Design Patterns: Parsing A File

One of the more common problems we encounter when writing software is the problem of parsing a file. Generally that involves parsing a text file–that is, turning an array or stream of characters into a data structure that can then be operated on. While this routinely comes up when designing a compiler, we see text files and parsing text files literally everywhere: in configuration files, in HTTP transactions where we receive a text response to an API query, even internally parsing commands.

There are several tools out there for parsing a file, but the general design pattern I want to talk about is the two-phase parsing process we learn about in Computer Science when talking about building a compiler.

Those two steps are “lexifying” or “tokenizing” a stream–turning random arrays of characters into ‘tokens’ that indicate the type of thing we’re looking at (“word”, “number”, “string”, etc). And the second step of parsing the stream of tokens, to turn them into an internal data structure that we can then directly operate on.

This two-step process is performed when we compile a program. But we also see this two-step tokenization/parsing process happen when parsing a JSON file, when parsing XML, or when parsing our own configuration files.


As a side note, the idea that we’re employing here, of “chunking” big problems into little problems, is it’s own “design pattern” of sorts.

And “chunking” of big problems into more manageable parts is a common idea we see everywhere: in the implementation of a protocol stack for network communications, where each “layer” implements just one piece of the puzzle and is only responsible for solving one problem. In a graphics pipeline, where each state of computation is handled by a separate step which is responsible for handling only one part of the geometric transformation which results in pretty pictures on a screen. Even in the development of a computer operating system, which is generally assembled not as a single monolithic entity, but as a series of individual computer programs each responsible for solving only one specialized problem.

This idea of chunking a problem into smaller bits is also referred to as as separation of concerns, and is worth understanding in its own right. “Separation of concerns” not only discusses chunking problems, but formalizes the idea–and throwing in things like defining well-defined interfaces and building your code as a series of “black boxes” which can stand on their own.


Tokenization

The first step is tokenization–that is, taking the text file and discovering the “tokens” in that text file which represent our “language.”

Tokenization depends on what we’re parsing. For example, if we were interested in parsing JSON, the tokens we need to look for are specified in the JSON specification, which itself is a very small subset of Javascript. And we can use the specification on this page to build a tokenizer by hand.

The JSON specification itself describes three fundamental object types beyond the incidential one-character puncutation marks that are used to define objects: strings (which are surrounded by double quotes), numbers, and tokens (such as “true” or “false”). As JSON is based on the Javascript specification I assume “tokens” are all symbols that start with a letter or an underscore, and are followed by letters, underscores or numbers. (So, for example, NULL would be a token, and A134 is a token. Anything a modern computer programming language would consider a ‘variable’ would be a token.)

Now whenever we read a file for tokenization, we generally want to create a mechanism to “push back” read characters. The idea here is that sometimes we have to read the next character to know it’s not part of the current token. So, for example, if we see the following:

3.14abc

We know after reading ‘3’, ‘.’, ‘1’ and ‘4’, ‘a’ is not part of the number. So we want to save ‘a’ for later use for the next token.

So we need routines which handle reading from our input stream and pushing back to our input stream. One way to handle this is with an array of pushed-back characters.

We define our tokenizer class in C++, ignoring the things we’re not talking about now:

/*  JSONLexer
 *
 *      JSON Lexer engine
 */

class JSONLexer
{
    public:
                        JSONLexer(FILE *f);
                        ~JSONLexer();
...
    private:
        FILE            *file;
        uint32_t        line;

        uint8_t         pos;
        uint8_t         stack[8];

        int             readChar();
        void            pushChar(int ch);
...
};

That is, we create a lexer which tokenizes the contents of the file, and we define the push-back stack as the internal members pos and stack.

Our implementation of the internal readChar and pushChar methods–which follow our own notion of “chunking” from above (and thus are only responsible for reading and pushing characters from our file) look like this:

/*  JSONLexer::readChar
 *
 *      Read the next character in the stream. Note this reads as 8-bit bytes,
 *  and does not validate unicode as UTF-8 characters.
 */

int JSONLexer::readChar()
{
    int ret;

    if (pos > 0) {
        ret = stack[--pos];
    } else {
        ret = fgetc(file);
    }

    if (ret == '\n') ++line;
    return ret;
}

/*  JSONLexer::pushChar
 *
 *      Push the read character back for re-reading again
 */

void JSONLexer::pushChar(int ch)
{
    stack[pos++] = (uint8_t)ch;

    if (ch == '\n') --line;
}

The push routine is easiest to explain: we simply add the character to our stack. (It’s limited to 8 characters, but we assume we won’t need to push back nearly that many characters during parsing. We also keep track of the number of newlines we’ve read, for reporting purposes. (This isn’t strictly necessary, but becomes convenient later.)

Note that we’d write the same two routines regardless of the file we’re parsing.

That is because any text file we’re reading may require a few characters “look-ahead” to determine the token we’re currently reading. That is also true if we’re parsing the C language or parsing an XML file.


Now that we have our push-back stack, we can build the tokenizer itself. Essentially our tokenizer reads as many characters as possible which fits into the pattern for “number”, “token” or “string”–and returns the text value of that object as well as the token ID for that object.

Basically think of this as an overblown version of ‘readChar’, except we read a token and return the token type as well as the contents of the token. (So, for example, our routine readToken may read “null” and return value “TOKEN” and the token string value for our token is set to ‘null’.)

Note that sometimes our parsers also need to read a token ahead. So we also need a flag that indicates we should return the last parsed token rather than read a new token.

The parts of our class that parses our token looks like this:

/*  JSONLexer
 *
 *      JSON Lexer engine
 */

class JSONLexer
{
    public:
                        JSONLexer(FILE *f);
                        ~JSONLexer();

        int             readToken();
        void            pushToken()
                            {
                                pushBack = true;
                            }

        std::string     token;
...
    private:
...
        bool            pushBack;
        int             lastToken;
};

That is, we track the last token we read, and we set a variable indicating if we should parse the next token or not. The text value of the token is in the field token.

Our tokenizer routine looks like the following. First, we determine if we’re just returning the last read token. If so, clear the flag, and return the token.

/*  JSONLexer::readToken
 *
 *      Read the next token in the stream
 */

int JSONLexer::readToken()
{
    int c;

    /*
     *  If we need to, return last token
     */

    if (pushBack) {
        pushBack = false;
        return lastToken;
    }

Now we strip out whitespace.

Note that for most programming languages and configuration files, white space is basically syntax sugar which separates tokens; it’s what makes ‘ABC DEF’ two tokens instead of one, from the parser’s perspective.

(Some languages make white space semantically important, like Markdown. Which means you’d need to use a different tokenizer to track white space in those languages. But for JSON, white space is not semantically meaningful, so we can strip it out.)

If we hit the end of the file, we return the end of file marker value, -1.

    /*
     *  Skip whitespace
     */

    while (isspace(c = readChar())) ;
    if (c == -1) return -1;         /* At EOF */

Now comes the interesting part. Basically we read the next character and, based on that character, we determine the thing we’re reading. For most tokenizers the next character completely defines what we’re reading: if it’s a token, a number, or a string. Some languages may require us to read a few characters ahead to figure out what we’re doing–but for JSON it’s far simpler.

Stripped of the logic of reading the rest of the token, our tokenizer looks like this:

    /*
     *  Parse strings
     */

    token.clear();
    if (c == '"') {
        ... Read the rest of the string ...
        return lastToken = STRING;
    }

    if (isalpha(c)) {
        ... Read the rest of the token ...
        return lastToken = TOKEN;
    }

    if (isdigit(c) || (c == '-')) {
        ... Read the rest of the number, or determine if the minus sign is
            just a minus sign ...
        return lastToken = NUMBER;
    }

    /*
     *  Return single character
     */

    token.push_back((char)c);
    return lastToken = c;
}

The full source code for the class (and the rest of the JSON parser and pretty printer) can be found on GitHub.


Parsing

Now that we have a stream of tokens, we can carry out the second half of the problem, parsing the tokens and making some sense of what those tokens mean.

There are several ways we can handle the parsing problem. One conceptually easy way is through hand-rolling a recursive descent parser. The idea here is that at the top our language defines a ‘thing’ which our parser’s parsing method is responsible for parsing. As we uncover the next token we then descend into a method which parses that sub-thing, and sometimes that requires us to recursively call earlier routines to handle the parsing task.

For JSON, there are fundamentally three language constructs: a “value” which can be a string, a number, “true”, “false”, “null”, or it can be an “object”, or an “array.” Objects and arrays in turn contain values themselves.

Our JSON language parser explicitly defines three methods, each which parses a “value”, an “object” and an “array”. (Note during the parsing process an internal data structure of the JSON object is built inside our class–but for the purposes of the discussion that’s not as important now.)

/*  JSONParser::parseValue
 *
 *      Parse the value. This assumes we are at the start of a value to parse,
 *  and assumes the next token is either a start of object, start of array,
 *  true, false, null, a number or a string
 */

bool JSONParser::parseValue()
{
    ... Parse the value. If we see a '{', call 'parseObject'. If we see a '[', call 'parseArray' ...
}

/*  JSONParser::parseObject
 *
 *      Parse the object. This is called after the '{' token is seen
 */

bool JSONParser::parseObject()
{
    ... Parse the object. When we need to parse a value, call 'parseValue' ...
}

/*  JSONParser::parseArray
 *
 *      Parse the array object
 */

bool JSONParser::parseArray()
{
    ... Parse the array. Values in the array are parsed by calling 'parseValue' ...
}

Side note: One aspect we use when building our parser here is that we take a playbook from the SAX parser for XML, and instead of having our JSONParser object construct a data representation of the JSON object as we parse it, instead, our parsing routines call several abstract methods indicating ‘events’, like “start object” and “start array.” This allows our parser to focus on just parsing the text, and defers the logic of what to do as we see different symbols to a separate class.

You can see this in the way we separate the concerns: the JSONParser class parses the file and fires the 11 separate abstract virtual methods in a specific order as we encounter values or objects. The separate JSONRecordParser extends our parser and implements the interface routines, using them to assemble a data structure which we can use elsewhere.

By separating the concerns in this fashion, in theory you could create your own class as a child class of JSONParser, intercepting the events and handling them in a different way.


Overall, then, our parsing process is invoked at the top level with the calls:

JSONLexer lexer(f);
JSONRecordParser parser;
JSONNode *node = parser.parse(&lexer);

And this is a hallmark of our overall two-step pattern: each step is represented as a separate class which handles its own operations. The second class then returns a data structure or object which represents what we parsed, and we can then use the contents of that item as we need to later in our code.


Conclusion.

Whenever you see a text file that needs to be parsed, consider breaking the problem into a tokenization step where you break the text file into a series of tokens–and a separate parsing step which parses the tokens and makes sense of what the tokens represent. Usually the parser will wind up building an internal data structure: a parse tree, a document object model or an abstract syntax tree, though sometimes you may wind up employing other algorithms, such as the shunting yard algorithm to parse infix-notation mathematics, evaluating the expressions as they are parsed.

And there are tools which exist that make building such parsers easier–though they are not strictly required in order to build a parser. Sometimes building the parser and tokenizer by hand gives you greater flexibility in catching and correcting syntax errors in the input text.

Design Patterns: Introduction

A design pattern is defined as “general, reusable solution to a commonly occurring problem within a given context in software design.” The idea is that when you encounter a problem, it provides a ready to use template for helping to solve that problem.

If you look at the Wikipedia article linked above, we then drop into a bunch of things which are then called “design patterns.” And while they are useful–I have never really cared for the way we use the definition “design pattern” in practice because many of them are so “small,” in a sense. That is, a lot of design patterns seem applicable to user interface design–which is good, I suppose–but many of them aren’t more than just a way to rearrange objects in an object-oriented system. (And as the criticism in the article notes, many of these “design patterns” are more a reflection of missing features in a programming language than do they represent true “patterns.”)

For me, the more interesting design patterns I have encountered in my life are far more complex than this. But I see them often enough in my coding that I resort to these “algorithmic” design patterns to solve far more complex problems than “how to unspagetify my code.”


I think it’s also worth noting the difference between a “pattern” and an “algorithm.”

To me, an algorithm is something that can be built as a self-contained unit. For example, sorting an array can be written in most modern languages using things like Java Generics: define the object, specify the comparator which compares objects, toss to a sorting object with the generic specified as the object you’re sorting. Things like red/black trees and the like can similarly be coded in self-contained packages in such a way so that you simply specify the object you want stored–and you’re good to go.

The implementation details at that point, in a well-written library, becomes almost academic: you have no need to know how Merge sort or Red-Black trees work; just pass to Arrays.sort() or create a TreeSet and call it a day.

But some things we need to do are as much “design pattern” as they are “algorithm”; that is, they’re techniques which cannot exactly be packaged, but are larger than a pattern like the observer pattern.


So that is the point of this series of essays: to describe design patterns that are much bigger and more interesting than design patterns that are arguably missing features in a programming language. Though I may touch upon more interesting variations of existing design patterns as I encounter them. And I may discuss things that are not quite ‘algorithmic design patterns’ as much as they are ‘ways to solve problems’ that are beyond simple design patterns.

Mostly when I encounter a thing that’s interesting to me, I plan to describe it here.

Updated “Metal Introduction” Document.

I’ve added a section on showing the processes in Metal for implementing Constructive Solid Geometry on the fly using the algorithms outlined in the paper An improved z-buffer CSG rendering algorithm, with example code uploaded to GitHub.

My goal is to eventually turn this into a CSG library for Metal.

The updated paper (with the additional section) can be downloaded from here: Metal: An Introduction, and updates the paper from my prior post.

And when you put it all together you should get:

ScreenShot2

OCTools

OCTools is a suite of tools which serve as a plug-in replacement for yacc and lex. The goal
is to provide a (roughly) source compatible tool which can convert yacc and lex
grammars into Objective C output for building parsers that run on MacOS and iOS.

The source kit can be found on GitHub.

Both tools can be compiled on the Macintosh using Xcode, and generate a command-line tool which can be run from the terminal or included into an Xcode project. (Both tools are built in C, so they should be portable to other platforms; however, I haven’t done the port so I’m not sure how successful porting would be.)

The goal of this project was to create Yacc and Lex analogs which generate re-entrant Objective C classes which implement the parsers, and for the Lex analog to use an Objective-C protocol definition for the input file, for maximum flexibility.


More information can be found here: OCTools.

3D Clipping in Homogeneous Coordinates.

Recently I got an Arduboy, and I thought it’d be an interesting exercise to blow the dust off my computer graphics knowledge and put a simple 3D vector drawing system together. After all, while by today’s standards a 16 MHz doesn’t sound like a lot, back when I was learning Computer Graphics at Caltech in the 1980’s in Jim Blinn’s computer graphics class, that was quite a bit of computational power.

A limiting factor on the Arduboy is available RAM; after you factor in the off-screen bitmap for drawing, you only have perhaps 1250 bytes left–not a lot for a polygonal renderer. But when a 4×4 transformation matrix fits into 64 bytes, that’s plenty of space for a vector graphics pipeline.

First step in building a vector graphics pipeline, of course, is building a 3D line clipping system; you want to be able to clip lines not just to the left and right, top and bottom, but also to the near and far clipping planes. And you do want to clip before you draw the lines; on the Arduboy, line drawing coordinates are represented as 16-bit signed integers, and with a 3D rendering system it would be very easy to blow those limits.

Sadly, however, I was unable to find a concise description of the homogeneous coordinate system clipping algorithm that I used when I was at Caltech. It’d been 30 years, and I had a vague memory of how we did it–how we combined the Cohen-Sutherland algorithm with Clipping using homogeneous coordinates and some general observations made during class to build an efficient 3D line clipper. And an Internet search yielded very little.

So I thought I’d write an article here, in case in another 20 years I need to dip back into the archives to build yet another line clipping system.

If you want to skip the math and get to the algorithm, scroll down to the section marked “Algorithm.”


Some preliminaries, before we get started.

In Computer Graphics, it is advantageous to represent 3D points using homogeneous coordinates. That is, for any point in 3D space (x,y,z), you add an additional term w, giving (x,y,z,w).

To convert from a homogeneous coordinate (x,y,z,w), you use the following relationship:

NewImage

Typically when we convert a point from 3D space to a homogeneous coordinates we simply tack a 1 at the end. There are special cases where you may want to do something simple (such as representing stars plotted infinitely far away, but typically you simply tack a 1 on:

NewImage

The purpose of this extra parameter is to simplify coordinate transformation through transformation matrices. This extra 1 term, for example, allows us to represent translation (that is, moving a point from one location to another) as a simple matrix multiplication. So if we are moving the points in our system over by (5,3,2), we can create a 4×4 translation matrix:

NewImage

and a point (x,y,z) would be translated through multiplication:

NewImage


One nice thing about using homogeneous coordinates is that we can represent perspective through a matrix multiplication. I’ll save you the diagrams you see on other countless web sites by noting that to represent point perspective, you divide the location (x,y) by the distance z; objects far away appear smaller and closer objects appear larger.

If we use a perspective matrix which adjusts the size (x,y) to fit our screen (so objects to the far left are at x = -1, objects to the far top are at y = -1, right and bottom are at +1 respectively), then a point at (x,y,z) would map in our perspective transform:

NewImage

(Side note: we use a right hand rule for coordinates. This means if up is +y and right is +x, then distance is -z. Which brings me to a rule of thumb with computer graphics: if the image looks wrong, look for an incorrect minus sign. While writing this article and testing the code below, I encountered at least 4 places where I had the wrong minus sign somewhere.)

Now if we convert our coordinates using the homogeneous coordinate to 3D point transformation above:

NewImage

That is, our 3D point (in what we call “screen space”) has the point perspective transformation baked in, and points in 3D space at (x,y,z) show up with perspective on our 2D screen: we just drop the third term and plot points on our screen at the location (ax/-nz,by/-nz).


Notice that we’ve set our boundaries for screen space with the screen space rectangle (-1,-1)-(1,1). It is advantageous for us to clip our drawings at those boundaries, and (for reasons I’ll skip here), it helps to also clip Z coordinates at a near clipping plane (generally z=-1), and a far clipping plane (z=-∞).

The rest of this article will describe in concrete steps how to do this clipping in homogeneous coordinates, using the two articles cited at the start of this article.


First, let us note that to clip in the screen space x/w >= -1, in homogeneous coordinates we really are clipping at x >= -w.

Second, let us define what we mean by clipping. When we say we are clipping a line, what we’re doing is finding the point where a line intersects our clipping plane x >= -w. We then shorten the line at that new point, drawing only the segment that is visible.

So, given two points A and B, we want to find the point C where C intersects the plane x = -w, or rather, where x + w = 0:

Canvas1

We can do this by observing that the dot product of a point and a vector is the distance that point is along the vector, and that our clipping plane x + w = 0 can also be defined as the set of points P where

NewImage

So to find our point C, we calculate the distance along the vector (1,0,0,1) (that is, x = -w) for the points A and B, and linearly interpolate to find the location C.

NewImage

There are some other useful observations we can make about the set of relationships above.

First, if x is greater than -w — that is, if we are on the inside of the clipping plane x/w >= -1, then the dot product A(1,0,0,1) will be greater than or equal to 0.

Second, if both A(1,0,0,1) and B(1,0,0,1) are both negative, then trivially our line is not visible, because both endpoints of the line are not visible.

And we can repeat this operation for all the relationships: for x >= -w, x <= w, y >= -w, y <= w, and z >= -w.

The far clipping plane can also be taken care of, by observing that our far clipping plane at infinity is reached as z/w approaches infinity; that is, as w approaches zero. So our far clipping plane would be the relationship z <= 0.

This gives us the following normal vectors representing our 6 clipping planes:

NewImage

Note that the vectors are set up so that when we are outside of the clipping plane, the value of the dot product is negative: that is, if the relationship x <= w holds, then 0 < w – x, and our clipping plane for that relationship is (-1 0 0 1).


The algorithm

This allows us to start sketching our solution.

Note our definition of a 3D vector is a homogeneous vector:

struct Vector3D
{
    double x,y,z,w;
}

First, we need a method to calculate all our clipping dot products. Note that because we’ve transformed our process into clipping against a set of vectors, the routine below could be generalized to add additional clipping planes. (For example, you may want to create a renderer that only renders the stuff in front of a wall in your game.)

static double Dot(int clipPlane, const Vector3D &v)
{
    switch (clipPlane) {
        case 0:
            return v.x + v.w;        /* v * (1 0 0 1) */
        case 1:
            return - v.x + v.w;      /* v * (-1 0 0 1) */
        case 2:
            return v.y + v.w;        /* v * (0 1 0 1) */
        case 3:
            return - v.y + v.w;      /* v * (0 -1 0 1) */
        case 4:
            return v.z + v.w;        /* v * (0 0 1 1) */
        case 5:
            return - v.z;            /* v * (0 0 -1 0) */
    }
    return 0; /* Huh? */
}

Now we borrow from the Cohen-Sutherland algorithm, and create a method which generates an “outcode”; this is a bitmap with 6 bits, each bit is set if a point is “outside” our clipping plane–that is, if the dot product for the given coordinate is negative.

static uint8_t OutCode(const Vector3D &v)
{
    uint8_t outcode = 0;
    for (int i = 0; i < 6; ++i) {
        if (Dot(i,v) < 0) outcode |= (1 << i);
    }
    return outcode;
}

(Note that we could easily optimize this method by using direct compares rather than by calling into our Dot method. We can also generalize it if the dot product method above is generalized.)

A nice feature about our OutCodes is that if we have the outcodes for points A and B, the following relationships are true:

  • if ((a_outcode & b_outcode) != 0) then both points are on the wrong side of at least one edge, and we can trivially reject this line.
  • if ((a_outcode | b_outcode) == 0) then both points are inside our viewing area, and we can trivially accept this line.

Now if it turns out both relationships are true, then we must clip our line A – B. And we can do that by calculating the alpha coordinates along A and B–if A is outside, we track the alpha on the A side of the line, and if B is outside, we track an alpha along B. And if the two flip–then we have a case like the one shown below:

Canvas1

This allows us to set up our final clipping algorithm.

First, we track the last point that was added, and the out code for that point:

static Vector3D oldPos;      // The last point we moved or draw to
static uint8_t oldOutCode;   // The last outcode for the old point.

We also rely on a drawing routine which actually draws in screen space. This should take the 3D homogeneous point provided, transform (using the transformation to 3D space noted above) to a 3D point, drop the Z component, convert the (x,y) components from the rectangle (-1,-1) – (1,1) to screen pixels, and draw the line as appropriate:

void MoveDrawScreen(bool draw, const Vector3D &v);

One last method we need is a utility method that, given a start point and an end point, and an alpha, calculates the linear interpolation between the start and end points.

static void Lerp(const Vector3D &a, const Vector3D &b, float alpha, Vector3D &out)
{
    float a1 = 1.0f - alpha;
    out.x = a1 * a.x + alpha * b.x;
    out.y = a1 * a.y + alpha * b.y;
    out.z = a1 * a.z + alpha * b.z;
    out.w = a1 * a.w + alpha * b.w;
}

Now our method declaration takes a new point and a flag that is true if we’re drawing a line, or false if we are moving to the point:

void MoveDrawClipping(bool draw, const Vector3D &v)
{
    Vector3D lerp;
    float aold;
    float anew;
    float alpha;
    float olda,newa;
    uint8_t m;
    uint8_t i;
    uint8_t newOutCode;
    uint8_t mask;

    /*
     *    Calculate the new outcode
     */

    newOutCode = OutCode(v);

In the above we calculate the outcode for our new point v. This allows us to quickly determine if the point is inside or outside our visible area. This is useful when we want to just move to a point rather than draw to a line:

    /*
     *  If we are not drawing, and the point we're moving to is visible,
     *  pass the location upwards.
     */

    if (!draw) {
        if (newOutCode == 0) {
            MoveDrawScreen(draw,v);
        }
    } else {

If draw is true, then we’re drawing a line, and so we get to the meat of our clipping algorithm. Note that our old out code and old points (stored in the globals above) have already been initialized. (Or rather, we assume they have been with an initial move call.)

So we can borrow from the Cohen-Sutherland algorithm and quickly determine if our line is entirely clipped or entirely visible:

        /*
         *  Fast accept/reject, from the Cohen-Sutherland algorithm.
         */

        if (0 == (newOutCode & oldOutCode)) {
            /*
             *  The AND product is zero, which means the line may be
             *  visible. Calculate the OR product for fast accept.
             *  We also use the mask to quickly ignore those clipping
             *  planes which do not intersect our line
             */

            mask = newOutCode | oldOutCode;
            if (0 == mask) {
                /*
                 *  Fast accept. Just draw the line. Note that our
                 *  code is set up so that the previous visible line has
                 *  already been moved to
                 */

                MoveDrawScreen(true,v);
            } else {

If we get to the code below, this means we have a value for mask which indicates the clipping planes that we intersect with. So now we want to interate through all the clipping planes, calculating the dot product for only those planes we intersect with:

                /*
                 *  At this point mask contains a 1 bit for every clipping
                 *  plane we're clipping against. Calculate C alpha to find
                 *  C, and adjust the endpoints A alpha (aold) and B alpha
                 *  (anew)
                 */

                aold = 0;            /* (1 - alpha) * old + alpha * new = pt */
                anew = 0;            /* so alpha = 0 == old, alpha = 1 == new */
                m = 1;
                for (i = 0; i < 6; ++i) {
                    if (mask & m) {

                        /* Calculate alpha; the intersection along the line */
                        /* vector intersecting the specified edge */

                        olda = Dot(i,oldPos);
                        newa = Dot(i,v);
                        alpha = olda/(olda-newa);

And next we need to bring in the side of the line which we know has clipped against the clipping plane. Meaning if A is outside our visible area and B is inside, we want to adjust Aα to the new value calculated in alpha; but if A is inside and B outside, we want to adjust Bα instead.

                        /* Now adjust the aold/anew endpoints according to */
                        /* which side (old or v) is outside. */
                        if (oldOutCode & m) {
                            if (aold < alpha) aold = alpha;
                        } else {
                            if (anew > alpha) anew = alpha;
                        }

And finally if somehow the edges cross, we have the circumstance above where the line is not visible but crosses two clipping planes. So we quickly reject that case:

                        if (aold > anew) {
                            /*
                             *  Our line is not visible, so reject.
                             */
                            break;
                        }

The rest simply closes our loop:

                    }
                    m <<= 1;
                }

At this point if we have iterated through all six planes, we have a line which is visible. If our starting point in oldPos was not visible, then we need to move to the place where our starting point intersects with the clipping boundaries of our view space:

                if (i >= 6) {
                    /* Ran all clipping edges. */
                    if (oldOutCode) {
                        /* Old line coming into view */
                        Lerp(oldPos,v,aold,lerp);
                        MoveDrawScreen(false,lerp);
                    }

And if our destination point is visible, we draw to it; otherwise, we draw to the point on the line we’re drawing to which intersects our clipping boundaries:

                    /* Draw to the new point */
                    if (newOutCode) {
                        Lerp(oldPos,v,anew,lerp);
                        mdl2(true,lerp);
                    } else {
                        mdl2(true,v);
                    }

Our clipping is done; all that is left is to close up the loops and conditionals, and store away in our globals the outcode for the point we just moved to, as well as the location of that point:

                }
            }
        }
    }

    /*
     *    Now update the old point to what we just moved to
     */

    oldOutCode = newOutCode;
    oldPos = v;
}

There are a number of optimizations that can take place with the clipping code above. For example, we can also track an array of dot products so we’re not repeatedly calculating the same dot product over and over again. We could replace our outcode routine with simple compare operations. We could also extend the above routine to handle additional clipping planes.

But this gives us a vector line drawing clipping system that clips in homogeneous coordinates, complete with source code that can be compiled and executed.


What is old is new again; what started as a powerful desktop computer has now appeared as an embedded processor. So it’s good to remember the old techniques as we see more and more microcontrollers show up in the products we use.

Finding the boundary of a one-bit-per-pixel monochrome blob

Recently I’ve had need to develop an algorithm which can find the boundary of a 1-bit per pixel monochrome blob. (In my case, each pixel had a color quality which could be converted to a single-bit ‘true’/’false’ test, rather than a literal monochrome image, but essentially the problem set is the same.)

In this post I intend to describe the algorithm I used.


Given a 2D monochrome image with pixels aligned in an X/Y grid:

Arbitrary 2D Blob

We’d like to find the “boundary” of this blob. For now I’ll ignore the problem of what the “boundary” is, but suffice to say that with the boundary we can discover which pixels are inside the blob which border the exterior of the blob, which pixels on the outside border the pixels inside the blob, or to find other attributes of the blob such as a rectangle which bounds the blob.

If we treat each pixel as a discrete 1×1 square rather than a dimensionless point, then I intend to define the boundary as the set of (x,y) coordinates which define the borders around each of those squares, with the upper-left corner of the square representing the coordinate of the point:

Pixel Definition

So, in this case, I intend to define the boundary as the set of (x,y) coordinates which define the border between the black and the white pixels:

outlined blob

We do this by walking the edge of the blob through a set of very simple rules which effectively walk the boundary of the blob in a clock-wise fashion. This ‘blob walking’ only requires local knowledge at the point we are currently examining.

Because we are walking the blob in a clock-wise fashion, it is easy to find the first point in a blob we are interested in walking: through iteratively searching all of the pixels from upper left to lower right:

(Algorithm 1: Find first point)

-- Given: maxx the maximum width of the pixel image
          maxy the maximum height of the pixel image

   Returns the found pixel for an eligable blob or 'PixelNotFound'.

for (int x = 0; x < maxx; ++x) {
    for (int y = 0; y < maxy; ++y) {
        if (IsPixelSet(x,y)) {
            return Pixel(x,y);
        }
    }
}
return PixelNotFound;

Once we have found our elegable pixel, we can start walking counter clockwise, tracking each of the coordinates we have found as we walk around the perimeter of the blob, traversing either horizontally or vertically.


Given the way we’ve encountered our first pixel in the algorithm above, the pixels around the immediate location at (x,y) looks like the following:

Point In Center

That’s because the way we iterated through, the first pixel we encountered at (x,y) implies that (x-1,y), (x,y-1) and (x-1,y-1) must be clear. Also, if we are to progress in a clock-wise fashion, clearly we should move our current location from (x,y) to (x+1,y):

Blob5


The design of our algorithm proceeds in a similar fashion: by examining each of the 16 possible pixel configurations we can find in the surrounding 4 pixels, and tracking the one of 4 possible incoming paths we take, we can construct a matrix of directions in which we should take to continue progressing in a clockwise fashion around our eligible blob. And in all but two configuration cases, there was only one possible incoming path we could have taken to get to the center point, since as we presume we are following the edge of the blob we could not have entered between two black pixels or between two white pixels. Some combinations are also illegal since we presume we are walking around the blob in a clock-wise fashion rather than in a counter-clockwise fashion. (This means that we should be, when standing at the point location, pixels should be on the right and never on the left. There is a proof for this which I will not sketch here.)

The 16 possible configurations and the outgoing paths we can take are illustrated below:

All directions

Along the top of this table shows the four possible incoming directions: from the left, from the top, from the right and from the bottom. Each of the 16 possible pixel combinations are shown from top to bottom, and blanks indicate where an incoming path was illegal–either because it comes between two blacks or two whites, or because the path would have placed the black pixel on the left of the incoming line rather than on the right.

Note that with only two exceptions each possible combination of pixels produces only one valid outgoing path. Those two exceptions we arbitrarily pick an outgoing of two possible paths which excludes the diagonal pixel; we could have easily gone the other way and included the diagonal, but this may have had the property of including blobs with holes. (If this is acceptable or not depends on how you are using the algorithm.)

This indicates that we could easily construct a switch statement, converting each possible row into an integer from 0 to 15:

Algorithm 2: converting to a pixel value.

-- Given: IsPixel(x,y) returns true if the pixel is set and false 
          if it is not set or if the pixel is out of the range from
          (0,maxx), (0,maxy)
   Return an integer value from 0 to 15 indicating the pixel combination

int GetPixelState(int x,y int y)
{
    int ret = 0;
    if IsPixel(x-1,y-1) ret |= 1;
    if IsPixel(x,y-1) ret |= 2;
    if IsPixel(x-1,y) ret |= 4;
    if IsPixel(x,y) ret |= 8;
    return ret;

}

We now can build our switch statement:

Algorithm 3: Getting the next pixel location

-- Given: the algorithm above to get the current pixel state,
          the current (x,y) location,
          the incoming direction dir, one of LEFT, UP, RIGHT, DOWN

   Returns the outgoing direction LEFT, UP, RIGHT, DOWN or ILLEGAL if the
   state was illegal.

Note: we don't test the incoming path when there was only one choice. We
could, by adding some complexity to this algorithm, for testing purposes.
The values below are obtained from examining the table above.

int state = GetPixelState(x,y);
switch (state) {
    case 0:    return ILLEGAL;
    case 1:    return LEFT;
    case 2:    return UP;
    case 3:    return LEFT;
    case 4:    return DOWN;
    case 5:    return DOWN;
    case 6:    {
                   if (dir == RIGHT) return DOWN;
                   if (dir == LEFT) return UP;
                   return ILLEGAL;
               }
    case 7:    return DOWN;
    case 8:    return RIGHT;
    case 9:    {
                   if (dir == DOWN) return LEFT;
                   if (dir == UP) return RIGHT;
                   return ILLEGAL;
               }
    case 10:   return UP;
    case 11:   return LEFT;
    case 12:   return RIGHT;
    case 13:   return RIGHT;
    case 14:   return UP;
    case 15:   return ILLEGAL;
}

From all of this we can now easily construct an algorithm which traces the outline of a blob. First, we use Algorithm 1 to find an eligible point. We then set ‘dir’ to UP, since the pixel state we discovered was pixel state 8, and the only legal direction had we been tracing around the blob was UP.

We store away the starting position (x,y), and then we iterate through algorithm 3, getting new directions and updating (x’,y’), traversing (x+1,y) for RIGHT, (x,y+1) for UP, (x-1,y) for LEFT and (x,y-1) for DOWN, until we arrive at our starting point.

As we traverse the parameter, we could either build an array of found (x,y) values, or we could do something else–such as maintaining a bounding rectangle, bumping the boundaries as we find (x,y) values that are outside the rectangle’s boundaries.

On Memory Leaks in Java and in Android.

Just because it’s a garbage collected language doesn’t mean you can’t leak memory or run out of it. Especially on Android where you get so little to begin with.

Now of course sometimes the answer is that you just need more memory. If your program is a Java command line program to load the entire road map of the United States to do some network algorithms, you probably need more than the default JVM configurations give you.

Sometimes it’s not even a full-on leak, but a large chunk of memory isn’t being released in time as a consequence of some holder object that isn’t being released in time.

There are some tools that can help. With Android, you can use DDMS to get an idea what’s going on, and you can even dump a snapshot of the heap by using the Dump HPROF File option. (You can also programmatically capture uncaught exceptions on startup of your application or activity and dump an hprof file within the exception handler like so:

public void onCreate(Bundle savedInstanceState)
{
...
    Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler()
    {
        @Override
        public void uncaughtException(Thread thread, Throwable ex)
        {
            try {
                File f = new File(Environment.getExternalStorageDirectory(),"error.hprof");
                String path = f.getAbsolutePath();
                Debug.dumpHprofData(path);
                Log.d("error", "HREF dumped to " + path);
            }
            catch (IOException e) {
                Log.d("error","Huh?",e);
            }
        }
    });
...
}

Of course once you have an .hprof file from Android you have to convert it to something that can be used by an application such as the Eclipse Memory Analyzer tool using the hprof-conv command line application included as part of the Android SDK; there is more information on how to do this and how to use the MAT tool here: Attacking memory problems on Android.

One place where I’ve been running into issues is with a clever little bit of code which loads images from a separate thread from a remote resource, and puts them into a custom view that replaces the ImageView class. This little bit of code creates a background thread which is used to talk to a remote server to download images; once the image is loaded, a callback causes the custom view to redraw itself with the correct contents. A snippet of that code is below:

/*  Cache.java
 *
 *  Created on May 15, 2011 by William Edward Woody
 */

package com.chaosinmotion.android.utils;

import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.Map.Entry;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;

import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Handler;

public class Cache
{
    /**
     * Our callback interface
     */
    public interface Callback
    {
        void loaded(String url, Bitmap bitmap);
        void failure(String url, Throwable th);
    }
    
    /**
     * Item in the queue which is waiting to be processed by our network thread(s)
     */
    private static class QueueItem
    {
        String url;
        Callback callback;
        
        QueueItem(String u, Callback c)
        {
            url = u;
            callback = c;
        }
    }
    
    /// The handler to thread to the UI thread
    private Handler fHandler;
    /// The event queue
    private LinkedList<QueueItem> fQueue;
    /// The global cache object, which will be created by the class loader on load.
    /// Because this is normally called from our UI objects, this means our Handler
    /// will be created on our UI thread
    public static Cache gCache = new Cache();
    
    /**
     * Internal runnable for our background loader thread
     */
    private class NetworkThread implements Runnable
    {
        public void run()
        {
            // Start HTTP Client
            HttpClient httpClient = new DefaultHttpClient();
            
            for (;;) {
                /*
                 * Dequeue next request
                 */
                QueueItem q;

                synchronized(fQueue) {
                    while (fQueue.isEmpty()) {
                        try {
                            fQueue.wait();
                        }
                        catch (InterruptedException e) {
                        }
                        break;
                    }

                    /*
                     * Get the next item
                     */
                    q = fQueue.removeLast();
                }
                
                /*
                 * Read the network
                 */
                
                try {
                    /*
                     * Set up the request and get the response
                     */
                    HttpGet get = new HttpGet(q.url);
                    HttpResponse response = httpClient.execute(get);
                    HttpEntity entity = response.getEntity();
                    
                    /*
                     * Get the bitmap from the URL response
                     */
                    InputStream is = entity.getContent();
                    final Bitmap bmap = BitmapFactory.decodeStream(is);
                    is.close();

                    entity.consumeContent();
                    
                    /*
                     * Send notification indicating we loaded the image on the
                     * main UI thread
                     */
                    final QueueItem qq = q;
                    fHandler.post(new Runnable() {
                        public void run()
                        {
                            qq.callback.loaded(qq.url,bmap);
                        }
                    });
                }
                catch (final Throwable ex) {
                    final QueueItem qq = q;
                    fHandler.post(new Runnable() {
                        public void run()
                        {
                            qq.callback.failure(qq.url,ex);
                        }
                    });
                }
            }
            
//            httpClient.getConnectionManager().shutdown();
        }
    }
    
    /**
     * Start up this object
     */
    private Cache()
    {
        fHandler = new Handler();
        fQueue = new LinkedList();
        Thread th = new Thread(new NetworkThread());
        th.setDaemon(true);
        th.start();
    }
    
    /**
     * Get the singleton cache object
     */
    public static Cache get()
    {
        return gCache;
    }
    
    /**
     * Get the image from the remote service. This will call the callback once the
     * image has been loaded
     * @param url
     * @param callback
     */
    public void getImage(String url, Callback callback)
    {
        synchronized(fQueue) {
            fQueue.addFirst(new QueueItem(url,callback));
            fQueue.notify();
        }
    }
}

Now what this does is rather simple: we have a queue of items which are put into a linked list, and our background thread loads those items, one at a time. Once the item is loaded, we call our callback so the image can then be handled by whatever is using the service to load images from a network connection.

Of course we can make this far more sophisticated; we can save the loaded files to a cache, we can collapse multiple requests for the same image so we don’t try to load it repeatedly. We can also make the management of the threads more sophisticated by creating a thread group of multiple threads all handling network loading.

We can then use this with a custom view class to draw the image, drawing a temporary image showing the real image hasn’t been loaded yet:

/*  RemoteImageView.java
 *
 *  Created on May 15, 2011 by William Edward Woody
 */

package com.chaosinmotion.android.utils;

import android.content.Context;
import android.graphics.Bitmap;
import android.graphics.Canvas;
import android.graphics.Color;
import android.graphics.Paint;
import android.view.View;

public class RemoteImageView extends View
{
    private Paint fPaint;
    private Bitmap fBitmap;
    private String fURL;

    public RemoteImageView(Context context)
    {
        super(context);
        // TODO Auto-generated constructor stub
    }
    
    public void setImageURL(String url)
    {
        fBitmap = null;
        fURL = url;
        
        Cache.get().getImage(fURL, new Cache.Callback() {
            public void loaded(String url, Bitmap bitmap)
            {
                fBitmap = bitmap;
                invalidate();
            }

            public void failure(String url, Throwable th)
            {
                // Ignoring for now. Could display broken link image
            }
        });
    }

    @Override
    protected void onDraw(Canvas canvas)
    {
        if (fPaint == null) fPaint = new Paint();
        
        canvas.drawColor(Color.BLACK);
        if (fBitmap == null) return;        // could display "not loaded" image
        canvas.drawBitmap(fBitmap, 0, 0, fPaint);
    }
}

This is a very simple example of our using the Cache object to load images from a background thread. We can make this far more sophisticated; we can (for example) display a “loading” image and a “image link broken” image. We can also alter the reported size during onMeasure to return the size of the bitmap, or we can center the displayed bitmap or scale the bitmap to fit. But at it’s core, we have a simple mechanism for displaying the loaded image in our system.

Can you spot the leak?

I didn’t, at first.

Here’s a hint: Avoiding Memory Leaks

Here’s another: the RemoteImageView, being a child of the View class, holds a reference to it’s parent, and up the line until we get to the top level activity, which holds a reference to–well–just about everything.

No?

Okay, here goes.

So when we call:

        Cache.get().getImage(fURL, new Cache.Callback() { ... });

The anonymous inner class we create when we create our callback holds a reference to the RemoteImageView. And that inner class doesn’t go away until after the image is loaded. So if we have a few dozen of these and a very slow connection, the user switches from one activity to another–and we can’t let the activity go, because we’re still waiting for the images to load and be copied into the image view.

So while it’s not exactly a memory leak, the class can’t be let go of, nor can all the associated resources, until our connection completes or times out. In theory it’s not a leak, exactly, because eventually the memory will be released–but it won’t be released soon enough for our purposes. And so we crash.

So how do we fix this?

Well, we need to add two things. First, we need to somehow disassociate our view from the anonymous inner class so that, when our view no longer exists, the callback class no longer holds a reference to the view. That way, the activity can be reclaimed by the garbage collector even though our callback continues to exist. Second, we can remove the unprocessed callbacks so they don’t make a network call to load an image that is no longer needed.

To do the first, we change our anonymous inner class to a static class (that way it doesn’t hold a virtual reference to ‘this’), and explicitly pass a pointer to our outer class to it, one that can then be removed:

/*  RemoteImageView.java
 *
 *  Created on May 15, 2011 by William Edward Woody
 */

package com.chaosinmotion.android.utils;

import android.content.Context;
import android.graphics.Bitmap;
import android.graphics.Canvas;
import android.graphics.Color;
import android.graphics.Paint;
import android.view.View;

public class RemoteImageView extends View
{
    private Paint fPaint;
    private Bitmap fBitmap;
    private String fURL;
    private OurCallback fCallback;
    
    public RemoteImageView(Context context)
    {
        super(context);
        // TODO Auto-generated constructor stub
    }
    

    private static class OurCallback implements Cache.Callback
    {
        private RemoteImageView pThis;
        
        OurCallback(RemoteImageView r)
        {
            pThis = r;
        }
        
        public void loaded(String url, Bitmap bitmap)
        {
            if (pThis != null) {
                pThis.fBitmap = bitmap;
                pThis.invalidate();
                pThis.fCallback = null; // our callback ended; remove reference
            }
        }

        public void failure(String url, Throwable th)
        {
            // Ignoring for now. Could display broken link image
            if (pThis != null) {
                pThis.fCallback = null; // our callback ended; remove reference
            }
        }
    }

    public void setImageURL(String url)
    {
        fBitmap = null;
        fURL = url;
        
        fCallback = new OurCallback(this);
        Cache.get().getImage(fURL, fCallback);
    }

    @Override
    protected void onDraw(Canvas canvas)
    {
        if (fPaint == null) fPaint = new Paint();
        
        canvas.drawColor(Color.BLACK);
        if (fBitmap == null) return;        // could display "not loaded" image
        canvas.drawBitmap(fBitmap, 0, 0, fPaint);
    }

    @Override
    protected void onDetachedFromWindow()
    {
        // Detach us from our callback
        if (fCallback != null) fCallback.pThis = null;
        
        super.onDetachedFromWindow();
    }
}

The two biggest changes is to create a new static OurCallback class which holds a reference to the view being acted on. We then hold a reference to the callback that is zeroed out when the callback completes, either on failure or on success. Then on the onDetachedFromWindow callback, if we have a request outstanding (because fCallback is not null), we detach the view from the callback. Note that because all the calls in the callback are done on the UI thread we don’t need to synchronize access.

This will now detach the view from the callback when the view goes away, so the activity that contains the view can be reclaimed by the memory manager.

Our second change is to remove the request from the queue, so we don’t use unnecessary resources. While not strictly necessary for memory management purposes, it helps our network performance. The change here is to explicitly remove our callback from the queue.

First, we change our onDetachedFromWindow() call to remove us (by callback) from the cache:

    @Override
    protected void onDetachedFromWindow()
    {
        // Detach us from our callback
        if (fCallback != null) {
            fCallback.pThis = null;
            Cache.get().removeCallback(fCallback);
        }
        
        super.onDetachedFromWindow();
    }

Second, we add a method to the cache to look for all instances of requests with the same callback, and delete the request from the queue. If it isn’t in the queue, it’s probably because the request is now being acted upon by our networking thread. (If we were particularly clever we could signal our networking thread to stop the network request, but I’m not going to do that here.)

So our method added to the Cache is:

    /**
     * Remove from the queue all requests with the specified callback. Done when the
     * result is no longer needed because the view is going away.
     * @param callback
     */
    public void removeCallback(Callback callback)
    {
        synchronized(fQueue) {
            Iterator iter = fQueue.iterator();
            while (iter.hasNext()) {
                QueueItem i = iter.next();
                if (i.callback == callback) {
                    iter.remove();
                }
            }
        }
    }

This iterates through the queue, removing entries that match the callback.

I’ve noted this on my list of things not to forget because this (and variations of this) comes up, with holding references to Android View objects in a thread that can survive the destruction of an activity.

The basic model is when the view goes away (which we can detect with a callback to onDetachedFromWindow), to disassociate the callback from the view and (preferably) to kill the background thread so the view object (and the activity associated with that view) can be garbage collected in a timely fashion.

Some observations on the polygon intersection algorithm in Chapter 2 of Computational Geometry: Algorithms and Applications.

I’m building a CSG library because eventually I’d like to have a native Macintosh editor drive my MakerBot. Building a CSG algorithm that uses BSP trees to handle the 3 dimensional math is pretty trivial; the algorithm I’m using came from the paper “Improved Binary Space Partition Merging” by Lysenko et.al., which itself uses the convex hull algorithm outlined in “Linear Programming and Convex Hulls Made Easy” by Seidel.

The problem is, once you have a BSP representation of your object, you need to turn it back into a collection of polygons. And that means a good 2D polygon manipulation engine. (I’m converting directly from a BSP tree to polygons; the way this works is to define a cube bounding the entire BSP tree (by picking one that is “big enough”), then recursing down the BSP tree structure, splitting that cube as you go. At the bottom, you throw away or return the split cube, and “glue it back together” as you come back up the tree. This implies that on return you have to glue arbitrary polygon shapes on both sides of a split plane together–and on the polygon surface, that means doing a 2 dimensional XOR operation on the polygons there.)

So I’ve been going through the book “Computational Geometry” trying to implement the intersection algorithm outlined in Chapter 2 there.

And I have a few observations.

(1) The exposition of the algorithm is geared more towards a general discussion of the algorithm rather than providing a complete description of the algorithm in a way that is suitable for implementation. That means implementing the algorithm requires a fundamental understanding of what’s going on; you can’t just look at the outline of the algorithm and convert it to code. And you need to discover using various data sets the gaps in your implementation and find where things go south.

(2) The algorithm fundamentally relies on the data structure in section 2.2, which uses half-edges to represent each ring. The half edges must be complete: that is, if you have just a naked polygon sitting in an otherwise empty space, you need both the inner ring representing the inside, and an outer ring of edges which represent the outer boundary. You can’t get away with unpaired half-edges and easily make this thing work. No shortcuts.

(That’s because when two half edges intersect (figure 2.5, section 2.3), you need to tie together the half edges that meet at the point–and sometimes an inner edge pairs with an outer edge and visa-versa. Imagine two polygons intersecting: the outer edges of the two polygons form the inner edges of the penetrating part of the two outer polygons of the intersection.)

You can be excused, by the way, in thinking the half planes could be naked and not have a matching half-edge for an outer defined polygon, given the discussion of monotone polygons and polygon triangulation in Chapter 3 works with naked half-edge polygon rings.

(3) The algorithm itself also relies on a sorted tree structure that you can search by edge or by point. In C++ or Objective C, that means some additional work. Fortunately the number of polygons are small enough that it’s not worth the overhead of using a balanced tree structure.

(4) The algorithm goes south when two edges are coincident. The trick appears to be throwing away identical half edges that are from S1 and from S2 in Algorithm MapOverlay in section 2.3, at step 1, creating the new doubly-connected edge list D. (In practice you can do this by dropping duplicate edges in step 2, when initializing the event queue Q: as each edge is added to each event in Q, determine if there is already an identical edge already in that event.)

Also, when computing the intersections, if an intersection test creates a new intersection edge that is identical to an existing edge (because the two intersecting edges were coincident but without the same start or end points), throw away the newly constructed edge intersection if they already exist. This means in practice if, while updating the event in Q in algorithm HandleEventPoint in section 2.1 as a result of finding an intersecting edge, when inserting the found segments, verify they aren’t already in the event in FindNewEvent.

(5) The “final step” referred to in the paragraph preceding MapOverlay, discovering which polygon in one overlay subdivision contains the vertices of the other, can be determined by using a sweep algorithm, sweeping on the vertices in both subdivisions S1 and S2, storing the edges of each in their own tree list. At each point, you can then determine which edge of one subdivision is to the left of the point in the other; that leftmost edge determines which polygon contains our vertex.

I’m sure you could probably also do this during the sweep in step 2 of MapOverlay; in fact, we do capture the half edge to the left of the event point, though that half edge is of D, which is the combination of S1 and S2. So it may be a matter of continuing the leftward walk to find representatives of both sets, or simply maintaining two other tree structures at the same time so you can get the leftward edges for D, S1 and S2 at the same time.

At some point, after I successfully get an implementation of my XOR algorithm going correctly, I’m going to write an exposition of this algorithm that isn’t so damned obtuse and which can be more easily implemented as code. Because this article (along with the article describing triangulation of a polygon) feels like it was written by a professor who intuitively knew the algorithms, but was not working from a functional example of the algorithms he was describing when writing the paper.

That’s because the gaps and holes in the exposition are big enough to drive a truck through. (For example, in the description FindNewEvent in section 2.1, we have the values L(p), U(p) and C(p) defined; however, from the description of FindNewEvent it’s unclear what we do at step 2: “Insert the event point as an event into Q.” Given an event contains L(p), U(p) and C(p), what is the event we insert when we insert the event point? And let’s skip by the “However, we won’t explain this final step here. It’s a good exercise to test your understanding of the plane sweep approach to design the algorithm yourself” comment later in the chapter.)

So if you were trying to implement a polygon mesh intersection from this discussion and decided you were just dumb, well, no you’re not: I’ve wasted a couple of months of my spare time trying to sort all this out, and I can tell you: you’re not dumb. The description is incomplete.

On Memory and Memory Management

This will be a bit of an introductory article on memory, written in part for my wife and for anyone else who may find such an introduction useful.

In The Beginning…

In the beginning there was RAM, random access memory. A microprocessor which could execute instructions would be attached to that RAM, which could be accessed by special instructions which operated on the contents of that memory.

Some microprocessors (for example, the Z-80) used special instructions which would use two 8-bit integer registers (such as register D and E) to form a 16-bit address into RAM, or would use special registers (the IX and IY index registers) to access memory. Others (such as the 68000 CPU) would have a bank of dedicated address registers which are used to access memory.

Writing software in this era was simple: you would define the location of various objects that would be stored in memory, and you would use those locations to access the stored data.

In today’s modern parlance all objects are fixed and global. There is no allocating objects or releasing those; that came later. Instead, there is just picking the size of the records (or structures) stored in memory, and making sure there is enough space left in RAM for your stack.

Since early microprocessors only had a very small and limited amount of memory, this is fine. Embedded development with toolchains such as the Small Device C Compiler, which can compile code for the HC08 series of CPUs, don’t generally provide any way to allocate memory; instead, you declare all of your objects as global structures.

separator.png

As an example, a program I’ve been tinkering with on the Arduino platform (and I really believe every high school student who wants to get into programming needs one) emulates a calculator with no memory. The memory declaration looks something like:

/*  Calculator Storage in RAM */
Double GDisplay;
Double GInput;
Double GScratch;
boolean GIsInput;
boolean GIsClear;

When compiled it will compile into an assembler statement that may look something like:

.EQU GDisplay = 0x0100
.EQU GInput = 0x0108
.EQU GScratch = 0x0110
.EQU GIsInput = 0x0118
.EQU GIsClear = 0x0119

Fixed allocation of memory structures also makes sense with certain types of game development, where performance and reliability of the code path is essential. For example, a 3D game such as the early game Descent could store for each level the size of the rendering map and the maximum number of records needed to process and render a level regardless of your location in the maze. Then, when the level loads, the proper amount of memory can be obtained using a function call similar to sbrk, which tells the operating system that you need a fixed amount of RAM, then partition the contents up yourself.

Various compiled games also use this technique, where the game is specified by a specialized game specification language. By being able to walk through all potential states in a game, it would be possible to determine the maximum memory footprint for that game, then request the fixed amount of memory for storage. This technique is used by Zork, amongst other things.

Fixed allocation of memory can also be used for certain high-performance applications where accuracy and speed and stability are paramount. For example, software handling incoming click requests in a search advertising system may wind up handing millions (or even billions) of click requests per minute–given that each click request can be designed as a fixed-sized record (or a record with a maximum known size), click requests can be accumulated into a fixed size array with certain summary data accumulated in global variables during store. Then, when a maximum time (or a maximum number) is reached, the formatted buffer and summary data can be spilled into a single write request into upstream systems which may only need the summary data.

Then Came The Heap.

Not all programs work well with the fixed allocation of objects. Sometimes you need a linked list of objects, or a heap list, or other more complex data structures where you cannot know a priori the number of records you will be handling.

Heap processing more or less works the same regardless of the programming language: there is a mechanism by which you can request a chunk of memory at a given size, and another call to release that memory. Sometimes there is a call that allows you to resize that memory block; resizing a memory block can be handled by releasing the old block and allocating a new one, copying the contents from one location to another.

Heap allocation works by using a large chunk of RAM dedicated to that heap, and, on a request for a chunk of memory, reserves it in RAM. There are many ways this can be done; the easiest to explain is the greedy algorithm, which reserves the first chunk of memory that can be found.

Allocated memory requires some bookkeeping information so the memory allocation code can know how to handle this chunk of memory: is it allocated, how big is the chunk that is reserved. So when you allocate memory, generally additional space is reserved prior to the address pointer with bookkeeping data; this is then used to resize memory (if resizing is allowed), and to know how much memory to mark as free when the free routine is called.

allocation.png

A simple malloc() and free()

We can easily implement a simple malloc and free routine to illustrate how a typical heap may work. We’ll make two assumptions, which are common to today’s modern processors. First, all allocated memory will be aligned (for efficiency sake) on a 16-byte boundary. (Some processors address things on 16-byte boundaries far more efficiently.) Second, we will use a four byte bookkeeping object which gives the size of the memory allocated, along with a 1 bit flag indicating if the area of memory is free or still allocated. Because we know all memory is aligned on a 16 byte boundary we know the least significant bit of the 32-bit length will never be used, so we use that bit for the free flag; this way we only need to use 4 bytes for our bookkeeping data.

By sketching out the code for a simple malloc() and free() we can also illustrate some of the interesting bugs that can come up while using the heap, such as accidentally using memory that was freed previously.

separator.png

Footnote: We use ‘uint8_t’ to refer to 8-bit bytes, and ‘uint32_t’ to refer to 32-bit bytes. Because on most modern operating systems, memory can be addressed on a byte boundary, we can add or subtract from an address pointer by converting that pointer to a pointer to a byte object. For example:

a = (uint32_t *)(((uint8_t *)a) + 3);

The expression above will add 3 to the address register in ‘a’, pointing three bytes in from where it was pointing before.

Notice if we were doing this for a microprocessor with 16-bit addresses, we could use a uint16_t, or 2 byte integer, for the bookkeeping area instead.

separator.png

Before we can use the memory, we must initialize the memory heap area to note that the entire heap is free. This means we need to set up a bookkeeping header that indicates that all of memory (minus the area for bookkeeping) is free. We also need to make sure the free memory itself is aligned on a 16 byte boundary–which means we must waste the first 12 bytes of memory. More advanced memory allocation schemes may make use of that area for other bookkeeping information, but for now we simply waste the memory.

When initialized our RAM area will look like:

memfreeheap.png

Our initialization routine will look like:

/*	initialize_ram
 *
 *		Given the block of ram memory, we mark the entire 
 *	block as free. We do this by setting up free block 16 
 *	bytes in; we do this so that the first allocated block
 *	will be aligned on a 16 byte boundary. This wastes the
 *	bottom 12 bytes of memory which could, for a more
 *	sophisticated system, be used for other stuff.
 *
 *		We have GRam point to the first byte of RAM, and
 *	RAMSIZE is the size of the RAM area being managed.
 */

void initialize_ram()
{
	/* Find the location of the bookkeeping block for this.
	 * The address is 12 bytes in; 16 bytes for alignment 
	 * minus 4 bytes for the bookeeping area
	 */
	
	uint32_t *bookkeeping = (uint32_t *)(((uint8_t *)GRam) + 12);
	
	/*
	 *	Now mark the memory as free. The free size is 16
	 *	bytes smaller than the max ram size, and mark it
	 *	as free. We assume RAMSIZE is divisible by 16.
	 */
	
	*bookkeeping = (RAMSIZE - 16) | 1;
	
	/*
	 *	And finally mark the end bookkeeing block. Because of the way
	 *	we allocate memory, the top 4 bytes must be specially
	 *	marked as a reserved block. That's so we know we have
	 *	hit the top.
	 */
	
	bookkeeping = (uint32_t *)(((uint8_t *)GRam) + RAMSIZE - 4);
	*bookkeeping = 4;		// marked as reserved 4 bytes.
}

Free is simple. Because our convention is that a block of memory is considered free if the least significant bit of the 32-bit bookkeeping area is set, free simply sets that bit.

/*	free
 *
 *		Free memory. This works by finding the bookkeeping block
 *	that is 4 bytes before the current pointer, and marking the
 *	free bit.
 */

void free(void *ptr)
{
	/*
	 *	Subtract 4 bytes from the current pointer to get the
	 *	address of the bookkeeping memory
	 */
	
	uint32_t *bookkeeping = (uint32_t *)(((uint8_t *)ptr) - 4);
	
	/*
	 *	Mark the memory as free by setting the lowest bit
	 */
	
	*bookkeeping |= 1;
}

Most of the meat of our memory management system will be in the alloc call. This has to scan through all the free blocks, finding the first chunk of memory that can be reserved. We also, by convention, return NULL if there is no memory left that can be allocated.

The first thing our allocation routine does is find out how big a chunk of memory we need to reserve. Because we must guarantee everything is aligned on a 16-byte boundary, this means a request for 2 bytes must reserve 16 bytes; that way, the next allocation request will also be aligned correctly. (More sophisticated memory management systems may know enough to subdivide memory for smaller allocation requests; we don’t do that here.)

So we must calculate the size, including the extra 4 bytes needed for our bookkeeping block:

	/*
	 *	Step 1: Change the size of the allocation to align
	 *	to 16 bytes, and add in the bookkeeping chunk that
	 *	we allocate. Our pointer will return the first
	 *	byte past the bookkeeping memory.
	 */
	
	size = size + 4;		// Add bookkeeping
	if (0 != (size % 16)) size += 16 - (size % 16);	// align

Next, we need to scan through memory from the first block in memory, searching for a collection of free blocks which are big enough for us to reserve for our requested block.

	/*
	 *	Step 2: scan to find a space where this will fit.
	 *	Notice that we may have to piece together multiple
	 *	free blocks, since our free doesn't glue together
	 *	free blocks.
	 */
	
	ptr = (uint32_t *)(((uint8_t *)GRam) + 12);
	end = (uint32_t *)(((uint8_t *)GRam) + RAMSIZE);
	while (ptr < end) {
		if (0 == (1 & *ptr)) {
			/* Lowest bit is clear; this is allocated memory. */
			/* The bookkeeping area holds the total size of the */
			/* allocated area in bytes; thus, we can skip to */
			/* the next block by adding the bookkeeping area */
			/* to the current block. */
			ptr = (uint32_t *)(((uint8_t *)ptr) + *ptr);
		} else {
			/*
			 *	This area is free. Note that this is found, then
			 *	start scanning free areas until we piece together
			 *	something big enough to fit my request
			 */
			
			found = ptr;
			asize = *ptr & ~1UL;		// Get the size, clearing free bit
			ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));	// next block
			while ((asize < size) && (ptr < end)) {
				if (0 == (1 & *ptr)) {
					/* We bumped against an allocated block of memory */
					/* Exit this loop and continue scanning */
					break;
				}
				
				/* Block is free. Add it up to this block and continue */
				asize += *ptr & ~1UL;
				ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));	// next block
			}
			if (asize = end) return NULL;		// Could not find free space

This gets a bit complicated.

First, we set ptr and end to point to the start block and the end of memory, respectively. Next, we walk through while our pointer has not reached the end, looking for free memory.

Now our free() routine simply marks the memory as free; it doesn’t gather up blocks of free memory and glue them together. So our allocation routine has to do the job. When we first encounter a free block, we note how much memory the free block represents in asize, and we put the start of that free block in found. We then continue to scan forward until we either piece enough memory together to satisfy the size we need, or until we find a reserved block which prevents us from using this chunk of free memory.

If we run out of memory: that is, if the pointer ptr goes past end, then we can’t satisfy this request, so we return NULL.

Now that we’ve found a chunk of memory that satisfies our request, we piece together the free blocks, breaking the last free block in the list of free blocks if needed.

When this chunk of code is reached, found points to the start of our free memory, and ptr points to the last free block in the list of free blocks that may need to be split.

We write the correct bookkeeping data to mark the memory as allocated, and if that is shy of the end of the free blocks, we then write a free block bookkeeping mark:

	/*
	 *	Step 3: mark the block as allocated, and split the last free block
	 *	if needed
	 */
	
	*found = size;						// mark the size we've allocated.
	if (size < asize) {
		/* We have more than enough memory. Split the free block */
		/* First, find the pointer to where the free block should go */
		ptr = (uint32_t *)(((uint8_t *)found) + size);
		
		/* Next, mark this as a free block */
		ptr = (asize - size) | 1;
	}

This works correctly because asize is the total size of the range of free blocks we just glued together, so (uint8_t *)found + asize will point to the next block of memory past the free memory we just found.

Now that we’ve peeled off a chunk of memory, we need to return the memory itself. Up until now our pointers have been pointing at the first byte of the 4-byte bookkeeping record; the memory we’re allocating is just past that 4 byte record. So we need to return the first byte of our allocated memory itself:

	
	/*
	 *	The found pointer points to the bookkeeping block. We need to
	 *	return the free memory itself, which starts with the first byte
	 *	past the bookkeeping mark.
	 */
	
	return (void *)(((uint8_t *)found) + 4);

Putting our alloc routine together we get:

/*	alloc
 *
 *		Allocate the requested memory by scanning the heap
 *	of free blocks until we find something that will fit.
 *	We then split the free blocks and return the allocated
 *	memory.
 *
 *		If we are out of memory, we return NULL.
 */

void *alloc(uint32_t size)
{
	uint32_t *ptr;
	uint32_t *end;
	uint32_t *found;
	uint32_t asize;
	
	/*
	 *	Step 1: Change the size of the allocation to align
	 *	to 16 bytes, and add in the bookkeeping chunk that
	 *	we allocate. Our pointer will return the first
	 *	byte past the bookkeeping memory.
	 */
	
	size = size + 4;		// Add bookkeeping
	if (0 != (size % 16)) size += 16 - (size % 16);	// align
	
	/*
	 *	Step 2: scan to find a space where this will fit.
	 *	Notice that we may have to piece together multiple
	 *	free blocks, since our free doesn't glue together
	 *	free blocks.
	 */
	
	ptr = (uint32_t *)(((uint8_t *)GRam) + 12);
	end = (uint32_t *)(((uint8_t *)GRam) + RAMSIZE);
	while (ptr < end) {
		if (0 == (1 & *ptr)) {
			/* Lowest bit is clear; this is allocated memory. */
			/* The bookkeeping area holds the total size of the */
			/* allocated area in bytes; thus, we can skip to */
			/* the next block by adding the bookkeeping area */
			/* to the current block. */
			ptr = (uint32_t *)(((uint8_t *)ptr) + *ptr);
		} else {
			/*
			 *	This area is free. Note that this is found, then
			 *	start scanning free areas until we piece together
			 *	something big enough to fit my request
			 */
			
			found = ptr;
			asize = *ptr & ~1UL;		// Get the size, clearing free bit
			ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));	// next block
			while ((asize < size) && (ptr < end)) {
				if (0 == (1 & *ptr)) {
					/* We bumped against an allocated block of memory */
					/* Exit this loop and continue scanning */
					break;
				}
				
				/* Block is free. Add it up to this block and continue */
				asize += *ptr & ~1UL;
				ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));	// next block
			}
			if (asize = end) return NULL;		// Could not find free space
	
	/*
	 *	Step 3: mark the block as allocated, and split the last free block
	 *	if needed
	 */
	
	*found = size;						// mark the size we've allocated.
	if (size < asize) {
		/* We have more than enough memory. Split the free block */
		/* First, find the pointer to where the free block should go */
		ptr = (uint32_t *)(((uint8_t *)found) + size);
		
		/* Next, mark this as a free block */
		ptr = (asize - size) | 1;
	}
	
	/*
	 *	The found pointer points to the bookkeeping block. We need to
	 *	return the free memory itself, which starts with the first byte
	 *	past the bookkeeping mark.
	 */
	
	return (void *)(((uint8_t *)found) + 4);
}

We can now use this code to illustrate some interesting things about heap memory allocation.

First, notice how free() simply marks the memory as free, but without overwriting the memory. This is why the following code, while quite illegal, can still work:

int error1()
{
	/* Allocate some memory and initialize it */
	int *someWord = (int *)alloc(sizeof(int));
	*someWord = 5;
	
	/* Free the memory */
	free(someWord);
	
	/* Now access the freed memory */
	return *someWord;
}

This is not guaranteed to work. Far from it, it’s illegal to access memory that was released after it was released–you don’t know what is going to happen to that chunk of memory. But in most cases, it’s simply marked as no longer reserved–but the values are still there in memory.

And this becomes a problem if the memory is reallocated to some other pointer which does something else with the memory:

int error2()
{
	/* Allocate some memory and initialize it */
	int *someWord = (int *)alloc(sizeof(int));
	*someWord = 5;
	
	/* Free the memory */
	free(someWord);
	
	/* Allocate some other memory */
	int *someOtherWord = (int *)alloc(sizeof(int));
	*someOtherWord = 6;
	
	/* Now access the freed memory */
	return *someWord;
}

What makes bugs like this very frustrating to find is that, in general, the patterns of allocs and frees are not quite so uniform. It can be quite unpredictable; for example, while the above probably will cause 6 to be returned using our version of alloc and free, the following may or may return 5 or 6 or even some other value, depending on how memory has been fragmented in the past:

int error3()
{
	/* Allocate some memory and initialize it */
	int *someWord = (int *)alloc(sizeof(int));
	*someWord = 5;
	
	/* Free the memory */
	free(someWord);
	
	/* Allocate some other memory */
	int *someOtherWord = (int *)alloc(sizeof(28));
	*someOtherWord = 6;
	
	/* Now access the freed memory */
	return *someWord;
}

Because the size of the second allocation is different than the first, it may or may not use the previously allocated memory.

Second, over time memory can fragment. Fragmentation can cause things to slow down over time, and they can even put you in the situation where after doing a bunch of small allocations and frees, you may still have plenty of memory left–but no block is large enough to put your request.

Different memory management methods attempt to resolve this problem using various techniques, of course–and on most modern operating systems there is plenty of memory so fragmentation is unlikely.

separator.png

As an aside, sometimes it is useful to allocate a whole bunch of tiny little objects and release them all at once. For example, a 3D rendering program may dynamically allocate a whole bunch of objects during rendering–but free them only after the entire image is drawn on the screen.

To do this you can allocate large chunks of memory, then subdivide the memory as needed, keeping the large allocated chunks of memory in a linked list, so when it comes time to free all of memory, the free operation can be done in one call.

separator.png

Another interesting thing to point out is that memory has to be explicitly allocated or freed. We are just one step above address registers in the microprocessor; our heap is something we must manually maintain. If we set a pointer to a new address, and we don’t free the memory that our pointer used to be pointing to, the free() routine has no idea that our memory must be freed.

In C and C++ this is the status quo: you must explicitly allocate or free memory. C++ adds the concept of ‘new’ and ‘delete’ which call a class constructor after the memory is allocated and the class destructor before the memory is freed; however, memory must still be explicitly allocated or freed.

In a world where there is only global memory, auto (stack-based) memory and heap memory this works okay. But once we start talking about object-oriented programming it is natural for us to talk about two pointers pointing to the same object in memory. And knowing when that object should be freed() becomes far more complicated than in the traditional procedural-based allocation scheme where we guarantee by convention that only one pointer holds onto a chunk of memory at a time.

And there are two ways we can solve this problem: resource counting and garbage collection.

Garbage Collection

Garbage collection is a technique whereby the operating system will automatically find memory that is no longer being used. The advantage of garbage collection is that you don’t have to remember to call free(). The disadvantage is that garbage collection can be computationally expensive and hard to get right.

There are several ways to handle garbage collection. The technique I’ll outline here is the simple mark and sweep technique to find (and mark) all memory that is currently being used, then sweeping through and freeing memory that is not marked.

Essentially it works like this.

With each allocated chunk of memory, we also reserve a bit used to mark that memory. From our allocator above, we could use the second to least significant bit to represent marking. We need a mark routine to mark the memory as such:

/*	mark
 *
 *		This marks memory. This marks the pointer by flipping the mark
 *	bit
 */

void mark(void *ptr)
{
	if (ptr == NULL) return;
	
	/*
	 *	Subtract 4 bytes from the current pointer to get the
	 *	address of the bookkeeping memory
	 */
	
	uint32_t *bookkeeping = (uint32_t *)(((uint8_t *)ptr) - 4);
	
	/*
	 *	Mark the memory by setting the second lowest bit
	 */
	
	*bookkeeping |= 2;
}

The first step to garbage collection is to do the mark phase: to mark all of the memory that is currently referred to by other chunks of memory in our system. To do this we use a ptr variable which points at the current block; as we sweep forward across all the blocks in the system, if we find a block that is marked, we then try to find all the pointers in that block, and mark them. This repeated marking continues until we no longer have any new memory blocks that need to be marked.

After we’ve done this marking, we sweep, freeing all blocks of memory that is not marked.

The hard part of any memory collection mechanism is to know in global memory and on the stack which are the address pointers and which are not. Languages such as Java keep class information around to allow the garbage collector to know exactly which things in memory refer to address pointers. Other languages, such as C, do not maintain this information–and so the garbage collector effectively “guesses.”

We assume in our code we have three methods: one which will mark all pointers in global memory and on the stack, another which returns the number of pointers inside an allocated memory object, and a third which returns the pointers in an allocated memory object.

/*	allocGC
 *
 *		Allocate but with a garbage collector. We do the mark/sweep phase
 *	on memory. We rely upon two other calls: a call to mark all pointers in
 *	global memory and on the stack, and a call which can tell us within an
 *	allocated chunk of memory which are the pointers by marking them.
 *
 *		In both cases we assume the mark routine will call mark() above
 *	to mark the memory as in use
 */

void *allocGC(uint32_t allocLen)
{
	uint32_t *ptr;
	uint32_t *end;
	uint32_t *tmp;
	uint32_t *tmp2;
	uint16_t len,i;

	/*
	 *	Try to allocate
	 */
	
	ptr = alloc(allocLen);
	
	/*
	 *	Out of memory?
	 */
	
	if (ptr == NULL) {
		/*
		 *	Out of memory; do garbage collection. First, clear the mark
		 *	bits for allocated memory
		 */
		
		ptr = (uint32_t *)(((uint8_t *)GRam) + 12);
		end = (uint32_t *)(((uint8_t *)GRam) + RAMSIZE);
		while (ptr < end) {
			*ptr &= ~2UL;		// clear second least bit
								// move to next block in memory
			ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));
		}
		
		/*
		 *	Now ask to mark memory on the stack and in global heap space
		 */
		
		markGlobalStack();
		
		/*
		 *	Run through and mark all references. We rely upon the 
		 *	routines numPointersInAllocBlock and pointerInAllocBlock
		 *	to return the pointers inside a block
		 */
		
		ptr = (uint32_t *)(((uint8_t *)GRam) + 12);
		while (ptr < end) {
			/*
			 *	If this pointer is marked, then find all the pointers that
			 *	are inside this pointer, and mark them, moving the pointer
			 *	backwards to the earliest unmarked object
			 */
			
			if (0 != (*ptr & 2)) {
				/*
				 *	Memory marked. Find all the pointers inside
				 */
				
				tmp2 = ptr;
				len = numPointersInAllocBlock(ptr);
				for (i = 0; i  tmp2) tmp2 = tmp;
						*tmp &= 2;
					}
				}
				
				/*
				 *	We may have moved tmp2 before ptr; this means we need
				 *	to pick up sweeping from tmp2, since we have a pointer
				 *	pointing backwards in memory
				 */
				
				ptr = tmp2;
			}
			
			/*
			 *	Move to next block
			 */
			
			ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));
		}
		
		/*
		 *	Now that we've gotten here, we need to free all allocated memory
		 *	that is not marked
		 */
		
		ptr = (uint32_t *)(((uint8_t *)GRam) + 12);
		while (ptr < end) {
			if (0 == (0x3 & *ptr)) {		// allocated, unmarked?
				*ptr |= 1;					// mark as freed
			}
			/*
			 *	Move to next block
			 */
			
			ptr = (uint32_t *)(((uint8_t *)ptr) + (*ptr & ~1UL));
		}
		
		/*
		 *	Now try again
		 */
		
		ptr = alloc(allocLen);
	}
	return ptr;
}

Essentially our garbage collector starts by sweeping all memory and clearing the mark bit.

gc1.png

We start by marking all the objects (using markGlobalState) that are pointed to by objects on the stack, or by objects in global memory:

gc2.png

Now we start in the loop to run through the blocks. Our ptr routine searches for the next marked block of memory, and then marks all the blocks that object points to:

gc3.png

We continue to mark foward, moving the pointer backwards only if we encounter a pointer to an earlier block in memory that is currently unmarked. This rewinding of the pointer is necessary to handle backwards pointing references without having to rewalk all of the blocks from the start of the heap:

gc4.png

(Because this pointer refers to something before the previous pointer, we move our pointer backwards:)

gc5.png

Once we’re done–and this is guaranteed to complete because there is only a finite number of objects, and we only rewind the pointer when something is unmarked–we then sweep through all of memory, marking as free objects we were unable to reach. We know these objects are freed because we could not reach them:

gc6.png
separator.png

Reference Counting

Garbage collection is very difficult to do correctly. You need to have a method, no matter in what state you’re in, to know where the pointers are, and what they point to. And you have to have a way to look inside of every object and know what chunks of memory in the heap are pointers, in order to correctly mark things.

In other words, you need to provide markGlobalState, numPointersInAllocBlock, and pointerInAllocBlock or the equivalent.

An easier way to track the pointers pointing to an object is by tracking a reference count associated with each object. This requires some work on the part of the developer; in fact, it requires that you explicitly call routines similar to alloc and free to keep track of the reference count to a collection of objects. On the other hand, it doesn’t require a lot of work to get working correctly. And this technique has been adopted by Microsoft’s COM system and Apple’s Objective-C on the Macintosh or iOS operating systems.

(Yes, I know the latest versions of Objective C on the Macintosh provide garbage collection. However, you can still do reference counting, and you must do reference counting on iOS.)

separator.png

Reference counting is extremely easy to do. Essentially it involves having the objects you want managed via reference counting internally store a reference count. Newly allocated objects set the reference count to 1, and if the reference count reaches zero, the object frees itself.

In C++ we can easily declare a base object for reference counting:

class BaseObject
{
	public:
					BaseObject();
		
		void			Retain();
		void			Release();
		
	protected:
		virtual		~BaseObject();

	private:
		uint32_t		fRefCount;
};

Our base class stores a reference count, called fRefCount. When we allocate our object, we first set the reference count to 1:

BaseObject::BaseObject()
{
	fRefCount = 1;
}

We then need to mark something as retained: meaning there is a new pointer pointing to the same object. The retain method can be written:

void BaseObject::Retain()
{
	++fRefCount;
}

Releasing the object then decrements the count, and if the count reaches zero, frees the object:

void BaseObject::Release()
{
	if (0 == --fRefCount) {
		delete this;
	}
}

In Objective C (but not on Microsoft COM) we have an additional method, called “autorelease”, which adds the object to an NSAutoreleasePool, which automatically calls release when the pool is drained, either explicitly or implicitly at the end of each event loop. In C++ we can do something similar by extending our base class by adding an array of object pointers:

class BaseObject
{
	public:
					BaseObject();
		
		void			Retain();
		void			Release();
		void			AutoRelease();
		
		static void	Drain();
		
	protected:
		virtual		~BaseObject();

	private:
		uint32_t		fRefCount;
		static std::vector gPool;
};

We then add an AutoRelease and a Drain methods:

void BaseObject::AutoRelease()
{
	gPool.push_back(this);
}

void BaseObject::Drain()
{
	/* Run through our array of objects */
	int i,len = gPool.size();
	for (i = 0; i Release();
	}
	/* Now erase the array */
	gPool.clear();
}
separator.png

Notice that simple assignment of pointers doesn’t actually call Retain or Release anymore than it automatically called alloc() and free() described earlier. Instead, we must use a coding convention to know when we should Retain, when we should Release, and (if present) when we should AutoRelease.

The Microsoft COM rules are quite simple: if a function call returns a pointer, you must release that pointer in your routine when you are no longer using it. So, for example, suppose we have a routine GetThing() which returns a BaseObject:

BaseObject *GetThing()
{
	return new BaseObject();
}

Then a caller must in turn release the value:

void UseBaseThing()
{
	BaseObject *obj = GetThing();
	
	/* Do some stuff to this */
	
	obj->Release();
}

Now if the routine GetThing is returning a reference to an object that it is storing in memory, then when the object is returned, the routine’s return value “owns” the reference to that global object, but the ownership must continue to be held locally. So you would use Retain:

BaseObject *gPtr;

BaseObject *GetThing()
{
	gPtr->Retain();
	return gPtr;
}

And similarly, if the caller function wants to hold onto the pointer (say, by storing it in a global variable), rather than call retain we simply keep ownership of the pointer:

BaseObject *gPtr2;

void UseBaseThing()
{
	BaseObject *obj = GetThing();
	/* Store object; don't release it--we still own the reference. */
	gPtr2 = obj;
}

The rule is quite simple: if a pointer is returned, the pointer must be released. But it adds complexity: it means you must constantly be calling the ‘Release’ method all the time, and that can become quite cumbersome.

On the other hand, and the key point to all of this, is that the retain and release rules are simple–and they are local: you don’t need to know how any other module in the system works, you only need to know what to do locally.

separator.png

Apple gets around the problem of constantly having to release objects (and retain objects that are held locally) by introducing the -autorelease method.

On the Macintosh (and iOS), the rules are given on Apple’s web site. The two rules are:

(1) You gain ownership of an object only if you create it (using -alloc or any method that starts with ‘new’, or contains ‘copy’), or if you explicitly retain the object.

(2) You release or autorelease the object when you no longer need to hold ownership of the object.

Using these two rules, we wind up writing less code–but things can get a little more complicated. In our example above, our ‘GetThing’ routine, because it does not start with ‘new’, would simply return the object:

BaseObject *gPtr;
+ (BaseObject *)getThing
{
	return gPtr;
}

If we were allocating this object (as in our first example), we would, because of the naming convention either rename our method to ‘newThing’:

+ (BaseObject *)newThing
{
	return [[BaseObject alloc] init];
}

Or we would need to make sure that ownership is not passed up by marking the object as autorelease:

+ (BaseObject *)getThing
{
	return [[[BaseObject alloc] init] autorelease];
}

In fact, this pattern is so common you’ll find yourself typing “alloc] init] autorelease]” nearly on autopilot.

The call “UseBaseThing” is similarly changed, depending on how we’re using it. If we don’t hold onto a reference to the object, we would not need to call ‘release’ because we aren’t hanging onto the object:

+ (void)useBaseThing
{
	BaseObject *obj = [BaseObject getThing];
	
	/* Do some stuff to this */
	/* Notice we don't release this object */
}

And if we are hanging onto the object, we must retain the object:

+ (void)useBaseThing
{
	BaseObject *obj = [BaseObject getThing];
	gPtr2 = [obj retain];
}

Likewise, if we are calling newThing, we’d be getting ownership back from the call–so we would need to release it as appropriate. So, our examples would be:

+ (void)useBaseThing
{
	BaseObject *obj = [BaseObject newThing];
	
	/* Do some stuff to this */

	[obj release];
}

And, if holding the object, we already have ownership, so we don’t need to release it:

+ (void)useBaseThing
{
	BaseObject *obj = [BaseObject getThing];
	gPtr2 = obj;
}
separator.png

Notice in all of the above, simply assigning or copying pointers around in a pointer doesn’t actually do anything on its own. A pointer is simply like an integer, but with the integer referring to an address in memory. In all of this evolution from fixed locations in RAM to garbage collection and object reference counting, we have never changed the immutable fact that simply copying or adding values to an address doesn’t affect how heap memory is handled. Garbage collection takes place after an object is no longer pointed to–and sometimes objects that are no longer referenced by a pointer can live for a very long time.

There are other subtleties that can take place here: different versions of memory management tools may do different things when a chunk of memory is allocated or freed. Some test tools may even store the location in a program where an object is allocated, so if there is an unbalanced alloc/free cycle, the line of code in error can be discovered easily. And there are many flavors of garbage collection out there which each behave differently, though ultimately result in the same end.

Further, with reference counting, cycles of objects can easily be created which cause the objects to linger long after those objects are no longer actually in use. It’s why it is important to think about cycles of pointers, especially when developing UIView objects which may hold references to other UIViews to perform certain operations.

In fact, a very common bug is to create one UIView that refers to another:

@class BView;
@interface AView : UIView
{
	BView *view;
}

@property (retain) BView *view;
@end

@interface BView : UIView
{
	AView *view;
}

@property (retain) AView *view;
@end

The problem here is that if AView and BView are part of the same view hierarchy, and are assigned to each other, then when the view hierarchy is released, AView and BView retain each other–preventing the memory from being reclaimed even though it is no longer being used.

In cases like this, when a third object holds two others which refer to each other–in this case, implicitly, by the UIWindow holding all of the views in the view hierarchy–it is better if only an unretained reference is held to these objects instead:

@class BView;
@interface AView : UIView
{
	BView *view;
}

@property (assign) BView *view;
@end

@interface BView : UIView
{
	AView *view;
}

@property (assign) AView *view;
@end

Assignment is not ownership, and when the window is released, AView and BView will both be released successfully.

separator.png

Hopefully this brief overview of memory management, from setting fixed locations in RAM to heaps to garbage collection and reference counting, will give you a good idea of how memory management actually works–and what the pitfalls are.

The key takeaways should be:

(1) Assigning pointers is not ownership. Simply writing Pointer *a = b; doesn’t actually grant ownership to ‘a’ when it points to ‘b’. You must, unless in a garbage collection environment, explicitly ask for and release ownership.

(2) If not using reference counting, the assumption is that only one pointer actually “owns” an object. For object-oriented programming this is too strict a restriction, and thus we must either introduce garbage collection or establish reference counting and conventions on how references are counted.

(3) For reference counting systems, there are (to my knowledge) two conventions for reference counting: the Microsoft COM convention, and the Apple Objective-C convention. You must also be aware of cyclical ownership references: if you accidentally grant ownership that turns into a cycle or ring of ownership, objects may leak because they mutually refer to each other, without actually being used anywhere else.

I love Wikipedia.

The other day I had to hook up some code via JSON. Having no idea what JSON is, I looked on Wikipedia. Basic data types, examples, syntax, and a link to RFC 4627 later, and I was set. Cool!

I wanted to implement a hash map, but I wasn’t sure if I wanted to do what I’ve done in the past, which is to represent each hash bucket with a linked list, or if I wanted to use a list. Unsure of the pros and cons, I looked up Hash tables on Wikipedia and got a reasonable overview discussion.

I’m finding there is a lot on Wikipedia that provides a good cursory overview of different protocols and algorithms, and more importantly, points me to the relevant research, descriptions or RFCs which describe the thing in greater detail. And while I tend to take Wikipedia with a grain of salt (you never know if the guy who wrote the article knew what he was talking about, or if the page was vandalized), the cursory overview (even inaccurate) combined with pointers to scholarly research or published standards makes a great reference.

I know this is old news: however, in the past few days I’ve found myself using Wikipedia as a starting point for a lot of CS related searches.