I recently decided to use a binary tree data structure as a part of a much larger C project.  As it turns out, red-black trees are a relatively simple way to keep the additions balanced so that the tree doesn't end up all lop-sided with depth larger than the optimal O(log n).  So of course, like a good coder, I found a few freely available implementations and the became dissatisfied with them.

  It also turns out that most implementations use parent pointers, along with their associated baggage (data structure imposition, maintenance and excessive pointer dereferencing).  I'd like to keep my data structure how it is but use some generic functions to insert and delete (and of course lookup) values.  The whole thing being implemented in C, it's easy to find examples for specialized data structures, but hard to find a generic one.  I think this has to do with the procedural way C programmers approach problems.

  Anyway, any data structure that can be sorted, that contains left and right pointers, and has one bit for ID-ing red/blackness can be worked with when we provide the following 'key':

typedef struct {
    int (*cmp)(const void *, const void *); // as in qsort
    unsigned int coff; // offset of 2 (left / right) child pointers
    unsigned int boff; // offset of char holding R/B bit.
    unsigned char mask; // contains a one where red/black bit is set.
} rbop_t;

The problem with no parent pointers is this:  Since the red/black property has to be fussed with in a bottom-to-top direction during addition, a simple top-to-bottom traversal (the only one possible with no parent pointers) doesn't have all the parent pointers it needs to do the recursion.  Instead, the stack has to be relied on in some way.

There are three major possibilities to do work around the top of the tree after working somewhere deep down below.  First, a recursive function is called and uses a post-traversal to do work remaining at the top node after the bottom ones are done.  The problem here is that some information available about the traversal after the top nodes is lost on each return.  I admit that this one would work for red-black tree insertion -- but I hadn't read through the whole algorithm yet, and from the use of a possibly un-ending list of parent pointers it wasn't clear this would be the case.  This is just that boring recursive function stuff anyway.

Next, come some more intriguing ideas.  Suppose the recursive function did some work at the top, descended to the bottom, then went back to the top (as in a post-traversal) -- but didn't stop there?  Instead, we mimick a finite state machine as in parsing:

int rec_state(void *info, int n) {
    while(n >= 0) {
        if(n > 0) { // go down the stack
            n = rec_state(info->next, n-1);
        } else {
            n = work(info);
        }
    }
    return n+1; // go up the stack
}

Like the head in a primitive Turing machine, the state could let us know which point of the stack to operate on.  Since each level maintains a local view of the tree structure, we can use the stack to hold the whole traversal.

On a third examination, we recognize that assembly programmers would laugh at us for doing this (in fact almost the only difference between C and asm is that the former hides the operations of the stack from us).  If you know your architectures, and size added to the stack for each function call, you could just access after the end of the local stack to get those previous generations.  That's a bit hacky, but a POSIX-compliant C way to do this is to just pass the location of the caller's stack:

typedef struct rst rec_stack_t;
struct rst {
  rec_stack_t *prev;
  void *info;
};

int rec_stack(rec_stack_t *prev) {
    void *info = child_ptr(prev->info);
    rec_stack_t my = {prev, info};

    work(prev->prev->info, prev->info, info); // three generations
    return rec_stack(my); // or the recursion
}

Now we have a LISP-style (reversed) linked-list allocated for us on the stack automatically with each function call.  What's amazing here is that
1. LISP wins yet again in the Turing vs. LISP implementation wars (arguably more efficient and eventually readable but harder to initially understand).
2. The presence of the stack is both statically typed and very LISP-y from the beginning.
3. I have not presented any code for implementing red/black trees using these techniques -- as I said, it can be done with simple (non-tail) recursion anyway.


 


Comments




Leave a Reply