I recently decided to use a binary tree data structure as a part of a much larger C project.  As it turns out, red-black trees are a relatively simple way to keep the additions balanced so that the tree doesn't end up all lop-sided with depth larger than the optimal O(log n).  So of course, like a good coder, I found a few freely available implementations and the became dissatisfied with them.

  It also turns out that most implementations use parent pointers, along with their associated baggage (data structure imposition, maintenance and excessive pointer dereferencing).  I'd like to keep my data structure how it is but use some generic functions to insert and delete (and of course lookup) values.  The whole thing being implemented in C, it's easy to find examples for specialized data structures, but hard to find a generic one.  I think this has to do with the procedural way C programmers approach problems.

  Anyway, any data structure that can be sorted, that contains left and right pointers, and has one bit for ID-ing red/blackness can be worked with when we provide the following 'key':

typedef struct {
    int (*cmp)(const void *, const void *); // as in qsort
    unsigned int coff; // offset of 2 (left / right) child pointers
    unsigned int boff; // offset of char holding R/B bit.
    unsigned char mask; // contains a one where red/black bit is set.
} rbop_t;

The problem with no parent pointers is this:  Since the red/black property has to be fussed with in a bottom-to-top direction during addition, a simple top-to-bottom traversal (the only one possible with no parent pointers) doesn't have all the parent pointers it needs to do the recursion.  Instead, the stack has to be relied on in some way.

There are three major possibilities to do work around the top of the tree after working somewhere deep down below.  First, a recursive function is called and uses a post-traversal to do work remaining at the top node after the bottom ones are done.  The problem here is that some information available about the traversal after the top nodes is lost on each return.  I admit that this one would work for red-black tree insertion -- but I hadn't read through the whole algorithm yet, and from the use of a possibly un-ending list of parent pointers it wasn't clear this would be the case.  This is just that boring recursive function stuff anyway.

Next, come some more intriguing ideas.  Suppose the recursive function did some work at the top, descended to the bottom, then went back to the top (as in a post-traversal) -- but didn't stop there?  Instead, we mimick a finite state machine as in parsing:

int rec_state(void *info, int n) {
    while(n >= 0) {
        if(n > 0) { // go down the stack
            n = rec_state(info->next, n-1);
        } else {
            n = work(info);
        }
    }
    return n+1; // go up the stack
}

Like the head in a primitive Turing machine, the state could let us know which point of the stack to operate on.  Since each level maintains a local view of the tree structure, we can use the stack to hold the whole traversal.

On a third examination, we recognize that assembly programmers would laugh at us for doing this (in fact almost the only difference between C and asm is that the former hides the operations of the stack from us).  If you know your architectures, and size added to the stack for each function call, you could just access after the end of the local stack to get those previous generations.  That's a bit hacky, but a POSIX-compliant C way to do this is to just pass the location of the caller's stack:

typedef struct rst rec_stack_t;
struct rst {
  rec_stack_t *prev;
  void *info;
};

int rec_stack(rec_stack_t *prev) {
    void *info = child_ptr(prev->info);
    rec_stack_t my = {prev, info};

    work(prev->prev->info, prev->info, info); // three generations
    return rec_stack(my); // or the recursion
}

Now we have a LISP-style (reversed) linked-list allocated for us on the stack automatically with each function call.  What's amazing here is that
1. LISP wins yet again in the Turing vs. LISP implementation wars (arguably more efficient and eventually readable but harder to initially understand).
2. The presence of the stack is both statically typed and very LISP-y from the beginning.
3. I have not presented any code for implementing red/black trees using these techniques -- as I said, it can be done with simple (non-tail) recursion anyway.


 
  In the garden of computing, there were two trees.  The first, the tree of life, grew from one, lambda-shaped root, onto which were applied a series of intersecting trunks, and which were again applied to branches, upon which perched leaves of pure data.  The second tree could only grow a single leaf at a time, and had but a single branch, which continually writhed and twisted in knots along the leaves.  About this tree, it was said ``You shall not eat of it, for in the day you eat of it, you shall surely die."

  But the mechanical operation of the second tree so pleased the computers, that it was immediately adopted as a standard.  The computers reaped their reward in the swift and painful judgment of the segfault.  From then on, it was ordained that they should toil continually to re-write addressing code and flatten control structures by the sweat of their brow.  As their gardening skills increased, they found that many hectares of silicon could be combined into vast farmlands.  This did not help, since although much more was now possible, the sins committed during the Fall could not be undone.  Farm equipment such as threaded processors and cache were developed to automatically manage chunks of silicon, but these suffered from the dreaded cache coherency dilemma and branch prediction paradox.  Such evils overcome the even optimistic increments of ``omp atomic", and force programmers to write and re-write algorithms enslaved to the lifeless, twisting form of that forbidden tree.

 
Some ideas, like Ron Paul's crusade to end monopolies on money, have such a stark simplicity and obvious utility that they continue to cling to the collective consciousness of anyone who has read them.  Plan 9 is one of those ideas.  As we all know (or don't as is the point in this case), Plan 9 got a few Dilbert cartoons, made its point and went off to pasture.  My opinion is that this is mostly because it wasn't open-sourced properly by Bell Labs at the start, and that at present, running the OS is not so important anyway with 9P network protocol bindings available for most languages.  However, it was sad to see IL go.  Overall, I think the trick is less about knowing when to fold, but more about knowing 1) how to market, 2) when to `part-out' an old idea, and 3) when to develop hyperopia and keep pushing on what seems like a dead-end to the unenlightened.

I wrote a little more about Plan 9 in my additions to the go9p library:

  9P is the network protocol used by the mythical Plan9 operating system.  The big idea is that all resources, networked or local, can be represented as read/write  operations on named hierarchies of objects.  Accessing those should be just as easy as  accessing a file, and it shouldn't make any difference if the file represents a local  or remote resource.  For modern re-inventions of this, see REST API design and FUSE.  Although revolutionary in its scope, the OS was not initially released open  source and required some effort in porting existing software, hindering  widespread adoption.  Its relatively small following led Eric S. Raymond, one of its more vocal  designers, to lament that "the most dangerous enemy of a better solution is an existing  codebase that is just good enough."

  After the decision to leave the personal computer and Unix networked server markets in 1996, AT&T divested the National Cash Register Corporation and spun-off Bell Labs as Lucent Technologies, Lucent gave Plan9 a back-seat and development slowly declined.  In 2002, Lucent made its last (4th edition) release of the Plan9 OS.  Interestingly, this nearly coincides with the date that Rob Pike moved from Lucent to Google.  Its major issue is the lack of support for most hardware.  It spewed some random garbage and ground to a halt when I tried to boot it on my AMD 2700-based system.

  In 2011, a fork (9front) of the slowly developing v4 was made to allow faster development, addition of new hardware support, and a re-write of the fossil filesystem to the "Cached Write Once Read Many Filesystem."  All I know is that this one did boot on my (ca. 2002) hardware.

 
  Yampa defines functional reactive programs and a basic 'reactimate' event loop, but comes up short-handed on a rich set of input / output gizmos to play with.  Where's the fun of programming a robot if you can't see the result?

  Enter games - just draw a sprite, simulate some physics and feel the Holiday cheer.  I tried compiling YFrob, but GHC7.4.1 couldn't compile the HGL library dependency for graphics, so I went investigating...

  Gerold Meisinger worked on just this problem, trying to build games on top of Yampa, and was kind enough to blog about it.  Unfortunately, the critical code examples, e.g. 'Yampa/SDL program stub' at appears to be missing, and the code for 'Robotroannah' was never posted.  The second may be understandable, since the game-development world is traditionally further from the open-source model, BUT both Yampa and the Haskell bindings to SDL were opensource, so releasing a stub connecting the two is not unreasonable.

  SDL actually makes a good example of a wrapped library, since it's clear what all those ... -> IO() (writers) and ... -> IO(stuff)
(readers) are doing, while no Ctypes are visible.

  Looking at the structure of Yampa's reactimate loop is critical for understanding how to piece together a working IO loop.
    (note: although this was kicked off by Gerold Meisinger, the whole
     discussion was later contributed by `Laeg', demonstrating how
     open-sourcing just a few bits of insight can lead someone else
     down the road to build something much better)
Essentially, you need to spend some time coding the 'sense' and 'activate' functions that will feed into and act upon the wisdom decreed by your soon-to-be Yampa (SF -> SF) oracle.

A few working examples also help:
1. the animate function on line 85 of YFrob-0.4/src/FRP/YFrob/RobotSim/
-- in fact, this example would have saved me a lot of time
   if I had found it first, and not run into a bug compiling HGL.
2. cuboid-0.14.1/Main.hs -- a simple start to a Yampa/OpenGL game.
Also, the most detailed explanation is given in the comments
above the reactimate function in
Yampa-0.9.3/src/FRP/Yampa.hs:3100

  The conclusion is that to make the IO loop work, you need callbacks.


A REPL shell using Haskell + Yampa

Now, let's get something working.  The incremental path is to (work toward a programming language interpreter and) use GNU readline.

  A very simple IO loop, a read-eval-print loop (REPL), can be coded in the top-level IO monad without Yampa, so now there's an entry barrier.  On top of that, it's not clear how to make 'a line was read' into a callback unless I start getting crafty with Concurrent Haskell!
-- I had looked at this earlier, but started to hope that connecting multiple programs via the IO monad (with FRP brains?) would be easier.

  YFrob-0.4/afp-demos/ITest.hs:30
Defines an `interactive' shell, but uses Haskell's standard getLine call, with a blocking evaluation, then print, loop that doesn't use FRP.

  Happily, readline provides a callback interface for times when 'applications need to interleave keyboard I/O with file, device, or window system I/O'.
 
It's not well documented in haskell, but readline says:
http://cnswww.cns.cwru.edu/php/chet/readline/readline.html#SEC41

  Function: void rl_callback_handler_install (const char *prompt, rl_vcpfunc_t *lhandler)
    Set up the terminal for readline I/O and display the initial expanded
  value of prompt. Save the value of lhandler to use as a function to call when
  a complete line of input has been entered. The function takes the text of the
  line as an argument.

And hackage has:
http://hackage.haskell.org/packages/archive/readline/1.0.1.0/doc/html/System-Console-Readline.html
callbackHandlerInstall :: String -> (String -> IO ()) -> IO (IO ())
The return of an IO() is a little cryptic here, but a glance at the source (I'm starting to appreciate Haddock) shows that the returned IO action is a cleanup routine that removes the handler.

So let's try it out:
-- test_rl.hs
import System.IO
import System.Console.Readline
import Data.IORef
import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar

-- Picked up Control.Concurrent's use of an MVar for synchronization
-- from an excellent reference:
-- https://github.com/tibbe/event/blob/master/tests/FileIO.hs

writeOK :: MVar () -> String -> IO()
writeOK done line = do
    iscmd <- case words line of
       ("quit":args) -> do
                        putMVar done ()
                        return False
       (cmd:args) -> return True
       _ -> return False
    if iscmd
       then addHistory line >> putStrLn "OK"
       else return ()

untilIO :: MVar() -> IO()
untilIO done = do
    callbackReadChar
    v <- tryTakeMVar done
    case v of
       Nothing -> do threadDelay 500
                     untilIO done
       Just _  -> do putStrLn ""
                     return ()

main :: IO()
main = do
    hSetBuffering stdin NoBuffering
    done <- newEmptyMVar
    clean <- callbackHandlerInstall ";] " (writeOK done)
    untilIO done
    clean

-- end test_rl.hs
``Do you know what this means?  I finally invent something that works!''
-- Doc Brown. to edit.
-- Now we can transform it to use Yampa

module Main where

import FRP.Yampa
import Data.IORef
import System.Console.Readline
import Data.Time.Clock.POSIX
import Data.Functor
import System.IO

--import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar

{- writeOK is now just a handler,
 - no longer in charge of
 - decoding the line and deciding
 - when we're done.
 -}
writeOK :: MVar String -> String -> IO()
writeOK out line = do
    putMVar out line -- <- its only critical function.

-- Differentiates the posix clock to make Yampa happy.
updTimer :: IORef POSIXTime -> IO(DTime)
updTimer timeRef = do
    t  <- readIORef timeRef
    t' <- getPOSIXTime
    writeIORef timeRef t'
    return $ realToFrac (t - t')

-- Sets up the IORef to be used by updTimer
setupTimer :: IO(IO(DTime))
setupTimer = do
    timer <- newIORef (0 :: POSIXTime)
    let updTime = updTimer timer
    return updTime


getCmd :: Maybe String -> String
getCmd Nothing = ""
getCmd (Just s) = s

mkSense :: (MVar String, IO(DTime)) -> Bool -> IO (DTime, Maybe String)
mkSense (newInput, updTime) _ = do
    callbackReadChar
    cmd <- getCmd <$> tryTakeMVar newInput
    dt <- updTime
    return (dt, Just cmd)

actuate :: Bool -> (String,(Bool,Bool)) -> IO Bool
actuate _ (line,(hist,done)) = do
    if hist
      then do addHistory line
              putStrLn "OK"
              hFlush stdout
    else
      return ()
    callbackReadChar
    return done

-- save history?, end program?
parseCmd :: String -> (Bool,Bool)
parseCmd line =
    case words line of
       ("quit":args) -> (False,True)
       (cmd:args) -> (True,False)
       _ -> (False,False)

sf :: SF String (String,(Bool,Bool)) -- The signal function to be run
sf = identity &&& arr parseCmd

{-
reactimate :: IO a                          -- init
           -> (Bool -> IO (DTime, Maybe a)) -- input/sense
           -> (Bool -> b -> IO Bool)        -- output/actuate
           -> SF a b                        -- process/signal function
           -> IO ()
-}

main :: IO()
main = do
    hSetBuffering stdin NoBuffering
    -- setup events
    updTime  <- setupTimer
    newInput <- newEmptyMVar
    clean <- callbackHandlerInstall ";] " (writeOK newInput)
    let sense = mkSense (newInput, updTime)

    reactimate (return "") sense actuate sf
    clean

  Aside from some cranky issues with having to repeatedly call callbackReadChar, the source shows that readline's input callback works.  Now, we just need to turn that callback into an IORef (or an MVar) that sense can read when it's called.  I also had to add a timer to complete the type of actuate that reactimate needs.  I copied the POSIXTime used in the Hello world example of
  http://www.haskell.org/haskellwiki/Yampa/reactimate
and encapsulated the get, subtract, set operation into an IO(DTime) action.

Some more cleaning up lead to the end result at https://github.com/frobnitzem/yrepl.

The interaction is a little less nice because the callback formerly known as writeOK, that sets the newInput MVar, doesn't write "OK", but instead "OK" is written during the actuate step, after sense has returned control to readline (which writes the prompt).  The history also has a strange habit of never appearing which crept in when yampa was introduced.  The upshot is that everything is now asynchronous and controlled by Yampa - woohoo!
 
  What if I could come up some symbols, attach them to my numbers, and give addition and multiplication special rules when I apply them to the symbols (but not, of course, the numbers)?  Working through Diamond's structural superposition method gave me exactly this impression:

q x /q <- rotate x using q
...
Q x_i Q^-1 <- apply rotations and scaling to x using Q?  Maybe I can even single out a particular x_i.

  Unfortunately for me, none of the terms in the title of this paragraph made any sense to my vectorized brain at the time.  However, like Alice, I persisted a little in this new world of literary nonsense.  What I found was a connection to intersecting shapes, symmetry theory, 4-D spacetime, a new way to understand imaginary numbers, and a multiplication operation that can intersect spheres and spiral objects through space.

  All of it can be done by attaching special symbols to numbers.  We're all familiar with the unit vector directions, for example

2 x - 3 y + 0 z

adding directions should only combine things that are measured in the same units, so we can't combine 2 x and -3 y.  What about multiplying?

  What is the product of two directions?  To start, we should note that x means something different than -x.  The x itself is supposed to have a direction, and the "2" above is how many steps we take in that direction.  The product of two directions should also have some kind of orientation.  Hamilton wanted the product to be invertible, so we could get "a" back from "ab" after somebody had gone off and multiplied it with "b".  It should also have a geometric meaning.  So there you have it, we multiply "a" and "b" directions by performing a geometric operation on "a" that does not loose information.  I'm sure you can picture two vectors, making an L on the table, and then falling over to represent this kind of operation.  That's a good idea (actually the quaternion rotation I mentioned earlier), but it adds another, un-introduced  direction.  We want the result to stay in the (a-b) space, or not have a direction at all.

  So, we revise this idea, and ask about a/b instead.  This, of course, represents the operation that takes "b" into "a", since a/b * b should equal a.  To be sensible, the direction of a/b * c should be the same regardless of the scale of c (c vs. 20c vs -2 c).  So we make a formal rule, numbers (scalars) commute through multiplication:
a 2 c = 2 a c = a c 2, and right away trim this to 2 a = a 2, sometimes so fast the intermediate step never even appears anywhere.

Be patient, xy turns out to be the unit imaginary number soon, and to rotate things by 90 degrees in the page!

  Now 1/b should mean something too, but according to our rules, it can't have a different direction than b.  Of course, it has to have some direction, since otherwise we could not (successfully) invert it twice.  1/b is left with its same direction, and a different length, which will be the inverse of its current length, so 1/b = b/||b||^2, where ||b||^2 is some scalar.  Now with some gymnastics, we find that 1 = (1/b) b = b b / ||b||^2, but 1 and ||b||^2 are scalars, so bb has to be one too!

  Finally we can begin to tell a story.  a/b wants to rotate stuff in the b direction toward stuff in the a direction, staying in the ab plane.  a and b have magnitudes, so a b does the same general thing as a/b, but with a different scale.  What's important in this operation is the geometric relationship between a and b, so we want ab to represent the plane that a and b sit inside.  But we also need the angle between a and b to make that invertible.  It turns out we can use a rule for products that fills all those requirements:
x x = 1
x y = - y x (can't be reduced further)

  This means the directions don't commute through multiplication, we'll have to think about xy as a plane with an orientation and a magnitude, and spend some work on commuting directions when we want to simplify things.  As a test, we'll take the squared magnitude of 2 x - 3 y: (2 x - 3 y) (2 x - 3 y) = 4 + 9 + 6 (xy + yx) = 13, as expected!  Multiplying xy by x + y, we get (xy) x + (xy) y = -y + x, a rotation of (1,1) to (1,-1).  So xy rotates vectors as advertised.

  In fact, it rotates them counterclockwise by 90 degrees.  Now the best part is this:
(xy) (xy) = -yx xy = -1.  xy has the same properties as the unit imaginary number, conventionally defined as i = sqrt(-1).  We can think of i as representing a plane somewhere (not the imaginary axis!), and multiplying by i as swapping two (very real) directions of a vector, not as moving from Descartes to Wick.  Complex analysis gets its power from rotations, and "i" was really only a two-dimensional concept anyway!

That was a surprise to me, and some others also, since David Hestenes showed that this product generates well-known (by those who know it well) constructions from Clifford:

ab = a ⋅ b + a∧b
a ⋅ b = (ab + ba)/2
a∧b = (ab - ba)/2

The inner product: a ⋅ b, which is a scalar, the cosine of the angle between a and b,
and the outer product: a∧b, which is a pure 2-d object (bivector is the jargon).

There are a lot more surprises, but they'll have to wait for now.