On the development of my own Lisp interpreter, part 3: revisiting closures.

I managed to solve my lisp closure problem outlined in this post by adopting the ideas behind this paper: Closure generation based on viewing LAMBDA as EPSILON plus COMPILE. The basic idea of the paper is that you can rewrite Lambda expressions as “epsilon” expressions (that is, ‘lambda’ but which all variable state outside of the lambda block is passed in as arguments), and compiling those functions with a special method which modifies the stack on calling the lambda function.

Let me explain. Suppose we have the following bit of Lisp:

    (define (funct a)
      (lambda (b) (+ a b)))

When we call this function with an argument, say (funct 5), this returns a function pointer to a function that takes one argument (b) and adds 5 to b. So:

> (define tmp (funct 5))
> (tmp 3)
8

See what happened here? By calling funct we created a new function (which we stored in tmp). We can then call that function, and get a result.

The idea of the paper is that in order to get proper closure rules, you can walk the expression (because the beauty of Lisp is that Lisp programs are just lists of lists that a Lisp program can easily parse), and perform a substitution to rewrite our function in terms of “epsilon”, or rather, lambda functions which only have local scope.

Thus, the lambda expression of our function funct above:

    (lambda (a)
        (lambda (b) (+ a b)))

which describes a function taking one argument that returns a function taking one argument, would be rewritten:

    (lambda (a) 
        ($$closure (lambda (a b) (+ a b)) a))

Now this special function $$closure is magic. It takes as it’s first argument a precompiled function, and as the tail of the arguments a list of parameters. It then creates a new function with pre-pends it’s arguments on the stack and jumps to the original function. Thus:

    ($$closure #<procedure:anon-1> 5)

In the above case, the above procedure would generate the following function:

	POP DS   ; Pop the marker giving the size of the stack
	PUSH 5   ; Push the parameter 5 onto the stack
	PUSH DS  ; Push the stack size marker back onto the stack
	JMP <procedure:anon-1>

That is, the function would simply pre-pend the list of arguments to the start of the parameter list (adjusting the stack frame as needed to properly mark the start and end of the stack frame), and then jump to the original procedure.

The code rewriter I built also tracks the current usage of variables so I properly ignore variables not used interior to a function. For example:

    (lambda (a b)
      (set! f1 (lambda (c) (+ a c)))
      (set! f2 (lambda (c) (- b c))))

This is rewritten as:

    (lambda (a b)
      (set! f1 
            ($$closure (lambda (a c) (+ a c)) 
                       a)) 
      (set! f2 
            ($$closure (lambda (b c) (- b c)) 
                       b)))

Meaning we run our $$closure function pushing only the additional parameters that are needed to each of my interior lambda functions.

We also handle the case where you can create a closure that is only accessible from an interior method. For example, if we define a function foo:

    (define foo (lambda (a) 
                        (lambda (b) (set! a (+ a b)) 
                                    a)))

This defines a function which, once run, will store the value ‘a’ away in a variable that is only accessible from the function that is returned. Calls to that interior function can then add to the interior value, and return that value.

Thus:

> (define bar (foo 5))
> (bar 1)
6
> (bar 2)
8
> (bar -3)
5

This interior variable is of course not shared:

> (define bletch (foo 3))
> (bletch 1)
4
> (bar 1)
6

This is accomplished by our routine by detecting if any of the interior lambdas perform a set! operation to update an interior variable held by closure. If so, we then “wrap” the variable in a list, so in effect we use the variable by reference.

Compiling our routine with the lambda closure function, we get:

    (lambda (a) 
      (set! a ($$wrap a)) 
      ($$closure (lambda (a b) ($$store a (+ ($$fetch a) b)) 
                               ($$fetch a)) 
                 a))

The $$wrap function wraps the variable with a list. The function $$store sets the wrapped value, and $$fetch obtains the wrapped value. These three functions are in fact declared as:

(define ($$wrap v) (cons v '()))
(define ($$fetch v) (car v))
(define ($$store v val) (set-car v val))

The way my code works is by doing an analysis pass, which computes the variable usage across the various lambdas in my declaration, then a rewrite pass, to use the information gathered in the variable scope parameters to determine how to rewrite my lambda functions.

The first method is $$eval-scope, which walks through the lambda function, maintaining a record of all the variables that have been encountered. The specific record that I track contains the following information:

1) An association list of the currently encountered variables, maintained as a “push-down” association list. When I encounter a new variable, the variable information is pre-pended to the list of variables. This variable state information contains the following information:

a) The variable name
b) A flag indicating if the variable was written to by an interior lambda (that is, by a lambda function not where this one was declared). This actually has three states: $$read (the variable was read), $$write (the variable was written) and $$unused (the variable was not used by an interior lambda).
c) A pointer to the lambda where this variable was declared,
d) A list of pointers to the lambdas where this variable was used.

Note that we only create records for the variables that we find inside declaration headers: that is, for an expression like (lambda (a b)), we only note ‘a’ and ‘b’. If we see a variable that hasn’t been declared in this lambda or an outer lambda, we assume it was a global variable and thus ignore it.

2) A flag next to the association list indicating if we’ve seen multiple lambdas in this pass. We can use this flag to ignore rewriting the function if we don’t see any embedded lambdas.

3) A push-down list of the lambdas that we’re currently examining. This is important because if we have three nested lambdas, but the middle lambda doesn’t use a variable declared in the outer that is used in the inner, we must pass the used variable through the middle lambda. So a lambda used in the interior is considered “used” by all of the lambdas between the current one we’re examining and the lambda where this variable was first declared.

4) A push-down list of pointers to our association list. This is used essentially to push and pop variable scope: when we enter a lambda function we push the old scope state, and when we’re done with the lambda we pop the scope state. This allows us to quickly unwind the current variable state once we’re done processing an interior lambda.

The $$eval-scope method walks through all of the declarations we’re currently examining, and updates the scope variable. We also, for convenience sake, rewrite lambdas we encounter (once we’ve processed them) with a special token ($$scope), which notes for each lambda we’ve encountered (a) the lambda method itself (transformed by $$eval-scope, naturally), (b) the un-transformed lambda expression (which we need because our scope variable uses pointers to the un-transformed lambda expression as an index), (c) the variables that are used outside the lambda function (that is, the state of the closure when the lambda was encountered), and (d) the variables created and used inside the lambda.

This can be a quite complicated: for our above example:

    (define foo (lambda (a) 
                        (lambda (b) (set! a (+ a b)) 
                                    a)))

The function $$eval-scope returns:

($$scope
 (lambda (a)
   ($$scope
    (lambda (b) (set! a (+ a b)) a)
    (lambda (b) (set! a (+ a b)) a)
    ((a $$write (lambda (a) (lambda (b) (set! a (+ a b)) a)) (lambda (b) (set! a (+ a b)) a) (lambda (a) (lambda (b) (set! a (+ a b)) a))))
    ((b $$unused (lambda (b) (set! a (+ a b)) a) (lambda (b) (set! a (+ a b)) a))
     (a $$write (lambda (a) (lambda (b) (set! a (+ a b)) a)) (lambda (b) (set! a (+ a b)) a) (lambda (a) (lambda (b) (set! a (+ a b)) a))))))
 (lambda (a) (lambda (b) (set! a (+ a b)) a))
 ()
 ((a $$write (lambda (a) (lambda (b) (set! a (+ a b)) a)) (lambda (b) (set! a (+ a b)) a) (lambda (a) (lambda (b) (set! a (+ a b)) a)))))

Ignoring the inner lambda, the outer lambda is:

($$scope
 (lambda (a)
   ($$scope ... ;; rewritten method)
 (lambda (a) (lambda (b) (set! a (+ a b)) a))
 ()
 ((a $$write (lambda (a) (lambda (b) (set! a (+ a b)) a)) (lambda (b) (set! a (+ a b)) a) (lambda (a) (lambda (b) (set! a (+ a b)) a)))))

This indicates that our outer lambda method declares one variable a (see it in the fourth list but not in the third? It starts “((a $$write …”), which is written to. It is used by our outer and inner lambda functions, and declared in our outer lambda. (Let me reformat the association list so you can see it:)

 ((a $$write                                         ;; The variable is written to
     (lambda (a) (lambda (b) (set! a (+ a b)) a))    ;; Where the variable was declared
     (lambda (b) (set! a (+ a b)) a)                 ;; All the places where it was used...
     (lambda (a) (lambda (b) (set! a (+ a b)) a)))))

The interior lambda gets the same treatment:

   ($$scope
    (lambda (b) (set! a (+ a b)) a) ;; The transformed lambda (which needed no transformation)
    (lambda (b) (set! a (+ a b)) a) ;; The original declaration
    ;; Variable state outside this lambda
    ((a $$write (lambda (a) (lambda (b) (set! a (+ a b)) a)) (lambda (b) (set! a (+ a b)) a) (lambda (a) (lambda (b) (set! a (+ a b)) a))))
    ;; Variable state inside this lambda
    ((b $$unused (lambda (b) (set! a (+ a b)) a) (lambda (b) (set! a (+ a b)) a))
     (a $$write (lambda (a) (lambda (b) (set! a (+ a b)) a)) (lambda (b) (set! a (+ a b)) a) (lambda (a) (lambda (b) (set! a (+ a b)) a))))))

Our second pass, the $$rewrite-lambda function, takes the state information expressed in our rewritten lambda function, and does two things:

(1) We rewrite all variable accesses to variables that are ‘written’ to (marked $$write in the association list) by substituting all ‘set!’ operations with ‘$$store’, all read operations with $$fetch, and we add to the start of the declaring lambda function a $$wrap call to wrap our arguments. (Our particular dialect of Lisp doesn’t have macros (yet), so ‘do’ and ‘let’ are explicit objects within our language–and the rewrite engine handles those correctly as well, rewriting things like ‘(do ((x 1 (+ 1 x))…’ as ‘(do ((x ($$wrap 1) ($$wrap (+ 1 ($$fetch x))))) …’.)

(2) We rewrite all $$scope expressions as lambda + closure. That is, our rewrite rules examine the $$scope function and generates a ($$closure (lambda …) …), prepending the variables that are passed outside of our function in as new parameters (added at the start), and creating a $$closure declaration with those added variables, in the same order.

In our above example, the lambda declaration gets it’s final form (for our compiler):

    (lambda (a) 
      (set! a ($$wrap a)) 
      ($$closure (lambda (a b) ($$store a (+ ($$fetch a) b)) 
                               ($$fetch a)) 
                 a))

If this seems a little heavy-weight, it’s because it probably is: for the Lisp compiler I’m building, I’m not expecting to use the closure feature a lot–and so, having something that rewrites my functions during the compilation process seems a reasonable trade-off for dealing with closures. It also means that my Virtual Machine doesn’t need to understand closures at all; I can just build it as a normal stack-based VM engine with garbage collection.

Development Chaos Theory

An on-going conversation about stuff I find interesting

On the development of my own Lisp interpreter, part 3: revisiting closures.

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply