I was looking at an old post of mine about how I implemented closures using closure generation, and it seems… overly complex.
So I went about doing a simpler implementation.
For review, we implement a LISP interpreter with a special instruction which essentially takes the following arguments:
($$closure function arg1 arg2 ...)
This special function takes the lambda declaration, pushes the arguments provided as constants, and generates a new function which in essence pushes the constants and calls a special CLOSURE instruction in the VM.
PUSHC function PUSHC arg1 PUSHC arg2 ... CLOSURE n
The closure instruction modifies the stack by putting the arguments at the top of the stack (through rotating the stack), then jumps to the function.
The purpose of this instruction is to allow us to add the closure of a lambda to the argument list of the lambda function; thus, if we have the following declaration:
(let ((a 0)) (lambda (b) (+ a b)))
We rewrite this using our special $$closure function (using the notation that ‘epsilon’ is ‘lambda’ without closures) as:
(let ((a 0)) ($$closure (epsilon (a b) (+ a b)) a)))
Our $$closure instruction then generates a new procedure which takes one argument ‘b’, adds ‘a’ to the stack, then calls our original epsilon (think “lambda without closures”) function.
We can handle variables from an outer scope by wrapping it; we declare the following methods:
(define ($$wrap v) (cons v nil)) (define ($$fetch v) (car v)) (define ($$store v val) (rplaca v val) val)
This has the net effect of using an indirect reference to refer to a particular element, and has the nice property that it can be passed as a “constant” to our closure function.
Thus, if we have the following:
(let ((a 0)) (lambda (b) (set! a (+ a b))))
Our rewrite function should generate the following:
(let ((a ($$wrap 0))) ($$closure (epsilon (a b) ($$store a (+ ($$fetch a) b))) a))
In other words, we wrap a as a list of one item, and this gives us the ability to access the contents of a even as our let statement passes out of scope.
This means we can do the following:
(define (makeproc) (let ((a 0)) (lambda (b) (set! a (+ a b))))) (set! Function1 (makeproc)) (set! Function2 (makeproc)) (Function1 3) > 3 (Function2 5) > 5 (Function1 4) > 7 (Function2 -4) > 1
Each invocation of makeproc creates a new list item (0) which is then stored in our new lambda function. This means Function1 and Function2 point to two separate (0) objects, which are then independently updated as we call either function.
Now my previous implementation seemed overly complex, so I re-implemented the algorithm.
The essence of the state that we track as we parse through the LISP code to detect closures is driven by the following observations:
First, we only need to make the following modifications to our statement as we perform closure analysis:
(1) Wrap all ‘lambda’ declarations with a $$closure statement with the closure inside the lambda. We don’t rewrite ‘lambda’ declarations that do not read or write variables outside of the scope of the lambda function.
(Note: while in our discussion above we use the keyword ‘epsilon’ to distinguish from ‘lambda’, in our implementation we use ‘lambda’ exclusively.)
(2) For those variables from an outer scope that is written to by an inner scope, we need to replace all ‘set!’ invocations with ‘$$store’, all instances of the variable with ‘$$fetch’, and instances where we create the variable with ‘$$wrap’.
Second, as we rewrite statements, the contents of a statement does not affect the outer part of the statement. Meaning as we recurse down we can rewrite statements inline; there is no context inside a statement (aside from if it is accessed) that we need to maintain.
Third, each variable is declared only in one place. Yes, a variable can be masked by an inner declaration, but that refers to a distinct variable.
My second point and third point means I can rewrite statements as I recurse and perform analysis.
Now the goal of my analysis and rewrite is the following:
(1) Determine which variables from an outer scope is written to or read from. Variables declared in my own scope are ignored, and we only need to understand the usage state of the first instance of a variable declared outside of my scope.
(2) We only need to determine if a variable is read from or written to; we do not need to track any other state.
So the state which I track as I recurse through the statements in my system is:
( local-scope usage )
where local-scope is a set of variables that have been declared in my local scope, and usage is a push-down stack of variables and their usage, as a pair of values–the first the variable name, the second a number 0 (declared but unused), 1 (read from) or 2 (written to).
For example, if we encounter the following declaration:
(let (a b c) ...)
Our state is:
((a b c) ((a 0) (b 0) (c 0)))
if, in an inner scope we encounter:
(let (c d e) ...)
our state is updated to
((d e a b c) ((c 0) (d 0) (e 0) (a 0) (b 0) (c 0)))
This means that inside if we encounter a lambda function, when it updates the usage, it only updates the usage of the variable in the usage list which is in our inner-most scope.
Meaning if we encounter the following inside our inner let:
(lambda (a) (set! c a))
First, note that we’re inside a lambda–a change of scope. So our state as we parse the contents of the lambda dumps the variables in local-scope list and replaces it with
((a) ((a 0) (c 0) (d 0) (e 0) (a 0) (b 0) (c 0)))
(Note that we push (a 0) on our usage stack because inside our lambda, the variable a has local scope.)
Next we examine the statements. We see a set!, and we examine c: since it is not in our local-scope we modify our state to show that our variable c is written. We only change the first item in the stack, since the variable c we’re interested in is in the inner scope:
((a) ((a 0) (c 2) (d 0) (e 0) (a 0) (b 0) (c 0)))
We also examine ‘a’, which is read from–but because it is inside the local-scope, we do not mark the usage of ‘a’.
Now our method we use to do this scanning ($$ScanUsage) returns a list of the variables in this inner scope which were found and modified; in our case, this is the list
On return, our inner let now sees the following:
((d e a b c) ((c 2) (d 0) (e 0) (a 0) (b 0) (c 0)))
We also get back the return value
(c), indicating that the variable c was modified in the inner scope of the lambda function.
The lambda function then can be modified for closure to pass in the variable found inside of it: our lambda above is rewritten in-place as:
($$closure (lambda (c a) (set! c a)) c)
Now on return we have our return value
(c), which we now examine against usage to see if the usage of the variable needs to be rewritten. Since in our usage list, the usage value is 2 (written to), we call another method $$RewriteWrite which recurses down and rewrites our closure to its final form:
($$closure (lambda (c a) ($$store c a)) c)
This also rewrites our inner let as:
(let ((c ($$wrap nil)) d e) ...)
That is, ‘c’ is rewritten as a list, initially initialized to
Our $$RewriteWrite method, of course, must examine each statement and if it is a declaration which masks the variable we are rewriting, we stop recursion at that point; the inner scope is another variable, albeit with the same name.
And discovering that scope in terms of our $$closure statement must take into account that the only variables in our lambda which are in the inner scope are those not in the parameter list of our $$closure statement.
Also note that our algorithm heavily leans on ‘rplaca’ and ‘rplacd’ to rewrite our statement rather than generating a new statement. This has implications about statements that may be generated and then later evaluated, where large parts of the code is contained in quote statements.
I believe this is an easier implementation in that the data structure being passed around is local to just the statement being examined, and is far simpler to understand than the older implementation. The code I have appears to correctly rewrite closure statements, though at this stage I haven’t finished testing yet.