Sunday, April 30, 2006

More v1 Compilation Experiments

Since it's turned out to be so easy to generate bytecodes based off the Ruby AST, I tackled another trouble spot for us: literal array creation.

The current interpreter, in order to follow a generic, iterative model, has a large overhead for instantiating literal arrays. For example, to create the following array:

['this', 'is', 'an', 'array', 'of', 'strings']

There are at least seven AST nodes to process: the array node and six nodes for the elements (technically, there's more that one node for each element, but we'll call it one for simplicity). The recursive way to process these nodes would be to visit the array node, and then recurse to process each of the elements. However, we're trying to escape recursion, so a different model was necessary.

The current interpreter avoids recursion by maintaining its location in the AST on a stack. As nodes are encountered, they are pushed onto that stack, and their instructions--rather than recursing--expand the subnodes by also pushing them onto the stack. Once the expansion reaches a termination point (such as a literal value or the last element in a block), the expanded nodes are processed one at a time.

The instructions associated with each node perform whatever JRuby operations are required to process that node. They instantiate classes, define methods, assign variables. For this reason, they actually are of the type Instruction internally, and this is where we can plug in the compiler.

Compiler Design Version One: Microcompilation

Because the interpreter simply traverses the AST, retrieving and caching instructions as each node is encountered, we can short-circuit this process by pre-populating the appropriate instruction for a parent node with a compiled instruction based on its children. When the interpreter reached that node and sees an instruction has been prefetched, it executes that rather than continuing to traverse. As a result, we can selectively compile branches of the AST for targetted speed gains. I call this microcompilation.

The benefit to this approach is that the compiler can be written a piece at a time, tested incrementally as those pieces come together. It also has the huge benefit of allowing compiled code to run within the existing interpreter engine without any modifications to JRuby.

There is, however, a downside to my current approach. Instead of pushing toward a green-threadable and continuation-able iterative model, this simple compiler will still deepen the stack. However, given that no applications we are currently working to support seem to require continuations or threads, I believe this is a good initial approach. It may also prove true that this simpler compiler is fastest (while perhaps not the most Ruby-compliant), and for many embedded Ruby applications it will continue to be useful into the future.

Processing The Array

So back to our six-element array. In order to handle it iteratively, rather than recursing to process each element, a complicated transformation happens. The instructions elements of the array are mixed with instructions for aggregating the results, and another instruction is added at the end to finally build the array. So the sequence of instructions processed, after initially processing the array node, goes something like this:

String Node: 'this'
Aggregate (push down result accumulator)
String Node: 'is'
String Node: 'an'
String Node: 'array'
String Node: 'of'
String Node: 'strings'
Build Array (deaggregates all results and constructs a ruby array)

For more complex elements, this approach also works; if one element requires a method call or a variable lookup, that branch of the AST is processed inline, with the end result placed on the top of the accumulator stack. Processing then continues with the next element.

The added overhead of this approach is not without reason. Yes, it would have been possible to modify the interpreter engine to understand arrays and handle them specially, but my goal originally was to create a generic iterative interpreter (and perhaps to prove that all nodes could be processed in this way). However, that design mostly proven out, it is now time to address its deficiencies.

Compiling The Array

Compiling the array turns out to be a simple matter, especially considering that the only element type supported by the compiler right now is a literal string. The compiled code instantiates a new IRubyObject[] of the appropriate size, then visits the compilers for each of the elements. The String node compiler I wrote for the "foo" test is reused here, and as each element is encountered bytecodes are generated to construct ruby String objects and insert them into the array. Finally, an operation is added to construct the ruby array instance, and our work is done.

During this round of work, it has also become apparent that although we make heavy use of JRuby internals from the compiled code, large portions of the interpreter itself can go away: for example, with a handy-dandy operand stack available in bytecode mode, we would no longer need to maintain state on a separate result accumulator. The "v1" compiler is turning out to be a fun and easy affair.

The Numbers

The results of these little microcompilations are perhaps anti-climactic...after all this noise I post a handful of numbers for you to look at. Even if the numbers are great, it's a small payoff for listening to my rambling. Sorry! That's how it is!

The test methods take the same form:

def foo_arr; ['this', 'is', 'an', 'array', 'of', 'strings']; end

The foo_arr method is our compiled version, and bar_arr is the uncompiled version. And so after another 1_000_000.times { foo_arr } and { bar_arr } we have:

foo_arr time: 32.628
bar_arr time: 93.31700000000001

This equates to a roughly 60% speedup from what is, again, a fairly simple operation even in the current interpreter.

Coming up in future posts: more v1 microcompilation and numbers, v2 and v3 compiler designs, and more...