Tuesday, May 09, 2006

A DSL for Bytecode Generation

Having discovered the power and magic of bytecode generation, it occurred to me that none of the existing libraries have the subtle elegance that most code generation tasks really deserve. I believe there's a simple reason for this: they're written in Java. Some of them really try to make things easier, and perhaps come close to succeeding, but they're all still cumbersome, clunky, and very, very verbose.

I have been writing JRuby's compiler in pure ruby by calling out to a Java-based bytecode generation library. My initial attempts were fairly straightforward calls: push this, call that, pop the other. Very linear, very boring, very verbose, and not a great deal simpler than the equivalent Java code. It seemed such a shame to waste an expressive language like Ruby on such a menial task, I've decided to build a domain-specific language for Java bytecode generation.

A short sample of what works today (basically the operations I needed for my test compiler):

class_bytes = ClassBuilder.def_class :public, "FooClass" do |c|
c.def_constructor :public do |con|
con.call_super
con.return_void
end

c.def_method :public, :string, "myMethod", [:void], [:exception] do |m|
m.call_this GenUtils.array_cls(:string), "getStringArr"
m.call_this :string, "getMessage"
m.return_top :ref
end

c.def_method :private, :string, "getMessage", [:void] do |m|
m.construct_obj :stringbuffer, [:string] do |p|
p.constant "Now I will say: "
end

m.call_method :stringbuffer, "append", :string, :stringbuffer do |p|
p.constant "Hello CodeGen!"
end
m.call_method :string, "toString", :void, :stringbuffer
m.return_top :ref
end

c.def_method :public, GenUtils.array_cls(:string), "getStringArr" do |m|
m.construct_array :string, 5 do |p,i|
p.constant "string \##{i}"
end

m.array_set 2, :string do |p|
p.constant "replacement at index 2"
end

m.return_top :ref
end
end


This approach has a number of advantages over others:

  • The structure of the generator is very similar to that of the generated code
  • Method parameters and array initializers (or the code to make them available) are logically associated with the eventual call or array they'll apply to
  • The builders maintain some internal state, and will be able to count stack depth, validate typing, automatically attempt casts, and automatically return the correct types
  • It's far easier to read
  • It's far more fun
Some notes on the code above:
  • The param-building blocks (with |p| params) are in all cases optional. If omitted, method calls will assume you have prepared all params, array creation will create an empty array, and array sets will assume the value is already present on the stack.
  • The core bytecode operations (dup, etc) are still present and callable on the MethodBuilder m. This allows you to fall back to linear-style when necessary.
  • Various classes (perhaps eventually all classes) from java.lang are aliased as symbols like :string and :object. At the ClassBuilder level, it is also possible to "import" classes, as in c.import "javax.swing.JFrame", :jframe and use the aliased symbol throughout this generation (much like import in a .java file)
  • I'm looking for a better way to handle arrays. GenUtils is only used internally except for array types, and I'd like to hide it completely.

I'm tossing this working snippit out to the world for comments and critique...and perhaps as a little teaser of things to come. I'm planning to add a few more operations and port the early v1 compiler over to this soon...then both will develop together. I see this DSL/library as having huge potential for other projects that want a simple, elegant way to do bytecode generation.

Thoughts? Comments?

3 comments:

Chris Nokleberg said...

Even in Java it probably isn't recommended to be calling the low-level visitor methods like push and pop directly, which is why there are helper classes like GeneratorAdapter
(in asm-commons.jar). I'd recommend layering on top of it instead of a bare MethodVisitor (assuming you're still using ASM).

The DSL idea itself seems good, although I don't know enough Ruby to comment on the syntax specifically.

John Lam said...

Here's a link to my generate.rb file that contains the core proxy generators for RubyCLR:

http://rubyforge.org/plugins/scmsvn/viewcvs.php/trunk/Src/Ruby/generate.rb?root=rubyclr&view=markup

I use a bunch of eval magic to eliminate the need for a variable inside of the blocks. For implementation details on how that's done, take a look at:

http://rubyforge.org/plugins/scmsvn/viewcvs.php/trunk/Src/Ruby/dynamicmethod.rb?root=rubyclr&view=markup

In particular, look at the magic in core_create_raw_ruby_method.

Paul said...

Very tidy -- a great demonstration of what one can do with a dynamic language!