Tuesday, November 20, 2007

Bytecode Tools in Ruby: A Low-level DSL

I've been toying with the idea of rewriting the JRuby compiler in Ruby, or at least writing the appropriate plumbing that would allow someone to do something similar. Migrating the JRuby compiler may or may not be worth it, since the existing Java compiler is basically done and working well, and a conversion would be sure to introduce bugs here and there. But it would certainly be a show of faith to give it a try.

As part of this effort, I've built up some basic utility code and a simple JVM bytecode builder that could act as the lowest level of such a compiler. I'm looking for input on the syntax at this point, while I take a break from it to explore JRuby Java integration improvements I think should be done before 1.1.

So here's the Ruby source of the builder, as contained within a test case:

require 'test/unit'
require 'compiler/builder'
require 'compiler/signature'

class TestBuilder < Test::Unit::TestCase
import java.lang.String
import java.util.ArrayList
import java.lang.Void
import java.lang.Object
import java.lang.Boolean

include Compiler::Signature

def test_class_builder
cb = Compiler::ClassBuilder.build("MyClass", "MyClass.java") do
field :list, ArrayList

constructor(String, ArrayList) do
aload 0
invokespecial Object, "<init>", Void::TYPE
aload 0
aload 1
aload 2
invokevirtual this, :bar, [ArrayList, String, ArrayList]
aload 0
putfield this, :list, ArrayList

static_method(:foo, this, String) do
new this
aload 0
new ArrayList
invokespecial ArrayList, "<init>", Void::TYPE
invokespecial this, "<init>", [Void::TYPE, String, ArrayList]

method(:bar, ArrayList, String, ArrayList) do
aload 1
invokevirtual(String, :toLowerCase, String)
aload 2
invokevirtual(ArrayList, :add, [Boolean::TYPE, Object])
aload 2

method(:getList, ArrayList) do
aload 0
getfield this, :list, ArrayList

static_method(:main, Void::TYPE, String[]) do
aload 0
ldc_int 0
invokestatic this, :foo, [this, String]
invokevirtual this, :getList, ArrayList


For those of you who don't speak bytecode, here's roughly the Java code that this would produce:
import java.util.ArrayList;

public class MyClass {
public ArrayList list;

public MyClass(String a, ArrayList b) {
list = bar(a, b);

public static MyClass foo(String a) {
return new MyClass(a, new ArrayList());

public ArrayList bar(String a, ArrayList b) {
return b;

public ArrayList getList() {
return list;

public static void main(String[] args) {

The general idea is that fairly clean-looking Ruby code can be used to generate real Java classes, providing a readable base for code generation tools like compilers.

There's a couple things to notice here:
  • Everything is public. I have not wired in visibility and other modifiers mainly because it starts to look cluttered no matter how I try. Suggestions are welcome.
  • The bytecode, while clean looking, is pretty raw. This interface also doesn't save you from yourself; if you're not ordering your bytecodes right, you'll end up with an unverifiable class file.
  • It's not apparent just from looking at the code which types specified are return values and which are argument values. Something more explicit could be useful here.
I'd like to continue this work. The above code, run against JRuby trunk and the lib/ruby/site_ruby/1.8/compiler library I'm working on, will produce a working MyClass class file:
~/NetBeansProjects/jruby $ jruby test/compiler/test_builder.rb
Loaded suite test/compiler/test_builder
Finished in 0.096 seconds.

1 tests, 0 assertions, 0 failures, 0 errors
~/NetBeansProjects/jruby $ java -cp . MyClass foo

So it's actually emitting the appropriate bytecode for this class.

Comments? Thoughts for improvement?


raggi said...

I like it, certainly opens up the scope to people interested in having a look.

As you say, the utility code capabilities are also very useful.

Christian Seiler said...

From a user's perspective I'd rather like to see 1.1 out first and performing well than a migration of the compiler to Ruby. Just my two cents..

Pickles said...

Just a thought for handling visibility and making it more clear the difference between return type and parameters:

method(:bar, :visibility => :private, :returns => ArrayList, :arguments => [String, ArrayList]) do

You could have reasonable defaults to, like public for visibility and void for return type.

John Lam said...

I found that using strings for method call targets worked really well as opposed to passing multiple arguments. It makes the DSL more readable and closer to the output of ILDASM on the CLR.

Charles said...

if you just have the return value of constructor, method and friends be an identifier for the code being generated then you could just prepend them with a call to public.

That or you could just make use of an instance variable to make the same class trick ruby does for pre-pending regions with public, private or protected.

Or just make public_method, private_constructor, etc methods.