Thursday, September 27, 2007

The Compiler Is Complete

It is a glorious day in JRuby-land, for the compiler is now complete.

Tom and I have been traveling in Europe the past two weeks, first for RailsConf EU in Berlin and currently in Århus, Denmark for JAOO (which was an excellent conference, I highly recommend it). And usually, that would mean getting very little work done...but time is short, and I've been putting in full days for almost the entire trip.

Let's recap my compiler-related commits made on the road:

  • r4330, Sep 16: ClassVarDeclNode and RescueNode compiling; all tests pass...we're in the home stretch now!
  • r4339, Sep 17: Fix for return in rescue not bubbling all the way out of the synthetic method generated to wrap it.
  • r4341, Sep 17: Adding super compilation, disabled until the whole findImplementers thing is tidied up in the generated code.
  • r4342, Sep 17: Enable super compilation! I had to disable an assertion to do it, but it doesn't seem to hurt things and I have a big fixme on it.
  • r4355, Sep 19: zsuper is getting closer, but still not enabled.
  • r4362, Sep 20: Enabled a number of additional flow-control syntax within ensure bodies, and def within blocks.
  • r4363, Sep 20: Re-disabling break in ensure; it caused problems for Rails and needs to be investigated in more depth.
  • r4367, Sep 21: Removing the overhead of constructing ISourcePosition objects for every line in the compiler; I moved construction to be a one-time cost and perf numbers went back to where they were before.
  • r4368, Sep 21: Small optz for literal string perf and memory use: cache a single bytelist per literal in the compiled class and share it across all literal strings constructed.
  • r4370, Sep 22: Enable compilation of multiple assignment with args (rest args)
  • r4375, Sep 24: Total refactoring of zsuper argument processing, and zsuper is now enabled in the compiler. We still need more/better tests and specs for zsuper, unfortunately.
  • r4377, Sep 24: Compile the remaining part of case/when, for when *whatever (appears as nested when nodes in the ast...why?)
  • r4388, Sep 25: Add compilation of global and constant assignment in masgn/block args
  • r4392, Sep 25: Compilation of break within ensurified sections; basically just do a normal breakjump instead of Java jumps
  • r4400, Sep 25: Fix for JRUBY-1388, plus an additional fix where it wasn't scoping constants in the right module.
  • r4401, Sep 25: Retry compilation!
  • r4402, Sep 26: Multiple additional cleanups, fixes, to the compiler; expand stack-based methods to include those with opt/rest/block args, fix a problem with redo/next in ensure/rescue; fix an issue in the ASTInspector not inspecting opt arg values; shrink the generated bytecode by offloading to CompilerHelpers in a few places. Ruby stdlib now compiles completely. Yay!
  • r4404, Sep 26: Add ASM "CheckClass" adapter to command-line (class file dumping) part of compiler.
  • r4405, Sep 26: A few additional fixes for rescue method names and reduced size for the pre-allocated calladapters, strings, and positions.
  • r4410, Sep 27: A number of additional fixes for the compiler to remedy inconsistent stack issues, and a whole slew of work to make apps run correctly with AOT-compiled stdlib. Very close to "complete" in my eyes.
  • r4412, Sep 27: Fixes to top-level scoping for AOT-compiled methods, loading sequence, and some minor compiler tweaks to make rubygems start up and run correctly with AOT-compiled stdlib.
  • r4413, Sep 27: Fixed the last known bug in the compiler. It is now complete.
  • r4414, Sep 27: Ok, now the compiler is REALLY complete. I forgot about BEGIN and END nodes. The only remaining node that doesn't compile is OptN, whichwe won't put in the compiled output (we'll just wrap execution of scripts with the appropriate logic). It's a good day to be alive!
I think I've done a decent job proving you can get some serious work done on the road, even while preparing two talks and hob-nobbing with fellow geeks. But of course this is an enormous milestone for JRuby in general.

For the first time ever, there is a complete, fully-functional Ruby 1.8 compiler. There have been other compilers announced that were able to handle all Ruby syntax, and perhaps even compile the entire standard library. But they have never gotten to what in my eyes is really "complete": being able to dump the stdlib .rb files and continue running nontrivial applications like IRB or RubyGems. I think I'm allowed to be a little proud of that accomplishment. JRuby has the first complete and functional 1.8-semantics compiler. That's pretty cool.

What's even more cool is that this has all been accomplished while keeping a fully-functional interpreter working in concert. We've even made great strides in speeding up interpreted mode to almost as fast as the C implementation of Ruby 1.8, and we still have more ideas. So for the first time, there's a mixed-mode Ruby runtime that can run interpreted, compiled, or both at the same time. Doubly cool. This also means that we don't have to pay a massive compilation cost for 'eval' and friends, and that we can be deployed in a security-restricted environment where runtime code-generation is forbidden.

I will try to prepare a document soon about the design of the compiler, the decisions made, and what the future holds. But for now, I have at least one teaser for you to chew on: there is a second compiler in the works, this time for creating real Java classes you can construct and invoke directly from Java-land. Yes, you heard me.

Compiler #2

Compiler #2 will basically take a Ruby class in a given file (or multiple Ruby classes, if you so choose) and generate a normal Java type. This type will look and feel like any other Java class:
  • You can instantiate it with a normal new MyClass(arg1, arg2) from Java code
  • You can invoke all its methods with normal Java invocations
  • You can extend it with your own Java classes
The basic idea behind this compiler is to take all the visible signatures in a Ruby class definition, as seen during a quick walk through the code, and turn them into Java signatures on a normal class. Behind the scenes, those signatures will just dynamically invoke the named method, passing arguments through as normal. So for example, a piece of Ruby code like this:
class MyClass
def initialize(arg1, arg2); end
def method1(arg1); end
def method2(arg1, arg2 = 'foo', *arg3); end
end
Might produce a Java class equivalent to this:
public class MyClass extends RubyObject {
public MyClass(Object arg1, Object arg2) {
callMethod("initialize", arg1, arg2);
}

public Object method1(Object arg1) {
return callMethod("method1", arg1);
}

public Object method2(Object arg1, Object... optAndRest) {
return callMethod("method2", arg1, optAndRest);
}
}
It's a pretty trivial amount of code to generate, but it completes that "last mile" of Java integration, being directly callable from Java and directly integrated into Java type hierarchies. Triply cool?

Of course the use of Object everywhere is somewhat less than ideal, so I've been thinking through implementation-independent ways to specify signatures for Ruby methods. The requirement in my mind is that the same code can run in JRuby and any other Ruby without modification, but in JRuby it will gain additional static type signatures for calls from Java. The syntax I'm kicking around right now looks something like this:
class MyClass
...
{String => [Integer, Array]}
def mymethod(num, ary); end
end
If you're unfamiliar with it, this is basically just a literal hash syntax. The return type, String, is associated with the types of the two method arguments, Integer and Array. In any normal Ruby implementation, this line would be executed, a hash constructed, and execution would proceed with the hash likely getting quickly garbage collected. However Compiler #2 would encounter these lines in the class body and use them to create method signatures like this:
    public String mymethod(int num, List ary) {
...
}

The final syntax is of course open for debate, but I can assure you this compiler will be far easier to write than the general compiler. It may not be complete before JRuby 1.1 in November, but it won't take long.

So there you have it, friends. Our work on JRuby has shown that it is possible to fully compile Ruby code for a general-purpose VM, and even that Ruby can be made to integrate as a first-class citizen on the Java platform, fitting in wherever Java code may be used today.

Are you as excited as I am?

30 comments:

Anonymous said...

Congratulations. I know it always feels great when you achieve something like that.

Anonymous said...

Any reason for not choosing to use comments? At least that way the markup is benign.

Anonymous said...

Great news

sutch said...

Congratulations. Your Ruby to Java class compiler sounds very interesting, especially to those of us that need to add functionality to Java systems.

Jason Morrison said...

Congratulations! That's an awesome achievement, and an immense contribution to the Ruby and Java communities.

Raphaël Valyi said...

First of all congrats for the compiler, for sure that's a big day.

But I should say I REALLY like your second compiler idea. This would be a fantastic way to get optional typing. I like Ruby quite much, but I'll never dismiss: in some cases (large teams and large time scales), static typing is required.

JRuby was already a good way to overcome that by allowing one to split up the code base between what should be statically constrained and should, on the contrary, allow quick prototyping. Your new compiler idea is just an other major step forward in that direction. That's great!

Raphaël Valyi

Anonymous said...

Congratulations!

Have you done any performance test compared to the interpreted one? Is it much faster?

Mariano said...

Triply congrats...

Okke said...

Great news!And indeed it would be a more than lovely addition when 'pure' java classes can be generated from ruby code.

About the proposed syntax. Why not open up the class method and add a method to define the java signature of a ruby method so you write something like

class MyClass {
jsignature :bla, :returns => String, :x => Integer, :y => Integer

def bla(x,y) {
return "#{x},#{y}
}
}

A bit more verbose but better readable.

Another idea is to apply a 'by convention mechanism'

def str_bla(int_x, int_y) {
....
}

and finally, since you control the compiler, why not introduce compiler directives using commment lines

# @param x int
# @param y int
# @return String
#
def str_bla(int_x, int_y) {
....
}

Or support all three ways to specify a signature. Just a few ideas ...

Nevertheless, great work!

Charles Oliver Nutter said...

anonymous 2: comments would work too, but it would require bit of additional parsing magic and might look a little ugly in the actual code. Of course, the map version isn't exactly beautiful either.

anonymous 3: performance when compiled is roughly double the interpreted performance, but there's still a lot of optimization to be done in both the compiled code and in the call logic.

okke: Of your ideas, only the convention and comment versions would meet my goal of having the same code run in both places. I think that's absolutely essential, since we have no intention of doing the whole "embrace and extend" thing in compatibility-breaking ways.

Anonymous said...

i might be a little silly but maybe someone can help explain

does this mean i could use ruby, and have a java-like thingy ready to move to other machines?
or in other words, write in ruby but could distribute something like a .jar file or similar?

msp said...

Congrats on a fantastic achievement! Things are getting REALLY interesting in JRuby world :)

Charles Oliver Nutter said...

anonymous 4: yep, that's about the size, though you can also load .rb files from a JAR right now too.

Brian said...

This is super cool - much needed.

1) Can you compile in GUI code too - maybe Swing or something simpler?
2) Will you be posting simple "Hello World"-sized compiled examples? (And how to run the compiler from NetBeans?)
3) Can you post a simple "Hello World" of loading .rb files from a JAR right now?

Congrats - this is really revolutionary - and much needed.

J. Whitley said...

One possible advantage for a comment-based annotation is that it would be more amenable to rdoc processing.

Juan said...

Congratulations ! It's simply amazing how the JRuby is evolving.

Now, take a good rest and enjoy some beers to celebrate it ! :)

Thank you for your work.

David Pollak said...

Congratulations!

You Rock!!

Anonymous said...

Excellent! Thank you guys! This is a huge accomplishment -- you should all be very proud.

bluetechnx said...

i like the idea of a jsignature idea.
jsig :method_name, :result, { :operand1 => :string, :operand2 => :int, ect... };

Or something to that affect.

I have used the comment version before in PHP5 using Services API/libraries not native to PHP for generating soap service xml files. The comment solution is very messy and you have scattered siguratures all over the file structure.

Anonymous said...

Congratulations!
jruby will become a first class citizen of java AND ruby world. I like the idea of the second compiler with lisp like optional types. Why not use a comment? With the correct mathematical order?
# Integer, HashMap -> String

Unknown said...

Congratulations Charlie. Your hard work and such rapid progress is truly inspiring.

War Pig said...

This is great news. I am involved in a project that looks to use JRuby for a SOAP to CORBA bridge. Ruby handles the SOAP well and Java the CORBA.

Unknown said...

Congratulations on a truly awesome job.

I am curious how you are handling a couple of things though,

1. evals, particularly ones where you yield self
2. dynamic class mutations, both cases like,

class MyClass
if (...)
def optionalMessage
...
end
end
end

or a more dynamic,

def addDynamicMethod (dynamicMessage)
class_eval %{
def #{dynamicMessage}
..
end
}
end

Unknown said...

Congratulations! This is a great achievement. It is truly amazing how quickly JRuby is evolving!

Daniel Berger said...

FYI, here's the Python 3000 approach to type annotations:

def foobar(a: Integer, b: Sequence) -> String:

From Guido:

"Function and method signatures may now be 'annotated'. The core language assigns no meaning to these annotations (other than making them available for introspection), but some standard library modules may do so; for example, generic functions can use these. The syntax is easy to read"

I definitely prefer to have them inlined in some fashion. I find your proposed notation rather ugly, sorry.

Anonymous said...

A truly great accomplishment by the whole JRuby team! I'm looking forward to even more Java/JRuby goodness from Compiler #2, #3, ..., #100... ;)

Anonymous said...

My only problem with comment based type signatures is that they aren't amenable to use for annotating dynamically defined methods.

I think probably the best option is an explicit class method in Class, like the jsig proposals, with perhaps a NOP definition for use in regular ruby.

Ellie said...

Congratulations on getting the compiler finished, this ought to shake things up a bit :)

With regards to your second compiler, how about using an accessor-style syntax for type information:

def fun(foo, bar)
returns :fun => :String
accepts :foo => :Integer, :bar => :Array
...
end

where the action of returns and accepts would be implementation-dependent. This is obviously a more verbose approach than the bare hash, but does add some additional flexibility.

luposlip said...

Amazing how you could work on a compiler, while participating (actively) in a conference and having beer in the evening (while fixing the last bug in the compiler).

Congratulations, and nice seeing you at JAOO.

Anonymous said...

Thanks for sharing!