Tuesday, September 11, 2007

JRuby Compiler Update, and a Nice Peformance Milestone

Hello again friends! It's time to update you on the status of the JRuby compiler.

Compiler Status

I've been working feverishly for the past several weeks to get the rest of the compiler complete. Currently, it's able to handle the majority of Ruby syntax. Here's a list of the remaining language features that do not compile:

  • "rescue" blocks; exception handling in Ruby is rather complicated, and there's some particularly odd uses of rescue that will be a bit tricky to support with normal Java exception-handling.
  • "class var declaration" is not yet supported. This is when you declare a class variable (@@foo) from within the body of a class or module. This primarily affects compiling class bodies, so although it prevents AOT compilation of some scripts, it doesn't usually affect individual methods.
  • "opt n" execution. This is specifying "-n" to the Ruby runtime, and it loops the provided script as though it were surrounded by "while gets(); ... end". It's useful for line-by-line processing of stdin.
  • "post execution" blocks. Post exe blocks are when you specify an END { ... } block somewhere in your script. These blocks are saved up and executed at the end of the script execution, regardless of where they appear in the script. They're a bit like Kernel#at_exit blocks.
  • "retry". Tell me friends, do you know what "retry" actually does? Retry is used within a block/closure, and it causes the method containing the closure to be re-called anew. And as an interesting quirk, the original arguments to the method are re-evaluated, so if you call foo(bar()) and a retry is triggered within foo(), bar() will get invoked again for the retried call to foo(). Weird, eh? Update: I didn't explain this well. Here's another attempt: if you have the following code:
    def foo(x = bar()); 1.times {retry}; end
    And you call foo with no arguments, allowing the default argument logic to fire, retry will cause that logic to fire again and again. It's essentially re-entering the method anew with the original arguments, but causing *argument processing* to be revisited. I'm not sure why you'd want this behavior, since it could frequently result in default arguments to re-call methods that might only be valid the first time.
  • Some non-local flow control is not yet complete. Non-local flow control happens any time you return, break, or next from within a block (when not immediately inside a normal loop construct). Much of non-local flow control is working, but I need to flush out any remaining cases that aren't running correctly.
It's a pretty short list, eh? Obviously "rescue" is the biggest and trickiest item here. Without exception handling, it's hard to say the compiler is near completion. The complications I mentioned involve the ability to embed rescue processing into arbitrary expressions. Here's a good example:
a = [1, 2, (begin; raise; rescue; 3; end)]
When this code is compiled, it turns into a local variable assignment. The value assigned is a literal array construction with three elements: a Fixnum 1, a Fixnum 2, and a rescued block of code. The typical way to construct the array then is to follow these steps:
  1. Construct an array of the appropriate size
  2. Dup the array reference
  3. Push a constant integer zero
  4. Push Fixnum 1
  5. Insert Fixnum 1 at index zero in the array. This consumes the dup'ed array, the index, and the Fixnum1.
  6. Dup the array reference again
  7. Push a constant integer one
  8. Push Fixnum 2
  9. Insert Fixnum 2
  10. Dup the array reference again
  11. Push a constant integer two
  12. Now it gets complicated; we must recurse in the compiler to handle the rescue block
  13. The rescue block is compiled and a "raise" is triggered in the code
  14. The exception raised is handled, resulting in the whole rescue leaving a Fixnum 3 on the stack
  15. Insert the Fixnum 3
  16. Construct a RubyArray object with the remaining object array
Now that seems simple enough. However there's a sneaky complication at steps 13 and 14: catching an exception clears the operand stack, and the original created array, its duplicated reference, and the integer two disappear as a result. The value "returned" from the rescue section therefore has nowhere to go.

We will likely have to solve this complication in one of two ways:
  • We could save off the stack when entering code that might trigger exception handling
  • We could put exception-handling logic in a separate method and invoke it in-place, thereby protecting our executing stack from clearage.
It remains to be seen which mechanism will work out to be simplest to compile and most performant.

A Nice Performance Milestone

And on the topic of performance, the recent compiler work has allowed us to reach a new milestone: we now exceed Ruby 1.8.6's performance on M. Edward (Ed) Borasky's MatrixBenchmark.

Some months back, after the Mountain West RubyConf in Salt Lake City, Ed posted an interesting blog entry where he professed a lot of confidence in JRuby's future. We emailed a bit offline, and he pointed me to this matrix benchmark he'd been using to measure the relative performance of Ruby 1.8.6 and Ruby 1.9 (YARV). I told him I'd give it a try.

Originally, we were perhaps 50% to 100% slower than Ruby 1.8.6. This was back when hardly anything was compiling, and there had been few serious efforts to optimize the JRuby runtime. Performance slowly crept up as time went on. But as recent as a week ago, JRuby performance was still roughly 20-25% slower than 1.8.6.

So last week, I dug into it a bit more. I turned on JRuby's JIT logging (-J-Djruby.jit.logging=true) and verbose logging (-J-Djruby.jit.logging.verbose=true) to log compiling and non-compiling methods, respectively. As it turned out, the "inverse_from" method in matrix.rb was not yet compiling...and it was where the bulk of MatrixBenchmark's work was happening.

The final sticking point in the compiler for this method was "operator element assignment" syntax, or basically anything that looks like a[0] += 5. It's a little involved to compile; you have to retrieve the element, calculate the value, call the operator method, and reassign all in one operation. For the ||= or &&= versions, you have to perform a boolean check against the element to see if you should proceed to the assignment. A good bit of compiler code, but it had to be done.

So then, with "OpElementAsgn" compiling, it was time to re-run the numbers. And finally, finally, we were comfortably exceeding Ruby 1.8.6 performance:
Ruby 1.8.6:
Hilbert matrix of dimension 128 times its inverse = identity? true
586.110000 5.710000 591.820000 (781.251569)

JRuby trunk, Java 6 server, ObjectSpace disabled:
Hilbert matrix of dimension 128 times its inverse = identity? true
372.950000 0.000000 372.950000 (372.950000)
Or should I say vastly exceeding? By my calculation this is an easy 2x performance increase, and perhaps a 70% improvement just by getting this one extra method to compile.

On Beyond Zebra

I believe we're pretty well on-target to have the compiler completed by RubyConf in November. I'm about to embark on a refactoring adventure to prepare for the stack-juggling I'll have to do to support rescue blocks. That will mean minimal progress on adding to the compiler until the end of the month, but ideally the refactoring will make it easy to get rescue compilation complete. The others are just a matter of spending some time.

Once the JRuby compiler is complete, we will start testing in earnest against a fully pre-compiled Ruby stdlib. Along with that, we'll wire in support for pre-compiling RubyGems as they install and pre-compiling Ruby scripts as they are executed and loaded. Much of this works already in prototype form, but it waits for the completion of the compiler to go into general use.

I also have plans for a "static" compiler for JRuby that enable compiling Ruby classes into normal, instantiable, callable, static Java classes. This would bring us on par with other compiled languages on the JVM, and allow you to directly instantiate and invoke JRuby/Ruby objects from within your Java code.

Beyond all this work, Tom and I have been discussing a whole raft of performance improvements we could make to the underlying JRuby runtime. There's a lot more performance to be had, and it's just around the corner.

Exciting times, friends. Exciting times.


Anonymous said...

Nice :)
Having a static compiler is a great feature !

Vladimir Sizikov said...


Charles, is there anything that somebody like me (with java and ruby experience, but without much knowlegde on JRuby internals) could help out. I have some spare time and would like to spend it better than just watching TV. :)

Any pointers or links would be appreciated!

Charles Oliver Nutter said...

Vladimir: For someone in your position, there's probably a perfect option: help us profile memory and performance and test multi-threaded operation. We have been working hard to improve perf and memory usage, and to make multi-threaded applications more stable, but there's only limited resources available to test and profile. Help from the community is needed...if people can identify areas where performance, memory, or threading need work, we can take it from there.

Vladimir Sizikov said...

Charles: Wow, profilers and memory leaks are the third soft spot for me (besides Java and Ruby), so all-in-all, very interesting suggestion!

Do you have any specific tests/test-cases to try out first? Any particular area that requires more attention?

I definitely could start with my own, home-grown scripts, but maybe there is something out there (some open performance benchmarks or tests) that are already used by you and the community?

P.S. As for the JRuby version, I'm just using JRuby hand-compiled from the trunk, right?

Anjan said...

Hi Charles,

static compiler : this will enable better unit testing integration with existing java projects. Cool.

Thank you guys for all the hard work.


Anonymous said...

I'd focus on Rails performance. Probably the most important benchmark at all. It would be great if JRuby 1.1 comes close to MRI performance in this regard.

Something I noticed when looking at Rails performance with a profiler:
RubyModlue.includeModule() -> getRuntime().getCacheMap().clear() consumed about 10% of the complete request (including Jetty/Goldspike)

This was a very simple page though and the profiling was nothing scientific but a quickshot only (the stacktraces are impressive by the way)

Charles Oliver Nutter said...

Vladimir: No specific test that would be good for profiling, but running Rails would be a great real-world test. And yes, run JRuby from trunk; it's quite stable and easy to build.