Monday, April 16, 2007

Paving the Road to JRuby 1.0: Performance

Since it looks like Antonio Cangiano is going to delay the next running of the bulls until Ubuntu 7.04 is released, I figured the next JRuby 1.0 update could be about performance.

Performance is such a tricky area for Ruby. Folks outside the Ruby community happily malign its performance, fueled by both FUD and by some truths. Folks within the Ruby community either aren't affected by Ruby's performance (it's "fast enough") or they simply don't care (it's slow, but I still love it too much to leave). A small part of the Ruby community takes what is in my opinion a rather anti-Ruby stance: "write it in C" as a targeted solution for identified bottlenecks. I suppose reality lies somewhere inbetween all these views, with Ruby's performance certainly not being stellar in the general case, but reasonable and sometimes surprisingly good for specific cases.

Ruby 1.9 has raised the promise of a new bytecode-based interpreter engine--a Ruby "virtual machine" by some reckonings--with the goal of improving performance foremost on the minds of Ruby's developers. And again the performance question is rather complicated. Antonio's recent shootout, running only Ruby 1.9's chosen benchmarks against all other implementations, shows it doing extremely well. It comes out many times faster than Ruby 1.8 in almost every test, and no other implementation even comes close. The truth however doesn't change very much; for non-synthetic benchmarks (like running Rails) the situation only improves by about 15% for some tests, and for many other tests performance actually degrades. Of course Ruby 1.9 is still under heavy development, and many more improvements are ahead, but the wide range of results demonstrates again that benchmarking must be taken with a grain of salt.

Et tu, JRuby?

So then there's us and JRuby. JRuby's professed goal has never been to be a better Ruby; at best, we're trying to build the best Ruby possible on top of what we believe is the best VM in existence. And toward that end we've put most of our time into compatibility and correctness above all else, aiming for the goal of complete Ruby language compatibility and near-complete builtin-class compatibility, hoping for JRuby to someday be treated as "just another Ruby implementation" on which people can run their apps and design their clever frameworks and libraries. But in the past year, it's become apparent that we could actually exceed Ruby's performance for specific cases in the near term, and for general cases over time. So the target market for JRuby seems to be changing from "Ruby users that must use Java VMs, libraries, and servers" to "Ruby users that want a better-performing, more scalable implementation". And our noble quest for near-complete compatibility gets muddled with all these fiddly performance details.

But that's life, right? The goals you set out for yourself and your projects rarely align perfectly with the goals others set out for you. The trick is achieving a balance between what you want to do with your life and what others (like your community members or your employers) want you to do. Perhaps the successful developer is the one who can derive pleasure from both tasks.

On the road to JRuby 1.0, we've done our best to balance compatibility and performance. We are now nearing the end of the compatibility road, with Ruby language features nearly 100% and builtin classes almost as complete as we can make them on the JVM. The real reason for a JRuby 1.0 now is that we believe we're finally approaching "Ruby compatibility" for some high measure of compatibility, such that the vast majority of platform-agnostic Ruby code should run successfully on JRuby. And that's certainly no small feat, given that just a year ago we celebrated a mostly-broken cobbled-together Rails 1.1 app slowly handling CRUD operations. Today, people are deploying JRuby on Rails apps in production, and the game has only gotten more interesting.

So then, performance. You're all wondering what the answer is to this performance thing, aren't you? Is JRuby going to blow away all competition, including the nascent Ruby 1.9 and mid-term projects like XRuby, Rubinius, and Ruby.NET? It's certainly possible, but it's not our goal. Is JRuby going to be faster than Ruby 1.8 when 1.0 is released? For specific cases, I'd say yes...there's plenty of areas we already perform better than Ruby 1.8. For the general cases, it's hard to say. We perform well serving up Rails requests today, but only about 50-70% of Ruby 1.8's performance. And though we know where most of the bottlenecks lie, we're a little resource limited trying to fix them. Do we believe that we'll be faster than Ruby 1.8 in all general cases in the near future? Yes, we strongly believe that will happen.

Now of course I could ramble on and on about performance and put you all to sleep, but actual numbers will probably keep your interest better than my droning.

The Test

Like the shootout, I'm just running the Ruby 1.9 benchmarks here. We have not done any targeted optimization for these tests; there's no Fixnum magic or anything like that under the covers. What we have done is implement multiple general-purpose optimizations to speed method and block invocation, object creation, and interpretation. We've also spent a little more time getting these tests to compile successfully, but of course any work done on the compiler is generally applicable as well.

These results are all based on JRuby trunk code, revision 3480. I'm running Java 6 on a MacBook Pro 2.16GHz Core Duo, and all code was compiled to target Java 6.

For the first set of results the JRuby command executed basically amounts to the following:

JAVA_OPTS=-Xverify:none jruby SERVER -O [script.rb]
  • JAVA_OPTS=-Xverify:none specifies not to verify classes on startup; this is a large part of the speed hit Java applications have when starting. We turn it off here to remove a little of that overhead from the benchmarks, since most of them are very short runs to begin with.
  • SERVER specifies that JRuby should use the "server" VM, which takes a bit longer to optimize a bit more heavily when JITting Java code into native instructions. JRuby generally performs best under the server VM, though using it impacts startup time.
  • -O disables ObjectSpace in JRuby. This may seem like cheating, but the truth is that when ObjectSpace is enabled we pay double or triple the object creation cost in JRuby since we have to track all objects separately. Ruby's ObjectSpace is essentially zero-cost...it's just a window into the memory manager. Since we don't have a low or zero-cost way to implement ObjectSpace, it falls into what I categorize as "optional incompatibility". If you don't need it, turn it off and JRuby performance will improve. You can call it cheating if you like...the truth is that practically no code actually depends on ObjectSpace.
And then there's the standard disclaimer for any Java application: these times include startup, about 1.0 to 1.3 seconds. I know you all are picky about how benchmarks are run, and you love to include startup time even though the vast majority of the world's work is not done in the first few seconds of execution, but if you'll forgive the disabling of ObjectSpace I'm willing to meet you half way. Startup time is included.
TEST                    MRI     JRuby
--------------------------------------
app_answer 0.584 2.239
app_factorial ERROR 4.248
app_fib 7.126 10.549
app_mandelbrot 2.346 10.300
app_raise 2.587 4.441
app_strconcat 1.829 2.141
app_tak 9.711 13.345
app_tarai 7.529 11.050
loop_times 5.475 6.903
loop_whileloop 9.982 11.797
loop_whileloop2 2.009 3.292
so_ackermann 13.726 26.132
so_array 7.257 8.801
so_concatenate 2.063 3.546
so_count_words 0.507 4.491
so_exception 4.342 8.346
so_lists 1.238 2.744
so_matrix 2.258 4.241
so_nested_loop 5.609 7.898
so_object 7.050 6.496
so_random 2.139 4.643
so_sieve 0.740 2.240
vm1_block 23.604 27.405
vm1_const 16.774 20.650
vm1_ensure 16.546 15.800
vm1_length 21.210 21.899
vm1_rescue 13.170 16.197
vm1_simplereturn 21.091 31.376
vm1_swap 25.114 17.949
vm2_array 6.049 5.690
vm2_method 13.528 17.759
vm2_poly_method 16.886 24.956
vm2_poly_method_ov 4.576 6.972
vm2_proc 7.060 7.797
vm2_regexp 4.421 9.353
vm2_send 4.332 8.198
vm2_super 4.992 7.944
vm2_unif1 3.838 6.095
vm2_zsuper 5.409 8.452
vm3_thread_create_join 0.019 1.592
Some of these are rather surprising results. This is JRuby running in plain old interpreted mode, with no compilation involved. The majority of the tests still have JRuby slower than Ruby 1.8, but the gap has narrowed an incredible amount since last year. In only a few tests are we more than twice as slow as MRI, and on a couple we're almost twice as fast. If you'll imagine startup time removed from these numbers, and believe me when I say Java takes far more than 30-60 seconds to rev up to full speed, then the situation looks even better. What's more, we've got a good several weeks before the first 1.0ish release is scheduled (something betaish or RCish that proudly proclaims it's "done") and a bunch of great committers and community members eyeing performance metrics.

Ok, you may be asking "what about JRuby's compiler?" Yes, there is a compiler in the works. It's primarily been my job to build out the compiler, though Ola has jumped in a few times to offer his excellent help. And progress has been slow but steady. You have to remember that in all the world, there's no known 100% complete Ruby compiler for a general-purpose VM. There's Ruby 1.9, but its bytecodes have been custom designed around Ruby. There's XRuby and Ruby.NET, but it's still unclear how complete they really are. So this is an open area of research and development. But the results are looking great so far.

For this test, both the ahead-of-time (AOT) compiler and the just-in-time (JIT) compiler modes are activated. This forces the target script to be compiled before execution and also compiles any methods hit heavily once execution gets going. The command amounts to the following:
JAVA_OPTS="-Djruby.jit.enabled=true -Xverify:none" /
jruby SERVER -O -C [script.rb]
  • -Djruby.jit.enabled=true enables the JIT compiler. The default threshold at which a (compilable) method gets compiled is 50 invocations.
  • -C tells JRuby to compile the target script before executing it. If the script can't be compiled, JRuby bombs out with an error.
The compiler can't handle all the Ruby 1.9 tests yet. Specifically, it doesn't handle multiple assignment (e.g. vm1_swap), exception handling (anything involving rescue or ensure), or full class definitions. But it compiles the majority of the tests.
TEST                    MRI     JRuby
--------------------------------------
app_factorial 0.029 3.459
app_fib 7.094 5.093
app_mandelbrot 2.340 9.011
app_strconcat 1.827 2.391
app_tak 9.714 5.394
app_tarai 7.515 4.642
loop_times 5.428 2.942
loop_whileloop 10.016 6.027
loop_whileloop2 2.012 2.191
so_ackermann 13.610 11.254
so_concatenate 2.043 2.396
so_lists 1.250 2.141
so_matrix 2.256 2.394
so_nested_loop 5.614 4.167
so_random 2.158 3.291
so_sieve 0.741 1.798
vm1_block 23.392 12.397
vm1_const 16.980 8.908
vm1_length 21.094 10.901
vm1_simplereturn 21.252 9.345
vm2_array 6.025 3.041
vm2_method 13.049 7.794
vm2_regexp 4.468 7.647
vm2_unif1 3.855 3.293
vm3_thread_create_join 0.017 1.389
Ahh, now things look a bit different! In almost every case, JRuby performs better than Ruby 1.8. In the long running cases, the difference is even more obvious. Short runs still put Ruby 1.8 ahead, but I'm totally ok with that. Java, and by extension JRuby, has never had stellar short-run and startup performance. But we don't really have to if the heavy, long-running apps people actually use end up running faster.

Note also that this is the first real compiler we've had; it's not doing any optimization like Ruby 1.9's and Rhino's compilers, and it's almost certainly far from being efficient. I'm no compiler expert, and this is my first real attempt. These numbers already looking so good demonstrates that there's a grand adventure ahead of us: Ruby really can be made to perform well on the JVM. It just requires a little confidence and a little more effort.

That pretty much wraps up this installment. The bottom line: As we approach JRuby 1.0, our performance is better than it's ever been--faster than Ruby 1.8 for many specific cases, with stellar across-the-board performance right around the corner. And with other implementations like Ruby 1.9, XRuby, Rubinius, and Ruby.NET rapidly coming of age, Ruby's future is looking extremely solid.

script type="text/ruby"

Remember way back when the IRB applet was first thrust upon the world, and I promised that you truly could now use Ruby in the browser? At the time, it was still a little cumbersome, and generally you could only use Ruby by passing it directly into the applet.

Well Dion Almaer has taken it to the next awesome step:

<script type="text/ruby">

Check out his blog entry on how to enable true Ruby scripting on the client side, as well as the link to his demo page (still loads up the applet in the background, but that's a minor implementation detail).

Awesome.