Sunday, June 01, 2008

Maglev

Of course anyone who reads my blog expected I'd have something to say about Maglev once it was made public. I've previously performed what I thought was a fair analysis of the various Ruby implementations, and Maglev was mostly a sidebar. With their coming out at RailsConf, they're now fair game for some level of analysis.

Avi Bryant and Bob Walker talked about Maglev, a new Ruby VM based on Gemstone's Smalltalk VM, at RailsConf this weekend. And there's been an explosion of coverage about it.

First off, they demonstrated its distributed object database automatically synchronizing globally-reachable state across multiple VMs. It's an amazing new idea that the world has never really seen...

except that it isn't. This is based on existing OODB technology that Gemstone and others have been promoting for better than a decade. It's cool stuff, no doubt, but it's been available in Gemstone's Smalltalk product and in their Java product for years, and hasn't seen widespread adoption. Maybe it's on the rise, I really don't know. It's certainly cool, but it's certainly not new.

The duo eventually moved on to show off some performance numbers. And please pardon me if I don't have these numbers exactly right. They showed our old friend fib running something like 15x faster. Method dispatch something like 30x faster. While loops 100x faster. Amazing results.

Except that these are results reported entirely in a vacuum. Whether this is fib following the "rules" of Ruby is entirely an open question. Whether this is method dispatch adhering to Ruby's call logic is entirely an open question. Whether this is a while loop using all method calls for its condition and increment steps is an open quesetion. Because the Maglev guys haven't started running Ruby tests yet. Is it Ruby?

I don't want to come off as too defensive here, and I don't want to appear as though I'm taking shots at another implementation. I've certainly launched my share of controversial commentary at Rubinius and IronRuby over the past few months, and while some of it may perhaps have slipped over the edge of polite commentary, I always thought I was being at least honest.

But there's an entirely new situation with Maglev. Maglev has begun to publish glowing performance numbers well in advance of actually running anything at all. They haven't started running the RubySpecs and have no compatibility story today. You can't actually get Maglev yet and run anything on it. It's worse than Vaporware, it's Presentationware. Go to Gemstone's site and download Maglev (you can't). Pull the source (you can't). Build it yourself and investigate what it does (you can't). You start to understand what I mean. And this is what the "Ruby media" is calling the most disruptive new Ruby technology. Dudes, come on. Were you born yesterday?

It's time for a confession. I've been too hard on IronRuby and Rubinius. Both teams are working really hard on their respective implementations, and both teams have really tried to stay true to Ruby ideals in everything they do. Guess what...IronRuby runs Rails. Rubinius runs Rails. And if they're not production ready now, they will be soon. And that's a good thing for Ruby. Sure, I still believe both teams may have made unreasonable claims about what they'd be able to accomplish in a given period of time, but we've all made those claims. If they haven't delivered on all milestones, they've delivered on most of the important ones. And it's those milestones I think deserve some credit now.

My sin is pride. I'm proud of what we've accomplished with JRuby. And when new implementers come along saying they're going to do it in half the time, I feel like it belittles the effort we've put in. IronRuby has done it. Rubinius has done it. And while I've occasionally lashed out at them as a result, I've always been right there trying to help them...answering questions, contributing specs, suggesting strategies and even committing code. In the end it's the cockiness...the attitude...the belief that "I know better than you do" that irritates me, and I'm too sensitive to it. Color me human. But it's time for me and others to understand another side of IronRuby and Rubinius in light of this new contender.

Rubinius and IronRuby teams have always considered compatibility as the primary goal. If you can't run Ruby apps, you're not Ruby, right? And so every step of the way, as they published performance results AND compatibility metrics, they've always been honest about the future.

IronRuby has managed to get great performance on several benchmarks by leveraging the DLR and the excellent language implementation folks on the DLR and IronPython teams at Microsoft. So if nothing else, they've proven many of the "fast-bootstrapping" claims they've made about the DLR. And they've always been balanced in reporting results...John Lam has shown a couple slow benchmarks along with fast benchmarks at every talk, not to mention showing spec results with pass/fail rates clearly spelled out. That honesty has not gone unnoticed, and it shows a realism and humility that will ensure IronRuby's future; a realism that will ensure Ruby users who really want or need a .NET implementation will receive an excellent one.

Rubinius has taken an entirely new approach to implementing Ruby by attempting to write as much as possible in Ruby itself. Maybe they have a lot of C/C++ code right now, but it's not that big a deal...and I was perhaps too pedantic to focus on this ratio in previous posts. What's important is that Rubinius has always tried to be an entirely open, community-driven project. Their successes and failures are immediately accessible to anyone who wants to pull the source; and anyone who wants to pull the source can probably become a Rubinius contributor within a short amount of time. They've had performance ups and downs, but again they've been honest about both the good and the bad. And like IronRuby, if they haven't trumpeted the bad side of things, it's because they're already proving that the Ruby-in-Ruby approach absolutely can work. The bad side will lessen over time until it completely disappears.

Then there's Maglev. Like the other impls, I'm excited that there's a new possibility for Ruby to succeed. A high performance, "scalable" Ruby implementation is certainly what this community needs. But unlike most of the other implementations, it seems like Maglev is pushing performance numbers without compatibility metrics; marketing before reality. Am I far off here?

Let's take a step back. Maglev will probably be amazing. It will probably be fast, maybe on some order approaching the numbers they've reported. Maybe this will happen some day along with support for existing Ruby code. And hell, maybe I'll use it too...I want to be able to write applications in Ruby and have insane performance so I can just write code the way I want to write code. So do you.

But we're talking theory here. So let's do an experiment using JRuby briefly.

Maglev published fib numbers as being around 15x MRI performance. That's very impressive. So let's check MRI perf on my machine (keeping in mind, as I've stated previously, that fib is far from indicative of any real-world performance):

Ruby 1.8.6, fib(34), best of 10: 6.56s

Now let's try stock JRuby, with full compatibility:

JRuby 1.1.2, fib(34), best of 10: 1.735s (3.8x faster)

Not bad, but certainly not up to Maglev speeds, right? Well...perhaps. JRuby, like IronRuby and Rubinius, has always focused first on compatibility. This means we're bending over backwards to make normal Ruby code run. So in many cases, we're doing more work than we need to, because compatibility has always been the primary goal. IronRuby and Rubinius will report the same process. Make it work, then make it fast. And both IronRuby and Rubinius are now starting to run Rails, so I think we've proven at least three times that this is the right approach.

But let's say we could tweak JRuby to run with some "future" optimizations, optimizations that might not be quite "Ruby" but which would still successfully run these benchmarks.

First, we'll turn off first-class frame object allocation/initialization, since it's not needed here:

JRuby 1.1.2, fib(34), no frames: 1.273s (5.15x faster than MRI)

Now we'll turn off thread checkpointing needed to implement operations like Thread#kill and Thread#raise, as well as turning off artificial line-position updates:

JRuby 1.1.2, fib(34), -frames, -checkpoints, -positions: 1.25s (5.24x faster)

Now we'll add in some fast integer operations like Ruby 1.9 includes, where Fixnum#+, -, etc are specially-handled by the compiler. And we'll simultaneously omit some last framing overhead that's still around to handle backtrace information:

JRuby 1.1.2, fib(34), "fastest" mode: 0.984s (6.67x faster)

So just by tweaking a few things we've gained another 3x performance over MRI. Are we having fun yet? Should we extrapolate to optimizations X, Y, Z that bring JRuby performance another half-dozen times faster than MRI? If we can run the benchmarks, it shouldn't matter that we can't run Ruby code, right?

The truth is that not all of these optimizations are kosher right now. Removing the ability to override Fixnum#+ certainly makes it easier to optimize addition, but it's not in the spirit of Ruby. Removing frames may be legal in some cases (like this one) but it's not legal in all cases. And of course I've blogged about how Thread#kill and Thread#raise are broken, but we have to support them anyway. On and on we can go through lots of optimizations you might make in the first 100 days of your implementation, only to back out later when you realize you're actually breaking features people depend on.

This all adds up to a very different picture of Ruby implementation. Rather than wishing for a rose-colored world where anyone with a new VM can swoop in and post magic performance numbers, perhaps we as Ruby community members should be focusing on whether this is going to help us actually run today's apps any better; whether these results are repeatable in ways that actually help us get shit done. Perhaps we should be focusing on the compatibility story over bleeding-edge early performance numbers; focusing on tangible steps toward the future rather than the "furs and gold rings" that David warned about in his keynote. Maybe we should think more about the effect that broadcasting vaporware performance numbers will have on the community, rather than rushing to be the first to republish the latest numbers on the latest slides. Maybe it's worth taking all this microbenchmark nonsense with a grain of salt and trying it out ourselves (if, of course, that's even possible) before serving as the mouthpiece for others' commercial ventures.

Am I wrong? Am I being unfair? Am I taking an unreasonable shot at Maglev?