Headius

Wednesday, February 21, 2007

Tech Days Talk a Success

I just completed my talk at Tech Days Hyderabad, and I think it was resounding success. If I heard right, the total head count for my "JRuby Essentials" talk was over 1200 people, by far the largest crowd I've presented for. There was a hoard of questioners immediately after the talk, who followed me into the hall to keep asking. I gave out my entire in-pocket batch of business cards, and even received a round of applause after the JRuby on Rails demo. All this even after my talk was delayed and cut to 40 minutes because the earlier talks ran too long. It was a sprint, but I think it went very well.

I will be posting the slides, but they're basically the same content many of you have probably seen the past couple months. For those who haven't had the pleasure, we'll start posting slides more regularly, and I'll try to blog a few example walkthroughs for those of you who want to duplicate them for your own talks. Feel free to steal it all and spread the word.

Tuesday, February 20, 2007

JRuby World Tour: Hyderabad

Success! It is now 3:00AM in Hyderabad India, and I have checked into the Novotel in HiTec City. The flight was uneventful; I got my demos basically working and then slept the majority of the time. The immigration and customs at the airport were perhaps the least threatening of any I've seen. Immigrations basically just looked for my visa, took my arrival card, and stamped my passport. Customs was little more than a big white X-ray machine with "CUSTOMS" printed on a white piece of paper taped to the front.

The smooth arrival was just what the doctor ordered. The reason I'm on this round-the-globe trip is because my original tickets to India were issued to the wrong name: Charles O'Nutter instead of Charles (O) Nutter. I never noticed the slip-up on the itinerary (nor would I ever have recognized it as a slip-up, since the output was pretty mainframe-ish), and only upon arriving at the airport in Minneapolis was I told I could not check in. Twenty-four hours and a lot of scrambling later, I had new tickets issued on the only flights available, hence my return trip heading east via Bangkok and Tokyo. But aside from that initial snafu, the trip has been pretty routine. Saturday will be interesting...my ticket to Bangalore has to be purchased in-person, so there's great potential for more headaches. But I can take it in stride, and I'm sure things will work out well in the end.

Hyderabad is about what I expected, and I'm looking forward to getting into the city one of these three days before I leave for Bangalore. It feels very similar to Beijing, with old meeting new and construction everywhere. The city looks very appealing; street-level shops line almost every sidewalk, there's green plant life everywhere, and everything has sort of a dusty, dingy, comfortable feel. It's not dirty, don't get me wrong...just dusty, like any city kicking earth into the air with a thousand construction projects might be. The auto-rickshaws (three-wheeled rickshaw-looking things with small engines) are pervasive; I can imagine that during the day they swarm and buzz like bees. They drive on the British side of the road, which isn't much of a surprise, really.

It's really unfortunate when traveling to more exotic places like this that I have to check into a tidy, pristine western-style hotel. I'd rather stay in the midst of the city, where I can walk out the front door into daily Hyderabadian life. I doubt I'll be at the hotel for all three days of the conference...there's exploring to be done.

If you're located in Hyderabad or Bangalore and might be able to show me around a day this week (Thursday or Friday for Hyderabad) or next Sunday or Tuesday (Bangalore) I'd appreciate it. I'm not put off by crowds of people, even if many don't speak my language, but knowing the hot spots for shopping and eating makes exploring a bit more productive.

I give my presentation in just over 12 hours, so I'm going to try to settle in with a nice Kingfisher lager beer and relax. The conference schedule has things starting up in about 6 hours, so I should be fine for sleep and fairly well-delagged by then.

On the plane, I managed to duplicate the rails-integration WARfile-based deployment of a simple webapp, but I was unable to get an embedded Derby DB to work with it. I'm sure it's a matter of permissions, connection pooling, and such. I may or may not have time to get that working by tomorrow, but if I demo the basic app working with WEBrick and then WAR it up and show the non-DB stuff working in GlassFish, I think the point will have been made. I must also keep reminding myself that my talk is only 50 minutes now. I can fill 50 minutes without batting an eye. So we'll see if there's even time to get to GlassFish, since it eats up a good five minutes building and deploying the app from scratch.

JRuby World Tour: Amsterdam

I'm now relaxing at Schiphol Airport, where I've just plunked down my 10 euros for internet access. After a light breakfast, my current task is now to ensure demos are ready and working for TechDays. I discovered a bit ago that the Rails Integration project uses maven to build, and so I need internet access for that first install command (to pull down required dependencies). I had worried a bit about finding internet access, but this connection appears to be pretty good. I've managed to build the Rails Integration module, and hopefully I'll be able to get Rails-in-a-WARfile working in the next few hours.

If I discover anything interesting in the process, I'll post it. From what I've observed, the rails-integration guys and Ashish have had lots of success lately.

If you happen to be at Schiphol before I depart at 11:55, I'm in Lounge 2 near the brasserie. I'm the one with the MacBook Pro, fighting jetlag and overlooking the concourse.

The next stop will be Hyderabad. My 11:55 flight departs from gate F8.

Update: I managed to get Rails up and running nicely in a WAR file. Huzzah! I'll play with it a bit on the plane and get our usual Rails demo to feed directly into "now we'll make it a WAR file". Thanks much to the rails-integration guys for putting together a really slick piece of work. Gotta love this whole open-source thing.

Side note: jet lag is a weird feeling. Right now it's 9:49 in Amsterdam and around 14:19 in Hyderabad, so I have to pretend it's mid-afternoon. My flight will board in about two hours, and toward the end of the flight I'll have to pretend it's evening and start getting a little sleep. It's quite unfortunate that I'm arriving in Hyderabad at 1:20AM on the day I'm presenting, but that's how it goes!

Sunday, February 18, 2007

The JRuby WORLD TOUR 2007

Yes, you read that right! I'm announcing my intent to circumnavigate the globe in only NINE DAYS spreading JRuby and joy at every stop. The Earth shall be wrapped in JRuby goodness!

To celebrate this monumental occasion, I'm inviting all developers with an interest in Ruby or Java to join the JRuby mailing lists and volunteer your services. Eternal gratitude and temporary fame could be yours, for the small price of bug reports, bug fixes, or high-performance rewrites of core JRuby libraries (it's so easy!). Operators are standing by to receive your emails and direct you to the promised land!

Now, on to the tour!

First Stop: Amsterdam, Netherlands

The first leg of my journey takes me from my home in Minneapolis, Minnesota to beautiful Amsterdam. There I will enjoy an delicious airport breakfast followed by a five-hour tour of the terminal. I will be available for autograph signing from 7:00 to 11:30 by appointment only. The first annual JRuby New-Age Concert of Magic will follow from 11:30 to 11:35. Tickets are first-come, first-served; cash only, please! I will depart from Amsterdam at 11:55 via a chartered jet I'm generously sharing with three-hundred other passengers and a small KLM Dutch Airways flight crew, en route to exotic Hyderabad, India!

Second Stop: Hyderabad, India

The next part of the JRuby "World Domination Tour 2007" takes me to Sun's Tech Days event in Hyderabad. In addition to my JRuby session, Tech Days will host such scintillating topics as "iPod Giveaway" and "Java Jacket Giveaway". But on Wednesday the 21st at 3:35PM you too can learn why JRuby is the programming alchemist's "Developer's Stone", transmuting static lead into dynamic gold. The topics covered will be exactly like those in recent JRuby talks, except cooler, faster, and after 26 hours of non-stop travel and five hours of sleep. Prepare for a punchy, laugh-riotous affair! Will I be able to maintain 90WPM during my interactive demonstrations? Will I remember that "alias" takes two arguments *without* a comma and with the original method first? Join me for what's sure to be 50 minutes you'll remember the rest of your life!

Third Stop: Bangalore, India

From Hyderabad, I take a short one-hour flight to Bangalore, home to Sun Microsystems India and the third leg of the JRuby Globe-trotting Festival of Light! On Monday, February 23rd, I will present JRuby to my tropical counterparts on the opposite side of the earth, featuring the exact same topics from Tech Days...in RANDOM ORDER. You never know what I'm going to do next.

I actually have all that Sunday free and Tuesday up until about 8:00 free to do some exploring. Event suggestions are welcome, and I challenge any of you to find food too spicy for me to eat! (impossible, I say!)

Fourth Stop: Bangkok, Thailand

The next step takes me from Bangalore to beautiful Thailand, jewel of Southeast Asia and home to one of my favorite cuisines. I will be available from 4:00AM to 5:30AM for a special, once-of-a-lifetime event I'm calling "JRuby Dawn at Bangkok Airport". And the great question on everyone's lips will be foremost on my mind:

Will the airport's Thai restaurants open before I board my next flight at 6:00? Stay Tuned!

Fifth Stop: Tokyo, Japan

Continuing the Asian leg of the tour, I'll spend 80 fun-filled minutes exploring the international terminal in Narita, only an hour's train ride from the Emperor's Palace! Naturally I'll be available for handshakes and baby-kissing, and hopefully the always-humorous photograph of "buying beer from a vending machine". This trip will serve as a preview for the main Tokyo event: Ruby Kaigi 2007 in June, where I'll finally present JRuby to the Land of the Rising Sun. Come 3:10PM it's time for "so long Japan", but I'll be back soon!

Sixth Stop: Minneapolis, Minnesota

And just 9 days after I departed, I'll be home in Minneapolis again, ready for my next thrilling adventure: The Greater Wisconsin Software Symposium (a No Fluff Just Stuff event) in Milwaukee from March 2-4. Join me for my two fabulous sessions "Bringing Ruby and Rails to the JVM" and "Become Super-Powerful with JRuby", putting the greatest dynamic language ever in the palm of your JVM.

Wednesday, February 14, 2007

Jython 2.2 Beta 1 Released!

The Phoenix Is Rising!

After being considered dead for many years, Jython is back in business with the beta 1 release of Jython 2.2. It's been teetering on the edge of 2.2 compatibility for a long time, but over the past several months the core team and several contributors have rounded off the edges, to the point that a beta release of the long-awaited 2.2 version is now available.

For all you Pythonistas, this should mean two things:

- You have a project for the day. Go get Jython 2.2, try it out with your Python 2.2-compatible apps and libraries, and report any issues you find.
- Start contributing your time, either helping with the Java coding, helping to debug Python apps and libraries, or help on efforts to write C-based libraries in Python.

I will also remind you that a large percentage of JRuby's success is due entirely to its community. Can you really get up in the morning and look at yourself in the mirror without knowing you've helped Jython get back up to speed? Can you?

Tuesday, February 13, 2007

Rails Support Status Update

Things are moving along well...so well I've found time to reimplement String with byte[], work on closures in the compiler, and, well those are posts for another day. Today I update you on the progress of supporting Rails in JRuby.

Rails is quite an interesting beast. I've learned more about Ruby looking through Rails code than from any other source...out of sheer necessity. They say the best way to learn a language is through immersion, right? Is debugging 50-deep stack traces on a questionable interpreter, digging for a reducible test case immersive enough for you? Yeah, I thought so.

We actually hit Rails support hard right at the beginning of this month, or the end of January. The early results were pretty solid, so our efforts have been distracted onto other large JRuby issues not necessarily Rails-related (but all still critical for an eventual 1.0). We also made the bold move of jumping to Rails 1.2.x for all our testing, to show we can keep up with the Rails development process. And to show that we've made great progress, I give you the following results.

ActiveSupport

ActiveSupport is the base module for Rails. It monkey-patches a number of core classes, provides a multibyte String wrapper for UTF-8 encoded text, and handles most of the details of running and configuring an application. It "supports" the other libraries, and is the first crucial leg needed to run Rails. And so we must "support" it well.

Here's the current results of a full test run:

498 tests, 1809 assertions, 16 failures, 6 errors

That's already in the 95% range, so we're in darn good shape. But it turns out a number of these errors are caused by a glitch in our parser related to KCODE. So if we work around that known issue, the results improve to:

498 tests, 1845 assertions, 12 failures, 1 errors

So more like 97% passing once we fix the KCODE parser problem. The remaining issues are almost all related to time-formatting bugs, with a couple multibyte and exception-handling issues tossed in. I haven't attacked the formatting bugs because Printf code is bloody painful, the multibyte issues are waiting on the KCODE parser fix, and the others...well, they're boring.

ActionPack

ActionPack is the brains of a Rails app, housing the mechanisms for controllers, views, and any code to support them. So it's leg two of the crucial three-legged support necessary to run a Rails application.

I hit ActionPack hard in the past, and a bit this month. Ola took off with it and completed most of the remaining failures. Running with the same KCODE workaround, we have the following results today:

1157 tests, 4811 assertions, 7 failures, 14 errors

That's above 98% passing. The remaining failures include a number of dupes (a single failure that breaks a number of tests), some additional string-formatting failures, and a couple that run ok outside of Rake. So it's damn close to perfect.

ActiveRecord

You should all know ActiveRecord by now, right? It's Rails' DB layer, based on the ActiveRecord pattern. It is the third core leg of the Rails platform, though you can certainly have apps that don't use AR for database support.

ActiveRecord is an extensive piece of code and it's the only part of Rails that usually requires a native library to run properly (though there is a pure Ruby impl of its MySQL support that nobody uses). In order to support AR, a number of the JRuby community members have cooperated over the past 9 months to build ActiveRecord-JDBC, a gem-installable module that provides ActiveRecord DB support via JDBC. It's a great bit of hackery, and runs surprisingly well considering our less-than-beautiful Java integration performance (under repair).

To limit the scope of this month's Rails work, we're targetting MySQL, since it's the de-facto standard for Rails apps in most quarters. But most of the work we're doing will apply equally well to the other databases, since JDBC is generally very consistent. Tom has been spending lots of time on Derby, for example, to the point that our ActiveRecord-on-Derby failures are almost entirely limited to SQL features it doesn't support yet.

So then ActiveRecord test results, minus the KCODE workaround (since Tom ran these for me):

1012 tests, 3417 assertions, 41 failures, 35 errors

Here our results dip to around 92% passing, but it's really the extended features of AR that have failures here. We've come a long way on this; results on the months-outdated JRuby wiki show Rails 1.1.6's ActiveRecord only passing about 60% of that release's tests, so there's been a ton of improvement since November. And we generally understand how to fix the remaining failures, so it's only a matter of (short) time.

ActionMailer

Can you guess what ActionMailer does? ActionMailer provides support for composing, formatting, and sending email from a Rails app. And that's about it. It's not a huge library, but it's essential for many apps.

64 tests, 142 assertions, 9 failures, 6 errors

That's about 75% passing, with a grand total of six test scripts. We haven't focused on this much, other than Tom's initial KCODE work a few months back. The current failures are almost all mail-formatting or SMTP-wrangling issues. I don't expect them to be hard to repair.

ActionWebservice

Because ActionWebservice's tests require some database setup, we haven't tackled them yet. But I would lay even money that they'll be comparable to ActiveRecord at their worst. For now, they all just fail because MySQL isn't set up correctly for them.

96 tests, 0 assertions, 0 failures, 96 errors

Any community member that wants to dive into ActionWebservice or ActionMailer would fast become a JRuby Hero.

Railties

Railties is the final piece of the Rails puzzle, and it...well..."ties Rails" together. I show the results here mostly for completeness; many of the libraries in Railties we'll never support (fcgi, for example) and most JRuby-on-Rails deployments will use alternative mechanisms for hitting the web.

5 tests, 19 assertions, 2 failures, 0 errors

Official Rails Support?

Because things are looking pretty solid, we've been looking for an answer to this question: What does Rails Support in JRuby mean? Do we have to pass all test cases 100% to "officially" support Rails? That might never happen, since there's POSIX and external library stuff we won't ever handle (nor will we need to for JRuby on Rails apps). So then is it a certain percentage? 95%? 98%?

I think the truth is that we could really announce support for Rails now. Almost all the visible, outstanding issues with actually *running* Rails apps have been resolved, and most apps and scripts work fine. There's ongoing work to improve ActiveRecord-JDBC's support for other databases, but that's an endless quest. And of course there's more work needed to support Grizzly, Mongrel, and WAR-based deployment of JRuby on Rails, but those are peripheral to the official announcement. Even when we do make an official announcement, it will be for a pre-1.0 version of JRuby, since we know there's another few months left on 1.0 fixes and features.

So what do you think, dear reader? At what point would you feel safe saying "Let's have our non-JRuby hackers try using JRuby on Rails"? You probably would *be* safe right now, since even if you found issues we've got a busy community ready to help solve them. And we'll probably tiptoe closer to "perfect" Rails support in JRuby over the next couple months, chasing the long tail of Ruby compatibility. But how do these numbers and this update make you feel about JRuby on Rails today?

And as always, we love to have additional contributors, so we'll bend over backwards to make it easy for you to help. Join the lists, join #jruby on freenode IRC, or toss us email privately. JRuby is an amazing community-driven success story, and the only thing missing is you.

California Schemin'

Last week Tim Bray and I were at Menlo Park to meet with the Open Source Software Society Shimane, a delegation of developers, managers, and company heads from the Shimane prefecture of Japan. They were visiting Sun to talk with us about opportunities for cooperation, Sun hardware and software, and most importantly: Ruby. For you see Shimane is the home of Yukihiro "Matz" Matsumoto, creator of Ruby, and he accompanied the group to California.

The evening before the event, we went to Fuki Sushi in Palo Alto, a short two blocks from my hotel. I don't believe I've ever eaten such quantity or variety of Japanese cuisine, and I think our guests felt the same way. They marveled at the size of most dishes, especially the ice-cream-scoop-sized lump of wasabi and the two-foot-long sushi tray. They also photographed almost everything...I think I posed for a couple dozen snaps.

During the following day, Thursday, they sat through numerous presentations on Sun hardware and software. Tim and I also discussed Sun's position on Ruby and JRuby for an hour before and about forty minutes after lunch. Tim hit the high-level points about where Ruby will likely fit into the Java ecosystem in the future, and I supplied details and demos of JRuby. I also threw in a demo of JRuby's compiler beating Ruby 1.8 in the standard fib algorithm, which elicited a smile and laugh from Matz himself (whew! I was worried how he'd react!).

Most interesting to me, however, was my discussion with Matz that night.

I was invited to join the delegation for a crab dinner in San Francisco. We went to Crustacean, a moderately upscale joint near California and 101. And after the attendant tied my plastic bib on, we were ready to go.

Since Matz and I ended up sitting together, and since very few others at the table spoke English, we managed to get in some time discussing a couple Ruby 2.0 design issues. Here's a quick summary:

Matz seems to have come around to my visibility proposal for "private" in Ruby 2.0, which is largely the same as how Java handles private visibility. I believe this model is a good simplification over the original proposal. See ruby-core:9996 and related for the original discussion. The basic facts of private then would be:

You must dispatch to private methods using a functional call, as in foo() versus xyz.foo(). I didn't like this at first, but I've come around to using call syntax to force certain aspects of visibility.
Dispatches to private methods will only look in the same class for the method definition.
Methods that are public in superclasses can't be made private in subclasses.
Methods that are private in superclasses are not visible to subclasses, and so new methods of the same name and any visibility can exist in subclasses.

Protected methods in Ruby 2.0 could potentially act like private methods now, though Matz is worried it would be too much of a change. I think it's appropriate; current private method behavior is very similar to Java's model for protected methods, where the methods can't be seen from outside the hierarchy, but can be called and overridden within the hierarchy as normal. I voiced my opinion, so we'll see where Matz goes from here.
Matz is still comfortable with removing set_trace_func if a better mechanism for profiling and debugging can replace it. I had a few suggestions for alternate mechanisms, but I also promised to look into Java's model, since it seems to work quite well. I also suggested there may be something to learn from DTrace.
Matz has come around to the idea that encoded character sequences are a different type than unencoded byte arrays, though he still wants them to have the same outward interface.

This last item warrants a bit more discussion.

The topic of encoded character strings came up a few times during Matz's visit, usually with him asking how we're doing things in JRuby. I explained that we mostly just follow Ruby 1.8, with our String now being backed by a byte[], but that we're also providing out-of-the-box native support for the new Rails ActiveSupport::MultiByte Chars class, a wrapper around string that enforces character boundaries and encodings.

At dinner, we continued the discussion. I made my case for a separate type with the following points:

A separate type would not require String's interface to change, and it could remain a byte array
By having separate types, we can use polymorphic behavior to avoid checking and re-checking encodings for every operation

The first item was mostly a non-issue...Matz is fairly intent on changing the String interface in 2.0, and much of that work is already complete. But he had an interesting response to the second item: he's already planning to have separate types internally for encoded character strings. This was very good news to me, since it meant that JRuby could easily support M17N in the future by simply providing different String types that handle the other encodings, where our UTF-16 String implementation could simply be backed up by java.lang.String/StringBuffer/Builder.

So the result of the String discussion can be summarized in a few points:

String's interface will change from 1.8 to work with characters rather than bytes, both in the encoded and unencoded forms of String. The plan for String methods' behaviors does not change from current Ruby 1.9.
String will have subtypes that represent encoded character data, though in most cases you won't need to know about those types. If you do need to go after a UTF8String (my name), you can, but there will also be some sort of factory model for generating encoded strings and Ruby 2's encoding pragma will handle literals.

All told, I think it was a very productive trip, and it was great to help Matz work through a few Ruby 2.0 design questions.

Wednesday, January 31, 2007

The End is Nigh: Help Squash Rails Bugs

My friends, the end of the road is in sight for official Rails support.

Tom, Ola, and I have been working over the past week to get remaining Rails issues wrapped up. As a result of our efforts:

ActionPack is now "practically" 100% working, minus a test or two we can't support and a few tests that are broken or that run fine in isolation (it would be nice to know *why* those fail)
ActiveSupport is well above 95% passing
ActiveRecord is in the 90% range passing with MySQL and in the 80% range with Derby

The remaining modules are still yet to be worked, but they are mostly in high % as well: ActionWebService, ActionMailer, Railties

So here is a set of instructions on how YOU, dear reader, can help round out JRuby's support for Ruby on Rails.

Getting a testbed set up
1. Update JRuby trunk (http://svn.codehaus.org/jruby/trunk/jruby), build it (ant clean jar), love it
2. Install Rake (gem install rake)
3. Fetch Rails 1.2.1 from Rails SVN at http://dev.rubyonrails.org/svn/rails/tags/rel_1-2-1/
Running the tests
1. From within the module under test (like activesupport/ or actionmailer/) just run "rake". All tests should execute and a report should be provided at the end. You can usually run tests individually as well, although a few depend on the side-effects of previous tests.
2. ActiveRecord requires some additional setup; I'll update this post with Tom's instructions and patches shortly.

Reporting issues
1. Reporting failures in Rails is good
2. Reporting reduced test cases or clear explanations for failures in Rails is better
3. Reporting reduced test cases and including patches for failures in Rails is BEST
4. Don't forget to check if your issue has already been reported, and please sync up on the mailing list while you're working
5. Patches will probably not be accepted without a reusable test case. We're trying to grow our regression suite as a result of this work.
6. JRuby's JIRA is here: http://jira.codehaus.org/browse/JRUBY
Caveats, things to watch for, things to try
1. ActiveRecord (with the AR-JDBC adapter) could use wider DB testing. We've done quite a bit of work with MySQL and Tom has been improving Derby support, but there are lots of other databases out there. Pick your favorite database, follow Tom's instructions to get up and going, and report issues (in the JRuby JIRA at least, but also report to jruby-extras project if appropriate)
2. Railties includes code that will never run under JRuby, like its fcgi-based dispatcher tests. You should confirm with us that they're expected, and then ignore or delete them for your future runs.
3. Rails is a very...interesting...application to debug. Feel free to ask on-list if you simply don't get something. I've seen things in Rails code no man should have to see, so I know it can be frustrating to debug at times.

We're on the home stretch now, and Rails is getting more and more solid every day. With you all helping, we should be able to finish off the remaining failures, clean up major outstanding JRuby issues, and kick out a pretty sweet "Rails-supporting" JRuby release in the next couple weeks.

Update: A couple folks pointed out that the codebase didn't compile under Java 1.4.2. That has been corrected!

Update 2: A few folks are seeing a problem installing gems related to the %p operator to printf. We're working on that, and it's a fairly minor issue, but to avoid it there's one additional step before you install rake: set the JRUBY_HOME env var to the root of your JRuby stuff.

Tuesday, January 30, 2007

Improving Java Integration Performance

I created JRUBY-501 to track performance improvements to Java integration, since it's come to light recently that it may be one of our biggest bottlenecks now. And I found a ripe, juicy fix already.

For every call to a Java type, we call JavaUtilities.matching_method with a list of potential methods and the given argument list. matching_method compares the available methods and the types of the arguments, choosing the best option and returning it to be called. This is essentially our heuristic for choosing an overloaded method from many options, given a set of arguments.

Problem was, we didn't cache anything.

Given a list of argument types and a list of methods, there's only ever going to be one appropriate choice. Unfortunately our code was doing the search for every single call, and you can imagine how much additional overhead that added. Or perhaps you can't, and I'll show you.

Here's the numbers before my tiny change:

 38.862000   0.000000  38.862000 ( 38.861000)
 40.230000   0.000000  40.230000 ( 40.230000)

This test basically just instantiates a StringBuffer and appends the same character to it 100_000 times. It takes roughly 40 seconds to do that with the old code.

And here's with my changes:

  3.295000   0.000000   3.295000 (  3.294000)
  2.933000   0.000000   2.933000 (  2.933000)

Yes, you're reading that right. It's an 13 times improvement.

And the change was trivial: given the list of methods and argument types, cache the correct method. So simple, so elegant, so effective.

So does this affect regular Ruby code? You better believe it does!

I had been intrigued by the fact that some of the first methods JITed during rdoc generation were all JavaSupport methods. That told me something in rdoc was using a class we provide through Java integration, rather than natively or in pure Ruby. So I figured with this change, I'd re-run the numbers.

Before the change, a full rake install with rdoc took about 42s, or about 31s with ObjectSpace disabled. And now, the "after" numbers:

with ObjectSpace:
real    0m29.765s
user    0m28.843s
sys     0m2.169s

without ObjectSpace:
real    0m24.984s
user    0m23.559s
sys     0m1.757s

This is by far the largest increase we've seen in rdoc performance in several months. The fix should also drastically improve the performance of libraries like ActiveRecord-JDBC, which is extremely Java-integration-heavy.

Another area that's been painful was installing Rails with all docs. It used to take over an hour, but now it's under *seven minutes*.

I hope those of you who've seen or blogged about performance problems (especially with the aforementioned ActiveRecord-JDBC) will try re-running your tests. This improvement ought to have a very noticeable effect on benchmarks.

Now the only concern I have with the caching is that it's a little coarse; there may be better places to do the caching, or finer-grained items to cache against. And we could probably pre-fill the cache with some likely candidates. But an improvement like this outweighs those concerns, so it's been committed...and there's bound to be similar improvements as well.

Boy oh boy is that low-hanging fruit looking ripe.

Friday, January 19, 2007

Velocity, F3, Grizzly on Rails, JParseTree

Roundup!

Martin Fowler enjoys using JRuby with Velocity

Jean Lazarou creates an F3 clone with JRuby

Ashish Sahni posts a walkthrough for JRuby on Rails under Grizzly

Werner Shuster releases JParseTree, sexp-based JRuby parse tree generator.

JRuby, JRuby, JRuby.

Thursday, January 18, 2007

JRuby Compiler: In Trunk and Ready to Play

Times they are a-changing.

I posted previously on JRuby's compiler work. There have been various iterations of the compiler, many purely prototype and never intended to be completed, and a few genuine attempts at evolving toward full Ruby support. However I believe in the recent weeks I've settled on a design that will carry us to the JRuby compiler endgame.

For the past year, we've emphasized correctness over performance nine times out of ten. When we did focus on performance, it was solely on improving JRuby's interpreter speed, in an attempt to match Ruby's performance in this area and because we knew that JRuby could never entirely escape interpretation. Ruby's just too dynamic for that. So while compatibility with Ruby 1.8.x continued to improve by leaps and bounds, our performance was rather poor in comparison.

This past fall, things started to change. Compatibility reached a point where we could finally be confident about our set of regression tests and our understanding of "how Ruby works" across all its weirdest features. As we understood better the design of the C implementation and the quirky intricacies of the language, we started to see a path to enlightenment. We started to realize how we could support Ruby as it exists today while simultaneously evolving JRuby into a more efficient and cleaner design. And so the performance numbers started to change.

From 0.9.0 to 0.9.1, we had a clean doubling of performance across the board. Our favorite benchmark--RDoc generation--was easily twice as fast, and other simpler benchmarks like fib had similar improvements. 0.9.2 was more of a rushed release for JavaPolis, but we had a good 1/4 to 1/3 speedup even then, since the ongoing refactoring removed another large chunk of overhead from JRuby's core runtime.

From 0.9.2 to current trunk, however, has been a different matter entirely.

The first major change is that we've started to seriously alter the way JRuby does dynamic method dispatching. I did some research, read a few papers, and mocked up and benchmarked a few options. What we've settled on for the moment is a combination of STI for the core classes (STI provides a large table mapping methods and classes to actual code) and various forms of inline caching for non-core classes (basically, for pure Ruby classes; though this is yet to be implemented in trunk). STI provides an extremely fast path for dispatch on those hardest-hit methods, since it reduces calling most core methods to two array indexes and a switch, a vast improvement over the hash lookup and multiple layers of abstraction and framing we had before.

We are continuing to expand our use of STI as it is applicable, and I will soon start exploring options for interpreted-mode inline caching (polymorphic, likely, though I need to run a few trials to get numbers balanced right). So fast dynamic dispatching is well on its way, and will improve performance across the board.

Then there's the compiler work. You have no idea how much it's irritated me to hear people talk about JRuby the past year and say "yeah, but it doesn't compile to Java bytecode." This obviously amounts to pure FUD, but beyond that it totally ignores the complexity of the problem: not a single person on this earth has managed to compile Ruby to a general-purpose VM yet. So complaining about our missing compiler is a bit like complaining that we haven't moved mountains. Honestly people, what do you expect?

Of course, there's the flip side of this statement: compiling Ruby is a hard problem, and I like hard problems. For me it's doubly hard, since I've never written a compiler before. But hell, before JRuby I'd never even worked on an interpreter or language implementation before, and that seems to have gone alright. So there it is...Mount Ruby, waiting to be climbed. And climb it I must!

The current compiler design lives in two halves: the AST-walking half; and the code-generation half. I chose to split these two because it make several things easier. For starters, it allows me to abstract all the bytecode generation logic behind a simple interface, an interface that presents coarse-grained operations like invokeDynamic() and retrieveLocalVariable(). The ultimate implementation of those operations can then be modified at will. It also allows us to evolve the AST independently of the compiler backend, even to the point of swapping in a completely different parser and in-memory code representation (like YARV bytecodes) without harming the evolving code generator backend. So this split helps future-proof the compiler work.

The current design also has another advantage: not all of Ruby has to compile for it to be useful. Currently, as the AST walker encounters nodes, if it finds a node it can't deal with it simply raises an exception. Compilation terminates, and the compiler's client can deal with the result as it will. This leads to a really powerful feature of this design: we can install the compiler now as a JIT and as it evolves more and more code will automatically get optimized. So once we're confident that a given node type is 100% compiling correctly, that node will now be eligible for JIT compilation. As an example, here's the output from a gem installation with the current compiler enabled as a JIT (with my logging in place, naturally):

compiled: TarHeader.empty?
compiled: Entry.initialize
compiled: Entry.full_name
compiled: Entry.bytes_read
compiled: Entry.close
compiled: Entry.invalidate
Successfully installed rake, version 0.7.1
Installing ri documentation for rake-0.7.1...
compiled: LeveledNotifier.notify?
compiled: LeveledNotifier.<=>
compiled: RubyLex.getc
compiled: null.debug?
compiled: BufferedReader.ungetc
compiled: Token.set_text
compiled: RubyLex.line_no
compiled: RubyLex.char_no
compiled: BufferedReader.column
compiled: RubyToken.set_token_position
compiled: Token.initialize
compiled: RubyLex.get_read
compiled: RubyLex.getc_of_rests
compiled: BufferedReader.getc_already_read
compiled: BufferedReader.peek
compiled: RubyParser.peek_tk
compiled: TokenStream.add_token
compiled: TokenStream.pop_token
compiled: CodeObject.initialize
compiled: RubyParser.remove_token_listener
compiled: Context.ongoing_visibility=
compiled: PreProcess.initialize
compiled: AttrSpan.[]
compiled: null.wrap
compiled: JavaProxy.to_java_object
compiled: Lines.next
compiled: Line.isBlank?
compiled: Fragment.add_text
compiled: Fragment.initialize
compiled: ToFlow.convert_string
compiled: LineCollection.add
compiled: Entry_.path
compiled: Entry_.directory?
compiled: Entry_.dereference?
compiled: AttrSpan.initialize
compiled: Entry_.prefix
compiled: Entry_.rel
compiled: Entry_.remove
compiled: Lines.rewind
compiled: AnyMethod.<=>
compiled: Description.serialize
compiled: AttributeManager.change_attribute
compiled: AttributeManager.attribute
compiled: ToFlow.annotate
compiled: NamedThing.initialize
compiled: ClassModule.full_name
compiled: Lines.initialize
compiled: Lines.empty?
compiled: LineCollection.normalize
compiled: ToFlow.end_accepting
compiled: Verbatim.add_text
compiled: FalseClass.to_s
compiled: TopLevel.full_name
compiled: Attr.<=>
Installing RDoc documentation for rake-0.7.1...
compiled: Context.add_attribute
compiled: Context.add_require
compiled: Context.add_class
compiled: AbstructNotifier.notify?
compiled: Context.add_module
compiled: LineReader.read
compiled: null.instance
compiled: HtmlMethod.path
compiled: HtmlMethod.aref
compiled: ContextUser.initialize
compiled: HtmlClass.name
compiled: TokenStream.token_stream
compiled: LineReader.initialize
compiled: TemplatePage.write_html_on
compiled: Context.push
compiled: Context.pop
compiled: HtmlMethod.name
compiled: Context.find_local_symbol
compiled: SimpleMarkup.add_special
compiled: TopLevel.find_module_named
compiled: Context.find_enclosing_module_named
compiled: HtmlMethod.<=>
compiled: ToHtml.annotate
compiled: HtmlMethod.visibility
compiled: HtmlMethod.section
compiled: HtmlMethod.document_self
compiled: LineReader.dup
compiled: Lines.unget
compiled: ToHtml.accept_paragraph
compiled: ContextUser.document_self
compiled: ToHtml.accept_heading
compiled: Heading.head_level
compiled: ToHtml.accept_list_start
compiled: ToHtml.accept_list_end
compiled: ToHtml.accept_verbatim
compiled: SimpleMarkup.initialize
compiled: AttributeManager.initialize
compiled: ToHtml.initialize
compiled: ToHtml.end_accepting
compiled: HtmlMethod.singleton
compiled: Context.modules
compiled: Context.classes
compiled: ContextUser.build_include_list
compiled: HtmlMethod.description
compiled: HtmlMethod.parent_name
compiled: HtmlMethod.aliases
compiled: HtmlClass.parent_name
compiled: ContextUser.as_href
compiled: ContextUser.url
compiled: ContextUser.aref_to
compiled: HtmlFile.<=>
compiled: HtmlClass.<=>

You can see from the output that not only are RubyGems methods getting compiled, but so are stdlib methods and our own Java integration methods. And this is with the current compiler, which doesn't support compiling class defs, blocks, case statements, ... Hopefully you get the picture; this bit-by-bit implementation of the compiler allows us to slowly grow our ability to optimize Ruby into Java bytecodes.

So then, how well does it perform? It performs just dandy, when we're able to compile. Witness the following results for a simple recursive fib algorithm running under Ruby 1.8.5 and JRuby trunk with the JIT enabled.

$ ruby test/bench/bench_fib_recursive.rb
12.760000   1.400000  14.160000 ( 14.718925)
12.660000   1.490000  14.150000 ( 14.648681)
$ JAVA_OPTS=-Djruby.jit.enabled=true jruby test/bench/bench_fib_recursive.rb
compiled: Object.fib_ruby
8.780000   0.000000   8.780000 (  8.780000)
7.761000   0.000000   7.761000 (  7.761000)

Yes, that's nearly double the performance of the C implementation of Ruby. And this is absolutely real.

Now JITing is great, and it's obviously carried Java a long ways. The HotSpot JIT is an unbelievable piece of work, and any app that runs a long time is guaranteed to perform better and better as deeper optimizations start to take hold. But We're talking about Ruby here, which starts up at C-program speeds, and runs as fast as it does immediately. So then JRuby needs a way to compete for immediate execution performance, and the most straightforward way to do that is with an ahead-of-time compiler. That compiler is now also available in JRuby trunk.

The name of the command is "jrubyc", and it does just what you'd expect, it outputs a Java class file for your Ruby code. However the mapping from Ruby code to a class file is not as straightforward as you'd expect: a Ruby script may contain many classes or no classes at all, and those classes may be opened and re-opened by the same script or other scripts at runtime. So there's no way to map directly from a Ruby class to a Java class given the strict limitations of Java's class model. But there is a much smaller unit of code that does not change over time, aside from being mercilessly juggled around: methods.

Ruby, in the end, is a creative and sometimes complicated jumble of method "objects", floating from class to class, from module to module, from namespace to namespace. Methods can be renamed, redefined, added and removed, but never can they be directly modified. And so here is where we have our immutable item to compile.

JRuby's compiler takes a given Ruby script and generates the following Java methods out of it: One Java method for the top-level, straight-through execution of the script, including class bodies and "def"s and the like (called "__file__" in the eventual Java class...thanks Ola for the idea), and a Java method for every Ruby method body and closure contained therein, named in such a way as to avoid conflicts. So for the following piece of code:

require 'foo'

def bar
baz { puts "hello" }
end

def baz
yield
end

There would be four Java methods generated: one for the toplevel execution of the script, two for the bar and baz methods, and one for the closure contained within bar. The resulting class file would store these as static methods, so they are accessible from any class or object as necessary, and the toplevel run-through would bind the two Ruby methods to their appropriate names in Ruby-space.

Quite simple, really!

So then an example of the precious, precious JRuby compiler:

$ cat fib_recursive.rb
def fib_ruby(n)
if n < 2
n
else
fib_ruby(n - 2) + fib_ruby(n - 1)
end
end

puts fib_ruby(34)
$ jrubyc fib_recursive.rb
$ ls fib_recursive.*
fib_recursive.class  fib_recursive.rb
$ time java -cp lib/jruby.jar:lib/asm-2.2.2.jar:. fib_recursive
5702887

real    0m8.126s
user    0m7.632s
sys     0m0.208s
$ time ruby fib_recursive.rb
5702887

real    0m14.649s
user    0m12.945s
sys     0m1.480s

Again, about twice as fast as Ruby 1.8.5 for this particular benchmark.

Now I don't want you going off and saying JRuby has a perfect compiler that will double the performance of your Rails apps. That's not true yet. The current compiler covers only about 30% of the possible code constructs in Ruby, and the remaining 60% (Update: 70%...that's what I get for late-night blogging) contains some of the biggest challenges like closures and class definitions. It's sure to be buggy right now, and the JIT isn't even enabled by default, plus it has my nasty logging message burned into it, to discourage any production use.

But it is very real. JRuby has a partial but growing compiler for Ruby to Java bytecode now.

And oh my, look at the time. Tonight I have to finish my visa application for a trip to India, nail down schedules and descriptions for several upcoming talks, and prepare some slides and notes for presentations in the coming weeks. You will see more about the Java compilation and our developing YARV/Ruby 2.0 bytecode support over the next couple months...and you can expect JavaOne to be an interesting time for Ruby on the JVM this year ;)

Friday, January 12, 2007

Ruby Compiler Fun: AOT and JIT Compilation

Who knew writing a compiler could be so much fun.

I managed to accomplish two things tonight. It's late and I have a flight home tomorrow, so I'll be brief.

jrubyc: JRuby's Ahead-Of-Time (AOT) Compiler

I have whipped together the very barest of command-line, ahead-of-time compilers, along with a simple script to invoke it.

~/NetBeansProjects/jruby $ jrubyc
Usage: jrubyc <filename> [<dest>]

It's mostly just a very thin wrapper around the existing compiler code, so it can only compile constructs it knows about. However, for really simple scripts without any unrecognized nodes, it works fine:

~/NetBeansProjects/jruby $ cat samples/fib.rb
# calculate Fibonacci(20)
# for benchmark
def fib(n)
if n<2
  n
else
  fib(n-2)+fib(n-1)
end
end
print(fib(20), "\n")
~/NetBeansProjects/jruby $ jrubyc samples/fib.rb tmp
~/NetBeansProjects/jruby $ ls tmp/samples
fib$MultiStub0.class    fib.class

At the moment, two classes are generated; one is a class to hold the script entry points and the other is a stub class for all the actual blocks of code contained within the script (toplevel code, method code, etc). This will soon be a single class file, so pay the MultiStub no mind.

We can then execute the script like you'd expect, specifying the JRuby and ASM jar files on the classpath:

~/NetBeansProjects/jruby $ export CLASSPATH=lib/jruby.jar:lib/asm-2.2.2.jar:tmp        
~/NetBeansProjects/jruby $ java samples/fib
6765

Huzzah! Compilation!

Now of course, as I mentioned, this only compiles scripts containing constructs it knows about. If you try to compile a script it can't handle, you'll get an error:

~/NetBeansProjects/jruby $ jrubyc lib/ruby/1.8/singleton.rb
Error -- Not compileable: Can't compile node: ModuleNode[]

The compiler currently supports only literal fixnums, strings, and arrays, simple method definitions, while loops, if/else, and calls that don't involve blocks or splatted arguments. More will come as time progresses. The benefit of building the compiler piecemeal like this becomes more apparent in the next section...

JIT Compilation

The current compiler only understands enough of Ruby to handle my experimentation and research. The compiler also does not output one-to-one Ruby-to-Java classes or even a single large method: it outputs a class containing a method for every semantically separate block of code in a given script. In Ruby's case, that means toplevel code, code found within the body of a class, and code found within the body of a method definition. By combining these two traits, we have everything necessary for a simple JIT.

A JIT, or Just-In-Time compiler, performs its compilation at runtime, usually based on some gathered information about the executing code. HotSpot, for example, has an extensive array of optimizations it can perform on running code just by watching how it executes and eliminating unnecessary overhead. My vastly simpler JIT uses a much more basic metric: the number of times a method has been invoked.

The actual compiler code is the same as that used for the AOT compiler, with one major difference. Instead of the generated code being dumped to a file for later execution, it's immediately loaded, instantiated, and snuggled away in the same location where interpreted code used to live. The logic goes like this:

A method is called. We'll name it "foo"
foo's code is written in Ruby, so it's just a sequence of AST nodes to be interpreted
we interpret foo's nodes, but each time we increment a counter. When the counter reaches some number (currently 50), the compiler kicks in
if the code can't be compiled, we continue to interpretation, but we set a flag and never try to compile again
if the code can be compiled, we save the generated code and use it for all future invocations

Because the compiler can generate these small pieces of code, we're able to JIT Ruby code that was not compiled before execution began, gaining the benefits of a compiled platform without losing the flexibility of an agile script-based development model. It also means we can start benefiting from bytecode compilation even before the compiler is complete.

So how well does it perform? Very well, provided you don't go outside the narrow range of AST nodes the script supports:

~/NetBeansProjects/jruby $ cat test/bench/bench_fib_recursive.rb
require 'benchmark'

def fib_ruby(n)
if n < 2
  n
else
  fib_ruby(n - 2) + fib_ruby(n - 1)
end
end

puts Benchmark.measure { fib_ruby(30) }
puts Benchmark.measure { fib_ruby(30) }

Here we have a fib benchmark script with a few nodes the compiler can't handle. For example, the blocks at the bottom of the script won't compile correctly at present. So it's a good candidate for the JIT.

Once the JRuby JIT's been wired up, we can simply run the code as normal:

~/NetBeansProjects/jruby $ jruby test/bench/bench_fib_recursive.rb
compiled: Object.fib_ruby
2.877000   0.000000   2.877000 (  2.876000)
2.955000   0.000000   2.955000 (  2.955000)

You will notice the "compiled" logging output I currently have in the JIT. The only method hit hard enough to be compiled during this run was the fib_ruby method defined on the toplevel Object instance. Now this performance is drastically increased over the current trunk, largely due to compilation but also due to a faster dynamic method invocation algorithm we're experimenting with. And there's still a lot of optimization left to be done at both the compiler and runtime levels. But it's already a vast improvement over JRuby from even a month ago. Things are moving very quickly now.

We also look better running under the Java 6 server VM. The "server" VM performs more aggressive optimizations of Java code than does the default "client" VM. Generally this is because the optimizations involved cause the server VM to start up a bit more slowly, since it waits longer and gathers more information before JITing. However in this case, the results are very impressive when we compare the JRuby JIT running under the Java 6 server VM against Ruby 1.8.5:

~/NetBeansProjects/jruby $ jruby SERVER test/bench/bench_fib_recursive.rb
compiled: Object.fib_ruby
1.645000   0.000000   1.645000 (  1.645000)
1.452000   0.000000   1.452000 (  1.453000)
~/NetBeansProjects/jruby $ ruby test/bench/bench_fib_recursive.rb
1.670000   0.000000   1.670000 (  1.677901)
1.660000   0.000000   1.660000 (  1.671957)

The future's looking pretty bright.

None of this code is in trunk at the moment, but it should land fairly soon. The AOT compiler may come before the JIT, since it's minimally invasive and won't affect normal interpreted mode execution. Look for both to be available in JRuby proper within a week or two, and watch for the compiler itself move toward completion over the coming weeks.

Saturday, January 06, 2007

Five Things About Me

Tor, you sneaky devil. You tagged me before anyone else had a chance. You grabbed the brass ring. Kudos.

So to continue the "5 Things" meme (for the record, I really hate the word "meme"), I present for you five things you probably don't know about me. Actually, some of you will know some of these facts, but I doubt any of you will know them all. I've tried to pick the most quirky or interesting bits out of my otherwise humdrum life.

Some time in 1998, I became the lead developer on the LiteStep project. LiteStep was a very popular replacement for the Explorer desktop shell on Windows during the late 90s. It provided a new taskbar, desktop window, NeXT-like dock, and pluggable UI and theming system. For hardcore users tired of the boring Explorer UI, it was the state of the art.

Originally created by a fellow named Francis Gastellu, it had by 1998 grown rather quiet. At the time, the codebase was silently fading away, with none of the original developers still working on the project and few active developers interested in or able to make a large time commitment to get LiteStep going again. I discovered LiteStep and was attracted by its ability to replace the entire desktop Look & Feel of my Windows machines. I had also been an avid Win32 developer, releasing the shareware program "Hack-It" to some minimal financial success. However the LiteStep code was in really rough shape.

Almost all the logic was packed into a single large C file that controlled the main desktop window. All the other modules were heavily dependent on this one piece of code, which ultimately crippled LiteStep's ability to incorporate certain types of UI plugins into a user's desktop. I tackled the problem in two ways:
1. I started converting the core plugins to C++ pure virtual classes and implementations, to allow for a more componentized system
2. And I reworked all the critical functionality from the desktop module into a central runtime, allowing all other modules to finally remove their desktop dependencies
Over the next year, LiteStep started to grab the attention of the desktop theming community once again. "Skinning" in general really took off during this time, with the launch of new shells GeoShell, DarkStep, and others. An article published in Wired (for which I was interviewed but not quoted) detailed this new movement.

Sadly, with the release of theming capabilities in Windows XP, the rise of Linux desktops, and the rebirth of Macintosh with OS X, LiteStep has long since fallen from grace. But to this day I still have the odd person walk up to me and thank me for my efforts during that time. LiteStep, we barely knew ye.

I suppose an addendum to this item is that for many years I wrote at least as much Win32 C++ code as I did Java, and I still have the programming guides to prove it. How's that for diversity?

I do not remember a time in my life I was not in front of a computer. The first computing experience I can remember was programming and playing with BASIC on my Atari 400, writing little games and buying programming books containing short apps I could type in...carefully...one finger at a time. I remember saving my programs to the Atari cassette tape drive and praying, praying, praying it would actually take. I remember dialing up to text-based information services at 300bps over an acoustic coupler. In third grade, a mentor came to my elementary to teach me to program in Apple BASIC, though I never owned an Apple computer until my current MacBook Pro.

Throughout gradeschool and highschool, my primary interests lie with computers. I ran a BBS called "Terminal Nightmare" (clever, eh?) for which I toiled many hours creating ANSI graphics and advertising on more popular boards. I brought C programming manuals to school in 8th grade to read during slow periods. I wrote C and assembler code on embedded processors for my dad's electronics design ventures in 9th grade. And so on and so forth. I've been a computer geek as long as I can remember, and I've never had a problem with that.

Toward the end of highschool I started thinking about degree programs. I initially started my post-secondary education in Organic Chemistry, and completed the first two years of requirements. But I hated labs. Some time during the second year, I discovered that there was something called a "Computer Science" degree. Oh, hell yes. From then on I never performed another titration or chromatograph, and I couln't be happier.

When I am not programming (which is extremely rare) I am an enthusiast of complete-information strategy games. I have spent some amount of time reading about and studying Go, which is my favorite game. I enjoy playing various Shogi variants (including Shogi, Chu Shogi, Tenjiku Shogi, and Tori Shogi), though I don't claim to be good at any of them. I will play Xiang Qi, but it's not one of my favorites, and I have not learned any particularly good strategies. I also play Chess, having been taught by my father at an early age.

Occasionally me and a few local friends will get together and play these games until the wee hours of the morning. Some people have LAN parties; we have strategy gaming parties. We most frequently play Bughouse when we can find four people and two clocks, but we often just get together to play the above games one-on-one.

And by "complete-information" games, I mean those in which there is no element of chance. I do not enjoy dice games, and I will play card games only if present company prefers such games. My opinion is that if I lose a game, I would much rather it be due to my own ineptitude than due to random chance.

I was one of the best fight-game players in local arcades in the late 1990s. Oddly enough, I was never drawn to Street Fighter, but I spent literally thousands of dollars over the years getting good at the Mortal Kombat and Killer Instinct series of games from Midway. My friend and I would generally spend most weekend nights at arcades, usually playing for minimum cost against players short on skill but long on quarters. We got quite good.

I was also pretty heavily addicted to those games. During my first two years at the University of Minnesota, I generally skipped class to play. There was such a rush from getting a higher combo, or beating a new player who tried to represent. I also made many friends in those arcades whose names I never knew and whom I have never seen since...but there was a bond among us gamers.

When I had the means, I began to collect arcade machines. Unfortunately, the means ran dry after only a few purchases, but I've been happy to have them. I own the following arcade machines, stowed in my basement and occasionally played:
- Ultimate Mortal Kombat 3 (in an old Atari Rampart cabinet)
- Killer Instinct 2 (in a KI1 cabinet; the sound ROMs are corrupt, so they need a refresh)
- Killer Instinct 1 (original cabinet; not functional at the moment)
- Mortal Kombat 2 (board only)
- Mortal Kombat 1 (board only)
- Teenage Mutant Ninja Turtles (in an old Taito cabinet with no volume control so it's freaking loud)
- Asteroids (yes, the original, in great working condition; however it's in a Lunar Lander cabinet, of which only a few thousand were ever made. Definitely the gem of the collection).
I also own the hollowed-out remnants of an old Gun Fight cabinet. I intended to restore it, but the side art and wood were in very poor shape. It's rotting in the garage.

I'd love to have a Ms PacMan, Q*Bert, or Tron machine. Unfortunately, so would the rest of the world.

I write and eat left-handed, though I prefer my right hand for almost everything else. Unfortunately, like most lefties, this means I can't use writing utensils that may smear or smudge. You lefties know what I'm talking about: the dreaded "pencil hand" you get from dragging your hand through what you've just written. In junior high I finally got tired of having to wash pencil lead off my hand every day, and for several years I utilized a novel solution:

I wrote backwards.

Friday, January 05, 2007

Ruby Breaks TIOBE Top Ten; Declared Language of the Year

The headline says it all, really!

The TIOBE Programming Community Index measures language popularity based on "the world-wide availability of skilled engineers, courses and third party vendors" using the major search engines. It's not not a terribly scientific way to measure popularity, but I'm not sure anyone has a better index.

Ruby has been moving up every month during 2006 and for the first time has broken the top ten in January 2007. TIOBE also declared it the "Programming Language of 2006", which comes as no surprise to us Rubyists who love the language so much.

Congratulations, Ruby!

Thursday, January 04, 2007

New JRuby Compiler: Progress Updates

I've been cranking away on the new compiler. I'm a bit tired and planning to get some sleep, but I've gotten the following working:

all three kinds of calls
local variables
string, fixnum, array literals
'def' for simple methods and arg lists
closures

Now that last item comes with a big caveat: I have no way to pass closures I create. The code is compiling, basically as a Closure class that you initialize with local variables and invoke. But since block management in JRuby is still heavily dependent on ThreadContext nonsense, there's no easy way to pass it to a given method. So the next step to getting closures to work in the compiler is to start passing them on the call path as a parameter, much like we do for ThreadContext.

I've managed to keep the compiler fairly well isolated between node walking and bytecode generation, though the bytecode generator impl I have currently is getting a little large and cumbersome. It's commented to death, but it's pushing 900 LOC. It needs some heavy refactoring. However, it's behind a fairly straightforward interface, so the node-walking code doesn't ever see the ugliness. I believe it will be much easier to maintain, and it's certainly easier to follow.

In general, things are moving along well. I'm skipping edge cases for some nodes at the moment to get bulk code compiling. There's a potential that as this fills out more and handles compiling more code, it could start to be wired in as a JIT. Since it can fail gracefully if it can't compile an AST, we'd just drop back to interpreted mode in those cases.

So that's it.

...

Ok, ok, here's performance numbers. Twist my arm why don't you.

(best times only)

The new method dispatch benchmark tests 100M calls to a simple no-arg method that returns 'self', in this case Fixnum#to_i. The first part of the test is a control run that just does 100M local variable lookups.

method dispatch, control (var access only):
 interpreted, client VM: 1.433
 interpreted, server VM: 1.429
 ruby 1.8.5: 0.552
 compiled, client VM: 0.093
 compiled, server VM: 0.056

Much better. The compiler handles local var lookups using an array, rather than going through ThreadContext to get a DynamicScope object. Much faster, and HotSpot hits it pretty hard. At worst it takes about 0.223s, so it's faster than Ruby even before HotSpot gets ahold of it. The second part of the test just adds in the method calls.

method dispatch, test (with method calls):
 interpreted, client VM: 5.109
 interpreted, server VM: 3.876
 ruby 1.8.5: 1.294
 compiled, client VM: 3.167
 compiled, server VM: 1.932

Better than interpreted, but slow method lookup and dispatch is still getting in the way. Once we find a single fast way to dynamic dispatch I think this number will improve a lot.

So then, on to the good old fib tests.

recursive fib:
 interpreted, client VM: 6.902
 interpreted, server VM: 5.426
 ruby 1.8.5: 1.696
 compiled, client VM: 3.721
 compiled, server VM: 2.463

Looking a lot better, and showing more improvement over interpreted than the previous version of the compiler. It's not as fast as Ruby, but with the client VM it's under 2x and with the server VM it's in the 1.5x range. Our heavyweight Fixnum and method dispatch issues are to blame for the remaining performance trouble.

iterative fib:
 interpreted, client VM: 17.865
 interpreted, server VM: 13.284
 ruby 1.8.5: 17.317
 compiled, client VM: 17.549
 compiled, server VM: 12.215

Finally the compiler shows some improvement over the interpreted version for this benchmark! Of course this one's been faster than Ruby in server mode for quite a while, and it's more a test of Java's BigInteger support than anything else, but it's a fun one to try.

All the benchmarks are available in test/bench/compiler, and you can just run them directly. If you like, you can open them up and see how to use the compiler yourself; it's pretty easy. I will be continuing to work on this after I get some sleep, but any feedback is welcome.

Wednesday, January 03, 2007

InvokeDynamic: Actually Useful?

Over time I've become less convinced that hotswappable classes would be an absolute requirement for the proposed invokedynamic bytecode to be useful, and more convinced that there's a number of ways a dynamic language like Ruby or Groovy could utilize the new bytecode. This post gives a little background on invokedynamic and attempts to summarize a few ideas off the top of my head.

Many folks, myself included, have long held that the proposed invokedynamic bytecode would only be useful if coupled with hotswappable classes. Hotswapping is the mechanism by which we could alter class structure after definition and have existing instances of the class pick up those changes. It's true this would be required if we were to compile Ruby all the way to bytecode; since Ruby classes are always open, we need the ability to add and remove methods without destroying already-created instances. The argument goes that if invokedynamic requires a dynamically-invoked method to exist on a target receiver's type, then we would only ever be able to invokedynamic against compiled Ruby code if we could continue to alter those types when classes get re-opened.

I do believe that hotswapping would be useful, but it's fraught with many really difficult problems. To begin with, there's Java's security model, whereby a class that's been loaded into the system *can not* be modified in most typical security contexts. The JVM does have the ability to replace existing method definitions at runtime, but that's generally reserved for debugging purposes, and it doesn't allow adding or removing methods. It also does not currently have the ability to wholesale remove and replace a class that has live instances, and it's an open research question to even consider the ramifications of allowing such a thing.

So what are the alternatives? Gilad Bracha proposed having the ability to attach methods dynamically to a given static class at runtime. This would perhaps be similar to the CLR's "dynamic methods". This idea perhaps has more merit...one issue not addressed by hotswappable classes is that even once we compile Ruby to bytecode, it's still dynamic and duck-typed. Would all methods accept Object and return Object? Is that useful? By specifically stating that some methods are dynamic and mutable (in the case of a Ruby class, likely all methods we've compiled), you effectively create the equivalent of hotswapping without breaking existing static types and their security semantics.

But this is all research that could and perhaps should occur outside invokedynamic, and it all may or may not be related. So then, can invokedynamic be useful with these class-structure questions unanswered? What does invokedynamic mean?

To me, invokedynamic means the ability to invoke a method without statically binding to a specific type, and perhaps additionally without specifying static types for the parameter list. For those that don't know, when generating method-call bytecodes for the JVM, you must always provide two things in addition to the method name: the class within which the method you're invoking lives and the precise parameter list of the method you want to call. And there's not much wiggle room there; if you're off on the target type or if the receiver you're calling against has not yet been cast to (or been determined to match) that type, kaboom. If your parameter list doesn't match one on the target type, kaboom. If your parameters haven't been confirmed as being compatible with that signature, kaboom. Perhaps you can see, then, why writing a compiler for the JVM is such a complicated affair.

So there's potential for invokedynamic to make even static compilation easier. Without the need to specify all those types, we can defer that compile-time magic to the VM, if we so choose. We don't have to dig around for the exact signature we want or the exact target type. Given a receiver object, a method name, and a bundle of parameter objects, invokedynamic should "do the right thing."

Now we start to see where this could be useful. Any dynamic language on the JVM is going to be most interesting in the context of the platform's available libraries. Ruby is great on its own, and there's certainly an entire (potentially large) market segment that's interested in JRuby purely as an alternative Ruby runtime. But the larger market, and the more intriguing application of JRuby, is as a language to tie the thousands of available Java libraries together. And that requires calling Java code from Ruby and Ruby code from Java with as little complexity and overhead as possible.

Enter invokedynamic.

Now I've only recently started to see how invokedynamic could really be useful even without dynamic methods or hotswappable classes, so this list is bound to grow. I'd love to have all three features, of course, but here's a few areas that invokedynamic alone would be useful:

Our native implementations of Ruby methods can't really be tied to a specific concrete class, since we have to be able to rewire them at runtime if they're redefined. If invokedynamic came along with a mechanism for doing a Java-based "method_missing", whereby we could intercept dynamic calls to a given object and dispatch in our own way, we could make use of the bytecode without having hot-swappable classes.
It would also aid compilation and code generation. In my work on the prototype compiler, one of the biggest stumbling blocks is making sure I'm binding method calls to the appropriate target type. I must make sure the receiver of a method has been casted to the type I intend to bind to or Java complains about it. If there were a way to just say invokedynamic, omitting the target type, it would make compilation far simpler; and I don't believe HotSpot would have to do any additional work to make it fast, since it already has optimizations under the covers that are fairly type-agnostic.
To a lesser extent, invokedynamic could push the smarts of determining appropriate method signatures onto the VM. I would supply a series of parameters and a method name, and tell the VM to invokedynamic. The VM, in turn, would look at the params and name and select an appropriate method from the receiving object. This is in essence all that's needed for real duck typing to work.

This last item calls out a perhaps surprising area that invokedynamic would be very useful: invoking Java code from a dynamic language.

When calling Java code from Ruby, for example, all we really have to work with are two details: a method name and potentially an arity. We can do some inference based on the actual types of parameters, but there's a lot of magic and a number of heuristics involved. If there were a JVM-native mechanism for calling arbitrary methods on a given object, without having to statically bind to those methods, it would eliminate much of our Java integration layer.

All told, I think invokedynamic would definitely be much more than a PR stunt, as some have claimed. It would eliminate one of the most difficult barriers to generating JVM bytecodes by allowing arbitrary method calls that aren't necessarily bound to specific types. I for one would vote yes, and I plan to throw my weight behind making invokedynamic do everything I need it to do...with or without hotswapping.

Tuesday, January 02, 2007

Groovy 1.0 is Released!

Congratulations to the Groovy team on their release of Groovy 1.0! Groovy is another dynamic language for the JVM inspired by features in Smalltalk, Python, and of course Ruby. It's been a long time coming, and a lot of hard work involved, but Groovy 1.0 is finally here.

See the announcement from Guillaume Laforge, one of the Groovy team members.

Here's hoping there's a bright future of cooperation between the Groovy team and the other dynamic languages for the JVM.

Monday, January 01, 2007

Welcome Nick Sieger to the JRuby Team

The team has grown again! After I asked the JRuby community to nominate a new team member, based on past code, mailing list, documentation, or other contributions, a number of folks thought Nick Sieger would be a good addition. And we agreed.

Nick is the original author of the ActiveRecord-JDBC connector, and has done a lot of work wiring JRuby up with NanoContainer. He's been an active member of the mailing lists and you've probably all read his blog at some point...if only for his excellent summary posts from RubyConf 2006. Even better, Nick hails from the Minneapolis area like Tom and I, and we attend the same Ruby user group meetings with the Ruby Users of Minnesota.

We also expect Nick will bring his familiarity with Maven 2 and his professional experience leading both Java and Ruby-based projects. He's a good developer and a good leader to add to the team.

Hopefully this will also serve as a reminder that JRuby is a true Open Source project, and anyone with Ruby and/or Java experience can easily start helping out. The team and the community continue to grow, as does Ruby's potential on the JVM.

Welcome to the team, Nick!

Wednesday, December 27, 2006

Making Dynamic Invocation Fast: An Idea?

Evan Phoenix (of Rubinius fame) were discussing dynamic dispatch today on #rubinius, sharing our caching strategies and our dispatch woes. We talked at length about various strategies for speeding dispatch, cache invalidation mechanisms, compilation techniques, and so on. All gloriously fun stuff.

So at one point I related to him the extended version of my plans for speeding up dynamic dispatch. I relate it now to you to hopefully spur some discussion. There are no original ideas anymore, right? Or are there?

The initial experiment with Fixnum basically was the static version of my fuzzy vision. During parse time, a very trivial static table mapped method names to numbers. I only handled three method names, +, -, and <. Then those numbers were passed to Fixnum during method dispatch, where a very trivial static switch statement used them to "fast dispatch" to the appropriate methods.

The ultimate vision, however, is much grander.

The general idea is that during dynamic dispatch, we want to use the method name and the class as keys to locate a specific method instance. The typical way this is done is by keeping a hashtable on every class, and looking up methods by name. This works reasonably well, but each class must have its own table and we must pay some hashtable cost for every search. This is compounded when we consider class hierarchies, where the method we're looking for might be in a super class or a super class's super class, ad infinatum.

Retrieving the class is the easy part; every method we want to dispatch will have a receiver, and receivers have a reference to their class.

So then, how do we perform this magic mapping from M method names and N classes to a given method? Well hell, that's a matrix!

So then the great idea: build a giant matrix mapping method names (method IDs) and classes (class IDs) to the appropriate switchable value. Then when we dispatch to the class, we use that switchable value in a neatly dense switch statement.

Let me repeat that in another way to make it clear...the values in the matrix are simple "int" values, ideally as closely-numbered (dense) as possible for a given class's set of methods. In order to dispatch to a named method on a given object, we have the following steps:

Get the method ID from the method (likely calculated at parse or compile time)
Get the class ID from the class (calculated sequentially at runtime)
Index into the matrix using method ID and class ID to get the class-specific method switch value
Use the switch value to dispatch (via a switch statement, of course) to the appropriate method in the target class

Ignoring the size of this matrix for a moment, there are a number of benefits to this data structure:

We have no need for per-class or inline method caches
Repopulating the switch table for a given class's methods is just a matter of refilling its column in the matrix with appropriate values
Since the matrix represents all classes, we can avoid searching hierarchies once parents' methods are known; we just represent those methods in the matrix under the child's column and fast-dispatch to them as normal
We can save off and reconstitute the mapping to avoid having to regenerate it

...and there's also the obvious benefit: using the switch values stored in the matrix, we can do a fast dispatch on the target object with only the minimal cost of looking up a value in the matrix. An additional benefit for C code would be simply storing the address of the appropriate code in the matrix, avoiding any switch values completely.

Now naturally we're talking about a big matrix here, I'll admit that. A fellow by the name of "Defiler" piped up and said an incomplete app of his showed 1147 unique method names across 1258 classes. That works out to a matrix with 1.4M elements. In the Java world, where everything is 32 bits, that's around 6MB of memory consumed just for method mapping. Acceptable? Perhaps. But what about using a sparse matrix?

A sparse matrix attempts to be as efficient as possible when there are large numbers of "zeros" in the matrix elements. That is, interestingly enough, exactly what we have here. The vast majority of entries in our method-map would contain a zero, since most methods would not exist on most classes. Zero, as it turns out, would neatly map to our good friend "method_missing", and so dispatching to a method on a target class that doesn't implement it would continue to work as it does today. Even better, method_missing dispatch would be just as fast as a regular method dispatch, excluding the actual method_missing implementation code, of course.

So then, there it is...a potential plan for fast dynamic method invocation in JRuby (and perhaps in Rubinius, if Evan agrees the idea has merit). Any of you out there heard of such a thing? Is this an old idea? Are there problems with it? Share!

Holiday Fun: Interpretation, Loading, Dispatching Work

This week (what's left of it) I'm spending on performance again. It's officially a holiday for Sun, so I'm not technically on the clock. I could even sleep the entire week and pick things back up on Tuesday.

Yeah, right.

One of the oddities of working on JRuby while working at Sun stems from the fact that I was working on JRuby for fun for the past two years. Now that this is the full-time job, it's even harder to pull myself away from it. I thought perhaps making JRuby my job might take away some of the attraction. In reality, it's just made a good thing better, since I can spend long hours on the hard problems I would never have tackled before.

So instead of stopping work completely, I'm shifting gears a bit. Instead of the heavy Rails focus we've had over the past month, I'm hitting other "fun" stuff instead. Compilation, interpretation performance, and so on. We know there's still lots of fruit to be plucked from the performance tree, and of course any additional work/research on compilation will help that eventual goal.

Compiler Refactoring Underway

As far as compilation goes, I've started committing a refactored compiler; it separates AST-node-walking from code generation, so the backend could be swapped out with a YARV generator or future compiler revisions. I think that's needed to allow the compiler and the AST to evolve independently, since I believe there will be many possible compile targets and potentially many AST changes in the future. Nothing too fancy there, but it's a bit more readable so hopefully others can also contribute.

Interpreter Enhancements and Fixes

On the interpretation front, there are a few items I've been working on.

1. Speeding up method invocation and block management (committed)

Ruby's AST represents method calls that take a block by putting the block first in the AST. You encounter an "iter node" that points at the "call" with which it is associated. So the typical way you evaluate those nodes is to create the block object and then visit the call. Unfortunately, the call itself may lead to other calls with their own blocks, for evaluating the arguments or receiver.

The end result of this node ordering is that every time a method is called, the evaluation of its args and receiver has to juggle the current "block available" status around, so the block doesn't get consumed before it's needed. Because the block gets created early, we have to push it and the "block available" status onto a stack. This is the primary reason the block logic in the interpreter and ThreadContext is so complicated.

An example would help illustrate what's happening. For the following code:

foo("hello") { 1 }

The parser generates the following hierarchy of AST nodes:

IterNode[]        <= this is the block
 NewlineNode[]
  FixnumNode[] 1  <= this is the Fixnum 1 inside the block
 FCallNode[] foo  <= this is the call to foo
  ArrayNode[]: {StrNode[]}
   StrNode[]"hello"

The node associated with the block is encountered first, so we construct the block then. We move on to the "foo" call, and the block is consumed. No problem, right? However, here's a more complicated example that illustrates the trouble with this AST ordering:

foo { 1 }.bar { 2 }.baz(hello { 3 }) { 4 }

Here things are a bit more interesting. The parser produces the following output:

IterNode[]            <= the "4" block
 NewlineNode[]
  FixnumNode[] 4
 CallNode[] baz
  IterNode[]          <= the "2" block
   NewlineNode[]
    FixnumNode[] 2
  CallNode[] bar
   IterNode[]         <= the "1" block
    NewlineNode[]
     FixnumNode[] 1
   FCallNode[] foo
 ArrayNode[]: {IterNode[]}
  IterNode[]          <= the "3" block
   NewlineNode[]
    FixnumNode[] 3
  FCallNode[] hello

So what actually happens here? First the "4" block is encountered, instantiated, and pushed onto the block stack. Then we proceed to the "baz" call. Unfortunately, the "baz" call has both a receiver and arguments, so we have to hide the "4" block to prevent it being consumed. We proceed to evaluate the receiver for "baz", encountering the "2" block. The "2" block is instantiated and pushed down, and we move on to the "bar" call. The "bar" call has another receiver; we evaluate that, encountering the "1" block and the "foo" call it's associated with. The "foo" call consumes the "1" block and returns a receiver for "bar". The "bar" call consumes the "2" block and returns a receiver for "baz". Now the "baz" call has to evaluate its arguments, so the "3" block is created and consumed by the "hello" call. Finally, with a receiver and args, we can call "baz" and consume the "4" block.

Confused? Me too. Why would you order it this way? Perhaps it's to ease parsing, or perhaps there's some reason I don't know. However, I believe the following ordering is much simpler (and I know it makes interpretation easier):

(extraneous nodes omitted)

CallNode[] baz
 CallNode[] bar       <= the receiver for "baz", a call to "bar"
  FCallNode[] foo     <= the receiver for "bar", a call to "foo"
   IterNode[]         <= the "1" block associated with "foo"
  IterNode[]          <= the "2" block associated with "bar"
 ArrayNode[]          <= args to "baz"
  FCallNode[] hello   <= the call to "hello"
   IterNode[]         <= ...and its "3" block
 IterNode[]           <= finally the "4" block

The advantages here should be obvious. We encounter the blocks in order, so there's no stack juggling involved. Because there's no stack juggling, we don't have to "hide" blocks as we evaluate receivers and arguments. Finally, because we know we'll only encounter blocks for methods that require them, there's no additional overhead for methods that don't need blocks. It's a good change, and I would love to understand why Ruby uses the more complicated AST structure instead of this.

I have already committed a change to reorder the way these AST nodes are handled. The AST itself is unchanged, but the visit to a given block (IterNode) just sets that node into the associated call and proceeds. The calls themselves are now responsible for creating an associated block (if necessary)...*after* receiver and args have been dealt with. This means two things: calls that don't accept blocks don't pay any block-manipulation penalty; and calls with peripheral blocks (for finding args or receivers) don't pay any block-manipulation penalty either. Only the calls that need blocks have to deal with them.

Eventually this will become an AST change, but this short-term fix resolves 90% of the interpreter goofiness right now. This change will also eventually mean the iter stack (and potentially the block stack) disappear too. Huzzah!

2. LoadService fixes, enhancements, and optimizations (committed)

LoadService cleanup and improvements are well under way. LoadService is responsible for "load" and "require" calls and does all the searching for files and management of loaded extensions and libraries. Unfortunately the existing heuristic was both broken and terribly inefficient.

Ruby's 'load' behavior is easy enough...just look for the exact file and execute it in the current runtime. Ruby's 'require' however has a bit more magic to it.

If you specify a full filename to 'require' it will use that filename to load either a source file or an extension, depending on whether you specify ".rb" or ".[so|o|dll|etc..]". If you do not specify an extension, it will search for .rb, .so, etc in turn until it finds something, If it finds nothing, that's a load error. Here's the "ri" doc for MRI's "require":

--------------------------------------------------------- Kernel#require
  require(string)    => true or false
------------------------------------------------------------------------
  Ruby tries to load the library named _string_, returning +true+ if
  successful. If the filename does not resolve to an absolute path,
  it will be searched for in the directories listed in +$:+. If the
  file has the extension ``.rb'', it is loaded as a source file; if
  the extension is ``.so'', ``.o'', or ``.dll'', or whatever the
  default shared library extension is on the current platform, Ruby
  loads the shared library as a Ruby extension. Otherwise, Ruby tries
  adding ``.rb'', ``.so'', and so on to the name. The name of the
  loaded feature is added to the array in +$"+. A feature will not be
  loaded if it's name already appears in +$"+. However, the file name
  is not converted to an absolute path, so that ``+require
  'a';require './a'+'' will load +a.rb+ twice.

There's also the issue of what paths to search. Ruby searches the current directory first, followed by custom load paths, site_ruby dirs, and ruby/1.8 dirs. This allows a number of mechanisms for overriding more general locations with more specific ones for particular uses.

JRuby adds a new wrinkle here: classloader resources. JRuby supports JARing up source files and loading them through Java's classloader mechanisms. This is how the JRuby applet and the "complete" JRuby JAR work: they simply include all Ruby source into the archive. So then for us the classloader/classpath represents an additional path to be searched.

The primary problem with JRuby's load heuristic is that it ended up searching the classloader far too frequently; and in many cases it searched it multiple times for files that could not exist, such as for complete absolute paths (starting with '/', which *never* works for classloader resources). A second problem with the heuristic is that it would try all locations for all extensions, so for example it would search for xxx.rb everywhere possible, then xxx.rb.ast.ser (our serialized AST format) everywhere possible, then xxx.so (used internally for extensions...though that may change), and so on. The result of this is that any extensions or serialized scripts went through the full monty of searches before being found on a second or third pass.

These two issues were compounded when running JRuby with a very large classpath. Because classloader resources can be expensive to search, our loading became linearly slower in relation to the number of JARs (or perhaps the size of jars) included into the JVM. More JARs, slower classloader resource searching, slower startup.

The fix for these issues is twofold: search a given load location for all filename extensions before moving on, and only search the classloader as a last resort for filenames likely to be found there. It's fairly simple to explain, but the LoadService code had been endlessly hacked and rejiggered over the years. The cleanup was 90% of the battle; the fixes were considerably easier.

A final issue with LoadService, which is now fixed, was that it allowed require to include files with no extensions at all. This is not correct Ruby behavior, and so now those files can't be loaded. This actually caused a bug a long time ago where the extension-free "rake" startup script was being loaded before the "rake.rb" library file it tried to locate. The result was an eventual stack overflow as the file tried to continually load itself. Goofy behavior we should never see again.

3. Speeding up dynamic dispatch

I'm also doing some experimental work to speed up dynamic dispatch. Currently, methods are located by name, looking up a callable object out of a big hash on a per-class basis. This works reasonably well, and there are caches to speed the process, but with all the recent performance work this search has started to become the new bottleneck.

The eventual callable objects looked up have another flaw: they make Hotspot optimization more difficult. Because they're behind an ICallable interface, because they have multiple levels of logic as well as pre/post-call setup and teardown, and because there are many different implementations of ICallable, they end up slowing execution down significantly.

Hotspot is really an amazing piece of work. For the vast majority of Java code, it's able to unroll loops, inline invocations, dynamically optimize conditionals and switches, and generally improve the speed of code by a drastic amount. When running in "server" mode, where Hotspot lets code run longer in interpreted mode before optimizing it, even greater improvements can be seen. For example, A fib benchmark--recursive and iterative--under Java 6 client and server VMs:

(best times shown)

client recursive:
8.551000

client iterative:
17.995000

server recursive:
5.008000

server iterative:
13.191000

MRI recursive:
1.670000

MRI iterative:
16.964403

These numbers aren't bad, really. We even beat MRI for this trivial benchmark when we start hitting Bignums heavily (Java's BigInteger implementation is quite a bit faster than Ruby's). However, we can certainly do better.

There are two experimental changes I'be been working on. The first eliminates the pre/post method setup for core libraries when that setup is not necessary. I call it "fast invocation", and it's applicable to a large majority of core class methods. For example, with just fast invocation, the recursive fib numbers above drop to the sub-5s range. This change is perfectly safe, since it's just eliminating interpreter overhead that would otherwise be wasted cycles. You can expect to see it included in JRuby soon.

The second change is the holy grail of dynamic invocation: eliminate, to the greatest extent possible, the overhead of looking up and dispatching to a given method. In short, make it as close to a simple static dispatch as possible. This is where the real speed gains in JRuby will start to show up.

I have some experimental code right now, focused on the fib benchmark, that is both safe and drastically improves performance. It's sub-par code at the moment, but it does produce results like this:

recursive before:
5.008000

recursive after:
3.864000

Now of course, this is still interpreted. The same change when applied to my experimental Ruby compiler produces a much more drastic effect:

compiled recursive after:
1.550100

Now we start to see the value of eliminating dynamic-dispatch overhead. This is actually *faster* than Ruby's recursive fib, a feat that hasn't been accomplished by JRuby at any time in the past.

The trick to this is fairly simple. For common core methods which are known to be simple Java code, such as for Fixnum's +, -, and < implementations, I provide integer IDs. Within the Fixnum implementation there's a new implementation of "callMethod", our dynamic-dispatcher, which switches on these IDs. For methods it knows, such as the aforementioned +, -, and <, it dispatches directly to op_plus, op_minus, or op_lt, the Java implementations. This skips the lookup phase, the ICallable implementation, the ThreadContext manipulation, and the pre/post-method setup code completely. It's also perfectly safe, again, because all those pieces only waste cycles for simple methods like this.

Now one problem with a simple approach like this is that if you redefine Fixnum#+, that change won't be picked up. The simple Fixnum callMethod won't ever try a method search for methods it knows can be fast-dispatched. I resolved this in my experimental code by adding a "clean" flag to the Fixnum class. If any any point after its initial definition the Fixnum class becomes "dirty", e.g. if you add or redefine a method, the old, slow dispatch will come back into play. My simple version is too coarse-grained, killing fast dispatching for all methods if any of them are changed, but the principal is sound. I'm going to be exploring this the rest of the week, trying to find a more complete solution...but some good things are around the corner.

--

All told, it's been a productive few days since JavaPolis. I'm hoping to write up a JavaPolis recap soon, but I'm keen to use this holiday time to get some cool stuff done. You'll hear more after the first of the year...hopefully with committed code and additional benchmarks against JRuby trunk :)