Headius

Monday, August 07, 2006

Sunday Night Niblets

Camping

After a minor fix provided by Ola Bini (thanks Ola!) we now have Camping running under JRuby, using the ActiveRecord JDBC adapter. According to Ola, Camping under JRuby seems to run very well, and feels very snappy. I'm going to be playing with it a bit soon, and may have a demo site up by tomorrow.

JRuby Extras

The JRuby Extras project is officially launched on RubyForge. The ActiveRecord JDBC adapter is there and has received modifications to work with Oracle as well. The work thusfar on Mongrel is also there, and it appears that we may have Mongrel working under JRuby shortly. If you have a particular Ruby app that needs some JRuby-specific modifications or extensions, please let me know; this is a community-driven project to make Ruby apps spectacular under JRuby.

Thursday, August 03, 2006

Calling all Tor Norbyes

Ok, perhaps the blogosphere can help with this one. I've been trying to respond to an email from Tor Norbye at Sun Microsystems since Sunday. Unfortunately, my emails seem to be shuffled off into the ether. He sent another email to me today, saying he hasn't heard back from me.

Tor! I'm right here! Give me an alternate way to contact you...your Sun address seems to be kaput.

Wednesday, August 02, 2006

JRuby: It's pretty much my favorite animal.

Some of the local Rubyists and I were talking about publishers for a future JRuby book...ligers were brought up...and, well, here's the result.

It's just perfect...the lion of Java mated with the tiger of Ruby...and the magical JRuby is their offspring. It brings a tear to your eye, doesn't it?

Feel free to digg it.

Update: A couple folks have noted that O'Reilly's Java animal is already a tiger, so I guess I've got the roles reversed!

Busy Bees

I haven't posted anything substantive in a while, but things are moving rapidly forward. Here's a quick summary of what's been going on with JRuby:

RubySpec

I have launched an effort to build up a Ruby specification, Wiki-style. At The RubySpec Wiki, contributors can write short pages/articles on any aspect of Ruby: the language, the libraries, or the implementations. The eventual goal of this is to create a comprehensive library of content describing in detail how every aspect of Ruby is *supposed* to work. This in turn will help alternative implementations like JRuby, Ruby.NET, Cardinal, and others ensure they are functioning correctly.

CONTRIBUTORS ARE NEEDED! Please create an account and add whatever you can. Found out about a new feature, quirk, or bug in Ruby? Add it! Feel like porting over some core docs in a more spec-like format (i.e. including edge cases and formal semantics)? Go for it! The Spec will only succeed with user contributions. I may sponsor contests to see who can contribute the most...so keep an eye out!

The New RubyTests

The RubyTests project on RubyForge mainly houses the Rubicon suite, a collection of tests originally created for the first PickAxe book and based on Ruby 1.6. Over the past several years, it's been slowly, slowly updated for 1.8, but the library is showing its age. To complicate matters, other test libraries have sprung up to remedy some of Rubicon's deficiencies: BFTS, from the MetaRuby guys, and now a RubyTests project from the Ruby.NET team out of QUT. In addition, contributors to the RubySpec have called for a place to keep tests that go along with the specification. Something had to be done.

This past week, I sent out a proposal to all the RubyTests project members and the MetaRuby guys about finally unifying all our efforts under one grand test suite. The response so far has been excellent...Ryan Davis of MetaRuby told me he agrees with my plan, and others on the RubyTests project also agree this is the way to go. The wheels are in motion!

I will act as steward for the new RubyTests project, but only to fostor community collaboration. We'll initially consider pulling all the myriad projects under the RubyTests umbrella, and then start discussing issues like what testing framework to use, how or whether to generate tests, and how to provide traceability back to items in the nascent RubySpec. I encourage anyone interested in seeing Ruby improve and flourish on all platforms to join the project and contribute.

Block Refactoring Work

It recently became apparent that the current block-management code in JRuby (modeled almost exactly on C Ruby) is rather inefficient; doubly so in JRuby because we don't have C tricks like unions and longjmps. Tom also discovered after some research that much of the block-scoping semantics can be pulled out during the parsing process and stored, saving many searches later on. To these ends, we have both been working on refactoring JRuby internals to improve how blocks function.

I have been working to modify the call chain to pass blocks along as part of the frame. This simplifies a great many things, since the correct block to which to yield is now just a field-access away (and eventually, just an argument-access away). Previously, multiple stack pushes and pops were necessary to get the correct frame, causing great undue overhead. Also, I have rewired how Proc instances are invoked, so instead of two pushes and two pops on our internal "block stack", it now just calls the proc directly. Much cleaner. The eventual goal of this is to eliminate the "block stack" and also the "iter stack", which maintains a stack of flags indicating whether a block is available or currently executing.

Tom's work will make static much of the information about how blocks are scoped, since their relative orientation in the original source provides almost all the information we need. This will allow us to automatically or more quickly locate the appropriate variable when accessing such from within a block, as well as ensuring our variable scoping is handled correctly with multiple nested blocks. He is also keeping in mind that evals can change the list of variables, so the end result should work fine in those cases as well. The end result is that variable scoping will be much more reliable and performant when blocks are involved.

RubyInline for JRuby

After doing a bit of exploration on how Ruby extensions are written, I stumbled across yet another post from Ryan Davis about the beauty and simplicity of RubyInline. I am not a huge C fan, having had my fill of it during my old LiteStep days (I was lead LiteStep dev during the "great redesign" period), but the attraction of RubyInline is undeniable.

Ryan and I had a brief discussion over IM, during which we agreed that adding JRuby/Java support to RubyInline would be a really great idea. Then instead of just specifying C code in your RubyInline blocks, you could easily do the following:

class Example
inline(:C) do |builder|
  builder.c "int test1() {
               int x = 10;
               return x;
             }"
end
inline(:java) do |builder|
  builder.java "public int test1() {
               int x = 10;
               return x;
             }"
end
end

...and know that whether under JRuby or C Ruby, your inlined code would shine through. Look for this effort to pick up soon; Ryan has agreed to include it in RubyInline once it's ready.

Mongrel for JRuby

Danny Lagrouw and Ola Bini, perennial JRuby community superstars, have been working on implementing the native bits of Mongrel in Java. Danny put together a YACC-based HTTP request parser (since we don't have a Java Ragel yet) and today Ola implemented a quick ternary search tree in Java. With these two pieces working, we just have to wire up Mongrel and try it out in JRuby. It's very close.

What's the value of Mongrel when we have servlet containers to host Rails apps? That question answers itself. Name one Rails developer who's enamored of servlet containers. Yeah, I didn't think so. WEBrick is a poor substitute for a real container, and almost all Rails deployments are going Mongrel now. Not supporting Mongrel would be a showstopper for many, many Rails projects. Therefore, we're making it happen.

JRuby Extras!

I have requested a new project on RubyForge called "JRuby Extras". This project is intended to be a JRuby community love-fest, hosting all the bits and pieces needed to support Ruby apps running under JRuby. It will hold such juicy tidbits as:

The upcoming Mongrel support libraries (at least until they're hopefully included in Mongrel proper)
Nick Sieger's excellent ActiveRecord JDBC adapter (until included in Rails)
Any other JRuby-related extensions that don't have good homes elsewhere
Any Java or JRuby-related updates to other projects (like RubyInline) until included directly into those projects

Where the main JRuby project has only Tom and I as gatekeepers, the jruby-extras project will be more community-oriented. If you've got a good idea for how JRuby can be improved (think like Groovy, with its ten-thousand add-ons), toss us an email...and get busy!

The project, once approved, will be jruby-extras on RubyForge.

Standarizing JRuby Extensions

There are a number of extensions to JRuby internally, to replace missing C functionality from C Ruby. There are also a number of extensions being developed externally, to support things like Mongrel. Unfortunately, there is no standard way to write JRuby extensions like there is for C Ruby. The APIs that we expose are subject to change, and the Java world brings along its own conventions and expectations for how plugins ought to work. In order to settle this question, I have kicked off a thread on the JRuby dev mailing list.

We're going to figure out the best way to support JRuby extensions, along these rough lines:

Requiring an extension will look for an extension library just as it does in Ruby; however, it will be looking for a jar file in the load or class paths containing an appropriately-named entry point.
require "my/extension" will most likely look for extension.jar in under the my/ load or class path, and then load my.ExtensionLibrary contained therein
Since we'll want to use direct invocation now in JRuby, we'll want an easy way for extensions to have the same benefits. Rather than having them implement direct-callable interfaces, we'll likely build a code generator that can take a class and a list of method mappings and generate all stubs and callables needed for JRuby. This will also simplify our own code classes and extensions as well.
There are at least two ways within JRuby to define a new class, its metaclass, and their methods. One is easy but a bit broken; the other is correct but cumbersome. Extension writers will get something in the middle...easy but correct, via various helpers and factories. The same model will also be applied internally. Unification!

Making a Move

On a more personal note, there are events afoot that may give me more time to work on JRuby. I won't go into specifics...just let your imagination run wild.

Sunday, July 30, 2006

RailConf Europe 2006 - Will I Be There?

Well now I've gone and done it.

Dear Charles,

We are pleased to inform you that your RailsConf Europe 2006 presentation proposal:

JRuby on Rails

has been accepted, and will be included in one of the conference tracks.

So there's only two problems now:

- I'm not funded through any source for RailsConf Europe.
- I have to spend the next month (working with the other contributors) making Rails run well enough to be presentable.

Blast it all...now I have to start picking and choosing at which venues I will speak. Why's it got to be so complicated?

Saturday, July 29, 2006

On Coding

It's 2AM. The western world is asleep. Bats flit and chatter outside my urban Minnesota home, chasing mosquitos. My available television channels (only broadcast; I don't have cable) have been reduced to dating shows, infomercials, and evangelical fundraisers.

I'm up coding. Why?

Perhaps it's because I eat, sleep and breathe code. I'm usually up until 2 or 3AM hacking or studying code. I wake up around 7, head to work, and crack open my laptop on the bus to do more coding. I code all day on Java EE apps with occasional JRuby breaks. I come home, sit down in my home office and code on JRuby. I go to bed at 2 or 3AM and the process repeats.

I blog about coding.

I go to user groups focused on coding.

I feel uncomfortable at parties unless I can talk about coding (although I do have other hobbies; mathematical/logical puzzles, pool, go, movies, console video games, and beer among them).

When I get drunk, I go on long-winded rants about coding and code-related topics. When I sober up, my first worry is whether I've damaged part of my code-brain.

My touch-typing method has my right-hand home row permanently set at KL;', since I'm one step closer to ;, ", ', |, \, and Enter (and no, Dvorak doesn't work for coding; I've tried *really* hard).

The nonfiction books in my bookshelf are all books on coding or remnants of my CS undergrad studies.

I am a coder.

Passion

I want to know where the other passionate coders are. I know they're out there, juggling bits and optimizing algorithms at all hours of the night. I know they share many of my characteristics. I know they love doing what they do, and perhaps they--like me--have always wanted to spend their lives coding.

How do we find them?

Google seems to know how. Give the coders what they want: let them work when the sun is asleep, let them eat when they want to eat, dress like they want to dress, play like they want to play; let them follow their creativity to whatever end, and reward that creativity both monetarily and politically; let them be.

Is this approach feasible? Google's bottom line seems to say so, boom-inflated numbers notwithstanding. And Google's approach is really just the current in a long line of attempts to appease the coder ethos. The dot-commers tried to figure it out, but rewarded breathing and loud talking as much as true inspiration and hard work. Other companies are now learning from those mistakes; Google is just the most prominent.

What is it that we want? What makes me say "this is the job for me"?

Reread that list of characteristics above. What theme shines through?

Perhaps coders just want the freedom to think, to learn, to create in their own ways. Perhaps it's not about timelines and budgets and marketability. Perhaps coding--really hardcore, 4AM, 24-hours-awake coding--is the passionate, compelling, empowering art form of our time.

Artists are mocked. Artists are ridiculed. Artists are persecuted. Artists are sought out. Artists are revered.

So are coders.

Artists are frequently unsolvable, incomprehensible, unmanageable, intractable.

So are coders.

Artists create their best work when left to their own devices, isolated from the terrible triviums of modern living.

So do coders.

Artists go on long-winded, oft-maligned midnight rants about what it means to be an artist, man, and what it means to create art.

So do coders.

Perhaps what we've always hoped is true. Perhaps we're not misfits or malcontents. Perhaps we're the latest result of that indescribable human spark that moves mountains and shoots the moon. Perhaps it's no longer presumptuous to say it:

Code is the new art.

Sunday, July 23, 2006

The Fastest Ruby Platform? or Hey, Isn't Java Supposed to be Slow?

I love stirring up trouble. In working on the compiler for JRuby, it's become apparent that a few targetted areas could yield tremendous performance benefit even in pure-interpreted mode. I describe a bit of this evening's study here.

As an experiment, I cut out a bunch of the stuff in our ThreadContext that caused a lot of overhead for Java-implemented methods. This isn't a safe thing to do for general cases, since most of these items are important, but I wanted to see what bare invocation might do for speed by reducing some of these areas to theoretical "zero cost".

Cut from per-method overhead:
- block/iter stack manipulation
- rubyclass stack manipulation

Just two pieces of our overall overhead, but two reasonably expensive pieces not needed for fib().

Explanation and wild theories follow the numbers below.

Recall that the original compiled fib was only twice as fast as the interpreted version. The new numbers put it closer to 2/3 faster:

Time for bi-recursive, interpreted: 18.37
Time for bi-recursive, compiled: 6.6160000000000005
Time for bi-recursive, interpreted: 17.837
Time for bi-recursive, compiled: 6.878
Time for iterative, interpreted: 25.222
Time for iterative, compiled: 24.885

So with the unnecessary overhead removed (simulating zero-cost for those bits of the method call process) we're down to mid 6-seconds for fib(30). The iterative version is calculating fib(500000), but I'll come back to that.

Now consider Ruby's own numbers for both of these same tests:

Time for bi-recursive, interpreted: 2.001974
Time for iterative, interpreted: 9.015137

Hmm, now we're not looking so bad anymore, are we? For the recursive version, we're only about 3.5x slower with the reduced overhead. For iterative, only about 2.5x slower. So there's a few other things to consider:

- Our benchmark still creates and pushes/pops a frame per invocation
- Our benchmark still has fairly costly overhead for method arguments, both on the caller and callee sides (think multiple arrays allocated and copied on both ends)
- Our benchmark is still using reflected invocation

Yes, the bits I removed simulate zero cost, which we'll never achieve. However, if we assume we'll get close (or at least minimize overhead for cases like this where those removed bits are obviously not needed), these numbers are not unreasonable. If we further assume we can trim more time off each call by simplifying and speeding up argument/param processing, we're even better. If we eliminate framing or reduce its cost in any way, we're better still. However, the third item above is perhaps the most compelling.

You should all have seen my microbenchmarks for reflected versus direct invocation. Even in the worst comparison, direct invocation (via INVOKEINTERFACE) took at most 1/8 as much time as reflected. The above fib invocation and all the methods it calls are currently using reflected invocation, just like most stuff in JRuby.

So what does performance hypothetically look like for 6.5s times 1/8? How does around 0.8s sound? A full 50-60% faster than C Ruby! What about for iterative...24s / 8 = 3s, a solid 66% boost over C Ruby again. Add in the fact that we're missing a bunch of optimizations, and things are looking pretty damn good. Granted, the actual cost of invoking all those reflected methods is somewhat less than the total, but it's very promising. Even if we assume that the cost of the unoptimized bits is 50% of the total time, leaving 50% cost for reflection, we'd basically be on par with C Ruby.

It's also interesting to note that the interpreted and compiled times for the iterative version are almost identical. Interpretation is expensive for many things, but not for a simple while loop. The iterative version's code is below:

def fib_iter_ruby(n)
   i = 0
   j = 1
   cur = 1
   while cur <= n
     k = i
     i = j
     j = k + j
     cur = cur + 1
   end
   i
end

This is a good example of code that's very light on interpretation. While loops in Ruby and JRuby boil down in both cases to little more than while loops in the underlying language, interpretation or not. The expense of this method is almost entirely in two areas: variable assignment and method calls, neither of which are sped up by compilation. The similarity of the compiled and interpreted numbers for this iterative algorithm show one thing extremely clearly: our method call overhead really, really stinks. It is here we should focus all our efforts in the short term.

Given these new numbers and the fact that we have many optimizations left to do, I think it's now very reasonable to say we could beat C Ruby performance by the end of the year.

Side Note: The compiler work has gone very well, and now supports several types of variables and while loops. This compiler is mostly educational, since it is heavily dependent on the current ThreadContext semantics and primitives. As we attack the call pipeline, the current compiler will break and be left behind, but it has certainly paved the way, showing us what we need to do to make JRuby the fastest Ruby interpreter available.

Wednesday, July 19, 2006

Conference Updates

JavaPolis 2006 - Antwerp, Belgium

I have received confirmation from my employer, Ventera Corporation, that they will fund my trip to Antwerp. Hooray! I'm also planning on bringing my wife and spending the holiday season in Europe. It ought to be a great trip, right on the heels of presenting JRuby.

Any Europeans in Amsterdam, Antwerp, Paris, Venice, Prague, or points nearby that might like to chat some time, let me know. We're planning on visiting at least those five cities.

And for the record, I speak only one European language: Spanish (and very poorly, I might add). My Mandarin Chinese is better, but I don't expect that will help much.

RubyConf 2006 - Denver, Colorado, USA

I will be attending RubyConf 2006, but I still will not have my own presentation. The selection process is complete. I still have standing offers from two other potential presenters to share time, but I have not yet heard whether they were accepted.

Oddly enough, I did receive the following email:

We have finished the presentation selection process, and regret to
have to inform you that your paper was not among those chosen for
inclusion.

Since I missed the submission deadline by a day, it's rather unremarkable that I was not selected to present. So I missed the deadline AND was declined? Ouch!

RailsConf 2006 - London, UK

I have submitted a talk entitled "JRuby on Rails" for RailsConf Europe 2006. I have also heard from my employer that they'll only pay for one European conference. Phooey.

If I'm accepted, I'll have to find another way of funding the trip, since it's not something I can swallow myself. We shall see.

Tuesday, July 18, 2006

JRuby in the News

July 17 was a particularly big news day for JRuby. Three articles were posted, one in two places, on JRuby and the two core developers Tom and I. I present here a collection of articles and blogs that have particular significance to the Ruby+Java world.

Ruby for the Java world
Joshua Fox - JavaWorld - July 17, 2006

Joshua Fox provides a brief introduction to Ruby and demonstrates Ruby on Java using JRuby. Joshua corresponded with Tom and I about this article, and I think the end result turned out well.

Interview: JRuby Development Team
Pat Eyler - Linux Journal - July 17, 2006

Pat put together a great set of questions and we told all in this 3800-word interview. I'm awaiting the flames from my response to "What's next for Ruby".

Interviewing the JRuby Developers
Pat Eyler - O'Reilly Ruby - July 17, 2006

The same interview from above, trimmed for length and posted to the O'Reilly Network.

JRuby Leaves SourceForge for Greener Pastures at Codehaus
Obie Fernandez - July 17, 2006

Obie covers our move from SourceForge to Codehaus, quoting Tom and I venting our frustration over multiple downtimes (including the infamous week-long CVS outage right before JavaOne.

A Gem Of A Language for Java and .NET
Andy Patrizio - May 26, 2006

Andy released this article shortly after our JavaOne press conference appearance. He provides a good executive summary of Ruby, IronRuby, and JRuby.

Ugo Cei on Ruby/Java Integration

In addition, I was today pointed to Ugo Cei's series of blog postings on Ruby/Java integration. They're short, but give a good quick overview of what works and what doesn't. Covered topics include the Ruby/Java bridges RJB (worked, but looks cumbersome) and YAJB (did not work), JRuby (worked like gangbusters, naturally ;), and remoted Java code over XML-RPC (a fairly popular recommendation from Rubyists).

Part I - RJB part 1
Part II - RJB for a more complicated case
Part III - YAJB...OutOfMemory on a simple script
Part IV - JRuby, a great comparison to the second RJB posting
Part V - XML-RPC part 1
Part VI - XML-RPC part 2

Ugo will also be speaking at OSCON on Ruby for Java Programmers, which is great since our own proposal was rejected.

JRuby in the News

Our Codehaus page will track articles as they're published. If you know of good blog entries or articles we should include here, please let me know!

Friday, July 14, 2006

Compilers, Conferences, Codehaus...Oh My!

I keep wondering if things can continue moving along so well indefinitely. JRuby's got the momentum of a steamroller shot out of a cannon these days.

Compilers

I have been back at work on the compiler, which is starting to take real shape now. I'm doing this initial straight-forward version in Java, so I can work on evolving my code-generation library independently. This version also isn't quite so complicated that it warrants using Ruby.

Early returns have been very good. I've been testing it so far with the standard "bi-recursive" fib algorithm, with a few other node types thrown in as I implement them:

def fib_java(n)
 1.0
 false
 true
 [1, 2, 3, 4, 5]
 "hello"
 if n < 2
   n
 else
   fib_java(n - 2) + fib_java(n - 1)
 end
end

The performance boost from the early compiler is roughly 50% when running fib(20):

fib(20)
Time for interpreted: 1.524
Time for compiled: 0.729

The fib algorithm is obviously extremely call-heavy. There's two recursions to fib plus four other calls for '<', '-', and '+'. In JRuby, this means that the overhead of making dyn-dispatched method calls is almost as high as directly interpreting code. For larger runs of fib performance tails off a bit because of this overhead. We're taking a two-pronged approach to performance right now; the compiler is obviously one prong but overall runtime optimization is the other. The compiler gives us an easy 50% boost, but we may see another large boost just by cleaning up and redesigining the runtime to optimize call paths. I would not be surprised if we're able to exceed C Ruby's performance in the near term.

Oh, and I love the following stack trace from a bug in the compiler. Note the file and line number where the error occurred:

Exception in thread "main" java.lang.IncompatibleClassChangeError
       at MyCompiledScript.fib_java (samples/test_fib_compiler.rb:2)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

How cool is that?

Conferences

Close on the heels of my disappointing mixup with RubyConf 2006, we have received an invitation to present at JavaPolis 2006 in Antwerp, Belgium. We actually were hoping to go and would have planned on delivering a BOF or Quicky session, but after sending a proposal one of the conference organizers said JRuby was already on the wish list. We're guaranteed "at least" a full session. JavaPolis will be held December 13-15, and we'll be getting cozy with some of the top Java folks in the world. It should be a fun and exciting event, especially considering what we'll get done in the five months before then.

I've also received other offers from potential RubyConf 2006 presenters to share their tiem slots with me. We should be able to get word out and demos presented after all. I'd still like to be able to present without stepping on others' time, so if you really would like to see JRuby have a full presentation at RubyConf this year, email the organizers.

Codehaus

We have made the move to Codehaus, and we've mostly got JRuby operations migrated there from SourceForge. JIRA is up, Confluence is up with a minimal set of pages, and we're totally running off Codehaus SVN now. So far we're very happy with the move; JIRA is beautiful and the SVN repo works great. We have not yet migrated off SF.net email lists and we haven't moved over all bugs from SF's tracker, but otherwise we're basically a Codehaus project now. Huzzah!

Sunday, July 09, 2006

Is Reflection Really as Fast as Direct Invocation?

This was originally posted to the jruby-devel mailing list, but I am desperate to be proven wrong here. We use reflection extensively to bind Ruby methods to Java impls in JRuby, and the rumors of how fast reflection is have always bothered me. What is the truth? Certainly there are optimizations that make reflection very fast, but as fast as INVOKEINTERFACE and friends? Show me the numbers! Prove me wrong!!

--

It has long been assumed that reflection is fast, and that much is true. The JVM has done some amazing things to make reflected calls really f'n fast these days, and for most scenarios they're as fast as you'd ever want them to be. I certainly don't know the details, but the rumors are that there's code generation going on, reflection calls are actually doing direct calls, the devil and souls are involved, and so on. Many stories, but not a lot of concrete evidence.

A while back, I started playing around with a "direct invocation method" in JRuby. Basically, it's an interface that provides an "invoke" method. The idea is that for every Ruby method we provide in Java code you would create an implementation of this interface; then when the time comes to invoke those methods, we are doing an INVOKEINTERFACE bytecode rather than a call through reflection code.

The down side is that this would create a class for every Ruby method, which amounts to probably several hundred classes. That's certainly not ideal, but perhaps manageable considering you'd have JRuby loaded once in a whole JVM for all uses of it. It could also be mitigated by only doing this for heavily-hit methods. Still, requiring lots of punky little classes is a big deal. [OT: Oh what I would give for delegates right about now...]

The up side, or so I hoped, would be that a straight INVOKEINTERFACE would be faster than a reflected call, regardless of any optimization going on, and we wouldn't have to do any wacked-out code generation.

Initial results seemed to agree with the upside, but in the long term nothing seemed to speed up all that much. There's actually a number of these "direct invocation methods" still in the codebase, specifically for a few heavily-hit String methods like hash, [], and so on.

So I figured I'd resolve this question once and for all in my mind. Is a reflected call as fast as this "direct invocation"?

A test case is attached. I ran the loops for ten million invocations...then ran them again timed, so that hotspot could do its thing. The results are below for both pure interpreter and hotspotted runs (time are in ms).

Hotspotted:
first time reflected: 293
second time reflected: 211
total invocations: 20000000
first time direct: 16
second time direct: 8
total invocations: 20000000

Interpreted:
first time reflected: 9247
second time reflected: 9237
total invocations: 20000000
first time direct: 899
second time direct: 893
total invocations: 20000000

I would really love for someone to prove me wrong, but according to this simple benchmark, direct invocation is faster--way, way faster--in all cases. It's obviously way faster when we're purely interpreting or before hotspot kicks in, but it's even faster after hotspot. I made both invocations increment a static variable, which I'm hoping prevented hotspot from optimizing code into oblivion. However even if hotspot IS optimizing something away, it's apparent that it does a better job on direct invocations. I know hotspot does some inlining of code when it's appropriate to do so...perhaps reflected code is impossible to inline?

Anyone care to comment? I wouldn't mind speeding up Java-native method invocations by a factor of ten, even if it did mean a bunch of extra classes. We could even selectively "directify" methods, like do everything in Kernel and Object and specific methods elsewhere.

--

The test case was attached to my email...I include the test case contents here for your consumption.

private static interface DirectCall {
    public void call();
}

public static class DirectCallImpl implements DirectCall {
    public static int callCount = 0;
    public void call() { callCount += 1; }
}

public static DirectCall dci = new DirectCallImpl();

public static int callCount = 0;
public static void call() { callCount += 1; }

public void testReflected() {
    try {
        Method callMethod = getClass().getMethod("call", new Class[0]);

        long time = System.currentTimeMillis();
        for (int i = 0; i < 10000000; i++) {
            callMethod.invoke(null, null);
        }
        System.out.println("first time reflected: " + (System.currentTimeMillis() - time));
        time = System.currentTimeMillis();
        for (int i = 0; i < 10000000; i++) {
            callMethod.invoke(null, null);
        }
        System.out.println("second time reflected: " + (System.currentTimeMillis() - time));
        System.out.println("total invocations: " + callCount);
    } catch (Exception e) {
        e.printStackTrace();
        assertTrue(false);
    }
}

public void testDirect() {
    long time = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) {
        dci.call();
    }
    System.out.println("first time direct: " + (System.currentTimeMillis() - time));
    time = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) {
        dci.call();
    }
    System.out.println("second time direct: " + (System.currentTimeMillis() - time));
    System.out.println("total invocations: " + DirectCallImpl.callCount);
}

Update: A commenter noticed that the original code was allocating a new Object[0] for every call to the reflected method; that was a rather dumb mistake on my part. The commenter also noted that I was doing a direct call to the impl rather than a call to the interface, which was also true. I updated the above code and re-ran the numbers, and reflection does much better as a result...but still not as fast as the direct call:

Hotspotted:

first time reflected: 146
second time reflected: 109
total invocations: 20000000
first time direct: 15
second time direct: 8
total invocations: 20000000

Interpreted:

first time reflected: 6560
second time reflected: 6565
total invocations: 20000000
first time direct: 912
second time direct: 920
total invocations: 20000000

Wednesday, July 05, 2006

On Net Neutrality

It's not really Ruby or Java-related, but with all the noise about net neutrality the past few weeks I figured I'd put my opinions out there. I doubt any of this is particularly original, but it's where I stand.

I believe, as do most well-informed Netizens, that Verizon et al are nuts to demand a new "pay-for-performance" model. They claim that since traffic to and from Google, Amazon, Yahoo, and friends takes up a larger percentage of their bandwidth than that from other sites, those heavy-hitters should pay intermediate providers a little something extra to make sure traffic arrives quickly. They liken it to a high-speed highway toll lane, allowing those who pay the price a higher quality of service. They are also completely wrong.

Plenty of metaphors have been tossed around by both sides. The toll lane metaphor in particular fails badly because of one simple fact: Google et al do not decide who visits their sites, and therefore they do not decide where their traffic goes. Google, for example, is a simple supplier of a service; the fact that the service is provided over the internet is entirely irrelevant since the internet is not funded based on who requests what from whom; it's based on the privilege of making, serving, and receiving requests. I'll talk in terms of Google from here on.

Traffic from Google (and to a lesser extent, traffic to Google) certainly do take up their share of overall internet bandwidth. This much is true. However Google already pays for some of that bandwidth on their end of the net. They pay for the privilege of receiving and serving requests and sending requests and receiving responses from sites they index or integrate with.

The other end of the internet is also already being paid for. Users pay for their internet connections, be they dialup or broadband, and in return are given the privilege of sending and receiving requests over that connection. Service pricing scales up with speed and reliability; corporate users pay the highest rates to guarantee quality-of-service while home users pay much lower consumer rates. Traffic that crosses the internet from point A to point B is already paid for twice...once on the service end and and once on the consumer end. This is how the internet has operated for better than two decades; it is today essentially a utility like power or water.

Adding a requirement that Google pay for traffic crossing various networks adds a third payment into this system. Where providers previously could only profit from what happened at the edges of the internet, they now want to choke money from the middle as well. Not only is this absurd given the structure and organization of the internet, it is also quite infeasible. If Google were expected to pay every primary and secondary backbone provider for all traffic crossing their wires, the costs could be astronomical...a death of a thousand cuts. It would, however, represent an explosive boom in revenues for the largest providers, since they would profit from every byte they delivered from point A to point B...while simultaneously charging A and B for the right to such extortion.

Let us assume that the providers are truthful in saying they need such a system to handle the explosive popularity of sites like Google's. What brought us to this situation? Again, the fault lies with the providers themselves.

Today, you can get a 1MBps DSL connection from any provider for about $25-$30 per month. Such a speed was unheard of only a couple years ago, and yet it is now standard in many regions. If you are willing to accept a few restrictions on how you use it, you can get 4MBps-8MBps cable internet for about the same price. Again, a remarkably low price for the service you get.

Supply and demand drives the market, wherein a fixed supply combined with increased demand causes prices to go up. While the supply of bandwidth has increased over the past few years, the demand for faster connections and the amount of data traversing them has skyrocketed. At the same time, prices have dropped.

Say a given provider serves 1,000 users, all with 56KBps modems. At peak theoretical usage this would require 56MBps of upstream bandwidth to serve (considering only downloads). Now consider a three-year period in which our friendly provider increases the size of their pipe by ten times (a very large increase), with a resulting total upstream bandwidth of 560MBps. Suppose over the same period half of their users switch to 4MBps broadband internet connections (also a large number, but perhaps not unreasonable in an internet-savvy population), and proceed to utilize them no more than 50% (reasonable given modern internet media services). This brings the total theoretical demand up to 1.052GBps, nearly double the ten-times buildout. Even dropping the number of broadband subscribers to 30% exceeds the buildout by a healthy margin.

I believe it is for this reason the providers have taken their current position. Rather than any one of them unilaterally increasing broadband prices to the consumer or service prices to corporate endpoints, they have decided to force content providers to foot the bill for their lack of foresight. They have not built out their networks fast enough to match the speed at which they have rolled out low-cost, high-speed internet. Now they're paying the price for over-marketing and under-planning with maxed-out networks and consumers ever-hungrier for more speed at lower cost.

Did they think they could decrease prices and increase service indefinitely without appropriate infrastructure to back it up?

So here we are stuck in this current situation. The large providers have had their "oh, shit" moment. They unfortunately hold most of the cards and have nearly convinced enough congresspeople to follow their lead. Again the needs of America's corporations are being put before the needs of her people. What other costs might there be?

For one, any other country that continues to observe network neutrality will continue to see the internet growing explosively. The US will fall behind out of a desire to increase provider profits.

Also, since consumers have little choice which networks their data flows through, providers excercise local monopolies. Without network neutrality, their networks will only be open to the highest bidders, eventually decreasing the level of choice and quality of service for end-users. When neutrality falls, how long before content providers bid each other completely off networks? Will I have to choose my next internet provider (if I have a choice) based on whether they allow me to more easily access Yahoo or Google?

So what are we to do? Obviously we should make every effort to ensure our congresspeople are voting in our best interests; that much is obvious. But I would propose another possible solution for Google and friends: if neutrality falls...don't pay. In fact, don't service those providers at all.

Stay with me here. Google and its family of services are likely the most popular destinations on the internet. Popularity is demand; users demand that Google services be available over their internet connections. They may not be screaming now about network neutrality, but when their ISP or the upstream provider tries to extort money from Google and Google responds by serving notice of such extortion back to the users in place of their desired services, you'll have thousands upon thousands of angry customers.

They will be angry not at Google--oh no. Anger is most often directed at the nearest target. If the reception on your TV is bad, the first thing blamed is the TV itself or the hardware or wires connecting it. If your gut grows beyond control, you may likely blame genetics or stress rather than the sodas and chips you consume daily. If you can't reach Google or GMail, guess where the blame and anger fall first: that's right, your ISP. Your ISP will quickly turn around and direct their own anger at the upstream providers that brought this new nightmare down upon them. "My customers can't reach Google because Google refuses to pay your fees. Resolve it or I and my 50,000 subscribers will no longer do business with you."

The beauty of this plan is that an ad-based company like Google won't even stand to lose much money. Google ads are delivered on countless pages across the net, all of which would still operate just as before. Google could bide its time while the furious public demanded net neutrality be reinstated. Supply and demand, and we're back where we started.

There's one other fact I need to call out, unfortunately. Those of you who have demanded absurdly fast connections at absurdly low prices will very likely have to accept that prices may increase to preserve neutrality. The internet as it exists today can't accommodate 100% of users using multi-megabytes-per-second...the infrastructure just isn't there. If there really is a crisis brewing on the internet such that available ISP demands will greatly exceed what backbone providers can handle, either endpoint speeds will need to decrease or prices will have to increase throughout the pipeline to fund network buildout. I would consider it a small price to pay to ensure the future of the internet as we know it. I hope you will too.

For more information on network neutrality, please visit the following links:

SaveTheInternet.com - a coalition fighting for network neutrality

Wikipedia's Article on Net Neutrality

Christopher Stern's excellent Washington Post article on the issues

And don't forget to contact your congresspeople to let them know where you stand:

US House
US Senate

JRuby 0.9.0 Released

And it's out the door! We are currently sending out the announcement: JRuby 0.9.0 is released!

Any frequent readers out there will know there's a ton of improvements in this release. I'll be blogging a bit later on the details of each, but here's a short list:

All non-native Ruby libraries (lib/ruby/1.8 from the C version) are included in this release, so you can download, unpack, and hit the ground running. No installing, no copying files, no fuss.
RubyGems 0.9.0 (coincidence?) is included pre-installed with appropriate scripts to run with JRuby. Gems install beautifully, but I'd recommend you use --no-rdoc and --no-ri until we get a few performance issues worked out (doc generation is still pretty slow).
WEBrick finally runs and can serve requests. This also allows the Rails 'server' script to server up your Rails apps. Speaking of Rails...
Ruby on Rails now runs most scripts and many simple applications out-of-the-box. The 1.1.4 release of Rails included our patch for block-argument assignment, eliminating an unsupported (by JRuby) and unusual syntax preventing scripts from running. We've also improved performance a bit and as mentioned before, got the WEBrick-based 'server' script running well.
YAML parsing, StringIO, and Zlib are now Java-based instead of the original pure-Ruby implementations. This has greatly improved the performance of RubyGems and other libraries/apps that make use of these libraries. There are a few libraries remaining to be "nativified" and we expect to get future performance benefits from doing so.
As always, there are a large number of interpreter, core, and library fixes as part of this release.

This release represents another huge leap forward beyond our JavaOne demo. Where Rails was a bit touchy and unpredictable with that release, it's now becoming very usable. Where RubyGems only half-worked before, it now appears to work well every time. We feel we're on track for full Rails support by the end of this summer, with a JRuby 1.0 release to follow. Thanks all for your support and assistance!

Special thanks to JRuby contributors Ola Bini and Evan Buswell for their work on YAML/Zlib reimplementation and server sockets, respectively. We wouldn't have RubyGems or WEBrick working without their contributions.

Information about JRuby and the release can be found at jruby.org.
The release archives are on SourceForge. This will be the last release from SourceForge; we are in the process of moving to CodeHaus and should be up and running there shortly!

Stay tuned for a JRuby on Rails walkthrough!

Sunday, July 02, 2006

JRuby Not to Appear at RubyConf 2006...

...and it's my fault!

So...I had the proposal all ready and in the RubyConf system. I got some feedback from Tom and could have pushed the "final submit" button at any time.

Then we got busy wrapping up the very-exciting 0.9.0 release, I got my dates mixed up and...well, enough excuses. I missed the deadline by *one day*, and the organizers stated that they ethically couldn't make an exception for me.

I'm pretty devastated. By RubyConf we were planning to have Rails working and performance vastly improved. It was to be a true "coming of age" for Ruby on the JVM, and I wanted to share that with the Ruby community at their definitive conference. Sadly, it is not to be.

If any readers are particularly interested in seeing JRuby present at RubyConf 2006, you might drop an email to the organizers and perhaps they'll change their minds.

Now if you'll excuse me, I'm going to spend the rest of the day sulking.

Updates:

John Lam (of the RubyCLR project) would like to see us at RubyConf 2006, and has offered to share his time (assuming his talk is accepted) if we can't sway the organizers. I'm still hoping there's a way to get our own presentation, but it's great to have backers out there pulling for us. Thanks John!

Saturday, July 01, 2006

Rails 1.1.4 Runs Unmodified in JRuby

Rails 1.1.4 was released on Friday to very little fanfare. It was basically a fixit release for routing logic that broke in the 1.1.3 release (which broke while fixing a security issue in 1.1.2, etc). However, there's a less visible but vastly more interesting fix (for JRubyists) also included in 1.1.4.

Esoterica

Rails pre-1.1.4 used a peculiar block syntax to handle simple array and hash assignment:

options = {} # a hash
...
some_object.some_method { |options[:key]| }

Which basically boils down to this:

some_object.some_method { |v| options[:key] = v }

The former is certainly terse, but it takes a little head-twisting to follow what's going on. Inifinitely more serious: it doesn't work in JRuby.

Anguish

I don't know when this syntax became valid or if it's always been valid, but it is seldom used. We had not seen it anywhere except in Rails, and there it's only used in a few top-level scripts for gathering command-line options. The level of effort to implement this missing functionality in JRuby is nontrivial:

there are parser changes necessary to treat the block arg as an assignment to a hash/array, which is a Call instead of a DVar (dynamic var)
there are interpreter changes to allow Call nodes to appear in block arg lists

Salvation

Naturally, we wanted to avoid adding this functionality if possible, so I emailed the ruby-core mailing list to ask the man himself, matz, whether this was an accepted standard. His response was music to my ears:

FYI, in the future, block parameters will be restricted to local variables only, so that I don't recommend using something other than local variables for the parameters.

Redemption

Being the rabble-rouser that I am, and armed with matz's Word, I contacted the Rails team about "fixing" this unusual block syntax. Naturally, we wanted it fixed because it was going away, and not just because we're lazy. A bug was created, patches were assembled, and finally, in 1.1.4, this syntax has been fixed.

Enlightenment

So what does this mean for JRuby? Until 1.1.4 was release, we had planned to include our own patch for this syntax, to allow Rails to work correctly. You would have had to patch Rails after installing, or none of the standard Rails scripts would run. Now, with the release of 1.1.4, the following will all work out-of-the-box with JRuby 0.9.0 (currently in RC status):

gem install rails --include-dependencies
# and use --no-rdoc for now...it's too slow
rails ~/myapp
# might need to tweak the rails script a bit to fix the shebang, or just call "jruby path/to/rails"
cd ~/myapp
jruby script/generate controller MyTest hello
jruby script/server webrick
# specify webrick explicitly; Rails uses platform-dependent tricks otherwise

And suddenly, you have http://localhost:3000/mytest/hello running JRuby on Rails.

We'll have a much better HOWTO packed into the 090 release, and I'll certainly be blogging on various aspects of JRuby on Rails over the next week or two. Keep in mind that we don't officially "support" Rails just yet; there's more work to do. We are, however, another leap beyond the JavaOne demo, and at this rate things are looking very good for an end-of-Summer JRuby 1.0.

Connections Freeze over Linux PPTP Client

Because my last "howto" post about an Eclipse issue ended up helping someone, I'm posting another.

I have PPTP set up on my Linux-based desktop, but I had been having no end of problems with large file transfers or long-running connections over it. They appeared to freeze up or die halfway through.

Unfortunately this was all happening when I was trying to access a remote CVS server. After I spent two hours diagnosing the poor CVS server I determined that there wasn't a bloody thing wrong with it. Turning my attentions locally, I thought I'd see if there were any troubleshooting items related to pptpclient. Turns out there are:

Connections Freeze over Linux PPTP Client

In my case, it was that MTU negotiation was busted. Adding "mtu 1400" to my peer file in /etc/ppp/peers appears to have solved all issues. Huzzah!

Hopefully my adding another descriptive, keyword-laden link to the world will help others find this more quickly.

Tuesday, June 27, 2006

Ruby.NET Compiler Passes test.rb

Queensland University of Technology's Ruby.NET project has achieved a major milestone: their Ruby to .NET compiler is able to run and pass all the test.rb samples in the C Ruby distribution. They've got plenty more work to do to be "Ruby compatible" in any sense of the words, but they're making very good progress.

I'm going to stay involved with this project to help in any way I can, be it discussions on implementation challenges or helping to debug and write tests. The .NET CLR is an impressive piece of technology, and I'd certainly like to see Ruby gain traction in the .NET world as well.

At any rate, kudos to the QUT folks for their hard work. I'm also glad to see they have a very liberal license and all source available online. Once I confirm I won't be tainted in any way I'll be reviewing their code and compiler...I'll report back on what I find.

It's amazing what financial backing from a major software company can do for an open-source project :)

Thursday, June 15, 2006

Mongrel in JRuby?

I'm going to try to post more frequent, less-verbose entries to break up the time between longer articles. This is the first.

For those of you unfamiliar with it, Mongrel is a (mostly) Ruby HTTP 1.1 server, designed to escape the perils of cgi/fcgi/scgi when running web frameworks like Rails or Camping. From what I hear it's very fast and very promising. Think of it as a Ruby equivalent to Java's servlet/web containers like Jetty and Tomcat.

Deploying Rails apps under JRuby currently has only two reasonable outlets:

- You can wrap all the Rails-dispatching logic into a servlet, deploying as you would a web application.
- You can (mostly) run the "server" script, which starts up Rails in WEBrick, Ruby's builtin lightweight server framework.

The first option would obviously provide the best integration with existing Java code and infrastructure. The second is good for development-time testing.

However, with JRuby rapidly becoming production-usable, there will be many folks who want a third option: deploying Rails--just Rails--in a production environment without a meaty servlet engine. For Ruby, that's where Mongrel comes in.

Mongrel is by the developer's own admission "mostly" Ruby code. The one big area where it is not Ruby is in its HTTP parser library (there's also a native ternary search tree, but that's no big deal to reimplement). Using Mongrel, require 'http11' pulls in a C-based HTTP 1.1 parser and related utilities. Ruby is notoriously slow for parsing work, so this native code is not unreasonable. However, it would be a barrier to running Mongrel within JRuby; we would need our own native implementation of the http11 library.

So, any takers? If I had a nickle for every JRuby sub-project I want to work on I'd have a fistful of nickles. This one will probably be pretty far back in the queue.

Known possibilities for lightweight HTTP 1.1 support (really all that's needed for Mongrel is an HTTP1.1 library, but that can probably be used alone):

- Jetty
- Simple

For those of you that say "Why not just wire Rails into [Tomcat|Jetty|Simple] and be done with it" I have this answer: Rubyists are not particularly fond of Java libraries. My aim of late is working toward supporting both Ruby/Java folks who will happily marry the two and pure Ruby folks that simply want another runtime to use. Mongrel is bound to become a very popular choice for web serving Ruby applications, and we would be remiss if we did not attempt to support it.

Wednesday, June 14, 2006

Unicode in Ruby, Unicode in JRuby?

The great unicode debate has started up again on the ruby-talk mailing list, and for once I'm not actively avoiding it. We have been asked many times by Java folks why JRuby doesn't suport unicode, and there's really been no good answer other than "that's the way Ruby does it".

For the record, Ruby does support utf8, but not multibyte. Internally, it usually assumes strings are byte vectors, though there are libraries and tricks you can usually use to make things work. The array of libraries and tricks, however, is baffling to me.

So here, now, I open up this question to the Java, JRuby, and Ruby communities in general as I did on the ruby-talk and rails-core mailing lists:

Every time these unicode discussions come up my head spins like a top. You should see it.

We JRubyists have headaches from the unicode question too. Since JRuby is currently 1.8-compatible, we do not have what most call *native* unicode support. This is primarily because we do not wish to create an incompatible version of Ruby or build in support for unicode now that would conflict with Ruby 2.0 in the future. It is, however, embarrassing to say that although we run on top of Java, which has arguably pretty good unicode support, we don't support unicode. Perhaps you can see our conundrum.

I am no unicode expert. I know that Java uses UTF16 strings internally, converted to/from the current platform's encoding of choice by default. It also supports converting those UTF16 strings into just about every encoding out there, just by telling it to do so. Java supports the Unicode specification version 3.0. So Unicode is not a problem for Java.

We would love to be able to support unicode in JRuby, but there's always that nagging question of what it should look like and what would mesh well with the Ruby community at large. With the underlying platform already rich with unicode support, it would not take much effort to modify JRuby. So then there's a simple question:

What form would you, the Ruby [or JRuby or Java] users, want unicode to take? Is there a specific library that you feel encompasses a reasonable implementation of unicode support, e.g. icu4r? Should the support be transparent, e.g. no longer treat or assume strings are byte vectors? JRuby, because we use Java's String, is already using UTF16 strings exclusively...however there's no way to get at them through core Ruby APIs. What would be the most comfortable way to support unicode now, considering where Ruby may go in the future?

So there it is, blogosphere. Unicode support is all but certain for JRuby; its usefulness as a JVM language depends on it. What should that support look like?

Saturday, June 10, 2006

Bringing RubyGems to JRuby OR The Zen of Slow-running Code

JRuby now supports RubyGems, and gems install correctly from local and remote. That's a huge achievement, especially considering the extra work that was required around YAML to get to this point. However, I'll start off with the caveats this time.

JRuby is very slow to install gems. You'll see what I mean in a moment, but it's so slow that something's obviously broken. That's perhaps the good news. Even on a high-end box, it's so intolerably slow that there's got to be a key fault keeping the speed down. We believe there are a couple reasons for it.

headius@opteron:~/rubygems-0.8.11$ time jruby gem install rails --include-dependencies
Attempting local installation of 'rails'
Local gem file not found: rails*.gem
Attempting remote installation of 'rails'
Updating Gem source index for: http://gems.rubyforge.org
Successfully installed rails-1.1.2
Successfully installed rake-0.7.1
Successfully installed activesupport-1.3.1
Successfully installed activerecord-1.14.2
Successfully installed actionpack-1.12.1
Successfully installed actionmailer-1.2.1
Successfully installed actionwebservice-1.1.2
Installing RDoc documentation for rake-0.7.1...
Installing RDoc documentation for activesupport-1.3.1...
Installing RDoc documentation for activerecord-1.14.2...
Installing RDoc documentation for actionpack-1.12.1...
Installing RDoc documentation for actionmailer-1.2.1...
Installing RDoc documentation for actionwebservice-1.1.2...

real    63m16.575s
user    55m5.939s
sys     0m25.547s

Ruby in Ruby

I'll tackle the simpler reason first: we still have a number of libraries implemented in Ruby code.

At various times, the following libraries have all been implemented in Ruby code within JRuby: zlib, yaml, stringio, strscan, socket...and others irrelevant to this discussion. This provided us a much faster way to implement those libraries, but owing to the sluggishness of JRuby's interpreter, this also meant these libraries were slower than we would like. This is actually no different from C Ruby; many of these intensive libraries are implemented in C code in Ruby, with no Ruby code to be seen.

Some, such as zlib, yaml, and stringio are on their way to becoming 100% Java implementations, but they're not all the way there yet. This is generally because Ruby code is so much simpler and shorter than Java code; completing the conversion to Java is painful in many ways.

Ola's work on the zlib and yaml libraries have been a tremendous help. He provided the first ruby implementation of zlib, and has provided incremental improvements to it, generally by sliding it closer and closer to 100% java. Ola also ported a fast YAML parser from Python, first to Ruby and now increasingly to JRuby, resulting in his RbYAML and JvYAML projects. Our old, original Ruby 1.6 yaml.rb parser was extremely slow. The new parsers have made YAML parsing speed many orders of magnitude faster. Tom and Ola have both worked to improve stringio. stringio provides an IO-like interface into a string, much like Java's StringBuffer/StringBuilder classes. In Ruby, understandably, this is implemented entirely in C. Our version, while slowly becoming 100% Java, is still quite a bit slower than it ought to be.

The continued porting of these hot-spot libraries from Ruby to Java will have perhaps the largest effect on gem install performance. However, there's another cause for alarm.

Fun with Threads

Threading in Ruby has a somewhat different feel from threading on most other languages and platforms. Ruby's threads are so easy to use and so lightweight that calling them "threads" is a bit misleading. They can be stopped, killed, and terminated in a variety of ways from outside themselves. They are trivial to launch: Thread.new { do something }. C Ruby also implements them as green threads, so no matter how many threads you spawn in Ruby, you're looking at a single processor thread to execute them. That means considerably less overhead, but practically no multi-core or multi-threading scalability at all. In short, Ruby's threads allow you to use and manipulate them in ways no other platform or language's threads allow while simultaneously giving you only a subset of typical threading benefits. For sake of brevity, I will refer to them as rthreads for the rest of this article.

The increased flexibility of rthreads mean that kicking off an asynchronous job is trivial. You can spin off many more threads than could be expected from native threading, using them for all manner of tasks where a parallel or asynchronous job is useful. The rthread is perhaps more friendly to users of the language than native threads: most of the typical benefits of threading are there without many of the gotchas. Because of this, I have always expected that Ruby code would use rthreading in ways that would horrify those of us with pure native threading. Therefore, I decided during my early redesign that supporting green threading--and even better, m:n threading--should be a priority. Our research into why gems are slow seems to have confirmed this is the right path.

RubyGems makes heavy use of the net/http package in Ruby. It provides a reasonably simple interface to connect, download, and manipulate http requests and responses. However, it shows its age in a few ways; other implementations of client-side http are around, and there are occasional calls to replace net/http as the standard.

net/http makes heavy use of net/protocol, a protocol-agnostic library for managing sockets and socket IO. It has various utilities for buffered IO and the like. It also makes use of Ruby's "timeout" library.

The timeout library allows you to specify that a given block of code should only execute for a given time. As you might guess, this requires the use of threading. However, you might be surprised how it works:

from lib/ruby/1.8/timeout.rb

  def timeout(sec, exception=Error)
  return yield if sec == nil or sec.zero?
  raise ThreadError, "timeout within critical session" if Thread.critical
  begin
    x = Thread.current
    y = Thread.start {
      sleep sec
      x.raise exception, "execution expired" if x.alive?
    }
    yield sec
    #    return true
  ensure
    y.kill if y and y.alive?
  end
end

It's fairly straightforward code. You provide a block and an optional timeout period. If you specify no timeout, just execute the block. If we're in a critical section (which prevents more than one thread from running), throw an error. Otherwise, start up a thread that sleeps for the timeout duration and execute the block. If the timeout thread wakes up before the block is complete, interrupt the working thread. Otherwise, kill the timeout thread and return.

With rthreads, this is a fairly trivial operation. It gives Ruby's thread scheduler one extra task...starting up a lightweight thread and immediately putting it to sleep. Now it can be argued that this is a waste of resources, creating a thread every time you want to timeout a task. I would agree, since a single thread-local "timeout worker" would suffice, and would not require launching many threads. However, this sort of pattern is not unexpected with such a simple and consumable threading API. Unfortunately, it's a larger problem under JRuby.

JRuby is still 1:1 rthread:native thread, which means the timeout code above launches a native thread for every timeout call. Obviously this is less than ideal. It becomes even less ideal when you examine more closely how timeout is used in net/protocol:

from lib/ruby/1.8/net/protocol

    def read(len, dest = '', ignore_eof = false)
     LOG "reading #{len} bytes..."
     read_bytes = 0
     begin
       while read_bytes + @rbuf.size < len
         dest << (s = rbuf_consume(@rbuf.size))
         read_bytes += s.size
         rbuf_fill
       end
       dest << (s = rbuf_consume(len - read_bytes))
       read_bytes += s.size
     rescue EOFError
       raise unless ignore_eof
     end
     LOG "read #{read_bytes} bytes"
     dest
   end
...
   def rbuf_fill
     timeout(@read_timeout) {
       @rbuf << @io.sysread(1024)
     }
   end

For those of you not as familiar with Ruby code, let me translate. The read operation performs a buffer IO read, reading bytes into a buffer until the requested quantity can be returned. To do this, it calls rbuf_fill repeatedly to fill the buffer. rbuf_fill, in order to enforce a protocol timeout, uses the timeout method for each read of 1024 bytes from the stream.

Here's where my defense of Ruby ends. Let's dissect this a bit.

First off, 1024 is nowhere near large enough. If I want to do a buffered read of a larger file (like oh, say, a gem) I will end up reading it in 1024-byte chunks. For a large file, that's hundreds or potentially thousands of read calls. What exactly is the purpose of buffering at this point?

Second, because of the timeout, I am now spawning a thread--however green--for every 1024 bytes coming out off the stream. Because of the inefficiency of net/protocol and timeout, we have a substantial waste of time and resources.

Now translate that to JRuby. Much of JRuby is still implemented in Ruby, which means that some calls which are native in Ruby are much slower in JRuby today. Socket IO is in that category, so doing a read every 1024 bytes greatly increases the overhead of installing a gem. Perhaps worse, JRuby implements rthreads with native threads, resulting in a native thread spinning up for every 1024 bytes read. For a 500k file, that means we're reading 500 times and launching 500 timeout threads in the process. Not exactly efficient.

We will likely try to submit a better timeout implementation, or a protocol implementation that reads in larger chunks (say 8k or 16k), but we have learned a valuable lesson here: rthreads allow for and sometimes make far easier threading scenarios we never would have attempted with native threads. For that reason, and because we'll certainly see this in other libraries and applications, we will continue down the m:n path.

Coolness Still Abounds

As always, despite these obstacles and landmines, we have arrived at a huge milestone in JRuby's development. RubyGems and Ruby go hand-in-hand like Java and jarfiles. The ability to install gems is perhaps the first step toward a really usable general-purpose Ruby implementation. Look for a release of JRuby--with a full complement of Ruby libraries and RubyGems preinstalled--sometime in the next week or two.