Headius

JRuby on Rails on Amazon Elastic Beanstalk

2011-01-19T14:31:00.001-06:00

Amazon this week announced Elastic Beanstalk, a managed Apache Tomcat service for AWS. Naturally, I had to try JRuby on it.

First, the bad:

AWSEB is really slow to deploy stuff. Several times it got "stuck" and I waited for more than 30 minutes for it to recover. It did not appear to be an app issue, since the app came up just fine.
The default instance size is t1.micro. I was able to get a Rails app to boot there, but it's a very underpowered size.
It appears to start up JVMs with 256MB of memory max and 64MB of permgen. For a larger app, or one with many Rails instances, that might not be enough. For a "threadsafe" Rails app, though, it's plenty.
The default EC2 load balancer for the new Beanstalk instance is set to ping the "/" URL. If you don't rig up a / route in your Rails app (like I forgot to do) the app will come up for a few minutes and immediately get taken out.

And the good news: it works great once you get past the hassles! Here's the process that worked for my app (assuming app is already build and ready for deploy).

Preparing the app:

Ensure jruby-openssl is in Gemfile. Rails seems to want it in production mode.
Edit config/environments/production.rb to enable threadsafe mode.
`warble`

Preparing Elastic Beanstalk:

Create a new instance, specifying the .war file Warbler created above as the app to deploy
There is no step two

Once the instance has been prepared, you may want to resize it to something larger than t1.micro if it's meant to be a real app...but it should boot ok.

Have fun!

Representing Non-Unicode Text on the JVM

2011-01-05T18:04:00.004-06:00

JRuby is an implementation of Ruby, and in order to achieve the high level of compatibility we boast we've had to put in some extra work. Probably the biggest area is in management of String data.

Strings in Ruby 1.8

Ruby 1.8 did not differentiate between strings of bytes or strings of characters. A String was just an array of bytes, and the representation of those bytes was dependent on how you used them. You could do regular expression matches against non-text binary sequences (used by some to parse binary formats like PNG). You could treat them as UTF-8 text by setting global encoding variables to assume UTF-8 in literal strings. And you could use the same methods for dealing with strings of bytes you would with strings of characters...split, gsub, and so on. The lack of a character abstraction was painful, but the ability to do character-string operations against byte-strings was frequently useful.

In order to support all this, we were forced in JRuby to also represent strings as byte[]. This was not an easy decision. Java's strings are all UTF-16 internally. By moving to byte[]-based strings, we lost many benefits of being on the JVM, like built-in regular expression support, seamless passing of strings to Java methods, and easy interaction with any Java libraries that accept, return, or manipulate Java strings. We eventually had to implement our own regexp engines (or the byte[]-to-char[]-to-byte[] overhead would kill us) and calls from Ruby to Java still pay a cost to pass Strings.

But we got a lot out of it too. We would not have been able to represent binary content easily with a char[]-based string, since it would either get garbled (when Java tried to decode it) or we'd have to force the data into only the lower byte of each char, doubling the size of all strings in memory. We have some of the fastest String IO capabilities of any JVM language, since we never have to decode text. And most importantly, we've achieved an incredibly high level of compatibility with C Ruby that would have been difficult or impossible forcing String data into char[].

There's also another major benefit: we can support Ruby 1.9's "multilingualization".

Strings in Ruby 1.9

Ruby 1.9 still represents all string data as byte[] internally, but it adds an additional field to all strings: Encoding (there's also "code range", but it's merely an internal optimization).

The biggest problem with encoded text in MRI was that you never knew what a string's encoding was supposed to be; the String object itself only aggregated byte[], and if you ever ended up with mixed encodings in a given app, you'd be in trouble. Rails even introduced its own "Chars" type specifically to deal with the lack of encoding support.

In Ruby 1.9, however, Strings know their own encoding. Strings can be forced to a specific encoding or transcoded to another. IO streams are aware of (and configurable for) external and internal encodings, and there's also default external and internal encodings. And you can still deal with raw binary data in the same structure and with the same String-manipulating features. For a full discussion of encoding support in Ruby 1.9, see Yehuda Katz's excellent post on Ruby 1.9 Encodings: A Primer.

As part of JRuby 1.6, we've been working on getting much closer to 100% compatible with Ruby 1.9. Of course this has meant working on encoding support. Luckily, we had a hacking machine some years ago in Marcin Mielzynski, who implemented not only our encoding-agnostic regexp engine (a port of Oniguruma from C), but also our byte[]-based String logic and almost all of our Encoding support. The remaining work has trickled in over the subsequent years, leading up the the last few months of heavy activity on 1.9 support.

How It Works

You might find it interesting to know how all this works, since JRuby is a JVM-based language where Strings are supposed to be UTF-16 always.

First off, String, implemented by the RubyString class. RubyString aggregates an Encoding and an array of bytes, using a structure we call ByteList. ByteList represents an arbitrary array of bytes, a view into them, and an encoding. All operations against a String operate within RubyString's code against ByteLists.

IO streams, implemented by RubyIO (and subclasses) and ChannelStream/ChannelDescriptor, accept and return ByteList instances. ByteList is the text/binary currency of JRuby...our java.lang.String.

Regexp is implemented in RubyRegexp using Joni, our Java port of the Oniguruma regular expression library. Oniguruma accepts byte arrays and uses encoding-specific information at match time to know what constitutes a character in that byte array. It is the only regular expression engine on the JVM capable of dealing with encodings other than UTF-16.

The JRuby parser also ties into encoding, using it on a per-file basis to know how to encode each literal string it encounters. Literal strings in the AST are held in StrNode, which aggregates a ByteList and constructs new String instances from it.

The compiler is an interesting case. Ideally we would like all literal strings to still go into the class file's constant pool, so that they can be loaded quickly and live as part of the class metadata. In order to do this, the byte[]-based string content is forced into a char[], which is forced into a java.lang.String that goes in the constant pool. At load time, we unpack the bytes and return them to a ByteList that knows their actual encoding. Dare I claim that JRuby is the first JVM language to allow representing literal strings in encodings other than UTF-16?

What It Means For You

At the end of the day, how are you affected by all this? How is your life improved?

If you are a Ruby user, you can count on JRuby having solid support for Ruby 1.9's "M17N" strings. That may not be complete for JRuby 1.6, but we intend to take it as far as possible. JRuby 1.6 *will* have the lion's share of M17N in-place and working.

If you are a JVM user...JRuby represents the *only* way you can deal with arbitrarily-encoded text without converting it to UTF-16 Unicode. At a minimum, this means JRuby has the potential to deal with raw wire data much more efficiently than libraries that have to up-convert to UTF-16 and downconvert back to UTF-8. It may also mean encodings without complete representation in Unicode (like Japanese "emoji" characters) can *only* be losslessly processed using JRuby, since forcing them into UTF-16 would either destroy them or mangle characters. And of course no other JVM language provides JRuby's capabilities for using String-like operations against arbitrary binary data. That's gotta be worth something!

I want to also take this opportunity to again thank Marcin for his work on JRuby in the past; Tom Enebo for his suffering through encoding-related parser work the past few weeks; Yukihiro "Matz" Matsumoto for adding encoding support to Ruby; and all JRuby committers and contributors who have helped us sort out M17N for JRuby 1.6.

Flat and Graph Profiles for JRuby 1.6

2011-01-04T05:31:00.004-06:00

Sometimes it's the little things that make all the difference in the world.

For a long time, we've extolled the virtues of the amazing JVM tool ecosystem. There's literally dozens of profiling, debugging, and monitoring tools, making JRuby perhaps the best Ruby tool ecosystem you can get. But there's a surprising lack of tools for command-line use, and that's an area many Rubyists take for granted.

To help improve the situation, we recently got the ruby-debug maintainers to ship our JRuby version, so JRuby has easy-to-use command-line Ruby debugging support. You can simply "gem install ruby-debug" now, so we'll stop shipping it in JRuby 1.6.

We also shipped a basic "flat" instrumented profiler for JRuby 1.5.6. It's almost shocking how few command-line profiling tools there are available for JVM users; most require you to boot up a GUI and click a bunch of buttons to get any information at all. Even when there are tools for profiling, they're often bad at reporting results for non-Java languages like JRuby. So we decided to whip out a --profile flag that gives you a basic, flat, instrumented profile of your code.

To continue this trend, we enlisted the help of Dan Lucraft of the RedCar project to expand our profiler to include "graph" profiling results. Dan previously implemented JRuby support for the "ruby-prof" project, a native extension to C Ruby, in the form of "jruby-prof" (which you can install and use today on any recent JRuby release). He was a natural to work on the built-in profiling support.

For the uninitiated, "flat" profiles just show how much time each method body takes, possibly with downstream aggregate times and total aggregate times. This is what you usually get from built-in command-line profilers like the "hprof" profiler that ships with Hotspot/OpenJDK. Here's a "flat" profile for a simple piece of code.

~/projects/jruby ➔ jruby --profile.flat -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
Total time: 0.99

    total        self    children       calls  method
----------------------------------------------------------------
     0.99        0.00        0.99           1  Object#foo
     0.99        0.08        0.90           1  Fixnum#times
     0.70        0.70        0.00      100000  Bignum#to_s
     0.21        0.21        0.00      100000  Fixnum#**
     0.00        0.00        0.00         145  Class#inherited
     0.00        0.00        0.00           1  Module#method_added

A "graph" profile shows the top N call stacks from your application's run, breaking them down by how much time is spent in each method. It gives you a more complete picture of where time is being spent while running your application. Here's a "graph" profile (abbreviated) for the same code.

~/projects/jruby ➔ jruby --profile.graph -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
%total   %self    total        self    children                 calls  name
---------------------------------------------------------------------------------------------------------
100%     0%        1.00        0.00        1.00                     0  (top)
                   1.00        0.00        1.00                   1/1  Object#foo
                   0.00        0.00        0.00               145/145  Class#inherited
                   0.00        0.00        0.00                   1/1  Module#method_added
---------------------------------------------------------------------------------------------------------
                   1.00        0.00        1.00                   1/1  (top)
 99%     0%        1.00        0.00        1.00                     1  Object#foo
                   1.00        0.09        0.91                   1/1  Fixnum#times
---------------------------------------------------------------------------------------------------------
                   1.00        0.09        0.91                   1/1  Object#foo
 99%     8%        1.00        0.09        0.91                     1  Fixnum#times
                   0.70        0.70        0.00         100000/100000  Bignum#to_s
                   0.21        0.21        0.00         100000/100000  Fixnum#**
---------------------------------------------------------------------------------------------------------
                   0.70        0.70        0.00         100000/100000  Fixnum#times
 69%    69%        0.70        0.70        0.00                100000  Bignum#to_s
---------------------------------------------------------------------------------------------------------
                   0.21        0.21        0.00         100000/100000  Fixnum#times
 21%    21%        0.21        0.21        0.00                100000  Fixnum#**
---------------------------------------------------------------------------------------------------------
                   0.00        0.00        0.00               145/145  (top)
  0%     0%        0.00        0.00        0.00                   145  Class#inherited
---------------------------------------------------------------------------------------------------------
                   0.00        0.00        0.00                   1/1  (top)
  0%     0%        0.00        0.00        0.00                     1  Module#method_added

As you can see, you get a much better picture of why certain methods are taking up time and what component calls are contributing most to that time.

We haven't settled on the final command-line flags, but look for the new graph profiling (and the cleaned-up flat profile) to ship with JRuby 1.6 (real soon now!)

Improved JRuby Startup by Deferring Gem Plugins

2010-12-23T16:01:00.002-06:00

Another present for you JRubyists out there!

JRuby has had notoriously bad startup times. Not as bad as, say, IronRuby (sorry guys!), but definitely a big fat hit every time you need to run some Ruby code from the command line. Some of this overhead was related to JRuby, and we've steadily worked to improve that over the years. Some of it is due to the JVM, most commonly due to running on the "server" Hotspot VM or another JVM that does not have an interpreter (both of which start up considerably slower than Hotspot/OpenJDK's "client" mode). I've blogged tips and tricks for JRuby startup before, and these mostly apply to vanilla JRuby startup performance.

However, a large part of the overhead was not specifically due to JRuby or the JVM, but to RubyGems. RubyGems in version 1.3 added support for "plugins", whereby gems could include a specially-named file to extend the functionality of RubyGems itself. Most of these plugins added command-line tools like "gem push" for pushing a new gem to gemcutter.org (now built-in for pushing to rubygems.org). Unfortunately, the feature was originally added by having RubyGems do a full scan of all installed gems on every startup. If you only had a few gems, this was a minor problem. If you had more than a few, it became a big fat O(N) problem, where each of those N could be arbitrarily complex in themselves.

The good news is that it looks like my proposed change – making plugin scanning happen *only* when using the "gem" command – appears likely to be approved for RubyGems 1.4, due out reasonably soon.

Here's the patch and the impact to RubyGems startup times are below. The first two times are without the patch, with the first time against a "cold" filesystem. The final time is with the patch in place. In all cases, it's against my local JRuby working copy, which has around 500 gems installed.

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
17.09

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
6.959

~/projects/jruby ➔ git stash pop
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified:   lib/ruby/site_ruby/1.8/rubygems.rb
# modified:   lib/ruby/site_ruby/1.8/rubygems/gem_runner.rb
...

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
0.481

It's truly a shocking difference, and it's easy to see why JRuby (plus RubyGems) has had such a bad startup-time reputation.

I've already made this change locally to JRuby's copy of RubyGems, which should help any users working against JRuby master. The change will almost certainly ship in JRuby 1.6, with RCs showing up in the next couple weeks. So with this change and my JRuby startup tips, we're on the road to a much more pleasant JRuby experience.

Happy Hacking!

JRuby Finally Installs ruby-debug Gem

2010-12-23T15:31:00.002-06:00

This should be a great Christmas present for many of you.

After over three years, the "ruby-debug" gem finally installs properly on JRuby.

~/projects/jruby ➔ gem install ruby-debug
Successfully installed ruby-debug-base-0.10.4-java
Successfully installed ruby-debug-0.10.4
2 gems installed

Back in 2007, folks working on NetBeans, Eclipse, and IntelliJ support for Ruby came together to build a new version of the ruby-debug backend that would work on JRuby. They shared the effort, we JRuby guys added features they needed to do a clean port of the ruby-debug C code, and the ultimate result was a ruby-debug-base gem that isolated the platform-specific bits.

For whatever reason, the JRuby version of ruby-debug-base never got pushed as a real, live "-java" gem. This meant that you had to download ruby-debug-base-VERSION-java.gem yourself to get ruby-debug to install. To make this process easier, we even shipped ruby-debug and ruby-debug-base preinstalled in JRuby 1.5.

Unfortunately, this was only a partial answer. Many libraries and applications want to install all their dependencies clean. If one of those dependencies was ruby-debug, it would fail to install. Rails even includes special JRuby-specific lines in its default Bundler Gemfile to exclude ruby-debug when bundling on JRuby.

All that nonsense ends today. Rocky Bernstein, one of the maintainers of the ruby-debug gem, agreed to push our ruby-debug-base to the canonical rubygems.org repository. As a result, ruby-debug now installs properly on JRuby. It only took three years to get that gem pushed (by nobody's fault...I think everyone expected everyone else to follow through on it).

Merry Christmas, Happy Chanukah, Joyous Kwanza, and enjoy your holiday season!

Quick Thoughts on Oracle/Apache and the Java TCK

2010-12-23T13:43:00.003-06:00

This was going to be a reply to a friend on Twitter, when I realized it would be several tweets and I might as well put them in one place.

Grant Michaels (@grantmichaels) sent me this link to the October JCP (Java Community Process) EC (Expert Committee) meeting notes. The Apache/TCK (Technology Compatibility Kit) issue was discussed at length.

For those of you in the dark, Apache recently resigned from the JCP because of the ongoing dispute over their "Harmony" OSS (Open-Source Software) Java implementation's inability to get an unencumbered license to the Java TCK. Passing the TCK is a requirement for an implementation to officially be accepted as "Java".

I had heard about this problem from a distance, but only recently started to understand its complexity. The TCK includes FOU (Field of Use) clauses preventing TCK-tested implementations other than OpenJDK from being released as open-source. Only implementations "largely" based on OpenJDK (Open Java Development Kit, Sun's GPLed "Hotspot" VM and class libraries) are allowed to get around this requirement. Apache's Harmony, being entirely independent and Apache-licensed, does not qualify.

If that were all there is to it, Apache would not have any real grounds for complaining. But the JSPA (Java Specification Participation Agreement, PDF) requires only unencumbered specifications, reference implementations, and test kits be submitted to the JCP. This sets the stage for an ugly licensing battle that has stymied Java's progress for over three years.

I'm not going to do a tl;dr post on this like I did with Oracle/Google. Instead, just a few quick thoughts as I read through the notes.

Overview

The first thing you will notice is that this isn't a simple cut-and-dried issue. The meeting notes express Oracle's position as outwardly fearful of Harmony leading to many downstream forks, with no recourse for asserting they fulfill the requirements of the Java specification. There seems to be an implied stab at Android here, which uses Harmony's class libraries atop the Dalvik VM to implement a substantial portion (but not all) of what looks like Java SE 5. Oracle states this decision is final; Apache will not be granted an unencumbered TCK license. Oracle is not known for changing their minds.

Several EC members, including Doug Lea and Josh Bloch, point out that it's fairly clear the encumbered TCK violates the JSPA's openness clauses. Oracle refuses to comment on this "legal" matter. Doug suggests that EC members might be able to vote to move Java forward with a clearer conscience if the JSPA were amended to make the encumbered TCK "legal".

Another point brought up by several members is the frustration that they have to deal with licensing at all. They recall a golden age of the JCP where it actually voted on technical matters rather than arguing over licensing.

IBM declares they are unhappy with this decision, but even more unhappy that the Java platform has stagnated because of it for so long. IBM would eventually go on to vote "yes" to the disputed Java 7 JSR, even in the presence of the apparent JSPA violation.

Apache representative and longtime Harmony advocate Geir Magnusson also weighed in. He argued that the health of the platform would only be bolstered by allowing for many independent open-source implementations, and damaged by disallowing them. When asking for a clarification of why OpenJDK gets a free pass, Adam Messinger (Oracle) stated that he didn't want to answer a legal question, but that OpenJDK's GPL (Gnu Public License) requires reciprocity from downstream forks, reducing the damage and confusion they might cause if released publicly without full spec compliance (I'm paraphrasing based on the notes here).

Toward the end of the licensing discussion, Adam again called for all memory organizations to participate in OpenJDK. It's fairly clear from these notes and from previous announcements and discussions that Oracle intends for OpenJDK to be the "one true OSS Java", and for all comers to contribute to it. They even managed to get Apple and IBM, longtime GPL foes, to join the family. Apache doesn't do GPL, and so Apache will not contribute to or base projects off OpenJDK.

No Right Answer

Now we enter entirely into the realm of my own opinions.

At a glance, I hate Oracle for disallowing free use of the TCK. Nobody should be disallowed from implementing their own OSS Java. Harmony is a promising project that would have a promising future if the cloud of licensing, patent protection, and "compliance" could be lifted. It seems that's nearly impossible now.

I also hate to see a good project like Harmony "die" or be stuck in legal limbo. I'm sure dozens of developers have poured their hearts into Harmony, and they deserve to see it thrive.

On the other hand, I applaud Oracle for so vigorously promoting OpenJDK. OpenJDK is certainly a more mature JVM and class library than Harmony, having been Sun's official Java implementation for years and years. Bringing IBM and Apple into OpenJDK will help ensure it moves forward on all platforms of note. If you look only at OpenJDK, OSS Java is stronger now than it ever has been.

Oracle may have a point with the forking concerns. Because Apache's license is so open that anyone can create and release binary-only forks, it would in theory be possible to speckle the Java landscape with Harmony derivatives that are incompatible in subtle ways. Possible, but perhaps unlikely.

Ultimately, I feel sad about the direction Oracle has taken, but I'm bolstered by hopes that OpenJDK is going to really thrive and grow, and as a result "Java" will continue to see widespread use and adoption across a variety of platforms.

Other Ways Out

Now I go into pure speculation-mode.

A key problem with the TCK restriction is the fact that the Java specification provides patent grants only to compliant implementations. In other words, if you don't want an Oracle/Google-esque lawsuit once you have billions in the bank, you need to be compliant with the one true TCK. In order to be compliant with the TCK, you can't release your implementation as open source. The patent grant is obviously intended to ensure that third-party implementations are "really" Java.

There's a bit of a chicken-and-egg thing here. What if Harmony could pass the TCK? In theory, they may be at that point right now. Does the ability to pass the TCK but the inability to run it without tainting mean they're Java or not? If an unrelated third-party forked Harmony into "Barmony", acquired a TCK license, and proved that it passed...would that mean Harmony could be considered compliant without ever having run the TCK?

What if, as in Android, Harmony simply moved forward without claiming they were compliant? Oracle could eventually club them to death with patent bats. Perhaps such a legal battle would force the legal remifications of the JSPA violation to be addressed in court? Could Oracle survive a legal test of their violation in attempting to back up a patent suit?

What It Means For You

The ultimate question is how this affects Java developers today and going forward.

By most estimates, 99% of the worlds Java developers run on one of the standard Java implementations from Oracle, IBM, and other licensees. A massive number of them run atop Hotspot, either in an old closed-source Java 5/6 form or in an OpenJDK 6/7 form. "Nobody" runs Harmony, and so few if any day-to-day Java developers will be affected. That doesn't excuse the situation, but it does soften the actual damage caused.

Android is another peculiar case. It would be difficult for it to be tested compliant, even if there weren't FOU restrictions on doing so. It uses Harmony libraries but Dalvik VM. It is also a massive force in mobile development now, and killing it would likely put the final nail in mobile Java's coffin. Oracle has to know this. Oracle also has to know that killing Android would hand the mobile keys over to Apple and Microsoft forever. Could Android switch to using OpenJDK-based class libraries? Would that qualify it as being "largely" based on OpenJDK (noting that the class libraries are the vast majority of the code in OpenJDK)? Oracle/Google is likely to be stuck in court for a long time, while Android continues to expand into televisions and tablets along with telephones.

How about projects that build atop Java, like JRuby? Perhaps even higher a percentage of JRuby users are already running atop OpenJDK or an OpenJDK derivative like IcedTea or SoyLatte. Oracle pushing OpenJDK will only benefit those users. Ruboto (JRuby on Android) will follow whatever path Android itself ends up following, and nobody can see that future yet...but it seems unlikely Ruboto will ever die since it's unlikely Android will ever die.

In closing...I encourage everyone to read the EC notes and gather as much information as they can before claiming this is now the final "death" of Java. I also encourage everyone to contribute thoughts, clarifications, and speculation in the comments here.

Have a happy holiday!

Predator and Prey

2010-09-23T03:39:00.002-05:00

(This is a repost of an article I wrote in 2004, which I stumbled upon this evening and thought worthy of a reprint. Feel free to rip it up and offer your own commentary. I think it is still 100% valid.)

I came up with the most compelling idea for a Disney-style film the other day. (Ok, perhaps not the most compelling idea, but certainly a fair shot at one)

Over the years I've heard a number of biologists (ecologists, environmentalists, what have you) comment on (as in expound endlessly upon) something called the "Bambi Syndrome." Simply put, the "Bambi Syndrome" is brought about by cutesy, utopian images of nature, where only unexpected, amorphous entities (usually accompanied by menacing percussion or something equally non-musical) can embody "evil"; it is a view that, in all its splendor and glory, "nature" is "good," while "man" is "bad." The parallel between this viewpoint and several (all?) nature-based Disney films is apparent (although it should be said that Disney is far from being the only perpetrator of "Bambiism").

So then, you ask, if nature isn't "good", then what is it? Evil and good are purely human constructions. Truth be told, nothing that exists is innately "good" or "evil". These concepts exist only in the eye of the beholder: to the prey, the successful predator is evil; to the predator, the successful prey is evil.

It could then be considered a great disservice to continue teaching these false ideals to our children, no? This has been my opinion, and I have tried to take an approach with my own son of presenting these facts of nature in as unbiased a way as possible--whence springs the compelling idea.

Take a typical Disney movie; its clear definition of "good" and "evil" and its even clearer illustration of which roles fall into which category. This movie would begin the same. Also typically, it would be based in nature, perhaps at a very low stratus of the animal kingdom. Predator and prey would be represented by species A (the "good" prey) and B (the "evil" predator). A typical scene ensues, a contest between good and evil, predator and prey. The predator's evil nature is clearly illustrated here, but atypically, the predator wins.

Just as people in the audience are questioning their faith in Hollywood, we move up one stratum. The evil predator, returning home with the spoils of war, becomes a gentle, caring mother. She was not simply an "evil" aggressor, bent on death and destruction, but a doting, protective mother, expending her own effort, at risk of her life, to care for her childen. In this way, stratum after stratum, "evil" becomes "good", and the elaborate network that makes up our natural system becomes more recognizable for the purity, neutrality, simplicity of its form.

Finally, as you would expect in such a movie, we would arrive at the most prolific of the Great Apes: man. Illustrating that all kingdoms on earth are becoming man's prey, with as much tree-hugging, granola-chomping tripe as possible to make sure we, the lords of creation, masters of destiny, killers of all, Shiva to nature's Brahma , are shown--incontrovertably--as the only pure "evil" on earth, the movie careens ever faster toward some measure of certainty: "Ahh, now I understand the film's message."

But man is just another spoke in the wheel. We can easily flip the coin, showing mothers feeding, defending children, innocents preyed upon by murderers, hunters taking prey not for food, but for the feeding of other hungers. We do what we do not out of pure evil, but because it is our capacity to do so to further our own species, further our goals, perpetuate. But we also have a capacity no other species possesses: the ability to create our own destinies. The only true evil we encounter in a world where we nearly reign supreme is ourselves. We daily pit our most animal desires--acqusition of resources and destruction of usurpers--against our knowledge that such desires run rampant will complicate our path through history, perhaps even terminating it. Can such a machine be affected by the changing opinions of a few small components? That is the question we leave for the viewers.

The challenge in such a film would almost certainly be not overplaying the hand. No evil must ever appear to be of any different motivation than its antithesis; and man must, in the end, appear as the most schizophrenic creature on Earth. Our "evil" predatory instincts must be tempered by the "good" effects of our fear of intimate and ultimate mortality for us to continue indefinitely. In this, man has another trait not found among the animals: Our system balances on our own decisions alone. With the capacity we will soon possess to control nature completely, without fear of predators, we can only undo ourselves. The balance comes from within.

Where will the viewer lie?

I'd hope every kid was as confused as possible by then; and eventually a bit more suspicious of being told what is "good" or "evil".

My Thoughts on Oracle v Google

2010-08-15T20:53:00.022-05:00

As you've probably heard by now, Oracle has decided to file suit against Google, claiming multiple counts of infringement against Java or JVM patents and copyrights they acquired when they assimilated Sun Microsystems this past year. Since I'm unlikely to keep my mouth shut about even trivial matters, something this big obviously requires at least a couple thousand words.

Who Am I?

Any post of this nature really requires an author to identify where they stand, so their unavoidable biases can be taken with the appropriate dosage of salt. Rather than having you dig through my past and learn who and what I am, I'll just lay it out here.

I am a Java developer. I've been a Java developer since 1996 or so, when I got my first University job writing stupid little applets in this new-fangled web language. That job expanded into a web development position, also using Java, and culminated with me joining a few senior developers for a 6-month shared development gig with IBM's then-nascent Pacific Development Center in Vancouver, BC. Since then I've had a string of Java-related jobs...some as a trenches developer, some as a lead, some as "architect", but all of them heavily wrapped up in this thing called Java. And I can't say that I've ever been particularly annoyed with Java as a language or a platform. Perhaps I haven't spent enough time on other runtimes, or perhaps I've got tunnel-vision after being a Java developer for so many years. But I'd like to think that I've become "seasoned" enough as a developer to realize no platform is perfect, and the manifold benefits of the JVM and the Java platform vastly outweigh the troublesome aspects.

I am an open-source developer. In the late 90s, I worked in earnest on my first open-source project: the LiteStep desktop replacement for Windows. At the time, the LiteStep project was a loosely-confederated glob of C code and amateur C hackers. Being a Windows user at the time, I was looking to improve my situation...specifically, I had worked for years on a small application called Hack-It that exposed aspects of the win32 API normally unavailable through standard Windows UI elements, and I was interested in taking that further. LiteStep was not my creation. It had many developers before me and many after, but my small contribution to the project was an almost complete rewrite in amateur-friendly C++ and a decoupling of the core LiteStep "kernel" from the various plugin mechanisms. I was also interviewed for a Wired article on the then-new domain of "skinning" computers, desktops, applications, and so on, though none of my quotes made it into the article. After LiteStep, I fell back into mostly anonymous corporate software development, all still using Java and many open-source technologies, but not much of a visible presence in the OSS world. Then, in 2004 while working as the lead "Java EE Architect" for a multi-million-dollar US government contract, I found JRuby.

I am a JRuby developer. Since 2004 (or really since late 2005, when I started helping out in earnest) I've been partially responsible for turning JRuby from an interesting novelty project into one of the top Ruby implementations. We've become well known as one of the best-performing – if not the best-performing – Ruby implementations, even faced with increasing competition from the young upstarts. We're also increasingly popular (and perhaps the easiest path) for bringing Ruby and its many paradigm-shifting libraries and frameworks (like Rails) to Java and JVM users around the world – without them having to change platforms or leave any of their legacy code behind. Part of my interest in JRuby has been to bring Ruby to the JVM, plain and simple. I like Ruby, I like the Ruby community, and on most days I like the cockiness and enthusiasm of those community members toward trying crazy new things. But another large part of my interest in JRuby is more sinister: I want to prove to naysayers what a great platform the JVM actually is, and perhaps make them think twice about knee-jerk biases they've carried and cultivated for so many years.

You'll notice I refer to JRuby not as "it" or "she" or "he", but as "we". "We've become well known...We're also increasingly popular..." That's not an accident. There's now over five years of my efforts in JRuby, and I consider it to be as much a part of me as I am a part of it. And so because of that, I have a much deeper, emotional investment in the platform upon which JRuby rests.

I am a Java developer. I am an open-source developer. I am a JRuby developer and a Ruby fan.

I am not a lawyer.

The Facts, According to Me

These are the facts as I see them. You're free to disagree with my interpretation of the world, and I encourage you to do so in the comments, on other forums, over email, or to my face (but buy me a beer first).

On Java

The Java platform is big. Really big. You just won't believe how vastly hugely mindbogglingly big it is. And by big, I mean it's everywhere.

There are three mainstream JVMs people know about: JRockit (WebLogic's first and then Oracle's after it acquired them), Hotspot (Which came to Sun through an acquisition and eventually became OpenJDK), and J9 (IBM's own JVM, fully-licensed and with all its shots). Upon those three JVMs lives a gigantic world. If you want the details, there's numerous studies and reports about the use of Java in all manner of business, from the hippest new startups (Twitter recently switched much of their stack to the JVM) to the oldest of the old financial concerns. It's the favored choice for government server applications, the strongest not-quite-completely-Free managed runtime for open-source libraries and applications, and now with Android it's rapidly becoming one of the strongest (if not the strongest) mobile OS platform (even though Android isn't *really* Java, as I'll get into later). You may love or hate Java, but I guarantee it's part of your life in some way or another.

There are a few open-source implementations of Java. The most well-known is OpenJDK, the Hotspot JVM that Sun relicensed under the GPL and set Free into the world. There's also Apache Harmony, whose class libraries form part of Dalvik's (Android's VM) Java-compatibility layer. There's GNU Classpath, a GPL-based implementation of the Java class libraries used for the ahead-of-time Java compiler GCJ. There's JamVM, which leverages Classpath to provide a very light, very minimal, (and very simple) JVM implementation. And there's others of varying qualities and relevance like IKVM (Java for .NET), VMKit (a Java compiler atop LLVM), and so on. OpenJDK is certainly the big daddy, though, and its release as GPL guarantees we'll at least have a solid Java 6 implementation forever.

Java is not an entirely open platform, what with the now-obvious encumbrances of patents and copyrights (not to mention draconian policies toward Java's various specifications, which are often very slow to evolve due to the JCP quagmire). That's not a great state of affairs, and if nothing else you have to recognize that folks at Sun at least tried to release the platform from its shackles by chasing OpenJDK. But the process of "freeing" Java has been pretty rocky; OpenJDK itself took years to gain acceptance from OSS purists, and the choice of the GPL has meant that folks afeared of the GPL's "viral" side still had to look for other options (which is a large part of why Apache Harmony was used as part of the basis for Android). Perhaps the biggest nail in the coffin is that Sun's Java test kit, the gold standard of whether an implementation is "compliant" or not, has never been released in open-source form, ultimately binding the hands of developers who wished to build a fully-compatible open-source Java.

Java is not an entirely closed platform, either. OpenJDK was a huge step in the direction of Freeing Java, and the Java community in general has a very strong OSS ethos. There's no piece of Java software that isn't at least partially based on open-source componenents, and most Java library, framework, or application developers either initially or eventually open-source some or all of their works. Open-source development and the Java platform go hand-in-hand, and without that relationship the platform would not be where it is today. Contrast that to other popular environments like Microsoft's .NET – which has been admirably Freed through open standards, but which has not yet become synonymous with or popular for OSS development – or Apple's various platforms – which aren't based on open-standards *or* open-source, but which have managed to become many OSS developers' environment of choice...for writing or consuming non-Apple open-source software. Among the corporation-controlled runtimes, the Java platform has more OSS in its blood than all others combined...many times more.

Java is not perfect, but it's pretty darn good. Every platform has its warts. The Java platform represents a decade and a half of tradeoffs, and it's impossible in that amount of time to make everyone happy all the time. One of the big contentious items is the addition in Java 5 of parametric polymorphism as a compile-time trick without also adding VM-level support for reifying per-type specializations as .NET can do. But ask most Java developers if they'd rather have nothing at all, and you'll get mixed responses. The sad, crippled version of generics in Java 5 doesn't do everything static-typing purists want, nor does it really extend to runtime at all (making reflective introspection almost impossible), but they do provide some nice surface-level sugar for Java developers. The same can be said of many Java "features" and tradeoffs. JavaEE became an abortively complicated jumble of mistakes (tradeoffs that went bad), but even upstarts that arguably made better decisions initially have themselves graduated into chaos (I believe the Spring framework has now grown even larger than the largest Java EE conglomerate, and Microsoft's periodically reboots their blessed-framework-of-the-week, resulting in an even more disruptive environment than a slow-moving, bulky standard like JavaEE). Designing good software is hard. Designing good *big* software is exponentially harder. Designing good *big* software that pleases everyone is impossible.

Why People Hate Java

Java is second only to Microsoft's platforms for being both wildly successful and almost universally hated by the self-sure software elite. The reasons for this are manifold and complex.

First of all, back in the 90s Java started getting shoved down everyone's throat. Developers were increasingly told to investigate this new platform, since their managers and long-disconnected tech leads kept hearing how great it was from Sun Microsystems, then a big deal in server applications and hardware. So developers that were happily using other environments (many of which exist to this day) often found themselves forced to suck it up and become Java developers. Making matters worse, Java itself was designed to be a fairly limited language...or at least limited in how easily a developer could paint themselves into a corner. Many features those reluctant developers had become used to in other environments were explicitly rejected for Java on the grounds that they added too much complexity, too much confusion, and too little value to trenches developers. So people that were happily doing Perl or C++ or Smalltalk or what have you were suddenly forced into a little J-shaped box and forced to write all those same applications upon Java and the JVM at a time when both were still poorly-suited to those domains. Those folks have had a white-hot hate for anything relating to Java ever since, and many will stop at nothing to see the entire platform ejected into space.

Second, as mentioned quickly above, Java in the 90s was simply not that great a platform. It had most of the current warts (classpath issues, VM limitations, poor system-level integration, a very limited language) on top of the fact that it was slow (optimizing JVMs didn't come around until the 2000s), marketed for highly-visible, highly-fickle application domains like desktop and browser-based applications (everyone's cursed a Java app or applet at some point in their life), and still largely driven and controlled by a single company (at a time when many developers were trying to get out from under Microsoft's thumb). It wasn't until Java 1.2 that we started to get a large and diverse update to Java's core libraries. Java 1.3 was the first release to ship Hotspot, which started to get the performance monkey off our backs. Java 1.5 brought the first major changes to the Java language, all designed to aid developers in expressing what they meant in standard ways (like using type-safe enums instead of static final ints, or generics for compiler-level assurances of collection homogeneity). And Java 6, the last major version, made great strides in improving startup time, overall performance, and manageability of JVM processes. Java 7, should it ever ship, will bring new changes to the Java language like support for closures and other syntactic sugar, better system-level integration features as found in NIO.2, and the feather in the cap: VM-level support for function objects and non-standard invocation sequences via Method Handles and InvokeDynamic. But unless you've been a Java developer for the past decade, all you remember is the roaring 90s and the pain Java caused you as a developer or a user.

Third, the Java language and environment has stagnated. Given years of declining fortunes at Sun Microsystems, disagreement among JCP members about the direction the platform should go, and a year of uncertainty triggered by Sun's collapse and rescue at the hands of Oracle, it's surprising anything's managed to get done at all. Java 7 is now many years overdue; they were talking about it when I joined Sun in 2006, and hoped to have preview releases within a year. For both technical and political reasons, it's taken a long time to bring the platform to the next level, and as a result many of the truly excellent improvements have remained on the shelf (much to my dismay...we really could use them in JRuby). For fast-moving technology hipsters, that's as good as dying on the vine; you need to shift paradigms on a regular schedule or you're yesterday's news.

Update: At least one commenter also pointed out that it took a long time for Java to be "everywhere", and even today most users still need to download and install it at least once on any newly-installed OS. Notable exceptions include Mac OS X, which ships a cracker-jack Java 6 based on Hotspot, and some flavors of Linux that come with some sort of Java installed out of the box. But this was definitely a very real problem; developers were being pushed to write apps and applets in Java, and users were forced to download a multi-megabyte installer just to run them...at a time when downloading multi-megabyte software was often a very painful ordeal. That would put a bad taste in anyone's mouth.

It's because of these and similar reasons that folks like Google finally said "enough is enough," and opted to start doing their own things. On the JRuby project, we've routinely hacked around the limitations of the JVM, be they related to its piss-poor process management APIs, its cumbersome support for binding native libraries, or its stubborn reluctance to become the world's greatest dynamic language VM. I've thought on numerous occasions how awesome it would be to spin off a company that took OpenJDK and made it "right" for the kinds of development people want to do today (and I'd love to be a part of that company), but such ventures are both expensive and light on profitability. Nobody pays for platforms or runtimes...they pay for services around those platforms or runtimes, services which are often anathema to the developers of those platforms and runtimes. So it required someone "bigger" to make that happen...someone who could write off the costs of the platform by funding it in creative new ways. Someone with a massive existing investment in Java. Someone with deep pockets and an army of the best developers in the business who love nothing more than a challenge. Someone like Google.

Why Android?

(Note that a lot of this is based on what information I've managed to glean from various conversations. Clarifications or corrections are welcome.)

There's an incredibly successful mobile Java platform out there. One that boasts millions of devices from almost all the major manufacturers, in form factors ranging from crappy mid-00s clamshells to high-end smartphones. A platform with hundreds or thousands of games and applications and freely-available development tools. That platform is Java ME.

Java ME started out as an effort to bring Java back to its original roots: as a language and environment for writing embedded applications. The baseline ME profiles are pretty bare; I did some CLDC development years ago and had to implement my own buffered streams and various data structures just to get by. Even the biggest profiles are still fairly restricted, and I don't believe any of them have ever graduated beyond Java 1.3-level featuresets. So Sun did a great job of getting Java ME on devices, back when people cared about Sun...and then they let mobile Java stagnate to a terrible degree while they spent all resources trying to get people to use Java EE and trying to get Java EE to suck less. So while resources were getting poured into EE, people started to form the same opinions of mobile Java they had formed about desktop and server Java years earlier.

At the same time, Java ME was one of the few Java-related technologies that brought in money. You see, in order for handset manufacturers to ship (and boast about) Java ME support, they had to license the technology from Sun. It wasn't a huge cash cow, but it was a cow nonetheless. Java ME actually made money for Sun. So in true Sun form, they loused it up terribly.

Fast forward to a few years ago. Google, recognizing that mobile devices finally were becoming the next great technology market, decided that leaving the mobile world in the hands of proprietary platforms was a bad idea. Java ME seemed like it could be an answer, but Sun was starting to get desperate for both revenue and relevance...and they'd started to back a completely new horse-that-would-be-cow called JavaFX, which they hoped to pimp as the next great development environment for in-browser and on-device apps alike. They weren't interested in making Java ME be what Google wanted it to be.

Google decided to take the hard route: they'd fund development of a new platform, building it entirely from open-source components, and leveraging two of the best platform technologies available: Linux, for the kernel, and Java, for the runtime environment. However there was a problem with Java: it was encumbered by all sorts of patents and copyrights and specifications and restrictions. Hell, even OpenJDK itself, the most complete and competitive OSS implementation of Java, could not be customized and shipped in binary-only form by hardware manufacturers and service providers due to it being GPL. So the answer was to build a new VM, use unencumbered versions of the core Java class libraries, and basically remake the world in a new, copyright and patent-free image. Android was born.

There's many parts to Android, several of which I'm not really qualified to talk about. But the application environment that runs atop the Dalvik VM needs some explanation.

First, there's the VM. Dalvik is *not* a JVM. It doesn't run JVM bytecode, and you can't ship JVM bytecode expecting it to work on Dalvik. You must recompile it to Dalvik's own bytecode using one of the provided translation tools. This is similar to how IKVM gets Java code to run on .NET: you're not actually running a JVM, you're transforming your code into a different form so it will run on someone else's VM. So it bears repeating, lest anyone get confused: Dalvik is not a JVM...it just plays one on TV.

Second, there's the core Java class libraries. Android supports a rough (but large) subset of the Java 1.5 class libraries. That subset is large enough that projects as complicated as JRuby can basically run unmodified on Android, with very few restrictions (a notable one is the fact that since we can't generate JVM bytecode, we can't reoptimize Ruby code at runtime right now). In order to do this without licensing Sun's class libraries (as most other mainstream Java runtimes like JRockit and J9 do), Google opted to go with the not-quite-complete-but-pretty-close Apache Harmony class libraries, which had for years been developed independent of Sun or OpenJDK but never really tested against the Java compatibility kits (and there's a long and storied history behind this situation).

So by building their own non-JVM VM and using translated versions of non-Sun, non-encumbered class libraries, Google hoped to avoid (or at least blunt) the possibility that their "unofficial", "unlicensed" mobile Java platform might face a legal test. In short, they hoped to build the open mobile Java platform developers wanted without the legal and financial encumbrances of Java ME.

At first, they seemed to be on a gravy train with biscuit wheels.

Splitting Up the Pie

Sun Microsystems was not amused. A little over a year ago, when several Sun developers started to take an eager interest in Android, we were all told to back off. It wasn't yet clear whether Android stood on solid legal ground, and Sun execs didn't want egg on their face if a bunch of their own employees turned out to be supporting a platform they'd eventually have to attack. Furthermore, it was an embarrassment to see Android drawing in the same developers Sun really, really wanted to look at JavaFX or PersonalJava or whatever the latest attempt to bring developers back might be. Android actually *was* a great platform that supported existing Java developers and libraries incredibly well (without actually being a Java environment), and for the first time there was a serious contender to "standard" Java that Sun had absolutely no control over.

To make matters worse, handset manufacturers started to sign on in droves to this new non-Java ME platform, which meant all that technology licensing revenue was reaching a dead end. Nobody (including me) wanted to do Java ME development anymore. Everyone wanted to do Android development.

Now we must say one thing to Sun's credit: they didn't do what Oracle is now attempting to do. As James Gosling blogged recently, patent litigation just wasn't in Sun's blood...even if there might have been legal ground to file suit. So while we Sun employees were still quietly discouraged from looking at or talking about Android, the rest of the world took Sun's silence as carte blanche to stuff Android into everything from phones to TVs, and mobile app developers started to think there might be hope for a real competitor to Apple's iPhone. Things might have proceeded in this way indefinitely, with Android continuing to grab market share (it recently passed iPhone in raw numbers with no slowing in sight) and mindshare (Android is far more approachable than almost any other mobile development environment, especially if you're one of the millions of developers who know Java.)

And then it all started to go wrong.

The Mantle of Java Passes to Oracle

If for nothing else, Jonathan Schwartz will be remembered as the man who broke open the Sun piñata, simultaneously releasing more open-source software than any company in history and killing Sun in the process. Either Jonathan had no "step 2" or the inertia of a company built on closed-source products was too great to overcome. In either case, by spring of 2009 Sun was hemorrhaging. Many reports claim that Jonathan had started shopping Sun around to possible buyers as early as 2008, but it wasn't until 2009 that the first candidates started lining up. Initially, it was IBM, hoping to gobble up its former competitor along with the IP, patents, and copyrights they carried. That deal ultimately went south when Sun refused to consider any deal that IBM wouldn't promise to carry to completion, even in the face of regulatory roadblocks sure to come up. Many of us breathed a sigh of relief; if there's any Java company even more firmly stuck in the old world than Sun, it's IBM...and we weren't looking forward to dealing with that.

Once that deal fell through, folks like me became resigned to the fact that Sun was nearing the end of its independent life. Years of platform negligence, management incompetence, and resting on laurels had dug a hole far too deep for anyone to climb out of. Would it be Cisco, who had recently started building up an interesting new portfolio of application server hardware and virtualization software? What about VMWare, who had recently gobbled up Springsource and seemed to be making all the right moves toward a large-scale virtualized "everything cloud." Or perhaps Oracle, a long-time partner to Sun, whose software was either Java-based or widely deployed on Sun hardware and operating systems. Dear god, please don't let it be Oracle.

Don't get me wrong...Oracle's a highly successful company. They've managed to turn almost every acquisition into gold while coaxing profitability out of just about every one of their divisions. But Oracle's not a developer-oriented company (like Sun)...it's a profit-oriented company (unlike Sun, sadly), and you need to either feed the bottom line or feed others in the company that do. So when it turned out that Oracle would gobble up Sun, many of us OSS folks started to get a little nervous.

You see, many of us at Sun had been actively trying to change the perception of the platform from that of a corporate, enterprisey, closed world to that of a great VM with a great OSS ecosystem and an open-source reference implementation. Folks like Jonathan believed that by freeing Java we'd free the platform, and both the platform and the developer community would be better for it. We were half right...the OpenJDK genie is out of the bottle, and there's basically no way to put it back now (and for that, the world owes Sun a great debt). But only part of the platform was Freed...the patents and copyrights surrounding Hotspot and Java itself remained in place, carefully tucked away in the vault of a company that just didn't mount patent or copyright-driven legal attacks.

Oracle, now in control of those patents and copyrights, obviously has different plans.

The Suit

So now, after spending 4000 words of your time, we come to the meat of the article: the actual Oracle v Google suit. The full text is provided various places online, though the Software Patents Wiki has probably the best collection of related facts (though the wiki-driven discussions of the actual patents are woefully inaccurate).

The suit largely comes down to a patent-infringement battle. Oracle claims that by developing and distributing Android, Google is in violation of seven patents. There's also an amorphous copyright claim without much backing information ("Google probably stole something copyrighted so we'll list a bunch of stuff commonly stolen in that way"), so we'll skip that one today.

Before looking at the actual patents involved, I want to make one thing absolutely clear: Oracle has not already won this suit. Even after a couple days of analysis, nobody has any idea whether they *can* win such a suit, given that Google seems to have taken great pains to avoid legal entanglements when designing Android. So everybody needs to take a deep breath and let things progress as they should, and either trust that things will go the right direction or start doing your damndest to make sure they go the right direction.

With that said, let's take a peek at the patents, one by one. And as always, the "facts" here are based on my reading of the patents and my understanding of the related systems.

Protection Domains To Provide Security In A Computer System (6,125,447) and Controlling Access To A Resource (6,192,476)

The first two patents describe the Java Security Policy system, for controlling access to resources. One of the least-interesting but most-important aspects of the Java platform is its approach to security. Code under a specific classloader or thread can be forced to comply with a specific security policy by installing a security manager. These permissions control just about every aspect of the code's interaction with the JVM and with the host operating system: loading new code, reflectively accessing existing classes, accessing system-level resources like devices and filesystems, and so on. It's even easy for you to build up security policies of your own by checking for custom-named permissions and only granting them when appropriate. It's a pretty good system, and one of the reasons Java has a much stronger security track record than other runtimes that don't have pervasive security in mind from the beginning.

In order to host applications written for the Java platform, and to sandbox them in a compatible way, Android necessarily had to support the same security mechanisms. The problem here is the same problem that plagues many patents: what boils down to a fairly simple and obvious way to solve a problem (associate pieces of code with sets of permissions, don't let that code do anything outside those permissions) becomes so far-reaching that almost any reasonable *implementation* of that idea would violate these patents. In this case the '447 and '476 patents do describe mechanisms for implementing Java security policies, but even that simple implementation is very vague and would be hard to avoid with even a clean-room implementation.

Now I do not know exactly how Android implements security policies, but it's probably pretty close to what's described in these patents...since just about every implementation of security policies would be pretty close to what's described.

Method And Apparatus For Preprocessing And Packaging Class Files (5,966,702)

This is basically the patent governing the "Pack200" compression format provided as part of the JDK and used to better-compress class file archives.

Update: Alex Blewitt has posted a discussion of the Pack200 specification. He says this patent isn't nearly as comprehensive, but that it may touch upon how Pack200 works. His post is a more complete treatment of the details of the class file format and how Pack200 improves compression ratios for class archives. It also occurs to me now that this patent could be related to mobile/embedded Java too, where better compression would obviously have an enormous savings.

Java class files are filled with redundant data. For example, every class that contains code that calls PrintStream.println (as in System.out.println) contains the same "constant pool" entry identifying that method by name, a la "java/io/PrintStream.println:(Ljava/lang/String;)V". Every field lookup, class reference, literal string, or method invocation will have some sort of entry in the constant pool. Pack200 takes advantage of this fact by compressing all class files as a single unit, batching duplicate data into one place so that the actual unique class data boils down to just the unique class, method, and code structure.

The reason for having a separate compression format is because "zip" files, which includes Java's "jar" files, are notoriously bad at compressing many small files with redundant data. Because one of the features of the "zip" format is that you can easily pull a single file out, compressing all files together as a single unit prevents introducing any interdependencies between those files or a global table. This is a large part of why compression formats like "tar.gz" do a better job of compressing many small files: tar turns many files into one file, and gzip or bzip2 compress that one large file as a single unit (conversely, this is why you can't easily get a single file out of a tarball).

On Android, this is accomplished in a similar way by the "dex" tool, which in the process of translating JVM bytecode into Dalvik bytecode also localizes all duplicate class data in a single place. The general technique is standard data compression theory, so presumably the novelty lies in applying decades-old compression theory specifically to Java classfile structure.

If I've lost you at this point, we can summarize it this way: part of Oracle's suit lies in a patent for a better compression mechanism for archives containing many class files that takes advantage of redundant data in those files.

Are you laughing yet?

System And Method For Dynamic Preloading Of Classes Through Memory Space Cloning Of A Master Runtime System Process (7,426,720)

I'm not sure this patent ever saw the light of day in a mainstream JVM implementation. It describes a mechanism by which a master parent process could pre-load and pre-initialize code for a managed system, and then new processes that need to boot quickly would basically be memory-copied (plus copy-on-write friendly) "forks" of that master process, with the master maintaining overall control of those child processes through some sort of IPC.

Ignore for the moment the obvious prior art of "fork" itself as applied to pre-initializing application state for many children. Anyone who's ever used fork to initialize a heavy process or runtime to avoid the cost of reinitializing children has either violated this patent (if done since 2003) or has a compelling case for prior art (if done before 2003).

It's likely that this patent was formulated as an answer to the poor semantics of running many applications under the same JVM. Java servlets and later Java EE made it possible to consider deploying all of your company's applications in a single process, isolated by classloaders and security policies. What they never really addressed was the fact that code isn't the only thing you're sharing in this model; you're also sharing memory space, CPU time, and process resources like file descriptors. No amount of Java classloader or security trickery could make this a seamless multiapp environment, and so work like this patent hoped to find a lightweight way for all those child applications to actually live as their own processes.

On Android, this manifests in the fact that each application runs independently, and they (like most operating systems) fork off from either the kernel process or some master process.

In this case, Oracle's banking on being able to litigate with a patent for a very common application of "fork".

Method And Apparatus For Resolving Data References In Generated Code (RE38,104)

This patent, invented by James Gosling himself, basically describes a mechanism by which symbolic data references in code (e.g. Java field references) can be resolved dynamically at runtime into actual direct memory accesses, eliminating the symbolic lookup overhead. It's part of standard JIT optimization techniques, and there's a lot of references in this patent many great JIT patents and papers of the past.

Here there may actually be merit, or as much merit as can be found in a software patent to begin with. The patent itself is tiny, as most of these patents are. The techniques seem obvious to me, but perhaps they're obvious because this patent helped make them standard. I'm not qualified to judge. What I can say is that I can't imagine a VM in existence that doesn't violate the spirit – if not the letter – of this patent as well. All systems with symbolic references will seek to eliminate the symbolic references in favor of direct access. The novelty of this patent may be in doing that translation on the fly...not even at a decidedly coarse-grained per-method level, but by rewriting code while the method is actually executing.

I would guess that this is a patent filed during the development of Java's earlier JIT technologies, before systems like Hotspot came along to do a much better large-scale, cross-method job of optimization. It doesn't seem like it would be hard to debunk the novelty of the patent, or at least show prior art that makes it irrelevant.

Update: I actually found a reference in the article Dalvik Optimization and Verification with dexopt to the technique described here (about 3/4 down the page, under "Optimization"):

"The Dalvik optimizer does the following: ... For instance field get/put, replace the field index with a byte offset. ..."

But Dalvik still does this only once, before running the code (actually, at install time); not *while* running the code as described in the patent.
Interpreting Functions Utilizing A Hybrid Of Virtual And Native Machine Instructions (6,910,205)

This patent, invented by Lars Bak of V8 fame, describes a mechanism for building a "mixed mode" VM that can execute interpreted code and compiled (presumably optimized) code in the same VM process, and flip between the two over time to produce better-optimized compiled code. This describes the basic underpinnings of VMs like Hotspot, which alternate between interpreting virtual machine code and executing real machine code even within the same thread of execution (and sometimes, even branching from virtual code to real code and back within the same method body). Any other VMs that are mixed mode would probably violate this patent, so its impact could reach much farther than Android. (In a sense, even JRuby might violate this patent, though our two mixed modes are both virtual instruction sets.)

Now you might think the other mainstream JVMs would violate this patent, but they don't. Neither JRockit nor J9 have interpreters; they both go immediately to native code with various tiers of instrumentation to do the runtime profile data gathering. They iterative regenerate native code with successively more and better optimizations. Lars most recent VM, the V8 Javascript VM at the heart of Chrome, also goes straight to native code.

Now here's where it gets weird: Up until Froyo (Android 2.2) Dalvik did a once-only compilation to native code before anything started executing, which means by definition that it was not mixed-mode. And even in Froyo, I believe it still does its initial execution in native code form with instrumentation to allow subsequent compiles to do a better job. Dalvik does not have an interpreter, Dalvik does not interpret Dalvik bytecode.

Perhaps someone can explain how this patent even applies to Dalvik or Android?

Update: A couple commenters correct me here: Dalvik actually was 100% interpreted before Froyo, and is now a standard mixed-mode environment post-Froyo. So if this suit had been filed a year ago this patent might not have been applicable, but it probably is now.

Method And System for Performing Static Initialization (6,061,520)

Sigh. This patent appears to revolve completely around a mechanism by which the static initialization of arrays could be "play executed" in a preloader and then rewritten to do static initialization in one shot, or at least more efficiently than running dozens of class initializers that just construct arrays and populate them. Of all the patents, this is probably the narrowest, and the mechanism described are again not very unusual, but there's probably a good chance that the "dex" tool does something along these lines to tidy up static initializers in Android applications.

Given the "preloader" aspect of this patent, I'd surmise that it was formulated in part to simplify static initialization of code on embedded devices or in applet environments (because on servers...the boot time of static initialization is probably of little concern). Because of the much more limited nature of embedded environments (especially in 1998, when this patent was filed) it would be very beneficial to turn programmatic data initialization into a simple copy operation or a specialized virtual machine instruction. And this may be why it could apply to Android; it's another sort of embedded Java, with a preloader (either dex or the dexopt tool that jit-compiles your app on the device) and resource limitations that would warrant optimizing static initialization.

So, Does the Suit Have Merit?

I'll again reiterate that I'm not a lawyer. I'm just a Java developer with a logical mind and a penchant for debunking myths about the Java platform.

The collection of patents specified by the suit seems pretty laughable to me. If I were Google, I wouldn't be particularly worried about showing prior art for the patents in question or demonstrating how Android/Dalvik don't actually violate them. Some, like the "mixed mode" patent, don't actually seem to apply at all. It feels very much like a bunch of Sun engineers got together in a room with a bunch of lawyers and started digging for patents that Google might have violated without actually knowing much about Android or Dalvik to begin with.

But does the suit have merit? It depends if you consider baseless or over-general patents to have merit. The most substantial patent listed here is the "mixed mode" patent, and unless I'm wrong that one doesn't apply. The others are all variations on prior art, usually specialized for a Java runtime environment (and therefore with some question as to whether they can apply to a non-Java runtime environment that happens to have a translator from Java code). Having read through the suit and scanned the patents, I have to say I'm not particularly worried. But then again, I don't know what sort of magic David Boies and company might be able to pull off.

What Might Happen?

In the unlikely event of a total victory by Oracle, there's probably a lot of possible outcomes. I don't see the "death of Java" among them. There's also the possibility that Google could win a convincing victory. What might happen in each case?

The Nuclear Option

The worst case scenario would be that Android is completely destroyed, all Android handsets are confiscated by the Oracle mafia and burned in the city square, and all hope for a Free and Open Java are forever laid to rest. Long live Mono.

To understand why this won't happen we need to explore Oracle's possible motives.

As I mentioned above, Java ME actually did bring licensing revenue to Sun. There's a lot of handset manufacturers, millions of handsets, and every one put a couple cents (or a couple bucks?) in Sun's pocket. In the heady high times of Java ME, it was the only managed mobile runtime in town, with sound and graphics and standard UI elements. It wasn't always pretty, but it worked well and it was really easy to write for.

Now with Android rapidly becoming the preferred mobile and embedded Java, it's become apparent that there's no future for Java ME - or at least no future in the expanding "smart" consumer electronics business. Java ME lives on in Blackberries, some other low-end phones, in most Blu-Ray devices (BD-J is a standard for writing Java apps that run on Blu-Ray systems, utilizing one of the richer class libraries available for Java ME), and in some sub-micro devices like Ajile's AJ-200 Java-based multimedia CPU. If you want Java on a phone or in your TV, Android is taking that world by storm. That means Java ME licensing revenue is rapidly drying up.

So why wouldn't Oracle want to take a bite of the rapidly-growing Android pie? Would they turn down a portion of that revenue and instead completely destroy a very popular and successful mobile Java, or would they just strongarm a few bucks out of Google and Android handset manufacturers? Remember we're talking about a profit-driven company here. Java ME is never going to come back to smartphones, that much is certain and I don't think even Oracle could argue it. There's no profit in filing this suit just to kill Android, since it would just mean competing mobile platforms like Windows Phone, RIM, Symbian, or iOS would just canibalize their younger brother. Instead of getting a slice of the fastest-growing segment of Java developers, you'd kill off the entire segment and force those developers to non-Java, non-Oracle-friendly platforms.

Oracle may be big and evil, but they're not stupid.

Google Licensing Deal

A more likely outcome might be that Google would be forced to license the patents or pay royalties on Android revenue. I honestly believe this is the goal of this lawsuit; Oracle wants to get their foot into the door of the smartphone world, and they know they can't innovate enough to make up for the collapse of Java ME. So they're hoping that by sabre-rattling a few patents, Google will be forced (or scared) into sharing the harvest.

Given the contents of the suit and the patents, I think this one is pretty unlikely too. Much of Android and Dalvik's designs are specifically crafted to avoid Java entanglements, and I think it's unlikely if this suit goes to trial that Oracle's lawyers would be able to make a convincing argument that the patents were both novel and that they were violated by Google. But let's not put anything past either the lawyers or the US federal court system, eh?

Nothing At All

There's a good chance that either Oracle or the court will realize quickly that the case has no merit, and drop all charges. I'm obviously hoping for this one, but it's likely to take the longest of all. First, the court would need to gather all facts in the case, which could take months (especially given the highly technical nature of some of the compaints). Then there's the rebuttals of those facts, sorting out the wheat from the chaff, deciding there's not enough there to proceed, and either Oracle backs out or the court tosses the case. In the latter case, there's the possibility of appeals, and things could start to get very expensive.

Total Collapse of Software Patents

This is probably the one of the highest-profile cases involving software patents in recent years. The other would be Apple's recent suit against HTC for design elements of the iPhone. Several other bloggers and analysts have called out the possibility that this could lead to the death of software patents in general. I think that's a bit optimistic, but both Google *and* Oracle have come down officially against patents in the past (though perhaps Oracle's had a change of heart since acquiring Sun's portfolio).

As much as I'd like to see it happen, software patents probably won't be dead in the next year or two. But this might be a nail in the coffin.

What Does This Mean for Java?

Now we come to the biggest question of all: how does this suit affect the Java world, regardless of outcome?

Well it's obviously not great to have two Java heavyweights bickering like schoolchildren, and it would be positively devastating if Android were obliterated because of this. But I think the real damage will be in how the developer community perceives Java, rather than in any lasting impact on the platform itself.

Let's return to some of our facts. First off, nothing in this suit would apply to any of the three mainstream JVMs that 99% of the world's Java runs on. Hotspot and JRockit are both owned by Oracle, and J9 is subject to the Java specification's patent grant for compliant implementations. The lesson here is that Android is the first Java-like environment since Microsoft's J++ to attempt to unilaterally subset or superset the platform (with the difference in Android's case being that it doesn't claim to be a Java environment, and it may not actually need the patent grant). Other Java implementations that "follow the Rules" are in the clear, and so 99% of the world's use of Java is in the clear. Sorry, Java haters...this isn't your moment.

This certainly does some damage to the notion of open-source Java implementations, but only those that are not (or can not be) compliant with the specification. As the Apache Harmony folks know all too well, it's really hard to build a clean-room implementation of Java and expect to get the "spec compliance patent grant" if you don't actually have the tools necessary to show spec compliance. Tossing the code over to Sun to run compliance testing is a nonstarter; the actual test kit is enormous and requires a huge time investment to set up and run (and Sun/Oracle have better things to do with their time than help out a competing OSS Java implementation). If the test kit had been open-sourced before Sun foundered, there would be no problem; everyone that wanted to make an open-source java would just aim for 100% compliance with the spec and all would be well. As it stands, independently implemented (i.e. non-OpenJDK) open-source Java is a really hard thing to create, especially if you have to clean-room implement all the class libraries yourself. Android has neatly dodged this issue by letting Android just be what it is: a subset of a Java-like platform that doesn't actually run Java bytecode and doesn't use any code from OpenJDK.

How will it affect Android if this case drags on? It could certainly hurt Android's adoption by hardware manufacturers, but they're already getting such an oustanding deal on the platform that they might not even care. Android is the first platform that has the potential to unify all hardware profiles, freeing manufacturers from the drudgery of building their own OSes or licensing OSes from someone else. Hell, HTC rose from zero to Hero largely because of their backing of Android and shipping of Android devices. Are they going to back off from that platform now just because Oracle's throwing lawyerbombs at Google? Probably not.

What Does This Mean For You?

If you're a non-Android Java developer...don't lose sleep over this. The details are going to take months to play out, and regardless of the outcome you're probably not going to be affected. Be happy, do great things, and keep making the Java platform a better place.

If you're an Android developer...don't lose sleep over this. Even if things go the way of the "Nuclear Option", you've still got a lot of time to build and sell apps and improve yourself as a developer. For a bit of novelty, start considering what a migration path might look like and turn that into a nice Android-agnostic application layer, something that's largely lacking in the current Android APIs. Or explore Android development in languages like JRuby, which are based on off-platform ecosystems that will survive regardless of Android's fate. Whatever you do, don't panic and run for the hills, and don't tell your friends to panic.

If you're mad as hell about this...I sympathize. I'm personally going to do whatever I can to keep people informed and keep pushing Android, including but not limited to writing 8000-word essays with my moderately-educated analysis of the "facts". I welcome your help in that fight, and I think it's a damn good time for people that want an open Java and an open mobile platform to show their quality by standing up and letting the world know we're here.

"All that is necessary for the triumph of evil is for good men to do nothing."

Do something, and we'll get through this together.

Footnote: Java Copyrights

I'd love for someone versed in copyright law to provide a brief analysis of how the Java copyrights described (vaguely) in the lawsuit might play out in Android. Java is certainly not ignored as a concept in Android docs, tools, and libraries, but it's unclear to me whether those copyrights amount to something enforceable when it comes to Android or Dalvik.

Update: "Crazy" Bob Lee emailed me to clear up a few facts. First off, Android and OpenJDK first came out around roughly the same time, so there was never really time to consider using OpenJDK's GPL'ed class libraries in Android. Bob also claims that Dalvik's design decisions were all technical and not made to circumvent IP, but it seems impossible to me that IP, patent, and licensing issues didn't have *some* influence on those decisions. He goes on to say that Android relies on process separation to sandbox applications, rather than leveraging Java security policies (or similar mechanisms (which Bob insists are badly designed anyway, and I might agree). Finally, he believes that in the worst case scenario, Dalvik would probably only require minor modifications to address the complaints in this suit. The "nuclear option" is, according to Bob, out of the realm of possibility.

Thanks for the clarifications, Bob!

What JRuby C Extension Support Means to You

2010-07-19T16:16:00.004-05:00

As part of the Ruby Summer of Code, Tim Felgentreff has been building out C extension support for JRuby. He's already made great progress, with simple libraries like Thin and Mongrel working now and larger libraries like RMagick and Yajl starting to function. And we haven't even reached the mid-term evaluation yet. I'd say he gets an "A" so far.

I figured it was time I talked a bit about C extensions, what they mean (or don't mean) for JRuby, and how you can help.

The Promise of C Extensions

One of the "last mile" features keeping people from migrating to JRuby has been their dependence on C extensions that only work on regular Ruby. In some cases, these extensions have been written to improve performance, like the various json libraries. Some of that performance could be less of a concern under Ruby 1.9, but it's hard to claim that any implementation will be able to run Ruby as fast as C for general-purpose libraries any time soon.

However, a large number of extensions – perhaps a majority of extensions – exist only to wrap a well-known and well-trusted C library. Nokogiri, for example, wraps the excellent libxml. RMagick wraps ImageMagick. For these cases, there's no alternative on regular Ruby...it's the C library or nothing (or in the case of Nokogiri, your alternatives are only slow and buggy pure-Ruby XML libraries).

For the performance case, C extensions on JRuby don't mean a whole lot. In most cases, it would be easier and just as performant to write that code in Java, and many pure-Ruby libraries perform well enough to reduce the need for native code. In addition, there are often libraries that already do what the perf-driven extensions were written for, and it's trivial to just call those libraries directly from Ruby code.

But the library case is a bit stickier. Nokogiri does have an FFI version, but it's a maintenance headache for them and a bug report headache for us, due to the lack of a C compiler tying the two halves together. There's a pure-Java Nokogiri in progress, but building both the Ruby bindings and emulating libxml behavior takes a long time to get right. For libraries like RMagick or the native MySQL and SQLite drivers, there are basically no options on the JVM. The Google Summer of Code project RMagick4J, by Sergio Arbeo, was a monumental effort that still has a lot of work left to be done. JDBC libraries work for databases, but they provide a very different interface from the native drivers and don't support things like UNIX domain sockets.

There's a very good chance that JRuby C extension support won't perform as well as C extensions on C Ruby, but in many cases that won't matter. Where there's no equivalent library now, having something that's only 5-10x slower to call – but still runs fast and matches API – may be just fine. Think about the coarse-grained operations you feed to a MySQL or SQLite and you get the picture.

So ultimately, I think C extensions will be a good thing for JRuby, even if they only serve as a stopgap measure to help people migrate small applications over to native Java equivalents. Why should the end goal be native Java equivalents, you ask?

The Peril of C Extensions

Now that we're done with the happy, glowing discussion of how great C extension support will be, I can make a confession: I hate C extensions. No feature of C Ruby has done more to hold it back than the desire for backward compatibility with C extensions. Because they have direct pointer access, there's no easy way to build a better garbage collector or easily support multiple runtimes in the same VM, even though various research efforts have tried. I've talked with Koichi Sasada, the creator of Ruby 1.9's "YARV" VM, and there's many things he would have liked to do with YARV that he couldn't because of C extension backward compatibility.

For JRuby, supporting C extensions will limit many features that make JRuby compelling in the first place. For example, because C extensions often use a lot of global variables, you can't use them from multiple JRuby runtimes in the same process. Because they expect a Ruby-like threading model, we need to restrict concurrency when calling out from Java to C. And all the great memory tooling I've blogged about recently won't see C extensions or the libraries they call, so it introduces an unknown.

All that said, I think it's a good milestone to show that we can support C extensions, and it may make for a "better JNI" for people who really just want to write C or who simply need to wrap a native library.

How You Can Help

There's a few things I think users like you can help with.

First off, we'd love to know what extensions you are using today, so we can explore what it would take to run them under JRuby (and so we can start exploring pure-Java alternatives, too.) Post your list in the comments, and we'll see what we can come up with.

Second, anyone that knows C and the Ruby C API (like folks who work on extensions) could help us fill out bits and pieces that are missing. Set up the JRuby cext branch (I'll show you how in a moment), and try to get your extensions to build and load. Tim has already done the heavy lifting of making "gem install xyz" attempt to build the extension and "require 'xyz'" try to load the resulting native library, so you can follow the usual processes (including extconf.rb/mkmf.rb for non-gem building and testing.) If it doesn't build ok, help us figure out what's missing or incorrect. If it builds but doesn't run, help us figure out what it's doing incorrectly.

Building JRuby with C Extension Support

Like building JRuby proper, building the cext work is probably the easiest thing you'll do all day (assuming the C compiler/build/toolchain doesn't bite you.

Check out (or fork and check out) the JRuby repository from http://github.com/jruby/jruby:
```
git clone git://github.com/jruby/jruby.git
```

Switch to the "cext" branch:
```
git checkout -b cext origin/cext
```

Do a clean build of JRuby plus the cext subsystem:
```
ant clean build-jruby-cext-native
```

At this point you should have a JRuby build (run with bin/jruby) that can gem install and load native extensions.

Browsing Memory with Ruby and Java Debug Interface

2010-07-17T16:58:00.005-05:00

This is the third post in a series. The first two were on Browsing Memory the JRuby Way and Finding Leaks in Ruby Apps with Eclipse Memory Analyzer

Hello again, friends! I'm back with more exciting memory analysis tips and tricks! Ready? Here we go!

After my previous two posts, several folks asked if it's possible to do all this stuff from Ruby, rather than using Java or C-based apps shipped with the JVM. The answer is yes! Because of the maturity of the Java platform, there are standard Java APIs you can use to access all the same information the previous tools consumed. And since we're talking about JRuby, that means you have Ruby APIs you can use to access that information.

That's what I'm going to show you today.

Introducing JDI

The APIs we'll be using are part of the Java Debug Interface (JDI), a set of Java APIs for remotely inspecting a running application. It's part of the Java Platform Debugger Architecture, which also includes a C/++ API, a wire protocol, and a raw wire protocol API. Exploring those is left as an exercise for the reader...but they're also pretty cool.

We'll use the Rails app from before, inspecting it immediately after boot. JDI provides a number of ways to connect up to a running VM, using VirtualMachineManager; you can either have the debugger make the connection or the target VM make the connection, and optionally have the target VM launch the debugger or the debugger launch the target VM. For our example, we'll have the debugger attach to a target VM listening for connections.

Preparing the Target VM

The first step is to start up the application with the appropriate debugger endpoint installed. This new flag is a bit of a mouthful (and we should make a standard flag for JRuby users), but we're simply setting up a socket-based listener on port 12345, running as a server, and we don't want to suspend the JVM when the debugger connects.

jruby -J-agentlib:jdwp=transport=dt_socket,server=y,address=12345,suspend=n -J-Djruby.reify.classes=true script/server -e production

The -J-Djruby.reify.classes bit I talked about in my first post. It makes Ruby classes show up as Java classes for purposes of heap inspection.

The rest is just running the server in production mode.

As you can see, remote debugging is already baked into the JVM, which means we didn't have to write it or debug it. And that's pretty awesome.

Let's connect to our Rails process and see what we can do.

Connecting to the target VM

In order to connect to the target VM, you need to do the Java factory dance. We start with the com.sun.jdi.Bootstrap class, get a com.sun.jdi.VirtualMachineManager, and then connect to a target VM to get a com.sun.jdi.VirtualMachine object.

vmm = Bootstrap.virtual_machine_manager
sock_conn = vmm.attaching_connectors[0] # not guaranteed to be Socket
args = sock_conn.default_arguments
args['hostname].value = "localhost"
args['port'].value = "12345"
vm = sock_conn.attach(args)

Notice that I didn't dig out the socket connector explicitly here, because on my system, the first connector always appears to be the socket connector. Here's the full list for me on OS X:

➔ jruby -rjava -e "puts com.sun.jdi.Bootstrap.virtual_machine_manager.attaching_connectors
> "
[com.sun.jdi.SocketAttach (defaults: timeout=, hostname=charles-nutters-macbook-pro.local, port=),
com.sun.jdi.ProcessAttach (defaults: pid=, timeout=)]

The ProcessAttach connector there isn't as magical as it looks; all it does is query the target process to find out what transport it's using (dt_socket in our case) and then calls the right connector (e.g. SocketAttach in the case of dt_socket or SharedMemoryAttach if you use dt_shmem on Windows). In our case, we know it's listening on a socket, so we're using the SocketAttach connector directly.

The rest is pretty simple: we get the default arguments from the connector, twiddle them to have the right hostname and port number, and attach to the VM. Now we have a VirtualMachine object we can query and twiddle; we're inside the matrix.

With Great Power...

So, what can we do with this VirtualMachine object? We can:

walk all classes and objects on the heap
install breakpoints and step-debug any running code
inspect and modify the current state of any running thread, even manipulating in-flight arguments and variables
replace already-loaded classes with new definitions (such as to install custom instrumentation)

Here's the output from JRuby's ri command when we ask about VirtualMachine:

➔ ri --java com.sun.jdi.VirtualMachine
-------------------------------------- Class: com.sun.jdi.VirtualMachine
     (no description...)
------------------------------------------------------------------------


Instance methods:
-----------------
     allClasses, allThreads, canAddMethod, canBeModified,
     canForceEarlyReturn, canGetBytecodes, canGetClassFileVersion,
     canGetConstantPool, canGetCurrentContendedMonitor,
     canGetInstanceInfo, canGetMethodReturnValues,
     canGetMonitorFrameInfo, canGetMonitorInfo, canGetOwnedMonitorInfo,
     canGetSourceDebugExtension, canGetSyntheticAttribute, canPopFrames,
     canRedefineClasses, canRequestMonitorEvents,
     canRequestVMDeathEvent, canUnrestrictedlyRedefineClasses,
     canUseInstanceFilters, canUseSourceNameFilters,
     canWatchFieldAccess, canWatchFieldModification, classesByName,
     description, dispose, eventQueue, eventRequestManager, exit,
     getDefaultStratum, instanceCounts, mirrorOf, mirrorOfVoid, name,
     process, redefineClasses, resume, setDebugTraceMode,
     setDefaultStratum, suspend, toString, topLevelThreadGroups,
     version, virtualMachine

We can basically make the target VM dance any way we want, even going so far as to write our own debugger entirely in Ruby code. But that's a topic for another day. Right now, we're going to do some memory inspection.

Creating a Histogram of the Heap
The simplest heap inspection we might do is to produce a histogram of all objects on the heap. And as you might expect, this is one of the easiest things to do, because it's the first thing everyone looks for when debugging a memory issue.

classes = VM.all_classes
counts = VM.instance_counts(classes)
classes.zip(counts)

VirtualMachine.all_classes gives you a list (a java.util.List, but we make those behave mostly like a Ruby Array) of every class the JVM has loaded, including Ruby classes, JRuby core and runtime classes, and other Java classes that JRuby and the JVM use. VirtualMachine.instance_counts takes that list of classes and returns another list of instance counts. Zip the two together, and we have an array of classes and instance counts. So easy!

Let's take these two pieces and put them together in an easy-to-use class

require 'java'

module JRuby
  class Debugger
    VMM = com.sun.jdi.Bootstrap.virtual_machine_manager
    
    attr_accessor :vm
    
    def initialize(options = {})
      connectors = VMM.attaching_connectors
      if options[:port]
        connector = connectors.find {|ac| ac.name =~ /Socket/}
      elsif options[:pid]
        connector = connectors.find {|ac| ac.name =~ /Process/}
      end

      args = connector.default_arguments
      for k, v in options
        args[k.to_s].value = v.to_s
      end
      
      @vm = connector.attach(args)
    end

    # Generate a histogram of all classes in the system
    def histogram
      classes = @vm.all_classes
      counts = @vm.instance_counts(classes)
      classes.zip(counts)
    end
  end
end

I've taken the liberty of expanding the connection process to handle pids and other arguments passed in. So to get a histogram from a VM listening on localhost port 12345, we can simply do:

JRuby::Debugger.new(:hostname => 'localhost', :port => 12345).histogram

Now of course this list is going to have a lot of JRuby and Java objects that we might not be interested in, so we'll want to filter it to just the Ruby classes. On JRuby master, all the generated Ruby classes start with a package name "ruby". Unfortunately, jitted Ruby methods start with a package of "ruby.jit" right now, so we'll want to filter those out too (unless you're interested in them, of course...JRuby is an open book!)

require 'jruby_debugger'

# connect to the VM
debugr = JRuby::Debugger.new(:hostname => 'localhost', :port => 12345)
histo = debugr.histogram
# sort by count
histo.sort! {|a,b| b[1] <=> a[1]}
# filter to only user-created Ruby classes with >0 instances
histo.each do |cls,num|
  next if num == 0 || cls.name[0..4] != 'ruby.' || cls.name[5..7] == 'jit'
  puts "#{num} instances of #{cls.name[5..-1].gsub('.', '::')}"
end

If we run this short script against our Rails application, we see similar results to the previous posts (but it's cooler, because we're doing it all from Ruby!)

➔ jruby ruby_histogram.rb | head -10
11685 instances of TZInfo::TimezoneTransitionInfo
1071 instances of Gem::Version
1012 instances of Gem::Requirement
592 instances of TZInfo::TimezoneOffsetInfo
432 instances of Gem::Dependency
289 instances of Gem::Specification
142 instances of ActiveSupport::TimeZone
118 instances of TZInfo::DataTimezoneInfo
118 instances of TZInfo::DataTimezone
45 instances of Gem::Platform

Just so we're all on the same page, it's important to know what we're actually dealing with here. VirtualMachine.all_classes returns a list of com.sun.jdi.ReferenceType objects. Let's ri that.

➔ ri --java com.sun.jdi.ReferenceType
--------------------------------------- Class: com.sun.jdi.ReferenceType
     (no description...)
------------------------------------------------------------------------


Instance methods:
-----------------
     allFields, allLineLocations, allMethods, availableStrata,
     classLoader, classObject, compareTo, constantPool,
     constantPoolCount, defaultStratum, equals, failedToInitialize,
     fieldByName, fields, genericSignature, getValue, getValues,
     hashCode, instances, isAbstract, isFinal, isInitialized,
     isPackagePrivate, isPrepared, isPrivate, isProtected, isPublic,
     isStatic, isVerified, locationsOfLine, majorVersion, methods,
     methodsByName, minorVersion, modifiers, name, nestedTypes,
     signature, sourceDebugExtension, sourceName, sourceNames,
     sourcePaths, toString, virtualMachine, visibleFields,
     visibleMethods

You can see there's quite a bit more you can do with a ReferenceType. Let's try something.

Digging Deeper Into TimezoneTransitionInfo

Let's actually take some time to explore our old friend TimezoneTransitionInfo (hereafter referred to as TTI). Instead of walking all classes in the system, we'll want to just grab TTI directly. For that we use VirtualMachine.classes_by_name, which returns a list of classes on the target VM of that name. There should be only one, since we only have a single JRuby instance in our server, so we'll grab that class and request exactly one instance of it...any old instance.

tti_class = debugr.vm.classes_by_name('ruby.TZInfo.TimezoneTransitionInfo')[0]
tti_obj = tti_class.instances(1)[0]
puts tti_obj

Running this we can see we've got the reference we're looking for.

➔ jruby tti_digger.rb
instance of ruby.TZInfo.TimezoneTransitionInfo(id=2)

ReferenceType.instances returns a list (no larger than the specified size, or all instances if you specify 0) of com.sun.jdi.ObjectReference objects.

➔ ri --java com.sun.jdi.ObjectReference
------------------------------------- Class: com.sun.jdi.ObjectReference
     (no description...)
------------------------------------------------------------------------


Instance methods:
-----------------
     disableCollection, enableCollection, entryCount, equals, getValue,
     getValues, hashCode, invokeMethod, isCollected, owningThread,
     referenceType, referringObjects, setValue, toString, type,
     uniqueID, virtualMachine, waitingThreads

Among the weirder things like disabling garbage collection for this object or listing all threads waiting on this object's monitor (a la 'synchronize' in Java), we can access the object's fields through getValue and setValue.

Let's examine the instance variables TTI contains. You may recall from previous posts that all Ruby objects in JRuby store their instance variables in an array, to avoid the large memory and cpu cost of storing them in a map. We can grab a reference to that array and display its contents.

var_table_field = tti_class.field_by_name('varTable')
tti_vars = tti_obj.get_value(var_table_field)
puts "varTable: #{tti_vars}"
puts tti_vars.values.map(&:to_s)

And the new output:

➔ jruby tti_digger.rb
varTable: instance of java.lang.Object[7] (id=13)
instance of ruby.TZInfo.TimezoneOffsetInfo(id=15)
instance of ruby.TZInfo.TimezoneOffsetInfo(id=16)
instance of org.jruby.RubyFixnum(id=17)
instance of org.jruby.RubyFixnum(id=18)
instance of org.jruby.RubyNil(id=19)
instance of org.jruby.RubyNil(id=19)
instance of org.jruby.RubyNil(id=19)

Since the varTable field is a simple Object[] in Java, the reference we get to it is of type com.sun.jdi.ArrayReference.

➔ ri --java com.sun.jdi.ArrayReference
-------------------------------------- Class: com.sun.jdi.ArrayReference
     (no description...)
------------------------------------------------------------------------


Instance methods:
-----------------
     disableCollection, enableCollection, entryCount, equals, getValue,
     getValues, hashCode, invokeMethod, isCollected, length,
     owningThread, referenceType, referringObjects, setValue, setValues,
     toString, type, uniqueID, virtualMachine, waitingThreads

Of course each of these references can be further explored, but already we can see that this TTI instance has seven instance variables: two TimezoneOffsetInfo objects, two Fixnums, and three nils. But we don't have instance variable names!

Instance variable names are only stored on the object's class. There, a table of names to offsets is kept up-to-date as new instance variable names are discovered. We can access this from the TTI class reference and combine it with the variable table to get the output we want to see.

# get the metaclass object and class reference
metaclass_field = tti_class.field_by_name('metaClass')
tti_class_obj = tti_obj.get_value(metaclass_field)
tti_class_class = tti_class_obj.reference_type

# get the variable names from the metaclass object
var_names_field = tti_class_class.field_by_name('variableNames')
var_names = tti_class_obj.get_value(var_names_field)

# splice the names and values together
table = var_names.values.zip(tti_vars.values)
puts table

This looks a bit complicated, but there's actually a lot of boilerplate here we could put into a utility class. For example, the metaClass and variableNames fields are standard on all (J)Ruby objects and classes, respectively. But considering that we're actually walking a remote VM's *live* heap...this is pretty simple code.

Here's what our script outputs now:

➔ jruby tti_digger.rb
"@offset"
instance of ruby.TZInfo.TimezoneOffsetInfo(id=25)
"@previous_offset"
instance of ruby.TZInfo.TimezoneOffsetInfo(id=26)
"@numerator_or_time"
instance of org.jruby.RubyFixnum(id=27)
"@denominator"
instance of org.jruby.RubyFixnum(id=28)
"@at"
instance of org.jruby.RubyNil(id=29)
"@local_end"
instance of org.jruby.RubyNil(id=29)
"@local_start"
instance of org.jruby.RubyNil(id=29)

We could go even deeper, but I think you get the idea.

Your Turn

Here's a gist of the three scripts we've created, so you can refer to and build off of them. And of course the javadocs and ri docs will help you as well, plus everything we've done here you can do in a jirb session.

There's a lot to the JDI API, but once you've got the VirtualMachine object in hand it's pretty easy to follow. As you'd expect from any debugger API, you need to know a bit about how things work on the inside, but through the magic of JRuby it's actually possible to write most of those fancy memory and debugging tools entirely in Ruby. Perhaps this article has peaked your interest in exploring JRuby internals using JDI and you might start to write debugging tools. Perhaps we can ship a few utilities to make some of the boilerplate go away. In any case, I hope this series of articles shows that JRuby users have an amazing library of tools available to them, and you don't even have to leave your comfort zone if you don't want to.

Note: The variableNames field is a recent addition to JRuby master, so if you'd like to play with that you'll probably want to build JRuby yourself or wait for a nightly build that picks it up. But you can certainly do a lot of exploring even without that patch.

Finding Leaks in Ruby Apps with Eclipse Memory Analyzer

2010-07-12T11:35:00.028-05:00

After my post on Browsing Memory the JRuby Way, one commenter and several other folks suggested I actually show using Eclipse MAT with JRuby. So without further ado...

The Eclipse Memory Analyzer, like many Eclipse-based applications, starts up with a "for dummies" page linking to various actions.

The most interesting use of MAT is to analyze a heap dump in a bit more interactive way than with the "jhat" tool. The analysis supports the "jmap" dump format, so we'll proceed to get a jmap dump of a "leaky" Rails application.

I've added this controller to a simple application:

class LeakyController < ApplicationController
  class MyData
    def initialize(params)
      @params = params
    end
  end
  
  LEAKING_ARRAY = {}
  def index
    LEAKING_ARRAY[Time.now] = MyData.new(params)
    render :text => "There are #{LEAKING_ARRAY.size} elements now!"
  end
end

Some genius has decided to save all recent request parameters into a constant on the LeakyController, keyed by time, wrapped in a custom type, and never cleaned out. Perhaps this was done temporarily for debugging, or perhaps we have a moron on staff. Either way, we need to find this problem and fix it.

We'll run this application and crank 10000 requests through the /leaky index, so the final request should output "There are 10000 elements now!"

~ ➔ ab -n 10000 http://localhost:3000/leaky
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
...

After 10000 requests have completed, we notice this application seems to grow and grow until it maxes out the heap (JRuby, being on the JVM, automatically limits heap sizes for you). Let's start by using jmap to investigate the problem.

~ ➔ jps -l
61976 org/jruby/Main
61999 sun.tools.jps.Jps
61837

~ ➔ jmap -histo 61976 | grep " ruby\." | head -5
 37:         11685         280440  ruby.TZInfo.TimezoneTransitionInfo
 40:         10000         240000  ruby.LeakyController.MyData
133:           970          23280  ruby.Gem.Version
137:           914          21936  ruby.Gem.Requirement
170:           592          14208  ruby.TZInfo.TimezoneOffsetInfo

We can see our old friend TimezoneTransitionInfo in there, but of course we've learned to accept that one. But what's this LeakyController::MyData object we've apparently got 10000 instances of? Where are they coming from? Who's holding on to them?

At this point, we can proceed to get a memory dump and move over to MAT, or have MAT acquire and open the dump in one shot, similar to VisualVM. Let's have MAT do it for us.

Getting Our Heap Into MAT

(Caveat: While preparing this post, I discovered that the jmap tool for the current OS X Java 6 (build 1.6.0_20-b02-279-10M3065) is not properly dumping all information. As a result, many fields and objects don't show up in dump analysis tools like MAT. Fortunately, there's a way out; on OS X, you can grab Soylatte or OpenJDK builds from various sources that work properly. In my case, I'm using a local build of OpenJDK 7.)

From the File menu, we select Acquire Heap Dump.

The resulting dialog should be familiar, since it lists the same JVM processes the "jps" command listed above. (If you had to specify a specific JDK home, like me, you'll need to click the "Configure" button and set the "jdkhome" flag" for "HPROF jmap dump provider".)

We'll pick our Rails instance (pid 61976) and proceed.

MAT connects to the process, pulls a heap dump to disk, and immediately proceeds to parse and open it.

Once it has completed parsing, we're presented with a few different paths to follow.

On other days, we might be interested in doing some component-by-component browsing to look for fat objects or minor leaks, or we might want to revisit the results of previous analyses against this heap. But today, we really need to figure out this MyData leak, so we'll run the Leak Suspects Report.

Leak Suspects?

Are you kidding? A tool that can search out and report possible leaks in a running system? Yes, Virginia, there is a Santa Claus!

This is the good side of the "plague of choices" we have on the JVM. Because there's so many tools for almost every basic purpose (like the dozen – at least – memory inspection tools), tool developers have moved on to more specific needs like leak analysis. MAT is my favorite tool for leak-hunting (and it uses less memory than jhat for heap-browsing, which is great for larger dumps).

Once MAT has finished chewing on our heap, it presents a pie chart of possible leak suspects. The logic used essentially seeks out data structures whose accumulated size is large in comparison to the rest of the heap. In this case, MAT has identified three suspects that in total comprise over half of the live heap data.

Scrolling down we start to get details about these leak candidates.

So there's a Hash, a Module, and 711 Class objects in our list of suspects. The Class objects are probably just loaded classes, since the JRuby core classes and additional classes loaded from Rails and its dependent libraries will easily number in the hundreds. We'll ignore that one for now. There's also an unusually large Module taking up almost 4MB of memory. We'll come back to that.

The Hash seems like the most likely candidate. Let's expand that.

The first new block of information gives us a list of "shortest paths" to the "accumulation point", or the point at which all this potentially-leaking data is gathering. There's more to this in the actual application, but I'm showing the top of the "path" here.

At the top of this list, we see the RubyHash object originally reported as a suspect, and a tree of objects that lead to it. In this case, we go from the Hash itself into a ConcurrentHashMap (note that we're hiding nothing here; you can literally browse anything in memory) which in turn is referenced by the "constants" field of a Class. So already we know that this hash is being referenced in some class's constant table. Pretty cool, eh?

Let's make sure we've got the right Hash and not some harmless data structure inside Rails. If we scroll down a bit more, we see a listing of all the objects this Hash has accumulated. Let's see what's in there.

Ok, so it's a hashtable structure with a table of entries. Can we get more out of this?

Of course like most of these tools, just about everything is clickable. We can dive into one of the hash entries and see what's in there. Clicking on an entry gives us several new ways to display the downstream objects we've managed to aggregate. In this case, we'll just do "List Objects", and the suboption "With Outgoing References" for downstream data.

Now finally in the resulting view of this particular RubyHashEntry, we can see that our MyData object is happily tucked away inside.

Ok, so we definitely have the right data structure. Not only that, but we can see that the entry's "key" is a Time object (org.jruby.RubyTime). Let's go back to the "Shortest Paths" view and examine the ConcurrentHashMap entry that's holding this Hash object. Each entry in this hash maps a constant name to a value, so we should be able to see which constant is holding the leaking references.

(At this point you'll see the side effects of my switch to OpenJDK 7; the memory addresses have changed, but the structure is the same.)

We'll do another "List Objects" "with outgoing references" against the the HashEntry object immediately referencing our RubyHash.

And there it is! In the "key" field of the HashEntry, we see our constant name "LEAKING_ARRAY".

What About That Module?

Oh yeah, what about that Module that showed up in the leak suspects? It was responsible for almost 4MB of the heap. Let's go back and check it out.

A-ha! Eclipse MAT has flagged the Gem module as being a potential leak suspect. But why? Let's go back to the suspect report and look at the Accumulated Objects by Class table, toward the bottom.

Ok, so the Gem module eventually references nearly ~~6000~~ Gem::Specification objects, which makes up the bulk of our 3.8MB. ~~I guarantee I don't have 6000 gem versions installed.~~ Perhaps that's something that RubyGems should endeavor to fix? Perhaps we've just used JRuby and Eclipse MAT to discover either a leak or wasteful memory use in RubyGems?

Evan Phoenix pointed out that I misread the columns. It's actually 249 Specification objects, their "self" size is almost 6000 bytes, and their "retained" size is 3.8MB. But that gives me an opportunity to show off another feature of MAT: Customized Retained Set calculation.

In this case, the retained size seems a bit suspect. Could there really be 3.8MB of data kept alive by Gem::Specification objects? It seems like a bit much, to be sure, but digging through the tree of references from the Gem module down shows there's several references to classes and modules, which in turn reference constant tables, method tables, and so on. How can we filter out that extra noise?

First we'll return to the view of the Gem module (two screenshots up) by going back to leak suspect #2, expanding "Shortest Paths". The topmost RubyModule in that list is the Gem module, so we're all set to calculate a Customized Retained Set.

The resulting dialog provides a list of options through which you can specify classes or fields to ignore when calculating the retained set from a given starting point. In this case, it's simple enough to filter out org.jruby.RubyClass and org.jruby.RubyModule, so that references from Gem::Specification back into the class/module hierarchy don't get included in calculations.

Which results in a similar view to those we've seen, but with objects sorted by retained heap.

Well what the heck? It looks like it's all String data?

JRuby's String implementation is an org.jruby.RubyString object, aggregating an org.jruby.util.ByteList object, aggregating a byte array, so the top three entries there in total are essentially all String memory. The best way to investigate where they're coming from is to do "List Objects" on RubyString, but instead of "with outgoing references" we'll use "with incoming references" to show where all those Strings are coming from.

Finally we have a view that lets us hunt through all these strings and see where they're coming from. Poking at the first few shows they're stored in constant tables of the Gem module (that last RubyModule I haven't expanded in). That's probably not a big deal. But if we sort the the list of RubyString objects by their retained sizes, we get a different picture of the system.

If we dig into the *largest* String objects, they appear to be referenced by Gem::Specification instance variables! So there's probably something worth investigating here.

It's also worth noting that any Ruby application is going to have a lot of Strings in it, so this isn't all that unusual to see. But it's nice to have a tool that lets you investigate potential inefficiencies (even down to the raw bytes!), and it's nice to know that at least some of that retained data for the Gem module is "real" and not just references back into the class hierarchy.

(And I'm not convinced all those Strings really *need* to be alive...but you're welcome to take it from here!)

Your Turn

Eclipse MAT is probably one of the nicest of the free tools. In addition to object browsing, leak detection, GC root analysis, and object query language support, there's a ton of other features, both in the main distribution and available from third parties. If you're hunting for memory leaks, or just want to investigate the memory usage of your (J)Ruby application, MAT is a tool worth playing with (and as always, I hope you will blog and report your experiences!)

Browsing Memory the JRuby Way

2010-07-09T01:23:00.023-05:00

There's been a lot of fuss made lately over memory inspection and profiling tools for Ruby implementations. And it's not without reason; inspecting a Ruby application's memory profile, much less diagnosing problems, has traditionally been very difficult. At least, difficult if you don't use JRuby.

Because JRuby runs on the JVM, we benefit from the dozens of tools that have been written for the JVM. Among these tools are numerous memory inspection, profiling, and reporting tools, some built into the JDK itself. Want a heap dump? Check out the jmap (Java memory map) and jhat (Java heap analysis tool) shipped with Hotspot-based JVMs (Sun, OpenJDK). Looking for a bit more? There's the Memory Analysis Tool based on Eclipse, the YourKit memory and CPU profiling app, VisualVM, now also shipped with Hotspot JVMs...and many more. There's literally dozens of these tools, and they provide just about everything you can imagine for investigating memory.

In this post, I'll show how you can use two of these tools: VisualVM, a simple, graphical tool for exploring a running JVM; and the jmap/jhat combination, which allows you to dump the memory heap to disk for inspection offline.

Getting JRuby Prepared

All these tools work with any version of JRuby, but as part of JRuby 1.6 development I've been adding some enhancements. Specifically, I've made some modifications that allow Ruby objects to show up side-by-side with Java objects in memory profiles. A little explanation is in order.

In JRuby, all the core classes are represented by "native" Java classes. Object is represented by org.jruby.RubyObject, String is org.jruby.RubyString, and so on. Normally, if you extend one of the core classes, we don't actually create a new "native" class to represent it; instead, all user-created classes that extend Object simply show up as RubyObject in memory. This is still incredibly useful; you can look into RubyObject and see the metaClass field, which indicates the actual Ruby type.

Let's see what that looks like, so we know where we're starting from. We'll run a simple script that creates a custom class, instantiates and saves 10000 instances of it, and then sleeps.

~/projects/jruby ➔ cat foo_heap_example.rb 
class Foo
end

ary = []
10000.times { ary << Foo.new }

puts "ready for analysis!"
sleep

~/projects/jruby ➔ jruby foo_heap_example.rb 
ready for analysis!

So we have our test subject ready to go. To use the jmap tool, we need the pid of this process. Of course we can use the usual shell tricks to get it, but the JDK comes with a nice tool for finding all JVM pids active on the system: jps

~/projects/jruby ➔ jps -l
52862 sun.tools.jps.Jps
52857 org/jruby/Main
48716 com.sun.enterprise.glassfish.bootstrap.ASMain

From this, you can see I have three JVMs running on my system right now: jps itself; our JRuby instance; and a GlassFish server I used for testing earlier today. We're interested in the JRuby instance, pid 52857. Let's see what jmap can do with that.

~/projects/jruby ➔ jmap
Usage:
    jmap [option] <pid>
        (to connect to running process)
    jmap [option] <executable <core>
        (to connect to a core file)
    jmap [option] [server_id@]<remote server IP or hostname>
        (to connect to remote debug server)

where <option> is one of:
    <none>               to print same info as Solaris pmap
    -heap                to print java heap summary
    -histo[:live]        to print histogram of java object heap; if the "live"
                         suboption is specified, only count live objects
    -permstat            to print permanent generation statistics
    -finalizerinfo       to print information on objects awaiting finalization
    -dump:<dump-options> to dump java heap in hprof binary format
                         dump-options:
                           live         dump only live objects; if not specified,
                                        all objects in the heap are dumped.
                           format=b     binary format
                           file=<file>  dump heap to <file>
                         Example: jmap -dump:live,format=b,file=heap.bin <pid>
    -F                   force. Use with -dump:<dump-options> <pid> or -histo
                         to force a heap dump or histogram when <pid> does not
                         respond. The "live" suboption is not supported
                         in this mode.
    -h | -help           to print this help message
    -J<flag>             to pass <flag> directly to the runtime system

<

The simplest option here is -histo, to print out a histogram of the objects on the heap. Let's run that against our JRuby instance.

~/projects/jruby ➔ jmap -histo:live 52857

 num     #instances         #bytes  class name
----------------------------------------------
   1:         22677        3192816  <constMethodKlass>
   2:         22677        1816952  <methodKlass>
   3:         35089        1492992  <symbolKlass>
   4:          2860        1389352  <instanceKlassKlass>
   5:          2860        1193536  <constantPoolKlass>
   6:          2798         739264  <constantPoolCacheKlass>
   7:          5861         465408  [B
   8:          5399         298120  [C
   9:          3042         292032  java.lang.Class
  10:          4037         261712  [S
  11:         10002         240048  org.jruby.RubyObject
  12:          3994         179928  [[I
  13:          5474         131376  java.lang.String
  14:          1661          95912  [I
...

The resulting output is a listing of literally every object in the system...not just Ruby objects even! The value of this should be apparent; not only can you start to investigate the memory overhead of code you've written, you'll also be able to investigate the memory overhead of every library and every piece of code running in the same process, right down to byte arrays (the "[B" above) and "native" Java strings ("java.lang.String" above). And so far we haven't had to do anything special to JRuby. Nice, eh?

So, back to the matter at hand: the Foo class from our example. Where is it?

Well, the answer is that it's right there; 10000 of those 10002 org.jruby.RubyObject instances are our Foo objects; the other two are probably objects constructed for JRuby runtime purposes. But obviously, there's nothing in this output that tells us how to find our Foo instances. This is what I'm remedying in JRuby 1.6.

On JRuby master, there's now a flag you can pass that will stand up a JVM class for every user-created Ruby class. Among the many benefits of doing this, we also get a more useful profile. Let's see how to use the flag (which will either be default or very easy to access by the time we release JRuby 1.6).

~/projects/jruby ➔ jruby -J-Djruby.reify.classes=true foo_heap_example.rb 
ready for analysis!

If we run jmap against this new instance, we see a more interesting result.

 num     #instances         #bytes  class name
----------------------------------------------
   1:         22677        3192816  <constMethodKlass>
   2:         22677        1816952  <methodKlass>
   3:         35089        1492992  <symbolKlass>
   4:          2860        1389352  <instanceKlassKlass>
   5:          2860        1193536  <constantPoolKlass>
   6:          2798         739264  <constantPoolCacheKlass>
   7:          5863         465456  [B
   8:          5401         298208  [C
   9:          3042         292032  java.lang.Class
  10:          4037         261712  [S
  11:         10000         240000  ruby.Foo
  12:          3994         179928  [[I
  13:          5476         131424  java.lang.String
  14:          1661          95912  [I

A-ha! There's our Foo instances! The "reify classes" option generates a JVM class of the same name as the Ruby class, prefixed by "ruby." to separate it from other JVM classes. Now we can start to see the real power of the tools, and we're just at the beginning. Let's see what a simple Rails application looks like.

~/projects/jruby ➔ jmap -histo:live 52926 | grep " ruby."
  29:         11685         280440  ruby.TZInfo.TimezoneTransitionInfo
  97:           970          23280  ruby.Gem.Version
  98:           914          21936  ruby.Gem.Requirement
 122:           592          14208  ruby.TZInfo.TimezoneOffsetInfo
 138:           382           9168  ruby.Gem.Dependency
 159:           265           6360  ruby.Gem.Specification
 201:           142           3408  ruby.ActiveSupport.TimeZone
 205:           118           2832  ruby.TZInfo.DataTimezoneInfo
 206:           118           2832  ruby.TZInfo.DataTimezone
 273:            41            984  ruby.Gem.Platform
 383:            14            336  ruby.Mime.Type
 403:            13            312  ruby.Set
 467:             8            192  ruby.ActionController.MiddlewareStack.Middleware
 476:             8            192  ruby.ActionView.Template
 487:             7            168  ruby.ActionController.Routing.DividerSegment
 508:             6            144  ruby.TZInfo.LinkedTimezoneInfo
 523:             6            144  ruby.TZInfo.LinkedTimezone
 810:             4             96  ruby.ActionController.Routing.DynamicSegment
2291:             2             48  ruby.ActionController.Routing.Route
2292:             2             48  ruby.I18n.Config
2293:             2             48  ruby.ActiveSupport.Deprecation.DeprecatedConstantProxy
2298:             2             48  ruby.ActionController.Routing.ControllerSegment
...

This time I've opted to grep out just the "ruby." items in the histogram, and the results are pretty impressive! We can see the baffling fact that there's 970 instance of Gem::Version, using at least 23280 bytes of memory. We can see the even more depressing fact that there's 11685 live instances of TZInfo::TimezoneTransitionInfo, using at least 280440 bytes.

Now that we're getting useful data, let's look at the first of our tools in more detail: jmap and jhat.

jmap and jhat

As you might guess, I do a lot of profiling in the process of developing JRuby. I've used probably a dozen different tools at different times. But the first tool I always reach for is the jmap/jhat combination.

You've seen the simple case of using jmap above, generating a histogram of the live heap. Let's take a look at an offline heap dump.

~/projects/jruby ➔ jmap -dump:live,format=b,file=heap.bin 52926
Dumping heap to /Users/headius/projects/jruby/heap.bin ...
Heap dump file created

That's how easy it is! The binary dump in heap.bin is supported by several tools: jhat (obviously), VisualVM, the Eclipse Memory Analysis Tool, and others. It's not officially a "standard" format, but it hasn't changed in a long time. Let's have a look at jhat options.

~/projects/jruby ➔ jhat
ERROR: No arguments supplied
Usage:  jhat [-stack <bool>] [-refs <bool>] [-port <port>] [-baseline <file>] [-debug <int>] [-version] [-h|-help] <file>

 -J<flag>          Pass <flag> directly to the runtime system. For
     example, -J-mx512m to use a maximum heap size of 512MB
 -stack false:     Turn off tracking object allocation call stack.
 -refs false:      Turn off tracking of references to objects
 -port <port>:     Set the port for the HTTP server.  Defaults to 7000
 -exclude <file>:  Specify a file that lists data members that should
     be excluded from the reachableFrom query.
 -baseline <file>: Specify a baseline object dump.  Objects in
     both heap dumps with the same ID and same class will
     be marked as not being "new".
 -debug <int>:     Set debug level.
       0:  No debug output
       1:  Debug hprof file parsing
       2:  Debug hprof file parsing, no server
 -version          Report version number
 -h|-help          Print this help and exit
 <file>            The file to read

For a dump file that contains multiple heap dumps,
you may specify which dump in the file
by appending "#<number>" to the file name, i.e. "foo.hprof#3".

All boolean options default to "true"

Generally you can just point jhat at a heap dump and away it goes. Occasionally if the heap is large, you may need to use the -J option to increase the maximum heap size of the JVM jhat runs in. Since we're running a Rails app, we'll bump the heap up a little bit.

~/projects/jruby ➔ jhat -J-Xmx200M heap.bin
Reading from heap.bin...
Dump file created Fri Jul 09 02:07:46 CDT 2010
Snapshot read, resolving...
Resolving 604115 objects...
[much verbose logging elided for brevity]

Chasing references, expect 120 dots........................................................................................................................
Eliminating duplicate references........................................................................................................................
Snapshot resolved.
Started HTTP server on port 7000
Server is ready.

"Server is ready"? Damn you Java people! Does everything have to be a server with you?

In this case, it's actually an incredibly useful tool. jhat starts up a small web application on port 7000 that allows you to click through the dump file. Let's see what that looks like.

Here's the front page of the tool. We see a listing of all JVM classes in the system. If you scroll to the bottom, there's a few more general functions.

Let's go with what we know and view the heap histogram again.

Here we can see that there's lots of objects taking up memory, and they're a mix of JVM-native types, JRuby implementation classes, and actual Ruby classes. In fact, here we can see our friend TZInfo::TimezoneTransitionInfo again. Let's click through.

Pretty mundane stuff so far; basically just information about the class itself. But you see at the bottom of this screenshot that we can go from here to viewing all instances of TimezoneTransitionInfo. Let's try that.

Ahh, that's more like it! Now we can see that there's a heck of a lot of these things floating around. Let's investigate a bit more and click through the first instance.

Now this is some cool stuff!

We can see that the JVM class generated for TimezoneTransitionInfo has three fields: metaClass, which points at the Ruby Class object; varTable, which is an array of Object references used for instance variables and other "internal" variables; and a flags field containing runtime flags for the object, like whether it's frozen, tainted, and so on. We can see that this object has no special flags set, and we can dig deeper into those fields if we like. We'll skip that today.

Moving further down, we see a few more amazing links. First, there's a list of all references to this object. Ahh, now we can start to investigate why they're staying in memory, even though we're not using them. We can even have jhat show us the full chains of references keeping these objects alive; a series of objects leading all the way back to one "rooted" by a thread or by global JVM state. And we can explore the other direction as well, walking all objects reachable from this one.

This is only a small part of what you can do with jmap and jhat, and they're so simple to use it feels almost criminal. But what if we want to inspect an application while it's running? Dumping heaps and analyzing them offline can tell you much of the story, but sometimes you just want to see the objects coming and going yourself. Let's move on to VisualVM.

VisualVM

VisualVM spawned out of the NetBeans profiling tools. One of the biggest complaints about the JVMs of old were that all the built-in tooling seemed to be designed for JVM engineers alone. Because Sun had the foresight to build and own their own IDE and related modules, it eventually became a natural fit to pull out the profiling tools for use by everyone. And so VisualVM was born.

On most systems with Java 6 installed, you should have a "jvisualvm" command. Let's run it now.

When you start up VisualVM, you're presented with a list of running JVMs, similar to using the 'jps' command. You can also connect to remote machines, browse offline heap and core dump files, and look through memory and CPU profiling snapshots from previous runs. Today, we'll just open up our running Rails app and see what we can see.

VisualVM connects to the running process and brings up a basic information pane with process information, JVM information, and so on. We're interested in monitoring heap usage, so let's move to the "Monitor" tab.

Already we're getting some useful information. This view shows CPU usage (currently zero, since it's an idle Rails app), Heap usage over time, and the number of JVM classes and threads that are active. We can trigger a full GC, if we'd like to tidy things up before we start poking around. But most importantly, we can do the jmap/jhat dance in one step, by clicking the Heap Dump button. Tantalizing, isn't it?

Initially, we see a basic summary of the heap: total size, number of classes and GC roots, and so on. We're looking for our friend TimezoneTransitionInfo, so let's look for it in the "Classes" pane.

Ahh, there it is, just a little ways down the list. The counts are as we expect, so let's double-click and dig a bit deeper.

Here we have a lot of the same information about object instances that we did with jhat, but presented in a much richer format. Almost everything is active; you can jump around the heap and do analysis that would take a lot of manual work very easily. Let's try another tool: the Retained Size calculator.

Because our JVM tools see all objects equally, the reported size for a Ruby object on the heap is only part of the story. There's also the variable table, the object's instance variables, and objects they reference to consider. Let's jump to a different object now, Gem::Version.

We don't want to have to scroll through the list of classes to find ruby.Gem.Version, so let's make use of the Object Query Language console. With the OQL console, you can write SQL-like queries to retrieve listings of objects in the heap. We'll search for all instances of ruby.Gem.Version.

The query runs and we get a listing of Gem::Version objects. Let's dig deeper and see how much retained memory each Version object is keeping alive.

Clicking on the "Compute Retained Sizes" link in the "Instances" pane prompts us with this dialog. We're tough...we can take it.

Reticulating splines...

So it looks like each of the Version objects take from 125 to 190 bytes for a total of 19400 bytes, most of which is from the variable table. What's in there?

Ahh...looks like there's a String and an Array. And of course we can poke around the heap ad infinatum, into and out of "native" JRuby and JVM classes, and truly get a complete picture of what our running applications look like. Now you're playing with power.

Your Turn

This is obviously only the tip of the iceberg. Tools like Eclipse Memory Analysis Tool include features for detecting leaks; VisualVM and NetBeans both allow you to turn on allocation tracing, to show where in your code all those objects are being created. There's tools for monitoring live GC behavior, and many of these tools even allow you to dig into a running heap and modify live objects. If you can dream it, there's a tool that can do it. And you get all that for free by using JRuby.

If you'd like to play with this, it all works with JRuby 1.5.1 but you won't get the nice JVM classes for Ruby classes. For that, you can pull and build JRuby master, download a 1.6.0.dev snapshot, or just wait for JRuby 1.6. And if you do play with these or other tools, I hope you'll let us know and blog about your experience!

In the future, I'll try to show some of the other tools plus some of the CPU profiling capabilities they bring to the table. For now, rest assured that if you're using JRuby, you really do have the best tools available to you.

My Short List of Key Missing JVM Features

2010-06-16T17:44:00.002-05:00

I mused today on Twitter that there's just a few small things that the JVM/JDK need to become a truly awesome platform for all sorts of development. Since so many people asked for more details, I'm posting a quick list here. There's obviously other things, but these are the ones on my mind today.

Cold Performance

Current JVMs start up pretty fast, and there's changes coming in Hotspot in Java 7 that will make them even better. Usually this comes from combinations of pre-verifying bytecode (or providing verification hints), sharing class data across processes, and run-of-the-mill tweaks to make the loading and linking processes more efficient. But for many apps, this doesn't do anything to solve the biggest startup hit of all: cold execution performance. Because the JVM doesn't save off the jitted products of each run, it must start "cold" every time, running everything in the bytecode interpreter until it gets hot enough to compile. Even on JVMs that don't have an interpreter, the initial cost of compiling everything to not-particularly-optimized assembly also causes a major startup hit (try running command-line stuff on JRockit or J9).

There's a few things people have suggested, and they're all hard:

Tiered compilation - compile earlier using the fastest-possible, least-optimizing compiler, but decorate the compiled code with appropriate profiling logic to do a better job later. Hotspot in Java 7 may ship a tiered compiler, but there have been some resource setbacks that delayed its development.
Save off compilation or optimization artifacts - this is theoretically possible, but the deeper you go the harder it is to save it. Usually the in-memory results of optimization and compilation depend on the layout of things...in memory. Saving them to disk means you need to scrub out anything that might be different in a new process like memory addresses and class identities. But .NET can do this, though it largely *just* does static compilation. Happy medium?
Keep a JVM process running and toss it new work. We do this in JRuby with the Nailgun library, but it has some problems. First off, it can leave various aspects of the JVM in a dirty state, like system properties and memory footprint. Second, it can't kill off rogue threads that don't terminate, so they can collect over time. And third...it's not actually running at the console, so a lot of console things you'd do normally don't work.

This is probably the biggest unsolvable problem for JRuby right now, and the one we most often have to apologize for. JRuby is fast...at times, very fast...and getting faster every day. But not during the first 5 seconds, and so everyone gets the same bad impression.

Better Console/Terminal Support

There's endless blogs out there complaining about how the standard IO streams you get from the JVM are crippled in various ways. You can't select on them, for example, which is the source of a few unfixable bugs in JRuby. You can't pass them along to subprocesses, which is perhaps more a failing of the process-launching APIs in the JDK than standard IO itself. There's no direct terminal support in any of Java's APIs, so people end up yanking in libraries like jline just to support line editing. If the JDK shipped with some nice terminal and process APIs, a lot of the hassles developers have writing command-line tools in Java would melt away.

There's some light at the end of the tunnel. NIO2, scheduled to be part of Java 7, will bring better process launching APIs (with inherited standard IO streams, if you desire), a broader range of selectable channels, and much more. Hopefully it will be enough.

Fix the Busted APIs

JDBC is broken. Why? Because you have to register your driver in a global hard-referencing hash, and have to unregister it from the same classloader or it will leak. That means that if you're loading JDBC drivers from within a webapp or EE application, *your entire application remains in memory* because the driver references it and that map references the driver. This is the primary reason why most Java web application servers leak memory on undeploy, and it's another "unfixable" from JRuby's perspective.

Object serialization is broken. Why? Because it plays all sorts of tricks to get your classloader, reflectively access fields (if you're going to reflectively access them anyway, why not just break encapsulation if security allows it), and construct object instances without allowing you the opportunity to initialize them appropriately yourself. You have to provide no-arg constructors, have to un-final fields so they can be set up outside of construction, and heaven forbid you use default serialization: it's dead slow.

Reflection is too slow and there's no way around it. Not only do you end up calling through many extra levels of logic for reflective invocation, you have to box your argument lists, box your numerics, and wrap everything in exception-handling. And it doesn't have to be this way. The invokedynamic work brings along with it method handles, which are fast, direct pointers to methods. This should have been added long ago, but thankfully it's on the way in Java 7. Until then, projects like JRuby will have to continue eating the cost of reflection...or generate method handles by hand. We do both.

Regular expressions are broken. Why? Because simple alternations can blow the Java stack when fed especially large input. The current Sun-created regex implementation recurses for things like alternation, making it easy for it to fail to match on large input. The problem is so bad that we've actually switched regular expression engines in JRuby *four times*, including two implementations we wrote ourselves. Nobody can say we haven't bled for our users.

And there's numerous other examples. Some are relics of Java 1.0 that never got corrected (because old APIs don't die, they just get deprecated...or ignored). Some are relics of the idea that gigantic monolithic servers hosting dozens of apps (and leaking memory when they undeploy, or else contending for basic resources that separate processes would not) are a good idea, when in actuality running multiple JVMs that each only host one or a few apps works far better. Making a real effort to smooth these bad APIs would go a long way.

Better Support for Native Libraries and POSIX Features

As of Java 6, there's still no support for working with symlinks and only limited support for setting file permissions. Process launching is absolutely terrible. You can't select on all channels...only on sockets. If you want to use a native library, you have to write JNI code to do it, even though there are libraries like JNA and JFFI in the wild that do an outstanding job of dynamically loading and binding those libraries.

Missing the POSIXy features is basically inexcusable today. Most of the system-level APIs in the JDK are still based on a lowest common denominator somewhere near Windows 95, even though all modern operating systems provide at least a substantial subset of those APIs. NIO2 will bring many improvements, but it's almost certain that some parts of POSIX won't be exposed, either because there's not enough resources to spec out the Java APIs for them or because they end up being too system-specific.

As for loading native libraries...this is again something that should have been rolled into the JDK a long time ago. Many people will cry foul..."pure Java!" they'll shout...and I agree. But there are times when some functionality simply doesn't exist in a Java library, or doesn't scale well as an out-of-process call. For these times, you just have to use the native library...and the barrier to entry for doing that on the Java platform is just too high. Rolle JNA or JFFI or something similar into the JDK, so we grown-ups can choose when we want to use native code.

The Punchline

The punchline is that for most of these things, we've solved or worked around them in JRuby...at *great expense*. I'd go so far as to say we've done more to work around the JVM and the JDK's lackings than any other project. We've gone out of our way to improve startup by any means possible. We ship beautiful, solidly-performing libraries for binding native libs without writing a line of C code. We generate a large amount of code at compile and runtime to avoid using reflection as much. We maintain and ship native POSIX layers for a dozen platforms. We've (i.e. one of our champions, Marcin Mielzynski) implemented our own regular expression engine (i.e. a port of Oniguruma). We've pulled every trick in the book to get process launching to work nicely and behave like Ruby users expect. And so on and so forth.

But it's not sustainable. We can't continue to patch around the JVM and JDK forever, even if we've done a great job so far. Hopefully this will serve as a wake-up call for JVM and JDK implementers around the world: If you don't want the Java platform to be a server-only, large-app-only, long-running-only, headless-only world...it's time to fix these things. I'm standing by to help coordinate those efforts :)

Restful Services in Ruby using JRuby and Jersey

2010-06-01T18:09:00.003-05:00

There's lots of ways to present RESTful web services these days, and REST has obviously become the new "it's IPC no it's not" hotness. And of course Rubyists have been helping to lead the way, building restfulness into just about everything they write. Rails itself is built around REST, with most controllers doubling as RESTful interfaces, and it even provides extra tools to help you transparently make RESTful calls from your application. If you're doing RESTful services for a typical Ruby application, Rails is the way to go (and even if you're not using Ruby in general...you should be considering JRuby + Rails).

In the Java world the options aren't quite as clear, but one API has at least attempted to standardize the idea of doing RESTful services on the Java platform: JSR-311, otherwise known as JAX-RS.

JAX-RS in theory makes it easy for you to simply mark up a piece of Java code with annotations and have it automatically be presented as a RESTful service. Of the available options, it may be the simplest, quickest way to get a Java-based service published and running.

So I figured I'd try to use it from JRuby, and when doing it in Ruby it's actually surprisingly clean, even compared to Ruby options.

The Service

I followed the Jersey Getting Started tutorial using Ruby for everything (and not using Maven in this case).

My version of their HelloWorldResource looks like this in Ruby:

require 'java'java_import 'javax.ws.rs.Path'
java_import 'javax.ws.rs.GET'
java_import 'javax.ws.rs.Produces'

java_package 'com.headius.demo.jersey'
java_annotation 'Path("/helloworld")'
class HelloWorld
java_annotation 'GET'
java_annotation 'Produces("text/plain")'
def cliched_message
 "Hello World"
end
end

Notice that we're using the new features in JRuby 1.5 for producing "real" Java classes: java_package to specify a target package for the Java class, java_annotation to specify class and method annotations.

We compile it using jrubyc from JRuby 1.5 like this:

~/projects/jruby ➔ jrubyc -c ../jersey-archive-1.2/lib/jsr311-api-1.1.1.jar --javac restful_service.rb
Generating Java class HelloWorld to /Users/headius/projects/jruby/com/headius/demo/jersey/HelloWorld.java
javac  -d /Users/headius/projects/jruby -cp /Users/headius/projects/jruby/lib/jruby.jar:../jersey-archive-1.2/lib/jsr311-api-1.1.1.jar /Users/headius/projects/jruby/com/headius/demo/jersey/HelloWorld.java

The new --java(c) flags in jrubyc examine your source for any classes, spitting out .java source for each one in turn. Along the way, if you have marked it up with signatures, annotations, imports, and so on, it will emit those into the Java source as well. If you specify --javac (as opposed to --java), it will also compile the resulting sources for you with jruby and your user-specified jars in the classpath, as I've done here.

The result looks like this to Java:

~/projects/jruby ➔ javap com.headius.demo.jersey.HelloWorld
Compiled from "HelloWorld.java"
public class com.headius.demo.jersey.HelloWorld extends org.jruby.RubyObject{
 public static org.jruby.runtime.builtin.IRubyObject __allocate__(org.jruby.Ruby, org.jruby.RubyClass);
 public com.headius.demo.jersey.HelloWorld();
 public java.lang.Object cliched_message();
 static {};
}

Under the covers, this class will load in the source of our restful_service.rb file and wire up all the Ruby and Java pieces so that both sides see HelloWorld as the in-memory representation of the Ruby HelloWorld class. Method calls are dispatched to the Ruby code, constructors dispatch to initialize, and so on. It's truly living in both worlds.

With the service in hand, we now need a server script to start it up.

The Server Script

Continuing with the tutorial, I've taken their simple Java-based server script and ported it directly to Ruby:

require 'java'
java_import com.sun.jersey.api.container.grizzly.GrizzlyWebContainerFactory

base_uri = "http://localhost:9998/"
init_params = {
"com.sun.jersey.config.property.packages" => "com.headius.demo.jersey"
}

puts "Starting grizzly"
thread_selector = GrizzlyWebContainerFactory.create(
 base_uri, init_params.to_java)

puts <<EOS
Jersey app started with WADL available at #{base_uri}application.wadl
Try out #{base_uri}helloworld
Hit enter to stop it...
EOS

gets

thread_selector.stop_endpoint

exit(0)

It's somewhat cleaner and shorter, but it wasn't particularly large to begin with. At any rate, it shows how simple it is to launch a Grizzly server and how nice annotation-based APIs can be for auto-configuring our JAX-RS service.

The CLASSPATH

Ahh CLASSPATH. You are so maligned when all you hope to do is make it explicit where libraries are coming from. The world should learn from your successes and your failures.

There's five jars required for this Jersey example to run. I've tossed them into my CLASSPATH env var, but you're free to do it however you like

jersey-core-1.2.jar
jersey-server-1.2.jar
jsr311-api-1.1.1.jar
asm-3.1.jar
grizzly-servlet-webserver-1.9.9.jar

The first four are available in the jersey-archive download, and you can fetch the Grizzly jar from Maven or other places.

Testing It Out

The lovely bit about this is that it's essentially a four-step process to do the entire thing: write and compile the service, write the server script, set up CLASSPATH, and run the server. Here's the server output, finding my HelloWorld service right where it should:

~/projects/jruby ➔ jruby main.rb
Starting grizzly
Jersey app started with WADL available at http://localhost:9998/application.wadl
Try out http://localhost:9998/helloworld
Hit enter to stop it...
Jun 1, 2010 6:36:55 PM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
com.headius.demo.jersey
Jun 1, 2010 6:36:55 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class com.headius.demo.jersey.HelloWorld
Jun 1, 2010 6:36:55 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
Jun 1, 2010 6:36:55 PM com.sun.jersey.server.impl.application.WebApplicationImpl initiate
INFO: Initiating Jersey application, version 'Jersey: 1.2 05/07/2010 02:04 PM'

And perhaps the most anti-climactic climax ever, curling our newly-deployed service:

~ ➔ curl http://localhost:9998/helloworld
Hello World

It works!

What We've Learned

This was a simple example, but we demonstrated several things:

JRuby's new jrubyc --java(c) support for generating "real" Java classes
Tagging a Ruby class with Java annotations, so it can be seen by a Java framework
Booting a Grizzly server from Ruby code
Implementing a JAX-RS service with Jersey in Ruby code

Do you have any examples of other nice annotation-based APIs we could test out with JRuby?

Kicking JRuby Performance Up a Notch

2010-05-31T01:15:00.004-05:00

We've often talked about the benefit of running on the JVM, and while our performance numbers have been *good*, they've never been as incredibly awesome as a lot of people expected. After all, regardless of how well we performed when compared to other Ruby implementations, we weren't as fast as statically-typed JVM languages.

It's time for that to change.

I've started playing with performing more optimizations based on runtime information in JRuby. You may know that JRuby has always had a JIT (just-in-time compiler), which lazily compiled Ruby AST into JVM bytecode. What you may not know is that unlike most other JIT-based systems, we did not gather any information at runtime that might help the eventual compilation produces a better result. All we applied were the same static optimizations we could safely do for AOT compilation (ahead-of-time, like jrubyc), and ultimately the JIT mode just deferred compilation to reduce the startup cost of compiling everything before it runs.

This was a pragmatic decision; the JVM itself does a lot to boost performance, even with our very naïve compiler and our limited static optimizations. Because of that, our performance has been perfectly fine for most users; we haven't even made a concentrated effort to improve execution speed for almost 18 months, since the JRuby 1.1.6 release. We've spent a lot more time handling the issues users actually wanted us to work on: better Java integration, specific Ruby incompatibilities, memory reduction (and occasional leaks), peripheral libraries like jruby-rack and activerecord-jdbc, and general system stability. When we asked users what we should focus on, performance always came up as a "nice to have" but rarely as something people had a problem with. We performed very nicely.

These days, however, we're recognizing a few immutable truths:

You can never be fast enough, and if your performance doesn't steadily improve people will feel like you're moving backward.
People eventually *do* want Ruby code to run as fast as C or Java, and if we can make it happen we should.
JRuby users like to write in Ruby, obviously, so we should make an effort to allow writing more of JRuby in Ruby, which requires that we suffer no performance penalty in the process.
Working on performance can be a dreadful time sink, but succeeding in improving it can be a tremendous (albeit shallow) ego boost.

So along with other work for JRuby 1.6, I'm back in the compiler.

I'm just starting to play with this stuff, so take all my results as highly preliminary.

A JRuby Call Site Primer

At each location in Ruby code where a dynamic call happens, JRuby installs what's called a call site. Other VMs may simply refer to the location of the call as the "call site", but in JRuby, each call site has an implementation of org.jruby.runtime.CallSite associated with it. So given the following code:

def fib(a)
  if a < 2
    a
  else
    fib(a - 1) + fib(a - 2)
  end
end

There are six call sites: the "<" call for "a < 2", the two "-" calls for "a - 1" and "a - 2", the two "fib" calls, and the "+" call for "fib(a - 1) + fib(a - 2)". The naïve approach to doing a dynamic call would be to query the object for the named method every time and then invoke it, like the dumbest possible reflection code might do. In JRuby (like most dynamic language VMs) we instead have a call site cache at each call site to blunt that lookup cost. And in implementation terms, that means each caching CallSite is an instance of org.jruby.runtime.callsite.CachingCallSite. Easy, right?

The simplest way to take advantage of runtime information is to use these caching call sites as hints for what method we've been calling at a given call site. Let's take the example of "+". The "+" method here is being called against a Fixnum object every time it's encountered, since our particular run of "fib" never overflows into Bignum and never works with Float objects. In JRuby, Fixnum is implemented with org.jruby.RubyFixnum, and the "+" method is bound to RubyFixnum.op_plus. Here's the implementation of op_plus in JRuby:

    @JRubyMethod(name = "+")
    public IRubyObject op_plus(ThreadContext context, IRubyObject other) {
        if (other instanceof RubyFixnum) {
            return addFixnum(context, (RubyFixnum)other);
        }
        return addOther(context, other);
    }

The details of the actual addition in addFixnum are left as an exercise for the reader.

Notice that we have an annotation on the method: @JRubyMethod(name = "+"). All the Ruby core classes implemented in JRuby are tagged with a similar annotation, where we can specify the minimum and maximum numbers of arguments, the Ruby-visible method name (e.g. "+" here, which is not a valid Java method name), and other details of how the method should be called. Notice also that the method takes two arguments: the current ThreadContext (thread-local Ruby-specific runtime structures), and the "other" argument being added.

When we build JRuby, we scan the JRuby codebase for JRubyMethod annotations and generate what we call invokers, one for every unique class+name combination in the core classes. These invokers are all instance of org.jruby.internal.runtime.methods.DynamicMethod, and they carry various informational details about the method as well as implementing various "call" signatures. The reason we generate an invoker class per method is simple: most JVMs will not inline through code used for many different code paths, so if we only had a single invoker for all methods in the system...nothing would ever inline.

So, putting it all together. At runtime, we have a CachingCallSite instance for every method call in the code. The first time we call "+" for example, we go out and fetch the method object and cache it in place for future calls. That cached method object is a subclass of DynamicMethod, which knows how to do the eventual invocation of RubyFixnum.op_plus and return the desired result. Simple!

First Steps Toward Runtime Optimization

The changes I'm working on locally will finally take advantage of the fact that after running for a while, we have a darn good idea what methods are being called. And if we know what methods are being called, those invocations no longer have to be dynamic, provided our assumptions hold (assumptions being things like "we're always calling against Fixnums" or "it's always the core implementation of "+"). Put differently: because JRuby defers its eventual compilation to JVM bytecode, we can potentially turn dynamic calls into static calls.

The process is actually very simple, and since I just started playing with it two days ago, I'll describe the steps I took.

Step One: Turn Dynamic Into Static

First, I needed the generated methods to bring along enough information for me to do a direct call. Specifically, I needed to know what target Java class they pointed at (RubyFixnum in our "+" example), what the Java name of the method was (op_plus), what arguments the method signature expected to receive (returning IRubyObject and receiving ThreadContext, IRubyObject), and whether the implementation was static (as are most module-based methods...but our "op_plus" is not). So I bundled this information into what I called a "NativeCall" data structure, populated on any invokers for which a native call might be possible.

    public static class NativeCall {
        private final Class nativeTarget;
        private final String nativeName;
        private final Class nativeReturn;
        private final Class[] nativeSignature;
        private final boolean statik;
        ...

In the compiler, I added a similar bit of logic. When compiling a dynamic call, it now will look to see whether the call site has already cached some method. If so, it checks whether that method has a corresponding NativeCall data structure associated with it. If it does, then instead of emitting the dynamic call logic it would normally emit, it compiles a normal, direct JVM invocation. So where we used to have a call like this:

INVOKEVIRTUAL org/jruby/runtime/CallSite.call

We now have a call like this:

INVOKEVIRTUAL org/jruby/RubyFixnum.op_plus

This call to "+" is now essentially equivalent to writing the same code in Java against our RubyFixnum class.

This comprises the first step: get the compiler to recognize and "make static" dynamic calls we've seen before. My experimental code is able to do this for most core class methods right now, and it will not be difficult to extend it to any call.

(The astute VM implementer will notice I made no mention of inserting a guard or test before the direct call, in case we later need to actually call a different method. This is, for the moment, intentional. I'll come back to it).

Step Two: Reduce Some Fixnum Use

The second step I took was to allow some call paths to use primitive values rather than boxed RubyFixnum objects. RubyFixnum in JRuby always represents a 64-bit long value, but since our call path only supports invocation with IRubyObject, we generally have to construct a new RubyFixnum for all dynamic calls. These objects are very small and usually very short-lived, so they don't impact GC times much. But they do impact allocation rates; we still have to grab the memory space for every RubyFixnum, and so we're constantly chewing up memory bandwidth to do so.

There are various ways in which VMs can eliminate or reduce object allocations like this: stack allocation, value types, true fixnums, escape analysis, and more. But of these, only escape analysis is available on current JVMs, and it's a very fragile optimization: all paths that would consume an object must be completely inlined, or else the object can't be elided.

So to help reduce the burden on the JVM, we have a couple call paths that can receive a single long or double argument where it's possible for us to prove that we're passing a boxed long or double value (i.e. a RubyFixnum or RubyFloat). The second step I took was to make the compiler aware of several of these methods on RubyFixnum, such as the long version of op_plus:

    public IRubyObject op_plus(ThreadContext context, long other) {
        return addFixnum(context, other);
    }

Since we have not yet implemented more advanced means of proving a given object is always a Fixnum (like doing local type propagation or per-variable type profiling at runtime), the compiler currently can only see that we're making a call with a literal value like "100" or "12.5". As luck would have it, there's three such cases in the "fib" implementation above: one for the "<" call and two for the "-" calls. By combining the compiler's knowledge of which actual method is being called in each case and how to make a call without standing up a RubyFixnum object, our actual call in the bytecode gets simplified further:

INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus (Lorg/jruby/runtime/ThreadContext;J)

(For the uninitiated, that "J" at the end of the signature means this call is passing a primitive long for the second argument to op_minus.)

We now have the potential to insert additional compiler smarts about how to make a call more efficiently. This is a simple case, of course, but consider the potential for another case: calling from Ruby to arbitrary Java code. Instead of encumbering that call with all our own multi-method dispatch logic *plus* the multiple layers of Java's reflection API, we can instead make the call *directly*, going straight from Ruby code to Java code with no intervening code. And with no intervening code, the JVM will be able to inline arbitrary Java code straight into Ruby code, optimize it as a whole. Are we getting excited yet?

Step Three: Steal a Micro-optimization

There's two pesky calls remaining in "fib": the recursive invocations of "fib" itself. Now the smart thing to do would be to proceed with adding logic to the compiler to be aware of calls from currently-jitting Ruby code to not-yet jitted Ruby code and handle that accordingly. For example, we might also force those methods to compile, and the methods they call to compile, and so on, allowing us to optimize from a "hot" method outward to potentially "colder" methods within some threshold distance. And of course, that's probably what we'll do; it's trivial to trigger any Ruby method in JRuby to JIT at any time, so adding this to the compiler will be a few minutes work. But since I've only been playing with this since Thursday, I figured I'd do an even smarter thing: cheat.

Instead of adding the additional compiler smarts, I instead threw in a dirt-simple single-optimization subset of those smarts: detect self-recursion and optimize that case alone. In JRuby terms, this meant simply seeing that the "fib" CachingCallSite objects pointed back to the same "fib" method we were currently compiling. Instead of plumbing them through the dynamic pipeline, we make them be direct recursive calls, similar to the core method calls to "<" and "-" and "+". So our fib calls turn into the following bytecode-level invocation:

INVOKESTATIC ruby/jit/fib_ruby_B52DB2843EB6D56226F26C559F5510210E9E330D.__file__

Where "B52DB28..." is the SHA1 hash of the "fib" method, and __file__ is the default entry point for jitted calls.

Everything Else I'm Not Telling You

As with any experiment, I'm bending compatibility a bit here to show how far we can move JRuby's "upper bound" on performance. So these optimizations currently include some caveats.

They currently damage Ruby backtraces, since by skipping the invokers we're no longer tracking Ruby-specific backtrace information (current method, file, line, etc) in a heap-based structure. This is one area where JRuby loses some performance, since the JVM is actually double-tracking both the Java backtrace information and the Ruby backtrace information. We will need to do a bit more work toward making the Java backtrace actually show the relevant Ruby information, or else generate our Ruby backtraces by mining the Java backtrace (and the existing "--fast" flag actually does this already, matching normal Ruby traces pretty well).

The changes also don't track the data necessary for managing "backref" and "lastline" data (the $~ and $_ pseudo-globals), which means that regular expression matches or line reads might have reduced functionality in the presence of these optimizations (though you might never notice; those variables are usually discouraged). This behavior can be restored by clever use of thread-local stacks (only pushing down the stack when in the presence of a method that might use those pseudo-globals), or by simply easing back the optimizations if those variables are likely to be used. Ideally we'll be able to use runtime profiling to make a smart decision, since we'll know whether a given call site has ever encountered a method that reads or writes $~ or $_.

Finally, I have not inserted the necessary guards before these direct invocations that would branch to doing a normal dynamic call. This was again a pragmatic decision to speed the progress of my experiment...but it also raises an interesting question. What if we knew, without a doubt, that our code had seen all the types and methods it would ever see. Wouldn't it be nice to say "optimize yourself to death" and know it's doing so? While I doubt we'll see a JRuby ship without some guard in place (ideally a simple "method ID" comparison, which I wouldn't expect to impact performance much), I think it's almost assured that we'll allow users to "opt in" to completely "statickifying" specific of code. And it's likely that if we started moving more of JRuby's implementation into Ruby code, we would take advantage of "fully" optimizing as well.

With all these caveats in mind, let's take a look at some numbers.

And Now, Numbers Meaningless to Anyone Not Using JRuby To Calculate Fibonacci Numbers

The "fib" method is dreadfully abused in benchmarking. It's largely a method call benchmark, since most calls spawn at least two more recursive calls and potentially several others if numeric operations are calls as well. In JRuby, it doubles as an allocation-rate or memory-bandwidth benchmark, since the bulk of our time is spent constructing Fixnum objects or updating per-call runtime data structures.

But since it's a simple result to show, I'm going to abuse it again. Please keep in mind these numbers are meaningless except in comparison to each other; JRuby is still cranking through a lot more objects than implementations with true Fixnums or value types, and the JVM is almost certainly over-optimizing portions of this benchmark. But of course, that's part of the point; we're making it possible for the JVM to accelerate and potentially eliminate unnecessary computation, and it's all becoming possible because we're using runtime data to improve runtime compilation.

I'll be using JRuby 1.6.dev (master plus my hacks) on OS X Java 6 (1.6.0_20) 64-bit Server VM on a MacBook Pro Core 2 Duo at 2.66GHz.

First, the basic JRuby "fib" numbers with full Ruby backtraces and dynamic calls:

  0.357000   0.000000   0.357000 (  0.290000)
  0.176000   0.000000   0.176000 (  0.176000)
  0.174000   0.000000   0.174000 (  0.174000)
  0.175000   0.000000   0.175000 (  0.175000)
  0.174000   0.000000   0.174000 (  0.174000)

Now, with backtraces calculated from Java backtraces, dynamic calls restructured slightly to be more inlinable (but still dynamic and via invokers), and limited use of primitive call paths. Essentially the fastest we can get in JRuby 1.5 without totally breaking compatibility (and with some minor breakages, in fact):

  0.383000   0.000000   0.383000 (  0.330000)
  0.109000   0.000000   0.109000 (  0.108000)
  0.111000   0.000000   0.111000 (  0.112000)
  0.101000   0.000000   0.101000 (  0.101000)
  0.101000   0.000000   0.101000 (  0.101000)

Now, using all the runtime optimizations described above. Keep in mind these numbers will probably drop a bit with guards in place (but they're still pretty fantastic):

  0.443000   0.000000   0.443000 (  0.376000)
  0.035000   0.000000   0.035000 (  0.035000)
  0.036000   0.000000   0.036000 (  0.036000)
  0.036000   0.000000   0.036000 (  0.036000)
  0.036000   0.000000   0.036000 (  0.036000)

And a couple comparisons of another mostly-recursive, largely-abused benchmark target: the tak function...

Static optimizations only, as fast as we can make it:

  1.750000   0.000000   1.750000 (  1.695000)
  0.779000   0.000000   0.779000 (  0.779000)
  0.764000   0.000000   0.764000 (  0.764000)
  0.775000   0.000000   0.775000 (  0.775000)
  0.763000   0.000000   0.763000 (  0.763000)

And with runtime optimizations:

  0.899000   0.000000   0.899000 (  0.832000)
  0.331000   0.000000   0.331000 (  0.331000)
  0.332000   0.000000   0.332000 (  0.332000)
  0.329000   0.000000   0.329000 (  0.329000)
  0.331000   0.000000   0.331000 (  0.332000)

So for these trivial benchmarks, a couple hours hacking in JRuby's compiler has produced numbers 2-3x faster than the fastest JRuby performance we've achieved thusfar. Understanding why requires one last section.

Hooray for the JVM

There's a missing trick here I haven't explained: the JVM is still doing most of the work for us.

In the plain old dynamic-calling, static-optimized case, the JVM is dutifully optimizing everything we throw at it. It's taking our Ruby calls, inlining the invokers and their eventual calls, inlining core class methods (yes, this means we've always been able to inline Ruby into Ruby or Java into Ruby or Ruby into Java, given the appropriate tweaks), and optimizing things very well. But here's the problem: the JVM's default settings are tuned for optimizing *Java*, not for optimizing Ruby. In Java, a call from one method to another has exactly one hop. In JRuby's dynamic calls, it may take three or four hops, bouncing through CallSites and DynamicMethods to eventually get to the target. Since Hotspot only inlines up to 9 levels of calls by default, you can see we're eating up that budget very quickly. Add to that the fact that Java to Java calls have no intervening code to eat up bytecode size thresholds, and we're essentially only getting a fraction of the optimization potential out of Hotspot that a "simpler" language like Java does.

Now consider the dynamically-optimized case. We've eliminated both the CallSite and the DynamicMethod from most calls, even if we have to insert a bit of guard code to do it. That means nine levels of Ruby calls can inline, or nine levels of logic from Ruby to core methods or Ruby to Java. We've now given Hotspot a much better picture of the system to optimize. We've also eliminated some of the background noise of doing a Ruby call, like updating rarely-used data structures. We'll need to ensure backtraces come out in some usable form, but at least we're not double-tracking them.

The bottom line here is that instead of doing all the much harder work of adding inlining to our own compiler, all we need to do is let the JVM do it, and we benefit from the years of work that have gone into current JVMs' optimizing compilers. The best way to solve a hard problem is to get someone else to solve it for you, and the JVM does an excellent job of solving this particular hard problem. Instead of giving the JVM a fish *or* teaching it how to fish, we're just giving it a map of the lake; it's already an expert fisherman.

There's also another side to these optimizations: our continued maintenance of an interpreter is starting to pay off. Other JVM languages that don't have an interpreted mode have a much more difficult time doing any runtime optimization; simply put, it's really hard to swap out bytecode you've already loaded unless you're explicitly abstracting how that bytecode gets generated in the first place. In JRuby, where methods have always had to switch from interpreted to jitted at runtime, we're can can hop back and forth much more freely, optimizing along the way.

That's all for now. Don't expect to download JRuby 1.6 in a few months and see everything be three times (or ten times! or 100 times!) faster. There's a lot of details we need to work out about when these optimizations will be safe, how users might be able to opt-in to "full" optimization if necessary, and whether we'll get things fast enough to start replacing core code with Ruby. But early results are very promising, and it's assured we'll ship some of this very soon.

Building Ruboto: Precompiling Ruby for Android

2010-04-30T01:12:00.002-05:00

I originally started to send this to the JRuby dev list and to the Ruboto list, but realized quickly that it might make a good blog post. Since I don't blog enough lately, here it is.

I've been looking into better ways to precompile Ruby code to classes for deploy on Android devices. Normally, JRuby users can just jar up their .rb files and load and require them as though they were on the filesystem; JRuby finds, loads, and runs them just fine. This works well enough on Android, but since there's no way to generate bytecode at runtime, JRuby code that isn't precompiled must run interpreted forever...and run a bit slower than we'd like because of it. In order for Ruby to be a first-class language for Android development, we must make it possible to precompile Ruby code *completely* and bundle it up with the application. So this evening I spent some time making that possible.

I have some good news, some bad news, and some good news. First, a bit of background into JRuby's compiler.

What JRuby's Compiler Produces

JRuby's ahead-of-time compiler produces a single .class file per .rb file, to ease deployment and lookup of those files (and because I think it's ugly to always vomit out an unidentifiable class for every method body). This produces a nice 1:1 mapping between .rb and .class, but it comes at a cost: since those .class files are just "bags of methods" we need to bind those methods somehow. This usually happens at runtime, with JRuby generating a small "handle" class for every method as it is bound. So for a script like this:

# foo.rb
class Foo
  def bar; end
end

def hello; end

You will get one top-level class file when you AOT compile, and then two more class files are generated at runtime for the "handles" for methods "bar" and "hello". This provides the best possible performance for invocation, plus a nice 1:1 on-disk format...but it means we're still generating a lot of code at runtime.

The other complication is that jrubyc normally outputs a .class file of the same name as the .rb file, to ease lookup of that .class file at runtime. So the main .class for the above script would be called "foo.class". The problem with this is that "foo.rb" may not always be loaded as "foo.rb". A user might load '../yum/../foo.rb' or some other peculiar path. As a result, the base name of the file is not enough to determine what class name to load. To solve this, I've introduced an alternate naming scheme that uses the SHA1 hash of the *actual* content of the file as the class name. So, for the above script, the resulting class would be named:

ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2

While this isn't a pretty name, it does provide a way to locate the compiled version of a script universally, regardless of what path is used to load it.

The Good News

I've modified jrubyc (on master only...we need to talk about whether this should be a late addition to 1.5) to have a new --sha1 flag. As you might guess, this flag alters the compile process to generate the sha1-named class for each compiled file.

~/projects/jruby ➔ jrubyc foo.rb 
Compiling foo.rb to class foo

~/projects/jruby ➔ jrubyc --sha1 foo.rb 
Compiling foo.rb to class ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2

~/projects/jruby ➔ jruby -X+C -J-Djruby.jit.debug=true -e "require 'foo'"
...
found jitted code for ./foo.rb at class: ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2
...

This is actually finding the foo.rb file, calculating its SHA1 hash, and then loading the .class file instead. So if you had a bunch of .rb code for an Android application and wanted to precompile it, you'd run this command to get the sha1 classes, and then include both the .rb file and the .class file in your application (the .rb file must be there because...you guessed it...we need to calculate the sha1 hash from its contents).

To test this out, I actually ran jrubyc against the Ruby stdlib to produce a sha1 class for every .rb file:

~/projects/jruby ➔ jrubyc -t /tmp --sha1 lib/ruby/1.8/
Compiling all in '/Users/headius/projects/jruby/lib/ruby/1.8'...
Compiling lib/ruby/1.8//abbrev.rb to class ruby.jit.FILE_4F30363F88066CC74555ABA5BE4B73FDE323BE1A
Compiling lib/ruby/1.8//base64.rb to class ruby.jit.FILE_DD42170B797E34D082C952B92A19474E3FDF3FA2
Compiling lib/ruby/1.8//benchmark.rb to class ruby.jit.FILE_0C42EBD7F248AF396DE7A70C0FBC31E9E8D233DE
...
Compiling lib/ruby/1.8//xsd/xmlparser/rexmlparser.rb to class ruby.jit.FILE_8B106B9E9F2F1768470A7A4E6BD1A36FC0859862
Compiling lib/ruby/1.8//xsd/xmlparser/xmlparser.rb to class ruby.jit.FILE_AF51477EA5467822D8ADED37EEB5AB5D841E07D9
Compiling lib/ruby/1.8//xsd/xmlparser/xmlscanner.rb to class ruby.jit.FILE_3203482AEE794F4B9D5448BF51935879B026092C

This produces 524 class files for 524 .rb files, just as it should, and running with forced compilation (-X+C) and jruby.jit.debug=true shows that it finds each class when loading anything from stdlib. That's a good start!

What About the Handles?

I mentioned above that we also generate, at runtime, a small handle class for every bound method in a given script. And again, since we can't generate bytecode on-device, we need a way to pregenerate all those handles.

An hour's worth of work later, and jrubyc has a --handles flag that will additionally spit out all method handles for each script compiled. Here's our foo script compiled with --sha1 and --handles, along with the resulting .class files:

~/projects/jruby ➔ jrubyc --sha1 --handles foo.rb
Compiling foo.rb to class ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2
Generating direct handles for foo.rb

~/projects/jruby ➔ ls ruby/jit/*351347*
ruby/jit/FILE_351347C9126659D4479558A2706DBC35E45D16D2.class

~/projects/jruby ➔ ls *351347*
ruby_jit_FILE_351347C9126659D4479558A2706DBC35E45D16D2Invokermethod__1$RUBY$barFixed0.class
ruby_jit_FILE_351347C9126659D4479558A2706DBC35E45D16D2Invokermethod__2$RUBY$helloFixed0.class

And sure enough, we can also see that these handles are being loaded instead of generated at runtime. So it's possible with these two options to *completely* precompile JRuby sources into .class files. Hooray!

The Bad News

My next step was obviously to try to precompile and dex the entire Ruby standard library. That's 524 files, but how many method bodies? We'd need to generate a handle for each one of them.

~/projects/jruby ➔ mkdir stdlib-compiled

~/projects/jruby ➔ jrubyc --sha1 --handles -t stdlib-compiled/ lib/ruby/1.8/
Compiling all in '/Users/headius/projects/jruby/lib/ruby/1.8'...
Compiling lib/ruby/1.8//abbrev.rb to class ruby.jit.FILE_4F30363F88066CC74555ABA5BE4B73FDE323BE1A
Generating direct handles for lib/ruby/1.8//abbrev.rb
Compiling lib/ruby/1.8//base64.rb to class ruby.jit.FILE_DD42170B797E34D082C952B92A19474E3FDF3FA2
Generating direct handles for lib/ruby/1.8//base64.rb
...
Compiling lib/ruby/1.8//xsd/xmlparser/xmlparser.rb to class ruby.jit.FILE_AF51477EA5467822D8ADED37EEB5AB5D841E07D9
Generating direct handles for lib/ruby/1.8//xsd/xmlparser/xmlparser.rb
Compiling lib/ruby/1.8//xsd/xmlparser/xmlscanner.rb to class ruby.jit.FILE_3203482AEE794F4B9D5448BF51935879B026092C
Generating direct handles for lib/ruby/1.8//xsd/xmlparser/xmlscanner.rb

~/projects/jruby ➔ find stdlib-compiled/ -name \*.class | wc -l
    8212

Wowsers, that's a lot of method bodies..over 7500 of them. But of course this is the entire Ruby standard library, with code for network protocols, templating, xml parsing, soap, and so on. Now for the more frightening numbers: keeping in mind that .class is a pretty verbose file format, how big are all these class files?

~/projects/jruby ➔ du -ks stdlib-compiled/ruby
14008 stdlib-compiled/ruby

~/projects/jruby ➔ du -ks stdlib-compiled/
44784 stdlib-compiled/

Yeeow! The standard library alone (without handles) produces 14MB of .class files, and with handles it goes up to a whopping 44MB of .class files! That seems a bit high, doesn't it? Especially considering that the .rb files add up to around 4.5MB?

Well there's a few explanations for this. First off, the generated handles are rather small, around 2k each, but they each are probably 50% the exact same code. They're generated as separate handles primarily because the JVM will not inline the same loaded body of code through two different call paths, so we have to duplicate that logic repeatedly. Java 7 fixes some of this, but for now we're stuck. The handle classes also share almost identical constant pools, or in-file tables of strings. Many of the same characteristics apply to the compiled Ruby scripts, so the 44MB number is a bit larger than it needs to be.

We can show a more realistic estimate of on-disk size by compressing the lot, first with normal "jar", and then with the "pack200" utility, which takes greater advantage of the .class format's intra-file redundancy:

~/projects/jruby ➔ cd stdlib-compiled/

~/projects/jruby/stdlib-compiled ➔ jar cf stdlib-compiled.jar .

~/projects/jruby/stdlib-compiled ➔ pack200 stdlib-compiled.pack.gz stdlib-compiled.jar

~/projects/jruby/stdlib-compiled ➔ ls -l stdlib-compiled.*
-rw-r--r--  1 headius  staff  13424221 Apr 30 01:43 stdlib-compiled.jar
-rw-r--r--  1 headius  staff   4051355 Apr 30 01:44 stdlib-compiled.pack.gz

Now we're seeing more reasonable numbers. A 13MB jar file is still pretty large, but it's not bad considering we started with 44MB of .class files. The packed size is even better: only 4MB for a completely-compiled Ruby standard library, and ultimately *smaller* than the original sources.

So what's the bad news? It obviously wasn't the size, since I just showed that was a red herring. The bad news is when we try to dex this jar.

The "dx" Tool

The Android SDK ships with a tool called "dx" which gets used at build time to translate Java bytecode (in .class files, .jar files, etc) to Dalvik bytecode (resulting in a .dex file. Along the way it optimizes the code, tidies up redundancies, and basically makes it as clean and compact as possible for distribution to an Android device. Once on the device, the Dalvik bytecode gets immediately compiled into whatever the native processor runs, so the dex data needs to be as clean and optimized as possible.

Every Java application shipped for Android must pass through dx in some way, so my next step was to try to "dex" the compiled standard library:

~/projects/jruby/out ➔ ../../android-sdk-mac_86/platforms/android-7/tools/dx --dex --verbose --positions=none --no-locals --output=stdlib-compiled.dex stdlib-compiled.jar 
processing archive stdlib-compiled.jar...
ignored resource META-INF/MANIFEST.MF
processing ruby/jit/FILE_003796EE1C0C24540DF7239B8197C183BC7017BB.class...
processing ruby/jit/FILE_00499F5FE29ED8EDB63965B0F65B19CFE994D120.class...
...
processing ruby_jit_FILE_FEF23DE8CDA5B9BD9D880CBC08D3249158379E58Invokermethod__5$RUBY$run_suiteFixed0.class...
processing ruby_jit_FILE_FEF23DE8CDA5B9BD9D880CBC08D3249158379E58Invokermethod__6$RUBY$create_resultFixed0.class...

trouble writing output: format == null

Uh-oh, that doesn't look good. What happened?

Well it turns out that the Ruby standard library *plus* all the handles needed to bind it is too much for the current dex file format It's a known issue that similarly bit Maciek Makowski (reported of the linked bug) when he tried to dex both the Scala compiler and the Scala base set of libraries in one go. And similar to his case, I was able to successfully dex *either* the precompiled stdlib *or* the generated handles...but not both at the same time.

What Can We Do?

It appears that for the moment, it's not going to be possible to completely precompile the entire Ruby standard library. But there's ways around that.

First off, probably no application on the planet needs the entire standard library, so we can easily just include the files needed for a given app. That may be enough to cut the size down tremendously. It's also perfectly possible to build a very complicated Ruby application for Android that will easily fit into the current dex format; I doubt most mobile applications would result in 4.5MB of uncompressed .rb source. So the added --sha1 and --handle features will be immediately useful for Android development.

Secondly, I've been planning on adding a different way to bind methods that doesn't require a class file per method. I would probably generate a large switch for each .rb file and then bind the methods numerically, so only a single additional class (and only a few methods in that class) would be needed to bind an entire compiled .rb script. This issue with dex will force me to finally do that.

And lastly, there's a bit more good news. Remember that the packed size of the entire standard library plus handles was around 4MB? Here's the sizes of the dex'ed standard library and handles:

~/projects/jruby ➔ ls -l *.dex
-rw-r--r--  1 headius  staff  3718340 Apr 30 00:57 stdlib-compiled-solo.dex
-rw-r--r--  1 headius  staff  8656300 Apr 30 00:52 stdlib-compiled-handles.dex

~/projects/jruby ➔ jar cf blah.apk *.dex

~/projects/jruby ➔ ls -l blah.apk 
-rw-r--r--  1 headius  staff  3179625 Apr 30 02:01 blah.apk

Once dex has worked its magic against our sources, we're now down to 3.1MB of compressed code...a pretty good size for the entire Ruby standard library plus 7500+ noisy, repetitive handles. We're definitely within reach of making full Ruby development for Android a reality.

Nokogiri Java Port: Help Us Finish It!

2010-04-11T12:27:00.005-05:00

One of the most commonly used native extensions for Ruby is the Nokogiri XML API. Nokogiri wraps libxml and has a fair amount of C code that links directly against the Ruby C extension API...an API we don't support in JRuby (and won't, without a lot of community help).

A bit over a year ago, the Nokogiri folks did us a big favor by creating an FFI version of Nokogiri that works surprisingly well; it's probably the most widely-used FFI-based Ruby library around. But the endgame for Nokogiri on JRuby has always been to get a pure-Java version. Not everyone is allowed to link native libraries on their Java platform of choice, and those that are often have trouble getting the right libxml versions installed. The Java version needs to happen.

That day is very close.

I spent a bit of time this weekend getting the Nokogiri "java" port running on my system, and the folks working on it have brought it almost to 100% passing. It's time to push it over the edge.

Building and Testing

Here's my process for getting it building. Let me know if this needs to be edited.

Update: Added rake-compiler and hoe to gems you need to install and modified the git command-line for versions that don't automatically create a local tracking branch.

1. Clone the Nokogiri repository and switch to the "java" branch

~/projects ➔ git clone git://github.com/tenderlove/nokogiri.git
Initialized empty Git repository in /Users/headius/projects/nokogiri/.git/
remote: Counting objects: 14767, done.
remote: Compressing objects: 100% (3882/3882), done.
remote: Total 14767 (delta 10482), reused 13969 (delta 9945)
Receiving objects: 100% (14767/14767), 3.73 MiB | 742 KiB/s, done.
Resolving deltas: 100% (10482/10482), done.

~/projects ➔ cd nokogiri/

~/projects/nokogiri ➔ git checkout -b java origin/java
Branch java set up to track remote branch java from origin.
Switched to a new branch 'java'

2. Install racc, rexical, rake-compiler, and hoe into Ruby (C Ruby, that is, since they also have extensions)

~/projects/nokogiri ➔ sudo gem install racc rexical rake-compiler hoe
Building native extensions.  This could take a while...
Successfully installed racc-1.4.6
Successfully installed rexical-1.0.4
Successfully installed rake-compiler-0.7.0
Successfully installed hoe-2.6.0
4 gems installed

3. Build the lexer and parser using C Ruby

~/projects/nokogiri ➔ rake gem:dev:spec
(in /Users/headius/projects/nokogiri)
warning: couldn't activate the debugging plugin, skipping
rake-compiler must be configured first to enable cross-compilation
/usr/bin/racc -l -o lib/nokogiri/css/generated_parser.rb lib/nokogiri/css/parser.y
rex --independent -o lib/nokogiri/css/generated_tokenizer.rb lib/nokogiri/css/tokenizer.rex

4. Build the Java bits (using rake in JRuby)

~/projects/nokogiri ➔ jruby -S rake java:build
(in /Users/headius/projects/nokogiri)
warning: couldn't activate the debugging plugin, skipping
javac -g -cp /Users/headius/projects/jruby/lib/jruby.jar:../../lib/nekohtml.jar:../../lib/nekodtd.jar:../../lib/xercesImpl.jar:../../lib/isorelax.jar:../../lib/jing.jar nokogiri/*.java nokogiri/internals/*.java
jar cf ../../lib/nokogiri/nokogiri.jar nokogiri/*.class nokogiri/internals/*.class

5. Run the tests (again with rake on JRuby)

~/projects/nokogiri ➔ jruby -S rake test
(in /Users/headius/projects/nokogiri)
...full output...

On my system, I get about 8 failures and 19 errors, out of 785 tests and 1657 assertions. We're very close!

A few other useful tasks:

jruby -S rake java:clean_all wipes out the build Java stuff
jruby -S rake java:gem builds the Java gem, if you want to try installing it

Helping Out

If you'd like to help fix these bugs, there's a few ways to approach it.

Join the nokogiri-talk Google Group so you can communicate with others working on the port. The key folks right now are Yoko Harada and Sergio Arbeo (who did the original bulk of the work for GSoC 2009). I'm also poking at it a bit in my spare time.
Post to the group to let folks know you want to help. This will help avoid duplicated effort.
Pick tests that appear to be missing or incorrect Ruby logic, like "not implemented", nil results ("method blah not found for nil") or arity errors ("3 arguments for 2" kinds of things). These are often the simplest ones to fix.
Don't give up! We're almost there!

It would be great if we could have a 100% working Nokogiri Java port for JRuby 1.5 final this month. I hope to see you on the nokogiri-talk list! Feel free to comment here if you have questions about getting bootstrapped.

Getting Started with Duby

2010-04-03T19:05:00.005-05:00

Hello again!

As you may know, I've been working part-time on a new language called Duby. Duby looks like Ruby, since it co-opts the JRuby parser, and includes some of the features of the Ruby language like optional arguments and closures. But Duby is not Ruby; it's statically typed, compiles to "native" code (JVM bytecode, for example) before running, and does not have any built-in library of its own (preferring to just use what's available on a given runtime). Here's a quick sample of Duby code:

class Foo
  def initialize(hello:String)
    puts 'constructor'
    @hello = hello
  end

  def hello(name:String)
    puts "#{@hello}, #{name}"
  end
end

Foo.new('Hiya').hello('Duby')

This post is not going to be an overview of the Duby language; I'll get that together soon, once I take stock of where the language stands as far as features go. Instead, this "getting started" post will show how you can grab the Duby repository and start playing with it right now.

First you need to pull down three resources: Duby itself, BiteScript (the Ruby DSL I use to generate JVM bytecode), and a JRuby 1.5 snapshot:

~/projects/tmp ➔ git clone git://github.com/headius/duby.git
Initialized empty Git repository in /Users/headius/projects/tmp/duby/.git/
remote: Counting objects: 2810, done.
remote: Compressing objects: 100% (1291/1291), done.
remote: Total 2810 (delta 1690), reused 2509 (delta 1447)
Receiving objects: 100% (2810/2810), 10.64 MiB | 722 KiB/s, done.
Resolving deltas: 100% (1690/1690), done.

~/projects/tmp ➔ git clone git://github.com/headius/bitescript.git
Initialized empty Git repository in /Users/headius/projects/tmp/bitescript/.git/
remote: Counting objects: 470, done.
remote: Compressing objects: 100% (404/404), done.
remote: Total 470 (delta 166), reused 313 (delta 57)
Receiving objects: 100% (470/470), 93.56 KiB, done.
Resolving deltas: 100% (166/166), done.

~/projects/tmp ➔ curl http://ci.jruby.org/snapshots/jruby-bin-1.5.0.dev.tar.gz | tar xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.3M  100 11.3M    0     0   353k      0  0:00:32  0:00:32 --:--:--  262k

~/projects/tmp ➔ ls
bitescript duby  jruby-1.5.0.dev

~/projects/tmp ➔ mv jruby-1.5.0.dev/ jruby

Once you have these three pieces in place, Duby can now be run. It's easiest to put the JRuby snapshot in PATH, but you can just run it directly too:

~/projects/tmp ➔ cd duby

~/projects/tmp/duby ➔ ../jruby/bin/jruby bin/duby -e "puts 'hello'"
hello

~/projects/tmp/duby ➔ ../jruby/bin/jruby bin/dubyc -e "puts 'hello'"

~/projects/tmp/duby ➔ java DashE
hello

Finally, you may want to create a "complete" Duby jar that includes Duby, BiteScript, JRuby, and Java classes for command-line or Ant task usage. Using JRuby 1.5's Ant integration, the Duby Rakefile can produce that for you:

~/projects/tmp/duby ➔ ../jruby/bin/jruby -S rake jar:complete
(in /Users/headius/projects/tmp/duby)
mkdir -p build
Compiling Ruby sources
Generating Java class DubyCommand to DubyCommand.java
javac -d build -cp ../jruby/lib/jruby.jar:. DubyCommand.java
Compiling Duby sources
mkdir -p dist
Building jar: /Users/headius/projects/tmp/duby/dist/duby.jar
mkdir -p dist
Building jar: /Users/headius/projects/tmp/duby/dist/duby-complete.jar

~/projects/tmp/duby ➔ java -jar dist/duby-complete.jar run -e 'puts "Duby is Awesome!"'
Duby is Awesome!

Hopefully we'll soon have duby.jar, duby-complete.jar, and a new Duby gem released, but this is a quick way to get involved.

I'll get back to you with a post on the Duby language itself Real Soon Now!

Update: I have also uploaded a snapshot duby-complete.jar (which includes both the Main-Class for jar execution and the simple Ant task org.jruby.duby.ant.Compiler) on the Duby Github downloads page. Have fun!

Using Ivy with JRuby 1.5's Ant Integration

2010-04-03T17:18:00.003-05:00

JRuby 1.5 will be released soon, and one of the coolest new features is the integration of Ant support into Rake, the Ruby build tool. Tom Enebo wrote an article on the Rake/Ant integration for the Engine Yard blog, which has lots of examples of how to start migrating to Rake without leaving Ant behind. I'm not going to cover all that here.

I've been using the Rake/Ant stuff for a few weeks now, first for my "weakling" RubyGem which adds a queue-supporting WeakRef to JRuby, and now for cleaning up Duby's build process. Along the way, I've realized I really never want to write Ant scripts again; they're so much nicer in Rake, and I have all of Ruby and Ant available to me.

One thing Ant still needs help with is dependency resolution. Many people make the leap to Maven, and let it handle all the nuts and bolts. But that only works if you really buy into the Maven way of life...a problem if you're like me and you live in a lot of hybrid worlds where the Maven way doesn't necessarily fit. So many folks are turning to Apache Ivy to get dependency management in their builds without using Maven.

Today I thought I'd translate the simple "no-install" Ivy example build (warning, XML) to Rake, to see how easy it would be. The results are pretty slick.

First we need to construct the equivalent to the "download-ivy" and "install-ivy" ant tasks. I chose to put that in a Rake namespace, like this:

namespace :ivy do
  ivy_install_version = '2.0.0-beta1'
  ivy_jar_dir = './ivy'
  ivy_jar_file = "#{ivy_jar_dir}/ivy.jar"

  task :download do
    mkdir_p ivy_jar_dir
    ant.get :src => "http://repo1.maven.org/maven2/org/apache/ivy/ivy/#{ivy_install_version}/ivy-#{ivy_install_version}.jar",
      :dest => ivy_jar_file,
      :usetimestamp => true
  end

  task :install => :download do
    ant.path :id => 'ivy.lib.path' do
      fileset :dir => ivy_jar_dir, :includes => '*.jar'
    end

    ant.taskdef :resource => "org/apache/ivy/ant/antlib.xml",
      #:uri => "antlib:org.apache.ivy.ant",
      :classpathref => "ivy.lib.path"
  end
end

Notice that instead of using Ant properties, I've just used Ruby variables for the Ivy install version, dir, and file. I've also removed the "uri" element to ant.taskdef because I'm not sure if we have an equivalent for that in Rake yet (note to self: figure out if we have an equivalent for that).

With these two tasks, we can now fetch ivy and install it for the remainder of the build. Here's running the download task from the command line:

~/projects/duby ➔ rake ivy:download
(in /Users/headius/projects/duby)
mkdir -p ./ivy
Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.0.0-beta1/ivy-2.0.0-beta1.jar
To: /Users/headius/projects/duby/ivy/ivy.jar

Now we want a simple task that uses ivy:install to fetch resources and make them available for the build. Here's the example from Apache, using the cachepath task, written in Rake:

task :go => "ivy:install" do
  ant.cachepath :organisation => "commons-lang",
    :module => "commons-lang",
    :revision => "2.1",
    :pathid => "lib.path.id",
    :inline => "true"
end

Pretty clean and simple, and it fits nicely into the flow of the Rakefile. I can also switch this to using the "retrieve" task, which just pulls the jars down and puts them where I want them:

task :go => "ivy:install" do
  ant.retrieve :organisation => 'commons-lang',
    :module => 'commons-lang',
    :revision => '2.1',
    :pattern => 'javalib/[conf]/[artifact].[ext]',
    :inline => true
end

This fetches the Apache Commons Lang package along with all dependencies into javalib, separated by what build configuration they are associated with (runtime, test, etc). Here it is in action:

~/projects/duby ➔ rake go
(in /Users/headius/projects/duby)
mkdir -p ./ivy
Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.0.0-beta1/ivy-2.0.0-beta1.jar
To: /Users/headius/projects/duby/ivy/ivy.jar
Not modified - so not downloaded
Trying to override old definition of task buildnumber
:: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
:: loading settings :: url = jar:file:/Users/headius/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: commons-lang#commons-lang-caller;working
 confs: [default, master, compile, provided, runtime, system, sources, javadoc, optional]
 found commons-lang#commons-lang;2.1 in public
:: resolution report :: resolve 64ms :: artifacts dl 3ms
 ---------------------------------------------------------------------
 |                  |            modules            ||   artifacts   |
 |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
 ---------------------------------------------------------------------
 |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
 |      master      |   1   |   0   |   0   |   0   ||   1   |   0   |
 |      compile     |   1   |   0   |   0   |   0   ||   0   |   0   |
 |     provided     |   1   |   0   |   0   |   0   ||   0   |   0   |
 |      runtime     |   1   |   0   |   0   |   0   ||   0   |   0   |
 |      system      |   1   |   0   |   0   |   0   ||   0   |   0   |
 |      sources     |   1   |   0   |   0   |   0   ||   1   |   0   |
 |      javadoc     |   1   |   0   |   0   |   0   ||   1   |   0   |
 |     optional     |   1   |   0   |   0   |   0   ||   0   |   0   |
 ---------------------------------------------------------------------
:: retrieving :: commons-lang#commons-lang-caller
 confs: [default, master, compile, provided, runtime, system, sources, javadoc, optional]
 4 artifacts copied, 0 already retrieved (1180kB/30ms)

But if I have multiple artifacts, this could be pretty cumbersome. Since this is Ruby, I can just put this in a method and call it repeatedly:

def ivy_retrieve(org, mod, rev)
  ant.retrieve :organisation => org,
    :module => mod,
    :revision => rev,
    :pattern => 'javalib/[conf]/[artifact].[ext]',
    :inline => true
end

artifacts = %w[
  commons-lang commons-lang 2.1
  org.jruby jruby 1.4.0
]

task :go => "ivy:install" do
  artifacts.each_slice(3) do |*artifact|
    ivy_retrieve(*artifact)
  end
end

Look for JRuby 1.5 release candidates soon, and let us know what you think of the new Ant integration!

Ruby Summer of Code 2010

2010-03-28T13:24:00.002-05:00

This year, no major Ruby organization got accepted to Google's Summer of Code (even though a half dozen Python projects got accepted, but I won't rant here). What do we as Rubyists do? Take it sitting down? NO! We make our own Summer of Code!

Thanks to Engine Yard, Ruby Central, and the Rails team, Ruby Summer of Code has raised $100k in just three days, allowing us to run 20 student projects! Hooray!

Ruby Summer of Code

Now of course we really would love to have some JRuby projects involved. There's so much exciting stuff going on with JRuby, and I believe it's the most promising platform for really growing the Ruby community. So we've set up a page for JRuby Ruby Summer of Code 2010 ideas. Here's a few to get you started:

JRuby on Android work, including command-line tooling, performance work, and all the little bits and pieces needed to make Ruby a first-class Android language.
Porting key C extensions to JRuby, so there's an alternative for people migrating.
A super-fast lightweight server similar to the GlassFish gem.
A full Hibernate and/or JPA backend for DataMapper or DataObjects, so that all databases Hibernate supports "just work" with JRuby.
Work on JRuby's nascent suport for Ruby C extensions by building the API out
Help get JRuby's early optimizing compiler wired up, to take JRuby's perf to the next level
Duby-related projects, like IDE support, better tooling, codebase cleanup, features, documentation.

And there's dozens of other projects out there just waiting for you! Add yourself as a student on the RubySOC page, add some ideas to the JRuby ideas page, and let's get hacking!

JRuby Startup Time Tips

2010-03-01T15:52:00.005-06:00

JRuby has become notorious among Ruby implementations for having a slow startup time. Some of this is the JVM's fault, since it doesn't save JIT products and often just takes a long time to boot and get going. Some of this is JRuby's fault, though we work very hard to eliminate startup bottlenecks once they're reported and diagnosed. But a large number of startup time problems stem from simple configuration issues. Here's a few tips to help you improve JRuby's startup time.

(Note: JRuby actually does pretty well on base startup time compared to other JVM languages; but MRI, while generally not the fastest Ruby impl, starts up incredibly fast.)

Make sure you're running the client JVM

This is by far the easiest way to improve startup. On HotSpot, the JVM behind OpenJDK and Sun's JDK, there are two different JIT-compiling backends: "client" and "server". The "client" backend does only superficial optimizations to JVM code as it compiles, but compiles things earlier and more quickly. The "server" backend performs larger-scale, more-global optimizations as it compiles, but takes a lot longer to get there and uses more resources when it compiles.

Up until recently, most JVM installs preferred the "client" backend, which meant JVM startup was about as fast as it could get. Unfortunately, many newer JVM releases on many operating systems are either defaulting to "server" or defaulting to a 64-bit JVM (which only has "server"). This means that without you or JRuby doing anything wrong, startup time takes a major hit.

Here's an example session showing a typical jruby -v line for a "server" JVM and the speed of doing some RubyGems requires (which is where most time is spent booting a RubyGems-heavy application):

~/projects/jruby ➔ jruby -v
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-03-01 6857a4e) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_17) [x86_64-java]

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m5.174s
user 0m7.643s
sys 0m0.422s

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m5.068s
user 0m7.662s
sys 0m0.449s

Ouch, over 5 seconds just to boot RubyGems and load ActiveSupport. You can see in this case that the JVM is preferring the "64-Bit Server VM", which will give you great runtime performance but pretty dismal startup time. If you see this sort of -v output, you can usually force the JVM to run in 32-bit "client" mode by specifying "-d32" as a JVM option. Here's an example using an environment variable:

~/projects/jruby ➔ export JAVA_OPTS="-d32"

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m2.320s
user 0m2.583s
sys 0m0.207s

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m2.275s
user 0m2.580s
sys 0m0.207s

On my system, a MacBook Pro running OS X 10.6, switching to the "client" VM improves this scenario by well over 50%. You may also want to try the "-client" option alone or in combination with -d32, since some 32-bit systems will still default to "server". Play around with it and see what works for you, and then let us know your platform, -v string, and how much improvement you see.

Regenerate the JVM's shared archive

Starting with Java 5, the HotSpot JVM has included a feature known as Class Data Sharing (CDS). Originally created by Apple for their OS X version of HotSpot, this feature loads all the common JDK classes as a single archive into a shared memory location. Subsequent JVM startups then simply reuse this read-only shared memory rather than reloading the same data again. It's a large reason why startup times on Windows and OS X have been so much better in recent years, and users of those systems may be able to ignore this tip.

On Linux, however, the shared archive is often *never* generated, since installers mostly just unpack the JVM into its final location and never run it. In order to force your system to generate the shared archive, run some equivalent of this command line:

headius@headius-desktop:~/jruby$ sudo java -Xshare:dump
Loading classes to share ... done.
Rewriting and unlinking classes ... done.
Calculating hash values for String objects .. done.
Calculating fingerprints ... done.
Removing unshareable information ... done.
Moving pre-ordered read-only objects to shared space at 0x94030000 ... done.
Moving read-only objects to shared space at 0x9444bef8 ... done.
Moving common symbols to shared space at 0x9444d870 ... done.
Moving remaining symbols to shared space at 0x945154e0 ... done.
Moving string char arrays to shared space at 0x94516118 ... done.
Moving additional symbols to shared space at 0x945ac5b0 ... done.
Read-only space ends at 0x94612560, 6169952 bytes.
Moving pre-ordered read-write objects to shared space at 0x94830000 ... done.
Moving read-write objects to shared space at 0x94ea64a0 ... done.
Moving String objects to shared space at 0x94ee3708 ... done.
Read-write space ends at 0x94f27448, 7304264 bytes.
Updating references to shared objects ... done.

For many users, this can make a tremendous difference in startup time, since all that extra class data remains available in memory. You may need to specify the -d32 or -client options along with the -Xshare option. Play with those and the other -Xshare modes to see if it helps your situation.

Delay or disable JRuby's JIT

An interesting side effect of JRuby's JIT is that it sometimes actually slows execution for really short runs. The compiler isn't free, obviously, nor is the cost of loading, verifying, and linking the resulting JVM bytecode. If you have a very short command that touches a lot of code, you might want to try disabling or delaying the JIT.

Disabling is easy: pass the -X-C flag or set the jruby.compile.mode property to "OFF":

~/projects/jruby ➔ time jruby -S gem install rake
Successfully installed rake-0.8.7
1 gem installed
Installing ri documentation for rake-0.8.7...
Installing RDoc documentation for rake-0.8.7...

real 0m13.188s
user 0m12.342s
sys 0m0.685s

~/projects/jruby ➔ time jruby -X-C -S gem install rake
Successfully installed rake-0.8.7
1 gem installed
Installing ri documentation for rake-0.8.7...
Installing RDoc documentation for rake-0.8.7...

real 0m12.590s
user 0m12.342s
sys 0m0.583s

This doesn't generally give you a huge boost, but it can be enough to keep you from going mad.

Avoid spawning "sub-rubies"

It's a fairly common idiom for Rubyists to spawn a Ruby subprocess using Kernel#system, Kernel#exec, or backquotes. For example, you may want to prepare a clean environment for a test run. That sort of scenario is perfectly understandable, but spawning many sub-rubies can take a tremendous toll on overall runtime.

When JRuby sees a #system, #exec, or backquote starting with "ruby", we will attempt to run it in the same JVM using a new JRuby instance. Because we have always supported "multi-VM" execution (where multiple isolated Ruby environments share a single process), this can make spawning sub-Rubies considerably faster. This is, in fact, how JRuby's Nailgun support (more on that later) keeps a single JVM "clean" for multiple JRuby command executions. But even though this can improve performance, there's still a cost for starting up those JRuby instances, since they need to have fresh, clean core classes and a clean runtime.

The worst-case scenario is when we detect that we can't spin up a JRuby instance in the same process, such as if you have shell redirection characters in the command line (e.g. system 'ruby -e blah > /dev/null'). In those cases, we have no choice but to launch an entirely new JRuby process, complete with a new JVM, and you'll be paying the full zero-to-running cost.

If you're able, try to limit how often you spawn "sub-rubies" or use tools like Nailgun or spec-server to reuse a given process for multiple hits.

Do less at startup

This is a difficult tip to follow, since often it's not your code doing so much at startup (and usually it's RubyGems itself). One of the sad truths of JRuby is that because we're based on the JVM, and the JVM takes a while to warm up, code executed early in a process runs a lot slower than code executed later. Add to this the fact that JRuby doesn't JIT Ruby code into JVM bytecode until it's been executed a few times, and you can see why cold performance is not one of JRuby's strong areas.

It may seem like delaying the inevitable, but doing less at startup can have surprisingly good results for your application. If you are able to eliminate most of the heavy processing until an application window starts up or a server starts listening, you may avoid (or spread out) the cold performance hit. Smart use of on-disk caches and better boot-time algorithms can help a lot, like saving a cache of mostly-read-only data rather than reloading and reprocessing it on every boot.

Try using Nailgun

In JRuby 1.3, we officially shipped support for Nailgun. Nailgun is a small library and client-side tool that reuses a single JVM for multiple invocations. With Nailgun, small JRuby command-line invocations can be orders of magnitude faster. Have a look at my article on Nailgun in JRuby 1.3.0 for details.

Nailgun seems like a magic bullet, but unfortunately it does little to help certain common cases like booting RubyGems or starting up Rails (such as when running tests). It also can't help cases where you are causing lots of sub-rubies to be launched. Your best bet is to give it a try and let us know if it helps.

Play around with JVM flags

There's scads of JVM flags that can improve (or degrade) startup time. For a good list of flags you can play with, see my article on my favorite JVM flags, paying special attention to the heap-sizing and GC flags. If you find combinations that really help your application start up faster, let us know!

Help us find bottlenecks

The biggest advances in startup-time performance have come from users like you investigating the load process to see where all that time is going. If you do a little poking around and find that particular libraries take unreasonably long to start (or just do too much at startup) or if you find that startup time seems to be limited by something other than CPU (like if your hard drive starts thrashing madly or your memory bus is being saturated) there may be improvements possible in JRuby or in the libraries and code you're loading. Dig a little...you may be surprised what you find.

Here's a few JRuby flags that might help you investigate:

--sample turns on the JVM's sampling profiler. It's not super accurate, but if there's some egregious bottleneck it should rise to the top.
-J-Xrunhprof:cpu=times turns on the JVM's instrumented profiler, saving profile results to java.hprof.txt. This slows down execution tremendously, but can give you more accurate low-level timings for JRuby and JDK code.
-J-Djruby.debug.loadService.timing=true turns on timing of all requires, showing fairly accurately where boot-time load costs are heaviest.
On Windows, where you may not have a "time" command, pass -b to JRuby (as in 'jruby -b ...') to print out a timing of your command's runtime execution (excluding JVM boot time).

Let us help you!

Sometimes, there's an obvious misconfiguration or bug in JRuby that causes an application to start up unreasonably slowly. If startup time seems drastically different than what you've seen here, there's a good chance something's wrong with your setup. Post your situation to the JRuby mailing list or find us on IRC, and we (along with other JRuby users) will do our best to help you.

Five Reasons to Wait on the iPad

2010-01-28T01:49:00.003-06:00

Like most geeks, I watched with great anticipation as the iPad was unveiled yesterday. In my case, it was from liveblogs monitored on spotty 3G and WiFi access at the Jfokus after-conference event. I'm reasonably impressed with the device, especially the $500 price point on the low end. But after giving it some thought, I have a few good reasons to wait on purchasing one. Here's five I came up with:

Wait for the next generation. Never, ever buy the first generation of a device this new and this complex. Doubly so for anything the Apple hype machine is pimping. First-gen devices are always at least a little bit tweaky, and Apple has proven repeatedly that their first-gen devices are overpriced, underpowered, and replaced by something better in 4-6 months.
Wait for competitors to answer. Sure, there have been other tablets and "slates" announced over the past year, but the iPad has moved the bar. Given the number of competing vendors, the availability of viable tablet options like Windows 7 and Android (or Chrome OS), and the ever-present iControl and iLock-in associated with all things Apple, you can bet there's going to be a bunch of competitive options during 2010.
Wait for everyone else to buy it. Yeah, this one is painful, but think about all the suckers that bought the iPhone right when it came out. You'll spend a couple months as a temporary luddite, ridiculed by your peers. And then you'll get a better device for cheaper and have the last laugh. I mean, isn't half the joy of the iPad in having the bigger iPenis? You can hold off for a while.
Wait for crackers to bust it wide open. Nobody's happy that the iPhone is a closed platform, nor are they happy that the App Store is so sketchy at approving applications. So why not wait and see what the busy iPhone hackers can do with an iPad before diving it? Chances are it will be a far better laptop replacement once they get ahold of it.
Wait because you don't actually need it. It can't replace your phone. It can't replace your laptop. It can't replace your 50" LCD TV. Seriously now...what do you need it for?

Me, I'm on the fence. I can afford it, but then I probably wouldn't be able to afford something else. And I'm a programmer...I want to be able to put my own apps on the device (or give apps to friends) without dealing with the App Store gatekeeper. In my ideal world, it would be Apple's hardware and design sensibility combined with Android's open platform and familiar runtime. Anything even close to that would outshine the iPad for me.

Update: One last bit of anecdotal evidence. Before the iPhone, I had held off buying anything other than the crappiest, cheapest phones, the lowest-end music devices (yep, I had the "pack of gum" Shuffle), the most basic digital cameras, and no PDA. I was waiting for something that would allow me to get rid of all devices at once. iPhone obviously did that, as a music/media player, internet device, PDA, phone, and camera all in one. iPad takes two of those features away (phone and camera) and only adds a larger screen with the potential for large-form apps.

Busy Week: JRuby with Android, Maven, Rake, C exts, and More!

2010-01-08T14:20:00.003-06:00

(How long has it been since I last blogged? Answer: a long time. I'll try to blog more frequently now that the (TOTALLY INSANE) fall conference season is over. Remind me to never have three conferences in a month ever again.)

Hello friends! It's been a busy week! I thought I'd show some of what's happening with JRuby, so you know we're still here plugging away.

Android Update

Earlier this week I helped Jan Berkel get the Ruboto IRB application working on a more recent JRuby build. There were a few minor tweaks needed, and Jan was interested in helping out, so I made the tweaks and added him as a committer.

The Ruboto IRB project repository is here: http://github.com/headius/ruboto-irb

Lately my interest in JRuby on Android has increased. I realized very recently that JRuby is just about the only mainstream JVM languge that can create *new* code while running on the device, which opens up a whole host of possibilities. It is not possible to implement an interactive shell in Groovy or Scala or Clojure, for example, since they all must first compile code to JVM bytecode, and JVM bytecode can't run on the Dalvik VM directly.

As I played with Ruboto IRB this week, I discovered something even more exciting: almost all the bugs that prevented JRuby from working well in early Android releases have been solved! Specifically, the inability to reflect any android.* classes seems to be fixed in both Android 1.6 and Android 2.0.x. Why is this so cool? It's cool because with Ruboto IRB you can interactively play with almost any Android API:

This example accesses the Activity (the IRB class in Ruboto IRB), constructs a new WebView, loads some HTML into it, and (not shown) replaces the content WebView. Interactively. On the device. Awesome.

I am trusting you to not go mad with power, and to use Ruboto only for good.

RubyGems/Maven Integration

JRuby has always been a Ruby implementation first, and as a result we've often neglected Java platform integration. But with Ruby 1.8.7 compatibility very solid and Ruby 1.9 work under way, we've started to turn our attentions toward Java again.

One of the key areas for language integration is tool support. And for Java developers, tool support invariably involves Maven.

About a year ago, I started a little project to turn Maven artifacts into RubyGems. The mapping was straightforward: both have dependencies, a name, a description, a unique identifier, version numbers, and a standard file format for describing a given package. The maven_gem project was my proof-of-concept that it's possible to merge the two worlds.

The maven_gem repository is here: http://github.com/jruby/maven_gem

The project sat mostly dormant until I circled back to it this fall. But once I got the guys from Sonatype involved (purveyors of the Nexus Maven server) things really got interesting.

Thanks to Tamas Cservenak from Sonatype, we now have something once thought impossible: full RubyGems integration of all the Maven artifacts in the world!

The Nexus RubyGems support repository is here: http://github.com/cstamas/nexus-ruby-support/

~/projects/jruby ➔ gem install com.lowagie.itext-rtf
Successfully installed bouncycastle.bcmail-jdk14-138-java
Successfully installed bouncycastle.bcprov-jdk14-138-java
Successfully installed bouncycastle.bctsp-jdk14-138-java
Successfully installed com.lowagie.itext-rtf-2.1.7-java
4 gems installed
Installing ri documentation for bouncycastle.bcmail-jdk14-138-java...
Installing ri documentation for bouncycastle.bcprov-jdk14-138-java...
Installing ri documentation for bouncycastle.bctsp-jdk14-138-java...
Installing ri documentation for com.lowagie.itext-rtf-2.1.7-java...
Installing RDoc documentation for bouncycastle.bcmail-jdk14-138-java...
Installing RDoc documentation for bouncycastle.bcprov-jdk14-138-java...
Installing RDoc documentation for bouncycastle.bctsp-jdk14-138-java...
Installing RDoc documentation for com.lowagie.itext-rtf-2.1.7-java...

Here's a an example of a full session, where I additionally install Rhino and then use it from IRB: http://gist.github.com/271764

I should reiterate what this means for JRuby users: as of JRuby 1.5, you'll be able to install or use in dependencies any Java library ever published to the public Maven repository. In short, you now have an additional 60000-some libraries at your fingertips. Awesome, no?

There are some remaining issues to work through, like the fact that RubyGems itself chokes on that many gems when generating indexes, but there's a test server up and working. We'll get all the issues resolved by the time we release JRuby 1.5 RC1. Jump on the JRuby mailing list if you're like to discuss this new capability.

Rake? Ant? Why not Both?

Another item on the integration front is Tom Enebo's work on providing seamless two-way integration of Rake (Ruby's build tool) and Ant. There's three aspects to Rake/Ant integration:

Using Rake tasks from Ant and Ant tasks from Rake
Calling Rake targets from Ant and Ant targets from Rake
Mixed build systems with part in Rake and part in Ant

Tom's work so far has focused on the first bullet, but the other two will come along as well. You'll be able to translate your Ant script to Rake and have it work without modification, call out to Rake from Ant, include a Rakefile in Ant and use its targets, and so on.

Here's an example of pulling in a build.xml file, depending on its targets, and calling Ant tasks from Rake:

require 'ant'

ant.load 'build.xml' # defines tasks :mkdir + :setup in ant!

task :compile => [:mkdir, :setup] do
  ant.javac(:srcdir => src_dir, :destdir => "./build/classes") do
    classpath :refid => "project.class.path" 
  end
end

Ideally we'll cover all possible integration scenarios, and finally blur the lines between Rake and Ant. And we'll be able to move JRuby's build into Rake, which will make all of us very happy. Look forward to this in JRuby 1.5 as well.

The C Extension Question

One aspect of Ruby itself that we've punted on is support for Ruby's C extension API. We haven't done that to spite C extension users or developers--far from it...we'd love to flip a switch and have C extensions magically work. The problem is that Ruby's C API provides too-invasive access to the internals of objects, and there's just no way we can support that sort of access without incurring a massive overhead (because we'd have to copy stuff back and forth for every operation).

But there's another possibility we've started to explore: supporting only a "safe" subset of the Ruby C API, and providing a few additional functions to replace the "unsafe" invasive bits. To that end, we (as in Wayne Meissner, creator of the FFI implementations for JRuby and Ruby) have cleaned up and published a C extension API shim library to Github.

The JRuby C extension shim library repository is here: http://github.com/wmeissner/jruby-cext

Last night I spent some time getting this building and working on OS X, and to my surprise I was able to get a (very) trivial C extension to work!

Here's the part in C. Note that this is identical to what you'd write if you were implementing the extension for Ruby:

#include <stdio.h>
#include <ruby.h>
 
VALUE HelloModule = Qnil;
VALUE HelloClass = Qnil;
 
VALUE say_hello(VALUE self, VALUE hello);
VALUE get_hello(VALUE self);
 
void
Init_defineclass()
{
    HelloClass = rb_define_class("Hello", rb_cObject);
    rb_define_method(HelloClass, "get_hello", get_hello, 0);
    rb_define_method(HelloClass, "say_hello", say_hello, 1);
}
 
VALUE
say_hello(VALUE self, VALUE hello)
{
    return Qnil;
}
VALUE
get_hello(VALUE self)
{
    return rb_str_new2("Hello, World");
}

Here's the little snippit of Ruby code that loads and calls it. Note that the ModuleLoader logic would be hidden behind require/load in a final version of the C extension support.

require 'java'
 
m = org.jruby.cext.ModuleLoader.new
m.load(self, "defineclass")
 
h = Hello.new
puts "Hello.new returned #{h.inspect}"
puts "h.get_hello returns #{h.get_hello}"

Among the C API pieces we probably won't ever support are things like RSTRING, RARRAY, RHASH that give direct access to string, array, and hash internals, anything dealing with Ruby's threads or runtime, and so on. Basically the pieces that don't fit well into JNI (the Java VM C API) would not be supported.

It's also worth mentioning that this is really a user-driven venture. If you are interested in a C extension API for JRuby, then you'll need to help us get there. Not only are we plenty busy with Java and Ruby support, we are also not extension authors ourselves. Have a look at the repository, hop on the JRuby dev list, and we'll start collaborating.

JRuby in 2010

There's a lot more coming for JRuby in 2010. We're going to finally let you create "real" Java classes from Ruby code that you can compile against, serialize, annotate, and specify in configuration files. We're going to offer JRuby development support to help prioritize and accelerate bug fixes for commercial users that really need them. We're going to have a beautiful, simple deployment option in Engine Yard Cloud, with fire-and-forget for your JRuby-based applications (including any non-Ruby libraries and code you might use like Java, Scala or Clojure). And we're going to finally publish a JRuby book, with all the walkthroughs, reference material, and troubleshooting tips you'll need to adopt JRuby today.

It's going to be a great year...and a lot of fun.

Missed RubyConf? Attend Qcon's Ruby Track!

2009-10-31T12:48:00.005-05:00

This year, RubyConf reportedly reduced their attendee cap to 250 people (Update: actually 450 people), after hosting a 500 to 600-person conference last year. As you can imagine, this meant a lot of people that wanted to attend were not able to get tickets. To complicate matters, the RubyConf registration site happened to go live during the middle of the night EU time, and by the time most Europeans woke up it was already sold out. What's a Rubyist to do?

Well there's another option. Working with Ryan Slobojan of InfoQ and the organizers of Qcon San Francisco, I'll be hosting a one-day Ruby track the day before RubyConf! Qcon's main conference runs Wednesday through Friday, with my track on Wednesday, November 18th.

And now the REALLY good news! Because we wanted this to be a fallback for folks that could not attend RubyConf, we realized that the full conference fee was simply too high (ranging from $1500 up). So to make it possible for people to attend just the one day Ruby track, you can register with the code "rubywednesday" to get a drastically reduced $350 one-day conference pass. And to sweeten the deal even more, you can pop over to other tracks and attend the keynotes that day. Yes Virginia, there is a Santa Claus!

Those of you attending RubyConf are also welcome to attend this one-day track as well; most of the presentations won't overlap. Here's the lineup, including a special opening presentation by Yukihiro Matsumoto himself!

10:20 - "The Many Facets of Ruby" track opening by me

Ruby has seen a dramatic growth in popularity over the last few years, and there are now many facets to the Ruby story - multiple implementations, game-changing web frameworks, and large-scale use in enterprise solutions. Join us as we explore many aspects of Ruby in today's world.

10:30 - "Why we love Ruby?" by Yukihiro Matsumoto

Why we love Ruby? I have asked myself this question repeatedly. In this presentation, I will disclose my answer as of 2009. The keyword is QWAN.

11:45 - "Basking in the Limelight" by Paul Pagel

Limelight is Ruby on the desktop. Build applications with multiple windows, or just one window. Take control of the desktop, or play nicely with the desktop. Create fun animated games, or productive business apps. Develop rich internet applications, or unwired apps to make you rich. Publish your apps on the internet, or keep them for you and your friends. Do all this, writing nothing but Ruby code, in Limelight.

13:45 - "You've Got Java in my Ruby" by Thomas Enebo

JRuby is now well-established as a popular alternative implementation of Ruby.But why would you want to use it? How can it help you? This talk will detail some of the more interesting differences and advantages of using JRuby. Expect to get a better understanding of how Java makes a faster and more stable Ruby as well as how you can leverage Java features as an extra set of tools for your project.

15:00 - "Rails 3" by Yehuda Katz

I don't have a full abstract for this, but it's what you might expect...an overview of why Rails 3 is really "growing up" the framework, making it more clearly componentized and easier to adapt to more complicated (dare I say "enterprise") applications in the future. In working with Yehuda I know he's also paid special attention to performance.. Rails 3 is going to be excellent.

16:30 - "Rails in the Large: How Agility Allows Us to Build One of the World's Biggest Rails Apps" by Neal Ford

While others have been debating whether Rails can scale to enterprise levels, we've been demonstrating it. ThoughtWorks is running one of the largest Rails projects in the world, for an Enterprise. This session discusses tactics, techniques, best practices, and other things we've learned from scaling rails development. I discuss infrastructure, testing, messaging, optimization, performance, and the results of lots of lessons learned, including killer rock-scissors-paper tricks to help you avoid babysitting the view tests!

I think it's going to an outstanding track, and I'd probably pay the $350 just to see Matz speak if I knew I wouldn't get another chance for a long time. Limelight looks like an outstanding way to build rich client apps using JRuby, and of course you know I like JRuby. Tom will show some of the latest advancements we've done in JRuby, including the ability to produce "real" Java classes at runtime for integrating with Java frameworks better. Rails 3 I've described above, but you really have to see Yehuda present it himself. And of course everyone would like to know how to scale Rails to the moon...Neal knows his stuff.

Here's the track page: The Many Facets of Ruby

And the full conference schedule for Wednesday.

And finally, the registration page (don't forget to use code "rubywednesday").

I really hope to see you all there, so you can get your Ruby conference fix this fall. Tell your friends and let me know if you have any questions!

Announcing JRubyConf 2009

2009-09-11T11:58:00.004-05:00

Good news everybody! We're finally going to have JRubyConf!

After over 3 years of heavy development, dozens of deployments and hundreds of users, it's time for a conference for JRuby users. We've talked about it on the JRuby mailing lists, polled users, and seen other JRubyists do the same. And the chorus slowly grew: you all wanted JRubyConf more and more.

Now, thanks to Engine Yard, who's producing the conference, and to sponsors EdgeCase and ThoughtWorks, we'll host the first ever JRubyConf the day after RubyConf, on Sunday November 22nd. This should allow folks attending RubyConf to also attend JRubyConf and not have to schedule a separate trip.

So here's the details:

What: JRubyConf 2009 (the first ever!)
When: Sunday, November 22nd; the day after RubyConf 2009
Where: Same hotel as RubyConf, the Embassy Suites at San Francisco Airport. You don't even have to switch locations!
Why: Because we love you!
Price: FREE!!!

We've been putting together a wide range of talks and speakers, from the JRuby core team members to real-world users; from the latest on deploying Rails to gaming and desktop development. It's going to be a fast-paced event with something for everyone, and best of all, it's FREE!

Space is limited for the event, and you will have to register separately from RubyConf to secure your seat (but you don't have to go to RubyConf to attend JRubyConf!). Check out www.jrubyconf.com for information, registration, and so on, and be quick about it!

See you at JRubyConf 2009!