Wednesday, January 19, 2011

JRuby on Rails on Amazon Elastic Beanstalk

Amazon this week announced Elastic Beanstalk, a managed Apache Tomcat service for AWS. Naturally, I had to try JRuby on it.


First, the bad:
  • AWSEB is really slow to deploy stuff. Several times it got "stuck" and I waited for more than 30 minutes for it to recover. It did not appear to be an app issue, since the app came up just fine.
  • The default instance size is t1.micro. I was able to get a Rails app to boot there, but it's a very underpowered size.
  • It appears to start up JVMs with 256MB of memory max and 64MB of permgen. For a larger app, or one with many Rails instances, that might not be enough. For a "threadsafe" Rails app, though, it's plenty.
  • The default EC2 load balancer for the new Beanstalk instance is set to ping the "/" URL. If you don't rig up a / route in your Rails app (like I forgot to do) the app will come up for a few minutes and immediately get taken out.
And the good news: it works great once you get past the hassles! Here's the process that worked for my app (assuming app is already build and ready for deploy).

Preparing the app:
  • Ensure jruby-openssl is in Gemfile. Rails seems to want it in production mode.
  • Edit config/environments/production.rb to enable threadsafe mode.
  • `warble`
Preparing Elastic Beanstalk:
  • Create a new instance, specifying the .war file Warbler created above as the app to deploy
  • There is no step two
Once the instance has been prepared, you may want to resize it to something larger than t1.micro if it's meant to be a real app...but it should boot ok.



Have fun!

Wednesday, January 05, 2011

Representing Non-Unicode Text on the JVM

JRuby is an implementation of Ruby, and in order to achieve the high level of compatibility we boast we've had to put in some extra work. Probably the biggest area is in management of String data.

Strings in Ruby 1.8

Ruby 1.8 did not differentiate between strings of bytes or strings of characters. A String was just an array of bytes, and the representation of those bytes was dependent on how you used them. You could do regular expression matches against non-text binary sequences (used by some to parse binary formats like PNG). You could treat them as UTF-8 text by setting global encoding variables to assume UTF-8 in literal strings. And you could use the same methods for dealing with strings of bytes you would with strings of characters...split, gsub, and so on. The lack of a character abstraction was painful, but the ability to do character-string operations against byte-strings was frequently useful.

In order to support all this, we were forced in JRuby to also represent strings as byte[]. This was not an easy decision. Java's strings are all UTF-16 internally. By moving to byte[]-based strings, we lost many benefits of being on the JVM, like built-in regular expression support, seamless passing of strings to Java methods, and easy interaction with any Java libraries that accept, return, or manipulate Java strings. We eventually had to implement our own regexp engines (or the byte[]-to-char[]-to-byte[] overhead would kill us) and calls from Ruby to Java still pay a cost to pass Strings.

But we got a lot out of it too. We would not have been able to represent binary content easily with a char[]-based string, since it would either get garbled (when Java tried to decode it) or we'd have to force the data into only the lower byte of each char, doubling the size of all strings in memory. We have some of the fastest String IO capabilities of any JVM language, since we never have to decode text. And most importantly, we've achieved an incredibly high level of compatibility with C Ruby that would have been difficult or impossible forcing String data into char[].

There's also another major benefit: we can support Ruby 1.9's "multilingualization".

Strings in Ruby 1.9

Ruby 1.9 still represents all string data as byte[] internally, but it adds an additional field to all strings: Encoding (there's also "code range", but it's merely an internal optimization).

The biggest problem with encoded text in MRI was that you never knew what a string's encoding was supposed to be; the String object itself only aggregated byte[], and if you ever ended up with mixed encodings in a given app, you'd be in trouble. Rails even introduced its own "Chars" type specifically to deal with the lack of encoding support.

In Ruby 1.9, however, Strings know their own encoding. Strings can be forced to a specific encoding or transcoded to another. IO streams are aware of (and configurable for) external and internal encodings, and there's also default external and internal encodings. And you can still deal with raw binary data in the same structure and with the same String-manipulating features. For a full discussion of encoding support in Ruby 1.9, see Yehuda Katz's excellent post on Ruby 1.9 Encodings: A Primer.

As part of JRuby 1.6, we've been working on getting much closer to 100% compatible with Ruby 1.9. Of course this has meant working on encoding support. Luckily, we had a hacking machine some years ago in Marcin Mielzynski, who implemented not only our encoding-agnostic regexp engine (a port of Oniguruma from C), but also our byte[]-based String logic and almost all of our Encoding support. The remaining work has trickled in over the subsequent years, leading up the the last few months of heavy activity on 1.9 support.

How It Works

You might find it interesting to know how all this works, since JRuby is a JVM-based language where Strings are supposed to be UTF-16 always.

First off, String, implemented by the RubyString class. RubyString aggregates an Encoding and an array of bytes, using a structure we call ByteList. ByteList represents an arbitrary array of bytes, a view into them, and an encoding. All operations against a String operate within RubyString's code against ByteLists.

IO streams, implemented by RubyIO (and subclasses) and ChannelStream/ChannelDescriptor, accept and return ByteList instances. ByteList is the text/binary currency of JRuby...our java.lang.String.

Regexp is implemented in RubyRegexp using Joni, our Java port of the Oniguruma regular expression library. Oniguruma accepts byte arrays and uses encoding-specific information at match time to know what constitutes a character in that byte array. It is the only regular expression engine on the JVM capable of dealing with encodings other than UTF-16.

The JRuby parser also ties into encoding, using it on a per-file basis to know how to encode each literal string it encounters. Literal strings in the AST are held in StrNode, which aggregates a ByteList and constructs new String instances from it.

The compiler is an interesting case. Ideally we would like all literal strings to still go into the class file's constant pool, so that they can be loaded quickly and live as part of the class metadata. In order to do this, the byte[]-based string content is forced into a char[], which is forced into a java.lang.String that goes in the constant pool. At load time, we unpack the bytes and return them to a ByteList that knows their actual encoding. Dare I claim that JRuby is the first JVM language to allow representing literal strings in encodings other than UTF-16?

What It Means For You

At the end of the day, how are you affected by all this? How is your life improved?

If you are a Ruby user, you can count on JRuby having solid support for Ruby 1.9's "M17N" strings. That may not be complete for JRuby 1.6, but we intend to take it as far as possible. JRuby 1.6 *will* have the lion's share of M17N in-place and working.

If you are a JVM user...JRuby represents the *only* way you can deal with arbitrarily-encoded text without converting it to UTF-16 Unicode. At a minimum, this means JRuby has the potential to deal with raw wire data much more efficiently than libraries that have to up-convert to UTF-16 and downconvert back to UTF-8. It may also mean encodings without complete representation in Unicode (like Japanese "emoji" characters) can *only* be losslessly processed using JRuby, since forcing them into UTF-16 would either destroy them or mangle characters. And of course no other JVM language provides JRuby's capabilities for using String-like operations against arbitrary binary data. That's gotta be worth something!

I want to also take this opportunity to again thank Marcin for his work on JRuby in the past; Tom Enebo for his suffering through encoding-related parser work the past few weeks; Yukihiro "Matz" Matsumoto for adding encoding support to Ruby; and all JRuby committers and contributors who have helped us sort out M17N for JRuby 1.6.

Tuesday, January 04, 2011

Flat and Graph Profiles for JRuby 1.6

Sometimes it's the little things that make all the difference in the world.

For a long time, we've extolled the virtues of the amazing JVM tool ecosystem. There's literally dozens of profiling, debugging, and monitoring tools, making JRuby perhaps the best Ruby tool ecosystem you can get. But there's a surprising lack of tools for command-line use, and that's an area many Rubyists take for granted.

To help improve the situation, we recently got the ruby-debug maintainers to ship our JRuby version, so JRuby has easy-to-use command-line Ruby debugging support. You can simply "gem install ruby-debug" now, so we'll stop shipping it in JRuby 1.6.

We also shipped a basic "flat" instrumented profiler for JRuby 1.5.6. It's almost shocking how few command-line profiling tools there are available for JVM users; most require you to boot up a GUI and click a bunch of buttons to get any information at all. Even when there are tools for profiling, they're often bad at reporting results for non-Java languages like JRuby. So we decided to whip out a --profile flag that gives you a basic, flat, instrumented profile of your code.

To continue this trend, we enlisted the help of Dan Lucraft of the RedCar project to expand our profiler to include "graph" profiling results. Dan previously implemented JRuby support for the "ruby-prof" project, a native extension to C Ruby, in the form of "jruby-prof" (which you can install and use today on any recent JRuby release). He was a natural to work on the built-in profiling support.

For the uninitiated, "flat" profiles just show how much time each method body takes, possibly with downstream aggregate times and total aggregate times. This is what you usually get from built-in command-line profilers like the "hprof" profiler that ships with Hotspot/OpenJDK. Here's a "flat" profile for a simple piece of code.

~/projects/jruby ➔ jruby --profile.flat -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
Total time: 0.99

total self children calls method
----------------------------------------------------------------
0.99 0.00 0.99 1 Object#foo
0.99 0.08 0.90 1 Fixnum#times
0.70 0.70 0.00 100000 Bignum#to_s
0.21 0.21 0.00 100000 Fixnum#**
0.00 0.00 0.00 145 Class#inherited
0.00 0.00 0.00 1 Module#method_added

A "graph" profile shows the top N call stacks from your application's run, breaking them down by how much time is spent in each method. It gives you a more complete picture of where time is being spent while running your application. Here's a "graph" profile (abbreviated) for the same code.

~/projects/jruby ➔ jruby --profile.graph -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
%total %self total self children calls name
---------------------------------------------------------------------------------------------------------
100% 0% 1.00 0.00 1.00 0 (top)
1.00 0.00 1.00 1/1 Object#foo
0.00 0.00 0.00 145/145 Class#inherited
0.00 0.00 0.00 1/1 Module#method_added
---------------------------------------------------------------------------------------------------------
1.00 0.00 1.00 1/1 (top)
99% 0% 1.00 0.00 1.00 1 Object#foo
1.00 0.09 0.91 1/1 Fixnum#times
---------------------------------------------------------------------------------------------------------
1.00 0.09 0.91 1/1 Object#foo
99% 8% 1.00 0.09 0.91 1 Fixnum#times
0.70 0.70 0.00 100000/100000 Bignum#to_s
0.21 0.21 0.00 100000/100000 Fixnum#**
---------------------------------------------------------------------------------------------------------
0.70 0.70 0.00 100000/100000 Fixnum#times
69% 69% 0.70 0.70 0.00 100000 Bignum#to_s
---------------------------------------------------------------------------------------------------------
0.21 0.21 0.00 100000/100000 Fixnum#times
21% 21% 0.21 0.21 0.00 100000 Fixnum#**
---------------------------------------------------------------------------------------------------------
0.00 0.00 0.00 145/145 (top)
0% 0% 0.00 0.00 0.00 145 Class#inherited
---------------------------------------------------------------------------------------------------------
0.00 0.00 0.00 1/1 (top)
0% 0% 0.00 0.00 0.00 1 Module#method_added

As you can see, you get a much better picture of why certain methods are taking up time and what component calls are contributing most to that time.

We haven't settled on the final command-line flags, but look for the new graph profiling (and the cleaned-up flat profile) to ship with JRuby 1.6 (real soon now!)

Thursday, December 23, 2010

Improved JRuby Startup by Deferring Gem Plugins

Another present for you JRubyists out there!

JRuby has had notoriously bad startup times. Not as bad as, say, IronRuby (sorry guys!), but definitely a big fat hit every time you need to run some Ruby code from the command line. Some of this overhead was related to JRuby, and we've steadily worked to improve that over the years. Some of it is due to the JVM, most commonly due to running on the "server" Hotspot VM or another JVM that does not have an interpreter (both of which start up considerably slower than Hotspot/OpenJDK's "client" mode). I've blogged tips and tricks for JRuby startup before, and these mostly apply to vanilla JRuby startup performance.

However, a large part of the overhead was not specifically due to JRuby or the JVM, but to RubyGems. RubyGems in version 1.3 added support for "plugins", whereby gems could include a specially-named file to extend the functionality of RubyGems itself. Most of these plugins added command-line tools like "gem push" for pushing a new gem to gemcutter.org (now built-in for pushing to rubygems.org). Unfortunately, the feature was originally added by having RubyGems do a full scan of all installed gems on every startup. If you only had a few gems, this was a minor problem. If you had more than a few, it became a big fat O(N) problem, where each of those N could be arbitrarily complex in themselves.

The good news is that it looks like my proposed change – making plugin scanning happen *only* when using the "gem" command – appears likely to be approved for RubyGems 1.4, due out reasonably soon.

Here's the patch and the impact to RubyGems startup times are below. The first two times are without the patch, with the first time against a "cold" filesystem. The final time is with the patch in place. In all cases, it's against my local JRuby working copy, which has around 500 gems installed.

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
17.09

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
6.959

~/projects/jruby ➔ git stash pop
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: lib/ruby/site_ruby/1.8/rubygems.rb
# modified: lib/ruby/site_ruby/1.8/rubygems/gem_runner.rb
...

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
0.481


It's truly a shocking difference, and it's easy to see why JRuby (plus RubyGems) has had such a bad startup-time reputation.

I've already made this change locally to JRuby's copy of RubyGems, which should help any users working against JRuby master. The change will almost certainly ship in JRuby 1.6, with RCs showing up in the next couple weeks. So with this change and my JRuby startup tips, we're on the road to a much more pleasant JRuby experience.

Happy Hacking!

JRuby Finally Installs ruby-debug Gem

This should be a great Christmas present for many of you.

After over three years, the "ruby-debug" gem finally installs properly on JRuby.

~/projects/jruby ➔ gem install ruby-debug
Successfully installed ruby-debug-base-0.10.4-java
Successfully installed ruby-debug-0.10.4
2 gems installed


Back in 2007, folks working on NetBeans, Eclipse, and IntelliJ support for Ruby came together to build a new version of the ruby-debug backend that would work on JRuby. They shared the effort, we JRuby guys added features they needed to do a clean port of the ruby-debug C code, and the ultimate result was a ruby-debug-base gem that isolated the platform-specific bits.

For whatever reason, the JRuby version of ruby-debug-base never got pushed as a real, live "-java" gem. This meant that you had to download ruby-debug-base-VERSION-java.gem yourself to get ruby-debug to install. To make this process easier, we even shipped ruby-debug and ruby-debug-base preinstalled in JRuby 1.5.

Unfortunately, this was only a partial answer. Many libraries and applications want to install all their dependencies clean. If one of those dependencies was ruby-debug, it would fail to install. Rails even includes special JRuby-specific lines in its default Bundler Gemfile to exclude ruby-debug when bundling on JRuby.

All that nonsense ends today. Rocky Bernstein, one of the maintainers of the ruby-debug gem, agreed to push our ruby-debug-base to the canonical rubygems.org repository. As a result, ruby-debug now installs properly on JRuby. It only took three years to get that gem pushed (by nobody's fault...I think everyone expected everyone else to follow through on it).

Merry Christmas, Happy Chanukah, Joyous Kwanza, and enjoy your holiday season!

Quick Thoughts on Oracle/Apache and the Java TCK

This was going to be a reply to a friend on Twitter, when I realized it would be several tweets and I might as well put them in one place.

Grant Michaels (@grantmichaels) sent me this link to the October JCP (Java Community Process) EC (Expert Committee) meeting notes. The Apache/TCK (Technology Compatibility Kit) issue was discussed at length.

For those of you in the dark, Apache recently resigned from the JCP because of the ongoing dispute over their "Harmony" OSS (Open-Source Software) Java implementation's inability to get an unencumbered license to the Java TCK. Passing the TCK is a requirement for an implementation to officially be accepted as "Java".

I had heard about this problem from a distance, but only recently started to understand its complexity. The TCK includes FOU (Field of Use) clauses preventing TCK-tested implementations other than OpenJDK from being released as open-source. Only implementations "largely" based on OpenJDK (Open Java Development Kit, Sun's GPLed "Hotspot" VM and class libraries) are allowed to get around this requirement. Apache's Harmony, being entirely independent and Apache-licensed, does not qualify.

If that were all there is to it, Apache would not have any real grounds for complaining. But the JSPA (Java Specification Participation Agreement, PDF) requires only unencumbered specifications, reference implementations, and test kits be submitted to the JCP. This sets the stage for an ugly licensing battle that has stymied Java's progress for over three years.

I'm not going to do a tl;dr post on this like I did with Oracle/Google. Instead, just a few quick thoughts as I read through the notes.

Overview

The first thing you will notice is that this isn't a simple cut-and-dried issue. The meeting notes express Oracle's position as outwardly fearful of Harmony leading to many downstream forks, with no recourse for asserting they fulfill the requirements of the Java specification. There seems to be an implied stab at Android here, which uses Harmony's class libraries atop the Dalvik VM to implement a substantial portion (but not all) of what looks like Java SE 5. Oracle states this decision is final; Apache will not be granted an unencumbered TCK license. Oracle is not known for changing their minds.

Several EC members, including Doug Lea and Josh Bloch, point out that it's fairly clear the encumbered TCK violates the JSPA's openness clauses. Oracle refuses to comment on this "legal" matter. Doug suggests that EC members might be able to vote to move Java forward with a clearer conscience if the JSPA were amended to make the encumbered TCK "legal".

Another point brought up by several members is the frustration that they have to deal with licensing at all. They recall a golden age of the JCP where it actually voted on technical matters rather than arguing over licensing.

IBM declares they are unhappy with this decision, but even more unhappy that the Java platform has stagnated because of it for so long. IBM would eventually go on to vote "yes" to the disputed Java 7 JSR, even in the presence of the apparent JSPA violation.

Apache representative and longtime Harmony advocate Geir Magnusson also weighed in. He argued that the health of the platform would only be bolstered by allowing for many independent open-source implementations, and damaged by disallowing them. When asking for a clarification of why OpenJDK gets a free pass, Adam Messinger (Oracle) stated that he didn't want to answer a legal question, but that OpenJDK's GPL (Gnu Public License) requires reciprocity from downstream forks, reducing the damage and confusion they might cause if released publicly without full spec compliance (I'm paraphrasing based on the notes here).

Toward the end of the licensing discussion, Adam again called for all memory organizations to participate in OpenJDK. It's fairly clear from these notes and from previous announcements and discussions that Oracle intends for OpenJDK to be the "one true OSS Java", and for all comers to contribute to it. They even managed to get Apple and IBM, longtime GPL foes, to join the family. Apache doesn't do GPL, and so Apache will not contribute to or base projects off OpenJDK.

No Right Answer

Now we enter entirely into the realm of my own opinions.

At a glance, I hate Oracle for disallowing free use of the TCK. Nobody should be disallowed from implementing their own OSS Java. Harmony is a promising project that would have a promising future if the cloud of licensing, patent protection, and "compliance" could be lifted. It seems that's nearly impossible now.

I also hate to see a good project like Harmony "die" or be stuck in legal limbo. I'm sure dozens of developers have poured their hearts into Harmony, and they deserve to see it thrive.

On the other hand, I applaud Oracle for so vigorously promoting OpenJDK. OpenJDK is certainly a more mature JVM and class library than Harmony, having been Sun's official Java implementation for years and years. Bringing IBM and Apple into OpenJDK will help ensure it moves forward on all platforms of note. If you look only at OpenJDK, OSS Java is stronger now than it ever has been.

Oracle may have a point with the forking concerns. Because Apache's license is so open that anyone can create and release binary-only forks, it would in theory be possible to speckle the Java landscape with Harmony derivatives that are incompatible in subtle ways. Possible, but perhaps unlikely.

Ultimately, I feel sad about the direction Oracle has taken, but I'm bolstered by hopes that OpenJDK is going to really thrive and grow, and as a result "Java" will continue to see widespread use and adoption across a variety of platforms.

Other Ways Out

Now I go into pure speculation-mode.

A key problem with the TCK restriction is the fact that the Java specification provides patent grants only to compliant implementations. In other words, if you don't want an Oracle/Google-esque lawsuit once you have billions in the bank, you need to be compliant with the one true TCK. In order to be compliant with the TCK, you can't release your implementation as open source. The patent grant is obviously intended to ensure that third-party implementations are "really" Java.

There's a bit of a chicken-and-egg thing here. What if Harmony could pass the TCK? In theory, they may be at that point right now. Does the ability to pass the TCK but the inability to run it without tainting mean they're Java or not? If an unrelated third-party forked Harmony into "Barmony", acquired a TCK license, and proved that it passed...would that mean Harmony could be considered compliant without ever having run the TCK?

What if, as in Android, Harmony simply moved forward without claiming they were compliant? Oracle could eventually club them to death with patent bats. Perhaps such a legal battle would force the legal remifications of the JSPA violation to be addressed in court? Could Oracle survive a legal test of their violation in attempting to back up a patent suit?

What It Means For You

The ultimate question is how this affects Java developers today and going forward.

By most estimates, 99% of the worlds Java developers run on one of the standard Java implementations from Oracle, IBM, and other licensees. A massive number of them run atop Hotspot, either in an old closed-source Java 5/6 form or in an OpenJDK 6/7 form. "Nobody" runs Harmony, and so few if any day-to-day Java developers will be affected. That doesn't excuse the situation, but it does soften the actual damage caused.

Android is another peculiar case. It would be difficult for it to be tested compliant, even if there weren't FOU restrictions on doing so. It uses Harmony libraries but Dalvik VM. It is also a massive force in mobile development now, and killing it would likely put the final nail in mobile Java's coffin. Oracle has to know this. Oracle also has to know that killing Android would hand the mobile keys over to Apple and Microsoft forever. Could Android switch to using OpenJDK-based class libraries? Would that qualify it as being "largely" based on OpenJDK (noting that the class libraries are the vast majority of the code in OpenJDK)? Oracle/Google is likely to be stuck in court for a long time, while Android continues to expand into televisions and tablets along with telephones.

How about projects that build atop Java, like JRuby? Perhaps even higher a percentage of JRuby users are already running atop OpenJDK or an OpenJDK derivative like IcedTea or SoyLatte. Oracle pushing OpenJDK will only benefit those users. Ruboto (JRuby on Android) will follow whatever path Android itself ends up following, and nobody can see that future yet...but it seems unlikely Ruboto will ever die since it's unlikely Android will ever die.

In closing...I encourage everyone to read the EC notes and gather as much information as they can before claiming this is now the final "death" of Java. I also encourage everyone to contribute thoughts, clarifications, and speculation in the comments here.

Have a happy holiday!

Thursday, September 23, 2010

Predator and Prey

(This is a repost of an article I wrote in 2004, which I stumbled upon this evening and thought worthy of a reprint. Feel free to rip it up and offer your own commentary. I think it is still 100% valid.)

I came up with the most compelling idea for a Disney-style film the other day. (Ok, perhaps not the most compelling idea, but certainly a fair shot at one)

Over the years I've heard a number of biologists (ecologists, environmentalists, what have you) comment on (as in expound endlessly upon) something called the "Bambi Syndrome." Simply put, the "Bambi Syndrome" is brought about by cutesy, utopian images of nature, where only unexpected, amorphous entities (usually accompanied by menacing percussion or something equally non-musical) can embody "evil"; it is a view that, in all its splendor and glory, "nature" is "good," while "man" is "bad." The parallel between this viewpoint and several (all?) nature-based Disney films is apparent (although it should be said that Disney is far from being the only perpetrator of "Bambiism").

So then, you ask, if nature isn't "good", then what is it? Evil and good are purely human constructions. Truth be told, nothing that exists is innately "good" or "evil". These concepts exist only in the eye of the beholder: to the prey, the successful predator is evil; to the predator, the successful prey is evil.

It could then be considered a great disservice to continue teaching these false ideals to our children, no? This has been my opinion, and I have tried to take an approach with my own son of presenting these facts of nature in as unbiased a way as possible--whence springs the compelling idea.

Take a typical Disney movie; its clear definition of "good" and "evil" and its even clearer illustration of which roles fall into which category. This movie would begin the same. Also typically, it would be based in nature, perhaps at a very low stratus of the animal kingdom. Predator and prey would be represented by species A (the "good" prey) and B (the "evil" predator). A typical scene ensues, a contest between good and evil, predator and prey. The predator's evil nature is clearly illustrated here, but atypically, the predator wins.

Just as people in the audience are questioning their faith in Hollywood, we move up one stratum. The evil predator, returning home with the spoils of war, becomes a gentle, caring mother. She was not simply an "evil" aggressor, bent on death and destruction, but a doting, protective mother, expending her own effort, at risk of her life, to care for her childen. In this way, stratum after stratum, "evil" becomes "good", and the elaborate network that makes up our natural system becomes more recognizable for the purity, neutrality, simplicity of its form.

Finally, as you would expect in such a movie, we would arrive at the most prolific of the Great Apes: man. Illustrating that all kingdoms on earth are becoming man's prey, with as much tree-hugging, granola-chomping tripe as possible to make sure we, the lords of creation, masters of destiny, killers of all, Shiva to nature's Brahma , are shown--incontrovertably--as the only pure "evil" on earth, the movie careens ever faster toward some measure of certainty: "Ahh, now I understand the film's message."

But man is just another spoke in the wheel. We can easily flip the coin, showing mothers feeding, defending children, innocents preyed upon by murderers, hunters taking prey not for food, but for the feeding of other hungers. We do what we do not out of pure evil, but because it is our capacity to do so to further our own species, further our goals, perpetuate. But we also have a capacity no other species possesses: the ability to create our own destinies. The only true evil we encounter in a world where we nearly reign supreme is ourselves. We daily pit our most animal desires--acqusition of resources and destruction of usurpers--against our knowledge that such desires run rampant will complicate our path through history, perhaps even terminating it. Can such a machine be affected by the changing opinions of a few small components? That is the question we leave for the viewers.

The challenge in such a film would almost certainly be not overplaying the hand. No evil must ever appear to be of any different motivation than its antithesis; and man must, in the end, appear as the most schizophrenic creature on Earth. Our "evil" predatory instincts must be tempered by the "good" effects of our fear of intimate and ultimate mortality for us to continue indefinitely. In this, man has another trait not found among the animals: Our system balances on our own decisions alone. With the capacity we will soon possess to control nature completely, without fear of predators, we can only undo ourselves. The balance comes from within.

Where will the viewer lie?

I'd hope every kid was as confused as possible by then; and eventually a bit more suspicious of being told what is "good" or "evil".

Sunday, August 15, 2010

My Thoughts on Oracle v Google

As you've probably heard by now, Oracle has decided to file suit against Google, claiming multiple counts of infringement against Java or JVM patents and copyrights they acquired when they assimilated Sun Microsystems this past year. Since I'm unlikely to keep my mouth shut about even trivial matters, something this big obviously requires at least a couple thousand words.

Who Am I?

Any post of this nature really requires an author to identify where they stand, so their unavoidable biases can be taken with the appropriate dosage of salt. Rather than having you dig through my past and learn who and what I am, I'll just lay it out here.

I am a Java developer. I've been a Java developer since 1996 or so, when I got my first University job writing stupid little applets in this new-fangled web language. That job expanded into a web development position, also using Java, and culminated with me joining a few senior developers for a 6-month shared development gig with IBM's then-nascent Pacific Development Center in Vancouver, BC. Since then I've had a string of Java-related jobs...some as a trenches developer, some as a lead, some as "architect", but all of them heavily wrapped up in this thing called Java. And I can't say that I've ever been particularly annoyed with Java as a language or a platform. Perhaps I haven't spent enough time on other runtimes, or perhaps I've got tunnel-vision after being a Java developer for so many years. But I'd like to think that I've become "seasoned" enough as a developer to realize no platform is perfect, and the manifold benefits of the JVM and the Java platform vastly outweigh the troublesome aspects.

I am an open-source developer. In the late 90s, I worked in earnest on my first open-source project: the LiteStep desktop replacement for Windows. At the time, the LiteStep project was a loosely-confederated glob of C code and amateur C hackers. Being a Windows user at the time, I was looking to improve my situation...specifically, I had worked for years on a small application called Hack-It that exposed aspects of the win32 API normally unavailable through standard Windows UI elements, and I was interested in taking that further. LiteStep was not my creation. It had many developers before me and many after, but my small contribution to the project was an almost complete rewrite in amateur-friendly C++ and a decoupling of the core LiteStep "kernel" from the various plugin mechanisms. I was also interviewed for a Wired article on the then-new domain of "skinning" computers, desktops, applications, and so on, though none of my quotes made it into the article. After LiteStep, I fell back into mostly anonymous corporate software development, all still using Java and many open-source technologies, but not much of a visible presence in the OSS world. Then, in 2004 while working as the lead "Java EE Architect" for a multi-million-dollar US government contract, I found JRuby.

I am a JRuby developer. Since 2004 (or really since late 2005, when I started helping out in earnest) I've been partially responsible for turning JRuby from an interesting novelty project into one of the top Ruby implementations. We've become well known as one of the best-performing – if not the best-performing – Ruby implementations, even faced with increasing competition from the young upstarts. We're also increasingly popular (and perhaps the easiest path) for bringing Ruby and its many paradigm-shifting libraries and frameworks (like Rails) to Java and JVM users around the world – without them having to change platforms or leave any of their legacy code behind. Part of my interest in JRuby has been to bring Ruby to the JVM, plain and simple. I like Ruby, I like the Ruby community, and on most days I like the cockiness and enthusiasm of those community members toward trying crazy new things. But another large part of my interest in JRuby is more sinister: I want to prove to naysayers what a great platform the JVM actually is, and perhaps make them think twice about knee-jerk biases they've carried and cultivated for so many years.

You'll notice I refer to JRuby not as "it" or "she" or "he", but as "we". "We've become well known...We're also increasingly popular..." That's not an accident. There's now over five years of my efforts in JRuby, and I consider it to be as much a part of me as I am a part of it. And so because of that, I have a much deeper, emotional investment in the platform upon which JRuby rests.

I am a Java developer. I am an open-source developer. I am a JRuby developer and a Ruby fan.

I am not a lawyer.

The Facts, According to Me

These are the facts as I see them. You're free to disagree with my interpretation of the world, and I encourage you to do so in the comments, on other forums, over email, or to my face (but buy me a beer first).

On Java

The Java platform is big. Really big. You just won't believe how vastly hugely mindbogglingly big it is. And by big, I mean it's everywhere.

There are three mainstream JVMs people know about: JRockit (WebLogic's first and then Oracle's after it acquired them), Hotspot (Which came to Sun through an acquisition and eventually became OpenJDK), and J9 (IBM's own JVM, fully-licensed and with all its shots). Upon those three JVMs lives a gigantic world. If you want the details, there's numerous studies and reports about the use of Java in all manner of business, from the hippest new startups (Twitter recently switched much of their stack to the JVM) to the oldest of the old financial concerns. It's the favored choice for government server applications, the strongest not-quite-completely-Free managed runtime for open-source libraries and applications, and now with Android it's rapidly becoming one of the strongest (if not the strongest) mobile OS platform (even though Android isn't *really* Java, as I'll get into later). You may love or hate Java, but I guarantee it's part of your life in some way or another.

There are a few open-source implementations of Java. The most well-known is OpenJDK, the Hotspot JVM that Sun relicensed under the GPL and set Free into the world. There's also Apache Harmony, whose class libraries form part of Dalvik's (Android's VM) Java-compatibility layer. There's GNU Classpath, a GPL-based implementation of the Java class libraries used for the ahead-of-time Java compiler GCJ. There's JamVM, which leverages Classpath to provide a very light, very minimal, (and very simple) JVM implementation. And there's others of varying qualities and relevance like IKVM (Java for .NET), VMKit (a Java compiler atop LLVM), and so on. OpenJDK is certainly the big daddy, though, and its release as GPL guarantees we'll at least have a solid Java 6 implementation forever.

Java is not an entirely open platform, what with the now-obvious encumbrances of patents and copyrights (not to mention draconian policies toward Java's various specifications, which are often very slow to evolve due to the JCP quagmire). That's not a great state of affairs, and if nothing else you have to recognize that folks at Sun at least tried to release the platform from its shackles by chasing OpenJDK. But the process of "freeing" Java has been pretty rocky; OpenJDK itself took years to gain acceptance from OSS purists, and the choice of the GPL has meant that folks afeared of the GPL's "viral" side still had to look for other options (which is a large part of why Apache Harmony was used as part of the basis for Android). Perhaps the biggest nail in the coffin is that Sun's Java test kit, the gold standard of whether an implementation is "compliant" or not, has never been released in open-source form, ultimately binding the hands of developers who wished to build a fully-compatible open-source Java.

Java is not an entirely closed platform, either. OpenJDK was a huge step in the direction of Freeing Java, and the Java community in general has a very strong OSS ethos. There's no piece of Java software that isn't at least partially based on open-source componenents, and most Java library, framework, or application developers either initially or eventually open-source some or all of their works. Open-source development and the Java platform go hand-in-hand, and without that relationship the platform would not be where it is today. Contrast that to other popular environments like Microsoft's .NET – which has been admirably Freed through open standards, but which has not yet become synonymous with or popular for OSS development – or Apple's various platforms – which aren't based on open-standards *or* open-source, but which have managed to become many OSS developers' environment of choice...for writing or consuming non-Apple open-source software. Among the corporation-controlled runtimes, the Java platform has more OSS in its blood than all others combined...many times more.

Java is not perfect, but it's pretty darn good. Every platform has its warts. The Java platform represents a decade and a half of tradeoffs, and it's impossible in that amount of time to make everyone happy all the time. One of the big contentious items is the addition in Java 5 of parametric polymorphism as a compile-time trick without also adding VM-level support for reifying per-type specializations as .NET can do. But ask most Java developers if they'd rather have nothing at all, and you'll get mixed responses. The sad, crippled version of generics in Java 5 doesn't do everything static-typing purists want, nor does it really extend to runtime at all (making reflective introspection almost impossible), but they do provide some nice surface-level sugar for Java developers. The same can be said of many Java "features" and tradeoffs. JavaEE became an abortively complicated jumble of mistakes (tradeoffs that went bad), but even upstarts that arguably made better decisions initially have themselves graduated into chaos (I believe the Spring framework has now grown even larger than the largest Java EE conglomerate, and Microsoft's periodically reboots their blessed-framework-of-the-week, resulting in an even more disruptive environment than a slow-moving, bulky standard like JavaEE). Designing good software is hard. Designing good *big* software is exponentially harder. Designing good *big* software that pleases everyone is impossible.

Why People Hate Java

Java is second only to Microsoft's platforms for being both wildly successful and almost universally hated by the self-sure software elite. The reasons for this are manifold and complex.

First of all, back in the 90s Java started getting shoved down everyone's throat. Developers were increasingly told to investigate this new platform, since their managers and long-disconnected tech leads kept hearing how great it was from Sun Microsystems, then a big deal in server applications and hardware. So developers that were happily using other environments (many of which exist to this day) often found themselves forced to suck it up and become Java developers. Making matters worse, Java itself was designed to be a fairly limited language...or at least limited in how easily a developer could paint themselves into a corner. Many features those reluctant developers had become used to in other environments were explicitly rejected for Java on the grounds that they added too much complexity, too much confusion, and too little value to trenches developers. So people that were happily doing Perl or C++ or Smalltalk or what have you were suddenly forced into a little J-shaped box and forced to write all those same applications upon Java and the JVM at a time when both were still poorly-suited to those domains. Those folks have had a white-hot hate for anything relating to Java ever since, and many will stop at nothing to see the entire platform ejected into space.

Second, as mentioned quickly above, Java in the 90s was simply not that great a platform. It had most of the current warts (classpath issues, VM limitations, poor system-level integration, a very limited language) on top of the fact that it was slow (optimizing JVMs didn't come around until the 2000s), marketed for highly-visible, highly-fickle application domains like desktop and browser-based applications (everyone's cursed a Java app or applet at some point in their life), and still largely driven and controlled by a single company (at a time when many developers were trying to get out from under Microsoft's thumb). It wasn't until Java 1.2 that we started to get a large and diverse update to Java's core libraries. Java 1.3 was the first release to ship Hotspot, which started to get the performance monkey off our backs. Java 1.5 brought the first major changes to the Java language, all designed to aid developers in expressing what they meant in standard ways (like using type-safe enums instead of static final ints, or generics for compiler-level assurances of collection homogeneity). And Java 6, the last major version, made great strides in improving startup time, overall performance, and manageability of JVM processes. Java 7, should it ever ship, will bring new changes to the Java language like support for closures and other syntactic sugar, better system-level integration features as found in NIO.2, and the feather in the cap: VM-level support for function objects and non-standard invocation sequences via Method Handles and InvokeDynamic. But unless you've been a Java developer for the past decade, all you remember is the roaring 90s and the pain Java caused you as a developer or a user.

Third, the Java language and environment has stagnated. Given years of declining fortunes at Sun Microsystems, disagreement among JCP members about the direction the platform should go, and a year of uncertainty triggered by Sun's collapse and rescue at the hands of Oracle, it's surprising anything's managed to get done at all. Java 7 is now many years overdue; they were talking about it when I joined Sun in 2006, and hoped to have preview releases within a year. For both technical and political reasons, it's taken a long time to bring the platform to the next level, and as a result many of the truly excellent improvements have remained on the shelf (much to my dismay...we really could use them in JRuby). For fast-moving technology hipsters, that's as good as dying on the vine; you need to shift paradigms on a regular schedule or you're yesterday's news.

Update: At least one commenter also pointed out that it took a long time for Java to be "everywhere", and even today most users still need to download and install it at least once on any newly-installed OS. Notable exceptions include Mac OS X, which ships a cracker-jack Java 6 based on Hotspot, and some flavors of Linux that come with some sort of Java installed out of the box. But this was definitely a very real problem; developers were being pushed to write apps and applets in Java, and users were forced to download a multi-megabyte installer just to run them...at a time when downloading multi-megabyte software was often a very painful ordeal. That would put a bad taste in anyone's mouth.

It's because of these and similar reasons that folks like Google finally said "enough is enough," and opted to start doing their own things. On the JRuby project, we've routinely hacked around the limitations of the JVM, be they related to its piss-poor process management APIs, its cumbersome support for binding native libraries, or its stubborn reluctance to become the world's greatest dynamic language VM. I've thought on numerous occasions how awesome it would be to spin off a company that took OpenJDK and made it "right" for the kinds of development people want to do today (and I'd love to be a part of that company), but such ventures are both expensive and light on profitability. Nobody pays for platforms or runtimes...they pay for services around those platforms or runtimes, services which are often anathema to the developers of those platforms and runtimes. So it required someone "bigger" to make that happen...someone who could write off the costs of the platform by funding it in creative new ways. Someone with a massive existing investment in Java. Someone with deep pockets and an army of the best developers in the business who love nothing more than a challenge. Someone like Google.

Why Android?

(Note that a lot of this is based on what information I've managed to glean from various conversations. Clarifications or corrections are welcome.)

There's an incredibly successful mobile Java platform out there. One that boasts millions of devices from almost all the major manufacturers, in form factors ranging from crappy mid-00s clamshells to high-end smartphones. A platform with hundreds or thousands of games and applications and freely-available development tools. That platform is Java ME.

Java ME started out as an effort to bring Java back to its original roots: as a language and environment for writing embedded applications. The baseline ME profiles are pretty bare; I did some CLDC development years ago and had to implement my own buffered streams and various data structures just to get by. Even the biggest profiles are still fairly restricted, and I don't believe any of them have ever graduated beyond Java 1.3-level featuresets. So Sun did a great job of getting Java ME on devices, back when people cared about Sun...and then they let mobile Java stagnate to a terrible degree while they spent all resources trying to get people to use Java EE and trying to get Java EE to suck less. So while resources were getting poured into EE, people started to form the same opinions of mobile Java they had formed about desktop and server Java years earlier.

At the same time, Java ME was one of the few Java-related technologies that brought in money. You see, in order for handset manufacturers to ship (and boast about) Java ME support, they had to license the technology from Sun. It wasn't a huge cash cow, but it was a cow nonetheless. Java ME actually made money for Sun. So in true Sun form, they loused it up terribly.

Fast forward to a few years ago. Google, recognizing that mobile devices finally were becoming the next great technology market, decided that leaving the mobile world in the hands of proprietary platforms was a bad idea. Java ME seemed like it could be an answer, but Sun was starting to get desperate for both revenue and relevance...and they'd started to back a completely new horse-that-would-be-cow called JavaFX, which they hoped to pimp as the next great development environment for in-browser and on-device apps alike. They weren't interested in making Java ME be what Google wanted it to be.

Google decided to take the hard route: they'd fund development of a new platform, building it entirely from open-source components, and leveraging two of the best platform technologies available: Linux, for the kernel, and Java, for the runtime environment. However there was a problem with Java: it was encumbered by all sorts of patents and copyrights and specifications and restrictions. Hell, even OpenJDK itself, the most complete and competitive OSS implementation of Java, could not be customized and shipped in binary-only form by hardware manufacturers and service providers due to it being GPL. So the answer was to build a new VM, use unencumbered versions of the core Java class libraries, and basically remake the world in a new, copyright and patent-free image. Android was born.

There's many parts to Android, several of which I'm not really qualified to talk about. But the application environment that runs atop the Dalvik VM needs some explanation.

First, there's the VM. Dalvik is *not* a JVM. It doesn't run JVM bytecode, and you can't ship JVM bytecode expecting it to work on Dalvik. You must recompile it to Dalvik's own bytecode using one of the provided translation tools. This is similar to how IKVM gets Java code to run on .NET: you're not actually running a JVM, you're transforming your code into a different form so it will run on someone else's VM. So it bears repeating, lest anyone get confused: Dalvik is not a JVM...it just plays one on TV.

Second, there's the core Java class libraries. Android supports a rough (but large) subset of the Java 1.5 class libraries. That subset is large enough that projects as complicated as JRuby can basically run unmodified on Android, with very few restrictions (a notable one is the fact that since we can't generate JVM bytecode, we can't reoptimize Ruby code at runtime right now). In order to do this without licensing Sun's class libraries (as most other mainstream Java runtimes like JRockit and J9 do), Google opted to go with the not-quite-complete-but-pretty-close Apache Harmony class libraries, which had for years been developed independent of Sun or OpenJDK but never really tested against the Java compatibility kits (and there's a long and storied history behind this situation).

So by building their own non-JVM VM and using translated versions of non-Sun, non-encumbered class libraries, Google hoped to avoid (or at least blunt) the possibility that their "unofficial", "unlicensed" mobile Java platform might face a legal test. In short, they hoped to build the open mobile Java platform developers wanted without the legal and financial encumbrances of Java ME.

At first, they seemed to be on a gravy train with biscuit wheels.

Splitting Up the Pie

Sun Microsystems was not amused. A little over a year ago, when several Sun developers started to take an eager interest in Android, we were all told to back off. It wasn't yet clear whether Android stood on solid legal ground, and Sun execs didn't want egg on their face if a bunch of their own employees turned out to be supporting a platform they'd eventually have to attack. Furthermore, it was an embarrassment to see Android drawing in the same developers Sun really, really wanted to look at JavaFX or PersonalJava or whatever the latest attempt to bring developers back might be. Android actually *was* a great platform that supported existing Java developers and libraries incredibly well (without actually being a Java environment), and for the first time there was a serious contender to "standard" Java that Sun had absolutely no control over.

To make matters worse, handset manufacturers started to sign on in droves to this new non-Java ME platform, which meant all that technology licensing revenue was reaching a dead end. Nobody (including me) wanted to do Java ME development anymore. Everyone wanted to do Android development.

Now we must say one thing to Sun's credit: they didn't do what Oracle is now attempting to do. As James Gosling blogged recently, patent litigation just wasn't in Sun's blood...even if there might have been legal ground to file suit. So while we Sun employees were still quietly discouraged from looking at or talking about Android, the rest of the world took Sun's silence as carte blanche to stuff Android into everything from phones to TVs, and mobile app developers started to think there might be hope for a real competitor to Apple's iPhone. Things might have proceeded in this way indefinitely, with Android continuing to grab market share (it recently passed iPhone in raw numbers with no slowing in sight) and mindshare (Android is far more approachable than almost any other mobile development environment, especially if you're one of the millions of developers who know Java.)

And then it all started to go wrong.

The Mantle of Java Passes to Oracle

If for nothing else, Jonathan Schwartz will be remembered as the man who broke open the Sun piñata, simultaneously releasing more open-source software than any company in history and killing Sun in the process. Either Jonathan had no "step 2" or the inertia of a company built on closed-source products was too great to overcome. In either case, by spring of 2009 Sun was hemorrhaging. Many reports claim that Jonathan had started shopping Sun around to possible buyers as early as 2008, but it wasn't until 2009 that the first candidates started lining up. Initially, it was IBM, hoping to gobble up its former competitor along with the IP, patents, and copyrights they carried. That deal ultimately went south when Sun refused to consider any deal that IBM wouldn't promise to carry to completion, even in the face of regulatory roadblocks sure to come up. Many of us breathed a sigh of relief; if there's any Java company even more firmly stuck in the old world than Sun, it's IBM...and we weren't looking forward to dealing with that.

Once that deal fell through, folks like me became resigned to the fact that Sun was nearing the end of its independent life. Years of platform negligence, management incompetence, and resting on laurels had dug a hole far too deep for anyone to climb out of. Would it be Cisco, who had recently started building up an interesting new portfolio of application server hardware and virtualization software? What about VMWare, who had recently gobbled up Springsource and seemed to be making all the right moves toward a large-scale virtualized "everything cloud." Or perhaps Oracle, a long-time partner to Sun, whose software was either Java-based or widely deployed on Sun hardware and operating systems. Dear god, please don't let it be Oracle.

Don't get me wrong...Oracle's a highly successful company. They've managed to turn almost every acquisition into gold while coaxing profitability out of just about every one of their divisions. But Oracle's not a developer-oriented company (like Sun)...it's a profit-oriented company (unlike Sun, sadly), and you need to either feed the bottom line or feed others in the company that do. So when it turned out that Oracle would gobble up Sun, many of us OSS folks started to get a little nervous.

You see, many of us at Sun had been actively trying to change the perception of the platform from that of a corporate, enterprisey, closed world to that of a great VM with a great OSS ecosystem and an open-source reference implementation. Folks like Jonathan believed that by freeing Java we'd free the platform, and both the platform and the developer community would be better for it. We were half right...the OpenJDK genie is out of the bottle, and there's basically no way to put it back now (and for that, the world owes Sun a great debt). But only part of the platform was Freed...the patents and copyrights surrounding Hotspot and Java itself remained in place, carefully tucked away in the vault of a company that just didn't mount patent or copyright-driven legal attacks.

Oracle, now in control of those patents and copyrights, obviously has different plans.

The Suit

So now, after spending 4000 words of your time, we come to the meat of the article: the actual Oracle v Google suit. The full text is provided various places online, though the Software Patents Wiki has probably the best collection of related facts (though the wiki-driven discussions of the actual patents are woefully inaccurate).

The suit largely comes down to a patent-infringement battle. Oracle claims that by developing and distributing Android, Google is in violation of seven patents. There's also an amorphous copyright claim without much backing information ("Google probably stole something copyrighted so we'll list a bunch of stuff commonly stolen in that way"), so we'll skip that one today.

Before looking at the actual patents involved, I want to make one thing absolutely clear: Oracle has not already won this suit. Even after a couple days of analysis, nobody has any idea whether they *can* win such a suit, given that Google seems to have taken great pains to avoid legal entanglements when designing Android. So everybody needs to take a deep breath and let things progress as they should, and either trust that things will go the right direction or start doing your damndest to make sure they go the right direction.

With that said, let's take a peek at the patents, one by one. And as always, the "facts" here are based on my reading of the patents and my understanding of the related systems.

Protection Domains To Provide Security In A Computer System (6,125,447) and Controlling Access To A Resource (6,192,476)

The first two patents describe the Java Security Policy system, for controlling access to resources. One of the least-interesting but most-important aspects of the Java platform is its approach to security. Code under a specific classloader or thread can be forced to comply with a specific security policy by installing a security manager. These permissions control just about every aspect of the code's interaction with the JVM and with the host operating system: loading new code, reflectively accessing existing classes, accessing system-level resources like devices and filesystems, and so on. It's even easy for you to build up security policies of your own by checking for custom-named permissions and only granting them when appropriate. It's a pretty good system, and one of the reasons Java has a much stronger security track record than other runtimes that don't have pervasive security in mind from the beginning.

In order to host applications written for the Java platform, and to sandbox them in a compatible way, Android necessarily had to support the same security mechanisms. The problem here is the same problem that plagues many patents: what boils down to a fairly simple and obvious way to solve a problem (associate pieces of code with sets of permissions, don't let that code do anything outside those permissions) becomes so far-reaching that almost any reasonable *implementation* of that idea would violate these patents. In this case the '447 and '476 patents do describe mechanisms for implementing Java security policies, but even that simple implementation is very vague and would be hard to avoid with even a clean-room implementation.

Now I do not know exactly how Android implements security policies, but it's probably pretty close to what's described in these patents...since just about every implementation of security policies would be pretty close to what's described.

Method And Apparatus For Preprocessing And Packaging Class Files (5,966,702)

This is basically the patent governing the "Pack200" compression format provided as part of the JDK and used to better-compress class file archives.

Update: Alex Blewitt has posted a discussion of the Pack200 specification. He says this patent isn't nearly as comprehensive, but that it may touch upon how Pack200 works. His post is a more complete treatment of the details of the class file format and how Pack200 improves compression ratios for class archives. It also occurs to me now that this patent could be related to mobile/embedded Java too, where better compression would obviously have an enormous savings.

Java class files are filled with redundant data. For example, every class that contains code that calls PrintStream.println (as in System.out.println) contains the same "constant pool" entry identifying that method by name, a la "java/io/PrintStream.println:(Ljava/lang/String;)V". Every field lookup, class reference, literal string, or method invocation will have some sort of entry in the constant pool. Pack200 takes advantage of this fact by compressing all class files as a single unit, batching duplicate data into one place so that the actual unique class data boils down to just the unique class, method, and code structure.

The reason for having a separate compression format is because "zip" files, which includes Java's "jar" files, are notoriously bad at compressing many small files with redundant data. Because one of the features of the "zip" format is that you can easily pull a single file out, compressing all files together as a single unit prevents introducing any interdependencies between those files or a global table. This is a large part of why compression formats like "tar.gz" do a better job of compressing many small files: tar turns many files into one file, and gzip or bzip2 compress that one large file as a single unit (conversely, this is why you can't easily get a single file out of a tarball).

On Android, this is accomplished in a similar way by the "dex" tool, which in the process of translating JVM bytecode into Dalvik bytecode also localizes all duplicate class data in a single place. The general technique is standard data compression theory, so presumably the novelty lies in applying decades-old compression theory specifically to Java classfile structure.

If I've lost you at this point, we can summarize it this way: part of Oracle's suit lies in a patent for a better compression mechanism for archives containing many class files that takes advantage of redundant data in those files.

Are you laughing yet?

System And Method For Dynamic Preloading Of Classes Through Memory Space Cloning Of A Master Runtime System Process (7,426,720)

I'm not sure this patent ever saw the light of day in a mainstream JVM implementation. It describes a mechanism by which a master parent process could pre-load and pre-initialize code for a managed system, and then new processes that need to boot quickly would basically be memory-copied (plus copy-on-write friendly) "forks" of that master process, with the master maintaining overall control of those child processes through some sort of IPC.

Ignore for the moment the obvious prior art of "fork" itself as applied to pre-initializing application state for many children. Anyone who's ever used fork to initialize a heavy process or runtime to avoid the cost of reinitializing children has either violated this patent (if done since 2003) or has a compelling case for prior art (if done before 2003).

It's likely that this patent was formulated as an answer to the poor semantics of running many applications under the same JVM. Java servlets and later Java EE made it possible to consider deploying all of your company's applications in a single process, isolated by classloaders and security policies. What they never really addressed was the fact that code isn't the only thing you're sharing in this model; you're also sharing memory space, CPU time, and process resources like file descriptors. No amount of Java classloader or security trickery could make this a seamless multiapp environment, and so work like this patent hoped to find a lightweight way for all those child applications to actually live as their own processes.

On Android, this manifests in the fact that each application runs independently, and they (like most operating systems) fork off from either the kernel process or some master process.

In this case, Oracle's banking on being able to litigate with a patent for a very common application of "fork".

Method And Apparatus For Resolving Data References In Generated Code (RE38,104)

This patent, invented by James Gosling himself, basically describes a mechanism by which symbolic data references in code (e.g. Java field references) can be resolved dynamically at runtime into actual direct memory accesses, eliminating the symbolic lookup overhead. It's part of standard JIT optimization techniques, and there's a lot of references in this patent many great JIT patents and papers of the past.

Here there may actually be merit, or as much merit as can be found in a software patent to begin with. The patent itself is tiny, as most of these patents are. The techniques seem obvious to me, but perhaps they're obvious because this patent helped make them standard. I'm not qualified to judge. What I can say is that I can't imagine a VM in existence that doesn't violate the spirit – if not the letter – of this patent as well. All systems with symbolic references will seek to eliminate the symbolic references in favor of direct access. The novelty of this patent may be in doing that translation on the fly...not even at a decidedly coarse-grained per-method level, but by rewriting code while the method is actually executing.

I would guess that this is a patent filed during the development of Java's earlier JIT technologies, before systems like Hotspot came along to do a much better large-scale, cross-method job of optimization. It doesn't seem like it would be hard to debunk the novelty of the patent, or at least show prior art that makes it irrelevant.

Update: I actually found a reference in the article Dalvik Optimization and Verification with dexopt to the technique described here (about 3/4 down the page, under "Optimization"):

"The Dalvik optimizer does the following: ... For instance field get/put, replace the field index with a byte offset. ..."

But Dalvik still does this only once, before running the code (actually, at install time); not *while* running the code as described in the patent.
Interpreting Functions Utilizing A Hybrid Of Virtual And Native Machine Instructions (6,910,205)

This patent, invented by Lars Bak of V8 fame, describes a mechanism for building a "mixed mode" VM that can execute interpreted code and compiled (presumably optimized) code in the same VM process, and flip between the two over time to produce better-optimized compiled code. This describes the basic underpinnings of VMs like Hotspot, which alternate between interpreting virtual machine code and executing real machine code even within the same thread of execution (and sometimes, even branching from virtual code to real code and back within the same method body). Any other VMs that are mixed mode would probably violate this patent, so its impact could reach much farther than Android. (In a sense, even JRuby might violate this patent, though our two mixed modes are both virtual instruction sets.)

Now you might think the other mainstream JVMs would violate this patent, but they don't. Neither JRockit nor J9 have interpreters; they both go immediately to native code with various tiers of instrumentation to do the runtime profile data gathering. They iterative regenerate native code with successively more and better optimizations. Lars most recent VM, the V8 Javascript VM at the heart of Chrome, also goes straight to native code.

Now here's where it gets weird: Up until Froyo (Android 2.2) Dalvik did a once-only compilation to native code before anything started executing, which means by definition that it was not mixed-mode. And even in Froyo, I believe it still does its initial execution in native code form with instrumentation to allow subsequent compiles to do a better job. Dalvik does not have an interpreter, Dalvik does not interpret Dalvik bytecode.

Perhaps someone can explain how this patent even applies to Dalvik or Android?

Update: A couple commenters correct me here: Dalvik actually was 100% interpreted before Froyo, and is now a standard mixed-mode environment post-Froyo. So if this suit had been filed a year ago this patent might not have been applicable, but it probably is now.

Method And System for Performing Static Initialization (6,061,520)

Sigh. This patent appears to revolve completely around a mechanism by which the static initialization of arrays could be "play executed" in a preloader and then rewritten to do static initialization in one shot, or at least more efficiently than running dozens of class initializers that just construct arrays and populate them. Of all the patents, this is probably the narrowest, and the mechanism described are again not very unusual, but there's probably a good chance that the "dex" tool does something along these lines to tidy up static initializers in Android applications.

Given the "preloader" aspect of this patent, I'd surmise that it was formulated in part to simplify static initialization of code on embedded devices or in applet environments (because on servers...the boot time of static initialization is probably of little concern). Because of the much more limited nature of embedded environments (especially in 1998, when this patent was filed) it would be very beneficial to turn programmatic data initialization into a simple copy operation or a specialized virtual machine instruction. And this may be why it could apply to Android; it's another sort of embedded Java, with a preloader (either dex or the dexopt tool that jit-compiles your app on the device) and resource limitations that would warrant optimizing static initialization.

So, Does the Suit Have Merit?

I'll again reiterate that I'm not a lawyer. I'm just a Java developer with a logical mind and a penchant for debunking myths about the Java platform.

The collection of patents specified by the suit seems pretty laughable to me. If I were Google, I wouldn't be particularly worried about showing prior art for the patents in question or demonstrating how Android/Dalvik don't actually violate them. Some, like the "mixed mode" patent, don't actually seem to apply at all. It feels very much like a bunch of Sun engineers got together in a room with a bunch of lawyers and started digging for patents that Google might have violated without actually knowing much about Android or Dalvik to begin with.

But does the suit have merit? It depends if you consider baseless or over-general patents to have merit. The most substantial patent listed here is the "mixed mode" patent, and unless I'm wrong that one doesn't apply. The others are all variations on prior art, usually specialized for a Java runtime environment (and therefore with some question as to whether they can apply to a non-Java runtime environment that happens to have a translator from Java code). Having read through the suit and scanned the patents, I have to say I'm not particularly worried. But then again, I don't know what sort of magic David Boies and company might be able to pull off.

What Might Happen?

In the unlikely event of a total victory by Oracle, there's probably a lot of possible outcomes. I don't see the "death of Java" among them. There's also the possibility that Google could win a convincing victory. What might happen in each case?

The Nuclear Option

The worst case scenario would be that Android is completely destroyed, all Android handsets are confiscated by the Oracle mafia and burned in the city square, and all hope for a Free and Open Java are forever laid to rest. Long live Mono.

To understand why this won't happen we need to explore Oracle's possible motives.

As I mentioned above, Java ME actually did bring licensing revenue to Sun. There's a lot of handset manufacturers, millions of handsets, and every one put a couple cents (or a couple bucks?) in Sun's pocket. In the heady high times of Java ME, it was the only managed mobile runtime in town, with sound and graphics and standard UI elements. It wasn't always pretty, but it worked well and it was really easy to write for.

Now with Android rapidly becoming the preferred mobile and embedded Java, it's become apparent that there's no future for Java ME - or at least no future in the expanding "smart" consumer electronics business. Java ME lives on in Blackberries, some other low-end phones, in most Blu-Ray devices (BD-J is a standard for writing Java apps that run on Blu-Ray systems, utilizing one of the richer class libraries available for Java ME), and in some sub-micro devices like Ajile's AJ-200 Java-based multimedia CPU. If you want Java on a phone or in your TV, Android is taking that world by storm. That means Java ME licensing revenue is rapidly drying up.

So why wouldn't Oracle want to take a bite of the rapidly-growing Android pie? Would they turn down a portion of that revenue and instead completely destroy a very popular and successful mobile Java, or would they just strongarm a few bucks out of Google and Android handset manufacturers? Remember we're talking about a profit-driven company here. Java ME is never going to come back to smartphones, that much is certain and I don't think even Oracle could argue it. There's no profit in filing this suit just to kill Android, since it would just mean competing mobile platforms like Windows Phone, RIM, Symbian, or iOS would just canibalize their younger brother. Instead of getting a slice of the fastest-growing segment of Java developers, you'd kill off the entire segment and force those developers to non-Java, non-Oracle-friendly platforms.

Oracle may be big and evil, but they're not stupid.

Google Licensing Deal

A more likely outcome might be that Google would be forced to license the patents or pay royalties on Android revenue. I honestly believe this is the goal of this lawsuit; Oracle wants to get their foot into the door of the smartphone world, and they know they can't innovate enough to make up for the collapse of Java ME. So they're hoping that by sabre-rattling a few patents, Google will be forced (or scared) into sharing the harvest.

Given the contents of the suit and the patents, I think this one is pretty unlikely too. Much of Android and Dalvik's designs are specifically crafted to avoid Java entanglements, and I think it's unlikely if this suit goes to trial that Oracle's lawyers would be able to make a convincing argument that the patents were both novel and that they were violated by Google. But let's not put anything past either the lawyers or the US federal court system, eh?

Nothing At All

There's a good chance that either Oracle or the court will realize quickly that the case has no merit, and drop all charges. I'm obviously hoping for this one, but it's likely to take the longest of all. First, the court would need to gather all facts in the case, which could take months (especially given the highly technical nature of some of the compaints). Then there's the rebuttals of those facts, sorting out the wheat from the chaff, deciding there's not enough there to proceed, and either Oracle backs out or the court tosses the case. In the latter case, there's the possibility of appeals, and things could start to get very expensive.

Total Collapse of Software Patents

This is probably the one of the highest-profile cases involving software patents in recent years. The other would be Apple's recent suit against HTC for design elements of the iPhone. Several other bloggers and analysts have called out the possibility that this could lead to the death of software patents in general. I think that's a bit optimistic, but both Google *and* Oracle have come down officially against patents in the past (though perhaps Oracle's had a change of heart since acquiring Sun's portfolio).

As much as I'd like to see it happen, software patents probably won't be dead in the next year or two. But this might be a nail in the coffin.

What Does This Mean for Java?

Now we come to the biggest question of all: how does this suit affect the Java world, regardless of outcome?

Well it's obviously not great to have two Java heavyweights bickering like schoolchildren, and it would be positively devastating if Android were obliterated because of this. But I think the real damage will be in how the developer community perceives Java, rather than in any lasting impact on the platform itself.

Let's return to some of our facts. First off, nothing in this suit would apply to any of the three mainstream JVMs that 99% of the world's Java runs on. Hotspot and JRockit are both owned by Oracle, and J9 is subject to the Java specification's patent grant for compliant implementations. The lesson here is that Android is the first Java-like environment since Microsoft's J++ to attempt to unilaterally subset or superset the platform (with the difference in Android's case being that it doesn't claim to be a Java environment, and it may not actually need the patent grant). Other Java implementations that "follow the Rules" are in the clear, and so 99% of the world's use of Java is in the clear. Sorry, Java haters...this isn't your moment.

This certainly does some damage to the notion of open-source Java implementations, but only those that are not (or can not be) compliant with the specification. As the Apache Harmony folks know all too well, it's really hard to build a clean-room implementation of Java and expect to get the "spec compliance patent grant" if you don't actually have the tools necessary to show spec compliance. Tossing the code over to Sun to run compliance testing is a nonstarter; the actual test kit is enormous and requires a huge time investment to set up and run (and Sun/Oracle have better things to do with their time than help out a competing OSS Java implementation). If the test kit had been open-sourced before Sun foundered, there would be no problem; everyone that wanted to make an open-source java would just aim for 100% compliance with the spec and all would be well. As it stands, independently implemented (i.e. non-OpenJDK) open-source Java is a really hard thing to create, especially if you have to clean-room implement all the class libraries yourself. Android has neatly dodged this issue by letting Android just be what it is: a subset of a Java-like platform that doesn't actually run Java bytecode and doesn't use any code from OpenJDK.

How will it affect Android if this case drags on? It could certainly hurt Android's adoption by hardware manufacturers, but they're already getting such an oustanding deal on the platform that they might not even care. Android is the first platform that has the potential to unify all hardware profiles, freeing manufacturers from the drudgery of building their own OSes or licensing OSes from someone else. Hell, HTC rose from zero to Hero largely because of their backing of Android and shipping of Android devices. Are they going to back off from that platform now just because Oracle's throwing lawyerbombs at Google? Probably not.

What Does This Mean For You?

If you're a non-Android Java developer...don't lose sleep over this. The details are going to take months to play out, and regardless of the outcome you're probably not going to be affected. Be happy, do great things, and keep making the Java platform a better place.

If you're an Android developer...don't lose sleep over this. Even if things go the way of the "Nuclear Option", you've still got a lot of time to build and sell apps and improve yourself as a developer. For a bit of novelty, start considering what a migration path might look like and turn that into a nice Android-agnostic application layer, something that's largely lacking in the current Android APIs. Or explore Android development in languages like JRuby, which are based on off-platform ecosystems that will survive regardless of Android's fate. Whatever you do, don't panic and run for the hills, and don't tell your friends to panic.

If you're mad as hell about this...I sympathize. I'm personally going to do whatever I can to keep people informed and keep pushing Android, including but not limited to writing 8000-word essays with my moderately-educated analysis of the "facts". I welcome your help in that fight, and I think it's a damn good time for people that want an open Java and an open mobile platform to show their quality by standing up and letting the world know we're here.

"All that is necessary for the triumph of evil is for good men to do nothing."

Do something, and we'll get through this together.

Footnote: Java Copyrights

I'd love for someone versed in copyright law to provide a brief analysis of how the Java copyrights described (vaguely) in the lawsuit might play out in Android. Java is certainly not ignored as a concept in Android docs, tools, and libraries, but it's unclear to me whether those copyrights amount to something enforceable when it comes to Android or Dalvik.

Update: "Crazy" Bob Lee emailed me to clear up a few facts. First off, Android and OpenJDK first came out around roughly the same time, so there was never really time to consider using OpenJDK's GPL'ed class libraries in Android. Bob also claims that Dalvik's design decisions were all technical and not made to circumvent IP, but it seems impossible to me that IP, patent, and licensing issues didn't have *some* influence on those decisions. He goes on to say that Android relies on process separation to sandbox applications, rather than leveraging Java security policies (or similar mechanisms (which Bob insists are badly designed anyway, and I might agree). Finally, he believes that in the worst case scenario, Dalvik would probably only require minor modifications to address the complaints in this suit. The "nuclear option" is, according to Bob, out of the realm of possibility.

Thanks for the clarifications, Bob!

Monday, July 19, 2010

What JRuby C Extension Support Means to You

As part of the Ruby Summer of Code, Tim Felgentreff has been building out C extension support for JRuby. He's already made great progress, with simple libraries like Thin and Mongrel working now and larger libraries like RMagick and Yajl starting to function. And we haven't even reached the mid-term evaluation yet. I'd say he gets an "A" so far.

I figured it was time I talked a bit about C extensions, what they mean (or don't mean) for JRuby, and how you can help.

The Promise of C Extensions

One of the "last mile" features keeping people from migrating to JRuby has been their dependence on C extensions that only work on regular Ruby. In some cases, these extensions have been written to improve performance, like the various json libraries. Some of that performance could be less of a concern under Ruby 1.9, but it's hard to claim that any implementation will be able to run Ruby as fast as C for general-purpose libraries any time soon.

However, a large number of extensions – perhaps a majority of extensions – exist only to wrap a well-known and well-trusted C library. Nokogiri, for example, wraps the excellent libxml. RMagick wraps ImageMagick. For these cases, there's no alternative on regular Ruby...it's the C library or nothing (or in the case of Nokogiri, your alternatives are only slow and buggy pure-Ruby XML libraries).

For the performance case, C extensions on JRuby don't mean a whole lot. In most cases, it would be easier and just as performant to write that code in Java, and many pure-Ruby libraries perform well enough to reduce the need for native code. In addition, there are often libraries that already do what the perf-driven extensions were written for, and it's trivial to just call those libraries directly from Ruby code.

But the library case is a bit stickier. Nokogiri does have an FFI version, but it's a maintenance headache for them and a bug report headache for us, due to the lack of a C compiler tying the two halves together. There's a pure-Java Nokogiri in progress, but building both the Ruby bindings and emulating libxml behavior takes a long time to get right. For libraries like RMagick or the native MySQL and SQLite drivers, there are basically no options on the JVM. The Google Summer of Code project RMagick4J, by Sergio Arbeo, was a monumental effort that still has a lot of work left to be done. JDBC libraries work for databases, but they provide a very different interface from the native drivers and don't support things like UNIX domain sockets.

There's a very good chance that JRuby C extension support won't perform as well as C extensions on C Ruby, but in many cases that won't matter. Where there's no equivalent library now, having something that's only 5-10x slower to call – but still runs fast and matches API – may be just fine. Think about the coarse-grained operations you feed to a MySQL or SQLite and you get the picture.

So ultimately, I think C extensions will be a good thing for JRuby, even if they only serve as a stopgap measure to help people migrate small applications over to native Java equivalents. Why should the end goal be native Java equivalents, you ask?

The Peril of C Extensions

Now that we're done with the happy, glowing discussion of how great C extension support will be, I can make a confession: I hate C extensions. No feature of C Ruby has done more to hold it back than the desire for backward compatibility with C extensions. Because they have direct pointer access, there's no easy way to build a better garbage collector or easily support multiple runtimes in the same VM, even though various research efforts have tried. I've talked with Koichi Sasada, the creator of Ruby 1.9's "YARV" VM, and there's many things he would have liked to do with YARV that he couldn't because of C extension backward compatibility.

For JRuby, supporting C extensions will limit many features that make JRuby compelling in the first place. For example, because C extensions often use a lot of global variables, you can't use them from multiple JRuby runtimes in the same process. Because they expect a Ruby-like threading model, we need to restrict concurrency when calling out from Java to C. And all the great memory tooling I've blogged about recently won't see C extensions or the libraries they call, so it introduces an unknown.

All that said, I think it's a good milestone to show that we can support C extensions, and it may make for a "better JNI" for people who really just want to write C or who simply need to wrap a native library.

How You Can Help

There's a few things I think users like you can help with.

First off, we'd love to know what extensions you are using today, so we can explore what it would take to run them under JRuby (and so we can start exploring pure-Java alternatives, too.) Post your list in the comments, and we'll see what we can come up with.

Second, anyone that knows C and the Ruby C API (like folks who work on extensions) could help us fill out bits and pieces that are missing. Set up the JRuby cext branch (I'll show you how in a moment), and try to get your extensions to build and load. Tim has already done the heavy lifting of making "gem install xyz" attempt to build the extension and "require 'xyz'" try to load the resulting native library, so you can follow the usual processes (including extconf.rb/mkmf.rb for non-gem building and testing.) If it doesn't build ok, help us figure out what's missing or incorrect. If it builds but doesn't run, help us figure out what it's doing incorrectly.

Building JRuby with C Extension Support

Like building JRuby proper, building the cext work is probably the easiest thing you'll do all day (assuming the C compiler/build/toolchain doesn't bite you.

  1. Check out (or fork and check out) the JRuby repository from http://github.com/jruby/jruby:
    git clone git://github.com/jruby/jruby.git

  2. Switch to the "cext" branch:
    git checkout -b cext origin/cext

  3. Do a clean build of JRuby plus the cext subsystem:
    ant clean build-jruby-cext-native

At this point you should have a JRuby build (run with bin/jruby) that can gem install and load native extensions.