Wednesday, June 16, 2010

My Short List of Key Missing JVM Features

I mused today on Twitter that there's just a few small things that the JVM/JDK need to become a truly awesome platform for all sorts of development. Since so many people asked for more details, I'm posting a quick list here. There's obviously other things, but these are the ones on my mind today.


Cold Performance

Current JVMs start up pretty fast, and there's changes coming in Hotspot in Java 7 that will make them even better. Usually this comes from combinations of pre-verifying bytecode (or providing verification hints), sharing class data across processes, and run-of-the-mill tweaks to make the loading and linking processes more efficient. But for many apps, this doesn't do anything to solve the biggest startup hit of all: cold execution performance. Because the JVM doesn't save off the jitted products of each run, it must start "cold" every time, running everything in the bytecode interpreter until it gets hot enough to compile. Even on JVMs that don't have an interpreter, the initial cost of compiling everything to not-particularly-optimized assembly also causes a major startup hit (try running command-line stuff on JRockit or J9).

There's a few things people have suggested, and they're all hard:
  • Tiered compilation - compile earlier using the fastest-possible, least-optimizing compiler, but decorate the compiled code with appropriate profiling logic to do a better job later. Hotspot in Java 7 may ship a tiered compiler, but there have been some resource setbacks that delayed its development.
  • Save off compilation or optimization artifacts - this is theoretically possible, but the deeper you go the harder it is to save it. Usually the in-memory results of optimization and compilation depend on the layout of things...in memory. Saving them to disk means you need to scrub out anything that might be different in a new process like memory addresses and class identities. But .NET can do this, though it largely *just* does static compilation. Happy medium?
  • Keep a JVM process running and toss it new work. We do this in JRuby with the Nailgun library, but it has some problems. First off, it can leave various aspects of the JVM in a dirty state, like system properties and memory footprint. Second, it can't kill off rogue threads that don't terminate, so they can collect over time. And third...it's not actually running at the console, so a lot of console things you'd do normally don't work.
This is probably the biggest unsolvable problem for JRuby right now, and the one we most often have to apologize for. JRuby is fast...at times, very fast...and getting faster every day. But not during the first 5 seconds, and so everyone gets the same bad impression.

Better Console/Terminal Support

There's endless blogs out there complaining about how the standard IO streams you get from the JVM are crippled in various ways. You can't select on them, for example, which is the source of a few unfixable bugs in JRuby. You can't pass them along to subprocesses, which is perhaps more a failing of the process-launching APIs in the JDK than standard IO itself. There's no direct terminal support in any of Java's APIs, so people end up yanking in libraries like jline just to support line editing. If the JDK shipped with some nice terminal and process APIs, a lot of the hassles developers have writing command-line tools in Java would melt away.

There's some light at the end of the tunnel. NIO2, scheduled to be part of Java 7, will bring better process launching APIs (with inherited standard IO streams, if you desire), a broader range of selectable channels, and much more. Hopefully it will be enough.

Fix the Busted APIs

JDBC is broken. Why? Because you have to register your driver in a global hard-referencing hash, and have to unregister it from the same classloader or it will leak. That means that if you're loading JDBC drivers from within a webapp or EE application, *your entire application remains in memory* because the driver references it and that map references the driver. This is the primary reason why most Java web application servers leak memory on undeploy, and it's another "unfixable" from JRuby's perspective.

Object serialization is broken. Why? Because it plays all sorts of tricks to get your classloader, reflectively access fields (if you're going to reflectively access them anyway, why not just break encapsulation if security allows it), and construct object instances without allowing you the opportunity to initialize them appropriately yourself. You have to provide no-arg constructors, have to un-final fields so they can be set up outside of construction, and heaven forbid you use default serialization: it's dead slow.

Reflection is too slow and there's no way around it. Not only do you end up calling through many extra levels of logic for reflective invocation, you have to box your argument lists, box your numerics, and wrap everything in exception-handling. And it doesn't have to be this way. The invokedynamic work brings along with it method handles, which are fast, direct pointers to methods. This should have been added long ago, but thankfully it's on the way in Java 7. Until then, projects like JRuby will have to continue eating the cost of reflection...or generate method handles by hand. We do both.

Regular expressions are broken. Why? Because simple alternations can blow the Java stack when fed especially large input. The current Sun-created regex implementation recurses for things like alternation, making it easy for it to fail to match on large input. The problem is so bad that we've actually switched regular expression engines in JRuby *four times*, including two implementations we wrote ourselves. Nobody can say we haven't bled for our users.

And there's numerous other examples. Some are relics of Java 1.0 that never got corrected (because old APIs don't die, they just get deprecated...or ignored). Some are relics of the idea that gigantic monolithic servers hosting dozens of apps (and leaking memory when they undeploy, or else contending for basic resources that separate processes would not) are a good idea, when in actuality running multiple JVMs that each only host one or a few apps works far better. Making a real effort to smooth these bad APIs would go a long way.

Better Support for Native Libraries and POSIX Features

As of Java 6, there's still no support for working with symlinks and only limited support for setting file permissions. Process launching is absolutely terrible. You can't select on all channels...only on sockets. If you want to use a native library, you have to write JNI code to do it, even though there are libraries like JNA and JFFI in the wild that do an outstanding job of dynamically loading and binding those libraries.

Missing the POSIXy features is basically inexcusable today. Most of the system-level APIs in the JDK are still based on a lowest common denominator somewhere near Windows 95, even though all modern operating systems provide at least a substantial subset of those APIs. NIO2 will bring many improvements, but it's almost certain that some parts of POSIX won't be exposed, either because there's not enough resources to spec out the Java APIs for them or because they end up being too system-specific.

As for loading native libraries...this is again something that should have been rolled into the JDK a long time ago. Many people will cry foul..."pure Java!" they'll shout...and I agree. But there are times when some functionality simply doesn't exist in a Java library, or doesn't scale well as an out-of-process call. For these times, you just have to use the native library...and the barrier to entry for doing that on the Java platform is just too high. Rolle JNA or JFFI or something similar into the JDK, so we grown-ups can choose when we want to use native code.

The Punchline

The punchline is that for most of these things, we've solved or worked around them in JRuby...at *great expense*. I'd go so far as to say we've done more to work around the JVM and the JDK's lackings than any other project. We've gone out of our way to improve startup by any means possible. We ship beautiful, solidly-performing libraries for binding native libs without writing a line of C code. We generate a large amount of code at compile and runtime to avoid using reflection as much. We maintain and ship native POSIX layers for a dozen platforms. We've (i.e. one of our champions, Marcin Mielzynski) implemented our own regular expression engine (i.e. a port of Oniguruma). We've pulled every trick in the book to get process launching to work nicely and behave like Ruby users expect. And so on and so forth.

But it's not sustainable. We can't continue to patch around the JVM and JDK forever, even if we've done a great job so far. Hopefully this will serve as a wake-up call for JVM and JDK implementers around the world: If you don't want the Java platform to be a server-only, large-app-only, long-running-only, headless-only world...it's time to fix these things. I'm standing by to help coordinate those efforts :)

Tuesday, June 01, 2010

Restful Services in Ruby using JRuby and Jersey

There's lots of ways to present RESTful web services these days, and REST has obviously become the new "it's IPC no it's not" hotness. And of course Rubyists have been helping to lead the way, building restfulness into just about everything they write. Rails itself is built around REST, with most controllers doubling as RESTful interfaces, and it even provides extra tools to help you transparently make RESTful calls from your application. If you're doing RESTful services for a typical Ruby application, Rails is the way to go (and even if you're not using Ruby in general...you should be considering JRuby + Rails).

In the Java world the options aren't quite as clear, but one API has at least attempted to standardize the idea of doing RESTful services on the Java platform: JSR-311, otherwise known as JAX-RS.

JAX-RS in theory makes it easy for you to simply mark up a piece of Java code with annotations and have it automatically be presented as a RESTful service. Of the available options, it may be the simplest, quickest way to get a Java-based service published and running.

So I figured I'd try to use it from JRuby, and when doing it in Ruby it's actually surprisingly clean, even compared to Ruby options.

The Service

I followed the Jersey Getting Started tutorial using Ruby for everything (and not using Maven in this case).

My version of their HelloWorldResource looks like this in Ruby:

require 'java'java_import 'javax.ws.rs.Path'
java_import 'javax.ws.rs.GET'
java_import 'javax.ws.rs.Produces'

java_package 'com.headius.demo.jersey'
java_annotation 'Path("/helloworld")'
class HelloWorld
java_annotation 'GET'
java_annotation 'Produces("text/plain")'
def cliched_message
"Hello World"
end
end
Notice that we're using the new features in JRuby 1.5 for producing "real" Java classes: java_package to specify a target package for the Java class, java_annotation to specify class and method annotations.

We compile it using jrubyc from JRuby 1.5 like this:
~/projects/jruby ➔ jrubyc -c ../jersey-archive-1.2/lib/jsr311-api-1.1.1.jar --javac restful_service.rb
Generating Java class HelloWorld to /Users/headius/projects/jruby/com/headius/demo/jersey/HelloWorld.java
javac -d /Users/headius/projects/jruby -cp /Users/headius/projects/jruby/lib/jruby.jar:../jersey-archive-1.2/lib/jsr311-api-1.1.1.jar /Users/headius/projects/jruby/com/headius/demo/jersey/HelloWorld.java
The new --java(c) flags in jrubyc examine your source for any classes, spitting out .java source for each one in turn. Along the way, if you have marked it up with signatures, annotations, imports, and so on, it will emit those into the Java source as well. If you specify --javac (as opposed to --java), it will also compile the resulting sources for you with jruby and your user-specified jars in the classpath, as I've done here.

The result looks like this to Java:
~/projects/jruby ➔ javap com.headius.demo.jersey.HelloWorld
Compiled from "HelloWorld.java"
public class com.headius.demo.jersey.HelloWorld extends org.jruby.RubyObject{
public static org.jruby.runtime.builtin.IRubyObject __allocate__(org.jruby.Ruby, org.jruby.RubyClass);
public com.headius.demo.jersey.HelloWorld();
public java.lang.Object cliched_message();
static {};
}
Under the covers, this class will load in the source of our restful_service.rb file and wire up all the Ruby and Java pieces so that both sides see HelloWorld as the in-memory representation of the Ruby HelloWorld class. Method calls are dispatched to the Ruby code, constructors dispatch to initialize, and so on. It's truly living in both worlds.

With the service in hand, we now need a server script to start it up.

The Server Script

Continuing with the tutorial, I've taken their simple Java-based server script and ported it directly to Ruby:
require 'java'
java_import com.sun.jersey.api.container.grizzly.GrizzlyWebContainerFactory

base_uri = "http://localhost:9998/"
init_params = {
"com.sun.jersey.config.property.packages" => "com.headius.demo.jersey"
}

puts "Starting grizzly"
thread_selector = GrizzlyWebContainerFactory.create(
base_uri, init_params.to_java)

puts <<EOS
Jersey app started with WADL available at #{base_uri}application.wadl
Try out #{base_uri}helloworld
Hit enter to stop it...
EOS

gets

thread_selector.stop_endpoint

exit(0)
It's somewhat cleaner and shorter, but it wasn't particularly large to begin with. At any rate, it shows how simple it is to launch a Grizzly server and how nice annotation-based APIs can be for auto-configuring our JAX-RS service.

The CLASSPATH

Ahh CLASSPATH. You are so maligned when all you hope to do is make it explicit where libraries are coming from. The world should learn from your successes and your failures.

There's five jars required for this Jersey example to run. I've tossed them into my CLASSPATH env var, but you're free to do it however you like
jersey-core-1.2.jar
jersey-server-1.2.jar
jsr311-api-1.1.1.jar
asm-3.1.jar
grizzly-servlet-webserver-1.9.9.jar
The first four are available in the jersey-archive download, and you can fetch the Grizzly jar from Maven or other places.

Testing It Out

The lovely bit about this is that it's essentially a four-step process to do the entire thing: write and compile the service, write the server script, set up CLASSPATH, and run the server. Here's the server output, finding my HelloWorld service right where it should:
~/projects/jruby ➔ jruby main.rb
Starting grizzly
Jersey app started with WADL available at http://localhost:9998/application.wadl
Try out http://localhost:9998/helloworld
Hit enter to stop it...
Jun 1, 2010 6:36:55 PM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
com.headius.demo.jersey
Jun 1, 2010 6:36:55 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class com.headius.demo.jersey.HelloWorld
Jun 1, 2010 6:36:55 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
Jun 1, 2010 6:36:55 PM com.sun.jersey.server.impl.application.WebApplicationImpl initiate
INFO: Initiating Jersey application, version 'Jersey: 1.2 05/07/2010 02:04 PM'
And perhaps the most anti-climactic climax ever, curling our newly-deployed service:
~ ➔ curl http://localhost:9998/helloworld
Hello World
It works!

What We've Learned

This was a simple example, but we demonstrated several things:
  • JRuby's new jrubyc --java(c) support for generating "real" Java classes
  • Tagging a Ruby class with Java annotations, so it can be seen by a Java framework
  • Booting a Grizzly server from Ruby code
  • Implementing a JAX-RS service with Jersey in Ruby code
Do you have any examples of other nice annotation-based APIs we could test out with JRuby?