Monday, March 01, 2010

JRuby Startup Time Tips

JRuby has become notorious among Ruby implementations for having a slow startup time. Some of this is the JVM's fault, since it doesn't save JIT products and often just takes a long time to boot and get going. Some of this is JRuby's fault, though we work very hard to eliminate startup bottlenecks once they're reported and diagnosed. But a large number of startup time problems stem from simple configuration issues. Here's a few tips to help you improve JRuby's startup time.

(Note: JRuby actually does pretty well on base startup time compared to other JVM languages; but MRI, while generally not the fastest Ruby impl, starts up incredibly fast.)

Make sure you're running the client JVM

This is by far the easiest way to improve startup. On HotSpot, the JVM behind OpenJDK and Sun's JDK, there are two different JIT-compiling backends: "client" and "server". The "client" backend does only superficial optimizations to JVM code as it compiles, but compiles things earlier and more quickly. The "server" backend performs larger-scale, more-global optimizations as it compiles, but takes a lot longer to get there and uses more resources when it compiles.

Up until recently, most JVM installs preferred the "client" backend, which meant JVM startup was about as fast as it could get. Unfortunately, many newer JVM releases on many operating systems are either defaulting to "server" or defaulting to a 64-bit JVM (which only has "server"). This means that without you or JRuby doing anything wrong, startup time takes a major hit.

Here's an example session showing a typical jruby -v line for a "server" JVM and the speed of doing some RubyGems requires (which is where most time is spent booting a RubyGems-heavy application):
~/projects/jruby ➔ jruby -v
jruby (ruby 1.8.7 patchlevel 174) (2010-03-01 6857a4e) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_17) [x86_64-java]

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m5.174s
user 0m7.643s
sys 0m0.422s

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m5.068s
user 0m7.662s
sys 0m0.449s
Ouch, over 5 seconds just to boot RubyGems and load ActiveSupport. You can see in this case that the JVM is preferring the "64-Bit Server VM", which will give you great runtime performance but pretty dismal startup time. If you see this sort of -v output, you can usually force the JVM to run in 32-bit "client" mode by specifying "-d32" as a JVM option. Here's an example using an environment variable:
~/projects/jruby ➔ export JAVA_OPTS="-d32"

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m2.320s
user 0m2.583s
sys 0m0.207s

~/projects/jruby ➔ time jruby -e "require 'rubygems'; require 'active_support'"

real 0m2.275s
user 0m2.580s
sys 0m0.207s
On my system, a MacBook Pro running OS X 10.6, switching to the "client" VM improves this scenario by well over 50%. You may also want to try the "-client" option alone or in combination with -d32, since some 32-bit systems will still default to "server". Play around with it and see what works for you, and then let us know your platform, -v string, and how much improvement you see.

Regenerate the JVM's shared archive

Starting with Java 5, the HotSpot JVM has included a feature known as Class Data Sharing (CDS). Originally created by Apple for their OS X version of HotSpot, this feature loads all the common JDK classes as a single archive into a shared memory location. Subsequent JVM startups then simply reuse this read-only shared memory rather than reloading the same data again. It's a large reason why startup times on Windows and OS X have been so much better in recent years, and users of those systems may be able to ignore this tip.

On Linux, however, the shared archive is often *never* generated, since installers mostly just unpack the JVM into its final location and never run it. In order to force your system to generate the shared archive, run some equivalent of this command line:
headius@headius-desktop:~/jruby$ sudo java -Xshare:dump
Loading classes to share ... done.
Rewriting and unlinking classes ... done.
Calculating hash values for String objects .. done.
Calculating fingerprints ... done.
Removing unshareable information ... done.
Moving pre-ordered read-only objects to shared space at 0x94030000 ... done.
Moving read-only objects to shared space at 0x9444bef8 ... done.
Moving common symbols to shared space at 0x9444d870 ... done.
Moving remaining symbols to shared space at 0x945154e0 ... done.
Moving string char arrays to shared space at 0x94516118 ... done.
Moving additional symbols to shared space at 0x945ac5b0 ... done.
Read-only space ends at 0x94612560, 6169952 bytes.
Moving pre-ordered read-write objects to shared space at 0x94830000 ... done.
Moving read-write objects to shared space at 0x94ea64a0 ... done.
Moving String objects to shared space at 0x94ee3708 ... done.
Read-write space ends at 0x94f27448, 7304264 bytes.
Updating references to shared objects ... done.
For many users, this can make a tremendous difference in startup time, since all that extra class data remains available in memory. You may need to specify the -d32 or -client options along with the -Xshare option. Play with those and the other -Xshare modes to see if it helps your situation.

Delay or disable JRuby's JIT

An interesting side effect of JRuby's JIT is that it sometimes actually slows execution for really short runs. The compiler isn't free, obviously, nor is the cost of loading, verifying, and linking the resulting JVM bytecode. If you have a very short command that touches a lot of code, you might want to try disabling or delaying the JIT.

Disabling is easy: pass the -X-C flag or set the jruby.compile.mode property to "OFF":
~/projects/jruby ➔ time jruby -S gem install rake
Successfully installed rake-0.8.7
1 gem installed
Installing ri documentation for rake-0.8.7...
Installing RDoc documentation for rake-0.8.7...

real 0m13.188s
user 0m12.342s
sys 0m0.685s

~/projects/jruby ➔ time jruby -X-C -S gem install rake
Successfully installed rake-0.8.7
1 gem installed
Installing ri documentation for rake-0.8.7...
Installing RDoc documentation for rake-0.8.7...

real 0m12.590s
user 0m12.342s
sys 0m0.583s
This doesn't generally give you a huge boost, but it can be enough to keep you from going mad.

Avoid spawning "sub-rubies"

It's a fairly common idiom for Rubyists to spawn a Ruby subprocess using Kernel#system, Kernel#exec, or backquotes. For example, you may want to prepare a clean environment for a test run. That sort of scenario is perfectly understandable, but spawning many sub-rubies can take a tremendous toll on overall runtime.

When JRuby sees a #system, #exec, or backquote starting with "ruby", we will attempt to run it in the same JVM using a new JRuby instance. Because we have always supported "multi-VM" execution (where multiple isolated Ruby environments share a single process), this can make spawning sub-Rubies considerably faster. This is, in fact, how JRuby's Nailgun support (more on that later) keeps a single JVM "clean" for multiple JRuby command executions. But even though this can improve performance, there's still a cost for starting up those JRuby instances, since they need to have fresh, clean core classes and a clean runtime.

The worst-case scenario is when we detect that we can't spin up a JRuby instance in the same process, such as if you have shell redirection characters in the command line (e.g. system 'ruby -e blah > /dev/null'). In those cases, we have no choice but to launch an entirely new JRuby process, complete with a new JVM, and you'll be paying the full zero-to-running cost.

If you're able, try to limit how often you spawn "sub-rubies" or use tools like Nailgun or spec-server to reuse a given process for multiple hits.

Do less at startup

This is a difficult tip to follow, since often it's not your code doing so much at startup (and usually it's RubyGems itself). One of the sad truths of JRuby is that because we're based on the JVM, and the JVM takes a while to warm up, code executed early in a process runs a lot slower than code executed later. Add to this the fact that JRuby doesn't JIT Ruby code into JVM bytecode until it's been executed a few times, and you can see why cold performance is not one of JRuby's strong areas.

It may seem like delaying the inevitable, but doing less at startup can have surprisingly good results for your application. If you are able to eliminate most of the heavy processing until an application window starts up or a server starts listening, you may avoid (or spread out) the cold performance hit. Smart use of on-disk caches and better boot-time algorithms can help a lot, like saving a cache of mostly-read-only data rather than reloading and reprocessing it on every boot.

Try using Nailgun

In JRuby 1.3, we officially shipped support for Nailgun. Nailgun is a small library and client-side tool that reuses a single JVM for multiple invocations. With Nailgun, small JRuby command-line invocations can be orders of magnitude faster. Have a look at my article on Nailgun in JRuby 1.3.0 for details.

Nailgun seems like a magic bullet, but unfortunately it does little to help certain common cases like booting RubyGems or starting up Rails (such as when running tests). It also can't help cases where you are causing lots of sub-rubies to be launched. Your best bet is to give it a try and let us know if it helps.

Play around with JVM flags

There's scads of JVM flags that can improve (or degrade) startup time. For a good list of flags you can play with, see my article on my favorite JVM flags, paying special attention to the heap-sizing and GC flags. If you find combinations that really help your application start up faster, let us know!

Help us find bottlenecks

The biggest advances in startup-time performance have come from users like you investigating the load process to see where all that time is going. If you do a little poking around and find that particular libraries take unreasonably long to start (or just do too much at startup) or if you find that startup time seems to be limited by something other than CPU (like if your hard drive starts thrashing madly or your memory bus is being saturated) there may be improvements possible in JRuby or in the libraries and code you're loading. Dig a may be surprised what you find.

Here's a few JRuby flags that might help you investigate:
  • --sample turns on the JVM's sampling profiler. It's not super accurate, but if there's some egregious bottleneck it should rise to the top.
  • -J-Xrunhprof:cpu=times turns on the JVM's instrumented profiler, saving profile results to java.hprof.txt. This slows down execution tremendously, but can give you more accurate low-level timings for JRuby and JDK code.
  • -J-Djruby.debug.loadService.timing=true turns on timing of all requires, showing fairly accurately where boot-time load costs are heaviest.
  • On Windows, where you may not have a "time" command, pass -b to JRuby (as in 'jruby -b ...') to print out a timing of your command's runtime execution (excluding JVM boot time).

Let us help you!

Sometimes, there's an obvious misconfiguration or bug in JRuby that causes an application to start up unreasonably slowly. If startup time seems drastically different than what you've seen here, there's a good chance something's wrong with your setup. Post your situation to the JRuby mailing list or find us on IRC, and we (along with other JRuby users) will do our best to help you.


Toby said...

Good and informative post!

One thing that I'm curious about is the -X-C flag / jruby.compile.mode - does that disable any and all bytecode generation in JRuby? I'm interested because I've used both JRuby and Jython in a context where thousands of little scripts are run in succession, and started seeing some strange behavior with the JVM not properly collecting all the generated bytecode. Symptoms were used up permgen and VM frezes. I've tried messing with some JVM parameters (e.g. -XX:+CMSClassUnlaodingEnabled), but didn't have much luck. I haven't investigated further due to time pressure and finding a workaround, but I would be happy to ditch my workaround if the aforementioned flag gets around this issue.

Charles Oliver Nutter said...

Toby: Yes, it turns off all bytecode generation within JRuby itself (obviously other libraries may generate bytecode). If you're seeing permgen filling up, however, that could very likely a bug we would want to fix (or may have already fixed for 1.5), so you should report it and work with us to diagnose the problem. We go to a great deal of effort to prevent generated bytecode from clogging up the JVM.

awksedgreep said...

Perfect timing on this post. I'm putting together a JRuby specific gem(SNMP4JR) and a more or less JRuby specific framework(CommonThread) and rake, gem, generate, etc are all buggin the crap out of me. I find myself installing the JRuby gems, etc in MRI just so I'm not sitting and waiting for a prompt 50% of the time.

Big big help, thanks.

awksedgreep said...

Great startup time improvement on 1.5RC1.

Thanks very much for this.

mark-cotners-macbook-pro:opt markcotner$ time jruby -e "require 'rubygems';require 'active_support'"

real 0m2.947s
user 0m3.023s
sys 0m0.270s
mark-cotners-macbook-pro:opt markcotner$ time jruby-1.4.0/bin/jruby -e "require 'rubygems';require 'active_support'"

real 0m5.086s
user 0m3.584s
sys 0m0.344s

Charles Oliver Nutter said...

awksedgreep: Great improvement! Thanks for posting it!