Friday, April 30, 2010

Building Ruboto: Precompiling Ruby for Android

I originally started to send this to the JRuby dev list and to the Ruboto list, but realized quickly that it might make a good blog post. Since I don't blog enough lately, here it is.

I've been looking into better ways to precompile Ruby code to classes for deploy on Android devices. Normally, JRuby users can just jar up their .rb files and load and require them as though they were on the filesystem; JRuby finds, loads, and runs them just fine. This works well enough on Android, but since there's no way to generate bytecode at runtime, JRuby code that isn't precompiled must run interpreted forever...and run a bit slower than we'd like because of it. In order for Ruby to be a first-class language for Android development, we must make it possible to precompile Ruby code *completely* and bundle it up with the application. So this evening I spent some time making that possible.

I have some good news, some bad news, and some good news. First, a bit of background into JRuby's compiler.

What JRuby's Compiler Produces

JRuby's ahead-of-time compiler produces a single .class file per .rb file, to ease deployment and lookup of those files (and because I think it's ugly to always vomit out an unidentifiable class for every method body). This produces a nice 1:1 mapping between .rb and .class, but it comes at a cost: since those .class files are just "bags of methods" we need to bind those methods somehow. This usually happens at runtime, with JRuby generating a small "handle" class for every method as it is bound. So for a script like this:

# foo.rb
class Foo
def bar; end
end

def hello; end

You will get one top-level class file when you AOT compile, and then two more class files are generated at runtime for the "handles" for methods "bar" and "hello". This provides the best possible performance for invocation, plus a nice 1:1 on-disk format...but it means we're still generating a lot of code at runtime.

The other complication is that jrubyc normally outputs a .class file of the same name as the .rb file, to ease lookup of that .class file at runtime. So the main .class for the above script would be called "foo.class". The problem with this is that "foo.rb" may not always be loaded as "foo.rb". A user might load '../yum/../foo.rb' or some other peculiar path. As a result, the base name of the file is not enough to determine what class name to load. To solve this, I've introduced an alternate naming scheme that uses the SHA1 hash of the *actual* content of the file as the class name. So, for the above script, the resulting class would be named:

ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2

While this isn't a pretty name, it does provide a way to locate the compiled version of a script universally, regardless of what path is used to load it.

The Good News

I've modified jrubyc (on master only...we need to talk about whether this should be a late addition to 1.5) to have a new --sha1 flag. As you might guess, this flag alters the compile process to generate the sha1-named class for each compiled file.

~/projects/jruby ➔ jrubyc foo.rb 
Compiling foo.rb to class foo

~/projects/jruby ➔ jrubyc --sha1 foo.rb
Compiling foo.rb to class ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2

~/projects/jruby ➔ jruby -X+C -J-Djruby.jit.debug=true -e "require 'foo'"
...
found jitted code for ./foo.rb at class: ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2
...

This is actually finding the foo.rb file, calculating its SHA1 hash, and then loading the .class file instead. So if you had a bunch of .rb code for an Android application and wanted to precompile it, you'd run this command to get the sha1 classes, and then include both the .rb file and the .class file in your application (the .rb file must be there because...you guessed it...we need to calculate the sha1 hash from its contents).

To test this out, I actually ran jrubyc against the Ruby stdlib to produce a sha1 class for every .rb file:

~/projects/jruby ➔ jrubyc -t /tmp --sha1 lib/ruby/1.8/
Compiling all in '/Users/headius/projects/jruby/lib/ruby/1.8'...
Compiling lib/ruby/1.8//abbrev.rb to class ruby.jit.FILE_4F30363F88066CC74555ABA5BE4B73FDE323BE1A
Compiling lib/ruby/1.8//base64.rb to class ruby.jit.FILE_DD42170B797E34D082C952B92A19474E3FDF3FA2
Compiling lib/ruby/1.8//benchmark.rb to class ruby.jit.FILE_0C42EBD7F248AF396DE7A70C0FBC31E9E8D233DE
...
Compiling lib/ruby/1.8//xsd/xmlparser/rexmlparser.rb to class ruby.jit.FILE_8B106B9E9F2F1768470A7A4E6BD1A36FC0859862
Compiling lib/ruby/1.8//xsd/xmlparser/xmlparser.rb to class ruby.jit.FILE_AF51477EA5467822D8ADED37EEB5AB5D841E07D9
Compiling lib/ruby/1.8//xsd/xmlparser/xmlscanner.rb to class ruby.jit.FILE_3203482AEE794F4B9D5448BF51935879B026092C

This produces 524 class files for 524 .rb files, just as it should, and running with forced compilation (-X+C) and jruby.jit.debug=true shows that it finds each class when loading anything from stdlib. That's a good start!

What About the Handles?

I mentioned above that we also generate, at runtime, a small handle class for every bound method in a given script. And again, since we can't generate bytecode on-device, we need a way to pregenerate all those handles.

An hour's worth of work later, and jrubyc has a --handles flag that will additionally spit out all method handles for each script compiled. Here's our foo script compiled with --sha1 and --handles, along with the resulting .class files:

~/projects/jruby ➔ jrubyc --sha1 --handles foo.rb
Compiling foo.rb to class ruby.jit.FILE_351347C9126659D4479558A2706DBC35E45D16D2
Generating direct handles for foo.rb

~/projects/jruby ➔ ls ruby/jit/*351347*
ruby/jit/FILE_351347C9126659D4479558A2706DBC35E45D16D2.class

~/projects/jruby ➔ ls *351347*
ruby_jit_FILE_351347C9126659D4479558A2706DBC35E45D16D2Invokermethod__1$RUBY$barFixed0.class
ruby_jit_FILE_351347C9126659D4479558A2706DBC35E45D16D2Invokermethod__2$RUBY$helloFixed0.class

And sure enough, we can also see that these handles are being loaded instead of generated at runtime. So it's possible with these two options to *completely* precompile JRuby sources into .class files. Hooray!

The Bad News

My next step was obviously to try to precompile and dex the entire Ruby standard library. That's 524 files, but how many method bodies? We'd need to generate a handle for each one of them.

~/projects/jruby ➔ mkdir stdlib-compiled

~/projects/jruby ➔ jrubyc --sha1 --handles -t stdlib-compiled/ lib/ruby/1.8/
Compiling all in '/Users/headius/projects/jruby/lib/ruby/1.8'...
Compiling lib/ruby/1.8//abbrev.rb to class ruby.jit.FILE_4F30363F88066CC74555ABA5BE4B73FDE323BE1A
Generating direct handles for lib/ruby/1.8//abbrev.rb
Compiling lib/ruby/1.8//base64.rb to class ruby.jit.FILE_DD42170B797E34D082C952B92A19474E3FDF3FA2
Generating direct handles for lib/ruby/1.8//base64.rb
...
Compiling lib/ruby/1.8//xsd/xmlparser/xmlparser.rb to class ruby.jit.FILE_AF51477EA5467822D8ADED37EEB5AB5D841E07D9
Generating direct handles for lib/ruby/1.8//xsd/xmlparser/xmlparser.rb
Compiling lib/ruby/1.8//xsd/xmlparser/xmlscanner.rb to class ruby.jit.FILE_3203482AEE794F4B9D5448BF51935879B026092C
Generating direct handles for lib/ruby/1.8//xsd/xmlparser/xmlscanner.rb

~/projects/jruby ➔ find stdlib-compiled/ -name \*.class | wc -l
8212

Wowsers, that's a lot of method bodies..over 7500 of them. But of course this is the entire Ruby standard library, with code for network protocols, templating, xml parsing, soap, and so on. Now for the more frightening numbers: keeping in mind that .class is a pretty verbose file format, how big are all these class files?

~/projects/jruby ➔ du -ks stdlib-compiled/ruby
14008 stdlib-compiled/ruby

~/projects/jruby ➔ du -ks stdlib-compiled/
44784 stdlib-compiled/

Yeeow! The standard library alone (without handles) produces 14MB of .class files, and with handles it goes up to a whopping 44MB of .class files! That seems a bit high, doesn't it? Especially considering that the .rb files add up to around 4.5MB?

Well there's a few explanations for this. First off, the generated handles are rather small, around 2k each, but they each are probably 50% the exact same code. They're generated as separate handles primarily because the JVM will not inline the same loaded body of code through two different call paths, so we have to duplicate that logic repeatedly. Java 7 fixes some of this, but for now we're stuck. The handle classes also share almost identical constant pools, or in-file tables of strings. Many of the same characteristics apply to the compiled Ruby scripts, so the 44MB number is a bit larger than it needs to be.

We can show a more realistic estimate of on-disk size by compressing the lot, first with normal "jar", and then with the "pack200" utility, which takes greater advantage of the .class format's intra-file redundancy:

~/projects/jruby ➔ cd stdlib-compiled/

~/projects/jruby/stdlib-compiled ➔ jar cf stdlib-compiled.jar .

~/projects/jruby/stdlib-compiled ➔ pack200 stdlib-compiled.pack.gz stdlib-compiled.jar

~/projects/jruby/stdlib-compiled ➔ ls -l stdlib-compiled.*
-rw-r--r-- 1 headius staff 13424221 Apr 30 01:43 stdlib-compiled.jar
-rw-r--r-- 1 headius staff 4051355 Apr 30 01:44 stdlib-compiled.pack.gz

Now we're seeing more reasonable numbers. A 13MB jar file is still pretty large, but it's not bad considering we started with 44MB of .class files. The packed size is even better: only 4MB for a completely-compiled Ruby standard library, and ultimately *smaller* than the original sources.

So what's the bad news? It obviously wasn't the size, since I just showed that was a red herring. The bad news is when we try to dex this jar.

The "dx" Tool

The Android SDK ships with a tool called "dx" which gets used at build time to translate Java bytecode (in .class files, .jar files, etc) to Dalvik bytecode (resulting in a .dex file. Along the way it optimizes the code, tidies up redundancies, and basically makes it as clean and compact as possible for distribution to an Android device. Once on the device, the Dalvik bytecode gets immediately compiled into whatever the native processor runs, so the dex data needs to be as clean and optimized as possible.

Every Java application shipped for Android must pass through dx in some way, so my next step was to try to "dex" the compiled standard library:

~/projects/jruby/out ➔ ../../android-sdk-mac_86/platforms/android-7/tools/dx --dex --verbose --positions=none --no-locals --output=stdlib-compiled.dex stdlib-compiled.jar 
processing archive stdlib-compiled.jar...
ignored resource META-INF/MANIFEST.MF
processing ruby/jit/FILE_003796EE1C0C24540DF7239B8197C183BC7017BB.class...
processing ruby/jit/FILE_00499F5FE29ED8EDB63965B0F65B19CFE994D120.class...
...
processing ruby_jit_FILE_FEF23DE8CDA5B9BD9D880CBC08D3249158379E58Invokermethod__5$RUBY$run_suiteFixed0.class...
processing ruby_jit_FILE_FEF23DE8CDA5B9BD9D880CBC08D3249158379E58Invokermethod__6$RUBY$create_resultFixed0.class...

trouble writing output: format == null

Uh-oh, that doesn't look good. What happened?

Well it turns out that the Ruby standard library *plus* all the handles needed to bind it is too much for the current dex file format It's a known issue that similarly bit Maciek Makowski (reported of the linked bug) when he tried to dex both the Scala compiler and the Scala base set of libraries in one go. And similar to his case, I was able to successfully dex *either* the precompiled stdlib *or* the generated handles...but not both at the same time.

What Can We Do?

It appears that for the moment, it's not going to be possible to completely precompile the entire Ruby standard library. But there's ways around that.

First off, probably no application on the planet needs the entire standard library, so we can easily just include the files needed for a given app. That may be enough to cut the size down tremendously. It's also perfectly possible to build a very complicated Ruby application for Android that will easily fit into the current dex format; I doubt most mobile applications would result in 4.5MB of uncompressed .rb source. So the added --sha1 and --handle features will be immediately useful for Android development.

Secondly, I've been planning on adding a different way to bind methods that doesn't require a class file per method. I would probably generate a large switch for each .rb file and then bind the methods numerically, so only a single additional class (and only a few methods in that class) would be needed to bind an entire compiled .rb script. This issue with dex will force me to finally do that.

And lastly, there's a bit more good news. Remember that the packed size of the entire standard library plus handles was around 4MB? Here's the sizes of the dex'ed standard library and handles:

~/projects/jruby ➔ ls -l *.dex
-rw-r--r-- 1 headius staff 3718340 Apr 30 00:57 stdlib-compiled-solo.dex
-rw-r--r-- 1 headius staff 8656300 Apr 30 00:52 stdlib-compiled-handles.dex

~/projects/jruby ➔ jar cf blah.apk *.dex

~/projects/jruby ➔ ls -l blah.apk
-rw-r--r-- 1 headius staff 3179625 Apr 30 02:01 blah.apk

Once dex has worked its magic against our sources, we're now down to 3.1MB of compressed code...a pretty good size for the entire Ruby standard library plus 7500+ noisy, repetitive handles. We're definitely within reach of making full Ruby development for Android a reality.

11 comments:

John Woodell said...

Wow, this is so exciting. I'd love to see this roll into 1.5.

Jonas Elfström said...

Do you have to distribute the stdlib with every app or could it be reused?

Anonymous said...

Yeah, that's a nice question. That was the first thing i asked myself as well: Can u reuse the stdlib ?

Chris Thiel said...

This is awesome. Keep up the good work, sir.

Gary S. Weaver said...

Being able to write Android apps in JRuby is nothing short of frickin' awesome. Thanks so much for your work on this!

danfuzz said...

Hi! I'm the tech lead for Dalvik within the Android project.

First of all, I'm pleased as punch that you're doing this work, and I am hoping that we can tweak the dex format to make this easier for you (and other similar efforts) in the not-too-distant future.

In the mean time, there are workarounds other than what you mentioned. I updated the bug to clarify, but for your convenience here's a recap: The dalvik.system package contains classes that can be used to create ClassLoader instances from arbitrary dex files. Though Android doesn't come out-of-the-box with a fully baked way to use more than one dex file in an apk, all the right underlying facilities are there for you to code up a solution that fits your needs.

I would love to hear your feedback in terms of experiences trying this all out, and really in terms of the whole effort (not just the above suggestion). The Dalvik team takes a customer-focused attitude, and though we can't promise immediate turnaround on any given issue, the more feedback we get, the better chance we have of making good forward progress.

Cheers.

Scott M said...

Awesome work! For Ruboto IRB I'm sure that a whole chunk of files from stdlib could be left as rbs (pulled in if needed). That might allow things through dx.

For the classes that do get compiled, would it make sense to store the sha1 values in a file to avoid the file system overhead of the rbs?

Dan, it's great to hear your thoughts. I experimented with classes in dalvik.system with some success (playing with this for the Ruboto IRB project). I think this might be the way to go if we create a unique LoadService to handle dex files. I was hoping to figure out a way to use stock JRuby and normal ruby scripts (i.e., using a simple require).

booOOOoooOoooO said...

Amazing work!! I've been trying to optimize jruby under the Android Scripting Ennvironment project as it is currently extremely slow.

I hope to see some of this in Jruby 1.5, cause that would make a world of difference in trying to win more Jruby supporters in the ASE project.

Also, if any of this could apply to Beanshell or vice versa, that could greatly reduce their footprints.

carmen said...

im just going to quote:
I think much of technology in the past twenty years has been about make-work

lets just not use libc, and while we're at it, lets not use JVM bytecodes. itll be cool, you can write absurd 8 page blogposts about getting thigns to work!

Charles Oliver Nutter said...

John Woodell: It is in! Both flags will be available in RC3.

Jonas, Anonymous: It could potentially be reused, like if there were a standard "Ruboto" app that every Ruby-based Android depended on. We're trying to work out the details of how to access it from other apps.

danfuzz: Thanks for that! That may be a big help for this. I'm hoping to concentrate more effort on making JRuby compact, fast, and full-featured for Android after JRuby 1.5 comes out (next weekish). I'm sure we can get it a lot closer to Java perf for mundane tasks, and this work is a step toward that.

Scott: It didn't land for 1.5, but I'd also like to add an option to emit .rb files containing just the SHA1 hash, so that you don't have to ship them. The loading process would then check if it's a "SHA1-only" file and use that, otherwise calculating SHA1 on the fly. It will come soon.

boooo: Maybe you guys can get some information on why it's "extremely slow"? I think there are aspects of Dalvik that make our interpreter a lot slower than running on Hotspot, where it actually performs reasonably well (at least as good as Ruby 1.8's C-based interpreter). Precompiling will certainly help as well, but I need a better picture of what Dalvik does and does not optimize and how to leverage that optimization better.

carmen: I do not understand what you're trying to say.

booOOOoooOoooO said...

@Charles Oliver Nutter:
There are several things slowing Jruby down in ASE. Right now, Jruby has to go through an extra layer of interpretation to get the Dex bytecode (note: this may go away with your precompiling). Also, Dalvik is trying to load a bunch of the sun.java stuff left in the jar file. This is compared to Python actually being a cross compiled C version, running natively in its own process. However, interpreted languages running in native C have huge limitations running outside of ASE. They can't call Android intents directly.

Jruby can be a complete API bridge, running in its own process. I've seen a Beanshell APK that had direct access to android. This may not be considered "safe", but its not for end users. Its for truly mobile devs. Your precompiling tradeoff could speed things up on the same order of raibow tables and rainbowcrack.

Hope that helps. Keep up the great work!!