Monday, July 19, 2010

What JRuby C Extension Support Means to You

As part of the Ruby Summer of Code, Tim Felgentreff has been building out C extension support for JRuby. He's already made great progress, with simple libraries like Thin and Mongrel working now and larger libraries like RMagick and Yajl starting to function. And we haven't even reached the mid-term evaluation yet. I'd say he gets an "A" so far.

I figured it was time I talked a bit about C extensions, what they mean (or don't mean) for JRuby, and how you can help.

The Promise of C Extensions

One of the "last mile" features keeping people from migrating to JRuby has been their dependence on C extensions that only work on regular Ruby. In some cases, these extensions have been written to improve performance, like the various json libraries. Some of that performance could be less of a concern under Ruby 1.9, but it's hard to claim that any implementation will be able to run Ruby as fast as C for general-purpose libraries any time soon.

However, a large number of extensions – perhaps a majority of extensions – exist only to wrap a well-known and well-trusted C library. Nokogiri, for example, wraps the excellent libxml. RMagick wraps ImageMagick. For these cases, there's no alternative on regular Ruby...it's the C library or nothing (or in the case of Nokogiri, your alternatives are only slow and buggy pure-Ruby XML libraries).

For the performance case, C extensions on JRuby don't mean a whole lot. In most cases, it would be easier and just as performant to write that code in Java, and many pure-Ruby libraries perform well enough to reduce the need for native code. In addition, there are often libraries that already do what the perf-driven extensions were written for, and it's trivial to just call those libraries directly from Ruby code.

But the library case is a bit stickier. Nokogiri does have an FFI version, but it's a maintenance headache for them and a bug report headache for us, due to the lack of a C compiler tying the two halves together. There's a pure-Java Nokogiri in progress, but building both the Ruby bindings and emulating libxml behavior takes a long time to get right. For libraries like RMagick or the native MySQL and SQLite drivers, there are basically no options on the JVM. The Google Summer of Code project RMagick4J, by Sergio Arbeo, was a monumental effort that still has a lot of work left to be done. JDBC libraries work for databases, but they provide a very different interface from the native drivers and don't support things like UNIX domain sockets.

There's a very good chance that JRuby C extension support won't perform as well as C extensions on C Ruby, but in many cases that won't matter. Where there's no equivalent library now, having something that's only 5-10x slower to call – but still runs fast and matches API – may be just fine. Think about the coarse-grained operations you feed to a MySQL or SQLite and you get the picture.

So ultimately, I think C extensions will be a good thing for JRuby, even if they only serve as a stopgap measure to help people migrate small applications over to native Java equivalents. Why should the end goal be native Java equivalents, you ask?

The Peril of C Extensions

Now that we're done with the happy, glowing discussion of how great C extension support will be, I can make a confession: I hate C extensions. No feature of C Ruby has done more to hold it back than the desire for backward compatibility with C extensions. Because they have direct pointer access, there's no easy way to build a better garbage collector or easily support multiple runtimes in the same VM, even though various research efforts have tried. I've talked with Koichi Sasada, the creator of Ruby 1.9's "YARV" VM, and there's many things he would have liked to do with YARV that he couldn't because of C extension backward compatibility.

For JRuby, supporting C extensions will limit many features that make JRuby compelling in the first place. For example, because C extensions often use a lot of global variables, you can't use them from multiple JRuby runtimes in the same process. Because they expect a Ruby-like threading model, we need to restrict concurrency when calling out from Java to C. And all the great memory tooling I've blogged about recently won't see C extensions or the libraries they call, so it introduces an unknown.

All that said, I think it's a good milestone to show that we can support C extensions, and it may make for a "better JNI" for people who really just want to write C or who simply need to wrap a native library.

How You Can Help

There's a few things I think users like you can help with.

First off, we'd love to know what extensions you are using today, so we can explore what it would take to run them under JRuby (and so we can start exploring pure-Java alternatives, too.) Post your list in the comments, and we'll see what we can come up with.

Second, anyone that knows C and the Ruby C API (like folks who work on extensions) could help us fill out bits and pieces that are missing. Set up the JRuby cext branch (I'll show you how in a moment), and try to get your extensions to build and load. Tim has already done the heavy lifting of making "gem install xyz" attempt to build the extension and "require 'xyz'" try to load the resulting native library, so you can follow the usual processes (including extconf.rb/mkmf.rb for non-gem building and testing.) If it doesn't build ok, help us figure out what's missing or incorrect. If it builds but doesn't run, help us figure out what it's doing incorrectly.

Building JRuby with C Extension Support

Like building JRuby proper, building the cext work is probably the easiest thing you'll do all day (assuming the C compiler/build/toolchain doesn't bite you.

  1. Check out (or fork and check out) the JRuby repository from http://github.com/jruby/jruby:
    git clone git://github.com/jruby/jruby.git

  2. Switch to the "cext" branch:
    git checkout -b cext origin/cext

  3. Do a clean build of JRuby plus the cext subsystem:
    ant clean build-jruby-cext-native

At this point you should have a JRuby build (run with bin/jruby) that can gem install and load native extensions.

16 comments:

metaphysicaldeveloper said...

I agree that Ruby would be much better off without the need for C-extensions. It would be far more portable and more hackable (turtles all way down are great for ParseTree AST metaprogramming).

However, the problem is the need for such extensions. Having a more performatic implementation of ruby would go a long way in diminishing the need. But the design decisions were to make the language usable, not fast. We can only hope implementations will eventually have as much JIT optimization opportunities as the jvm.

skim said...

The only one I would like to see for JRuby is bson_ext which MongoDB depends on for "significantly improved performance".

http://mongodb.org/display/DOCS/Ruby+Language+Center

skim said...

The only one I would like to see for JRuby is bson_ext which mongo gem depends on for significantly improved performance.

http://mongodb.org/display/DOCS/Ruby+Language+Center

openid said...

Things I've used in the past few months that needed C-Extensions:

- BSON_EXT (MongoDB)
- Typhoeus (A fast HTTP client)
- yajl-ruby (A fast JSON parser and drop-in replacement for the JSON gem. Supports streaming JSON input)

Claudio said...

Ruby C extensions I need:

- Gsl (rb-gsl)
- Image Magick (RMagick)
- Mongrel

AnonymousWoodDuck said...

I don't get it. If C Ruby holds it (Ruby?) back pure Java extensions will only make matters worse since your complaint about being tied to an API holds true regardless of the underlying language.

Charles Oliver Nutter said...

AnonymousWoodDuck: It's not necessarily that it's the API that holds it back, it's that the API exposes details about the internal implementation of MRI. Since many extensions depend on those details, it's very difficult to remove or change those parts of the API and even more difficult to implement them on other implementations.

Paul said...

I just started looking for a kerberos implementation in ruby and found:
http://rubyforge.org/projects/krb5-auth/
Unfortunately, it's a C extension and doesn't work in JRuby. Is this a good candidate or should I be looking to wrap a Java implementation instead?

Charles Oliver Nutter said...

Paul: It may be possible to get the C ext working as a short-term measure, so I'd recommend giving it a try. However in almost all cases the best long-term solution will be to find a pure Java library that can simulate or replace the C ext. This is mainly because C exts are cumbersome, often have difficult-to-manage dependencies, and may not be permitted at all on typical Java servers.

Kyle Banker said...

@skim @openid

We're working on a Java version of bson_ext. You can track progress here:
http://github.com/mongodb/mongo-ruby-driver/tree/jruby

Kyle

skim said...

@Kyle Banker: thanks! look forward to the updates

Paul Brannan said...

Could you use RTLD_LOCAL or RTLD_DEEPBIND to use multiple instances of the same C extension in the same process?

Charles Oliver Nutter said...

Paul: I do not know of such things, but I would love to learn!

openid said...

Just discovered a new one that might be interesting:

- Cassandra (http://github.com/fauna/cassandra#readme)

Marc Seeger said...

aaand another one:

-em-http-request

Anonymous said...

Maybe I am just lucky but I have had zero problems with FFI, and the FFI code always works the same for both MRI and JRuby.

That makes my libraries far more valuable.