Tuesday, November 27, 2007

REXML Numbers With Joni

As Ola reported earlier today, we've merged Joni, Marcin Mielczynski's port of Oniguruma, to JRuby trunk. Here's the description from the Oniguruma home page:

Oniguruma is a regular expressions library.
The characteristics of this library is that different character encoding
for every regular expression object can be specified.
The benefit for us is avoiding the encode/decode we previously had to do for every regular expression match, since Ruby uses byte[]-based strings and all Java regular expression engines work with char[]. You can imagine the overhead all that array churn introduced.

After running through a series of basic optimizations, most of the key expressions we worried about were performing as well as or much better than JRegex, so Ola went through with the conversion over the past couple days. Marcin is continuing to work on various optimizations, but both Ola and I have been playing with the new code. And it's looking great.

You may remember I reported recently about how the regexp bottleneck impacted XML parsing with REXML. Here's the numbers run against JRuby immediately before merging Joni:
read content from stream, no DOM
3.362000 0.000000 3.362000 ( 3.362000)
1.232000 0.000000 1.232000 ( 1.232000)
0.887000 0.000000 0.887000 ( 0.887000)
1.009000 0.000000 1.009000 ( 1.010000)
0.801000 0.000000 0.801000 ( 0.801000)
read content once, no DOM
9.869000 0.000000 9.869000 ( 9.869000)
9.779000 0.000000 9.779000 ( 9.779000)
9.786000 0.000000 9.786000 ( 9.786000)
9.655000 0.000000 9.655000 ( 9.655000)
9.601000 0.000000 9.601000 ( 9.601000)
read content from stream, build DOM
1.368000 0.000000 1.368000 ( 1.368000)
1.297000 0.000000 1.297000 ( 1.297000)
1.192000 0.000000 1.192000 ( 1.192000)
1.131000 0.000000 1.131000 ( 1.131000)
0.812000 0.000000 0.812000 ( 0.812000)
read content once, build DOM
10.595000 0.000000 10.595000 ( 10.595000)
9.489000 0.000000 9.489000 ( 9.488000)
9.947000 0.000000 9.947000 ( 9.947000)
9.821000 0.000000 9.821000 ( 9.821000)
9.414000 0.000000 9.414000 ( 9.415000)
And here's the performance numbers today, with Joni:
read content from stream, no DOM
2.309000 0.000000 2.309000 ( 2.308000)
1.217000 0.000000 1.217000 ( 1.217000)
0.776000 0.000000 0.776000 ( 0.776000)
0.825000 0.000000 0.825000 ( 0.825000)
0.637000 0.000000 0.637000 ( 0.637000)
read content once, no DOM
0.370000 0.000000 0.370000 ( 0.369000)
0.415000 0.000000 0.415000 ( 0.415000)
0.288000 0.000000 0.288000 ( 0.288000)
0.260000 0.000000 0.260000 ( 0.260000)
0.254000 0.000000 0.254000 ( 0.254000)
read content from stream, build DOM
1.455000 0.000000 1.455000 ( 1.455000)
0.916000 0.000000 0.916000 ( 0.916000)
0.887000 0.000000 0.887000 ( 0.888000)
0.827000 0.000000 0.827000 ( 0.827000)
0.607000 0.000000 0.607000 ( 0.607000)
read content once, build DOM
0.630000 0.000000 0.630000 ( 0.630000)
0.664000 0.000000 0.664000 ( 0.664000)
0.680000 0.000000 0.680000 ( 0.680000)
0.553000 0.000000 0.553000 ( 0.553000)
0.650000 0.000000 0.650000 ( 0.650000)
Marcin's being modest about the work, but we're all absolutely amazed by it.

So finally the last really gigantic performance bottleneck in JRuby is gone, and it appears that JRuby's slow regexp era has come to a close. Next targets: the remaining issues with IO and Java integration performance.

Java 6 Port for OS X (Tiger and Leopard)

I just stumbled across this little gem today:

Landon Fuller's JDK 6 Port for OS X

Who Landon Fuller is I don't know. But I find it incredibly impressive that he's managed to get the base JDK 6 ported to OS X and working. Talk about showing the value of an open-source JDK...Landon Fuller FTW. Apple, are you hiring? Perhaps this guy can kick the Apple JDK process in the ass.

So naturally when I'm confronted with a final JDK 6 release for OS X one thing immediately springs to mind: performance.

We'd always suspected that the early preview version of JDK 6 on OS X was not showing us the true awesome performance we could expect from a final version.

We were right.

So I'll give you two sets of numbers, one that's specious and unreliable and the other that's a more real-world test.

fib numbers

Yes, good old fib. A constant in benchmarking. It shows practically nothing, and yet people use it to demonstrate perf. And in the case of Ruby 1.9, they've specifically optimized for integer-math-heavy benchmarks like this.

Ruby 1.9:

  0.400000   0.000000   0.400000 (  0.413737)
0.420000 0.010000 0.430000 ( 0.421622)
0.400000 0.000000 0.400000 ( 0.411591)
0.410000 0.000000 0.410000 ( 0.411593)
0.400000 0.000000 0.400000 ( 0.410080)
0.410000 0.000000 0.410000 ( 0.408836)
0.400000 0.000000 0.400000 ( 0.408572)
0.410000 0.000000 0.410000 ( 0.408114)
0.400000 0.000000 0.400000 ( 0.410374)
0.400000 0.000000 0.400000 ( 0.413096)

Very nice numbers, especially considering Ruby 1.8 benchmarks at about 1.7s on my system. Ruby 1.9 contains several optimizations for integer math, including the use of "tagged integers" for Fixnum values (saving object costs) and fast math opcodes in the 1.9 bytecode specification (avoiding method dispatch). JRuby does neither of these, representing Fixnums as a normal Java object containing a wrapped Long and dispatching as normal for all numeric operations.

JRuby trunk:
  0.783000   0.000000   0.783000 (  0.783000)
0.510000 0.000000 0.510000 ( 0.510000)
0.510000 0.000000 0.510000 ( 0.510000)
0.506000 0.000000 0.506000 ( 0.506000)
0.505000 0.000000 0.505000 ( 0.504000)
0.507000 0.000000 0.507000 ( 0.507000)
0.510000 0.000000 0.510000 ( 0.510000)
0.507000 0.000000 0.507000 ( 0.507000)
0.508000 0.000000 0.508000 ( 0.508000)
0.510000 0.000000 0.510000 ( 0.510000)

This is improved from numbers in the 0.68s range under the Apple JDK 6 preview. Pretty damn hot, if you ask me. I love being able to sit back and do nothing while performance numbers improve. It's a nice change from 16 hour days.

Anyway, back to performance. JRuby also supports an experimental frameless execution mode that omits allocating and initializing per-call frame information. In Ruby, frames are used for such things as holding the current method visibility, the current "self", the arguments and block passed to a method, and so on. But in many cases, it's safe to omit it entirely. I haven't got it running 100% safe in JRuby yet, and probably won't before 1.1 final comes out...but it's on the horizon. So then...numbers.

JRuby trunk, frameless execution:
  0.627000   0.000000   0.627000 (  0.627000)
0.409000 0.000000 0.409000 ( 0.409000)
0.401000 0.000000 0.401000 ( 0.401000)
0.402000 0.000000 0.402000 ( 0.402000)
0.403000 0.000000 0.403000 ( 0.403000)
0.403000 0.000000 0.403000 ( 0.403000)
0.404000 0.000000 0.404000 ( 0.405000)
0.401000 0.000000 0.401000 ( 0.401000)
0.403000 0.000000 0.403000 ( 0.403000)
0.405000 0.000000 0.405000 ( 0.405000)

Hello hello? What do we have here? JRuby actually executing fib faster than an optimized Ruby 1.9? Can it truly be?

Pardon my snarkiness, but we never thought we'd be able to match Ruby 1.9's integer math performance without seriously stripping down Fixnum and introducing fast math operations into the compiler. I guess we were wrong.

M. Ed Borasky's MatrixBenchmark

I like Borasky's matrix benchmark because it's a non-trivial piece of code, and pulls in a Ruby standard library (matrix.rb) as well. It basically inverts a matrix of a particular size and multiplies the original by the inverse. I show here numbers for a 64x64 matrix, since it's long enough to show the true benefit of JRuby but short enough I don't get bored waiting.

Ruby 1.9:
Hilbert matrix of dimension 64 times its inverse = identity? true
21.630000 0.110000 21.740000 ( 21.879126)
JRuby trunk:
Hilbert matrix of dimension 64 times its inverse = identity? true
14.780000 0.000000 14.780000 ( 14.780000)

This is down from 16-17s under the Apple JDK 6 preview and a clean 25% faster than Ruby 1.9.

So what have we learned today?
  • Sun's JDK 6 provides frigging awesome performance
  • Apple users are crippled without a JDK 6 port. Apple, I hope you're paying attention.
  • Landon Fuller is my hero of the week. I know Landon will just point at the excellent work to port JDK 6 to FreeBSD and OpenBSD...but give yourself some credit, you did what none of the other Leopard whiners did.
  • JRuby rocks
Note: You have to be a Java Research License licensee to legally download the binary or source versions of Landon's port. That or complain to Apple about some dude making a working port before they did. Landon mentions on his blog that he plans to contribute this work to OpenJDK soon...which would quickly result in a buildable GPLed JDK for OS X. Awesome.