Saturday, March 08, 2008

RubyInline for JRuby? Easy!

With JRuby 1.1 approaching, performance looking great, and general Ruby compatibility better than its ever been, we've started to branch out into more libraries and applications. So a couple days ago, I thought I'd make good on a promise to myself and have a look at getting RubyInline working on JRuby.

RubyInline is a library by Ryan "zenspider" Davis which allows you to embed snippits of C code into your Ruby scripts. RubyInline does a minimal parse of that source, and based on the function signature you provide it wires it to the containing class as a Ruby method entry point and performs appropriate entry and exit type conversion to whatever C types you happen to use. It's particularly useful if you have a small algorithm you need to run fast, and you'd like to run it in a somewhat faster language by "throwing work over" to it.

Here's the example of RubyInline from Ryan's page:

class MyTest

def factorial(n)
f = 1
n.downto(2) { |x| f *= x }
f
end

inline do |builder|
builder.c "
long factorial_c(int max) {
int i=max, result=1;
while (i >= 2) { result *= i--; }
return result;
}"
end
end

The interesting bit is the builder.c call, one of several functions on the C builder. Others allow you to add arbitrary preamble code, imports, and "bare" methods (with no type conversion) among other things. And as you'd expect, performance is greatly improved by writing some algorithms in C instead of Ruby.

Naturally, we want to have the same thing work in Java, and ideally use the same API and the same RubyInline plumbing for the rest of it. So then, I present to you java_inline, a RubyInline builder for JRuby!

This represents about four hours of work, and although it doesn't yet have a complete complement of tests and could use some "robustification", it's already working in about 100 lines of code. It's made far easier in JRuby than in C Ruby because we already have a full-features Java integration layer to handle the argument mapping.

Here's a sample java_inline script to show what it looks like, similar to the fastmath.rb sample provided with RubyInline.

So how does it work? Well the Ruby side of things is largely the same as RubyInline's C builder...parse signature, compose the code together and compile it, and bind the method. It wires directly into the RubyInline pipeline, so all you need to do is install RubyInline, require the java_inline.rb script and you're all set. On the Java side, it's using the Java Compiler API, provided in Java 6 implementations. (OT: This has to be the worst-designed API I've ever seen. Go see for yourself. It's cruel and unusual. I won't dwell on it.) So yes, this will only work on Java 6. Deal with it...or submit a patch to get it working on Java 5 as well :)

It's not released in any official form yet, but I'll probably try to wire up a gem or something. I have to make sure I'm dotting my eyes and crossing my tees when I release stuff, even if it's only 100loc. But the repository is obviously public, so play with it, submit patches and improvements, and let me know if you'd like to use it or help work on it more. I'm also interest in suggestions for other libraries you'd like to see special JRuby support for, so pass that along too.

Thursday, March 06, 2008

PyCon, Euruko, and Scotland on Rails

Upcoming event round-up!

  1. Next weekend, Tom Enebo, Nick Sieger and I will be at PyCon, checking out all the pythony goodness and hooking up with the Jython guys for some hacking.
  2. Tom and I will travel to Prague for Euruko 2008, the European RubyConf. We'll present JRuby, unsurprisingly, and hopefully meet up with more European Rubyists we haven't talked to before.
  3. After Euruko, Tom and I will be in Edinburgh for Scotland on Rails. We're supposed to do a 90-minute talk about JRuby on Rails. We'll try to make it entertaining and informative.

Monday, March 03, 2008

Welcome Pythonistas to Sun!

Today we can finally announce another exciting language-related event: we've hired Frank Wierzbicki of the Jython project and Ted Leung of OSAF (and a bunch of other stuff) to help with the Python story at Sun. Hooray!

Frank Wierzbicki has been tirelessly plodding away on Jython for years, since the early days. Even when Jython was considered by many to be "dead", Frank kept the project moving forward as best he could. Now he's been a major part of Jython's revitalization...a new release last year and rapid work to get compatibility up to current-Python levels prove that.

Ted Leung has long been involved with the Open Source Applications Foundation and the Apache Software Foundation, and was one of the original developers of the Xerces Java XML parser. I don't know Ted personally, but I'm excited to be working with him, especially in light of some of this previous roles.

So what does this mean for Sun? Well, it means we're serious about improving support for Python on Sun platforms. Jython is a big part of the story, since they have many challenges similar to JRuby, but a bunch of new ones as well. So we'll be looking to share key libraries and subsystems as much as possible, and we'll start looking at Jython as another driver for future JVM and platform improvement. On the non-Java side of the world, it means we'll ramp up support for Python itself. Sun has already settled on Mercurial for source control for all our OSS projects, and the package management system being worked up for Indiana is written in Python as well. But there's more we need to do.

Please help me in congratulating Frank and Ted on their new roles at Sun. It's going to be another interesting year for polyglots :)

Wednesday, February 27, 2008

University of Tokyo and JRuby Team to Collaborate on MVM

Finally the full announcement is available. We're pretty excited about this one...

The University of Tokyo and Sun Microsystems Commence Joint Research Projects on High Performance Computing and Web-based Programming Languages

Improving Efficiency and Simplicity of Fortress, Ruby and JRuby Languages is Current Focus for Unique Academic-Corporate Collaborative Model

TOKYO and SANTA CLARA, Calif. -- February 27, 2008 -- The University of Tokyo and Sun Microsystems, Inc. (Nasdaq: JAVA) today announced two joint research projects that will focus on High-Performance Computing (HPC) and Web-based programming languages.

...

The two research topics are:
  • Development of a library based on skeletal parallel programming in Fortress
  • Implementation of a multiple virtual machine (MVM) environment on Ruby and JRuby

Some of you knew about this from Tim Bray's rather cryptic announcement at RubyConf. The basic idea here is that we on the JRuby team and folks in Professor Ikuo Takeuchi's group at the University of Tokyo will be cooperating to come up with an MVM specification and implementations for the Ruby 1.9 and JRuby codebases. Tom and I have had brief discussions with Professor Koichi Sasada (ko1, creator of YARV) about this already, and there are mailing lists and wikis already running. I've also made sure that Evan Phoenix, of Rubinius, is included in the discussions. Like JRuby, Rubinius largely has MVM features implemented, but the eventual form they'll take and API they'll present is up in the air.

This should be an exciting collaboration between Sun and U Tokyo, as well as between the three most functional next-generation Ruby implementations. I'll certainly keep you posted on all MVM developments as they come up, and I welcome any comments or suggestions you might have.

Tuesday, February 26, 2008

JRuby in Google Summer of Code 2008

Greetings!

Google's Summer of Code for 2008 is starting up again, and we're looking for folks to submit proposals. The JRuby Community or Sun or me or someone will sign up as a mentoring organization, so start thinking about or discussing possible proposals. Here's a few ideas to get you started (and hopefully, there are many other ideas out there not on this list):

  • Writing a whole suite of specs for current Java integration behavior...and then expanding that suite to include behavior we want to add. This work would go hand-in-hand with a rework of Java Integration we're likely to start soon.
  • Collect and help round out all the profiling/debugging/etc tools for JRuby and get them to final releasable states. There's several projects in the works, but most of them are stalled and folks need better debugging. This project could also include simply working with existing IDEs (NetBeans, etc) to figure out how to get them to debug compiled Ruby code correctly (currently they won't step into .rb files).
  • Continue work on an interface-compatible RMagick port. There's already RMagickJR which has a lot of work into it, but nobody's had time to continue it. A working RMagick gem would ease migration for lots of folks using Ruby.
  • Putting together a definitive set of fine and coarse-grained benchmarks for Rails. JRuby on most benchmarks has been faster than Ruby 1.8.6...and yet higher-performance Rails has been elusive. We need better benchmarks and better visibility into core Rails. Bonus work: help nail down what's slower about JRuby.
  • Survey all existing JRuby extensions and put together an official public API based on core JRuby methods they're using. This would help us reduce the hassle of migrating extensions across JRuby versions.
Please, anyone else who has ideas, feel free to post them here or on the mailing list for discussion. And if you have a proposal, go ahead and mail the JRuby dev list directly.

Update: Martin Probst added this idea in the comments:
Another idea might be to "fix" RDoc, whatever that means. That's not really JRuby centric, but still a very worthwhile task, I think.

A documentation system that allows easy doc writing (Wiki-alike) and provides a better view on the actual functionality you can find in a certain instance would be really helpful. Plus a decent search feature.
And that makes me think of another idea:
  • Add RDoc comments to all the JRuby versions of Ruby core methods, and get RI/RDocs generating as part of JRuby distribution builds. Then we could ship RI and have it work correctly. Ola Bini has already started some of the work, creating an RDoc annotation we can add to our Ruby-bound methods.
Update 2: sgwong in the comments suggested implementing the win32ole library for JRuby. This would also be an excellent contribution, since there's already libraries like Jacob to take some of the pain out of it, and it would be great to have it working on JRuby. And again, this makes me think of an few additional options:
  • Implement a full, compatible set of Apple Cocoa bindings for JRuby. You could use JNA, and I believe there's already Cocoa bindings for Java you could reuse as well, but I'm not familiar with them.
  • Complete implementation of the DL library (Ruby's stdlib for programmatic loading and calling native dynamic libraries) and/or Rubinius's FFI (same thing, with a somewhat tidier interface). Here too there's lots of help: I've already partially implemented DL using JNA, and it wouldn't be hard to finish it and/or implement Rubinius's FFI. And implementing Rubinius's FFI would have the added benefit of allowing JRuby to share some of Rubinius's library wrappers.

Sunday, February 24, 2008

Ruby's Thread#raise, Thread#kill, timeout.rb, and net/protocol.rb libraries are broken

I'm taking a break from some bug fixing to bring you this public service announcement:

Ruby's Thread#raise, Thread#kill, and the timeout.rb standard library based on them are inherently broken and should not be used for any purpose. And by extension, net/protocol.rb and all the net/* libraries that use timeout.rb are also currently broken (but they can be fixed).

I will explain, starting with timeout.rb. You see, timeout.rb allows you to specify that a given block of code should only run for a certain amount of time. If it runs longer, an error is raised. If it completes before the timeout, all is well.

Sounds innocuous enough, right? Well, it's not. Here's the code:

def timeout(sec, exception=Error)
return yield if sec == nil or sec.zero?
raise ThreadError, "timeout within critical session"\
if Thread.critical
begin
x = Thread.current
y = Thread.start {
sleep sec
x.raise exception, "execution expired" if x.alive?
}
yield sec
# return true
ensure
y.kill if y and y.alive?
end
end

So you call timeout with a number of seconds, an optional exception type to raise, and the code to execute. A new thread is spun up (for every invocation, I might add) and set to sleep for the specified number of seconds, while the main/calling thread yields to your block of code.

If the code completes before the timeout thread wakes up, all is well. The timeout thread is killed and the result of the provided block of code is returned.

If the timeout thread wakes up before the block has completed, it tells the main/calling thread to raise a new instance of the specified exception type.

All this is reasonable if we assume that Thread#kill and Thread#raise are safe. Unfortunately, they're provably unsafe.

Here's a reduced example:

main = Thread.current
timer = Thread.new { sleep 5; main.raise }
begin
lock(some_resource)
do_some_work
ensure
timer.kill
unlock_some_resource
end

Here we have a simple timeout case. A new thread is spun up to wait for five seconds. A resource is acquired (in this case a lock) and some work is performed. When the work has completed, the timer is killed and the resource is unlocked.

The problem, however, is that you can't guarantee when the timer might fire.

In general, with multithreaded applications, you have to assume that cross-thread events can happen at any point in the program. So we can start by listing a number of places where the timer's raise call might actually fire in the main part of the code, between "begin" and "ensure".
  1. It could fire before the lock is acquired
  2. It could fire while the lock is being acquired (potentially corrupting whatever resource is being locked, but we'll ignore that for the moment)
  3. It could fire after the lock is acquired but before the work has started
  4. It could fire while the work is happening (presumably the desired effect of the timeout, but it also suffers from potential data corruption issues)
  5. It could fire immediately after the work completes but before entering the ensure block
Other than the data corruption issues (which are very real concerns) none of these is particularly dangerous. We could even assume that the lock is safe and the work being done with the resource is perfectly synchronized and impossible to corrupt. Whatever. The bad news is what happens in the ensure block.

If we assume we've gotten through the main body of code without incident, we now enter the ensure. The main thread is about to kill the timeout thread, when BAM, the raise call fires. Now we're in a bit of a predicament. We're already outside the protected body, so the remaining code in the ensure is going to fail. What's worse, we're about to leave a resource locked that may never get unlocked, so even if we can gracefully handle the timeout error somewhere else, we're in trouble.

What if we move the timer kill inside the protected body, to ensure we kill the timer before proceeding to the lock release?

main = Thread.current
timer = Thread.new { sleep 5; raise }
begin
lock(some_resource)
do_some_work
timer.kill
ensure
unlock_some_resource
end

Now we have to deal with the flip side of the coin: if the work we're performing raises an exception, we won't kill the timer thread, and all hell breaks loose. Specifically, after our lock has been released and we've bubbled the exception somewhere up into the call stack, BAM, the raise call fires. Now it's anybody's guess what we've screwed up in our system. And the same situation applies to any thread you might want to call Thread#raise or Thread#kill against: you can't make any guarantees about what damage you'll do.

There's a good FAQ in the Java SE platform docs entitled Why Are Thread.stop, Thread.suspend, Thread.resume and Runtime.runFinalizersOnExit Deprecated?, which covers this phenomenon in more detail. You see, in the early days of Java, it had all these same operations: killing a thread with Thread.stop(), causing a thread to raise an arbitrary exception with Thread.stop(Throwable), and a few others for suspending and resuming threads. But they were a mistake, and in any current Java SE platform implementation, these methods no longer function.

It is provably unsafe to be able to kill a target thread or cause it to raise an exception arbitrarily.

So what about net/protocol.rb? Here's the relevant code, used to fill the read buffer from the protocol's socket:
def rbuf_fill
timeout(@read_timeout) {
@rbuf << @io.sysread(8196)
}
end

This is from JRuby's copy; MRI performs a read of 1024 bytes at a time (spinning up a thread for each) and Rubinius has both a size modification and a timeout.rb modification to use a single timeout thread. But the problems with timeout remain; specifically, if you have any code that uses net/protocol (like net/http, which I'm guessing a few of you rely on) you have a chance that a timeout error might be raised in the middle of your code. You are all rescuing Timeout::Error when you use net/http to read from a URL, right? What, you aren't? Well you better go add that to every piece of code you have that calls net/http, and while you're at it add it to every other library in the world that uses net/http. And then you can move on to the other net/* protocols and third-party libraries that use timeout.rb. Here's a quick list to get you started.

Ok, so you don't want to do all that. What are your options? Here's a few suggestions to help you on your way:
  1. Although you don't have to take my word for it, eventually you're going to have to accept the truth. Thread#kill, Thread#raise, timeout.rb, net/protocol.rb all suffer from these problems. net/protocol.rb could be fixed to use nonblocking IO (select), as could I suspect most of the other libraries, but there is no safe way to use Thread#kill or Thread#raise. Let me repeat that: there is no safe way to use Thread#kill or Thread#raise.
  2. Start lobbying the various Ruby implementations to eliminate Thread#kill and Thread#raise (and while you're at it, eliminate Thread#critical= as well, since it's probably the single largest thing preventing Ruby from being both concurrently-threaded and high-performance).
  3. Start lobbying the library and application maintainers using Thread#kill, Thread#raise, and timeout.rb to stop.
  4. Stop using them yourself.
Now I want this post to be productive, so I'll give a brief overview of how to avoid using these seductively powerful and inherently unsafe features:
  • If you want to be able to kill a thread, write its code such that it periodically checks whether it should terminate. That allows the thread to safely clean up resources and prepare to "die itself".
  • If you need to time out an operation, you're going to have to find a different way to do it. With IO, it's pretty easy. Just look up IO#select and learn to love it. With arbitrary code and libraries, you may be able successfully lobby the authors to add timeout options, or you may be able to hook into them yourself. If you can't do either of those...we'll, you're SOL. Welcome to threads. I hope others will post suggestions in the comments.
  • If you think you can ignore this, think again. Eventually you're going to get bitten in the ass, be it from a long-running transaction that gets corrupted due to a timeout error or a filesystem operation that wipes out some critical file. You're not going to escape this one, so we should start trying to fix it now.
I'm hoping this will start a discussion eventually leading to these features being deprecated and removed from use. I believe Ruby's viability in an increasingly parallel computing world depends on getting threading and concurrency right, and Thread#raise, Thread#kill, and the timeout.rb library need to go away.

Thursday, February 21, 2008

FOSDEM

It's 5am in Brussels and I'm awake. That can only mean one thing. Time to blog!

This weekend I'm presenting JRuby at FOSDEM, the "Free and Open Source Developers European Meeting." I was invited to talk, and who could pass up an invitation to Belgium?

I've been trying to shake things up with my recent JRuby talks. At Lang.NET, I obviously dug into technical details a lot more because it was a crowd that could stomach such things. At acts_as_conference, I threw out the old JRuby on Rails demo and focused only on things that make JRuby on Rails different (better) than classic Ruby on Rails development, such as improved performance, easier deployment, and access to a world of Java libraries. And FOSDEM will include all new content as well: I'm spending Friday putting together a talk that discusses JRuby capabilities and status while simultaneously illustrating the impact community developers have had on the project. After all, it's an OSS conference, so I'll continue my recent trend and try to present something directly on-topic.

For those of you unable to attend, it turns out there will be a live video stream of some of the talks, including the whole Programming Languages track. I don't think I've ever been livestreamed before.

Saturday, February 16, 2008

JRuby RC2 Released; What's Next?

Today, Tom got the JRuby 1.1 RC2 release out. It's an astounding collection of performance improvements and compatibility fixes. Here's Tom's JRuby 1.1 RC2 announcement.

Let's recap a little bit:

  • There have been additional performance improvements in RC2 over RC1. Long story short, performance of most trivial numeric benchmarks approaches or exceeds Ruby 1.9, while most others are on par. So in general, we have started using Ruby 1.9 performance as our baseline.
  • Unusual for an RC cycle are the 260 bugs we fixed since RC1. Yes, we're bending the rules of RCs a bit, but if we can fix that many bugs in just over a month, who can really fault us for doing so? Nobody can question that JRuby is more compatible with Ruby 1.8.6 than any other Ruby implementation available.
  • We've also included a number of memory use improvements that should reduce the overall memory usage of a JRuby instance and perhaps more importantly reduce the memory cost of JIT compiled code (so called "Perm Gen" space used by Ruby code that's been compiled to JVM bytecode). In quick measurements, an application running N instance of JRuby used roughly 1/Nth as much perm gen space.
So there's really two big messages that come along with this release. The first is that JRuby is rapidly becoming the Ruby implementation of choice if you want to make a large application as fast and as scalable as possible. The thousands of hours of time we've poured into compatibility, stability, and performance are really paying off. The second message comes from me to you:

If you haven't looked at JRuby, now's the time to do so.

All the raw materials are falling into place. JRuby is ready to be used and abused, and we're ready to take it to the next level for whatever kinds of applications you want to throw at it. And not only are we ready, it's our top priority for JRuby. We want to make JRuby work for you.

And that brings me to the "What's Next" portion of this post. Where do we go from here?

We've got a window for additional improvements and fixes before 1.1 final. That much is certain, and we want to fill that time with everything necessary to make JRuby 1.1 a success. So we need your help...find bugs, file reports, let us know about performance bottlenecks. And if you're able, help us field those reports by fixing issues and contributing patches.

But we also have a unique opportunity here. JRuby offers features not found in any other Ruby implementation, features we've only begun to utilize:

Inside a single JVM process, JRuby can be used to host any number of applications, scaling them to any number of concurrent users.

This one applies equally well to Rails and to other web frameworks rapidly gaining popularity like Merb. And new projects like the GlassFish gem are leading the way to simple, scalable, no-hassle hosting for Ruby web applications. But we're pretty resource-limited on the JRuby project. We've got two full-time committers and a handful of part-timers and after-hours contributors pouring their free time into helping out. For JRuby web app hosting to improve and meet your requirements, we're going to need your use cases, your experience, and your input. We're attempting to build the most scalable, best-performing Ruby web platform with JRuby, but we're doing it in true OSS style. No secrets, no hidden agendas. This is going to be your web platform, or our efforts are wasted. So what should it look like?

The GlassFish gem is the first step. It leverages some of the best features of the Java platform: high-speed asynchronous IO, native threading, built-in management and monitoring, and application namespace isolation (yes, even classloaders have a good side) to make Ruby web applications a push-button affair to deploy and scale. With one command, your app is production-ready. No mongrel packs to manage, no cluster of apps to monitor, and no WAR file relics to slow you down. "glassfish_rails myapp" and you're done; it's true one-step deployment. Unfortunately right now it only supports Rails. We want to make it not only a rock-solid and high-performance Rails server, but also a general-purpose "mod_ruby" for all Ruby web-development purposes. It's the right platform at the right time, and we're ready to take it to the next level. But we need you to try it out and let us know what it needs.

JRuby's performance regularly exceeds Ruby 1.8.6, and in many cases has started to exceed Ruby 1.9.

At this point I'm convinced JRuby will be able to claim the title of "fastest Ruby implementation", for some definition of "Ruby". And if we're not there yet, we will be soon. With most benchmarks meeting or exceeding Ruby 1.8.6 and many approaching or exceeding Ruby 1.9 we're feeling pretty good. What I've learned is that performance is important, but it's not the driving concern for most Ruby developers, especially those building web applications usually bounded by unrelated bottlenecks in IO or database access. But that's only what I've been able to gather from conversations with a few of you. Is there more we need to do? Should we keep going on performance, or are there other areas we should focus on? Do you have cases where JRuby's performance doesn't meet your needs?

JRuby performance future is largely an open question right now. A stock JRuby install performs really well, "better enough" that many folks are using it for performance-sensitive apps already. We know there's bottlenecks, but we've solving them as they come up, and we're on the downward slope. Outside of bottlenecks, do we have room to grow? You bet we do. I've got prototype and "experimental" features already implemented and yet to be explored that will improve JRuby's performance even more. Of course there's always tradeoffs. You might have to accept a larger memory footprint, or you may have to turn off some edge-case features of Ruby that incur automatic performance handicaps. And some of the wildest improvements will depend on dynamic invocation support (hopefully in JDK 7) and a host of other features (almost certain to be available in the OpenJDK "Multi-language VM" subproject). But where performance improvements are needed, they're going to happen, and if I have any say they're going to happen in such a way that all other JVM languages can benefit as well. I'm looking to you folks to help us prioritize performance and point us in the right direction for your needs.

JRuby makes the JVM and the Java platform really shine, with excellent language performance and a "friendlier" face on top of all those libraries and all that JVM magic.

I think this is an area we've only just started to realize. Because JRuby is hosted on the JVM, we have access to the best JIT technology, the best GC technology, and the best collection of libraries ever combined into a single platform. Say what you will about Java. You may love the language or you may hate it...but it's becoming more and more obvious that the JVM is about a lot more than Java. JRuby is, to my knowledge, the only time a non-JVM language has been ported to the JVM and used for real-world, production deployments of a framework never designed with the JVM in mind. We've taken an unspecified single-implementation language (Matz's Ruby) and its key application framework, (Rails) and delivered a hosting and deployment option that at least parallels the original and in many ways exceeds it. And this is only the first such case...the Jython guys now have Django working, and it's only a matter of time before Jython becomes a real-world usable platform for Python web development. And there's already work being done to make Merb--a framework inspired by Rails but without many of Rails' warts--run perfectly in JRuby. And it's all open source, from the JVM up. This is your future to create.

I think the next phase for JRuby will bring tighter, cleaner integration with the rest of the Java platform. Already you can call any Java-based library as though it were just another piece of Ruby code. But it's not as seamless as we'd like. Some issues, like choosing the right target method or coercing types appropriately, are issues all dynamic languages face calling into static-typed code. Groovy has various features we're likely to copy, features like explicit casting and coercing of types and limited static type declaration at the edges. Frameworks like the DLR from Microsoft have a similar approach, since they've been designed to make new languages "first class" from day one. We will work to find ways to solve these sorts of problems for all JVM languages at the same time we solve them for JRuby. But there's also a lot that needs to come from Ruby users. What can we do to make the JVM and all those libraries work for you?

I guess there's a simple bottom line to all this. JRuby is an OSS project driven mostly by community contributors (5 out of 8 committers working in their free time and hundreds of others contributing patches and bug reports), based on an OSS platform (not only OpenJDK, but a culture of Free software and open source that permeates the entire Java ecosystem), hosting OSS frameworks and libraries (Rails, Merb, and the host of other apps in the Ruby world). All this is meaningless without you and your applications. We're ready to pour our effort into making this platform work for you. Are you ready to help us?

Monday, January 28, 2008

Lang.NET 2008: Day 1 Thoughts

Yes friends, I'm at Microsoft's Lang.NET symposium this week. Does this strike you as a bit peculiar?

Lang.NET is Microsoft's event for folks interested in using and implementing "alternative" languages on the CLR. From the Lang.NET site itself:

Lang .NET 2008 Symposium is a forum for discussion on programming languages, managed execution environments, compilers, multi-language libraries, and integrated development environments. This conference provides an excellent opportunity for Programming Language Implementers and Researchers from both industry and academia to meet and share their knowledge, experience, and suggestions for future research and development in the area of programming languages
Of course .NET and the CLR aren't mentioned explicitly here, but being a Microsoft event I think it's reasonable to expect it to be .NET heavy. And that's certainly ok, since at least one of the key professed design goals of the CLR is to support multiple languages, and by multiple we'd like to think they mean more than VB or C#. Microsoft would understandably like to cultivate a culture of language implementation and experimentation on CLR, since history has shown repeatedly that programming language diversity is a must for any platform, system, or framework to be long-lived. This fact is not new to many of us, although there are certain companies that until recently refused to believe it. But I digress.

What is a long time Java user and short time JVM language implementer doing here? Why would I subject myself to the slings and arrows of a potentially unfriendly .NET crowd, or at least subject myself to the scalding apathy of a roomful of engineers with no interest in the JVM or my work on JRuby? Well, that's an excellent question.

Perhaps it's more entertaining to describe *what* I'm doing here before explaining *why* I'm doing it. The credit for the idea goes to Brian Goetz, a coworker of mine at Sun and currently an engineer working on JavaFX (though you probably know him better from his talks on Java performance or his book and talks on Java concurrency). Given that we're both working on JVM language-related projects, and we both would like to see the JVM and the Java platform continue to grow, evolve, and expand, he thought it might be useful for us two and John Rose to attend the conference to present and discuss the many enhancements we're considering to support non-Java languages. You may know John Rose from his posts on adding various wild features to the JVM...features that you, dear reader, and others like you may never have expected to enter the JVM in any form. That's how I know him, and from my many discussions with him about how to make JRuby perform well on current JVM versions. But we must look to the future, and attending Lang.NET is at least in part to help us validate that this is an appropriate future to pursue.

So today John Rose and I presented his Da Vinci Machine project to a largely .NET crowd, with John covering the theoretical aspects of what features we'd like to add to the JVM and how he plans to add them, and me covering the practical reasons why we'd like to add those features using JRuby as an example. Those of you who saw my presentation at RubyConf would have recognized my slides; I largely presented the same "JRuby design internals" content in more depth and with language implementers and enthusiasts in mind. Specifically I presented details about JRuby's parser, interpreter, core class implementations, extensions and POSIX support, and of course its compiler and related optimizations. The last area was obviously the most interesting for this crowd, but I'll explain that later.

So there's the primary "what" of the trip. And it was the primary "what" we three shared in common. But there were a few other "what"s I had in mind entering into this, and they also explain the "why":
  1. I hoped to understand better how the DLR (and to a lesser extent the CLR) improve the language-implementing ecosystem on .NET.
  2. I wanted to meet and talk with .NET language implementers about their strategies for various problems facing JRuby and similar JVM languages.
  3. I wanted to meet my counterparts on the .NET side of the world, which is increasingly becoming a parallel universe to the JVM and the Java platform...at least as far as problem-solving strategies and future directions go.
  4. I enjoy meeting and talking with smart people. There's a few of them here.
So in order to "yeggefy" my post a bit, I want to elaborate on the first and primary point in more detail, rather than give another yawner of a conference play-by-play.

Pain

It's worth prefacing this with a disclaimer: I am not a Microsoft fan. Perhaps it's like being burned too many times by a hot iron. My work on LiteStep, my enthusiasm for open source, my interest in collaborative, communal development processes, and my dark days learning Win32, MFC, and COM mean I'll probably never be particularly warm to Microsoft projects, technology, or ideals, regardless of how they may evolve over time. That's not to say Microsoft can't change or isn't changing...but as you probably know it's hard to make yourself touch a cold iron. Pain rewires our circuitry, and lingers on in our altered behavior forever. I have felt pain.

But I endeavor to rise above my own wiring and bigotry, which is why I force myself to help other JVM language implementations, force myself to help other Ruby implementations, and force myself to share freely and openly all I can even with members of otherwise competing worlds. In this sense, the pain that comes from cooperating with projects I might otherwise want to undermine, derail, or discount is antithetical to my believe in open doors and open processes. I must cure myself of this affliction, rewrite myself to not feel that pain.

And so I am trying to study the DLR, which brings me to the big point #1. I will appreciate any and all corrections to everything I state here.

Eyeing the DLR

The DLR is Microsoft's Dynamic Language Runtime, a set of libraries and tools designed to make it easier to implement dynamic languages atop the CLR. The DLR provides facilities for compiler and interpreter generation (via language-agnostic expression trees), fast dynamic invocation (via self-updating dynamic call sites), and cross-language method dispatch and type system support. It is, to my eyes, intended to be the "everytool" needed to implement dynamic languages for .NET.

The DLR has largely grown out of Jim Hugunin's work on IronPython, and aims to tackle all the same issues we've dealt with while working on JRuby. But it also provides more than that in its self-optimizing capabilities and its expression tree logic, two features that undoubtedly make it easier to produce CLR dynamic language implementations with acceptable to excellent performance characteristics.

Expression trees in the DLR are used as an intermediate representation of language semantics. Where the typical language implementation process has you parse to an AST and then either interpret that or perform the additional work to compile the AST to a target "assembly" language, expression trees are meant to be (mostly) the last step in your language implementation process (outside of implementing language-specific types and dealing with language-specific nuances). Up to the point of creating a parsed AST the two paths are largely the same. But on the DLR, instead of proceeding to compiler work, you further transform the AST into an expression tree that represents the language behavior in DLR terms. So if your language provides mechanisms to conditionally run a given expression (modifiers in Ruby terms), you might turn your one conditional expression AST node into equivalent test and execute DLR expression tree nodes. It appears the translation is almost always a widening translation, where the DLR hopes to provide a broad set of "microsemantics" that can be used to compose a broader range of language features. And of course the idea is that there's a quickly-flattening asymptotic curve to the number of expression tree nodes required to implement each new language. Whether that's the case is yet to be seen.

I must admit, I've poo-pooed the expression tree idea in the past, mostly because I did not understand it and had not found a way to see beyond my personal biases. While a language-generic expression tree is certainly a very appealing goal, one that lowers the entry cost of language implementation considerably, it always seemed like a too-lofty ideal. Of course it's impossible to predict what features future languages might decide to add, so can we reasonably expect to provide a set of expression nodes that can represent those languages? Won't we be endlessly forced to extend that set of node types for each new language that comes along? And if to alleviate such pain we are given hooks to call back to custom language-specific code for cases where the DLR falls down, doesn't that eliminate the advantage we hoped to gain in the first place?

I'll make no secret about my skepticism here, but I've seen a couple things today that make me far more optimistic about the utility of language-agnostic expression trees.

The first was during Anders Hejlsberg's talk about C# 3.0. C# 3.0 adds a considerable slate of features, some which I can appreciate (type inference, for example, would take 99% of the pain out of using generics; Anders called that out as a key reason for adding inference to C#) and others which I'm still dubious on (LINQ's additions to C#'s already large set of keywords). But his talk made one thing abundantly clear: these changes are little more than compiler magic, smoke and mirrors to produce syntactic sugar. And he admitted as much, showing that behind the scenes LINQ and inferred types and their ilk basically compile down to the same IL that would be produced if you write the same sugar by hand. And that's particularly heartening to me, since it means the vast majority of these features could be added to Java or supported by JVM languages without modifying the JVM itself. Whether they're features we want to add is a separate discussion.

The second detail that made me better appreciate expression trees stems directly from the first: they're just objects, man. Although expression tree support is being wired directly into C# and VB.NET for at least the small subset of expressions LINQ encompasses, in general we're talking about nothing more than a graph of objects and a set of libraries and conventions that can utilize them. If it were reasonable for us to expose JRuby's AST as a "one true AST" to represent all languages, we'd be able to support a similar set of features, adding syntax to Java to generate Ruby ASTs at runtime and capabilities for manipulating, executing, and compiling those ASTs. Naturally, we'd never try to make an argument that the JRuby AST can even come close to approximating a set of low-level features "all languages" might need to support, but hopefully you see the parallel. So again, we're working at a level well above the VM itself. That means the power of expression trees could definitely be put into the hands of JVM language developers without any modifications to existing JVMs. And in the DLR's case, they're even considering the implications of allowing new language implementations to provide extensions to the set of expression tree nodes, so at least some of them recognize that it's all just objects too...and perhaps starting to recognize that nobody will ever build the perfect mouse trap.

So that brings me to my other key point of interest in the DLR: its mechanisms for optimizing performance at runtime.

The DLR DynamicSite is basically an invokable object that represents a given call at a given position in a program's source code. When you perform a dynamic invocation, the DynamicSite runs through a list of rules, checking each in turn until it finds a match. If it finds a match, it invokes the associated target code, which presumably will do things like coerce types, look up methods, and perform the actual work of the invocation. As more calls pass through the same call site, DynamicSite will (if necessary) repeatedly replace itself with new code (new IL in fact) that includes some subset of previous rules and targets (potentially all of them if they're all still valid or none of them if they're all now invalid) along with at least one new rule for the new set of circumstances brought about by this call. I've prototyped this exact same behavior in JRuby's call sites (on a more limited scale), but have not had the time to include them in JRuby proper just yet (and might never have to...more on that later). The idea is that as the program runs, new information becomes available that will either make old assumptions incorrect or make new optimizations possible. So the DynamicSite continually evolves, presumably toward some state of harmony or toward some upper limit (i.e. "I give up").

And there's a simple reason why the DLR must do this to get acceptable performance out of languages like Python and Ruby:

Because the CLR doesn't.

A Solid Base

In the JVM world, we've long been told about and only recently realized the benefits of dynamic optimization. The JVM has over time had varying capabilities to dynamically optimize and reoptimize code...and perhaps most importantly, to dynamically *deoptimize* code when necessary. Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations--potentially unsafe optimizations in the uncertain future of an entire application--knowing you'll be able to fall back on a tried and true safe path later on. So you can do things like inlining entire call paths once you've seen the same path a few times. You can do things like omitting synchronization guards until it becomes obvious they're needed. And you can change the set of optimizations applied after the fact...in essence, you can safely be "wrong" and learn from your mistakes at runtime. This is the key reason why Java now surpasses C and C++ in specific handcrafted benchmarks and why it should eventually be able to exceed C and C++ in almost all benchmarks. And it's a key reason why we're able to get acceptable performance out of JRuby having done far less work than Microsoft has done on IronPython and the DLR.

The CLR, on the other hand, does not (yet) have the same level of dynamic optimization that the JVM does. Specifically, it does not currently support deoptimizing code that has already been JITed, and it does not always (occasionally? rarely?) include type information into its consideration of how to optimize code. (Those of you more familiar with CLR internals please update or correct me on how much optimization exists today and how deep those optimizations go.) So for dynamic languages to perform well, you simply have to do additional work. You don't have a static call path you can JIT early and trust to never change. You can't bind a call site to a specific method body, knowing verifiably that it will forever be the same code called from that site. And ultimately, that means you must implement those capabilities yourself, with profiling, self-updating call sites that build up rule and target sets based on gathered information.

So I think there's a reasonably simple answer now to folks asking me if I believe we need a "DLR" for JVM language implementers to target. The answer is that certain parts of the DLR are definitely attractive, and they may be worth implementing to ease the pain of JVM language implementation or at least reduce the barrier to entry. But there's a large set of DLR features we simply don't need to create if we can find a way to open up the JVM's existing ability to optimize dynamically...if we can punch a hole in the bulky, dynamic-unfriendly Java exterior. And that's exactly what we're doing (really, John is doing) with JSR-292 (dynamic invocation for JVM) and the MLVM.

Where then do we go from here? If the correct path is only partly down DLR street, what else is missing? Well, some of my Sun brethren may not like to hear these (and some other may love to hear them), but here's a short list of what I believe needs to happen to keep the JVM relevant into the future (at least if we take it as written that multi-language support is a must):
  1. We need clear and consistent proof that a multi-language JVM is a priority, and not just at Sun Microsystems but also in the wider JVM and Java platform communities. The libraries and runtimes and support code necessary will grow out of that commitment. It's obviously not worth the effort to make all this happen if nobody cares and nobody will want to use it. But I don't think that's the case, and currently there's not a strong enough indication from any of the major community players that this cause is worth fighting for. At least in Sun's case, we've thrown down the gauntlet by open-sourcing Java and initiating projects like the JVM Languages mailing list, the JVM Language Runtime project, and the Multi-Language VM project. But we need you, and that brings me to my second point.
  2. We're fighting an uphill battle for talented, excited resources in this domain, and without more money and time spent by the major players on multi-language support for the JVM (in collaboration, I might add), it's not going to happen. Again, whether that's a bad thing depends on whether you believe in a multi-language world. I do, and I'm willing to spend whatever of my time is necessary to make this happen. But I'm no patsy for political interests that might route us down the wrong path, and I'm definitely not going to be able to do this alone. Where do you stand?
  3. We need the freedom to make the JVM "reborn". Java is suffering from middle age (or old age, depending on who you ask) so making nontrivial changes to the JVM specification is generally met with stiff resistance. But I believe this resistance comes either from folks who don't realize the stakes involved, or from folks with their own bigotry and biases, or perhaps simply from rank-and-file pragmatists who don't want to or can't invest the resources necessary to make their implementation of the "JVM" more useful to the relative few of us currently interested in next-generation JVM language support. And of course, there were lots of folks who believed without a doubt the Titanic could never sink, and that additional resources to ensure it would be wasted. It's time to learn from such mistakes. Yes, we can and will use MLVM as a proving ground for these features, and yes, I'm sure Sun and other players will continue to use careful, measured steps to evolve the JVM and the Java language. These are both appropriate directions. But we must never shut the door on the future by claiming that either the JVM or the Java language are "done". There's three words for an entity that can no longer change: "it's dead, Jim". Java is not dead...and I'll be damned if I'm going to let it or the JVM die.
As always my opinions are my own, and may or may not reflect the positions of Sun Microsystems or its partners. But I know for a fact there are many other like-minded individuals that feel the same way.

I'm looking forward to day two of Lang.NET 2008, and to hearing from all of you out there. Comments, questions, flames and praise are welcome. But actions speak louder than words, and the time for action is now.

Update: Here's a link to Tom's and my RubyConf slides. I'll add a link to the Lang.NET slides when they're posted.

Thursday, January 03, 2008

Jython's Back, Baby!

Well it's been a long hard slog for the Jython team. Once thought dead, they seemed to pick up steam more and more over the past year. They got out a long awaited 2.2 release and started to work on many missing 2.3, 2.4, and 2.5 features. They started tackling new parsers and compilers. They spoke with me and other folks at Sun about the future of dynamic languages on the JVM. And above all, they've been working their asses off, having sprints and codefests and hacking away on every corner of Jython.

It looks like their hard work is paying off. Jim Baker reports that they have successfully run Django on Jython. They're using bleeding edge revisions of both Jython and Django, and there's a bit more work to be done, but hey, lots of folks thought it would be impossible. Haven't we heard that somewhere before?

Hats off to whole Jython team and their obviously excellent community for making this a reality. This is just the beginning!

Saturday, December 22, 2007

Project Idea: Native JSON Gem for JRuby

json-lib 2.2 has been released by Andres Almiray, and he boldly claims that "it now has become the most complete JSON parsing library". It supports Java, Groovy, and JRuby quite well, and Andres has an extensive set of json-lib examples to get you started.

But this post is about a project idea for anyone interested in tackling it: make a JRuby-compatible version of the fast JSON library from Florian Frank using Andres's json-lib (or another library of your choosing, if it's as "complete" as json-lib).

JSON is being used more and more for remoting, not just for AJAX but for RESTful APIs as well. Rails 2.0 supports REST APIs using either XML or JSON (or YAML, I believe), and many shops are settling on JSON.

So there's a possibility this could be a bottleneck for JRuby unless we have a fast native JSON library. There's json-pure, also from Florian Frank, which is a pure Ruby version of the library...but that will never compete with a fast version in C or Java.

Anyone up to the challenge? JRUBY-1767: JRuby needs a fast JSON library

Update: Marcin tells me that Florian's JSON library uses Ragel, which may be an easier path to getting it up and running on JRuby. Hpricot and Mongrel also use Ragel, and both already have JRuby versions.

Thursday, December 20, 2007

A Few Easy (?) JRuby 1.1 Bugs

When I posted on a few easy JRuby 1.0.2 bugs a couple months ago, I got a great response. So since I'm doing a bug tracker review today for JRuby 1.1, here's a new list for intrepid developers.

DST bug in second form of Time.local - This doesn't seem like it should be hard to correct, provided we can figure out what the correct behavior is. We've recently switch JRuby to Joda Time from Java's Calendar, so this one may have improved or gotten easier to resolve.

Installing beta RubyGems fails - This actually applies to the current release of RubyGems, 1.0.0...I tried to do the update and got this nasty error, different from the one originally reported. Someone more knowledgable about RubyGems internals could probably make quick work of this.

weakref.rb could (should?) be implemented in Java - So I already went ahead and did this, because it was a trivial bit of work (and perhaps a good piece of code to look at it you want to see how to easily write extensions to JRuby). But what's still missing are any sort of weakref tests or specs. My preference would be for you to add specs to Rubinius's suite, which will someday soon graduate to a top-level standard test suite of its own. But at any rate, a nice set of tests/specs for weakref would do good for everyone.

AnnotationFormatError when running trunk jruby-complete --command irb - This is actually a JarJar Links problem that we're using a patched version for right now. Problem is...even the current jarjar-1.0rc6 still breaks when I incorporate it into the build. These sorts of bugs can drive a person mad, so if anyone wants to shag out what's wrong with jarjar here, we'd really appreciate it.

Failure in ruby_test Numeric#truncate test - The first of a few trivial numeric failures from Daniel Berger's "ruby_test" suite. Pretty easy to jump into, I would expect.

Failure in ruby_test Numeric#to_int test - And another, perhaps the same sort of issue.

IO.select does not work properly with timeout - This one would involve digging into JRuby IO code a bit, but for someone who knows IO reasonably well it may not be difficult. The primary issue is that while almost all other types of IO in JRuby use NIO channels, stdio does not. So basically, you can't do typical non-blocking IO operations against stdin, stdout, or stderr. Think you can tackle it?

Iconv character set option //translit is not supported - The Iconv library, used by MRI for character encoding/decoding/transcoding, supports more transliteration options than we've been able to emulate with Java's CharSet classes. Do you know of a way to support Iconv-like transliteration in Java?

jirb exits on ESC followed by any arrow key followed by return - this is actually probably a Jline bug, but maybe one that's easily fixed?

bfts test_file_test failures - The remaining failures here seem to mostly be because we don't have UNIXServer implemented (since UNIX domain sockets aren't supported on the JVM). However, since JRuby ships with JNA, it could be possible for someone familiar with UNIX socket C APIs to wire up a nice implementation for us. And for that matter, it would be useful for just about anyone who wants to use UNIX sockets from Java.

Process module does not implement some methods
- Again, here's a case where it probably would be easy to use JNA to add some missing POSIX/libc functions to JRuby.

Implement win32ole library using one of the available Java-COM bridges - This one already has a start! Some fella named Rui Lopes started implementing the Win32OLE Ruby library using Jacob, a Java/COM bridge. I'd love to see this get completed, since Ruby's DSL capabilities map really nicely to COM/ActiveX objects. (It's also complicated by the fact that most of the JRuby core team develops on Macs.)

allow "/" as absolute path in Windows - this is the second oldest JRuby 1.1-flagged bug, and from reviewing the comments we still aren't sure the correct way to make it work. It can't be this hard, can it?

IO CRLF compatibility with cruby on Windows - Another Windows-related bug that really should be brought to a close. Koichiro Ohba started looking into it some time ago, but we haven't heard from him recently. This is the oldest JRuby 1.1 bug, numbered JRUBY-61, and is actually the second-oldest open bug overall. Can't someone help figure this bloody thing out?

Saturday, December 08, 2007

Upcoming Events: Dec 2007, Jan/Feb 2008

JavaPolis 2007 - Antwerp, Belgium - December 10-14 - Sounds like a great event this year, with claims of over 3200 registrations so far. I'll be sharing the JRuby/NetBeans tutorial with Brian Leonard on the 10th and the JRuby/Rails talk with Ola Bini on the 12th. Outside of that, I'll probably be hacking in the main area. Come say hi.

Microsoft Lang.NET Symposium - Redmond, Washington - January 28-30 - I'll be there to get ideas about building a language platform, sharing war stories with fellow language implementers, and probably contributing a bit to John Rose's talk on the Multi-Language VM project. Oughta be a fun time...though it feels a bit weird making my first trip to Microsoft.

acts_as_conference - Orlando, Florida - February 8-9 - Robert Dempsey of Rails For All invited me to come talk about JRuby and Rails...though I'll be doing things a bit differently this time (not showing how to build a Rails app, but showing purely how JRuby improves the Rails ecosystem). Who could pass up a trip to Florida from Minnesota at this time of year?

FOSDEM 2008 - Brussels, Belgium - February 23-24 - FOSDEM invited me to present on the OSS languages track. I've got some great ideas for how to tackle this one. Given that it's an OSS conference, I think it's finally time to show how JRuby has evolved in the past three years from a slow, partial interpreter and runtime to the fastest Ruby 1.8-compatible implementation around. It's been a hell of a ride, and it's gotta qualify as an OSS success story.

Outside these four events, I've had invitations for plenty others (I could probably just do conferences...but how would I ever get anything done?) so I'm sure there will be more to come. You can also count on JavaOne in San Francisco this spring, Ruby Kaigi in Tokyo this summer, RubyConf Europe in Prague some time between April and July, and maybe RailsConf 2008 in Portland (though there's a good chance I won't be presenting).

Friday, December 07, 2007

Groovy 1.5 Released!

The Groovy team has kicked out their second major production release, Groovy 1.5...and skipped straight from 1.0 to 1.5. Why? Perhaps because they added generics, enums, static imports, annotations, fully dynamic metaclasses, improved performance, ... and much more. I think the move to 1.5 was certainly warranted, and we've been considering making the next JRuby release 1.5 for the same reasons.

Congratulations to the Groovy team! I'm looking forward to seeing 1.6 and 2.0 in the future!

OpenJDK Migration to Mercurial is Complete!

I'm really excited about this one. Kelly O'Hair reports that OpenJDK source has been fully migrated to Mercurial! This means that daily development on OpenJDK (eventually to produce Java 7 and other great things) will happen on the same repository that you, dear reader, can access from home. And it's using Mercurial, one of the two big Distributed SCM apps, so you can pull off an entire repo and maintain your own OpenJDK workshop at home. Excellent news...I now finally have an excuse to learn Hg, and I can finally put in the effort to get OpenJDK building with the knowledge that I'll safely be able to "pull" changes as they happen. Thank you to the OpenJDK migration team!

See also Kelly O'Hair's sun.com blog for articles on the OpenJDK Mercurial layout and how to work with it.

Wednesday, December 05, 2007

Groovy in Ruby: Implement Interface with a Map

Some of you may know I participate in the Groovy community as well. I'm hoping to start contributing some development time to the Groovy codebase, but for now I've mostly been monitoring their progress. One thing the Groovy team has more experience with is integrating with Java.

Now if you ask the Groovy team, they'll make some claim like "it's all Java objects" or "Groovy integrates seamlessly with Java" but neither of those are entirely true. Groovy does integrate extremely well with Java, but it's because of a number of features they've added over time to make it so...many of them not directly part of the Groovy language but features of their core libraries and portions of their runtime.

Since Ruby and Groovy seem to be the two most popular (or noisiest) non-Java JVM languages these days, I thought I'd start a series of posts showing how to add Groovy features missing from Ruby to JRuby. But there's a catch: I'll use only Ruby code to do this, and what I show will work on any unmodified JRuby release. That's the beauty of Ruby: the language is so flexible and fluid, you can implement many features from other languages without ever modifying the implementation.

First up, Groovy's ability to implement an interface from a Map.

1. impl = [
2. i: 10,
3. hasNext: { impl.i > 0 },
4. next: { impl.i-- },
5. ]
6. iter = impl as Iterator
7. while ( iter.hasNext() )
8. println iter.next()
Ok, this is Groovy code. The brackety thing assigned to 'impl' shows Groovy's literal Map syntax (a Hash to you Rubyists). Instead of providing literal strings for the keys, Groovy automatically turns whatever token is in the key position into a Java String. So 'i' becomes a String key referencing 10, 'hasNext' becomes a String key referencing a block of code that checks if impl.i is greater than zero, and so on.

The magic comes on line 6, where the newly-constructed Map is coerced into a java.util.Iterator implementation. The resulting object can then be passed to other code that expects Iterator, such as the while loop on lines 7 and 8, and the values from the Map will be used as the code for the implemented methods.

To be honest, I find this feature a bit weird. In JRuby, you can implement a given interface on any class, add methods to that class at will, and get most of this functionality without ever touching a Hash object. But it's pretty simple to implement this in JRuby:
1. module InvokableHash
2. def as(java_ifc)
3. java_ifc.impl {|name, *args| self[name].call(*args)}
4. end
5. end
Here we have one of Ruby's wonderful modules, which I appreciate more each day. This InvokableHash module provides only a single method 'as' which accepts a Java interface type and produces an implementation of that type that uses the contents of hash keys to implement the methods. That's really all there is to it. So by reopening the Hash class, we gain this functionality:
1. class Hash
2. include InvokableHash
3. end
And we're done! Let's see the fruits of our labor in action:
1. impl = {
2. :i => 10,
3. :hasNext => proc { impl[:i] > 0 },
4. :next => proc { impl[:i] -= 1 }
5. }
6. iter = impl.as java.util.Iterator
7. while (iter.hasNext)
8. puts iter.next
9. end
Our final Ruby code looks roughly like the Groovy code. On lines 1 through 5 we construct a literal Hash. Notice that instead of automatically turning identifier tokens into Strings, Ruby uses the exact object you specify for the key, and so here we use Ruby Symbols as our hash keys (they're roughly like interned Strings, and highly recommended for hash keys). On line 6, we coerce our Hash into an Iterator instance (and we could have imported Iterator above to avoid the long name). And then lines 7 through 9 use the new Iterator impl in exactly the same way as the Groovy code.

You've gotta love a language this flexible, especially with JRuby's magic Java integration features to back it up.

Thursday, November 29, 2007

Coming Soon

$ jruby -J-Djruby.compat.version=ruby1_9 -v
ruby 1.9.1 (2007-11-29 rev 4842) [i386-jruby1.1b1]

Wednesday, November 28, 2007

Tab Sweep November 28

WiiHear - Streaming audio for the Wii. Very cool. It was a little spotty for me, but your results may be better. And in related news: Super Mario Galaxy is the best Mario ever. I'll post a review later.

Easy(?) JRuby bugs from Rubinius specs - Vladimir Sizikov has been contributing new/fixed specs to Rubinius while running all the specs against JRuby. He's reported several new bugs as a result, which could potentially be easy API edge cases. Give them a shot. I'll try to post an "easy bug round-up" soon too.

File IO in Different Languages
- A quick comparison of (very) basic file I/O in TCL, Python, PHP, Ruby, Groovy, Scala, and JavaScript. The poster then goes on to compare startup times for the Java implementations of these languages, showing an area JRuby has some issues (slowest startup time, at around 1.6s). We know about the problem, and hopefully NailGun plus other improvements in the future will help. Largely, it's a JVM issue; classes are slow to load and verify, and we create literally hundreds of tiny classes.

Minneapolis - I live in Richfield, an "urban suburb" of Minneapolis (often dubbed Minneapolis Junior), but I generally just say I'm from Minneapolis, since most people recognize it. The Wikipedia article on Minneapolis is excellent (as is the one on the Twin Cities area). A few choice facts:

  • Minneapolis has more per-capita theater seats than any US city other than New York
  • The Twin Cities is one of the warmest places in Minnesota, with an average annual temperature of 45.4 ºF. Monthly average daily high temperatures range from 21.9 °F (-5.6 °C) in January to 83.3 °F (28.5 °C) in July; the average daily minimum temperatures for the two months are 4.3 °F (-15.4 °C) and 63.0 °F (17 °C) respectively. So it gets cold in the winter and hot in the summer, not too hot nor too cold. It's nice to have four seasons.
  • Every home in Minneapolis is no more than six blocks away from a wooded park. In the summer, Minneapolis from the air looks like a bunch of skyscrapers surrounded by a bed of trees.
It's a nice place to live. I'd be hard pressed to find somewhere better.

Oniguruma Regular Expression Syntax version 5.6 - This is the library Marcin ported. I believe all of this is supported in Joni. Of course we wrap it so normal Ruby regular expressions work exactly as they would under Ruby 1.8, but Marcin will release Joni as a standalone library too.

Dr. Nic Installs Mingle with Capistrano - A nice walkthrough.

JavaPolis 2007 - I will be attending and co-presenting a JRuby/NetBeans tutorial with Brian Leonard and a JRuby/Rails session with Ola Bini. Only two weeks and I'll be in the land of beer, chocolate, and diamonds.

acts_as_conference 2007 - I will be attending and presenting something JRubyish and Railsy. Perhaps my last Rails presentation before handing such things off to people who actually know Rails? This conference also wins my vote for "most cumbersome title".

RailsConf 2008 CFP - I haven't decided if I plan to present anything this year. There are many, many other folks doing real-world JRuby on Rails work that could probably do a better job.

FOSDEM 2008 - I have been invited and will be attending. Back to the land of beer, chocolate and diamonds.

Tuesday, November 27, 2007

REXML Numbers With Joni

As Ola reported earlier today, we've merged Joni, Marcin Mielczynski's port of Oniguruma, to JRuby trunk. Here's the description from the Oniguruma home page:

Oniguruma is a regular expressions library.
The characteristics of this library is that different character encoding
for every regular expression object can be specified.
The benefit for us is avoiding the encode/decode we previously had to do for every regular expression match, since Ruby uses byte[]-based strings and all Java regular expression engines work with char[]. You can imagine the overhead all that array churn introduced.

After running through a series of basic optimizations, most of the key expressions we worried about were performing as well as or much better than JRegex, so Ola went through with the conversion over the past couple days. Marcin is continuing to work on various optimizations, but both Ola and I have been playing with the new code. And it's looking great.

You may remember I reported recently about how the regexp bottleneck impacted XML parsing with REXML. Here's the numbers run against JRuby immediately before merging Joni:
read content from stream, no DOM
3.362000 0.000000 3.362000 ( 3.362000)
1.232000 0.000000 1.232000 ( 1.232000)
0.887000 0.000000 0.887000 ( 0.887000)
1.009000 0.000000 1.009000 ( 1.010000)
0.801000 0.000000 0.801000 ( 0.801000)
read content once, no DOM
9.869000 0.000000 9.869000 ( 9.869000)
9.779000 0.000000 9.779000 ( 9.779000)
9.786000 0.000000 9.786000 ( 9.786000)
9.655000 0.000000 9.655000 ( 9.655000)
9.601000 0.000000 9.601000 ( 9.601000)
read content from stream, build DOM
1.368000 0.000000 1.368000 ( 1.368000)
1.297000 0.000000 1.297000 ( 1.297000)
1.192000 0.000000 1.192000 ( 1.192000)
1.131000 0.000000 1.131000 ( 1.131000)
0.812000 0.000000 0.812000 ( 0.812000)
read content once, build DOM
10.595000 0.000000 10.595000 ( 10.595000)
9.489000 0.000000 9.489000 ( 9.488000)
9.947000 0.000000 9.947000 ( 9.947000)
9.821000 0.000000 9.821000 ( 9.821000)
9.414000 0.000000 9.414000 ( 9.415000)
And here's the performance numbers today, with Joni:
read content from stream, no DOM
2.309000 0.000000 2.309000 ( 2.308000)
1.217000 0.000000 1.217000 ( 1.217000)
0.776000 0.000000 0.776000 ( 0.776000)
0.825000 0.000000 0.825000 ( 0.825000)
0.637000 0.000000 0.637000 ( 0.637000)
read content once, no DOM
0.370000 0.000000 0.370000 ( 0.369000)
0.415000 0.000000 0.415000 ( 0.415000)
0.288000 0.000000 0.288000 ( 0.288000)
0.260000 0.000000 0.260000 ( 0.260000)
0.254000 0.000000 0.254000 ( 0.254000)
read content from stream, build DOM
1.455000 0.000000 1.455000 ( 1.455000)
0.916000 0.000000 0.916000 ( 0.916000)
0.887000 0.000000 0.887000 ( 0.888000)
0.827000 0.000000 0.827000 ( 0.827000)
0.607000 0.000000 0.607000 ( 0.607000)
read content once, build DOM
0.630000 0.000000 0.630000 ( 0.630000)
0.664000 0.000000 0.664000 ( 0.664000)
0.680000 0.000000 0.680000 ( 0.680000)
0.553000 0.000000 0.553000 ( 0.553000)
0.650000 0.000000 0.650000 ( 0.650000)
Marcin's being modest about the work, but we're all absolutely amazed by it.

So finally the last really gigantic performance bottleneck in JRuby is gone, and it appears that JRuby's slow regexp era has come to a close. Next targets: the remaining issues with IO and Java integration performance.

Java 6 Port for OS X (Tiger and Leopard)

I just stumbled across this little gem today:

Landon Fuller's JDK 6 Port for OS X

Who Landon Fuller is I don't know. But I find it incredibly impressive that he's managed to get the base JDK 6 ported to OS X and working. Talk about showing the value of an open-source JDK...Landon Fuller FTW. Apple, are you hiring? Perhaps this guy can kick the Apple JDK process in the ass.

So naturally when I'm confronted with a final JDK 6 release for OS X one thing immediately springs to mind: performance.

We'd always suspected that the early preview version of JDK 6 on OS X was not showing us the true awesome performance we could expect from a final version.

We were right.

So I'll give you two sets of numbers, one that's specious and unreliable and the other that's a more real-world test.

fib numbers

Yes, good old fib. A constant in benchmarking. It shows practically nothing, and yet people use it to demonstrate perf. And in the case of Ruby 1.9, they've specifically optimized for integer-math-heavy benchmarks like this.

Ruby 1.9:

  0.400000   0.000000   0.400000 (  0.413737)
0.420000 0.010000 0.430000 ( 0.421622)
0.400000 0.000000 0.400000 ( 0.411591)
0.410000 0.000000 0.410000 ( 0.411593)
0.400000 0.000000 0.400000 ( 0.410080)
0.410000 0.000000 0.410000 ( 0.408836)
0.400000 0.000000 0.400000 ( 0.408572)
0.410000 0.000000 0.410000 ( 0.408114)
0.400000 0.000000 0.400000 ( 0.410374)
0.400000 0.000000 0.400000 ( 0.413096)

Very nice numbers, especially considering Ruby 1.8 benchmarks at about 1.7s on my system. Ruby 1.9 contains several optimizations for integer math, including the use of "tagged integers" for Fixnum values (saving object costs) and fast math opcodes in the 1.9 bytecode specification (avoiding method dispatch). JRuby does neither of these, representing Fixnums as a normal Java object containing a wrapped Long and dispatching as normal for all numeric operations.

JRuby trunk:
  0.783000   0.000000   0.783000 (  0.783000)
0.510000 0.000000 0.510000 ( 0.510000)
0.510000 0.000000 0.510000 ( 0.510000)
0.506000 0.000000 0.506000 ( 0.506000)
0.505000 0.000000 0.505000 ( 0.504000)
0.507000 0.000000 0.507000 ( 0.507000)
0.510000 0.000000 0.510000 ( 0.510000)
0.507000 0.000000 0.507000 ( 0.507000)
0.508000 0.000000 0.508000 ( 0.508000)
0.510000 0.000000 0.510000 ( 0.510000)

This is improved from numbers in the 0.68s range under the Apple JDK 6 preview. Pretty damn hot, if you ask me. I love being able to sit back and do nothing while performance numbers improve. It's a nice change from 16 hour days.

Anyway, back to performance. JRuby also supports an experimental frameless execution mode that omits allocating and initializing per-call frame information. In Ruby, frames are used for such things as holding the current method visibility, the current "self", the arguments and block passed to a method, and so on. But in many cases, it's safe to omit it entirely. I haven't got it running 100% safe in JRuby yet, and probably won't before 1.1 final comes out...but it's on the horizon. So then...numbers.

JRuby trunk, frameless execution:
  0.627000   0.000000   0.627000 (  0.627000)
0.409000 0.000000 0.409000 ( 0.409000)
0.401000 0.000000 0.401000 ( 0.401000)
0.402000 0.000000 0.402000 ( 0.402000)
0.403000 0.000000 0.403000 ( 0.403000)
0.403000 0.000000 0.403000 ( 0.403000)
0.404000 0.000000 0.404000 ( 0.405000)
0.401000 0.000000 0.401000 ( 0.401000)
0.403000 0.000000 0.403000 ( 0.403000)
0.405000 0.000000 0.405000 ( 0.405000)

Hello hello? What do we have here? JRuby actually executing fib faster than an optimized Ruby 1.9? Can it truly be?

Pardon my snarkiness, but we never thought we'd be able to match Ruby 1.9's integer math performance without seriously stripping down Fixnum and introducing fast math operations into the compiler. I guess we were wrong.

M. Ed Borasky's MatrixBenchmark

I like Borasky's matrix benchmark because it's a non-trivial piece of code, and pulls in a Ruby standard library (matrix.rb) as well. It basically inverts a matrix of a particular size and multiplies the original by the inverse. I show here numbers for a 64x64 matrix, since it's long enough to show the true benefit of JRuby but short enough I don't get bored waiting.

Ruby 1.9:
Hilbert matrix of dimension 64 times its inverse = identity? true
21.630000 0.110000 21.740000 ( 21.879126)
JRuby trunk:
Hilbert matrix of dimension 64 times its inverse = identity? true
14.780000 0.000000 14.780000 ( 14.780000)

This is down from 16-17s under the Apple JDK 6 preview and a clean 25% faster than Ruby 1.9.

So what have we learned today?
  • Sun's JDK 6 provides frigging awesome performance
  • Apple users are crippled without a JDK 6 port. Apple, I hope you're paying attention.
  • Landon Fuller is my hero of the week. I know Landon will just point at the excellent work to port JDK 6 to FreeBSD and OpenBSD...but give yourself some credit, you did what none of the other Leopard whiners did.
  • JRuby rocks
Note: You have to be a Java Research License licensee to legally download the binary or source versions of Landon's port. That or complain to Apple about some dude making a working port before they did. Landon mentions on his blog that he plans to contribute this work to OpenJDK soon...which would quickly result in a buildable GPLed JDK for OS X. Awesome.