Headius

Friday, March 21, 2008

More Fun With Duby

It's been...oh my, almost two weeks since I broke the news about Duby. Since then I attended PyCon and we managed to get JRuby 1.1 RC3 out the door, which is looking like it will become JRuby 1.1 final. But I've still been working on Duby in my spare time. It's kinda nice to have a different "spare time" project than JRuby for a while.

After my previous post, I continued to carry the compiler forward. I managed to get it compiling the following syntactic structures:

until and while loops
static method definitions and calls (using "def self.foo" syntax)
array types (which combined with static calls allowed creating a "main" method that worked!)
instance variables (using @foo to mean the "foo" field on a Java type)
type resolution from any Java type
imports (with "import foo.bar.Baz" syntax)
packages (using "module Foo" syntax, though that will probably change)
type inference across methods on the same type, including smarts for instance vs static

All very exciting stuff, and starting to get very close to a language that would be usable for simple algorithms. Except there was just one problem: every new feature I added made the codebase more difficult to understand.

Reloaded

I had just started to explore the next steps when I realized the codebase was becoming too hard to follow. After chasing around a few inference bugs I decided it was time to take a step back. I also hoped all along to eliminate the hard requirement to declare a return type, since that at least should be apparent from inspecting the method body...but it wouldn't be possible with the original design.

The initial Duby compiler was entirely made up of decorations on the existing Ruby AST. A CallNode, for example, was decorated with methods called "compile", "declared_type", "signature" and so on that were called in various sequences to first infer types and then produce bytecode. There were a few problems with this approach that became apparent as work continued:

The hypothetical Duby AST structure did not map 1:1 to the Ruby AST structure; it was richer, with type information, method signatures, and overloaded/hooked syntax. To support this in the existing AST, many layers of complexity had to be added to the compilation process: "is this call being used as a call or as a type declaration?"
The decorator methods were inextricably entwined with Java/JVM-specific details, be they type names, bytecodes, or just obviously Java-centric syntax. This not only made it more difficult to evolve the language, but made it impossible for the language to live anywhere but on JVM and be compiled by anything but JRuby.
My grand plans for the language were quickly exceeding what would be possible by simply decorating the AST.

The third bullet may spark some interest, so I'll explain very briefly (since it's still just forming in my head). As I thought more about how to map Ruby to the JVM, I realized that very few algorithms, very little syntax couldn't be made to map to *something* fast, static-typed, and conceptually the same. Mix-ins, for example, could easily be implemented like C# 3.0's extension methods. Blocks could be implemented behind the scenes as anonymous types of particular signatures. Even operator overloading could easily be mapped to appropriate methods on Numeric types, Collection types, and many others. The key to all this was having type inferencing and compiler layers that were not just flexible, but infinitely pluggable.

The Dream of Duby

The dream, as I see it, would be to use a Ruby-like syntax to wire up any arbitrary types using what looks generally like idiomatic Ruby code. For example, our friend fib(). The same fib() method I showed before, executing against primitive integer values, could execute against boxed Integer objects or BigInteger objects just by specifying the right plugin. So if within the context of a file, I declare that "fixnum" types are to be represented as BigInteger objects, and math operations against them are to call appropriate methods on BigInteger, so be it...that's what comes out the other side.

The term for this is, I believe, "specialization". In the case above, it's explicit specialization on a case-by-case basis. For many cases, this is all you need...you know the types you're dealing with ahead of time and can comfortably declare them or specify type mappings during compilation. But the more interesting side of this comes from general-purpose specialization in the form of parametric polymorphism...or in terms you might be more familiar with: generics.

I am certainly stepping beyond my current education and understanding here, but the pieces are starting to fit together for me. I and others like me have long been envious of languages like Haskell which infer types so incredibly well. And living between the Ruby and Java worlds, I've often felt like there had to be some middle ground that would satisfy both ends of the spectrum: a static, verified type system flexible enough to consume and specialize against any incoming types just by loading a plugin or recompiling, combined with the cleanest, most expressive (while still being one of the most reable) dynamic language syntaxes around. And so that's where I'd like Duby to fit.

The New Version

So where does it stand now?

Things have been moving fast. With JRuby 1.1 RC3 out of the way, I've taken some time to go back and rework the Duby pipeline.

Now, the only decoration on the Ruby AST is in the form of transformation logic to produce a richer Duby syntax tree. Literals are the same. The three call types have been unified into two: Call and FunctionalCall (against "self"). The various types of code bodies have been unified into a single Body construct. Method definition is represented through MethodDefinition and StaticMethodDefinition, both of which aggregate a "signature" element to represent declared or inferred signature information. The several looping constructs (excluding for loops, which are block-based iterations) have been unified into Loop. And so on. Not all the syntax supported by the Duby prototype has been represented in transformation, but it's not far off. And I'm taking a much more measured approach now.

The new AST structure affords a much more realistic typing and compilation pipeline. Each of the Duby AST nodes defines a single method "infer" which, when passed a Typer, will walk its children and infer as many types as it is able. Each child walks its own children, and so on, unifying and generalizing types as it goes (though the unification and generalization is stubbed out now to only allow exact matches...this will eventually be the responsibility of the back-end to handle). Simple code that calls fully-resolved target methods and has no unknown branches may completely resolve during this first pass.

In more complicated cases, each node that can't make an accurate determination about inferred type registers itself with the Typer in its "deferred" list. Once the initial inference cycle has run, all nodes in the AST will have either successfully inferred their result type or registered with the deferred list. At this point, you can either continue to pass ASTs through the Typer, or you can begin the resolution phase.

To start the resolution phase, the "resolve" method on the Typer is called, which attempts to iteratively resolve all deferred nodes in the AST. This resolution process in theory will loop until either the deferred list has been drained of all nodes (which will presumably then be 100% resolved), or until two successive resolution cycles fail to alter the list (perhaps because more information is needed or there are circular references for which the user must add hints). In general, this means that the deepest unresolved child nodes will fall out first. For example, if you have a method "foo" that calls a method "bar" before "bar" has been defined in the source.

def foo
  bar
end
def bar
  1
end

During the first inference cycle, bar will completely resolve but foo will be deferred. During the resolution phase, foo will now see that bar has been defined and resolved, and will itself resolve. Both return :fixnum (though the decision of what type ":fixnum" resolves to will be left to the compiler backend for a given system).

Back to fib()

Here's our friend fib(), which serves as a nice simple example:

def fib(n)
  {n => :fixnum}

  if n < 2
    n
  else
    fib(n - 1) + fib(n - 2)
  end
end

The fib() method is actually fairly interesting here because it recurses. If this were a simple recursion, it would be impossible to determine what actual type fib returns without an explicit declaration, since no other information is available, and this would produce an error of the following variety:

~/NetBeansProjects/jruby ➔ cat recurse.rb
def recurse
  recurse
end
~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb recurse.rb 
Could not infer typing for the following nodes:
  FunctionalCall(recurse) (child of MethodDefinition(recurse))
  MethodDefinition(recurse) (child of Script)
  Script (child of )

AST:
Script
 MethodDefinition(recurse)
  {:return=>Type(notype)}
  Arguments
  FunctionalCall(recurse)

Here we see a simple "recurse" method that just calls itself, and as you'd expect type inference fails. Because the return value of "recurse" depends on knowing the return value of "recurse", resolution fails.

However in the case of fib(), we don't have a simple recursion, we have a conditional recursion. The default behavior for the Duby "Simple" typer is to assume that if one branch of an If can successfully resolve, that's the type to use (temporarily) as the value of the If (while still marking the if as unresolved, and unifying the two bodies later). And since the root of the fib() method only contains an If, it can successfully resolve. Let's try it:

~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rb
Could not infer typing for the following nodes:
  Call(<) (child of Condition)
  Condition (child of If)
  Call(-) (child of FunctionalCall(fib))
  FunctionalCall(fib) (child of Call(+))
  Call(-) (child of FunctionalCall(fib))
  FunctionalCall(fib) (child of Call(+))
  Call(+) (child of If)
...

Ouch, what happened here? Actually it's pretty easy to understand...the calls to "<", "-", and "+" were unknown to the Typer, and so they could not be resolved. As a result, the If Condition could not resolve, nor could the body of the Else statement. This is not necessarily a fatal state, merely an incomplete one. The "resolve" method on Typer can be called with "true" to force an error to be raised, or with no arguments to just "do the best it can do". In this case, using the simple call to the typer, it raises and displays the error, but there's no reason that more information couldn't be added to the system to allow a subsequent resolution to proceed.

Pluggable Inference

This is where having a pluggable engine starts to come in handy. Though the mechanism is currently pretty crude, there's already a basic ability to specify a plugin for method resolution. In order to enlist in method resolution, a plugin must define a "method_type" method that accepts a parent Typer, a target type, a method name, and a list of parameter types. If at any time method resolution fails in the Simple Typer, the plugin Typers will be called in turn. So in this case, I created a simple "Math" typer that's aware of a few simple operations with LHS and RHS of :fixnum. Let's try it:

~/NetBeansProjects/jruby ➔ jruby -rcompiler/duby/plugin/math lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rb

AST:
Script
 MethodDefinition(fib)
  {:return=>Type(fixnum), :n=>Type(fixnum)}
  Arguments
   RequiredArgument(n)
  Body
   Noop
   If
    Condition
     Call(<)
      Local(name = n, scope = MethodDefinition(fib))
      Fixnum(2)
    Local(name = n, scope = MethodDefinition(fib))
    Call(+)
     FunctionalCall(fib)
      Call(-)
       Local(name = n, scope = MethodDefinition(fib))
       Fixnum(1)
     FunctionalCall(fib)
      Call(-)
       Local(name = n, scope = MethodDefinition(fib))
       Fixnum(2)

Notice now that the return type of fib() has been correctly inferred to be :fixnum. Huzzah! We are successful!

Debugging

Along with work to make the system more pluggable and the code easier to follow, I've also been trying to provide useful debugging output. Man, I think making debugging output useful and readable is harder than writing a type inference engine in the first place. I must have spent a good hour just tweaking output so it didn't look totally heinous. And it's still not great, but it's definitely usable. Here, for your edification, is the debugging output from type inference on fib.rb:

* [Simple] Learned local type under MethodDefinition(fib) : n = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "<" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "<" Type(fixnum) on Type(fixnum) = Type(boolean)
* [AST] [Call] resolved!
* [AST] [Condition] resolved!
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Method type for "+"  on  not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "+"  on  not found
* [Simple] Deferring inference for Call(+)
* [Simple] Deferring inference for If
* [Simple] Learned method fib (Type(fixnum)) on Type(script) = Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "+" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "+" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] [Cycle 0]: Inferred type for Call(+): Type(fixnum)
* [AST] [If] resolved!
* [Simple] [Cycle 0]: Inferred type for If: Type(fixnum)
* [Simple] Inference cycle 0 resolved all types, exiting

What's Next

I should clarify a few things before getting back to work:

This codebase is mostly separate from, but heavily advised by the original Duby prototype. I learned a lot from that code, and it's still more functional from a compilation standpoint, but it's not really something I can evolve. The new codebase will probably be at the same level of functionality with a week or so.

Largely the more measured pace of this new codebase is because of two key goals.

I'd like to move the system ever toward full, general type inference when possible. That includes inferring method parameters as well as return types. That also means there will have to be a much more comprehensive type-inference engine than the current "simple" engine, but nothing about the current system will break forward compatibility with such work.

I'd also like to see the entire type inferencing pipeline and ideally most of the compilation pipeline entirely platform-independent. There has been a surprising amount of interest in Duby from folks desiring to target LLVM, C, x86 ASM, Parrot, and others. Sadly, without a realistic codebase to work from--one which isolates typing logic from the underlying platform--none of that work has moved forward (though it's almost frightening how quickly people pounced on the idea). So the new system is almost entirely independent of Java, the JVM, and JRuby. Ideally the only pieces you'd need to reimplement (or plugin) for a given platform would be the parser (producing a Duby AST through whatever mechanism you choose) and the type mapper/compiler for a given system. The parser would probably be the more difficult, but since the language syntax is "basically Ruby" and designed to support full AOT compilation you can use any Ruby parser to produce a Ruby AST and then transform it in the same way I transform JRuby's AST. The compiler will mostly be a matter of mapping Duby syntactic structures and inferred generic types to code sequences and native types on your platform of choice.

Over the weekend I'll probably try to absorb the available literature on type inference, to learn what I'm doing wrong and what I could be doing better...but I think my "common sense" approach seems to be working out pretty well so far. We shall see. Suggestions for papers, ideally papers designed for mortals, are welcome.

So there you have it...the Duby update several of you have been asking for. Satisfied for now? :)

(BTW, the code is all available in the JRuby repository, under lib/ruby/site_ruby/1.8/compiler/duby. The new stuff is largely under the ast/ subdir and in the "transform.rb" and "typer2.rb" files. The tests of interest are in test/compiler/duby/test_ast.rb and test_typer.rb)

Monday, March 17, 2008

Another GSoC Idea

We were just discussing GSoC a bit, and another idea occurred to me:

Survey existing language implementations and how they're solving similar problems like POSIX, code generation/compilation, parsing, and so on, and work with project leads to pull out common solutions into reusable APIs.

This would be a huge help to all such language projects, since we all really want to work together but we're generally swamped with fixes and whatnot for our own projects. It would even be possible to narrow the focus a bit, such as to take on POSIX and use Tom Enebo's current library in JRuby as a base to start building out complete JVM POSIX support for all languages. If you're interested in helping JVM languages succeed, this would be a great way to help them all at once.

Monday, March 10, 2008

Duby: A Type-Inferred Ruby-Like JVM Language

It's been one of those nights, my friends! An outstanding night!

All day Sunday I had the stupids. It was probably related to a few beers on Saturday night, or the couple glasses of red wine before that. Whatever it was, I didn't trust myself to work on any JRuby bugs for fear of introducing problems if my brain clicked off mid-stream. So I started playing around with my old bytecode generating DSL from a few months back. Then things got interesting.

We've long wanted to have a "Ruby-like" language, probably a subset of Ruby syntax, that we could compile to solid, fast, idiomatic JVM bytecode. Not a compiler for Ruby, with all the bells and whistles that make Ruby both difficult to support and inefficient to use for implementing itself. A real subset language that produces clean, tight JVM bytecode on par with what you'd get from compiled Java. But better, because it still looks and feels mostly like Ruby.

So I wrote one! And I used my bytecode library too!

Let's say we want to implement our good friend fib.

class Foo
  def fib(n)
    if (n < 2)
      n
    else
      fib(n - 2) + fib(n - 1)
    end
  end
end

This is normal ruby code. Given a Fixnum input, it calculates the appropriate fibbonaci number and returns it. It's slow in Ruby for a few reasons:

In JRuby, it uses a boxed integer value. Matz's Ruby and Rubinius use tagged integers to improve performance, and we rely on the JVM to optimize as much as it can (which turns out to be a *lot*). But it's still way slower than using primitives directly.
The comparison operations, integer math operations, and fib operations are all dynamic invocations. So there's at least a bit of method lookup cost, and then a bunch of abstraction cost. You can reduce it, but you can't eliminate it.
There are many Ruby features that influence performance negatively even when you're not using them. It's very difficult, for example, to optimally store local variables when the local scope can be captured at any time. So either you rely on tricks, or you store local variables on the heap and deal with them being slow.

When working with a statically-typed language, you can eliminate all of this. In Java, for example, you have both object types and primitive types; primitive operations are extremely fast and eventually JIT down to machine-code equivalents; and the feature set is suitably narrow to allow current JVMs to do amazing levels of optimization.

But of course Java has its problems too. For one, it does very little guessing or "inferring" of types for you, which means you generally have to declare them all over the place. On local variables, on parameters, on return types. C# 3.0 aims to correct this by adding type inference all over, but then there's still other syntactic hassle using any C-like language: curly braces, semicolons, and other gratuitous syntax that make up a lot of visual noise.

Wouldn't be nice if we could take the code above, add some type inference logic, and turn it into a fast, static-typed language?

class Foo
  def fib(n)
    {n => java.lang.Integer::TYPE, :return => java.lang.Integer::TYPE}
    if (n < 2)
      n
    else
      fib(n - 2) + fib(n - 1)
    end
  end
end

And there it is! This is the same code as before, but now it's been decorated with a little type declaration block (in the form of a Ruby hash/map) immediately preceding the body of the method. The type decl describes that the 'n' argument is to be mapped to a primitive int, and the method itself will return a primitive int (and yes, I know those could be inferred too...it shall be soon). The rest of the method just works like you'd expect, except that it's all primitive operations, chosen based on the inferred types. For the bold, here's the javap disassembly output from the compiler:

Compiled from "superfib.rb"
public class Foo extends java.lang.Object{
public int fib(int);
  Code:
   0: iload_1
   1: ldc #10; //int 2
   3: if_icmpge 10
   6: iload_1
   7: goto 27
   10: aload_0
   11: iload_1
   12: ldc #10; //int 2
   14: isub
   15: invokevirtual #12; //Method fib:(I)I
   18: aload_0
   19: iload_1
   20: ldc #13; //int 1
   22: isub
   23: invokevirtual #12; //Method fib:(I)I
   26: iadd
   27: ireturn

public Foo();
  Code:
   0: aload_0
   1: invokespecial #15; //Method java/lang/Object."<init>":()V
   4: return

}

A few items to point out:

A default constructor is generated, as you'd expect in Java. This will be expected to also recognize "def initialize" constructors. I haven't decided if I'll allow overloading or not.
Notice the type signature for fib and all the type signatures for calls it makes are correctly inferred to the correct types.
Notice all the comparison and arithmetic operations are compiled to the correct bytecodes (iadd, isub, if_icmpge and so on).

And the performance is what you'd hope for:

$ jruby -rjava -e "t = Time.now; 5.times {Java::Foo.new.fib(35)}; p Time.now - t"
0.681
$ jruby -rnormalfib -e "t = Time.now; 5.times {Foo.new.fib(35)}; p Time.now - t"
27.851

Here's another example, with some string operations thrown in:

class Foo
 def bar
   {:return => java.lang.String}
 
   'here'
 end
end

class Foo
 # reopening classes works in the same file only (for now)
 def baz(a)
   {a => java.lang.String}
 
   b = "foo"
   a = a + bar + b
   puts a
 end
end

It works, of course:

$ jruby -rjava -e "Java::Foo.new.baz('Type inference is fun')"
Type inference is funherefoo

And once again, the disassembled output:

Compiled from "stringthing.rb"
public class Foo extends java.lang.Object{
public java.lang.String bar();
 Code:
  0: ldc #13; //String here
  2: areturn

public void baz(java.lang.String);
 Code:
  0: ldc #15; //String foo
  2: astore_2
  3: aload_1
  4: checkcast #17; //class java/lang/String
  7: aload_0
  8: invokevirtual #19; //Method bar:()Ljava/lang/String;
  11: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
  14: checkcast #17; //class java/lang/String
  17: aload_2
  18: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
  21: astore_1
  22: getstatic #29; //Field java/lang/System.out:Ljava/io/PrintStream;
  25: aload_1
  26: invokevirtual #34; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
  29: return

public Foo();
 Code:
  0: aload_0
  1: invokespecial #36; //Method java/lang/Object."<init>":()V
  4: return

}

Notice here that the + operation was detected as acting on two strings, so it was compiled to call String#concat rather than try to do a numeric operation. These sorts of mappings are simple to add, and since there's type information everywhere it's also easy to come up with cool ways to map Ruby syntax to Java types.

My working name for this is going to be Duby, pronounced "Doobie", after Duke plus Ruby. Duby! It has a nice ring to it. It may be subject to change, but that's what we'll go with for the moment.

Currently, Duby supports only the features you see here. It's a very limited subset of Ruby right now, and the subset doesn't support all Java primitive types, for example, so there's a lot of blanks to be filled. It also doesn't support static (class) methods, doesn't wire up "initialize" methods, doesn't support packages (for namespacing) or imports (to shrink type names) or superclasses or interfaces or enums or generics or what have you. But it's already functional for simple code, as you see here, so I think it's a great start for 10 hours of work. The rest will come, as needed and as time is available.

What are my plans for it? Well many of you may know Rubinius, an effort to reimplement Ruby in Ruby, modeled after the design of many Smalltalk VMs. Well, in order to make JRuby more approachable to Ruby developers, Duby seems like the best way to go. They'll be able to write mostly idiomatic Ruby code and know it will both perform at Java speeds and provide compile-type checking that all is wired up correctly. So we can avoid the overhead of "turtles all the way down" by just teaching one turtle how to speak JVM bytecode and building on that.

I also hope this will lead to lots of new ideas about how to implement languages for the JVM. It's certainly attractive to language users to be able to contribute to language implementations using the language in question, and if we can come up with compilers and sub-languages like Duby it could make the JVM more approachable to a wide range of developers.

Oh, and for those of you curious: the Duby compiler is written mostly in Ruby too.

Saturday, March 08, 2008

RubyInline for JRuby? Easy!

With JRuby 1.1 approaching, performance looking great, and general Ruby compatibility better than its ever been, we've started to branch out into more libraries and applications. So a couple days ago, I thought I'd make good on a promise to myself and have a look at getting RubyInline working on JRuby.

RubyInline is a library by Ryan "zenspider" Davis which allows you to embed snippits of C code into your Ruby scripts. RubyInline does a minimal parse of that source, and based on the function signature you provide it wires it to the containing class as a Ruby method entry point and performs appropriate entry and exit type conversion to whatever C types you happen to use. It's particularly useful if you have a small algorithm you need to run fast, and you'd like to run it in a somewhat faster language by "throwing work over" to it.

Here's the example of RubyInline from Ryan's page:

class MyTest

  def factorial(n)
    f = 1
    n.downto(2) { |x| f *= x }
    f
  end

  inline do |builder|
    builder.c "
    long factorial_c(int max) {
      int i=max, result=1;
      while (i >= 2) { result *= i--; }
      return result;
    }"
  end
end

The interesting bit is the builder.c call, one of several functions on the C builder. Others allow you to add arbitrary preamble code, imports, and "bare" methods (with no type conversion) among other things. And as you'd expect, performance is greatly improved by writing some algorithms in C instead of Ruby.

Naturally, we want to have the same thing work in Java, and ideally use the same API and the same RubyInline plumbing for the rest of it. So then, I present to you java_inline, a RubyInline builder for JRuby!

This represents about four hours of work, and although it doesn't yet have a complete complement of tests and could use some "robustification", it's already working in about 100 lines of code. It's made far easier in JRuby than in C Ruby because we already have a full-features Java integration layer to handle the argument mapping.

Here's a sample java_inline script to show what it looks like, similar to the fastmath.rb sample provided with RubyInline.

So how does it work? Well the Ruby side of things is largely the same as RubyInline's C builder...parse signature, compose the code together and compile it, and bind the method. It wires directly into the RubyInline pipeline, so all you need to do is install RubyInline, require the java_inline.rb script and you're all set. On the Java side, it's using the Java Compiler API, provided in Java 6 implementations. (OT: This has to be the worst-designed API I've ever seen. Go see for yourself. It's cruel and unusual. I won't dwell on it.) So yes, this will only work on Java 6. Deal with it...or submit a patch to get it working on Java 5 as well :)

It's not released in any official form yet, but I'll probably try to wire up a gem or something. I have to make sure I'm dotting my eyes and crossing my tees when I release stuff, even if it's only 100loc. But the repository is obviously public, so play with it, submit patches and improvements, and let me know if you'd like to use it or help work on it more. I'm also interest in suggestions for other libraries you'd like to see special JRuby support for, so pass that along too.

Thursday, March 06, 2008

PyCon, Euruko, and Scotland on Rails

Upcoming event round-up!

Next weekend, Tom Enebo, Nick Sieger and I will be at PyCon, checking out all the pythony goodness and hooking up with the Jython guys for some hacking.
Tom and I will travel to Prague for Euruko 2008, the European RubyConf. We'll present JRuby, unsurprisingly, and hopefully meet up with more European Rubyists we haven't talked to before.
After Euruko, Tom and I will be in Edinburgh for Scotland on Rails. We're supposed to do a 90-minute talk about JRuby on Rails. We'll try to make it entertaining and informative.

Monday, March 03, 2008

Welcome Pythonistas to Sun!

Today we can finally announce another exciting language-related event: we've hired Frank Wierzbicki of the Jython project and Ted Leung of OSAF (and a bunch of other stuff) to help with the Python story at Sun. Hooray!

Frank Wierzbicki has been tirelessly plodding away on Jython for years, since the early days. Even when Jython was considered by many to be "dead", Frank kept the project moving forward as best he could. Now he's been a major part of Jython's revitalization...a new release last year and rapid work to get compatibility up to current-Python levels prove that.

Ted Leung has long been involved with the Open Source Applications Foundation and the Apache Software Foundation, and was one of the original developers of the Xerces Java XML parser. I don't know Ted personally, but I'm excited to be working with him, especially in light of some of this previous roles.

So what does this mean for Sun? Well, it means we're serious about improving support for Python on Sun platforms. Jython is a big part of the story, since they have many challenges similar to JRuby, but a bunch of new ones as well. So we'll be looking to share key libraries and subsystems as much as possible, and we'll start looking at Jython as another driver for future JVM and platform improvement. On the non-Java side of the world, it means we'll ramp up support for Python itself. Sun has already settled on Mercurial for source control for all our OSS projects, and the package management system being worked up for Indiana is written in Python as well. But there's more we need to do.

Please help me in congratulating Frank and Ted on their new roles at Sun. It's going to be another interesting year for polyglots :)

Wednesday, February 27, 2008

University of Tokyo and JRuby Team to Collaborate on MVM

Finally the full announcement is available. We're pretty excited about this one...

The University of Tokyo and Sun Microsystems Commence Joint Research Projects on High Performance Computing and Web-based Programming Languages

Improving Efficiency and Simplicity of Fortress, Ruby and JRuby Languages is Current Focus for Unique Academic-Corporate Collaborative Model

TOKYO and SANTA CLARA, Calif. -- February 27, 2008 -- The University of Tokyo and Sun Microsystems, Inc. (Nasdaq: JAVA) today announced two joint research projects that will focus on High-Performance Computing (HPC) and Web-based programming languages.

...

The two research topics are:
Development of a library based on skeletal parallel programming in Fortress
Implementation of a multiple virtual machine (MVM) environment on Ruby and JRuby

Some of you knew about this from Tim Bray's rather cryptic announcement at RubyConf. The basic idea here is that we on the JRuby team and folks in Professor Ikuo Takeuchi's group at the University of Tokyo will be cooperating to come up with an MVM specification and implementations for the Ruby 1.9 and JRuby codebases. Tom and I have had brief discussions with Professor Koichi Sasada (ko1, creator of YARV) about this already, and there are mailing lists and wikis already running. I've also made sure that Evan Phoenix, of Rubinius, is included in the discussions. Like JRuby, Rubinius largely has MVM features implemented, but the eventual form they'll take and API they'll present is up in the air.

This should be an exciting collaboration between Sun and U Tokyo, as well as between the three most functional next-generation Ruby implementations. I'll certainly keep you posted on all MVM developments as they come up, and I welcome any comments or suggestions you might have.

Tuesday, February 26, 2008

JRuby in Google Summer of Code 2008

Greetings!

Google's Summer of Code for 2008 is starting up again, and we're looking for folks to submit proposals. The JRuby Community or Sun or me or someone will sign up as a mentoring organization, so start thinking about or discussing possible proposals. Here's a few ideas to get you started (and hopefully, there are many other ideas out there not on this list):

Writing a whole suite of specs for current Java integration behavior...and then expanding that suite to include behavior we want to add. This work would go hand-in-hand with a rework of Java Integration we're likely to start soon.
Collect and help round out all the profiling/debugging/etc tools for JRuby and get them to final releasable states. There's several projects in the works, but most of them are stalled and folks need better debugging. This project could also include simply working with existing IDEs (NetBeans, etc) to figure out how to get them to debug compiled Ruby code correctly (currently they won't step into .rb files).
Continue work on an interface-compatible RMagick port. There's already RMagickJR which has a lot of work into it, but nobody's had time to continue it. A working RMagick gem would ease migration for lots of folks using Ruby.
Putting together a definitive set of fine and coarse-grained benchmarks for Rails. JRuby on most benchmarks has been faster than Ruby 1.8.6...and yet higher-performance Rails has been elusive. We need better benchmarks and better visibility into core Rails. Bonus work: help nail down what's slower about JRuby.
Survey all existing JRuby extensions and put together an official public API based on core JRuby methods they're using. This would help us reduce the hassle of migrating extensions across JRuby versions.

Please, anyone else who has ideas, feel free to post them here or on the mailing list for discussion. And if you have a proposal, go ahead and mail the JRuby dev list directly.

Update: Martin Probst added this idea in the comments:

Another idea might be to "fix" RDoc, whatever that means. That's not really JRuby centric, but still a very worthwhile task, I think.

A documentation system that allows easy doc writing (Wiki-alike) and provides a better view on the actual functionality you can find in a certain instance would be really helpful. Plus a decent search feature.

And that makes me think of another idea:

Add RDoc comments to all the JRuby versions of Ruby core methods, and get RI/RDocs generating as part of JRuby distribution builds. Then we could ship RI and have it work correctly. Ola Bini has already started some of the work, creating an RDoc annotation we can add to our Ruby-bound methods.

Update 2: sgwong in the comments suggested implementing the win32ole library for JRuby. This would also be an excellent contribution, since there's already libraries like Jacob to take some of the pain out of it, and it would be great to have it working on JRuby. And again, this makes me think of an few additional options:

Implement a full, compatible set of Apple Cocoa bindings for JRuby. You could use JNA, and I believe there's already Cocoa bindings for Java you could reuse as well, but I'm not familiar with them.
Complete implementation of the DL library (Ruby's stdlib for programmatic loading and calling native dynamic libraries) and/or Rubinius's FFI (same thing, with a somewhat tidier interface). Here too there's lots of help: I've already partially implemented DL using JNA, and it wouldn't be hard to finish it and/or implement Rubinius's FFI. And implementing Rubinius's FFI would have the added benefit of allowing JRuby to share some of Rubinius's library wrappers.

Sunday, February 24, 2008

Ruby's Thread#raise, Thread#kill, timeout.rb, and net/protocol.rb libraries are broken

I'm taking a break from some bug fixing to bring you this public service announcement:

Ruby's Thread#raise, Thread#kill, and the timeout.rb standard library based on them are inherently broken and should not be used for any purpose. And by extension, net/protocol.rb and all the net/* libraries that use timeout.rb are also currently broken (but they can be fixed).

I will explain, starting with timeout.rb. You see, timeout.rb allows you to specify that a given block of code should only run for a certain amount of time. If it runs longer, an error is raised. If it completes before the timeout, all is well.

Sounds innocuous enough, right? Well, it's not. Here's the code:

def timeout(sec, exception=Error)
  return yield if sec == nil or sec.zero?
  raise ThreadError, "timeout within critical session"\
                                      if Thread.critical
  begin
    x = Thread.current
    y = Thread.start {
      sleep sec
      x.raise exception, "execution expired" if x.alive?
    }
    yield sec
    #    return true
  ensure
    y.kill if y and y.alive?
  end
end

So you call timeout with a number of seconds, an optional exception type to raise, and the code to execute. A new thread is spun up (for every invocation, I might add) and set to sleep for the specified number of seconds, while the main/calling thread yields to your block of code.

If the code completes before the timeout thread wakes up, all is well. The timeout thread is killed and the result of the provided block of code is returned.

If the timeout thread wakes up before the block has completed, it tells the main/calling thread to raise a new instance of the specified exception type.

All this is reasonable if we assume that Thread#kill and Thread#raise are safe. Unfortunately, they're provably unsafe.

Here's a reduced example:


main = Thread.current
timer = Thread.new { sleep 5; main.raise }
begin
  lock(some_resource)
  do_some_work
ensure
  timer.kill
  unlock_some_resource
end

Here we have a simple timeout case. A new thread is spun up to wait for five seconds. A resource is acquired (in this case a lock) and some work is performed. When the work has completed, the timer is killed and the resource is unlocked.

The problem, however, is that you can't guarantee when the timer might fire.

In general, with multithreaded applications, you have to assume that cross-thread events can happen at any point in the program. So we can start by listing a number of places where the timer's raise call might actually fire in the main part of the code, between "begin" and "ensure".

It could fire before the lock is acquired
It could fire while the lock is being acquired (potentially corrupting whatever resource is being locked, but we'll ignore that for the moment)
It could fire after the lock is acquired but before the work has started
It could fire while the work is happening (presumably the desired effect of the timeout, but it also suffers from potential data corruption issues)
It could fire immediately after the work completes but before entering the ensure block

Other than the data corruption issues (which are very real concerns) none of these is particularly dangerous. We could even assume that the lock is safe and the work being done with the resource is perfectly synchronized and impossible to corrupt. Whatever. The bad news is what happens in the ensure block.

If we assume we've gotten through the main body of code without incident, we now enter the ensure. The main thread is about to kill the timeout thread, when BAM, the raise call fires. Now we're in a bit of a predicament. We're already outside the protected body, so the remaining code in the ensure is going to fail. What's worse, we're about to leave a resource locked that may never get unlocked, so even if we can gracefully handle the timeout error somewhere else, we're in trouble.

What if we move the timer kill inside the protected body, to ensure we kill the timer before proceeding to the lock release?


main = Thread.current
timer = Thread.new { sleep 5; raise }
begin
  lock(some_resource)
  do_some_work
  timer.kill
ensure
  unlock_some_resource
end

Now we have to deal with the flip side of the coin: if the work we're performing raises an exception, we won't kill the timer thread, and all hell breaks loose. Specifically, after our lock has been released and we've bubbled the exception somewhere up into the call stack, BAM, the raise call fires. Now it's anybody's guess what we've screwed up in our system. And the same situation applies to any thread you might want to call Thread#raise or Thread#kill against: you can't make any guarantees about what damage you'll do.

There's a good FAQ in the Java SE platform docs entitled Why Are Thread.stop, Thread.suspend, Thread.resume and Runtime.runFinalizersOnExit Deprecated?, which covers this phenomenon in more detail. You see, in the early days of Java, it had all these same operations: killing a thread with Thread.stop(), causing a thread to raise an arbitrary exception with Thread.stop(Throwable), and a few others for suspending and resuming threads. But they were a mistake, and in any current Java SE platform implementation, these methods no longer function.

It is provably unsafe to be able to kill a target thread or cause it to raise an exception arbitrarily.

So what about net/protocol.rb? Here's the relevant code, used to fill the read buffer from the protocol's socket:

def rbuf_fill
  timeout(@read_timeout) {
    @rbuf << @io.sysread(8196)
  }
end

This is from JRuby's copy; MRI performs a read of 1024 bytes at a time (spinning up a thread for each) and Rubinius has both a size modification and a timeout.rb modification to use a single timeout thread. But the problems with timeout remain; specifically, if you have any code that uses net/protocol (like net/http, which I'm guessing a few of you rely on) you have a chance that a timeout error might be raised in the middle of your code. You are all rescuing Timeout::Error when you use net/http to read from a URL, right? What, you aren't? Well you better go add that to every piece of code you have that calls net/http, and while you're at it add it to every other library in the world that uses net/http. And then you can move on to the other net/* protocols and third-party libraries that use timeout.rb. Here's a quick list to get you started.

Ok, so you don't want to do all that. What are your options? Here's a few suggestions to help you on your way:

Although you don't have to take my word for it, eventually you're going to have to accept the truth. Thread#kill, Thread#raise, timeout.rb, net/protocol.rb all suffer from these problems. net/protocol.rb could be fixed to use nonblocking IO (select), as could I suspect most of the other libraries, but there is no safe way to use Thread#kill or Thread#raise. Let me repeat that: there is no safe way to use Thread#kill or Thread#raise.
Start lobbying the various Ruby implementations to eliminate Thread#kill and Thread#raise (and while you're at it, eliminate Thread#critical= as well, since it's probably the single largest thing preventing Ruby from being both concurrently-threaded and high-performance).
Start lobbying the library and application maintainers using Thread#kill, Thread#raise, and timeout.rb to stop.
Stop using them yourself.

Now I want this post to be productive, so I'll give a brief overview of how to avoid using these seductively powerful and inherently unsafe features:

If you want to be able to kill a thread, write its code such that it periodically checks whether it should terminate. That allows the thread to safely clean up resources and prepare to "die itself".
If you need to time out an operation, you're going to have to find a different way to do it. With IO, it's pretty easy. Just look up IO#select and learn to love it. With arbitrary code and libraries, you may be able successfully lobby the authors to add timeout options, or you may be able to hook into them yourself. If you can't do either of those...we'll, you're SOL. Welcome to threads. I hope others will post suggestions in the comments.
If you think you can ignore this, think again. Eventually you're going to get bitten in the ass, be it from a long-running transaction that gets corrupted due to a timeout error or a filesystem operation that wipes out some critical file. You're not going to escape this one, so we should start trying to fix it now.

I'm hoping this will start a discussion eventually leading to these features being deprecated and removed from use. I believe Ruby's viability in an increasingly parallel computing world depends on getting threading and concurrency right, and Thread#raise, Thread#kill, and the timeout.rb library need to go away.

Thursday, February 21, 2008

FOSDEM

It's 5am in Brussels and I'm awake. That can only mean one thing. Time to blog!

This weekend I'm presenting JRuby at FOSDEM, the "Free and Open Source Developers European Meeting." I was invited to talk, and who could pass up an invitation to Belgium?

I've been trying to shake things up with my recent JRuby talks. At Lang.NET, I obviously dug into technical details a lot more because it was a crowd that could stomach such things. At acts_as_conference, I threw out the old JRuby on Rails demo and focused only on things that make JRuby on Rails different (better) than classic Ruby on Rails development, such as improved performance, easier deployment, and access to a world of Java libraries. And FOSDEM will include all new content as well: I'm spending Friday putting together a talk that discusses JRuby capabilities and status while simultaneously illustrating the impact community developers have had on the project. After all, it's an OSS conference, so I'll continue my recent trend and try to present something directly on-topic.

For those of you unable to attend, it turns out there will be a live video stream of some of the talks, including the whole Programming Languages track. I don't think I've ever been livestreamed before.

Saturday, February 16, 2008

JRuby RC2 Released; What's Next?

Today, Tom got the JRuby 1.1 RC2 release out. It's an astounding collection of performance improvements and compatibility fixes. Here's Tom's JRuby 1.1 RC2 announcement.

Let's recap a little bit:

There have been additional performance improvements in RC2 over RC1. Long story short, performance of most trivial numeric benchmarks approaches or exceeds Ruby 1.9, while most others are on par. So in general, we have started using Ruby 1.9 performance as our baseline.
Unusual for an RC cycle are the 260 bugs we fixed since RC1. Yes, we're bending the rules of RCs a bit, but if we can fix that many bugs in just over a month, who can really fault us for doing so? Nobody can question that JRuby is more compatible with Ruby 1.8.6 than any other Ruby implementation available.
We've also included a number of memory use improvements that should reduce the overall memory usage of a JRuby instance and perhaps more importantly reduce the memory cost of JIT compiled code (so called "Perm Gen" space used by Ruby code that's been compiled to JVM bytecode). In quick measurements, an application running N instance of JRuby used roughly 1/Nth as much perm gen space.

So there's really two big messages that come along with this release. The first is that JRuby is rapidly becoming the Ruby implementation of choice if you want to make a large application as fast and as scalable as possible. The thousands of hours of time we've poured into compatibility, stability, and performance are really paying off. The second message comes from me to you:

If you haven't looked at JRuby, now's the time to do so.

All the raw materials are falling into place. JRuby is ready to be used and abused, and we're ready to take it to the next level for whatever kinds of applications you want to throw at it. And not only are we ready, it's our top priority for JRuby. We want to make JRuby work for you.

And that brings me to the "What's Next" portion of this post. Where do we go from here?

We've got a window for additional improvements and fixes before 1.1 final. That much is certain, and we want to fill that time with everything necessary to make JRuby 1.1 a success. So we need your help...find bugs, file reports, let us know about performance bottlenecks. And if you're able, help us field those reports by fixing issues and contributing patches.

But we also have a unique opportunity here. JRuby offers features not found in any other Ruby implementation, features we've only begun to utilize:

Inside a single JVM process, JRuby can be used to host any number of applications, scaling them to any number of concurrent users.

This one applies equally well to Rails and to other web frameworks rapidly gaining popularity like Merb. And new projects like the GlassFish gem are leading the way to simple, scalable, no-hassle hosting for Ruby web applications. But we're pretty resource-limited on the JRuby project. We've got two full-time committers and a handful of part-timers and after-hours contributors pouring their free time into helping out. For JRuby web app hosting to improve and meet your requirements, we're going to need your use cases, your experience, and your input. We're attempting to build the most scalable, best-performing Ruby web platform with JRuby, but we're doing it in true OSS style. No secrets, no hidden agendas. This is going to be your web platform, or our efforts are wasted. So what should it look like?

The GlassFish gem is the first step. It leverages some of the best features of the Java platform: high-speed asynchronous IO, native threading, built-in management and monitoring, and application namespace isolation (yes, even classloaders have a good side) to make Ruby web applications a push-button affair to deploy and scale. With one command, your app is production-ready. No mongrel packs to manage, no cluster of apps to monitor, and no WAR file relics to slow you down. "glassfish_rails myapp" and you're done; it's true one-step deployment. Unfortunately right now it only supports Rails. We want to make it not only a rock-solid and high-performance Rails server, but also a general-purpose "mod_ruby" for all Ruby web-development purposes. It's the right platform at the right time, and we're ready to take it to the next level. But we need you to try it out and let us know what it needs.

JRuby's performance regularly exceeds Ruby 1.8.6, and in many cases has started to exceed Ruby 1.9.

At this point I'm convinced JRuby will be able to claim the title of "fastest Ruby implementation", for some definition of "Ruby". And if we're not there yet, we will be soon. With most benchmarks meeting or exceeding Ruby 1.8.6 and many approaching or exceeding Ruby 1.9 we're feeling pretty good. What I've learned is that performance is important, but it's not the driving concern for most Ruby developers, especially those building web applications usually bounded by unrelated bottlenecks in IO or database access. But that's only what I've been able to gather from conversations with a few of you. Is there more we need to do? Should we keep going on performance, or are there other areas we should focus on? Do you have cases where JRuby's performance doesn't meet your needs?

JRuby performance future is largely an open question right now. A stock JRuby install performs really well, "better enough" that many folks are using it for performance-sensitive apps already. We know there's bottlenecks, but we've solving them as they come up, and we're on the downward slope. Outside of bottlenecks, do we have room to grow? You bet we do. I've got prototype and "experimental" features already implemented and yet to be explored that will improve JRuby's performance even more. Of course there's always tradeoffs. You might have to accept a larger memory footprint, or you may have to turn off some edge-case features of Ruby that incur automatic performance handicaps. And some of the wildest improvements will depend on dynamic invocation support (hopefully in JDK 7) and a host of other features (almost certain to be available in the OpenJDK "Multi-language VM" subproject). But where performance improvements are needed, they're going to happen, and if I have any say they're going to happen in such a way that all other JVM languages can benefit as well. I'm looking to you folks to help us prioritize performance and point us in the right direction for your needs.

JRuby makes the JVM and the Java platform really shine, with excellent language performance and a "friendlier" face on top of all those libraries and all that JVM magic.

I think this is an area we've only just started to realize. Because JRuby is hosted on the JVM, we have access to the best JIT technology, the best GC technology, and the best collection of libraries ever combined into a single platform. Say what you will about Java. You may love the language or you may hate it...but it's becoming more and more obvious that the JVM is about a lot more than Java. JRuby is, to my knowledge, the only time a non-JVM language has been ported to the JVM and used for real-world, production deployments of a framework never designed with the JVM in mind. We've taken an unspecified single-implementation language (Matz's Ruby) and its key application framework, (Rails) and delivered a hosting and deployment option that at least parallels the original and in many ways exceeds it. And this is only the first such case...the Jython guys now have Django working, and it's only a matter of time before Jython becomes a real-world usable platform for Python web development. And there's already work being done to make Merb--a framework inspired by Rails but without many of Rails' warts--run perfectly in JRuby. And it's all open source, from the JVM up. This is your future to create.

I think the next phase for JRuby will bring tighter, cleaner integration with the rest of the Java platform. Already you can call any Java-based library as though it were just another piece of Ruby code. But it's not as seamless as we'd like. Some issues, like choosing the right target method or coercing types appropriately, are issues all dynamic languages face calling into static-typed code. Groovy has various features we're likely to copy, features like explicit casting and coercing of types and limited static type declaration at the edges. Frameworks like the DLR from Microsoft have a similar approach, since they've been designed to make new languages "first class" from day one. We will work to find ways to solve these sorts of problems for all JVM languages at the same time we solve them for JRuby. But there's also a lot that needs to come from Ruby users. What can we do to make the JVM and all those libraries work for you?

I guess there's a simple bottom line to all this. JRuby is an OSS project driven mostly by community contributors (5 out of 8 committers working in their free time and hundreds of others contributing patches and bug reports), based on an OSS platform (not only OpenJDK, but a culture of Free software and open source that permeates the entire Java ecosystem), hosting OSS frameworks and libraries (Rails, Merb, and the host of other apps in the Ruby world). All this is meaningless without you and your applications. We're ready to pour our effort into making this platform work for you. Are you ready to help us?

Monday, January 28, 2008

Lang.NET 2008: Day 1 Thoughts

Yes friends, I'm at Microsoft's Lang.NET symposium this week. Does this strike you as a bit peculiar?

Lang.NET is Microsoft's event for folks interested in using and implementing "alternative" languages on the CLR. From the Lang.NET site itself:

Lang .NET 2008 Symposium is a forum for discussion on programming languages, managed execution environments, compilers, multi-language libraries, and integrated development environments. This conference provides an excellent opportunity for Programming Language Implementers and Researchers from both industry and academia to meet and share their knowledge, experience, and suggestions for future research and development in the area of programming languages

Of course .NET and the CLR aren't mentioned explicitly here, but being a Microsoft event I think it's reasonable to expect it to be .NET heavy. And that's certainly ok, since at least one of the key professed design goals of the CLR is to support multiple languages, and by multiple we'd like to think they mean more than VB or C#. Microsoft would understandably like to cultivate a culture of language implementation and experimentation on CLR, since history has shown repeatedly that programming language diversity is a must for any platform, system, or framework to be long-lived. This fact is not new to many of us, although there are certain companies that until recently refused to believe it. But I digress.

What is a long time Java user and short time JVM language implementer doing here? Why would I subject myself to the slings and arrows of a potentially unfriendly .NET crowd, or at least subject myself to the scalding apathy of a roomful of engineers with no interest in the JVM or my work on JRuby? Well, that's an excellent question.

Perhaps it's more entertaining to describe *what* I'm doing here before explaining *why* I'm doing it. The credit for the idea goes to Brian Goetz, a coworker of mine at Sun and currently an engineer working on JavaFX (though you probably know him better from his talks on Java performance or his book and talks on Java concurrency). Given that we're both working on JVM language-related projects, and we both would like to see the JVM and the Java platform continue to grow, evolve, and expand, he thought it might be useful for us two and John Rose to attend the conference to present and discuss the many enhancements we're considering to support non-Java languages. You may know John Rose from his posts on adding various wild features to the JVM...features that you, dear reader, and others like you may never have expected to enter the JVM in any form. That's how I know him, and from my many discussions with him about how to make JRuby perform well on current JVM versions. But we must look to the future, and attending Lang.NET is at least in part to help us validate that this is an appropriate future to pursue.

So today John Rose and I presented his Da Vinci Machine project to a largely .NET crowd, with John covering the theoretical aspects of what features we'd like to add to the JVM and how he plans to add them, and me covering the practical reasons why we'd like to add those features using JRuby as an example. Those of you who saw my presentation at RubyConf would have recognized my slides; I largely presented the same "JRuby design internals" content in more depth and with language implementers and enthusiasts in mind. Specifically I presented details about JRuby's parser, interpreter, core class implementations, extensions and POSIX support, and of course its compiler and related optimizations. The last area was obviously the most interesting for this crowd, but I'll explain that later.

So there's the primary "what" of the trip. And it was the primary "what" we three shared in common. But there were a few other "what"s I had in mind entering into this, and they also explain the "why":

I hoped to understand better how the DLR (and to a lesser extent the CLR) improve the language-implementing ecosystem on .NET.
I wanted to meet and talk with .NET language implementers about their strategies for various problems facing JRuby and similar JVM languages.
I wanted to meet my counterparts on the .NET side of the world, which is increasingly becoming a parallel universe to the JVM and the Java platform...at least as far as problem-solving strategies and future directions go.
I enjoy meeting and talking with smart people. There's a few of them here.

So in order to "yeggefy" my post a bit, I want to elaborate on the first and primary point in more detail, rather than give another yawner of a conference play-by-play.

Pain

It's worth prefacing this with a disclaimer: I am not a Microsoft fan. Perhaps it's like being burned too many times by a hot iron. My work on LiteStep, my enthusiasm for open source, my interest in collaborative, communal development processes, and my dark days learning Win32, MFC, and COM mean I'll probably never be particularly warm to Microsoft projects, technology, or ideals, regardless of how they may evolve over time. That's not to say Microsoft can't change or isn't changing...but as you probably know it's hard to make yourself touch a cold iron. Pain rewires our circuitry, and lingers on in our altered behavior forever. I have felt pain.

But I endeavor to rise above my own wiring and bigotry, which is why I force myself to help other JVM language implementations, force myself to help other Ruby implementations, and force myself to share freely and openly all I can even with members of otherwise competing worlds. In this sense, the pain that comes from cooperating with projects I might otherwise want to undermine, derail, or discount is antithetical to my believe in open doors and open processes. I must cure myself of this affliction, rewrite myself to not feel that pain.

And so I am trying to study the DLR, which brings me to the big point #1. I will appreciate any and all corrections to everything I state here.

Eyeing the DLR

The DLR is Microsoft's Dynamic Language Runtime, a set of libraries and tools designed to make it easier to implement dynamic languages atop the CLR. The DLR provides facilities for compiler and interpreter generation (via language-agnostic expression trees), fast dynamic invocation (via self-updating dynamic call sites), and cross-language method dispatch and type system support. It is, to my eyes, intended to be the "everytool" needed to implement dynamic languages for .NET.

The DLR has largely grown out of Jim Hugunin's work on IronPython, and aims to tackle all the same issues we've dealt with while working on JRuby. But it also provides more than that in its self-optimizing capabilities and its expression tree logic, two features that undoubtedly make it easier to produce CLR dynamic language implementations with acceptable to excellent performance characteristics.

Expression trees in the DLR are used as an intermediate representation of language semantics. Where the typical language implementation process has you parse to an AST and then either interpret that or perform the additional work to compile the AST to a target "assembly" language, expression trees are meant to be (mostly) the last step in your language implementation process (outside of implementing language-specific types and dealing with language-specific nuances). Up to the point of creating a parsed AST the two paths are largely the same. But on the DLR, instead of proceeding to compiler work, you further transform the AST into an expression tree that represents the language behavior in DLR terms. So if your language provides mechanisms to conditionally run a given expression (modifiers in Ruby terms), you might turn your one conditional expression AST node into equivalent test and execute DLR expression tree nodes. It appears the translation is almost always a widening translation, where the DLR hopes to provide a broad set of "microsemantics" that can be used to compose a broader range of language features. And of course the idea is that there's a quickly-flattening asymptotic curve to the number of expression tree nodes required to implement each new language. Whether that's the case is yet to be seen.

I must admit, I've poo-pooed the expression tree idea in the past, mostly because I did not understand it and had not found a way to see beyond my personal biases. While a language-generic expression tree is certainly a very appealing goal, one that lowers the entry cost of language implementation considerably, it always seemed like a too-lofty ideal. Of course it's impossible to predict what features future languages might decide to add, so can we reasonably expect to provide a set of expression nodes that can represent those languages? Won't we be endlessly forced to extend that set of node types for each new language that comes along? And if to alleviate such pain we are given hooks to call back to custom language-specific code for cases where the DLR falls down, doesn't that eliminate the advantage we hoped to gain in the first place?

I'll make no secret about my skepticism here, but I've seen a couple things today that make me far more optimistic about the utility of language-agnostic expression trees.

The first was during Anders Hejlsberg's talk about C# 3.0. C# 3.0 adds a considerable slate of features, some which I can appreciate (type inference, for example, would take 99% of the pain out of using generics; Anders called that out as a key reason for adding inference to C#) and others which I'm still dubious on (LINQ's additions to C#'s already large set of keywords). But his talk made one thing abundantly clear: these changes are little more than compiler magic, smoke and mirrors to produce syntactic sugar. And he admitted as much, showing that behind the scenes LINQ and inferred types and their ilk basically compile down to the same IL that would be produced if you write the same sugar by hand. And that's particularly heartening to me, since it means the vast majority of these features could be added to Java or supported by JVM languages without modifying the JVM itself. Whether they're features we want to add is a separate discussion.

The second detail that made me better appreciate expression trees stems directly from the first: they're just objects, man. Although expression tree support is being wired directly into C# and VB.NET for at least the small subset of expressions LINQ encompasses, in general we're talking about nothing more than a graph of objects and a set of libraries and conventions that can utilize them. If it were reasonable for us to expose JRuby's AST as a "one true AST" to represent all languages, we'd be able to support a similar set of features, adding syntax to Java to generate Ruby ASTs at runtime and capabilities for manipulating, executing, and compiling those ASTs. Naturally, we'd never try to make an argument that the JRuby AST can even come close to approximating a set of low-level features "all languages" might need to support, but hopefully you see the parallel. So again, we're working at a level well above the VM itself. That means the power of expression trees could definitely be put into the hands of JVM language developers without any modifications to existing JVMs. And in the DLR's case, they're even considering the implications of allowing new language implementations to provide extensions to the set of expression tree nodes, so at least some of them recognize that it's all just objects too...and perhaps starting to recognize that nobody will ever build the perfect mouse trap.

So that brings me to my other key point of interest in the DLR: its mechanisms for optimizing performance at runtime.

The DLR DynamicSite is basically an invokable object that represents a given call at a given position in a program's source code. When you perform a dynamic invocation, the DynamicSite runs through a list of rules, checking each in turn until it finds a match. If it finds a match, it invokes the associated target code, which presumably will do things like coerce types, look up methods, and perform the actual work of the invocation. As more calls pass through the same call site, DynamicSite will (if necessary) repeatedly replace itself with new code (new IL in fact) that includes some subset of previous rules and targets (potentially all of them if they're all still valid or none of them if they're all now invalid) along with at least one new rule for the new set of circumstances brought about by this call. I've prototyped this exact same behavior in JRuby's call sites (on a more limited scale), but have not had the time to include them in JRuby proper just yet (and might never have to...more on that later). The idea is that as the program runs, new information becomes available that will either make old assumptions incorrect or make new optimizations possible. So the DynamicSite continually evolves, presumably toward some state of harmony or toward some upper limit (i.e. "I give up").

And there's a simple reason why the DLR must do this to get acceptable performance out of languages like Python and Ruby:

Because the CLR doesn't.

A Solid Base

In the JVM world, we've long been told about and only recently realized the benefits of dynamic optimization. The JVM has over time had varying capabilities to dynamically optimize and reoptimize code...and perhaps most importantly, to dynamically *deoptimize* code when necessary. Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations--potentially unsafe optimizations in the uncertain future of an entire application--knowing you'll be able to fall back on a tried and true safe path later on. So you can do things like inlining entire call paths once you've seen the same path a few times. You can do things like omitting synchronization guards until it becomes obvious they're needed. And you can change the set of optimizations applied after the fact...in essence, you can safely be "wrong" and learn from your mistakes at runtime. This is the key reason why Java now surpasses C and C++ in specific handcrafted benchmarks and why it should eventually be able to exceed C and C++ in almost all benchmarks. And it's a key reason why we're able to get acceptable performance out of JRuby having done far less work than Microsoft has done on IronPython and the DLR.

The CLR, on the other hand, does not (yet) have the same level of dynamic optimization that the JVM does. Specifically, it does not currently support deoptimizing code that has already been JITed, and it does not always (occasionally? rarely?) include type information into its consideration of how to optimize code. (Those of you more familiar with CLR internals please update or correct me on how much optimization exists today and how deep those optimizations go.) So for dynamic languages to perform well, you simply have to do additional work. You don't have a static call path you can JIT early and trust to never change. You can't bind a call site to a specific method body, knowing verifiably that it will forever be the same code called from that site. And ultimately, that means you must implement those capabilities yourself, with profiling, self-updating call sites that build up rule and target sets based on gathered information.

So I think there's a reasonably simple answer now to folks asking me if I believe we need a "DLR" for JVM language implementers to target. The answer is that certain parts of the DLR are definitely attractive, and they may be worth implementing to ease the pain of JVM language implementation or at least reduce the barrier to entry. But there's a large set of DLR features we simply don't need to create if we can find a way to open up the JVM's existing ability to optimize dynamically...if we can punch a hole in the bulky, dynamic-unfriendly Java exterior. And that's exactly what we're doing (really, John is doing) with JSR-292 (dynamic invocation for JVM) and the MLVM.

Where then do we go from here? If the correct path is only partly down DLR street, what else is missing? Well, some of my Sun brethren may not like to hear these (and some other may love to hear them), but here's a short list of what I believe needs to happen to keep the JVM relevant into the future (at least if we take it as written that multi-language support is a must):

We need clear and consistent proof that a multi-language JVM is a priority, and not just at Sun Microsystems but also in the wider JVM and Java platform communities. The libraries and runtimes and support code necessary will grow out of that commitment. It's obviously not worth the effort to make all this happen if nobody cares and nobody will want to use it. But I don't think that's the case, and currently there's not a strong enough indication from any of the major community players that this cause is worth fighting for. At least in Sun's case, we've thrown down the gauntlet by open-sourcing Java and initiating projects like the JVM Languages mailing list, the JVM Language Runtime project, and the Multi-Language VM project. But we need you, and that brings me to my second point.
We're fighting an uphill battle for talented, excited resources in this domain, and without more money and time spent by the major players on multi-language support for the JVM (in collaboration, I might add), it's not going to happen. Again, whether that's a bad thing depends on whether you believe in a multi-language world. I do, and I'm willing to spend whatever of my time is necessary to make this happen. But I'm no patsy for political interests that might route us down the wrong path, and I'm definitely not going to be able to do this alone. Where do you stand?
We need the freedom to make the JVM "reborn". Java is suffering from middle age (or old age, depending on who you ask) so making nontrivial changes to the JVM specification is generally met with stiff resistance. But I believe this resistance comes either from folks who don't realize the stakes involved, or from folks with their own bigotry and biases, or perhaps simply from rank-and-file pragmatists who don't want to or can't invest the resources necessary to make their implementation of the "JVM" more useful to the relative few of us currently interested in next-generation JVM language support. And of course, there were lots of folks who believed without a doubt the Titanic could never sink, and that additional resources to ensure it would be wasted. It's time to learn from such mistakes. Yes, we can and will use MLVM as a proving ground for these features, and yes, I'm sure Sun and other players will continue to use careful, measured steps to evolve the JVM and the Java language. These are both appropriate directions. But we must never shut the door on the future by claiming that either the JVM or the Java language are "done". There's three words for an entity that can no longer change: "it's dead, Jim". Java is not dead...and I'll be damned if I'm going to let it or the JVM die.

As always my opinions are my own, and may or may not reflect the positions of Sun Microsystems or its partners. But I know for a fact there are many other like-minded individuals that feel the same way.

I'm looking forward to day two of Lang.NET 2008, and to hearing from all of you out there. Comments, questions, flames and praise are welcome. But actions speak louder than words, and the time for action is now.

Update: Here's a link to Tom's and my RubyConf slides. I'll add a link to the Lang.NET slides when they're posted.

Thursday, January 03, 2008

Jython's Back, Baby!

Well it's been a long hard slog for the Jython team. Once thought dead, they seemed to pick up steam more and more over the past year. They got out a long awaited 2.2 release and started to work on many missing 2.3, 2.4, and 2.5 features. They started tackling new parsers and compilers. They spoke with me and other folks at Sun about the future of dynamic languages on the JVM. And above all, they've been working their asses off, having sprints and codefests and hacking away on every corner of Jython.

It looks like their hard work is paying off. Jim Baker reports that they have successfully run Django on Jython. They're using bleeding edge revisions of both Jython and Django, and there's a bit more work to be done, but hey, lots of folks thought it would be impossible. Haven't we heard that somewhere before?

Hats off to whole Jython team and their obviously excellent community for making this a reality. This is just the beginning!

Saturday, December 22, 2007

Project Idea: Native JSON Gem for JRuby

json-lib 2.2 has been released by Andres Almiray, and he boldly claims that "it now has become the most complete JSON parsing library". It supports Java, Groovy, and JRuby quite well, and Andres has an extensive set of json-lib examples to get you started.

But this post is about a project idea for anyone interested in tackling it: make a JRuby-compatible version of the fast JSON library from Florian Frank using Andres's json-lib (or another library of your choosing, if it's as "complete" as json-lib).

JSON is being used more and more for remoting, not just for AJAX but for RESTful APIs as well. Rails 2.0 supports REST APIs using either XML or JSON (or YAML, I believe), and many shops are settling on JSON.

So there's a possibility this could be a bottleneck for JRuby unless we have a fast native JSON library. There's json-pure, also from Florian Frank, which is a pure Ruby version of the library...but that will never compete with a fast version in C or Java.

Anyone up to the challenge? JRUBY-1767: JRuby needs a fast JSON library

Update: Marcin tells me that Florian's JSON library uses Ragel, which may be an easier path to getting it up and running on JRuby. Hpricot and Mongrel also use Ragel, and both already have JRuby versions.

Thursday, December 20, 2007

A Few Easy (?) JRuby 1.1 Bugs

When I posted on a few easy JRuby 1.0.2 bugs a couple months ago, I got a great response. So since I'm doing a bug tracker review today for JRuby 1.1, here's a new list for intrepid developers.

DST bug in second form of Time.local - This doesn't seem like it should be hard to correct, provided we can figure out what the correct behavior is. We've recently switch JRuby to Joda Time from Java's Calendar, so this one may have improved or gotten easier to resolve.

Installing beta RubyGems fails - This actually applies to the current release of RubyGems, 1.0.0...I tried to do the update and got this nasty error, different from the one originally reported. Someone more knowledgable about RubyGems internals could probably make quick work of this.

weakref.rb could (should?) be implemented in Java - So I already went ahead and did this, because it was a trivial bit of work (and perhaps a good piece of code to look at it you want to see how to easily write extensions to JRuby). But what's still missing are any sort of weakref tests or specs. My preference would be for you to add specs to Rubinius's suite, which will someday soon graduate to a top-level standard test suite of its own. But at any rate, a nice set of tests/specs for weakref would do good for everyone.

AnnotationFormatError when running trunk jruby-complete --command irb - This is actually a JarJar Links problem that we're using a patched version for right now. Problem is...even the current jarjar-1.0rc6 still breaks when I incorporate it into the build. These sorts of bugs can drive a person mad, so if anyone wants to shag out what's wrong with jarjar here, we'd really appreciate it.

Failure in ruby_test Numeric#truncate test - The first of a few trivial numeric failures from Daniel Berger's "ruby_test" suite. Pretty easy to jump into, I would expect.

Failure in ruby_test Numeric#to_int test - And another, perhaps the same sort of issue.

IO.select does not work properly with timeout - This one would involve digging into JRuby IO code a bit, but for someone who knows IO reasonably well it may not be difficult. The primary issue is that while almost all other types of IO in JRuby use NIO channels, stdio does not. So basically, you can't do typical non-blocking IO operations against stdin, stdout, or stderr. Think you can tackle it?

Iconv character set option //translit is not supported - The Iconv library, used by MRI for character encoding/decoding/transcoding, supports more transliteration options than we've been able to emulate with Java's CharSet classes. Do you know of a way to support Iconv-like transliteration in Java?

jirb exits on ESC followed by any arrow key followed by return - this is actually probably a Jline bug, but maybe one that's easily fixed?

bfts test_file_test failures - The remaining failures here seem to mostly be because we don't have UNIXServer implemented (since UNIX domain sockets aren't supported on the JVM). However, since JRuby ships with JNA, it could be possible for someone familiar with UNIX socket C APIs to wire up a nice implementation for us. And for that matter, it would be useful for just about anyone who wants to use UNIX sockets from Java.

Process module does not implement some methods - Again, here's a case where it probably would be easy to use JNA to add some missing POSIX/libc functions to JRuby.

Implement win32ole library using one of the available Java-COM bridges - This one already has a start! Some fella named Rui Lopes started implementing the Win32OLE Ruby library using Jacob, a Java/COM bridge. I'd love to see this get completed, since Ruby's DSL capabilities map really nicely to COM/ActiveX objects. (It's also complicated by the fact that most of the JRuby core team develops on Macs.)

allow "/" as absolute path in Windows - this is the second oldest JRuby 1.1-flagged bug, and from reviewing the comments we still aren't sure the correct way to make it work. It can't be this hard, can it?

IO CRLF compatibility with cruby on Windows - Another Windows-related bug that really should be brought to a close. Koichiro Ohba started looking into it some time ago, but we haven't heard from him recently. This is the oldest JRuby 1.1 bug, numbered JRUBY-61, and is actually the second-oldest open bug overall. Can't someone help figure this bloody thing out?

Saturday, December 08, 2007

Upcoming Events: Dec 2007, Jan/Feb 2008

JavaPolis 2007 - Antwerp, Belgium - December 10-14 - Sounds like a great event this year, with claims of over 3200 registrations so far. I'll be sharing the JRuby/NetBeans tutorial with Brian Leonard on the 10th and the JRuby/Rails talk with Ola Bini on the 12th. Outside of that, I'll probably be hacking in the main area. Come say hi.

Microsoft Lang.NET Symposium - Redmond, Washington - January 28-30 - I'll be there to get ideas about building a language platform, sharing war stories with fellow language implementers, and probably contributing a bit to John Rose's talk on the Multi-Language VM project. Oughta be a fun time...though it feels a bit weird making my first trip to Microsoft.

acts_as_conference - Orlando, Florida - February 8-9 - Robert Dempsey of Rails For All invited me to come talk about JRuby and Rails...though I'll be doing things a bit differently this time (not showing how to build a Rails app, but showing purely how JRuby improves the Rails ecosystem). Who could pass up a trip to Florida from Minnesota at this time of year?

FOSDEM 2008 - Brussels, Belgium - February 23-24 - FOSDEM invited me to present on the OSS languages track. I've got some great ideas for how to tackle this one. Given that it's an OSS conference, I think it's finally time to show how JRuby has evolved in the past three years from a slow, partial interpreter and runtime to the fastest Ruby 1.8-compatible implementation around. It's been a hell of a ride, and it's gotta qualify as an OSS success story.

Outside these four events, I've had invitations for plenty others (I could probably just do conferences...but how would I ever get anything done?) so I'm sure there will be more to come. You can also count on JavaOne in San Francisco this spring, Ruby Kaigi in Tokyo this summer, RubyConf Europe in Prague some time between April and July, and maybe RailsConf 2008 in Portland (though there's a good chance I won't be presenting).

Friday, December 07, 2007

Groovy 1.5 Released!

The Groovy team has kicked out their second major production release, Groovy 1.5...and skipped straight from 1.0 to 1.5. Why? Perhaps because they added generics, enums, static imports, annotations, fully dynamic metaclasses, improved performance, ... and much more. I think the move to 1.5 was certainly warranted, and we've been considering making the next JRuby release 1.5 for the same reasons.

Congratulations to the Groovy team! I'm looking forward to seeing 1.6 and 2.0 in the future!

OpenJDK Migration to Mercurial is Complete!

I'm really excited about this one. Kelly O'Hair reports that OpenJDK source has been fully migrated to Mercurial! This means that daily development on OpenJDK (eventually to produce Java 7 and other great things) will happen on the same repository that you, dear reader, can access from home. And it's using Mercurial, one of the two big Distributed SCM apps, so you can pull off an entire repo and maintain your own OpenJDK workshop at home. Excellent news...I now finally have an excuse to learn Hg, and I can finally put in the effort to get OpenJDK building with the knowledge that I'll safely be able to "pull" changes as they happen. Thank you to the OpenJDK migration team!

See also Kelly O'Hair's sun.com blog for articles on the OpenJDK Mercurial layout and how to work with it.

Wednesday, December 05, 2007

Groovy in Ruby: Implement Interface with a Map

Some of you may know I participate in the Groovy community as well. I'm hoping to start contributing some development time to the Groovy codebase, but for now I've mostly been monitoring their progress. One thing the Groovy team has more experience with is integrating with Java.

Now if you ask the Groovy team, they'll make some claim like "it's all Java objects" or "Groovy integrates seamlessly with Java" but neither of those are entirely true. Groovy does integrate extremely well with Java, but it's because of a number of features they've added over time to make it so...many of them not directly part of the Groovy language but features of their core libraries and portions of their runtime.

Since Ruby and Groovy seem to be the two most popular (or noisiest) non-Java JVM languages these days, I thought I'd start a series of posts showing how to add Groovy features missing from Ruby to JRuby. But there's a catch: I'll use only Ruby code to do this, and what I show will work on any unmodified JRuby release. That's the beauty of Ruby: the language is so flexible and fluid, you can implement many features from other languages without ever modifying the implementation.

First up, Groovy's ability to implement an interface from a Map.

1. impl = [
2.   i: 10,
3.   hasNext: { impl.i > 0 },
4.   next: { impl.i-- },
5. ]
6. iter = impl as Iterator
7. while ( iter.hasNext() )
8.   println iter.next()

Ok, this is Groovy code. The brackety thing assigned to 'impl' shows Groovy's literal Map syntax (a Hash to you Rubyists). Instead of providing literal strings for the keys, Groovy automatically turns whatever token is in the key position into a Java String. So 'i' becomes a String key referencing 10, 'hasNext' becomes a String key referencing a block of code that checks if impl.i is greater than zero, and so on.

The magic comes on line 6, where the newly-constructed Map is coerced into a java.util.Iterator implementation. The resulting object can then be passed to other code that expects Iterator, such as the while loop on lines 7 and 8, and the values from the Map will be used as the code for the implemented methods.

To be honest, I find this feature a bit weird. In JRuby, you can implement a given interface on any class, add methods to that class at will, and get most of this functionality without ever touching a Hash object. But it's pretty simple to implement this in JRuby:

1. module InvokableHash
2.   def as(java_ifc)
3.     java_ifc.impl {|name, *args| self[name].call(*args)}
4.   end
5. end

Here we have one of Ruby's wonderful modules, which I appreciate more each day. This InvokableHash module provides only a single method 'as' which accepts a Java interface type and produces an implementation of that type that uses the contents of hash keys to implement the methods. That's really all there is to it. So by reopening the Hash class, we gain this functionality:

1. class Hash
2.   include InvokableHash
3. end

And we're done! Let's see the fruits of our labor in action:

1. impl = {
2.   :i => 10,
3.   :hasNext => proc { impl[:i] > 0 },
4.   :next => proc { impl[:i] -= 1 }
5. }
6. iter = impl.as java.util.Iterator
7. while (iter.hasNext)
8.   puts iter.next
9. end

Our final Ruby code looks roughly like the Groovy code. On lines 1 through 5 we construct a literal Hash. Notice that instead of automatically turning identifier tokens into Strings, Ruby uses the exact object you specify for the key, and so here we use Ruby Symbols as our hash keys (they're roughly like interned Strings, and highly recommended for hash keys). On line 6, we coerce our Hash into an Iterator instance (and we could have imported Iterator above to avoid the long name). And then lines 7 through 9 use the new Iterator impl in exactly the same way as the Groovy code.

You've gotta love a language this flexible, especially with JRuby's magic Java integration features to back it up.