Thursday, March 12, 2009

More Compiling Ruby to Java Types

I did another pass on compiler2, and managed to wire in signature support. So let's look at a couple examples:

class MyRubyClass
def helloWorld
puts "Hello from Ruby"
end
def goodbyeWorld(a)
puts a
end

signature :helloWorld, [] => Java::void
signature :goodbyeWorld, [java.lang.String] => Java::void
end

In this case we have our friend MyRubyClass once again, with helloWorld and goodbyeWorld methods. You'll recall from my previous post that these two methods originally compiled as returning IRubyObject, and goodbyeWorld compiled as receiving a single IRubyObject parameter.

But with signature support, things are so much cooler! The two "signature" lines at the bottom of the class (syntax and structure are totally up for debate) associated signatures with the two methods. helloWorld receives no parameters and has a void return type. goodbyeWorld receives a single String parameter and has a void return type.

The compiler takes this new information, and produces a more normal-looking set of Java signatures:
Compiled from "MyObject.java.rb"
public class MyObject extends org.jruby.RubyObject{
static {};
public MyObject();
public void helloWorld();
public void goodbyeWorld(java.lang.String);
}

Huzzah! There's almost nothing here to give away that we're actually dealing with Ruby code under the covers. And the code that consumes this is just as simple:
public class MyObjectTest {
public static void main(String[] args) {
MyObject obj = new MyObject();
obj.helloWorld();
obj.goodbyeWorld("hello");
}
}

And that's literally all there is to it. Here's a more advanced example:
class MyRubyClass
%w[boolean byte short char int long float double].each do |type|
java_type = Java.send type
eval "def #{type}Method(a); a; end"
signature "#{type}Method", [java_type] => java_type
end
end

This time we're actually *generating* the methods, looping over a list of Java primitives and eval'ing a method for each. So this is *runtime* generation of methods, like any good Rubyist loves to do. And of course, this is absolutely no problem for compiler2:
Compiled from "MyObject2.java.rb"
public class MyObject2 extends org.jruby.RubyObject{
static {};
public MyObject2();
public double doubleMethod(double);
public int intMethod(int);
public char charMethod(char);
public short shortMethod(short);
public boolean booleanMethod(boolean);
public float floatMethod(float);
public long longMethod(long);
public byte byteMethod(byte);
}

All the methods are there, just as you'd expect them! Fantastic!!! (Though the ordering is a little peculiar; I think that's because we don't have an ordered method table in our class impl. Does it matter?)

Even better, the above methods are doing the same type coercion on the way in and out that we do for any other Java-based method calling. So your integral numerics are presented to Ruby as Fixnums, floating-point numerics are Floats, and booleans come through as Ruby true or false.

There's certainly more work to be done:
  • There's no support for overloads at the moment, but I'll likely provide a method aliasing facility so you can define multiple Ruby methods and then say which one maps to which overload. And of course, you'll be able to define multiple overloads that go to the same method body if you wish.
  • I also have not wired in varargs, but it will be an easy match to Ruby's restargs. And optional arguments could automatically generate different-arity Java signatures.
  • Annotations will also be trivial to add; it's just a matter of attaching appropriate metadata and having compiler2 emit them. So you'll be able to use JavaEE 5, JUnit4, and any other frameworks that depend on having annotations present.
Of course this is all checked into JRuby trunk, so feel free to give it a try. Stop by JRuby mailing lists or IRC if you have questions. And it's all still written in Ruby; signature support bloated the compiler up to a whopping 178 lines of code, most of that for dealing with the JVM opcodes for primitive types.

This is just the beginning!

20 comments:

Martin Probst said...

I think method order is insignificant. You can observe it when iterating over the methods of a class using reflection, but at least for Java the Javadocs explicitly say that you must not assume any specific order.

This has bitten me once as IBM's JDK behaves different from Sun's in that place.

dvae said...

I think method order is important.

Only because a compiler should be deterministic. For the same input the output should always be the same. Down to the last byte!

If not then the md5sum of your project maybe different each time!

If you are reliant on hashcode to order, then as some strings may get interned, the order can change.

Also note that in Maven 2.0.10, many of the main uses on HashMap have been replaced with LinkedHashMap to avoid non-deterministic dependency orders.

Sure it doesn't matter to reflection, and compilation. But why introduce non-determinism when you don't have to?

sundog said...

Your awesomeness knows no bounds :-) Great work. Looks like a very natural way to add the meta data necessary to interface to the Java world.

Dave Newton said...

The "signature" method is great--no new Ruby syntax, and provides all the hinting necessary (and opens up new possibilities as a bonus).

Daniel Spiewak said...

I'm impressed! I didn't think there was any way you could create a Ruby compiler which produced real JVM classes. The downside is that I don't think this could be practically applied to *every* Ruby class in a project, but since an API is generally defined by a few "outer" classes, I don't think that will be a problem.

JoergWMittag said...

There is one thing I am a little worried about: the proliferation of static type annotations in Ruby:

While RDoc itself does not have support for type annotations (and its current maintainer has made it very clear that it never will – I believe his exact words were "Over my dead body"), the core and standard library documentation that is part of MRI and YARV, actually includes a lot of static type annotations in a semi-formal, semi-consistent format. See String#[] for an example.

Diamondback Ruby uses annotations that are inspired by the RDoc annotations in the MRI core libraries, but are much richer, since DRuby's type system is much richer than the imaginary type system used by the MRI documentation. They also have slightly different markup, to differentiate them from normal RDoc comments. See <stubs/1.8/basetypes.rb#L1102-1104> for the same example as above – note that because of DRuby's richer type system, it only needs 3 annotations instead of 6.

YARD supports type annotations but uses a completely different syntax than MRI or DRuby. Instead, its syntax is essentially the same as JavaDoc's.

Then there is Eivind Eklund's type library.

NetBeans and Ruby in Steel actually seem to share the same syntax for type annotations, which is at least a start, although this syntax is different from all the others in this list.

And, of course, we have Duby, which differs from the above five versions in that it is not constrained by backwards-compatibility, thus its annotation syntax is actually much nicer since it can afford to be illegal Ruby syntax.

Now, JRuby introduces yet another syntax for type annotations, which is actually pretty similar to the type library's, but not identical.

So, if I want my code to work with JRuby and DRuby, and want to have nice documentation and IDE support, I need to annotate my methods four times with the exact same type information. I'd much rather have just one annotation syntax. It would be nice if the designers of those projects (and other projects that might be interested, presumably pretty much all Ruby implementations that provide tight integration with the underlying platform like XRuby, Ruby.NET, IronRuby, Ruby Red, but also projects like SWIG and Ruby-FFI, API documentation projects, other IDEs, projects with heavy data marshaling needs like DataMapper, Sequel and ActiveRecord) could come to an agreement on both a unified object model and syntax for annotations.

Of course, the best way to solve this would be to add annotations to Ruby, like Python 3.0 did, but unfortunately the specification for Ruby 1.9 was frozen 5 months ago and Ruby 2.0 is still another 5 years or so out. Also, every mention of type signatures seems to be met with a Pavlovian reflex to tell the proponent to fuck off and stay with Java, without even so much as looking at what was actually proposed. (Hint: it has nothing to do with static typing.)

Ken Bloom said...

Allow me to suggest that if you want to overload a method, you should consider skipping the aliasing, and just do it by providing multiple calls to "signature", for that same method. The programmer will have to decode the parameters themselves, but anyone doing that with Ruby method calls is used to that by now.

e.g.

class MyRubyClass
def overloaded *args
if args.size==0
#implement the no-args version here
elsif args.size==1 and args[0].is_a? String
#implement the String version
end
end

signature :overloaded, [] => Java::void
signature :overloaded, [java.lang.String] => Java::void
end

Charles Oliver Nutter said...

Ken: Not a bad thought. And of course patches are accepted; but that sounds like a good way to do it, and I'll probably get around to that soon. I'm really hoping more people will have a look at the code, since it's just Ruby and pretty simple to figure out. tool/compiler2.rb in JRuby repo.

Charles Oliver Nutter said...

dvae: The reflection ordering doesn't convince me but determinism does. I'll see what I can do to get the methods generating in the same order every time (probably alphabetical).

Charles Oliver Nutter said...

JoergWMittag: I sympathize, and that's why I've left signature specification intentionally vague. The only requirement that would be set in stone is that there be a way for the compiler to get signature data; how that signature data is attached to the class is up for debate.

So, for example, someone could take any one of those other type-annotating schemes and tweak them for compiler2. I wouldn't mind at all. The syntax here, with the "signature" method, is just something simple to get the compiler itself working.

Charles Oliver Nutter said...

Daniel: Yeah, I don't think there's even a need to produce a Java class for every Ruby class in the system, and really you don't need to lock yourself into Java types except where you intend to present an API. Of course, I don't see that there are any limitations to this means of compilation, so in theory you *could* annotated every API in your system. But I doubt that's desirable.

Sean said...

Great work! Yours spurs my enthusiasm too; I can even imagine myself working with Java again, after I got bitter over its blown-uppedness.

Of course people like me will also start to demand type-inference-assistance further down the road. But given the facility of type annotations that may very well be done by a different project.

Fabio Kung said...

Great stuff! Am I able to add signature info later, reopening the class?

That way, we could conditionally add this information in separate .rb files, only when running on JRuby, to build portable apps.

I guess it could be done, as the compiler works with the "runtime version" of classes.

Charles Oliver Nutter said...

Fabio Kung: Yes, you can add the signature info anywhere, any time in your application, so long as it's present for compiler2 to inspect and emit the Java type information. That's what makes it so much nicer than any options that required syntax changes, "special" structured classes, or offline inspection of an AST to get the compiler information.

Herve said...

@headius; I just translated the French wiki entry for Rails 2.3.2. I'm not the original author, so let me know if some things are odd / still difficult to understand.

Herve said...

I follow you on Twitter regularly but I don't have a Twitter account ;)

Charles Oliver Nutter said...

Herve: Hey thanks! Looks good to me!

Jonathan said...

This work is fantastic. I've modified compiler2 a bit to load up some gems-in-jars and now I've got Ruby files in my Java web service that are being compiled by the same ant task as everything else.

What's the work that's required for subclassing, or perhaps I should ask what the strategy is? I'm unsure how one might wrap the Ruby guts to expose a class as anything other than extends RubyObject. I'd love to fiddle around and maybe get onto this work. My dream is to write Wicket in Ruby....

Charles Oliver Nutter said...

Jonathan: I'd love to have you collaborate on it. I think I'm going to spin this off as a separate project today so others can start to contribute to it, and we'll plan to just release it as a gem.

Jae said...

I am working on an integration where my JRuby code needs to call Java framework code. A Java framework then needs to load a custom class and invokes method on it.

Using compiler2 seemed perfect. I can compile my Jruby code and the Java framework can load the plugin as it is loading any java class.

When I tried that, I kept getting 'Illegal type in constant pool'

class RcmSyncSource
def helloWorld
return "Hello from Ruby"
end
signature :helloWorld, [] => Java::java.lang.String
end

java -classpath c:\dev\jruby-1.4.0RC1\lib\jruby.jar;. RcmSyncSource

Exception in thread "main" java.lang.VerifyError: (class: RcmSyncSource, method: getName signature: ()Ljava/lang/String;) Illegal type in constant pool

I am using JRuby 1.4.0RC1 and JDK 1.6.0_07

Looking at the article, I only see the primitive types. Does compiler2 support compiling methods that return java objects? Also does it support compiling Jruby classes that implement Java interface?