Monday, March 10, 2008

Duby: A Type-Inferred Ruby-Like JVM Language

It's been one of those nights, my friends! An outstanding night!

All day Sunday I had the stupids. It was probably related to a few beers on Saturday night, or the couple glasses of red wine before that. Whatever it was, I didn't trust myself to work on any JRuby bugs for fear of introducing problems if my brain clicked off mid-stream. So I started playing around with my old bytecode generating DSL from a few months back. Then things got interesting.

We've long wanted to have a "Ruby-like" language, probably a subset of Ruby syntax, that we could compile to solid, fast, idiomatic JVM bytecode. Not a compiler for Ruby, with all the bells and whistles that make Ruby both difficult to support and inefficient to use for implementing itself. A real subset language that produces clean, tight JVM bytecode on par with what you'd get from compiled Java. But better, because it still looks and feels mostly like Ruby.

So I wrote one! And I used my bytecode library too!

Let's say we want to implement our good friend fib.

class Foo
def fib(n)
if (n < 2)
n
else
fib(n - 2) + fib(n - 1)
end
end
end
This is normal ruby code. Given a Fixnum input, it calculates the appropriate fibbonaci number and returns it. It's slow in Ruby for a few reasons:
  1. In JRuby, it uses a boxed integer value. Matz's Ruby and Rubinius use tagged integers to improve performance, and we rely on the JVM to optimize as much as it can (which turns out to be a *lot*). But it's still way slower than using primitives directly.
  2. The comparison operations, integer math operations, and fib operations are all dynamic invocations. So there's at least a bit of method lookup cost, and then a bunch of abstraction cost. You can reduce it, but you can't eliminate it.
  3. There are many Ruby features that influence performance negatively even when you're not using them. It's very difficult, for example, to optimally store local variables when the local scope can be captured at any time. So either you rely on tricks, or you store local variables on the heap and deal with them being slow.
When working with a statically-typed language, you can eliminate all of this. In Java, for example, you have both object types and primitive types; primitive operations are extremely fast and eventually JIT down to machine-code equivalents; and the feature set is suitably narrow to allow current JVMs to do amazing levels of optimization.

But of course Java has its problems too. For one, it does very little guessing or "inferring" of types for you, which means you generally have to declare them all over the place. On local variables, on parameters, on return types. C# 3.0 aims to correct this by adding type inference all over, but then there's still other syntactic hassle using any C-like language: curly braces, semicolons, and other gratuitous syntax that make up a lot of visual noise.

Wouldn't be nice if we could take the code above, add some type inference logic, and turn it into a fast, static-typed language?
class Foo
def fib(n)
{n => java.lang.Integer::TYPE, :return => java.lang.Integer::TYPE}
if (n < 2)
n
else
fib(n - 2) + fib(n - 1)
end
end
end
And there it is! This is the same code as before, but now it's been decorated with a little type declaration block (in the form of a Ruby hash/map) immediately preceding the body of the method. The type decl describes that the 'n' argument is to be mapped to a primitive int, and the method itself will return a primitive int (and yes, I know those could be inferred too...it shall be soon). The rest of the method just works like you'd expect, except that it's all primitive operations, chosen based on the inferred types. For the bold, here's the javap disassembly output from the compiler:
Compiled from "superfib.rb"
public class Foo extends java.lang.Object{
public int fib(int);
Code:
0: iload_1
1: ldc #10; //int 2
3: if_icmpge 10
6: iload_1
7: goto 27
10: aload_0
11: iload_1
12: ldc #10; //int 2
14: isub
15: invokevirtual #12; //Method fib:(I)I
18: aload_0
19: iload_1
20: ldc #13; //int 1
22: isub
23: invokevirtual #12; //Method fib:(I)I
26: iadd
27: ireturn

public Foo();
Code:
0: aload_0
1: invokespecial #15; //Method java/lang/Object."<init>":()V
4: return

}
A few items to point out:
  • A default constructor is generated, as you'd expect in Java. This will be expected to also recognize "def initialize" constructors. I haven't decided if I'll allow overloading or not.
  • Notice the type signature for fib and all the type signatures for calls it makes are correctly inferred to the correct types.
  • Notice all the comparison and arithmetic operations are compiled to the correct bytecodes (iadd, isub, if_icmpge and so on).
And the performance is what you'd hope for:
$ jruby -rjava -e "t = Time.now; 5.times {Java::Foo.new.fib(35)}; p Time.now - t"
0.681
$ jruby -rnormalfib -e "t = Time.now; 5.times {Foo.new.fib(35)}; p Time.now - t"
27.851
Here's another example, with some string operations thrown in:
class Foo
def bar
{:return => java.lang.String}

'here'
end
end

class Foo
# reopening classes works in the same file only (for now)
def baz(a)
{a => java.lang.String}

b = "foo"
a = a + bar + b
puts a
end
end
It works, of course:
$ jruby -rjava -e "Java::Foo.new.baz('Type inference is fun')"
Type inference is funherefoo
And once again, the disassembled output:
Compiled from "stringthing.rb"
public class Foo extends java.lang.Object{
public java.lang.String bar();
Code:
0: ldc #13; //String here
2: areturn

public void baz(java.lang.String);
Code:
0: ldc #15; //String foo
2: astore_2
3: aload_1
4: checkcast #17; //class java/lang/String
7: aload_0
8: invokevirtual #19; //Method bar:()Ljava/lang/String;
11: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
14: checkcast #17; //class java/lang/String
17: aload_2
18: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
21: astore_1
22: getstatic #29; //Field java/lang/System.out:Ljava/io/PrintStream;
25: aload_1
26: invokevirtual #34; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
29: return

public Foo();
Code:
0: aload_0
1: invokespecial #36; //Method java/lang/Object."<init>":()V
4: return

}
Notice here that the + operation was detected as acting on two strings, so it was compiled to call String#concat rather than try to do a numeric operation. These sorts of mappings are simple to add, and since there's type information everywhere it's also easy to come up with cool ways to map Ruby syntax to Java types.

My working name for this is going to be Duby, pronounced "Doobie", after Duke plus Ruby. Duby! It has a nice ring to it. It may be subject to change, but that's what we'll go with for the moment.

Currently, Duby supports only the features you see here. It's a very limited subset of Ruby right now, and the subset doesn't support all Java primitive types, for example, so there's a lot of blanks to be filled. It also doesn't support static (class) methods, doesn't wire up "initialize" methods, doesn't support packages (for namespacing) or imports (to shrink type names) or superclasses or interfaces or enums or generics or what have you. But it's already functional for simple code, as you see here, so I think it's a great start for 10 hours of work. The rest will come, as needed and as time is available.

What are my plans for it? Well many of you may know Rubinius, an effort to reimplement Ruby in Ruby, modeled after the design of many Smalltalk VMs. Well, in order to make JRuby more approachable to Ruby developers, Duby seems like the best way to go. They'll be able to write mostly idiomatic Ruby code and know it will both perform at Java speeds and provide compile-type checking that all is wired up correctly. So we can avoid the overhead of "turtles all the way down" by just teaching one turtle how to speak JVM bytecode and building on that.

I also hope this will lead to lots of new ideas about how to implement languages for the JVM. It's certainly attractive to language users to be able to contribute to language implementations using the language in question, and if we can come up with compilers and sub-languages like Duby it could make the JVM more approachable to a wide range of developers.

Oh, and for those of you curious: the Duby compiler is written mostly in Ruby too.

37 comments:

Anonymous said...

That seems a bit dubious :-D

Seriously: that's really cool. Now what about parsing those type annotations in regular Ruby code, too, and optimizing JRuby's code generation for that?

Maybe you could also allow type annotations like this in JavaDoc-style comments to methods, so you have documentation and performance enhancements at the same time!

Anonymous said...

I like this feature. Ola had wrote a post on the similar stuff here.

I will write ruby without the type declaration and declare the type later for optimization if needed.

Charles Oliver Nutter said...

@martin I've been looking into exactly that sort of thing, trying to decide on a minimally-invasive way to allow people to mark up code for performance. I'm interested to see what the reaction is to Duby to see if others are open to this kind of language-bending. I also want to get imports working (after some sleep) so that declarations can just be like {:a => int, :b => String}. There's a lot more to do, but I think this could be very powerful.

Patrick Mueller said...

Why the Java slant? Instead of

{n => java.lang.Integer::TYPE, :return => java.lang.Integer::TYPE}

howabout

{n => :integer, :return => :integer}

Basically, keep it implementation language neutral. As to Martin's comment, make this info introspectable, so that we can extract this information and add it to generated documentation.

Charles Oliver Nutter said...

@sgwong To be clear, this is definitely not running as normal Ruby after it's been compiled. By then it's been turned into fairly standard JVM bytecodes, without dynamic invocation and many other Ruby features. But these techniques could be easily added to the main JRuby compiler to allow you to specify the runtime characteristics of a given method, or to allow directly embedding a Duby method into your Ruby code. There's a world of possibilities.

Charles Oliver Nutter said...

@patrick It will eventually look exactly like that, once I support "import" for types. You would then import java.lang.String or java.lang.Integer::TYPE and use them directly as "String" and "int" in type declarations. And of course those common types would probably be auto-imported for you automatically.

But now I must sleep. Perhaps when I wake up.

Scot McSweeney-Roberts said...

It was probably related to a few beers on Saturday night, or the couple glasses of red wine before that.

As a uni lecturer of mine once said -

Beer before wine, that's fine. Wine before beer, oh dear.

Unknown said...

This is a great step in a very good direction.

Is it possible to use Scala-style optional typing?

name [':' type]

I guess that type declarations may be needed for local variables, instance variables and class variables, as well as for method parameters and result types. At first sight the use of colons in declarations doesn't clash with their use elsewhere in Ruby syntax.

Like Patrick, I'd prefer the typing to be in terms of Ruby's types, rather than Java's, so that the same optimisation could be used in other implementations.

jherber said...

charles, that rocks! keep the innovations coming :)

would it be an advantage for going between ruby and duby if annotation were in a ruby comment and perhaps not confusable with ruby code. for instance (borrowing functional notation):

class Foo
  def mult(x,y)  #(int,int)->int
    x*y
  end
end

JoBro Hater #3 said...

Where can I get Duby? I don't see a link.

Anonymous said...

This looks cool! It isn't really type inference what you're doing, but it definitely saves a lot of type-typing ;)

Charles Oliver Nutter said...

@scot I think I've heard just about every permutation of liquor rhyme possible, and you know what? They're all right, because if you're drinking enough to worry about one after another, you're already in trouble.

@justinforder That would certainly be nice, but not directly possible in current Ruby grammars. We could certainly put our heads together and come up with a grammar modification that looks more pleasing, but I don't know how well that would be received by the larger Ruby community. Perhaps it wouldn't be so bad?

@jherber That's another one of the options, for sure. The advantage is that it truly is a no-op, and would have no effect on performance when running as normal Ruby. On the other hand, it would require additional parsing smarts. It's still on the table though.

@johnathan It's already in JRuby trunk, though it won't be supported or used for anything in the 1.1 release. Look under lib/ruby/site_ruby/1.8/compiler.

@chris Are you sure it isn't type inference? Notice in the second example no type is ever declared for the 'b' variable, but because it's assigned a string that's the type that gets used. I should come up with a more complex example that allows a variable's type to be determined from a method it calls, but that also works right now. If this isn't type inference, can you point me toward a better definition?

Anonymous said...

This is a pretty cool development, and I hope it doesn't get shelved away.

It looks a bit more like how Cobra handles type inference, with all those precondition hints to help the compiler generate native code.

Anonymous said...

@charles:

Generally, type inference means that the types are fully omitted from a program. The types are then "reconstructed" by an algorithm. Most of these algorithms are based on Algorithm W by Damas/Milner. What you do is often also referred to as type-checking, and indeed, you only need the types of the function arguments to be able to type-check a full program.

Things are going to get more interesting when you are going to add things like mixins, and there are some other catches.

That being said, I have been thinking about doing something similar, and I think it's really cool that there's actually somebody who's doing this. Did you implement all this in Ruby?

Charles Oliver Nutter said...

@laurence Me too...but I don't plan to shelve it away, and I'm already devising ways to use it for certain pieces of JRuby internals.

@chris Ok, so would you say then that Scala is not a type-inferred language, since it requires you to specify argument types? And to answer your question, the main compiler logic is all in Ruby, though it uses our Java integration layer heavily to reflectively inspect Java types. The bytecode layer is a Ruby wrapper around ASM.

Anonymous said...

maybe you know this, maybe you don't:

http://lua-users.org/lists/lua-l/2008-02/msg00051.html

http://www.bluishcoder.co.nz/2008/02/quick-introduction-to-tamarin-tracing.html

Charles Oliver Nutter said...

@superfast Hey, thanks for those links, they both look interesting. We're probably going to start looking into profile-based JIT after 1.1, since we have mixed-mode execution already. So these will be very useful to read. Thank you!

Daniel Berger said...

def fib(Integer n) => Integer
...
end

Break with MRI. Stop fighting it.

Unknown said...

To go along with justinforder's comment about how Scala adds type information to parameters, that is how Python adds parameter annotations in 2.6/3.0:

def fib(n:Integer) -> Integer

As you mentioned, it might not work out with Ruby's syntax, but at least two other languages like it. =)

Unknown said...

I'd very much prefer pulling the signature out of the method definition:

sig fib(Integer) => Integer
def fib(n)
...
end

or (valid ruby syntax)

sig Integer, Integer
def fib(n)
...
end

Anyway, excellent idea. jruby becomes increasingly interesting.

(Any chance jruby will have macros one day?)

Bill said...

Hi Charles,

Just for the record, here's what it would look like in Scala:

class Fibo {
  def fib(n: Int): Int =
    if (n < 2) n else fib(n - 2) + fib(n - 1)
}

The bytecodes look similar to those generated by the Duby version:

public int fib(int);
Code:
Stack=4, Locals=2, Args_size=2
0: iload_1
1: iconst_2
2: if_icmpge 9
5: iload_1
6: goto 24
9: aload_0
10: iload_1
11: iconst_2
12: isub
13: invokevirtual #16; //Method fib:(I)I
16: aload_0
17: iload_1
18: iconst_1
19: isub
20: invokevirtual #16; //Method fib:(I)I
23: iadd
24: ireturn

I guess you're after something that looks more like Ruby. By the way, this algo isn't tail recursive, so larger numbers passed in could cause other performance problems. A little searching on the web and I found this tail-recursive algorithm:

class FiboTail {

  def fib(n: Int): Int = {

    def aux(n: Int, next: Int, result: Int): Int =
      if (n == 0)
        result
      else
        aux(n - 1, next + result, next)

    aux(n, 1, 0)
  }
}

The bytecodes for this one show just one call to the auxiliary function aux (which I defined as a local function inside fib):

public int fib(int);
Code:
Stack=4, Locals=2, Args_size=2
0: aload_0
1: iload_1
2: iconst_1
3: iconst_0
4: invokespecial #25; //Method aux$1:(III)I
7: ireturn

The tail recursion inside aux gets optimized away into a loop:

private final int aux$1(int, int, int);
Code:
Stack=3, Locals=5, Args_size=4
0: aload_0
1: astore 4
3: iload_1
4: iconst_0
5: if_icmpne 10
8: iload_3
9: ireturn
10: iload_1
11: iconst_1
12: isub
13: iload_2
14: iload_3
15: iadd
16: iload_2
17: istore_3
18: istore_2
19: istore_1
20: goto 3

This tail recursive one seems to give the same answers as the non-tail recursive one (for small numbers):

scala> :load dubious.scala
Loading dubious.scala...
defined class Fibo
defined class FiboTail

scala> val fibo = new Fibo
fibo: Fibo = Fibo@ee1c42

scala> val fiboTail = new FiboTail
fiboTail: FiboTail = FiboTail@616fde

scala> for (i <- 1 to 10)
     |   println(fibo.fib(i) + " " + fiboTail.fib(i))
1 1
1 1
2 2
3 3
5 5
8 8
13 13
21 21
34 34
55 55

Sorry to crash your party with Scala code. I'm sure you're looking for a more Ruby-like syntax, but when you started talking about wanting a language with concise expression, type inference, and optimized bytecodes for the JVM, it rang a bell for me.

Unknown said...

Someone knows of any "backend" of Ruby or subset of ruby (a general computing library) that uses GPU?

GPUs dont have the control unit needed for general Ruby usage but they have a whole bunch of useful instructions and of course a huge amount or ALU to become a very good backend for some Ruby operations.

If I am interested in developing something like that, the only option would be write a C library and make a ruby wrap to it?

Anonymous said...

I second Tom's suggestion of pulling the signature out of the method body.

Daniel Berger said...

Tom, Mark,

Putting it outside the method definition will be problematic for metaprogramming, e.g. Module#define_method. It'll probably be more difficult to parse, too.

Also, I don't see the point of separating it. Python doesn't, and it seems to be working for them. Inlining the type looks much nicer.

Anonymous said...

Maybe I'm missing something, but:
regular def fib(n)
is not the same as
def fib(n)
{n => java.lang.Integer::TYPE, :return => java.lang.Integer::TYPE}

Simply because Ruby's integer numbers go from Fixnum to Bignum when necessary - but java.lang.Integer implies the Java 32 bit signed integer, which will go titsup once it gets to big. So I'd call the 2nd fib implementation to be buggy... why in the world should one fib invocation fail, just because it's result happens to be greater than 31 bit?

Let the Polymorphism in:
http://www.jroller.com/murphee/entry/hell_hath_no_fury_like

You know the saying: Every time a developer uses fixed width numbers, Turing eats a kitten.

Charles Oliver Nutter said...

@murphee Duby doesn't (currently) support arithmetic overflow into arbitrary-precision because the JVM doesn't support it. Duby is being designed to put a Ruby face on what works well on the JVM. The primary motivator is to give a static-typed lookalike for Ruby we can use to implement static signatures on compiled Ruby classes and to implement JRuby internals without all the Java fuss. Anything outside that...I'm open to contributions.

Tiago said...

Why not having also optional explicit parameters a la CAML and Scala?

I am not suggesting compulsory explicit typing, but allowing for the programmer to declare the types can help a lot in the debugging phase and also for some automated tools like IDEs, code generators, etc.

Charles Oliver Nutter said...

@tiago I assume you mean in Ruby, yes? We're considering it.

Anonymous said...

please do not involve rubyists with the java not-convention between data-objects and privitives ( Integer not egal int) etc.

so its the best to use only the ruby syntax - so we can use the type hints for other ruby stuff and not only for the JVM

so int should be a :integer etc.



an other suggestion for syntax is:


class Foo
__def mult(x,y)
____params x => :integer, x > :integer
____returns :integer
____x*y
__end
end

Anonymous said...

Just another thought after reading everyone's comments regarding syntax. I'd love to see something like:

def foo(a)
types :a => Fixnum, :return => String
...
end

Anonymous said...

singpolyma

Something like that could be very easily implemented directly in ruby as a simple gem.

And IDE's could use it for their type inference.

Roger Pack said...

I like duby as an option for "strongly typed, more easily compilable" codes

I assume this is like pyrex/cython?

-=R

Unknown said...

It would be pretty funny if you did it as comments, and then rdoc went "oh look, some people are putting type information in the comments, let's generate docs from it", and then IDE developers went "oh look, some people are putting type information in the comments, let's make autocomplete work much better".

Charles Oliver Nutter said...

Trejkaz: Actually I believe NetBeans already does that!

Unknown said...

What is the syntax to put in the comment to make it work? Maybe IDEA supports it too and I just don't know because I'm only using call-seq and other "normal" Ruby ways of documenting methods.

Charles Oliver Nutter said...

Trejkaz: Here's a screenshot of it; you might poke around http://wiki.netbeans.org/Ruby as well, it's probably documented somewhere.

http://blogs.sun.com/tor/resource/ruby-types.png

Roger Pack said...

You should start a mailing list or something so that users can contribute.
Cheers!
-=r