Sunday, August 31, 2008

A Duby Update

I haven't forgotten about my promise to post on FFI and MVM APIs, but I've been taking occasional breaks from JRuby (heaven forbid!) to get some time on on Duby.

What Is Duby?

Duby, for those who have not heard of it, is my little toy language. It's basically a Ruby-like static-typed language with local type inference and not a whole lot of bells and whistles. The goal for Duby (which is most definitely a working name...it will probably change), is to provide the all the best parts of Ruby syntax people are familiar with, but add to it:

  • Written all in Ruby (and obviously the eventual plan would be to port it to Duby)
  • Backend-agnostic (JVM is obviously my focus, but nothing stops someone from building an LLVM or CLR typer+compiler)
  • Minimally-intrusive static typing (Duby infers types from arguments and calls, like Scala)
  • Features missing from Java (Duby treats module inclusion and class reopening like defining extension methods in C#)
  • A very pluggable type inference engine (Duby's "Java" typer is currently all of about 20 lines of code that plugs into the engine)
  • A pluggable compiler (Duby will allow adding compiler plugins to turn str1 + str2 into concatenation or StringBuffer calls, for example)
  • Absolutely no runtime dependencies (I want compiled output from Duby to be *done*, so there's no runtime library to lug along so it works; once compiled, there are no dependencies on Duby)
The primary motivation for Duby was originally to have a Ruby-like language we could use to implement parts of JRuby. The JVM, it's type system, and its bytecode are all actually really, really nice. There's a huge collection of libraries, fast primitives (that get optimized into faster native code), and a bytecode specification that's pretty easy for almost anyone to grok. But there's a problem: Java.

Now don't get me wrong, Java is a great language, but it's become a victim of its own success. While other languages have been adding niceities like local type inference, structural typing, closures and extension methods, Java's stayed pretty much the same. There have been no major language changes to Java since Java 5's additions of generics, enums, annotations, varargs, and a few other miscellaneous odds and ends. Meanwhile, I live in a torturous world between Ruby and Java, where I'd love to write everything in Ruby (too slow, too inexact for "stable layer" code), but must instead write everything in Java (with associated syntactic baggage and 20th-century language design). And so necessity dictates taking a new approach.
def fib(a => :fixnum)
if a < 2
a
else
fib(a - 1) + fib(a - 2)
end
end

puts fib(45)
So here we see an example I've shown in previous posts, but with a twist. First off, it's almost exactly the same as the equivalent Ruby code except for the argument type declaration => :fixnum. The rest of the script is all vanilla Ruby, even down to the puts call at the bottom.

But all is not as it seems. This is not Ruby code.

The type declaration in the method def looks natural, but it's not actually parseable Ruby. I had Tom Enebo hack a change in to JRuby's parser (off by default) to allow that syntax. Duby originally had a syntax something like this, so it could be parsed by any Ruby impl:

def fib(a)
{a => :fixnum}
...
end
But it's obviously a lot uglier.

New Type Inference Engine

Ignoring Java for a moment we can focus on the type inference happening here. Originally Duby only worked with explicit Java types, which obviously meant it would only ever be useful as a JVM language. The use of those types was also rather ugly, especially in cases where you just want something "Fixnum-like". So even though I had a working Duby compiler several months ago, I took a step back to rewrite it. The rewrite involved two major changes:
  1. Rather than build Duby directly on top of JRuby's AST I introduced a transformation phase, where the Ruby AST goes in and a Duby AST comes out. This allowed me to build up a structure that more accurately represented Duby, and also has the added bonus that transformers could be built from any Ruby parse output (like that of ruby_parser).
  2. Instead of being inextricably tied to the JVM's types and type system, I rewrote the inference engine to be type-system independent. Basically it uses all symbolic and string-based type identifiers, and allows wiring in any number of typing plugins, passing unresolved nodes to them in turn. Two great example plugins exist now: a Math plugin that knows how to handle mathematical and boolean operators against numeric types like :fixnum (it knows :fixnum < :fixnum produces a :boolean, for example), and a Java plugin that knows how to reach out into Java's classes and methods to infer return types for calls out of Duby-space.
The result of this is that up to the point of compilation, there's no explicit dependency on any named set of types, any type system, or any backend. Here's the output from the type inference engine running against that fib script above:
* [Simple] Learned local type under MethodDefinition(fib) : a = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "<" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "<" Type(fixnum) on Type(fixnum) = Type(boolean)
* [AST] [Call] resolved!
* [AST] [Condition] resolved!
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1635aad>
* [Java] Failed to infer Java types for method "fib" Type(fixnum) on Type(script)
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1635aad>
* [Java] Failed to infer Java types for method "fib" Type(fixnum) on Type(script)
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Method type for "+" on not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "+" on not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1635aad>
* [Java] Failed to infer Java types for method "+" on
* [Simple] Deferring inference for Call(+)
* [Simple] Deferring inference for If
* [Simple] Learned method fib (Type(fixnum)) on Type(script) = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [AST] [PrintLine] resolved!
* [Simple] Entering type inference cycle
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "+" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "+" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] [Cycle 0]: Inferred type for Call(+): Type(fixnum)
* [AST] [If] resolved!
* [Simple] [Cycle 0]: Inferred type for If: Type(fixnum)
* [Simple] Inference cycle 0 resolved all types, exiting
There's a lot going on here. You can see the MathTyper and JavaTyper both getting involved here. Since there's no explicit Java calls it's mostly the MathTyper doing all the heavy lifting. The inference stage progresses as follows:
  1. Make a first pass over all AST nodes, performing trivial inferences (declared arguments, literals, etc).
  2. Add each unresolvable node encountered to an unresolved list.
  3. Cycle over that list repeatedly until either all nodes have resolved or the list's contents do not change from one cycle to the next.
It's a fairly brute-force inference mechanism, certainly not on the scale of a full Hindley-Milner inference. Honestly I find the type declaration in the argument list to be far more helpful than harmful, though, and I'm not smart enough to write my own H/M engine at the moment.

Include Java

Anyway, back to Duby. Here's a more complicated example that makes calls out to Java classes:
import "System", "java.lang.System"

def foo
home = System.getProperty "java.home"
System.setProperty "hello.world", "something"
hello = System.getProperty "hello.world"

puts home
puts hello
end

puts "Hello world!"
foo
Here we see a few new concepts introduced.

First off, there's an import. Unlike in Java however, import knows nothing about Java types; it's simply associating a short name with a long name. The syntax (and even the name "import") is up for debate...I just wired this in quickly so I could call Java code.

Second, we're actually making calls that leave the known Duby universe. System.getProperty and setProperty are calls to the Java type java.lang.System. Now the Java typer gets involved. Here's a snippit of the inference output for this code:
* [Simple] Method type for "getProperty" Type(string) on Type(java.lang.System meta) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xaf17c7>
* [Math] Method type for "getProperty" Type(string) on Type(java.lang.System meta) not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1eb717e>
* [Java] Method type for "getProperty" Type(string) on Type(java.lang.System meta) = Type(java.lang.String)
* [AST] [Call] resolved!
The Java typer is fairly simple at the moment. When asked to infer the return type for a call, it takes the following path:
  1. Attempt to instantiate known Java types for the target and arguments. It makes use of the list of "known types" in the typing engine, augmented by import statements. If those types successfully resolve to Java types...
  2. It uses Java reflection APIs (through JRuby) to look up a method of that name with those arguments on the target type. From this method, then, we have a return type. The return type is reduced to a symbolic name (since again, the rest of the type inference engine knows nothing of Java types) and we consider it a successful inference. If the method does not exist, we temporarily fail to resolve; it may be that additional methods are defined layer that will support this name and argument list.
So in this case, the "System" type has been associated with the "java.lang.System" class (the "meta" in the type reference means it's a class reference rather than an instance reference), and the argument type "string" resolves to "java.lang.String". So java.lang.System.getProperty(java.lang.String) resolves as returning java.lang.String, and we have successfully resolved the call.

Next Steps

I see getting the JVM backend and typer working as two major milestones. Duby already can learn about Java types anywhere in the system and can compile calls to them. But mostly what works right now is what you see above. There's no support for array types, instantiating objects, or hierarchy-aware type inference. There's no logic in place to define new types, static methods, or to define or access fields. All this will come in time, and probably will move very quickly now that the basic plumbing is installed.

I'm hoping to get a lot done on Duby this month while I take a "pseudo-vacation" from constant JRuby slavery. I also have another exciting project on my plate: wiring JRuby into the now-functional "invokedynamic" support in John Rose's MLVM. So I'll probably split my time between those. But I'm very interested in feedback on Duby. This is real, and I'm going to continue moving it forward. I hope to be able to use this as my primary language some day soon.

Update: A few folks asked me to post performance numbers for that fib script above. So here's the comparison between Java and Duby for fib(45).

Java source:
public class FibJava {
public static int fib(int a) {
if (a < 2) {
return a;
} else {
return fib(a - 1) + fib(a - 2);
}
}

public static void main(String[] args) {
System.out.println(fib(45));
}
}
Java time:
➔ time java -cp . FibJava
1134903170

real 0m13.368s
user 0m12.684s
sys 0m0.154s
Duby source:
def fib(a => :fixnum)
if a < 2
a
else
fib(a - 1) + fib(a - 2)
end
end

puts fib(45)
Duby time:
➔ time java -cp . fib
1134903170

real 0m12.971s
user 0m12.687s
sys 0m0.112s
So the performance is basically identical. But I prefer the Duby version. How about you?

14 comments:

chanwit said...

Charlie,

I have been reading this paper:
http://progtools.comlab.ox.ac.uk/publications/oopsla08abc

Hope it may be useful for the type inference engine.

To be honest, I really like most part of Duby, but I need something more Java. For example,

a => :fixnum

I just need Pascal style 'a: fixnum' or C style 'fixnum a'.

Hopefully I can find a way to fork it, but I never ever been familiar with Ruby syntax :-)

Charles Oliver Nutter said...

Chanwit: I probably should spin off Duby into its own top-level project, so that folks like you could contribute to it. And I understand where you're coming from on the type declaration. Thankfully Ruby 1.9 supports using : instead of => so there's a good chance we'd be able to do that once we get Ruby 1.9 syntax added into the parser. C-style declaration will probably never be supported though, since it both disturbs the Ruby flow of the argument and it would require much more drastic changes to the parser.

chanwit said...

Charlie,

As I really see some hidden potential of Duby, it would be great if I could be able to join this project.

Charles Oliver Nutter said...

Chanwit: It's all in JRuby's source tree under lib/ruby/site_ruby/1.8/duby. If you find me on IM (headius@Yahoo, headius@gmail.com, headiusmaximus@AIM) or on IRC (headius@freenode on #jruby) I can walk you through some of it.

Anonymous said...

Very impressive!

Ad. ": instead of =>" - thankfully, indeed!
=> means "equal or greater"

Martin Probst said...

Nice to see progress on this.

Maybe it would be possible to turn parts of this into a Ruby type advisor a la FindBugs - a tool that does not do proper, complete type inference to create statically typed programs, but rather something that checks your Rails app for the common method name typos and such.

For that, you'd also need new type sources, i.e. the database schema as a back end to know that UserModel has an 'email' field. And of course you need to be allow turning off the warnings for DSL like meta programming parts, i.e. XML builders.

Anonymous said...

Pretty cool. Similar to work the PyPy guys did with RPython; a statically typed Python subset used to implement the PyPy compiler and runtime. There's a lot of boiler plate in compilers, it's only logical to use a language with nicer syntax than C for implementations.

opinali said...

In the Java code, the ternary operator is your friend:

public class FibJava {
public static int fib(int a) {
return (a < 2)
  ? a
  : fib(a - 1) + fib(a - 2);
}
}

P.S.: Contrary to the bozos who consider the ternary operator a bad practice and flag it with tools like Checkstyle, I use it massively, and it helps readability a lot (especially on functional-style code).

Charles Oliver Nutter said...

opinali: As it turns out, ternary already works because it's just an if statement in Ruby. So yay for ternary in Duby, it makes the code even tighter:

def fib(a => :fixnum)
a < 2 ? a : fib(a - 1) + fib(a - 2)
end

Slick.

aquasync said...

Hi Charlie,

This looks like a pretty cool project. I'm working on something similar, for converting ruby to c (with all the gobject trappings).

The first stage makes typed s-expressions. The gobject c output is still in progress. It uses ruby_parser, a custom type inference, and can dump the typed ruby back out for inspection, ie like:

# String
x = "asdf"
# [Nothing]
y = []
# Integer
y[(1 + 1)] = x
# [String]
y
# String | NilClass
z = y[0] if true
# NilClass
p z

Note that it has Array and Union composite types also. Anyway, I'd be interested in seeing if we can merge our type systems & type inferencers & annotations etc, so that one chunk of ruby could become java / gobject c / ...

Roger Pack said...

I suppose once you have type inference of duby then perhaps you can apply it to straight ruby [i.e. tight inner loops with "guessable" parameters can be converted to java on the fly].
That might be cool.
-=R

deep said...

Makes it look like ruby on mobile divices might really be possible...

Mark Essel said...

Howdy Charlie,
Was it you on that recent Jruby support Google video talking about Duby?

I'm trying to get my head around the differences between JRuby and Duby.
=> Jruby is a JVM version of Ruby syntax

=> while Duby is a statically typed language that looks similar to ruby, and uses inference to guess types. If it could infer all types (not saying it can) would that make it a Ruby to JVM compiler?

Is Duby "faster" than JRuby?

Charles Oliver Nutter said...

Mark: That was actually my primary co-conspirator from Google, Ryan Brown.

This post is a little old now, so I think need a new one. But Duby has basically become a Ruby syntax that compiles to some variety of statically-typed backend. In the current incarnation, it can generate .class and .java files for the JVM. It is not Ruby, and only has access to the libraries you give it, like other jars or the JDK libraries themselves. But it does add a lot of language-level features that Java doesn't have and Ruby does, like closures, optional arguments, and various literal syntaxes.

Think of it as a Rubyesque face on top of Java, and you're pretty much there.