Sunday, August 31, 2008

A Duby Update

I haven't forgotten about my promise to post on FFI and MVM APIs, but I've been taking occasional breaks from JRuby (heaven forbid!) to get some time on on Duby.

What Is Duby?

Duby, for those who have not heard of it, is my little toy language. It's basically a Ruby-like static-typed language with local type inference and not a whole lot of bells and whistles. The goal for Duby (which is most definitely a working name...it will probably change), is to provide the all the best parts of Ruby syntax people are familiar with, but add to it:

  • Written all in Ruby (and obviously the eventual plan would be to port it to Duby)
  • Backend-agnostic (JVM is obviously my focus, but nothing stops someone from building an LLVM or CLR typer+compiler)
  • Minimally-intrusive static typing (Duby infers types from arguments and calls, like Scala)
  • Features missing from Java (Duby treats module inclusion and class reopening like defining extension methods in C#)
  • A very pluggable type inference engine (Duby's "Java" typer is currently all of about 20 lines of code that plugs into the engine)
  • A pluggable compiler (Duby will allow adding compiler plugins to turn str1 + str2 into concatenation or StringBuffer calls, for example)
  • Absolutely no runtime dependencies (I want compiled output from Duby to be *done*, so there's no runtime library to lug along so it works; once compiled, there are no dependencies on Duby)
The primary motivation for Duby was originally to have a Ruby-like language we could use to implement parts of JRuby. The JVM, it's type system, and its bytecode are all actually really, really nice. There's a huge collection of libraries, fast primitives (that get optimized into faster native code), and a bytecode specification that's pretty easy for almost anyone to grok. But there's a problem: Java.

Now don't get me wrong, Java is a great language, but it's become a victim of its own success. While other languages have been adding niceities like local type inference, structural typing, closures and extension methods, Java's stayed pretty much the same. There have been no major language changes to Java since Java 5's additions of generics, enums, annotations, varargs, and a few other miscellaneous odds and ends. Meanwhile, I live in a torturous world between Ruby and Java, where I'd love to write everything in Ruby (too slow, too inexact for "stable layer" code), but must instead write everything in Java (with associated syntactic baggage and 20th-century language design). And so necessity dictates taking a new approach.
def fib(a => :fixnum)
if a < 2
a
else
fib(a - 1) + fib(a - 2)
end
end

puts fib(45)
So here we see an example I've shown in previous posts, but with a twist. First off, it's almost exactly the same as the equivalent Ruby code except for the argument type declaration => :fixnum. The rest of the script is all vanilla Ruby, even down to the puts call at the bottom.

But all is not as it seems. This is not Ruby code.

The type declaration in the method def looks natural, but it's not actually parseable Ruby. I had Tom Enebo hack a change in to JRuby's parser (off by default) to allow that syntax. Duby originally had a syntax something like this, so it could be parsed by any Ruby impl:

def fib(a)
{a => :fixnum}
...
end
But it's obviously a lot uglier.

New Type Inference Engine

Ignoring Java for a moment we can focus on the type inference happening here. Originally Duby only worked with explicit Java types, which obviously meant it would only ever be useful as a JVM language. The use of those types was also rather ugly, especially in cases where you just want something "Fixnum-like". So even though I had a working Duby compiler several months ago, I took a step back to rewrite it. The rewrite involved two major changes:
  1. Rather than build Duby directly on top of JRuby's AST I introduced a transformation phase, where the Ruby AST goes in and a Duby AST comes out. This allowed me to build up a structure that more accurately represented Duby, and also has the added bonus that transformers could be built from any Ruby parse output (like that of ruby_parser).
  2. Instead of being inextricably tied to the JVM's types and type system, I rewrote the inference engine to be type-system independent. Basically it uses all symbolic and string-based type identifiers, and allows wiring in any number of typing plugins, passing unresolved nodes to them in turn. Two great example plugins exist now: a Math plugin that knows how to handle mathematical and boolean operators against numeric types like :fixnum (it knows :fixnum < :fixnum produces a :boolean, for example), and a Java plugin that knows how to reach out into Java's classes and methods to infer return types for calls out of Duby-space.
The result of this is that up to the point of compilation, there's no explicit dependency on any named set of types, any type system, or any backend. Here's the output from the type inference engine running against that fib script above:
* [Simple] Learned local type under MethodDefinition(fib) : a = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "<" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "<" Type(fixnum) on Type(fixnum) = Type(boolean)
* [AST] [Call] resolved!
* [AST] [Condition] resolved!
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1635aad>
* [Java] Failed to infer Java types for method "fib" Type(fixnum) on Type(script)
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Retrieved local type in MethodDefinition(fib) : a = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1635aad>
* [Java] Failed to infer Java types for method "fib" Type(fixnum) on Type(script)
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Method type for "+" on not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "+" on not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1635aad>
* [Java] Failed to infer Java types for method "+" on
* [Simple] Deferring inference for Call(+)
* [Simple] Deferring inference for If
* [Simple] Learned method fib (Type(fixnum)) on Type(script) = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [AST] [PrintLine] resolved!
* [Simple] Entering type inference cycle
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "+" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xcc5002>
* [Math] Method type for "+" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] [Cycle 0]: Inferred type for Call(+): Type(fixnum)
* [AST] [If] resolved!
* [Simple] [Cycle 0]: Inferred type for If: Type(fixnum)
* [Simple] Inference cycle 0 resolved all types, exiting
There's a lot going on here. You can see the MathTyper and JavaTyper both getting involved here. Since there's no explicit Java calls it's mostly the MathTyper doing all the heavy lifting. The inference stage progresses as follows:
  1. Make a first pass over all AST nodes, performing trivial inferences (declared arguments, literals, etc).
  2. Add each unresolvable node encountered to an unresolved list.
  3. Cycle over that list repeatedly until either all nodes have resolved or the list's contents do not change from one cycle to the next.
It's a fairly brute-force inference mechanism, certainly not on the scale of a full Hindley-Milner inference. Honestly I find the type declaration in the argument list to be far more helpful than harmful, though, and I'm not smart enough to write my own H/M engine at the moment.

Include Java

Anyway, back to Duby. Here's a more complicated example that makes calls out to Java classes:
import "System", "java.lang.System"

def foo
home = System.getProperty "java.home"
System.setProperty "hello.world", "something"
hello = System.getProperty "hello.world"

puts home
puts hello
end

puts "Hello world!"
foo
Here we see a few new concepts introduced.

First off, there's an import. Unlike in Java however, import knows nothing about Java types; it's simply associating a short name with a long name. The syntax (and even the name "import") is up for debate...I just wired this in quickly so I could call Java code.

Second, we're actually making calls that leave the known Duby universe. System.getProperty and setProperty are calls to the Java type java.lang.System. Now the Java typer gets involved. Here's a snippit of the inference output for this code:
* [Simple] Method type for "getProperty" Type(string) on Type(java.lang.System meta) not found.
* [Simple] Invoking plugin: #<Duby::Typer::MathTyper:0xaf17c7>
* [Math] Method type for "getProperty" Type(string) on Type(java.lang.System meta) not found
* [Simple] Invoking plugin: #<Duby::Typer::JavaTyper:0x1eb717e>
* [Java] Method type for "getProperty" Type(string) on Type(java.lang.System meta) = Type(java.lang.String)
* [AST] [Call] resolved!
The Java typer is fairly simple at the moment. When asked to infer the return type for a call, it takes the following path:
  1. Attempt to instantiate known Java types for the target and arguments. It makes use of the list of "known types" in the typing engine, augmented by import statements. If those types successfully resolve to Java types...
  2. It uses Java reflection APIs (through JRuby) to look up a method of that name with those arguments on the target type. From this method, then, we have a return type. The return type is reduced to a symbolic name (since again, the rest of the type inference engine knows nothing of Java types) and we consider it a successful inference. If the method does not exist, we temporarily fail to resolve; it may be that additional methods are defined layer that will support this name and argument list.
So in this case, the "System" type has been associated with the "java.lang.System" class (the "meta" in the type reference means it's a class reference rather than an instance reference), and the argument type "string" resolves to "java.lang.String". So java.lang.System.getProperty(java.lang.String) resolves as returning java.lang.String, and we have successfully resolved the call.

Next Steps

I see getting the JVM backend and typer working as two major milestones. Duby already can learn about Java types anywhere in the system and can compile calls to them. But mostly what works right now is what you see above. There's no support for array types, instantiating objects, or hierarchy-aware type inference. There's no logic in place to define new types, static methods, or to define or access fields. All this will come in time, and probably will move very quickly now that the basic plumbing is installed.

I'm hoping to get a lot done on Duby this month while I take a "pseudo-vacation" from constant JRuby slavery. I also have another exciting project on my plate: wiring JRuby into the now-functional "invokedynamic" support in John Rose's MLVM. So I'll probably split my time between those. But I'm very interested in feedback on Duby. This is real, and I'm going to continue moving it forward. I hope to be able to use this as my primary language some day soon.

Update: A few folks asked me to post performance numbers for that fib script above. So here's the comparison between Java and Duby for fib(45).

Java source:
public class FibJava {
public static int fib(int a) {
if (a < 2) {
return a;
} else {
return fib(a - 1) + fib(a - 2);
}
}

public static void main(String[] args) {
System.out.println(fib(45));
}
}
Java time:
➔ time java -cp . FibJava
1134903170

real 0m13.368s
user 0m12.684s
sys 0m0.154s
Duby source:
def fib(a => :fixnum)
if a < 2
a
else
fib(a - 1) + fib(a - 2)
end
end

puts fib(45)
Duby time:
➔ time java -cp . fib
1134903170

real 0m12.971s
user 0m12.687s
sys 0m0.112s
So the performance is basically identical. But I prefer the Duby version. How about you?