Headius

Thursday, March 02, 2006

IRB is GO!

I should post about things not working more often.

Within a few hours after my previous post, where I showed the world how IRB now starts up successfully in JRuby but does not work, I was back at it trying to fix the next few bugs preventing it from working. The first issue was a NullPointerException deep in the interpreter, when executing an "until" block. Our parser, for right or for wrong, was producing an AST "UntilNode" with no body. While this could be correct or incorrect behavior--since the "until" in question actually did have an empty body--we still were not handling it correctly. The interpreter assumed that all "until"s would have bodies, and when a body turned up null...kaboom. A fix to check for null and not attempt to evaluate the body was an easy, if not entirely kosher, way to fix it. Done.

However, nothing could prepare me for what followed.

C:\JRubyWork\jruby3>jruby C:\ruby\bin\irb
irb(#<IRB::WorkSpace:0x5a9c6e>):001:0> x = 1
=> 1

I expected that the "until" bug would go away...that much was easy. however, I did not expect the variable assignment to work. "Ok," I thought, "that's better progress than I expected, but let's try something more complicated."

irb(#<IRB::WorkSpace:0x5a9c6e>):002:0> puts x
NameError: undefined local variable or method 'x' for #<IRB::WorkSpace:0x5a9c6e>
from (irb):1:in `method_missing'
...

Ahh, there's the comforting disappointment I was used to. The 'x' variable had been declared and assigned, but for whatever reason, it was not visible in the current scope.

Normally, I would have continued on to fix this scoping issue, which certainly would have involved a complicated dive into JRuby internals, hunting for mishandled scopes, bindings, frames, and wrapper objects. In this case, however, I decided to give IRB's "single IRB" mode a try, which simplifies the logical scoping of the IRB workspace. What follows is a series of annotated IRB sessions running--yes, running successfully--under JRuby.

This first demo shows something basic: a multiline do/end array iteration.

C:\JRubyWork\jruby3>jruby C:\ruby\bin\irb --single-irb
irb(#<irb::workspace:0x175ace6>):001:0> [1, 2, 3].each do |i|
irb(#<irb::workspace:0x175ace6>):002:1* puts i
irb(#<irb::workspace:0x175ace6>):003:1> end
1
2
3
=> [1, 2, 3]

This confirmed several things:

method calls were working just fine
array instantiation and integer literals were ok
multi-line constructs were being handled correctly

It is this last one that surprised me a bit. I had not expected multi-line constructs to work so well and without any problems, but there it was. Playing around a bit more, I discovered some other surprises:

line editing was working successfully, and I could arrow-key left and right to correct mistakes
command history was also working, so that up and down arrow would retrieve the next and previous lines, respectively
tab completion does not work

Excluding the tab completion issue (hitting tab just inserts a "tab" character into the current line), the perfectly working line editing and command history totally blew me away. I have NEVER seen a console-mode Java application do such things so seamlessly, much less one running an interactive shell. It appears that IRB's fallback "StdioInputHandler" is far less "dumb" than I expected. It was making Java do things I didn't know Java could do. Excited, I pressed on.

This next demonstration tests the declaration and instantiation of a multiline class, another area I thought would never work correctly.

irb(#<irb::workspace:0x175ace6>):001:0> class MyClass
irb(#<irb::workspace:0x175ace6>):002:1> def hello
irb(#<irb::workspace:0x175ace6>):003:2> "Hello from IRB!"
irb(#<irb::workspace:0x175ace6>):004:2> end
irb(#<irb::workspace:0x175ace6>):005:1> end
=> nil
irb(#<irb::workspace:0x175ace6>):006:0> x = MyClass.new
=> #<myclass:0x1621fe6>
irb(#<irb::workspace:0x175ace6>):007:0> puts x.hello
Hello from IRB!
=> nil

Once again, JRuby (and IRB) thoroughly surprised me. Defining a class over multiple lines worked perfectly, just as you'd expect from IRB running under C Ruby. At this point, IRB was running so well I began to have some doubts. Could it be that IRB had called out to an external C Ruby process for running the interactive portion of the shell? Such a thing would not be unheard of; Rake launches external Ruby processes to run test cases, though you might never notice such a thing. There was, however, a simple way to confirm that I was actually seeing JRuby at work and not C Ruby: call Java code.

JRuby's greatest strength lies, unsurprisingly, in its ability to neatly tie Ruby and Java code together. For what other purpose would we want Ruby running in the JVM than to take advantage of the wealth of libraries the Java world has to offer? The integration is improving more and more with each release, and has become extremely powerful, usable, and above all very Ruby-like.

This next demonstration shows IRB calling Java code.

irb(#<irb::workspace:0x175ace6>):001:0> require 'java'
=> true
irb(#<irb::workspace:0x175ace6>):002:0> include_class "java.lang.System"
=> ["java.lang.System"]
irb(#<irb::workspace:0x175ace6>):003:0> System.out.println("Hello from Java")
Hello from Java
=> nil

Now some of you may not realize what this means. The ability to interactively script and exercise Java code from within an IRB session has huge potential for testing Java code, debugging JRuby (perhaps that's more exciting to me...oh well), and providing all the interactive goodies that Rubyists have taken for granted with the power and variety of Java's capabilities.

So, a final demonstration is in order.

irb(#<irb::workspace:0x175ace6>):001:0> require 'java'
=> true
irb(#<irb::workspace:0x175ace6>):002:0> include_class "javax.swing.JFrame"
=> ["javax.swing.JFrame"]
irb(#<irb::workspace:0x175ace6>):003:0> include_class "javax.swing.JButton"
=> ["javax.swing.JButton"]
irb(#<irb::workspace:0x175ace6>):004:0> frame = JFrame.new("my frame")
=> javax.swing.JFrame[long desc omitted...]
irb(#<irb::workspace:0x175ace6>):005:0> button = JButton.new("my button")
=> javax.swing.JButton[long desc omitted...]
irb(#<irb::workspace:0x175ace6>):006:0> frame.contentPane.add(button)
=> javax.swing.JButton[long desc omitted...]
irb(#<irb::workspace:0x175ace6>):007:0> frame.setSize(200, 100)
=> nil
irb(#<irb::workspace:0x175ace6>):008:0> frame.show
=> nil

The result:

So there you have it. With a few small caveats (like --single-irb), IRB is actually up and working in JRuby, far sooner than I expected. This is turning out to be a really good week.

JRuby Progress Updates: JRuby on Rails, IRB, and the Future

We've been very productive on JRuby over the past week. Progress is being made on many fronts, and I'm more excited than ever about JRuby's potential. Here's a few updates on Rails, IRB, and JRuby's future for those of you following along. It's a long post, but each subsection stands on its own.

JRuby on Rails

I've continued my work getting the Rails "generate" script to run with JRuby. The major hurdle we were coping with last time was parsing the database.yml file. First, a bit of background.

YAML, as most of you will know, is a markup language (or rather, YAML's not A Markup Language) used prominently in Ruby applications for configuration files and for some types of object persistence. YAML's creator, _why, originally wrote YAML parsers and libraries in the language of the target platform; in other words, the original YAML parser was pure Ruby code, written to use Ruby's compiler-compiler library RACC. Now Ruby's troubles with performance are fairly well-documented, but these issues were considerably more pronounced when doing such an intensive process as parsing a large YAML file. _why's solution was to write a new C library for parsing YAML called "syck". Syck did two things: first, it sped up YAML parsing considerably and allowed many languages to use the same parser via language plugin mechanisms; and second, it eliminated the need or availability for a pure Ruby YAML parser.

Enter JRuby. With YAML now being parsed almost exclusively using the C-based syck library, we have been forced to use an older version of the RACC-based pure Ruby parser. When starting work on Rails (and really, when playing with making RubyGems work) the complexity of the YAML parser brought out some problems in JRuby. That was a couple weeks ago.

Almost all of those problems have now been solved.

Our StringIO library (again, Ruby uses C code for StringIO to improve performance...see a pattern forming?) had been tested using all available test cases, but unfortunately those test cases did not cover the simple cases. When yaml.rb (the pure Ruby YAML parser we're using) started to make heavy use of StringIO, failures showed up. Tom Enebo is currently working on fixing the last of those failures, writing more extensive test cases at the same time. However, yaml.rb also contains a "stripped-down" version of StringIO for its own use. During my continued testing, I have been forcing it to use that version while Tom completes his fixes.

Other problems ranged from interpreter bugs (variable scoping, throw/catch not working, etc) to parser bugs (JRuby's parser did not take into account an "eval" called from within a block, which causes variables to be handled a bit differently). Those issues are now sufficiently resolved so that YAML does not show failures.

So back to Rails. Where the "generate" script originally got to the point of parsing database.yml and blew up, it now successfully parses that file and continues on. The next step in the "initialize_database" step of the initializer is to actually instantiate an ActiveRecord adapter based on database.yml. This is where the current failure lies, and where my attentions will be focused.

So to recap Rails progress, the "generate" script's call to the railties initializer successfully runs up to ActiveRecord instantiation, as well as successfully running a number of other initialize tasks. It's getting closer every day.

IRB

Oh, IRB, how we love thee. For those unfamiliar, IRB is the "interactive ruby" shell where you can enter in line-by-line Ruby code and immediately see results. Multi-line constructs like classes, methods, and modules are handled very elegantly, and therefore you can test some fairly complex bits of Ruby quickly and easily. It's a wonderful interactive environment for testing, learning, and experimenting with Ruby.

Unfortunately, it doesn't run under JRuby.

IRB is a very complicated beast. Running IRB results in almost every aspect of the underlying interpreter getting a good pounding; the parser is brutalized for parsing small snippits of code, the evaluator must translate that code into appropriate state changes, and any aspect of the Ruby language must be instantiable and callable interactively. Beyond even the Ruby aspects, IRB provides line-editing capabilities, tab completion, and command history features. Naturally, this presents many challenges for JRuby, and the ability to run IRB would be a huge demonstration of JRuby's maturity.

Running IRB under JRuby originally just blew up immediately; there were core bugs in the libraries and interpreter that prevented early stages of IRB's startup from completing successfully. Many of those issues were the same ones fixed for Rails' "generate" script, such as the parser/block issues and many interpreter bugs. Today I fixed another issue affecting both Rails and IRB, where throw/catch was not correctly passing back the symbol thrown. I was working on "generate", but remembered that I had stopped previous IRB work because of an apparent try/catch problem.

So I took a break from rails and attempted to start up IRB.

C:\rails>jruby C:\ruby\bin\irb
irb(#<irb::workspace:0x5a9c6e>):001:0>

To my amazement, IRB successfully started up. Although hopeful, I had always worried that there were core requirements of IRB that could never be satisfied by JRuby, and that even starting it up would never be possible. Seeing the IRB prompt comeup successfully was a huge relief to me and an unexpected nugget of joy. I'm so glad it happened in the morning; I'll be glowing all day.

Now don't get me wrong. IRB still doesn't work right. I naturally proceeded to type in the beginning of a class definition, and IRB blew up immediately after hitting enter. I never expected the prompt would just start working, and the blow up doesn't temper my joy in any way. There's still more work to do, but this is a very exciting milestone in my book. I now believe without a doubt that we will get IRB to run. The implications of successfully running such a complicated script in JRuby are tremendous, and finally reaching this milestone has made my day.

The Future

Ahh the future. Such a magical time. If not for the promise of the future, what point would there be in writing software. Truly, my greatest motivation for rolling out of bed each day is the possible future I will be walking into.

My hopes for JRuby's future are starting to take shape.

Recently, I encountered more issues with JRuby's performance being a bit lacking. Actually, let's just say it: JRuby is really slow right now. A microbenchmark recently posted to the ruby-talk mailing list implemented a brute-force Sudoku-solving aogorithm. The original poster's compared Ruby's performance to native C code; where the C code took seconds to run, the Ruby version of the algorithm took over half a minute.

Again, Ruby's struggles with performance are widely known. It's also obvious that Ruby's creators and developers are aware of these issues, since many core libraries are implemented in C and since Ruby 2.0 will boast a new interpreter and Vitual Machine as well as many VM features comparable to those in Java and .NET's runtimes.

Naturally, curious how JRuby would perform on this benchmark--and with full awareness that JRuby's performance is far from spectacular--I ran it and waited for a result.

And waited. And waited.

After a few minutes, I killed the VM, assuming that there was something broken in JRuby that prevented the algorithm from terminating successfully. I did a bit of debugging, traced into the very depths of JRuby's evaluator, and found nothing. As far as I could tell, progress was being made and the algorithm was moving forward. My findings warranted another run.

JRuby took over 800 seconds to complete the benchmark, around 13.6 minutes.

I will admit the realization that JRuby is an order of magnitude slower than C Ruby came as a bit of a shock to me. There are many definitions of slow; Ruby's "slow" is for most purposes "fast enough". Java's "slow" is in most cases much faster than is required, and in some cases faster than native C code. JRuby's "slow", it would seem, is a different beast altogether.

However, I am reactionary. Such disappointment immediately sours my stomach and gives me a headache. Could I have been wrong about JRuby's potential? Will this never work?

Performance is a unique problem in JRuby. Since we do not have the option of running native C code for any libraries, and since reimplementing core features in pure Java is both time-consuming and not in the spirit of what we're trying to accomplish, performance concerns have taken a back seat to functionality, compatibility, and correctness. Performance problems are not easily isolated, and never easily solved. However...I love a challenge.

The redesign of JRuby's interpreter over the past several months has been focused on two things: enabling missing features like continuations and green threading; and providing a Java-friendly design that could more easily transition to optimized interpreters and eventually bytecode compilation. What I've essentially been doing amounts to painstaking refactoring of all JRuby's functional guts, from the AST-walking evaluator to the class and object implementations to the threading, framing, scoping, and call mechanisms. All these areas were originally written and designed based on Ruby 1.6 code; there were flashes of OO genius, but the mostly procedural approach of Ruby's C code shined bright throughout JRuby. As you might guess, this is certainly the easiest way to port a language interpreter to any platform: reimplement the same code in your target language of choice. As you might also guess, this does not generally take advantage of that target language's best features.

In JRuby's case, a major missing piece was the inability to longjmp, C's function for leaping from one call stack to another. longjmp is heavily used (understatement!) in Ruby for everything from threading to continuations to exception handling. Missing longjmp in Java presents a very large hole when porting Ruby C. Many creative attempts to mimic longjmp were therefore created: exception-based flow control allowed loop keywords like 'next' to throw control back to a higher-level loop construct; a recursive evaluator repeatedly called itself for new AST nodes, ever-deepening the stack but always keeping lower nodes within the context of higher ones; exception-based "return sleds" allowed returns to bubble their results back up to the appropriate recipient; and on and on. Many of these approaches were extremely novel, worthy of their own papers and accolades. Indeed, several of them have shown up in academic papers and PhD theses in some form or another.

Unfortunately, these features still tried to mimic the way C code worked, which was never 100% achievable. longjmp is an extremely powerful tool that requires the capability to store, retrieve, and manipulate your own call stack. Java provides no such capability, and while exceptions do allow us to escape the stack--mimicking one aspect of longjmp--there is no ability to restore that stack. A new approach was needed.

Enter the JRuby redesign. In October of 2005, I began the process of unraveling JRuby's code with a number of design goals in mind:

The new interpreter must be iterative, rather than recursive, so escaping and restoring the stack are possible. This would enable continuations and green threading.
The JRuby code must be drastically cleaned up and simplified, and there must be a clear separation of concerns to allow future implementations of key subsystems.
JRuby must continue to work with no functional regression throughout this redesign.

The first two points are fairly straightforward. The new interpreter design enables us to provide all the required Ruby language features in a much more Java-friendly way. It also helps qualify JRuby as a real "VM", or at least a micro-VM layer on top of the JVM. I'm planning to start documenting this new design (since it has evolved over time and out of necessity), but it's fairly well-understood within the JRuby team.

The third goal, however, continues to be a serious pain-in-the-ass.

Ruby as a language and as a platform is poorly-specified. There is no conclusive specification; indeed the best spec is the incomplete (but still astounding) documentation provided by Dave Thomas's "Pickaxe" book, Programming Ruby. Given this lacking, the only way a Ruby interpreter can be determined to conform to the Ruby Way is by actually running it. Primarily, this means unit tests.

The Rubicon project was spawned out of a set of unit tests Dave and the PragProg folks created while writing the first edition of "Pickaxe". It tested out many of the features and scriptlets demonstrated in the book, and provided a wide but fairly shallow set of test cases to excercise Ruby features. Rubicon today exists as the "rubytests" project on RubyForge, where it has languished in recent years. Nobody likes writing tests after-the-fact, and the value of such tests is dubious.

JRuby makes heavy use of Rubicon, as well as some of Ruby's and our own internal unit tests, to ensure compatiblity and prevent regression. Anything not covered by those tests or by applications that run on JRuby remain unknown, untested areas until discovered by a new script or application. However, they're the best we've got right now. By implementing a Ruby that can run all or most of those tests as well as as few key applications, we can cobble together over time a pretty good Ruby. Current efforts to run more advanced applications like Rails or IRB are driven by the fact that those test cases do not excercise enough of JRuby to be conclusive, and the more we run the better we get.

When the redesign began, it was immediately apparent that without continually running those test cases and applications we would be diving down the rabbit hole with no insurance; refactoring an entire interpreter is obviously extremely dangerous without a language spec or appropriate unit tests. Goal #3 above became an absolute necessity.

As a result, after every major VM change these past months we have continued to run test cases and scripts to ensure that regressions are prevented or kept to a bare minimum. JRuby's codebase is not terribly large; a wholesale refactoring would not normally take months to complete. However with the added restriction that it must continue to work, the time-to-implementation increased tremendously. In addition, and of primary importance to performance, tradeoffs had to be made between "doing things right" and "doing things fast". Things had to get worse before they could get better.

JRuby's performance was no great shakes before the refactoring, but the 0.8.2 release appears to be as much as 30% faster than the current HEAD version in some scenarios. While such a decrease in speed is worrisome, it comes with the fact that the new VM will enable performance-enhancing optimizations in ways the original never could.

My interest in those enhancements was revitalized by the poor benchmark results. Perhaps one of the most impotant is JRuby's eventual ability to compile Ruby code into Java bytecode. After a long discussion with my good friend Kelly, I believe we have devised a way to make compilation happen without sacrificing goal #1 above. More on that in a future post.

I also started looking to isolate the performance problems. Immediately, I started looking at the redesigned interpreter engine. To make a long story short, the current interpreter has more overhead than the original because rather than recursing for additional nodes in the AST, it "trampolines" from one to the next. Each node encountered is associated with a number of instructions; those instructions are executed in sequence, allowing the Java call stack to remain at the same level and enabling the potential for continuations and green threading (since we can now step away from one instruction sequence and into another, efffectively doing what longjmp does for C). This flexiblity initially comes with decreased performance since the instruction fetch cycle, decoding that instruction into sub-instructions, maintaining a cursor within the AST and instruction sequence, and double-dispatching for each instruction all add overhead.

Small changes in the interpreter can have a drastic effect on performance, and so to put my mind at ease I went ahead with a couple optimizations I had put off. Specifically, I reworked the way flow-control, return values, and exception handling worked, reducing the number of calls and objects created. The results were very promising: a subset of the sudoku benchmark improved by roughly 9%. Since this small change only represented one tiny aspect of the interpreter, my fears have been temporarily put to rest.

Based on my reexamination of the interpreter and on the results of this small optimization, I do not believe that JRuby's performance issues will be a problem much longer. I'm also confident that we can begin improving performance rather than degrading it, since the current interpreter is only a few steps off from its eventual structure. Combining future interpreter optimizations with potentially compiling many or all pure Ruby methods to Java bytecode means we should see drastic improvements in the coming months. Will we ever run as fast as Ruby 1.8 or 2.0? Will we run faster? Time will tell.

Wednesday, February 22, 2006

Migrating from Ant to Rake?

I will admit it. I'm fed up with Ant.

My current charge has me as architect for a number of larger applications and the platform on which they run--perhaps over 1Mloc total from DB to front-end. These applications must be deployable across several environments (dev, integ, qa, ua, prod, prod backup). Each application pushes its own configurations out to the app and web servers, restarting them as appropriate. In addition, our reports are automated via a similar process. All that's required to take a bare web cluster, app cluster, and report cluster from nothing to fully functional is a single build command. The entire process, from beans to nuts, is automated through Ant.

Some particularly nice nuggets:

- Apache configurations, based on a generic template, are generated with environment-specific settings at build time. If you are building to the QA environment, QA servers, urls, and filesystem paths are inserted into the template. There is also a template for installing an app-wide or site-wide outage page.
- WebLogic configurations are generated the same way, with a base template filled out with environment-specific details.
- In both WebLogic and Apache cases, the generated configs are pushed out to the servers as part of the build. In this way, the actual app configuration is versioned along with the apps themselves, and rolling back to a previous release automatically downgrades server configurations.
- Via tools like net.exe and sc.exe in Windows, the build logs on to remote shares and manipulates remote services. This is especially important for WebLogic, where many system classpath and domain-wide configuration settings require a restart of the administration server.
- In order to support "throwing it over the wall" to our client, the build command has been made extremely simple:
ant -Denvironment=qa clean build deploy.all...will completely build and deploy any one of the applications (or the platform itself) to the QA environment.

And so on. All told, it's a beautifully automated, extensively documented build process that reduces even the most complicated build tasks to a single target. It's the epitomy of what an Ant script should be able to do.

It's also over 5000 lines of grotty XML.

Or at least I should say, it's over 2000 lines for one of the applications, while the platform's script is just over 1000 lines, and another smaller application is in the 1100 range. All told, there's at least 5000 lines of build script to maintain, though there's obviously some duplication and a lot of mirroring across those scripts. And unfortunately, my case is far from unique.

Over the past year, I've made multiple efforts to find a way to rewrite or simply refactor these scripts, ranging from genericizing tasks that appear again and again across builds to breaking up larger scripts into smaller ones for specific subsystems. In every case, the end result is no simpler and no easier to maintain than the original; many genericizing attempts actually resulted in more code rather than less, since clients of that genericized code must pass more state and more configuration along. It seems that for these projects and these applications, what we have is as good as it gets.

At least, as good as it gets with Ant.

Ant itself suffers from a number of flaws that I don't need to discuss here. I will, however, call out a few specifics:

- The declarative vs procedural debate rages endlessly; however being a programmer I think procedurally, and most build tasks I wish to automate are procedures rather than simple relationships between disconnected tasks. An enormous amount of overhead is spent in Ant scripts bridging this gap between declarative and procedural worlds, and some of those hacks are seriously ugly.
- Ant does not provide good support for creating "template" build targets, where various elements and tasks within that target are configurable at runtime. If, for example, I have the same rough target for installing an Apache outage-page configuration, I should be able to create a generic version of that target that takes in app-specific tasks and parameters and modifies its behavior accordingly. Ant's minimal support for "params" and "properties" in ant and antcall targets works fine for passing along configuration (aside from the requisite XML overhead from entering <param name="someName" value="someValue"> for every single parameter), but it does little to actually change the behavior of the target itself. You can't change the tasks called and you can't provide alternative targets to execute.
- Ant is extremely poor at sharing across build files. In one of the refactoring efforts, we broke out build targets by subsystem, with EJB stuff in one file and Web stuff in another. Unfortunately, those targets had dependencies on various of the same configurations, tasks, and targets between them. making n build scripts work together as a cohesive whole was exponentially more difficult than making a single build script work well.
- Ant, being XML-based, is grossly verbose. At least 75% of those 5000 lines is due to XML bloat.

I'm sure readers will have any number of alternative solutions to issues I list above...in some cases, you may be able to solve many of Ant's deficiencies. However I would wager a guess that no amount of hacking or refactoring will be able to address all the issues with Ant, and I think there's a body of work that agrees with me.

So what is the alternative?

Maven provides some enhancement to the build process, specifically in the area of managing dependencies, subsystems, and cross-project builds. It also provides a more procedural language, Jelly, which can call and be called from Ant scripts (though Jelly is still XML-based and suffers from the same verbosity). Maven, like other solutions, might provide relief for a few of Ant's failures, but it introduces many more of its own. These applications are also not highly componentized; the size and complexity is almost entirely from application-specific business and presentation logic. While the applications themselves could (and perhaps should) be better componentized, Maven is not a useful or realistic option in the near future, and I'm dubious as to whether it would reduce or increase overall complexity. The old standby Make is of course another option, but there are reasons people use Ant instead.

Here's what I need:
- A procedural build process that understands declarative dependencies
- An elegant and simple language that I can easily write and others can easily read
- Tight integration with Java and awareness of how Java builds must proceed
- Reuse of existing build utilities, including existing Ant tasks and javac support in the JVM

I believe that Ruby's build tool "Rake" is the answer I've been looking for.

Rake was created for many of the same reasons Ant was created, primarily because Make's many faults and deficiencies--however minor--could no longer be overlooked. Rake provides a very procedural way to run builds, but also has awareness of dependencies and task ordering. Most importantly, Rakefiles are simply Ruby code, and so any thing you can do in a Ruby script you can do in a Rakefile. The ability to actually use an "if" statement or a loop can't be overstated here; anyone who's tried to do the same operations in Ant fully realizes how difficult it can be. So out of the box, Rake easily fulfills the first two requirements above. I believe it's time to help Rake realize the second two requirements, and JRuby will make that happen.

JRuby, as you may all know, is an implementation of Ruby that runs on the JVM. Originally written to match Ruby 1.6, it has recently come again under heavy development to finally achieve 1.8-compatibility. In addition, we have started to run the "big ticket" Ruby applications like Rails in an effort to flush out remaining interpreter incompatibilities. Rake is one of those applications.

Currently, there is still work to be done to get Rake working with JRuby. However, while Rake is really an outstanding work of simplicity and a great example of Ruby's power, it does not in my estimation do anything crazy with Ruby...or at least, nothing crazy that JRuby can't support in the near term. Among many other JRuby-related projects, I intend to and believe I can successfully get Rake working.

So if we set aside the current compatibility concerns, we can start to see the potential of Rake+JRuby for building Java applications. First and foremost, Rake running within the JVM would have access to all the same libraries and tools that Ant uses. Calling out to javac, hitting databases with JDBC, running XDoclet or EJBGEN or Annotation-based tools--all will be simple to do from within a Rake+JRuby Rakefile. Second, and perhaps more compelling, there's no reason why a Rakefile couldn't use existing Ant tasks and tools directly. A Rakefile could either transparently wrap existing Ant builds or could directly call Ant tasks (providing an appropriate execution context, of course). In a perfect world, every target in my existing Ant-based build process would be 100% supported in my future Rakefile, with a minimum of wrapping fuss.

So what remains to be done for Rake to replace Ant in my world? Surely the JRuby issues are the first things to resolve; this is of course why JRuby is number one on my spare-time project list. Again putting that aside, there are three areas that would need some additional work for Rake:

1. Rake must allow seamless, flawless integration and intelligence of Java's idiosyncracies, from classpath/classloader management to compilation quirks. As Ant is able to do, Rake must at a minimum handle the basic Java build operations seamlessly.
2. Existing tools, Ant tasks, and frameworks frequently used in builds must either be wrapped as appropriate or there must be a simple, elegant way to make those tools, tasks, and frameworks accessible from within Rakefiles. I do not believe there should be any concerted effort to reimplement or wrap existing code, if there is a way to make that code accessible without excessive overhead.
3. Finally, Rake must represent a reasonable migration path for existing Ant-based builds from a configuration management perspective. Specifically, as much as possible current Ant use cases should have identical or very similar analogs in the Rake world. Rake and Rakefiles currently look and feel (from the outside) very similar to Make and Makefiles for this exact reason. Similar care must be taken on the Ant side.

I believe this is all possible, and very likely possible in the near future. JRuby has been improving by leaps and bounds over the past year, and the market is ripe for an alternative to Ant. Even more than Rails, the ability to build Java applications using Rake is very exciting to me...if only as a way to escape my 5000-line build script hell.

Now if only I had another 8 hours every day to spend exclusively working on this stuff.

Tuesday, February 21, 2006

Making Progress on Rails

I've been making good progress on my end of the Rails-on-JRuby work. I have been focusing on getting the "generate" script working, and as a result fixing multiple bugs and minor issues as they come up. Here's an update.

The most recent issue, now apparently mostly fixed, involved evaluating a Ruby script from within a block. The JRuby parser was not wired to understand parsing from within scopes other than the top-level Object, and so it declared and defined certain variables incorrectly. I modified the parser to allow specifying that it will run from within a block, and the issues have been remedied.

With this fix, here's a rough description of the progress of the generate script:

Execution proceeds into the initializer (vendor/rails/railties/lib/initializer.rb). In the scenario I'm executing (generate script with no parameters) the process method is eventually invoked.
The load path is set (Initializer#set_load_path) without any issues
Connection adapters are set (Initializer#set_connection_adapters)
All frameworks are required in (Initializer#require_frameworks). This was where most of the failures and fixes came into play, but it now executes successfully.
The current environment is loaded (Initializer#load_environment). I'm using the default environment right now.
Initialization of the database begins to run, but appears to get stuck in some infinite loop during YAML parsing of the default database.yml file. I have not investigated this issue yet, and disabled database initialization to continue some testing.
The logger initializes without any issues (Initializer#initialize_logger)
The framework logging and views are initialized (Initializer#initializer_framework_logger and initialize_framework_view)

At this point, the next step is initializing routing (presumably request routing; I'm not any sort of Rails expert yet). This fails with what appears to be some scoping issues, and I have not gotten further at the moment.

Tom Enebo, the other main JRuby developer, is also making progress on actually running Rails in the simplest of deployment scenarios. He's currently beefing up our Socket implementation so that WeBRICK will run correctly. The current sticking point is our less-than-great support for Ruby's IO; namely, we do not support select correctly (or perhaps at all).

I think we've made great progress on both fronts, despite ongoing issues. It's very heartening that for the generate script all the libraries are required successfully and several subsystems initialize without any problems. There's still quite a bit of work to be done, but we're definitely getting there. I'll post more updates as they come in!

Friday, January 27, 2006

Bogle’s Blog » Ruby On Spring

I heard from a friend of mine at Object Partners that JRuby was being used with Spring and Hibernate for rapid prototyping of applications, and today I stumbled across a blog entry by the guy that's doing it.

Bogle’s Blog » Ruby On Spring

Adam Waldal provides a short description of the work they're doing with JRuby. They've got it wired into Hibernate for database access and Spring for reusable services. The whole thing is fronted by Rubyfied JSPs, and it's truly a breath of fresh air compared to the typical top-heavy J2EE applications.

It's encouraging to see JRuby put to such good use!

Thursday, January 26, 2006

Twin Cities Ruby Users Group meeting #2

We're on for the second meeting of the Twin Cities Ruby Users Group (official name pending). The details are available on the group's home page, but the basics are that it's at Digital River in Eden Prairie, Minnesota, on January 31st at 7:30PM CST. We're hoping to have a big turnout, so if you're in the area, please come!

If you would be interested in presenting at one of our meetings, let me know and I'll put you in touch with the appropriate folks. Thank you!

Friday, January 20, 2006

We're Going to San Francisco

I think it's perfect to start this blog with some good news: Tom Enebo and I have been accepted to present JRuby at this year's JavaOne conference. It will be a technical session (yay!) and we hope to pack in as much practical and technical information as possible on all the latest JRuby developments.

Over the next several weeks I'll be posting a few updates on my work with JRuby, mainly focusing around the redesign of the core interpreter and how it plays into Tom's and my plans for JRuby's future. I'll also provide some context by describing the practical details of the redesign, from its earliest stages back in September of 2005.

For those of you that can't wait until Sun's official JavaOne catalog, here's the full abstract we submitted.

JRuby: Bringing Ruby to the JVM

JRuby is an implementation of the Ruby programming language targeted at the Virtual Machine for the Java™ (JVM) platform. Ruby is a dynamically-typed object-oriented language with support for blocks, continuations, and all the usual OO trimmings. JRuby aims to not only support the full Ruby platform, but also provide an enhanced m:n threading model, a heap-allocated “stackless” call stack, AOT and JIT compilation of Ruby to bytecodes, and extensive, pervasive integration between Ruby and Java technology.

Ruby has become a very popular language recently, in part because of the popularity of the Rails web framework, but also due to the careful, cautious evolution of the language and libraries. Because of this popularity, many powerful tools and frameworks are available that would fit well into existing Java applications. We plan for JRuby to run all the high-visibility Ruby applications in concert with existing Java applications and frameworks. Imagine Rails with JDBC ActiveRecord connectors, session or entity beans implemented in Ruby, middle-tier Ruby-based business rule engines, or building your application using the elegant Rake build tool. JRuby will help both Ruby and the JVM language benefit from all these possibilities.

The JRuby session will show you how to apply Ruby on the JVM to common use-cases. We will also show off projects that utilize JRuby and demonstrate the most compelling capabilities offered when Ruby and Java work together. You only need an interest in alternative JVM languages to come away with an appreciation of JRuby’s potential.

The final abstract may differ somewhat from this, but there ya go! I hope to see you all at JavaOne!

Saturday, January 14, 2006

The First Day

I've been up for 24 hours, working on JRuby for about 20 of those. I thought it might be a good time to start a tech blog of my own.

More to come.