Headius

Sunday, August 27, 2006

From Eclipse to NetBeans, Part 1

I am attempting to make the switch from Eclipse to NetBeans, and this is a raw dump of the pros and cons so far during that process. Note that these are not meant to question design decisions behind various NetBeans features; they are simply differences that have made the conversion harder or easier.

Stats

Opteron 150 w/ 2G memory
Ubuntu Linux 6, AMD64
Java 5, x86_64 version
NetBeans 5.5b2

The Bad

I'll start off with the bad, because these are more obvious. It's much harder to list good things, since the "best" features will be those that are intuitive to an Eclipse user and that require no re-learning. Please keep in mind that I'm a rank newbie when it comes to NetBeans, but I'm a pretty solid developer. In other words, if I have trouble with these things, most new users will too. There will be more to come, and I'm willing to discuss these with any of the core NetBeans folks at length.

The list, in the order in which I encountered them this evening:

Why isn't antialiasing turned on by default, and why is it a bit hard to find? I don't think of this as an editor setting...I want antialiasing on everywhere. To be honest, I don't understand why this is even a setting. Why would I choose the "make everything uglier" option?
SVN should be available in the default install. Perhaps I'm biased because I need it, but I really, really loathe having to install additional plugins whenever I set up a new workspace.
The concept of multiple workspaces with their own plugins and settings is VERY attractive. NetBeans seems to lack this concept.
I have always preferred to keep the concepts of repository management and project sources separate, rather than making a hard link in my IDE between the two. NetBeans appears to require me to go through a lot of anguish each time I want to switch a project from trunk to branch or back.
Also, there has developed in the Subversion world a standard of using top-level "trunk", "branches", "tags" dirs, and at least one SVN plugin for Eclipse uses that convention to make switching branches, merging, and updating more intuitive. NetBeans should do the same, allowing the typical branch/tag operations against this de-facto standard layout.
I tried to delete a CVS-based project to re-checkout via SVN, and it didn't appear to delete all the files successfully. I had to do it manually AND go back through the Subversion wizard again.
I hate wizards. Don't make me go through a wizard for every one or two settings when I could easily enter them all at once. Wizards like this are totally unhelpful.
Some time back, Eclipse made the wise decision to also show all non-source files in the source or "package" view. It was widely considered a very good idea, and I miss it. I'd like to have a single view that has source or "Project" smarts without masking files and dirs I really want to see.
Eclipse recently added a full-text search to their settings dialog that works very well for finding the location of hard-to-reach settings. I would have found two NetBeans settings much more quickly with such a search by entering "antialias" and "browser".
Browser selection should prefer the settings of the host platform (I'm on Ubuntu Linux).
NetBeans is a goofy name for an IDE. When I think NetBeans I think "beans for doing network stuff". Forte and Eclipse are better names, though completely undescriptive. Something along the lines of "UberIDE" would be even better. Names like Visual Studio, JBuilder, IDEA, and so on have far less ambiguity about what they mean. NetBeans seems misleading, and I doubt anyone would guess it's an IDE if asked.
Eclipse keybindings should include C-A-t (open a specific class) and C-A-r (open a specific file/resource anywhere in the project)
Eclipse provides options for sorting the Outline or "Members View" of a class by name or by visibility, which is nice.
Eclipse provides options to show packages in a hierarchy rather than as expanded names, which is very nice when expanded package names take up a lot of space (as in JRuby). There are also features to allow using a hierarchy but still collapsing empty packages, so I could have org.jruby as a top-level node (because the org package is empty) and then source files and subpackages under it.
Editors are anti-aliased, but nothing else is. Why?
Eclipse supports collapsing multiline /* */ comments, which is very nice for us since every JRuby file includes a reasonably long licensing block at the top.
Eclipse uses color, bolding, and italics better to differentiate different kinds of variables, methods, keywords, and type names.I would rather see type names, static variables and methods, constants, and fields offset in text than method calls and names. It appears that NetBeans by default only offsets the following:
- Comments: grey (I think green is better/easier to read, but that's a matter of opinion)
- Method names, both declarations and calls: bold black
- Keywords: blue
Perhaps accessibility plays a decision in the more drab and colorblind-friendly defaults?
Idle/startup memory use seems a lot higher in NetBeans. Mine's idling at 399MB, where Eclipse hovered under 200MB most of the time with several other projects checked out.
FIXME-style task tags in comments are really, really nice, and I know a LOT of projects that use them (basically, EVERY project that uses Eclipse)
I would like to be able to manually clear search results from previous searches to eliminate text highlighting.
Being able to right click within the file and have SCM actions available in that menu is extremely nice, so I can go to an editor for a file I know I want to commit or compare and do that action right away.
Many files in my project do not have a Subversion submenu in any view...they have a CVS view, and it wants to try to add or delete them from CVS. This isn't even a CVS project, so I have no idea why the CVS submenu would EVER be appropriate to show up. As it is, a large number of files show the CVS submenu and can't be committed directly; I have to commit at a higher level. This is a major goofy bug, perhaps in the Subversion module.
NetBeans takes longer than Eclipse to start up.

The Good

It's only fair to sugar the salt a bit, since it's so far been a pretty good migration. NetBeans has come a long way in the past year or two, and I'm very impressed.

I'm absolutely stunned at how responsive the UI is most of the time. I frequently have to double-check that an operation has finished successfully because they happen so fast. Sluggish response was one of my biggest reasons for not using NetBeans in the early days.
I appreciate the fact that NB uses the ant script for builds. I think this is "The Right Way", though I have my doubts about Ant as the "Right Tool" for building software in general. Seamless maven2 integration might be a good thing to add.
The Runtime pane is nicer and includes more useful things than stock Eclipse provides.
The default monospaced editor font is much neater and more compact than most Courier fonts, which is what Eclipse uses by default. I dislike serifed fonts for code, even more than I dislike non-monospaced fonts.
Automatically updating the UI based on outside changes to the project's files and dirs is a no-brainer; I hate having to refresh in Eclipse.
Undoing all changes to an unsaved modified file causes it to be marked as unmodified again; this is a Very Good Thing.

Friday, August 18, 2006

To Multibyte, Or Not To Multibyte

We've been wrestling with parser speed this past week on the JRuby project, tweaking the lexer, fiddling with the grammar and parser generator, and micro-optimizing all the various support classes. None of those experiments have helped much; performance in each case improved by only a few percentage points.

We've also been wrestling with the issue of Unicode support, since Java supports it well and Ruby does not. We're caught between worlds here, not wanting to create an incompatible Ruby but realizing the absurdity of our lacking Unicode support under Java.

It seems that solutions for the two issues may be mutually exclusive.

Character Pain

After a recent speed comparison between Java, Ruby, C, and a few other languages erupted on the ruby-talk mailing list, it became quickly apparent how expensive writing UTF-16 character sequences to a single-byte encoding can be. The best optimization of the program on that thread pre-encoded and cached all strings to be written (none dynamically generated) as byte[], saving the cost of encoding them later during a stream write. Because of that version's success, I started to wonder what might be the cost of reading and writing char versus byte in Java. The results surprised me.

When I first suggested this comparison on the JRuby dev list, Ola Bini quickly tested out a version of the lexer that used only streams and byte[], rather than readers and Strings. With no other optimizations, that change improved the overall parse performance by almost 20% (Java 5 on Windows x86). Shocking, to say the least.

Given that surprising speed boost, I thought I'd run a few microbenchmarks on reading and writing bytes and characters.

The Test

The source files come in two flavors:

yes.txt, an ISO-8859-1-encoded file filled with "y" characters
yes2.txt, the same file encoded in UTF-16.

Eight scenarios were tested:

reading bytes straight out of the file
reading characters straight out of the file
buffered reading of bytes straight out of the file
buffered reading of characters straight out of the file
reading bytes from a byte array
reading characters from a byte array
writing bytes to a byte array
writing characters to a byte array

I didn't play with various types of streams much; I mainly just ran with a few basic ones I'm familiar with. If there's an optimal way to perform each of these scenarios, please let me know.

Each cycle was run 1000 times, reconstructing streams and readers each time. Each cycle read the equivalent of 10 million characters, which in the case of yes2.txt meant reading 20 million bytes.

All tests were run on an Opteron 150, 2.6GHz, running 64-bit Linux and 64-bit Java 5.

Results: yes.txt, ISO-8859-1, 10 million characters (10 million bytes)

I first ran against the single-byte version:

1000 direct byte reads from file: 22435
1000 direct char reads from file: 112372
1000 buffered byte reads from file: 22625
1000 buffered char reads from file: 112594
1000 buffered byte reads from array: 9477
1000 buffered char reads from byte array: 107975
1000 buffered byte writes to array: 8556
1000 buffered char writes to byte array: 16198

Ouch. In both buffered and unbuffered direct reads from a file, characters fare rather poorly, taking over five times as long. Note that buffering here didn't really help, since filesystem IO is apparently not a limiting factor on this machine.

Notice also how little character reads improved from an in-memory byte array. In this case, I had the code read from an InputStreamReader wrapped around a ByteArrayInputStream. It's certainly possible this number would improve if simply passing the byte[] directly to a String constructor, but the current code seems far slower than I expected.

Not terribly surprising is how much better character writes to a byte array performed. Down-encoding from UTF-16 to a single-byte encoding--especially when we're dealing with all ASCII characters--is pretty cheap. Still, it took twice as long.

Results: yes2.txt, UTF-16, 10 million characters (20 million bytes)

1000 direct byte reads from file: 44717
1000 direct char reads from file: 126998
1000 buffered byte reads from file: 44595
1000 buffered char reads from file: 126401
1000 buffered byte reads from array: 17423
1000 buffered char reads from byte array: 122082
1000 buffered byte writes to array: 17893
1000 buffered char writes to byte array: 57915

Here the character reads fare better, but not by much. While the byte reads took twice as long (duh, we're reading twice as many bytes) the character reads have increased by only about 10%. Since the work done for character reads should be a superset of the work done for byte reads, this shows that it's obviously faster reading from UTF-16 into UTF-16. Unfortunately any speed gains are wiped out when we have to read twice as much data.

The write numbers are confusing, and could indicate an error in my test. Where the byte writes doubled in length, the character writes have almost quadrupled. Either I'm doing something wrong or someone else is. If anything, I would have expected the performance of character writes to decrease no more than the performance of byte writes, since no down-encoding was now necessary. And if I had wired the test wrong and down-encoding is actually occurring, the numbers should have matched the single-byte file.

How This Affects JRuby

MRI (Matz's Ruby Interpreter) currently has poor support for Unicode, mostly cobbled together from various community projects. Ruby 2.0 promises support for every string encoding possible, including Unicode encodings and many others, but we're unlikely to see it for well over a year. Because JRuby runs on Java, us toeing the line and also avoiding Unicode support simply doesn't make sense. As much as we'd like to avoid diverging from MRI, many Javaists simply can't use JRuby effectively without Unicode.

A number of different schemes have been discussed for supporting Unicode in JRuby. Some are based on the Ruby 2.0 plans, or as far as we can take them without causing incompatibility with Ruby 1.8, its libraries, or applications written for it. Some leverage the fact that our Ruby String implementation is using Java's UTF-16 String, simply allowing incoming files to be in any encoding and allowing the parser to work with full UTF-16 characters rather than with our present 0xFF-masked byte-in-a-char. Still others propose we support multibyte encodings, but only in literal strings...which matches Ruby 2.0 plans to only allow single-byte-encoded identifiers in code, but any encoding for embedded literal strings.

The simplest to support, obviously, is to just allow Java to handle decoding the incoming stream, possibly allowing a pragma line (Ruby 2.0-style) to specify a specific encoding. While reading, we handle the pragma and set the remainder of the file to read with the given encoding into full UTF-16 Strings. This achieves the primary goal of Unicode string literals, but has the side effect of allowing Unicode identifiers, something which is so far not supported for Ruby 2.0.

The Ruby 2.0-ish way to handle encodings would be to read the file in as a single-byte encoding first, only using specialized encodings when encountering string literals. Say what you want about that method; I won't comment on its quality, but I will say it would be considerably more difficult for us to implement, and I'm not sure how you would embed non-ASCII-compatible string literals into an ASCII-compatible script file.

I am leaning toward the full Unicode support, where incoming files can be any encoding Java supports and all text can use the full complement of UTF-16-compatible Unicode characters. The compatibility with existing Ruby code is apparent: almost everything out there right now is in an ASCII-compatible format, which we'd be able to support without any work at all. However JRuby scripts that use Unicode characters would almost certainly be incompatible with MRI if any of those characters require multiple bytes; it would be impossible, for example, to take a UTF-16 encoded JRuby file and run it under MRI without modification.

So What?

There are two conflicting goals here: performance and Unicode support.

On the performance front, we would like to always read, parse, and store simple bytes, rather than paying the thunk cost for every character. Perhaps more serious and drastic, we'd like to use a byte[]-based UTF-8 String implementation internally, since Ruby uses String as a general-purpose byte-buffer (for which we currently pay the thunk cost on every read or write operation). The cost of using all characters internally, when everything else comes in the form of bytes, is apparent from the benchmark numbers.

On the Unicode front, we'd like seamless, Java-style Unicode support without quirks or gotchas. We'd like to continue using Java's String internally, and do all our parsing through readers. We would have to suck up the (sometimes large) thunk cost, but we'd have arguably the best Unicode support of any Ruby implementation currently available. We would unfortunately also then support writing scripts that are incompatible with MRI.

What To Do?

All these numbers and all these ideas boil down to a few key questions:

Is the ability to create incompatible scripts for JRuby a showstopper? Is it enough to warn people that we support Unicode more fully than MRI, but that support comes at a price? Is full Unicode support more important than backward-compatibility for JRuby scripts under Ruby 1.8 (or even forward-compatibility for JRuby scripts under Ruby 2.0 as currently specified)?
Is there anything that can be done about the dismal performance of byte-to-char thunking? It worries me for parsing, but worries me even more for our String implementation, which uses the char-based StringBuffer internally as a byte buffer for all Ruby's IO operations. Are parse and Ruby IO performance more important than full Unicode support? Should we hobble JRuby for (perhaps large) performance gains?

I'm anxious to solve both issues; but we may end up having to choose one or the other. However if we could resolve the character-thunking performance issue, the answer would be clear.

Update: The source code for the test, as it was run, is available here.

Thursday, August 17, 2006

Ola Bini: JRuby Goes Camping

Ola Bini on Java, Lisp, Ruby and AI

As part of his series of JRuby "howto" articles, Ola has put together an outstanding walkthrough for getting Camping running under JRuby. It has all the trimmings, including ActiveRecord over JDBC. It took surprisingly little work for us to support in JRuby. I made a few tricky interpreter fixes, and Ola solved some other good bugs, but ActiveRecord has been working since June. Ultimately it seems that a number of recent fixes made on my JRuby branch solved the last few problem, and we can now say JRuby supports Camping.

Thanks go to Ola for his ongoing contributions and again to Evan Buswell for his WEBrick-enabling NIO work in the past.

This is another very compelling application and use case for JRuby. Things are getting very exciting.

Tuesday, August 15, 2006

InfoQ: The Resurgence of Java the Platform

InfoQ: The Resurgence of Java the Platform

A prescient post from Scott Delap, InfoQ's newest Java editor. As you can probably guess, I also believe Java the platform is entering a renaissance with Sun's recent promise for the platform to be "multilingual" and projects like JRuby finally coming into their own. Java the platform--the sleeping giant beneath Java the language--is awakening...and it will speak in dynamic tongues.

Friday, August 11, 2006

.NET and J2EE to get better dynamic language support

Digg: Microsoft and Sun Microsystems have observed growing interest in dynamic programming, and plan to integrate more extensive support for dynamic language features in their respective managed language platforms.

It's interesting to see this kind of article make it to the Digg front page. The links to the eWeek articles on Microsoft and Sun's efforts are also very interesting to read. The Sun article goes into more depth about what changes might be made to the JVM.

read more | digg story

Nibbling Away at Performance

JRuby's performance has never been stellar. Even before the current performance-hindering refactoring and "correctification" work began, it was almost an order of magnitude slower than MRI ("Matz's Ruby Interpreter"). When I started working on my parts of the JRuby internal redesign, I knew thing were going to get worse before they got better...but I think they're finally starting to get better.

I ran some quick numbers comparing performance of JRuby 0.9.0 versus current trunk:

Under 090, gem install rake-0.7.1.gem:
real 1m39.088s
user 1m37.666s
sys 0m1.128s

Under trunk:
real 1m16.388s
user 1m15.233s
sys 0m0.924s

That equates to about a 23% improvement in speed. Considering that we've only been nibbling at performance and that our large-scale performance-related refactoring has just begun, things are looking a lot better than they were six months ago.

The current goal is to get interpreted-mode JRuby as close as possible to MRI performance before we commit to a bytecode compiler. Because the eventual compiler will have to appropriately hook into JRuby's runtime, this only makes sense: if we go full-bore on a compiler now we may see great improvement in performance, but we'll have a much harder time evolving the runtime. By making the interpreter runtime as well-designed and as fast as possible now, we run less of a risk that compilation later on will tie us to a poor runtime design. I believe too many language projects fall into the trap of immediately diving into compilation without first considering how a language should best be represented on the target machine. When we do the hard work of improving the interpreter first, we learn the nuances of the language and gain a better understanding of how that compiler should eventually look. It may even be the case that we find a more direct mapping from the language to the platform that allows us to minimize or eliminate the runtime entirely for compiled code. We'd never reach that conclusion if we prematurely optimized by banking on a compiler too early.

At any rate, things are looking good for JRuby performance, both for small-scale optimizations and large-scale refactorings. The compiler will just make good...better.

Tuesday, August 08, 2006

Interfaces Should Be Modules

Currently, in order to implement a Java interface in JRuby, you extend from it like so:

require 'java'

include_class "java.awt.event.ActionListener"

class MyActionListener < ActionListener
 def actionPerformed(event)
   puts event
 end
end

While documenting a JRUBY-66 workaround and thinking about a longterm fix, it hit me like a diamond bullet through my forehead: interfaces should be treated like modules.

My justification:

You can include many modules, but only extend one class...just like interfaces. Currently in JRuby you can only implement one interface, which is stupid.
Modules imply a particular set of behaviors not specific to a given class hierarchy...just like interfaces.
Ruby implementations of Java interfaces can't extend any other classes; you can't both extend Array and implement Collection, if that were your goal.
Ruby implementations of Java interfaces have bugs when defining initialize, since they don't really just implement that interface...they extend one of our JavaSupport proxies.

Item #1 will be of particular importance as we start using JRuby more and more to implement Java services. In my opinion, this is an unacceptable limitation on JRuby's Java integration capabilities.

Item #3 limits your ability to re-open core Ruby classes and add new Java interfaces to them, something that might greatly simplify mapping Ruby types to Java-land.

Item #4 is the cause of JRUBY-66, since we need to make sure the proxy's initializer is called.

In our defence, we inherited much of this Java integration behavior from the original project owners; however I think mapping interfaces to modules allows for much more powerful and uniform Java integration support.

I know it would be a fairly significant change to make Java interfaces
act like modules, but it seems much more logical to me. It's also primarily a new feature we could phase in, with the < syntax continuing to work for old style interface implementation.

Thoughts?

# yes, I know encapsulation would be better...this is just an example
...
include_class "javax.swing.JButton"

class MyActionRecorder < Array
include ActionListener

def actionPerformed(event)
  self << event
end
end

Monday, August 07, 2006

Distributed Ruby (DRb) "Working Well"

A new member of the JRuby community, Blane Dabney, submitted a patch for JRuby socket IO to resolve a DRb issue he'd been having. Our original implementation of a "write" method was not properly handling line terminators, and would end up blocking on write calls with nothing coming out the other end. After some investigation by us both, Blane managed to put together a simple, working patch that solves the issue.

According to him, DRb from a Ruby client to a JRuby backend now "seems to be working well." I'm letting the patch stew for a bit, but it will likely be committed to trunk in the next couple days.

The ability to use DRb from Ruby to JRuby opens up a whole new world of integration with Java services. I guess it's time one of us got busy on a DRb-to-EJB gateway, don't you think?

Conference Updates

RubyConf*MI

I am registered for RubyConf*MI, though it's still uncertain if I'll attend. The registration cost is a measly $20, but it sounds like it will be a good time. Grand Rapids is about a 9-hour drive, however, so I'm looking for someone to share transportation with from Minneapolis. I probably won't go if it's just me alone.

MinneDemo

I'll be doing a quick (<15 mins) JRuby demo at Minnesota's first DemoCamp. I have no idea what I'll demo yet, but perhaps a more elaborate IRB-based Swing demo like that I did at JavaOne.

RailsConf Europe

Various events that are in mostion and which I won't elaborate on may lead to me attending and presenting at RailsConf Europe. Hopefully those events pan out (and hopefully there's enough time between now and the conf to get Rails working suitably well).

RubyConf

I may not be presenting, but I shall attend! I managed to secure one of the coveted registrations before they sold out two hours later. Regardless of corporate reimbursement, I'm going to make the trek to Denver. I hope to do an unofficial or "lightning" session on JRuby as well, since I know there are many attendees interested in hearing about it.

Sunday Night Niblets

Camping

After a minor fix provided by Ola Bini (thanks Ola!) we now have Camping running under JRuby, using the ActiveRecord JDBC adapter. According to Ola, Camping under JRuby seems to run very well, and feels very snappy. I'm going to be playing with it a bit soon, and may have a demo site up by tomorrow.

JRuby Extras

The JRuby Extras project is officially launched on RubyForge. The ActiveRecord JDBC adapter is there and has received modifications to work with Oracle as well. The work thusfar on Mongrel is also there, and it appears that we may have Mongrel working under JRuby shortly. If you have a particular Ruby app that needs some JRuby-specific modifications or extensions, please let me know; this is a community-driven project to make Ruby apps spectacular under JRuby.

Thursday, August 03, 2006

Calling all Tor Norbyes

Ok, perhaps the blogosphere can help with this one. I've been trying to respond to an email from Tor Norbye at Sun Microsystems since Sunday. Unfortunately, my emails seem to be shuffled off into the ether. He sent another email to me today, saying he hasn't heard back from me.

Tor! I'm right here! Give me an alternate way to contact you...your Sun address seems to be kaput.

Wednesday, August 02, 2006

JRuby: It's pretty much my favorite animal.

Some of the local Rubyists and I were talking about publishers for a future JRuby book...ligers were brought up...and, well, here's the result.

It's just perfect...the lion of Java mated with the tiger of Ruby...and the magical JRuby is their offspring. It brings a tear to your eye, doesn't it?

Feel free to digg it.

Update: A couple folks have noted that O'Reilly's Java animal is already a tiger, so I guess I've got the roles reversed!

Busy Bees

I haven't posted anything substantive in a while, but things are moving rapidly forward. Here's a quick summary of what's been going on with JRuby:

RubySpec

I have launched an effort to build up a Ruby specification, Wiki-style. At The RubySpec Wiki, contributors can write short pages/articles on any aspect of Ruby: the language, the libraries, or the implementations. The eventual goal of this is to create a comprehensive library of content describing in detail how every aspect of Ruby is *supposed* to work. This in turn will help alternative implementations like JRuby, Ruby.NET, Cardinal, and others ensure they are functioning correctly.

CONTRIBUTORS ARE NEEDED! Please create an account and add whatever you can. Found out about a new feature, quirk, or bug in Ruby? Add it! Feel like porting over some core docs in a more spec-like format (i.e. including edge cases and formal semantics)? Go for it! The Spec will only succeed with user contributions. I may sponsor contests to see who can contribute the most...so keep an eye out!

The New RubyTests

The RubyTests project on RubyForge mainly houses the Rubicon suite, a collection of tests originally created for the first PickAxe book and based on Ruby 1.6. Over the past several years, it's been slowly, slowly updated for 1.8, but the library is showing its age. To complicate matters, other test libraries have sprung up to remedy some of Rubicon's deficiencies: BFTS, from the MetaRuby guys, and now a RubyTests project from the Ruby.NET team out of QUT. In addition, contributors to the RubySpec have called for a place to keep tests that go along with the specification. Something had to be done.

This past week, I sent out a proposal to all the RubyTests project members and the MetaRuby guys about finally unifying all our efforts under one grand test suite. The response so far has been excellent...Ryan Davis of MetaRuby told me he agrees with my plan, and others on the RubyTests project also agree this is the way to go. The wheels are in motion!

I will act as steward for the new RubyTests project, but only to fostor community collaboration. We'll initially consider pulling all the myriad projects under the RubyTests umbrella, and then start discussing issues like what testing framework to use, how or whether to generate tests, and how to provide traceability back to items in the nascent RubySpec. I encourage anyone interested in seeing Ruby improve and flourish on all platforms to join the project and contribute.

Block Refactoring Work

It recently became apparent that the current block-management code in JRuby (modeled almost exactly on C Ruby) is rather inefficient; doubly so in JRuby because we don't have C tricks like unions and longjmps. Tom also discovered after some research that much of the block-scoping semantics can be pulled out during the parsing process and stored, saving many searches later on. To these ends, we have both been working on refactoring JRuby internals to improve how blocks function.

I have been working to modify the call chain to pass blocks along as part of the frame. This simplifies a great many things, since the correct block to which to yield is now just a field-access away (and eventually, just an argument-access away). Previously, multiple stack pushes and pops were necessary to get the correct frame, causing great undue overhead. Also, I have rewired how Proc instances are invoked, so instead of two pushes and two pops on our internal "block stack", it now just calls the proc directly. Much cleaner. The eventual goal of this is to eliminate the "block stack" and also the "iter stack", which maintains a stack of flags indicating whether a block is available or currently executing.

Tom's work will make static much of the information about how blocks are scoped, since their relative orientation in the original source provides almost all the information we need. This will allow us to automatically or more quickly locate the appropriate variable when accessing such from within a block, as well as ensuring our variable scoping is handled correctly with multiple nested blocks. He is also keeping in mind that evals can change the list of variables, so the end result should work fine in those cases as well. The end result is that variable scoping will be much more reliable and performant when blocks are involved.

RubyInline for JRuby

After doing a bit of exploration on how Ruby extensions are written, I stumbled across yet another post from Ryan Davis about the beauty and simplicity of RubyInline. I am not a huge C fan, having had my fill of it during my old LiteStep days (I was lead LiteStep dev during the "great redesign" period), but the attraction of RubyInline is undeniable.

Ryan and I had a brief discussion over IM, during which we agreed that adding JRuby/Java support to RubyInline would be a really great idea. Then instead of just specifying C code in your RubyInline blocks, you could easily do the following:

class Example
inline(:C) do |builder|
  builder.c "int test1() {
               int x = 10;
               return x;
             }"
end
inline(:java) do |builder|
  builder.java "public int test1() {
               int x = 10;
               return x;
             }"
end
end

...and know that whether under JRuby or C Ruby, your inlined code would shine through. Look for this effort to pick up soon; Ryan has agreed to include it in RubyInline once it's ready.

Mongrel for JRuby

Danny Lagrouw and Ola Bini, perennial JRuby community superstars, have been working on implementing the native bits of Mongrel in Java. Danny put together a YACC-based HTTP request parser (since we don't have a Java Ragel yet) and today Ola implemented a quick ternary search tree in Java. With these two pieces working, we just have to wire up Mongrel and try it out in JRuby. It's very close.

What's the value of Mongrel when we have servlet containers to host Rails apps? That question answers itself. Name one Rails developer who's enamored of servlet containers. Yeah, I didn't think so. WEBrick is a poor substitute for a real container, and almost all Rails deployments are going Mongrel now. Not supporting Mongrel would be a showstopper for many, many Rails projects. Therefore, we're making it happen.

JRuby Extras!

I have requested a new project on RubyForge called "JRuby Extras". This project is intended to be a JRuby community love-fest, hosting all the bits and pieces needed to support Ruby apps running under JRuby. It will hold such juicy tidbits as:

The upcoming Mongrel support libraries (at least until they're hopefully included in Mongrel proper)
Nick Sieger's excellent ActiveRecord JDBC adapter (until included in Rails)
Any other JRuby-related extensions that don't have good homes elsewhere
Any Java or JRuby-related updates to other projects (like RubyInline) until included directly into those projects

Where the main JRuby project has only Tom and I as gatekeepers, the jruby-extras project will be more community-oriented. If you've got a good idea for how JRuby can be improved (think like Groovy, with its ten-thousand add-ons), toss us an email...and get busy!

The project, once approved, will be jruby-extras on RubyForge.

Standarizing JRuby Extensions

There are a number of extensions to JRuby internally, to replace missing C functionality from C Ruby. There are also a number of extensions being developed externally, to support things like Mongrel. Unfortunately, there is no standard way to write JRuby extensions like there is for C Ruby. The APIs that we expose are subject to change, and the Java world brings along its own conventions and expectations for how plugins ought to work. In order to settle this question, I have kicked off a thread on the JRuby dev mailing list.

We're going to figure out the best way to support JRuby extensions, along these rough lines:

Requiring an extension will look for an extension library just as it does in Ruby; however, it will be looking for a jar file in the load or class paths containing an appropriately-named entry point.
require "my/extension" will most likely look for extension.jar in under the my/ load or class path, and then load my.ExtensionLibrary contained therein
Since we'll want to use direct invocation now in JRuby, we'll want an easy way for extensions to have the same benefits. Rather than having them implement direct-callable interfaces, we'll likely build a code generator that can take a class and a list of method mappings and generate all stubs and callables needed for JRuby. This will also simplify our own code classes and extensions as well.
There are at least two ways within JRuby to define a new class, its metaclass, and their methods. One is easy but a bit broken; the other is correct but cumbersome. Extension writers will get something in the middle...easy but correct, via various helpers and factories. The same model will also be applied internally. Unification!

Making a Move

On a more personal note, there are events afoot that may give me more time to work on JRuby. I won't go into specifics...just let your imagination run wild.

Sunday, July 30, 2006

RailConf Europe 2006 - Will I Be There?

Well now I've gone and done it.

Dear Charles,

We are pleased to inform you that your RailsConf Europe 2006 presentation proposal:

JRuby on Rails

has been accepted, and will be included in one of the conference tracks.

So there's only two problems now:

- I'm not funded through any source for RailsConf Europe.
- I have to spend the next month (working with the other contributors) making Rails run well enough to be presentable.

Blast it all...now I have to start picking and choosing at which venues I will speak. Why's it got to be so complicated?

Saturday, July 29, 2006

On Coding

It's 2AM. The western world is asleep. Bats flit and chatter outside my urban Minnesota home, chasing mosquitos. My available television channels (only broadcast; I don't have cable) have been reduced to dating shows, infomercials, and evangelical fundraisers.

I'm up coding. Why?

Perhaps it's because I eat, sleep and breathe code. I'm usually up until 2 or 3AM hacking or studying code. I wake up around 7, head to work, and crack open my laptop on the bus to do more coding. I code all day on Java EE apps with occasional JRuby breaks. I come home, sit down in my home office and code on JRuby. I go to bed at 2 or 3AM and the process repeats.

I blog about coding.

I go to user groups focused on coding.

I feel uncomfortable at parties unless I can talk about coding (although I do have other hobbies; mathematical/logical puzzles, pool, go, movies, console video games, and beer among them).

When I get drunk, I go on long-winded rants about coding and code-related topics. When I sober up, my first worry is whether I've damaged part of my code-brain.

My touch-typing method has my right-hand home row permanently set at KL;', since I'm one step closer to ;, ", ', |, \, and Enter (and no, Dvorak doesn't work for coding; I've tried *really* hard).

The nonfiction books in my bookshelf are all books on coding or remnants of my CS undergrad studies.

I am a coder.

Passion

I want to know where the other passionate coders are. I know they're out there, juggling bits and optimizing algorithms at all hours of the night. I know they share many of my characteristics. I know they love doing what they do, and perhaps they--like me--have always wanted to spend their lives coding.

How do we find them?

Google seems to know how. Give the coders what they want: let them work when the sun is asleep, let them eat when they want to eat, dress like they want to dress, play like they want to play; let them follow their creativity to whatever end, and reward that creativity both monetarily and politically; let them be.

Is this approach feasible? Google's bottom line seems to say so, boom-inflated numbers notwithstanding. And Google's approach is really just the current in a long line of attempts to appease the coder ethos. The dot-commers tried to figure it out, but rewarded breathing and loud talking as much as true inspiration and hard work. Other companies are now learning from those mistakes; Google is just the most prominent.

What is it that we want? What makes me say "this is the job for me"?

Reread that list of characteristics above. What theme shines through?

Perhaps coders just want the freedom to think, to learn, to create in their own ways. Perhaps it's not about timelines and budgets and marketability. Perhaps coding--really hardcore, 4AM, 24-hours-awake coding--is the passionate, compelling, empowering art form of our time.

Artists are mocked. Artists are ridiculed. Artists are persecuted. Artists are sought out. Artists are revered.

So are coders.

Artists are frequently unsolvable, incomprehensible, unmanageable, intractable.

So are coders.

Artists create their best work when left to their own devices, isolated from the terrible triviums of modern living.

So do coders.

Artists go on long-winded, oft-maligned midnight rants about what it means to be an artist, man, and what it means to create art.

So do coders.

Perhaps what we've always hoped is true. Perhaps we're not misfits or malcontents. Perhaps we're the latest result of that indescribable human spark that moves mountains and shoots the moon. Perhaps it's no longer presumptuous to say it:

Code is the new art.

Sunday, July 23, 2006

The Fastest Ruby Platform? or Hey, Isn't Java Supposed to be Slow?

I love stirring up trouble. In working on the compiler for JRuby, it's become apparent that a few targetted areas could yield tremendous performance benefit even in pure-interpreted mode. I describe a bit of this evening's study here.

As an experiment, I cut out a bunch of the stuff in our ThreadContext that caused a lot of overhead for Java-implemented methods. This isn't a safe thing to do for general cases, since most of these items are important, but I wanted to see what bare invocation might do for speed by reducing some of these areas to theoretical "zero cost".

Cut from per-method overhead:
- block/iter stack manipulation
- rubyclass stack manipulation

Just two pieces of our overall overhead, but two reasonably expensive pieces not needed for fib().

Explanation and wild theories follow the numbers below.

Recall that the original compiled fib was only twice as fast as the interpreted version. The new numbers put it closer to 2/3 faster:

Time for bi-recursive, interpreted: 18.37
Time for bi-recursive, compiled: 6.6160000000000005
Time for bi-recursive, interpreted: 17.837
Time for bi-recursive, compiled: 6.878
Time for iterative, interpreted: 25.222
Time for iterative, compiled: 24.885

So with the unnecessary overhead removed (simulating zero-cost for those bits of the method call process) we're down to mid 6-seconds for fib(30). The iterative version is calculating fib(500000), but I'll come back to that.

Now consider Ruby's own numbers for both of these same tests:

Time for bi-recursive, interpreted: 2.001974
Time for iterative, interpreted: 9.015137

Hmm, now we're not looking so bad anymore, are we? For the recursive version, we're only about 3.5x slower with the reduced overhead. For iterative, only about 2.5x slower. So there's a few other things to consider:

- Our benchmark still creates and pushes/pops a frame per invocation
- Our benchmark still has fairly costly overhead for method arguments, both on the caller and callee sides (think multiple arrays allocated and copied on both ends)
- Our benchmark is still using reflected invocation

Yes, the bits I removed simulate zero cost, which we'll never achieve. However, if we assume we'll get close (or at least minimize overhead for cases like this where those removed bits are obviously not needed), these numbers are not unreasonable. If we further assume we can trim more time off each call by simplifying and speeding up argument/param processing, we're even better. If we eliminate framing or reduce its cost in any way, we're better still. However, the third item above is perhaps the most compelling.

You should all have seen my microbenchmarks for reflected versus direct invocation. Even in the worst comparison, direct invocation (via INVOKEINTERFACE) took at most 1/8 as much time as reflected. The above fib invocation and all the methods it calls are currently using reflected invocation, just like most stuff in JRuby.

So what does performance hypothetically look like for 6.5s times 1/8? How does around 0.8s sound? A full 50-60% faster than C Ruby! What about for iterative...24s / 8 = 3s, a solid 66% boost over C Ruby again. Add in the fact that we're missing a bunch of optimizations, and things are looking pretty damn good. Granted, the actual cost of invoking all those reflected methods is somewhat less than the total, but it's very promising. Even if we assume that the cost of the unoptimized bits is 50% of the total time, leaving 50% cost for reflection, we'd basically be on par with C Ruby.

It's also interesting to note that the interpreted and compiled times for the iterative version are almost identical. Interpretation is expensive for many things, but not for a simple while loop. The iterative version's code is below:

def fib_iter_ruby(n)
   i = 0
   j = 1
   cur = 1
   while cur <= n
     k = i
     i = j
     j = k + j
     cur = cur + 1
   end
   i
end

This is a good example of code that's very light on interpretation. While loops in Ruby and JRuby boil down in both cases to little more than while loops in the underlying language, interpretation or not. The expense of this method is almost entirely in two areas: variable assignment and method calls, neither of which are sped up by compilation. The similarity of the compiled and interpreted numbers for this iterative algorithm show one thing extremely clearly: our method call overhead really, really stinks. It is here we should focus all our efforts in the short term.

Given these new numbers and the fact that we have many optimizations left to do, I think it's now very reasonable to say we could beat C Ruby performance by the end of the year.

Side Note: The compiler work has gone very well, and now supports several types of variables and while loops. This compiler is mostly educational, since it is heavily dependent on the current ThreadContext semantics and primitives. As we attack the call pipeline, the current compiler will break and be left behind, but it has certainly paved the way, showing us what we need to do to make JRuby the fastest Ruby interpreter available.

Wednesday, July 19, 2006

Conference Updates

JavaPolis 2006 - Antwerp, Belgium

I have received confirmation from my employer, Ventera Corporation, that they will fund my trip to Antwerp. Hooray! I'm also planning on bringing my wife and spending the holiday season in Europe. It ought to be a great trip, right on the heels of presenting JRuby.

Any Europeans in Amsterdam, Antwerp, Paris, Venice, Prague, or points nearby that might like to chat some time, let me know. We're planning on visiting at least those five cities.

And for the record, I speak only one European language: Spanish (and very poorly, I might add). My Mandarin Chinese is better, but I don't expect that will help much.

RubyConf 2006 - Denver, Colorado, USA

I will be attending RubyConf 2006, but I still will not have my own presentation. The selection process is complete. I still have standing offers from two other potential presenters to share time, but I have not yet heard whether they were accepted.

Oddly enough, I did receive the following email:

We have finished the presentation selection process, and regret to
have to inform you that your paper was not among those chosen for
inclusion.

Since I missed the submission deadline by a day, it's rather unremarkable that I was not selected to present. So I missed the deadline AND was declined? Ouch!

RailsConf 2006 - London, UK

I have submitted a talk entitled "JRuby on Rails" for RailsConf Europe 2006. I have also heard from my employer that they'll only pay for one European conference. Phooey.

If I'm accepted, I'll have to find another way of funding the trip, since it's not something I can swallow myself. We shall see.

Tuesday, July 18, 2006

JRuby in the News

July 17 was a particularly big news day for JRuby. Three articles were posted, one in two places, on JRuby and the two core developers Tom and I. I present here a collection of articles and blogs that have particular significance to the Ruby+Java world.

Ruby for the Java world
Joshua Fox - JavaWorld - July 17, 2006

Joshua Fox provides a brief introduction to Ruby and demonstrates Ruby on Java using JRuby. Joshua corresponded with Tom and I about this article, and I think the end result turned out well.

Interview: JRuby Development Team
Pat Eyler - Linux Journal - July 17, 2006

Pat put together a great set of questions and we told all in this 3800-word interview. I'm awaiting the flames from my response to "What's next for Ruby".

Interviewing the JRuby Developers
Pat Eyler - O'Reilly Ruby - July 17, 2006

The same interview from above, trimmed for length and posted to the O'Reilly Network.

JRuby Leaves SourceForge for Greener Pastures at Codehaus
Obie Fernandez - July 17, 2006

Obie covers our move from SourceForge to Codehaus, quoting Tom and I venting our frustration over multiple downtimes (including the infamous week-long CVS outage right before JavaOne.

A Gem Of A Language for Java and .NET
Andy Patrizio - May 26, 2006

Andy released this article shortly after our JavaOne press conference appearance. He provides a good executive summary of Ruby, IronRuby, and JRuby.

Ugo Cei on Ruby/Java Integration

In addition, I was today pointed to Ugo Cei's series of blog postings on Ruby/Java integration. They're short, but give a good quick overview of what works and what doesn't. Covered topics include the Ruby/Java bridges RJB (worked, but looks cumbersome) and YAJB (did not work), JRuby (worked like gangbusters, naturally ;), and remoted Java code over XML-RPC (a fairly popular recommendation from Rubyists).

Part I - RJB part 1
Part II - RJB for a more complicated case
Part III - YAJB...OutOfMemory on a simple script
Part IV - JRuby, a great comparison to the second RJB posting
Part V - XML-RPC part 1
Part VI - XML-RPC part 2

Ugo will also be speaking at OSCON on Ruby for Java Programmers, which is great since our own proposal was rejected.

JRuby in the News

Our Codehaus page will track articles as they're published. If you know of good blog entries or articles we should include here, please let me know!

Friday, July 14, 2006

Compilers, Conferences, Codehaus...Oh My!

I keep wondering if things can continue moving along so well indefinitely. JRuby's got the momentum of a steamroller shot out of a cannon these days.

Compilers

I have been back at work on the compiler, which is starting to take real shape now. I'm doing this initial straight-forward version in Java, so I can work on evolving my code-generation library independently. This version also isn't quite so complicated that it warrants using Ruby.

Early returns have been very good. I've been testing it so far with the standard "bi-recursive" fib algorithm, with a few other node types thrown in as I implement them:

def fib_java(n)
 1.0
 false
 true
 [1, 2, 3, 4, 5]
 "hello"
 if n < 2
   n
 else
   fib_java(n - 2) + fib_java(n - 1)
 end
end

The performance boost from the early compiler is roughly 50% when running fib(20):

fib(20)
Time for interpreted: 1.524
Time for compiled: 0.729

The fib algorithm is obviously extremely call-heavy. There's two recursions to fib plus four other calls for '<', '-', and '+'. In JRuby, this means that the overhead of making dyn-dispatched method calls is almost as high as directly interpreting code. For larger runs of fib performance tails off a bit because of this overhead. We're taking a two-pronged approach to performance right now; the compiler is obviously one prong but overall runtime optimization is the other. The compiler gives us an easy 50% boost, but we may see another large boost just by cleaning up and redesigining the runtime to optimize call paths. I would not be surprised if we're able to exceed C Ruby's performance in the near term.

Oh, and I love the following stack trace from a bug in the compiler. Note the file and line number where the error occurred:

Exception in thread "main" java.lang.IncompatibleClassChangeError
       at MyCompiledScript.fib_java (samples/test_fib_compiler.rb:2)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

How cool is that?

Conferences

Close on the heels of my disappointing mixup with RubyConf 2006, we have received an invitation to present at JavaPolis 2006 in Antwerp, Belgium. We actually were hoping to go and would have planned on delivering a BOF or Quicky session, but after sending a proposal one of the conference organizers said JRuby was already on the wish list. We're guaranteed "at least" a full session. JavaPolis will be held December 13-15, and we'll be getting cozy with some of the top Java folks in the world. It should be a fun and exciting event, especially considering what we'll get done in the five months before then.

I've also received other offers from potential RubyConf 2006 presenters to share their tiem slots with me. We should be able to get word out and demos presented after all. I'd still like to be able to present without stepping on others' time, so if you really would like to see JRuby have a full presentation at RubyConf this year, email the organizers.

Codehaus

We have made the move to Codehaus, and we've mostly got JRuby operations migrated there from SourceForge. JIRA is up, Confluence is up with a minimal set of pages, and we're totally running off Codehaus SVN now. So far we're very happy with the move; JIRA is beautiful and the SVN repo works great. We have not yet migrated off SF.net email lists and we haven't moved over all bugs from SF's tracker, but otherwise we're basically a Codehaus project now. Huzzah!

Sunday, July 09, 2006

Is Reflection Really as Fast as Direct Invocation?

This was originally posted to the jruby-devel mailing list, but I am desperate to be proven wrong here. We use reflection extensively to bind Ruby methods to Java impls in JRuby, and the rumors of how fast reflection is have always bothered me. What is the truth? Certainly there are optimizations that make reflection very fast, but as fast as INVOKEINTERFACE and friends? Show me the numbers! Prove me wrong!!

--

It has long been assumed that reflection is fast, and that much is true. The JVM has done some amazing things to make reflected calls really f'n fast these days, and for most scenarios they're as fast as you'd ever want them to be. I certainly don't know the details, but the rumors are that there's code generation going on, reflection calls are actually doing direct calls, the devil and souls are involved, and so on. Many stories, but not a lot of concrete evidence.

A while back, I started playing around with a "direct invocation method" in JRuby. Basically, it's an interface that provides an "invoke" method. The idea is that for every Ruby method we provide in Java code you would create an implementation of this interface; then when the time comes to invoke those methods, we are doing an INVOKEINTERFACE bytecode rather than a call through reflection code.

The down side is that this would create a class for every Ruby method, which amounts to probably several hundred classes. That's certainly not ideal, but perhaps manageable considering you'd have JRuby loaded once in a whole JVM for all uses of it. It could also be mitigated by only doing this for heavily-hit methods. Still, requiring lots of punky little classes is a big deal. [OT: Oh what I would give for delegates right about now...]

The up side, or so I hoped, would be that a straight INVOKEINTERFACE would be faster than a reflected call, regardless of any optimization going on, and we wouldn't have to do any wacked-out code generation.

Initial results seemed to agree with the upside, but in the long term nothing seemed to speed up all that much. There's actually a number of these "direct invocation methods" still in the codebase, specifically for a few heavily-hit String methods like hash, [], and so on.

So I figured I'd resolve this question once and for all in my mind. Is a reflected call as fast as this "direct invocation"?

A test case is attached. I ran the loops for ten million invocations...then ran them again timed, so that hotspot could do its thing. The results are below for both pure interpreter and hotspotted runs (time are in ms).

Hotspotted:
first time reflected: 293
second time reflected: 211
total invocations: 20000000
first time direct: 16
second time direct: 8
total invocations: 20000000

Interpreted:
first time reflected: 9247
second time reflected: 9237
total invocations: 20000000
first time direct: 899
second time direct: 893
total invocations: 20000000

I would really love for someone to prove me wrong, but according to this simple benchmark, direct invocation is faster--way, way faster--in all cases. It's obviously way faster when we're purely interpreting or before hotspot kicks in, but it's even faster after hotspot. I made both invocations increment a static variable, which I'm hoping prevented hotspot from optimizing code into oblivion. However even if hotspot IS optimizing something away, it's apparent that it does a better job on direct invocations. I know hotspot does some inlining of code when it's appropriate to do so...perhaps reflected code is impossible to inline?

Anyone care to comment? I wouldn't mind speeding up Java-native method invocations by a factor of ten, even if it did mean a bunch of extra classes. We could even selectively "directify" methods, like do everything in Kernel and Object and specific methods elsewhere.

--

The test case was attached to my email...I include the test case contents here for your consumption.

private static interface DirectCall {
    public void call();
}

public static class DirectCallImpl implements DirectCall {
    public static int callCount = 0;
    public void call() { callCount += 1; }
}

public static DirectCall dci = new DirectCallImpl();

public static int callCount = 0;
public static void call() { callCount += 1; }

public void testReflected() {
    try {
        Method callMethod = getClass().getMethod("call", new Class[0]);

        long time = System.currentTimeMillis();
        for (int i = 0; i < 10000000; i++) {
            callMethod.invoke(null, null);
        }
        System.out.println("first time reflected: " + (System.currentTimeMillis() - time));
        time = System.currentTimeMillis();
        for (int i = 0; i < 10000000; i++) {
            callMethod.invoke(null, null);
        }
        System.out.println("second time reflected: " + (System.currentTimeMillis() - time));
        System.out.println("total invocations: " + callCount);
    } catch (Exception e) {
        e.printStackTrace();
        assertTrue(false);
    }
}

public void testDirect() {
    long time = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) {
        dci.call();
    }
    System.out.println("first time direct: " + (System.currentTimeMillis() - time));
    time = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) {
        dci.call();
    }
    System.out.println("second time direct: " + (System.currentTimeMillis() - time));
    System.out.println("total invocations: " + DirectCallImpl.callCount);
}

Update: A commenter noticed that the original code was allocating a new Object[0] for every call to the reflected method; that was a rather dumb mistake on my part. The commenter also noted that I was doing a direct call to the impl rather than a call to the interface, which was also true. I updated the above code and re-ran the numbers, and reflection does much better as a result...but still not as fast as the direct call:

Hotspotted:

first time reflected: 146
second time reflected: 109
total invocations: 20000000
first time direct: 15
second time direct: 8
total invocations: 20000000

Interpreted:

first time reflected: 6560
second time reflected: 6565
total invocations: 20000000
first time direct: 912
second time direct: 920
total invocations: 20000000