Thursday, August 31, 2006

Performance: Block Variables Breakdown

In response to my previous blog, Chris Nokleberg noted that if we're using String#equals a lot, interning will have an additional benefit...namely because String#equals short-circuits if both String objects are ==. I had forgotten about that benefit, so I thought I'd poke around for places it might have an effect.

In digging a little, I was reminded that DynamicVariableSet, which holds block-local variables, does a linear search for retrievals and mutations. Compare that to method-local variables in Scope, for which all clients must pass an index to get or set a value. I'm not sure if there's a reason why DynVars must be retrieved by name and Scope vars can be directly indexed...it's probably historical from the C code or a limitation of the current parser. In any case, a linear search could theoretically represent a performance issue accessing or modifying block-local variables. So I thought I'd run some numbers to see how frequently we have to search past the 0 value in the DynVarSet. Here's the breakdown while gem installing rake:

Count of block var retrieves from indicies 0 through 6:
516436
226049
42687
1170
656
122
1

So a total of 787121 block variable lookups, with percentages below:

index 0: 65.6%
index 1: 28.7%
index 2: 5.4%
...with the remainder filling out the last 0.3% of accesses

What can we learn from this? Well DynVarSet currently allocates initial space for up to 8 variables, based on someone's long-past examination of code that showed a maximum of 6 was a good estimate. The numbers above show that, for RubyGems + RDoc at least, a maximum of 3 would cover 99.7% of all cases without requiring expansion of the array, and that with a subsequent expansion to 6 slots we would cover all but 0.0000127% of cases. So using 6 as a maximum and always allocating 8 slots is perhaps a bit generous. More profiling on more code is warranted here, but it might be safe to drop the initial size to 3 and leave in the doubling when expanding.

It also shows us that we pay the cost of two String#equals calls for 28.7% of lookups, and pay for three in 5.4% of lookups, with higher counts statistically insignificant. So although it might seem like we're unlikely to suffer much of a performance hit because most block var lookups are at index 0, our cost to lookup or modify block variables is doubled in over a quarter of cases and tripled in a twentieth of cases. If we normalize all lookups to the same cost (X * 99.7 versus X * 139.2), that translates to a 28% performance boost for block variable lookup cost. Note that I'm not measuring block variable modifications here either, which would conceivably see a similar boost.

And of course what it doesn't show (but which we know to be true) is that if we were able to eliminate both the linear search and the call to String#equals, we'd save a very large percentage of block var lookup time, since we'd reduce it to a simple array indexing. I think it's worth exploring how we could do that--especially since we already do it for Scope in all the same places.

Isn't optimizing legacy code fun?

Performance: Inlining Strings

A while ago, we decided to inline all appropriate symbolic strings as they entered the AST. This appeared to help performance a measurable amount, presumably for two reasons:

  1. The AST would take up less space in the long term
  2. Since Strings cache their hashcodes, having each identical string in the AST be the same object would reduce the number of hash calculations.
And the numbers seem to bear it out. Here's a local rake install:

Without interning AST strings:
real 1m19.894s
user 1m18.649s
sys 0m0.920s

With interning AST strings:
real 1m15.021s
user 1m13.785s
sys 0m0.948s

So it's very measurable...in this case 4-5 seconds out of 80 seconds, or about a 6% gain with interning.

However this week I realized the fallacy of the second point above. In order to intern each incoming string, Java must hash them. This causes all strings entering the AST to be hashed once anyway in order to get the interned version. In fact, it could reduce performance, since we're now forced to hash strings we might never encounter during execution.

I can't confirm that this is 100% true, but it's a reasonable theory. String.intern() is native code, but logic dictates that the fastest way to intern strings would be to use an internal hash/symbol table. So I proceeded this evening to try removing interning to see how it affected performance. I hoped to get a small gain during execution due to a large gain during parsing. The numbers above show that I lose a measurable amount overall by interning. However, I do see substantial parse performance gains:

With interning AST strings (Time.now - t method of measuring 100 parses):
11.101

Without interning AST strings:
9.404

About 1.7 seconds out of 11, or a whopping 15% gain in parse speed without interning. AARGH.

At this point I'm leaning toward removing interning and swallowing the 6% performance hit temporarily until we can figure out where it's coming from. Logically, interning everything should only add overhead, or so my brain tells me. Where's the flaw in my logic?

My London Schedule

So here's the times and places. I want to get together JRubyists to just chat a bit at some point, and it seems like Thursday or Friday would work best. The only concrete suggestion so far is the Fitzroy Tavern on the 15th, around 6:30PM-7:00PM (thanks Damian). This works for me, and it's after the conference. Here's a goofy Google Map to Fitzroy from the conference center.

I'm also planning to attend the "Pizza on Rails" event on the 13th, so folks can find me there.

I'll keep this entry updated as things come in, so it can be used for reference.

13 September, 6:00AM: Arrival in London (Heathrow)

Other than getting checked into the hotel, I have that whole day free. I don't plan to cram for my presentation or anything, since it's not until Friday (and I don't really work that way). To avoid the lag I'll probably sleep as much as I can on the plane. Getting out and about once I get there ought to get the circadians whipped into shape.

13 September, 6:00PM: Pizza on Rails

Pizza on Rails is around 6PM, with pizza at 7PM.

14 September, 9:00AM: RailsConf Day One

The "Welcome" is at 9, and it goes straight through until 8:30PM with a lunch break from 12:30PM to 2:00PM and a dinner break from 5:50PM to 7:30PM. I'll probably try to grab meals with folks from the conf.

15 September, 9:30AM: RailsConf Day Two

Lunch break is the same as Day One, and the conf is scheduled to go through until 6PM, with "Closing Remarks" coming after that. Probably be out by 6:30, and the Fitzroy is pretty close. Maybe someone not going to the conf can get there early to scope it out.

15 September, 3:00PM: JRuby on Rails

I'm in the last set of speakers, before the final plenary session with James Duncan Davidson and Dave Thomas. I'm up against cleaning up rails, adding unicode to rails, and project management for rails. I wish I could attend the unicode session, but I'll try to grab Dominic offline.

15 September, 6:30PM: JRuby Meetup

So far I think we'll just be at Fitzroy Tavern, as mentioned at the top of this post. All are welcome...I don't fly out until 1:25PM the next day, so I can stick around for a while.

16 September 10:00AM: Departure

Given the recent TERROR scare, I'm going to be extra careful and head to the airport nice and early. I'm not particularly worried about TERROR, but I am over-cautious about giving security enough time to scour my person for soda pop and lip balm.

Thus ends my first trip to London. It's too bad it will be so short, but I have business to attend to stateside immediately after the conference. Hopefully I'll stop back in when I'm in the EU for JavaPolis.

I hope to see plenty of JRubyists in London!