Tuesday, January 04, 2011

Flat and Graph Profiles for JRuby 1.6

Sometimes it's the little things that make all the difference in the world.

For a long time, we've extolled the virtues of the amazing JVM tool ecosystem. There's literally dozens of profiling, debugging, and monitoring tools, making JRuby perhaps the best Ruby tool ecosystem you can get. But there's a surprising lack of tools for command-line use, and that's an area many Rubyists take for granted.

To help improve the situation, we recently got the ruby-debug maintainers to ship our JRuby version, so JRuby has easy-to-use command-line Ruby debugging support. You can simply "gem install ruby-debug" now, so we'll stop shipping it in JRuby 1.6.

We also shipped a basic "flat" instrumented profiler for JRuby 1.5.6. It's almost shocking how few command-line profiling tools there are available for JVM users; most require you to boot up a GUI and click a bunch of buttons to get any information at all. Even when there are tools for profiling, they're often bad at reporting results for non-Java languages like JRuby. So we decided to whip out a --profile flag that gives you a basic, flat, instrumented profile of your code.

To continue this trend, we enlisted the help of Dan Lucraft of the RedCar project to expand our profiler to include "graph" profiling results. Dan previously implemented JRuby support for the "ruby-prof" project, a native extension to C Ruby, in the form of "jruby-prof" (which you can install and use today on any recent JRuby release). He was a natural to work on the built-in profiling support.

For the uninitiated, "flat" profiles just show how much time each method body takes, possibly with downstream aggregate times and total aggregate times. This is what you usually get from built-in command-line profilers like the "hprof" profiler that ships with Hotspot/OpenJDK. Here's a "flat" profile for a simple piece of code.

~/projects/jruby ➔ jruby --profile.flat -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
Total time: 0.99

total self children calls method
----------------------------------------------------------------
0.99 0.00 0.99 1 Object#foo
0.99 0.08 0.90 1 Fixnum#times
0.70 0.70 0.00 100000 Bignum#to_s
0.21 0.21 0.00 100000 Fixnum#**
0.00 0.00 0.00 145 Class#inherited
0.00 0.00 0.00 1 Module#method_added

A "graph" profile shows the top N call stacks from your application's run, breaking them down by how much time is spent in each method. It gives you a more complete picture of where time is being spent while running your application. Here's a "graph" profile (abbreviated) for the same code.

~/projects/jruby ➔ jruby --profile.graph -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
%total %self total self children calls name
---------------------------------------------------------------------------------------------------------
100% 0% 1.00 0.00 1.00 0 (top)
1.00 0.00 1.00 1/1 Object#foo
0.00 0.00 0.00 145/145 Class#inherited
0.00 0.00 0.00 1/1 Module#method_added
---------------------------------------------------------------------------------------------------------
1.00 0.00 1.00 1/1 (top)
99% 0% 1.00 0.00 1.00 1 Object#foo
1.00 0.09 0.91 1/1 Fixnum#times
---------------------------------------------------------------------------------------------------------
1.00 0.09 0.91 1/1 Object#foo
99% 8% 1.00 0.09 0.91 1 Fixnum#times
0.70 0.70 0.00 100000/100000 Bignum#to_s
0.21 0.21 0.00 100000/100000 Fixnum#**
---------------------------------------------------------------------------------------------------------
0.70 0.70 0.00 100000/100000 Fixnum#times
69% 69% 0.70 0.70 0.00 100000 Bignum#to_s
---------------------------------------------------------------------------------------------------------
0.21 0.21 0.00 100000/100000 Fixnum#times
21% 21% 0.21 0.21 0.00 100000 Fixnum#**
---------------------------------------------------------------------------------------------------------
0.00 0.00 0.00 145/145 (top)
0% 0% 0.00 0.00 0.00 145 Class#inherited
---------------------------------------------------------------------------------------------------------
0.00 0.00 0.00 1/1 (top)
0% 0% 0.00 0.00 0.00 1 Module#method_added

As you can see, you get a much better picture of why certain methods are taking up time and what component calls are contributing most to that time.

We haven't settled on the final command-line flags, but look for the new graph profiling (and the cleaned-up flat profile) to ship with JRuby 1.6 (real soon now!)

5 comments:

Dave Griffiths said...

If JRuby is generating bytecode rather than using an interpreter loop, can't you use hprof itself as the profiler? Even if mixed-mode, all the hot methods that you are interested in are going to get compiled to bytecode. Plus you would also see any meta stuff from the JRuby compiler itself. And you would also see the input from any pure Java calls that JRuby makes.

Or is there no simple mapping from JRuby methods to Java methods?

Charles Oliver Nutter said...

Dave: Yep, that works just fine too, and JRuby can be set to force methods to compile before invocation, so they'll show up in Java tools.

The problem with hprof, though, is that it's not possible to narrow its focus to just Ruby methods (or to exclude core JDK or JRuby implementation methods). As a result, hprof instrumented profiling is *really slow* since it's profiling every piece of Java code loaded.

If you can get past that, the next issue is that our generated bytecode for Ruby methods often has long, mangled names that would be hard for a Rubyist to interpret, and hprof doesn't show filenames or line numbers.

Ultimately I'd love to see better JVM tool support for alternative languages with "peculiar" bytecode, but a JRuby-specific profiler will probably always be needed.

williamlouth said...

Charles,

OpenCore's metering engine (which powers JXInsight) is completely command/console/terminal based.

http://www.jinspired.com/videos/opencore/specjvm2008.top.probes.plugin.mp4

Its also the fastest Java profiler. The fastest JRuby/Ruby profiler and the fastest Jython/Python profiler.

http://williamlouth.wordpress.com/2010/08/23/benchmarking-opencore-probes-looptest-in-ruby/

If you want to go build an adhoc profiling solution for JRuby that's fine but please don't start trying to justify it with claims that such tools don't exist. They do and they are far much more scalable.

http://williamlouth.wordpress.com/2010/09/14/scalable-application-performance-management-part-1/

Charles Oliver Nutter said...

William: Of course I know about JXInsight and OpenCore, and I didn't mean to imply that there are no such command-line tools for profiling. I merely wanted to lament the fact that there's so *few* command-line tools. Out of all the dozens of profiling options for the JVM, I know of exactly three command-line-only tools: "hprof", which is very limited, slow, and profiles *everything*; The Java Interactive Profiler (JIP) library, which seems to have been abandoned (and so we stopped shipping it with JRuby); and OpenCore. Of these, only OpenCore is really useful for instrumented profiling.

The main reason we don't ship OpenCore as the built-in profiler is the fact that it's not open source and therefore not distribution-compatible with JRuby's dist licenses. But there's no reason JRuby users shouldn't look into it.

OpenCore really is an amazing tool, and we'd love to promote it more. Perhaps there's a way it could be easily bundled up as a Ruby gem so that JRuby users could have easier access to it? It could be the "professional grade" profiler while we continue to ship our own "basic" profiling logic as part of JRuby.

I would also like to standardize the hooks we put in place for profiling in JRuby, so that OpenCore can use "blessed" entry points and hopefully have a better guarantee of working from version to version.

Toss me an email :) I didn't mean to slight OpenCore or JXInsight in any way, and you know I'd love to get them in front of more JRuby users. We just wanted a dead simple (and perhaps not as scalable) built-in option, so people have an easily accessible "first tool" to reach for.

williamlouth said...

Hi Charles,

I posted my comment late last night (1 am) with much more emotion than I would have liked.

I definitely want to get more JRuby users trying out OpenCore's JRuby-to-Ruby extension especially those high profile sites with high volumes in production which is the target for OpenCore. Somewhat disappointed that this has not materialized when you consider some of these still have performance issues.

I would really like to see some standardization of the hooks to help with our maintenance and create more competition in the implementation space. I am more than willing to provide some advice in this area to ensure that the interface is optimal and supports very efficient profiling (i.e. static/dynamic meta/state mgmt across callbacks).

I think it is good you finally move away from JIP.

Btw the hope for OpenCore was to have it standardized within JSE. I put it forward to IBM, Sun and Oracle. IBM were approachable and view it very favorably. Sun had considered for their Cloud intiative. Talking to the Oracle/JRockit team is like watching paint dry but it could be worse (i.e. talking with the Oracle Enterprise Manager product team).