Finding Ruby Performance Hotspots via Allocation Stats
RubyParser is a library written by Ryan Davis for parsing Ruby code and producing an abstract syntax tree. It is used by Brakeman and several other static analysis gems.
Recently I was poking around to see if there was any low-hanging fruit for performance improvements. At first, I was interested in the generated parsers. Racc outputs some crazy arrays of state machine changes. Instead of generating arrays of integers, it outputs arrays of strings, then splits those strings into integers which it loads into the final array. I thought for sure skipping this and starting with the final array of integers would be faster, but…somehow it wasn’t.
I moved on to thinking about frozen string literals, which led me to checking String allocations.
Measuring String Allocations
I found the allocation_stats gem very useful for this.
I set up a test like this to read in a file and parse it:
require 'ruby_parser'
require 'allocation_stats'
f = File.read(ARGV[0])
rp = RubyParser.new
stats = AllocationStats.trace do
rp.parse(f, ARGV[0], 40)
end
puts stats.allocations(alias_paths: true).where(class: String).group_by(:sourcefile, :sourceline).sort_by_count.to_text
This outputs a report like this (truncated here):
sourcefile sourceline count
-------------------------------------------------- ---------- -----
<GEM:ruby_parser-3.11.0>/lib/ruby_parser.rb 20 70686
<GEM:ruby_parser-3.11.0>/lib/ruby_parser_extras.rb 1361 58154
<GEM:ruby_parser-3.11.0>/lib/ruby_parser_extras.rb 1362 54672
<GEM:ruby_parser-3.11.0>/lib/ruby_lexer.rb 373 19019
<GEM:ruby_parser-3.11.0>/lib/ruby_lexer.rb 770 12005
<GEM:ruby_parser-3.11.0>/lib/ruby_lexer.rex.rb 109 8252
<GEM:ruby_parser-3.11.0>/lib/ruby_parser_extras.rb 1015 6818
Right away, these look like some juicy targets.
Version Creation
Let’s take a look at the first one:
class Parser < Racc::Parser
include RubyParserStuff
def self.inherited x
RubyParser::VERSIONS << x
end
def self.version
Parser > self and self.name[/(?:V|Ruby)(\d+)/, 1].to_i
end
end
On line 8 you can see the Parser.version
method. RubyParser is actually not just one parser, but multiple parsers for different versions of Ruby.
So there is a RubyParser
class but also Ruby18Parser
, Ruby19Parser
, etc. and RubyParser::V18
, RubyParser::V19
, etc.
To figure out the version of the current class, the code above grabs the version from the class name itself.
The problem is this code is called a lot (70k+ in the example above) to make version-specific decisions during the lexing phase. This is fairly easy to fix.
In my testing, this reduced string allocations by ~25% and parse time by 5-10%. One thing I have noticed - and you may also find if you go chasing object allocations in Ruby programs - is that reducing allocations doesn’t necessarily help with peak memory use or run time. It seems the Ruby VM has gotten pretty good at allocating and garbage collecting objects efficiently.
Debug Code
Let’s take a look at the next two large number of String allocations:
sourcefile sourceline count
-------------------------------------------------- ---------- -----
<GEM:ruby_parser-3.11.0>/lib/ruby_parser_extras.rb 1361 58154
<GEM:ruby_parser-3.11.0>/lib/ruby_parser_extras.rb 1362 54672
Interesting: just two lines apart, with over 100k allocations between them.
def push val
@stack.push val
c = caller.first
c = caller[1] if c =~ /expr_result/
warn "#{name}_stack(push): #{val} at line #{c.clean_caller}" if debug
nil
end
The two lines of interest are 3 and 4 - the assignments to the local variable c
, which pull information from caller
.
caller
is a fairly expensive method, since it needs to generate a stack trace for the current method call.
Upon a closer look, it’s clear the c
variable is only used in the message on the following line, and that message is only used if the debug
flag is set.
This means we can wrap all that code in a condition, like this:
def push val
@stack.push val
if debug
c = caller.first
c = caller[1] if c =~ /expr_result/
warn "#{name}_stack(push): #{val} at line #{c.clean_caller}"
end
nil
end
This change saves 38-50% on string allocations and 20-26% on parse time.
Reading Lines
Skipping down a few unavoidable string allocations, there’s this one:
sourcefile sourceline count
-------------------------------------------------- ---------- -----
<GEM:ruby_parser-3.11.0>/lib/ruby_parser_extras.rb 1015 6818
Here’s the code:
header = str.lines.first(2)
RubyParser checks the first couple lines of a file for any comments setting the encoding for the file. The trouble is that calling String#lines
will split the entire string up when we only need the first two lines.
Grabbing only the first two lines ends up being pretty trivial thanks to Ruby’s standard approach of returning enumerators for enumeration methods if a block is not supplied:
header = str.each_line.first(2)
String#each_line
will lazily return the lines from the string, so it only does the work needed.
Sadly, this didn’t do much for overall string allocations and parse time since this method is only called once, but I think it’s a clear improvement to only grab the two lines needed.
Freezing Strings
Finally, back to the original idea. By the time I made it back to freezing string literals, I was feeling pretty lazy, so I just threw the frozen string header on ruby_lexer.rb
:
# frozen_string_literal: true
Running the tests showed only one method where frozen string literals did not work, so these strings needed to be dup
ed.
String allocations were reduced by 24-30%, but with almost no parse time change. Probably because these were tiny, tiny strings.
Final Metrics
With these four changes, string allocations were reduced by 75-83% and parse time was reduced by 30-37%. The test suite for RubyParser ran 33% faster on my machine.
I did not see a huge decrease in peak memory use. Maybe 3%. My guess is this is because the String representation in Ruby is fairly well-optimized already (e.g. copy-on-write).
For Brakeman, parsing is a decent part of the run time (30-60% even), so a faster RubyParser definitely makes Brakeman scans faster. From a few test scans, I saw as much as a 30% improvement in total scan time.
Final Changes
The final version of the changes applied by Ryan are in this commit.
I expect these improvements will be in the next RubyParser and Brakeman releases.