*.blog

by Justin Collins

Faster Call Indexing in Brakeman

Background

About a month ago, an issue was reported where Brakeman taking a ridiculously long time on the “call indexing” step. At the time, I was pessimistic about opportunities to improve the performance of call indexing, since it is a pretty simple operation.

Call Indexing

The majority of the checks performed by Brakeman involve finding and examining method calls (e.g., SQL queries). In order to make these checks faster, Brakeman scans an app once and then saves information about each method call in a data structure called the “call index”. This makes searching for specific method calls very fast.

The call index allows search for methods by name and also by the class of the target. For example, it is possible to search for all calls to open on File. It also allows searching via regular expressions for methods and targets.

Investigation

I happened to notice a Rails application in my collection which also seemed to take a long time indexing calls. So I ran it through perftools.rb to see if there was anything interesting going on.

This is the result:

Brakeman 1.8.2 scan

The large amount of time spent in the garbage collector (60%) was high even for Brakeman. But then something else caught my eye:

Call indexing in 1.8.2

This scan spent 4.7% of its time converting Sexps to strings while indexing calls. This seemed excessive.

This is the entirety of the call indexing code:

call_index.rb
1
2
3
4
5
6
7
8
  def index_calls calls
    calls.each do |call|
      @methods << call[:method].to_s
      @targets << call[:target].to_s
      @calls_by_method[call[:method]] << call
      @calls_by_target[call[:target]] << call
    end
  end

@methods and @targets are sets which contain the string versions of all the methods and method targets. This is exclusively used to search for methods and targets via regular expressions.

The call method will always be a symbol…but what about the target? It turns out that while much of the time it is a symbol, if a sane value like :File or :@something cannot be found, then it will be the entire Sexp! This is where Brakeman was wasting time calling Sexp#to_s.

The quick fix was to only store symbol targets in the @targets set, ignoring any other target values.

Results

Scanning the same application with Brakeman 1.8.3 has this result:

Brakeman 1.8.3 scan

Garbage collection time dropped from 60% to 42%. While still very high, this is a good sign. Time spent indexing calls has dropped from 5.4% to 1.8% and the calls to Sexp#to_s have vanished.

The total scan time dropped from 3.5 minutes to about 2 minutes. For the original reporter, scan times went from 78 minutes to 40 seconds.

More Improvements

Looking through Brakeman, it does not currently use the “search via regex” feature for the call index. So the method and target name sets can be removed entirely.

Going even further, nowhere does Brakeman search for targets by any values other than symbols. Note in the graph below that Array#eql? was sampled 1,330 times during call indexing:

Call indexing hashing

Since Sexps are subclassed from Array, it is clear that these calls are generated when using the call[:target] as a hash key (line 6 above). Again, the current Brakeman code only searches for call targets by symbol, never by a full Sexp. There is no reason to the call targets that are Sexps.

This is the modified call indexing code:

Modified call_index.rb
1
2
3
4
5
6
7
8
9
  def index_calls calls
    calls.each do |call|
      @calls_by_method[call[:method]] << call

      unless call[:target].is_a? Sexp
        @calls_by_target[call[:target]] << call
      end
    end
  end

With this code in place, call indexing does not even show up under perftools. Speed improvements vary by project, but this should at least shave off a few seconds. YMMV.

Wrapping Up

Some quick profiling led me to performance improvements where I really did not expect to find them. Sadly, this is one of the cleanest, simplest parts of Brakeman, so I know there are many other instances where Brakeman can be improved. Prior to the introduction of the call index in Brakeman 1.0, I was trying to keep Brakeman scans under 20 minutes (on large applications). Now I worry when scans take longer than a few minutes.

97% of the open source Rails applications I use as test cases can be scanned in less than 30 seconds. Unfortunately, this probably does not reflect scan times for large, commercial applications. Please report any long-running scans! It may lead to more speed improvements like the ones above.

Comments