Another Reason to Avoid constantize in Rails

Backstory

Recently, a friend asked me if just calling constantize on user input was dangerous, even if subsequent code did not use the result:

params[:class].classify.constantize

Brakeman generates a “remote code execution” warning for this code:

Confidence: High
Category: Remote Code Execution
Check: UnsafeReflection
Message: Unsafe reflection method `constantize` called with parameter value
Code: params[:class].classify.constantize
File: app/controllers/users_controller.rb
Line: 7

But why? Surely just converting a string to a constant (if the constant even exists!) can’t be dangerous, right?

Coincidentally, around that same time I was looking at Ruby deserialization gadgets - in particular this one which mentions that Ruby’s Digest module will load a file based on the module name. For example, Digest::A will try to require 'digest/a':

2.7.0 :001 > require 'digest'
 => true 
2.7.0 :002 > Digest::Whatever
Traceback (most recent call last):
        5: from /home/justin/.rvm/rubies/ruby-2.7.0/bin/irb:23:in `<main>'
        4: from /home/justin/.rvm/rubies/ruby-2.7.0/bin/irb:23:in `load'
        3: from /home/justin/.rvm/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/irb-1.2.1/exe/irb:11:in `<top (required)>'
        2: from (irb):2
        1: from /home/justin/.rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/digest.rb:16:in `const_missing'
LoadError (library not found for class Digest::Whatever -- digest/whatever)

The Digest library uses the const_missing hook to implement this functionality.

This made me wonder if constantize and const_missing could be connected, and what the consequences would be.

Constantizing in Rails

The constantize method in Rails turns a string into a constant. If the constant does not exist then a NameError will be raised.

However, it is possible to hook into the constant lookup process in Ruby by defining a const_missing method. If a constant cannot be found in a given module, and that module has const_missing defined, then const_missing will be invoked.

2.7.0 :001 > module X
2.7.0 :002 >   def self.const_missing(name)
2.7.0 :003 >     puts "You tried to load #{name.inspect}"
2.7.0 :004 >   end
2.7.0 :005 > end
 => :const_missing 
2.7.0 :006 > X::Hello
You tried to load :Hello
 => nil

If const_missing is implemented with behavior based on the constant name, such as loading a file or creating a new object, there is an opportunity for malicious behavior.

Some Vulnerable Gems

Fortunately, const_missing is not used very often. When it is, the implementation is not usually exploitable.

Searching across ~1300 gems, I found only ~40 gems with a const_missing implementation.

Of those, the majority were not exploitable because they checked the constant name against expected values or called const_get which raises an exception if the constant does not exist.

One gem, coderay, loads files based on constant names like the Digest library. Also like the Digest library, this does not appear to be exploitable because the files are limited to a single coderay directory.

The next two gems below have memory leaks, which can enable denial of service attacks through memory exhaustion.

Temple

The Temple gem is a foundational gem used by Haml, Slim, and other templating libraries.

In Temple, there is a module called Temple::Mixins::GrammarDSL that implements const_missing like this:

def const_missing(name)
  const_set(name, Root.new(self, name))
end

The method creates a new constant based on the given name and assigns a new object.

This is a memory leak since constants are never garbage collected. If an attacker can trigger it, they can create an unlimited number of permanent objects, using up as much memory as possible.

Unfortunately, it is easy to exploit this code.

Temple::Grammar extends Template::Mixins::GrammarDSL and is a core class for Temple. Let’s see if it is loaded by Haml, a popular templating library often used with Rails:

2.7.0 :001 > require 'haml'
 => true 
2.7.0 :002 > Temple::Grammar
 => Temple::Grammar 

Great! What happens if we try to reference a module that definitely does not exist?

2.7.0 :003 > Temple::Grammar::DefinitelyDoesNotExist
 => #<Temple::Mixins::GrammarDSL::Root:0x000055a79b011060 @grammar=Temple::Grammar, @children=[], @name=:DefinitelyDoesNotExist> 

As can be seen above, the constant is created along with a new object.

To go one step further… does the use of constantize invoke this code?

We can test by loading a Rails console for an application using Haml:

Loading development environment (Rails 6.0.3.2)
2.7.0 :001 > require 'haml'
 => false 
2.7.0 :002 > 'Temple::Grammar::DefinitelyDoesNotExist'.constantize
 => #<Temple::Mixins::GrammarDSL::Root:0x000055ba28031a50 @grammar=Temple::Grammar, @children=[], @name=:DefinitelyDoesNotExist> 

It does!

Any Ruby on Rails application using Haml or Slim that calls constantize on user input (e.g. params[:class].classify.constantize) is vulnerable to a memory leak via this method.

Restforce

A very similar code pattern is implemented in the restforce gem.

The ErrorCode module uses const_missing like this:

module ErrorCode
  def self.const_missing(constant_name)
    const_set constant_name, Class.new(ResponseError)
  end
end

Nearly the same, except this actually creates new classes, not just regular objects.

We can verify again:

Loading development environment (Rails 6.0.3.2)
2.7.0 :001 > require 'restforce'
 => false 
2.7.0 :002 > Restforce::ErrorCode::WhateverWeWant
 => Restforce::ErrorCode::WhateverWeWant 

This time we get as many new classes as we want.

This has been fixed in Restforce 5.0.0.

Finding and Exploiting Memory Leaks

Finding vulnerable code like this in a production application would be difficult. You would need to guess which parameters might be constantized.

Verifying that you’ve found a memory leak is a little tricky and the two memory leaks described above create very minimal objects.

From what I could estimate, a new Rule object in Temple uses about 300 bytes of memory, while a new class in Restforce was taking up almost 1,000 bytes.

Based on that and my testing, it would take 1 to 4 million requests to use just 1GB of memory.

Given that web applications are usually restarted on a regular basis and it’s not usually a big deal to kill off a process and start a new one, this does not seem particularly impactful.

However, it would be annoying and possibly harmful for smaller sites. For example, the base Heroku instance only has 512MB of memory.

Another note here: Memory leaks are not the worst outcome of an unprotected call to constantize. More likely it can trigger remote code execution. The real issue I am trying to explore here is the unexpected behavior that may be hidden in dependencies.

Conclusions

In short: Avoid using constantize in Rails applications. If you need to use it, check against an allowed set of class names before calling constantize. (Calling classify before checking is okay, though.)

Likewise for const_missing in Ruby libraries. Doing anything dynamic with the constant name (loading files, creating new objects, evaluating code, etc.) should be avoided. Ideally, check against an expected list of names and reject anything else.

In the end, this comes down to the security basics of not trusting user input and strictly validating inputs.

Edit: It seems some language I used above was a little ambiguous, so I tweaked it. Calling classify does not make the code safe - I meant calling classify is not dangerous by itself. It’s the subsequent call to constantize that is dangerous. So you can safely call classify, check against a list of allowed classes, then take the appropriate action.

Why 'Escaping' JavaScript is Dangerous

A recent vulnerability report and the blog post behind it brought my attention back to the escape_javascript Ruby on Rails helper method.

It’s bad form to drop blanket statements without explanation or evidence, so here it is:

Escaping HTML

Part of the danger of escape_javascript is the name and apparent relationship to html_escape.

HTML is a markup language for writing documents. Therefore, it must have a method for representing itself in text. In other words, there must be a way to encode <b> such that the browser displays <b> and does not interpret it as HTML.

As a result, HTML has a well-defined HTML encoding strategy. In the context of security and cross-site scripting, if a value output in an HTML context is HTML escaped, it is safe - the value will not be interpreted as HTML.

(See my post all about escaping!)

Escaping Javascript

On the other hand, JavaScript has no such escaping requirements or capabilities.

Therefore, the “escaping” performed by escape_javascript is limited. The vulnerability report states the method is for “escaping JavaScript string literals”.

In particular, escape_javascript is only useful in one, single context: inside JavaScript strings!

For example:

# ERb Template
<script>
  var x = '<%= escape_javascript some_value %>';
</script>

Use of escape_javascript in any other context is incorrect and dangerous!

This is and always has been dangerous (note the missing quotes):

# ERb Template
<script>
  var x = <%= escape_javascript some_value %>;
</script>

some_value could be a payload like 1; do_something_shady(); // which would result in the following HTML:

<script>
  var x = 1; do_something_shady(); //; 
</script>

The escape_javascript helper does not and cannot make arbitrary values inserted into JavaScript “safe” in the same way html_escape makes values safe for HTML.

CVE-2020-5267

Jesse’s post has more details, but here’s the gist: JavaScript added a new string literal. Instead of just single and double-quotes, now there are also backticks ` which support string interpolation (like Ruby!).

This meant it was simple to bypass escape_javascript and execute arbitrary JavaScript by using a backtick to break out of the string or just #{...} to execute code during interpolation.

For example, if this were our code:

# ERb Template
<script>
  var x = `<%= escape_javascript some_value %>`;
</script>

Then if some_value had a payload of ```; do_something_shady(); //``, the resulting HTML would be:

<script>
  var x = ``; do_something_shady(); //`
</script>

This is because escape_javascript was not aware of backticks for strings.

Dangers of Dynamic Code Generation

As I have talked about before, web applications are essentially poorly-defined compilers generating code with untrusted inputs. In the end, the server is just returning a mishmash of code for the browser to interpret.

However, directly trying to generate safe code in a Turing-complete language like JavaScript or Ruby via string manipulation is a risky game. Methods like escape_javascript make it tempting to do so because the name sounds like it will make the code safe.

If at all possible, avoid dynamic code generation!

Sanitizing, Escaping, and Encoding

“We need to sanitize this data” is a phrase I have heard too many times in the context of web security. It always makes me a little nervous.

The implication of the term “sanitize” is somehow cleaning the data or rendering it “safe”. But the details of how that safety is achieved are a little vague.

Often it means simply searching for a function containing sanitize and blindly using that function.

That is usually the wrong thing!

Injection Vulnerabilities

Injection vulnerabilities, including cross-site scripting, are a top category of web vulnerabilities.

The root cause of injection vulnerabilities is the mixing of code and data which is then handed to a parser (the browser, database driver, shell, etc). Injection is possible when the data is treated as code.

(See my talk about injection for a deeper dive!)

Since proper escaping or sanitization is the mitigation for injection vulnerabilities, it is important to have a clear understanding of what those terms mean.

Escaping

The term “escaping” originates from situations where text is being interpreted in some mode and we want to “escape” from that mode into a different mode.

For example, there are ANSI “escape codes” to tell your terminal to switch from a text mode to interpreting a sequence of control characters.

The more common situation is when a developer needs to tell a parser to not interpret a value as code. For example, when one is writing a string and wants to include a double-quote inside the string:

"blah\"blah"

The backslash \ is an escape character that tells the parser to treat the following character as just a value, not the end of the string literal.

However, especially in web security, when we say “escaping” we typically mean “encoding”:

Encoding

Encoding involves replacing special characters with a different representation.

HTML encoding uses HTML entities.

For example, < would normally be interpreted as the start of an HTML tag. To display a < character without it being interpreted as a tag, use &lt;.

In HTML, & is the escape character. So now you can see how encoding and escaping are intertwined.

In URLs, encoding involves replacing characters with % followed by a hexadecimal number that corresponds to the ASCII code for that character.

For example, / in a URL would normally be interpreted as a path separator. To pass in / without it being interpreted that way, use %2F.

This is called “URL encoding” or “percent encoding” and the % character is the escape character. The value after % is the hex representation of the ASCII code for the desired display character.

Encoding special characters is typically a very simple and straightforward process. Characters are simply replaced with their encoded value in a linear fashion.

The encoding scheme used depends on context. For any type of interpretation (HTML, JavaScript, URLs, CSS, SQL, JSON, …) there will be a different encoding scheme. It is important to use the correct encoding for the context.

Also note that encoding is a completely reversible process! Given an encoded string, we can easily decode it back to the original value.

Sanitizing

Unlike encoding and escaping, sanitization involves removing characters entirely in order to make the value “safe”.

This is a complicated, error-prone process.

Here is a classic example of bad sanitization:

# Remove script tags!
def sanitize_js(input)
  input.gsub(/<\/?script>/, "")
end

sanitize_js("<script>alert(1)</script>") # => "alert(1)"

sanitize_js("<scri<script>pt>alert(1)</scr</script>ipt>") # => "<script>alert(1)</script>"

This is not just an amusing theoretical example - I have seen this exact approach used in production applications.

Since sanitization is so difficult - nearly impossible - to do correctly, most sanitization implementations have seen a number of bypasses.

Also, unlike encoding, sanitization is not reversible! Information is lost when the data is sanitized. You cannot retrieve the original input once it has gone through a sanitization process. This is rarely a desirable side-effect.

Sanitization can also mean removal or replacement of sensitive data. That is a different usage not being discussed here.

Using the Right Approach

From a security perspective, contextually encoding untrusted values at time of use is the preferred approach.

The tricky part is understanding the output context of the data and which encoding to use. HTML can easily have more than four different contexts in a single document! Also, it makes no sense to use HTML encoding in SQL.

When possible, use encoding routines provided by libraries or frameworks.

Sanitization should be reserved for cases when encoding is simply not possible. For example, if an application must accept and display HTML from users. There is no way to use encoding in that scenario.

Again, when possible, do not write your own sanitization! Use existing libraries.

Summary

When discussing handling potentially dangerous data, be precise with terms!

The security industry seems to have settled on “escaping” to actually mean “encoding”. In other words, a reversible transformation that encodes special characters so they will not be interpreted as code.

Sanitization, in this context, means an irreversible stripping of special characters.

When possible, prefer encoding/escaping to sanitization!

See Also

OWASP Cross-Site Scripting Prevention Cheatsheet

OWASP Injection Prevention Cheatsheet

Taking on the King: Killing Injection Vulnerabilities