Another Reason to Avoid constantize in Rails
Backstory
Recently, a friend asked me if just calling constantize
on user input was dangerous, even if subsequent code did not use the result:
params[:class].classify.constantize
Brakeman generates a “remote code execution” warning for this code:
Confidence: High Category: Remote Code Execution Check: UnsafeReflection Message: Unsafe reflection method `constantize` called with parameter value Code: params[:class].classify.constantize File: app/controllers/users_controller.rb Line: 7
But why? Surely just converting a string to a constant (if the constant even exists!) can’t be dangerous, right?
Coincidentally, around that same time I was looking at Ruby deserialization gadgets - in particular this one which mentions that Ruby’s Digest
module will load a file based on the module name. For example, Digest::A
will try to require 'digest/a'
:
2.7.0 :001 > require 'digest' => true 2.7.0 :002 > Digest::Whatever Traceback (most recent call last): 5: from /home/justin/.rvm/rubies/ruby-2.7.0/bin/irb:23:in `<main>' 4: from /home/justin/.rvm/rubies/ruby-2.7.0/bin/irb:23:in `load' 3: from /home/justin/.rvm/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/irb-1.2.1/exe/irb:11:in `<top (required)>' 2: from (irb):2 1: from /home/justin/.rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/digest.rb:16:in `const_missing' LoadError (library not found for class Digest::Whatever -- digest/whatever)
The Digest
library uses the const_missing
hook to implement this functionality.
This made me wonder if constantize
and const_missing
could be connected, and what the consequences would be.
Constantizing in Rails
The constantize
method in Rails turns a string into a constant. If the constant does not exist then a NameError
will be raised.
However, it is possible to hook into the constant lookup process in Ruby by defining a const_missing
method. If a constant cannot be found in a given module, and that module has const_missing
defined, then const_missing
will be invoked.
2.7.0 :001 > module X 2.7.0 :002 > def self.const_missing(name) 2.7.0 :003 > puts "You tried to load #{name.inspect}" 2.7.0 :004 > end 2.7.0 :005 > end => :const_missing 2.7.0 :006 > X::Hello You tried to load :Hello => nil
If const_missing
is implemented with behavior based on the constant name, such as loading a file or creating a new object, there is an opportunity for malicious behavior.
Some Vulnerable Gems
Fortunately, const_missing
is not used very often. When it is, the implementation is not usually exploitable.
Searching across ~1300 gems, I found only ~40 gems with a const_missing
implementation.
Of those, the majority were not exploitable because they checked the constant name against expected values or called const_get
which raises an exception if the constant does not exist.
One gem, coderay, loads files based on constant names like the Digest library. Also like the Digest library, this does not appear to be exploitable because the files are limited to a single coderay directory.
The next two gems below have memory leaks, which can enable denial of service attacks through memory exhaustion.
Temple
The Temple gem is a foundational gem used by Haml, Slim, and other templating libraries.
In Temple, there is a module called Temple::Mixins::GrammarDSL
that implements const_missing
like this:
def const_missing(name)
const_set(name, Root.new(self, name))
end
The method creates a new constant based on the given name
and assigns a new object.
This is a memory leak since constants are never garbage collected. If an attacker can trigger it, they can create an unlimited number of permanent objects, using up as much memory as possible.
Unfortunately, it is easy to exploit this code.
Temple::Grammar
extends Template::Mixins::GrammarDSL
and is a core class for Temple. Let’s see if it is loaded by Haml, a popular templating library often used with Rails:
2.7.0 :001 > require 'haml' => true 2.7.0 :002 > Temple::Grammar => Temple::Grammar
Great! What happens if we try to reference a module that definitely does not exist?
2.7.0 :003 > Temple::Grammar::DefinitelyDoesNotExist => #<Temple::Mixins::GrammarDSL::Root:0x000055a79b011060 @grammar=Temple::Grammar, @children=[], @name=:DefinitelyDoesNotExist>
As can be seen above, the constant is created along with a new object.
To go one step further… does the use of constantize invoke this code?
We can test by loading a Rails console for an application using Haml:
Loading development environment (Rails 6.0.3.2) 2.7.0 :001 > require 'haml' => false 2.7.0 :002 > 'Temple::Grammar::DefinitelyDoesNotExist'.constantize => #<Temple::Mixins::GrammarDSL::Root:0x000055ba28031a50 @grammar=Temple::Grammar, @children=[], @name=:DefinitelyDoesNotExist>
It does!
Any Ruby on Rails application using Haml or Slim that calls constantize
on user input (e.g. params[:class].classify.constantize
) is vulnerable to a memory leak via this method.
Restforce
A very similar code pattern is implemented in the restforce gem.
The ErrorCode module uses const_missing
like this:
module ErrorCode
def self.const_missing(constant_name)
const_set constant_name, Class.new(ResponseError)
end
end
Nearly the same, except this actually creates new classes, not just regular objects.
We can verify again:
Loading development environment (Rails 6.0.3.2) 2.7.0 :001 > require 'restforce' => false 2.7.0 :002 > Restforce::ErrorCode::WhateverWeWant => Restforce::ErrorCode::WhateverWeWant
This time we get as many new classes as we want.
This has been fixed in Restforce 5.0.0.
Finding and Exploiting Memory Leaks
Finding vulnerable code like this in a production application would be difficult. You would need to guess which parameters might be constantize
d.
Verifying that you’ve found a memory leak is a little tricky and the two memory leaks described above create very minimal objects.
From what I could estimate, a new Rule
object in Temple uses about 300 bytes of memory, while a new class in Restforce was taking up almost 1,000 bytes.
Based on that and my testing, it would take 1 to 4 million requests to use just 1GB of memory.
Given that web applications are usually restarted on a regular basis and it’s not usually a big deal to kill off a process and start a new one, this does not seem particularly impactful.
However, it would be annoying and possibly harmful for smaller sites. For example, the base Heroku instance only has 512MB of memory.
Another note here: Memory leaks are not the worst outcome of an unprotected call to constantize
. More likely it can trigger remote code execution. The real issue I am trying to explore here is the unexpected behavior that may be hidden in dependencies.
Conclusions
In short: Avoid using constantize
in Rails applications. If you need to use it, check against an allowed set of class names before calling constantize
. (Calling classify
before checking is okay, though.)
Likewise for const_missing
in Ruby libraries. Doing anything dynamic with the constant name (loading files, creating new objects, evaluating code, etc.) should be avoided. Ideally, check against an expected list of names and reject anything else.
In the end, this comes down to the security basics of not trusting user input and strictly validating inputs.
Edit: It seems some language I used above was a little ambiguous, so I tweaked it. Calling classify
does not make the code safe - I meant calling classify
is not dangerous by itself. It’s the subsequent call to constantize
that is dangerous. So you can safely call classify
, check against a list of allowed classes, then take the appropriate action.
Why 'Escaping' JavaScript is Dangerous
A recent vulnerability report and the blog post behind it brought my attention back to the escape_javascript
Ruby on Rails helper method.
Let me say it again... if you are calling `escape_javascript` or `j` in your Rails code, please don't! https://t.co/60KLEjHX3T
— Justin Collins (@presidentbeef) May 9, 2020
It’s bad form to drop blanket statements without explanation or evidence, so here it is:
Escaping HTML
Part of the danger of escape_javascript
is the name and apparent relationship to html_escape
.
HTML is a markup language for writing documents. Therefore, it must have a method for representing itself in text.
In other words, there must be a way to encode <b>
such that the browser displays <b>
and does not interpret it as HTML.
As a result, HTML has a well-defined HTML encoding strategy. In the context of security and cross-site scripting, if a value output in an HTML context is HTML escaped, it is safe - the value will not be interpreted as HTML.
(See my post all about escaping!)
Escaping Javascript
On the other hand, JavaScript has no such escaping requirements or capabilities.
Therefore, the “escaping” performed by escape_javascript
is limited.
The vulnerability report states the method is for “escaping JavaScript string literals”.
In particular, escape_javascript
is only useful in one, single context: inside JavaScript strings!
For example:
# ERb Template
<script>
var x = '<%= escape_javascript some_value %>';
</script>
Use of escape_javascript
in any other context is incorrect and dangerous!
This is and always has been dangerous (note the missing quotes):
# ERb Template
<script>
var x = <%= escape_javascript some_value %>;
</script>
some_value
could be a payload like 1; do_something_shady(); //
which would result in the following HTML:
<script>
var x = 1; do_something_shady(); //;
</script>
The escape_javascript
helper does not and cannot make arbitrary values inserted into JavaScript “safe” in the same way html_escape
makes values safe for HTML.
CVE-2020-5267
Jesse’s post has more details, but here’s the gist: JavaScript added a new string literal. Instead of just single and double-quotes, now there are also backticks `
which support string interpolation (like Ruby!).
This meant it was simple to bypass escape_javascript
and execute arbitrary JavaScript by using a backtick to break out of the string or just #{...}
to execute code during interpolation.
For example, if this were our code:
# ERb Template
<script>
var x = `<%= escape_javascript some_value %>`;
</script>
Then if some_value
had a payload of ```; do_something_shady(); //``, the resulting HTML would be:
<script>
var x = ``; do_something_shady(); //`
</script>
This is because escape_javascript
was not aware of backticks for strings.
Dangers of Dynamic Code Generation
Let me say it again… using dynamic javascript under practically any circumstance is inviting trouble. It might be ok. I’d rather not have to worry about it. https://t.co/wnPy3OnkKI
— Shake, Oreo (@ndm) May 9, 2020
As I have talked about before, web applications are essentially poorly-defined compilers generating code with untrusted inputs. In the end, the server is just returning a mishmash of code for the browser to interpret.
However, directly trying to generate safe code in a Turing-complete language like JavaScript or Ruby via string manipulation is a risky game.
Methods like escape_javascript
make it tempting to do so because the name sounds like it will make the code safe.
If at all possible, avoid dynamic code generation!
Sanitizing, Escaping, and Encoding
“We need to sanitize this data” is a phrase I have heard too many times in the context of web security. It always makes me a little nervous.
The implication of the term “sanitize” is somehow cleaning the data or rendering it “safe”. But the details of how that safety is achieved are a little vague.
Often it means simply searching for a function containing sanitize
and blindly using that function.
That is usually the wrong thing!
Injection Vulnerabilities
Injection vulnerabilities, including cross-site scripting, are a top category of web vulnerabilities.
The root cause of injection vulnerabilities is the mixing of code and data which is then handed to a parser (the browser, database driver, shell, etc). Injection is possible when the data is treated as code.
(See my talk about injection for a deeper dive!)
Since proper escaping or sanitization is the mitigation for injection vulnerabilities, it is important to have a clear understanding of what those terms mean.
Escaping
The term “escaping” originates from situations where text is being interpreted in some mode and we want to “escape” from that mode into a different mode.
For example, there are ANSI “escape codes” to tell your terminal to switch from a text mode to interpreting a sequence of control characters.
The more common situation is when a developer needs to tell a parser to not interpret a value as code. For example, when one is writing a string and wants to include a double-quote inside the string:
"blah\"blah"
The backslash \
is an escape character that tells the parser to treat the following character as just a value,
not the end of the string literal.
However, especially in web security, when we say “escaping” we typically mean “encoding”:
Encoding
Encoding involves replacing special characters with a different representation.
HTML encoding uses HTML entities.
For example, <
would normally be interpreted as the start of an HTML tag.
To display a <
character without it being interpreted as a tag, use <
.
In HTML, &
is the escape character. So now you can see how encoding and escaping are intertwined.
In URLs, encoding involves replacing characters with %
followed by a hexadecimal number that corresponds
to the ASCII code for that character.
For example, /
in a URL would normally be interpreted as a path separator.
To pass in /
without it being interpreted that way, use %2F
.
This is called “URL encoding” or “percent encoding” and the %
character is the escape character.
The value after %
is the hex representation of the ASCII code for the desired display character.
Encoding special characters is typically a very simple and straightforward process. Characters are simply replaced with their encoded value in a linear fashion.
The encoding scheme used depends on context. For any type of interpretation (HTML, JavaScript, URLs, CSS, SQL, JSON, …) there will be a different encoding scheme. It is important to use the correct encoding for the context.
Also note that encoding is a completely reversible process! Given an encoded string, we can easily decode it back to the original value.
Sanitizing
Unlike encoding and escaping, sanitization involves removing characters entirely in order to make the value “safe”.
This is a complicated, error-prone process.
Here is a classic example of bad sanitization:
# Remove script tags!
def sanitize_js(input)
input.gsub(/<\/?script>/, "")
end
sanitize_js("<script>alert(1)</script>") # => "alert(1)"
sanitize_js("<scri<script>pt>alert(1)</scr</script>ipt>") # => "<script>alert(1)</script>"
This is not just an amusing theoretical example - I have seen this exact approach used in production applications.
Since sanitization is so difficult - nearly impossible - to do correctly, most sanitization implementations have seen a number of bypasses.
Also, unlike encoding, sanitization is not reversible! Information is lost when the data is sanitized. You cannot retrieve the original input once it has gone through a sanitization process. This is rarely a desirable side-effect.
Sanitization can also mean removal or replacement of sensitive data. That is a different usage not being discussed here.
Using the Right Approach
From a security perspective, contextually encoding untrusted values at time of use is the preferred approach.
The tricky part is understanding the output context of the data and which encoding to use. HTML can easily have more than four different contexts in a single document! Also, it makes no sense to use HTML encoding in SQL.
When possible, use encoding routines provided by libraries or frameworks.
Sanitization should be reserved for cases when encoding is simply not possible. For example, if an application must accept and display HTML from users. There is no way to use encoding in that scenario.
Again, when possible, do not write your own sanitization! Use existing libraries.
Summary
When discussing handling potentially dangerous data, be precise with terms!
The security industry seems to have settled on “escaping” to actually mean “encoding”. In other words, a reversible transformation that encodes special characters so they will not be interpreted as code.
Sanitization, in this context, means an irreversible stripping of special characters.
When possible, prefer encoding/escaping to sanitization!
See Also
OWASP Cross-Site Scripting Prevention Cheatsheet