Understanding ReDoS: The Hidden Threat of Regex Exploitation
Written on
Introduction to ReDoS
When you hear about Denial-of-Service (DoS) attacks, you likely picture a swarm of bots overwhelming a web server's resources. While this is indeed a common method to execute such an attack, there's another, less well-known tactic called Regular Expression Denial of Service, or ReDoS, which exploits the vulnerabilities in regular expressions.
You might wonder, how can regular expressions—tools designed to simplify string matching—be used maliciously? While they do serve to filter and validate strings, attackers can manipulate them to render a server unresponsive. Let's explore how this can happen.
What is a Regular Expression?
At its core, a Regular Expression (regex) is a sequence of characters that forms a search pattern, primarily for string matching in programming languages. For example, consider the following JavaScript code that validates email addresses:
let regex = new RegExp('[a-z0-9]+@[a-z]+\.[a-z]{2,3}');
let testEmails = ["notanemail.com", "[email protected]", "[email protected]", "[email protected]"];
testEmails.forEach((address) => {console.log(regex.test(address))});
Here, the regex checks if email addresses conform to a certain format. Breaking it down:
- [a-z0-9]+ allows lowercase letters and digits, requiring at least one character.
- @ indicates the presence of the "at" symbol.
- [a-z]+ ensures the domain consists solely of lowercase letters.
- \. represents the dot in the domain.
- [a-z]{2,3} restricts the top-level domain to two or three letters.
Now, you might think this regex works well. But how can it lead to a Denial-of-Service attack?
During a ReDoS attack, the regex evaluator may be forced into an inefficient loop due to malicious input. This excessive resource consumption prevents legitimate users from accessing the service, effectively crippling the web application.
Another potential risk arises from poorly designed regex patterns that can lead to failed validations and excessive processing times.
Understanding Evil Regex Patterns
An "Evil Regex" refers to a regex that can be exploited by attackers. According to Wikipedia, these patterns often involve:
- Repetition operators like "+" or "*" applied to complex sub-expressions.
- Sub-expressions that match both a valid string and a suffix of another valid match.
For instance, consider the regex: ^(a+)+$. This pattern requires strings to start and end with 'a'. When inputting a long string of 'a's, the regex evaluator processes it quickly. However, if you append an exclamation mark—aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!—the evaluation time drastically increases due to backtracking, taking about two seconds to return false instead of milliseconds.
This delay occurs as the regex engine attempts to find a way to validate the input against the defined pattern, consuming significant time and resources in the process.
The Exploitability of Regex Patterns
Developers, like anyone, can make errors. Many applications might incorporate regex patterns that are vulnerable to exploitation. For seasoned hackers, identifying these weaknesses can be straightforward, especially when source code is publicly accessible.
Bad regex patterns may allow attackers to circumvent security measures, whether at the web application level or through network firewalls. Therefore, it's crucial to rigorously test regex patterns before deploying applications into production.
Mitigation Strategies
To avoid the pitfalls of regex vulnerabilities, it's best to approach regex with caution, particularly for those who are less experienced. If regex is necessary, consider using safe alternatives to achieve the same outcomes. Testing your regex on platforms like regex101.com can help ensure effectiveness without introducing risks.
Additionally, sanitizing user inputs is an age-old practice that remains effective. Utilizing faster regex engines can also minimize processing time, and tools like url-regex can assist in identifying vulnerabilities in regex patterns.
Conclusion
Often overlooked, the implications of regex patterns can significantly expand the attack surface of an application. In this discussion, we explored ReDoS attacks, their underlying causes, and how to mitigate associated risks. Addressing these vulnerabilities is essential to maintain the availability of your web services and prevent costly downtime.
The first video titled "Revealer: Detecting and Exploiting Regular Expression Denial-of-Service Vulnerabilities" provides insights into identifying and exploiting regex vulnerabilities.
The second video, "USENIX Security '22 - RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense," offers strategies for defending against ReDoS attacks.