Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2965

Abstract

amp;ha3;"</span>></span> <span class="hljs-meta"><!ENTITY <span class="hljs-keyword">ha5</span> <span class="hljs-string">"&ha4; &ha4;"</span>></span> ]></span></pre></div><div id="6cb0"><pre><span class="hljs-tag"><<span class="hljs-name">root</span>></span>&ha5;<span class="hljs-tag"></<span class="hljs-name">root</span>></span></pre></div><p id="afaf">With <code>ha31</code>, we would have 2³⁰ times 😆 . That is a billion laughs. Please note how asymmetric this is: With a document that is less than 1kB big the attacker can make the parser consume about Gigabytes of memory. This can easily consume all memory of a machine and thus render it unusable until the parser is killed or the machine is restarted.</p><p id="3594">A slight variation of the <b>billion laughs attack</b> is called <b>quadratic blowup</b>.</p><p id="336e">Please notice that similar attacks are possible in other file formats such as YAML. The key point here is that those formats have references.</p><h1 id="e7bb">How can I defend against a billion laughs?</h1><p id="dc9b">Assuming that you cannot control the input directly and prevent XMLs with attacks from reaching you at all, I can think of 4 measures:</p><ul><li><b>Lazy evaluation of references</b>: Instead of evaluating the whole document at once, the references are only resolved when necessary. It might solve some issues.</li><li><b>No evaluation of references</b>: Throwing the dangerous feature out of the window for sure means that you’re not vulnerable to the attack anymore. You need to make sure it doesn’t affect your users, though. Communicating this might be hard.</li><li><b>Reference recursion depth limit</b>: The parser itself could be aware of this issue and have a threshold when it stops evaluating references. However, this might also lead to false-positives — documents that get not parsed, because the parser thinks it’s an attack.</li><li><b>RAM restriction</b>: You can run the code that might execute the billion laughs attack under resource restrictions. This means the execution thread/process receives a (catchable) exception and can continue execution normally. It might especially mean that even if the exception is not thrown, the rest of your system might be fine. Only that thread/process might be killed.</li></ul><p id="fb46">So, how do you do this with Python?</p><p id="d1b1">For XML, the simplest solution is to use the <a href="https://pypi.org/project/defusedxml/">defusedxml</a> package as pointed out by <a href="undefined">Diederik van der Boor</a> (thank you!)</p><p id="13ce">The resource restriction is easiest:</p> <figure id="3298"> <div> <div>

            <iframe class="gist-iframe" src="/gist/MartinThoma/dc19635d7b3d89e3061fe67324058a29.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe><

Options

/div></div></figure><p id="c5a3">Restricting the parser is sometimes possible, sometimes not. It depends on your parser. Some have parameters like <code>resolve_entities</code> (<a href="https://lxml.de/api/lxml.etree.XMLParser-class.html">lxml</a>).</p><p id="5982">Limiting the maximum decompression size was done against the HTTP/2 “HPACK” bomb (<a href="https://python-hyper.org/projects/hpack/en/latest/security/CVE-2016-6581.html#the-solution">source</a>).</p><h1 id="0f14">See also</h1><p id="c909">Kate Murphey wrote an awesome article about git bombs, check it out!</p><div id="b508" class="link-block"> <a href="https://kate.io/blog/git-bomb/"> <div> <div> <h2>Exploding Git Repositories</h2> <div><h3>If you are an adventurous sort (and can handle a potential reboot) I invite you to clone this tiny repo: $ git clone…</h3></div> <div><p>kate.io</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*yuEgOB3N4O2p5Xqg)"></div> </div> </div> </a> </div><h1 id="7c70">What’s next?</h1><p id="13f7">In this series about application security (AppSec) we already explained some of the techniques of the attackers 😈 and also techniques of the defenders 😇:</p><ul><li>Part 1:<a href="https://readmedium.com/sql-injections-e8bc9a14c95"> SQL Injections</a> 😈</li><li>Part 2: <a href="https://levelup.gitconnected.com/leaking-secrets-240a3484cb80">Don’t leak Secrets</a> 😇</li><li>Part 3: <a href="https://levelup.gitconnected.com/cross-site-scripting-xss-fd374ce71b2f">Cross-Site Scripting (XSS)</a> 😈</li><li>Part 4: <a href="https://levelup.gitconnected.com/password-hashing-eb3b97684636">Password Hashing</a> 😇</li><li>Part 5: <a href="https://readmedium.com/zip-bombs-30337a1b0112">ZIP Bombs</a> 😈</li><li>Part 6: <a href="https://readmedium.com/captcha-500991bd90a3">CAPTCHA</a> 😇</li><li>Part 7: <a href="https://readmedium.com/email-spoofing-9da8d33406bf">Email Spoofing</a> 😈</li><li>Part 8: <a href="https://readmedium.com/software-composition-analysis-sca-7e573214a98e">Software Composition Analysis</a> (SCA) 😇</li><li>Part 9: <a href="https://readmedium.com/xxe-attacks-750e91448e8f">XXE attacks</a> 😈</li><li>Part 10: <a href="https://levelup.gitconnected.com/effective-access-control-331f883cb0ff">Effective Access Control</a> 😇</li><li>Part 11: <a href="https://readmedium.com/dos-via-a-billion-laughs-9a79be96e139">DOS via a Billion Laughs</a> 😈</li></ul><p id="45a2">And this is about to come:</p><ul><li>CSRF 😈</li><li>DOS 😈</li><li>Credential Stuffing 😈</li><li>Cryptojacking 😈</li><li>Single-Sign-On 😇</li><li>Two-Factor Authentication 😇</li><li>Backups 😇</li><li>Disk Encryption 😇</li></ul><p id="74dd">Let me know if you are interested in more articles around AppSec / InfoSec!</p></article></body>

DOS via a billion laughs 😈

Consume arbitrary much RAM by repeated referencing

The billion laughs attack is known since 2003 (source). The attack uses the references in XML files to make a small source file be huge in memory if all references are expanded. It’s also known as a LOL bomb, XML bomb, or in a variation as a YAML bomb and git bomb. It is a type of denial of service (DOS) attack as it can bring a service down.

Why you should care

This is a bit too specific to be visible in many news articles. However, there are several big projects which were vulnerable over the years:

2003: libxml2 was vulnerable (CVE-2003–1564)
2015: MediaWiki was vulnerable (CVE-2015–2942)
2016: libxml2 was vulnerable … again (CVE-2016–3705)
2016: HTTP/2 header compression was used to build an HPACK bomb (CVE-2016–6581)
2019: Kubernetes was vulnerable (source, CVE-2019–11253)
2019: c3p0 (JDBC database drivers) was vulnerable (CVE-2019–5427)

How it works

The following XML defines an entity ha , then an entity ha2 which contains ha twice. This pattern is repeated. This means ha5 contains ha indirectly 16 times. You can see the exponential growth, can’t you?

<?xml version="1.0"?>

<!DOCTYPE root [
<!ENTITY ha "😆">
<!ENTITY ha2 "&ha; &ha;">
<!ENTITY ha3 "&ha2; &ha2;">
<!ENTITY ha4 "&ha3; &ha3;">
<!ENTITY ha5 "&ha4; &ha4;">
]>

<root>&ha5;</root>

With ha31, we would have 2³⁰ times 😆 . That is a billion laughs. Please note how asymmetric this is: With a document that is less than 1kB big the attacker can make the parser consume about Gigabytes of memory. This can easily consume all memory of a machine and thus render it unusable until the parser is killed or the machine is restarted.

A slight variation of the billion laughs attack is called quadratic blowup.

Please notice that similar attacks are possible in other file formats such as YAML. The key point here is that those formats have references.

How can I defend against a billion laughs?

Assuming that you cannot control the input directly and prevent XMLs with attacks from reaching you at all, I can think of 4 measures:

Lazy evaluation of references: Instead of evaluating the whole document at once, the references are only resolved when necessary. It might solve some issues.
No evaluation of references: Throwing the dangerous feature out of the window for sure means that you’re not vulnerable to the attack anymore. You need to make sure it doesn’t affect your users, though. Communicating this might be hard.
Reference recursion depth limit: The parser itself could be aware of this issue and have a threshold when it stops evaluating references. However, this might also lead to false-positives — documents that get not parsed, because the parser thinks it’s an attack.
RAM restriction: You can run the code that might execute the billion laughs attack under resource restrictions. This means the execution thread/process receives a (catchable) exception and can continue execution normally. It might especially mean that even if the exception is not thrown, the rest of your system might be fine. Only that thread/process might be killed.

So, how do you do this with Python?

For XML, the simplest solution is to use the defusedxml package as pointed out by Diederik van der Boor (thank you!)

The resource restriction is easiest:

Restricting the parser is sometimes possible, sometimes not. It depends on your parser. Some have parameters like resolve_entities (lxml).

Limiting the maximum decompression size was done against the HTTP/2 “HPACK” bomb (source).

What’s next?

In this series about application security (AppSec) we already explained some of the techniques of the attackers 😈 and also techniques of the defenders 😇:

Part 1: SQL Injections 😈
Part 2: Don’t leak Secrets 😇
Part 3: Cross-Site Scripting (XSS) 😈
Part 4: Password Hashing 😇
Part 5: ZIP Bombs 😈
Part 6: CAPTCHA 😇
Part 7: Email Spoofing 😈
Part 8: Software Composition Analysis (SCA) 😇
Part 9: XXE attacks 😈
Part 10: Effective Access Control 😇
Part 11: DOS via a Billion Laughs 😈

And this is about to come:

CSRF 😈
DOS 😈
Credential Stuffing 😈
Cryptojacking 😈
Single-Sign-On 😇
Two-Factor Authentication 😇
Backups 😇
Disk Encryption 😇

Let me know if you are interested in more articles around AppSec / InfoSec!