Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fails test test_billion_laughs #22

Open
jonassmedegaard opened this issue May 11, 2022 · 3 comments
Open

fails test test_billion_laughs #22

jonassmedegaard opened this issue May 11, 2022 · 3 comments

Comments

@jonassmedegaard
Copy link

Hi,

Building on a Debian system currently fails its testsuite like this:

======================================================================
FAIL: test_billion_laughs (html_sanitizer.tests.SanitizerTestCase) (before='<?xml version="1.0"?>\n<!DOCTYPE lolz [\n <!ENTITY lol "lol">\n <!ELEMENT lolz (#PCDATA)>\n <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">\n <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">\n <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">\n <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">\n <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">\n <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">\n <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">\n <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">\n <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">\n]>\n<lolz>&lol9;</lolz>\n', after='             ]&gt; &amp;lol9; ')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html-sanitizer-1.9.3/.pybuild/cpython3_3.10/build/html_sanitizer/tests.py", line 15, in run_tests
    self.assertEqual(
AssertionError: '&lt;!ELEMENT lolz (#PCDATA)&gt; &lt;!ENTI[1109 chars]ol9;' != ']&gt; &amp;lol9;'
Diff is 1177 characters long. Set self.maxDiff to None to see it. : Cleaning 'b'<?xml version="1.0"?>\\n<!DOCTYPE lolz [\\n <!ENTITY lol "lol">\\n <!ELEMENT lolz (#PCDATA)>\\n <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">\\n <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">\\n <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">\\n <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">\\n <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">\\n <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">\\n <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">\\n <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">\\n <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">\\n]>\\n<lolz>&lol9;</lolz>\\n'', expected 'b'             ]&gt; &amp;lol9; '' but got 'b'  &lt;!ELEMENT lolz (#PCDATA)&gt; &lt;!ENTITY lol1 "&amp;lol;&amp;lol;&amp;lol;&amp;lol;&amp;lol;&amp;lol;&amp;lol;&amp;lol;&amp;lol;&amp;lol;"&gt; &lt;!ENTITY lol2 "&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;&amp;lol1;"&gt; &lt;!ENTITY lol3 "&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;&amp;lol2;"&gt; &lt;!ENTITY lol4 "&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;&amp;lol3;"&gt; &lt;!ENTITY lol5 "&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;&amp;lol4;"&gt; &lt;!ENTITY lol6 "&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;&amp;lol5;"&gt; &lt;!ENTITY lol7 "&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;&amp;lol6;"&gt; &lt;!ENTITY lol8 "&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;&amp;lol7;"&gt; &lt;!ENTITY lol9 "&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;&amp;lol8;"&gt; ]&gt; &amp;lol9; ''
======================================================================
FAIL: test_billion_laughs (html_sanitizer.tests.SanitizerTestCase) (before=' <?xml version="1.0"?>\n  <!DOCTYPE foo [\n   <!ELEMENT foo ANY >\n   <!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>\n', after='    ]&gt;&amp;xxe; ')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html-sanitizer-1.9.3/.pybuild/cpython3_3.10/build/html_sanitizer/tests.py", line 15, in run_tests
    self.assertEqual(
AssertionError: '&lt;!ENTITY xxe SYSTEM "file:///dev/random" &gt;]&gt;&amp;xxe;' != ']&gt;&amp;xxe;'
- &lt;!ENTITY xxe SYSTEM "file:///dev/random" &gt;]&gt;&amp;xxe;
+ ]&gt;&amp;xxe;
 : Cleaning 'b' <?xml version="1.0"?>\\n  <!DOCTYPE foo [\\n   <!ELEMENT foo ANY >\\n   <!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>\\n'', expected 'b'    ]&gt;&amp;xxe; '' but got 'b'   &lt;!ENTITY xxe SYSTEM "file:///dev/random" &gt;]&gt;&amp;xxe; ''

----------------------------------------------------------------------
Ran 28 tests in 0.041s

FAILED (failures=2)
E: pybuild pybuild:369: test: plugin distutils failed with: exit code=1: cd /build/python-html-sanitizer-1.9.3/.pybuild/cpython3_3.10/build; python3.10 -m unittest discover -v

Build succeeded in January. Notable changes since then:

  • Python upgraded from 3.9.9 to 3.10.4
  • lxml upgraded from 4.6.5 to 4.8.0
  • bs4 upgraded from 4.10.0 to 4.11.1
@matthiask
Copy link
Owner

Thanks! Nice, finally a reproduction. I always wondered why I couldn't reproduce this since the code shouldn't be safe against this.

Annoyingly enough I still cannot reproduce this locally using Python 3.10.4, beautifulsoup4==4.11.1, lxml==4.8.0 (Ubuntu, both native and WSL2)

Do you have any idea which version of libxml2 you're using? It seems I'm using 2.9.12 here. Maybe you find the information here .../lib/python3.10/site-packages/lxml/includes/libxml/xmlversion.h

@jonassmedegaard
Copy link
Author

Do you have any idea which version of libxml2 you're using?

Sure: I build in a clean Debian sid system, so that's currently 2.9.14.

@matthiask
Copy link
Owner

I have been searching whether it would be possible to specify resolve_entities=False somehow and didn't find a way to do it.

The library doesn't really seem to be vulnerable to the billion laughs attack since the diff is only 1177 characters long.

I'm a bit unsure if we should simply trust the protections which have been implemented in libxml2 and lxml and simply remove the offending test. This doesn't seem like a good solution to me though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants