Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove backtracking from parsing tarfile members and headers #121285

Closed
sethmlarson opened this issue Jul 2, 2024 · 5 comments
Closed

Remove backtracking from parsing tarfile members and headers #121285

sethmlarson opened this issue Jul 2, 2024 · 5 comments
Assignees
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.13 bugs and security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@sethmlarson
Copy link
Contributor

sethmlarson commented Jul 2, 2024

Bug description:

Today the tarfile module parsing of header values allows for backtracking when parsing header values. Headers have a well-known format that doesn't require backtracking to parse reliably, the new method of parsing will only require a single pass over a byte stream.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@sethmlarson sethmlarson added the type-security A security issue label Jul 2, 2024
gpshead added a commit that referenced this issue Aug 31, 2024
* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 31, 2024
…thonGH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 31, 2024
…thonGH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@gpshead
Copy link
Member

gpshead commented Aug 31, 2024

main version merged. I'll let you work out any non-automatic backports.

gpshead added a commit that referenced this issue Aug 31, 2024
…H-121286) (GH-123543)

gh-121285: Remove backtracking when parsing tarfile headers (GH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@picnixz picnixz added stdlib Python modules in the Lib dir 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.8 (EOL) end of life 3.13 bugs and security fixes labels Sep 1, 2024
@picnixz
Copy link
Member

picnixz commented Sep 1, 2024

(I just added the labels that still require the backports and you could remove them one by one when you're done)

@gpshead gpshead moved this to In Progress in Tarfile issues Sep 1, 2024
Yhg1s pushed a commit that referenced this issue Sep 2, 2024
…H-121286) (#123542)

gh-121285: Remove backtracking when parsing tarfile headers (GH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
sethmlarson added a commit to sethmlarson/cpython that referenced this issue Sep 3, 2024
…ers (pythonGH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
sethmlarson added a commit to sethmlarson/cpython that referenced this issue Sep 3, 2024
…ers (pythonGH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
sethmlarson added a commit to sethmlarson/cpython that referenced this issue Sep 3, 2024
…rs (pythonGH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
sethmlarson added a commit to sethmlarson/cpython that referenced this issue Sep 3, 2024
…rs (pythonGH-121286)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
pablogsal pushed a commit that referenced this issue Sep 3, 2024
…H-121286) (#123639)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
pablogsal pushed a commit that referenced this issue Sep 3, 2024
…H-121286) (#123640)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@frenzymadness
Copy link
Contributor

The CVE description contains:

Regular expressions that allowed excessive backtracking during tarfile.TarFile header parsing are vulnerable to ReDoS via specifically-crafted tar archives.

but the test added to verify the fix does not cause/test the ReDoS. I mean, when you run the test with unpatched Python, it raises tarfile.ReadError as well just with a different message, for example:

tarfile.ReadError: file could not be opened successfully:
- method gz: ReadError('not a gzip file')
- method bz2: ReadError('not a bzip2 file')
- method xz: ReadError('not an lzma file')
- method tar: ReadError('bad checksum')

Could you please help me understand how one of the regexes (fox example: br"(\d+) ([^=]+)=") can cause a regex backtracking? There is no combination of greedy nested quatifiers, greedy quantifiers are limited by the character class and I don't see a need for the regex algorithm to go back and consider another path if the match is not found.

@sethmlarson
Copy link
Contributor Author

@frenzymadness Apologies for the confusion, not all of the regular expressions in the PR have issues with backtracking. The regex you mention was replaced to process the entire tar header block in one pass rather than multiple passes.

ambv pushed a commit that referenced this issue Sep 4, 2024
…-121286) (#123642)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
ambv pushed a commit that referenced this issue Sep 4, 2024
…-121286) (#123641)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@ambv ambv closed this as completed Sep 4, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Tarfile issues Sep 4, 2024
frenzymadness pushed a commit to frenzymadness/cpython that referenced this issue Sep 5, 2024
…e headers (pythonGH-121286) (python#123642)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
frenzymadness added a commit to frenzymadness/cpython that referenced this issue Sep 5, 2024
…e headers (pythonGH-121286) (python#123642)

* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Lumír Balhar <lbalhar@redhat.com>
@mcepl
Copy link
Contributor

mcepl commented Sep 18, 2024

Patch for Python 3.8 applies more or less painlessly on the 3.6 branch as well.

mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Oct 9, 2024
* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

(cherry picked from commit 34ddb64)

Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Fixes: bsc#1230227 (CVE-2024-6232)
Fixes: gh#python#121285
From-PR: gh#python/cpython!123642
Patch: CVE-2024-6232-ReDOS-backtrack-tarfile.patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.13 bugs and security fixes stdlib Python modules in the Lib dir type-security A security issue
Projects
Status: Done
Development

No branches or pull requests

6 participants