Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaked file object with gzip compression #630

Closed
ampanasiuk opened this issue Jul 29, 2021 · 2 comments · Fixed by #636
Closed

Leaked file object with gzip compression #630

ampanasiuk opened this issue Jul 29, 2021 · 2 comments · Fixed by #636
Labels

Comments

@ampanasiuk
Copy link
Contributor

ampanasiuk commented Jul 29, 2021

Problem description

I am getting a warning that a file object is not closed when doing something like

from smart_open import open


def fun():
    with open("my_file.gz", encoding="utf-8") as f:
        print(f.read())


fun()
gc.collect()

This is because closing a GzipFile instance does not close the underlying stream. I didn't check whether the same applies to bz2.BZ2File and the support for .bz2 in smart_open.

https://github.com/RaRe-Technologies/smart_open/blob/develop/smart_open/compression.py

def _handle_gzip(file_obj, mode):
    import gzip
    return gzip.GzipFile(fileobj=file_obj, mode=mode)

https://docs.python.org/3/library/gzip.html#gzip.GzipFile

Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data. This also allows you to pass an io.BytesIO object opened for writing as fileobj, and retrieve the resulting memory buffer using the io.BytesIO object’s getvalue() method.

@mpenkov mpenkov added the bug label Jul 30, 2021
@mpenkov
Copy link
Collaborator

mpenkov commented Jul 30, 2021

Good catch! Are you interested in making a PR to fix this?

@ampanasiuk
Copy link
Contributor Author

Yea, I'll look at this next week.

ampanasiuk added a commit to ampanasiuk/smart_open that referenced this issue Aug 4, 2021
* Don't leak compressed stream

Transparent compression works by adapting the binary stream with the
GzipFile or BZ2File class. These, however, by design don't close the
adapted stream. Add a matching
CompressionFormatTest.test_closes_compressed_stream test.
ampanasiuk added a commit to ampanasiuk/smart_open that referenced this issue Aug 4, 2021
ampanasiuk added a commit to ampanasiuk/smart_open that referenced this issue Aug 5, 2021
* Don't leak compressed stream

Transparent compression works by adapting the binary stream with the
GzipFile or BZ2File class. These, however, by design don't close the
adapted stream. Add a matching
CompressionFormatTest.test_closes_compressed_stream test.
ampanasiuk added a commit to ampanasiuk/smart_open that referenced this issue Aug 5, 2021
* Don't leak compressed stream

Transparent compression works by adapting the binary stream with the
GzipFile or BZ2File class. These, however, by design don't close the
adapted stream. Add a matching
CompressionFormatTest.test_closes_compressed_stream test.
ampanasiuk added a commit to ampanasiuk/smart_open that referenced this issue Aug 5, 2021
mpenkov added a commit that referenced this issue Aug 18, 2021
* Don't leak compressed stream (#630)

* Don't leak compressed stream

Transparent compression works by adapting the binary stream with the
GzipFile or BZ2File class. These, however, by design don't close the
adapted stream. Add a matching
CompressionFormatTest.test_closes_compressed_stream test.

* fixup! Don't leak compressed stream (#630)

* add docstring, improve readability

* update CHANGELOG.md

Co-authored-by: Michael Penkov <misha.penkov@gmail.com>
Co-authored-by: Michael Penkov <m@penkov.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants