-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix pandas compatibility with Python installations lacking bzip2 headers #53858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR. Can you add a test? |
In principle, yes, but because The current |
Maybe we can just check that |
Do you mean rather than catching the |
I meant for the test. Everything else looks good. I was thinking something like this.
(except for the last line, you would use an assert I think). Does this work? |
Ah, I understand. That sounds like it should work. Where should I put that test so as to not interfere with those that require bz2? |
I think any one of the test files that doesn't already explictly import bz2 with ( |
It looks like that one has a compression test: @pytest.mark.parametrize("encoding", ["utf-16", "utf-32"])
@pytest.mark.parametrize("compression_", ["bz2", "xz"])
def test_warning_missing_utf_bom(self, encoding, compression_):
"""
bz2 and xz do not write the byte order mark (BOM) for utf-16/32.
https://stackoverflow.com/questions/55171439
GH 35681
""" So I guess it would break that test? I'll do the following for a test FYI: ...
sys.modules.pop("bz2", None) # Remove 'bz2' from available modules for testing
import pandas as pd
...
from pandas.compat import get_bz2_file
...
def test_bz2_nonimport():
assert "bz2" not in sys.modules
msg = "bz2 module not available."
with pytest.raises(RuntimeError, match=msg):
get_bz2_file() |
You can try That should probably be safer. |
I think you're missing the first test now, which is to check that bz2 is not in Can you add that test (similar to how you're doing the other one now)? |
Sorry I'm not sure I understand. Wouldn't bz2 always be in sys.modules in this test? We are just setting it to |
Popping bz2 from the modules does not trigger an import error, if that is the point of confusion: [ins] In [1]: import sys
[ins] In [2]: sys.modules.pop('bz2', None)
Out[2]: <module 'bz2' from '/Users/mcranmer/mambaforge/envs/pandas-dev/lib/python3.10/bz2.py'>
[ins] In [3]: sys.modules['bz2']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[3], line 1
----> 1 sys.modules['bz2']
KeyError: 'bz2'
[ins] In [4]: import pandas as pd
[nav] In [5]: sys.modules['bz2']
Out[5]: <module 'bz2' from '/Users/mcranmer/mambaforge/envs/pandas-dev/lib/python3.10/bz2.py'> Only |
To clarify, I meant to check that bz2 is not in To avoid messing with the other tests, you should launch a new Python process like the other test. The way the import error test is done now is fine. |
This is what I showed in the example above. Even after popping it, it reappears in |
This comment was marked as resolved.
This comment was marked as resolved.
No, bz2 is still imported when loading pandas, but now it only happens when bz2 is installed. I did it this way to be consistent with the other optional libraries in pandas (e.g., lzma), but I could change it; let me know. What happens at runtime now is that it checks whether bz2 was loaded or not before importing the BZ2File class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @MilesCranmer |
… headers (pandas-dev#53858) * BUG: Make bz2 import optional * CLN: Create `get_bz2_file` to match `get_lzma_file` * DOC: Add bz2 bugfix to changelog * TST: Test bz2 non-import works * TST: Test bz2 non-import from subprocess * TST: Fix bz2 non-import test * TST: Fix indentation issues in bz2 import test * MAINT: Clean up merge commit * Mark bz2 missing test with `single_cpu`
This simple change makes the
bz2
import optional. Some Python installations do not contain bzip2 headers, so unless you are working with bzip compression, there is no reason for pandas to eagerly load (and require) abz2
import.