Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for local file ignore_extension #173

Closed
wants to merge 1 commit into from

Conversation

shmuelamar
Copy link

fix for #172 , few notes:

  1. Ill will add tests if all the rest looks fine
  2. i used the ClosingGzipFile class for S3 instead of plain GzipFile. reviewing the python3 code for GzipFile it seems to not affect but its something worth reviewing to see if it affects something (it even might be better). LMK if you want me to use for S3 the plain stdlib GzipFile and ill change the implementation.
  3. it will be nice to add ignore_extension to docs, to find it in the first place I dig into the source.

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @shmuelamar!

Looks good for me, need to add several changes

  • Add test for case when ignore_extension=True
  • (Maybe?) Revert current changes for tests (with explicit ignore_extension=False)
  • Add information about this feature to README.md with example

CC: @mpenkov can you review too please

_, ext = os.path.splitext(filename)
if ext == '.bz2':
return ClosingBZ2File(file_obj, mode)
elif ext == '.gz':
if ext == '.gz':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

@@ -292,29 +292,29 @@ def test_file(self, mock_smart_open):
smart_open_object = smart_open.smart_open(prefix+full_path, read_mode)
smart_open_object.__iter__()
# called with the correct path?
mock_smart_open.assert_called_with(full_path, read_mode, encoding=None, errors='strict')
mock_smart_open.assert_called_with(full_path, read_mode, encoding=None, errors='strict', ignore_extension=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need also add a test for check this feature (where ignore_extension=True)

@menshikh-iv
Copy link
Contributor

ping @shmuelamar, are you plan to finish this PR?

@shmuelamar
Copy link
Author

yes @menshikh-iv ill try finish it this week - got busy week.

@shmuelamar shmuelamar force-pushed the master branch 2 times, most recently from 080fcee to a46e0e8 Compare April 1, 2018 19:47
@shmuelamar
Copy link
Author

shmuelamar commented Apr 1, 2018

@menshikh-iv just finished completing the tests and reverting the if/else change.
also added a README example.

regarding the revert current changes to test (explicit ignore_extension), i think its not quite possible without changing the implementation itself as file_smart_open() accepts no kwargs so i can do something like kwargs.pop('ignore_extension', False).

I'm open to suggestions, please LMK if I should anything else.

thanks :) shmulik

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me (only several questions), @mpenkov please have a look too

@@ -215,7 +219,7 @@ def smart_open(uri, mode="rb", **kw):
raise TypeError('don\'t know how to handle uri %s' % repr(uri))


def s3_open_uri(parsed_uri, mode, **kwargs):
def s3_open_uri(parsed_uri, mode, ignore_extension=False, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ignore_extension be an explicit parameter (or this should be part of kwargs), same question for s3_open_key, @mpenkov thought?

mock_smart_open.assert_called_with(full_path, read_mode, encoding=None, errors='strict')
mock_smart_open.assert_called_with(full_path, read_mode, encoding=None, errors='strict', ignore_extension=False)

@mock.patch('smart_open.smart_open_lib.file_smart_open')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be maybe_.. decorator instead?

@shmuelamar
Copy link
Author

hey @menshikh-iv @mpenkov pls LMK how can I help merge this PR. is there anything else to fix or answer?

@menshikh-iv
Copy link
Contributor

@shmuelamar we'll merge #185 first and after - current PR, don't worry.

@menshikh-iv
Copy link
Contributor

@shmuelamar can you make rebase with current master, please?

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 15, 2018

The current master handles ignore_extension across all interfaces. It's now in one place: https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/smart_open_lib.py#L195

@shmuelamar please have a look there, it should help with your rebase.

@menshikh-iv
Copy link
Contributor

ping @shmuelamar, what's about PR?

@menshikh-iv
Copy link
Contributor

Ping @shmuelamar, are you planning to finish PR?

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 26, 2019

Abandoned.

Furthermore, the current code seems to include the feature mentioned in this PR.

$ python -c 'import smart_open as s;print(s.smart_open("test.txt.gz").read())'
b'hello world\n'
$ python -c 'import smart_open as s;print(s.smart_open("test.txt.gz", ignore_extension=True).read())'
b'\x1f\x8b\x08\x08\xd8\xdbt\\\x00\x03test.txt\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\xe1\x02\x00-;\x08\xaf\x0c\x00\x00\x00'
$

@mpenkov mpenkov closed this Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants