-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pathlib monkeypatch with replacement of pathlib.Path.open
#436
Conversation
Looks good to me! Merged. Thank you for your contribution @menshikh-iv !! |
Drop-in replacement of ``pathlib.Path.open`` | ||
-------------------------------------------- | ||
|
||
Now you can natively use ``smart_open.open`` with your ``Path`` objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "now" referring to? @mpenkov let's clean this up a bit. How about:
``smart_open.open`` can also be used with ``Path`` objects. The built-in `Path.open()` is not able to read text from compressed files, so use ``patch_pathlib`` to replace it with `smart_open.open()` instead:
.. code-block:: python
>>> from pathlib import Path
>>> from smart_open.smart_open_lib import patch_pathlib
>>>
>>> patch_pathlib() # replace `Path.open` with `smart_open.open`.
>>>
>>> path = Path("smart_open/tests/test_data/crime-and-punishment.txt.gz")
>>>
>>> with path.open("r") as infile:
... print(infile.readline()[:41])
В начале июля, в чрезвычайно жаркое время
>>> # You can also use the patch as a context manager, to automatically restore the original ``Path.open()`` at the end:
>>> with patch_pathlib():
... with Path("smart_open/tests/test_data/crime-and-punishment.txt.gz").open("r") as infile:
... print(infile.readline()[:41])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with first part, but definitely agains of context-manager usage in this case, because:
your variant
with patch_pathlib():
with Path("smart_open/tests/test_data/crime-and-punishment.txt.gz").open("r") as infile:
print(infile.readline()[:41])
how I'll do that
with smart_open("smart_open/tests/test_data/crime-and-punishment.txt.gz") as infile:
print(infile.readline()[:41])
shorter and simpler (and nothing will change If you replace str
to Path
here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true.
pathlib = sys.modules.get("pathlib", None) | ||
|
||
if not pathlib: | ||
raise RuntimeError("Can't patch 'pathlib.Path.open', you should import 'pathlib' first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not import it ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't do that ourselves, this is bad tone.
BTW - this doesn't help to the user, because the user will not have Path
class available even if we import that ourselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow – imported modules are singletons, it doesn't matter who imported it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, but user has no access to pathlib
itself (without digging into sys.modules
), like
In [1]: def f():
...: import pathlib
...:
In [2]: pathlib
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-7bc182bbe5df> in <module>
----> 1 pathlib
NameError: name 'pathlib' is not defined
I expect that user UNDERSTAND what they do, for this reason, I expect than user firstly import pathlib
and after apply monkeypathing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get it. What's the point of failing with "import pathlib first!", when we can import it ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…plus pathlib
seems to be already pre-imported internally:
>>> import sys
>>> sys.modules.get("pathlib", None)
<module 'pathlib' from '/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pathlib.py'>
So when can this if
condition actually happen? Is it some Python version compatibility thing?
If so, needs a better comment, and a different exception message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my environment:
(smart_open) sergeyich:smart_open misha$ python
Python 3.7.6 (default, Dec 30 2019, 19:38:28)
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules.get('pathlib', 'not there')
'not there'
However, smart_open
already imports pathlib by itself: https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/smart_open_lib.py#L51
So, given that we're already doing it, it's probably simpler to patch it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok @mpenkov, if we do that anyway, I agree.
lines = infile.readlines() | ||
|
||
_patch_pathlib(obj.old_impl) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the context manager? That should be the preferred way of restoring the original, instead of calling _patch_pathlib(obj.old_impl)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added context manager for testing purposes, I see no cases for "real usage" of context manager in this case
Sorry for the late review, I see this was already merged. Needs a little language + API clean up. |
@mpenkov the main |
@menshikh-iv @piskvorky Please see #437, I've incorporated the post-merge comments into there. |
Problem
If you use
pathlib.Path
instead ofstr
in your code, you should use the internalopen
method likeof course, you can pass them to
smart_open
like thisbut this doesn't looks "good enough"
Solution
I implement "mokeypatch" function that replaces
pathlib.Path.open
tosmart_open.open
Example