-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read from io.StringIO #570
Comments
version 4.0.1 |
I briefly looked into this because one of the tests dealing with it failed. If you want to support this I think the only way is to explicitly check for |
I don't like the idea of an explicit check. It's too narrow. How about a fin.peek(0) to see what's in the actual stream? If it's text, we can't do anything other than return it as is. |
I don't think StringIO has a In general I'm not sure accepting file objects even makes sense. Why would you call smart_open or even the built in open if you already have a file object? I can see the point of having StringIO/BytesIO work for being able to mock something in memory but beyond that I'm not convinced. Supporting StringIO also breaks the concept of smart_open always treating the underlying file as a byte stream which complicates things and may set you up for quite a bit of maintenance pain in the future. |
@markopy You make valid points.
@piskvorky WDYT? What was the motivation of allowing smart_open to accept a file object as input? It breaks symmetry with the built-in function. >>> open(open('setup.py'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper
>>> Perhaps we should do exactly as the built-in option, and accept the above three classes as inputs exclusively? |
I'm not sure, would have to dig through the project history. I like @markopy 's argument, makes sense to keep smart_open's mission clean and contained. |
Just one thing to add: For complete consistency with built in open you also need to accept integer file descriptors as pointed out in #528. |
I can't recall the motivation. I vaguely think some bit of example code or downstream code in Gensim may have needed this behavior, and at the time I may have thought that even Python's Given my minimal comment & your rapid review & integration, we'd probably talked about it somewhere else, but I'm not sure where. If we try removing the passthrough case & running all |
Are you interested in investigating this further @markopy ? |
The change from #32 now resides in There are a whole bunch of tests explicitly for
I find support for this somewhat questionable but I don't have time to investigate why this might exist and whether to remove it again makes sense. The easiest thing to do is to keep the current support for handling binary file objects but not extend it to text streams like StringIO. |
@piskvorky Removing support for open-ing a buffer (as opposed to a string) seems to have no effect on gensim tests. What do you think about removing this functionality? |
I have no chips on the table – if It's a top-level API change though… does it warrant a major version bump? |
Yes, I think it does, even though it doesn't seem like people use this particular feature much. |
The text was updated successfully, but these errors were encountered: