-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Gracefully handle all utf-8 characters in json urls #17933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
99ff08f
to
ec9cc9a
Compare
Codecov Report
@@ Coverage Diff @@
## master #17933 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50114 +1
==========================================
- Hits 45723 45714 -9
- Misses 4390 4400 +10
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17933 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50114 +1
==========================================
- Hits 45723 45714 -9
- Misses 4390 4400 +10
Continue to review full report at Codecov.
|
@@ -187,6 +187,7 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None, | |||
filepath_or_buffer = _stringify_path(filepath_or_buffer) | |||
|
|||
if _is_url(filepath_or_buffer): | |||
filepath_or_buffer = quote(filepath_or_buffer, safe=';/?:@&=+$,') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason the default is not enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if yes, can you expand the test that exercises that (with another case)
@@ -998,6 +998,7 @@ I/O | |||
- Bug in :meth:`DataFrame.to_html` in which there was no validation of the ``justify`` parameter (:issue:`17527`) | |||
- Bug in :func:`HDFStore.select` when reading a contiguous mixed-data table featuring VLArray (:issue:`17021`) | |||
- Bug in :func:`to_json` where several conditions (including objects with unprintable symbols, objects with deep recursion, overlong labels) caused segfaults instead of raising the appropriate exception (:issue:`14256`) | |||
- Bug in :func:`read_json` where all utf-8 characters were not encoded properly when reading json data from a url (:issue:`17918`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is generally true (not just for read_json), so maybe amend to say for urls. If you can add a test for more readers would be great.
@jreback Ok so the more I dig into this issue, the less confident I am that this is a worthwhile change. The issue was raised due to an error raised when a url passed to
we run the risk of not encoding the In addition, Perhaps a better approach would be to add a note to the docs stating that properly encoding the url is the responsibility of the user. I would be happy to open a PR if you think that would be a good idea. |
actually that sounds like a good idea to add to docs and the doc-strings of all the read_* functions that take urls (most of them) |
Please remove me from the list
On Sat, Oct 21, 2017 at 9:07 AM Grant Cooksey ***@***.***> wrote:
- closes #xxxx
- tests added / passed
- passes git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
Url is passed through quote before doing the request. The safe parameter
characters are the reserved character set defined in RFC 2396
<https://www.ietf.org/rfc/rfc2396.txt>(See section 2.2).
------------------------------
You can view, comment on, or merge this pull request online at:
#17933
Commit Summary
- BUG: Gracefully handle all utf-8 characters in urls GH17918
File Changes
- *M* doc/source/whatsnew/v0.21.0.txt
<https://github.com/pandas-dev/pandas/pull/17933/files#diff-0> (1)
- *M* pandas/io/common.py
<https://github.com/pandas-dev/pandas/pull/17933/files#diff-1> (3)
- *M* pandas/tests/io/json/test_pandas.py
<https://github.com/pandas-dev/pandas/pull/17933/files#diff-2> (10)
Patch Links:
- https://github.com/pandas-dev/pandas/pull/17933.patch
- https://github.com/pandas-dev/pandas/pull/17933.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#17933>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJfa3JzCLqUaiLmYH-0Y6N7Ql2b7r_Xks5suZgqgaJpZM4QBdko>
.
--
Greg Gurevich
greggurevich@gmail.com
cell: 1-917-504-7105
|
@jreback Ok I will work on that and close this PR. Thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Url is passed through
quote
before doing the request. Thesafe
parameter characters are the reserved character set defined in RFC 2396(See section 2.2).