-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_csv fails some http servers if port number is specified #17019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@skynss : Why can't we just set the header ourselves if that's the problem? It seems like your code in the example could address that. |
@gfyoung I recommend not fixing it for now because
|
@skynss : Fair enough. We still have to merge your original PR for user-auth in the first place 😄 |
What's the status of this? I'm getting an error when trying to pd_read_csv from an URL with a non-standard port in it. |
it’s an open issue, pull requests are always welcome |
Headers can be modified now by using |
xref #16716
Code Sample, a copy-pastable example if possible
Problem description
The problem is in atleast one version of web server,
urlopen
and thereforepandas.read_csv
fails when ahttp://<fqdn>:<port>
is specified, even if it is default port 80. However, instead ofurlopen
thepython-requests
library is utilized, same url works. The issue isrequests
sets headerHost : fqdn
, as compared tourlopen
sets header toHost : fqdn:port
. While urlopen is still adhering to http RFC , requests \Firefox\chrome\IE\Curl all work with all urls. So possibly, pandas user would wonder why pandas returns code404
The question is how big of an issue is this? I dont know. So I cannot immediately recommend this be fixed. But we should watch out of similar issues in future and then, either consider modifying host header or consider using requests library.Expected Output
A dataframe should be read.
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: