Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread showing HTTP error when read.csv does not #4659

Open
dkgaraujo opened this issue Aug 4, 2020 · 3 comments
Open

fread showing HTTP error when read.csv does not #4659

dkgaraujo opened this issue Aug 4, 2020 · 3 comments
Labels

Comments

@dkgaraujo
Copy link

fread is able to correctly load a csv file that was downloaded to a local folder, but not the same csv file from the web. Function read.csv is able to correctly load the same csv file from the web. The issue was perceived in the following code as seen in the attached PDF, which also contains the session info. Also, I would note that I'm not running this behind any corporate firewall (although I also got the same error when I ran from behind a corporate firewall in another computer.)

fread_bug_report.pdf

@ben-schwen
Copy link
Member

ben-schwen commented Aug 5, 2020

It seems that curl has problems with unencoded urls.

url = "https://github.com/OpportunityInsights/EconomicTracker/raw/main/data/Google Mobility - County - Daily.csv"

# fails
dt = fread(url)
# works
dt = fread(utils::URLencode(url))

df = read.csv(url, nrows = 10)

To not rely on utils::URLencode, one could use

invalid_chars = "[^][!$&'()*+,;=:/?@#ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._~-]"
valid_url = gsub(invalid_chars, "%20\\1", url)
fread(valid_url)

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 5, 2020 via email

@ben-schwen
Copy link
Member

ben-schwen commented Aug 5, 2020

thanks for the investigation... why then does read.csv succeed? does it have such a redirect/is it doing url encoding?

On Wed, Aug 5, 2020, 12:45 PM ben-schwen @.**> wrote: It seems that curl has problems with unencoded urls. url = "https://github.com/OpportunityInsights/EconomicTracker/raw/main/data/Google Mobility - County - Daily.csv" # fails dt = fread(url) # works dt = fread(utils::URLencode(url)) df = read.csv(url, nrows = 10) To not rely on utils, one could use invalid_chars = "[^][!$&'()+,;=:/?@#ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._~-]" valid_url = gsub(invalid_chars, "%20\1", url) fread(valid_url) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4659 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2BA5JN2WVKCDGLKGF6YOLR7GED3ANCNFSM4PUGMHTA .

As far as I understand: read.table (read.csv is just a wrapper function) opens a connection to an url and then proceeds with calling base::scan. This itself calls do_scan from base scan.c. fread in contrary, tries not to read the file directly but downloads a tempfile first.

Furthermore it is also mentioned in the base::connections docs 'Most methods do not percent-encode special characters such as spaces in http:// URLs (see URLencode), but it seems the "wininet" method does.'

It seems like this is another follow-up on #1686

Strange enough it appears that read.csv is working with a not encoded url on Windows but not on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants