Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange error with get_resource for a .csv file = EOF within quoted string #11

Open
jamiedtor opened this issue Sep 9, 2021 · 3 comments

Comments

@jamiedtor
Copy link

First, excellent, super useful package. Thanks very much.

Second, I have hit one small snag. When I use get_resource using the following code, the .csv file ends up being parsed incorrectly.

active_building_permits <- search_packages("Active permits") %>% list_package_resources() %>% dplyr::filter(name == "Active permits (CSV)") %>% get_resource()

I have far fewer records than I should and information appears in the wrong columns. I get the following warning:

In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string

I know of similar things happening when directly using read.csv rather than read.table instead of read.csv because of stray quotes in the data, See https://kodlogs.com/33766/in-scan-file-file-what-what-sep-sep-quote-quote-dec-dec-eof-within-quoted-string

But I'm not sure what is happening here.

@sharlagelfand
Copy link
Owner

Hi, thanks for the issue!

I have actually run into this problem myself, with this exact data set! The issue is definitely with the underlying CSV - read.csv doesn't seem to parse it properly, but readr::read_csv() does. Unfortunately right now ckanr (the package that opendatatoronto uses to access the portal) uses read.csv and not readr::read_csv().

I'll open an issue over on ckanr with this - I'm the maintainer on that too so will have a think about how to handle it.

In the meantime, you can access the file more manually by using ckanr functions and reading the CSV yourself - here is some code to do that:

library(opendatatoronto)
library(ckanr)
#> Loading required package: DBI
library(readr)

active_building_permits <- search_packages("Active permits") %>% 
  list_package_resources() %>% dplyr::filter(name == "Active permits (CSV)")

active_building_permits_id <- active_building_permits[["id"]]
  
# Get URL of resource
resource <- resource_show(active_building_permits_id, url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/", as = "list")

# Make a directory to save into
dir <- tempdir()
resource_dir <- fs::dir_create(paste0(dir, "/", active_building_permits_id))

# Save the ZIP file
save_path <- ckan_fetch(resource[["url"]], store = "disk", path = paste0(dir, "/", active_building_permits_id, "/", "res.zip"))

# Unzip it
csv_files <- unzip(save_path[["path"]], exdir = resource_dir)

# Read it in 
res <- read_csv(csv_files)
#> Rows: 246434 Columns: 30
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (16): PERMIT_NUM, REVISION_NUM, PERMIT_TYPE, STRUCTURE_TYPE, WORK, STREE...
#> dbl (13): GEO_ID, APPLICATION_DATE, ISSUED_DATE, DWELLING_UNITS_CREATED, DWE...
#> lgl  (1): COMPLETED_DATE
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dim(res)
#> [1] 246434     30

# Compare to via read.csv()
bad_res <- read.csv(csv_files)
#> Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
#> EOF within quoted string

dim(bad_res)
#> [1] 135718     30

Hope this is helpful in the meantime!

@jamiedtor
Copy link
Author

This works perfectly. Thanks so much for the quick fix (and the great package). Shall I close the issue since the manual code works or do you want me to leave it open as a placeholder to think about?

@sharlagelfand
Copy link
Owner

Great, so glad it worked for you! Let's leave it open as a placeholder - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants