-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange error with get_resource for a .csv file = EOF within quoted string #11
Comments
Hi, thanks for the issue! I have actually run into this problem myself, with this exact data set! The issue is definitely with the underlying CSV - I'll open an issue over on In the meantime, you can access the file more manually by using library(opendatatoronto)
library(ckanr)
#> Loading required package: DBI
library(readr)
active_building_permits <- search_packages("Active permits") %>%
list_package_resources() %>% dplyr::filter(name == "Active permits (CSV)")
active_building_permits_id <- active_building_permits[["id"]]
# Get URL of resource
resource <- resource_show(active_building_permits_id, url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/", as = "list")
# Make a directory to save into
dir <- tempdir()
resource_dir <- fs::dir_create(paste0(dir, "/", active_building_permits_id))
# Save the ZIP file
save_path <- ckan_fetch(resource[["url"]], store = "disk", path = paste0(dir, "/", active_building_permits_id, "/", "res.zip"))
# Unzip it
csv_files <- unzip(save_path[["path"]], exdir = resource_dir)
# Read it in
res <- read_csv(csv_files)
#> Rows: 246434 Columns: 30
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (16): PERMIT_NUM, REVISION_NUM, PERMIT_TYPE, STRUCTURE_TYPE, WORK, STREE...
#> dbl (13): GEO_ID, APPLICATION_DATE, ISSUED_DATE, DWELLING_UNITS_CREATED, DWE...
#> lgl (1): COMPLETED_DATE
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dim(res)
#> [1] 246434 30
# Compare to via read.csv()
bad_res <- read.csv(csv_files)
#> Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
#> EOF within quoted string
dim(bad_res)
#> [1] 135718 30 Hope this is helpful in the meantime! |
This works perfectly. Thanks so much for the quick fix (and the great package). Shall I close the issue since the manual code works or do you want me to leave it open as a placeholder to think about? |
Great, so glad it worked for you! Let's leave it open as a placeholder - thanks! |
First, excellent, super useful package. Thanks very much.
Second, I have hit one small snag. When I use get_resource using the following code, the .csv file ends up being parsed incorrectly.
active_building_permits <- search_packages("Active permits") %>% list_package_resources() %>% dplyr::filter(name == "Active permits (CSV)") %>% get_resource()
I have far fewer records than I should and information appears in the wrong columns. I get the following warning:
I know of similar things happening when directly using read.csv rather than read.table instead of read.csv because of stray quotes in the data, See https://kodlogs.com/33766/in-scan-file-file-what-what-sep-sep-quote-quote-dec-dec-eof-within-quoted-string
But I'm not sure what is happening here.
The text was updated successfully, but these errors were encountered: