Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_datasets only grabs first 50 results #141

Closed
combinatorist opened this issue Feb 3, 2017 · 8 comments
Closed

list_datasets only grabs first 50 results #141

combinatorist opened this issue Feb 3, 2017 · 8 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@combinatorist
Copy link

This is because the Google Big Query API only lists 50 by default. You can override this if you pass a maxResults argument or set all to True.

https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list#parameters

Please pass one or both of these arguments via bigrquery.list_datasets.

@combinatorist
Copy link
Author

I'm happy to provide more context to this issue, but I couldn't find a default issue template or anything in your README.md.

Adding @meridithperatikos to help watch this issue.

@meridithperatikos
Copy link

Relates to #108.

@combinatorist
Copy link
Author

Just to be explicit: #108 refers to list_tables (within a given dataset), but it has an analogous problem, so they could easily be fixed together.

@Alexander-McLean
Copy link

Here's a quick fix, in case anyone else has the same issue:

list_datasets <- function(project) { assert_that(is.string(project)) url <- sprintf("projects/%s/datasets", Project) query <- list() query$maxResults <- 999999 data <- bigrquery:::bq_get(url, query = query)$datasets unlist(lapply(data, function(x) x$datasetReference$datasetId)) }

@combinatorist
Copy link
Author

@Alexander-McLean, I'm assuming you meant this? (use triple quotes "```" for multi-line code)

list_datasets <- function(project) { 
  assert_that(is.string(project)) 
  url <- sprintf("projects/%s/datasets", Project) 
  query <- list() 
  query$maxResults <- 999999 
  data <- bigrquery:::bq_get(url, query = query)$datasets 
  unlist(lapply(data, function(x) x$datasetReference$datasetId)) 
}

@combinatorist
Copy link
Author

Thanks, @Alexander-McLean!

I got it to run with the following tweaks:

  • commented out assert_that (not installed for me)
  • lower cased the "Project" variable - proper case variable wasn't defined (at least in the scope of the function)
> list_datasets <- function(project) { 
+ #    assert_that(is.string(project)) 
+     url <- sprintf("projects/%s/datasets", project) 
+     query <- list() 
+     query$maxResults <- 999999 
+     data <- bigrquery:::bq_get(url, query = query)$datasets 
+     unlist(lapply(data, function(x) x$datasetReference$datasetId)) 
+ }
> 
> 
> list_datasets('bigquery-public-data')
 [1] "baseball"                "bls"                     "cloud_storage_geo_index"
 [4] "common_eu"               "common_us"               "fec"                    
 [7] "ghcn_d"                  "ghcn_m"                  "github_repos"           
[10] "hacker_news"             "irs_990"                 "medicare"               
[13] "new_york"                "noaa_gsod"               "open_images"            
[16] "samples"                 "san_francisco"           "stackoverflow"          
[19] "usa_names"              
> 

Notice, this public project doesn't have enough datasets to prove we're breaking over the 50, but I ran it on the private project where we noticed the problem and it successfully retrieved 178 datasets.

@combinatorist
Copy link
Author

combinatorist commented Mar 16, 2017

@Alexander-McLean, I'm not really familiar with R. I just tried adapting the same approach to list_tables (#108), but it didn't work (for me). Any ideas?

> list_tables <- function(dataset) { 
+     # assert_that(is.string(dataset)) 
+     url <- sprintf("dataset/%s/tables", dataset) 
+     query <- list() 
+     query$maxResults <- 999999 
+     data <- bigrquery:::bq_get(url, query = query)$tables 
+     unlist(lapply(data, function(x) x$tableReference$tableId)) 
+ }
> 
> 
> list_tables('bigquery-public-data:baseball')
 Hide Traceback
 
 Rerun with Debug
 Error: HTTP error [404] Not Found 
4.
stop("HTTP error [", req$status, "] ", out, call. = FALSE) 
3.
process_request(req) 
2.
bigrquery:::bq_get(url, query = query) 
1.
list_tables("bigquery-public-data:baseball") 
> 
> 
> list_tables('baseball')
 Hide Traceback
 
 Rerun with Debug
 Error: HTTP error [404] Not Found 
4.
stop("HTTP error [", req$status, "] ", out, call. = FALSE) 
3.
process_request(req) 
2.
bigrquery:::bq_get(url, query = query) 
1.
list_tables("baseball") 
> 
> 

@hadley hadley added the bug an unexpected problem or unintended behavior label Apr 18, 2017
@hadley hadley closed this as completed in 4dbc2ee Apr 19, 2017
@combinatorist
Copy link
Author

Thanks, @hadley!

Zsedo pushed a commit to Zsedo/bigrquery that referenced this issue Jun 26, 2017
Fixes r-dbi#141

And fix a bunch of R CMD check problems :(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants