-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dummy play in update_db() to help predefine column types in some db drivers #324
Comments
Doing a little more testing and it appears the DBI package is a little smarter than just using the first line. It appears (I have not done rigorous testing) it gets the value of the largest string in each column of the tibble passed to it. I assume nflfastr batches things by seasons, passing a new tibble for each, which may be why this does not work. If the values in the n+1 tibble are larger than in tibble n, we'll get an error. This should not affect the proposed solution. |
Went through the DBI docs a bit. So, when DBI writes a new table, it uses pbp <- nflreadr::load_pbp(1999)
DBI::dbDataType(RMariaDB::MariaDB(), pbp) |> head(10)
#> play_id game_id old_game_id home_team away_team season_type
#> "DOUBLE" "TEXT" "TEXT" "TEXT" "TEXT" "TEXT"
#> week posteam posteam_type defteam
#> "INT" "TEXT" "TEXT" "TEXT"
DBI::dbDataType(RMySQL::MySQL(), pbp) |> head(10)
#> [1] "text"
DBI::dbDataType(RSQLite::SQLite(), pbp) |> head(10)
#> play_id game_id old_game_id home_team away_team season_type
#> "DOUBLE" "TEXT" "TEXT" "TEXT" "TEXT" "TEXT"
#> week posteam posteam_type defteam
#> "INT" "TEXT" "TEXT" "TEXT"
DBI::dbDataType(RPostgres::Postgres(), pbp) |> head(10)
#> play_id game_id old_game_id home_team
#> "DOUBLE PRECISION" "TEXT" "TEXT" "TEXT"
#> away_team season_type week posteam
#> "TEXT" "TEXT" "INTEGER" "TEXT"
#> posteam_type defteam
#> "TEXT" "TEXT"
DBI::dbDataType(RPostgreSQL::PostgreSQL(), pbp) |> head(10)
#> [1] "text"
DBI::dbDataType(duckdb::duckdb(), pbp) |> head(10)
#> play_id game_id old_game_id home_team away_team season_type
#> "DOUBLE" "STRING" "STRING" "STRING" "STRING" "STRING"
#> week posteam posteam_type defteam
#> "INTEGER" "STRING" "STRING" "STRING" |
It looks like some db drivers define column types with the very first row in the table.
This can make problems esp. with character columns where the first row is not the longest character string.
We can try to fix this by adding a dummy play at the very first line in update_db() that wraps all data types and character lengths.
That dummy play should be saved on sysdata and removed from the table after the function is done.
https://twitter.com/friscojosh/status/1526288878373507072?s=21&t=oEfim2Cyr45Z7HN3ltXDfQ
The text was updated successfully, but these errors were encountered: