Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dummy play in update_db() to help predefine column types in some db drivers #324

Closed
mrcaseb opened this issue May 16, 2022 · 2 comments · Fixed by #325
Closed

Use dummy play in update_db() to help predefine column types in some db drivers #324

mrcaseb opened this issue May 16, 2022 · 2 comments · Fixed by #325

Comments

@mrcaseb
Copy link
Member

mrcaseb commented May 16, 2022

It looks like some db drivers define column types with the very first row in the table.
This can make problems esp. with character columns where the first row is not the longest character string.

We can try to fix this by adding a dummy play at the very first line in update_db() that wraps all data types and character lengths.

That dummy play should be saved on sysdata and removed from the table after the function is done.

https://twitter.com/friscojosh/status/1526288878373507072?s=21&t=oEfim2Cyr45Z7HN3ltXDfQ

@friscojosh
Copy link

friscojosh commented May 16, 2022

Doing a little more testing and it appears the DBI package is a little smarter than just using the first line. It appears (I have not done rigorous testing) it gets the value of the largest string in each column of the tibble passed to it. I assume nflfastr batches things by seasons, passing a new tibble for each, which may be why this does not work. If the values in the n+1 tibble are larger than in tibble n, we'll get an error.

This should not affect the proposed solution.

@mrcaseb
Copy link
Member Author

mrcaseb commented May 17, 2022

Went through the DBI docs a bit. So, when DBI writes a new table, it uses dbDataType() to determine the SQL data type. Every DB backend has to provide a method for this generic. Below shows examples of the output when calling the methods with nflfastR pbp.

pbp <- nflreadr::load_pbp(1999)
DBI::dbDataType(RMariaDB::MariaDB(), pbp) |> head(10)
#>      play_id      game_id  old_game_id    home_team    away_team  season_type 
#>     "DOUBLE"       "TEXT"       "TEXT"       "TEXT"       "TEXT"       "TEXT" 
#>         week      posteam posteam_type      defteam 
#>        "INT"       "TEXT"       "TEXT"       "TEXT"
DBI::dbDataType(RMySQL::MySQL(), pbp) |> head(10)
#> [1] "text"
DBI::dbDataType(RSQLite::SQLite(), pbp) |> head(10)
#>      play_id      game_id  old_game_id    home_team    away_team  season_type 
#>     "DOUBLE"       "TEXT"       "TEXT"       "TEXT"       "TEXT"       "TEXT" 
#>         week      posteam posteam_type      defteam 
#>        "INT"       "TEXT"       "TEXT"       "TEXT"
DBI::dbDataType(RPostgres::Postgres(), pbp) |> head(10)
#>            play_id            game_id        old_game_id          home_team 
#> "DOUBLE PRECISION"             "TEXT"             "TEXT"             "TEXT" 
#>          away_team        season_type               week            posteam 
#>             "TEXT"             "TEXT"          "INTEGER"             "TEXT" 
#>       posteam_type            defteam 
#>             "TEXT"             "TEXT"
DBI::dbDataType(RPostgreSQL::PostgreSQL(), pbp) |> head(10)
#> [1] "text"
DBI::dbDataType(duckdb::duckdb(), pbp) |> head(10)
#>      play_id      game_id  old_game_id    home_team    away_team  season_type 
#>     "DOUBLE"     "STRING"     "STRING"     "STRING"     "STRING"     "STRING" 
#>         week      posteam posteam_type      defteam 
#>    "INTEGER"     "STRING"     "STRING"     "STRING"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants