Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

methods for tbl_lazy and connections #150

Open
moodymudskipper opened this issue Apr 22, 2023 · 1 comment
Open

methods for tbl_lazy and connections #150

moodymudskipper opened this issue Apr 22, 2023 · 1 comment
Labels
Milestone

Comments

@moodymudskipper
Copy link
Collaborator

A tbl_lazy is probably called by tbl(con, "my_table") if there is no query, if there is a query I believe the dplyr code is stored somewhere in the object so we might use it, we might also use the sql with something like tbl(con, sql(...)) (IIRC how it works).

so I think constructors are "dplyr" and "sql" (and either way we just take the name of the table if there is no query. Need to use dbplyr::in_schema() too if relevant.

connection objects are trickier.
*1) We might just use con and user is supposed to know how they do it
*2) We construct the connection code, but I'm not sure credentials are stored in the object, and if so we might want to hide them by default, using "password" etc for instance.
*3) We might construct a local DB with the data but then it has to be SQLite and the dbplyr code (and the sql to a greater extent) won't be robust anymore.

method 2) with password is the only one that would produce an equivalent object.

@moodymudskipper moodymudskipper added this to the 0.2 milestone May 28, 2023
@moodymudskipper
Copy link
Collaborator Author

moodymudskipper commented May 29, 2023

library(constructive)
library(dplyr)

con <- DBI::dbConnect(RSQLite::SQLite())

# The connection object is beautifully handled by our S4 support
# we might use directly the DBI::dbConnect(<driver>,) syntax, though it will
# necessarily create a new pointer (and a new env but waldo will consider those equivalent)
construct(con)
#> new(
#>   "SQLiteConnection" |>
#>     structure(package = "RSQLite"),
#>   ptr = constructive::external_pointer("0x10e708290"),
#>   dbname = "",
#>   loadable.extensions = TRUE,
#>   flags = 70L,
#>   vfs = "",
#>   ref = constructive::env("0x10af7ce28", parents = "empty"),
#>   bigint = "integer64",
#>   extended_types = FALSE
#> )

DBI::dbWriteTable(con, "cars", cars)

# Maybe the class of the nested lazy query <lazy_base_remote_query/lazy_base_query/lazy_query> 
# indicates that there is no query as long as unclass(object$lazy_query$x) is a name,
# so we might construct tbl(con, "cars") accurately here
construct(tbl(con, "cars"), data = list(con = con))
#> list(
#>   src = list(con = con, disco = NULL) |>
#>     structure(class = c("src_SQLiteConnection", "src_dbi", "src_sql", "src")),
#>   lazy_query = list(
#>     x = "cars" |>
#>       structure(class = c("ident", "character")),
#>     vars = c("speed", "dist"),
#>     group_vars = character(0),
#>     order_vars = NULL,
#>     frame = NULL
#>   ) |>
#>     structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
#> ) |>
#>   structure(class = c("tbl_SQLiteConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl"))

# Here we see the class of the lazy query change into "<lazy_select_query/lazy_query>"
# the filter clause ends up in object$lazy_quey$where but it's not clear how we could reconstruct
# the query from there
waldo::compare(
  max_diffs = Inf,
  construct(tbl(con, "cars")),
  construct(tbl(con, "cars") |> filter(speed < 10))
)
#> old$code[16:28] vs new$code[16:48]
#>   "  ) |>"
#>   "    structure(class = c(\"src_SQLiteConnection\", \"src_dbi\", \"src_sql\", \"src\")),"
#>   "  lazy_query = list("
#> - "    x = \"cars\" |>"
#> - "      structure(class = c(\"ident\", \"character\")),"
#> - "    vars = c(\"speed\", \"dist\"),"
#> + "    x = list("
#> + "      x = \"cars\" |>"
#> + "        structure(class = c(\"ident\", \"character\")),"
#> + "      vars = c(\"speed\", \"dist\"),"
#> + "      group_vars = character(0),"
#> + "      order_vars = NULL,"
#> + "      frame = NULL"
#> + "    ) |>"
#> + "      structure(class = c(\"lazy_base_remote_query\", \"lazy_base_query\", \"lazy_query\")),"
#> + "    select = tibble::tibble("
#> + "      name = c(\"speed\", \"dist\"),"
#> + "      expr = list(quote(speed), quote(dist)),"
#> + "      group_vars = list(character(0), character(0)),"
#> + "      order_vars = list(NULL, NULL),"
#> + "      frame = list(NULL, NULL),"
#> + "    ),"
#> + "    where = list(rlang::new_quosure(quote(speed < 10))),"
#> + "    group_by = NULL,"
#> + "    order_by = NULL,"
#> + "    distinct = FALSE,"
#> + "    limit = NULL,"
#> + "    select_operation = \"select\","
#> + "    message_summarise = NULL,"
#>   "    group_vars = character(0),"
#>   "    order_vars = NULL,"
#>   "    frame = NULL"
#>   "  ) |>"
#> - "    structure(class = c(\"lazy_base_remote_query\", \"lazy_base_query\", \"lazy_query\"))"
#> + "    structure(class = c(\"lazy_select_query\", \"lazy_query\"))"
#>   ") |>"
#>   "  structure(class = c(\"tbl_SQLiteConnection\", \"tbl_dbi\", \"tbl_sql\", \"tbl_lazy\", \"tbl\"))"

# note that it is again a object
# but lazy_query$x contains the query here, so we might be able to reconstruct this call
# as well
construct(tbl(con,  sql("SELECT *\nFROM `cars`\nWHERE (`speed` < 10.0)")), data = list(con = con))
#> list(
#>   src = list(con = con, disco = NULL) |>
#>     structure(class = c("src_SQLiteConnection", "src_dbi", "src_sql", "src")),
#>   lazy_query = list(
#>     x = "SELECT *\nFROM `cars`\nWHERE (`speed` < 10.0)" |>
#>       structure(class = c("sql", "character")),
#>     vars = c("speed", "dist"),
#>     group_vars = character(0),
#>     order_vars = NULL,
#>     frame = NULL
#>   ) |>
#>     structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
#> ) |>
#>   structure(class = c("tbl_SQLiteConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl"))


# the lazy query is a recursive element, it would be muche asier to follow if the list
# calls were piped.
# note that dbplyr has constructors lazy_query and lazy_*_query that can be used too, we can pipe them by default
# lazy_query is just a list with a class so it's an easy low hanging fruit.
construct(tbl(con, "cars") |> filter(speed < mean(speed)) |> group_by(dist) |> mutate(foo = mean(dist)), data = list(con = con))
#> {constructive} couldn't create code that reproduces perfectly the input
#> ℹ Call `construct_issues()` to inspect the last issues
#> list(
#>   src = list(con = con, disco = NULL) |>
#>     structure(class = c("src_SQLiteConnection", "src_dbi", "src_sql", "src")),
#>   lazy_query = list(
#>     x = list(
#>       x = list(
#>         x = "cars" |>
#>           structure(class = c("ident", "character")),
#>         vars = c("speed", "dist"),
#>         group_vars = character(0),
#>         order_vars = NULL,
#>         frame = NULL
#>       ) |>
#>         structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query")),
#>       select = tibble::tibble(
#>         name = c("speed", "dist", "q06"),
#>         expr = list(
#>           quote(speed),
#>           quote(dist),
#>           rlang::new_quosure(
#>             quote(mean(speed)),
#>             constructive::env("0x104c59390", parents = "namespace:dbplyr")
#>           )
#>         ),
#>         group_vars = list(character(0), character(0), character(0)),
#>         order_vars = list(NULL, NULL, NULL),
#>         frame = list(NULL, NULL, NULL),
#>       ),
#>       where = NULL,
#>       group_by = NULL,
#>       order_by = NULL,
#>       distinct = FALSE,
#>       limit = NULL,
#>       select_operation = "mutate",
#>       message_summarise = NULL,
#>       group_vars = character(0),
#>       order_vars = NULL,
#>       frame = NULL
#>     ) |>
#>       structure(class = c("lazy_select_query", "lazy_query")),
#>     select = tibble::tibble(
#>       name = c("speed", "dist", "foo"),
#>       expr = list(quote(speed), quote(dist), rlang::new_quosure(quote(mean(dist)))),
#>       group_vars = list("dist", "dist", "dist"),
#>       order_vars = list(NULL, NULL, NULL),
#>       frame = list(NULL, NULL, NULL),
#>     ),
#>     where = list(quote(speed < q06)),
#>     group_by = NULL,
#>     order_by = NULL,
#>     distinct = FALSE,
#>     limit = NULL,
#>     select_operation = "select",
#>     message_summarise = NULL,
#>     group_vars = "dist",
#>     order_vars = NULL,
#>     frame = NULL
#>   ) |>
#>     structure(class = c("lazy_select_query", "lazy_query"))
#> ) |>
#>   structure(class = c("tbl_SQLiteConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl"))

# we also have an environment issue above worth investigating
construct_issues()
#> `attr(original$lazy_query$select$expr[[3]], '.Environment')` is <env:global>
#> `attr(recreated$lazy_query$select$expr[[3]], '.Environment')` is <env:0x10af27130>

# actually rebuilding the dbplyr code is probably a very deep rabbit hole, and considering
# the fst evolution of dbplyr the code is likely to not be stable.

Created on 2023-05-29 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant