feat: `dm_from_con()` gains `.names` argument for pattern-based construction of table names in the dm object #1790

owenjonesuob · 2023-02-13T13:43:24Z

N.B. This adds to #1789, and should probably be merged after that PR has been merged.

We could enable name repairing for a dm object, in a similar way to how column names can be repaired within tibble::tibble().

With SQL, we should be able to assume that:

Schema names are unique
Within a schema, table names are unique

So using a default repair strategy of "check_unique" shouldn't actually ever result in failure.

In more drastic cases, e.g. if the user passes a custom function to .name_repair, we might end up with name clashes - in which case, we fail gracefully with an informative error message.

See #1534 for further discussion. If we assume schema/table names do not contain ., then the "snakecase" approach for tidy_names as described there could now be achieved with:

dm_from_con(
  ...,
  .name_repair = ~ snakecase::to_snake_case(.x, sep_in = "[^[:alnum:].]")
)

…epair

owenjonesuob · 2023-03-22T11:29:59Z

Following discussion in #1789, this feature now follows a different approach. Quoting from a comment in that PR:

Add a .names = NULL argument to dm_from_con(), where NULL is a "smart default" like the one used in dplyr::across(). So the user can specify their own pattern, taking advantage of special variables .schema and .table if they would like to - but by default:

When length(schema) == 1, use "{.table}", akin to the "single" default of "{.col}" in across()

Otherwise use "{.schema}.{.table}", akin to the "plural" default of "{.col}_{.fn}" in across(), but adhering more closely to how entities are usually specified in SQL

Regarding possible name clashes: for now, I've kept the warning-based approach implemented in #1789, but we could upgrade this to an error if we think that's more appropriate.

…epair

krlmlr · 2023-06-21T09:35:52Z

Thank you for your patience. Merging the other PR soon, and perhaps #1184 on top of that. Are you open to extending this PR to allow for user-defined resolution of clashes? Note that the internals have changed quite a bit with #1894.

owenjonesuob · 2023-06-22T19:25:07Z

Could I check what you mean by "user-defined resolution of clashes"? Would we want some kind of interactive process for that?

Fwiw, with the "smart default" as described above (implemented in this branch at present), we are guaranteed to avoid clashes, due to SQL naming constraints: table names must be unique within a schema, and schema names must be unique within a database. So it is only possible for a user to encounter clashes, if they have intentionally passed a non-default value to the .names parameter in dm_from_con() - in which case, can we assume that the current warning is sufficient to help them diagnose and fix their custom naming pattern?

krlmlr

Thanks. Another way to disambiguate is to offer name repair via vctrs::vec_as_names() . If the default is to fail in case of clashes, we could offer an error message (instead of a warning) with a clear remedy.

krlmlr · 2023-06-23T05:41:04Z

R/db-helpers.R

+ names_pattern <- if (length(schema) == 1) {
+ names %||% "{.table}"
+ } else {
+ names %||% "{.schema}.{.table}"
+ }


Will this be surprising for users of this function?

You know this package's audience far better than I do! So I'll ultimately take your advice here. But I'll share why I believe this default isn't too surprising for users with either a very little bit of SQL/database experience, or some slightly broader tidyverse experience:

With (a very little) SQL experience:

In SQL (and assuming we are working within a single database), entities are usually specified as schema.table, e.g. select * from schema.table

Although it is strongly discouraged when writing SQL directly, it's possible (for at least some versions of SQL) to omit the schema name - in which case, the table is searched for in the user's "default-schema" (or, in each schema in turn along their "search path" in some versions e.g. Postgres)

The default-schema has a default value (e.g. dbo for SQL Server, public for Postgres), but it is possible for a user to override it via an explicit command (e.g. StackOverflow instructions here for SQL Server)

So with this in mind, I think users fall into one of three categories:

"Huh, what's a schema?": no problem, they use the default default-schema for whichever database they connect to (i.e. they do not specify schema = in dm_from_con()), and can reach all the tables in that schema via a single-part name like table

"I only need things in one particular schema": they either intentionally use the default schema, or specify a single non-default schema in dm_from_con() - but either way, they are conscious that they are using a single schema of their choice, i.e. they have knowingly chosen their personal default-schema and can therefore use single-part names like table

"I need multiple schemas!": they specify multiple schemas in dm_from_con() - they are conscious of this decision, and are effectively setting their "search path" for tables. The default naming pattern is guaranteed to give unique two-part names like schema.table, but the user can, against best practice, choose to use a different pattern, e.g. {.table} to generate one-part names. Only the first table on the search path with each name will be kept, with a warning raised about tables not kept, thanks to feat: dm_from_con() can retrieve multiple schemas, pass a character vector to the schema argument #1789.

With some broader tidyverse experience:

There are already well-established examples of using glue patterns and "special variables" (for us, .schema and .table) for construction of object names elsewhere in the tidyverse - for example:

.names in dplyr::across() - which has a near-identical "smart default" to ours here, i.e. switches from one-part to two-part names in order to avoid ambiguity when multiple "sources" are involved (schemas in our case, functions in across()'s case)

names_glue in tidyr::pivot_wider()

owenjonesuob · 2023-06-23T13:08:15Z

Thanks. Another way to disambiguate is to offer name repair via vctrs::vec_as_names() . If the default is to fail in case of clashes, we could offer an error message (instead of a warning) with a clear remedy.

The first attempt to implement this feature did actually use vctrs::vec_as_names() - see 9d92ebe and preceding commits on this branch if you like! But I think pattern-based name construction is a more intuitive, and more powerful alternative to name repair - unless you think both could coexist somehow?

Re warning/error for name clashes: we could upgrade to an error, but I think this comment from #1789 still applies:

... I think the only advice we can provide is for the user to rename the one of the tables in the database itself, which is probably not feasible. So an error might completely block this functionality for users, whereas an unambiguous warning would allow them to carefully sidestep the problem by making sure schemas is ordered correctly for their use case.

Perhaps there is one additional piece of advice we could provide now: if there is a name clash, it MUST be because the user has overridden the "smart default" for .names in dm_from_con(), as reasoned in a previous comment. So rather than assuming that the user will figure that out, perhaps we could provide a tidyverse-esque hint that they might want to check their .names pattern and adjust it to avoid clashes.

krlmlr · 2023-07-11T23:43:16Z

Thanks! I'm not in love with the automagic, but your arguments all make sense. Now there's a message when the slightly surprising option is chosen, and the argument and behavior is marked experimental.

owenjonesuob added 3 commits February 13, 2023 13:07

Add name repair functionality

0b688c7

Add .name_repair to dm_from_con()

d7392ad

Remove redundant approach

47f468c

owenjonesuob mentioned this pull request Feb 13, 2023

Allow for multiple schemata in filter condition #1534

Closed

Merge branch 'b-1533-multiple-schemas' into f-multiple-schemas-name-r…

9d92ebe

…epair

owenjonesuob mentioned this pull request Mar 21, 2023

feat: dm_from_con() can retrieve multiple schemas, pass a character vector to the schema argument #1789

Merged

owenjonesuob added 5 commits March 22, 2023 10:17

Merge branch 'b-1533-multiple-schemas' into f-multiple-schemas-name-r…

1ecf060

…epair

Replace .name_repair with glue-based naming patterns

1037d90

Add "local_name" to global variables

06b156e

Fix typo in column name

23cceb1

Merge branch 'b-1533-multiple-schemas' into f-multiple-schemas-name-r…

4b28e0b

…epair

owenjonesuob changed the title ~~Add name repair functionality~~ Add pattern-based construction of local names Mar 22, 2023

owenjonesuob added 2 commits March 22, 2023 18:22

Merge branch 'b-1533-multiple-schemas' into f-multiple-schemas-name-r…

cd57deb

…epair

Fix parentheses positions in test

98356a6

krlmlr changed the title ~~Add pattern-based construction of local names~~ feat: Add pattern-based construction of local names Jun 13, 2023

Merge branch 'main' into f-multiple-schemas-name-repair

b6ba837

owenjonesuob added 3 commits June 22, 2023 20:30

Regenerate docs with roxyglobals

eeb562c

Use DBI::Id() in place of DBI::SQL()

220672c

Fix list extraction in tests

87d56ac

krlmlr reviewed Jun 23, 2023

View reviewed changes

krlmlr added 2 commits July 11, 2023 22:36

Explicit assignment, verbosity

ba4580c

Tweak docs

9a51f9c

krlmlr changed the title ~~feat: Add pattern-based construction of local names~~ feat: dm_from_con() gains .names argument for pattern-based construction of table names in the dm object Jul 11, 2023

Don't inform in base case

d6cb0c8

krlmlr merged commit 7f2dc65 into cynkra:main Jul 11, 2023
16 checks passed

owenjonesuob mentioned this pull request Jul 24, 2023

Naming pattern from .names ignored in dm_from_con() #1933

Closed

jesse-smith mentioned this pull request Aug 2, 2023

Error when supplying table_names in dm_from_con() #1936

Open

owenjonesuob mentioned this pull request Aug 7, 2023

fix: Always use get_src_tbl_names() to fetch tbl Ids within dm_from_con() #1934

Draft

owenjonesuob mentioned this pull request May 3, 2024

Naming pattern not used in dm_from_con() when keys are learned successfully #2213

Open

github-actions bot locked as resolved and limited conversation to collaborators Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `dm_from_con()` gains `.names` argument for pattern-based construction of table names in the dm object #1790

feat: `dm_from_con()` gains `.names` argument for pattern-based construction of table names in the dm object #1790

owenjonesuob commented Feb 13, 2023

owenjonesuob commented Mar 22, 2023

krlmlr commented Jun 21, 2023

owenjonesuob commented Jun 22, 2023 •

edited

Loading

krlmlr left a comment

krlmlr Jun 23, 2023

owenjonesuob Jun 23, 2023

owenjonesuob commented Jun 23, 2023

krlmlr commented Jul 11, 2023

feat: dm_from_con() gains .names argument for pattern-based construction of table names in the dm object #1790

feat: dm_from_con() gains .names argument for pattern-based construction of table names in the dm object #1790

Conversation

owenjonesuob commented Feb 13, 2023

owenjonesuob commented Mar 22, 2023

krlmlr commented Jun 21, 2023

owenjonesuob commented Jun 22, 2023 • edited Loading

krlmlr left a comment

Choose a reason for hiding this comment

krlmlr Jun 23, 2023

Choose a reason for hiding this comment

owenjonesuob Jun 23, 2023

Choose a reason for hiding this comment

With (a very little) SQL experience:

With some broader tidyverse experience:

owenjonesuob commented Jun 23, 2023

krlmlr commented Jul 11, 2023

feat: `dm_from_con()` gains `.names` argument for pattern-based construction of table names in the dm object #1790

feat: `dm_from_con()` gains `.names` argument for pattern-based construction of table names in the dm object #1790

owenjonesuob commented Jun 22, 2023 •

edited

Loading