-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dm_from_con()
gains .names
argument for pattern-based construction of table names in the dm object
#1790
feat: dm_from_con()
gains .names
argument for pattern-based construction of table names in the dm object
#1790
Conversation
Following discussion in #1789, this feature now follows a different approach. Quoting from a comment in that PR:
Regarding possible name clashes: for now, I've kept the warning-based approach implemented in #1789, but we could upgrade this to an error if we think that's more appropriate. |
Could I check what you mean by "user-defined resolution of clashes"? Would we want some kind of interactive process for that? Fwiw, with the "smart default" as described above (implemented in this branch at present), we are guaranteed to avoid clashes, due to SQL naming constraints: table names must be unique within a schema, and schema names must be unique within a database. So it is only possible for a user to encounter clashes, if they have intentionally passed a non-default value to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Another way to disambiguate is to offer name repair via vctrs::vec_as_names()
. If the default is to fail in case of clashes, we could offer an error message (instead of a warning) with a clear remedy.
R/db-helpers.R
Outdated
names_pattern <- if (length(schema) == 1) { | ||
names %||% "{.table}" | ||
} else { | ||
names %||% "{.schema}.{.table}" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be surprising for users of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know this package's audience far better than I do! So I'll ultimately take your advice here. But I'll share why I believe this default isn't too surprising for users with either a very little bit of SQL/database experience, or some slightly broader tidyverse experience:
With (a very little) SQL experience:
- In SQL (and assuming we are working within a single database), entities are usually specified as
schema.table
, e.g.select * from schema.table
- Although it is strongly discouraged when writing SQL directly, it's possible (for at least some versions of SQL) to omit the schema name - in which case, the table is searched for in the user's "default-schema" (or, in each schema in turn along their "search path" in some versions e.g. Postgres)
- The default-schema has a default value (e.g.
dbo
for SQL Server,public
for Postgres), but it is possible for a user to override it via an explicit command (e.g. StackOverflow instructions here for SQL Server) - So with this in mind, I think users fall into one of three categories:
- "Huh, what's a schema?": no problem, they use the default default-schema for whichever database they connect to (i.e. they do not specify
schema =
indm_from_con()
), and can reach all the tables in that schema via a single-part name liketable
- "I only need things in one particular schema": they either intentionally use the default schema, or specify a single non-default schema in
dm_from_con()
- but either way, they are conscious that they are using a single schema of their choice, i.e. they have knowingly chosen their personal default-schema and can therefore use single-part names liketable
- "I need multiple schemas!": they specify multiple schemas in
dm_from_con()
- they are conscious of this decision, and are effectively setting their "search path" for tables. The default naming pattern is guaranteed to give unique two-part names likeschema.table
, but the user can, against best practice, choose to use a different pattern, e.g.{.table}
to generate one-part names. Only the first table on the search path with each name will be kept, with a warning raised about tables not kept, thanks to feat:dm_from_con()
can retrieve multiple schemas, pass a character vector to theschema
argument #1789.
- "Huh, what's a schema?": no problem, they use the default default-schema for whichever database they connect to (i.e. they do not specify
With some broader tidyverse experience:
There are already well-established examples of using glue patterns and "special variables" (for us, .schema
and .table
) for construction of object names elsewhere in the tidyverse - for example:
.names
indplyr::across()
- which has a near-identical "smart default" to ours here, i.e. switches from one-part to two-part names in order to avoid ambiguity when multiple "sources" are involved (schemas in our case, functions inacross()
's case)names_glue
intidyr::pivot_wider()
The first attempt to implement this feature did actually use Re warning/error for name clashes: we could upgrade to an error, but I think this comment from #1789 still applies:
Perhaps there is one additional piece of advice we could provide now: if there is a name clash, it MUST be because the user has overridden the "smart default" for |
dm_from_con()
gains .names
argument for pattern-based construction of table names in the dm object
Thanks! I'm not in love with the automagic, but your arguments all make sense. Now there's a message when the slightly surprising option is chosen, and the argument and behavior is marked experimental. |
N.B. This adds to #1789, and should probably be merged after that PR has been merged.
We could enable name repairing for a
dm
object, in a similar way to how column names can be repaired withintibble::tibble()
.With SQL, we should be able to assume that:
So using a default repair strategy of
"check_unique"
shouldn't actually ever result in failure.In more drastic cases, e.g. if the user passes a custom function to
.name_repair
, we might end up with name clashes - in which case, we fail gracefully with an informative error message.See #1534 for further discussion. If we assume schema/table names do not contain
.
, then the "snakecase" approach fortidy_names
as described there could now be achieved with: