Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-statement support #402

Open
mmuurr opened this issue Nov 24, 2022 · 3 comments
Open

Multi-statement support #402

mmuurr opened this issue Nov 24, 2022 · 3 comments

Comments

@mmuurr
Copy link

mmuurr commented Nov 24, 2022

I created an issue (quite a while back!) specifically asking about this with RPostgres, where libpq has functionality to handle multi-statement SQL being sent to the DB server. The recommendation was to bubble this up to DBI, but I found a duplicate DBI issue with essentially the same use case, too: SQL scripts holding multiple statements that we'd like to relay (perhaps after string interpolation) to the DB via R/DBI. The recommendation there was to bubble this to projects like squr or dbr, but neither of those projects seem to be active (the last commits were 3 and 2 years ago, respectively, and neither are available on CRAN).

The general pattern of:

readr::read_file("some-sql-script-template.sql") %>%
  glue::glue_sql(.con = my_con) %>%  ## maybe we do some interpolation here
  DBI::dbMultiExecute(my_con)  ## imaginary DBI method

... still seems very appealing to me and I've written wrappers around this to try to robustly split the SQL script into the separate statements and loop over them, but I can't help but wonder if this should still be available within DBI, especially for the DB back-ends that support multi-statement SQL.

DB back-ends differ on what multi-statement execution returns to the client and how it should be handled (PostgreSQL, for example, simply sends back the standard return of the final statement; MySQL, on the other hand, wants to send back multiple resultsets). For this reason, I think a new optionally-supported execution method, like DBI::dbMultiExecute() could be used, with return-value semantics that align with the traditional DBI::dbExecute() method: simply the last status value sent back by the DB. The risk seems to be poor interpretation of that status, e.g., if there's a failure mid-way, on which statement did the full execution fail? But this also seems (to me) to be a standard 'here-be-dragons' situation with programming, and anyone using multi-statement SQL with these databases outside of R are already accepting this risk. (Experienced folks should probably be wrapping those scripts in transactions anyhow, one way or another.)

This might wind up being a "won't support" feature, but I keep finding myself wanting to adopt the example pattern shown above, and I imagine there are plenty of other folks out there with SQL scripts that want to do the same without needing to tackle some amount of script-parsing on their own, either with-or-without an existing R package to do so. (FWIW, I've been using {reticulate} with the sqlparse Python package to help with this, but this seems like a heavyweight solution to a problem that shouldn't exist -- at least for me, because I use PostgreSQL/libpq most of the time :-)

Curious about your thoughts, but regardless of them, thanks much for the continued DBI work (it's highly-valued)!

@krlmlr
Copy link
Member

krlmlr commented Nov 25, 2022

Thanks. I'm torn on this one. I see two components:

  • splitting a text file/character string into separate SQL statements (expressions?)
  • executing multiple SQL statements/expressions

The former seems hard and somewhat dependent on the SQL dialect, the latter is easy to roll manually with a simple loop.

If we want to solve the first component in DBI or in some backends, we can also help with the second. For the other way around, I'm not convinced.

AFAICT, libpq requires multiple queries to separated by semicolons: https://www.postgresql.org/docs/current/libpq-exec.html . (And we'd need to use PQexec(), but splitting by semicolon is considerably easy with what DBI offers today, assuming it can't be part of a valid single SQL query.) What's the situation with other drivers?

@mmuurr
Copy link
Author

mmuurr commented Dec 9, 2022

I also think splitting a text stream by some delimiter is quite difficult, as it requires re-implementing what would already be done by some of the native SQL servers/drivers. For example, with libpq, my instinct is to let PQexec() handle the multi-statement argument and not have DBI pre-process it in any way.

For SQL systems that don't have native multi-statement support, I think DBI should likewise not permit it. This is the main reason that I'd propose not changing the semantics (or any code) associated with dbSendStatement() or dbExecute() and instead introduce new multi-statement versions (e.g. dbMultiSendStatement()) and let the individual drivers:

  • throw errors when not supported,
  • return a result that conforms to some close-to-agreed-upon semantics when supported.

The main difference I see between PostgreSQL's and MySQL's multi-statement support is:

These can then consistently be represented as a single list (of length-one in PostgreSQL's case) of DBIResult objects. This might make working with methods like dbHasCompleted(), dbGetRowsAffected(), etc. harder, though, so perhaps one needs to make *Multi* versions of nearly all objects and methods?

I see the complexity with how to integrate this with the existing DBI API, but at the same time I see the ease with which one could do this with PostgreSQL in particular, and a very compelling use case of templated SQL scripts (with either glue::glue_sql() or native value interpolation), so I'm also torn. Though that use case is so compelling to me that I find I'm already doing some manual splitting of my scripts and it just feels like this is something that is fragile and unneeded, since libpq can do this better on its own. (FWIW, splitting those scripts into all separate files doesn't really make sense, either, as some complex SQL scripts can be 100s of individual statements, often building temporary tables, creating-then-later-dropping functions, etc.)

@mlin
Copy link

mlin commented May 13, 2023

Adding +1 for this. SQLite supports semicolon-separated statements through sqlite3_exec(). Agree it should be supported only if the underlying db has it natively like that; I wouldn't want to see R code trying to split the SQL correctly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants