-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rows_upsert.tbl_dbi()
#616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good.
The test failure looks weird, does it succeed locally?
What do we need unique indices for? Can you summarize the advantages of ON CONFLICT
-- or why we shouldn't use MERGE
here?
Upsert SupportAccording to Wikipedia the following Databases support MERGE:
Alternative in MySQL, SQLite, PostgreSQL
Other Databases
MergeThere are a couple of reason why Postgres doesn't support MERGE. One of them is issues with concurrency:
The biggest issue with INSERT INTO ... ON CONFLICT is that the typical ID field (i.e. an auto incrementing sequence) is increased by one for every row that one tries to insert, i.e. in particular also for existing rows that end up being updated. If you have frequent big upsert where most rows already exists this can blow up your ID quite a big and suddenly it has to be a bigint. Then the fun starts thanks to the terrible support of big integers in R... Alternative for MySQL, SQLite, PostgreSQLIf there is no unique constraint one can build an upsert query like follows WITH updated AS
(
UPDATE table
SET ...
FROM new_values
WHERE ...
RETURNING ...
)
INSERT INTO table
SELECT ...
FROM new_values
WHERE NOT EXISTS (
SELECT 1
FROM updated
WHERE ... -- if there is a primary key one can use it. Otherwise basically the same condition as in the CTE
) Apart from a longer syntax this has concurrency issues (there are many SO threads on this). --> What should be supported for these three databases? The official solution with ON CONFLICT? The old solution? Both? |
It works when I add EDIT: There is an issue in waldo 0.3.0 which is already fixed in 0.3.1. See news. Though the test still fails but this time with more useful information. |
There seems to be another issue for SQLite with temporary tables and RETURNING test_sqlite <- function(temporary) {
con <- RSQLite::dbConnect(RSQLite::SQLite(), ":memory:")
if (temporary) {
DBI::dbExecute(con, 'CREATE TEMPORARY TABLE test_frame_2021_09_14_11_22_08_18668_10 ("select" integer, "where" varchar(20), "exists" REAL);')
} else {
DBI::dbExecute(con, 'CREATE TABLE test_frame_2021_09_14_11_22_08_18668_10 ("select" integer, "where" varchar(20), "exists" REAL);')
}
DBI::dbExecute(con, "INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 VALUES (1, 'a', 0.5);")
DBI::dbExecute(con, "INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 VALUES (2, 'b', 1.5);")
DBI::dbExecute(con, "INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 VALUES (3, NULL, 2.5);")
DBI::dbExecute(con, 'CREATE UNIQUE INDEX unique_select_id ON test_frame_2021_09_14_11_22_08_18668_10 ("select");')
DBI::dbExecute(con, 'CREATE TEMPORARY TABLE dbplyr_031 ("select" integer, "where" varchar(20));')
DBI::dbExecute(con, "INSERT INTO dbplyr_031 VALUES (2, 'x');")
DBI::dbExecute(con, "INSERT INTO dbplyr_031 VALUES (3, 'y');")
DBI::dbExecute(con, "INSERT INTO dbplyr_031 VALUES (4, 'z');")
result <- DBI::dbGetQuery(
con,
'WITH "...y"("select", "where") AS (
SELECT *
FROM "dbplyr_031"
)
INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 ("select", "where")
SELECT * FROM "...y"
WHERE true
ON CONFLICT ("select")
DO UPDATE
SET "where" = excluded."where"
RETURNING *;'
)
print(result)
DBI::dbRemoveTable(con, "test_frame_2021_09_14_11_22_08_18668_10")
}
test_sqlite(FALSE)
#> select where exists
#> 1 2 x 1.5
#> 2 3 y 2.5
#> 3 4 z NA
test_sqlite(TRUE)
#> select where exists
#> 1 4 z NA Created on 2021-09-16 by the reprex package (v2.0.1) |
Conflicts: R/rows-db.R man/rows-db.Rd tests/testthat/_snaps/rows-db.md tests/testthat/test-rows-db.R
Conflicts: NAMESPACE
I don't really know why it doesn't work for MS SQL. Could someone please get rid of MS SQL?... It works fine on sqliteonline Microsoft SQL Server 2019 (RTM-CU9) (KB5000642) - 15.0.4102.2 (X64) but not on sqlfiddle Microsoft SQL Server 2017 (RTM-CU2) (KB4052574) - 14.0.3008.27 (X64) So probably it only works with MS SQL 2019. |
I feel your pain with MSSQL. We keep it as a special case for entertainment. The tests complain about a semicolon, can we pass it via DBI? Is there an upstream bug report for SQLite, is it perhaps fixed already? |
I haven't created one and in a quick search I haven't seen one. |
Can we repro using I don't mind installing MSSQL 2019 on GitHub Actions, perhaps in parallel to MSSQL 2017. There is https://github.com/ankane/setup-sqlserver that can help with that, or perhaps install manually. |
See my comment above
You can if you want to. I don't care so much about MS SQL itself. I mainly implemented it because it was one of the databases with support for MERGE. |
Yeah, let's skip the test for now with a comment. |
Head branch was pushed to by a user without write access
@krlmlr Do you know why it fails for MS SQL? Error (test-dplyr-src.R:34:3): 'copy_to.dm()' works
|
The code coverage check fails because
|
* test error with DuckDB instead of skipping * remove `sql_rows_upsert()` for MS SQL from coverage * remove `sql_rows_upsert()` for `tbl_sql` from coverage
I resolved the conflict by keeping both changes. At some point the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks like we no longer need sql_rows_update_prep()
-- I'll clean up later.
Together with
rows_patch()
this would complete the support for therows_*()
functions 😄