Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rows_upsert.tbl_dbi() #616

Merged
merged 13 commits into from
Sep 26, 2021
Merged

Add rows_upsert.tbl_dbi() #616

merged 13 commits into from
Sep 26, 2021

Conversation

mgirlich
Copy link
Contributor

Together with rows_patch() this would complete the support for the rows_*() functions 😄

  • I only added SQLite, MariaDB and Postgres for now.
  • I used the ON CONFLICT clause. This only works when there also is a unique constraint but - at least for Postgres - has some advantages but also the disadvantage that the primary key sequence is increased.

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

The test failure looks weird, does it succeed locally?

What do we need unique indices for? Can you summarize the advantages of ON CONFLICT -- or why we shouldn't use MERGE here?

R/rows-db.R Show resolved Hide resolved
tests/testthat/test-rows-db.R Outdated Show resolved Hide resolved
@mgirlich
Copy link
Contributor Author

mgirlich commented Sep 16, 2021

What do we need unique indices for? Can you summarize the advantages of ON CONFLICT -- or why we shouldn't use MERGE here?

Upsert Support

According to Wikipedia the following Databases support MERGE:

  • Oracle Database
  • DB2
  • Teradata
  • EXASOL
  • Firebird (with some restrictions according to Wikipedia?)
  • CUBRID
  • H2
  • HSQLDB
  • MS SQL
  • Vectorwise
  • Apache Derby

Alternative in MySQL, SQLite, PostgreSQL

  • MySQL
    • INSERT ... ON DUPLICATE KEY UPDATE -> join on PRIMARY KEY or UNIQUE constraints
    • REPLACE INTO -> attempt insert, if it fails delete the row (if existing) and insert the new one
    • Support for ignoring duplicate keys (insert only new rows)
  • SQLite
    • INSERT OR REPLACE INTO -> similiar to MySQL
  • PostgreSQL
    • INSERT INTO ... ON CONFLICT [ conflict_target ] conflict_action -> similar to MySQL

Other Databases

  • Apache Phoenix supports UPSERT VALUES[12] and UPSERT SELECT[13] syntax.
  • Spark SQL supports UPDATE SET * and INSERT * clauses in actions.[14]
  • Apache Impala supports UPSERT INTO ... SELECT.[15]

Merge

There are a couple of reason why Postgres doesn't support MERGE. One of them is issues with concurrency:

the SQL standard does not require that MERGE be atomic in the sense of atomically providing either an INSERT or UPDATE, and other products do not provide any such guarantees with their MERGE statement.

The biggest issue with INSERT INTO ... ON CONFLICT is that the typical ID field (i.e. an auto incrementing sequence) is increased by one for every row that one tries to insert, i.e. in particular also for existing rows that end up being updated. If you have frequent big upsert where most rows already exists this can blow up your ID quite a big and suddenly it has to be a bigint. Then the fun starts thanks to the terrible support of big integers in R...

Alternative for MySQL, SQLite, PostgreSQL

If there is no unique constraint one can build an upsert query like follows

WITH updated AS
( 
    UPDATE table
        SET ...
    FROM new_values
    WHERE ...
    RETURNING ...
)
INSERT INTO table
SELECT ...
FROM new_values
WHERE NOT EXISTS (
    SELECT 1 
    FROM updated
    WHERE ... -- if there is a primary key one can use it. Otherwise basically the same condition as in the CTE
)

Apart from a longer syntax this has concurrency issues (there are many SO threads on this).

--> What should be supported for these three databases? The official solution with ON CONFLICT? The old solution? Both?

@mgirlich
Copy link
Contributor Author

mgirlich commented Sep 16, 2021

The test failure looks weird, does it succeed locally?

It works when I add Sys.setenv(DM_TEST_SRC = "postgres") to helper-src.R but produces the same error without it. I'll have a look.

EDIT: There is an issue in waldo 0.3.0 which is already fixed in 0.3.1. See news. Though the test still fails but this time with more useful information.

@mgirlich
Copy link
Contributor Author

There seems to be another issue for SQLite with temporary tables and RETURNING

test_sqlite <- function(temporary) {
  con <- RSQLite::dbConnect(RSQLite::SQLite(), ":memory:")
  
  if (temporary) {
    DBI::dbExecute(con, 'CREATE TEMPORARY TABLE test_frame_2021_09_14_11_22_08_18668_10 ("select" integer, "where" varchar(20), "exists" REAL);')
  } else {
    DBI::dbExecute(con, 'CREATE TABLE test_frame_2021_09_14_11_22_08_18668_10 ("select" integer, "where" varchar(20), "exists" REAL);')
  }
  DBI::dbExecute(con, "INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 VALUES (1, 'a', 0.5);")
  DBI::dbExecute(con, "INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 VALUES (2, 'b', 1.5);")
  DBI::dbExecute(con, "INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 VALUES (3, NULL, 2.5);")
  DBI::dbExecute(con, 'CREATE UNIQUE INDEX unique_select_id ON test_frame_2021_09_14_11_22_08_18668_10 ("select");')
  
  DBI::dbExecute(con, 'CREATE TEMPORARY TABLE dbplyr_031 ("select" integer, "where" varchar(20));')
  DBI::dbExecute(con, "INSERT INTO dbplyr_031 VALUES (2, 'x');")
  DBI::dbExecute(con, "INSERT INTO dbplyr_031 VALUES (3, 'y');")
  DBI::dbExecute(con, "INSERT INTO dbplyr_031 VALUES (4, 'z');")
  
  result <- DBI::dbGetQuery(
    con,
    'WITH "...y"("select", "where") AS (
    SELECT *
    FROM "dbplyr_031"
    )
    INSERT INTO test_frame_2021_09_14_11_22_08_18668_10 ("select", "where")
    SELECT * FROM "...y"
    WHERE true
    ON CONFLICT ("select")
    DO UPDATE
    SET "where" = excluded."where"
    RETURNING *;'
  )
  
  print(result)
  
  DBI::dbRemoveTable(con, "test_frame_2021_09_14_11_22_08_18668_10")
}

test_sqlite(FALSE)
#>   select where exists
#> 1      2     x    1.5
#> 2      3     y    2.5
#> 3      4     z     NA
test_sqlite(TRUE)
#>   select where exists
#> 1      4     z     NA

Created on 2021-09-16 by the reprex package (v2.0.1)

@mgirlich
Copy link
Contributor Author

I don't really know why it doesn't work for MS SQL. Could someone please get rid of MS SQL?...

It works fine on sqliteonline

Microsoft SQL Server 2019 (RTM-CU9) (KB5000642) - 15.0.4102.2 (X64)
Jan 25 2021 20:16:12
Copyright (C) 2019 Microsoft Corporation
Express Edition (64-bit) on Linux (CentOS Linux 8)

but not on sqlfiddle

Microsoft SQL Server 2017 (RTM-CU2) (KB4052574) - 14.0.3008.27 (X64)
Nov 16 2017 10:00:49
Copyright (C) 2017 Microsoft Corporation
Express Edition (64-bit) on Linux (Ubuntu 16.04.3 LTS)

So probably it only works with MS SQL 2019.

@krlmlr
Copy link
Collaborator

krlmlr commented Sep 16, 2021

I feel your pain with MSSQL. We keep it as a special case for entertainment.

The tests complain about a semicolon, can we pass it via DBI?

Is there an upstream bug report for SQLite, is it perhaps fixed already?

@mgirlich
Copy link
Contributor Author

To me the syntax looks the same for 2017 and 2019.

@mgirlich
Copy link
Contributor Author

Is there an upstream bug report for SQLite, is it perhaps fixed already?

I haven't created one and in a quick search I haven't seen one.

@krlmlr
Copy link
Collaborator

krlmlr commented Sep 16, 2021

Can we repro using sqlite3 ?

I don't mind installing MSSQL 2019 on GitHub Actions, perhaps in parallel to MSSQL 2017. There is https://github.com/ankane/setup-sqlserver that can help with that, or perhaps install manually.

@mgirlich
Copy link
Contributor Author

Can we repro using sqlite3 ?

See my comment above

I don't mind installing MSSQL 2019 on GitHub Actions, perhaps in parallel to MSSQL 2017. There is https://github.com/ankane/setup-sqlserver that can help with that, or perhaps install manually.

You can if you want to. I don't care so much about MS SQL itself. I mainly implemented it because it was one of the databases with support for MERGE.

@krlmlr
Copy link
Collaborator

krlmlr commented Sep 16, 2021

Yeah, let's skip the test for now with a comment.

@krlmlr krlmlr enabled auto-merge September 16, 2021 15:30
auto-merge was automatically disabled September 17, 2021 06:52

Head branch was pushed to by a user without write access

@mgirlich
Copy link
Contributor Author

@krlmlr Do you know why it fails for MS SQL?

Error (test-dplyr-src.R:34:3): 'copy_to.dm()' works
Error: Error: promise already under evaluation: recursive default argument reference or earlier problems?
Backtrace:

  1. dm:::test_src_frame(!!!mtcars)
  2. dbplyr:::copy_to.src_sql(...)
  3. dbplyr:::db_copy_to.DBIConnection(...)
  4. dbplyr:::create_indexes(con, table, unique_indexes, unique = TRUE)

@mgirlich
Copy link
Contributor Author

The code coverage check fails because

  • currently no tests are run for Maria DB. Do you want to add them to the workflow?
  • upsert tests are skipped for MS SQL -> I remove them from the coverage with #nocov
  • no database in the workflows use sql_rows_upsert.tbl_sql() -> I remove them from the coverage with #nocov

mgirlich and others added 3 commits September 20, 2021 07:10
* test error with DuckDB instead of skipping
* remove `sql_rows_upsert()` for MS SQL from coverage
* remove `sql_rows_upsert()` for `tbl_sql` from coverage
@krlmlr
Copy link
Collaborator

krlmlr commented Sep 25, 2021

I resolved the conflict by keeping both changes. At some point the new _prep() function should be integrated with sql_rows_prep() from #624?

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks like we no longer need sql_rows_update_prep() -- I'll clean up later.

@krlmlr krlmlr merged commit 6fd3206 into cynkra:main Sep 26, 2021
@mgirlich mgirlich deleted the rows_upsert branch September 28, 2021 08:00
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants