R: `write_adbc()` does not seem to work when writing to a Snowflake table #2366

liamnz · 2024-12-13T01:24:54Z

What happened?

I'm using the adbcdrivermanager and adbcsnowflake packages in R and testing out writing a data-frame to a Snowflake table with write_adbc().

write_adbc() returns a NOT_FOUND error from Snowflake.

It looks like the table gets created, but the actual data is not written to the table.

Stack Trace

No response

How can we reproduce the bug?

install.packages('adbcdrivermanager')
#> package 'adbcdrivermanager' successfully unpacked and MD5 sums checked

install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
#> package 'adbcsnowflake' successfully unpacked and MD5 sums checked

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)

con <- adbc_connection_init(db)

execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')

write_adbc(mtcars, con, 'MTCARS')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'MTCARS' does not exist or not authorized.

schema_tables <- as.data.frame(read_adbc(con, 'show tables in schema reprex'))
#> Warning in convert_array_stream(x, to): 1 value(s) may have incurred loss of
#> precision in conversion to double()
schema_tables[, c('name', 'rows')]
#>     name rows
#> 1 MTCARS    0
as.data.frame(read_adbc(con, 'select * from mtcars'))
#>  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
#> <0 rows> (or 0-length row.names)

Created on 2024-12-13 with reprex v2.1.1

Environment/Setup

Windows 11
R 4.4.2
adbcdrivermanager 0.15.0
adbcsnowflake 0.15.0

The text was updated successfully, but these errors were encountered:

paleolimbot · 2024-12-13T16:43:06Z

Thank you for reporting!

Just curious...does a normal "insert" query work? I know very little about Snowflake, but I'm wondering if there just aren't create table/write privileges, or whether the mechanism we use to do the inserting (which I think is a COPY INTO with Parquet) doesn't pass on everything it needs to.

liamnz · 2024-12-14T20:28:18Z

Ah that's interesting that write_adbc() writes Parquet files, I wondered if it might be sending Arrow record batches instead.

Inserting records and manually writing parquet files seem to work fine :)

# install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
# install.packages('arrow')
# install.packages('fs')

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)

con <- adbc_connection_init(db)

execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')

# Inserting works
execute_adbc(con, 'create or replace table x (a int, b int, c int)')
execute_adbc(con, 'insert into x values (1, 2, 3)')
read_adbc(con, 'select * from x') |> as.data.frame()
#>   A B C
#> 1 1 2 3


# COPY INTO with Parquet works
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'create or replace table x (a int, b int, c int)')
parquet_file_path <- fs::file_temp(ext = '.parquet')
parquet_file <- fs::path_file(parquet_file_path)
arrow::write_parquet(x, parquet_file_path)
put_command <- paste0("put 'file://", parquet_file_path,  "' @~")
execute_adbc(con, put_command)
execute_adbc(con, paste0("
      copy into x
      from @~/", parquet_file, "
      file_format = (type = parquet)
      match_by_column_name = case_sensitive
"))
read_adbc(con, 'select * from x') |> as.data.frame()
#>   A B C
#> 1 1 2 3


# write_adbc() doesn't work
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'drop table if exists x')
write_adbc(x, con, 'X')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'X' does not exist or not authorized.


schema_tables <- read_adbc(con, ' show tables in schema reprex') |> as.data.frame()
#> Warning in convert_array_stream(x, to): 1 value(s) may have incurred loss of
#> precision in conversion to double()
schema_tables[, c('name', 'rows')]
#>   name rows
#> 1    X    0
read_adbc(con, 'select * from x') |> as.data.frame()
#> [1] A B C
#> <0 rows> (or 0-length row.names)

Created on 2024-12-15 with reprex v2.1.1

paleolimbot · 2024-12-16T16:10:34Z

Thank you for the investigation! I don't know exactly what queries are being issued here (it may help to have the Go driver print the query alongside the error, or include it as an error detail if it doesn't already), so I'm not sure exactly which one is failing.

@zeroshade Any ideas here?

zeroshade · 2024-12-16T19:22:55Z

@paleolimbot what mode does the R write_adbc use by default? append? create?

In addition, @liamnz can you bear with me and try using a lowercase 'x' in the write_adbc call?

paleolimbot · 2024-12-16T19:28:44Z

what mode does the R write_adbc use by default? append? create?

By default it doesn't set append.mode (should it?):

arrow-adbc/r/adbcdrivermanager/R/helpers.R

Line 129 in 46f7a84

    
           adbc.ingest.mode = if (!identical(mode, "default")) paste0("adbc.ingest.mode.", mode),

zeroshade · 2024-12-16T19:50:06Z

it doesn't have to, the ADBC docs say the default ingestion mode is Create which is what the snowflake driver does. So given that the select * from x is working afterwards, it looks like the create query is running, but something is giving an issue with the actual insertions.

@liamnz Could you look at the monitoring on your snowflake dashboard and look at the actual COPY INTO queries being run and see if anything stands out as problematic there?

liamnz · 2024-12-16T20:12:20Z

@zeroshade lowercase table names yields the same error. Here are the commands that are issued when the write is attempted:

It looks like it attempts a count on the newly created table but perhaps there is too much quoting going on in the IDENTIFIER() function?

https://docs.snowflake.com/en/sql-reference/identifier-literal

liamnz · 2024-12-16T20:21:26Z

The count works outside of write_adbc() though 🫤

# install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)

con <- adbc_connection_init(db)

execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')

x <- data.frame(A = 1, B = 2, C = 3)
write_adbc(x, con, 'X')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'X' does not exist or not authorized.

read_adbc(con, "SELECT COUNT(*) FROM IDENTIFIER('\"X\"')") |> as.data.frame()
#>   COUNT(*)
#> 1        0

Created on 2024-12-17 with reprex v2.1.1

CurtHagenlocher · 2024-12-16T20:21:56Z

Yeah, I think it should be either SELECT COUNT(*) FROM "X" or SELECT COUNT(*) FROM IDENTIFIER('X') and not this mixture of both.

CurtHagenlocher · 2024-12-16T20:23:39Z

The count works outside of write_adbc() though 🫤

Ah, crossed replies -- and, hmm. Are both queries happening on the same connection and/or with the same context (i.e. current database, current schema)?

liamnz · 2024-12-16T20:26:36Z

It looks like the count inside write_adbc() occurs in a different session

liamnz · 2024-12-16T20:40:59Z

Ah I think I see what the problem is now. In my original code, when the connection is first established a schema is not assigned but is created later with a CREATE SCHEMA command. Because the ADBC client is issuing commands over multiple sessions, the commands it issues with the default Snowflake session context in the connection object will fail because a schema isn't set.

When I change the connection so that a schema is assigned at connection time, everything works fine.

# install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE"),
  adbc.snowflake.sql.schema = 'REPREX'
)

con <- adbc_connection_init(db)

x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'drop table if exists x')
write_adbc(x, con, 'X')
read_adbc(con, 'select * from x') |> as.data.frame()
#>   A B C
#> 1 1 2 3

Created on 2024-12-17 with reprex v2.1.1

liamnz · 2024-12-16T20:45:10Z

Do you folks know if this multi-session behaviour is intended for the ADBC client? It looks like it just makes a new session for those count queries, everything else occurs in the original session.

zeroshade · 2024-12-18T16:17:10Z

@liamnz at the moment that behavior is intentional though it's supposed to transfer the context of the current schema/db etc. I guess we must have missed something in that. I'll look into seeing if we can just use the same session instead of a different one.

liamnz · 2024-12-18T21:51:56Z

Okay that's good to know, thanks for your help 😊

liamnz added the Type: bug Something isn't working label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R: `write_adbc()` does not seem to work when writing to a Snowflake table #2366

R: `write_adbc()` does not seem to work when writing to a Snowflake table #2366

liamnz commented Dec 13, 2024

paleolimbot commented Dec 13, 2024

liamnz commented Dec 14, 2024

paleolimbot commented Dec 16, 2024

zeroshade commented Dec 16, 2024

paleolimbot commented Dec 16, 2024

zeroshade commented Dec 16, 2024

liamnz commented Dec 16, 2024 •

edited

Loading

liamnz commented Dec 16, 2024

CurtHagenlocher commented Dec 16, 2024

CurtHagenlocher commented Dec 16, 2024

liamnz commented Dec 16, 2024

liamnz commented Dec 16, 2024

liamnz commented Dec 16, 2024

zeroshade commented Dec 18, 2024

liamnz commented Dec 18, 2024

R: write_adbc() does not seem to work when writing to a Snowflake table #2366

R: write_adbc() does not seem to work when writing to a Snowflake table #2366

Comments

liamnz commented Dec 13, 2024

What happened?

Stack Trace

How can we reproduce the bug?

Environment/Setup

paleolimbot commented Dec 13, 2024

liamnz commented Dec 14, 2024

paleolimbot commented Dec 16, 2024

zeroshade commented Dec 16, 2024

paleolimbot commented Dec 16, 2024

zeroshade commented Dec 16, 2024

liamnz commented Dec 16, 2024 • edited Loading

liamnz commented Dec 16, 2024

CurtHagenlocher commented Dec 16, 2024

CurtHagenlocher commented Dec 16, 2024

liamnz commented Dec 16, 2024

liamnz commented Dec 16, 2024

liamnz commented Dec 16, 2024

zeroshade commented Dec 18, 2024

liamnz commented Dec 18, 2024

R: `write_adbc()` does not seem to work when writing to a Snowflake table #2366

R: `write_adbc()` does not seem to work when writing to a Snowflake table #2366

liamnz commented Dec 16, 2024 •

edited

Loading