Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: write_adbc() does not seem to work when writing to a Snowflake table #2366

Open
liamnz opened this issue Dec 13, 2024 · 15 comments
Open

R: write_adbc() does not seem to work when writing to a Snowflake table #2366

liamnz opened this issue Dec 13, 2024 · 15 comments
Labels
Type: bug Something isn't working

Comments

@liamnz
Copy link

liamnz commented Dec 13, 2024

What happened?

I'm using the adbcdrivermanager and adbcsnowflake packages in R and testing out writing a data-frame to a Snowflake table with write_adbc().

write_adbc() returns a NOT_FOUND error from Snowflake.

It looks like the table gets created, but the actual data is not written to the table.

Stack Trace

No response

How can we reproduce the bug?

install.packages('adbcdrivermanager')
#> package 'adbcdrivermanager' successfully unpacked and MD5 sums checked

install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
#> package 'adbcsnowflake' successfully unpacked and MD5 sums checked

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)

con <- adbc_connection_init(db)

execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')

write_adbc(mtcars, con, 'MTCARS')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'MTCARS' does not exist or not authorized.

schema_tables <- as.data.frame(read_adbc(con, 'show tables in schema reprex'))
#> Warning in convert_array_stream(x, to): 1 value(s) may have incurred loss of
#> precision in conversion to double()
schema_tables[, c('name', 'rows')]
#>     name rows
#> 1 MTCARS    0
as.data.frame(read_adbc(con, 'select * from mtcars'))
#>  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
#> <0 rows> (or 0-length row.names)

Created on 2024-12-13 with reprex v2.1.1

Environment/Setup

Windows 11
R 4.4.2
adbcdrivermanager 0.15.0
adbcsnowflake 0.15.0

@liamnz liamnz added the Type: bug Something isn't working label Dec 13, 2024
@paleolimbot
Copy link
Member

Thank you for reporting!

Just curious...does a normal "insert" query work? I know very little about Snowflake, but I'm wondering if there just aren't create table/write privileges, or whether the mechanism we use to do the inserting (which I think is a COPY INTO with Parquet) doesn't pass on everything it needs to.

@liamnz
Copy link
Author

liamnz commented Dec 14, 2024

Ah that's interesting that write_adbc() writes Parquet files, I wondered if it might be sending Arrow record batches instead.

Inserting records and manually writing parquet files seem to work fine :)

# install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
# install.packages('arrow')
# install.packages('fs')

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)

con <- adbc_connection_init(db)

execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')

# Inserting works
execute_adbc(con, 'create or replace table x (a int, b int, c int)')
execute_adbc(con, 'insert into x values (1, 2, 3)')
read_adbc(con, 'select * from x') |> as.data.frame()
#>   A B C
#> 1 1 2 3


# COPY INTO with Parquet works
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'create or replace table x (a int, b int, c int)')
parquet_file_path <- fs::file_temp(ext = '.parquet')
parquet_file <- fs::path_file(parquet_file_path)
arrow::write_parquet(x, parquet_file_path)
put_command <- paste0("put 'file://", parquet_file_path,  "' @~")
execute_adbc(con, put_command)
execute_adbc(con, paste0("
      copy into x
      from @~/", parquet_file, "
      file_format = (type = parquet)
      match_by_column_name = case_sensitive
"))
read_adbc(con, 'select * from x') |> as.data.frame()
#>   A B C
#> 1 1 2 3


# write_adbc() doesn't work
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'drop table if exists x')
write_adbc(x, con, 'X')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'X' does not exist or not authorized.


schema_tables <- read_adbc(con, ' show tables in schema reprex') |> as.data.frame()
#> Warning in convert_array_stream(x, to): 1 value(s) may have incurred loss of
#> precision in conversion to double()
schema_tables[, c('name', 'rows')]
#>   name rows
#> 1    X    0
read_adbc(con, 'select * from x') |> as.data.frame()
#> [1] A B C
#> <0 rows> (or 0-length row.names)

Created on 2024-12-15 with reprex v2.1.1

@paleolimbot
Copy link
Member

Thank you for the investigation! I don't know exactly what queries are being issued here (it may help to have the Go driver print the query alongside the error, or include it as an error detail if it doesn't already), so I'm not sure exactly which one is failing.

@zeroshade Any ideas here?

@zeroshade
Copy link
Member

@paleolimbot what mode does the R write_adbc use by default? append? create?

In addition, @liamnz can you bear with me and try using a lowercase 'x' in the write_adbc call?

@paleolimbot
Copy link
Member

what mode does the R write_adbc use by default? append? create?

By default it doesn't set append.mode (should it?):

adbc.ingest.mode = if (!identical(mode, "default")) paste0("adbc.ingest.mode.", mode),

@zeroshade
Copy link
Member

it doesn't have to, the ADBC docs say the default ingestion mode is Create which is what the snowflake driver does. So given that the select * from x is working afterwards, it looks like the create query is running, but something is giving an issue with the actual insertions.

@liamnz Could you look at the monitoring on your snowflake dashboard and look at the actual COPY INTO queries being run and see if anything stands out as problematic there?

@liamnz
Copy link
Author

liamnz commented Dec 16, 2024

@zeroshade lowercase table names yields the same error. Here are the commands that are issued when the write is attempted:

image

It looks like it attempts a count on the newly created table but perhaps there is too much quoting going on in the IDENTIFIER() function?

https://docs.snowflake.com/en/sql-reference/identifier-literal

@liamnz
Copy link
Author

liamnz commented Dec 16, 2024

The count works outside of write_adbc() though 🫤

# install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)

con <- adbc_connection_init(db)

execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')

x <- data.frame(A = 1, B = 2, C = 3)
write_adbc(x, con, 'X')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'X' does not exist or not authorized.

read_adbc(con, "SELECT COUNT(*) FROM IDENTIFIER('\"X\"')") |> as.data.frame()
#>   COUNT(*)
#> 1        0

Created on 2024-12-17 with reprex v2.1.1

@CurtHagenlocher
Copy link
Contributor

Yeah, I think it should be either SELECT COUNT(*) FROM "X" or SELECT COUNT(*) FROM IDENTIFIER('X') and not this mixture of both.

@CurtHagenlocher
Copy link
Contributor

The count works outside of write_adbc() though 🫤

Ah, crossed replies -- and, hmm. Are both queries happening on the same connection and/or with the same context (i.e. current database, current schema)?

@liamnz
Copy link
Author

liamnz commented Dec 16, 2024

It looks like the count inside write_adbc() occurs in a different session

image

@liamnz
Copy link
Author

liamnz commented Dec 16, 2024

Ah I think I see what the problem is now. In my original code, when the connection is first established a schema is not assigned but is created later with a CREATE SCHEMA command. Because the ADBC client is issuing commands over multiple sessions, the commands it issues with the default Snowflake session context in the connection object will fail because a schema isn't set.

When I change the connection so that a schema is assigned at connection time, everything works fine.

# install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")

library(adbcdrivermanager)
library(adbcsnowflake)

db <- adbc_database_init(
  adbcsnowflake(),
  username = Sys.getenv('SNOWFLAKE_USER'),
  adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
  adbc.snowflake.sql.uri.protocol = 'https',
  adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
  adbc.snowflake.sql.uri.port = '443',
  adbc.snowflake.sql.auth_type = 'auth_ext_browser',
  adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
  adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
  adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE"),
  adbc.snowflake.sql.schema = 'REPREX'
)

con <- adbc_connection_init(db)

x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'drop table if exists x')
write_adbc(x, con, 'X')
read_adbc(con, 'select * from x') |> as.data.frame()
#>   A B C
#> 1 1 2 3

Created on 2024-12-17 with reprex v2.1.1

image

@liamnz
Copy link
Author

liamnz commented Dec 16, 2024

Do you folks know if this multi-session behaviour is intended for the ADBC client? It looks like it just makes a new session for those count queries, everything else occurs in the original session.

@zeroshade
Copy link
Member

@liamnz at the moment that behavior is intentional though it's supposed to transfer the context of the current schema/db etc. I guess we must have missed something in that. I'll look into seeing if we can just use the same session instead of a different one.

@liamnz
Copy link
Author

liamnz commented Dec 18, 2024

Okay that's good to know, thanks for your help 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants