-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R: write_adbc()
does not seem to work when writing to a Snowflake table
#2366
Comments
Thank you for reporting! Just curious...does a normal "insert" query work? I know very little about Snowflake, but I'm wondering if there just aren't create table/write privileges, or whether the mechanism we use to do the inserting (which I think is a COPY INTO with Parquet) doesn't pass on everything it needs to. |
Ah that's interesting that write_adbc() writes Parquet files, I wondered if it might be sending Arrow record batches instead. Inserting records and manually writing parquet files seem to work fine :) # install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
# install.packages('arrow')
# install.packages('fs')
library(adbcdrivermanager)
library(adbcsnowflake)
db <- adbc_database_init(
adbcsnowflake(),
username = Sys.getenv('SNOWFLAKE_USER'),
adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
adbc.snowflake.sql.uri.protocol = 'https',
adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
adbc.snowflake.sql.uri.port = '443',
adbc.snowflake.sql.auth_type = 'auth_ext_browser',
adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)
con <- adbc_connection_init(db)
execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')
# Inserting works
execute_adbc(con, 'create or replace table x (a int, b int, c int)')
execute_adbc(con, 'insert into x values (1, 2, 3)')
read_adbc(con, 'select * from x') |> as.data.frame()
#> A B C
#> 1 1 2 3
# COPY INTO with Parquet works
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'create or replace table x (a int, b int, c int)')
parquet_file_path <- fs::file_temp(ext = '.parquet')
parquet_file <- fs::path_file(parquet_file_path)
arrow::write_parquet(x, parquet_file_path)
put_command <- paste0("put 'file://", parquet_file_path, "' @~")
execute_adbc(con, put_command)
execute_adbc(con, paste0("
copy into x
from @~/", parquet_file, "
file_format = (type = parquet)
match_by_column_name = case_sensitive
"))
read_adbc(con, 'select * from x') |> as.data.frame()
#> A B C
#> 1 1 2 3
# write_adbc() doesn't work
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'drop table if exists x')
write_adbc(x, con, 'X')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'X' does not exist or not authorized.
schema_tables <- read_adbc(con, ' show tables in schema reprex') |> as.data.frame()
#> Warning in convert_array_stream(x, to): 1 value(s) may have incurred loss of
#> precision in conversion to double()
schema_tables[, c('name', 'rows')]
#> name rows
#> 1 X 0
read_adbc(con, 'select * from x') |> as.data.frame()
#> [1] A B C
#> <0 rows> (or 0-length row.names)
Created on 2024-12-15 with reprex v2.1.1 |
Thank you for the investigation! I don't know exactly what queries are being issued here (it may help to have the Go driver print the query alongside the error, or include it as an error detail if it doesn't already), so I'm not sure exactly which one is failing. @zeroshade Any ideas here? |
@paleolimbot what In addition, @liamnz can you bear with me and try using a lowercase |
By default it doesn't set arrow-adbc/r/adbcdrivermanager/R/helpers.R Line 129 in 46f7a84
|
it doesn't have to, the ADBC docs say the default ingestion mode is @liamnz Could you look at the monitoring on your snowflake dashboard and look at the actual COPY INTO queries being run and see if anything stands out as problematic there? |
@zeroshade lowercase table names yields the same error. Here are the commands that are issued when the write is attempted: It looks like it attempts a count on the newly created table but perhaps there is too much quoting going on in the https://docs.snowflake.com/en/sql-reference/identifier-literal |
The count works outside of # install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
library(adbcdrivermanager)
library(adbcsnowflake)
db <- adbc_database_init(
adbcsnowflake(),
username = Sys.getenv('SNOWFLAKE_USER'),
adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
adbc.snowflake.sql.uri.protocol = 'https',
adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
adbc.snowflake.sql.uri.port = '443',
adbc.snowflake.sql.auth_type = 'auth_ext_browser',
adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE")
)
con <- adbc_connection_init(db)
execute_adbc(con, 'create or replace schema reprex')
execute_adbc(con, 'use schema reprex')
x <- data.frame(A = 1, B = 2, C = 3)
write_adbc(x, con, 'X')
#> Error in adbc_statement_execute_query(stmt): NOT_FOUND: 002003 (42S02): SQL compilation error:
#> Object 'X' does not exist or not authorized.
read_adbc(con, "SELECT COUNT(*) FROM IDENTIFIER('\"X\"')") |> as.data.frame()
#> COUNT(*)
#> 1 0 Created on 2024-12-17 with reprex v2.1.1 |
Yeah, I think it should be either |
Ah, crossed replies -- and, hmm. Are both queries happening on the same connection and/or with the same context (i.e. current database, current schema)? |
Ah I think I see what the problem is now. In my original code, when the connection is first established a schema is not assigned but is created later with a When I change the connection so that a schema is assigned at connection time, everything works fine. # install.packages('adbcdrivermanager')
# install.packages("adbcsnowflake", repos = "https://community.r-multiverse.org")
library(adbcdrivermanager)
library(adbcsnowflake)
db <- adbc_database_init(
adbcsnowflake(),
username = Sys.getenv('SNOWFLAKE_USER'),
adbc.snowflake.sql.account = Sys.getenv('SNOWFLAKE_ACCOUNT'),
adbc.snowflake.sql.uri.protocol = 'https',
adbc.snowflake.sql.uri.host = paste0(Sys.getenv('SNOWFLAKE_ACCOUNT'), '.snowflakecomputing.com'),
adbc.snowflake.sql.uri.port = '443',
adbc.snowflake.sql.auth_type = 'auth_ext_browser',
adbc.snowflake.sql.role = Sys.getenv("SNOWFLAKE_ROLE"),
adbc.snowflake.sql.warehouse = Sys.getenv("SNOWFLAKE_WAREHOUSE"),
adbc.snowflake.sql.db = Sys.getenv("SNOWFLAKE_DATABASE"),
adbc.snowflake.sql.schema = 'REPREX'
)
con <- adbc_connection_init(db)
x <- data.frame(A = 1, B = 2, C = 3)
execute_adbc(con, 'drop table if exists x')
write_adbc(x, con, 'X')
read_adbc(con, 'select * from x') |> as.data.frame()
#> A B C
#> 1 1 2 3 Created on 2024-12-17 with reprex v2.1.1 |
Do you folks know if this multi-session behaviour is intended for the ADBC client? It looks like it just makes a new session for those count queries, everything else occurs in the original session. |
@liamnz at the moment that behavior is intentional though it's supposed to transfer the context of the current schema/db etc. I guess we must have missed something in that. I'll look into seeing if we can just use the same session instead of a different one. |
Okay that's good to know, thanks for your help 😊 |
What happened?
I'm using the
adbcdrivermanager
andadbcsnowflake
packages in R and testing out writing a data-frame to a Snowflake table withwrite_adbc()
.write_adbc()
returns a NOT_FOUND error from Snowflake.It looks like the table gets created, but the actual data is not written to the table.
Stack Trace
No response
How can we reproduce the bug?
Created on 2024-12-13 with reprex v2.1.1
Environment/Setup
Windows 11
R 4.4.2
adbcdrivermanager 0.15.0
adbcsnowflake 0.15.0
The text was updated successfully, but these errors were encountered: