-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1337235 SHOW <TABLES, VIEWS> has inconsistent JSON responses that can't be converted to Arrow correctly #1110
Comments
hi and thank you for submitting this issue, i'll take a look sometimes early next week. If the issue is very urgent or impactful, as suggested during the issue creation: please create a case with Snowflake Support instead as a quick preliminary analysis:
if the issue will be indeed proven to be at the server side, then it's independent from the |
did a test on Debian 11 to see how If there's any issue in the JSON resultset parsing, it should be visible regardless of OS, but to be sure I'll try to test this on Windows too. Reproduction:
# cat gosnowflake1110.go
package main
import (
"database/sql"
"flag"
"fmt"
"log"
sf "github.com/snowflakedb/gosnowflake"
)
func main() {
if !flag.Parsed() {
flag.Parse()
}
cfg, err := sf.GetConfigFromEnv([]*sf.ConfigParam{
{Name: "Account", EnvName: "SNOWFLAKE_TEST_ACCOUNT", FailOnMissing: true},
{Name: "Role", EnvName: "SNOWFLAKE_TEST_ROLE", FailOnMissing: true},
{Name: "User", EnvName: "SNOWFLAKE_TEST_USER", FailOnMissing: true},
{Name: "Password", EnvName: "SNOWFLAKE_TEST_PASSWORD", FailOnMissing: true},
{Name: "Warehouse", EnvName: "SNOWFLAKE_TEST_WAREHOUSE", FailOnMissing: true},
{Name: "Database", EnvName: "SNOWFLAKE_TEST_DATABASE", FailOnMissing: true},
{Name: "Schema", EnvName: "SNOWFLAKE_TEST_SCHEMA", FailOnMissing: true},
})
if err != nil {
log.Fatalf("failed to create Config, err: %v", err)
}
//cfg.Tracing = "DEBUG"
dsn, err := sf.DSN(cfg)
if err != nil {
log.Fatalf("failed to create DSN from Config: %v, err: %v", cfg, err)
}
db, err := sql.Open("snowflake", dsn)
if err != nil {
log.Fatalf("failed to connect. %v, err: %v", dsn, err)
}
defer db.Close()
query := "SHOW TABLES LIMIT 2999;"
rows, err := db.Query(query) // no cancel is allowed
if err != nil {
log.Fatalf("failed to run a query. %v, err: %v", query, err)
}
defer rows.Close()
var v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23 sql.NullString
for rows.Next() {
err := rows.Scan(&v1, &v2, &v3, &v4, &v5, &v6, &v7, &v8, &v9, &v10, &v11, &v12, &v13, &v14, &v15, &v16, &v17, &v18, &v19, &v20, &v21, &v22, &v23)
if err != nil {
log.Fatalf("failed to get result. err: %v", err)
}
fmt.Println(v2)
}
if rows.Err() != nil {
fmt.Printf("ERROR: %v\n", rows.Err())
return
}
fmt.Printf("Congrats! You have successfully run %v with Snowflake DB!\n", query)
} Result: expected tables are returned:
when adding
which chunk, as the console output suggests, is successfully downloaded and parsed
As of next step, could you please:
Asking this because for now, only using the Snowflake Go Driver, the |
I can't run it as is, I get the error:
But I can run it if I drop |
But, that call is also just a JSON call with no Arrow context to it. I will try to get a closer re-creation. |
apparently then you have one column less in your response for some reason; maybe we're in different versions or there's some settings different between our accounts. being able to run it without the last column means it is possible to retrieve the 'big' resultset for if would be very much appreciated if you could please provide a repro; as you see the issue does not reproduce with |
not quite the repro, but this is closer to the scenario:
results in the error:
|
thank you for the repro and details!
will not work with This is documented in the Supported Data Types section of the driver documentation:
You seem to be hitting this expected behaviour. edited for possible way forward:
batches, err := rows.(sf.SnowflakeRows).GetArrowBatches()
rows, err := db.QueryContext(ctx, query) |
The problem is that the call isn't consistent. When the call has the lower number, it runs correctly, and the JSON can be parsed. Line 651 in 088150c
|
I think I know what's happening. If you have enough rows in your output, snowflake will chunk the data, returning the first chunk with the initial response and requiring you to fetch the remaining results as separate requests. It looks like snowflakeArrowStreamChunkDownloader isn't handling this scenario properly in order to report this to the consumer. In the ADBC code that @davidhcoe linked to, we check the number of batches returned by For consistency, I think the ideal scenario here would be for gosnowflake's |
Hi @zeroshade ! My input from driver developer perspective:
|
|
@zeroshade |
The problem there is that it doesn't correctly manage when the JSON data is chunked. It returns the JSON data from the initial response, and it's still difficult to tell when you need to call
You will likely run into this problem again when you do that as customers are going to almost certainly dislike the inconsistency. Personally, the preferred solution here would be to add support to return Arrow for these metadata queries. But for now, at minimum something should be added to |
FWIW: In general, converting a result set from JSON format to Arrow format is a nontrivial operation for a client or driver library to have to perform. It requires using dependencies which can be heavy and can behave subtly differently depending on what language implementation you're using. Client-side conversion also creates greater risk of inconsistency because the logic for converting Snowflake types to Arrow types can be complicated. +1 to Matt's suggestion to always convert to Arrow on the server side whenever possible. This is better for keeping connectors lightweight and for ensuring consistency across connectors. |
@ianmcook thanks for you input! However the problem on backend is (to my best knowledge) similar to the problem in drivers - there is one component that prepares the actual table data (and this component can serialise arrow) and another one which returns metadata (and this one probably can't - but I'm not sure). @zeroshade I analysed code a bit and I'm wondering about one thing. Currently the process looks like this:
What do you think about such approach:
|
I was toying with this a bit, and I can do this, but this is a terrible solution for the general case. Mostly because the chunks aren't actually valid JSON. They are a series of JSON arrays separated by commas: ["a", "b", "c", "d"], ["e", "f", "g", "h"], .... With no top-level structure as per the JSON spec. Looking at the gosnowflake code, there's an entire section devoted to custom parsing this JSON because you can't just deserialize it as normal JSON. It also means that if snowflake ever changes how it formats the JSON for the large chunks, it'll break us as we'll now be relying on an undocumented, non-standard way of representing the JSON data. While this is likely a viable workaround for this situation, I don't think this is a viable long-term solution. |
…itations (#1790) Workaround to fix #1454 until snowflake addresses snowflakedb/gosnowflake#1110 with a better solution (hopefully by having the server actually return Arrow...)
Sorry for late reply, I was absent in the previous week. |
…itations (apache#1790) Workaround to fix apache#1454 until snowflake addresses snowflakedb/gosnowflake#1110 with a better solution (hopefully by having the server actually return Arrow...)
Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!
1.9
Windows x64
run
go version
in your consolego version go1.21.3 windows/amd64
4.Server version:* E.g. 1.90.1
You may get the server version by running a query:
8.15.0
What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
The issue was first noticed and logged as an ADBC issue on apache/arrow-adbc#1454.
The problem seems to be that when a SHOW command is run that contains a large number of results, the error
arrow/ipc: could not read message schema: arrow/ipc: could not read message metadata: unexpected EOF
is received.Upon digging in to this further, the behavior of the call is inconsistent. If the call is included with a LIMIT command to restrict the number of records to, say, 100, then things work as expected:
And the 100 values are returned in the JSON:
There are no Arrow record batches returned from the ArrowStreamLoader class because it is nil. The downstream calls all succeed as expected.
However, a larger number of records (of indeterminate size) in the result causes that same request to fail. For example, if limited to 700, only 562 records are returned:
And those records end with a "," and not EOF because the JSON string isn't loaded correctly.
Now, ArrowStreamLoader reports there is 1 record batch. This leads to downstream callers failing:
Because the string can't be finished being read in
What did you expect to see?
The correct number of tables from SHOW TABLES should be sent, no matter how many there are.
Can you set logging to DEBUG and collect the logs?
https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors
What is your Snowflake account identifier, if any? (Optional)
I included screenshots that contain QueryIDs. Hopefully that helps.
The text was updated successfully, but these errors were encountered: