Select using request Id return one row #375

nicogon · 2022-12-06T20:35:23Z

Please answer these questions before submitting your issue. Thanks!

What version of NodeJS are you using (node --version and npm --version)?
16
What operating system and processor architecture are you using?
osx
Linux:

cat /etc/redhat-release for RedHat OS variants,
lsb_release -a for Debian OS variants
Mac:
system_profiler SPSoftwareDataType
Windows:
systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
What are the component versions in the environment (npm list)?
1.6.16
What did you do?

connection.execute({
    streamResult: true,
    sqlText: 'select * from .........',
    requestId: 'eadd641b-5dbc-4efd-aff9-4d8c4837849c',
  })
  
    const recordStream = query.streamRows({ start: 0, end: 10000 });

The table i am selecting has millions of rows. If I want to resume the fetch logic using request Id
the stream only returns one row

 { '1': 1 }

What did you expect to see?

I expect to have a stream with all the query data

What did you see instead?

A stream that returns

 { '1': 1 }

Add this to get standard output.

var snowflake = require('./snowflake');
snowflake.configure(
{
  logLevel: 'trace'
});

No logs

The text was updated successfully, but these errors were encountered:

beckjake · 2022-12-09T18:52:23Z

I think you're hitting an issue introduced in #338 - that PR unconditionally overwrites your query with select 1 on retries, which causes snowflake to execute that instead of your original request if your request never made it to snowflake in the first place.

The statement object you get back will still indicate that you ran the original query, but if you look up the query in snowflake's query history you'll see it was actually select 1, the sdk just silently overwrote your sql text.

sfc-gh-dszmolka · 2023-01-19T17:11:49Z

hi, thank you for submitting this issue. special thanks to @beckjake for pinpointing where it breaks . we'll take a look.

as a workaround, since the breaking change seems to have come with v1.6.15, maybe v1.6.13 or v1.6.14 could be used as the feature was introduced in v1.6.13. i tested it and works as expected, in terms of returning the expected result.
on the downside, you'll still have to pass the sqlText too (as opposed to v1.6.15+ where purely the requestId is enough) otherwise the query won't work, but you already seem to do that anyways.

beckjake · 2023-02-03T16:19:20Z

IMO, the feature where you don't have to send sqlText on retries is of rather dubious value: There's an inherent flaw in the case where your query never reached the server that the SDK can never really resolve, because only the original caller knows the original sqlText value. The only way I can see to resolve that is for the SDK to stash attempted queries by request ID or something, which opens up a whole can of worms - how do you know when to clear the cache? Do you just grow the cache without bound? Evict retry data after some amount of time/number of calls? Both seem pretty bad, caching for arbitrary data/use cases in a library is pretty tricky.

In our local fork of this, for what it's worth, I just changed the if (!statementContext.resubmitRequest) to if (!statementContext.resubmitRequest || !statementContext.sqlText). That way if you didn't provide any SQL it tries to do the v1.6.15+ behavior where it's optional, but if you did provide any SQL, it'll just use that.

sfc-gh-dszmolka · 2023-04-18T07:01:17Z

would like to update this one with some information discussed with the team over the past weeks.

capability originally implemented with Add option to pass request id #333
feature supposed to only work as expected when request with requestId is sent within the same Connection. Tested this a while back, and it does work as I remember. It indeed does not work when the request with requestId is sent anywhere outside of the same Connection, where it was originally created. One could argue with the practical use-cases of this capability, but it is like this at this moment.
indeed a select 1 is executed in all the occassions where the request with requestId is sent outside of the Connection where it was originally created. We'll consider changing this into something which would at least provide more intuitive feedback what is happening.

epechuzal · 2023-04-21T00:23:17Z

@sfc-gh-dszmolka is there a way to consume result rows from a query in a distributed way? to me that's the main use case for using requestId from another Connection.

I imagine some flow like:

execute some query which we know returns a large number of rows
store the request ID for the query
consumers use the request id just to fetch results from the previously executed query
- maybe even with start/end params

sfc-gh-dszmolka · 2023-04-21T09:12:26Z

@epechuzal there is indeed at least one way to do this, like the following

execute the query returning any arbitrary number of rows. retrieve the queryId (not the requestId, these are two independent identifiers). The queryId retrieval method is calling getStatementId() on the statement which executes your query. (I sneakily used a third identifier for the same query, and trust your graciousness you won't ask why it's called queryId when it's statementId and vica versa 😅 )
store the queryId (not the requestId) of the given query
taking advantage of the Snowflake query caching function (enabled by default, feature of the Snowflake backend, not the snowflake-sdk), you can call RESULT_SCAN on your persisted queryId to do anything with the cached query result set (re-fetch it again, fetch only a subset, etc.) which you would normally do on a 'normal' query. On top of this, of course RESULT_SCAN can be executed from any Connection or even any client, not just inside the exact same Connection in which the requestId was created.
Hope this helps.

sfc-gh-dszmolka · 2023-04-22T06:11:40Z

#477 for enhancing the query executed in the background to return something meaningful instead of 1

sfc-gh-dszmolka · 2023-05-25T06:45:11Z

fix released with version 1.6.22

beckjake · 2023-05-26T17:00:38Z

Hi @sfc-gh-dszmolka, very sorry for the long delay on this as I hadn't realized that there was maybe some misunderstanding about the nature of this issue, at least from my perspective as an end user.

indeed a select 1 is executed in all the occassions where the request with requestId is sent outside of the Connection where it was originally created. We'll consider changing this into something which would at least provide more intuitive feedback what is happening.

I am pretty confident you can trigger this behavior without sending a request outside of the Connection where it was originally created, as long as you hit an error that occurs after the request ID is created but before the request makes it to snowflake. The process goes like this:

set up a connection like normal
user calls connection.execute({ sqlText: "select 42 as mytest", complete: (err, stmt, rows) => {...} })
via Statement.createStatementPreExec -> createContextPreExec, the connector ultimately does statementContext.requestId = uuidv4(); and eventually returns that statementContext (with some other things of course) to the caller.
Concurrently, the statement is executed. However, the http client isn't able to reach snowflake at all, and calls the provided callback with an error (and a stmt object that has the requestId!).
calls using the request ID have their SQL text replaced with some garbage (now, at least, not select 1, but still not helpful stuff)
all future calls using the same request ID will now retry the garbage inserted by the connector

You should be able to reproduce this bad behavior by introducing network errors in the connection process.

Let me describe how it looks from my end. As the caller, I see that I got an error and that I have a request ID. A-ha, I say, I will retry my request per the docs, passing my request ID in to retry the request! So I retry the request, passing along my original SQL text. That sets request ID, which means my request doesn't run and instead I get a message telling me to use RESULT_SCAN (which isn't true!) And then the connector returns successfully! But it actually didn't do the thing I asked.

To quote the linked documentation:

By including the request ID in the SQL statement, you can avoid the potential for data duplication. Resubmitting the request with the request ID from the initial request ensures that the resubmitted command executes only if the initial request failed.

This implies to me that if I use the request ID, I will 1) get the old result if it ran and 2) execute the command if the initial request failed.

sfc-gh-dszmolka · 2023-05-27T06:45:15Z

hi @beckjake that was definitely a misunderstanding on my end and thank you so much for clarifying the actual issue at hand here. Reopening this Issue now and we'll look into it further.

sfc-gh-hchaturvedi · 2023-09-15T19:09:28Z

We're working to get to a resolution for this issue. Please stay tuned. Thanks!

sfc-gh-dszmolka · 2023-10-02T17:27:35Z

fix PR #639, in progress

sfc-gh-dszmolka · 2023-10-08T07:40:00Z

fix is merged and will be part of the October's release, towards end of the month. Will update the issue once the release is out.

sfc-gh-dszmolka · 2023-11-15T18:47:33Z

node.js driver version 1.9.1 released with the fix and is available on npm. thank you all for bearing with us !

sfc-gh-dszmolka assigned sfc-gh-ext-simba-ti Jan 19, 2023

sfc-gh-dszmolka added bug Something isn't working status-in_progress Issue is worked on by the driver team labels Mar 29, 2023

sfc-gh-dszmolka removed the status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. label May 25, 2023

sfc-gh-dszmolka closed this as completed May 25, 2023

sfc-gh-dszmolka reopened this May 27, 2023

sfc-gh-dszmolka added bug Something isn't working status-in_progress Issue is worked on by the driver team and removed enhancement The issue is a request for improvement or a new feature labels May 30, 2023

sfc-gh-dszmolka assigned sfc-gh-igarish and unassigned sfc-gh-ext-simba-ti May 30, 2023

sfc-gh-dszmolka assigned sfc-gh-triage-snowpark-ecosystem-dl and unassigned sfc-gh-igarish Jun 29, 2023

sfc-gh-dszmolka added status-pr_pending_merge A PR is made and is under review and removed status-in_progress Issue is worked on by the driver team labels Oct 2, 2023

sfc-gh-dszmolka added status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. and removed status-pr_pending_merge A PR is made and is under review labels Oct 8, 2023

sfc-gh-dszmolka closed this as completed Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select using request Id return one row #375

Select using request Id return one row #375

nicogon commented Dec 6, 2022 •

edited

Loading

beckjake commented Dec 9, 2022

sfc-gh-dszmolka commented Jan 19, 2023

beckjake commented Feb 3, 2023

sfc-gh-dszmolka commented Apr 18, 2023

epechuzal commented Apr 21, 2023

sfc-gh-dszmolka commented Apr 21, 2023 •

edited

Loading

sfc-gh-dszmolka commented Apr 22, 2023

sfc-gh-dszmolka commented May 25, 2023

beckjake commented May 26, 2023 •

edited

Loading

sfc-gh-dszmolka commented May 27, 2023

sfc-gh-hchaturvedi commented Sep 15, 2023

sfc-gh-dszmolka commented Oct 2, 2023

sfc-gh-dszmolka commented Oct 8, 2023

sfc-gh-dszmolka commented Nov 15, 2023

Select using request Id return one row #375

Select using request Id return one row #375

Comments

nicogon commented Dec 6, 2022 • edited Loading

beckjake commented Dec 9, 2022

sfc-gh-dszmolka commented Jan 19, 2023

beckjake commented Feb 3, 2023

sfc-gh-dszmolka commented Apr 18, 2023

epechuzal commented Apr 21, 2023

sfc-gh-dszmolka commented Apr 21, 2023 • edited Loading

sfc-gh-dszmolka commented Apr 22, 2023

sfc-gh-dszmolka commented May 25, 2023

beckjake commented May 26, 2023 • edited Loading

sfc-gh-dszmolka commented May 27, 2023

sfc-gh-hchaturvedi commented Sep 15, 2023

sfc-gh-dszmolka commented Oct 2, 2023

sfc-gh-dszmolka commented Oct 8, 2023

sfc-gh-dszmolka commented Nov 15, 2023

nicogon commented Dec 6, 2022 •

edited

Loading

sfc-gh-dszmolka commented Apr 21, 2023 •

edited

Loading

beckjake commented May 26, 2023 •

edited

Loading