Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select using request Id return one row #375

Closed
nicogon opened this issue Dec 6, 2022 · 14 comments
Closed

Select using request Id return one row #375

nicogon opened this issue Dec 6, 2022 · 14 comments
Assignees
Labels
bug Something isn't working status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector.

Comments

@nicogon
Copy link

nicogon commented Dec 6, 2022

Please answer these questions before submitting your issue. Thanks!

  1. What version of NodeJS are you using (node --version and npm --version)?
    16

  2. What operating system and processor architecture are you using?
    osx
    Linux:

    cat /etc/redhat-release for RedHat OS variants,
    lsb_release -a for Debian OS variants
    Mac:
    system_profiler SPSoftwareDataType
    Windows:
    systeminfo | findstr /B /C:"OS Name" /C:"OS Version"

  3. What are the component versions in the environment (npm list)?
    1.6.16

  4. What did you do?

connection.execute({
    streamResult: true,
    sqlText: 'select * from .........',
    requestId: 'eadd641b-5dbc-4efd-aff9-4d8c4837849c',
  })
  
    const recordStream = query.streamRows({ start: 0, end: 10000 });

The table i am selecting has millions of rows. If I want to resume the fetch logic using request Id
the stream only returns one row

 { '1': 1 }
  1. What did you expect to see?

I expect to have a stream with all the query data

  1. What did you see instead?

A stream that returns

 { '1': 1 }
  1. Add this to get standard output.
var snowflake = require('./snowflake');
snowflake.configure(
{
  logLevel: 'trace'
});

No logs

@beckjake
Copy link
Contributor

beckjake commented Dec 9, 2022

I think you're hitting an issue introduced in #338 - that PR unconditionally overwrites your query with select 1 on retries, which causes snowflake to execute that instead of your original request if your request never made it to snowflake in the first place.

The statement object you get back will still indicate that you ran the original query, but if you look up the query in snowflake's query history you'll see it was actually select 1, the sdk just silently overwrote your sql text.

@sfc-gh-dszmolka
Copy link
Collaborator

hi, thank you for submitting this issue. special thanks to @beckjake for pinpointing where it breaks . we'll take a look.

as a workaround, since the breaking change seems to have come with v1.6.15, maybe v1.6.13 or v1.6.14 could be used as the feature was introduced in v1.6.13. i tested it and works as expected, in terms of returning the expected result.
on the downside, you'll still have to pass the sqlText too (as opposed to v1.6.15+ where purely the requestId is enough) otherwise the query won't work, but you already seem to do that anyways.

@beckjake
Copy link
Contributor

beckjake commented Feb 3, 2023

IMO, the feature where you don't have to send sqlText on retries is of rather dubious value: There's an inherent flaw in the case where your query never reached the server that the SDK can never really resolve, because only the original caller knows the original sqlText value. The only way I can see to resolve that is for the SDK to stash attempted queries by request ID or something, which opens up a whole can of worms - how do you know when to clear the cache? Do you just grow the cache without bound? Evict retry data after some amount of time/number of calls? Both seem pretty bad, caching for arbitrary data/use cases in a library is pretty tricky.

In our local fork of this, for what it's worth, I just changed the if (!statementContext.resubmitRequest) to if (!statementContext.resubmitRequest || !statementContext.sqlText). That way if you didn't provide any SQL it tries to do the v1.6.15+ behavior where it's optional, but if you did provide any SQL, it'll just use that.

@sfc-gh-dszmolka sfc-gh-dszmolka added bug Something isn't working status-in_progress Issue is worked on by the driver team labels Mar 29, 2023
@sfc-gh-dszmolka
Copy link
Collaborator

would like to update this one with some information discussed with the team over the past weeks.

  1. capability originally implemented with Add option to pass request id #333
  2. feature supposed to only work as expected when request with requestId is sent within the same Connection. Tested this a while back, and it does work as I remember. It indeed does not work when the request with requestId is sent anywhere outside of the same Connection, where it was originally created. One could argue with the practical use-cases of this capability, but it is like this at this moment.
  3. indeed a select 1 is executed in all the occassions where the request with requestId is sent outside of the Connection where it was originally created. We'll consider changing this into something which would at least provide more intuitive feedback what is happening.

@epechuzal
Copy link

@sfc-gh-dszmolka is there a way to consume result rows from a query in a distributed way? to me that's the main use case for using requestId from another Connection.

I imagine some flow like:

  • execute some query which we know returns a large number of rows
  • store the request ID for the query
  • consumers use the request id just to fetch results from the previously executed query
    • maybe even with start/end params

@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Apr 21, 2023

@epechuzal there is indeed at least one way to do this, like the following

  • execute the query returning any arbitrary number of rows. retrieve the queryId (not the requestId, these are two independent identifiers). The queryId retrieval method is calling getStatementId() on the statement which executes your query. (I sneakily used a third identifier for the same query, and trust your graciousness you won't ask why it's called queryId when it's statementId and vica versa 😅 )
  • store the queryId (not the requestId) of the given query
  • taking advantage of the Snowflake query caching function (enabled by default, feature of the Snowflake backend, not the snowflake-sdk), you can call RESULT_SCAN on your persisted queryId to do anything with the cached query result set (re-fetch it again, fetch only a subset, etc.) which you would normally do on a 'normal' query. On top of this, of course RESULT_SCAN can be executed from any Connection or even any client, not just inside the exact same Connection in which the requestId was created.
    Hope this helps.

@sfc-gh-dszmolka
Copy link
Collaborator

#477 for enhancing the query executed in the background to return something meaningful instead of 1

@sfc-gh-dszmolka sfc-gh-dszmolka added status-pr_pending_merge A PR is made and is under review status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. enhancement The issue is a request for improvement or a new feature and removed status-in_progress Issue is worked on by the driver team status-pr_pending_merge A PR is made and is under review bug Something isn't working labels Apr 26, 2023
@sfc-gh-dszmolka sfc-gh-dszmolka removed the status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. label May 25, 2023
@sfc-gh-dszmolka
Copy link
Collaborator

fix released with version 1.6.22

@beckjake
Copy link
Contributor

beckjake commented May 26, 2023

Hi @sfc-gh-dszmolka, very sorry for the long delay on this as I hadn't realized that there was maybe some misunderstanding about the nature of this issue, at least from my perspective as an end user.

indeed a select 1 is executed in all the occassions where the request with requestId is sent outside of the Connection where it was originally created. We'll consider changing this into something which would at least provide more intuitive feedback what is happening.

I am pretty confident you can trigger this behavior without sending a request outside of the Connection where it was originally created, as long as you hit an error that occurs after the request ID is created but before the request makes it to snowflake. The process goes like this:

  • set up a connection like normal
  • user calls connection.execute({ sqlText: "select 42 as mytest", complete: (err, stmt, rows) => {...} })
  • via Statement.createStatementPreExec -> createContextPreExec, the connector ultimately does statementContext.requestId = uuidv4(); and eventually returns that statementContext (with some other things of course) to the caller.
  • Concurrently, the statement is executed. However, the http client isn't able to reach snowflake at all, and calls the provided callback with an error (and a stmt object that has the requestId!).
  • calls using the request ID have their SQL text replaced with some garbage (now, at least, not select 1, but still not helpful stuff)
  • all future calls using the same request ID will now retry the garbage inserted by the connector

You should be able to reproduce this bad behavior by introducing network errors in the connection process.

Let me describe how it looks from my end. As the caller, I see that I got an error and that I have a request ID. A-ha, I say, I will retry my request per the docs, passing my request ID in to retry the request! So I retry the request, passing along my original SQL text. That sets request ID, which means my request doesn't run and instead I get a message telling me to use RESULT_SCAN (which isn't true!) And then the connector returns successfully! But it actually didn't do the thing I asked.

To quote the linked documentation:

By including the request ID in the SQL statement, you can avoid the potential for data duplication. Resubmitting the request with the request ID from the initial request ensures that the resubmitted command executes only if the initial request failed.

This implies to me that if I use the request ID, I will 1) get the old result if it ran and 2) execute the command if the initial request failed.

@sfc-gh-dszmolka
Copy link
Collaborator

hi @beckjake that was definitely a misunderstanding on my end and thank you so much for clarifying the actual issue at hand here. Reopening this Issue now and we'll look into it further.

@sfc-gh-dszmolka sfc-gh-dszmolka added bug Something isn't working status-in_progress Issue is worked on by the driver team and removed enhancement The issue is a request for improvement or a new feature labels May 30, 2023
@sfc-gh-hchaturvedi
Copy link
Collaborator

We're working to get to a resolution for this issue. Please stay tuned. Thanks!

@sfc-gh-dszmolka
Copy link
Collaborator

fix PR #639, in progress

@sfc-gh-dszmolka sfc-gh-dszmolka added status-pr_pending_merge A PR is made and is under review and removed status-in_progress Issue is worked on by the driver team labels Oct 2, 2023
@sfc-gh-dszmolka
Copy link
Collaborator

fix is merged and will be part of the October's release, towards end of the month. Will update the issue once the release is out.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. and removed status-pr_pending_merge A PR is made and is under review labels Oct 8, 2023
@sfc-gh-dszmolka
Copy link
Collaborator

node.js driver version 1.9.1 released with the fix and is available on npm. thank you all for bearing with us !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector.
Projects
None yet
Development

No branches or pull requests

8 participants