Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSSQL Streaming does not handle backpressure #879

Closed
fehnomenal opened this issue Feb 12, 2024 · 4 comments · Fixed by #1041
Closed

MSSQL Streaming does not handle backpressure #879

fehnomenal opened this issue Feb 12, 2024 · 4 comments · Fixed by #1041
Assignees
Labels
bug Something isn't working built-in dialect Related to a built-in dialect mssql Related to MS SQL Server (MSSQL)

Comments

@fehnomenal
Copy link

First of all thank you for this great library and support for mssql. You literally saved my life.

I'm working with rather huge datasets (3.9M rows joined with other tables) which cannot be paged without risking timeouts. Unfortunately, I cannot influence the schema or indices so I have to work with what I got and that is streaming the query results. This is were I noticed that my app crashes with OOM depending on the number of selected columns (and rows of course).

Short digression about how streaming mssql works right now:
There is an array that is filled with rows as they come in from the tedious request.
A loop (with setTimeout 0) checks if the array is larger than the requested batch size (let's call it n) and resolves a deferred with the same number n of rows.

Now this is the problem as only the first n rows are passed to the application. If the application is fast in handling the batch this creates no problem. But otherwise the row buffer fills up and is not cleared fast enough, thus consuming all the memory.

I solved this problem locally by using a (nodejs) readable stream and pausing the tedious request if the stream cannot buffer more rows. This works really great but I could only test this for reading data as I have only read-only access to the database.

Are you interested in a PR? I'm not sure about other runtimes that kysely supports.

@igalklebanov igalklebanov added bug Something isn't working built-in dialect Related to a built-in dialect mssql Related to MS SQL Server (MSSQL) labels Mar 12, 2024
@igalklebanov
Copy link
Member

Hey 👋

Thank you!

Is using a stream absolutely necessary? Shouldn't using request.pause() and request.resume() suffice?

@fehnomenal
Copy link
Author

fehnomenal commented Apr 1, 2024

Yes, I use Request.pause() and Request.resume(). I'm not familiar how to detect back pressure other than via stream, so that's what I went with.

Thinking about it, it might be enough to check the size of the rows array after splicing it and pausing the request if it is not empty. I suspect the stream implementation is doing something similar internally with a buffer array.

@igalklebanov
Copy link
Member

igalklebanov commented Apr 1, 2024

Let's try the array length approach first. PRs welcome.

@igalklebanov
Copy link
Member

igalklebanov commented Jun 15, 2024

And it's solved. Use #1041's changes for now. Array length + pause/resume did the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working built-in dialect Related to a built-in dialect mssql Related to MS SQL Server (MSSQL)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants