Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal slonik patch for closing stream queries (release) #604

Merged

Conversation

alxndrsn
Copy link
Contributor

WIP

@alxndrsn alxndrsn changed the title wip Minimal slonik patch for closing stream queries (release) Sep 12, 2022
@matthew-white
Copy link
Member

I checked out this PR on the QA server, then followed the repro steps in #482. I'm still able to reproduce a database connection leak by following those steps. However, I actually think that's expected. This patched version of Slonik makes it possible to destroy a stream, but we still need to leverage that ability by making other changes. After discussing with @alxndrsn, it seems like we have three options in this area:

  1. Wherever we use Promise.all() with a stream, instead use a utility function like the one at 10 concurrent requests for OData feed results in database connection leak #482 (comment)
  2. Avoid using Promise.all() with a stream (10 concurrent requests for OData feed results in database connection leak #482 (comment))
  3. Automatically destroy streams that haven't had a recent read (see Patch slonik to close streams more reliably #599)

If we want to ship this patched version of Slonik with the upcoming Central patch release, we should also ship one of the three changes above. Ultimately, we'll probably want to make more than one of these changes: I think we want to implement (3), then also either (1) or (2).

@matthew-white
Copy link
Member

We discussed this in the meeting today, and we're currently thinking that we should ship one of the three options above as part of the patch release. Whichever option we choose, I think we could gain confidence in the change by running the benchmarker against it. @lognaturel also made the point that if streams somehow stop working, that's obviously not a good thing, but there's actually only so much risk there. (It's not like it's a write operation and data would be corrupted.) How does shipping one of those options sound, @alxndrsn? Do you have a preference among them? I don't have a strong preference myself, but let me know if there's a way for me to help. If option (1) seems preferable, I'd be happy to continue working on that utility function.

@matthew-white
Copy link
Member

I think regardless of whether we ship #608 or the timeout functionality in #599, we should go ahead and merge this PR. @alxndrsn, does that sound right? Is there more that needs to be done for this PR specifically?

@alxndrsn
Copy link
Contributor Author

@matthew-white because this PR wasn't useful, I've re-added stream timeouts, with tests.

@matthew-white
Copy link
Member

matthew-white commented Sep 15, 2022

@alxndrsn ran the benchmarker for a few different options. Each benchmarker run took 300 sec.

  Total requests Successes Throughput (req/s) Mean response time (sec) Min response time (sec) Max response time (sec) Response size
#608 999 11 (1%) 0.0 21.6 16.8 23.4 818,805
Stream timeout (2 min) 999 53 (5%) 0.2 22.2 13.3 29.9 827,011
Stream timeout (45 sec) 1,000 180 (18%) 0.6 25.0 14.0 36.4 871,938
Stream timeout (5 sec) 1,000 205 (20%) 0.7 27.2 15.2 36.9 872,616

@alxndrsn wrote on Slack about #608:

basically throughput is halved... but there don't seem to be any connection leaks

@matthew-white
Copy link
Member

matthew-white commented Sep 16, 2022

@alxndrsn taught me how to run the benchmarker, so I've been running it in CircleCI. I ran it for #608, #609, and stream timeouts (this PR), as well as for v1.5.1 and the current release branch. Very surprisingly to me, #609 didn't seem to do any better than #608 (and maybe even did a little worse). Like #608, it performed worse than stream timeouts. (Note that in some cases, I ran the benchmarker multiple times for the same commit and got different results. Below I list the more successful results.)

  CircleCI build Successes Throughput (req/s)
v1.5.1 [1][2][3] 29 (2%) 0.1
Current release branch - no patch to Slonik (ae09ec5) [1][2] 34 (3%) 0.1
#608 [1][2] 57 (5%) 0.2
#609 [1][2] 33 (3%) 0.1
Stream timeout (2 min) - this PR [1][2] 105 (10%) 0.3

Unlike #608, #609 doesn't wait until after a Promise.all() to create a stream, so I'm confused about why it's doing so much worse than stream timeouts. Stream timeouts would perform especially well when there's an issue with the client reading data. Is there a chance that the benchmarker just isn't consuming all the streams it's requesting? I changed the benchmarker to send 3 requests every 3 seconds (300 requests total) rather than 10 requests every 3 seconds (1000 requests total). I saw pretty different results:

  CircleCI build Successes Throughput (req/s)
v1.5.1 [1][2][3] 208 (69%) 0.7
Current release branch - no patch to Slonik (ae09ec5) [1][2] 244 (81%) 0.8
#608 [1][2][3] 241 (80%) 0.8
#609 [1][2] 280 (93%) 0.9
Stream timeout (2 min) - this PR [1][2] 257 (85%) 0.9
#609 + stream timeout [1][2] 241 (80%) 0.8

(One surprising result: #609 + stream timeouts did worse than either one on its own!)

Based on the results above, I'm thinking:

  • There's some variability in the benchmarker results. That means that it's worth running the benchmarker multiple times and that it's harder to ascribe small differences in results to real differences in performance.
  • Fixing the database connection leaks might not be making a huge difference in these results, but it does seem like the various fixes at least aren't breaking anything.
  • Stream timeouts fix cases that Create streams only after Promise.all() #608 and Destroy streams created in rejected Promise.all() #609 don't fix: stream timeouts seem to be performing markedly better than the other options in the first table.

@matthew-white matthew-white marked this pull request as ready for review September 19, 2022 16:58
Copy link
Member

@matthew-white matthew-white left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #599 for relevant discussion.

We're going to ship this PR with the patch release. We may also continue looking into #608 and #609.

@matthew-white
Copy link
Member

There's some variability in the benchmarker results.

@alxndrsn thinks it's possible that some of that variability is from CircleCI itself, specifically the state of the machine on which the benchmarker is run. That's something I'll keep in mind once I continue working on #609.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants