Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf drop in latest releases - unstable postgrest-loadtest results #2842

Closed
steve-chavez opened this issue Jun 29, 2023 · 8 comments
Closed
Labels
nix related to Nix tooling perf

Comments

@steve-chavez
Copy link
Member

Problem

On #2523, the loadtest reported 420.02257774518125 throughput.

On main, the loadtest reports a 338.6473662241433 throughput.

Proposal

We need to check on which commit(s) the regression happened and then find a solution.


@wolfgangwalther I recall you ran the postgrest-loadtest on a commit range before. Perhaps you could share the script so I can check? (the script would be useful as one of our nix commands too).

@steve-chavez steve-chavez added perf nix related to Nix tooling labels Jun 29, 2023
@wolfgangwalther
Copy link
Member

@wolfgangwalther I recall you ran the postgrest-loadtest on a commit range before. Perhaps you could share the script so I can check? (the script would be useful as one of our nix commands too).

Unfortunately, I don't have that script anymore.

@develop7
Copy link
Collaborator

develop7 commented Sep 12, 2023

I've tried to reproduce the issue with postgrest-loadtest-against 0a1564ba5aaa8175dcb738271967944652cf4ce9 locally

According to the report, requests throughput does drop, along with request rate, but not 25% down:

Rate Throughput
0a1564 710.57 645.97
main 687.80 625.27
diff -4% -4%

While there is a slowdown, it's not awful lot. Might be a fluke; worst case is the slowdown is amplified by the environment Github actions are running in. I'd suggest we'd try running loadtest in a dedicated VM (in AWS for now) and see if we could reproduce the issue.

postgrest-loadtest report
❯  postgrest-loadtest-against 0a1564ba5aaa8175dcb738271967944652cf4ce9

Running loadtest on "0a1564ba5aaa8175dcb738271967944652cf4ce9"...

Preparing worktree (detached HEAD 0a1564ba)
postgrest-with-postgresql-15: You can connect with: psql 'postgres:///postgres?host=/run/user/1000/postgrest/postgrest-with-postgresql-15-qzo/socket' -U postgres
postgrest-with-postgresql-15: You can tail the logs with: tail -f /run/user/1000/postgrest/postgrest-with-postgresql-15-qzo/db.log
delaying data to/from postgres by 0ms
Building postgrest (nix)... done.
Starting postgrest... done.
delaying data to/from PostgREST by 0ms
2023/09/11 16:59:39 accept unix /run/user/1000/postgrest/postgrest-with-slow-postgrest-y1R/postgrest.socket: use of closed network connection
2023/09/11 16:59:39 accept unix /run/user/1000/postgrest/postgrest-with-slow-pg-SlF/socket/.s.PGSQL.5432: use of closed network connection
Requests      [total, rate, throughput]         42638, 710.57, 645.97
Duration      [total, attack, wait]             1m0s, 1m0s, 438.81µs
Latencies     [min, mean, 50, 90, 95, 99, max]  72.607µs, 1.406ms, 467.721µs, 1.413ms, 10.105ms, 10.945ms, 19.807ms
Bytes In      [total, mean]                     43138986, 1011.75
Bytes Out     [total, mean]                     164090460, 3848.46
Success       [ratio]                           90.91%
Status Codes  [code:count]                      200:23258  201:3876  204:11628  400:3876  
Error Set:
400 Bad Request

Done running on "0a1564ba5aaa8175dcb738271967944652cf4ce9".


Running loadtest on HEAD...

postgrest-with-postgresql-15: You can connect with: psql 'postgres:///postgres?host=/run/user/1000/postgrest/postgrest-with-postgresql-15-sOm/socket' -U postgres
postgrest-with-postgresql-15: You can tail the logs with: tail -f /run/user/1000/postgrest/postgrest-with-postgresql-15-sOm/db.log
delaying data to/from postgres by 0ms
Building postgrest (nix)... done.
Starting postgrest... done.
delaying data to/from PostgREST by 0ms
2023/09/11 17:04:39 accept unix /run/user/1000/postgrest/postgrest-with-slow-postgrest-ojG/postgrest.socket: use of closed network connection
2023/09/11 17:04:39 accept unix /run/user/1000/postgrest/postgrest-with-slow-pg-ktm/socket/.s.PGSQL.5432: use of closed network connection
Requests      [total, rate, throughput]         41274, 687.80, 625.27
Duration      [total, attack, wait]             1m0s, 1m0s, 595.107µs
Latencies     [min, mean, 50, 90, 95, 99, max]  69.772µs, 1.452ms, 465.196µs, 1.469ms, 10.637ms, 11.475ms, 19.373ms
Bytes In      [total, mean]                     42521098, 1030.22
Bytes Out     [total, mean]                     158840920, 3848.45
Success       [ratio]                           90.91%
Status Codes  [code:count]                      200:22514  201:3752  204:11256  400:3752  
Error Set:
400 Bad Request

Done running on HEAD.

@develop7
Copy link
Collaborator

On the other hand, the loadtest against the commit before the v11.2.0 reports throughput = 261.92, while the one before the v11.1.0 does 323.83 which results in 20% drop.

@develop7
Copy link
Collaborator

develop7 commented Sep 12, 2023

Ran postgrest-loadtest-against v11.1.0 locally, no luck — less that 5% drop

Reviewed loadtest results for handful of commits between v11.1.0 and v11.2.0. Most of the tests have their throughput in 320-350, and while there are some drops to 260s, they do not hold — loadtest against next commit has throughput back in 300s.

To me it looks like an irregular fluke so far.

postgrest-loadtest-against v11.1.0
postgrest-loadtest-against v11.1.0

Running loadtest on "v11.1.0"...

Preparing worktree (detached HEAD 1f13e43a)
postgrest-with-postgresql-15: You can connect with: psql 'postgres:///postgres?host=/run/user/1000/postgrest/postgrest-with-postgresql-15-uj1/socket' -U postgres
postgrest-with-postgresql-15: You can tail the logs with: tail -f /run/user/1000/postgrest/postgrest-with-postgresql-15-uj1/db.log
delaying data to/from postgres by 0ms
Building postgrest (nix)... done.
Starting postgrest... done.
delaying data to/from PostgREST by 0ms
2023/09/12 17:17:00 accept unix /run/user/1000/postgrest/postgrest-with-slow-postgrest-3O6/postgrest.socket: use of closed network connection
2023/09/12 17:17:00 accept unix /run/user/1000/postgrest/postgrest-with-slow-pg-YDd/socket/.s.PGSQL.5432: use of closed network connection
Requests      [total, rate, throughput]         41890, 698.08, 634.62
Duration      [total, attack, wait]             1m0s, 1m0s, 427.59µs
Latencies     [min, mean, 50, 90, 95, 99, max]  74.201µs, 1.431ms, 480.298µs, 1.356ms, 10.445ms, 10.965ms, 19.283ms
Bytes In      [total, mean]                     42382350, 1011.75
Bytes Out     [total, mean]                     161211680, 3848.45
Success       [ratio]                           90.91%
Status Codes  [code:count]                      200:22850  201:3808  204:11424  400:3808
Error Set:
400 Bad Request

Done running on "v11.1.0".


Running loadtest on HEAD...

postgrest-with-postgresql-15: You can connect with: psql 'postgres:///postgres?host=/run/user/1000/postgrest/postgrest-with-postgresql-15-0iR/socket' -U postgres
postgrest-with-postgresql-15: You can tail the logs with: tail -f /run/user/1000/postgrest/postgrest-with-postgresql-15-0iR/db.log
delaying data to/from postgres by 0ms
Building postgrest (nix)... done.
Starting postgrest... done.
delaying data to/from PostgREST by 0ms
2023/09/12 17:18:04 accept unix /run/user/1000/postgrest/postgrest-with-slow-postgrest-oDJ/postgrest.socket: use of closed network connection
2023/09/12 17:18:05 accept unix /run/user/1000/postgrest/postgrest-with-slow-pg-Nt9/socket/.s.PGSQL.5432: use of closed network connection
Requests      [total, rate, throughput]         40614, 676.90, 615.36
Duration      [total, attack, wait]             1m0s, 1m0s, 479.307µs
Latencies     [min, mean, 50, 90, 95, 99, max]  74.712µs, 1.476ms, 483.312µs, 1.504ms, 10.708ms, 11.614ms, 19.803ms
Bytes In      [total, mean]                     41841298, 1030.22
Bytes Out     [total, mean]                     156300820, 3848.45
Success       [ratio]                           90.91%
Status Codes  [code:count]                      200:22154  201:3692  204:11076  400:3692
Error Set:
400 Bad Request

Done running on HEAD.

@steve-chavez steve-chavez changed the title Perf drop in latest releases Perf drop in latest releases - unstable postgrest-loadtest results Sep 12, 2023
@steve-chavez
Copy link
Member Author

steve-chavez commented Sep 12, 2023

While there is a slowdown, it's not awful lot. Might be a fluke; worst case is the slowdown is amplified by the environment Github actions are running in.

@develop7 Those are great news! Thanks for reporting!

I'd suggest we'd try running loadtest in a dedicated VM (in AWS for now) and see if we could reproduce the issue.

That would be great. Having a dedicated AWS instance where postgrest-loadtest runs on every CI run would give the most stable results. Seems doable considering we already use AWS for our ARM builds.

@steve-chavez
Copy link
Member Author

Having a dedicated AWS instance where postgrest-loadtest runs on every CI run

@laurenceisla Just told me that the above is not possible, the github secrets required for the AWS instance to run are not exposed to PRs, only for the main branch. This is why our ARM build only runs on main.

@develop7
Copy link
Collaborator

I'll summarize here (for posterity's sake) our private chat with @steve-chavez from yesterday.

  • I've reviewed the loadtest reports across commits in main branch and, while there are significant drops here and there, they are not preserved further in history. Therefore I think these drops are caused by uneven resources budgeting for action runners on GitHub. We could avoid that by moving runners to the environment with more control on resources (dedicated server, spot instances on AWS, etc).
  • Although, while producing unstable results across runs, loadtest CI job does two separate loadtest runs (with postgrest-loadtest-against script) and these numbers look in line within a run, doing their job after all.
  • Essentially, comparing loadtest metrics across CI runs is unreliable.
  • On the other hand, loadtest against main branch runs twice against the same commit. I've suggested we'd make it run against latest release instead, which would require some tweaking, which @steve-chavez approved.

Bottom line is: while there's probably no performance drop against latest release, I'll fix our loadtest CI to make sure there isn't for sure.

@steve-chavez
Copy link
Member Author

Closing this as it was already clarified there was no noticeable perf loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nix related to Nix tooling perf
Development

No branches or pull requests

3 participants