Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the fence mechanism performances #283

Merged
merged 3 commits into from
Aug 14, 2024

Conversation

dumbbell
Copy link
Member

This pull request is a collection of patches that improves the performances of the fence mechanism.

It uses ra:key_metrics/2 instead of ra:member_overview/2 and avoids the preliminary query that we do to serialize events if it is not needed. See individual commits for more details.

This pull request is blocked because it depends on a new release of Ra that contains rabbitmq/ra#461. Also, to fully fix the performance problem introduced with the fence mechanism, we also need that that future release of Ra contains rabbitmq/ra#462.

@dumbbell dumbbell added the enhancement New feature or request label Aug 13, 2024
@dumbbell dumbbell added this to the v0.15.0 milestone Aug 13, 2024
@dumbbell dumbbell self-assigned this Aug 13, 2024
src/khepri_machine.erl Outdated Show resolved Hide resolved
src/khepri_machine.erl Outdated Show resolved Hide resolved
@dumbbell dumbbell force-pushed the improve-fence-mechanism-performance branch from 73f219c to a705687 Compare August 13, 2024 16:29
[Why]
`ra:member_overview/2` is a very expensive call.

[How]
We just need the last index and the current term from the leader and
`ra:key_metrics/2` provides this piece of information too.

The difference is huge: in my benchmark, the query rate goes from 15
queries per second to 100k. This is in association with a related change
in Ra; see rabbitmq/ra#462.
…ition1/3`

[Why]
There is no need to run the same checks as a user query. We can execute
the query directly.
@dumbbell dumbbell force-pushed the improve-fence-mechanism-performance branch from a705687 to 04916a5 Compare August 14, 2024 14:18
Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.55%. Comparing base (503a29a) to head (4c25407).
Report is 4 commits behind head on main.

Files Patch % Lines
src/khepri_machine.erl 94.28% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #283      +/-   ##
==========================================
+ Coverage   89.36%   89.55%   +0.18%     
==========================================
  Files          21       21              
  Lines        3104     3122      +18     
==========================================
+ Hits         2774     2796      +22     
+ Misses        330      326       -4     
Flag Coverage Δ
erlang-25 88.78% <94.28%> (+0.16%) ⬆️
erlang-26 89.49% <94.28%> (+0.38%) ⬆️
erlang-27 89.49% <94.28%> (?)
os-ubuntu-latest 89.55% <94.28%> (+0.18%) ⬆️
os-windows-latest 89.55% <94.28%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dumbbell dumbbell marked this pull request as ready for review August 14, 2024 14:57
@dumbbell
Copy link
Member Author

This pull request is unblocked by the release of Ra 2.13.6.

Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny comment about docs, otherwise looks good

src/khepri_machine.erl Outdated Show resolved Hide resolved
... if it is unneeded.

[Why]
We do that extra query to ensure that previous async commands were
handled by the local Ra server before we proceed with the query with a
fence. This comes with a performance penalty of course.

We don't need that extra query if the previous command or query made by
the calling process was synchronous.

[How]
We now keep a flag in the calling process dictionary to indicate if the
last command was synchonous or it was a query. The flag is cleared with
an async command.

When we have to perform a query with a fence, we look at this flag to
determine if the extra query is needed.
@dumbbell dumbbell force-pushed the improve-fence-mechanism-performance branch from 04916a5 to 4c25407 Compare August 14, 2024 15:13
@dumbbell
Copy link
Member Author

dumbbell commented Aug 14, 2024

For the record, my local testing shows that consistent queries are about 100% slower than low latency queries. In other words, the throughput doubles with low latency queries compared to consistent ones.

Before the whole work in #260 and this pull request, consistent queries were slower by a single digit percent.

Edit: My message is unclear. To sum up:

  • Low latency are unaffected by the recent changes.
  • Consistent queries throughput was divided by 2 with these changes in order to make them more robust.

@the-mikedavis
Copy link
Member

I think consistent queries being slower than low latency is not very concerning for the sake of RabbitMQ: we only use consistent queries very rarely

@dumbbell
Copy link
Member Author

Indeed. Also sync writes and sync transactions now wait for the command to be applied locally before returning. Therefore, consistent queries should be rarely useful.

@dumbbell dumbbell merged commit 94e673c into main Aug 14, 2024
12 checks passed
@dumbbell dumbbell deleted the improve-fence-mechanism-performance branch August 14, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants