Skip to content

Increase target shard usage #5771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Increase target shard usage #5771

wants to merge 1 commit into from

Conversation

rdettai
Copy link
Collaborator

@rdettai rdettai commented May 20, 2025

Description

Currently the target shard usage before scaling down the number of shards is 20%. This is quite low which results in a high number of shards.

Also decrease the default scale up factor from 1.5 to 1.1. A scale up factor of 1.1 still exhibits a good scaling behavior for bigger indexes (e.g shards are added 10 by 10 when an index has 100 shards) while limiting the likelihood of overshooting.

How was this PR tested?

Added unit tests.

* 0.3f32,
scale_down_shards_threshold_mib_per_sec: max_shard_throughput_mib_per_sec * 0.2f32,
* 0.375f32,
scale_down_shards_threshold_mib_per_sec: max_shard_throughput_mib_per_sec * 0.25f32,
shard_scale_up_factor,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase the target load by 25% only. Increasing it more would prevent the scaling from 1 shard to 2 shards to happen properly. Take max_shard_throughput_mib_per_sec=5MB and num_open_shards=1:

  • with 0.3 as scale up load factor, scale_up_shards_long_term_threshold_mib_per_sec=1.5MB. This means that avg_long_term_ingestion_rate must be >3MB for scaling event to occur.
  • with 0.375 as scale up load factor, scale_up_shards_long_term_threshold_mib_per_sec=1.875MB. This means that avg_long_term_ingestion_rate must be >3.75MB for scaling event to occur.
  • with 0.45 as scale up load factor, scale_up_shards_long_term_threshold_mib_per_sec=2.25MB. This means that avg_long_term_ingestion_rate must be >4.5MB for scaling event to occur.

This is due to how long_term_scale_up_threshold_max_shards avoids getting into the interval [scale_down, scale_up]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant