-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
I was talking with @mch2 yesterday, saying "Hey -- we should add a dedicated API, or maybe just a flag to force_merge, to rewrite old segments using the latest codec after doing an upgrade." I'm glad I talked with him before creating an issue for it, because he told me that it already exists. Great!
I was curious about how it works, so I dove through the call to InternalEngine.forceMerge(), where it passes the upgrade = true parameter, which it sets on the OpenSearchMergePolicy, and then this logic gets called to do the actual segment rewriting. Cool.
Except, it's not cool at all, because, if there's nothing to upgrade (because all the segments have already been (re)written using the latest codec), it falls back to the underlying merge policy's force merge logic, which will compute merges with unbounded max segment size.
Overall, the real problem is OpenSearchTieredMergePolicy and that annoying forceMergePolicy that ignores max segment size. While we need it in order to maintain the existing behavior when merging down to a single segment (e.g. before hot-to-warm migration), IMO we should not use forceMergePolicy unless we're explicitly trying to merge down to a small number of segments. Specifically, I would say that we should only use forceMergePolicy if total_shard_size / maxSegmentCount > maxSegmentSizeMB -- that is, whenever we can merge down to maxSegmentCount segments without exceeding the max segment size, then we should honor the max segment size.
Related component
Indexing
To Reproduce
- Create an index with 1 shard, 0 replicas (for simplicity), called
my-index, with a merge policy that sets max segment size to 500MB. - Index 5GB of data.
- Run
POST /my-index/_upgrade - Run
GET /_cat/segments - Maybe repeat steps 3 and 4 a few times.
- Notice that (eventually), a segment with size greater than 500MB gets written.
Expected behavior
I would expect max segment size to be honored except when I force merge to a small number of segments.
Additional Details
Plugins
N/A
Screenshots
N/A
Host/Environment (please complete the following information):
- OS: My brain reading through code
- Version: Way too old
Additional context
N/A