Stop Allocating Document Source to Unpooled Buffers when Indexing #67502

original-brownbear · 2021-01-14T09:38:57Z

This makes the handling of bulk shard requests stop allocating a new byte[] for every document source. Doing so effectively doubles the peak memory use for any bulk request not handled on a coordinating node (there the bytes are already pooled via the REST layer) needlessly. With larger documents allocating the source to unpooled byte[] could also lead to humongous allocations for the document source.

Unlike the coordinator node pooling of bytes, pooling these requests per shard should behave a lot nicer in theory since the requests (BulkShardRequest) get processed "in one piece" anyway and need to be retained fully for retries as well as far as I can see.

In some quick and dirty benchmarking I could see a significant drop in GC for the PMC benchmark run on a 3-node cluster with this and a massive saving in the number of byte[] that are allocated per unit of time.

WIP, this was a holiday project and still needs some cleanup here and there and should be broken up into pieces since it introduces tricky new test infrastructure. But overall this commit works. It makes it so all the source bytes in bulk requests are shared throughout the lifecycle of the bulk request instead of copying them to unpooled buffers when deserializing indexing requests. The essentially cuts the peak memory use for bulk requests in almost half (though by retaining buffers for longer that effect is reduced a little compared to GC's ability to more fine-grained collected byte arrays). Still overall this reduces GC times by 2/3 when running the rally PMC indexing track, reduces the risk of humungous allocations with G1GC and makes memory use less spiky so on balance I think this is the direction to go.

This commit adds leak tracking infrastructure that enables assertions about the state of objects at GC time (simplified version of what Netty uses to track `ByteBuf` instances). This commit uses the infrastructure to improve the quality of leak checks for page recycling in the mock nio transport (the logic in `org.elasticsearch.common.util.MockPageCacheRecycler#ensureAllPagesAreReleased` does not run for all tests and tracks too little information to allow for debugging what caused a specific leak in most cases due to the lack of an equivalent of the added `#touch` logic). This is elastic#67502

mark-vieira · 2021-02-03T00:16:22Z

@elasticmachine update branch

…-bytes-clean

Added this assertion to have an easier time debugging work on elastic#67502 and found that we were accessing `refcount == 0` bytes in the `SSLOutboundBuffer` so I fixed that buffer to not keep references to released pages.

Added this assertion to have an easier time debugging work on #67502 and found that we were accessing `refcount == 0` bytes in the `SSLOutboundBuffer` so I fixed that buffer to not keep references to released pages.

Added this assertion to have an easier time debugging work on elastic#67502 and found that we were accessing `refcount == 0` bytes in the `SSLOutboundBuffer` so I fixed that buffer to not keep references to released pages.

Added this assertion to have an easier time debugging work on #67502 and found that we were accessing `refcount == 0` bytes in the `SSLOutboundBuffer` so I fixed that buffer to not keep references to released pages.

…-bytes-clean

elasticmachine · 2021-02-27T20:14:17Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2021-02-27T20:19:45Z

server/src/main/java/org/elasticsearch/action/bulk/Retry.java

            TimeValue next = backoff.next();
            logger.trace("Retry of bulk request scheduled in {} ms.", next.millis());
-            retryCancellable = scheduler.schedule(() -> this.execute(bulkRequestForRetry), next, ThreadPool.Names.SAME);
+            bulkRequestForRetry.incRef();


I initially thought this was inefficient relative to what we currently do but:

Retrying, will cause the whole request to be retained instead of just those requests that actually get retried. But, I think in the real world this is less of an issue with where we currently are.
If the retry happens on a coordinating node, we already have this problem (since we use shared bytes that are tracked as one thing for the whole bulk request). If the retry happens for a replication request, then the replication request is the same unit of work as the primary bulk request received so that seems fine as well. There's certainly room here for more selectively retaining bytes though but I don't think this PR makes us worse off here.

original-brownbear · 2021-02-27T20:44:19Z

Jenkins run elasticsearch-ci/2 (unrelated ml)

original-brownbear · 2021-02-27T20:52:58Z

Jenkins run elasticsearch-ci/1

…-bytes-clean

elasticsearchmachine · 2025-01-30T16:56:47Z

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

elasticsearchmachine · 2025-01-30T16:56:48Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

original-brownbear added >non-issue WIP :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. :Distributed Coordination/Network Http and internode communication implementations labels Jan 14, 2021

original-brownbear mentioned this pull request Jan 19, 2021

Add Leak Tracking Infrastructure #67688

Merged

Merge remote-tracking branch 'elastic/master' into ref-count-indexing…

6b72f41

…-bytes-clean

original-brownbear mentioned this pull request Feb 25, 2021

Add Ref Count Assertion to Page #69599

Merged

original-brownbear mentioned this pull request Feb 25, 2021

Add Ref Count Assertion to Page (#69599) #69622

Merged

original-brownbear added 6 commits February 25, 2021 20:07

Merge remote-tracking branch 'elastic/master' into ref-count-indexing…

754d2d2

…-bytes-clean

merge conflict

782a8fb

less noise

e741409

Merge remote-tracking branch 'elastic/master' into ref-count-indexing…

a22da2d

…-bytes-clean

nicer

44478f8

unrelated

9d89082

original-brownbear marked this pull request as ready for review February 27, 2021 20:14

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 27, 2021

original-brownbear commented Feb 27, 2021

View reviewed changes

original-brownbear marked this pull request as draft February 27, 2021 21:37

original-brownbear added 2 commits February 28, 2021 11:01

Merge remote-tracking branch 'elastic/master' into ref-count-indexing…

92d4099

…-bytes-clean

necessary after all

aae141c

kingherc added the v8.7.0 label Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jan 30, 2025

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop Allocating Document Source to Unpooled Buffers when Indexing #67502

Stop Allocating Document Source to Unpooled Buffers when Indexing #67502

Uh oh!

original-brownbear commented Jan 14, 2021 •

edited

Loading

Uh oh!

mark-vieira commented Feb 3, 2021

Uh oh!

elasticmachine commented Feb 27, 2021

Uh oh!

original-brownbear Feb 27, 2021

Uh oh!

original-brownbear commented Feb 27, 2021

Uh oh!

original-brownbear commented Feb 27, 2021

Uh oh!

elasticsearchmachine commented Jan 30, 2025

Uh oh!

elasticsearchmachine commented Jan 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Stop Allocating Document Source to Unpooled Buffers when Indexing #67502

Are you sure you want to change the base?

Stop Allocating Document Source to Unpooled Buffers when Indexing #67502

Uh oh!

Conversation

original-brownbear commented Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-vieira commented Feb 3, 2021

Uh oh!

elasticmachine commented Feb 27, 2021

Uh oh!

original-brownbear Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Feb 27, 2021

Uh oh!

original-brownbear commented Feb 27, 2021

Uh oh!

elasticsearchmachine commented Jan 30, 2025

Uh oh!

elasticsearchmachine commented Jan 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

original-brownbear commented Jan 14, 2021 •

edited

Loading