Optimize coalesce #126

springmeyer · 2018-07-27T20:25:27Z

This branch optimizes the coalesce function in these ways:

It pre-filters context objects using relev score to mitigate the size of the std::vector<Context> object before sorting
It reverts several calls to vector.reserve that were, based on profiling, incurring more overhead than efficiency gains because they were overallocating memory that was never written to. The rule of thumb should be to only call reserve when we know for sure that we'll use all the memory. Otherwise incurring re-allocation overhead will be less of a cost than over-allocating up front.
It makes several classes movable, such that re-allocation of these objects in vectors (during growth or sorting) should be faster, since they'll use move semantics.

@apendleton @aaaandrea - I've published dev binaries for this, but have not done more. Could you take over here to:

Is the relev pre-filtering okay, or could it break something? If it breaks something downstream, does that mean we are missing critical test coverage here?
The bottleneck that remained in local profiling was __getmatching and this PR does nothing to speed this up - no ideas there.
I noticed that in profiling the main event loop as doing about 40% in copying objects from C++ to JS and the threads were about 20% in __getmatching and 80% idle. I'm not sure why the threads are not more busy - could memory allocations or locking in rocksdb be slowing down the ability to dispatch more work to the threads?
In local profiling (with the bench data @apendleton provided) I did not see that carmen::ContextSortByRelev as a significant bottleneck like I did previously in production (refs Speed up sorting / reduce overhead of sorting #120). Does this mean that things have changed or that the benchmark does not represent the production load that was placed on the machines when we got the traces from Speed up sorting / reduce overhead of sorting #120?

springmeyer · 2018-07-28T00:49:18Z

Check if this works in carmen

See if it increases performance in the places that matter

Leaving this up to @apendleton

Dane Springmeyer added 5 commits July 27, 2018 09:55

don't overallocate over-optimistically

8e1b29f

limit the amount of contexts we need to sort by filtering in relev

9593993

avoid a few copies by more aggresively opting into move semantics

b34e643

format the code

eaa7bc1

[publish binary] 0.21.3-optimize

5155c74

apendleton merged commit 4235e95 into master Jul 31, 2018

apendleton deleted the optimize-coalesce branch July 31, 2018 18:49

springmeyer mentioned this pull request Aug 10, 2018

Documenting most recent bottlenecks detected #127

Open

springmeyer mentioned this pull request Sep 26, 2018

Libdeflate port mapbox/gzip-hpp#25

Open

4 tasks

springmeyer pushed a commit that referenced this pull request Sep 26, 2018

remove prefiltering - reverts part of #126

94cd697

springmeyer mentioned this pull request Sep 26, 2018

[testing] Remove prefiltering #131

Closed

apendleton mentioned this pull request Oct 18, 2018

Refactor coalesce to reduce duplicate get operations #134

Open

springmeyer mentioned this pull request Dec 20, 2018

Speed up sorting / reduce overhead of sorting #120

Open

Provide feedback