Skip to content
This repository has been archived by the owner on Oct 30, 2021. It is now read-only.

Optimize coalesce #126

Merged
merged 5 commits into from
Jul 31, 2018
Merged

Optimize coalesce #126

merged 5 commits into from
Jul 31, 2018

Conversation

springmeyer
Copy link
Contributor

@springmeyer springmeyer commented Jul 27, 2018

This branch optimizes the coalesce function in these ways:

  • It pre-filters context objects using relev score to mitigate the size of the std::vector<Context> object before sorting
  • It reverts several calls to vector.reserve that were, based on profiling, incurring more overhead than efficiency gains because they were overallocating memory that was never written to. The rule of thumb should be to only call reserve when we know for sure that we'll use all the memory. Otherwise incurring re-allocation overhead will be less of a cost than over-allocating up front.
  • It makes several classes movable, such that re-allocation of these objects in vectors (during growth or sorting) should be faster, since they'll use move semantics.

@apendleton @aaaandrea - I've published dev binaries for this, but have not done more. Could you take over here to:

  • Check if this works in carmen
  • See if it increases performance in the places that matter

Open questions

  • Is the relev pre-filtering okay, or could it break something? If it breaks something downstream, does that mean we are missing critical test coverage here?
  • The bottleneck that remained in local profiling was __getmatching and this PR does nothing to speed this up - no ideas there.
  • I noticed that in profiling the main event loop as doing about 40% in copying objects from C++ to JS and the threads were about 20% in __getmatching and 80% idle. I'm not sure why the threads are not more busy - could memory allocations or locking in rocksdb be slowing down the ability to dispatch more work to the threads?
  • In local profiling (with the bench data @apendleton provided) I did not see that carmen::ContextSortByRelev as a significant bottleneck like I did previously in production (refs Speed up sorting / reduce overhead of sorting #120). Does this mean that things have changed or that the benchmark does not represent the production load that was placed on the machines when we got the traces from Speed up sorting / reduce overhead of sorting #120?

@springmeyer
Copy link
Contributor Author

Check if this works in carmen

https://github.com/mapbox/carmen/pull/728

See if it increases performance in the places that matter

Leaving this up to @apendleton

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants