Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

serial insertion of bins into accounts index #18469

Merged
merged 1 commit into from
Jul 12, 2021

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Jul 7, 2021

Problem

Accounts index generation is slow. Looks like we end up with too few items per bin to insert to be efficiently done in parallel. At 72M accounts, 400k slots, we end up with an average of ~11 accounts per bin per slot.

Summary of Changes

Rely on parallelism at storage level instead of within index generation. This can be tuned over time. We could collect multiple storages/slots together to reduce lock grabbing cost. We could chunk multiple bins into a thread and spin up something like 4 or 8 or 16 threads total instead of BINS (currently 16) threads per insert.
Fixes #

@jeffwashington
Copy link
Contributor Author

jeffwashington commented Jul 7, 2021

total_us  and insertion_time_us go down. scan_stores_us even goes down. We are likely over-scheduling threads.
lemond:
this pr:
generate_index total_us= 9739623i scan_stores_us=12983168i insertion_time_us=20758566i min_bin_size=4565247i max_bin_size=4577150i total_items=73129210i
generate_index total_us=12805024i scan_stores_us=15308801i insertion_time_us=27794907i min_bin_size=4565247i max_bin_size=4577150i total_items=73129210i
master^

wiggins:
this pr:
generate_index total_us=14764011i scan_stores_us=13823149i insertion_time_us=37207121i min_bin_size=3888540i max_bin_size=3896996i total_items=62272320i
generate_index total_us=16692284i scan_stores_us=17065313i insertion_time_us=51326983i min_bin_size=3888540i max_bin_size=3896996i total_items=62272320i
master^

@jeffwashington jeffwashington requested a review from sakridge July 7, 2021 04:35
@jeffwashington jeffwashington marked this pull request as ready for review July 7, 2021 04:36
@jeffwashington jeffwashington force-pushed the copies9 branch 3 times, most recently from 65797e7 to e848039 Compare July 8, 2021 04:09
@codecov
Copy link

codecov bot commented Jul 10, 2021

Codecov Report

Merging #18469 (4d8e9a7) into master (c2e7d39) will increase coverage by 0.0%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #18469   +/-   ##
=======================================
  Coverage    82.7%    82.7%           
=======================================
  Files         440      440           
  Lines      123905   123900    -5     
=======================================
+ Hits       102525   102549   +24     
+ Misses      21380    21351   -29     

@jeffwashington jeffwashington merged commit f5ff4b2 into solana-labs:master Jul 12, 2021
jeffwashington added a commit to jeffwashington/solana that referenced this pull request Jul 12, 2021
mergify bot pushed a commit that referenced this pull request Jul 12, 2021
mergify bot added a commit that referenced this pull request Jul 12, 2021
(cherry picked from commit f5ff4b2)

Co-authored-by: Jeff Washington (jwash) <75863576+jeffwashington@users.noreply.github.com>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants