Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses fold+reduce for handling duplicate pubkeys during index generation #34011

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Nov 9, 2023

Problem

When generating the index we visit duplicate pubkeys to get uncleaned roots and also the accounts data len from those duplicates. This visitation is farmed out to multiple threads with rayon. To capture the results, we currently use a mutex. This causes undue communication and contention between these threads, since rayon has primitives for doing these reductions.

Summary of Changes

Replace the mutex with an impl that uses Rayon's fold and reduce.

Results

Using ledger-tool and a recent mnb snapshot, I compared master (as of 69ab8a8234) to this PR. Since this change only impacts index generation, I only looked at that metric, and specifically the accounts_data_len_dedup_time_us datapoint:

branch time (us)
master 1,974,473
this pr 1,700,625
difference 273,848

So not huge, but it's something!

@brooksprumo brooksprumo added the work in progress This isn't quite right yet label Nov 9, 2023
@brooksprumo brooksprumo self-assigned this Nov 9, 2023
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #34011 (7512e9a) into master (69ab8a8) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           master   #34011     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         811      811             
  Lines      219412   219428     +16     
=========================================
+ Hits       179792   179797      +5     
- Misses      39620    39631     +11     

@brooksprumo brooksprumo force-pushed the generate-index/uncleaned-roots/reduce branch from 5c414c2 to 7512e9a Compare November 10, 2023 16:30
@brooksprumo brooksprumo removed the work in progress This isn't quite right yet label Nov 10, 2023
@brooksprumo brooksprumo marked this pull request as ready for review November 10, 2023 17:21
Comment on lines -9443 to -9444
self.accounts_index
.add_uncleaned_roots(uncleaned_roots.into_iter());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved up into the block that makes uncleaned_roots. Same for the other "removed" lines just above here.

Comment on lines +9460 to +9465
accounts_data_len_dedup_timer.stop();
timings.accounts_data_len_dedup_time_us = accounts_data_len_dedup_timer.as_us();
timings.slots_to_clean = uncleaned_roots.len() as u64;

self.accounts_index
.add_uncleaned_roots(uncleaned_roots.into_iter());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. lgtm.

Copy link
Contributor

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo brooksprumo merged commit 3c71f85 into solana-labs:master Nov 10, 2023
18 checks passed
@brooksprumo brooksprumo deleted the generate-index/uncleaned-roots/reduce branch November 10, 2023 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants