Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

Parallelize build_hash_chain #1334

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

xandkar
Copy link
Contributor

@xandkar xandkar commented May 10, 2022

For a full chain resync, this reduces the build_hash_chain running time from ~65 minutes to ~11 minutes on my machine (16-core Ryzen 9 with 32 GB RAM).

Overview

High

The main idea is to unbundle the following bundled operations:

  1. retrieval of blocks (can be asynchronized)
  2. deserialization of blocks (can be parallelized)
  3. walk through block child->parent relations (the only essentially serial part of the job)

Mid

  1. asynchronously:
    1.1. enumerate all {K, V} pairs in the blocks CF
  2. in-parallel:
    2.1. consume above {K, V} pairs
    2.2. deserialize blocks (the Vs above)
    2.3. lookup parents
    2.4. accumulate {Child, Parent} pairs
  3. in-serial:
    3.1. build a child-to-parent relations map from above pairs
    3.2. walk the relations from the youngest given hash and trace its
    longest possible lineage up to the oldest given hash

The last step, 3.2, is essentially what the previous, serial implementation used to do, but this time all the expensive deserialization has already been done, in parallel.

Low

See code :)

The main idea is to
1. asynchronously: enumerate all `{K, V}` pairs in the blocks CF;
2. in-parallel:
    2.1 deserialize blocks;
    2.2 lookup parents;
    2.3 accumulate `{Parent, Child}` pairs;
3. in-serial:
    3.1 build graph from above-built pairs;
    3.2 walk backwards from the youngest given hash and trace its
        longest possible lineage up to the oldest given hash.

The last step, 3.2, is essentially what the previous implementation
(that this one replaces) used to do, but this time all the expensive
deserialization has already been done, in parallel.
@xandkar xandkar force-pushed the sk/parallelize-build_hash_chain branch from cb82d65 to 9bfd163 Compare May 10, 2022 14:34
after moving from topsorting to trace-back - I realized that things are
much simpler now - duh!

Bonus - this is also about 30 seconds faster on my machine.
@xandkar xandkar force-pushed the sk/parallelize-build_hash_chain branch from ff86a97 to a3021f3 Compare May 14, 2022 16:45
@xandkar xandkar force-pushed the sk/parallelize-build_hash_chain branch from a3021f3 to 4a55a8d Compare May 14, 2022 16:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant