-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp/lighthorizon: Unify single-process and map-reduce index builders. #4420
Closed
Shaptic
wants to merge
27
commits into
stellar:lighthorizon
from
Shaptic:lighthorizon_actually-parallel
Closed
exp/lighthorizon: Unify single-process and map-reduce index builders. #4420
Shaptic
wants to merge
27
commits into
stellar:lighthorizon
from
Shaptic:lighthorizon_actually-parallel
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Shaptic
changed the title
Lighthorizon actually parallel
exp/lighthorizon: Unify single-process and map-reduce versions of index building.
May 31, 2022
Shaptic
changed the title
exp/lighthorizon: Unify single-process and map-reduce versions of index building.
exp/lighthorizon: Unify single-process and map-reduce index builders.
May 31, 2022
For local testing, S3 is a little suboptimal.. This lets us use file:// paths for "batching" the maps locally.
If this isn't done, then FlushAccounts() will do absolutely nothing after a Flush(), because the previous Flush() will clear the map of indices out of memory.
According to S/O, parallel writes are thread-safe despite it not being an explicit guarantee. This might be OS-specific, though? Cross that bridge if we ever get there...
Shaptic
force-pushed
the
lighthorizon_actually-parallel
branch
from
June 3, 2022 00:39
75628ca
to
f62d08c
Compare
Here's a bash script I used to test this locally, for posterity: # Performs a local map-reduce indexing.
#
# Obviously, this does not offer many performance benefits over running the
# indexer using the single-process version. However, it does allow us to test it
# locally without needing to set up AWS Batch jobs.
MONOREPO="$GOPATH/src/github.com/stellar/go"
MAPREDUCE="$MONOREPO/exp/lighthorizon/index/cmd/batch"
CC_TOML="/etc/default/stellar-captive-core.toml"
# We assume the ledger exporter has already been run.
TXMETA_PATH="$HOME/workspace/txmeta-archive-bigly"
echo "Searching $TXMETA_PATH..."
LATEST_LEDGER=$(ls $TXMETA_PATH/ledgers | tail -n1)
BASE=$(ls $TXMETA_PATH/ledgers | head -n1)
START_LEDGER=$((((($BASE / 64) + 1) * 64) - 1))
COUNT=$(($LATEST_LEDGER - $START_LEDGER))
echo "Determined ledger range: [$START_LEDGER, $LATEST_LEDGER] ($COUNT ledgers)"
INDICES_PATH="$HOME/workspace/map-reduce/indices-dump"
echo "Recreating target directory: $INDICES_PATH"
# rm -rf $INDICES_PATH/G*/ $INDICES_PATH/accounts
rm -rf $INDICES_PATH
mkdir -p $INDICES_PATH
cd $MAPREDUCE/map
go build . || exit
for i in {0..3}
do
echo "Creating map job $i"
AWS_BATCH_JOB_ARRAY_INDEX=$i \
BATCH_SIZE=128 \
FIRST_CHECKPOINT=$START_LEDGER \
TXMETA_SOURCE=file://$TXMETA_PATH \
INDEX_TARGET=file://$INDICES_PATH/job_$i/ \
./map &
done
while :
do
COUNT=$(ps aux | grep -E './map$' | wc -l)
if [[ $COUNT -eq "0" ]]; then
break
fi
sleep 1
done
echo "Map jobs are complete"
cd $MAPREDUCE/reduce
go build . || exit
for i in {0..0}
do
echo "Creating reduce job $i"
AWS_BATCH_JOB_ARRAY_INDEX=$i \
REDUCE_JOB_COUNT=1 \
MAP_JOB_COUNT=2 \
WORKER_COUNT=2 \
INDEX_SOURCE_ROOT=file://$INDICES_PATH/ \
INDEX_TARGET=file://$INDICES_PATH/ \
./reduce &
done
while :
do
COUNT=$(ps aux | grep -E './reduce$' | wc -l)
if [[ $COUNT -eq "0" ]]; then
break
fi
sleep 1
done
echo "Reduce jobs are complete" We should probably turn this into an integration test of sorts at some point. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
This unifies the index building across the map-reduce and single-process versions:
builder.go:BuildIndices
does the work (so we get txmeta indexing "for free")participantsForOperations
andgetLedgerKeyParticipants
are goneProcessAccountsWithoutBackend
module does the backend-less index buildingshouldAccountBeProcessed
andshouldTransactionBeProcessed
helpers are pulled out versions of the batching codeIt also cleans up and abstracts away the environmental variables for AWS Batch, which should make it simpler to support other platforms or even local runs.
Why
Less code, more sharing. Related to #4403.
Known limitations
I haven't tested this on AWS Batch yet.