exp/lighthorizon: Unify single-process and map-reduce index builders. #4420

Shaptic · 2022-05-31T18:21:39Z

What

This unifies the index building across the map-reduce and single-process versions:

builder.go:BuildIndices does the work (so we get txmeta indexing "for free")
the redundant participantsForOperations and getLedgerKeyParticipants are gone
the new ProcessAccountsWithoutBackend module does the backend-less index building
the shouldAccountBeProcessed and shouldTransactionBeProcessed helpers are pulled out versions of the batching code

It also cleans up and abstracts away the environmental variables for AWS Batch, which should make it simpler to support other platforms or even local runs.

Why

Less code, more sharing. Related to #4403.

Known limitations

I haven't tested this on AWS Batch yet.

For local testing, S3 is a little suboptimal.. This lets us use file:// paths for "batching" the maps locally.

… over panic

…y S3 URLs

…lace

If this isn't done, then FlushAccounts() will do absolutely nothing after a Flush(), because the previous Flush() will clear the map of indices out of memory.

According to S/O, parallel writes are thread-safe despite it not being an explicit guarantee. This might be OS-specific, though? Cross that bridge if we ever get there...

Shaptic · 2022-06-03T18:26:35Z

Here's a bash script I used to test this locally, for posterity:

# Performs a local map-reduce indexing.
#
# Obviously, this does not offer many performance benefits over running the
# indexer using the single-process version. However, it does allow us to test it
# locally without needing to set up AWS Batch jobs.

MONOREPO="$GOPATH/src/github.com/stellar/go"
MAPREDUCE="$MONOREPO/exp/lighthorizon/index/cmd/batch"
CC_TOML="/etc/default/stellar-captive-core.toml"

# We assume the ledger exporter has already been run.
TXMETA_PATH="$HOME/workspace/txmeta-archive-bigly"
echo "Searching $TXMETA_PATH..."

LATEST_LEDGER=$(ls $TXMETA_PATH/ledgers | tail -n1)
BASE=$(ls $TXMETA_PATH/ledgers | head -n1)
START_LEDGER=$((((($BASE / 64) + 1) * 64) - 1))
COUNT=$(($LATEST_LEDGER - $START_LEDGER))

echo "Determined ledger range: [$START_LEDGER, $LATEST_LEDGER] ($COUNT ledgers)"

INDICES_PATH="$HOME/workspace/map-reduce/indices-dump"
echo "Recreating target directory: $INDICES_PATH"

# rm -rf $INDICES_PATH/G*/ $INDICES_PATH/accounts

rm -rf $INDICES_PATH
mkdir -p $INDICES_PATH

cd $MAPREDUCE/map
go build . || exit
for i in {0..3}
do
    echo "Creating map job $i"

    AWS_BATCH_JOB_ARRAY_INDEX=$i \
    BATCH_SIZE=128 \
    FIRST_CHECKPOINT=$START_LEDGER \
    TXMETA_SOURCE=file://$TXMETA_PATH \
    INDEX_TARGET=file://$INDICES_PATH/job_$i/ \
        ./map &
done

while :
do
    COUNT=$(ps aux | grep -E './map$' | wc -l)
    if [[ $COUNT -eq "0" ]]; then
        break
    fi
    sleep 1
done

echo "Map jobs are complete"

cd $MAPREDUCE/reduce
go build . || exit
for i in {0..0}
do
    echo "Creating reduce job $i"
    AWS_BATCH_JOB_ARRAY_INDEX=$i \
    REDUCE_JOB_COUNT=1 \
    MAP_JOB_COUNT=2 \
    WORKER_COUNT=2 \
    INDEX_SOURCE_ROOT=file://$INDICES_PATH/ \
    INDEX_TARGET=file://$INDICES_PATH/ \
        ./reduce &
done

while :
do
    COUNT=$(ps aux | grep -E './reduce$' | wc -l)
    if [[ $COUNT -eq "0" ]]; then
        break
    fi
    sleep 1
done

echo "Reduce jobs are complete"

We should probably turn this into an integration test of sorts at some point.

Shaptic added 2 commits May 31, 2022 10:21

Unify map-reduce and single-process index builders

544bff2

Simpliy modules: Derive checkpoint from ledger meta

5e8c0d7

Shaptic added the Ingestion Lite label May 31, 2022

Shaptic self-assigned this May 31, 2022

Shaptic changed the title ~~Lighthorizon actually parallel~~ exp/lighthorizon: Unify single-process and map-reduce versions of index building. May 31, 2022

Shaptic changed the title ~~exp/lighthorizon: Unify single-process and map-reduce versions of index building.~~ exp/lighthorizon: Unify single-process and map-reduce index builders. May 31, 2022

Shaptic changed the base branch from master to lighthorizon May 31, 2022 18:22

Shaptic added 23 commits May 31, 2022 16:06

Allow map program to use local filesystem:

c76731a

For local testing, S3 is a little suboptimal.. This lets us use file:// paths for "batching" the maps locally.

Move thread-safe string to another file

8b855b1

Cleanup: workers can be specified via the S3 connection URL

e3c78da

Use the previous new method e.g. ?workers=[int], plus misc. cleanup

1d4b1d7

misc. cleanup: renames, shove it all into a function to return errors…

522fb2b

… over panic

Allow index URLs to be passed in the environment, thus not necessaril…

2c43950

…y S3 URLs

Remove S3 dependency (all envvars now) and add logging all over the p…

4e4cf16

…lace

Merge branch 'lighthorizon' into lighthorizon_actually-parallel

58b123a

Add ability to read account list from file backend

362e312

Flush account list before dropping the in-memory store:

12ca5de

If this isn't done, then FlushAccounts() will do absolutely nothing after a Flush(), because the previous Flush() will clear the map of indices out of memory.

Better errors, adapt to new flushes, use pointers where appropriate

bd07275

Pass workers as an envvar rather than a hard-coded value

fc8cd47

Bugfix: os.IsNotExist() is a more reliable error check

e5ac3e8

:clown:

2800f98

More/Cleaner logging

ba46a8f

Append over Write to deal w/ parallel writes:

67fad26

According to S/O, parallel writes are thread-safe despite it not being an explicit guarantee. This might be OS-specific, though? Cross that bridge if we ever get there...

Update TODO with findings

aa590de

Use XDR for accounts in the "list" file

cf2a8cc

Bug: What if a ledger range is smaller than a single checkpoint?

e701006

Doc typo: the partitioning uses 16 bits not 32 for each half

e9a9f90

Use config params for partitioning, add index merging helper

425530c

Partition work across many channels rather than a single one

1d6aece

Better logging, use helpers, etc.

8941be5

Shaptic added 2 commits June 2, 2022 17:39

Fixing misc. clownery

be8be23

Attempt at resolving concurrent map writes in tx index builder

f62d08c

Shaptic force-pushed the lighthorizon_actually-parallel branch from 75628ca to f62d08c Compare June 3, 2022 00:39

Shaptic closed this Jun 3, 2022

Shaptic mentioned this pull request Jun 3, 2022

exp/lighthorizon: Refactor and repair the reduce job #4424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp/lighthorizon: Unify single-process and map-reduce index builders. #4420

exp/lighthorizon: Unify single-process and map-reduce index builders. #4420

Shaptic commented May 31, 2022

Shaptic commented Jun 3, 2022

exp/lighthorizon: Unify single-process and map-reduce index builders. #4420

exp/lighthorizon: Unify single-process and map-reduce index builders. #4420

Conversation

Shaptic commented May 31, 2022

What

Why

Known limitations

Shaptic commented Jun 3, 2022