-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Horizon Lite: Improve the performance and functionality of the batch-based indexer. #4566
Closed
3 tasks
Tracked by
#4571
Comments
3 tasks
Shaptic
changed the title
exp/lighthorizon/cmd/batch: performance optimizations for reduce
Horizon Lite: Optimize the performance of the indexer reduce job.
Aug 31, 2022
Shaptic
changed the title
Horizon Lite: Optimize the performance of the indexer reduce job.
Horizon Lite: Improve the performance and functionality of the batch-based indexer.
Aug 31, 2022
We should also consider using something other than s3 since we may not end up using s3 in production (for cost reasons). |
64 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Context
There are several necessary improvements to the existing map-reduce batch job for index creation:
reduce
is low when the target/source index is remote, for example, S3 (jobs don't complete, running forever and churning slowly on account/tx merging routines)reduce
job operates on all modules, even if the map job only specified on module.Suggestions
tx/
folder, skip iterating all 255 tx prefixes if the map job output does not have 'tx' folder. (This happens when map was configured to not includetransactions
in itsMODULES
.)map
jobs write to a single on-disk volume or source of storage,reduce
jobs merge them together to the same on-disk source,target
index.job_id:accountid->true/false
, then the worker -> account -> read-all-map-jobs-for-account loop can check for account presence first and avoid sending iterative network trips to remote 'source' index that will be empty response anyway.Acceptance Criteria
It's entirely possible that this task can/should be broken down into many sub-tasks based on the above suggestions, but the general criteria for completion should be:
map
job did not apply all modules - per first suggestion aboveThe text was updated successfully, but these errors were encountered: