Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel writing to online store #2421

Closed
npow opened this issue Mar 18, 2022 · 3 comments
Closed

Support parallel writing to online store #2421

npow opened this issue Mar 18, 2022 · 3 comments
Assignees
Labels
kind/feature New feature or request priority/p2

Comments

@npow
Copy link

npow commented Mar 18, 2022

Is your feature request related to a problem? Please describe.
Writing to online store (DynamoDB) currently uses a single thread, and takes extremely long when attempting to write billions of records (on the order of 12+ hours).

Describe the solution you'd like
Ability to parallelize writes to online store across multiple threads, ideally multiple nodes.

Describe alternatives you've considered
Tried using multiprocessing.Pool to parallelize writes on a single node with 96 cores but am hitting memory limits.
Historically have used AWS Glue to ingest data from S3->DynamoDB (using DynamoDB as a sink), and it works quite well.

Additional context
N/A

@npow npow added the kind/feature New feature or request label Mar 18, 2022
@woop
Copy link
Member

woop commented Mar 21, 2022

Thanks for raising this issue @npow!

Ability to parallelize writes to online store across multiple threads, ideally multiple nodes.

Would you be open to submitting a pull request to implement this functionality?

@npow
Copy link
Author

npow commented Mar 21, 2022

Sure, I can submit one once I iron out the memory issues. I'm thinking of using something like joblib which allows switching out the parallel processing backend.

@adchia
Copy link
Collaborator

adchia commented Apr 21, 2022

Moving discussions to the linked issue!

@adchia adchia closed this as completed Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request priority/p2
Projects
None yet
Development

No branches or pull requests

4 participants