You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Writing to online store (DynamoDB) currently uses a single thread, and takes extremely long when attempting to write billions of records (on the order of 12+ hours).
Describe the solution you'd like
Ability to parallelize writes to online store across multiple threads, ideally multiple nodes.
Describe alternatives you've considered
Tried using multiprocessing.Pool to parallelize writes on a single node with 96 cores but am hitting memory limits.
Historically have used AWS Glue to ingest data from S3->DynamoDB (using DynamoDB as a sink), and it works quite well.
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Sure, I can submit one once I iron out the memory issues. I'm thinking of using something like joblib which allows switching out the parallel processing backend.
Is your feature request related to a problem? Please describe.
Writing to online store (DynamoDB) currently uses a single thread, and takes extremely long when attempting to write billions of records (on the order of 12+ hours).
Describe the solution you'd like
Ability to parallelize writes to online store across multiple threads, ideally multiple nodes.
Describe alternatives you've considered
Tried using
multiprocessing.Pool
to parallelize writes on a single node with 96 cores but am hitting memory limits.Historically have used AWS Glue to ingest data from S3->DynamoDB (using DynamoDB as a sink), and it works quite well.
Additional context
N/A
The text was updated successfully, but these errors were encountered: