Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batching redis writes #234

Merged
merged 2 commits into from
Jan 30, 2022
Merged

batching redis writes #234

merged 2 commits into from
Jan 30, 2022

Conversation

JacobReynolds
Copy link
Contributor

@JacobReynolds JacobReynolds commented Jan 24, 2022

  • Adds the REDIS_WRITE_CHUNK_SIZE environment variable which allows pushes to the redis queue to be batched

In reference to #233 I wanted to propose a solution to slow syncs between the database and redis. I was able to get the sync time down from ~2 minutes to 1 minute when inserting ~16k items on my local machine, with a single core dedicated to the process. I think we would see greater improvements on a more powerful computer.

Would love any feedback on this option and your help in verifying it works.

@@ -106,6 +106,8 @@
REDIS_AUTH = env.str("REDIS_AUTH", default=None)
# number of items to read from Redis at a time
REDIS_CHUNK_SIZE = env.int("REDIS_CHUNK_SIZE", default=1000)
# number of items to write to Redis at a time
REDIS_WRITE_CHUNK_SIZE = env.int("REDIS_WRITE_CHUNK_SIZE", default=1000)
Copy link
Owner

@toluaina toluaina Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eventually the other variable should be renamed to REDIS_READ_CHUNK_SIZE

@@ -1004,10 +1009,13 @@ def poll_db(self) -> None:
os._exit(-1)

while conn.notifies:
if len(item_queue)>=REDIS_WRITE_CHUNK_SIZE:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this and wondered if there is an opportunity to just have one bulk_push in this method. Apart from that looks good. Great job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to find a way to do that. My worry is that if the length of item_queue exceeds REDIS_WRITE_CHUNK_SIZE while inside the conn.notifies loop, we would exceed the max size. Did you have any ideas?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I was trying to get some timings against the old method to put for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a sample repo you can use here https://github.com/JacobReynolds/PGSyncSpeedTest that has both methods available if you want to use that for timing

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yields significant improvements for write operations.

Slow timer 0:00:10.480808 (10.48 sec)
Fast timer 0:00:00.651109 (0.65 sec)

10K records and batch 1K.

Nice work!

Copy link
Owner

@toluaina toluaina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Will make a relese of this right away

@toluaina toluaina merged commit a32f117 into toluaina:master Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants