-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batching redis writes #234
Conversation
@@ -106,6 +106,8 @@ | |||
REDIS_AUTH = env.str("REDIS_AUTH", default=None) | |||
# number of items to read from Redis at a time | |||
REDIS_CHUNK_SIZE = env.int("REDIS_CHUNK_SIZE", default=1000) | |||
# number of items to write to Redis at a time | |||
REDIS_WRITE_CHUNK_SIZE = env.int("REDIS_WRITE_CHUNK_SIZE", default=1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think eventually the other variable should be renamed to REDIS_READ_CHUNK_SIZE
@@ -1004,10 +1009,13 @@ def poll_db(self) -> None: | |||
os._exit(-1) | |||
|
|||
while conn.notifies: | |||
if len(item_queue)>=REDIS_WRITE_CHUNK_SIZE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this and wondered if there is an opportunity to just have one bulk_push in this method. Apart from that looks good. Great job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was trying to find a way to do that. My worry is that if the length of item_queue
exceeds REDIS_WRITE_CHUNK_SIZE
while inside the conn.notifies
loop, we would exceed the max size. Did you have any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I was trying to get some timings against the old method to put for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a sample repo you can use here https://github.com/JacobReynolds/PGSyncSpeedTest that has both methods available if you want to use that for timing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This yields significant improvements for write operations.
Slow timer 0:00:10.480808 (10.48 sec)
Fast timer 0:00:00.651109 (0.65 sec)
10K records and batch 1K.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. Will make a relese of this right away
REDIS_WRITE_CHUNK_SIZE
environment variable which allows pushes to the redis queue to be batchedIn reference to #233 I wanted to propose a solution to slow syncs between the database and redis. I was able to get the sync time down from ~2 minutes to 1 minute when inserting ~16k items on my local machine, with a single core dedicated to the process. I think we would see greater improvements on a more powerful computer.
Would love any feedback on this option and your help in verifying it works.