-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Issue Description
Current Behavior
The all_pks method in the redis-om library lacks the option to specify the count parameter for the SCAN command. As a result, it uses a default count value, causing performance issues when dealing with a large number of keys in Redis.
#! Current Behavior
@classmethod
async def all_pks(cls): # type: ignore
key_prefix = cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
# TODO: We need to decide how we want to handle the lack of
# decode_responses=True...
return (
remove_prefix(key, key_prefix)
if isinstance(key, str)
else remove_prefix(key.decode(cls.Meta.encoding), key_prefix)
async for key in cls.db().scan_iter(f"{key_prefix}*", _type="HASH")
)
Desired Behavior
I would like to propose adding an option to pass the count parameter in the all_pks method. This would allow users to customize the number of keys processed at a time, providing better control over performance when working with Redis.
#! Expected Behavior
@classmethod
# we can make count as an optional parameter
# to make the change non-breaking
async def all_pks(cls, count: Optional[int]= None): # type: ignore
key_prefix = cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
# TODO: We need to decide how we want to handle the lack of
# decode_responses=True...
return (
remove_prefix(key, key_prefix)
if isinstance(key, str)
else remove_prefix(key.decode(cls.Meta.encoding), key_prefix)
async for key in cls.db().scan_iter(f"{key_prefix}*", _type="HASH", count= count)
)
Steps to Reproduce (if applicable)
N/A
Impact
Adding this feature would improve the performance and scalability of the library when working with Redis instances containing a large number of keys. It would provide users with better control over the data retrieval process and enable them to fine-tune the performance according to their needs.
Additional context:
I have gathered some examples that highlight the performance gains achieved by modifying the value of the count parameter. These examples serve as evidence for the potential improvements in performance when users can customize the count value in the all_pks method. (I have used pyinstrument to profile code)
Note:
It is important to note that determining the ideal count value depends on the specific Redis instance and the number of items it contains. The count value I have used in my examples is based on the number of items in my Redis instance and may not be universally applicable
Test Code
from typing import Optional
from aredis_om import HashModel, get_redis_connection
class DummyModel(HashModel):
class Meta:
global_key_prefix = "logomatcher_orchestrator"
model_key_prefix = "SimilarityScoreModel"
database = get_redis_connection(
host="redishost",
port=6379,
db=0,
)
async def print_keys_with_prefix_matching(prefix:str,count:Optional[int] = None):
async for key in DummyModel.db().scan_iter(f"{prefix}*", _type="HASH",count=count):
print(key)
time took with the default count value
>>> profiler = Profiler()
>>> profiler.start()
>>>
>>> asyncio.get_event_loop().run_until_complete( print_keys_with_prefix_matching("test") )
>>> profiler.stop()
<pyinstrument.session.Session object at 0x7fc9d7002a60>
>>>
>>> profiler.print()
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:40:33 Samples: 57622
/_//_/// /_\ / //_// / //_'/ // Duration: 97.692 CPU time: 48.077
/ _/ v4.4.0
Program:
97.692 MainThread <thread>:140504873899840
├─ 49.604 Redis.scan_iter aioredis/client.py:2413
│ [114 frames hidden] aioredis, asyncio, <built-in>, async_...
├─ 43.834 <module> <stdin>:1
│ [330 frames hidden] <stdin>, asyncio, aioredis, async_tim...
├─ 1.468 HiredisParser.can_read aioredis/connection.py:477
│ [8 frames hidden] aioredis, async_timeout
└─ 1.124 StreamReader.read asyncio/streams.py:643
[4 frames hidden] asyncio
time took with the count value 1000
>>> import asyncio
>>> from pyinstrument import Profiler
>>>
>>> profiler = Profiler()
>>> profiler.start()
>>>
>>> asyncio.get_event_loop().run_until_complete( print_keys_with_prefix_matching("test",count=1000) )
>>> profiler.stop()
<pyinstrument.session.Session object at 0x7fc9d6fa4cd0>
>>>
>>> profiler.print()
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:46:30 Samples: 580
/_//_/// /_\ / //_// / //_'/ // Duration: 0.782 CPU time: 0.471
/ _/ v4.4.0
Program:
0.781 MainThread <thread>:140504873899840
├─ 0.644 <module> <stdin>:1
│ [130 frames hidden] <stdin>, asyncio, selectors, <built-i...
└─ 0.136 Redis.scan_iter aioredis/client.py:2413
[78 frames hidden] aioredis, asyncio, <built-in>, async_...
Environment
Library version: 0.0.27
Python version: 3.9.16