-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache statistics #34
Cache statistics #34
Conversation
This pr is for #32 |
src/cacheout/cache.py
Outdated
if value == default: | ||
self._stats_counter.record_misses(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work when self.default
is set and the value was set to it (i.e. key is set but happens to be the default value).
I'm wondering if this would be easier to do within _get()
since it will be more explicit when the key isn't set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is has()
get_ttl()
_add()
calls _has()
then _has()
calls _get()
. Should these calls be counted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ok to count those.
src/cacheout/cache.py
Outdated
if self._stats_counter: | ||
self._stats_counter.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the stats should necessarily be cleared when clearing the cache (could be a use-case for wiping the keys but keeping the stats going). Instead, we can have an explicit method to reset the stats (e.g. reset_stats()
).
src/cacheout/cache.py
Outdated
if self._stats_counter: | ||
self._stats_counter.record_total(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to manually inc/dec the total key count since that can be easily determined with len(self)
. I'm worried we may miss counting somewhere which would throw off the total.
src/cacheout/cache.py
Outdated
with self._lock: | ||
return Stats(counter=self._stats_counter) if self._stats_counter else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid locking if not needed:
with self._lock: | |
return Stats(counter=self._stats_counter) if self._stats_counter else None | |
if not self._stats_counter: | |
return None | |
with self._lock: | |
return Stats(counter=self._stats_counter) |
src/cacheout/cache.py
Outdated
@@ -85,12 +88,14 @@ def __init__( | |||
timer: t.Callable[[], T_TTL] = time.time, | |||
default: t.Any = None, | |||
on_delete: t.Optional[t.Callable[[t.Hashable, t.Any, RemovalCause], None]] = None, | |||
stats: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stats: bool = False, | |
enable_stats: bool = False, |
src/cacheout/stats.py
Outdated
return self._total | ||
|
||
@property | ||
def access(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def access(self) -> int: | |
def accesses(self) -> int: |
src/cacheout/stats.py
Outdated
return self._misses | ||
|
||
@property | ||
def total(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just total
might be a little ambiguous. Maybe total_entries
instead.
src/cacheout/stats.py
Outdated
|
||
Return 1.0 when ``access_count`` == 0. | ||
""" | ||
return 1.0 if self.access == 0 else self._hits / self.access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer without the ternary so that it's a little easier to grok:
return 1.0 if self.access == 0 else self._hits / self.access | |
if self.accesses == 0: | |
return 1.0 | |
return self.hits / self.access |
Also, stay consistent with either self._*
attribute or without the leading underscore for all.
src/cacheout/stats.py
Outdated
|
||
Return 0.0 when ``access_count`` == 0. | ||
""" | ||
return 0.0 if self.access == 0 else self._misses / self.access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer no ternary. See comment on hit_rate()
.
src/cacheout/stats.py
Outdated
|
||
Return 1.0 when ``access_count`` == 0. | ||
""" | ||
return 1.0 if self.access == 0 else self._evictions / self.access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer no ternary. See comment on hit_rate()
.
I almost agree. But if we move hits/misses count into |
Usage
|
src/cacheout/cache.py
Outdated
@property | ||
def stats(self) -> StatsTracker: | ||
return self._stats_tracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the @property
and just have self.stats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to prevent the user from modifying the reference of _stats_tracker
(make it read-only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for it to to be overridable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you offer me a case? It seems one Cache
should hold only one and same StatsTracker
in one lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't quite decided what the exact approach will be but don't want it behind a property.
src/cacheout/cache.py
Outdated
return self._get(key, default=default) | ||
value = self._get(key, default=default) | ||
|
||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't seem necessary; it's the same either way (maybe leftover from refactor?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! I forgot it.
src/cacheout/cache.py
Outdated
): | ||
self.maxsize = maxsize | ||
self.ttl = ttl | ||
self.timer = timer | ||
self.default = default | ||
self.on_delete = on_delete | ||
self._enable_stats = enable_stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like StatsTracker
is already managing the enablement and checking whether stats are enabled or not before incrementing/decrementing; can have Cache
call the stats methods unconditionally instead. Also, don't need this attribute since we can do something like if enable_stats: self.stats.enable()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it.
src/cacheout/stats.py
Outdated
self._hit_count = 0 | ||
self._miss_count = 0 | ||
self._evicted_count = 0 | ||
self._total_count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer not to define the individual counter attributes on both StatsTracker
and Stats
. Can just have a reference to a Stats
instance on StatsTracker
that's modified instead.
src/cacheout/stats.py
Outdated
self._enabled = True | ||
self._paused = False | ||
|
||
def _inc_hits(self, count: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with public methods for these in case users want to override them/add their own.
src/cacheout/stats.py
Outdated
def __init__(self, hits: int, misses: int, evictions: int, total: int) -> None: | ||
self._hits = hits | ||
self._misses = misses | ||
self._evictions = evictions | ||
self._total = total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match naming with def total_entries
:
def __init__(self, hits: int, misses: int, evictions: int, total: int) -> None: | |
self._hits = hits | |
self._misses = misses | |
self._evictions = evictions | |
self._total = total | |
def __init__(self, hits: int, misses: int, evictions: int, total_entries: int) -> None: | |
self._hits = hits | |
self._misses = misses | |
self._evictions = evictions | |
self._total_entries = total_entries |
src/cacheout/cache.py
Outdated
if self._enable_stats: | ||
self._stats_tracker._total_count = len(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if instead of doing this, we pass the Cache
instance to StatsTracker
and then do return Stats(..., total_entries=len(self._cache))
so that the total is always accurate and we don't have to worry about syncing up cache key addition/removal events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will lead to a circular import if StatsTracker
holds the Cache
instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
circular import
Because of cross-typing? That can be bypassed by checking typing.TYPE_CHECKING
and only importing when True
in stats module.
This blog as a simple example: https://adamj.eu/tech/2021/05/13/python-type-hints-how-to-fix-circular-imports/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Thanks for taking the time to add this!
Usage
1.Turn on statistics collection
Set
stats
toTrue
to turn on statistics collection.2.Get statistics snapshot
Call
get_stats()
to get aStats
object, what represents a snapshot of statistics.3.Get statistics