Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache statistics #34

Merged
merged 4 commits into from
Oct 31, 2023
Merged

Cache statistics #34

merged 4 commits into from
Oct 31, 2023

Conversation

uncle-lv
Copy link
Contributor

Usage

1.Turn on statistics collection

Set stats to True to turn on statistics collection.

cache = Cache(stats=True)

2.Get statistics snapshot

Call get_stats() to get a Stats object, what represents a snapshot of statistics.

stats = cache.get_stats()

3.Get statistics

stats.hits
stats.misses
stats.total
stats.access
stats.hit_rate
stats.miss_rate
stats.eviction_rate

@coveralls
Copy link

coveralls commented Oct 12, 2023

Coverage Status

coverage: 100.0%. remained the same when pulling e8d8f01 on uncle-lv:statistics into d9c6d8b on dgilland:master.

@uncle-lv
Copy link
Contributor Author

This pr is for #32

Comment on lines 245 to 246
if value == default:
self._stats_counter.record_misses(1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work when self.default is set and the value was set to it (i.e. key is set but happens to be the default value).

I'm wondering if this would be easier to do within _get() since it will be more explicit when the key isn't set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is has() get_ttl() _add() calls _has() then _has() calls _get(). Should these calls be counted?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ok to count those.

Comment on lines 205 to 206
if self._stats_counter:
self._stats_counter.reset()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the stats should necessarily be cleared when clearing the cache (could be a use-case for wiping the keys but keeping the stats going). Instead, we can have an explicit method to reset the stats (e.g. reset_stats()).

Comment on lines 351 to 352
if self._stats_counter:
self._stats_counter.record_total(1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to manually inc/dec the total key count since that can be easily determined with len(self). I'm worried we may miss counting somewhere which would throw off the total.

Comment on lines 660 to 661
with self._lock:
return Stats(counter=self._stats_counter) if self._stats_counter else None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid locking if not needed:

Suggested change
with self._lock:
return Stats(counter=self._stats_counter) if self._stats_counter else None
if not self._stats_counter:
return None
with self._lock:
return Stats(counter=self._stats_counter)

@@ -85,12 +88,14 @@ def __init__(
timer: t.Callable[[], T_TTL] = time.time,
default: t.Any = None,
on_delete: t.Optional[t.Callable[[t.Hashable, t.Any, RemovalCause], None]] = None,
stats: bool = False,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stats: bool = False,
enable_stats: bool = False,

return self._total

@property
def access(self) -> int:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def access(self) -> int:
def accesses(self) -> int:

return self._misses

@property
def total(self) -> int:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just total might be a little ambiguous. Maybe total_entries instead.


Return 1.0 when ``access_count`` == 0.
"""
return 1.0 if self.access == 0 else self._hits / self.access
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer without the ternary so that it's a little easier to grok:

Suggested change
return 1.0 if self.access == 0 else self._hits / self.access
if self.accesses == 0:
return 1.0
return self.hits / self.access

Also, stay consistent with either self._* attribute or without the leading underscore for all.


Return 0.0 when ``access_count`` == 0.
"""
return 0.0 if self.access == 0 else self._misses / self.access
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer no ternary. See comment on hit_rate().


Return 1.0 when ``access_count`` == 0.
"""
return 1.0 if self.access == 0 else self._evictions / self.access
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer no ternary. See comment on hit_rate().

@uncle-lv
Copy link
Contributor Author

I almost agree.

But if we move hits/misses count into _get(), not only get() calls will be counted but also other methods(has() get_ttl _add()) which call _get() will be counted. It maybe taint statistics.

@uncle-lv
Copy link
Contributor Author

uncle-lv commented Oct 25, 2023

Usage

1.Turn on statistics collection

Set enable_stats to True to turn on statistics collection.

cache = Cache(enable_stats=True)

1.Enable statistics collection

Call cache.stats.enable() to enable statistics collection.

cache = Cache()
cache.stats.enable()

2.Control statistics

Disable statistics.

cache.stats.disable()

⚠️Statistics will be cleared if you call disable()!

Enable statistics.

cache.stats.enable()

Pause statistics.

cache.stats.pause()

Resume statistics.

cache.stats.resume()

Clear statistics.

cache.stats.reset()

3.Get statistics

Call cache.stats.info() to get a statistics snapshot.

stats = cache.stats.info()

Get details.

stats.hits
stats.misses
stats.total_entries
stats.accesses
stats.hit_rate
stats.miss_rate
stats.eviction_rate

Comment on lines 653 to 655
@property
def stats(self) -> StatsTracker:
return self._stats_tracker
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the @property and just have self.stats.

Copy link
Contributor Author

@uncle-lv uncle-lv Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to prevent the user from modifying the reference of _stats_tracker(make it read-only).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for it to to be overridable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you offer me a case? It seems one Cache should hold only one and same StatsTracker in one lifetime.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't quite decided what the exact approach will be but don't want it behind a property.

Comment on lines 235 to 244
return self._get(key, default=default)
value = self._get(key, default=default)

return value
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't seem necessary; it's the same either way (maybe leftover from refactor?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I forgot it.

):
self.maxsize = maxsize
self.ttl = ttl
self.timer = timer
self.default = default
self.on_delete = on_delete
self._enable_stats = enable_stats
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like StatsTracker is already managing the enablement and checking whether stats are enabled or not before incrementing/decrementing; can have Cache call the stats methods unconditionally instead. Also, don't need this attribute since we can do something like if enable_stats: self.stats.enable().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.

Comment on lines 96 to 99
self._hit_count = 0
self._miss_count = 0
self._evicted_count = 0
self._total_count = 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer not to define the individual counter attributes on both StatsTracker and Stats. Can just have a reference to a Stats instance on StatsTracker that's modified instead.

self._enabled = True
self._paused = False

def _inc_hits(self, count: int) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with public methods for these in case users want to override them/add their own.

Comment on lines 7 to 11
def __init__(self, hits: int, misses: int, evictions: int, total: int) -> None:
self._hits = hits
self._misses = misses
self._evictions = evictions
self._total = total
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match naming with def total_entries:

Suggested change
def __init__(self, hits: int, misses: int, evictions: int, total: int) -> None:
self._hits = hits
self._misses = misses
self._evictions = evictions
self._total = total
def __init__(self, hits: int, misses: int, evictions: int, total_entries: int) -> None:
self._hits = hits
self._misses = misses
self._evictions = evictions
self._total_entries = total_entries

Comment on lines 356 to 357
if self._enable_stats:
self._stats_tracker._total_count = len(self)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if instead of doing this, we pass the Cache instance to StatsTracker and then do return Stats(..., total_entries=len(self._cache)) so that the total is always accurate and we don't have to worry about syncing up cache key addition/removal events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will lead to a circular import if StatsTracker holds the Cache instance.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

circular import

Because of cross-typing? That can be bypassed by checking typing.TYPE_CHECKING and only importing when True in stats module.

This blog as a simple example: https://adamj.eu/tech/2021/05/13/python-type-hints-how-to-fix-circular-imports/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks.

Copy link
Owner

@dgilland dgilland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Thanks for taking the time to add this!

@dgilland dgilland merged commit 53d55c6 into dgilland:master Oct 31, 2023
12 checks passed
@uncle-lv uncle-lv deleted the statistics branch November 1, 2023 10:03
@uncle-lv uncle-lv mentioned this pull request Nov 4, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants