Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make event objects unhashable #15305

Closed
wants to merge 2 commits into from
Closed

Conversation

DMRobertson
Copy link
Contributor

Generally speaking one should look up an event by its id. Otherwise we'll fall back to object.__hash__ (the instance's address, under CPython) which is probably not what you want.

Noticed in #15240.

David Robertson added 2 commits March 21, 2023 22:23
Generally speaking one should look up an event by its id. Otherwise
we'll fall back to `object.__hash__` (the instance's address, under
CPython) which is probably not what you want.

Noticed in #15240.
@clokep
Copy link
Member

clokep commented Mar 21, 2023

I suppose an alternative would be to use the event ID as the hash since we assume they're unique anyway?

@DMRobertson
Copy link
Contributor Author

DMRobertson commented Mar 21, 2023

I suppose an alternative would be to use the event ID as the hash since we assume they're unique anyway?

Yeah, I mulled over that but erred towards being more militant.

This

to_remove = set()
for event, context in events_and_contexts:
outlier_persisted = have_persisted.get(event.event_id)
logger.debug(
"_update_outliers_txn: event=%s outlier=%s outlier_persisted=%s",
event.event_id,
event.internal_metadata.is_outlier(),
outlier_persisted,
)
# Ignore events which we haven't persisted at all
if outlier_persisted is None:
continue
to_remove.add(event)
seems to rely on EventBase being hashable.

I suppose this might be an unnecessary change. The footgun I have in mind is

  • we have an event E,
  • we have two different EventBases e1 and e2 representing E
  • we use a set to deduplicate the events---but this fails because they have different hashes.

More generally EventBase feels like it might be mutable, so shouldn't be hashable? It just smells like a bug to me, but maybe I'm overreacting?

@clokep
Copy link
Member

clokep commented Mar 21, 2023

I think they aren't really mutable but I'm honestly not sure? I think @erikjohnston had a branch somewhere that did magic to ensure only a single copy ever existed per event.

@erikjohnston
Copy link
Member

I think they aren't really mutable but I'm honestly not sure? I think @erikjohnston had a branch somewhere that did magic to ensure only a single copy ever existed per event.

I think the unsigned and internal_metadata fields are mutable

@clokep
Copy link
Member

clokep commented Sep 25, 2023

@DMRobertson Any idea if we should continue down this?

@DMRobertson
Copy link
Contributor Author

I think that

The footgun I have in mind is

* we have an event E,

* we have two different `EventBase`s `e1` and `e2` representing E

* we use a set to deduplicate the events---but this fails because they have different hashes.

is still a possibility, but looking back I'm not sure that making them unhashable is the right thing to do. I think let's drop this.

@clokep
Copy link
Member

clokep commented Sep 25, 2023

See also #14911 which notes that we cache some EventBase instances even though they're not immutable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants