Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

Open
edwbuck opened this issue Aug 2, 2024 · 4 comments · May be fixed by #5509
Labels
priority/backlog Issue is approved and in the backlog
Milestone

Comments

@edwbuck
Copy link
Contributor

edwbuck commented Aug 2, 2024

This is a small optimization of the db event framework for a specific scenario.

When an entry is modified twice within a polling cycle, we currently handle each modification (event) independently. This means that the cache updating read of the entry will read the entry twice, once for the first update and once for the second update.

As the first update's read would likely read the record as it exists after the second update had been applied, an optimization of reading the entry once, instead of twice, should provide the same outcomes, with one less record combing back in the database request.

For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record. However, if we ever want fidelity on "count of events processed" tracking to "count of cache updates" this optimization will break 1 to 1 event tracing.

@amoore877
Copy link
Member

For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record.

yeah my initial thoughts was just select unique, but I guess we have to be careful about making sure we can tie back all retrieved event IDs too. Like if 10 event IDs all touched one entity, we only really care that some time later we retrieved the data for that one entity.

like if I have

missedAgentEventsMap: {
"1": agentAID
"5": agentBID
"10": agentAID
} 

and it's time to look up missed events, then first thing that happens (assuming something like #5342 is addressed):

select * where eventID in (1, 5, 10)

which returns back, if successful, a struct essentially equivalent to the missedAgentEventsMap above

missedEventsFound: [
{event: "1", agentID: agentAID},
{event:"5", agentID: agentBID},
{event:"10", agentID: agentAID},
]

we de-dupe this, but I was thinking into something like an inverse of the original map:

agentIDsToQueryMap: {
agentAID: [1, 10],
agentBID: [5],
}

next we query the agent table for each key in the query map above.
if all cache update ops are successful for agentBID, we remove 5 from missedAgentEventsMap
if all cache update ops are successful for agentAID, we remove both 1 and 10 from missedAgentEventsMap

@azdagron azdagron added triage/in-progress Issue triage is in progress priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Aug 6, 2024
@sorindumitru
Copy link
Contributor

I think this might already be handled. There's a seenMap used during the polling:

It looks like we only poll the db the first time we see an entry/node id.

@edwbuck
Copy link
Contributor Author

edwbuck commented Sep 3, 2024

@sorindumitru I wish it was already deduplicated, but there will be a small code change to deduplicate the query. What you are seeing is that "change event" being deduplicated. The issue described above is when two different change events refer to the same EntryID, the EntryID should be fetched only once.

This will involve a map, and a second loop. We process the items once (as above) to see if we skip items, and to add them to a fetching map, instead of fetching the directly.

Then we fetch from the fetching map, which will be keyed by the EntryID (not the EventID) which will de-duplicate the fetching.

I'm working on code for this now, and will have a PR ready soon.

@sorindumitru
Copy link
Contributor

Isn't it done by, entry id at the moment? So if the same entry id is seen twice in an update we'll only fetch the entry once from the database, see for example

if _, seen := seenMap[event.EntryID]; seen {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants