Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

edwbuck · 2024-08-02T13:59:32Z

This is a small optimization of the db event framework for a specific scenario.

When an entry is modified twice within a polling cycle, we currently handle each modification (event) independently. This means that the cache updating read of the entry will read the entry twice, once for the first update and once for the second update.

As the first update's read would likely read the record as it exists after the second update had been applied, an optimization of reading the entry once, instead of twice, should provide the same outcomes, with one less record combing back in the database request.

For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record. However, if we ever want fidelity on "count of events processed" tracking to "count of cache updates" this optimization will break 1 to 1 event tracing.

amoore877 · 2024-08-02T14:52:17Z

For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record.

yeah my initial thoughts was just select unique, but I guess we have to be careful about making sure we can tie back all retrieved event IDs too. Like if 10 event IDs all touched one entity, we only really care that some time later we retrieved the data for that one entity.

like if I have

missedAgentEventsMap: {
"1": agentAID
"5": agentBID
"10": agentAID
}

and it's time to look up missed events, then first thing that happens (assuming something like #5342 is addressed):

select * where eventID in (1, 5, 10)

which returns back, if successful, a struct essentially equivalent to the missedAgentEventsMap above

missedEventsFound: [
{event: "1", agentID: agentAID},
{event:"5", agentID: agentBID},
{event:"10", agentID: agentAID},
]

we de-dupe this, but I was thinking into something like an inverse of the original map:

agentIDsToQueryMap: {
agentAID: [1, 10],
agentBID: [5],
}

next we query the agent table for each key in the query map above.
if all cache update ops are successful for agentBID, we remove 5 from missedAgentEventsMap
if all cache update ops are successful for agentAID, we remove both 1 and 10 from missedAgentEventsMap

sorindumitru · 2024-08-23T15:20:51Z

I think this might already be handled. There's a seenMap used during the polling:

spire/pkg/server/endpoints/authorized_entryfetcher_registration_entries.go

Line 122 in 4f34e43

seenMap := map[string]struct{}{}

It looks like we only poll the db the first time we see an entry/node id.

edwbuck · 2024-09-03T18:12:43Z

@sorindumitru I wish it was already deduplicated, but there will be a small code change to deduplicate the query. What you are seeing is that "change event" being deduplicated. The issue described above is when two different change events refer to the same EntryID, the EntryID should be fetched only once.

This will involve a map, and a second loop. We process the items once (as above) to see if we skip items, and to add them to a fetching map, instead of fetching the directly.

Then we fetch from the fetching map, which will be keyed by the EntryID (not the EventID) which will de-duplicate the fetching.

I'm working on code for this now, and will have a PR ready soon.

sorindumitru · 2024-09-03T18:36:32Z

Isn't it done by, entry id at the moment? So if the same entry id is seen twice in an update we'll only fetch the entry once from the database, see for example

spire/pkg/server/endpoints/authorized_entryfetcher_registration_entries.go

Line 140 in 4f34e43

if _, seen := seenMap[event.EntryID]; seen {

edwbuck mentioned this issue Aug 2, 2024

[Event-based entry cache] replayMissedEvents queries DB in loop for every missed ID #5342

Open

azdagron added triage/in-progress Issue triage is in progress priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Aug 6, 2024

edwbuck mentioned this issue Aug 16, 2024

Add comments to events based cache code #5327

Merged

3 tasks

edwbuck linked a pull request Sep 20, 2024 that will close this issue

Implement cache update deduplication per fetch cycle and backoff algorithm. #5509

Open

amartinezfayo added this to the 1.11.0 milestone Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

edwbuck commented Aug 2, 2024

amoore877 commented Aug 2, 2024

sorindumitru commented Aug 23, 2024

edwbuck commented Sep 3, 2024

sorindumitru commented Sep 3, 2024

Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

Comments

edwbuck commented Aug 2, 2024

amoore877 commented Aug 2, 2024

sorindumitru commented Aug 23, 2024

edwbuck commented Sep 3, 2024

sorindumitru commented Sep 3, 2024