-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delayed processing for ProcessManager.pidToProcessInfo #321
Conversation
The old "symbolize now" mechanism is no longer needed.
if !ok { | ||
log.Debugf("Skip process exit handling for unknown PID %d", pid) | ||
return symbolize | ||
return | ||
} | ||
|
||
// Delete all entries we have for this particular PID from pid_page_to_mapping_info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this cleanup here as there's no immediate need to postpone cleaning up the eBPF map until traceCaptureKTime >= pidExitKtime
(unlike pidToProcessInfo
). This also speeds up execution of ProcessedUntil
compared to having the map cleanup take place there.
// fast enough and this particular pid is reused again by the system. | ||
// NOTE: Exported only for tracer. | ||
func (pm *ProcessManager) ProcessPIDExit(pid libpf.PID) bool { | ||
func (pm *ProcessManager) ProcessPIDExit(pid libpf.PID) { | ||
exitKTime := times.GetKTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this outside the lock for improved accuracy (there's a debug log in ProcessedUntil
that prints exit latency).
Uses ProcessedUntil mechanism to guarantee that process metadata is not discarded before all relevant trace events have been processed.
5629d4b
to
a8c3852
Compare
processmanager/processinfo.go
Outdated
return symbolize | ||
return | ||
} | ||
if pidExited { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to attempt a repeat cleanup for the same PID, if we've previously performed it.
@@ -348,34 +347,6 @@ func (pm *ProcessManager) MaybeNotifyAPMAgent( | |||
return serviceName | |||
} | |||
|
|||
func (pm *ProcessManager) SymbolizationComplete(traceCaptureKTime times.KTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to processinfo.go
for consistency (all pidToProcessInfo
accessors in one place), renamed to ProcessedUntil
and updated to also cleanup pidToProcessInfo
.
@@ -548,9 +553,6 @@ func (pm *ProcessManager) ProcessPIDExit(pid libpf.PID) bool { | |||
address, pid, err) | |||
} | |||
} | |||
delete(pm.pidToProcessInfo, pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now taking place in ProcessedUntil
, delayed until traceCaptureKTime >= exitKTime
.
@@ -294,33 +293,6 @@ func (pm *ProcessManager) ConvertTrace(trace *host.Trace) (newTrace *libpf.Trace | |||
return newTrace | |||
} | |||
|
|||
// findMappingForTrace locates the mapping for a given host trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved without changes to processinfo.go
for consistency.
if len(pm.interpreters[pid]) > 0 { | ||
pidExited := false | ||
info, pidExists := pm.pidToProcessInfo[pid] | ||
if pidExists || (pm.interpreterTracerEnabled && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially same logic as before with these additions:
- Don't add
exitKTime
topm.exitEvents
if it already exists. - Also add
exitKTime
topm.exitEvents
ifpm.pidToProcessInfo[pid]
exists, as we want to cleanup the latter in delayed fashion.
continue | ||
} | ||
|
||
delete(pm.pidToProcessInfo, pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic as before with this single-line addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first look with some comments
|
||
info, ok := pm.pidToProcessInfo[pid] | ||
if !ok { | ||
if !pidExists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the global read & write lock as short as possible, the if !pidExists {..}
part should be moved before if pidExists || (pm.interpreterTracerEnabled && len(pm.interpreters[pid]) > 0) {..}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would prevent executing if _, pidExited = ...
in case (pm.interpreterTracerEnabled && len(pm.interpreters[pid]) > 0
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Tim wrote, this would alter the logic. I tried to keep as much of the original semantics the same to avoid introducing new races. Maybe here it's possible to safely say that if !pidExists
then it's OK not to write exitKTime
in pm.exitEvents
but we'd need to carefully examine all subsystem interactions, check for race conditions etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up is done in #325
processmanager/processinfo.go
Outdated
return symbolize | ||
return | ||
} | ||
if pidExited { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, pidExited
should be renamed to pidExitProcessed
so something similar, this would it make obvious, that we want to avoid duplicate work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
pm.mu.Lock() | ||
defer pm.mu.Unlock() | ||
|
||
nowKTime := times.GetKTime() | ||
log.Debugf("ProcessedUntil captureKT: %v latency: %v ms", | ||
traceCaptureKTime, (nowKTime-traceCaptureKTime)/1e6) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the lock holding as short as possible:
pm.mu.Lock() | |
defer pm.mu.Unlock() | |
nowKTime := times.GetKTime() | |
log.Debugf("ProcessedUntil captureKT: %v latency: %v ms", | |
traceCaptureKTime, (nowKTime-traceCaptureKTime)/1e6) | |
nowKTime := times.GetKTime() | |
log.Debugf("ProcessedUntil captureKT: %v latency: %v ms", | |
traceCaptureKTime, (nowKTime-traceCaptureKTime)/1e6) | |
pm.mu.Lock() | |
defer pm.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can affect the latency measurement, since we're timing before the lock.
Summary
SymbolizationComplete
toProcessedUntil
and moved toprocessinfo.go
ProcessManager.pidToProcessInfo
cleanupLeverages #307 to ensure that process metadata is not discarded before all relevant trace events have been processed.
Fixes #278.
You may find reviewing commit-by-commit to be simpler.