-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EBPF-357] Add GenericMap supporting batch lookup #21738
Conversation
Bloop Bleep... Dogbot HereRegression Detector ResultsRun ID: ff46f4a9-1aae-408f-b2ad-d20a4ca0b264 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | file_to_blackhole | % cpu utilization | +1.66 | [-4.97, +8.29] |
➖ | file_tree | memory utilization | +0.74 | [+0.64, +0.85] |
➖ | idle | memory utilization | +0.23 | [+0.20, +0.26] |
Fine details of change detection per experiment
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | file_to_blackhole | % cpu utilization | +1.66 | [-4.97, +8.29] |
➖ | file_tree | memory utilization | +0.74 | [+0.64, +0.85] |
➖ | process_agent_real_time_mode | memory utilization | +0.25 | [+0.22, +0.28] |
➖ | idle | memory utilization | +0.23 | [+0.20, +0.26] |
➖ | trace_agent_msgpack | ingress throughput | +0.05 | [+0.03, +0.07] |
➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.03, +0.03] |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.06, +0.06] |
➖ | trace_agent_json | ingress throughput | -0.03 | [-0.07, -0.00] |
➖ | process_agent_standard_check | memory utilization | -0.05 | [-0.10, +0.01] |
➖ | tcp_syslog_to_blackhole | ingress throughput | -0.25 | [-0.32, -0.19] |
➖ | process_agent_standard_check_with_stats | memory utilization | -0.47 | [-0.52, -0.42] |
➖ | otel_to_otel_logs | ingress throughput | -1.66 | [-2.39, -0.93] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
ab9ae3d
to
8cb116c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial review
pkg/collector/corechecks/ebpf/probe/tcpqueuelength/tcp_queue_length.go
Outdated
Show resolved
Hide resolved
82c92f1
to
1c2a34e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cred
pkg/ebpf/generic_map.go
Outdated
|
||
// Important! If we pass here g.keys/g.values, Go will create a copy of the slice instance | ||
// and will generate extra allocations. I am not entirely sure why it is doing that. | ||
g.currentBatchSize, g.err = g.m.BatchLookup(&g.cursor, g.keysCopy, g.valuesCopy, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to provide the origianl keys and values, the slice is a very slim struct itself just holding the pointer to the underlying buffer. the reason go allocates it when passed by value is due to the fact the slice is immutable in golang
moreover have you seen this comment in cilium/ebpf for the BatchLookup
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I am a bit concerned about the next lines in that comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason I did the copy first was mostly to have 0 allocations per iteration, Bryce told me to take care with that. It's true that it's not much, but I think this approach gives better performance and memory usage (only one copy is done instead of one per batch). Regarding the first comment, I think the tests cover all the cases, it's true that if you pass a pointer (e.g. &g.keysCopy
) it complains about it.
For the second comment, it is indeed a little bit weird. For example, a batch might return less items than requested even if it's not the last one. It's also unintuitive that the return code for the last batch is ErrKeyNotExists, although in my tests I've seen that only happens always for hash maps, array maps don't always return ErrKeyNotExists for the last batch, it just returns 0 elements sometimes.
Reviewing both the code of the library and the usage in code, I think the tests cover the use cases we have in the agent (I just uploaded a test for the Array case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more referring to:
// Warning: This API is not very safe to use as the kernel implementation for
// batching relies on the user to be aware of subtle details with regarding to
// different map type implementations.
rather than return code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed the return code is what they were referring to, as that was the only difference I noticed while working with it and looking at the kernel code. However, we could maybe change the implementation so that the default is single-item iteration, so that batch iteration has to be selected explicitly and people using that can test that it works in their use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you can do this and avoid the allocations. You can also get rid of keysCopy
and valuesCopy
.
g.currentBatchSize, g.err = g.m.BatchLookup(&g.cursor, g.keysCopy, g.valuesCopy, nil) | |
g.currentBatchSize, g.err = g.m.BatchLookup(&g.cursor, any(g.keys), any(g.values), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing that breaks this test that checks that there are not extra allocations per iteration. We could skip that test until cilium/ebpf fixes this behavior in cilium/ebpf#1290? I can also document this in the code, keep an eye on that issue and send a PR when they fix that.
pkg/ebpf/generic_map.go
Outdated
if itops.BatchSize > int(g.m.MaxEntries()) { | ||
itops.BatchSize = int(g.m.MaxEntries()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a warn here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even better - fail the operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I wouldn't fail, right now the default is to iterate with batches (although that could change) and depending on the batch and map sizes we could be failing a lot of iterations by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a warning log (once) should be sufficient
My concern is that the naive user will try to increase the BatchSize
field in IteratrOptions
and once he passes a value larger than max he does not know, but also does not see any impact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least document the boundaries in the function's documentation (for BatchSize == 0 and BatchSize > MaxEntries)
Co-authored-by: Bryce Kahle <bryce.kahle@datadoghq.com>
/merge |
🚂 MergeQueue This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. you can cancel this operation by commenting your pull request with |
🚂 MergeQueue Added to the queue. There are 2 builds ahead of this PR! (estimated merge in less than 44m) you can cancel this operation by commenting your pull request with |
What does this PR do?
Adds a new GenericMap object that allows for more type safety in eBPF maps, and supports batch lookups on iterations
Motivation
unsafe.Pointer
to avoid marshaling in calls to eBPF maps.When this PR is approved, I'll create a new one with the changes applied to the rest of the code that uses eBPF maps (it was previously in this PR but now it'll be split)
Additional Notes
datadog-agent/pkg/ebpf/generic_map.go
Line 161 in 846ee97
MapCleaner
was usingBatchLookup
directly, so I replaced it and compared the benchmarks (the changes to MapCleaner and other parts of the code will be done in a separate PR):IterateWithBatchSize
function with a batch size either zero (for a default value) or greater than zero.Several things to note here: first, speed is maintained, as expected. Memory usage on single item iterations is reduced, due to an update in the cilium/ebpf library (cilium/ebpf@f0d238d) where they removed allocations in
MapIterator.Next
. Memory usage on batch operations is reduced too due to the reduction in copies done in BatchLookup.While in the map_cleaner tests it doesn't look as dramatic (mainly because the bulk of the allocations in the benchmark is done by other functions, see this profiling output), I did some tests and found that in my initial implementation of the iterator (which was practically equal to the one existing in
map_cleaner
), the number of allocations would grow with each call toBatchLookup
. After doing some profiling, I saw that Go makes a copy of the keys/values buffer (allocated once per iterator) to convert them from typeK[]
to typeany
. I am not exactly sure why is this behavior happening, but I checked the tests from cilium/ebpf where they make sure that there is no allocation per each call to BatchLookup, and the main difference is that they convert the output buffers to typeany
. After doing that change (i.e., creating a "view" of the key/value buffers) I managed to bring down the allocations to 8, independent of the batch size/number of batches (controlled by this test).Possible Drawbacks / Trade-offs
Users might not know that they're iterating with batches, and therefore run into some problems caused by that. I've tried to make it as transparent as possible and handling different cases without interaction, but something could be wrong.
In any case, there's an option to force item-by-item iteration without batches. This is useful in certain cases where the batch lookup calls don't work properly (for example, with the
OOMKiller
probe, tests were failing with the batch mode but it seemed to be an error from the underlying cilium/ebpf code) and one needs to force iteration without batches.I've also reviewed the usage of
unsafe.Pointer
to avoid marshaling and copies as much as possible. I think it's correct based on the usage I've seen in the code. I would have liked to have a unit test that ensures that no marshaling is done but didn't find an immediate way to do it.Describe how to test/QA your changes
Reviewer's Checklist
Triage
milestone is set.major_change
label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.changelog/no-changelog
label has been applied.qa/skip-qa
label, with required eitherqa/done
orqa/no-code-change
labels, are applied.team/..
label has been applied, indicating the team(s) that should QA this change.need-change/operator
andneed-change/helm
labels have been applied.k8s/<min-version>
label, indicating the lowest Kubernetes version compatible with this feature.