-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BPF Batch Ops Methods #207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your PR, it's cool to have you contributing again :) I think it makes sense to expose the low level batch primitives. I'm a bit worried that Map
is getting bigger and bigger, but NextKey
, etc. is already there.
Looking at the PR I realised that the current situation with unmarshalBytes
, unmarshalMap
, marshalPtr
is pretty confusing. I wanted to suggest folding some of the []*Map
checks into unmarshalBytes
but now I'm not so sure how that would work. I'll try to clean up that code and get back to you. Take a look at my comments on the test, maybe we can get rid of the new map / program unmarshaling in the first place.
5adb79a
to
1ed71e7
Compare
0edde42
to
e979c3c
Compare
e979c3c
to
aeb1ca1
Compare
Hi Nate, I realize I'm a little late to the party, sorry about that. I have some gripes with the API proposed here, but that should not necessarily be a blocker to merge this as long as we don't slap a version number on it.
I would suggest a Let's call it.. Iterator pagination state is kept internally, and must be reset using a I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general. WDYT? |
You're right, it's not a very nice API. For better or worse I think it's useful to have it though: there is always something lost in translation when we build a higher-level interface, some use case that we can't foresee. This kind of bare bones API is a pressure release valve for such situations: users can just drop down to the ugly interface and aren't blocked. So my take is: we should merge this once we've got the array vs hashmap ErrNotExist thing figured out.
I kind of figured that we could make MapIterator use batch lookups behind the scenes, if they are available. So Is this something you were considering @nathanjsweet?
Can you create an issue and describe your ideas for MapIterator and/or key value typing? I agonized about |
I share your concerns @ti-mo. I'm not wild about it either. All I can say is that I'm excited for generics to land. My only real defense is that I tried to be generous in the error messaging. Also, I wouldn't say there are no examples. The tests offer some decent examples for the 3 different map types batch supports.
I'm not opposed to this at all. It's a good idea, but I really don't like obfuscating basic operations from the users of this library. There are just too many variables with using the batch operations that prevent us from abstracting them in a way that would be satisfactory to everyone. Philosophically, I don't think we should be too scared about lots of methods piling up. There are lots methods/operations in eBPF. I do think we can go beyond the harshness of libbpf, but I never want to hide basic functionality.
We do need to have a better design philosophy that we can spell out in the repository. It's getting big enough that we should have a document we can all point to for justifying an approach. Maybe we could do a megathread on slack or have a zoom meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the following change to the Array test:
diff --git a/map_test.go b/map_test.go
index 1fb5732..6abcedc 100644
--- a/map_test.go
+++ b/map_test.go
@@ -168,7 +168,7 @@ func TestBatchAPIArray(t *testing.T) {
Type: Array,
KeySize: 4,
ValueSize: 4,
- MaxEntries: 10,
+ MaxEntries: 2,
})
if err != nil {
t.Fatal(err)
@@ -218,6 +218,28 @@ func TestBatchAPIArray(t *testing.T) {
t.Errorf("BatchUpdate and BatchLookup values disagree: %v %v", values, lookupValues)
}
+ count, err = m.BatchLookup(uint32(1), &nextKey, lookupKeys, lookupValues, nil)
+ if !errors.Is(err, ErrKeyNotExist) {
+ t.Error("Expected ErrKeyNotExist when batch runs into end of array, got", err)
+ }
+ if count != 1 {
+ t.Error("Expected a single result, got", count)
+ }
+ if lookupKeys[0] != 1 {
+ t.Error("Expected first key to be 1, got", lookupKeys[0])
+ }
+ if lookupValues[0] != 4242 {
+ t.Error("Expected first value to be 4242, got", lookupValues[0])
+ }
+
+ count, err = m.BatchLookup(uint32(2), &nextKey, lookupKeys, lookupValues, nil)
+ if !errors.Is(err, ErrKeyNotExist) {
+ t.Error("Expected ErrKeyNotExist, got", err)
+ }
+ if count != 0 {
+ t.Error("Expected no result, got", count)
+ }
+
_, err = m.BatchLookupAndDelete(nil, &nextKey, deleteKeys, deleteValues, nil)
if !errors.Is(err, ErrBatchOpNotSup) {
t.Fatalf("BatchLookUpDelete: expected error %v, but got %v", ErrBatchOpNotSup, err)
This is roughly how I would expect the API to behave based on our conversation. Here is what I get:
=== RUN TestBatchAPIArray
/home/lorenz/dev/ebpf/map_test.go:223: Expected ErrKeyNotExist when batch runs into end of array, got <nil>
/home/lorenz/dev/ebpf/map_test.go:226: Expected a single result, got 0
/home/lorenz/dev/ebpf/map_test.go:229: Expected first key to be 1, got 0
/home/lorenz/dev/ebpf/map_test.go:232: Expected first value to be 4242, got 0
/home/lorenz/dev/ebpf/map_test.go:237: Expected ErrKeyNotExist, got <nil>
/home/lorenz/dev/ebpf/map_test.go:240: Expected no result, got 2
PTAL.
aeb1ca1
to
97bb871
Compare
There's a couple of things missing from your assumptions.
A possible workaround is that we can add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startKey is excluded from the batch processing (i.e. it starts with the key after startKey).
Ah ok, that is quite confusing. Can you mention that in the docs? Maybe rename startKey
-> prevKey
?
The kernel returns ENOENT is used to indicate the end of the list even if a successful partial result set is returned. I decided against returning the error because it is not typical in golang to return an error even if everything went fine.
I think the API is already plenty weird, so returning ErrKeyNotExist (or some other sentinel) doesn't feel too onerous ;) It's important to be able to use the API in a generic fashion, for example as it stands it can't be used to optimize MapIterator.
A possible workaround is that we can add a done return value that is "true" when the batch operation has reached the end of the map or array.
That's what the lookup APIs were originally: Lookup(key, value) (bool, error)
. It turns out this is actually really cumbersome to use in the common case where we want to treat an absent key as an error:
var value uint32
if ok, err := m.Lookup(key, &value); !ok {
return fmt.Errorf("doesn't exist")
} else if err != nil {
return fmt.Errorf("bla: %s", err)
}
// vs
var value uint32
if err := m.Lookup(key, value); err != nil {
return fmt.Errorf("bla: %s", err) // NB: err already contains a stringification of key here for free
}
syscalls.go
Outdated
@@ -345,6 +377,10 @@ func wrapMapError(err error) error { | |||
return ErrKeyExist | |||
} | |||
|
|||
if errors.Is(err, unix.ENOTSUPP) { | |||
return ErrBatchOpNotSup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see this before, this is going to trigger false positives if some other map non-batch command returns ENOTSUPP. Why not just return ErrNotSupported in this case? Seems like you have a specific use case for ErrBatchOpNotSup in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did when we were doing this by map type, but now that we're not your suggestion is correct.
682ad4e
to
dadfc3e
Compare
@lmb I don't understand why the push test didn't work, but the PR one did. |
As of kernel v5.6 batch methods allow for the fast lookup, deletion, and updating of bpf maps so that the syscall overhead (repeatedly calling into any of these methods) can be avoided. The batch methods are as follows: * BatchUpdate * BatchLookup * BatchLookupAndDelete * BatchDelete Only the "array" and "hash" types currently support batch operations, and the "array" type does not support batch deletion. Tests are in place to test every scenario and helper functions have been written to catch errors that normally the kernel would give to helpful to users of the library. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
startKey is now called prevKey, remove mentions of the former.
dadfc3e
to
b5f3b14
Compare
@nathanjsweet I pushed three small clean up commits, if you agree with them feel free to squash + merge. |
b5f3b14
to
95b312f
Compare
As a follow up to cilium#207, add support for PerCPU Hash and Array maps to the following methods: - BatchLookup() - BatchLookupAndDelete() - BatchUpdate() - BatchDelete() This provides a significant performance improvement by amortizing the overhead of the underlying syscall. In this change, the API contact for the batches is a flat slice of values []T: batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN In order to avoid confusion and panics for users, the library is strict about the expected lengths of slices passed to these methods, rather than padding slices to zeros or writing partial results. An alternative design that was considered was [][]T: batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...} []T was partly chosen as it matches the underlying semantics of the syscall, although without correctly aligned data it cannot be a zero copy pass through. Caveats: * Array maps of any type do not support batch delete. * Batched ops support for PerCPU Array Maps was only added in 5.13: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/ Signed-off-by: Alun Evans <alun@badgerous.net> Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
As a follow up to cilium#207, add support for PerCPU Hash and Array maps to the following methods: - BatchLookup() - BatchLookupAndDelete() - BatchUpdate() - BatchDelete() This provides a significant performance improvement by amortizing the overhead of the underlying syscall. In this change, the API contact for the batches is a flat slice of values []T: batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN In order to avoid confusion and panics for users, the library is strict about the expected lengths of slices passed to these methods, rather than padding slices to zeros or writing partial results. An alternative design that was considered was [][]T: batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...} []T was partly chosen as it matches the underlying semantics of the syscall, although without correctly aligned data it cannot be a zero copy pass through. Caveats: * Array maps of any type do not support batch delete. * Batched ops support for PerCPU Array Maps was only added in 5.13: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/ Signed-off-by: Alun Evans <alun@badgerous.net> Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
As a follow up to cilium#207, add support for PerCPU Hash and Array maps to the following methods: - BatchLookup() - BatchLookupAndDelete() - BatchUpdate() - BatchDelete() This provides a significant performance improvement by amortizing the overhead of the underlying syscall. In this change, the API contact for the batches is a flat slice of values []T: batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN In order to avoid confusion and panics for users, the library is strict about the expected lengths of slices passed to these methods, rather than padding slices to zeros or writing partial results. An alternative design that was considered was [][]T: batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...} []T was partly chosen as it matches the underlying semantics of the syscall, although without correctly aligned data it cannot be a zero copy pass through. Caveats: * Array maps of any type do not support batch delete. * Batched ops support for PerCPU Array Maps was only added in 5.13: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/ Signed-off-by: Alun Evans <alun@badgerous.net> Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
As a follow up to #207, add support for PerCPU Hash and Array maps to the following methods: - BatchLookup() - BatchLookupAndDelete() - BatchUpdate() - BatchDelete() This provides a significant performance improvement by amortizing the overhead of the underlying syscall. In this change, the API contact for the batches is a flat slice of values []T: batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN In order to avoid confusion and panics for users, the library is strict about the expected lengths of slices passed to these methods, rather than padding slices to zeros or writing partial results. An alternative design that was considered was [][]T: batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...} []T was partly chosen as it matches the underlying semantics of the syscall, although without correctly aligned data it cannot be a zero copy pass through. Caveats: * Array maps of any type do not support batch delete. * Batched ops support for PerCPU Array Maps was only added in 5.13: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/ Signed-off-by: Alun Evans <alun@badgerous.net> Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
As of kernel v5.6 batch methods allow for the
fast lookup, deletion, and updating of bpf maps
so that the syscall overhead (repeatedly calling
into any of these methods) can be avoided.
Add support for BatchUpdate, BatchLookup, BatchLookupDelete, and BatchDelete
as well as tests for all of the above.
Signed-off-by: Nate Sweet nathanjsweet@pm.me