Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BPF Batch Ops Methods #207

Merged
merged 4 commits into from
Feb 23, 2021
Merged

Conversation

nathanjsweet
Copy link
Member

As of kernel v5.6 batch methods allow for the
fast lookup, deletion, and updating of bpf maps
so that the syscall overhead (repeatedly calling
into any of these methods) can be avoided.

Add support for BatchUpdate, BatchLookup, BatchLookupDelete, and BatchDelete
as well as tests for all of the above.

Signed-off-by: Nate Sweet nathanjsweet@pm.me

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR, it's cool to have you contributing again :) I think it makes sense to expose the low level batch primitives. I'm a bit worried that Map is getting bigger and bigger, but NextKey, etc. is already there.

Looking at the PR I realised that the current situation with unmarshalBytes, unmarshalMap, marshalPtr is pretty confusing. I wanted to suggest folding some of the []*Map checks into unmarshalBytes but now I'm not so sure how that would work. I'll try to clean up that code and get back to you. Take a look at my comments on the test, maybe we can get rid of the new map / program unmarshaling in the first place.

map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map_test.go Show resolved Hide resolved
run-tests.sh Outdated Show resolved Hide resolved
run-tests.sh Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
prog.go Outdated Show resolved Hide resolved
syscalls.go Show resolved Hide resolved
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch 2 times, most recently from 5adb79a to 1ed71e7 Compare February 4, 2021 04:41
@nathanjsweet nathanjsweet marked this pull request as ready for review February 4, 2021 04:44
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch 2 times, most recently from 0edde42 to e979c3c Compare February 4, 2021 18:21
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch from e979c3c to aeb1ca1 Compare February 9, 2021 15:27
@ti-mo
Copy link
Collaborator

ti-mo commented Feb 11, 2021

Hi Nate, I realize I'm a little late to the party, sorry about that. I have some gripes with the API proposed here, but that should not necessarily be a blocker to merge this as long as we don't slap a version number on it.

  • It feels more like C than Go (taking mostly references as function arguments, etc.)
  • interface{} everywhere and generous use of reflect
  • It expects the caller to allocate everything (I know this was likely done to avoid allocs)
  • Function documentation is rather sparse and there are no examples provided at all

I would suggest a MapIterator-like approach here, although I would also like to propose some changes to MapIterator to reduce its reliance on interface{}.

Let's call it.. MapBatchIterator? Instead of Next() returning a slice of results, it pulls the results into an internal preallocated scratch buffer to avoid any iterator-related allocations. After a successful pull, the caller invokes Batch() or Results() or something similar, which returns the result set. The efficiency gains here may be non-existent, but it's meant to be consistent with what I had in mind for MapIterator, where this approach could eliminate any use of interface{} for primitive k/v types.

Iterator pagination state is kept internally, and must be reset using a Reset() method to restart the iteration, allowing them to be re-used. Iterators are not thread-safe, that would be up to the caller to implement.

I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general.

WDYT?

@lmb
Copy link
Collaborator

lmb commented Feb 11, 2021

You're right, it's not a very nice API. For better or worse I think it's useful to have it though: there is always something lost in translation when we build a higher-level interface, some use case that we can't foresee. This kind of bare bones API is a pressure release valve for such situations: users can just drop down to the ugly interface and aren't blocked. Map.Update and Map.NextKey are other examples of such an API. Their downside is that the ugly interfaces make it harder to find and use the nicer ones. I don't have a good solution for this except moving low level stuff in to a separate package. That seems like a lot of boilerplate though.

So my take is: we should merge this once we've got the array vs hashmap ErrNotExist thing figured out.

I would suggest a MapIterator-like approach here, although I would also like to propose some changes to MapIterator to reduce its reliance on interface{}.
Let's call it.. MapBatchIterator? Instead of Next() returning a slice of results, it pulls the results into an internal preallocated scratch buffer to avoid any iterator-related allocations.

I kind of figured that we could make MapIterator use batch lookups behind the scenes, if they are available. So Next() still operates on single elements but in the background we only issue one batch lookup every X calls to Next.

Is this something you were considering @nathanjsweet?

I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general.

Can you create an issue and describe your ideas for MapIterator and/or key value typing? I agonized about interface{} in the beginning, but in practice I can't find much wrong with it. Also keep in mind that Go generics are on the horizon (another year?) which will surely make this a lot nicer.

@nathanjsweet
Copy link
Member Author

Hi Nate, I realize I'm a little late to the party, sorry about that. I have some gripes with the API proposed here, but that should not necessarily be a blocker to merge this as long as we don't slap a version number on it.

It feels more like C than Go (taking mostly references as function arguments, etc.)
interface{} everywhere and generous use of reflect
It expects the caller to allocate everything (I know this was likely done to avoid allocs)
Function documentation is rather sparse and there are no examples provided at all

I share your concerns @ti-mo. I'm not wild about it either. All I can say is that I'm excited for generics to land. My only real defense is that I tried to be generous in the error messaging. Also, I wouldn't say there are no examples. The tests offer some decent examples for the 3 different map types batch supports.

I would suggest a MapIterator-like approach here, although I would also like to propose some changes to MapIterator to reduce its reliance on interface{}.

Let's call it.. MapBatchIterator? Instead of Next() returning a slice of results, it pulls the results into an internal preallocated scratch buffer to avoid any iterator-related allocations. After a successful pull, the caller invokes Batch() or Results() or something similar, which returns the result set. The efficiency gains here may be non-existent, but it's meant to be consistent with what I had in mind for MapIterator, where this approach could eliminate any use of interface{} for primitive k/v types.

Iterator pagination state is kept internally, and must be reset using a Reset() method to restart the iteration, allowing them to be re-used. Iterators are not thread-safe, that would be up to the caller to implement.

I'm not opposed to this at all. It's a good idea, but I really don't like obfuscating basic operations from the users of this library. There are just too many variables with using the batch operations that prevent us from abstracting them in a way that would be satisfactory to everyone. Philosophically, I don't think we should be too scared about lots of methods piling up. There are lots methods/operations in eBPF. I do think we can go beyond the harshness of libbpf, but I never want to hide basic functionality.

I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general.

We do need to have a better design philosophy that we can spell out in the repository. It's getting big enough that we should have a document we can all point to for justifying an approach. Maybe we could do a megathread on slack or have a zoom meeting.

@lmb lmb mentioned this pull request Feb 12, 2021
Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the following change to the Array test:

diff --git a/map_test.go b/map_test.go
index 1fb5732..6abcedc 100644
--- a/map_test.go
+++ b/map_test.go
@@ -168,7 +168,7 @@ func TestBatchAPIArray(t *testing.T) {
 		Type:       Array,
 		KeySize:    4,
 		ValueSize:  4,
-		MaxEntries: 10,
+		MaxEntries: 2,
 	})
 	if err != nil {
 		t.Fatal(err)
@@ -218,6 +218,28 @@ func TestBatchAPIArray(t *testing.T) {
 		t.Errorf("BatchUpdate and BatchLookup values disagree: %v %v", values, lookupValues)
 	}
 
+	count, err = m.BatchLookup(uint32(1), &nextKey, lookupKeys, lookupValues, nil)
+	if !errors.Is(err, ErrKeyNotExist) {
+		t.Error("Expected ErrKeyNotExist when batch runs into end of array, got", err)
+	}
+	if count != 1 {
+		t.Error("Expected a single result, got", count)
+	}
+	if lookupKeys[0] != 1 {
+		t.Error("Expected first key to be 1, got", lookupKeys[0])
+	}
+	if lookupValues[0] != 4242 {
+		t.Error("Expected first value to be 4242, got", lookupValues[0])
+	}
+
+	count, err = m.BatchLookup(uint32(2), &nextKey, lookupKeys, lookupValues, nil)
+	if !errors.Is(err, ErrKeyNotExist) {
+		t.Error("Expected ErrKeyNotExist, got", err)
+	}
+	if count != 0 {
+		t.Error("Expected no result, got", count)
+	}
+
 	_, err = m.BatchLookupAndDelete(nil, &nextKey, deleteKeys, deleteValues, nil)
 	if !errors.Is(err, ErrBatchOpNotSup) {
 		t.Fatalf("BatchLookUpDelete: expected error %v, but got %v", ErrBatchOpNotSup, err)

This is roughly how I would expect the API to behave based on our conversation. Here is what I get:

=== RUN   TestBatchAPIArray
    /home/lorenz/dev/ebpf/map_test.go:223: Expected ErrKeyNotExist when batch runs into end of array, got <nil>
    /home/lorenz/dev/ebpf/map_test.go:226: Expected a single result, got 0
    /home/lorenz/dev/ebpf/map_test.go:229: Expected first key to be 1, got 0
    /home/lorenz/dev/ebpf/map_test.go:232: Expected first value to be 4242, got 0
    /home/lorenz/dev/ebpf/map_test.go:237: Expected ErrKeyNotExist, got <nil>
    /home/lorenz/dev/ebpf/map_test.go:240: Expected no result, got 2

PTAL.

map_test.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch from aeb1ca1 to 97bb871 Compare February 17, 2021 23:04
@nathanjsweet
Copy link
Member Author

nathanjsweet commented Feb 17, 2021

This is roughly how I would expect the API to behave based on our conversation. Here is what I get:

=== RUN TestBatchAPIArray
/home/lorenz/dev/ebpf/map_test.go:223: Expected ErrKeyNotExist when batch runs into end of array, got
/home/lorenz/dev/ebpf/map_test.go:226: Expected a single result, got 0
/home/lorenz/dev/ebpf/map_test.go:229: Expected first key to be 1, got 0
/home/lorenz/dev/ebpf/map_test.go:232: Expected first value to be 4242, got 0
/home/lorenz/dev/ebpf/map_test.go:237: Expected ErrKeyNotExist, got
/home/lorenz/dev/ebpf/map_test.go:240: Expected no result, got 2

There's a couple of things missing from your assumptions.

  1. startKey is excluded from the batch processing (i.e. it starts with the key after startKey).
  2. The kernel returns ENOENT is used to indicate the end of the list even if a successful partial result set is returned. I decided against returning the error because it is not typical in golang to return an error even if everything went fine.

A possible workaround is that we can add a done return value that is "true" when the batch operation has reached the end of the map or array. Otherwise, I think the way the library is working maps to libbpf's behavior pretty cleanly.

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startKey is excluded from the batch processing (i.e. it starts with the key after startKey).

Ah ok, that is quite confusing. Can you mention that in the docs? Maybe rename startKey -> prevKey?

The kernel returns ENOENT is used to indicate the end of the list even if a successful partial result set is returned. I decided against returning the error because it is not typical in golang to return an error even if everything went fine.

I think the API is already plenty weird, so returning ErrKeyNotExist (or some other sentinel) doesn't feel too onerous ;) It's important to be able to use the API in a generic fashion, for example as it stands it can't be used to optimize MapIterator.

A possible workaround is that we can add a done return value that is "true" when the batch operation has reached the end of the map or array.

That's what the lookup APIs were originally: Lookup(key, value) (bool, error). It turns out this is actually really cumbersome to use in the common case where we want to treat an absent key as an error:

var value uint32
if ok, err := m.Lookup(key, &value); !ok {
    return fmt.Errorf("doesn't exist")
} else if  err != nil {
   return fmt.Errorf("bla: %s", err)
}

// vs

var value uint32
if err := m.Lookup(key, value); err != nil {
    return fmt.Errorf("bla: %s", err) // NB: err already contains a stringification of key here for free
}

syscalls.go Outdated
@@ -345,6 +377,10 @@ func wrapMapError(err error) error {
return ErrKeyExist
}

if errors.Is(err, unix.ENOTSUPP) {
return ErrBatchOpNotSup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this before, this is going to trigger false positives if some other map non-batch command returns ENOTSUPP. Why not just return ErrNotSupported in this case? Seems like you have a specific use case for ErrBatchOpNotSup in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did when we were doing this by map type, but now that we're not your suggestion is correct.

marshalers.go Outdated Show resolved Hide resolved
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch 2 times, most recently from 682ad4e to dadfc3e Compare February 22, 2021 17:49
@nathanjsweet
Copy link
Member Author

@lmb I don't understand why the push test didn't work, but the PR one did.

nathanjsweet and others added 3 commits February 23, 2021 10:31
As of kernel v5.6 batch methods allow for the fast lookup,
deletion, and updating of bpf maps so that the syscall
overhead (repeatedly calling into any of these methods)
can be avoided.

The batch methods are as follows:
 * BatchUpdate
 * BatchLookup
 * BatchLookupAndDelete
 * BatchDelete

Only the "array" and "hash" types currently support
batch operations, and the "array" type does not support
batch deletion.

Tests are in place to test every scenario and helper
functions have been written to catch errors that
normally the kernel would give to helpful to users
of the library.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
startKey is now called prevKey, remove mentions of the former.
@lmb lmb force-pushed the pr/nathanjsweet/batch-map-ops branch from dadfc3e to b5f3b14 Compare February 23, 2021 10:39
@lmb
Copy link
Collaborator

lmb commented Feb 23, 2021

@nathanjsweet I pushed three small clean up commits, if you agree with them feel free to squash + merge.

@lmb lmb force-pushed the pr/nathanjsweet/batch-map-ops branch from b5f3b14 to 95b312f Compare February 23, 2021 10:40
@nathanjsweet nathanjsweet merged commit 0ad1835 into master Feb 23, 2021
@nathanjsweet nathanjsweet deleted the pr/nathanjsweet/batch-map-ops branch February 23, 2021 17:36
alxn added a commit to alxn/ebpf that referenced this pull request Dec 14, 2023
As a follow up to cilium#207, add support
for PerCPU Hash and Array maps to the following methods:

- BatchLookup()
- BatchLookupAndDelete()
- BatchUpdate()
- BatchDelete()

This provides a significant performance improvement by amortizing the
overhead of the underlying syscall.

In this change, the API contact for the batches is a flat slice of
values []T:

    batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN

In order to avoid confusion and panics for users, the library is
strict about the expected lengths of slices passed to these methods,
rather than padding slices to zeros or writing partial results.

An alternative design that was considered was [][]T:

    batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...}

[]T was partly chosen as it matches the underlying semantics of the
syscall, although without correctly aligned data it cannot be a zero
copy pass through.

Caveats:

* Array maps of any type do not support batch delete.
* Batched ops support for PerCPU Array Maps was only added in 5.13:
  https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/

Signed-off-by: Alun Evans <alun@badgerous.net>
Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
lmb pushed a commit to alxn/ebpf that referenced this pull request Dec 15, 2023
As a follow up to cilium#207, add support
for PerCPU Hash and Array maps to the following methods:

- BatchLookup()
- BatchLookupAndDelete()
- BatchUpdate()
- BatchDelete()

This provides a significant performance improvement by amortizing the
overhead of the underlying syscall.

In this change, the API contact for the batches is a flat slice of
values []T:

    batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN

In order to avoid confusion and panics for users, the library is
strict about the expected lengths of slices passed to these methods,
rather than padding slices to zeros or writing partial results.

An alternative design that was considered was [][]T:

    batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...}

[]T was partly chosen as it matches the underlying semantics of the
syscall, although without correctly aligned data it cannot be a zero
copy pass through.

Caveats:

* Array maps of any type do not support batch delete.
* Batched ops support for PerCPU Array Maps was only added in 5.13:
  https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/

Signed-off-by: Alun Evans <alun@badgerous.net>
Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
lmb pushed a commit to alxn/ebpf that referenced this pull request Dec 15, 2023
As a follow up to cilium#207, add support
for PerCPU Hash and Array maps to the following methods:

- BatchLookup()
- BatchLookupAndDelete()
- BatchUpdate()
- BatchDelete()

This provides a significant performance improvement by amortizing the
overhead of the underlying syscall.

In this change, the API contact for the batches is a flat slice of
values []T:

    batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN

In order to avoid confusion and panics for users, the library is
strict about the expected lengths of slices passed to these methods,
rather than padding slices to zeros or writing partial results.

An alternative design that was considered was [][]T:

    batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...}

[]T was partly chosen as it matches the underlying semantics of the
syscall, although without correctly aligned data it cannot be a zero
copy pass through.

Caveats:

* Array maps of any type do not support batch delete.
* Batched ops support for PerCPU Array Maps was only added in 5.13:
  https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/

Signed-off-by: Alun Evans <alun@badgerous.net>
Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
lmb pushed a commit that referenced this pull request Dec 15, 2023
As a follow up to #207, add support
for PerCPU Hash and Array maps to the following methods:

- BatchLookup()
- BatchLookupAndDelete()
- BatchUpdate()
- BatchDelete()

This provides a significant performance improvement by amortizing the
overhead of the underlying syscall.

In this change, the API contact for the batches is a flat slice of
values []T:

    batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN

In order to avoid confusion and panics for users, the library is
strict about the expected lengths of slices passed to these methods,
rather than padding slices to zeros or writing partial results.

An alternative design that was considered was [][]T:

    batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...}

[]T was partly chosen as it matches the underlying semantics of the
syscall, although without correctly aligned data it cannot be a zero
copy pass through.

Caveats:

* Array maps of any type do not support batch delete.
* Batched ops support for PerCPU Array Maps was only added in 5.13:
  https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com/

Signed-off-by: Alun Evans <alun@badgerous.net>
Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants