Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf2go: recognise go package paths in C includes #1473

Closed
wants to merge 1 commit into from

Conversation

mejedi
Copy link
Contributor

@mejedi mejedi commented May 28, 2024

Recognise go package paths in C includes when -gopkg-includes flag is passed, e.g. #include "github.com/cilium/ebpf/foo/bar.h".

It is handy for sharing code between multiple ebpf blobs withing a project. (The alternative is "../.." paths which are brittle, or configuring include paths via CFLAGS in a Makefile which is something an average Golang programmer is not comfortable with.)

Even better, it enables sharing ebpf code between multiple projects using go modules as delivery vehicle.

@mejedi mejedi requested a review from a team as a code owner May 28, 2024 10:43
@mejedi mejedi force-pushed the bpf2go-gopkg-includes branch 3 times, most recently from 439e852 to 86b6d31 Compare May 28, 2024 11:01
@mejedi
Copy link
Contributor Author

mejedi commented May 28, 2024

Some context #1320

Sharing code between projects is for reusable custom helpers (e.g. XDP needs something to fill in MAC addresses, and bpf_fib_lookup alone is generally not enough), NOT for “standard” headers such as libbpf and Linux.

СС @ti-mo @lmb

@mejedi mejedi force-pushed the bpf2go-gopkg-includes branch from 86b6d31 to 927d330 Compare May 29, 2024 08:26
@mejedi
Copy link
Contributor Author

mejedi commented Jun 11, 2024

Ping @lmb @ti-mo

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you went about tying the go modules into the search path is very neat. I was worried that we'd have to resort to parsing the C somehow, this is much nicer. Adding all direct dependencies to the search path is also smart because it means we don't have to come up with a way of maintaining the dependencies separate from go.mod.

  • How would one go about adding a dependency for the purpose of pulling in C code? Imagine a situation where the go module only used from the C, and so isn't naturally part of go.mod. Something like https://go.dev/wiki/Modules#how-can-i-track-tool-dependencies-for-a-module?
  • What will the modules look like? Just C files in a go module wrapper? C + Go code?
  • If I store my C headers / sources in bpf/includes/ or something like that, the include path would be "example.com/foo/bpf/includes/bar.h". Not the end of the world, but also seems like a bit of a missed opportunity? Reminds me a bit of pkg-config.

Overall this seems like a good idea, I'm a bit worried that this is a very broad / powerful feature which means there are edge cases we don't know about yet. I wonder if we should make this opt-in from the depended module's perspective? Something like a bpf2go.json in the root of the module which contains some metadata, for example which subdirectory to expose to clang. That way we could extend the module-as-source-code concept bit by bit.

People have also asked about being able to put more code in .c files and then do actual linking, which feels similar to this. At least we'd need to be able to refer to headers for forward declarations. If i can reference a header file from another module, should I be able to compile a c file from the other module (i think not, but what is the exact rationale?)

Once we've figured out whether to make this opt-in this also needs tests and some documentation.

// either "file" (a reference to file) or "directory-remap" (a reference
// to directory).
//
// https://github.com/llvm/llvm-project/blob/llvmorg-18.1.0/llvm/include/llvm/Support/VirtualFileSystem.h#L637
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appeared in LLVM 18?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It dates back to at least LLVM 10.

}

func vfsAdd2(vi *vfsItem, vpath, path string, typ vfsItemType, specificity int, fs fs.FS) error {
for _, name := range strings.Split(vpath, "/") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: filepath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is Golang module URL being split, it only ever contains forward slashes. Maybe path instead of filepath? Unfortunately, I can't find anything helpful in there either. (Besides a function to extract a single component at a time, starting from the right side.)

return os.Stat(name)
}

func vfsAdd2(vi *vfsItem, vpath, path string, typ vfsItemType, specificity int, fs fs.FS) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a method on vfsItem?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of complexity here, with symlink handling and so on. Would it be possible to just create a single directory-remap entry per module in vfs.Roots, without descending into the dir itself?

Kinda like this:

{"roots": [
	{
		"type": "directory-remap",
		"name": "github.com/foo/bar/baz",
		"external-contents": "/home/foo/.cache/..."
	}
]}

See https://github.com/llvm/llvm-project/blob/461274b81d8641eab64d494accddc81d7db8a09e/llvm/include/llvm/Support/VirtualFileSystem.h#L699-L712

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a method on vfsItem?

Fixed.

... single directory-remap entry per module in vfs.Roots, without descending into the dir itself?

A comment slightly below the linked lines say

/// For 'file', 'directory' and 'directory-remap' entries the 'name' field may
/// contain multiple path components (e.g. /path/to/file). However, any
/// directory in such a path that contains more than one child must be uniquely
/// represented by a 'directory' entry.

Which rules out a flat list of entries. Further, golang module trees might "overlap", e.g. github.com/aws/aws-sdk-go-v2 and github.com/aws/aws-sdk-go-v2/internal/endpoints/v2. I can't add both to vfs unless I do some merging, which requires enumerating files in the directory.


// persistVfs stores vfs in user cache dir under a path derived from
// outputDir; the file should stay around for the benefit of a language
// server / IDE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you configure your IDE to access this? What happens if I don't have it configured, code completion doesn't work?

Copy link
Contributor Author

@mejedi mejedi Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other people on my team use it. I think there's a tool you can use for launching other programs, bpf2go included. The tool does some LD_PRELOAD hackery to capture arguments passed to a compiler. This is later used by clangd to configure include paths, amongst other things. I will share the tool name shortly.

Basically, it is autocompletion that stops working, plus plenty of bogus compile errors are shown.

Vfsoverlay file has to stay around, otherwise clangd will have a very different idea about include paths than the actual compiler used to build bpf code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

const vfsOverlayDir = "/.vfsoverlay.d"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the code this is used to make llvm search the vfs? Could use a doc comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"strings"
)

// vfs is LLVM virtual file system parsed from YAML file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seems like we're doing JSON (thankfully)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON is the proper subset of YAML. Would the following wording work?

vfs is LLVM virtual file system parsed from a config file

roots := [1]vfsItem{{Name: vfsOverlayDir, Type: vfsDirectory}}
for _, m := range mods {
if m.Dir == "" {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this ignores all indirect dependencies, which is really hard to discern from just reading the code. Maybe it makes more sense to do the filtering in parseMods?

Copy link
Contributor Author

@mejedi mejedi Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is highly confusing, agreed. Actually, Dir is empty if you don't have the module downloaded for whatever reason. There's also Indirect field. Changing to

mods, err := listMods(b2g.outputDir, "all")
if err != nil {
        return fmt.Errorf("listing go modules: %w", err)
}
var mainModAndDirectDeps []mod
for _, m := range mods {
       if m.Indirect {
                continue
       }
        if m.Dir == "" {
                return fmt.Errorf("%s is missing locally: consider 'go mod download'", m.Path)
        }
        mainModAndDirectDeps = append(mainModAndDirectDeps, m)
}
vfs, err := createVfs(mainModAndDirectDeps)

Sidenote: it is actually a happy coincidence that we only go after direct dependencies. go mod download (and probably go build as well) may skip fetching some indirect dependencies:

go mod download

If the main module’s go.mod file specifies go 1.17 or higher, go mod download without arguments now downloads source code for only the modules explicitly required in the main module’s go.mod file. (In a go 1.17 or higher module, that set already includes all dependencies needed to build the packages and tests in the main module.) To also download source code for transitive dependencies, use go mod download all.go mod download

if err != nil {
return fmt.Errorf("creating LLVM vfsoverlay: %w", err)
}
b2g.vfsoverlay, err = persistVfs(b2g.outputDir, vfs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to write this into the output directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually stored in user cache dir under sha256 of the output path as the name. Changing slightly so that the code doesn't suggest that something is being written to outputDir.

vfsID := sha256.Sum256([]byte(b2g.outputDir))
b2g.vfsoverlay, err = persistVfs(vfsID, vfs)

(This is being done to reduce the number of files that linger in user cache dir. For the rationale to keep these files around see sibling thread.)

@mejedi mejedi force-pushed the bpf2go-gopkg-includes branch 3 times, most recently from 0941dc8 to 3750dbd Compare June 14, 2024 15:35
Recognise go package paths in C includes when -gopkg-includes flag is
passed, e.g. #include "github.com/cilium/ebpf/foo/bar.h"

It is handy for sharing code between multiple ebpf blobs withing a
project. (The alternative is "../.." paths which are brittle, or
configuring include paths via CFLAGS in a Makefile which is something an
average Golang programmer is not comfortable with.)

Even better, it enables sharing ebpf code between multiple projects
using go modules as delivery vehicle.

Signed-off-by: Nick Zavaritsky <mejedi@gmail.com>
@mejedi mejedi force-pushed the bpf2go-gopkg-includes branch from 3750dbd to d3daf1c Compare June 14, 2024 15:39
@lmb
Copy link
Collaborator

lmb commented Jun 21, 2024

@mejedi I've thought a lot about what to do with this PR, I think you have a good idea here with a neat implementation! Unfortunately I'm not in a position to figure out if the implementation is right because I only use bpf2go in some really basic scenarios. My worry is that this feature will lead to a high maintenance burden, for which I don't have the time at the moment.

As an alternative, I've just created #1494 which should hopefully allow you to build your own tool with little effort. I think it will allow you to implement your idea by just passing another flag to clang. Please let me know if anything is missing.

@lmb lmb closed this Jun 21, 2024
@mejedi
Copy link
Contributor Author

mejedi commented Jun 21, 2024

I’m rather unhappy that you decided to end the discussion abruptly. We as the community would be in a better position if we figured features such as code reuse for ebpf together instead of encouraging forks. Please reconsider.

@lmb
Copy link
Collaborator

lmb commented Jun 24, 2024

hey nick, i understand your frustration, sorry for not communicating with you directly before hand. you wrote that we would be in a better place if code reuse worked better: i agree. you've got a good idea here: extend the go module / package system to C code. using the clang vfs abstraction is neat.

the problem is that i'm the only person maintaining bpf2go. adding code always means that i take on an additional burden, and often times i have a good idea how big that burden will be. then it's easy for me to make a decision. for this PR i do not have that, and this worries me. one reason is that i'm not really using this system myself, so my judgement isn't going to be as good. the other reason is that we're trying to emulate the module system without really using it for it's intended purpose.

  • by relying on go.mod for C dependencies we make many assumptions about Go toolchain behaviour. The go toolchain has changed what is considered a direct and indirect dependency multiple times, because it only cares about Go code. How are users going to add a dependency on a module that is only used for C, so that it continues to be present in the future? I would feel much better if we actually used Go somehow to express dependencies via API.

  • by making all direct dependencies available to the C code we lose the ability to have some sort of contract between bpf2go and the included C code. This worries me because it makes backwards compatible changes a lot harder: if everything is allowed / accessible we can't know how something is used. At the same time it's also not possible to scope dependencies to just a package, which feels more natural to me.

i also have a somewhat philosophical argument: bpf2go suffers from being very useful in a narrow use case and at the same time not being flexible enough. yet making it more flexible is really onerous because CLI tools inevitably end up mired in the combinatorial explosion of their five hundred flags (looking at ffmpeg here). programming languages are the best tool we have to deal with this. for this reason i'm convinced that exposing bpf2go as a Go API is the right way forward.

i made the following mistakes:

  1. I assumed that you'd be fine with rebasing your implementation out of tree, while in reality you want to improve the experience for everyone in the ecosystem.
  2. I put off making a decision / giving clear guidelines for too long, because I was unsure what to do. I wasted your time and enthusiasm because of this.

i'll try to do better next time! it would be interesting to explore how code reuse could fit into the bpf2go Go API, if you are up for it.

@mejedi
Copy link
Contributor Author

mejedi commented Jun 24, 2024

@lmb
Hi Lorenz, thank you for a thoughtful and kind response. I kind of come to terms with it now that the dust has settled. I also appreciate your input; I will be implementing something for a module to restrict the header files it is willing to provide.
I might have misinterpreted the inputs; I was under an impression that cilium/ebpf was aiming to serve a broader community besides just Cilium.
I think that bpf2go can be improved. Maybe I’m biased and my favorite features are of a little use for everyone else. Ebpf code reuse, separate compilation and linking fall into this folder.
There are also features that serve a broader audience. It would be nice if one didn’t have to setup Docker wrappers around bpf2go to have reproducible builds/control toolchain and headers versions. Probably every user would prefer if the tool handled it internally. This certainly add complexity and if you are the sole maintainer of the tool I totally understand your reluctance. But it kind of prevents the community from having nice things.
Is there a chance you could use external co-maintainers for bpf2go?

@janca-ucdavis
Copy link

Hi, I don't want to veer too far off course from what's already been said here, but I'd like to add my take. I've been working with bitwise math in bpf (and before that, dtrace) and it's been extremely helpful to be able to do more sophisticated math in-kernel even for things as humble as high resolution histograms and heat maps. I want to expand it to incorporate vector math, clustering, FFTs and the like.

The thing is, I would like to build a library for general re-use, and it's not immediately obvious how to do this in a way that will benefit the most people. These are functions that likely do not belong as helpers but there is a high need for re-use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants