Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: BuildKit network dependencies #4099

Open
tonistiigi opened this issue Aug 3, 2023 · 3 comments
Open

Discussion: BuildKit network dependencies #4099

tonistiigi opened this issue Aug 3, 2023 · 3 comments

Comments

@tonistiigi
Copy link
Member

tonistiigi commented Aug 3, 2023

ref #1337
ref #3960

message ExecOp {
  Meta meta = 1;
  repeated Mount mounts = 2;
  NetMode network = 3;
  SecurityMode security = 4;
  repeated SecretEnv secretenv = 5;
  
  repeated NetworkInterface networks = 6;

  // note that Inputs array for this execOp needs to also define all
  // the inputs used by any mount of the ServicePeer that ExecOp has access
}

message NetworkInterface {
  string name = 1; // unique per execop
  string IP = 2; // eg. 10.0.0.1/24
  int32 mtu = 3; // probably could be just fixed
  repeated Peer peers = 4;
}

message Peer {
  string IP = 1;
  oneof {
    SessionPeer = 2;
    ServicePeer = 3;
    Link = 4;
  }
}

message SessionPeer {
  string sessionID = 1;
  string peerID = 2; // client needs to configure handler for such ID
}

message ServicePeer {
  string id = 1; // Same string with same Job group means same process (assuming that they also have equal inputs)
  NewContainerRequest container = 2;
  InitMessage init = 3;

  // stopsignal?
}

// Link can be used to link two containers already defined under execop.
// The definitions or peers have already been defined so only need to link to
// existing ServicePeer ID. If no such ID is found then it is runtime error.
message Link {
  string ID = 1;
}

message NewContainerRequest {
  string ContainerID = 1;
  // NEW: these mounts can use input indexes from the ExecOp inputs array
  repeated pb.Mount Mounts = 2;
  pb.NetMode Network = 3;
  pb.Platform platform = 4;
  pb.WorkerConstraints constraints = 5;
  repeated pb.HostIP extraHosts = 6;
  string hostname = 7;

  repeated NetworkInterface networks = 8;
}

Network dependencies can be created between:

  • ExecOp and ServicePeer container started by buildkit (and controlled by Job lifecycle)
  • ExecOp or ServicePeer container and buildkit client that has dialed into Session API
  • Interactive container and buildkit client that has dialed into Session API
  • (theoretically possible ExecOp and Interactive container, but let’s leave this out for now until there is use-case).

All communication via WireGuard packets. No support or dependency for any custom CNI backend or special setup.

Implementation is based on go-wireguard/netstack https://pkg.go.dev/golang.zx2c4.com/wireguard@v0.0.0-20230704135630-469159ecf7d1/tun/netstack library. This is imported in client and buildkitd. buildkitd side can (optional) natively create WireGuard interface instead, but that would require that 2 BuildKit containers have discoverable endpoint between them (eg. simplest is to use CNI bridge for that).

// example connection logic

tunDev, gNet, err := netstack.CreateNetTUN([]netip.Addr{myIP}, []netip.Addr{}, mtu)
if err != nil {
	return errors.Wrap(err, "failed to create tun device")
}

wgDev := device.NewDevice(tunDev, conn.NewDefaultBind(), device.NewLogger(device.LogLevelVerbose, "wireguard: "))

for _, peer := range me.Peers {
  // keys are auto-generated by library. public keys shared via session API when needed
	wgConf := bytes.NewBuffer(nil)
	fmt.Fprintf(wgConf, "private_key=%s\n", hex.EncodeToString(me.PrivateKey[:]))
	fmt.Fprintf(wgConf, "public_key=%s\n", hex.EncodeToString(peer.PublicKey[:]))
	if peer.Endpoint != nil {
		fmt.Fprintf(wgConf, "endpoint=%s\n", peer.Endpoint) // Endpoint is dummy unique value for custom bind
	}

	ips := make([]string, len(peer.AllowedIPs))
	for i, ip := range peer.AllowedIPs {
		ips[i] = ip.String()
	}

	fmt.Fprintf(wgConf, "allowed_ip=%s\n", strings.Join(ips, ","))
	fmt.Fprintf(wgConf, "persistent_keepalive_interval=%d\n", 10)

	if err := wgDev.IpcSetOperation(bufio.NewReader(wgConf)); err != nil {
		return errors.Wrap(err, "failed to set wg device config")
	}
}
if err := wgDev.Up(); err != nil {
	return errors.Wrap(err, "failed to bring wg device up")
}

There are no open ports or extra communication channels for network traffic. Everything is on the gRPC connection from session endpoint. Daemon side communication is either between buildkitd(running netstack) and tuntap device, or between two wireguard interfaces directly.

For that, instead of conn.NewDefaultBind() call above a custom implementation of conn.Bind is needed. https://pkg.go.dev/golang.zx2c4.com/wireguard@v0.0.0-20230704135630-469159ecf7d1/conn#Bind

type Bind interface {
	// Open puts the Bind into a listening state on a given port and reports the actual
	// port that it bound to. Passing zero results in a random selection.
	// fns is the set of functions that will be called to receive packets.
	Open(port uint16) (fns []ReceiveFunc, actualPort uint16, err error)

	// Close closes the Bind listener.
	// All fns returned by Open must return net.ErrClosed after a call to Close.
	Close() error

	// SetMark sets the mark for each packet sent through this Bind.
	// This mark is passed to the kernel as the socket option SO_MARK.
	SetMark(mark uint32) error

	// Send writes one or more packets in bufs to address ep. The length of
	// bufs must not exceed BatchSize().
	Send(bufs [][]byte, ep Endpoint) error

	// ParseEndpoint creates a new endpoint from a string.
	ParseEndpoint(s string) (Endpoint, error)

	// BatchSize is the number of buffers expected to be passed to
	// the ReceiveFuncs, and the maximum expected to be passed to SendBatch.
	BatchSize() int
}

Eg. there is a new API registered in the Session endpoint that defines streaming endpoint that sends init/data/close etc packets that implement this interface.

netstack.CreateNetTun() https://pkg.go.dev/golang.zx2c4.com/wireguard@v0.0.0-20230704135630-469159ecf7d1/tun/netstack#CreateNetTUN

@vito @pchico83

@sipsma
Copy link
Collaborator

sipsma commented Aug 3, 2023

I like this idea a lot; a nice side-effect is that it seems like it would probably work if buildkitd was not root and spinning up rootless container via oci worker, thanks to running netstack in userspace?

A mild implementation-detail concern is how much a buildkit client (i.e. buildx, the dagger cli, etc.) is going to get bloated by importing+running wireguard's netstack, both in terms of binary size and performance overhead.

If the client is using the docker-container connhelper and buildkitd->client connections are via session attachable (w/ grpc-in-grpc), I think at that point we'd be tunneling wireguard-over-grpc-over-grpc-over-stdio-pipes. Entirely possible that works great and doesn't matter, just seems like something to look out for.

@tonistiigi
Copy link
Member Author

spinning up rootless container via oci worker, thanks to running netstack in userspace?

Possibly yes. I guess that case creating a native wg interface would be too privileged.

A mild implementation-detail concern is how much a buildkit client (i.e. buildx, the dagger cli, etc.) is going to get bloated by importing+running wireguard's netstack, both in terms of binary size and performance overhead.

I haven't measured how big these imports are. On buildkitd side, it is needed. On client side, it would probably be a good idea to use a different import that enables the sessionpeer handler support

@jedevc
Copy link
Member

jedevc commented Aug 16, 2023

If the client is using the docker-container connhelper and buildkitd->client connections are via session attachable (w/ grpc-in-grpc), I think at that point we'd be tunneling wireguard-over-grpc-over-grpc-over-stdio-pipes. Entirely possible that works great and doesn't matter, just seems like something to look out for.

Potentially, for the moby and dagger engine case, we could have a new ClientOpt WithNetworkDialer (bad name), similar to how today we have a WithSessionDialer. The client could open a direct connection into a reachable endpoint, and then perform wireguard connections through there, without an intermediate GRPC tunneling layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants