Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic network configuration via session attachable #3960

Closed
wants to merge 3 commits into from

Conversation

vito
Copy link
Contributor

@vito vito commented Jun 21, 2023

Adds support for passing network config (DNS + ExtraHosts) to llb.Exec, llb.Git, and llb.HTTP via a session.Attachable instead of baking it directly into LLB.

The use of a session.Attachable allows you to configure ephemeral IPs and DNS search domains without busting the cache by passing a stable ID value that is resolved at build time, just like secrets.

The end goal is to use this in Dagger, where we're switching service containers to run via the gateway NewContainer API instead of Solve. We want to use DNS search domains to keep service addresses namespaced to Dagger sessions running against the same Buildkit. With this approach we need a way to pass the search domain to anything that can reach a service, which means not only containers but HTTP and Git sources too.

I'm also interested in expanding this in the future to support TLS configuration, and to support configuring it for llb.Image too. (Softly related: #3366)

Implementation notes:

  • Exec support works by merging the provided config into the container's /etc/hosts and /etc/resolv.conf, very similar to how llb.ExtraHosts currently works.
  • HTTP works using a custom DNS resolver at the HTTP transport layer.
    • The transport code has been extracted so it can be re-used for llb.Image in the future.
  • Git works by unsharing the mount namespace and mounting over /etc/hosts and /etc/resolv.conf every time we shell out to git.
    • The overrides are created once rather than on each git call.
  • Tests have been added.
  • Commits squashed.

@vito vito force-pushed the networks branch 2 times, most recently from 3dd739f to 49cff4f Compare June 21, 2023 23:28
@tonistiigi
Copy link
Member

Couple of questions with this.

The first one is regarding how this configuration should participate in the cache and portability/reproducibility guarantees (or should it be for interactive containers only). Is it for global hostnames/IPs or for deployment-specific overrides? If it is global then it should be in the build cache. If not, then it is kind of weird, as client(remote machine) is configuring the network per request via the session, but the IPs it provides require a special configuration of buildkitd.

In that case, it should either be configured from buildkitd config, or client should share primitives for actual network forwarding instead of preconfigured IPs. The API could be for backing of a network (tuntap) interface or maybe proxying Wireguard packets and exposing the Wireguard interface in the container. Then client-side would have full control of tweaking the network properties of a step, transfers between steps or client and a step etc. Once we have these primitives it should be possible to have network dependencies in LLB.

If the client provides (preconfigured) IPs then the question is as well of how does this work with existing NetworkMode/Entitlements configurations that put things in different network namespaces, meaning their routing is different.

@vito
Copy link
Contributor Author

vito commented Jun 22, 2023

Some important context: In Dagger, services are content-addressed. Their hostname is a hash of their recipe, kind of like an LLB digest but shorter. This allows client calls like wget http://deadbeef to be cached globally so that when a service runs again and its IP changes, the client calls don't all have to re-build. If the service itself changes (different config etc), its hostname will be different, so different clients will run. Services run within Buildkit and have ephemeral IPs.

There are two sides of this to explain, since these changes correspond to two different approaches to container-to-container-networking. I started with ExtraHosts hoping it would be simpler than maintaining a DNS stack, but switched back to DNS to handle tricky cases like Dagger-in-Dagger.

ExtraHosts

To save some time: we don't need this anymore, and I realized while writing this comment that it's not very useful in its current form. I can just remove it if you want to skip to the next point, since it seems to be the source of most of your concerns so far.

To explain anyway: the IP value being passed to ExtraHosts is the IP of a gateway container, fetched from Buildkit client-side via a tricky hack (long story...). It's not a global value, and it's not the client/remote machine deciding the IP; it was allocated via Buildkit's configuration, so it seems OK to me to pass it back in.

This implementation seems hard to use though, because you would need dynamic network IDs to handle each client's hosts config, and you would need the NetworkID to be deterministic but unique to the client somehow to avoid cache busting. A better approach might be to implement a hostname => IP resolver session.Attachable and do something like llb.DynamicExtraHosts("deadbeef") instead. That approach would work fine for a content-addressed services setup like ours, or for limited cases where human-readable hostnames like postgres will only ever resolve to one IP.

DNS

The DNS use case is similar to #3044. buildkitd is configured with CNI, which is configured with a bridge network plugin and a DNS plugin that registers container hostnames with a local dnsmasq. Buildkit is then configured with dnsmasq as its DNS server, so all containers can reach each other via their hostnames.

The trick is that we want clients to always resolve hostnames to a service running in the same Dagger session. For example, if you're running two test suites concurrently on the same Buildkit we don't want them accidentally reaching each other's services. With ExtraHosts this was easy since it's all IPs local to each session, but with DNS the client could resolve deadbeef to any instance. To solve this, we randomly generate a session domain like sdfkhjkjshkjhsf.dagger.local, append it to all service hostnames, and configure the search domain using this PR. And to handle nesting, we just pass the parent Dagger's session domain along to the child, which appends it to their network config's SearchDomains.


I hope that makes sense; there's a lot of things that have to align for this all to work. Thanks for the review!

@vito
Copy link
Contributor Author

vito commented Jun 22, 2023

Follow-up: to use ExtraHosts for content-addressed services, I could configure NetworkID like dagger:abcdef,deadbeef (so just list each service dependency's hostname) and have the config provider resolve abcdef and deadbeef to IPs. Don't know why I didn't think of that before. So it could still be useful, but I don't mind removing it either way.

Also didn't mean to ignore this:

In that case, it should either be configured from buildkitd config, or client should share primitives for actual network forwarding instead of preconfigured IPs. The API could be for backing of a network (tuntap) interface or maybe proxying Wireguard packets and exposing the Wireguard interface in the container. Then client-side would have full control of tweaking the network properties of a step, transfers between steps or client and a step etc. Once we have these primitives it should be possible to have network dependencies in LLB.

I don't have enough experience with the tech mentioned here to fully picture the proposal, but I'm very interested in expanding Buildkit's network capabilities so that it feels more integrated compared to my pile of hacks. 😄

@vito vito force-pushed the networks branch 6 times, most recently from bfdcefa to a7fc1e4 Compare June 23, 2023 22:45
@vito vito marked this pull request as ready for review June 23, 2023 23:02
@vito
Copy link
Contributor Author

vito commented Jun 24, 2023

Added integration tests for Git/HTTP + dynamic DNS config. Not sure why https://github.com/moby/buildkit/actions/runs/5361203707/jobs/9726967877?pr=3960 failed. Maybe a flake?

@vito vito force-pushed the networks branch 7 times, most recently from eb4bf0d to 4711702 Compare July 1, 2023 02:29
@vito
Copy link
Contributor Author

vito commented Jul 1, 2023

Rebased and squashed all the commits to reduce noise.

vito added 2 commits July 10, 2023 17:12
Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
still not great, but slightly better

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
@vito
Copy link
Contributor Author

vito commented Jul 10, 2023

Sorry to bump. Is there anything I can do, or any changes I need to make, to help this along? Thanks!

Current state: all tests are passing, or were before the most recent suite which seems to have hit a few flakes. Tests have been added, and caps have been added.

going with the networks package definition. will revert if requested,
but seems worth trying.

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
@d3rp3tt3
Copy link

Will this be fixed soon?

"github.com/pkg/errors"
)

// gitCLI carries config to pass to the git CLI to make running multiple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vito any chance you could pull this git refactor into a separate PR? I was looking at unifying the buildx and buildkit git helpers into one place, and this would be a good start 🎉

If you're too busy, I'm happy to pick it up 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, pretty in the weeds at the moment but I'll let you know if I find time! Feel free to pick this up in the meantime 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started hacking on this in #4106 🎉 (thanks!)

@tonistiigi
Copy link
Member

Looking at this again, I'll try to clear up what is problematic and what is not.

The new networkConfigID field in the API for ExecOp/Container looks too hacky and specific to internals of Dagger. As the API is backward compatible this is an issue. I've opened #4099 for discussion of general purpose Network dependencies in BuildKit. LMK what you think.

The new Session API itself does not need to be defined in upstream if it is too Dagger-specific. You could probably hide the networkConfigID into some metadata where your session implementation could catch it. Or we could add some general-purpose metadata array for that. But this is probably what you are already doing and wished to clean up.

Adding the extra DNS info/types directly into LLB is not issue afaics., it is really the session callback that is super specific.

Adding new options to the Go library(or config file) that are unused by the upstream use-cases is not that problematic either. It is really the proto library that we need to be careful about.

@vito
Copy link
Contributor Author

vito commented Aug 3, 2023

Thanks for the feedback! We can just close this anyway. A session.Attachable ended up not being the right layer of abstraction for our new architecture, since we need each level of nesting Dagger-in-Dagger to have different network config (1 additional DNS search domain at each level), but all of the nested instances run with one long-running session.

Adding the extra DNS info/types directly into LLB is not issue afaics., it is really the session callback that is super specific.

Yeah, if I were to start over I would probably just add LLB level DNS config to Exec + HTTP + Git + other sources and ignore it in CacheKey/CacheMap just like ExtraHosts. At the time this PR was opened we relied on LLB digests themselves being "stable," (i.e. not changing based on ephemeral network config across runs), which meant that approach wouldn't work, but now we do an extra pass to manually strip ephemeral info from LLB and compute a stable digest of that instead, which would make regular old LLB config a simpler option.

@vito vito closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants