Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ability to specify a CALLBACK_URI env var (so packer can be used in kubernetes based pipelines) #13201

Open
lknite opened this issue Nov 5, 2024 · 11 comments

Comments

@lknite
Copy link

lknite commented Nov 5, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

If the environment variable CALLBACK_URI exists, use that once a VM is complete instead of the local ip automatically detected when the bind was performed.

Use Case(s)

In my environment I run pipelines within kubernetes. When running image-builder, which uses packer, this means once the VM is ready it can't call back on the ip it was given. Things running in kubernetes are not reachable by default. They need to be exposed via a service. In my case I am easily able to expose the IP needed to be used using a LoadBalancer IP, or specify an FQDN (which is registered in DNS automatically), but via image-builder & packer there seems to be no way to specify the value.

In the past I used Jenkins with the kubernetes plugin, running things via Kubernetes. Currently my pipelines run using gitlab-runner (via Kubernetes), which is used by my gitlab pipelines.

Potential configuration

env var CALLBACK_URI or similar
@lknite lknite changed the title add ability to specify a CALLBACK_URI (so packer can be used in kubernetes based pipelines) add ability to specify a CALLBACK_URI env var (so packer can be used in kubernetes based pipelines) Nov 5, 2024
@lbajolet-hashicorp
Copy link
Contributor

Hey @lknite,

I'm not sure I understand what it is you're looking for with this issue to be honest, I am not familiar with image-builder, and I've never tried to run Packer in a k8s environment, so please bear with me.
What is that CALLBACK_URI variable you want defined for? You mention packer using a local IP (I imagine that's for communicating with the guest VM? Or something else?), I'm not sure at which step we're talking, do you have an example that we can work with that you can share, and traces/logs that we can start looking into?
I'm mostly having a hard time understanding what's broken with the current setup, or how to address that lack. The envvar is meant to be used by whom?

@lknite
Copy link
Author

lknite commented Nov 19, 2024

You mentioned that you've never tried to run packer in a k8s environment, that implies you might have some experience working with kubernetes? If not, I think I'd have to teach you kubernetes before this question would make sense ... and that's a big ask. I'd recommend a course on udemy that runs about $29 with a new email. It changed my life, so not joking or anything here, its called, "Certified Kubernetes Administrator (CKA) with Practice Tests".

I can try though. With kubernetes you create a 'deployment' resource, which is a bunch of yaml that specifies a docker container image to use. This runs in a "hidden network" that kubernetes provides. The "hidden network" spans across multiple PCs/servers. Via the "hidden network", which kubernetes calls the ClusterIP network, a docker container like wordpress can run on one server (worker node), and talk with another container like a redis database running on another worker node. From the perspective of the containers, its just like they are plugged into the same switch. This lets kubernetes deploy containers to any worker node, and making this highly available, cause a worker node can go down because of a power outage and kubernetes will move the container to another worker node.

From the container perspective, everything is great via ClusterIP, and it is, everything works... but the outside world can't see anything, cause the ClusterIP network is just kind of a secret network among the worker nodes. To interact with the world the deployment must be 'exposed', and it does this using a LoadBalancerIP (a typical ipv4 ip) or using an ingress (which uses a reverse proxy and let you use a URL such as https://travisloyd.xyz/media)
In a way kubernetes can be seen like an awesome place with tons of cpu and memory, like a mainframe, or a datacenter.

Since there is so much cpu and memory there, its nice to run pipelines in there. You could see how it might be nice to run packer in there, I mean heck, once you are familiar with it you pretty much want to run everything in there.

So in my case I run imagebuilder, which is used to build images for use with kubernetes. Imagebuilder uses packer. Packer gets the "hidden, secret, IP from the ClusterIP network" and runs a http server there, or something, its the callback used once the image is complete to signal everything is done. But the VM can't reach the hidden network, it can only reach an 'exposed' ip, setup for that purpose. But how can I tell packer, don't use the "secret ip" when you tell the vm the ip to use when its finished, that instead it needs to use the ip that I give it?

I'm proposing the idea that if the environment variable CALLBACK_URI is set, to give that to the vm instead of just using the local ip which was automatically detected when the 'bind' occurred when the http server was stood up.

(high level overview of kubernetes if interested: https://www.youtube.com/watch?v=5g1k3D11V-8&t=8s)

@lbajolet-hashicorp
Copy link
Contributor

I have worked with k8s in the past yes, but not for the past ~3 years, I assume the basics haven't changed too much. Deployment/StatefulSet and Service are concepts I do remember from that time.
As for what you're trying to do, I can understand why you'd want to run builds within your cluster, now the question becomes who'll talk with whom, and understanding what fails to execute properly.

Judging from what you describe; packer runs on a pod, and builds a VM on another? Which plugin are you using for the builder? I am very unfamiliar with imagebuilder, is this it?
Not all plugins support http_server, those that do generally run locally.

Looking at the code, it seems individual plugins are responsible for providing an address for connecting to the HTTP server, along with inferring which address to listen on if relevant. If an option like CALLBACK_URI or anything else is to be added, plugins would need to be updated so they support it, unless we can find a place to support it directly in the SDK, but that seems unlikely.

To explore possible solutions, could you provide some usage/configuration example (and if possible verbose logs) to help me understand what the issue is, and what we can do to fix it, provided this is a Packer issue. It looks like some of the configurations for local hypervisors like qemu do use HTTPIP and HTTPPort for the boot commands, so I assume they configured their tool to support it, how do they manage to make it work?

@lknite
Copy link
Author

lknite commented Nov 27, 2024

My current use case is:

  • working with proxmox at the moment, so the vm being built with packer is being built there
  • i have a kubernetes cluster running, which lives in proxmox, it was stood up using cluster-api with the proxmox provider
  • my pipelines run in gitlab, using 'gitlab-runners', ... the gitlab-runners run onsite and that's where image-builder gets run

What currently happens is, everything works perfectly, the gitlab pipline executes, for each step in the pipeline a container is spun up onprem (via gitlab-runner inside the kubernetes cluster), the image is built in proxmox and i end up with a template ... but the vm is never able to communicate back to the pipeline that its finished, so the pipeline just waits a very long time before giving up ... or maybe gitlab kills it, i forget, i think it waits like 20 or 30 minutes.

The helm chart for gitlab-runner has a place to expose things via a LoadBalancer ip or ingress, so I have that and I can pass it into image-builder/packer if there was a way to.

Based on your previous comment I submitted an issue to the proxmox packer plugin, hashicorp/packer-plugin-proxmox#304, and I can help to get things working there. It's just, these seems like it should end up in all the plugins, otherwise only some will work via kubernetes-based pipelines, and I'm not sure how to make that happen. The packer-plugin-proxmox uses the packer-plugin-sdk and I think maybe this functionality we are talking about is in the SDK?

found a log...

2024/10/16 01:54:59 packer-plugin-goss_v3.2.12_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 13). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 13). Ignoring.
2024/10/16 01:54:59 packer-post-processor-shell-local plugin: Received interrupt signal (count: 13). Ignoring.
2024/10/16 01:54:59 packer-plugin-proxmox_v1.2.1_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 13). Ignoring.
2024/10/16 01:54:59 packer-provisioner-shell plugin: Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-plugin-goss_v3.2.12_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-provisioner-shell plugin: Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-post-processor-shell-local plugin: Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-plugin-proxmox_v1.2.1_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 14). Ignoring.
2024/10/16 01:54:59 packer-provisioner-shell plugin: Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-post-processor-shell-local plugin: Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-plugin-goss_v3.2.12_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-plugin-proxmox_v1.2.1_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-provisioner-shell plugin: Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 15). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 16). Ignoring.
2024/10/16 01:54:59 packer-plugin-proxmox_v1.2.1_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 16). Ignoring.
2024/10/16 01:54:59 packer-provisioner-shell plugin: Received interrupt signal (count: 16). Ignoring.
2024/10/16 01:54:59 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 16). Ignoring.
2024/10/16 01:54:59 packer-plugin-goss_v3.2.12_x5.0_linux_amd64 plugin: 2024/10/16 01:54:59 Received interrupt signal (count: 16). Ignoring.
2024/10/16 01:54:59 packer-provisioner-shell plugin: Received interrupt signal (count: 16). Ignoring.
2024/10/16 01:54:59 packer-post-processor-shell-local plugin: Received interrupt signal (count: 16). Ignoring.
Build 'proxmox-iso.ubuntu-2204' errored after 46 minutes 1 second: build was cancelled

@lknite
Copy link
Author

lknite commented Nov 27, 2024

In this function, within the packer-plugin-proxmox, it seems like if c.HTTPAddress is defined it would just use that, so it would seem I could specify my loadbalancer ip there somehow. But when I was digging into this before, I think it tried to bind to what we specify here, next ... which wouldn't work with a loadbalancer ip. I'll try to figure it out again ...

step_type_boot_command.go

        if c.HTTPAddress != "0.0.0.0" {
                httpIP = c.HTTPAddress
        } else {
                httpIP, err = hostIP(c.HTTPInterface)
                if err != nil {
                        err := fmt.Errorf("Failed to determine host IP: %s", err)
                        state.Put("error", err)
                        ui.Error(err.Error())
                        return multistep.ActionHalt
                }
        }

@lknite
Copy link
Author

lknite commented Nov 27, 2024

In builder.go , in the packer-plugin-proxmox , the https server is stood up then the information is passed to the vm:

                commonsteps.HTTPServerFromHTTPConfig(&b.config.HTTPConfig),
                &stepTypeBootCommand{
                        BootConfig: b.config.BootConfig,
                        Ctx:        b.config.Ctx,
                },

The step to setup the http server is in step_http_server.go in packer-plugin-sdk , It's using the HTTPAddress:

func HTTPServerFromHTTPConfig(cfg *HTTPConfig) *StepHTTPServer {
        return &StepHTTPServer{
                HTTPDir:     cfg.HTTPDir,
                HTTPContent: cfg.HTTPContent,
                HTTPPortMin: cfg.HTTPPortMin,
                HTTPPortMax: cfg.HTTPPortMax,
                HTTPAddress: cfg.HTTPAddress,
        }
}

So, we can't set the HTTPAddress, or it will use that when standing up the http server.

Maybe I could modify the code in the previous comment in step_type_boot_command.go to check for an environment variable and use that if it exists:

        // HTTPCallbackAddress is useful when running within a kubernetes environment using an exposed LoadBalancer ip
        if c.HTTPCallbackAddress != "" {
                httpIP = c.HTTPCallbackAddress
        } else c.HTTPAddress != "0.0.0.0" {
                httpIP = c.HTTPAddress
        } else {
                httpIP, err = hostIP(c.HTTPInterface)
                if err != nil {
                        err := fmt.Errorf("Failed to determine host IP: %s", err)
                        state.Put("error", err)
                        ui.Error(err.Error())
                        return multistep.ActionHalt
                }
        }

and

packer-plugin-sdk

multistep/commonsteps/http_config.go

	HTTPAddress string `mapstructure:"http_bind_address"`
	// Use to specify a specific ip/fqdn a vm should use to reach the callback http server upon completion.
        // This is required when running via workflows/pipelines which are running within a kubernetes cluster.
	HTTPCallbackAddress string `mapstructure:"http_callback_address"`

@lknite
Copy link
Author

lknite commented Nov 27, 2024

Ok, I think I've figured it out. Need to add a field to the SDK, which will make it available for all the providers, they'll just need to add it. If there's a common packer provider template, then I'll see if I can update that also.

@lknite
Copy link
Author

lknite commented Nov 27, 2024

SDK pull request: hashicorp/packer-plugin-sdk#268

@lknite
Copy link
Author

lknite commented Nov 27, 2024

Proxmox provider pull request: hashicorp/packer-plugin-proxmox#305

@lknite
Copy link
Author

lknite commented Nov 27, 2024

Image-builder pull request: kubernetes-sigs/image-builder#1637

@lknite
Copy link
Author

lknite commented Dec 7, 2024

@lbajolet-hashicorp I wonder if you might be able to help my pull requests to get some attention internally?

... Since the fix requires changes in 3 git repos, though the changes are small, it might be tricky to get the maintainers to all understand how the parts work together, mainly that the folks involved would need to know enough kubernetes to understand the ClusterIp is unreachable and it needs to be possible to specify a loadbalancer or ip or similar.

The image-builder folks are ready to merge the pull request but are waiting on the other two. I'm hoping to avoid this being dragged out and forgotten cause folks without any kubernetes background skip over the pull requests.

Since you've got some kubernetes background, maybe you can help to stitch the teams together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants