Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove internal registry #45

Closed
gianarb opened this issue Jan 26, 2021 · 23 comments · Fixed by #154
Closed

Remove internal registry #45

gianarb opened this issue Jan 26, 2021 · 23 comments · Fixed by #154
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@gianarb
Copy link
Contributor

gianarb commented Jan 26, 2021

We are not sure if the internal registry is something we want or if it is just a complication.

We enabled #41. So now the workers have access to the internet. It means that all the registry setup and the certificate to have it secure is not strictly required anymore.

BUT we will have to figure out by ourselves how to get our actions to run in the worker. This is not a problem when the action is public, but for private repositories, it is a bit more complicated and the operating system installation environment (osie or tinkie) will have to give us a way to inject authentication (or we have to make tink-worker good enough to authenticate I suppose)

@mmlb
Copy link
Contributor

mmlb commented Jan 26, 2021

Yep private repo/images is something we should think about but I think the way to do it is to force it? I'm in favor of dropping the registry from sandbox.

@gianarb
Copy link
Contributor Author

gianarb commented Jan 26, 2021

I think the way to do it is to force it?

What do you mean?

@mmlb
Copy link
Contributor

mmlb commented Jan 26, 2021

Drop the registry so that we can have a discussion about this. Force the discussion.

@stolsma
Copy link

stolsma commented Jan 26, 2021

I agree with @mmlb, just drop it for now from the sandbox. It is easy to add again later and most people/orgs that need a private repo already have a policy and implementation or know how to set it up.... Just my 2 cents... :-)

@gianarb
Copy link
Contributor Author

gianarb commented Jan 26, 2021

need a private repo already have a policy and implementation or know how to set it up.... Just my 2 cents... :-)

Yeah, the problem is not how you set it up for yourself, but how you can use it in the installation environment if it has passwords or things like that. Anyway, I am always up for removing things 👍 I will leave time for more people to leave a comment, and at some point, I will move forward accordingly

@stolsma
Copy link

stolsma commented Jan 26, 2021

Yeah, the problem is not how you set it up for yourself, but how you can use it in the installation environment if it has passwords or things like that

If you are running a private repo you always have that problem... :-( But this problem is also related to the encryption of the metadata path from provisioner to local executor. Thats something we need to make easy and save so all secrets can be transported in a save way.

@mrchrd
Copy link
Contributor

mrchrd commented Jan 26, 2021

I think I'd like to keep it for now for consistency with k8s-sandbox. @detiber how do you push freshly built dev images to k8s with tilt? Is the registry required/useful?

@fransvanberckel
Copy link

fransvanberckel commented Jan 26, 2021

Beware i use a private repo registry for Docker images with the Ansible Playbook ...

@gianarb
Copy link
Contributor Author

gianarb commented Jan 26, 2021

@fransvanberckel can you tell me a bit more?! Do you rely on the private registry that sandbox provides or you just provision a registry as part of toolchain ansible-role-tinkerbell ships?

If it's the second no problem, you can keep shipping it. If it's the first removing it from the sandbox will require some work for the ansible role.

@fransvanberckel
Copy link

fransvanberckel commented Jan 26, 2021

@gianarb That's right, it's more or less the second. For more details, take a look at the role ...
https://github.com/fransvanberckel/ansible-role-tinkerbell/tree/master/roles/tinkerbell

@nicklasfrahm
Copy link

I think removing it is a good idea, because it simplifies things. If you seek to do private images or work in an airgapped environment, there are other ways to do it, like a private Docker registry with credentials or pre-pulling images.

@gianarb
Copy link
Contributor Author

gianarb commented Jan 27, 2021

I have started to work at this and the problem is in OSIE (and tinkie @thebsdbox ).

OSIE: https://github.com/tinkerbell/osie/blob/master/apps/workflow-helper.sh#L81
Tinkie: https://github.com/gianarb/tinkie/blob/master/bootkit/main.go#L64

We use the internal registry as a way to select the version of tink-worker we want to run as part of the installation environment.

Currently, it works this way:

  1. Sandbox has a file called ./current_versions.sh it contains the publicly available tink-worker image
  2. It gets moved to .env as part of the ./generate-envrc.sh script
  3. setup.sh pick it up and mirror it to the private registry always tagged as latest
  4. In this way the various OSIEs implementation can pull and run it.

We can make it more general and we can write a new RULE to OSIE:

"OSIE use a expect a kernel command line called tink-worker-image or whatever and it uses its content to run the tink-worker. In this way, we can even override it via metadata"

if we all agree I can open PR against osie and tinkie and sandbox to have all of them inline

(At some point we should write OSIE RULES as documentation, but we don' know them all yet! :P )

@thebsdbox
Copy link

We can easily do this now, default to lates and override in the facilities metadata!

    "facility": {
      "facility_code": "onprem tink-worker-tag=abc123",

How does that sound?

@nicklasfrahm
Copy link

nicklasfrahm commented Jan 27, 2021

@thebsdbox I am not too much into the details, but wouldn't it be better to create a new field in the facility object, because this could break applications, that use the facility_code and build on top of the metadata.

Maybe something like facility.cmdline?

@thebsdbox
Copy link

ah so this is an undocumented thing :| things in the facility code are passed to the cmdline. They're then presented as environment variables or in /proc/cmdline inside of OSIE and we can process them anyway we like. It won't break any existing functionality.

@gianarb
Copy link
Contributor Author

gianarb commented Jan 27, 2021

Yep @thebsdbox I love that. Let's see what @mmlb says and I can proceed when we agree

@nicklasfrahm yeah it is unfriendly and undocumented :) I think at some point we will figure out a better metadata structure for all of that. I think something is part of a proposal @mmlb started but not totally sure https://github.com/tinkerbell/proposals/pull/25/files

@mmlb
Copy link
Contributor

mmlb commented Jan 27, 2021

Yep I think this should be done in a metadata fashion but not like that @thebsdbox, boots already supports this in "cacher mode", we have "custom services version" that we use to boot different versions of osie (https://github.com/tinkerbell/boots/blob/master/packet/models_cacher.go#L319). Custom services is meant for this in the legacy EM stack and we should probably make use of that in tinkerbell too I think. I don't like the idea of injecting it through the facility param that is just layering bad UX on top of the bad stuff already :D.

I don't think this should be part of the Instance proposal @gianarb thats the wrong layer to do this I think.

The other option is to just pin it in the osie "package". Which is kind of the other side of doing things. I'm not too keen on this actually.

There are some runtime params that we should have a way to pass down to tink-worker. Its version seems like one, centralized logging config is another. I think maybe hardware.metadata.services is probably a better place? We can do something like:

{
  "tink-worker": {
    "log": {
      "driver": "..."
    },
    "tag": "some-tag",
    "repo": [
      { "url":"...", "user":"someuser", "password":"s3cr3t"}
    ]
  }
}

edited to add:
and boots adds them to command line with some transformation (maybe gron like)? some base64 json is also an option but I don't quite like this :D. That does seem like it can get a bit out of hand and we could end up running up against the kernel's cmdline limit. Maybe we just embed grpccurl or similar binary fetch the services data from hegel?

@gianarb
Copy link
Contributor Author

gianarb commented Jan 27, 2021

Let's summarise:

  1. One option is to build the operating system installation environment with the right binary/container. I think we do not have to discuss this scenario because it requires very low coordination. Whoever wants to do it, can do it I understand the benefit (self-contained) but I would like to offer an easy to swap mechanism because I see the benefit of swapping the version from the outside without having to recompile e distribute an OS.
  2. I proposed cmdline because right now it is the best structure and well-known way we have to inject something but I understand what you are saying @mmlb and yes we can get that information from the metadata, tinkie can extend bootkit to fetch values from there, for osie you have to tell me. Metadata are so unstructured and obscure that I try to avoid that as much as I can xD

Let's evaluate option 2. When you describe services do you mean this proposal https://github.com/tinkerbell/proposals/blob/master/proposals/0014/README.md ? Probably not

At this point we have to figure out a structure and where we want to place that configuration (not in cmdline)

@mmlb
Copy link
Contributor

mmlb commented Jan 27, 2021

Hmm yeah we can go with option 2 cmdline, maybe actually (append_cmdline ?). We already have the osie section in

network.interfaces[].netboot.osie | OSIE details

where we can add append_cmdline. https://docs.tinkerbell.org/hardware-data/

@gianarb
Copy link
Contributor Author

gianarb commented Jan 27, 2021

I am confused, you say that cmdline were a bad idea right? :D

@mmlb
Copy link
Contributor

mmlb commented Jan 27, 2021

The services proposal (workflow services actually) is not what I meant when I mentioned services... naming x)

@mmlb
Copy link
Contributor

mmlb commented Jan 27, 2021

cmdline is definitely an option, but feels hacky (though its my favorite of the hacks). Having a proper way to get some hardware data from tink-server (through hegel probably?) is I think the best option, but not as quick :D

@gianarb
Copy link
Contributor Author

gianarb commented Jan 27, 2021

Ok, we (@mmlb) moved our conversation over prv Slack because here it was too hard and we find a common proposal!

We are gonna teach Hegel a new endpoint /worker that will expose the content of hardware.metadata.worker. That field specifies how the worker that OSIE will start looks like.

The format is not yet defied but ideally something like:

{
  "worker": {
    "log": {
      "driver": "..."
    },
    "image": "quay.io/tinkerbell/tink-worker:something",
    ....
  }
}

Unknown: tink-worker today relays on environment variables and I am not sure how and if we can make it work with this proposal.

@tstromberg tstromberg added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Aug 27, 2021
@jacobweinstock jacobweinstock mentioned this issue Oct 5, 2022
3 tasks
@jacobweinstock jacobweinstock linked a pull request Oct 5, 2022 that will close this issue
3 tasks
@mergify mergify bot closed this as completed in #154 Oct 25, 2022
mergify bot added a commit that referenced this issue Oct 25, 2022
## Description


This PR brings up the sandbox via Docker compose using the Kubernetes backend for all service. This does not completely remove the postgres backend setup but moves all the compose with postgres into an isolated directory (deploy/compose/postgres) that can be removed when we're ready. 

> I did not touch the terraform setup. I need some help validating that one. please and thank you. CC @mmlb @displague

## Why is this needed



Fixes: #142 #45 #118 #131 #133 #145 #148 
- This "fixes" a quite a few issues related to TLS cert generation. This is the case because we are not using TLS in this deployment. Also see, tinkerbell/tink#555.
- This also "fixes" any issues related to the internal registry as that is removed as the default.

## How Has This Been Tested?



Manually tested vagrant with virtualbox (on a Mac), vagrant with libvirt (on Ubuntu 22.04), and docker-compose (on on Ubuntu 22.04). 


## How are existing users impacted? What migration steps/scripts do we need?
There is no migration support. Users will need to follow a quick start guide to get started.





## Checklist:

I have:

- [x] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
ttwd80 pushed a commit to ttwd80/tinkerbell-playground that referenced this issue Sep 7, 2024
## Description


This PR brings up the sandbox via Docker compose using the Kubernetes backend for all service. This does not completely remove the postgres backend setup but moves all the compose with postgres into an isolated directory (deploy/compose/postgres) that can be removed when we're ready. 

> I did not touch the terraform setup. I need some help validating that one. please and thank you. CC @mmlb @displague

## Why is this needed



Fixes: tinkerbell#142 tinkerbell#45 tinkerbell#118 tinkerbell#131 tinkerbell#133 tinkerbell#145 tinkerbell#148 
- This "fixes" a quite a few issues related to TLS cert generation. This is the case because we are not using TLS in this deployment. Also see, tinkerbell/tink#555.
- This also "fixes" any issues related to the internal registry as that is removed as the default.

## How Has This Been Tested?



Manually tested vagrant with virtualbox (on a Mac), vagrant with libvirt (on Ubuntu 22.04), and docker-compose (on on Ubuntu 22.04). 


## How are existing users impacted? What migration steps/scripts do we need?
There is no migration support. Users will need to follow a quick start guide to get started.





## Checklist:

I have:

- [x] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants