Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elemental hosts should run elemental-register at each boot #434

Closed
fgiudici opened this issue Apr 28, 2023 · 7 comments · Fixed by #470, rancher/elemental#890 or rancher/elemental-docs#163
Assignees

Comments

@fgiudici
Copy link
Member

After registration and provisioning, each host should re-register at each reboot.
This would ensure:

  1. the host is tracked by a MachineInventory in the Elemental Operator
  2. labels and annotations (in particular the IP addr annotation, SMBIOS and HW labels) are updated

Note that point 1. could be useful only if the host has not been made part of a cluster yet, otherwise the tracked state could be a bit tricky: if the host is already part of a cluster (provisioned with a kubernetes deployment) it cannot be completely re-synced.
This case is not part of this card, and should be tracked in a follow-up one.

@fgiudici fgiudici moved this to 🗳️ To Do in Elemental Apr 28, 2023
@fgiudici
Copy link
Member Author

Note that re-registering at each boot is incompatible with emulate-tpm with random emulated-tpm-seed (equal to -1').
Hosts configured with random emulated-tpm-seed should be prevented from re-registering.

@kkaempf
Copy link
Contributor

kkaempf commented Apr 28, 2023

I don't think we should enable re-registration generally. It's a scalability/bandwidth challenge.

@fgiudici
Copy link
Member Author

fgiudici commented Apr 28, 2023

I don't think we should enable re-registration generally. It's a scalability/bandwidth challenge.

the registration itself, without the Elemental OS provisioning, is really fast and lightweight:

m-baf32526-d5ec-40d6-9eb9-3a5c3a928fb5:~ # time elemental-register 
I0428 08:13:04.849710   15110 log.go:42] Register version 1.2.2, commit 06c41fac, commit date 20230413
I0428 08:13:04.850246   15110 log.go:42] reading config file config.yaml
I0428 08:13:04.850483   15110 log.go:42] Connect to https://172.16.200.2.sslip.io/elemental/registration/b9rhfx8dvxq6jppzjjfmpx74gb9fgvp92zbzz49c9k8fz8fmlvcrzg
I0428 08:13:05.019142   15110 log.go:42] Using TPM Auth with Hash 54a3ac73b0933fd61fdb3a9cf1df2d2331a734e64a846f18919c0f1a4d26a1e2 to dial wss://172.16.200.2.sslip.io/elemental/registration/b9rhfx8dvxq6jppzjjfmpx74gb9fgvp92zbzz49c9k8fz8fmlvcrzg
I0428 08:13:05.028597   15110 log.go:42] Local Address: 172.16.200.3:49616
I0428 08:13:05.111195   15110 log.go:42] TPM authentication completed
I0428 08:13:05.111985   15110 log.go:42] Negotiated protocol version: 9
I0428 08:13:05.112111   15110 log.go:42] Send SMBIOS data
I0428 08:13:05.121369   15110 log.go:42] Send system data
I0428 08:13:05.184428   15110 log.go:46] Send elemental annotations
I0428 08:13:05.185175   15110 log.go:46] Get elemental configuration

real	0m0.370s
user	0m0.062s
sys	0m0.090s

The whole TCP exchange was 69 kB (dumped full traffic with tcpdump).

It just updates the registration data.
Let's add also that this will only happen after a node reboot, quite a rare event, especially if you envision many many nodes rebooting at the very same time that could pose a scalability problem.
The benefit would be to have up-to-date information, like the registering ip address and labels (and I think would be really good also to record OS version in the annotations).
It will also enable the operator to detect hosts not tracked in the MachineInventories, and trigger a reset (when we will have that feature), or even help in tracking the update procedure.

@anmazzotti anmazzotti self-assigned this Jun 30, 2023
@anmazzotti anmazzotti moved this from 🗳️ To Do to 🏃🏼‍♂️ In Progress in Elemental Jun 30, 2023
@kkaempf
Copy link
Contributor

kkaempf commented Jun 30, 2023

It's still a scalability challenge. Thinks of thousands of cash registers being turned on when shops open.

@anmazzotti
Copy link
Contributor

It's still a scalability challenge. Thinks of thousands of cash registers being turned on when shops open.

I can add a randomized sleep interval so that we can stagger the registration attempts.
It's not a proper solution I believe, but maybe a good mitigation.

@anmazzotti
Copy link
Contributor

FYI I opened the new follow up issue: #471
Feel free to add notes in case I miss anything.

@anmazzotti anmazzotti moved this from 🏃🏼‍♂️ In Progress to 👀 Needs review in Elemental Jul 3, 2023
@fgiudici
Copy link
Member Author

fgiudici commented Jul 4, 2023

Agreed that dealing with re-registrations timings deserves a separate discussion / issue: #471 is perfect for it.
There is one important missing scenario anyway we should deal with here: deleted MachineInventories which track hosts that will re-register.
That will cause the MachineInventory to be recreated as a "clean" machine, but that is not the case if the Machine has been already provisioned as part of a Cluster.
It looks to me we should re-register differently than registering, likely we need to extend the registration protocol to be safe. Further discussion needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment