Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: fix provider config precedence #1032

Merged
merged 2 commits into from
Jul 16, 2020
Merged

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 16, 2020

Regression from #958. We switched the list of providers from an array to
a map. But iteration order through a map is undefined, so we lost the
precedence of providers.

I think this is the cause behind a lot of the FCOS installer test
timeouts, such as:

coreos/coreos-assembler#1597

There, we pass the Ignition config for the PXE boot via
ignition.config.url, but if the metal (no-op) fetcher appears earlier
than the cmdline fetcher, we get no config. And similarly for the
installed system when the no-op fetcher appears before the system
fetcher (which coreos-installer's --ignition-file leverages).

The likelihood of this happening increased in the v2.4.0 release due to
#1002, which only gave us one try
to iterate over the correct provider first (at the fetch stage),
rather than every stage having a go at it.

Closes: coreos/coreos-assembler#1597

@sohankunkerkar
Copy link
Member

sohankunkerkar commented Jul 16, 2020

Regression from #958. We switched the list of providers from an array to
a map. But iteration order through a map is undefined, so we lost the
precedence of providers.

I think this is the cause behind a lot of the FCOS installer test
timeouts, such as:

coreos/coreos-assembler#1597

@jlebon Nice catch!

@lucab
Copy link
Contributor

lucab commented Jul 16, 2020

I totally see us regressing this again the next time we touch this logic. Could we quickly throw a unit test on top of it (possibly as a followup PR)?

@dustymabe
Copy link
Member

I totally see us regressing this again the next time we touch this logic. Could we quickly throw a unit test on top of it (possibly as a followup PR)?

also maybe a comment in the code (comments are free 😄)

jlebon added 2 commits July 16, 2020 10:18
Regression from coreos#958. We switched the list of providers from an array to
a map. But iteration order through a map is undefined, so we lost the
precedence of providers.

I think this is the cause behind a lot of the FCOS installer test
timeouts, such as:

coreos/coreos-assembler#1597

There, we pass the Ignition config for the PXE boot via
`ignition.config.url`, but if the metal (no-op) fetcher appears earlier
than the `cmdline` fetcher, we get no config. And similarly for the
installed system when the no-op fetcher appears before the `system`
fetcher (which coreos-installer's `--ignition-file` leverages).

The likelihood of this happening increased in the v2.4.0 release due to
coreos#1002, which only gave us one try
to iterate over the correct provider first (at the `fetch` stage),
rather than every stage having a go at it.

Closes: coreos/coreos-assembler#1597
Let's run install tests on this repo since Ignition is key to automating
them.
@jlebon jlebon force-pushed the pr/fix-ordering branch from ca24503 to 2ac42aa Compare July 16, 2020 14:20
@jlebon
Copy link
Member Author

jlebon commented Jul 16, 2020

I totally see us regressing this again the next time we touch this logic. Could we quickly throw a unit test on top of it (possibly as a followup PR)?

I looked at this a bit, but it's tricky. I think we'd need to mock the cmdline and system providers? I turned on testiso in CI at least. :) (Not saying it's not worth doing, but yeah let's not block on it?)

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bgilbert
Copy link
Contributor

Great catch! We have blackbox tests for this already and they've been flaking for a while. That explains that. 🎉

@jlebon
Copy link
Member Author

jlebon commented Jul 16, 2020

+ kola testiso -S --output-dir tmp/kola-testiso-metal
Testing scenarios: [pxe-install iso-offline-install]
Successfully tested scenario pxe-install for 32.20200716.dev.0 on bios (metal)
Successfully tested scenario iso-offline-install for 32.20200716.dev.0 on bios (metal)

🎉

@jlebon jlebon merged commit 1f0b2c0 into coreos:master Jul 16, 2020
@jlebon jlebon deleted the pr/fix-ordering branch July 16, 2020 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kola testiso --scenarios pxe-install is flaky
6 participants