-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/exec/engine: write empty cache config when not provided #1002
Conversation
I'm not wild about special-casing this at such a high level. Would it make more sense for The other issue is that #950 calls for something a bit more general than what's done here: refactoring so that the non-fetch stages can only read from cache, and will fail outright if the cache isn't available. |
9faafb6
to
20b19c0
Compare
Updated. Split the internals of Moved the writing of an empty config to after Fedora infra died while I was trying to build images for testing so latest set isn't tested yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it's possible to address https://github.com/coreos/ignition/pull/1002/files#diff-e3491b7c0e160811ce6761d5a1117558R96-R99 comment too in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For #950, one thing I originally imagined was literally moving all the code related to fetching configs from engine.go
into fetch/fetch.go
. I guess this is a little complicated now with the new fetch-offline
stage... Still probably worth doing some refactoring eventually, though definitely not a blocker!
internal/exec/engine.go
Outdated
func (e *Engine) acquireConfig(stageName string) (cfg types.Config, err error) { | ||
cfg, err = e.acquireCachedConfig() | ||
if err != nil { | ||
if os.IsNotExist(err) && strings.HasPrefix(stageName, "fetch") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for fetch
/fetch-offline
, we shouldn't try to even use the cache. Not sure whether we want to take that further and also error out if it's present. But it feels misleading to not actually fetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, I've updated it to only attempt to fetch from the provider config in fetch*
stages.
4748735
to
b862614
Compare
Yeah, I'd rather punt that out for now and keep this simple. Change pushed addressing comment & fixing blackbox tests (we were previously relying on the old functionality and not actually running the fetch stage in blackbox tests). |
Hmm, the
|
Okay, I think I've tracked down the bug. I'm not sure how we're avoiding this at the moment though. Essentially what seems to be happening is that the compressed tests work for the initial fetch (and are able to add the sources). However, we're re-writing the CA cert in place here which is resulting in the CA source now being a simple base64 encoded data URL but still is specifying the previous compression method ( This causes subsequent reads of the config from the cache to fail when attempting to read a base64 string via a gzip reader. A simple solution is to just nuke the Compression for the CA at the time of rewrite but I'm wondering if we should be preserving the compression type specified to the cached config. cc @bgilbert |
Compression support for CAs was added as part of resource consolidation in spec 3.1.0 and this interaction was missed. I don't think it's important to preserve the original compression field in the cached config. We're already clearing the HTTP headers. |
Tests now passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nits, LGTM otherwise!
internal/exec/engine.go
Outdated
if err == errors.ErrEmpty { | ||
e.Logger.Info("%v: provider config was empty, writing empty cache config", err) | ||
if err = renameio.WriteFile(e.ConfigCache, []byte("{}"), 0640); err != nil { | ||
e.Logger.Crit("failed to empty write cached config: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.Logger.Crit("failed to empty write cached config: %v", err) | |
e.Logger.Crit("failed to write empty cached config: %v", err) |
internal/exec/engine.go
Outdated
func (e *Engine) acquireConfig() (cfg types.Config, err error) { | ||
// acquireConfig will perform differently based on the stage it is being | ||
// called from. In fetch stages it will attempt to fetch the provider | ||
// config (writing an empty provider config if it is empty. In all other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// config (writing an empty provider config if it is empty. In all other | |
// config (writing an empty provider config if it is empty). In all other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in commit message: interal
.
Updated |
When the user-provided config is empty write an empty cache config. Additionally restrict fetching configs to only `fetch` stages.
CAs are re-written into the fetched config as a base64 string to allow cached reads to not require an additional fetch. When compression support was added during the resource consolidation refactor for the 3.1.0 spec this interaction was missed. The blackbox tests did not catch this as they were always clearing the config cache on each run.
pushed |
We shouldn't use `/run/ignition.json` to determine whether a user config was provided since it's implementation details. Instead, use the new official journal messages that Ignition emits. This is complicated by the fact that we need to support RHCOS, where the journal messages haven't been backported. Use the fact that we always have a base config to key off of whether to use the old behaviour vs the new one. (More accurately, we'd want to check for coreos/ignition#1002, but there's no easy way to do this from the outside. Alternatively we can check the Ignition version, though that's deeply nested under `/usr/lib/dracut/...`). Anyway, this should be temporary until RHCOS moves to spec v3. Closes: coreos#514
We shouldn't use `/run/ignition.json` to determine whether a user config was provided since it's implementation details. Instead, use the new official journal messages that Ignition emits. This is complicated by the fact that we need to support RHCOS, where the journal messages haven't been backported. Use the fact that we always have a base config to key off of whether to use the old behaviour vs the new one. (More accurately, we'd want to check for coreos/ignition#1002, but there's no easy way to do this from the outside. Alternatively we can check the Ignition version, though that's deeply nested under `/usr/lib/dracut/...`). Anyway, this should be temporary until RHCOS moves to spec v3. Closes: #514
Regression from coreos#958. We switched the list of providers from an array to a map. But iteration order through a map is undefined, so we lost the precedence of providers. I think this is the cause behind a lot of the FCOS installer test timeouts, such as: coreos/coreos-assembler#1597 There, we pass the Ignition config for the PXE boot via `ignition.config.url`, but if the metal (no-op) fetcher appears earlier than the `cmdline` fetcher, we get no config. And similarly for the installed system when the no-op fetcher appears before the `system` fetcher (which coreos-installer's `--ignition-file` leverages). The likelihood of this happening increased in the v2.4.0 release due to coreos#1002, which only gave us one try to iterate over the correct provider first (at the `fetch` stage), rather than every stage having a go at it. Closes: coreos/coreos-assembler#1597
Regression from coreos#958. We switched the list of providers from an array to a map. But iteration order through a map is undefined, so we lost the precedence of providers. I think this is the cause behind a lot of the FCOS installer test timeouts, such as: coreos/coreos-assembler#1597 There, we pass the Ignition config for the PXE boot via `ignition.config.url`, but if the metal (no-op) fetcher appears earlier than the `cmdline` fetcher, we get no config. And similarly for the installed system when the no-op fetcher appears before the `system` fetcher (which coreos-installer's `--ignition-file` leverages). The likelihood of this happening increased in the v2.4.0 release due to coreos#1002, which only gave us one try to iterate over the correct provider first (at the `fetch` stage), rather than every stage having a go at it. Closes: coreos/coreos-assembler#1597
When the user-provided config is empty write an empty cache config to
prevent re-fetching it in every stage.
Closes #950