Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treefile: Add new add-commit-metadata key #1865

Closed
wants to merge 7 commits into from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 8, 2019

Add support for a new add-commit-metadata key in the treefile so that
we can directly specify commit metadata we want to inject from there.

This will be useful in Fedora CoreOS, where we'll have separate
treefiles for each streams, each with stream-specific metadata values
required.

Instead of relying on `rpmostree_composeutil_read_json_metadata` to
initialize the metadata hash table, initialize it explicitly in
`context_new()` function and only call the util function if we were
passed a file with `--add-metadata-from-json`.

Accordingly rename the function
`rpmostree_composeutil_read_json_metadata_from_file`.
Split out from `rpmostree_composeutil_read_json_metadata_from_file` the
part that actually converts to `GVariant` and inserts into the hash
table.
Move up the setting of the treefile JSON object to right after parsing,
and move down the populating of the metadata hash table to after setting
the treefile JSON object. This is pure code block moves; there's no
functional change otherwise.

Prep for future patch.
@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2019

See coreos/fedora-coreos-config#110 for example usage.

rust/src/treefile.rs Outdated Show resolved Hide resolved
@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2019

Hmm OK and the Rust compile errors are the same as the ones in #1863 (comment). Will dig into this.

@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2019

So the issue was in fact that we didn't track Cargo.lock for bindgen. Took me a while to figure out it wasn't the main librpmostree_rust library that was failing to build (even though looking closely at the logfile now, it seems pretty obvious).

Anyway, as mentioned in the commit message, we should be able to update to 1.35.0 soon too.

Copy link
Contributor

@rfairley rfairley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, otherwise LGTM

src/app/rpmostree-compose-builtin-tree.c Outdated Show resolved Hide resolved
@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2019

File 'out.txt' doesn't match fixed string list 'No apparent changes since previous commit'

Hmm, I'm not hitting this locally. Still trying to figure out what's going on here.

@rfairley
Copy link
Contributor

rfairley commented Jul 8, 2019

Also not seeing this locally, running the CI command . CI is also showing

error: No such metadata key 'exampleos.gitrepo'

in f29-compose1 (also not seeing this one locally).

@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2019

Yeah, the issue is that we'll get different input hashes based on how serde decides to serialize the hash table (and so the ordering of keys can change on each iteration). I'm working on adding checksumming over all the file inputs to fix this (incidentally also fixing: https://github.com/projectatomic/rpm-ostree/blob/c94bd08b02499a3f2a5804f5c2ba207a0afbda18/src/app/rpmostree-composeutil.c#L90).

Add support for a new `add-commit-metadata` key in the treefile so that
we can directly specify commit metadata we want to inject from there.

This will be useful in Fedora CoreOS, where we'll have separate
treefiles for each streams, each with stream-specific metadata values
required.
We were hitting the classic "negative test passes for the wrong reason".
It was failing not because it didn't have a parent, but because we
didn't pass `--repo`. Fix this and also explicitly check for the error
message we expect.
For the same reasons we started doing it for the main app:
coreos#1719

This time, it's `getrand` that broke us.
rust-random/getrandom#15

We should be able to update to 1.35.0 soon, which will unblock this.
@jlebon
Copy link
Member Author

jlebon commented Jul 9, 2019

Added docs and fixed tests! ⬆️

and so the ordering of keys can change on each iteration

In fact, the only change necessary for this was to swap out HashMap for a BTreeMap, since those keys are ordered. But I ended up also enhancing the way we hash inputs anyway in the process. We can split out that last commit into a separate patch if wanted. It's not strictly needed for FCOS.

@lucab
Copy link
Contributor

lucab commented Jul 9, 2019

Notice we hash the reserialization of the final flattened treefile only so that e.g.
comments/whitespace/hash table key reorderings don't trigger a respin.

This is an high-level pattern which always causes me troubles, as it is hashing some input state that doesn't really exist and is not persisted. Would you consider just hashing the raw input bytes instead?

@jlebon
Copy link
Member Author

jlebon commented Jul 9, 2019

Would you consider just hashing the raw input bytes instead?

Yeah, that's in fact what I initially had. I then went back to just covering the final flattened treefile (which is the status quo) when I realized it regressed on detecting what is essentially no-ops. Is the argument for it in that quote not motivation enough in your opinion?

as it is hashing some input state that doesn't really exist and is not persisted.

This is not strictly true; the flattened treefile is written to /usr/share/rpm-ostree/treefile.json in the OSTree. :) So in theory, we could still retrieve the treefile hash for a previous commit if needed.

ISTM like hashing all the treefiles is more of an escape hatch for being unable to easily deal with some tricky situations. Seems like it's worth trying the "smart approach" first before resorting to the easier one?

That said, I'm not strongly against it either. Just slightly more leaning towards maintaining the status quo until there's an obvious problem with that (or we discover some subtleties/interactions with how this behaviour integrates with e.g. COSA or the FCOS/RHCOS pipelines in a negative way).

@lucab
Copy link
Contributor

lucab commented Jul 9, 2019

I don't grasp the full context of the effects of those no-ops, so I am up to your call here.

@cgwalters
Copy link
Member

Another approach is to hash the JSON canonical form - also to write that by default.

rust/src/treefile.rs Outdated Show resolved Hide resolved
@cgwalters
Copy link
Member

Any time we change the checksum algorithm, it will trigger a "pointless" build for users. This isn't a serious problem IMO, but we shouldn't do it often. Concretely for example, after this lands in Fedora, depending on whether or not there are AH/Silverblue updates pending, users may get a "no-op" update.

IOW, if we're going to change it now we should feel it's pretty future proof.

I like the direction the code is going in. However, it looks like we're relying on the Serde serialization, and I suspect that's not defined to be stable - they would feel free to e.g. add whitespace? It looks like that canonical JSON library I linked above is abandoned though 😦

I dunno. Maybe it's unlikely Serde will change and we can just live with that.

Copy link
Contributor

@rfairley rfairley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Docs look good to me.

After thinking/reading a bit, the reasoning for using BTreeMap (or any ordered datatype) makes sense to me now for the fields where ordering matters (e.g. the add-commit-metadata key), vs. relying on hashing the content of the externals to detect changes in ordering. Nice that the issue of hashing the external files could also be addressed!

@jlebon
Copy link
Member Author

jlebon commented Jul 9, 2019

Any time we change the checksum algorithm, it will trigger a "pointless" build for users. This isn't a serious problem IMO, but we shouldn't do it often.

Yeah, agreed on both counts.

It looks like that canonical JSON library I linked above is abandoned though

Ohh, that would've been perfect for us. I might create an RFE for serde_json to add a canonical mode to see what they think. I guess one thing we could do at least in the mean time is checksum in non-pretty mode.

WRT checksumming all treefiles vs just the final flattened one, I think it comes down to which one will be the most "unstable", serde's format vs humans/robots making inconsequential changes to treefiles... I'd think it'd be more of the latter :)

Note that the way we checksum things now is purely additive (i.e. we were already checksumming the flattened treefile before; we're just adding more things to the checksum). So at the very least, it's not going to be less stable than it already was.

@cgwalters
Copy link
Member

I guess one thing we could do at least in the mean time is checksum in non-pretty mode.

Oh this definitely; the pretty mode seems much more likely to change.

So at the very least, it's not going to be less stable than it already was.

Right; as you noted in the commit message the original core issue was using HashMap instead of BTreeMap which you fixed.

Anyways...from my PoV we don't need to keep rehashing this more (so to speak), I'm good with the PR with the addition of using non-pretty mode per above.

Move hashing to the Rust side so that we can easily hash over the final
set of inputs after parsing. This means that we now hash over all the
externals, like `add-files` references, any `postprocess-script` script,
and `passwd` and `group` files.

The original motivation for this was that hashing over a reserialized
version of the treefile was not deterministic now that treefiles include
hash tables (i.e. `add-commit-metadata`). So I initially included each
individual treefile as part of the hash.

I realized afterwards that just switching to `BTreeMap` fixes this, so
we can keep hashing only the final flattened reserialized treefile so we
ignore comments and whitespace too. But since I already wrote the patch,
and it fixes a real issue today... here we are.

One notable change though is that we now hash the treefile in non-pretty
mode to increase the chances that the serialized form remains stable.
Ironically, this change is likely to cause a no-op commit once it gets
to pipelines which iterate quickly. All for the greater good though.
@jlebon
Copy link
Member Author

jlebon commented Jul 9, 2019

OK updated! I also added to the commit message:

One notable change though is that we now hash the treefile in non-pretty
mode to increase the chances that the serialized form remains stable.
Ironically, this change is likely to cause a no-op commit once it gets
to pipelines which iterate quickly. All for the greater good though.

😄

@cgwalters
Copy link
Member

@rh-atomic-bot r+ 8b3753f

@rh-atomic-bot
Copy link

⌛ Testing commit 8b3753f with merge 42e1da2...

rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Instead of relying on `rpmostree_composeutil_read_json_metadata` to
initialize the metadata hash table, initialize it explicitly in
`context_new()` function and only call the util function if we were
passed a file with `--add-metadata-from-json`.

Accordingly rename the function
`rpmostree_composeutil_read_json_metadata_from_file`.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Split out from `rpmostree_composeutil_read_json_metadata_from_file` the
part that actually converts to `GVariant` and inserts into the hash
table.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Move up the setting of the treefile JSON object to right after parsing,
and move down the populating of the metadata hash table to after setting
the treefile JSON object. This is pure code block moves; there's no
functional change otherwise.

Prep for future patch.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Add support for a new `add-commit-metadata` key in the treefile so that
we can directly specify commit metadata we want to inject from there.

This will be useful in Fedora CoreOS, where we'll have separate
treefiles for each streams, each with stream-specific metadata values
required.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
We were hitting the classic "negative test passes for the wrong reason".
It was failing not because it didn't have a parent, but because we
didn't pass `--repo`. Fix this and also explicitly check for the error
message we expect.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
For the same reasons we started doing it for the main app:
#1719

This time, it's `getrand` that broke us.
rust-random/getrandom#15

We should be able to update to 1.35.0 soon, which will unblock this.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Move hashing to the Rust side so that we can easily hash over the final
set of inputs after parsing. This means that we now hash over all the
externals, like `add-files` references, any `postprocess-script` script,
and `passwd` and `group` files.

The original motivation for this was that hashing over a reserialized
version of the treefile was not deterministic now that treefiles include
hash tables (i.e. `add-commit-metadata`). So I initially included each
individual treefile as part of the hash.

I realized afterwards that just switching to `BTreeMap` fixes this, so
we can keep hashing only the final flattened reserialized treefile so we
ignore comments and whitespace too. But since I already wrote the patch,
and it fixes a real issue today... here we are.

One notable change though is that we now hash the treefile in non-pretty
mode to increase the chances that the serialized form remains stable.
Ironically, this change is likely to cause a no-op commit once it gets
to pipelines which iterate quickly. All for the greater good though.

Closes: #1865
Approved by: cgwalters
@rh-atomic-bot
Copy link

💔 Test failed - status-atomicjenkins

@jlebon
Copy link
Member Author

jlebon commented Jul 9, 2019

+ coreos-assembler init --force https://github.com/coreos/fedora-coreos-config
stat: cannot stat '/dev/kvm': No such file or directory

Hmm, were we scheduled on a node that doesn't have nested virt or something? This definitely works usually.

@rh-atomic-bot retry

@lucab
Copy link
Contributor

lucab commented Jul 9, 2019

Relevant to the "serde canonical json" conversation: serde-rs/json#309

@rh-atomic-bot
Copy link

⌛ Testing commit 8b3753f with merge b381e02...

rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Split out from `rpmostree_composeutil_read_json_metadata_from_file` the
part that actually converts to `GVariant` and inserts into the hash
table.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Move up the setting of the treefile JSON object to right after parsing,
and move down the populating of the metadata hash table to after setting
the treefile JSON object. This is pure code block moves; there's no
functional change otherwise.

Prep for future patch.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Add support for a new `add-commit-metadata` key in the treefile so that
we can directly specify commit metadata we want to inject from there.

This will be useful in Fedora CoreOS, where we'll have separate
treefiles for each streams, each with stream-specific metadata values
required.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
We were hitting the classic "negative test passes for the wrong reason".
It was failing not because it didn't have a parent, but because we
didn't pass `--repo`. Fix this and also explicitly check for the error
message we expect.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
For the same reasons we started doing it for the main app:
#1719

This time, it's `getrand` that broke us.
rust-random/getrandom#15

We should be able to update to 1.35.0 soon, which will unblock this.

Closes: #1865
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Jul 9, 2019
Move hashing to the Rust side so that we can easily hash over the final
set of inputs after parsing. This means that we now hash over all the
externals, like `add-files` references, any `postprocess-script` script,
and `passwd` and `group` files.

The original motivation for this was that hashing over a reserialized
version of the treefile was not deterministic now that treefiles include
hash tables (i.e. `add-commit-metadata`). So I initially included each
individual treefile as part of the hash.

I realized afterwards that just switching to `BTreeMap` fixes this, so
we can keep hashing only the final flattened reserialized treefile so we
ignore comments and whitespace too. But since I already wrote the patch,
and it fixes a real issue today... here we are.

One notable change though is that we now hash the treefile in non-pretty
mode to increase the chances that the serialized form remains stable.
Ironically, this change is likely to cause a no-op commit once it gets
to pipelines which iterate quickly. All for the greater good though.

Closes: #1865
Approved by: cgwalters
@rh-atomic-bot
Copy link

☀️ Test successful - status-atomicjenkins
Approved by: cgwalters
Pushing b381e02 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants