-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
treefile: Add new add-commit-metadata
key
#1865
Conversation
Instead of relying on `rpmostree_composeutil_read_json_metadata` to initialize the metadata hash table, initialize it explicitly in `context_new()` function and only call the util function if we were passed a file with `--add-metadata-from-json`. Accordingly rename the function `rpmostree_composeutil_read_json_metadata_from_file`.
Split out from `rpmostree_composeutil_read_json_metadata_from_file` the part that actually converts to `GVariant` and inserts into the hash table.
Move up the setting of the treefile JSON object to right after parsing, and move down the populating of the metadata hash table to after setting the treefile JSON object. This is pure code block moves; there's no functional change otherwise. Prep for future patch.
See coreos/fedora-coreos-config#110 for example usage. |
Hmm OK and the Rust compile errors are the same as the ones in #1863 (comment). Will dig into this. |
So the issue was in fact that we didn't track Anyway, as mentioned in the commit message, we should be able to update to 1.35.0 soon too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, otherwise LGTM
Hmm, I'm not hitting this locally. Still trying to figure out what's going on here. |
Also not seeing this locally, running the CI command . CI is also showing
in |
Yeah, the issue is that we'll get different input hashes based on how serde decides to serialize the hash table (and so the ordering of keys can change on each iteration). I'm working on adding checksumming over all the file inputs to fix this (incidentally also fixing: https://github.com/projectatomic/rpm-ostree/blob/c94bd08b02499a3f2a5804f5c2ba207a0afbda18/src/app/rpmostree-composeutil.c#L90). |
Add support for a new `add-commit-metadata` key in the treefile so that we can directly specify commit metadata we want to inject from there. This will be useful in Fedora CoreOS, where we'll have separate treefiles for each streams, each with stream-specific metadata values required.
We were hitting the classic "negative test passes for the wrong reason". It was failing not because it didn't have a parent, but because we didn't pass `--repo`. Fix this and also explicitly check for the error message we expect.
For the same reasons we started doing it for the main app: coreos#1719 This time, it's `getrand` that broke us. rust-random/getrandom#15 We should be able to update to 1.35.0 soon, which will unblock this.
Added docs and fixed tests! ⬆️
In fact, the only change necessary for this was to swap out |
This is an high-level pattern which always causes me troubles, as it is hashing some input state that doesn't really exist and is not persisted. Would you consider just hashing the raw input bytes instead? |
Yeah, that's in fact what I initially had. I then went back to just covering the final flattened treefile (which is the status quo) when I realized it regressed on detecting what is essentially no-ops. Is the argument for it in that quote not motivation enough in your opinion?
This is not strictly true; the flattened treefile is written to ISTM like hashing all the treefiles is more of an escape hatch for being unable to easily deal with some tricky situations. Seems like it's worth trying the "smart approach" first before resorting to the easier one? That said, I'm not strongly against it either. Just slightly more leaning towards maintaining the status quo until there's an obvious problem with that (or we discover some subtleties/interactions with how this behaviour integrates with e.g. COSA or the FCOS/RHCOS pipelines in a negative way). |
I don't grasp the full context of the effects of those no-ops, so I am up to your call here. |
Another approach is to hash the JSON canonical form - also to write that by default. |
Any time we change the checksum algorithm, it will trigger a "pointless" build for users. This isn't a serious problem IMO, but we shouldn't do it often. Concretely for example, after this lands in Fedora, depending on whether or not there are AH/Silverblue updates pending, users may get a "no-op" update. IOW, if we're going to change it now we should feel it's pretty future proof. I like the direction the code is going in. However, it looks like we're relying on the Serde serialization, and I suspect that's not defined to be stable - they would feel free to e.g. add whitespace? It looks like that canonical JSON library I linked above is abandoned though 😦 I dunno. Maybe it's unlikely Serde will change and we can just live with that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Docs look good to me.
After thinking/reading a bit, the reasoning for using BTreeMap
(or any ordered datatype) makes sense to me now for the fields where ordering matters (e.g. the add-commit-metadata
key), vs. relying on hashing the content of the externals to detect changes in ordering. Nice that the issue of hashing the external files could also be addressed!
Yeah, agreed on both counts.
Ohh, that would've been perfect for us. I might create an RFE for serde_json to add a canonical mode to see what they think. I guess one thing we could do at least in the mean time is checksum in non-pretty mode. WRT checksumming all treefiles vs just the final flattened one, I think it comes down to which one will be the most "unstable", serde's format vs humans/robots making inconsequential changes to treefiles... I'd think it'd be more of the latter :) Note that the way we checksum things now is purely additive (i.e. we were already checksumming the flattened treefile before; we're just adding more things to the checksum). So at the very least, it's not going to be less stable than it already was. |
Oh this definitely; the pretty mode seems much more likely to change.
Right; as you noted in the commit message the original core issue was using Anyways...from my PoV we don't need to keep rehashing this more (so to speak), I'm good with the PR with the addition of using non-pretty mode per above. |
Move hashing to the Rust side so that we can easily hash over the final set of inputs after parsing. This means that we now hash over all the externals, like `add-files` references, any `postprocess-script` script, and `passwd` and `group` files. The original motivation for this was that hashing over a reserialized version of the treefile was not deterministic now that treefiles include hash tables (i.e. `add-commit-metadata`). So I initially included each individual treefile as part of the hash. I realized afterwards that just switching to `BTreeMap` fixes this, so we can keep hashing only the final flattened reserialized treefile so we ignore comments and whitespace too. But since I already wrote the patch, and it fixes a real issue today... here we are. One notable change though is that we now hash the treefile in non-pretty mode to increase the chances that the serialized form remains stable. Ironically, this change is likely to cause a no-op commit once it gets to pipelines which iterate quickly. All for the greater good though.
OK updated! I also added to the commit message:
😄 |
Instead of relying on `rpmostree_composeutil_read_json_metadata` to initialize the metadata hash table, initialize it explicitly in `context_new()` function and only call the util function if we were passed a file with `--add-metadata-from-json`. Accordingly rename the function `rpmostree_composeutil_read_json_metadata_from_file`. Closes: #1865 Approved by: cgwalters
Split out from `rpmostree_composeutil_read_json_metadata_from_file` the part that actually converts to `GVariant` and inserts into the hash table. Closes: #1865 Approved by: cgwalters
Move up the setting of the treefile JSON object to right after parsing, and move down the populating of the metadata hash table to after setting the treefile JSON object. This is pure code block moves; there's no functional change otherwise. Prep for future patch. Closes: #1865 Approved by: cgwalters
Add support for a new `add-commit-metadata` key in the treefile so that we can directly specify commit metadata we want to inject from there. This will be useful in Fedora CoreOS, where we'll have separate treefiles for each streams, each with stream-specific metadata values required. Closes: #1865 Approved by: cgwalters
We were hitting the classic "negative test passes for the wrong reason". It was failing not because it didn't have a parent, but because we didn't pass `--repo`. Fix this and also explicitly check for the error message we expect. Closes: #1865 Approved by: cgwalters
For the same reasons we started doing it for the main app: #1719 This time, it's `getrand` that broke us. rust-random/getrandom#15 We should be able to update to 1.35.0 soon, which will unblock this. Closes: #1865 Approved by: cgwalters
Move hashing to the Rust side so that we can easily hash over the final set of inputs after parsing. This means that we now hash over all the externals, like `add-files` references, any `postprocess-script` script, and `passwd` and `group` files. The original motivation for this was that hashing over a reserialized version of the treefile was not deterministic now that treefiles include hash tables (i.e. `add-commit-metadata`). So I initially included each individual treefile as part of the hash. I realized afterwards that just switching to `BTreeMap` fixes this, so we can keep hashing only the final flattened reserialized treefile so we ignore comments and whitespace too. But since I already wrote the patch, and it fixes a real issue today... here we are. One notable change though is that we now hash the treefile in non-pretty mode to increase the chances that the serialized form remains stable. Ironically, this change is likely to cause a no-op commit once it gets to pipelines which iterate quickly. All for the greater good though. Closes: #1865 Approved by: cgwalters
💔 Test failed - status-atomicjenkins |
Hmm, were we scheduled on a node that doesn't have nested virt or something? This definitely works usually. @rh-atomic-bot retry |
Relevant to the "serde canonical json" conversation: serde-rs/json#309 |
Split out from `rpmostree_composeutil_read_json_metadata_from_file` the part that actually converts to `GVariant` and inserts into the hash table. Closes: #1865 Approved by: cgwalters
Move up the setting of the treefile JSON object to right after parsing, and move down the populating of the metadata hash table to after setting the treefile JSON object. This is pure code block moves; there's no functional change otherwise. Prep for future patch. Closes: #1865 Approved by: cgwalters
Add support for a new `add-commit-metadata` key in the treefile so that we can directly specify commit metadata we want to inject from there. This will be useful in Fedora CoreOS, where we'll have separate treefiles for each streams, each with stream-specific metadata values required. Closes: #1865 Approved by: cgwalters
We were hitting the classic "negative test passes for the wrong reason". It was failing not because it didn't have a parent, but because we didn't pass `--repo`. Fix this and also explicitly check for the error message we expect. Closes: #1865 Approved by: cgwalters
For the same reasons we started doing it for the main app: #1719 This time, it's `getrand` that broke us. rust-random/getrandom#15 We should be able to update to 1.35.0 soon, which will unblock this. Closes: #1865 Approved by: cgwalters
Move hashing to the Rust side so that we can easily hash over the final set of inputs after parsing. This means that we now hash over all the externals, like `add-files` references, any `postprocess-script` script, and `passwd` and `group` files. The original motivation for this was that hashing over a reserialized version of the treefile was not deterministic now that treefiles include hash tables (i.e. `add-commit-metadata`). So I initially included each individual treefile as part of the hash. I realized afterwards that just switching to `BTreeMap` fixes this, so we can keep hashing only the final flattened reserialized treefile so we ignore comments and whitespace too. But since I already wrote the patch, and it fixes a real issue today... here we are. One notable change though is that we now hash the treefile in non-pretty mode to increase the chances that the serialized form remains stable. Ironically, this change is likely to cause a no-op commit once it gets to pipelines which iterate quickly. All for the greater good though. Closes: #1865 Approved by: cgwalters
☀️ Test successful - status-atomicjenkins |
Add support for a new
add-commit-metadata
key in the treefile so thatwe can directly specify commit metadata we want to inject from there.
This will be useful in Fedora CoreOS, where we'll have separate
treefiles for each streams, each with stream-specific metadata values
required.