-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: provide more complete explanation of ChainID #586
Conversation
config.md
Outdated
The `ChainID` of an applied set of layers is defined with the following recursion: | ||
|
||
``` | ||
ChainID(L<sub>0</sub>) = DiffID(L<sub>0</sub>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subscript does not show up in fenced code block
4bfe1a9
to
624316b
Compare
config.md
Outdated
|
||
``` | ||
ChainID(L₀) = DiffID(L₀) | ||
ChainID(L₀|...|Lₙ₋₁|Lₙ) = SHA256(ChainID(L₀|...|Lₙ₋₁) + " " + DiffID(Lₙ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with Digest
, which is exactly that.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
624316b
to
0fa3cc5
Compare
👍 LGTM |
@@ -25,19 +25,61 @@ Changing it means creating a new derived image, instead of changing the existing | |||
|
|||
### Layer DiffID | |||
|
|||
A layer DiffID is a SHA256 digest over the layer's uncompressed tar archive and serialized in the descriptor digest format, e.g., `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`. | |||
A layer DiffID is the digest over the layer's uncompressed tar archive and serialized in the descriptor digest format, e.g., `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's lifting the requirement for DiffIDs to be SHA-256. That might be useful, but seems orthogonal enough to the ChainID definition that it should be part of a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a requirement.
With the old wording (“A layer DiffID is a SHA256 digest”) it certainly seems like using a sha256
digest is a requirement. Regardless of the original intent, clarifying this point seems orthogonal to documenting the chain ID. But shrug, I'm not a maintainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHA256 was just inherited from the original description of Docker's use. Best to not mention specific hashes except where explicitly required to avoid this type of uncertainty for future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're making (or clarifying) that the algorithm isn't fixed, then I'd also recommend using “a digest” instead of “the digest”, because the different algorithm choices allow multiple valid digests for the same content (e.g. sha256:2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
and sha512:f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7
are both valid digests for foo
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the digest" refers to "the digest" of the layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the digest" refers to "the digest" of the layer.
A given layer can have multiple digests (see my foo example). That means that an indefinite article is more appropriate than a definite article. Wikipedia gives the following example:
Likewise,
Give me the book.
refers to a specific book whose identity is known or obvious to the listener; as such it has a markedly different meaning from
Give me a book.
which uses an indefinite article, which does not specify what book is to be given.
In this case, the lack of an algorithm requirement means that there are several potential digests (one for each possible algorithm identifier), and we are not specifying which of those digests is intended (all are equally legal).
|
||
``` | ||
ChainID(L₀) = DiffID(L₀) | ||
ChainID(L₀|...|Lₙ₋₁|Lₙ) = Digest(ChainID(L₀|...|Lₙ₋₁) + " " + DiffID(Lₙ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digest(content)
is not well defined. I think you need the signature to be Digest(algorithm, content)
. For example:
Digest("sha256", ChainID(L₀|...|Lₙ₋₁) + " " + DiffID(Lₙ))
Otherwise the ChainID could be the result of different algorithm choices at each stage, and the final ChainID digest only identifies the algorithm for the final digest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is well-defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is well-defined.
Here's an explicit example. Consider the following DiffID array:
[
"sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c",
"sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730",
"sha256:bf07a7fbb825fc0aae7bf4a1177b2b31fcf8a3feeaf7092761e18c859ee52a9c"
]
Your PR makes it very clear that ChainID(L₀)
is sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c
. Good.
Now say the chain ID implementation decides to take a sha256
digest to get ChainID(L₀|L₁)
They get:
Digest("sha256", "sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730")
= sha256:edf2182426048661521fe12414e289f2187c9cb6df93bd05c17bb424b7c40804
Good. Say they decided to use sha512
:
Digest("sha512", "sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730")
= sha512:edf2182426048661521fe12414e289f2187c9cb6df93bd05c17bb424b7c40804
Also good. Clearly not the same as the sha256 version, but the algorithm identifier makes that clear.
Moving on to ChainID(L₀|L₁|L₂)
, say both parties decide to use sha256. The fellow who used sha256 for ChainID(L₀|L₁)
gets:
Digest("sha256", "sha256:edf2182426048661521fe12414e289f2187c9cb6df93bd05c17bb424b7c40804 sha256:bf07a7fbb825fc0aae7bf4a1177b2b31fcf8a3feeaf7092761e18c859ee52a9c")
= sha256:d4fb205067a0e6ccee30f4c08bcd7e7a7b2f60f49efcdab0f00139198e51eea1
And the fellow who used sha512
for ChainID(L₀|L₁)
gets:
Digest("sha256", "sha512:edf2182426048661521fe12414e289f2187c9cb6df93bd05c17bb424b7c40804 sha256:bf07a7fbb825fc0aae7bf4a1177b2b31fcf8a3feeaf7092761e18c859ee52a9c")
= sha256:d58dac7b666551a5ddaa5598e792035607a2611f29ad37082881a5bd1f462f49
That's two possible sha256:…
values for ChainID(L₀|L₁|L₂)
(sha256:d4…
and sha256:d5…
), so the chain ID is not well defined. In order to get a well-defined ChainID
, you need to either:
a. Require a particular algorithm identifier for all chainID digests (this is independent of the algorithm identifier used for the diffIDs themselves) or
b. Require chainID implementations to not swap algorithms as they process additional entries.
(a) could be phrased as:
ChainID(L₀|...|Lₙ₋₁|Lₙ) = Digest("sha256", ChainID(L₀|...|Lₙ₋₁) + " " + DiffID(Lₙ))
and (b) could be phrased as:
ChainID(algo, L₀) = DiffID(L₀)
ChainID(algo, L₀|...|Lₙ₋₁|Lₙ) = Digest(algo, ChainID(algo, L₀|...|Lₙ₋₁) + " " + DiffID(Lₙ))
where the algo
argument has no effect on the L₀ case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking this is defining the mathematical formula for a digest and reasoning behind it. Defining the algorithm or making requirements about algorithms matching is unrelated to the definition of a chain ID.
This is a perfectly valid chain id as the properties are reserved.
Digest("sha256", "sha512:edf2182426048661521fe12414e289f2187c9cb6df93bd05c17bb424b7c40804 sha256:bf07a7fbb825fc0aae7bf4a1177b2b31fcf8a3feeaf7092761e18c859ee52a9c")
The processing creating the chain ID is also responsible for providing the diff IDs, the algorithms of those diff digests are completely irrelevant to the chain id computation and the algorithm used for the end chain ID does not need to match the diff IDs. The chain ID algorithm can be set by the implementation of the chain ID and as with all cases where a digest is used, no algorithm should be preferred by specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking this is defining the mathematical formula for a digest and reasoning behind it.
Agreed.
Defining the algorithm or making requirements about algorithms matching is unrelated to the definition of a chain ID.
I agree that defining SHA-256 is unrelated, but I don't agree that algorithm selection is unrelated to the chainID definition.
This is a perfectly valid chain id as the properties are reserved.
Digest("sha256", "sha512:edf2182426048661521fe12414e289f2187c9cb6df93bd05c17bb424b7c40804 sha256:bf07a7fbb825fc0aae7bf4a1177b2b31fcf8a3feeaf7092761e18c859ee52a9c")
I'm unclear on what you mean by “the properties are reserved”.
The processing creating the chain ID is also responsible for providing the diff IDs…
This is not necessarily true. DiffIDs are part of the distributed config, so they are created by the config author. ChainIDs may be created by the config author (in which case your statement is true), but they may also be created by the config consumer (e.g. for identifying directories with unpacked/applied stacks of layers) and in that case your statement is false.
…the algorithms of those diff digests are completely irrelevant to the chain id computation and the algorithm used for the end chain ID does not need to match the diff IDs.
Agreed.
The chain ID algorithm can be set by the implementation of the chain ID and as with all cases where a digest is used, no algorithm should be preferred by specification.
That sounds like an argument for the ChainID(algo, L₀|...|Lₙ₋₁|Lₙ)
phrasing I recommended. Without that, there is no way to consistently check a claimed chainID
. Using my example values, it's not clear how, you could verify that a ChainID(L₀|L₁|L₂)
of sha256:d58dac7b666551a5ddaa5598e792035607a2611f29ad37082881a5bd1f462f49
was accurate unless you guessed (somehow) that the creator had used sha512
for the ChainID(L₀|L₁)
digest.
Leaving the algorithm choice unrestricted (and thus allowing algorithm swaps), means that there is no guess-free way of validating ChainID(L₀|…|Lₙ)
for n > 1
. You could define ChainID
that way, but when ChainID(algo, L₀|...|Lₙ₋₁|Lₙ)
avoids that limitation so easily, I don't see why we wouldn't want to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking You are assigning uncertainty where there is none. I please ask that you take time to understand the subject before commenting with authority. If there are questions you have, please don't be afraid to ask, but please pose them as questions.
I also believe that you are misunderstanding the notation. First off, the expression L₀|…|Lₙ
abstracts the subsequent application of layers. In other words, the file system state. In this context, we actually define the DiffID
in terms of the changeset, rather the contents of diff_ids
, which may be used but is not required. How you arrive at the result of the DiffID
is part of the implementation. The implementation provided here operate directly on an analogous array, but making this work in practice does require this.
Either way, the ChainID
isn't transmitted, so local implementations can choose specifics, while still allowing implementations that select DiffID
, ChainID
and Digest
carefully to yield compatible results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How you arrive at the result of the DiffID is part of the implementation.
…
Either way, the ChainID isn't transmitted, so local implementations can choose specifics…
Then why bother defining it at all? Can't we just point out that using a hash of the final layer is (obviously?) not unique to one particular array of layers and is a property shared by all arrays ending with that layer. I don't see the point of spending ~50 lines defining and explaining ChainID
if the end result is “but you're free to pick a lot of the unspecified parameters, including the DiffID(Lᵢ)
algorithm and digest hash, so don't expect your ChainID
s to match anyone else's”.
|
||
#### Explanation | ||
|
||
Let's say we have layers A, B, C, ordered from bottom to top, where A is the base and C is the top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Switching from Lᵢ
to A/B/C seems unnecessary. I'd just stick with the subscripted L
throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a specific example verses a generic case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“A, B, C” is not a specific example. A specific example would be “the array of strings ["a", "b", "c"]
or (if you prefer restricting ChainID to layers) “the array of layers ["<inline tar-a>", "<inline tar-b>", "<inline tar-c>"]
”. If you go with the latter, I recommend using very tiny tarballs ;).
For example, given base layer `A` and a changeset `B`, we refer to the result of applying `B` to `A` as `A|B`. | ||
|
||
Above, we define the `ChainID` for a single layer (`L₀`) as equivalent to the `DiffID` for that layer. | ||
Otherwise, the `ChainID` for `L₀|...|Lₙ₋₁|Lₙ` is defined as recursion `Digest(ChainID(L₀|...|Lₙ₋₁) + " " + DiffID(Lₙ))`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph doesn't add anything to the earlier definition that I can see. There's already enough going on here that I don't think repeating that definition here adds anything, so I'd like to drop this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called adding context. For those unfamiliar with recursion, or who practice literate reading, this is very helpful. It helps to assign an english meaning to an otherwise obtuse syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps to assign an english meaning to an otherwise obtuse syntax.
So keep this version and drop the earlier version? The earlier version is not much more compact than this paragraph, so if it is obtuse in any way I don't see the point of including it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different people understand things it different ways.
``` | ||
|
||
For this, we define the binary `|` operation to be the result of applying the right operand to the left operand. | ||
For example, given base layer `A` and a changeset `B`, we refer to the result of applying `B` to `A` as `A|B`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid defining |
and just define ChainID to act on an array of strings. In the common case, these strings will be digests over the uncompressed layer (i.e. DiffIDs), but there's no reason to limit the chain ID to those cases when it only complicates the definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid defining | and just define ChainID to act on an array of strings.
But that is not the definition of a ChainID
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid defining | and just define ChainID to act on an array of strings.
But that is not the definition of a ChainID.
Not as your PR stands with 0fa3cc5, but with a string-based ChainID definition, it's easy enough to couple back in your choice of DiffID(…)
implementation:
ChainID([DiffID(L₀), DiffID(L₁), …, DiffID(Lₙ)])
On the other hand, it's clearly impossible to decouple the ChainID
from layers if you require the arguments to be layers ;).
Your implementation seems to agree by accepting Digest
s and then treating them as opaque strings. There is nothing layer-specific in the current identity/chainid.go
code, and I like that, and think the Markdown definition should follow that example.
While it is implied that `C` is only useful when applied to `A|B`, the identifier `C` is insufficient to identify this result, as we'd have the equality `C = A|B|C`, which isn't true. | ||
|
||
The main issue is when we have two definitions of `C`, `C = C` and `C = A|B|C`. If this is true (with some handwaving), `C = x|C` where `x = any application` must be true. | ||
This means that if an attacker can define `x`, relying on `C` provides no guarantee that the layers were applied in any order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two paragraphs seem like they could be clarified. How about something like:
The final string
Lₙ
in an array of strings[L₀, L₁, …, Lₙ]
is not sufficient to uniquely identify the array, because many arrays might end in the same value. For example["a", "b"]
, and["a", "c", "b"]
both end inb
.
Then follow up with your next paragraph reiterating that the ChainID
depends on all of the strings in the array, so the ChainID
s of ["a", "b"]
, and ["a", "c", "b"]
will be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You keep trying to model this as an array of strings. It is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You keep trying to model this as an array of strings. It is not.
Hashing anything is an operation on byte strings. If you want to keep the DiffID(Lᵢ)
algorithm unspecified, you can accomplish that in the array-of-strings context by using ChainID([DiffID(L₀), DiffID(L₁), …, DiffID(Lₙ)])
, which I think is better separation of concerns than baking an undefined DiffID(Lᵢ)
algorithm into the ChainID
definition.
ChainID(A|B|C) = Digest(Digest(DiffID(A) + " " + DiffID(B)) + " " + DiffID(C)) | ||
``` | ||
|
||
Hopefully, the above is illustrative of the _actual_ contents of the `ChainID`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully? If there is any doubt, can we just use real values. With the array-of-strings formulation, you can use:
ChainID(["a", "b", "c"]) = Digest(Digest("a" + " " + "b") + " " + "c")
= Digest("sha256:01186fcf04b4b447f393e552964c08c7b419c1ad7a25c342a0b631b1967d3a27" + " " + "c")
= "sha256:638db252c0d891abd09b138672c11c4b4fb984121019bf1d526e200d704325d0"
Which has the additional benefit of being an easy unit test for any ChainID
implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are being hopeful about understanding of the reader, not the result.
It is clear that "hopeful" is the right sentiment, as I know of at least one reader who is struggling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are being hopeful about understanding of the reader, not the result.
Right. That's a sign that we doubt the clarity of the current wording, so we should hunt for wording that is clear enough to make us not doubt the reader's understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Everyone else I have shown this to seems to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everyone else I have shown this to seems to understand.
I'd still prefer real values. But if many people understand, and only one does not understand, and you feel like there is no room for clarification to help that remaining person, I think that's enough reason to drop this sentence. Letting them know that you hope they will understand is unlikely to impact their understanding. And while we hope readers understand everything in the spec, I think it's appropriate that we don't burn a line saying “We hope you understand this specification” in spec.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that I don't feel there is no room for clarification. It's that within a day of submitting this PR, which is not contentious at all, you have made upwards of 30 comments on it.
I can't tell if you are trying to slow OCI down for nefarious purposes or genuinely think your helping.
``` | ||
|
||
Hopefully, the above is illustrative of the _actual_ contents of the `ChainID`. | ||
Most importantly, we can easily see that `ChainID(C) != ChainID(A|B|C)`, otherwise, `ChainID(C) = DiffID(C)`, which is the base case, could not be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think otherwise, …
adds anything and would like to drop it. It looks like it is attempting to argue “if ChainID(C) = ChainID(A|B|C)
, then ChainID(C) = DiffID(C)
, so… something”. However, ChainID(C) = DiffID(C)
is true (it's the L₀
case). Seeing the (true) ChainID(C) = DiffID(C)
right next to “could not be true” is just confusing. What is true is that, baring some unlikely hash collisions, ChainID(C) != ChainID(A|B|C)
. And I think the earlier ChainID(A|B|C)
expansion into constituent digests makes that clear enough on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mirrors the structure of the definition, the prose description and
You need to differentiate between "I don't understand the implications of this equality" from "I don't understand the strength or reasoning behind these implications".
The whole point here is that DiffID(C)
is insufficient to identify the application of A|B|C
, and thus ChainID(C)
is insufficient. So, we have defined ChainID(A|B|C)
to allow this.
However, ChainID(C) = DiffID(C) is true (it's the L₀ case).
This is not the L₀ case. This is why we revert the to lettering to make this clear. C
represents an independent change set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point here is that DiffID(C) is insufficient to identify the application of A|B|C, and thus ChainID(C) is insufficient.
ChainID(C)
is sufficient to identify the layer array [C]
(and in that case it is clear that ChainID(C) = DiffID(C)
. ChainID(A|B|C)
is sufficient to identify the layer array [A, B, C]
, and it's not clear to me why anyone would attempt to identify that array with ChainID(C)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear to me why anyone would attempt to identify that array with ChainID(C).
In fact, ChainID(C) = ChainID(∅|C)
. It is the chain id for identifying the application of C to the empty file system.
If these aspects aren't clear to you, why are commenting with such authority? It seems like you don't have a basic understand of this problem or its solution, yet you are attempting to comment on it as if you have a full understanding or even expertise on the matter. I don't mind further clarifying this, but it makes it hard to have a conversation when you pose your attempt to understand as contrarian statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear to me why anyone would attempt to identify that array with ChainID(C).
In fact, ChainID(C) = ChainID(∅|C)…
To have ∅|C
you'd need an empty-tar layer. And ChainID(C) = DiffID(C) ≠ Digest(DiffID(∅) + " " + DiffID(C)) = ChainID(∅|C)
. There may be some benefit to a ChainID
definition that is based on the assembled filesystem (you get the same ChainID
for A|B|C
and D|E
if the resulting filesystems happen to be identical), but any ChainID
that is defined recursively on the constituent layers is not actually based on the resulting filesystem.
It is the chain id for identifying the application of C to the empty file system.
This is true for ChainID(C)
, but that's not what ChainID(∅|C)
is (ChainID(∅|C)
identifies applying the ∅ layer to an empty filesystem, and then applying C to that) . And either way, I still don't see why anyone would be using this to identify the layer array [A, B, C]
.
If these aspects aren't clear to you, why are commenting with such authority?
Because I think these aspects are clear to me?
I don't mind further clarifying this, but it makes it hard to have a conversation when you pose your attempt to understand as contrarian statements.
I think pushing back on “this doesn't make sense to me” is a good way to have a discussion that gets us to agreement. There are alternatives though. For example, I could file a PR with my own take on clarifying ChainID
(although it would be very similar to my December comment). If that seems like it would be more productive, let me know. But unless we find a way focus on the places where we aren't currently seeing eye to eye, I don't know how we'll resolve those differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
∅|C = C
On Wed, Mar 01, 2017 at 05:04:01PM -0800, Stephen Day wrote [1]:
I can't tell if you are trying to slow OCI down for nefarious
purposes or genuinely think your helping.
I'm trying to be helpful, but having more trouble than I expected :p.
Pulling this reply back out into the main thread to try to collect and
summarize the inline sub-threads, here are what I see as the main
issues being raised by this PR:
1. Are non-sha256 digests allowed in the diff_ids field? Folks seem
on board with “yes” [2]. The only push-back has been me with “I
think this change is orthogonal”, but I'm in favor of not requiring
sha256 however that gets to master.
2. Given some content, is there one obvious digest? I.e. is the
signature for the function:
a. Digest(content)
b. Digest(algo, content)
I think the answer is clearly (b), else why include the algorithm
identifier in the digest format [3,4]. But as of 0fa3cc5 this PR
is adding a number of calls with the (a) form and resistance to
using (b) seems to be grounded in issue 3 [5].
In the (b) case, I think we should be using “a digest” and not “the
digest” [6] in the absence of context which makes the choice of
algorithm clear.
3. Should the ChainID(…) for a given set of arguments be uniquely
determined by the arguments and the spec's ChainID definition? I
think it should, because it's easy to define (and test!)
deterministic definitions [6]. There seems to be resistance to
that on the grounds that ChainIDs aren't transmitted [7], but if we
aren't transmitting ChainIDs, why bother spec'ing them out [8]?
Just remind readers that the final element in an array doesn't
uniquely identify the full array [9] and be done with it.
4. What are the arguments to ChainID(…)?
There seem to be several options on the table here:
a. A single filesytem: ChainID(filesystemPath) [10],
b. An array of layers: ChainID([LBytes₀, …, LBytesₙ]) [11],
b. An array of layer digests: ChainID([LDigest₀, …, LDigestₙ]) [12], or
c. An array of strings: ChainID([s₀, …, sₙ]) [13]. And
d. Any of the above, with an additional algorithm identifier [6]:
i. ChainID(algo, filesystemPath)
ii. ChainID(algo, [LBytes₀, …, LBytesₙ])
iii. ChainID(algo, [LDigest₀, …, LDigestₙ])
iv. ChainID(algo, [s₀, …, sₙ])
(b) is what is in master, and as of 0fa3cc5 this PR is aiming for
some non-existent target between a and b [14]. I think the
clearest position is (d), to uniquely determine the ChainID(…) for
a given set of arguments [6] (issue 3). And within (d), I think
the clearest position is (d.iv), because it decouples the ChainID
algorithm from the “layer” concept. Folks interesting in coupling
layers back in can easily use:
ChainID(algo, [DiffID(LBytes₀), …, DiffID(LBytesₙ)])
or similar. (d.i) has some nice properties, like having the same
value for different layer arrays as long as they all result in the
same filesystem, but hashing a filesystem is a lot more complicated
than hashing a string.
[1]: https://github.com/opencontainers/image-spec/pull/586/files#r103829714
[2]: https://github.com/opencontainers/image-spec/pull/586/files#r103591296
[3]: https://github.com/opencontainers/image-spec/blame/v1.0.0-rc5/descriptor.md#L72-L73
[4]: https://github.com/opencontainers/image-spec/pull/586/files#r103591613
[5]: https://github.com/opencontainers/image-spec/pull/586/files#r103798102
[6]: https://github.com/opencontainers/image-spec/pull/586/files#r103773769
[7]: https://github.com/opencontainers/image-spec/pull/586/files#r103780739
[8]: 0fa3cc5#r103815082
[9]: 0fa3cc5#r103799840
[10]: 0fa3cc5#diff-c9c91c29b41257aea3a3403cc606ad99R50
[11]: https://github.com/opencontainers/image-spec/pull/586/files#r103800516
[12]: https://github.com/opencontainers/image-spec/blob/v1.0.0-rc5/identity/chainid.go#L30
[13]: https://github.com/opencontainers/image-spec/pull/586/files#r103792591
[14]: https://github.com/opencontainers/image-spec/pull/586/files#r103814966
|
@wking As I've said, increasing citations and verbosity does not increase the effectiveness of your point. Furthermore, continuing to ignore my refutations of your points does not make them true. To put this in context, I have already answered each point, in detail, identified in the previous comment, both here and in IRC. The level of disrespect exhibited in continuing to push these points, even after they have been addressed is both insulting and exhausting to the maintainers and contributors of this project. At this point, you have received several warnings for communicating in this manner. Continuing to pretend that communicating in this manner is helpful does not make it helpful. |
This look good to me. |
dammit, I missed getting my review in. suppose I will follow up |
With opencontainers#586 there are math spec characters introduced. This required a few new latex packages and specifying xelatex. Fixes opencontainers#596 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Signed-off-by: Stephen J Day stephen.day@docker.com
Closes #482.
cc @dmcgowan @opencontainers/image-spec-maintainers