-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First write-up for the new "offline attestation" functionality #76
Conversation
89854bd
to
60bdccd
Compare
There are two additional things that I would like to see being part of this proposal.
|
@THS-on regarding the first point, yes, I don't see the inclusion of "our" (IBM adapter to for the second point, I also do not have a fundamental disagreement with the proposal: the aforementioned separation is just logical and needed, and in my mind I see it as basically extra PRs as part of this very same enhancement. |
We should test it in the CI if possible. I would disable artifact collection by default, but making your implementation an official part Keylime is probably a good idea. In my team we had also the discussion of keeping artifacts and we are pretty open on which stack we use, so sharing that code is definitely beneficial. It would be interesting to hear what stack SUSE (@aplanas) and RedHat prefer.
Yes makes sense. Ideally this change is done before the artifact collection is implemented, so that the plugin API can be cleanly implement. |
Forgive me if I get wrong the context:
(@THS-on)
IMHO this is highly dependent of how the feature is implemented. Keylime + datastore (like Redis), as implied in the conversation, should be ultimately tested via an integration test, and this indeed makes the deployment more cumbersome. We try to use openQA for those test, and this will imply make a full keylime deployment. Can be done, and will be done if / when support the data storage use case. But most of our tests will reside in the |
big +1 from me. I am very supportive of this and myself and @lkatalin had been researching this with no knowledge that others had also had the same idea (which is a nice validation to see). We also worked on the premise of using rekor, as its an OSS project with a vendor diverse community and stewardship under the OpenSSF. It's also such an incredibly good fit. It can accept customisable attestation manifests (json) , will validate signatures (for non-repudiation) and has an inbuilt time stamping service. Also with it being tamper resistant and auditable, it makes a really good candidate for offline attestations. Not sure I see testing as highly problematic. I would be happy to help with this. |
into a JSON file, sign it (using the private key generated as part of the | ||
certificates for mTLS interaction with both `tenant` and `registrar`) and then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be better to use a tpm resident key to sign this which in turn can be chained back to the EK / tpm itself? I am not sure what value there is with having non-tpm resident private key sign things attestations.
Is it a case of the private key provides non-repudiation on which register posted to the datastore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly favor a change in the core keylime
in order to use TPM resident keys instead. However, for the moment, I advocate we separate this change from the whole "offline attestation" enhancement, and simply revisit it, with the appropriate modifications if (when?) we decide on such fundamental change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly favor a change in the core keylime in order to use TPM resident keys instead.
I think for the TLS connections it does not work, because on how the SSL library in Python works. Ideally we would reverse proxy the registrar and verifier through nginx, haproxy or similar for TLS instead of using the limited Python implementation.
For the used signing key I think it should be a separate one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason why I would like to have a separate discussion on which kinds of signing keys we should use on Keylime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following the logic here. Why even bother signing the attestations in the first place, especially with a file resident key. What guarantees does this get us, is this to somehow map the machine measured to the registrar / verifier who requested the attestation?
73-offline-attestation.md
Outdated
make a record of it on a tamper-resistant transparency log (e.g., Rekor). In | ||
addition to that, it will store the JSON file, the signature, and the public | ||
key on the time-series persistent datastore. This should allow an external |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if we were to use rekor, it includes a time-stamping service, this would also remove any risk of timing based attacks from using two separate stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my understanding that the most recent rekor
no longer includes a time-stamping authority API, but in my prototype I am interacting with freeTSA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, I stand corrected.
- The first PR will provide the "persistent datastore" plugin capability, to be | ||
called from with both `registrar` and `verifier` code. It will include a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to better understand the plugin architecture. Is this something that canonicalises materials into json and makes some sort of POST to the datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I can experimentally demonstrate that I can simply "jsonify" everything from within the registrar
/verifier
code (basically the agent
and json_response
python dictionaries) and push it into a datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with OP, IMHO there is more value of defining the API (open_datastore(), close_datastore(), is_valid(), insert_line(), etc) and the security guarantees.
The json serialization sure is OK, but it is the API (and the parameters required) the one that will make sure that all the data will be stored also in future versions. If not there is a risk that one of those internal state objects that we are storing change, and make the attestation impossible (or worse, fix the internal representation for ever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aplanas very well, I will add a proto-API to the enhancement text now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aplanas just did
73-offline-attestation.md
Outdated
- Add functionality on the `verifier` to record (in a time-series persistent | ||
datastore) all the information needed to perform attestation standlone (i.e., | ||
quotes and MB/IMA logs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By quotes would this be both the event log and PCRs? It would be nice to know what sort of information could be garnered by the elements we store for offline attestations.
It would be pretty cool if we were able to following this course of action:
- acme firmware is exploited by some nasty trojan.
- acme firmware with exploit has the following digest
- search transparency log for instances of exploited firmware and have guarantees around machines that booted with the acme firmware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I am storing "everything" that is sent by the agent
which does include quotes
, measured boot log and IMA log. With very few modifications (currently the keylime_attest
command-line uses the very same policy as defined in keylime.conf
) I see no impediments to implement your scenario.
NOTE: at this point, the transparency log is written only by the registrar
and its sole use is to record the "moment" when the EK and AIK were associated (i.e., when tpm2_makecredential
was run by the registrar
), so technically one just has to traverse the "time-series-like persistent datastore", with a new defined policy in hand, to check for the instances of the exploited firmware (which is covered by this new policy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to assert that "system X was valid on this time period", should be enough to store the validation result of the verifier for this machine?
I mean, the good thing about your proposal is that the attestation can be replicated, as we store all the data. I guess that this store the quotes, the nonces and all that is required to re-create the same OK / BAD result. But this storage needs to be "secured", as was already mentioned. Meaning that we need to assure that no data for any windows time has been removed, or the data replicated so many times, etc.
Those guarantees about the non-data modification will make that storing a triplet like "{date, machine UUID, status}" feature wise similar.
Or I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aplanas this is the fundamental aspect, and I believe it can be summarized in two main premises ( @mpeters @lukehinds @lkatalin @THS-on @galmasi among others, please speak up if you disagree):
P1 : If we are certain that a given AIK was generated by a given TPM (and thus to a given node), we are able to answer the key question "node N was attested at time T" as long as we have quotes (signed by the AIK tied to the TPM) and logs (i.e., the "attestation artifacts"). There is no intrinsic need for additional signing, nor encryption or even "security" for this long chain artifacts.
P2: We trust the registrar
to "associate" the EK to AIK, and store a record of this association on a transparency log, thus ensuring the "certainty" required by P1.
Some artifacts might be missing, and one could be unable to answer the "key question" for a specific point in time, but for those time windows where the artifacts are recorded, we can answer with the same level of certainty that is currently provided by the "online" verifier
.
Please do note that we don't even have to actually trust the verifier.
I do not see the need for a transparency log entry for every "artifact attestation package" collected by the verifier
for every agent
at every quote_interval
, but I am willing to discuss it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no intrinsic need for additional signing, nor encryption or even "security" for this long chain artifacts.
Uhm .. I am missing something.
If the stored timeline log is something like that:
Time UUID AIK Quote
==== ==== === =====
T1 ID1 K1 Q1
T2 ID1 K1 Q2
T3 ID1 K1 Q3
For the system ID1 we have 3 entries, we can use the AIK K1 to validate the quotes Q1, Q2, Q3 offline. Imagine that somehow Q2 and Q3 produces an invalid attestation.
Later someone change the stored logs to:
Time UUID AIK Quote
==== ==== === =====
T1 ID1 K1 Q1
T2 ID1 K1 Q1
T3 ID1 K1 Q1
Now all the times produce valid attestation.
To avoid that you are pointing to "tamper-resistant transparency log (e.g., Rekor)". I do not know Rekor, but sqlite, plain text, nor redis are tamper-resistant, so the modification situation will happen.
That is why I see that:
- We need to implement some anti tamper mechanism in the driver level for some storage. One can be sign the entries producing some merkle tree or something. Or signing the full storage once it is closed, and validating the signature on the open operation.
- If we already implement an anti tamper mechanism (or is available natively in the log storage, like seems to be the case for Rekor), maybe we do not need to store any quote information, and store directly the result of the verifier, in a fashion like:
Time UUID Result
==== ==== ======
T1 ID1 OK
T2 ID1 OK
T3 ID1 OK
That is not offline attestation, but if I am not missing something it is equivalent: we are able to answer if a system was valid in a period of time. For that we need first to validate that the storage has not been tampered, and later check the content without reproducing the attestations.
Sorry if I am missing the point. Maybe the core point is that for a proper audit we must replicate the attestation by a 3rd party, but in that case we should not use any code from Keylime to perform the audit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to define a clear threat model, so that we can discuss which parts we need to trust and what parts need to be secured.
I can think of three main scenarios.
The registrar and verifier are always trustworthy. We only assume that an attacker can modify the agents, therefore no other tampering protection is needed.
The registrar and verifier are generally trustworthy, but might get compromised.
- Here we need to have append only logs, so that old data cannot be removed.
The registrar and verifier are generally untrusted
- All data needs to be verifiable after the fact.
- Timestamps need to be enforced.
Thanks, the discussion only makes sense in light of some threat model. I believe the scheme proposed here would allow us to have "untrusted" verifiers
, but - unless we change the keys used to sign quotes - we will have to trust the registrar
.
There is also the case to whom we need to prove this. If it is just ourself then there is probably not the need for a transparancy log. If it is to a third-party a transparency log is helpful.
Right, "our" (IBM) use case is, a customer of our cloud hires an external auditor to check the state of the nodes.
Using 3. as the threat model is probably not possible, because I don't think we can find a way to ensure that the AK and EK belong to the same TPM. For make/activate credential we will always require some trust in the registrar. (One could work around it by provisioning the AK outside of Keylime, but only the rust agent has partial support for that.)
I don't see a way to use 3 given the current implementation of Keylime, I concur
I am not sure that Keylime is tracking this invariant [of the TPM time]
It is currently not tracking it. The TPM clock validation is part of #59. We could move it to a separate proposal.
I strongly favor a smaller PR where we extract the timestamp from the quote (I have a way of doing it with tpm2_print
), add a new column in the DB for this, and actually use the delta between the "current" one (from the quote) and the one stored on the DB to decide to reject a quote (if such delta is negative). I can open such PR as part of this enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm... but in this case, the verifier already uses a nonce to avoid "replay" from a malicious agent, so this problem is already solved for the "online" attestation.
I was always talking about the offline attestation, and the modification of the stored log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm... but in this case, the verifier already uses a nonce to avoid "replay" from a malicious agent, so this problem is already solved for the "online" attestation.
I was always talking about the offline attestation, and the modification of the stored log.
I see, right right, an attacker that compromises the "time-series datastore" (in my mind, it was always orthogonal to the threat model for keylime). Being as it may, do you agree that relying on the timestamp of the quotes should cover this gap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you agree that relying on the timestamp of the quotes should cover this gap?
That was also commented before. IMHO no, at least no directly.
In the trivial case:
add a new column in the DB for this, and actually use the delta between the "current" one (from the quote) and the one stored on the DB to decide to reject a quote (if such delta is negative)
In the first example, when Q1 is replicated in the log, delta would be 0 a no negative. So this invariant of negative deltas does no catch this issue.
But what worries me more (and here I am talking from the inexperience, as I did not dig enough in the details) is that the section of the specification referred before (10.11.1 TPMS_CLOCK_INFO) do not state that the clock is monotonically increasing, as is cleared during the tpm2_clear operation:
This value is reset to zero when the Storage Primary Seed is changed (TPM2_Clear()).
So at least the condition seems to be around that for the same AIK (because I guess that after a tpm2_clear we will have a new one), the clock delta should be > 0. When the AIK is different, this delta should be < 0.
But the rule can be more complex, as the clock value is persisted one per ~1hr. This suggest that if there is a reset during the same hour, we will have valid quotes with valid clocks that do not keep the invariant of delta > 0 (the clock value will repeat values after the reset). So we need to add into the equation some "reset" counter, that I am not aware of.
Edit: this same struct has resetCount and restartCount, that seems that can help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, it has to be non-zero, evidently. Good point (tpm2_print
will extract all three already). So, for a given AIK (changed whenever the agent
restarts), within a given resetCount
and restartCount
, we can assume that clock will monotonically increase, and only accept quotes with a positive non-zero delta (from the prior).
This could be encoded on the "online" verifier
code (this is my preference) or we can do it directly on the new keylime_attest
CLI.
For reference, here is an example output of tpm2_print
magic: ff544347
type: 8018
qualifiedSigner: 000b3e403b6694301264d3865771f07416228cc51fe86009a189a8cd27c4d3879979
extraData: 6e33545a583939614a35744b4f36544364424c48
clockInfo
clock: 685618775
resetCount: 0
restartCount: 0
safe: 1
firmwareVersion: 3636160023101920
attested
quote:
pcrSelect
count: 1
pcrSelections:
0:
hash: 11 (sha256)
sizeofSelect: 3
pcrSelect: 010001
pcrDigest: 3725b2fe95da5d7674076e709dec0cb8641f406a863759afab6d0dad73bcdb19\n"
Good to know that we weren't to only ones envisioning the need and willing to work on this feature @lukehinds ! My initial prototype does indeed use |
73-offline-attestation.md
Outdated
This should also include some proof that a given AIK created on a TPM by an | ||
`agent` was indeed tied to a given EK, a process that is done by the | ||
`registrar` and whose responsibility is to store it on a tamper-resistant | ||
metadatastore (e.g. transparency log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question, can we provide a simple implementation in something self contained, like Sqlite, or even better, a plain text?
My point is that maybe this RFC should be about defining an API that Keylime expect from a driver to be implemented, and provide some drivers for plain text, sqlite, redis or rekor. In this RFC maybe we need to describe the security guarantees like that the open
operation should validate that no change has been done in the storage.
Sometimes those guarantees should be delegated in the database itself, but other should be implemented in the driver (like signing each entry line, or during the close
sign the full content and the metadata of the file ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe having a "plain text" implementation is a most welcome suggestion, and should go a long way in making it more testable. I will include that o the PR.
73-offline-attestation.md
Outdated
- Add functionality on the `verifier` to record (in a time-series persistent | ||
datastore) all the information needed to perform attestation standlone (i.e., | ||
quotes and MB/IMA logs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to assert that "system X was valid on this time period", should be enough to store the validation result of the verifier for this machine?
I mean, the good thing about your proposal is that the attestation can be replicated, as we store all the data. I guess that this store the quotes, the nonces and all that is required to re-create the same OK / BAD result. But this storage needs to be "secured", as was already mentioned. Meaning that we need to assure that no data for any windows time has been removed, or the data replicated so many times, etc.
Those guarantees about the non-data modification will make that storing a triplet like "{date, machine UUID, status}" feature wise similar.
Or I am missing something?
73-offline-attestation.md
Outdated
dictionary) from the `agent` - which will include both quotes and logs (MB | ||
and IMA) - `agent` data (python dictionary) from the SQL database (internal to | ||
Keylime) and the `agentAttestState` python object, combine it into a single | ||
record and store it on the time-series persistent datastore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this proposal will store marshaled Python objects? IMHO we should make the effort (if we indeed decide to store this data) to serialize in a more agnostic format (json, yaml). So a 3rd party software can still read it and replicate the attestation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it :-) I've studied the code further and simple json
is enough, no need to marshal/pickle anymore.
- The first PR will provide the "persistent datastore" plugin capability, to be | ||
called from with both `registrar` and `verifier` code. It will include a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with OP, IMHO there is more value of defining the API (open_datastore(), close_datastore(), is_valid(), insert_line(), etc) and the security guarantees.
The json serialization sure is OK, but it is the API (and the parameters required) the one that will make sure that all the data will be stored also in future versions. If not there is a risk that one of those internal state objects that we are storing change, and make the attestation impossible (or worse, fix the internal representation for ever).
Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
0221544
to
eb388dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the write-up @maugustosilva and sorry I wasn't able to comment before now. @lukehinds is right, there is a high degree of intersection here with our team's work. For context, I have been thinking of this from the Rekor side, asking whether it makes sense to have a new type in Rekor to store information about Keylime attestations.
I have some questions about the enhancement and to check that I'm understanding:
-
As this offline attestation may occur after the nodes in question no longer exist, is the idea that the external auditor would want to find all attestations related to a specific TPM/EK? (How do they decide which EKs are relevant to audit? Just curious about this.)
-
I'm trying to understand what is pluggable vs. not vs. out of scope for the new API and CLI, and which components are talking to which others. Is this an accurate overview?
user submits EK< ==> new `keylime_attest` CLI <==> pluggable API supplied in config by user as Python modules <==> log, TSDB backends
So the CLI is user-facing and the logic for the API between Keylime and the log/TSDB backends are user-supplied?
-
There is a clear separation of duties in this enhancement between the tamper-evident log for AIK/EK association and the time-series database for everything else. I have instead been thinking of storing all data/metadata in a log (specifically Rekor), and based on the discussion comments I also see need for having all data in a tamper-evident, append-only log for at least one threat model. Is there any reason that all the data in question here (ex. quote, boot logs, IMA logs, etc.) couldn't go into a transparency log along with the AIK/EK mapping if the user wanted? Admittedly I don't know much at all about TSDBs but I believe Rekor could support having a type or types that store all of this data and is queryable on something like the EK. This means the tamper-evident, immutable, etc. guarantees of Rekor would apply to all the data. For example, the data mentioned to check the AIK/EK association (with the signature and pubkey used by the registrar) is going into a TSDB in this enhancement but this sounds like exactly the type of data Rekor stores. Could / should the data just go into Rekor or at least should we allow for this option?
-
I am trying to get a good overview of the totality of the data that should be stored in the log and/or TSDB (this is the data that would be stored in the potential new Rekor "Keylime type" if it makes sense to create this type). Is this an accurate list?
- Association between AIK/EK
- The association from above plus the registrar's signature on it and the pubkey used to sign (such that signature can be checked)
- Quote, measured boot logs, IMA logs, and nonce for an offline attestation
- Time info related to the Quote (ideally embedded in the Quote?) along with
resetCount
andrestartCount
values as discussed in comments - Anything else?
73-offline-attestation.md
Outdated
record generated by the `registrar` to indicate the association between EK | ||
and AIK is enough. Once this is done, offline attestation has basically the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record generated by the `registrar` to indicate the association between EK | |
and AIK is enough. Once this is done, offline attestation has basically the | |
record generated by the `registrar` is enough to prove the association between EK | |
and AIK. Once this is done, offline attestation has basically the |
Because I found myself asking: "Enough to do what?" with the original wording.
73-offline-attestation.md
Outdated
Provide Keylime with the ability to store all the required information to | ||
perform a full attestation, in a persistent external time-series datastore. | ||
This should also include some proof that a given AIK created on a TPM by an | ||
`agent` was indeed tied to a given EK (from the TPM locate at the node where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`agent` was indeed tied to a given EK (from the TPM locate at the node where | |
`agent` was indeed tied to a given EK (from the TPM located at the node where |
73-offline-attestation.md
Outdated
This should also include some proof that a given AIK created on a TPM by an | ||
`agent` was indeed tied to a given EK (from the TPM locate at the node where | ||
it was running). This AIK/EK association process is done by the `registrar` | ||
it will its responsibility to store a record of such association on a tamper-resistant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will its responsibility to store a record of such association on a tamper-resistant | |
it will be its responsibility to store a record of such association on a tamper-resistant |
73-offline-attestation.md
Outdated
|
||
The main motivation for adding this functionality is to give auditors and other | ||
compliance officers the ability to answer, with a proper degree of certainty | ||
and trust the following question: did node N had its software stack fully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and trust the following question: did node N had its software stack fully | |
and trust the following question: did node N have its software stack fully |
73-offline-attestation.md
Outdated
The main motivation for adding this functionality is to give auditors and other | ||
compliance officers the ability to answer, with a proper degree of certainty | ||
and trust the following question: did node N had its software stack fully | ||
attested at date D? Being date "D" a point time that could be well in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attested at date D? Being date "D" a point time that could be well in the | |
attested at date D? Being date "D" a point in time that could be well in the |
73-offline-attestation.md
Outdated
every time an `agent` is restarted on Keylime. | ||
- The `verifier` will be modified to take the `json_response` (python | ||
dictionary) from the `agent` - which will include both quotes and logs (MB | ||
and IMA) - `agent` data (python dictionary) from the SQL database (internal to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is agent data
just the agent's UUID or is there other info to be included here?
73-offline-attestation.md
Outdated
start with the high-level sections and fill out details incrementally in | ||
subsequent PRs. | ||
--> | ||
# enhancement-#40: TPM 2.0 Pre-Boot Event log support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update title?
@lkatalin sorry for the delay in answering, I have been unwell for the past few days. I will try to address all points
|
could you outline 'optional' and how this would be presented, something like a bool config value? I agree that pushing data to a tlog every-time a quote is polled would be a huge volume. I instead imagine there could be a less frequent push than the standard quote frequency keylime has (every 10 minutes instead of every 0.5 seconds) or perhaps an initial bundle is pushed to show the state at first boot. One factor to consider, is that rekor (as you mention that solution) supports sharding, so the tlog won't perpetually grow and slow down over time. @lkatalin developed that feature and so knows it well. |
Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
@lukehinds @aplanas @lkatalin @THS-on Updated and refined the document a little bit more, hopefully addressing all the concerns, questions and comments thus far. |
73-offline-attestation.md
Outdated
The goal of this enhancement is to provide Keylime with the ability to store | ||
all the required information to perform a full attestation, in a persistent | ||
external time-series like datastore. | ||
This should also include some proof that a given AIK created on a TPM by an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: AIK is a pre 2.0 key AFAIK, the name since 2.0 is AK. Seems also that AIK can sign only a subset of objects and the new AK can sign. (Feel free to ignore this comment, as is very clear what is referring in the document)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should make this consistent with the TPM 2.0 spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, shall fix it on the text.
73-offline-attestation.md
Outdated
--> | ||
|
||
- The `registrar` will be modified to, upon initial `agent` registration - | ||
which includes the execution of `tpm2_makecredential` - record the EK, AIK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can be wrong here (please, forgive me if I am writing non-sense), but if we store the AK name, the public EK and the secret, the auditor cannot still validate the relation between both EK and AK, as would still require a connection to the node to recover the secret. The issue is that at this time is fairly possible that the AK is not valid anymore for this agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. The (only) way you could verify the that the TPM2_ActivateCredential was not fraudulent is by attesting the AK again. No audit trail of the operations is useful when the agent and registrar are compromised.
Requirements: the TPM, private and public portion of the AK, the EK_pub, secret
Steps
- Generate MakeCredential blob EK_pub(secret, AK_name)
- Load the AK again on the TPM
- Let the TPM decrypt the MakeCredential blob to retrieve the secret
- Check if the secret is correct.
Storing the original secret, and makeCredntial blob is fine, but is this reproducible or does this structure contain some nonces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a crucial aspect of the whole endeavor, and I do thank you both to give the opportunity to clarify. No, it is not possible to re-execute the tpm2_makecredential
. The (now I see, insufficiently clear) mais point here is: "everybody" involved will trust the registrar
enough to consider that a signed JSON file with the AK, EK and EK cert signed by this component is proof of the association between AK and EK. While I freely grant that might not be acceptable in all cases, this is certainly so for our currently Production deployment (which provides a "proof of existence" of the eventual sufficiency of it).
73-offline-attestation.md
Outdated
dictionary) from the `agent` - which will include both TPM quotes and logs (MB | ||
and IMA) - plus `agent` data (python dictionary which includes all columns stored) | ||
from the SQL database, internal to Keylime), combine it into a single (python | ||
dictionary record) and store it on a time-series datastore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not like this: store a Python dictionary directly. I guess that will be serialized to something like json at the end (if so should be explicit).
But the main problem that I see is that we are storing an internal state of keylime. This internal state can change in the future, so maybe in the next keylime version will be storing something a little different, and in many releases we will have different formats.
IMHO should be only listed the fields that are useful for auditing, without relating them to any internal state (in this case the json_response
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "internal state" of Keylime is needed only in case an external third-party auditor wants to use the verifier
code unmodified to perform offline attestation. Again, there is a real use case for this with our Production customers, but the fact we are storing more, not less data should not prevent other parties to write offline attestation code from scratch, simply ignoring any "Keylime-specific state"
73-offline-attestation.md
Outdated
stores) and the URLs for these these new proposed stores will be | ||
supplied by the user as parameters under the `[general]` section: | ||
`durable_attestation_import`, `persistent_store_url` and | ||
`transparency_log_url`. The URL format is similar to the one already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between a persistent store and a transparency log? Somehow in my brain both were used to store the attestation data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, very good question. I will let to others to refine and present formal definitions according to well founded terminology and epistemology :-) but for now will summarize as: a transparency log stores records of signature based on the contents of an object, while a persistent store holds the contents of the actual object.
signing also stored on tamper-resistant transparency log. The value of | ||
this attribute - `signed_attributes` - could be set to `all`, which will | ||
result in the `verifier` signing **everything**, including the | ||
`json_response` data package from the `agent`, and storing a record on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we should not do that optional. There is only a fixed set of fields that are of utility for offline attestation. We should store that and no more.
The risks are: we can leak data, we can store different things in different times (potentially breaking the audit code), and we can take the risk of storing too few data that the ones required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of the contents of every single piece of data required by attestation would imply trust in the verifier
, something that we are trying to avoid. Making it optional does not prevent a deployment in a scenario where this (trust in the verifier
) is the case, while still accommodating "our" (IBM cloud) production scenario.
73-offline-attestation.md
Outdated
start with the high-level sections and fill out details incrementally in | ||
subsequent PRs. | ||
--> | ||
# enhancement-#73: Durable (Offline) Attestation suppor in Keylime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# enhancement-#73: Durable (Offline) Attestation suppor in Keylime | |
# enhancement-#73: Durable (Offline) Attestation support in Keylime |
73-offline-attestation.md
Outdated
- Three new parameters: the name of a python module to be dynamically | ||
imported (which will contain code used to interact with these new proposed | ||
stores) and the URLs for these these new proposed stores will be | ||
supplied by the user as parameters under the `[general]` section: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those options should be in the respective registrar and verifier sections. We are trying to remove the general section.
persistent_store_url
and transparency_log_url
should be in a section defined by the plugin and not in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, will modify both the code and the description.
Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
No description provided.