Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First write-up for the new "offline attestation" functionality #76

Merged
merged 4 commits into from
Jul 21, 2022

Conversation

maugustosilva
Copy link
Contributor

No description provided.

@maugustosilva maugustosilva requested review from mpeters and THS-on June 8, 2022 17:38
@maugustosilva maugustosilva force-pushed the offline_attestation branch from 89854bd to 60bdccd Compare June 8, 2022 17:39
@THS-on
Copy link
Member

THS-on commented Jun 9, 2022

There are two additional things that I would like to see being part of this proposal.

  1. The addition of one official plugin for the datastore and transparency log (e.g. Redis and Rekor or what IBM chooses to use in production). With the measured boot policy it does not make sense to maintain a official one (besides the example policy) because everything before the shim is highly environment specific. This is not necessarily the case for this (e.g. new users just need to deploy Redis and Rekor if they want this feature). My fear is that by adding another plugin API is, that we have parts of Keylime that are unusable because no public implementations are available.

  2. What I mentioned already yesterday that we introduce a clean separation between data collection, agent state and actual attestation. I think this proposal is the right place to put it and doing the separation is also beneficial for the push model idea.

@maugustosilva
Copy link
Contributor Author

@THS-on regarding the first point, yes, I don't see the inclusion of "our" (IBM adapter to Redis and Rekor) as a problem or blocker. However, it is not clear to me that other will be as enthusiastic about it, and with good reason: it will make the testing of it far more complex.

for the second point, I also do not have a fundamental disagreement with the proposal: the aforementioned separation is just logical and needed, and in my mind I see it as basically extra PRs as part of this very same enhancement.

@THS-on
Copy link
Member

THS-on commented Jun 9, 2022

However, it is not clear to me that other will be as enthusiastic about it, and with good reason: it will make the testing of it far more complex.

We should test it in the CI if possible. I would disable artifact collection by default, but making your implementation an official part Keylime is probably a good idea. In my team we had also the discussion of keeping artifacts and we are pretty open on which stack we use, so sharing that code is definitely beneficial. It would be interesting to hear what stack SUSE (@aplanas) and RedHat prefer.

for the second point, I also do not have a fundamental disagreement with the proposal: the aforementioned separation is just logical and needed, and in my mind I see it as basically extra PRs as part of this very same enhancement.

Yes makes sense. Ideally this change is done before the artifact collection is implemented, so that the plugin API can be cleanly implement.

@aplanas
Copy link

aplanas commented Jun 10, 2022

Forgive me if I get wrong the context:

(@maugustosilva)

it will make the testing of it far more complex

(@THS-on)

It would be interesting to hear what stack SUSE

IMHO this is highly dependent of how the feature is implemented. Keylime + datastore (like Redis), as implied in the conversation, should be ultimately tested via an integration test, and this indeed makes the deployment more cumbersome. We try to use openQA for those test, and this will imply make a full keylime deployment. Can be done, and will be done if / when support the data storage use case.

But most of our tests will reside in the %check section of the package. This place is more suitable for unit test and I think that there is a place for all the keylime features to be tested in this way. For this one a mock data storage can take note of the insertion and queries to, at least, evaluate API. It is a very limited test, but still helpful by itself when we backport patches from master to an already released version of keylime.

@lukehinds lukehinds requested review from lukehinds and lkatalin June 16, 2022 06:47
@lukehinds
Copy link
Member

lukehinds commented Jun 16, 2022

big +1 from me. I am very supportive of this and myself and @lkatalin had been researching this with no knowledge that others had also had the same idea (which is a nice validation to see).

We also worked on the premise of using rekor, as its an OSS project with a vendor diverse community and stewardship under the OpenSSF. It's also such an incredibly good fit. It can accept customisable attestation manifests (json) , will validate signatures (for non-repudiation) and has an inbuilt time stamping service. Also with it being tamper resistant and auditable, it makes a really good candidate for offline attestations.

Not sure I see testing as highly problematic.

I would be happy to help with this.

Comment on lines +164 to +168
into a JSON file, sign it (using the private key generated as part of the
certificates for mTLS interaction with both `tenant` and `registrar`) and then
Copy link
Member

@lukehinds lukehinds Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to use a tpm resident key to sign this which in turn can be chained back to the EK / tpm itself? I am not sure what value there is with having non-tpm resident private key sign things attestations.

Is it a case of the private key provides non-repudiation on which register posted to the datastore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly favor a change in the core keylime in order to use TPM resident keys instead. However, for the moment, I advocate we separate this change from the whole "offline attestation" enhancement, and simply revisit it, with the appropriate modifications if (when?) we decide on such fundamental change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly favor a change in the core keylime in order to use TPM resident keys instead.

I think for the TLS connections it does not work, because on how the SSL library in Python works. Ideally we would reverse proxy the registrar and verifier through nginx, haproxy or similar for TLS instead of using the limited Python implementation.

For the used signing key I think it should be a separate one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why I would like to have a separate discussion on which kinds of signing keys we should use on Keylime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following the logic here. Why even bother signing the attestations in the first place, especially with a file resident key. What guarantees does this get us, is this to somehow map the machine measured to the registrar / verifier who requested the attestation?

Comment on lines 166 to 171
make a record of it on a tamper-resistant transparency log (e.g., Rekor). In
addition to that, it will store the JSON file, the signature, and the public
key on the time-series persistent datastore. This should allow an external
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if we were to use rekor, it includes a time-stamping service, this would also remove any risk of timing based attacks from using two separate stores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that the most recent rekor no longer includes a time-stamping authority API, but in my prototype I am interacting with freeTSA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I stand corrected.

Comment on lines +235 to +242
- The first PR will provide the "persistent datastore" plugin capability, to be
called from with both `registrar` and `verifier` code. It will include a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to better understand the plugin architecture. Is this something that canonicalises materials into json and makes some sort of POST to the datastore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I can experimentally demonstrate that I can simply "jsonify" everything from within the registrar/verifier code (basically the agent and json_response python dictionaries) and push it into a datastore.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with OP, IMHO there is more value of defining the API (open_datastore(), close_datastore(), is_valid(), insert_line(), etc) and the security guarantees.

The json serialization sure is OK, but it is the API (and the parameters required) the one that will make sure that all the data will be stored also in future versions. If not there is a risk that one of those internal state objects that we are storing change, and make the attestation impossible (or worse, fix the internal representation for ever).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aplanas very well, I will add a proto-API to the enhancement text now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aplanas just did

Comment on lines 131 to 136
- Add functionality on the `verifier` to record (in a time-series persistent
datastore) all the information needed to perform attestation standlone (i.e.,
quotes and MB/IMA logs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By quotes would this be both the event log and PCRs? It would be nice to know what sort of information could be garnered by the elements we store for offline attestations.

It would be pretty cool if we were able to following this course of action:

  1. acme firmware is exploited by some nasty trojan.
  2. acme firmware with exploit has the following digest
  3. search transparency log for instances of exploited firmware and have guarantees around machines that booted with the acme firmware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I am storing "everything" that is sent by the agent which does include quotes, measured boot log and IMA log. With very few modifications (currently the keylime_attest command-line uses the very same policy as defined in keylime.conf) I see no impediments to implement your scenario.

NOTE: at this point, the transparency log is written only by the registrar and its sole use is to record the "moment" when the EK and AIK were associated (i.e., when tpm2_makecredential was run by the registrar), so technically one just has to traverse the "time-series-like persistent datastore", with a new defined policy in hand, to check for the instances of the exploited firmware (which is covered by this new policy)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to assert that "system X was valid on this time period", should be enough to store the validation result of the verifier for this machine?

I mean, the good thing about your proposal is that the attestation can be replicated, as we store all the data. I guess that this store the quotes, the nonces and all that is required to re-create the same OK / BAD result. But this storage needs to be "secured", as was already mentioned. Meaning that we need to assure that no data for any windows time has been removed, or the data replicated so many times, etc.

Those guarantees about the non-data modification will make that storing a triplet like "{date, machine UUID, status}" feature wise similar.

Or I am missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aplanas this is the fundamental aspect, and I believe it can be summarized in two main premises ( @mpeters @lukehinds @lkatalin @THS-on @galmasi among others, please speak up if you disagree):

P1 : If we are certain that a given AIK was generated by a given TPM (and thus to a given node), we are able to answer the key question "node N was attested at time T" as long as we have quotes (signed by the AIK tied to the TPM) and logs (i.e., the "attestation artifacts"). There is no intrinsic need for additional signing, nor encryption or even "security" for this long chain artifacts.

P2: We trust the registrar to "associate" the EK to AIK, and store a record of this association on a transparency log, thus ensuring the "certainty" required by P1.

Some artifacts might be missing, and one could be unable to answer the "key question" for a specific point in time, but for those time windows where the artifacts are recorded, we can answer with the same level of certainty that is currently provided by the "online" verifier.

Please do note that we don't even have to actually trust the verifier.

I do not see the need for a transparency log entry for every "artifact attestation package" collected by the verifier for every agent at every quote_interval, but I am willing to discuss it further.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no intrinsic need for additional signing, nor encryption or even "security" for this long chain artifacts.

Uhm .. I am missing something.

If the stored timeline log is something like that:

Time UUID AIK Quote
==== ==== === =====
T1   ID1  K1  Q1
T2   ID1  K1  Q2
T3   ID1  K1  Q3

For the system ID1 we have 3 entries, we can use the AIK K1 to validate the quotes Q1, Q2, Q3 offline. Imagine that somehow Q2 and Q3 produces an invalid attestation.

Later someone change the stored logs to:

Time UUID AIK Quote
==== ==== === =====
T1   ID1  K1  Q1
T2   ID1  K1  Q1
T3   ID1  K1  Q1

Now all the times produce valid attestation.

To avoid that you are pointing to "tamper-resistant transparency log (e.g., Rekor)". I do not know Rekor, but sqlite, plain text, nor redis are tamper-resistant, so the modification situation will happen.

That is why I see that:

  1. We need to implement some anti tamper mechanism in the driver level for some storage. One can be sign the entries producing some merkle tree or something. Or signing the full storage once it is closed, and validating the signature on the open operation.
  2. If we already implement an anti tamper mechanism (or is available natively in the log storage, like seems to be the case for Rekor), maybe we do not need to store any quote information, and store directly the result of the verifier, in a fashion like:
Time UUID Result
==== ==== ======
T1   ID1  OK
T2   ID1  OK
T3   ID1  OK

That is not offline attestation, but if I am not missing something it is equivalent: we are able to answer if a system was valid in a period of time. For that we need first to validate that the storage has not been tampered, and later check the content without reproducing the attestations.

Sorry if I am missing the point. Maybe the core point is that for a proper audit we must replicate the attestation by a 3rd party, but in that case we should not use any code from Keylime to perform the audit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to define a clear threat model, so that we can discuss which parts we need to trust and what parts need to be secured.

I can think of three main scenarios.

  1. The registrar and verifier are always trustworthy. We only assume that an attacker can modify the agents, therefore no other tampering protection is needed.

  2. The registrar and verifier are generally trustworthy, but might get compromised.

    • Here we need to have append only logs, so that old data cannot be removed.
  3. The registrar and verifier are generally untrusted

    • All data needs to be verifiable after the fact.
    • Timestamps need to be enforced.

Thanks, the discussion only makes sense in light of some threat model. I believe the scheme proposed here would allow us to have "untrusted" verifiers, but - unless we change the keys used to sign quotes - we will have to trust the registrar.

There is also the case to whom we need to prove this. If it is just ourself then there is probably not the need for a transparancy log. If it is to a third-party a transparency log is helpful.

Right, "our" (IBM) use case is, a customer of our cloud hires an external auditor to check the state of the nodes.

Using 3. as the threat model is probably not possible, because I don't think we can find a way to ensure that the AK and EK belong to the same TPM. For make/activate credential we will always require some trust in the registrar. (One could work around it by provisioning the AK outside of Keylime, but only the rust agent has partial support for that.)

I don't see a way to use 3 given the current implementation of Keylime, I concur

I am not sure that Keylime is tracking this invariant [of the TPM time]

It is currently not tracking it. The TPM clock validation is part of #59. We could move it to a separate proposal.

I strongly favor a smaller PR where we extract the timestamp from the quote (I have a way of doing it with tpm2_print), add a new column in the DB for this, and actually use the delta between the "current" one (from the quote) and the one stored on the DB to decide to reject a quote (if such delta is negative). I can open such PR as part of this enhancement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm... but in this case, the verifier already uses a nonce to avoid "replay" from a malicious agent, so this problem is already solved for the "online" attestation.

I was always talking about the offline attestation, and the modification of the stored log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm... but in this case, the verifier already uses a nonce to avoid "replay" from a malicious agent, so this problem is already solved for the "online" attestation.

I was always talking about the offline attestation, and the modification of the stored log.

I see, right right, an attacker that compromises the "time-series datastore" (in my mind, it was always orthogonal to the threat model for keylime). Being as it may, do you agree that relying on the timestamp of the quotes should cover this gap?

Copy link

@aplanas aplanas Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you agree that relying on the timestamp of the quotes should cover this gap?

That was also commented before. IMHO no, at least no directly.

In the trivial case:

add a new column in the DB for this, and actually use the delta between the "current" one (from the quote) and the one stored on the DB to decide to reject a quote (if such delta is negative)

In the first example, when Q1 is replicated in the log, delta would be 0 a no negative. So this invariant of negative deltas does no catch this issue.

But what worries me more (and here I am talking from the inexperience, as I did not dig enough in the details) is that the section of the specification referred before (10.11.1 TPMS_CLOCK_INFO) do not state that the clock is monotonically increasing, as is cleared during the tpm2_clear operation:

This value is reset to zero when the Storage Primary Seed is changed (TPM2_Clear()).

So at least the condition seems to be around that for the same AIK (because I guess that after a tpm2_clear we will have a new one), the clock delta should be > 0. When the AIK is different, this delta should be < 0.

But the rule can be more complex, as the clock value is persisted one per ~1hr. This suggest that if there is a reset during the same hour, we will have valid quotes with valid clocks that do not keep the invariant of delta > 0 (the clock value will repeat values after the reset). So we need to add into the equation some "reset" counter, that I am not aware of.

Edit: this same struct has resetCount and restartCount, that seems that can help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, it has to be non-zero, evidently. Good point (tpm2_print will extract all three already). So, for a given AIK (changed whenever the agent restarts), within a given resetCount and restartCount, we can assume that clock will monotonically increase, and only accept quotes with a positive non-zero delta (from the prior).

This could be encoded on the "online" verifier code (this is my preference) or we can do it directly on the new keylime_attest CLI.

For reference, here is an example output of tpm2_print

magic: ff544347
type: 8018
qualifiedSigner: 000b3e403b6694301264d3865771f07416228cc51fe86009a189a8cd27c4d3879979
extraData: 6e33545a583939614a35744b4f36544364424c48
clockInfo
    clock: 685618775
    resetCount: 0
    restartCount: 0
    safe: 1
firmwareVersion: 3636160023101920
attested
    quote:
        pcrSelect
            count: 1
            pcrSelections:
                0:
                    hash: 11 (sha256)
                    sizeofSelect: 3
                    pcrSelect: 010001
            pcrDigest: 3725b2fe95da5d7674076e709dec0cb8641f406a863759afab6d0dad73bcdb19\n"

@maugustosilva
Copy link
Contributor Author

big +1 from me. I am very supportive of this and myself and @lkatalin had been researching this with no knowledge that others had also had the same idea (which is a nice validation to see).

We also worked on the premise of using rekor, as its an OSS project with a vendor diverse community and stewardship under the OpenSSF. It's also such an incredibly good fit. It can accept customisable attestation manifests (json) , will validate signatures (for non-repudiation) and has an inbuilt time stamping service. Also with it being tamper resistant and auditable, it makes a really good candidate for offline attestations.

Not sure I see testing as highly problematic.

I would be happy to help with this.

Good to know that we weren't to only ones envisioning the need and willing to work on this feature @lukehinds ! My initial prototype does indeed use rekor as a transparency log to store the association between the EK and AIK. We can discuss further, or I can make it available for others to test (I am still refining before starting the PRs, as the implementation is functional but crude at the moment)

This should also include some proof that a given AIK created on a TPM by an
`agent` was indeed tied to a given EK, a process that is done by the
`registrar` and whose responsibility is to store it on a tamper-resistant
metadatastore (e.g. transparency log)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, can we provide a simple implementation in something self contained, like Sqlite, or even better, a plain text?

My point is that maybe this RFC should be about defining an API that Keylime expect from a driver to be implemented, and provide some drivers for plain text, sqlite, redis or rekor. In this RFC maybe we need to describe the security guarantees like that the open operation should validate that no change has been done in the storage.

Sometimes those guarantees should be delegated in the database itself, but other should be implemented in the driver (like signing each entry line, or during the close sign the full content and the metadata of the file ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe having a "plain text" implementation is a most welcome suggestion, and should go a long way in making it more testable. I will include that o the PR.

Comment on lines 131 to 136
- Add functionality on the `verifier` to record (in a time-series persistent
datastore) all the information needed to perform attestation standlone (i.e.,
quotes and MB/IMA logs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to assert that "system X was valid on this time period", should be enough to store the validation result of the verifier for this machine?

I mean, the good thing about your proposal is that the attestation can be replicated, as we store all the data. I guess that this store the quotes, the nonces and all that is required to re-create the same OK / BAD result. But this storage needs to be "secured", as was already mentioned. Meaning that we need to assure that no data for any windows time has been removed, or the data replicated so many times, etc.

Those guarantees about the non-data modification will make that storing a triplet like "{date, machine UUID, status}" feature wise similar.

Or I am missing something?

dictionary) from the `agent` - which will include both quotes and logs (MB
and IMA) - `agent` data (python dictionary) from the SQL database (internal to
Keylime) and the `agentAttestState` python object, combine it into a single
record and store it on the time-series persistent datastore.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this proposal will store marshaled Python objects? IMHO we should make the effort (if we indeed decide to store this data) to serialize in a more agnostic format (json, yaml). So a 3rd party software can still read it and replicate the attestation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update it :-) I've studied the code further and simple json is enough, no need to marshal/pickle anymore.

Comment on lines +235 to +242
- The first PR will provide the "persistent datastore" plugin capability, to be
called from with both `registrar` and `verifier` code. It will include a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with OP, IMHO there is more value of defining the API (open_datastore(), close_datastore(), is_valid(), insert_line(), etc) and the security guarantees.

The json serialization sure is OK, but it is the API (and the parameters required) the one that will make sure that all the data will be stored also in future versions. If not there is a risk that one of those internal state objects that we are storing change, and make the attestation impossible (or worse, fix the internal representation for ever).

Marcio Silva added 2 commits June 21, 2022 17:14
Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
Copy link

@lkatalin lkatalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write-up @maugustosilva and sorry I wasn't able to comment before now. @lukehinds is right, there is a high degree of intersection here with our team's work. For context, I have been thinking of this from the Rekor side, asking whether it makes sense to have a new type in Rekor to store information about Keylime attestations.

I have some questions about the enhancement and to check that I'm understanding:

  1. As this offline attestation may occur after the nodes in question no longer exist, is the idea that the external auditor would want to find all attestations related to a specific TPM/EK? (How do they decide which EKs are relevant to audit? Just curious about this.)

  2. I'm trying to understand what is pluggable vs. not vs. out of scope for the new API and CLI, and which components are talking to which others. Is this an accurate overview?

user submits EK< ==> new `keylime_attest` CLI <==> pluggable API supplied in config by user as Python modules <==> log, TSDB backends

So the CLI is user-facing and the logic for the API between Keylime and the log/TSDB backends are user-supplied?

  1. There is a clear separation of duties in this enhancement between the tamper-evident log for AIK/EK association and the time-series database for everything else. I have instead been thinking of storing all data/metadata in a log (specifically Rekor), and based on the discussion comments I also see need for having all data in a tamper-evident, append-only log for at least one threat model. Is there any reason that all the data in question here (ex. quote, boot logs, IMA logs, etc.) couldn't go into a transparency log along with the AIK/EK mapping if the user wanted? Admittedly I don't know much at all about TSDBs but I believe Rekor could support having a type or types that store all of this data and is queryable on something like the EK. This means the tamper-evident, immutable, etc. guarantees of Rekor would apply to all the data. For example, the data mentioned to check the AIK/EK association (with the signature and pubkey used by the registrar) is going into a TSDB in this enhancement but this sounds like exactly the type of data Rekor stores. Could / should the data just go into Rekor or at least should we allow for this option?

  2. I am trying to get a good overview of the totality of the data that should be stored in the log and/or TSDB (this is the data that would be stored in the potential new Rekor "Keylime type" if it makes sense to create this type). Is this an accurate list?

  • Association between AIK/EK
  • The association from above plus the registrar's signature on it and the pubkey used to sign (such that signature can be checked)
  • Quote, measured boot logs, IMA logs, and nonce for an offline attestation
  • Time info related to the Quote (ideally embedded in the Quote?) along with resetCount and restartCount values as discussed in comments
  • Anything else?

Comment on lines 223 to 224
record generated by the `registrar` to indicate the association between EK
and AIK is enough. Once this is done, offline attestation has basically the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
record generated by the `registrar` to indicate the association between EK
and AIK is enough. Once this is done, offline attestation has basically the
record generated by the `registrar` is enough to prove the association between EK
and AIK. Once this is done, offline attestation has basically the

Because I found myself asking: "Enough to do what?" with the original wording.

Provide Keylime with the ability to store all the required information to
perform a full attestation, in a persistent external time-series datastore.
This should also include some proof that a given AIK created on a TPM by an
`agent` was indeed tied to a given EK (from the TPM locate at the node where

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`agent` was indeed tied to a given EK (from the TPM locate at the node where
`agent` was indeed tied to a given EK (from the TPM located at the node where

This should also include some proof that a given AIK created on a TPM by an
`agent` was indeed tied to a given EK (from the TPM locate at the node where
it was running). This AIK/EK association process is done by the `registrar`
it will its responsibility to store a record of such association on a tamper-resistant

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it will its responsibility to store a record of such association on a tamper-resistant
it will be its responsibility to store a record of such association on a tamper-resistant


The main motivation for adding this functionality is to give auditors and other
compliance officers the ability to answer, with a proper degree of certainty
and trust the following question: did node N had its software stack fully

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and trust the following question: did node N had its software stack fully
and trust the following question: did node N have its software stack fully

The main motivation for adding this functionality is to give auditors and other
compliance officers the ability to answer, with a proper degree of certainty
and trust the following question: did node N had its software stack fully
attested at date D? Being date "D" a point time that could be well in the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
attested at date D? Being date "D" a point time that could be well in the
attested at date D? Being date "D" a point in time that could be well in the

every time an `agent` is restarted on Keylime.
- The `verifier` will be modified to take the `json_response` (python
dictionary) from the `agent` - which will include both quotes and logs (MB
and IMA) - `agent` data (python dictionary) from the SQL database (internal to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is agent data just the agent's UUID or is there other info to be included here?

start with the high-level sections and fill out details incrementally in
subsequent PRs.
-->
# enhancement-#40: TPM 2.0 Pre-Boot Event log support

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update title?

@maugustosilva
Copy link
Contributor Author

@lkatalin sorry for the delay in answering, I have been unwell for the past few days. I will try to address all points

  1. The offline attestation could occur years after a node has been decommissioned. It is my argument here that the public EK, the EK certificate (signed by the manufacturer) and a record of the linking between the EK and AIK (which was created by the no longer existing registrar) is enough to proceed with the attestation, in a way that is no less trustworthy than the "online" keylime

  2. More like this:

                                                     EK, EKCert, AIK
T1: new `keylime_attest` CLI <============= Datastore/Transparency Log

                                                    "Attestation Artifacts"
T2: new `keylime_attest` CLI <================== Datastore

T3: new `keylime_attest` CLI ==================> PASS/FAIL
  1. Very good question. I have been thinking about it myself, especially after my interactions with @aplanas. I am not sure if the transparency log is the right place for this massive amount of data (just imagine 10K nodes, with its TPM2 quotes, measured boot logs, IMA logs being written on the transparency log every 10 seconds), but I am amending my proposal to make the storing of all the "attestation artifacts" on the transparency log optional. The crux of the matter is: **I have no idea if Rekor can operationally support the amount of data produced by keylime **

  2. You got it, this is exactly what has been stored in my prototype implementation. The timing info comes straight from the quotes. I plan to make the TPM clock info on the quotes accessible by the "online" verifier too.

@lukehinds
Copy link
Member

Very good question. I have been thinking about it myself, especially after my interactions with @aplanas. I am not sure if the transparency log is the right place for this massive amount of data (just imagine 10K nodes, with its TPM2 quotes, measured boot logs, IMA logs being written on the transparency log every 10 seconds), but I am amending my proposal to make the storing of all the "attestation artifacts" on the transparency log optional. The crux of the matter is: **I have no idea if Rekor can operationally support the amount of data produced by keylime **

could you outline 'optional' and how this would be presented, something like a bool config value?

I agree that pushing data to a tlog every-time a quote is polled would be a huge volume. I instead imagine there could be a less frequent push than the standard quote frequency keylime has (every 10 minutes instead of every 0.5 seconds) or perhaps an initial bundle is pushed to show the state at first boot.

One factor to consider, is that rekor (as you mention that solution) supports sharding, so the tlog won't perpetually grow and slow down over time. @lkatalin developed that feature and so knows it well.

Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
@maugustosilva
Copy link
Contributor Author

@lukehinds @aplanas @lkatalin @THS-on Updated and refined the document a little bit more, hopefully addressing all the concerns, questions and comments thus far.

The goal of this enhancement is to provide Keylime with the ability to store
all the required information to perform a full attestation, in a persistent
external time-series like datastore.
This should also include some proof that a given AIK created on a TPM by an
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AIK is a pre 2.0 key AFAIK, the name since 2.0 is AK. Seems also that AIK can sign only a subset of objects and the new AK can sign. (Feel free to ignore this comment, as is very clear what is referring in the document)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should make this consistent with the TPM 2.0 spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, shall fix it on the text.

-->

- The `registrar` will be modified to, upon initial `agent` registration -
which includes the execution of `tpm2_makecredential` - record the EK, AIK
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be wrong here (please, forgive me if I am writing non-sense), but if we store the AK name, the public EK and the secret, the auditor cannot still validate the relation between both EK and AK, as would still require a connection to the node to recover the secret. The issue is that at this time is fairly possible that the AK is not valid anymore for this agent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. The (only) way you could verify the that the TPM2_ActivateCredential was not fraudulent is by attesting the AK again. No audit trail of the operations is useful when the agent and registrar are compromised.

Requirements: the TPM, private and public portion of the AK, the EK_pub, secret
Steps

  1. Generate MakeCredential blob EK_pub(secret, AK_name)
  2. Load the AK again on the TPM
  3. Let the TPM decrypt the MakeCredential blob to retrieve the secret
  4. Check if the secret is correct.

Storing the original secret, and makeCredntial blob is fine, but is this reproducible or does this structure contain some nonces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a crucial aspect of the whole endeavor, and I do thank you both to give the opportunity to clarify. No, it is not possible to re-execute the tpm2_makecredential. The (now I see, insufficiently clear) mais point here is: "everybody" involved will trust the registrar enough to consider that a signed JSON file with the AK, EK and EK cert signed by this component is proof of the association between AK and EK. While I freely grant that might not be acceptable in all cases, this is certainly so for our currently Production deployment (which provides a "proof of existence" of the eventual sufficiency of it).

dictionary) from the `agent` - which will include both TPM quotes and logs (MB
and IMA) - plus `agent` data (python dictionary which includes all columns stored)
from the SQL database, internal to Keylime), combine it into a single (python
dictionary record) and store it on a time-series datastore.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not like this: store a Python dictionary directly. I guess that will be serialized to something like json at the end (if so should be explicit).

But the main problem that I see is that we are storing an internal state of keylime. This internal state can change in the future, so maybe in the next keylime version will be storing something a little different, and in many releases we will have different formats.

IMHO should be only listed the fields that are useful for auditing, without relating them to any internal state (in this case the json_response)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "internal state" of Keylime is needed only in case an external third-party auditor wants to use the verifier code unmodified to perform offline attestation. Again, there is a real use case for this with our Production customers, but the fact we are storing more, not less data should not prevent other parties to write offline attestation code from scratch, simply ignoring any "Keylime-specific state"

stores) and the URLs for these these new proposed stores will be
supplied by the user as parameters under the `[general]` section:
`durable_attestation_import`, `persistent_store_url` and
`transparency_log_url`. The URL format is similar to the one already
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between a persistent store and a transparency log? Somehow in my brain both were used to store the attestation data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, very good question. I will let to others to refine and present formal definitions according to well founded terminology and epistemology :-) but for now will summarize as: a transparency log stores records of signature based on the contents of an object, while a persistent store holds the contents of the actual object.

signing also stored on tamper-resistant transparency log. The value of
this attribute - `signed_attributes` - could be set to `all`, which will
result in the `verifier` signing **everything**, including the
`json_response` data package from the `agent`, and storing a record on the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should not do that optional. There is only a fixed set of fields that are of utility for offline attestation. We should store that and no more.

The risks are: we can leak data, we can store different things in different times (potentially breaking the audit code), and we can take the risk of storing too few data that the ones required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of the contents of every single piece of data required by attestation would imply trust in the verifier, something that we are trying to avoid. Making it optional does not prevent a deployment in a scenario where this (trust in the verifier) is the case, while still accommodating "our" (IBM cloud) production scenario.

start with the high-level sections and fill out details incrementally in
subsequent PRs.
-->
# enhancement-#73: Durable (Offline) Attestation suppor in Keylime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# enhancement-#73: Durable (Offline) Attestation suppor in Keylime
# enhancement-#73: Durable (Offline) Attestation support in Keylime

- Three new parameters: the name of a python module to be dynamically
imported (which will contain code used to interact with these new proposed
stores) and the URLs for these these new proposed stores will be
supplied by the user as parameters under the `[general]` section:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those options should be in the respective registrar and verifier sections. We are trying to remove the general section.

persistent_store_url and transparency_log_url should be in a section defined by the plugin and not in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, will modify both the code and the description.

Signed-off-by: Marcio Silva <marcio.a.silva@ibm.com>
@THS-on THS-on mentioned this pull request Jul 6, 2022
6 tasks
@mpeters mpeters merged commit 7a4e4b4 into keylime:master Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants