Update README and documentation

Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>
sigstore · Oct 24, 2023 · 79bd3d3 · 79bd3d3
1 parent 857e2af
commit 79bd3d3
Show file tree

Hide file tree

Showing 4 changed files with 146 additions and 16 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,5 +1,33 @@
 # Contributing
 
-We currently do not accept PRs. 
+Want to contribute? Great! First, read this page (including the small print at
+the end).
+
+### Before you contribute
+
+Before we can use your code, you must sign the [Google Individual Contributor
+License Agreement](https://cla.developers.google.com/about/google-individual)
+(CLA), which you can do online. The CLA is necessary mainly because you own the
+copyright to your changes, even after your contribution becomes part of our
+codebase, so we need your permission to use and distribute your code. We also
+need to be sure of various other things: for instance that you'll tell us if you
+know that your code infringes on other people's patents. You don't have to sign
+the CLA until after you've submitted your code for review and a member has
+approved it, but you must do it before we can put your code into our codebase.
+
+Before you start working on a larger contribution, you should get in touch with
+us first through the issue tracker with your idea so that we can help out and
+possibly guide you. Coordinating up front makes it much easier to avoid
+frustration later on.
+
+### Code reviews
+
+All submissions, including submissions by project members, require review. We
+use GitHub pull requests for this purpose.
+
+### The small print
+
+Contributions made by corporations are covered by a different agreement than the
+one above, the [Software Grant and Corporate Contributor License
+Agreement](https://cla.developers.google.com/about/google-corporate).
 
-If you have a question or a feature request, please open an issue on the repository.
diff --git a/README.md b/README.md
@@ -7,40 +7,89 @@
 <!-- toc -->
 
 - [Overview](#overview)
-- [Status](#status)
 - [Projects](#projects)
   - [Model Signing](#model-signing)
+  - [SLSA for ML](#slsa-for-ml)
+- [Status](#status)
 - [Contributing](#contributing)
 
 <!-- tocstop -->
 
 ## Overview
 
-This repository will host a collection of utilities and examples related to the security of machine learning pipelines. The focus is on providing *verifiable* claims about the integrity and provenance of the resulting models, meaning users can check for themselves that these claims are true rather than having to just trust the model trainer.
+There is a significant growth in the number of ML-powered applications. However,
+this also provides grounds for attackers to exploit unsuspecting ML users. This
+is why Google launched [Secure AI Framework (SAIF)][saif] to help chart a path
+towards creating trustworhty AI applications. The first principle of SAIF is
 
-## Status
+> Expand strong security foundations to the AI ecosystem
 
-This is not an officially supported Google product.
+Building on the work with [Open Source security Foundation][openssf] we are
+creating this repository to prove how the ML supply chain can be strengthen in
+_the same way_ as the traditional software supply chain.
 
-This project is currently in alpha. We may make breaking changes until the first official release. All code should be viewed as experimental and should not be used in any production environment.
+This repository hosts a collection of utilities and examples related to the
+security of machine learning pipelines. The focus is on providing *verifiable*
+claims about the integrity and provenance of the resulting models, meaning users
+can check for themselves that these claims are true rather than having to just
+trust the model trainer.
 
 ## Projects
 
+Currently, there are 2 main projects in the repository: model signing (to
+prevent tampering of models after publication to ML model hubs) and
+[SLSA](https://slsa.dev/) (to prevent tampering of models during the build
+process).
+
 ### Model Signing
 
-This project demonstrates how to protect the integrity of a model by signing it with [Sigstore](https://www.sigstore.dev/), a tool for making code signatures transparent without requiring key maintenance.  When users download a given version of a signed model they can check that the signature comes from a known or trusted identity and thus that the model hasn't been tampered with after training.
+This project demonstrates how to protect the integrity of a model by signing it
+with [Sigstore](https://www.sigstore.dev/), a tool for making code signatures
+transparent without requiring key maintenance.
+
+When users download a given version of a signed model they can check that the
+signature comes from a known or trusted identity and thus that the model hasn't
+been tampered with after training.
+
+We are able to sign large models with very good performance, as the following
+table shows:
+
+| Model              | Size  |  Sign Time | Verify Time |
+|--------------------|-------|:----------:|:-----------:|
+| roberta-base-11    | 8K    | 1s         | 0.6s        |
+| hustvl/YOLOP       | 215M  | 1s         | 1s          |
+| bertseq2seq        | 2.8G  | 1.9s       | 1.4s        |
+| bert-base-uncased  | 3.3G  | 1.6s       | 1.1s        |
+| tiiuae/falcon-7b   | 14GB  | 2.1s       | 1.8s        |
 
 See [model_signing/README.md](model_signing/README.md) for more information.
 
-#### Verifying signing events
+### SLSA for ML
+
+To protect the supply chain of traditional software against tampering (like in
+the [Solarwinds attack][solarwinds]), we can generate SLSA provenance, for
+example by using the [SLSA L3 GitHub generator][slsa-generator].
+
+This projects shows how we can use the same generator for training models via
+GitHub Actions. While most of the ML models are too expensive to train in such a
+fashion, this is a proof of concept to prove that _the same traditional software
+supply chain protections can be applied to ML_. Future work will involve
+covering training ML models that require access to accelerators (i.e., GPUs,
+TPUs) or that require multiple hours for training.
 
-Signing events are recorded to Sigstore's append-only transparency log. Transparency logs make signing events discoverable, so that model signers
-can monitor the logs and determine if any signing events are unexpected. During verification, model verifiers will verify a proof of inclusion from the log,
-which is handled by the model signing library.
+See [slsa_for_models/README.md](slsa_for_models/README.md) for more information.
 
-Model signers should monitor for occurences of their signing identity in the log. Sigstore is actively developing a [log monitor](https://github.com/sigstore/rekor-monitor)
-that runs on GitHub Actions.
+## Status
+
+This project is currently in alpha. We may make breaking changes until the first
+official release. All code should be viewed as experimental and should not be
+used in any production environment.
 
 ## Contributing
 
-We are not accepting PRs at this point in time. Please see the [Contributor Guide](CONTRIBUTING.md) for more information. 
+Please see the [Contributor Guide](CONTRIBUTING.md) for more information.
+
+[saif]: https://blog.google/technology/safety-security/introducing-googles-secure-ai-framework/
+[openssf]: https://openssf.org/
+[slsa-generator]: https://github.com/slsa-framework/slsa-github-generator
+[solarwinds]: https://www.techtarget.com/whatis/feature/SolarWinds-hack-explained-Everything-you-need-to-know
diff --git a/model_signing/README.md b/model_signing/README.md
@@ -1,6 +1,22 @@
 # Model Signing
 
-This project demonstrates how to protect the integrity of a model by signing it with [Sigstore](https://www.sigstore.dev/).
+This project demonstrates how to protect the integrity of a model by signing it
+with [Sigstore](https://www.sigstore.dev/), a tool for making code signatures
+transparent without requiring key maintenance.
+
+When users download a given version of a signed model they can check that the
+signature comes from a known or trusted identity and thus that the model hasn't
+been tampered with after training.
+
+Signing events are recorded to Sigstore's append-only transparency log.
+Transparency logs make signing events discoverable, so that model signers can
+monitor the logs and determine if any signing events are unexpected. During
+verification, model verifiers will verify a proof of inclusion from the log,
+which is handled by the model signing library.
+
+Model signers should monitor for occurences of their signing identity in the
+log. Sigstore is actively developing a [log
+monitor](https://github.com/sigstore/rekor-monitor) that runs on GitHub Actions.
 
 ## Installation and usage
 

diff --git a/slsa_for_models/README.md b/slsa_for_models/README.md
@@ -0,0 +1,37 @@
+# SLSA for Models
+
+To protect the supply chain of traditional software against tampering (like in
+the [Solarwinds attack][solarwinds]), we can generate [SLSA][slsa] provenance,
+for example by using the [SLSA L3 GitHub generator][slsa-generator].
+
+This projects shows how we can use the same generator for training models via
+GitHub Actions. While most of the ML models are too expensive to train in such a
+fashion, this is a proof of concept to prove that _the same traditional software
+supply chain protections can be applied to ML_. Future work will involve
+covering training ML models that require access to accelerators (i.e., GPUs,
+TPUs) or that require multiple hours for training.
+
+When users download a given version of a model they can also check its
+provenance by using [the SLSA verifier][slsa-verifier] repository. This can be
+done automatically: for example the model serving pipeline could validate
+provenance for all new models before serving them. The verification can also be
+done manually, on demand.
+
+As an additional benefit, having a provenance for a model allows users to react
+to vulnerabilities in a training framework: they can quickly determine if a
+model needs to be retrained because it was created using the vulnerable version.
+
+## Usage
+
+TODO: Display how to run the action in the repo, show an example with images on
+how to trigger workflow, show how to run the verifier manually
+
+## Benchmarking
+
+TODO: Table discussing performance of generating provenance for models, in
+various formats, based on the running the GitHub acctions
+
+[slsa-generator]: https://github.com/slsa-framework/slsa-github-generator
+[solarwinds]: https://www.techtarget.com/whatis/feature/SolarWinds-hack-explained-Everything-you-need-to-know
+[slsa]: https://slsa.dev
+[slsa-verifier]: https://github.com/slsa-framework/slsa-verifier/