Skip to content

Commit

Permalink
Update README and documentation
Browse files Browse the repository at this point in the history
Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>
  • Loading branch information
mihaimaruseac committed Oct 24, 2023
1 parent 857e2af commit 79bd3d3
Show file tree
Hide file tree
Showing 4 changed files with 146 additions and 16 deletions.
32 changes: 30 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,33 @@
# Contributing

We currently do not accept PRs.
Want to contribute? Great! First, read this page (including the small print at
the end).

### Before you contribute

Before we can use your code, you must sign the [Google Individual Contributor
License Agreement](https://cla.developers.google.com/about/google-individual)
(CLA), which you can do online. The CLA is necessary mainly because you own the
copyright to your changes, even after your contribution becomes part of our
codebase, so we need your permission to use and distribute your code. We also
need to be sure of various other things: for instance that you'll tell us if you
know that your code infringes on other people's patents. You don't have to sign
the CLA until after you've submitted your code for review and a member has
approved it, but you must do it before we can put your code into our codebase.

Before you start working on a larger contribution, you should get in touch with
us first through the issue tracker with your idea so that we can help out and
possibly guide you. Coordinating up front makes it much easier to avoid
frustration later on.

### Code reviews

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose.

### The small print

Contributions made by corporations are covered by a different agreement than the
one above, the [Software Grant and Corporate Contributor License
Agreement](https://cla.developers.google.com/about/google-corporate).

If you have a question or a feature request, please open an issue on the repository.
75 changes: 62 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,40 +7,89 @@
<!-- toc -->

- [Overview](#overview)
- [Status](#status)
- [Projects](#projects)
- [Model Signing](#model-signing)
- [SLSA for ML](#slsa-for-ml)
- [Status](#status)
- [Contributing](#contributing)

<!-- tocstop -->

## Overview

This repository will host a collection of utilities and examples related to the security of machine learning pipelines. The focus is on providing *verifiable* claims about the integrity and provenance of the resulting models, meaning users can check for themselves that these claims are true rather than having to just trust the model trainer.
There is a significant growth in the number of ML-powered applications. However,
this also provides grounds for attackers to exploit unsuspecting ML users. This
is why Google launched [Secure AI Framework (SAIF)][saif] to help chart a path
towards creating trustworhty AI applications. The first principle of SAIF is

## Status
> Expand strong security foundations to the AI ecosystem
This is not an officially supported Google product.
Building on the work with [Open Source security Foundation][openssf] we are
creating this repository to prove how the ML supply chain can be strengthen in
_the same way_ as the traditional software supply chain.

This project is currently in alpha. We may make breaking changes until the first official release. All code should be viewed as experimental and should not be used in any production environment.
This repository hosts a collection of utilities and examples related to the
security of machine learning pipelines. The focus is on providing *verifiable*
claims about the integrity and provenance of the resulting models, meaning users
can check for themselves that these claims are true rather than having to just
trust the model trainer.

## Projects

Currently, there are 2 main projects in the repository: model signing (to
prevent tampering of models after publication to ML model hubs) and
[SLSA](https://slsa.dev/) (to prevent tampering of models during the build
process).

### Model Signing

This project demonstrates how to protect the integrity of a model by signing it with [Sigstore](https://www.sigstore.dev/), a tool for making code signatures transparent without requiring key maintenance. When users download a given version of a signed model they can check that the signature comes from a known or trusted identity and thus that the model hasn't been tampered with after training.
This project demonstrates how to protect the integrity of a model by signing it
with [Sigstore](https://www.sigstore.dev/), a tool for making code signatures
transparent without requiring key maintenance.

When users download a given version of a signed model they can check that the
signature comes from a known or trusted identity and thus that the model hasn't
been tampered with after training.

We are able to sign large models with very good performance, as the following
table shows:

| Model | Size | Sign Time | Verify Time |
|--------------------|-------|:----------:|:-----------:|
| roberta-base-11 | 8K | 1s | 0.6s |
| hustvl/YOLOP | 215M | 1s | 1s |
| bertseq2seq | 2.8G | 1.9s | 1.4s |
| bert-base-uncased | 3.3G | 1.6s | 1.1s |
| tiiuae/falcon-7b | 14GB | 2.1s | 1.8s |

See [model_signing/README.md](model_signing/README.md) for more information.

#### Verifying signing events
### SLSA for ML

To protect the supply chain of traditional software against tampering (like in
the [Solarwinds attack][solarwinds]), we can generate SLSA provenance, for
example by using the [SLSA L3 GitHub generator][slsa-generator].

This projects shows how we can use the same generator for training models via
GitHub Actions. While most of the ML models are too expensive to train in such a
fashion, this is a proof of concept to prove that _the same traditional software
supply chain protections can be applied to ML_. Future work will involve
covering training ML models that require access to accelerators (i.e., GPUs,
TPUs) or that require multiple hours for training.

Signing events are recorded to Sigstore's append-only transparency log. Transparency logs make signing events discoverable, so that model signers
can monitor the logs and determine if any signing events are unexpected. During verification, model verifiers will verify a proof of inclusion from the log,
which is handled by the model signing library.
See [slsa_for_models/README.md](slsa_for_models/README.md) for more information.

Model signers should monitor for occurences of their signing identity in the log. Sigstore is actively developing a [log monitor](https://github.com/sigstore/rekor-monitor)
that runs on GitHub Actions.
## Status

This project is currently in alpha. We may make breaking changes until the first
official release. All code should be viewed as experimental and should not be
used in any production environment.

## Contributing

We are not accepting PRs at this point in time. Please see the [Contributor Guide](CONTRIBUTING.md) for more information.
Please see the [Contributor Guide](CONTRIBUTING.md) for more information.

[saif]: https://blog.google/technology/safety-security/introducing-googles-secure-ai-framework/
[openssf]: https://openssf.org/
[slsa-generator]: https://github.com/slsa-framework/slsa-github-generator
[solarwinds]: https://www.techtarget.com/whatis/feature/SolarWinds-hack-explained-Everything-you-need-to-know
18 changes: 17 additions & 1 deletion model_signing/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
# Model Signing

This project demonstrates how to protect the integrity of a model by signing it with [Sigstore](https://www.sigstore.dev/).
This project demonstrates how to protect the integrity of a model by signing it
with [Sigstore](https://www.sigstore.dev/), a tool for making code signatures
transparent without requiring key maintenance.

When users download a given version of a signed model they can check that the
signature comes from a known or trusted identity and thus that the model hasn't
been tampered with after training.

Signing events are recorded to Sigstore's append-only transparency log.
Transparency logs make signing events discoverable, so that model signers can
monitor the logs and determine if any signing events are unexpected. During
verification, model verifiers will verify a proof of inclusion from the log,
which is handled by the model signing library.

Model signers should monitor for occurences of their signing identity in the
log. Sigstore is actively developing a [log
monitor](https://github.com/sigstore/rekor-monitor) that runs on GitHub Actions.

## Installation and usage

Expand Down
37 changes: 37 additions & 0 deletions slsa_for_models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# SLSA for Models

To protect the supply chain of traditional software against tampering (like in
the [Solarwinds attack][solarwinds]), we can generate [SLSA][slsa] provenance,
for example by using the [SLSA L3 GitHub generator][slsa-generator].

This projects shows how we can use the same generator for training models via
GitHub Actions. While most of the ML models are too expensive to train in such a
fashion, this is a proof of concept to prove that _the same traditional software
supply chain protections can be applied to ML_. Future work will involve
covering training ML models that require access to accelerators (i.e., GPUs,
TPUs) or that require multiple hours for training.

When users download a given version of a model they can also check its
provenance by using [the SLSA verifier][slsa-verifier] repository. This can be
done automatically: for example the model serving pipeline could validate
provenance for all new models before serving them. The verification can also be
done manually, on demand.

As an additional benefit, having a provenance for a model allows users to react
to vulnerabilities in a training framework: they can quickly determine if a
model needs to be retrained because it was created using the vulnerable version.

## Usage

TODO: Display how to run the action in the repo, show an example with images on
how to trigger workflow, show how to run the verifier manually

## Benchmarking

TODO: Table discussing performance of generating provenance for models, in
various formats, based on the running the GitHub acctions

[slsa-generator]: https://github.com/slsa-framework/slsa-github-generator
[solarwinds]: https://www.techtarget.com/whatis/feature/SolarWinds-hack-explained-Everything-you-need-to-know
[slsa]: https://slsa.dev
[slsa-verifier]: https://github.com/slsa-framework/slsa-verifier/

0 comments on commit 79bd3d3

Please sign in to comment.