From 12b8a5e03700ebc9d2d2cde6dadbee5c703aefd2 Mon Sep 17 00:00:00 2001 From: marinamoore Date: Wed, 20 Nov 2019 10:21:28 -0800 Subject: [PATCH 1/3] remove sha256 and redo metadata overhead calculations --- pep-0458.txt | 93 +++++++++++++++++++++++++--------------------------- 1 file changed, 44 insertions(+), 49 deletions(-) diff --git a/pep-0458.txt b/pep-0458.txt index b53d39e3e45..4375669bd99 100644 --- a/pep-0458.txt +++ b/pep-0458.txt @@ -307,9 +307,9 @@ kinds of metadata RECOMMENDED for PyPI. __ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/METADATA.md -In addition, all target files SHOULD be available on disk at least three times. +In addition, all target files SHOULD be available on disk at least two times. Once under their original filename, to provide backwards compatibility, and -twice with their SHA-256 and SHA-512 hash respectively included in their +twice with their SHA-512 hash included in their filename. This is required to produce `Consistent Snapshots`_. Depending on the used file system different data deduplication mechanisms MAY @@ -321,7 +321,7 @@ PyPI and TUF Metadata TUF metadata provides information that clients can use to make update decisions. For example, a *targets* metadata lists the available target files -on PyPI and includes the required signatures, cryptographic hashes, and +on PyPI and includes the required signatures, cryptographic hash, and file sizes for each. Different metadata files provide different information, which are signed by separate roles. The *root* role indicates what metadata belongs to each role. The concept of roles allows TUF to delegate responsibilities @@ -345,20 +345,19 @@ roles used in TUF. Figure 1: An overview of the TUF roles. Unless otherwise specified, this PEP RECOMMENDS that every metadata or -target file be hashed using both the SHA2-256 and SHA2-512 functions of +target file be hashed using the SHA2-512 function of the `SHA-2`__ family. SHA-2 has native and well-tested Python 2 and 3 support (allowing for verification of these hashes without additional, -non-Python dependencies), and using both functions should provide -sufficient protection against `collision attacks`__ for the foreseeable -future. However, this assumes that a collision attack for SHA2-256 does -not easily translate to SHA2-512. If stronger security guarantees are -required, then SHA2-256 and `SHA3-256`__ MAY be used instead, since they -are based on very different designs from each other. However, SHA-3 +non-Python dependencies). If stronger security guarantees are +required, then both SHA2-256 and SHA2-512 or both SHA2-256 and `SHA3-256`__ +MAY be used instead. SHA2-256 and SHA3-256 +are based on very different designs from each other, providing extra protection +against `collision attacks`__. However, SHA-3 requires installing additional, non-Python dependencies for `Python 2`__. __ https://en.wikipedia.org/wiki/SHA-2 -__ https://en.wikipedia.org/wiki/Collision_attack __ https://en.wikipedia.org/wiki/SHA-3 +__ https://en.wikipedia.org/wiki/Collision_attack __ https://pip.pypa.io/en/latest/development/release-process/#python-2-support @@ -509,13 +508,13 @@ __ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/TUTORIAL.md#deleg Based on our findings as of the time this document was updated for implementation (Nov 7 2019), summarized in Tables 1-2, PyPI SHOULD split all targets in the *bins* role by delegating them to 16,384 -*bin-n* roles (see C11 in Table 1). Each *bin-n* role would sign -for the PyPI targets whose SHA2-256 hashes fall into that bin +*bin-n* roles (see C10 in Table 1). Each *bin-n* role would sign +for the PyPI targets whose SHA2-512 hashes fall into that bin (see and Figure 2 and `Consistent Snapshots`_). It was found that this number of bins would result in a 6-10% metadata overhead -(relative to the average size of downloaded distribution files; see V14 and -V16 in Table 2) for returning users, and a 70% overhead for new -users who are installing pip for the first time (see V18 in Table 2). +(relative to the average size of downloaded distribution files; see V13 and +V15 in Table 2) for returning users, and a 70% overhead for new +users who are installing pip for the first time (see V17 in Table 2). A few assumptions used in calculating these metadata overhead percentages: @@ -526,31 +525,29 @@ A few assumptions used in calculating these metadata overhead percentages: +------+--------------------------------------------------+-----------+ | Name | Description | Value | +------+--------------------------------------------------+-----------+ -| C1 | # of bytes in a SHA2-256 hexadecimal digest | 64 | +| C1 | # of bytes in a SHA2-512 hexadecimal digest | 128 | +------+--------------------------------------------------+-----------+ -| C2 | # of bytes in a SHA2-512 hexadecimal digest | 128 | +| C2 | # of bytes for a SHA2-512 public key ID | 64 | +------+--------------------------------------------------+-----------+ -| C3 | # of bytes for a SHA2-256 public key ID | 64 | +| C3 | # of bytes for an Ed25519 signature | 128 | +------+--------------------------------------------------+-----------+ -| C4 | # of bytes for an Ed25519 signature | 128 | +| C4 | # of bytes for an Ed25519 public key | 64 | +------+--------------------------------------------------+-----------+ -| C5 | # of bytes for an Ed25519 public key | 64 | +| C5 | # of bytes for a target relative file path | 256 | +------+--------------------------------------------------+-----------+ -| C6 | # of bytes for a target relative file path | 256 | +| C6 | # of bytes to encode a target file size | 7 | +------+--------------------------------------------------+-----------+ -| C7 | # of bytes to encode a target file size | 7 | +| C7 | # of bytes to encode a version number | 6 | +------+--------------------------------------------------+-----------+ -| C8 | # of bytes to encode a version number | 6 | +| C8 | # of targets (simple indices and distributions) | 2,273,539 | +------+--------------------------------------------------+-----------+ -| C9 | # of targets (simple indices and distributions) | 2,273,539 | +| C9 | Average # of bytes for a downloaded distribution | 2,184,393 | +------+--------------------------------------------------+-----------+ -| C10 | Average # of bytes for a downloaded distribution | 2,184,393 | -+------+--------------------------------------------------+-----------+ -| C11 | # of bins | 16,384 | +| C10 | # of bins | 16,384 | +------+--------------------------------------------------+-----------+ -C9 by computed querying the number of release files. -C10 was derived by taking the average between a rough estimate of the average +C8 by computed querying the number of release files. +C9 was derived by taking the average between a rough estimate of the average size of release files *downloaded* over the past 31 days (1,628,321 bytes), and the average size of releases files on disk (2,740,465 bytes). Ernest W. Durbin III helped to provide these numbers on November 7, 2019. @@ -560,41 +557,39 @@ Table 1: A list of constants used to calculate metadata overhead. +------+------------------------------------------------------------------------------------+------------------------------+-----------+ | Name | Description | Formula | Value | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V1 | Length of a path hash prefix | math.ceil(math.log(C11, 16)) | 4 | +| V1 | Length of a path hash prefix | math.ceil(math.log(C10, 16)) | 4 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ | V2 | Total # of path hash prefixes | 16**V1 | 65,536 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V3 | Avg # of targets per bin | math.ceil(C9/C11) | 139 | -+------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V4 | Avg size of SHA-256 hashes per bin | V3*C1 | 8,896 | +| V3 | Avg # of targets per bin | math.ceil(C8/C10) | 139 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V5 | Avg size of SHA-512 hashes per bin | V3*C2 | 17,792 | +| V4 | Avg size of SHA-512 hashes per bin | V3*C1 | 17,792 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V6 | Avg size of target paths per bin | V3*C6 | 35,584 | +| V5 | Avg size of target paths per bin | V3*C5 | 35,584 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V7 | Avg size of lengths per bin | V3*C7 | 973 | +| V6 | Avg size of lengths per bin | V3*C6 | 973 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V8 | Avg size of bin-n metadata (bytes) | V4+V5+V6+V7 | 63,245 | +| V7 | Avg size of bin-n metadata (bytes) | V4+V5+V76 | 54,349 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V9 | Total size of public key IDs in bins | C11*C3 | 1,048,576 | +| V8 | Total size of public key IDs in bins | C10*C2 | 1,048,576 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V10 | Total size of path hash prefixes in bins | V1*V2 | 262,144 | +| V9 | Total size of path hash prefixes in bins | V1*V2 | 262,144 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V11 | Est. size of bins metadata (bytes) | V9+V10 | 1,310,720 | +| V10 | Est. size of bins metadata (bytes) | V8+V9 | 1,310,720 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V12 | Est. size of snapshot metadata (bytes) | C11*C8 | 98,304 | +| V11 | Est. size of snapshot metadata (bytes) | C10*C7 | 98,304 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V13 | Est. size of metadata overhead per distribution per returning user (same snapshot) | 2*V8 | 126,490 | +| V12 | Est. size of metadata overhead per distribution per returning user (same snapshot) | 2*V7 | 108,698 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V14 | Est. metadata overhead per distribution per returning user (same snapshot) | round((V13/C10)*100) | 6% | +| V13 | Est. metadata overhead per distribution per returning user (same snapshot) | round((V12/C9)*100) | 5% | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V15 | Est. size of metadata overhead per distribution per returning user (diff snapshot) | V13+V12 | 224,794 | +| V14 | Est. size of metadata overhead per distribution per returning user (diff snapshot) | V12+V11 | 207,002 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V16 | Est. metadata overhead per distribution per returning user (diff snapshot) | round((V15/C10)*100) | 10% | +| V15 | Est. metadata overhead per distribution per returning user (diff snapshot) | round((V14/C9)*100) | 9% | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V17 | Est. size of metadata overhead per distribution per new user | V15+V11 | 1,535,514 | +| V16 | Est. size of metadata overhead per distribution per new user | V14+V10 | 1,517,722 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V18 | Est. metadata overhead per distribution per new user | round((V17/C10)*100) | 70% | +| V17 | Est. metadata overhead per distribution per new user | round((V16/C9)*100) | 69% | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ Table 2: Estimated metadata overheads for new and returning users. @@ -829,7 +824,7 @@ version of the *snapshot* metadata, which in turn lists the versions of the snapshot. The *targets* or delegated targets metadata refer to the actual target -files, including all of their cryptographic hashes as specified above. +files, including their cryptographic hashes as specified above. Thus, to mark a target file as part of a consistent snapshot it MUST, when written to disk, include its hash in its filename: From 534f946d2b620dfd835cf34e93437c5a0ba7b79f Mon Sep 17 00:00:00 2001 From: mnm678 Date: Wed, 20 Nov 2019 14:37:44 -0500 Subject: [PATCH 2/3] Update pep-0458.txt Co-Authored-By: Joshua Lock --- pep-0458.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0458.txt b/pep-0458.txt index 4375669bd99..a2b4e57e1ff 100644 --- a/pep-0458.txt +++ b/pep-0458.txt @@ -309,7 +309,7 @@ __ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/METADATA.md In addition, all target files SHOULD be available on disk at least two times. Once under their original filename, to provide backwards compatibility, and -twice with their SHA-512 hash included in their +once with their SHA-512 hash included in their filename. This is required to produce `Consistent Snapshots`_. Depending on the used file system different data deduplication mechanisms MAY From 9c12c3cb7f85351ea52409746b1f802d419c5f5d Mon Sep 17 00:00:00 2001 From: mnm678 Date: Thu, 21 Nov 2019 11:57:42 -0500 Subject: [PATCH 3/3] Update pep-0458.txt Co-Authored-By: lukpueh --- pep-0458.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0458.txt b/pep-0458.txt index a2b4e57e1ff..18d00bd2a9a 100644 --- a/pep-0458.txt +++ b/pep-0458.txt @@ -569,7 +569,7 @@ Table 1: A list of constants used to calculate metadata overhead. +------+------------------------------------------------------------------------------------+------------------------------+-----------+ | V6 | Avg size of lengths per bin | V3*C6 | 973 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ -| V7 | Avg size of bin-n metadata (bytes) | V4+V5+V76 | 54,349 | +| V7 | Avg size of bin-n metadata (bytes) | V4+V5+V6 | 54,349 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+ | V8 | Total size of public key IDs in bins | C10*C2 | 1,048,576 | +------+------------------------------------------------------------------------------------+------------------------------+-----------+