Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content hashing of archives #3482

Merged
merged 57 commits into from
Oct 18, 2022
Merged

Content hashing of archives #3482

merged 57 commits into from
Oct 18, 2022

Conversation

sigurdm
Copy link
Contributor

@sigurdm sigurdm commented Jul 5, 2022

Fix of #2462

  • Compute a hash of each downloaded archive and store it in: $PUB_CACHE/hosted/<hosted-url>/.hashes/<package>-<version>.sha256 (details here still subject to change)

  • New optional field in the package listing api for the server to provide the content-hash. If that is provided - it is verified against the downloaded archive.

  • When writing a pubspec.lock file, the sha256 is included in the description of each hosted package.

  • On pub get If the description of a package from pubspec.lock doesn't match the one in the cache, the archive is redownloaded - if the hash still doesn't match, the resolution fails with an error.

  • Has been moved to a follow-up PR Introduce a new option dart pub get --enforce-lockfile A mode that will NOT modify pubspec.lock. That means:

    • won't add hashes if missing,
    • will refuse to resolve if pubspec.yaml isn't satisfied,
    • will refuse to resolve if hashes don't match cached hashes.
    • will refuse to resolve if pubspec.lock is missing
    • will verify that the extracted package content matches the contents of the original archive.

    This is useful when deploying to production.
    Fixes: dart pub get --pristine/--locked #2890 and locked option in pubspec.yaml #2905

An unfortunate side-effect of this change is that all already downloaded packages will be re-downloaded (because we don't store the archives, only the extracted files) to compute their hashes.

@sigurdm
Copy link
Contributor Author

sigurdm commented Jul 5, 2022

The basic functionality is here.

We could use some work on the error-messages. Maybe other details need to be fixed.

@sigurdm sigurdm requested review from jonasfj and szakarias July 5, 2022 12:39
@sigurdm
Copy link
Contributor Author

sigurdm commented Jul 5, 2022

From the test failures and a bit of debugging it seems that gzip.encode from dart:io is not stable across platforms.

import 'dart:io';

main() {
  print(gzip.encode([
    112,
    117,
    98,
    115,
    112,
    101,
    99,
    46,
    121,
    97,
    109,
    108,
    0,
    0,
  ]));
}

Prints:

[31, 139, 8, 0, 0, 0, 0, 0, 0, 3, 43, 40, 77, 42, 46, 72, 77, 214, 171, 76, 204, 205, 97, 96, 0, 0, 244, 120, 3, 30, 14, 0, 0, 0] # on my linux workstation
[31, 139, 8, 0, 0, 0, 0, 0, 0, 19, 43, 40, 77, 42, 46, 72, 77, 214, 171, 76, 204, 205, 97, 96, 0, 0, 244, 120, 3, 30, 14, 0, 0, 0] # on my mac

This makes testing harder. We'll have to consider how best to test this. Maybe we can find a gzip-encoder written in dart to use for testing...

@sigurdm
Copy link
Contributor Author

sigurdm commented Jul 5, 2022

Reading https://www.rfc-editor.org/rfc/rfc1952.html it seems there is a OS field in gzip. We might be able to just filter that away. Also there is an mtime stamp, that luckily it seems dart:io doesn't fill in.

@jonasfj
Copy link
Member

jonasfj commented Jul 12, 2022

On pub get If the description of a package from pubspec.lock doesn't match the one in the cache, the archive is redownloaded - if the hash still doesn't match, the resolution fails with an error.

I haven't verified if the code actually does this, but as recall we wanted it to update pubspec.lock (and print a warning) if the server version listing returns a different sha256.

update: Ah, maybe I'm reading it wrong. If the pubspec.lock has the same hash as version listing and the downloaded archives doesn't then yes we should fail.

Copy link
Member

@jonasfj jonasfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments, I'm not all done. Let's walk through it when you get back.


final _hexTable2 = Map.fromIterables(_hexTable, List.generate(16, (i) => i));

String hexEncode(List<int> bytes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex encode/decode is already in package:convert which is in DEPS, so I think we can just use that without too many complications.

See hex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lib/src/command/get.dart Outdated Show resolved Hide resolved
Comment on lines -144 to +145
PackageId id,
PackageRef ref,
Version version,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed this from PackageId to PackageRef + Version, because the sha256 is part of the description that is necessary to create the PackageId, correct?

Should we instead fix the root of this problem and make doGetVersions return a PackageVersionEntry or something else?

I guess I'm trying to ask if maybe we should do away with PackageStatus as a per-version entity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't see any places where this is called without using an id.toRef() and id.version as arguments, am I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this is a bit annoying - let's discuss it when you're back.

lib/src/source/cached.dart Show resolved Hide resolved
lib/src/lock_file.dart Show resolved Hide resolved
lib/src/lock_file.dart Outdated Show resolved Hide resolved
log.fine('No hash of downloaded archive found');
} else if (!bytesEquals(cachedHash, description.sha256)) {
dataError(
'Cache entry for $name-${package.version} does not have content-hash matching lockfile.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we suggest running dart pub cache repair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I added a couple of suggestions here - but let's go over all the messaging together when you're back.

lib/src/lock_file.dart Outdated Show resolved Hide resolved
lib/src/lock_file.dart Outdated Show resolved Hide resolved
lib/src/entrypoint.dart Outdated Show resolved Hide resolved
doc/repository-spec-v2.md Outdated Show resolved Hide resolved
Comment on lines 60 to 62
if (argResults['dry-run'] && argResults['enforce-lockfile']) {
fail('Cannot do a dry-run with enforce-lockfile.');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--dry-run with --enforce-lockfile is fine. In this case it should do a resolution and check that nothing in the lock file changed.
We won't need to download archives, but we could opt to check that content-hashes match what is one the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was disallowed because we can only validate the content-hashes after downloading the archives, and dry-run did not download the archives. This has changed now, so now we can indeed have the two options together.

@@ -28,6 +29,9 @@ class GetCommand extends PubCommand {
negatable: false,
help: "Report what dependencies would change but don't change any.");

argParser.addFlag('enforce-lockfile',
negatable: false, help: 'Only use resolution from existing pubspec.lock.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
negatable: false, help: 'Only use resolution from existing pubspec.lock.');
negatable: false, help: 'Fail resolution if pubspec.lock does not satisfy pubspec.yaml.');

or

Suggested change
negatable: false, help: 'Only use resolution from existing pubspec.lock.');
negatable: false, help: 'Enforce pubspec.lock, and fail resolution if it does not satisfy pubspec.yaml.');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better - thanks!

},
);
}

/// Replaces the 9th entry in [stream] with a 0. This replaces the os entry
/// of a gzip stream, giving us the same stream on all platforms.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point to spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

},
);
}

/// Replaces the 9th entry in [stream] with a 0. This replaces the os entry
/// of a gzip stream, giving us the same stream on all platforms.
Stream<List<int>> replaceOs(Stream<List<int>> stream) async* {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.rfc-editor.org/rfc/rfc1952.html

And maybe explain a little... perhaps make this simpler by converting everyting to a single Uint8List and then modify it... and return a Stream.single()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

test/package_server.dart Outdated Show resolved Hide resolved
test/package_server.dart Outdated Show resolved Hide resolved
Comment on lines 32 to 33
// Setting this to true will make automatic calculation of content-hashes.
bool serveContentHashes = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this by default, should we? if not maybe leave a comment explaining why we need to enable/disable this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is done by default - seems to cause no problems.

lib/src/source/hosted.dart Outdated Show resolved Hide resolved
Comment on lines 77 to 81
// Deleting the version-listing cache will cause it to be refetched, and the error will happen.
File(p.join(globalServer.cachingPath, '.cache', 'foo-versions.json'))
.deleteSync();

await pubGet(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, do pub upgrade instead

sigurdm and others added 4 commits October 6, 2022 15:42
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Comment on lines 1028 to 1034
throw FormatException('''
Downloaded archive for ${id.name}-${id.version} had wrong content-hash.

This indicates a problem on the package repository: `${description.url}`.

See $contentHashesDocumentationUrl.
''');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only thing I'm unsure of...

Options:

  • (A) Retry the request, this makes sense if the response didn't have a crc32 checksum.
  • (B) Don't retry the request, this makes sense if the response had a crc32 checksum.

But I don't like making logic that depends on whether we have crc32 checksum or not.

(C) If response carries a header *-pub-content-sha256: <hex sha256> then we abort fetching if the header doesn't match the archive_sha256 from the version-listing API. If the downloaded content doesn't match the value of *-pub-content-sha256 header, then we retry downloading.
If there is no *-pub-content-sha256 and archive_sha256 mismatches downloaded content, then we retry.

This is a bit of a hack, because the header will have to be anything that ends with -pub-content-sha256. Because on GCS we can attach custom headers, they are just prefixed x-goog-meta-, so our custom headers would be x-goog-meta-pub-content-sha256 and if hosted on S3 a package repository would have x-amx-meta-pub-content-sha256 or something like that.

See:


But option (C) might also be overkill. Certainly it's something we should do in a follow up PR.


Back on topic, it's tempting to retry, because:

  • Most likely it's a network issues -- rarely should there be disagreement between version-listing API and the archive stored. So if there is disagreement, it's very possibly a network issue.
  • In cases where there is disagreement, its possible that it would resolve from retrying -- it's likely a temporary inconsistency in cloud APIs -- like updating a GCS object cached in a CDN might take some time to propagate.
  • If the disagreement persist, then it's a bug in the package repository, if the user have to wait for retries to take place before being told that the content-hash of the downloaded archive is wrong, then maybe that's not the end of the world. It's not like the user is waiting for a positive outcome -- it's not a common scenario.

So when in doubt, maybe it's better to retry :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to retrying

@sigurdm sigurdm merged commit 4c9ebd0 into dart-lang:master Oct 18, 2022
erdemyerebasmaz added a commit to breez/c-breez that referenced this pull request Jan 25, 2023
Content hashing of archives by @sigurdm in dart-lang/pub#3482
erdemyerebasmaz added a commit to breez/c-breez that referenced this pull request Jan 25, 2023
Content hashing of archives by @sigurdm in dart-lang/pub#3482
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dart pub get --pristine/--locked
3 participants