-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content hashing of archives #3482
Conversation
The basic functionality is here. We could use some work on the error-messages. Maybe other details need to be fixed. |
From the test failures and a bit of debugging it seems that import 'dart:io';
main() {
print(gzip.encode([
112,
117,
98,
115,
112,
101,
99,
46,
121,
97,
109,
108,
0,
0,
]));
}
Prints:
This makes testing harder. We'll have to consider how best to test this. Maybe we can find a gzip-encoder written in dart to use for testing... |
Reading https://www.rfc-editor.org/rfc/rfc1952.html it seems there is a OS field in gzip. We might be able to just filter that away. Also there is an mtime stamp, that luckily it seems dart:io doesn't fill in. |
I haven't verified if the code actually does this, but as recall we wanted it to update update: Ah, maybe I'm reading it wrong. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial comments, I'm not all done. Let's walk through it when you get back.
lib/src/utils.dart
Outdated
|
||
final _hexTable2 = Map.fromIterables(_hexTable, List.generate(16, (i) => i)); | ||
|
||
String hexEncode(List<int> bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PackageId id, | ||
PackageRef ref, | ||
Version version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed this from PackageId
to PackageRef
+ Version
, because the sha256 is part of the description that is necessary to create the PackageId
, correct?
Should we instead fix the root of this problem and make doGetVersions
return a PackageVersionEntry
or something else?
I guess I'm trying to ask if maybe we should do away with PackageStatus
as a per-version entity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't see any places where this is called without using an id.toRef()
and id.version
as arguments, am I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - this is a bit annoying - let's discuss it when you're back.
lib/src/lock_file.dart
Outdated
log.fine('No hash of downloaded archive found'); | ||
} else if (!bytesEquals(cachedHash, description.sha256)) { | ||
dataError( | ||
'Cache entry for $name-${package.version} does not have content-hash matching lockfile.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we suggest running dart pub cache repair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I added a couple of suggestions here - but let's go over all the messaging together when you're back.
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
lib/src/command/get.dart
Outdated
if (argResults['dry-run'] && argResults['enforce-lockfile']) { | ||
fail('Cannot do a dry-run with enforce-lockfile.'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--dry-run
with --enforce-lockfile
is fine. In this case it should do a resolution and check that nothing in the lock file changed.
We won't need to download archives, but we could opt to check that content-hashes match what is one the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was disallowed because we can only validate the content-hashes after downloading the archives, and dry-run did not download the archives. This has changed now, so now we can indeed have the two options together.
lib/src/command/get.dart
Outdated
@@ -28,6 +29,9 @@ class GetCommand extends PubCommand { | |||
negatable: false, | |||
help: "Report what dependencies would change but don't change any."); | |||
|
|||
argParser.addFlag('enforce-lockfile', | |||
negatable: false, help: 'Only use resolution from existing pubspec.lock.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
negatable: false, help: 'Only use resolution from existing pubspec.lock.'); | |
negatable: false, help: 'Fail resolution if pubspec.lock does not satisfy pubspec.yaml.'); |
or
negatable: false, help: 'Only use resolution from existing pubspec.lock.'); | |
negatable: false, help: 'Enforce pubspec.lock, and fail resolution if it does not satisfy pubspec.yaml.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better - thanks!
test/package_server.dart
Outdated
}, | ||
); | ||
} | ||
|
||
/// Replaces the 9th entry in [stream] with a 0. This replaces the os entry | ||
/// of a gzip stream, giving us the same stream on all platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point to spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/package_server.dart
Outdated
}, | ||
); | ||
} | ||
|
||
/// Replaces the 9th entry in [stream] with a 0. This replaces the os entry | ||
/// of a gzip stream, giving us the same stream on all platforms. | ||
Stream<List<int>> replaceOs(Stream<List<int>> stream) async* { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.rfc-editor.org/rfc/rfc1952.html
And maybe explain a little... perhaps make this simpler by converting everyting to a single Uint8List and then modify it... and return a Stream.single()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/package_server.dart
Outdated
// Setting this to true will make automatic calculation of content-hashes. | ||
bool serveContentHashes = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this by default, should we? if not maybe leave a comment explaining why we need to enable/disable this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it is done by default - seems to cause no problems.
test/content_hash_test.dart
Outdated
// Deleting the version-listing cache will cause it to be refetched, and the error will happen. | ||
File(p.join(globalServer.cachingPath, '.cache', 'foo-versions.json')) | ||
.deleteSync(); | ||
|
||
await pubGet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, do pub upgrade
instead
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
lib/src/source/hosted.dart
Outdated
throw FormatException(''' | ||
Downloaded archive for ${id.name}-${id.version} had wrong content-hash. | ||
|
||
This indicates a problem on the package repository: `${description.url}`. | ||
|
||
See $contentHashesDocumentationUrl. | ||
'''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only thing I'm unsure of...
Options:
- (A) Retry the request, this makes sense if the response didn't have a crc32 checksum.
- (B) Don't retry the request, this makes sense if the response had a crc32 checksum.
But I don't like making logic that depends on whether we have crc32 checksum or not.
(C) If response carries a header *-pub-content-sha256: <hex sha256>
then we abort fetching if the header doesn't match the archive_sha256
from the version-listing API. If the downloaded content doesn't match the value of *-pub-content-sha256
header, then we retry downloading.
If there is no *-pub-content-sha256
and archive_sha256
mismatches downloaded content, then we retry.
This is a bit of a hack, because the header will have to be anything that ends with -pub-content-sha256
. Because on GCS we can attach custom headers, they are just prefixed x-goog-meta-
, so our custom headers would be x-goog-meta-pub-content-sha256
and if hosted on S3 a package repository would have x-amx-meta-pub-content-sha256
or something like that.
See:
- https://cloud.google.com/storage/docs/xml-api/reference-headers#xgoogmeta
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html
But option (C) might also be overkill. Certainly it's something we should do in a follow up PR.
Back on topic, it's tempting to retry, because:
- Most likely it's a network issues -- rarely should there be disagreement between version-listing API and the archive stored. So if there is disagreement, it's very possibly a network issue.
- In cases where there is disagreement, its possible that it would resolve from retrying -- it's likely a temporary inconsistency in cloud APIs -- like updating a GCS object cached in a CDN might take some time to propagate.
- If the disagreement persist, then it's a bug in the package repository, if the user have to wait for retries to take place before being told that the content-hash of the downloaded archive is wrong, then maybe that's not the end of the world. It's not like the user is waiting for a positive outcome -- it's not a common scenario.
So when in doubt, maybe it's better to retry :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to retrying
Co-authored-by: Jonas Finnemann Jensen <jopsen@gmail.com>
This reverts commit aedd523.
Content hashing of archives by @sigurdm in dart-lang/pub#3482
Content hashing of archives by @sigurdm in dart-lang/pub#3482
Fix of #2462
Compute a hash of each downloaded archive and store it in:
$PUB_CACHE/hosted/<hosted-url>/.hashes/<package>-<version>.sha256
(details here still subject to change)New optional field in the package listing api for the server to provide the content-hash. If that is provided - it is verified against the downloaded archive.
When writing a pubspec.lock file, the sha256 is included in the description of each hosted package.
On
pub get
If the description of a package from pubspec.lock doesn't match the one in the cache, the archive is redownloaded - if the hash still doesn't match, the resolution fails with an error.Has been moved to a follow-up PR Introduce a new option
dart pub get --enforce-lockfile
A mode that will NOT modify pubspec.lock. That means:This is useful when deploying to production.
Fixes: dart pub get --pristine/--locked #2890 and
locked
option inpubspec.yaml
#2905An unfortunate side-effect of this change is that all already downloaded packages will be re-downloaded (because we don't store the archives, only the extracted files) to compute their hashes.