-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(toolkit): ensure consistent zip files #2931
Conversation
Not 100% sure how to correctly unit test this... but I can confirm that I finally have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on!
packages/aws-cdk/lib/archive.ts
Outdated
output.on('open', async () => { | ||
archive.pipe(output); | ||
|
||
const contents = await Promise.all(files.map(async (file) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will load all files into memory concurrently. Not sure I like that for very big zips (tens to hundreds of megabytes).
Also, don't see the use of doing this in parallel since this task should be I/O bound, as far as I can estimate.
Can you test and verify whether the speed makes a difference? If it does, then I would prefer some kind of batching (chop the list into batches of 10 and do those in parallel) to keep memory consumption under control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand, will have a look at it
@@ -2,7 +2,7 @@ | |||
"compilerOptions": { | |||
"target":"ES2018", | |||
"module": "commonjs", | |||
"lib": ["es2016", "es2017.object", "es2017.string"], | |||
"lib": ["es2016", "es2017.object", "es2017.string", "dom"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for @types/jszip
const stream = fs.createReadStream(path.join(directory, file)); | ||
archive.append(stream, { | ||
name: file, | ||
date: new Date('1980-01-01T00:00:00.000Z'), // reset dates to get the same hash for the same content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on my system specifying new Date(0)
here seems to be translated to 1980-01-01T00:00:00.000Z
for the files in the zip... this means that I cannot write a unit test for the date reset => setting to 1980-01-01T00:00:00.000Z
packages/aws-cdk/lib/archive.ts
Outdated
|
||
files.forEach(file => { // Append files serially to ensure file order | ||
const stream = fs.createReadStream(path.join(directory, file)); | ||
archive.append(stream, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't NodeJS streams async? Are we 100% positive that after this statement, the stream has been fully read and added? Or are we sure that .finalize()
will synchronously read and flush all streams? Do we need to await finalize()
?
It concerns me that I don't understand the API fully, and the documentation doesn't seem to be very explicit about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the example in the doc: finalize the archive (ie we are done appending files but streams have to finish yet)
From the doc: Finalizes the instance and prevents further appending to the archive structure (queue will continue til drained).
I've successfully created zips containing multiple files of more than 50MB with this code (resulting in zip files of several hundreds of MB).
Looks like finalize()
is a blocking statement.
zipDirectory
returns a promise that is resolved only once we get the close
event on output
, this is fired only when archiver has been finalized and the output file descriptor has closed (output.once('close', () => ok());
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible to call archive.file()
and let the library handle the read stream creation (need to check if the sort order is preserved across zips in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested archive.file()
and file order is not preserved across zips. If think that the only way to go here is to use archive.append()
with fs.createReadStream()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, yes you are right, I had missed this:
output.once('close', () => ok());
I'm just very scared of accidental asyncness where you're not expecting it :(
Thanks, will merge when the validation build finishes.
Zip files were not consistent across deploys resulting in unnecessary S3 uploads and stack updates.
Ensure consistency by appending files in series (guarantees file ordering in the zip) and reseting
dates (guarantees same hash for same content).
Closes #1997, Closes #2759
Pull Request Checklist
design
folderBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.