Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MR-2875] Bulk Downloads 502 Error #1373

Merged
merged 9 commits into from
Dec 13, 2022
Merged

[MR-2875] Bulk Downloads 502 Error #1373

merged 9 commits into from
Dec 13, 2022

Conversation

macrael
Copy link
Contributor

@macrael macrael commented Dec 8, 2022

Summary

This PR resolves our issue with bulk downloads not working.

The bug ultimately was due to the way we were invoking the zip archive, triggered by the change to AWS v3 returning a stream instead of a buffer (i’m pretty sure it was a buffer but it definitely was different). I’m still a little confused by this, because we appear to be handling the archive exactly how it’s documented in their examples, but as is it was crashing the bulk download lambda every time.

The zip archive was configured inside of a promise, writing to a stream called streamPassThrough. The promise was set to be resolved when that stream emitted the close or end events. After that promise resolved, we passed that stream to s3.PutObject to be uploaded. With the internal AWS v3 change to pass a stream in instead of a buffer, it appears that the close and end events are no longer emitted during the zip process and so I think that the lambda was killed for being promise-locked. I’m still not 100% clear on this because how ever the lambda was dying it was not getting caught by any of our catch clauses. It just died mid stream as far as I could tell. My main fix here was to take the zip out of the promise and just await the s3 upload in the end to sync all the stream processing.

The s3.PutObject command was not built to work with streams, when passed an incomplete stream it errored. So I imported the s3.Upload command which is a higher level command built to work with streams. It also can manage tags so that consolidated some of our s3 activity.

Finally, the first thing I did when I was looking at this was trying to straighten out the type of what we were getting back from our initial s3.GetObject calls. This was frustrating and ulitmately fruitless, the type was in fact a Readble which is what we were coercing the return type to anyway. I assumed that since the docs said that s3 was now returning a stream that it would in fact be returning the ReadableStream type to us, and trying to get that type to fit into a Readable which is what the zip archiver wanted was a real pain. Here’s what I learned:

  • ReadableStream is a browser standard that a bunch of tooling uses
  • Readable is a node type
  • (There used to be a node ReadableStream type which some of the Readable functions can still interact with, unrelated to the browser type)
  • node implements a set of the browser types itself, the Readable.fromWeb() takes the new node ReadableStream type.
  • That node ReadableStream is actually slightly incompatible with the actual ReadableStream type implemented by browsers, it has some extra functions that are part of the standard but are not implemented by any browsers. This prevents you from passing the ReadableStream that AWS returns into the Readable.fromWeb() fucntion that would get it into node land. 😡
  • Since AWS is returning a browser standard type, it expects the “dom” lib to be loaded in typescript. Without that it just returns a stub of ReadableStream that has no members.
  • Since it was such a pain to figure this all out and I did get the right types connected in the end I left all those changes in even though in my testing that code path is never called b/c AWS always returns a Readable instance

Related issues

https://qmacbis.atlassian.net/browse/MR-2875

@macrael macrael changed the title Bulk Downloads 502 Error [MR-2875] Bulk Downloads 502 Error Dec 8, 2022
@github-actions github-actions bot temporarily deployed to review-apps December 8, 2022 19:15 Inactive
}
})
)
} catch (e) {
Copy link
Contributor

@haworku haworku Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😞 I wonder if this is a change we should have everywhere (catch errors, log for cloudwatch, also return some more specific error) . Idk why I didn't notice we weren't try catching our async actions in handlers that should probably be our standard.

Also eslint should warn on this too? Gonna check that out

@github-actions github-actions bot temporarily deployed to review-apps December 8, 2022 21:03 Inactive
@github-actions github-actions bot temporarily deployed to review-apps December 8, 2022 22:43 Inactive
@github-actions github-actions bot temporarily deployed to review-apps December 8, 2022 23:16 Inactive
@github-actions github-actions bot temporarily deployed to review-apps December 9, 2022 01:03 Inactive
@github-actions github-actions bot temporarily deployed to review-apps December 10, 2022 02:06 Inactive
@github-actions github-actions bot temporarily deployed to review-apps December 10, 2022 02:48 Inactive
@github-actions github-actions bot temporarily deployed to review-apps December 10, 2022 02:55 Inactive
@@ -2,7 +2,7 @@
"extends": "../../tsconfig.json",
"compilerOptions": {
"module": "commonjs",
"lib": ["es2021"],
"lib": ["es2021", "DOM"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels... unexpected? What this still needed or is it from previous debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The s3 APIs return a ReadableStream which is a browser type. Without this the types from the AWS api were coming back without the right information. In the end we’re not actually using that type, but I think we should keep this so the types stay valid

@macrael macrael marked this pull request as ready for review December 12, 2022 17:05
Copy link
Contributor

@haworku haworku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I see things working.

Not directly related but just sharing a finding from manual QA - the multi-rate behavior for the "download all rate documents" button just groups everything in the same folder. That surprised me, I was expecting folders or some kind of re-naming in the files to group things a bit more. Not sure if we intentionally designed things that way or just never actively re-designed this feature with multi-rates.

@macrael
Copy link
Contributor Author

macrael commented Dec 12, 2022

I think we never re-designed it, that’s a good question to bring up for the whole team

@mojotalantikite
Copy link
Contributor

Cool, thanks for writing out that summary! Code looks good and things are working for me as expected in the review app.

@github-actions github-actions bot temporarily deployed to review-apps December 12, 2022 23:36 Inactive
@macrael macrael merged commit 28fd843 into main Dec 13, 2022
@macrael macrael deleted the wml-download-all-502 branch December 13, 2022 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants