Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster downloads #798

Merged
merged 8 commits into from
Oct 20, 2023
Merged

Faster downloads #798

merged 8 commits into from
Oct 20, 2023

Conversation

BryceStevenWilley
Copy link
Collaborator

Speeds up the downloads by directly downloading the file instead of trying to rely on response callbacks, which almost always timeout.

I made things a bit resilient; I think things work fine without all of the failed_to_download stuff (see https://github.com/SuffolkLITLab/ALKiln/actions/runs/6556325525/job/17806135932, an earlier test run without that code), but figured I'd keep the existing code in, in case there are situations we couldn't foresee. Happy to rip out the old code if we want to though.

I tried a lot of different options, none of them seemed to work particularly well:

  • https.get doesn't work with http://localhost (might be a me problem, but IMO, it's frustrating), the workaround is hacky and uses both https.get and http.get.
    • I also had issues getting the file to download still? Just kept getting File Not Found. I think it's because DA by default checks the cookies / logged in status of who's downloading a file (that's why you need to use url_for(private=False)). So in general, trying to access the file from outside of the browser session is going to be tricky, if not impossible.
  • You can't pass back a blob of buffer object directly from the browser (Cannot return blob or arrayBuffer from page.evaluate puppeteer/puppeteer#3722). I took the suggested code from that issue and used it here.

In this PR, I have:

Links to any solved or related issues

#492

Any manual testing I have done to ensure my PR is working

Running a test that downloads a PDF (@o17) runs in 3.4 seconds now, as opposed to 33 seconds on v5.

Copy link
Collaborator

@plocket plocket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speeding this up will be a huge win. I think we may need to check on some stuff first as this may be a bit more complex than it would first appear, but maybe I'm off - I feel like I'm missing some of this picture. Anyway, some notes here and some in the review comments. At least some of them are relevant.

Numbered for easier referencing:

  1. Do we really need to update shrinkwrap?
  2. Added some comments about adding messages to let the author know what's going on.
  3. Should we add something in the report when the download has succeeded? Before we didn't know if it had succeeded, but now we should.
  4. I think this would be especially good to add to the changelog as an internal change.
  5. Does this change handle the timeout running out? Previously, we'd go till the timeout and just continue the tests regardless of what happened (since we couldn't tell what happened). If the files didn't download, there wasn't anything to do about it. Now, though, we may need to handle the timeout if our await goes too long. Not sure if we'd want to error or warn at that point. We may want to add a test for whatever behavior we choose - make a huge image file or something to download.
  6. When comparing downloads, did we make a provision for if/when the download doesn't exist (since it may not complete in time)? I'll go check when I have a mo. It looks like we didn't, so we might need to take care of that in another PR fairly soon.

lib/scope.js Outdated Show resolved Hide resolved
lib/scope.js Outdated Show resolved Hide resolved
lib/scope.js Show resolved Hide resolved
lib/scope.js Show resolved Hide resolved
@BryceStevenWilley
Copy link
Collaborator Author

Do we really need to update shrinkwrap?

That happens when I run npm install, which I needed to do after #755 merged.

Should we add something in the report when the download has succeeded? Before we didn't know if it had succeeded, but now we should.
I think this would be especially good to add to the changelog as an internal change.

👍 (I've been waiting for changelog stuff when I put out multiple PRs b/c I don't know which will go out first)

Does this change handle the timeout running out? Previously, we'd go till the timeout and just continue the tests regardless of what happened (since we couldn't tell what happened). If the files didn't download, there wasn't anything to do about it. Now, though, we may need to handle the timeout if our await goes too long. Not sure if we'd want to error or warn at that point. We may want to add a test for whatever behavior we choose - make a huge image file or something to download.

Good points. IMO I'd rather error, since if we tried to download something and didn't download it, something's likely wrong with the server (testing might be hard; even really big stuff will download quickly on the same machine).

When comparing downloads, did we make a provision for if/when the download doesn't exist (since it may not complete in time)?

This will exist though? It completes in this step (unless there's a timeout, and I that case I think we should error, as mentioned above).

@plocket
Copy link
Collaborator

plocket commented Oct 19, 2023

  1. shrinkwrap

I think we could start using npm ci for this. It's meant for pipelines, but I think it would serve the purpose we're looking for - installing a consistent version based on the lockfile each time. When we need to update for security or for new packages we would, of course, just npm install.

Another option is to use npm install --no-save in future to avoid updating shrinkwrap, but that unfortunately doesn't really make things stable - the shrinkwrap file doesn't change, but your local environment is still different than what the shrinkwrap uses. It looks like we all end up using different shrinkwraps.

I wonder if item 4 here indicates that what happened to your shrinkwrap is a bug and may be worth reporting.

IMO I'd rather error

Sounds good to me 👍

testing might be hard

Yeah, we probably won't be able to write tests that will consistently pass (by failing) until we do some research about implementing throttling.

[The download] completes in this step [unless it errors]

Good point. I can still see a situation where someone would forget to include a download step before comparing. We might handle that in a different issue, though.

@plocket
Copy link
Collaborator

plocket commented Oct 19, 2023

Not sure if you saw that note about a success message, so just repeating that here.

@BryceStevenWilley
Copy link
Collaborator Author

I've been a bit lenient about shrinkwrap changes. I'll avoid them more in the future (though this time it's because the version in package.json on main is still 5.1.7-sources-7, like I pointed out in #755 (comment) 👀 , not a bug).

lib/scope.js Outdated Show resolved Hide resolved
@plocket
Copy link
Collaborator

plocket commented Oct 19, 2023

testing might be hard; even really big stuff will download quickly on the same machine

It occurred to me that we can decrease the max timeout for the Step to a fraction of a second and thus trigger a timeout. Would that work for our purposes? It's only one type of error, but that may still be worth testing.

@BryceStevenWilley
Copy link
Collaborator Author

It occurred to me that we can decrease the max timeout for the Step to a fraction of a second and thus trigger a timeout.

Good point, I can try that! IMO, we can try to manually and see how it does. I don't think we need to add it to the test suite though.

lib/scope.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@plocket plocket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good to me! I think once you feel comfortable with it, we can send it out into the wild!

@BryceStevenWilley
Copy link
Collaborator Author

It occurred to me that we can decrease the max timeout for the Step to a fraction of a second and thus trigger a timeout.

I did this, and got the following printout in the cucumber report:

1) Scenario: I compare the same PDFs (attempt 2) # docassemble/ALKilnTests/data/sources/observation_steps.feature:220
   ✔ Before # lib/steps.js:104
   ✔ Given I start the interview at "test_pdf" # lib/steps.js:165
   ✔ Then the question id should be "proxy vars" # lib/steps.js:453
   ✔ When I set the var "x[i].name.first" to "Proxyname1" # lib/steps.js:939
   ✔ And I tap to continue # lib/steps.js:740
   ✔ Then I sign # lib/steps.js:984
   ✔ And I tap to continue # lib/steps.js:740
   ✔ Then the question id should be "simple doc" # lib/steps.js:453
   ✔ Given the max seconds for each step in this scenario is 0.1 # lib/steps.js:279
   ✖ When I download "simple-doc.pdf" # lib/steps.js:999
       Error: Action did not complete within 0 milliseconds
           at Timeout.<anonymous> (/home/brycew/Developer/LITLab/code/ALKiln/lib/cucumber_8_shim.js:18:14)
           at listOnTimeout (node:internal/timers:569:17)
           at process.processTimers (node:internal/timers:512:7)
   - And I expect the baseline PDF "simple-doc-Baseline.pdf" and the new PDF "simple-doc.pdf" to be the same # lib/steps.js:708
   ✔ After # lib/steps.js:1113

The PDF file did actually download fine, because I guess the download had enough time to start. The conflicting states isn't great, but I'm assuming that the presence of a clear "Action did not complete within " message is good enough for people. I tried uploading a larger PDF to do this with, but ran into issues (DA doesn't like too big of files, and it doesn't like PDFs without fields, all of the big PDFs I have don't have fields and it'd be difficult to add them to it).

I'll add to the changelog now and merge!

BryceStevenWilley and others added 8 commits October 20, 2023 15:58
The existing `content-disposition` doesn't work because there are no
content disposition headers.

From https://thom4.net/2018/puppeteer-download-file/
* revert shrinkwrap changes
* get back more detailed error info from browser download
* write both success and failure to download with new method to report
Co-authored-by: plocket <52798256+plocket@users.noreply.github.com>
Co-authored-by: plocket <52798256+plocket@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants