Browser caching does not work for the first range request on Chrome and Safari #11624

alexandre-sarfati · 2020-02-24T12:32:18Z

Dear community,

When one activates range requests with disableAutoFetch: true and disableStream: true and http server configured to cache contents with public, max-age=31536000, immutable, browser caching does work on Chrome, nor on Safari.

But the first range request return a code 200 and is not cached by the browser, until all the ranges has been download.

Attach (recommended) or Link to PDF file here: not file specific, but one can use https://public.fays.io/public/580af700-5f79-44c4-8a0c-f7e976ee528f.pdf

The server configuration is:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, HEAD, PUT, POST
Access-Control-Expose-Headers: Accept-Ranges, Content-Range, Content-Length, x-amz-request-id, x-amz-id-2, ETag
Access-Control-Max-Age: 31536000
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Cache-Control: public, max-age=31536000, immutable

Configuration:

Web browser and its version: Chrome Version 80.0.3987.116. Firefox does not yet cache range requests, so it is out of scope.
Operating system and its version: mac os x 10.15.3 
PDF.js version: 2.2.228
Is a browser extension: no

Steps to reproduce the problem:

Clear the browser cache
Go to https://app.fays.io/bf205bf6-43d2-4140-8140-ab4b6f48c2a5 and open the developper tool. This viewer should work like the integrated viewer of PDF.JS
Refresh the tab

What is the expected behavior?
The expected behavior is that the browser caches all the HTTP requests, including the first one with response code 200.

What went wrong?
After step 3., one can see in the developer tool that the range requests have been cached but not the first request.

The first request has specific header for CORS:

:authority: public.fays.io
:method: GET
:path: /public/2ce2766d-b491-43a0-a0a9-96c56c2bd667.pdf
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7,de;q=0.6
if-modified-since: Sun, 23 Feb 2020 23:43:01 GMT
if-none-match: "dd09a498518c55f91fe48b29bec2ffde"
origin: https://app.fays.io
range: bytes=0-196607
referer: https://app.fays.io/bf205bf6-43d2-4140-8140-ab4b6f48c2a5
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36

The request is not cached by the browser since PDF.JS cancels the first request.

pdf.js/src/display/fetch_stream.js

Lines 155 to 157 in 93aa613

    
           if (!this._isStreamingSupported && this._isRangeSupported) { 
        
             this.cancel(new AbortException("Streaming is disabled.")); 
        
           }

**Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): **
https://app.fays.io/bf205bf6-43d2-4140-8140-ab4b6f48c2a5

Best, A.

The text was updated successfully, but these errors were encountered:

Snuffleupagus · 2020-02-24T12:49:59Z

The request is not cached by the browser since PDF.JS cancels the first request ?

Well, if you set disableStream = true then you're explicitly telling the PDF.js code to abort the initial (200) request and instead fetch the PDF document using range (206) requests. Quite frankly, if the initial request wasn't aborted there'd be no point in actually having the disableStream option.

Still, when the browser has downloaded all the ranges (step 4.), then the request is finally cached On Chrome (but not on Safari):

It seems that, again, your problem is related to the differences in how various browsers have chosen to implement caching of network requests.

All-in-all, I'm afraid that it does not seem to be much (if anything) actionable here from a PDF.js perspective.

alexandre-sarfati · 2020-02-24T12:58:45Z

Well, if you set disableStream = true then you're explicitly telling the PDF.js code to abort the initial (200) request and instead fetch the PDF document using range (206) requests. Quite frankly, if the initial request wasn't aborted there'd be no point in actually having the disableStream option.

Yes this is the expected behavior of disableStream in this scenario. I am wondering if this is possible to replace the first request by another type of request.

It seems that, again, your problem is related to the differences in how various browsers have chosen to implement caching of network requests.

Yes

Snuffleupagus · 2020-02-24T13:25:06Z

I am wondering if this is possible to replace the first request by another type of request.

First of all, you need to dispatch a regular (200) request to check if range requests are actually supported and correctly implemented by the server, see

pdf.js/src/display/fetch_stream.js

Lines 137 to 147 in dd893d5

    
           const { 
        
             allowRangeRequests, 
        
             suggestedLength, 
        
           } = validateRangeRequestCapabilities({ 
        
             getResponseHeader, 
        
             isHttp: this._stream.isHttp, 
        
             rangeChunkSize: this._rangeChunkSize, 
        
             disableRange: this._disableRange, 
        
           }); 
        
           this._isRangeSupported = allowRangeRequests;

, or

pdf.js/src/display/network.js

Lines 320 to 332 in dd893d5

    
           const { 
        
             allowRangeRequests, 
        
             suggestedLength, 
        
           } = validateRangeRequestCapabilities({ 
        
             getResponseHeader, 
        
             isHttp: this._manager.isHttp, 
        
             rangeChunkSize: this._rangeChunkSize, 
        
             disableRange: this._disableRange, 
        
           }); 
        
           if (allowRangeRequests) { 
        
             this._isRangeSupported = true; 
        
           }

, and also

pdf.js/src/display/network_utils.js

Lines 23 to 62 in dd893d5

    
           function validateRangeRequestCapabilities({ 
        
             getResponseHeader, 
        
             isHttp, 
        
             rangeChunkSize, 
        
             disableRange, 
        
           }) { 
        
             assert(rangeChunkSize > 0, "Range chunk size must be larger than zero"); 
        
             const returnValues = { 
        
               allowRangeRequests: false, 
        
               suggestedLength: undefined, 
        
             }; 
        
             const length = parseInt(getResponseHeader("Content-Length"), 10); 
        
             if (!Number.isInteger(length)) { 
        
               return returnValues; 
        
             } 
        
             returnValues.suggestedLength = length; 
        
             if (length <= 2 * rangeChunkSize) { 
        
               // The file size is smaller than the size of two chunks, so it does not 
        
               // make any sense to abort the request and retry with a range request. 
        
               return returnValues; 
        
             } 
        
             if (disableRange || !isHttp) { 
        
               return returnValues; 
        
             } 
        
             if (getResponseHeader("Accept-Ranges") !== "bytes") { 
        
               return returnValues; 
        
             } 
        
             const contentEncoding = getResponseHeader("Content-Encoding") || "identity"; 
        
             if (contentEncoding !== "identity") { 
        
               return returnValues; 
        
             } 
        
             returnValues.allowRangeRequests = true; 
        
             return returnValues; 
        
           }

Second of all, in the default case where streaming is used the initial (200) request is simply allowed to continue. Not only is that the most efficient solution, but it also avoids errors with servers that don't allow a PDF document to be fetched more than once.

alexandre-sarfati · 2020-02-24T16:58:20Z

I found that the following is not mandatory on my computer. I skip this part with an if:

pdf.js/src/display/fetch_stream.js

Lines 119 to 159 in dd893d5

    
           fetch( 
        
             url, 
        
             createFetchOptions( 
        
               this._headers, 
        
               this._withCredentials, 
        
               this._abortController 
        
             ) 
        
           ) 
        
             .then(response => { 
        
               if (!validateResponseStatus(response.status)) { 
        
                 throw createResponseStatusError(response.status, url); 
        
               } 
        
               this._reader = response.body.getReader(); 
        
               this._headersCapability.resolve(); 
        
               const getResponseHeader = name => { 
        
                 return response.headers.get(name); 
        
               }; 
        
               const { 
        
                 allowRangeRequests, 
        
                 suggestedLength, 
        
               } = validateRangeRequestCapabilities({ 
        
                 getResponseHeader, 
        
                 isHttp: this._stream.isHttp, 
        
                 rangeChunkSize: this._rangeChunkSize, 
        
                 disableRange: this._disableRange, 
        
               }); 
        
               this._isRangeSupported = allowRangeRequests; 
        
               // Setting right content length. 
        
               this._contentLength = suggestedLength || this._contentLength; 
        
               this._filename = extractFilenameFromHeader(getResponseHeader); 
        
               // We need to stop reading when range is supported and streaming is 
        
               // disabled. 
        
               if (!this._isStreamingSupported && this._isRangeSupported) { 
        
                 this.cancel(new AbortException("Streaming is disabled.")); 
        
               } 
        
             }) 
        
             .catch(this._headersCapability.reject);

And an else condition this._headersCapability.resolve();
It works if the parameter "length" (e.g. { url: URL, length: 1000 }) is set to the contentLength of the file.

Snuffleupagus · 2020-02-24T19:32:06Z

I don't really understand what #11624 (comment) even suggests removing/skipping, but it's obviously not appropriate to simply remove a fetch call (assuming that's what was actually suggested).

Furthermore, as was clearly mentioned in #11624 (comment) and is hopefully evident if you carefully read through the validateRangeRequestCapabilities function, there's a fair amount of validation of the server response that needs to happen before range requests can actually be used.

I found that the following is not mandatory on my computer.

It's hopefully easy to see why a single example doesn't really mean anything in general, and why you cannot assume that random servers on the internet are always correctly configured.

And an else condition this._headersCapability.resolve();

It's not really clear what you're suggesting here, but it's obviously not a good idea to purposely add options that would allow users to essentially "shot themselves in the foot" by disabling validation which is necessary for the correct function of the PDF.js library.

All-in-all, I'm really not seeing anything that can be fixed here (from a PDF.js perspective) and would thus suggest that the issue should be closed.

alexandre-sarfati · 2020-02-24T20:32:08Z

@Snuffleupagus I totally agree with you. I am looking for a specific solution for a specific server setting to cache this preflight 200 and be able to load it offline!

Snuffleupagus · 2020-02-24T21:38:36Z

I am looking for a specific solution for a specific server setting to cache this preflight 200 and be able to load it offline!

Given that caching of range requests are not well supported across browsers, have you instead considered not depending so heavily on that and simply load the PDF document with default values for the relevant disable... options.
With modern browsers (i.e. those supporting the Fetch API), that would thus imply that most (if not all) of the PDF document is fetched with the initial (200) request, which browsers should have no trouble caching. Note also that, generally speaking, streaming is usually faster than range requests too.

alexandre-sarfati · 2020-02-24T22:18:20Z

@Snuffleupagus In fact, I have successfully tested offline with #11624 (comment) for the following document and range request activated: https://public.fays.io/public/2ce2766d-b491-43a0-a0a9-96c56c2bd667.pdf 💯

I try with the PDF.JS integrated viewer. When one turns off Wifi and refreshes the tab 10 times it still loads the document and one may consult the previously requested pages. 🥇

Still, I did not yet manage to make it works when one shutdowns the browser. 👎 One may try to use a production environment and see what happens with the ServiceWorker.

Of course, as you mentionned, this is not a general solution at all. Just a try. This feature can not be defaulted for every PDF.JS users.

But I find out that the first request (code 200) is not mandatory because the preflight CORS check is also performed by the browser if the request has a "range" header (for all browser, this is in the specs, who knows ?).

As you said:

if you carefully read through the validateRangeRequestCapabilities function, there's a fair amount of validation of the server response that needs to happen before range requests can actually be used.

The first request is mandatory when one does not know if the server accept ranges and to get the size of the document, this is specified on validateRangeRequestCapabilities

Still, the first request is an option if the user can ensure by itself that validateRangeRequestCapabilities is always true.
For example, one can access the size using an alternative strategy (not related to PDF.JS) and if one wants to force range requests and set disableStream and disableAutoFetch.

But all of this requires more investigation 🔢

💯 Therefore, it will be nice to have a way to override this part easily outside PDF.JS but I don't know how to do it properly. Can one extends PDFFetchStreamReader ? 💯

have you instead considered not depending so heavily on that and simply load the PDF document with default values for the relevant disable... options.

I confirm that offline already works when disableStream and disableAutoFetch are not set. 👍

Given that caching of range requests are not well supported across browsers

In fact, it works well for range requests on Safari, Chrome, Opera (including mobile?), I have not tested it on Edge or any alternatives and they plan to implement the feature on Firefox.

Snuffleupagus · 2020-02-24T23:37:37Z

Still, the first request is an option if the user can ensure by itself that validateRangeRequestCapabilities is always true.

You want to trust the server, rather than the user here[1], to avoid a support nightmare later on.
As was clearly mentioned in #11624 (comment) we definitely do not want any foot guns in the options here, hence that won't happen!

Besides, it's really not at all correct to make the changes you're suggesting since it'd make PDFFetchStreamReader unable to read any data (which is bound to break lots of other things).

Therefore, it will be nice to have a way to override this part easily outside PDF.JS but I don't know how to do it properly. Can one extends PDFFetchStreamReader?

No, that's considered internal functionality and it's consequently not exposed in the public API. (You'd need to fork the code, but as I've tried to explain multiple times the changes you're suggesting aren't really a good idea.)

The "correct" way of handling special cases, with regards to fetching of data, would be to utilize PDFDataRangeTransport instead. That allows completely custom data delivery, that you thus can implement in what ever way you want/need in your case (it's being used in the PDF Viewer that's built-in to the Firefox browser); see #11453 (comment) for some relevant links.

This issue really ought to remain closed. /cc @timvandermeij

[1] Previous experiences have shown that users may, more often than you'd like, set certain options without always fully understanding the ramifications of doing so.

alexandre-sarfati · 2020-02-25T09:21:19Z

But all of this requires more investigation 🔢

The "correct" way of handling special cases, with regards to fetching of data, would be to utilize PDFDataRangeTransport instead. That allows completely custom data delivery, that you thus can implement in what ever way you want/need in your case (it's being used in the PDF Viewer that's built-in to the Firefox browser); see #11453 (comment) for some relevant links.

Thank you for this @Snuffleupagus
Can one extends PDFDataRangeTransport in Typescript ?

timvandermeij · 2020-02-25T22:05:24Z

Yes, it's part of the public API; see also

pdf.js/src/display/api.js

Line 177 in 1240048

* @param {string|TypedArray|DocumentInitParameters|PDFDataRangeTransport} src

Closing as answered since there is not really anything we can/should do here.

alexandre-sarfati mentioned this issue Feb 24, 2020

Browser caching does not work for the first range request on Chrome and Safari whatwg/fetch#1003

Closed

alexandre-sarfati closed this as completed Feb 24, 2020

alexandre-sarfati reopened this Feb 24, 2020

timvandermeij closed this as completed Feb 25, 2020

snyk-bot mentioned this issue Mar 7, 2022

[Snyk] Security upgrade react-scripts from 3.2.0 to 5.0.0 yoonpsu/pdf.js#20

Open

yoonpsu mentioned this issue Apr 20, 2022

[Snyk] Security upgrade react-scripts from 3.2.0 to 5.0.0 yoonpsu/pdf.js#21

Open

yoonpsu mentioned this issue Oct 25, 2022

[Snyk] Security upgrade react-scripts from 3.2.0 to 5.0.0 yoonpsu/pdf.js#22

Open

snyk-bot mentioned this issue Nov 5, 2022

[Snyk] Security upgrade react-scripts from 3.2.0 to 5.0.0 yoonpsu/pdf.js#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browser caching does not work for the first range request on Chrome and Safari #11624

Browser caching does not work for the first range request on Chrome and Safari #11624

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 24, 2020

Snuffleupagus commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020

alexandre-sarfati commented Feb 25, 2020 •

edited

Loading

timvandermeij commented Feb 25, 2020 •

edited

Loading

Browser caching does not work for the first range request on Chrome and Safari #11624

Browser caching does not work for the first range request on Chrome and Safari #11624

Comments

alexandre-sarfati commented Feb 24, 2020 • edited Loading

Snuffleupagus commented Feb 24, 2020

alexandre-sarfati commented Feb 24, 2020 • edited Loading

Snuffleupagus commented Feb 24, 2020 • edited Loading

alexandre-sarfati commented Feb 24, 2020 • edited Loading

Snuffleupagus commented Feb 24, 2020 • edited Loading

alexandre-sarfati commented Feb 24, 2020

Snuffleupagus commented Feb 24, 2020 • edited Loading

alexandre-sarfati commented Feb 24, 2020 • edited Loading

Snuffleupagus commented Feb 24, 2020

alexandre-sarfati commented Feb 25, 2020 • edited Loading

timvandermeij commented Feb 25, 2020 • edited Loading

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020 •

edited

Loading

Snuffleupagus commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 24, 2020 •

edited

Loading

alexandre-sarfati commented Feb 25, 2020 •

edited

Loading

timvandermeij commented Feb 25, 2020 •

edited

Loading