Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multi-step chapter load process more reliably in VitalSource integration #4145

Merged
merged 7 commits into from
Jan 26, 2022

Conversation

robertknight
Copy link
Member

@robertknight robertknight commented Jan 24, 2022

Injection of the client into a new chapter/section after navigating chapters/pages in VitalSource did not always work reliably. The issue is that the real VS viewer loads new chapters/pages in multiple steps:

  1. It removes the iframe for the previous chapter/page
  2. It creates an iframe for the new chapter/page. The iframe URL that is initially loaded returns a "blank" page containing some encrypted data for the book. You can see this if you look at the document request made after a navigation in devtools: Initial content request
  3. Finally it makes a form submission using the content from the initial request as a payload. This returns the actual HTML: Decoded content request

The client previously tried to inject itself as soon as it detected the new iframe, and did not react if that iframe was later navigated. As a result, it would sometimes inject itself after stage 2 and failed to re-inject itself after stage 3. As a result, the new chapter content could not be annotated.

Summary of changes:

  1. The first commit updates the test page at http://localhost:3000/document/vitalsource-epub to emulate this behavior of the real VS viewer
  2. The the second commit updates the VitalSource integration to handle this behavior more reliably, by avoiding injecting the client into the initial "blank" frame and listening for load events on the new chapter's iframe to respond to the real content loading

The VitalSource EPUB test page was missing an important behavior of the
real VS viewer, where the content loads in multiple steps - see code
comments for details. The client does not load reliably into the new
chapter after a navigation due to this behavior of the VS viewer.
Navigations between sections of a book in VitalSource happen in several
steps:

 1. The iframe for the previous chapter is removed
 2. An iframe is created for the new chapter. The initial load of this
    iframe contains a "blank" page with the chapter content in an
    encoded/encrypted form as hidden text in the page.
 3. The chapter content data is submitted to the VS backend via a form
    POST request, which returns the decoded content

Previously the Hypothesis client would sometimes detect the new iframe
after step 2 and inject the client before step 3 had completed. As a
result the client would be injected into the blank frame, then
immediately unloaded, and not re-loaded into the "real" content frame.

Make injection more reliable by:

 1. Checking whether the content frame is a "blank" frame or not before
    injecting the client.
 2. Listening for `load` events on the frame and re-injecting the client
    if the iframe is navigated
I don't know if these scenarios can actually happen in the real VS viewer, but
this feels like appropriate defensive coding.
@codecov
Copy link

codecov bot commented Jan 24, 2022

Codecov Report

Merging #4145 (dab139a) into master (b1b5204) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4145   +/-   ##
=======================================
  Coverage   99.08%   99.08%           
=======================================
  Files         217      217           
  Lines        8175     8184    +9     
  Branches     1917     1918    +1     
=======================================
+ Hits         8100     8109    +9     
  Misses         75       75           
Impacted Files Coverage Δ
src/annotator/integrations/vitalsource.js 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1b5204...dab139a. Read the comment docs.

@robertknight robertknight requested a review from esanzgar January 24, 2022 16:14
Copy link
Contributor

@esanzgar esanzgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested the PR and it effectively injects the Hypothesis client into the VS iframe on chapter navigation.

I have a concern about checking the iframe's document before it is ready. I would suggest a revision of that.

@@ -27,7 +27,7 @@

let chapterIndex = 0;

const setChapter = index => {
const setChapter = (index, initialLoad = false) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would suggest to pass an optional object as the second argument:

Suggested change
const setChapter = (index, initialLoad = false) => {
const setChapter = (index, {initialLoad = false} = {}) => {

That way the function is called either: setChapter(index) or setChapter(index, {initialLoad: true}). In the second case, there is no need to comment what it means the second argument (setChapter(0, true /* initialLoad */))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could simplify and always go through the chapter navigations step, even for the initial loading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would suggest to pass an optional object as the second argument:

Done!

Comment on lines 61 to 64
this.contentFrame.src = 'about:blank';
setTimeout(() => {
this.contentFrame.src = chapterURL;
}, 50);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simulation mimics the step #3 in a slightly different way. It changes the src attribute of the iframe instead of loading a new document follow by an internal navigation. However, both actions trigger the loading of a new document.

Maybe a more faithful reproduction would have been:

Suggested change
this.contentFrame.src = 'about:blank';
setTimeout(() => {
this.contentFrame.src = chapterURL;
}, 50);
this.contentFrame.srcdoc = `<script>setTimeout(() => location = '${chapterURL}', 50)<\/script>`;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Since this is a same-origin iframe, this can be done without a script: contentFrame.contentWindow.location.href = <new URL>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the way the final document loads so that it doesn't modify the src attribute, which better matches the real VitalSource viewer.

Comment on lines 63 to 66
// Check if this frame contains ebook content, as opposed to being a
// "blank" frame containing encrypted book content, as hidden text,
// which is created initially after a chapter navigation. These "blank"
// pages are replaced with the real content after a form submission.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest avoiding the term blank so it is not confused with about:blank from the mustache example:

Suggested change
// Check if this frame contains ebook content, as opposed to being a
// "blank" frame containing encrypted book content, as hidden text,
// which is created initially after a chapter navigation. These "blank"
// pages are replaced with the real content after a form submission.
// Check if this ebook document contains visible content, as opposed to being a
// page containing encrypted book content, as hidden text,
// which is created initially after a chapter navigation. These
// pages with invisible content are replaced with the real content after a form submission.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have revised this comment to avoid using the term "blank" to describe the initial page that we receive after a navigation.

//
// The format of the decoded HTML can vary, but as a simple heuristic,
// we look for a text paragraph.
const isBookContent = frame.contentDocument?.querySelector('p');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document could be not ready yet. I suggest you wait for the readiness of the document before making the query. annotator.injectClient does that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the document is not ready yet, this check may fail. However in that case we'll retry when the load event is emitted after the document finishes loading.

@@ -55,12 +55,36 @@ export class VitalSourceContainerIntegration {
throw new Error('Book container element not found');
}

/** @type {WeakSet<HTMLIFrameElement>} */
const contentFrames = new WeakSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is to track if the Hypothesis client has already been injected into this iframe's document. However, it stores the iframe element.

Is re-injection of the client possible?

I think this check is unnecessary if hasHypothesis function worked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is to track if the Hypothesis client has already been injected into this iframe's document.

This is used to track whether we have set up the load event listener on the frame.

The real VitalSource viewer loads the final URL with a form submission,
which does not update the `src` attribute of the iframe. Make the demo
page also load the app in a way that doesn't modify the parent frame's
DOM, to ensure the client is not relying on that.
Clarify what happens if we try to test whether a VitalSource frame
contains "real content", before it has finished loading.
@robertknight
Copy link
Member Author

Thanks for the feedback. I updated various comments and made some changes to the VitalSource EPUB test page as suggested.

@robertknight robertknight merged commit 40d3fa8 into master Jan 26, 2022
@robertknight robertknight deleted the handle-vs-two-step-load branch January 26, 2022 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants