Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add timeout to all protocol commands #6347

Merged
merged 5 commits into from
Oct 24, 2018

Conversation

connorjclark
Copy link
Collaborator

Fixes #6296

}

const value = response.result.value;
async _evaluateInContext(expression, contextId) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VSCode suggested this could be async, so I let it :) Thoughts on changing asynchronous syntax? FWIW, V8 just improved the performance of await to match Promises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on changing asynchronous syntax?

Totally fine. Since we adopted Node 8 as the base requirement we've mostly just followed the rule of only changing from Promises to async/await when we're touching the code for some other reason (see initial discussion in #3742), but to convert (or not) at will if that's the case.

We have some tricky uses of promises (see #6343 :) that need real review, which mass changes make difficult, especially since the indentation level changes and the git diffs end up not always the best, even when ignoring whitespace. We (I?) also end up using git blame a lot, and random changes in unrelated PRs make that annoying :)

It's definitely a fuzzy line, though, and sometimes drive by fixes are great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This case is totally different, though, because it's fundamentally changing anyways, so it's a prime candidate for fixing.)

@connorjclark connorjclark changed the title core: add timeout to all protocol commands (#6296) core: add timeout to all protocol commands Oct 19, 2018
Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so simple and clean I love it nice work!

I am slightly worried about the commands that are going to come out of the woodwork failing though. I would want to trying running LH on some crappier hardware on some gnarly URLs (@paulirish has a ton of phones :)) and find the longest commands we have right now.

});
});
// Encoding issues may lead to hanging getResponseBody calls: https://github.com/GoogleChrome/lighthouse/pull/4718
// driver.sendCommand will handle timeout after 5s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this at 1s? expanding to 5 could dramatically slow down the failure case when 3 of these go wrong :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I put it back to 1s

lighthouse-core/gather/driver.js Show resolved Hide resolved
* @param {number} timeout
*/
setProtocolTimeout(timeout) {
this._protocolTimeout = timeout;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I afraid we'll need to make timeout a setting on sendCommand with a global default :/

Copy link
Collaborator Author

@connorjclark connorjclark Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move to that if it's really needed. It could complicate usage quite a bit, that + the unique typing on the method, I think the best option would be this:

/**
 * Call protocol methods.
 * @template {keyof LH.CrdpCommands} C
 * @param {C|{method: C, timeout: number}} methodOrOpts
 * @param {LH.CrdpCommands[C]['paramsType']} params,
 * @return {Promise<LH.CrdpCommands[C]['returnType']>}
 */
sendCommand(methodOrOpts, ...params) {
  const {method, timeout} = typeof methodOrOpts === 'object' ? methodOrOpts : {
    method: methodOrOpts,
    timeout: DEFAULT_PROTOCOL_TIMEOUT
  }
}

That way, the calling code for most use cases can remain the same.

proto/lighthouse-result.proto Outdated Show resolved Hide resolved
// Used when security error prevents page load.
INSECURE_DOCUMENT_REQUEST = 17;
// Used when security error prevents page load.
PROTOCOL_TIMEOUT = 18;
Copy link
Collaborator Author

@connorjclark connorjclark Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed REQUEST_CONTENT_TIMEOUT ... instead of making a new enum value for PROTOCOL_TIMEOUT, could we just reuse REQUEST_CONTENT_TIMEOUT's enum (rename it)? Would that be OK for proto compat?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, can't rename.

But actually looks like we never exposed REQUEST_CONTENT_TIMEOUT in the proto. So I think we're OK. :)

lighthouse-core/lib/lh-error.js Show resolved Hide resolved
clearTimeout(asyncTimeout);
reject(err);
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe drop keep the promise just to the thing that has to be a promise? e.g.

return new Promise(async (resolve, reject) => {
  const asyncTimeout = setTimeout((_ => {
    const err = new LHError(LHError.errors.PROTOCOL_TIMEOUT);
    err.message += ` Method: ${method}`;
    reject(err);
  }), this._protocolTimeout);

  try {
    const result = await this._connection.sendCommand(method, ...params);
    clearTimeout(asyncTimeout);
    resolve(result);
  } catch (err) {
    clearTimeout(asyncTimeout);
    reject(err);
  }
});

@brendankenny
Copy link
Member

brendankenny commented Oct 19, 2018

ok, options:

  1. sendCommand(methodOrOpts, ...params)
  2. call setNextProtocolTimeout(60000) right before sendCommand() and have it reset itself after that next sendCommand call
  3. have sendCommand() have a 5s default, but have a second unboundedSendCommand() that can be called if needed (e.g. by evaluateAsync)
  4. some kind of sneaky currying that we haven't figured out yet
  5. wait for the typescript team to support appending to tuple types/parameters

other options? (1) worries me about exposing complexity for the 5% case that may be with us forever (see: every js API that takes as its first parameter a string, or if it's a number it acts like this, or if it's an object it's an options object, or if it's a stream it...). OTOH it is overhead when writing code, but is fairly self explanatory when reading it, and I take @hoten's point that it's basically equivalent to (2)

lhrRuntimeError: true,
},

// Hey! When adding a new error type, update lighthouse-result.proto too.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@connorjclark
Copy link
Collaborator Author

@patrickhulce I'll test it out on some slower devices today.

@connorjclark
Copy link
Collaborator Author

Tested on an old Nexus 4 device. Didn't come across any protocol timeouts.

@connorjclark
Copy link
Collaborator Author

Got some timings. https://gist.github.com/Hoten/de38179b2d941b6d195c3c4383d0bd8d

Runtime.evaluate took ~9s on Nexus 4 (this command has an explicit 60s timeout). The next slowest was ~4.1s (brushing up on the 5s timeout). Everything that took longer than 1s seems related to font stuff (CSS.getComputedStyleForNode), which I think we are getting rid of soon? If not, we might consider 1) increasing the timeout for those commands or 2) instead of sending them all at once, send them in batches.

@paulirish
Copy link
Member

Everything that took longer than 1s seems related to font stuff (CSS.getComputedStyleForNode)

Yup looks like it's the font-size gatherer:

async function fetchComputedFontSize(driver, textNode) {
try {
const {computedStyle} = await driver.sendCommand('CSS.getComputedStyleForNode', {
nodeId: textNode.parentId,
});

I was hoping it was the fonts one that was removed this morning. but alas.

TBH I think that situation is OK. I don't really want the font-size gatherer taking >5s.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

thanks so much for jumping on that slow device timeout investigation @hoten! sorry I spaced out hearing the first half of your update 😳

@@ -240,23 +248,48 @@ class Driver {
}
}

/**
* NOTE: This can eventually be replaced when TypeScript
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true now? Maybe add a bit more to the comment then because I haven't the faintest clue how we can remove the method and the variadic types header at the top of that issue just makes me dizzy 🤣

Copy link
Collaborator Author

@connorjclark connorjclark Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is referring to sendCommand. That might be the confusion...

My understanding is that the linked issue will eventually allow types such as:

type TypedArray = [string, number];
type ExtendedTypedArray = TypedArray + [string];
const value: ExtendedTypedArray = ['', 12, ''];

If so, we'd be able to type that variadic parameter as LH.CrdpCommands[C]['paramsType'] + [?{timeout: number}]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooooh, that makes much more sense. Move the comments down there then near that parameter? :)

@@ -860,27 +878,17 @@ class Driver {
* Return the body of the response with the given ID. Rejects if getting the
* body times out.
* @param {string} requestId
* @param {number} [timeout]
* @param {number} timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this param still optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. how do you mark optional? Are the brackets "clojure style"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{number=} is closure style :)

brackets are jsdoc

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Nice job handling some really core driver internals. :D

// Used when security error prevents page load.
INSECURE_DOCUMENT_REQUEST = 17;
// Used when security error prevents page load.
PROTOCOL_TIMEOUT = 18;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, can't rename.

But actually looks like we never exposed REQUEST_CONTENT_TIMEOUT in the proto. So I think we're OK. :)

@paulirish
Copy link
Member

@hoten proto conflict for ya

@connorjclark
Copy link
Collaborator Author

@paulirish resolved

@paulirish paulirish merged commit 9f15c6c into master Oct 24, 2018
@paulirish paulirish deleted the issue-6296-protocol-timeout branch October 24, 2018 22:13
@brendankenny
Copy link
Member

Yup looks like it's the font-size gatherer

That gatherer sends all those CSS.getComputedStyleForNode commands at once, up to 500, depending on the page's node count. It's possible command overhead/contention causes issues, especially on an older phone. We could try doing them in batches instead of all at once to see if that's a solution.

It's also remotely possible we hit some gross style-recalc issue, depending on the page and how the commands are coming in. It's possible it would be more efficient to do all the lookups in the page at once with evaluateAsync, since we could guarantee running them all in the same turn of the event loop/without reforcing layout. We'd have to resync node info and debugger nodeIds, though. Maybe there's a similar coalesced solution, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants