-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to best handle "child cannot end after parent" errors? #8
Comments
Hello |
Our thinking was that if this has happened, all our assumptions about trace processing might not hold so it's better to alert the consumer than provide potentially false numbers. Anything extra about the type of activity you're recording you can share? |
Hi @patrickhulce I have next html file: <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Estimo Template</title>
</head>
<body>
<script src="https://unpkg.com/react@16/umd/react.development.js"></script>
<h1>Estimo</h1>
</body>
</html> And next code to capture Chrome trace: ...
await page.tracing.start({ path: traceFile })
try {
await page.goto(getUrlToHtmlFile(lib.html), { timeout: options.timeout })
} catch (err) {
handleSessionError(err, browser)
}
await page.tracing.stop()
... Then i'm trying to parse it with async function generateTasksReport(pathToTraceFile) {
const content = JSON.parse(await readFile(pathToTraceFile))
const tasks = tracium.computeMainThreadTasks(content, {
flatten: true,
})
return tasks
} On my local machine it works perfectly. /home/travis/build/mbalabash/estimo/node_modules/tracium/lib/main-thread-tasks.js:155
Rejected promise returned by test. Reason:
Error {
message: 'Fatal trace logic error - child cannot end after parent',
}
Function._computeRecursiveSelfTime (node_modules/tracium/lib/main-thread-tasks.js:155:13)
node_modules/tracium/lib/main-thread-tasks.js:159:37
Function._computeRecursiveSelfTime (node_modules/tracium/lib/main-thread-tasks.js:159:8)
node_modules/tracium/lib/main-thread-tasks.js:159:37
Function._computeRecursiveSelfTime (node_modules/tracium/lib/main-thread-tasks.js:159:8)
Function.getMainThreadTasks (node_modules/tracium/lib/main-thread-tasks.js:244:23)
Function.compute (node_modules/tracium/lib/main-thread-tasks.js:272:28)
Object.computeMainThreadTasks (node_modules/tracium/index.js:13:36) This may be due to the fact that |
From my side I think it's content related somehow? It happens sometimes when I test cnet.com, I test the same way all the time, but sometimes the trace fails. Since it's CNET I was thinking there's one of the ads or something that is acting up somehow? |
Hm interesting, thanks for the repro case @mbalabash. @soulgalore are your failures also on a linux box or CI environment only? |
@patrickhulce I could reproduce today on my Mac, else we are running in Docker on Linux (Ubuntu) where we also get it. I've also got reports from users that got the same error but on other pages, but they haven't shared their URLs. We used to use our own parsing lib but wanna move over to Tracium asap. And we got the same problem with events in the wrong order. |
Had a go and tried to reproduce with puppeteer (I use Chromedriver normally). I didn't get the same error but the same behaviour that one out of three/four runs fails. The one failing are waiting a long time on some content (I'm running without any throttling) and the runs that are ok finish within a couple of seconds, but when it fails it takes 15-20s (maybe). And when it fails I get LHError: NO_TRACING_STARTED. So I think there's some content that somehow creates the problem?
|
Thanks @soulgalore! NO_TRACING_STARTED is a different problem that's particularly prevalent in m74+. We've got a fix for it in LH master but it hasn't been ported over here yet. I'll have to dig in to those repro cases you've provided when I've got time. I haven't seen those cases locally before, but I'll try with your settings :) |
Thanks! @patrickhulce
No. I got same error when used local scripts. |
Yeah, I understand that it's another error but I thought I saw the same kind of behaviour. I've seen this on another URL (https://www.aftonbladet.se), it's a Swedish newspaper with a lot of ads. Let me see if I can find a way to reproduce. |
Hi @patrickhulce |
So far the only case where I've seen this happen where the trace isn't totally useless/non-sensical is when the ending trace event of the child that is B/E is missing and it was nested under an X. Available options I see:
I tend to support option 4 in Lighthouse, but I see the value of 1/2 in the tracium case where other consumers might be ready to be more forgiving (maybe it's even a setting to pass in |
If that's ok for you all that would be great. I've been testing Tracium in different scenarios the last days and have gotten others errors too. I wonder if it could be because of that I use Chromedriver and somehow some events is dropped? I also do other scenarios, like testing a SPA. I've got cases where one event don't get a duration/selfTime so that the https://github.com/aslushnikov/tracium/blob/master/lib/main-thread-tasks.js#L258 check fails. Also got traces where I get end event without a start as in https://github.com/aslushnikov/tracium/blob/master/lib/main-thread-tasks.js#L111 Also using the debug log in those places would help. |
Interesting, yeah our trace processing up until tracium has always been cold load of page with raw protocol so it's possible these other scenarios increase the likelihood of these otherwise unlikely events from LH perspective. @soulgalore if you could get us copies of any of these traces you're talking about I'd love to add some test cases and make sure we can handle them appropriately :) |
I can collect a couple traces today/tomorrow. For SPA I'm injecting a nav start event for it to work. However I think the problem is on my side since I use the Chromedriver. Checking the docs it says "...will continue tracing until Chrome closes.". Without checking the code, I guess when we get the trace, the driver has not stopped the trace and it doesn't wait on Tracing.tracingComplete, so we probably don't get all the events. I'll try to switch to use CDP directly instead, that will minimise the risk (and work) for me :) |
I made a quick POC and it works much better using CDP directly, sorry my fault, I'll try to push it to our test servers later today and then I will know for sure. |
Ok been running these for a while and it works much better, so it was on my side! I've got one use case where Tracium sometimes breaks using a SPA. The use case is like this: Hit a menu item to open up a menu, wait for it to appear, start a trace, click on the new menu item and measure that request and load. Sometimes (not always) I run into Fatal trace logic error - unexpected end event with an event that looks like this:
Maybe I can fix that on my side, not sure though. I can post the full trace log if it helps! |
Thanks @soulgalore that's good to hear many of the issues sorted themselves out!
Ah I can see this scenario if we start a trace in the middle of some event and we only get the end part of it which isn't the case in normal LH operations. I think turning this into a flag to drop these invalid cases (maybe stick the events into an array so the user can fail on their own if there were lots of them?) seems like the best way to go. |
Oh and a copy of that full trace would be awesome for test case prepping if you're willing :) |
Sharing is caring: https://gist.github.com/soulgalore/6b5d13a485334afe009eab3866379e5f |
Hello @patrickhulce @aslushnikov I added docker support for reproducting Travis CI errors in estimo. And it seems like strange chrome / puppeteer behavior. I run puppeteer like this: const browser = await puppeteer.launch({ headless, args: ['--no-sandbox', '--disable-setuid-sandbox'] }) All code for work with puppeteer here. I'm still do stuff like this (to analyze only js parse/execution time): <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Estimo Template</title>
</head>
<body>
<h1>Estimo</h1>
<script src="https://unpkg.com/react@16/umd/react.development.js"></script>
</body>
</html> And i got errors only in Docker / CI (on my host machine all works perfectly). Thanks |
Ah thanks so much for the docker repro @mbalabash! There's been movement on getting tracium publishable from within LH itself, so once that is done I'll work on getting a version published that can better handle these cases. |
Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8
Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8
Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8
I pushed the 0.2.1 version that sorts out some of the errors here that relate to trace event sorting. |
Thanks @aslushnikov! Right now I'm using a modified version of Tracium, where we don't throw the errors. I've been continuously running Tracium and got a couple of more errors (that happens sometimes, not all the time). I get them all on a single navigation on "normal" pages.
I can revert and run them on another server just to collect the trace log. But I wonder if you don't also get them in Lighthouse? We run the same trace config and since I skipping enabling trace using Chromedriver and instead using CDP I'm 99.9% sure the trace log that we collect is correct in the sense it's what we expect out of Chrome. |
I've been digging more into these and there's a major root issue we've identified that Chrome's invariants for trace event ordering have drifted slightly from what was expected when the original code was written ~2 years ago. To be fair to our Chromium colleagues, there was a document that notes order could be different in the future, but it was difficult to find these bugs without any examples of how it could be different. Now that we have them it'll just take time to sort through :) I'm working on some LH PRs that should hopefully help a lot with this class of error. |
Hi @aslushnikov
when I test cnet.com sometimes my trace logs is broken with the child cannot end after parent error. I've pushed an example trace here: https://github.com/soulgalore/tracium/blob/failing-cnet/examples/cnet.json
I think this started to happen 1-2 Chrome releases ago. I guess it should be fixed upstream but I wonder if Tracium should just swallow the error and continue?
Best
Peter
The text was updated successfully, but these errors were encountered: