Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incosistent reporting in Pagespeed Insights #9902

Closed
Kvabber opened this issue Oct 30, 2019 · 30 comments
Closed

Incosistent reporting in Pagespeed Insights #9902

Kvabber opened this issue Oct 30, 2019 · 30 comments
Labels
PSI/LR PageSpeed Insights and Lightrider

Comments

@Kvabber
Copy link

Kvabber commented Oct 30, 2019

Does anyone know why Pagespeed Insights is so inconsistent in its reports? I often get a gap on 10+ between analyses, and sometimes I get a huge gap when it doesn't load the page properly on either device (it never loads perfectly on both devices simultaneously).

2019-10-30 14_18_23-PageSpeed Insights – Google Chrome
2019-10-30 14_16_56-PageSpeed Insights – Google Chrome

@murali-krishnamoorthy
Copy link

Yes i am able to see the same issue and it started to happen around Oct 17th. I can see it both in the API and in pagespeedinsights .

@Kvabber
Copy link
Author

Kvabber commented Oct 30, 2019

@murali-krishnamoorthy yeah, it's been happening for the past two weeks for me as well. Some days it differs 10+ within minutes of testing. Is there another way to look for API updates?

@murali-krishnamoorthy
Copy link

Just to add , the resource data that api provides in its response , does not have all the resources on the page when it has a high score , so maybe its calculating the score before all the resources are downloaded in some instances

@patrickhulce
Copy link
Collaborator

The screenshot is a giveaway that some resources were not loaded, so @murali-krishnamoorthy's analysis is correct that when some resources are not present the score will go up because loading a page with lots of missing resources is much faster than loading a heavy page.

@patrickhulce patrickhulce added needs-priority PSI/LR PageSpeed Insights and Lightrider labels Oct 30, 2019
@murali-krishnamoorthy
Copy link

Hi i am not sure how soon this will be prioritized , but this impacts mostly all users.

@midudev
Copy link

midudev commented Oct 31, 2019

I can confirm that's happening on PageSpeed Insights API as well. We're getting far better scores because some resources are not loaded at all. It's like the test is being stopped or it detects early that's done.

@exterkamp
Copy link
Member

started to happen around Oct 17th

This is probably unrelated, from what I can tell nothing seems to have changed on our end on the 17th.

+1 to @patrickhulce & @murali-krishnamoorthy that this is because not all the content was loaded. As to why it wasn't loaded I'm not sure. @Kvabber / @murali-krishnamoorthy / @midudev do you have some URLs I could test against to look into this?

@midudev
Copy link

midudev commented Nov 1, 2019

@exterkamp I can reproduce the problem with this one: https://www.fotocasa.es/es/comprar/viviendas/madrid-provincia/madrid-zona-de/l?combinedLocationIds=724,14,28,173,0,0,0,0,0

In the image, you can see the content still is not completely loaded (that's the SSR with critical CSS):
image

It seems like it's not completely mounting the app on the client. Might be some kind of signal that is making think Lighthouse the load is over? Some different threshold from previous versions?

If I use the Audit tab with Chrome 78.0.3904.70 I get this result:
image

I see Chrome is using Lighthouse 5.2.0 while PageSpeed Insights is using 5.6.0

@murali-krishnamoorthy
Copy link

here is another example of with 2 varying score run within few seconds
URL : https://www.thermofisher.com/us/en/home.html

Screen Shot 2019-11-01 at 8 00 34 AM

Screen Shot 2019-11-01 at 8 01 06 AM

@Kvabber
Copy link
Author

Kvabber commented Nov 1, 2019

started to happen around Oct 17th

This is probably unrelated, from what I can tell nothing seems to have changed on our end on the 17th.

+1 to @patrickhulce & @murali-krishnamoorthy that this is because not all the content was loaded. As to why it wasn't loaded I'm not sure. @Kvabber / @murali-krishnamoorthy / @midudev do you have some URLs I could test against to look into this?

This happens across all of our clients actually. It hasn't been an issue before as far as I recall, but lately it hasn't been very trustworthy due to the inconsistency within seconds of testing.

The URL I showed in the first example is https://www.smarterlife.dk
I've tried different landingpages on the same domain and of course different domains. All chrome extensions are off / or in incognito, so there's no interference.

@murali-krishnamoorthy
Copy link

Hi - Any update on this issue

@exterkamp
Copy link
Member

Any update on this issue

I started to look into this. All I can say for now is that I can repro this kind of inconsistent behavior in PSI. Haven't been able to dig into the specifics of why it's cutting off in PSI.

Are you seeing this kind of behavior across other channels? ChromeDevTools/CLI?

https://www.fotocasa.es/es/comprar/viviendas/madrid-provincia/madrid-zona-de/l?combinedLocationIds=724,14,28,173,0,0,0,0,0
https://www.thermofisher.com/us/en/home.html
https://www.smarterlife.dk

These examples all appear to be mid load when we finish the run. Which is definitely odd. @patrickhulce these sites all seem to be loading content late/mid load, this seems like a case of us thinking the site is done mid load...

Might be some kind of signal that is making think Lighthouse the load is over? Some different threshold from previous versions?

@patrickhulce we didn't change any loading signals from 5.2->5.6 did we?

@patrickhulce
Copy link
Collaborator

Not that I'm aware of. #7356 was the last change to loading.

These examples all appear to be mid load when we finish the run.

I don't observe this behavior locally from CLI, so perhaps there was something else that changed in LR that started this?

Seems like we need #9878 but in PSI :)

@Lofesa
Copy link

Lofesa commented Nov 7, 2019

I´m facing this or similar issue.
In PSI and web.dev some times the request fails w/ net::ERR_ACCESS_DENIED message, in other times some resources are not loaded. In PSI this can be apreciated seeing the image that is not fully rendered, in web.dev the "Best practices" scored comes down by messages in the console, when you go to the messages, some resources are not loaded with the same net::ERR_ACCESS_DENIED.
In PSI when get the error is allways in mobile never in desktop, this make me thinking maybe is throttling related.
In Chrome, w/ LH 5.2.0 never fails.

Captura

Captura1

@bwheatley
Copy link

bwheatley commented Nov 7, 2019

We're seeing the same issues from the same time frame. We see it on multiple stacks. Wether we hit via CDN, GCLB, or direct server connection we see it about 5% of the time. It seems like javascript is not loaded on the "first" hit. Then it goes away for a few minutes and comes back.

https://www.bexrealty.com/real-estate/Boca-Raton/
https://www.bexrealty.com/Georgia/Atlanta/

You can see that it's missing the javascript files, it's not even requesting them from the server. There are no requests for it at the server level like it's specifically missing those resources.

@hazemhagrass
Copy link

I have exactly the same issue! Any updates around it?

@midudev
Copy link

midudev commented Nov 11, 2019

I'm guessing this is not getting movement because #ChromeWebSummit. After the event, I think we could see some updates.

@midudev
Copy link

midudev commented Nov 18, 2019

Bumping this. I'll be glad to help if you could point me in the right direction or assist in anything you might need. 🙇

@hagen-gloetter
Copy link

I´m facing this or similar issue here in germany.
especially the TTI fluctuates extremely, which has a strong effect on the score.
The picture shows the course of the TTI over the last 6 months:
image
Of course we made changes to our site during this time, but the site is all in all rather static. So there is no explanation why the TTI jumps from 2 to 7 seconds within 2 measurements - taken every 6 hours. After that there are again distances which show the normal 4.8-5 seconds. Unfortunately these normal distances are getting shorter and shorter, although the page has remained the same.

To verify the problem I set the monitoring to 30 min and measured the following with the official Google Lighthouse API:
1: one completely static page
2: a WordPress page and
3: our homepage (a lot of JS and CSS)
The result is as follows:
image
As you can see the static side (white curve) is actually very stable at 2.5s but suddenly jumps to less than 1s. WordPress (orange) sometimes jumps crazy through the area. From an average of 5.5s to less than 2s, which can't be at all.
And on our homepage (blue) there is almost no trend anymore.

image
it gets totally crazy when you look at the Total Byte Weight (TBW) over the last 2 days: The static side is 9,3mb big, LH always recognized only 8,5mb. but that was stable. now we get over hours partly 2 hours long values of (401,314,385,437)KB in succession. Also the WordPress page, which has not changed during this period, fluctuates between two measurements of (1985,2793)KB that is almost one MB.
Currently you can' t rely on the LH-API.
"Hey Google", please fix the issue.

@midudev
Copy link

midudev commented Nov 20, 2019

It seems the problem is fixed since some hours ago.

image

@exterkamp
Copy link
Member

Post Mortem: ERR_ACCESS_DENIED and PSI measurement inaccuracy

In the past month, we have had an increase in error rates inside PageSpeed Insights. This presented as an increase in fatal FAILED_DOCUMENT_REQUESTs and fatal NO_FCP errors, as well as non-fatal loss of network requests via net::ERR_ACCESS_DENIED errors.

These errors resulted in:

  • A large increase in performance metric/score variance due to random network requests being denied.
  • Some pages fail completely due to the main document request being denied.
  • 4x rise in FAILED_DOCUMENT_REQUEST errors.
  • Large increases in overall performance due to random network requests being denied.
  • 3x rise in NO_FCP errors due to critical requests being denied.

What Happened?

On October 16th, the infrastructure that makes outgoing network requests for PageSpeed Insights began to sporadically deny network requests due to some concurrent quota issues. This happened stochastically and silently, which made it not reproducible and hard to diagnose.

These errors began to happen slowly and rose over the period of 1 month, tripping none of our alerting infrastructure.

Additionally, these errors began to present mid way as we upgraded PSI from Lighthouse 5.4 to 5.6, and additional rendering upgrades. This disguised the gradual changes to error rates behind the usual fluxuations in upgrading.

Once the global error rate was noticed we updated and redeployed all our rendering infrastructure, and Lighthouse infrastructure, to no effect.

On November 19th, upon investigating our networking infrastructure more closely, we found that our outgoing network requests were hitting a concurrent requests quota. These denials presented to end users as net::ERR_ACCESS_DENIED which was the fallback Chrome network error code presented and was not due to access being denied, but us being denied access to the network.

We requested more quota immediately. It was approved hours later and the deployment began November 19th around 6PM PST. We saw error rates begin to decline and we expect this variability noise to resolve as well.

Important Dates

  • October 16th, 4am PST: Networking error rates begin to climb, quota ~4% over utilized.
  • October 25th, 4am PST: Networking error rates climb again, quota ~6% over utilized.
  • October 27th, 9pm PST: Networking error rates climb again, quota ~9% over utilized.
  • November 15th: Global error rate elevation noted.
  • November 19th, 6pm PST: Fix applied, error rates drop to pre issue levels.

Results

Once the fix was applied, error rates dropped:

  • NO_FCP errors dropped 66%.
  • FAILED_DOCUMENT_REQUEST errors dropped 75%.

Additionally, latency and performance scores returned to pre-October 16th levels.

Special Thanks

Thank you to everyone who reported this issue initially and continued to look into it and share their experiences, and especially the graphs of performance shifts, they were super helpful! ❤️

@Kvabber, @murali-krishnamoorthy, @midudev, @Lofesa, @bwheatley, @hazemhagrass, @hagen-gloetter, @TDurrr1, @naveedahmed1, @JustPlainHigh, @jmartinezpoq, @rootman, @darxide-pl

@naveedahmed1
Copy link

@exterkamp thanks for sharing the details, great job 👍

@midudev
Copy link

midudev commented Nov 20, 2019

Thanks @exterkamp & team for fixing it and giving such a detailed post morten comment. Much appreciated! 🙇

@hagen-gloetter
Copy link

Hi exterkamp,
thank you for the repair, the detailed post mortem note and the very nice credits.
Also with me the scores now look normal and constant again.
Every now and then I still have a spike or impact in my measurements, but that was also before the bug already. Here are my current charts:
image
image
and the most expressive one:
image
Great Job

@Lofesa
Copy link

Lofesa commented Nov 21, 2019

Hi to all
PSI and web.dev are much more stable than the latest days, but the issue is not 100% solved.
Now PSI and web,dev don´t throw any error like net::ERR_ACCESS_DENIED, but some files are not downloaded and this, I think, alter the score and timings.
For example, 2 consecutive runs on web,dev:

Captura
Captura1

In the 1st. you can see, in the javascript execution time section in passed audits, 4 file execution profile and in the 2nd you can see only 3 file execution profile.
Same in PSI, but here, I never got the 4 files. only 1 or 2.

@patrickhulce
Copy link
Collaborator

patrickhulce commented Nov 21, 2019

@Lofesa that audit doesn't display highlight all downloaded scripts, just ones where the CPU execution time is above 50ms. The script that's missing was 59ms previously so it probably just dropped below 50ms and was hidden.

@Lofesa
Copy link

Lofesa commented Nov 21, 2019

Thx @patrickhulce
Sorry for the noise.

@exterkamp
Copy link
Member

This seems sorted out now 😄 Thanks for all the comments and feedback! ❤️

@AlexChipev
Copy link

AlexChipev commented Jan 15, 2020

I think it is happening again. When we are adding a page to a domain for ex: https://some-domain/some-page we get the errors. Some of our domains are working and on others - not. And if we get the response the metrics are completely different comparing when Lighthouse is run in devtools or terminal.
working:
curl
'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=http%3A%2F%2Fcobra-responsive.sbtech.com%2Fsuper-fast-page%2F%3FshowLandingPage%3Dpreview%26lang%3Den&strategy=desktop&key=[YOUR_API_KEY]'
--header 'Accept: application/json'
--compressed

not working:
curl
'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=http%3A%2F%2Fonlinedemo-stg2.staging.sbtech.com%2Fsuper-fast-page%2F%3FshowLandingPage%3Dpreview%26lang%3Den&strategy=desktop&key=[YOUR_API_KEY]'
--header 'Accept: application/json'
--compressed

Please check.

@DorianAtGCG
Copy link

I'm also seeing it happening with the online tool.

Works fine in the local DevTool audit panel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PSI/LR PageSpeed Insights and Lightrider
Projects
None yet
Development

No branches or pull requests