-
Notifications
You must be signed in to change notification settings - Fork 27.4k
google.com search results are linking doc pages without stylesheets #16432
Comments
You don't want to use If you search for Partials are not supposed to have CSS. |
@frederikprijck is right. How did you get the partial url? |
@frederikprijck The CSS loads if I access it with that link. @Narretz I just searched "angular.element docs" on Google and that was that first link that popped up. |
@Narretz @hayleytom Wow that's correct and that's NOT good. So despite the fact CSS is not expected to be loaded on that page, it shouldn't be indexed by google! 😟 |
I see. Something went wrong with the robots.txt during migration I assume |
Weird the robots.txt is:
which looks correct to me. Any ideas? |
Perhaps there was a random crawling during the switchover to Firebase Hosting that missed the robots file? Can we trigger a new crawl @Narretz - I don't have access to the Search console for this site. |
I just pushed an update that added this robots.txt. 😊 Before there was actually none. I'll look if I have access. |
@petebacondarwin I don't have docs.angularjs.org in the search console either. Probably @IgorMinar must trigger the re-crawl. |
It's strange that the crawler seems to to be not honoring the #! contract. We are correctly setting the meta tag. I don't know if this is related, but I was just saw a notification that https://code.angularjs.org/snapshot/docs/js/all-versions-data.js is being banned via robots.txt and this prevents the crawler that indexes the site after the js executed from working. |
some other issues I found:
@Narretz can you please take a look? |
except for robots.txt the other two issues still seem unresolved. @Narretz I though you said that the sitemap was fixed a few days ago, did I misunderstand you? |
@IgorMinar I did, and when I checked it was there. Not sure what's going on. The sitemap is visible on the docs folders for the snapshots. (Not that we need them there, just saying that the build produces them) |
Ah I see, it was NOT copied to the deploy folder. Fixing right now ... |
@IgorMinar the site map is up: https://docs.angularjs.org/sitemap.xml The link to the ajax crawling scheme above says this is deprecated. I assume as long as we allow the js crawler access we don't need this. And it looks like the crawler hasn't reindexed the site at all. Because at least the partial url should not be in the results but it still is. |
I requested a recrawl using the sitemap via search console. the issue with partials is that we used to use the fragment scheme before google had a js-enabled crawler, now that js-enabled crawler is a thing the fragment scheme is in the way... :-( we spent a ton of time fixing js-crawler related issues for angular.io and I'm not sure if we want to go though the same effort on docs.angularjs.org. it would be better if we could just make the fragment scheme work well enough and not touch it any more. |
Okay, I actually wasn't aware that we used the escaped fragment rule, and so there is no rewrite rule (yet) for serving the escaped fragment. Shouldn't take too long to add it. However, the site also says the ajax schemes will be discontinued in summer 2018, so I think we still need the site to be crawlable by the JS bot. |
right, but I think we are now in the state where the escaped fragments are considered as results to be served to users, that's why we see bad search results. We should restore the escaped fragments functionality so that the results go back to normal - even though this functionality has been deprecated by google's crawler team. supporting the js-crawler is a big undertaking especially compared to restoring the escaped fragment route. |
Okay, so here's what we need to do afaict:
Sound good? |
Clarification:
Not sure what the better approach is. |
yeah. that sounds good, I think, with a few comments: I don't remember if google requests: with regards to serving - since there is a finite number of urls to serve, wouldn't it be simpler to have dgeni generate the firebase rewrites and then we don't need to deal with serving these files ourselves via functions or what not. Dgeni already does something very very similar when it generates the sitemap. I'm just looking for the most reliable and low maintenance solution... |
If there's no hash in the url there is no value in the escaped_fragment
parameter. There is a section in the guide for that:
https://developers.google.com/webmasters/ajax-crawling/docs/getting-started#3-handle-pages-without-hash-fragments
The old setup has "capture" for the parameter value, but like the guide
says, it doesn't seem to be necessary.
You mean dgeni should create a rewrite source / destination pair for every
piece of content we have which we will then insert into the firebase.json?
Unfortunately, Firebase rewrites do not support query parameters.
Everything after ? was ignored in my tests. So we probably need a cloud
function that handles all content sites that checks if the parameter is in
the url. Less then ideal of course.
Igor Minar <notifications@github.com> schrieb am Sa., 10. Feb. 2018 15:19:
… yeah. that sounds good, I think.
I don't remember if google requests:
https://docs.angularjs.org/api/ng/function/angular.element?_escaped_fragment_=
or if the query param has some value. I think you got it right. There was
some weirdness about this because of our use of "html5" urls rather than
hashbang urls. Can you look at the old server config to confirm that this
is right?
with regards to serving - since there is a finite number of urls to serve,
wouldn't it be simpler to have dgeni generate the firebase rewrites and
then we don't need to deal with serving these files ourselves via functions
or what not. Dgeni already does something very very similar when it
generates the sitemap. I'm just looking for the most reliable and low
maintenance solution...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16432 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGYSTJO6G9JSmXrttxjwkzo4oh3g5Rwks5tTaVmgaJpZM4R2dWY>
.
|
Okay, so ? is a glob wildcard for a single character. However, it doesn't seem possible to escape it. Actually, since ? matches any single character, it should still match ? in the url. I guess that means the query parameters are not matched to the rewrite after all. |
OK. So let's go with the cloud function. Can we get this fixed asap. We are starting to see increased number of people being affected by this. See: https://news.ycombinator.com/item?id=16353676 |
This commit restores serving the plain partials (content) when a docs page is accessed with ?_escaped_fragment_=. The Google Ajax Crawler accesses these urls when the page has `<meta type="fragment" content="!">` is set. During the migration to Firebase, this was lost, which resulted in Google dropping the docs almost completely from the index. We are using a Firebase cloud function to serve the partials. Since we cannot access the static hosted files from the function, we have to deploy them as part of the function directory instead, from which they can be read. Related to angular#16432 Related to angular#16417
This commit restores serving the plain partials (content) when a docs page is accessed with ?_escaped_fragment_=. The Google Ajax Crawler accesses these urls when the page has `<meta type="fragment" content="!">` is set. During the migration to Firebase, this was lost, which resulted in Google dropping the docs almost completely from the index. We are using a Firebase cloud function to serve the partials. Since we cannot access the static hosted files from the function, we have to deploy them as part of the function directory instead, from which they can be read. Related to angular#16432 Related to angular#16417
#16452 should fix it. |
The firebase deployment failed? (Because the job passed). Anyway let's call it a night. I'll take a look tomorrow |
It failed, because the firebase functions dependencies are not installed (and are apparently necessary). |
Thanks to @gkalpak the snapshots are here! https://docs.angularjs.org/guide/databinding?_escaped_fragment_= |
Finally, deployed: https://travis-ci.org/angular/angular.js/builds/340741409#L735 🎉 |
yay! awesome! thanks @gkalpak I tried to confirm that it actually worked, but the search console ui is confusing for the urls crawled via "escaped fragment" method. This is what I get: Note the the UI actually renders fine via the js-enabled-crawler, except that the crawler is not allowed to index that view because several URLs are being black listed in the robots.txt. @Narretz can you please fix them? This is the list: |
If I'm testing it correctly, then I think the fix in prod is working well. Example: url: https://docs.angularjs.org/api/ng/directive both work! |
I would have been devastated if only one of these worked after all this :> @IgorMinar I've updated the robots.txt to allow access to the js and images that are used by the docs app. (96bee0c) I've tested it with this tool: https://technicalseo.com/seo-tools/robots-txt/ That's not google specific though. Can you also please check what the crawler sees for the direct partials/ urls like https://docs.angularjs.org/partials/api/ng/function/angular.element.html ? Because the current robots.txt excludes them, but the site is still indexed ... |
Martin, I sent you invite to search console so that you can take a look at
what it looks like there.
…On Tue, Feb 13, 2018 at 1:18 AM Martin Staffa ***@***.***> wrote:
both work!
I would have been devastated if only one of these worked after all this :>
@IgorMinar <https://github.com/igorminar> I've updated the robots.txt to
allow access to the js and images that are used by the docs app. (96bee0c
<96bee0c>)
I've tested it with this tool:
https://technicalseo.com/seo-tools/robots-txt/ That's not google specific
though.
Can you also please check what the crawler sees for the direct partials/
urls like
https://docs.angularjs.org/partials/api/ng/function/angular.element.html
? Because the current robots.txt excludes them, but the site is still
indexed ...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16432 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANM6CFD7DInQ4pCRwubUSEiD57zc0oAks5tUVN_gaJpZM4R2dWY>
.
|
@Narretz I think it's fine to unblock the partial html. Since we provide a sitemap, the crawler should understand that that html is not a url we want to publicize. |
@IgorMinar good to see the traffic is recovering. For the partials I will add a noindex header which should complement the sitemap |
sounds good
…On Wed, Feb 14, 2018 at 9:05 AM Martin Staffa ***@***.***> wrote:
@IgorMinar <https://github.com/igorminar> good to see the traffic is
recovering.
For the partials I will add a noindex header which should complement the
sitemap
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16432 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANM6G2xRB4rYN6mERmLXFdvx-0XrXVCks5tUxJtgaJpZM4R2dWY>
.
|
…ials/ The sitemap.xml might also prevent the indexing, as the partials are not listed. Related to angular#16432
The sitemap.xml might also prevent the indexing, as the partials are not listed. Related to angular#16432
The original search for angular.element docs now returns a good result: unblocking the partials from the crawler has done the trick I think. I have requested a re-index of docs.angularjs.org/api and its direct links. Remaining issues:
|
I don't think that there is any action we need to take. Let's just monitor this further. |
The sitemap.xml might also prevent the indexing, as the partials are not listed. Related to angular#16432 Closes angular#16457 Closes angular#16446
This is resolved - the numbers are still lower than before the migration, but that is possibly an affect of the LTS announcement. |
I'm submitting a ...
Current behavior:
The CSS on the docs page for
angular.element
doesn't seem to be loading: https://docs.angularjs.org/partials/api/ng/function/angular.element.htmlExpected / new behavior:
The page would have CSS applied as usual.
Minimal reproduction of the problem with instructions:
Go to https://docs.angularjs.org/partials/api/ng/function/angular.element.html
AngularJS version: 1.x.y
stable
Browser: [all | Chrome XX | Firefox XX | Edge XX | IE XX | Safari XX | Mobile Chrome XX | Android X.X Web Browser | iOS XX Safari | iOS XX UIWebView | iOS XX WKWebView ]
Chrome 63 in macOS
Anything else:
These are the only network requests I'm seeing:
The text was updated successfully, but these errors were encountered: