-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely poor docs.python.org SEO performance. #1691
Comments
@JulienPalard Any ideas here? Do you have/need access to the google search console for docs? |
I do have access to the search console, but I don't think I'm of any help from an SEO point of view. The search console is telling us our "mobile ergonomy is bad", it's probably just that? I heard Google is ranking mobile pages first. We have a PR opened to make the doc responsive since may here: python/python-docs-theme#46 I don't know if it could help. |
Today, Google, for the Unlink many spammy websites, docs.python.org does not have a page dedicated to those topics, so they probably win because it's in the title of their page, and the more specific we go, the more they'll win. For example, for On the other hand, for An obvious way would be to write a page for all those topics, but user-generated content is equally good at this job (stack overflow typically), with less contributor bottleneck. I'd leave this question to the to-be-created new doc sig. Sadly if we don't do it, spammers will always get first, with bad quality or outdated content, just so they can display their ads. |
Another issue is that the search engines often seem to prefer docs for older releases than for newer releases, e.g.: The missing description is probably also hurting us. The 'learn why' link goes to https://support.google.com/webmasters/answer/7489871?hl=en |
That other issue you mentioned, @di, is something I've noticed about the Django docs as well -- google often seems to rank older versions much higher. I wonder if that problem could be solved with a rel=cannonical? |
FWIW the Rust packages docs also had this problem, and seemed to have solved it, but I can't remember how (and ironically, googling it is useless) -- I don't see rel=canonical on docs.rs pages so there may be another tactic in addition. |
It looks like Python is indexing old versions, and they are disallowed in the robots.txt: https://docs.python.org/robots.txt -- at least the link that @di is pointing at. That is probably the first thing I'd try to fix them, but agreed that canonicalization can sometimes help. Google is quite fickle though, and hard to understand how to fix this. We've tried a number of different things. We have lots more tips here: https://docs.readthedocs.io/en/stable/guides/technical-docs-seo-guide.html. I'm guessing the Python docs history of non-mobile friendly design has probably hurt it a lot over time. I believe that's fixed now though. The first step I'd do is probably add canonical links to The next step is definitely diving into Google Search Console for what it says there. |
Just to echo Eric, this should definitely be the next step. Whoever has access to y'alls search console will get a lot of details about what Google is doing. For example, there may be something Google sees as spam or duplicated and they've taken some downranking action against the domain. I looked at the A larger (and harder to fix) issue is that a lot of the Python documentation isn't written with search engines in mind. I would tackle issues with robots, sitemaps, and search console first, but this might be worth a look afterwards. Just to give a couple concrete examples:
|
Probably this ticket: rust-lang/rust#12466 |
Looks like we have canonicals links to
I may probably be fixed with some proper sed-fu, but is it worth it as previous versions are denied by robots.txt: $ curl https://docs.python.org/robots.txt
Sitemap: https://docs.python.org/sitemap.xml
# Prevent development and old documentation from showing up in search results.
User-agent: *
Disallow: /dev
Disallow: /release
# Disallow EOL versions
Disallow: /2/
Disallow: /2.0/
Disallow: /2.1/
Disallow: /2.2/
Disallow: /2.3/
Disallow: /2.4/
Disallow: /2.5/
Disallow: /2.6/
Disallow: /2.7/
Disallow: /3.0/
Disallow: /3.1/
Disallow: /3.2/
Disallow: /3.3/
Disallow: /3.4/
|
I totally agree, I don't think we can beat those, to the point I wonder if we should do the same: for the most searched functions, to build a dedicated "howto" or "tutorial", with up-to-date good practices, examples, and so on. But I don't feel my english level is enough to start this kind of project ☹ |
I would definitely fix it. This will stop version 3.4 from showing up in Google's results. You may have to open up the As to whether to take on a huge docs reformatting/rework projects, it's a terrifying never-ending project of incremental improvement. I'd fix all the concrete easy things (like 3.4 docs showing up in search engines) first. |
Any update on stuffing the older docs that lack rel=canonical information with canonical tags? 3.3 and such are still showing up on top in many searches such as Googling for |
@JulienPalard Please could you use your sed-fu? |
I can propose:
Followed by:
to clean the cache. I just passed it for 3.4, tell me I should go ahead on 3.0, 3.1, 3.2, and 3.3 or if you see an issue.
|
Canonical looks good at https://docs.python.org/3.4/library/, as does https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fdocs.python.org%2F3.4%2Flibrary%2F What do others say? Good to do 3.0 - 3.3? |
makes sense, go ahead for the earlier 3s as well. |
|
This is definitely a step in the right direction, but Google hasn't indexed it yet. I'm not sure whether you need to open up the I did verify that the canonical tag is on that page so it should be picked up eventually. |
I agree they should be temporarily removed from That said, I'm not seeing how the current |
Never mind, I see that it's here: https://github.com/python/docsbuild-scripts/blob/3a75c4dcac91e25d6188b750b7beb0546d40eb90/templates/robots.txt#L8-L22 |
@di I just removed them: python/docsbuild-scripts@c49181f |
@JulienPalard Thanks! Let me know when that's deployed and I can submit them for reindexing! |
I see that the This is because the sitemap provides a URL like I think we probably need to a) make sure the canonical URLs are in the sitemap and b) put many more URLs into the sitemap (possibly, every URL we have). Right now, the sitemap only includes:
And doesn't include any older Python versions or any sub-pages. |
Also, to resolve the issue in OP, where https://docs.python.org/2/library/sets.html is the 2nd result for https://www.google.com/search?q=python+set, I think we probably need to update canonical tags and remove |
Another thing: it seems like for translations, our canonical URLs should be pointing to the translated versions of these pages with:
https://developers.google.com/search/blog/2010/09/unifying-content-under-multilingual |
It seems like many 3.x pages are still missing canonical tags as well:
|
Ohhh interesting! Those lost their So we have to find all pages like this...
and fix them manually... at the end some will still not have a 'canonical', or at least not one to |
I don't believe in Google conspiracy - they're lower ranked simply because they're BAD and thus people don't want to use them. I felt this way when I first started learning python, and I still do even though I know most of the language basics. Take the It's 2024 and it still says the very unfriendly "No Information Available", while linking to 3.4 docs. So I go to https://docs.python.org/3.4/library/stdtypes.html and what do I get? A gigantic page titled "4. Built-in Types". Wait, I thought I was looking for strip! So how do I get there? Well first I need to know that the function will be under string methods and is actually called If you want to be user-friendly, you should have one page per TYPE, per CLASS, and in the best case per METHOD. If I google I see this term maybe overused on Stack Overflow but this issue is an "XY Problem". The issue asks why the docs SEO is atrocious, when it should be asking why the docs themselves are atrocious. (There's speculation from large youtube creators that its infamous "algorithm" is the same way - if you want to be consistently recommended, you have to be worthwhile and quality first.) It's mind-boggling to me that a language as massive and successful as python would never improve its languishing documentation. |
Instead of disallowing docs for old versions in For example, how Go does it:
docs.rs uses essentially the same approach. But instead of a noindex tag they add |
Describe the bug
docs.python.org has atrocious search performance on Google. It's so bad that I suspect Google is actively downranking it for some reason.
To Reproduce
Search Google for virtually any Python documentation topic. The ones that drove me here were searches for [python set] and [python shuffle list].
Expected behavior
docs.python.org is the authoritative source for Python documentation on the web. I expect to find relevant results on docs.python.org somewhere on the first page of Google search results.
Instead, I find the opposite. For [python set] I expect to find https://docs.python.org/3/library/stdtypes.html#set
somewhere on the first page of results, but instead, the only python.org result I see is for the long-deprecated
Python 2
sets
module --https://docs.python.org/2/library/sets.html
For [python shuffle list], neither https://docs.python.org/3/library/random.html nor any other python.org result shows up anywhere on the first page.
Screenshots
Additional context
These results are egregious enough to make me suspect you're
being actively downranked for some reason. This isn't a request for general SEO optimization -- although that'd be a great project if someone has the interest -- but for a domain admin to try to use Google's search console (https://developers.google.com/search) to investigate if there's something egregiously wrong with an easy fix.
The text was updated successfully, but these errors were encountered: