Skip to content

Update robots.txt for every stable release #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks
est31 opened this issue Mar 3, 2018 · 5 comments
Closed
2 tasks

Update robots.txt for every stable release #17

est31 opened this issue Mar 3, 2018 · 5 comments
Labels
A-doc Issue that affects doc.rust-lang.org C-enhancement Enhancement of an existing feature

Comments

@est31
Copy link
Member

est31 commented Mar 3, 2018

I'd heard that @imperio and @QuietMisdreavus from the docs team wanted to improve SEO for official rust documentation. So I dug a little and found that doc.rust-lang.org has a robots.txt. If we could list all stable releases in robots.txt, we could achieve the SEO improvements that the docs team wanted. Some things I found out:

  • Apparently you can't create any regexes for paths in robots.txt so you need to list all stable releases (googling told me).
  • All doc.rlo content including the robots.txt is hosted on S3.
  • rust-central-station seems to be uploading the docs to that S3 storage

It would be awesome if:

  • robots.txt got amended with the past releases
  • rust-central-station would update robots.txt on every release automatically
@follower
Copy link

follower commented Oct 5, 2020

"Disallow" considered harmful. :)

FYI, Rust docs SEO is probably being hurt by the use of "disallow" in the current robots.txt:

User-agent: *
Disallow: /1.
Disallow: /0.
Disallow: /book/first-edition/
Disallow: /book/second-edition/
Disallow: /stable/book/first-edition/
Disallow: /stable/book/second-edition/
Disallow: /beta/book/first-edition/
Disallow: /beta/book/second-edition/
Disallow: /nightly/book/first-edition/
Disallow: /nightly/book/second-edition/

It's also the cause of the "No information is available for this page" (or similar) message in top search results on Google .

How Godot handled poor doc SEO issue

Until recently Godot (as did many other RTD-based projects) had a similar SEO problem that was exacerbated by multiple language support.

If I recall correctly, the proper way to handle older versions of docs and ensure their "Google juice" isn't lost while also causing current docs appear highest is via use of link canonical tags, e.g. (from outdated version Godot 3.1 source):

<link rel="canonical" href="https://docs.godotengine.org/en/stable/" />

or for a specific page:

<link rel="canonical" href="https://docs.godotengine.org/en/stable/getting_started/step_by_step/" />

I did a lot of research into the issue for Godot and this issue has some additional details that may also apply to the Rust docs: godotengine/godot-docs#3262 (Especially the "Cause: robots.txt related." section.)

Tool for checking robots.txt

In addition, whoever has admin capability can apparently use this tool for more feedback on the existing robots.txt:

@pietroalbini pietroalbini transferred this issue from rust-lang/rust-central-station Oct 26, 2020
@pietroalbini pietroalbini added A-doc Issue that affects doc.rust-lang.org C-enhancement Enhancement of an existing feature labels Oct 28, 2020
@jyn514
Copy link
Member

jyn514 commented Feb 14, 2023

cc @jsha, this seems like something you'd be interested in

@jsha
Copy link

jsha commented Feb 14, 2023

Thanks for the tag @jyn514! I'd be curious what the original problems were with SEO for doc.rust-lang.org. From context I'm guessing it was a problem of Google choosing the wrong canonical URL, like we're trying to solve for docs.rs at rust-lang/docs.rs#1438? Perhaps in 2018 it was the case that each release was published at a versioned URL and there was no "current" URL like https://doc.rust-lang.org/std/?

Also just to confirm, the current contents of https://doc.rust-lang.org/robots.txt are the same as shown in @follower's comment. It looks like the current robots.txt actually does block the versioned URL for each stable release, by blocking /1..

I have access to the Google Search Console for rust-lang.org. Quantitatively, of the top 1000 pages Google sends people to, 754 are doc.rust-lang.org. Of those: 21 have /beta/ in the URL (mostly rust-by-example pages). 6 have /nightly/ in the URL. 327 have /std/ in the URL. 5 have a version number (e.g. /1.0.0/) in the URL. The rest, roughly speaking, are pages from the various books. So, overall, if the problem is "Google sends people to a versioned page and that version isn't the latest," I think this SEO problem has been solved.

Also, qualitatively, my experience has been that search results for pages on doc.rust-lang.org usually point to the right page.

So I propose to close this issue unless there are specific queries that someone can cite as giving wrong results.

@jyn514
Copy link
Member

jyn514 commented Feb 14, 2023

amazing, thanks @jsha! going to close this unless someone runs into trouble again :)

@jyn514 jyn514 closed this as completed Feb 14, 2023
@est31
Copy link
Member Author

est31 commented Feb 14, 2023

I opened it 5 years ago but I think the problem was that when you looked for some concept on rust, google would link to a random version's rustdoc, like say 1.20.0. Which would then surprise users when not everything was available.

Ultimately, no idea how or when this got resolved. I opened this back then to organize discussion on the issue which wasn't successful, but at least it got fixed. Shrug, closing is okay I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-doc Issue that affects doc.rust-lang.org C-enhancement Enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants