Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Host different released versions of the documentation + version switcher #28941

Closed
7 tasks done
asfimport opened this issue Jul 5, 2021 · 11 comments
Closed
7 tasks done

Comments

@asfimport
Copy link
Collaborator

asfimport commented Jul 5, 2021

Related to ARROW-1299 for hosting the nightly docs, we could also keep the built docs for older versions instead of overwriting them at each release.

Currently, the built docs live in the apache/arrow-site repo (asf-site branch) in the "/docs/" subdirectory.

When releasing, we could add the newly built docs for that release in a subdirectory instead. Something like "/docs/5.0/" or "/docs/version/5.0".
(And we could retroactively add some docs of previous releases, if we want)

To make this useful for the user, we need a version switcher in the sphinx theme layout. There are other projects that use the same sphinx theme that have added this (see eg https://mne.tools/stable/index.html), and there is some work to upstream this to the base theme (but on the short term we could also copy such a custom implementation).

For the "stable" docs (latest release), we could either 1) keep a duplicated version of the latest built docs at "/docs/", or 2) symlink "/docs/" to "/docs/5.0/" (and update this for each release; although I am not sure this is possible since it's the other is a child directory).
We could also add a "/docs/stable/" and make this the default url.

cc @amol- @kszucs

Reporter: Joris Van den Bossche / @jorisvandenbossche

Subtasks:

Related issues:

Note: This issue was originally created as ARROW-13260. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Ian Cook / @ianmcook:
+1, many mature Apache projects version their docs like this and we should do this for Arrow.

@asfimport
Copy link
Collaborator Author

Nicola Crane / @thisisnic:
In terms of the R pkgdown docs, there is a ticket on the pkgdown repo open about implementing functionality for this, but it's not been done: r-lib/pkgdown#1373

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Copying from our meeting notes: we should take steps to avoid Google sending users to anything but the latest version (unless perhaps they are explicitly searching for an older version). https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls might have some pointers on how to do this.

As a first step, I created a PR to add the existing releases in subdirs: apache/arrow-site#139

One point for discussion (or bike shedding :)): what url do we want to use for the "current release" docs?
Readthedocs (and therefore quite some python packages) use the "stable" and "latest" for latest release and dev version, respectively (but I think it's not uncommon that people interpret "latest" as latest release and not latest development version ...). We could also use "stable" and "dev" to make this clearer.
Another option is to not use a tag for the current docs, so just use eg /docs/format/Columnar.html for the current release and then eg /docs/4.0/format/Columnnar.html for an older release. (this might also solve the issue about how to redirect/link the current pages to something with /stable/..)

@asfimport
Copy link
Collaborator Author

Alessandro Molina / @amol-:
One note, I think we want to make sure that /docs is a permanent redirect to /docs/stable|latest, that way we would make sure that we don't end up with the documentation indexed twice in search engines. Same way, we probably want /docs/5.0.0 to redirect to /docs/stable when 5.0.0 is the current stable version and not viceversa because we probably want to avoid people bookmarking a specific version and search engines indexing a specific version as the current stable one.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
It's also an option to keep "/docs" for the current release as is, without a redirect to "/docs/stable" (and thus without adding a "/docs/stable" url).

We should indeed make that we don't add unnecessary duplication (like have the current release twice), but of course we will end up with a large amount of duplication anyway, because the content of the 4.0 docs or 5.0 docs are largely the same. I mentioned a link above (https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls ) that describes some strategies to help with this. It seems that adding some "canonical" tag in the html can help, and https://docs.readthedocs.io/en/stable/guides/canonical.html has some information about doing that with sphinx.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Also, some review of apache/arrow-site#139 is welcome. It's adding the old docs to arrow-site, but the docs are also quite large, so not sure how many of those we will be able to keep (as long as we are using github pages, "Published GitHub Pages sites may be no larger than 1 GB." on https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages)

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
I have been researching the canonical url option a bit (or in general ways to avoid issues with duplicated content).

So Google mentions using a <link rel="canonical" href="..." /> link tag for this. It seems that this is actually supported by sphinx when using the html_baseurl configuration option. We don't use that yet, but adding it to conf.py and setting it to "https://arrow.apache.org/docs/" adds proper canonical links tags.
So we could start using this now, and then for the already released versions of the docs add this manually (I think writing a small script for this should be doable).

But, one possible drawback with canonical urls is that those can get out of date. For example, if a page is removed or renamed, the older already existing versions of the docs will still point to that url, so effectively pointing to a non-existing page. I don't know whether this is actually a problem, though?

According to https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls, adding a sitemap is also an alternative way to give Google a hint which page to use in search results. There is a sphinx extension that can generate this (https://github.com/jdillard/sphinx-sitemap, didn't yet check it out). So this could be an easier option instead of the canonical urls, in case those canonical urls need to be kept updated in older docs (the sitemap would only list the pages for the stable release, and thus is easy to keep up to date by regenerating it with sphinx when adding the built docs with a new release).

(I am no web expert, though, so not sure what option is best)

@asfimport
Copy link
Collaborator Author

Dominik Moritz / @domoritz:
It looks like there is still some work to do. We are blocking on this issue for the 6.0.1 release. Can we move it to 6.0.2?

@asfimport
Copy link
Collaborator Author

Nicola Crane / @thisisnic:
@domoritz I think that should be fine - the R version switching sub-tasks are not preventing other things from working.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Yes, it's only some of the sub-tasks that are tagged with 6.0.1, this parent issues doesn't need that tag.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
All subtasks are now resolved or closed, so I think we can close this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant