Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demos in readthedocs have broken links #1292

Closed
kjun9 opened this issue Apr 21, 2020 · 2 comments
Closed

Demos in readthedocs have broken links #1292

kjun9 opened this issue Apr 21, 2020 · 2 comments
Assignees
Labels
bug Something isn't working doc Issue relates to documentation sg-library

Comments

@kjun9
Copy link
Contributor

kjun9 commented Apr 21, 2020

Describe the bug

Follow-up from #1279

Notebooks were added to our readthedocs documentation by using nbsphinx, but seems like quite a number of links got broken. For example in loading-saving-neo4j:
image.png

@kjun9 kjun9 added bug Something isn't working sg-library labels Apr 21, 2020
kjun9 added a commit that referenced this issue Apr 21, 2020
This adds our demo notebooks to readthedocs documentation using `nbsphinx` and `nbsphinx-link`.

There are a variety of follow-ups mostly to improve formatting and make the automation more robust:
* messy sections #1294
* binder/colab buttons #1293
* broken links #1292
* readme as indices #1298
* validate script in CI #1299

See #1159
@huonw huonw added the doc Issue relates to documentation label Apr 21, 2020
@huonw huonw self-assigned this Apr 27, 2020
@huonw
Copy link
Member

huonw commented Apr 27, 2020

I think there's two causes here:

huonw added a commit that referenced this issue Apr 28, 2020
Our notebooks are designed to be narrated examples, which means they contain a
bunch of text and headings. It's helpful to have consistent formatting for these
for a few reasons:

- our "cloud-runner" cells look best if they're immediately after the notebook
  title, so the first cell should be only the title

- that title should be a H1 element (`# ...`), since it's semantically the main
  title and it ends up as the page title in the HTML docs (such as that shown on
  Read the Docs)

- there should only be one H1 element (`# ...`), since the others end up being
  considered siblings to the main title and thus appear in the HTML docs tables
  of contents. Also, it's better practice: for instance, MDN
  <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements#Usage_notes>:

  > You should only use one `<h1>` per page. Using more than one will not result
  > in an error, but using only one is seen as a best practice. It makes logical
  > sense — `<h1>` is the most important heading, and tells you what the purpose
  > of the overall page is. You wouldn't have a book with more than one title,
  > or a movie with more than one name! Having a single top-level title is also
  > arguably better for screenreader users, and SEO.

- headings shouldn't skip, meaning `## ...` (`<h2>` in HTML) should be followed
  by `### ...` (`<h3>`, or anything higher like `## ...`), not by `#### ...`
  (`<h4>`). For instance, reStructured Text requires "consistent" heading levels
  <https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#sections>
  without skipping:

  > However, a document must be consistent in its use of section titles: once a
  > hierarchy of title styles is established, sections must use that hierarchy.

  <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements#Usage_notes>:

   > Avoid skipping heading levels: always start from `<h1>`, next use `<h2>`
   > and so on.

This pull request adds a script that validates these rules, uses it on CI, and
thus fixes all of notebooks that violate these guidelines. In order to keep this
PR focused, it tries to _only_ fix the violations of the guidelines and not fix
other typos and so on.

The script has specific output about the problem, tries to have guidance for how
one might fix the problems and includes links to the NBViewer-rendered
notebooks. For instance,
https://buildkite.com/stellar/stellargraph-public/builds/3448 shows all of the
violations in our current `develop` notebooks.

For the the changes to notebooks themselves:

- GitHub comparison:
  - Before:
    https://github.com/stellargraph/stellargraph/blob/a310143ff/demos/embeddings/stellargraph-metapath2vec.ipynb
  - After:
    https://github.com/stellargraph/stellargraph/blob/9a1fd36/demos/embeddings/stellargraph-metapath2vec.ipynb

- HTML docs table-of-contents comparison (see #1363 for screenshots):
  - Before: https://stellargraph.readthedocs.io/en/v1.0.0rc1/demos/index.html
    many of the links are links to internal sections of the main demos
  - After:
    https://stellargraph--1363.org.readthedocs.build/en/1363/demos/index.html

This PR also resolves 13 of the 75 warnings in #1360 (all of the "Title level
inconsistent" ones).

There's potential future work we can do here, like:

- validating links to other notebooks are relative (this might play into #1292
  too)
- ensuring there's a "conclusion" section at the end of each notebook
- adding spellchecking (possibly overly annoying): #1368

See: #1294
huonw added a commit that referenced this issue Apr 30, 2020
This is a building block for #1292, since part of that will require validating
the contents of link elements (see: #1394), which will be much more reliable
with a real parser.

I chose the commonmark library https://github.com/readthedocs/commonmark.py
because it (a) provides easy access to an AST node type, and (b) that AST
records the source positions of block elements (in the form of the `sourcepos`
property, which is a list like `[start, end]`, where `start` and `end` are lists
like `[line_number, column_number]` (one-based)). The source positions make it
easy to report nice error messages with lines from the cells, without having to
reconstruct/guess at which part of the code was broken.

This does increase the amount of code. Most of that is in support code, while
the `@checker`s themselves and even error message construction get (arguably)
simpler. The code can now reason in terms of actual markdown entities, instead
of having to do regex parsing/guessing.

This also catches one instance of a heading that the regexes did not, in the
PPNP demo.

Validation: I added some broken-on-purpose notebooks and built them in:
https://buildkite.com/stellar/stellargraph-public/builds/3541

See: #1292
huonw added a commit that referenced this issue Apr 30, 2020
Our demo notebooks contain links to READMEs. These links don't work on Read the Docs/in Sphinx, because we have `index` (per the web server convention) files for each directory, not READMEs.

This PR writes a basic sphinx extension that rewrites any relative links to a file called "README.md" to instead be a link to "index.txt", matching our Sphinx index files. I modelled this after:

- https://www.sphinx-doc.org/en/master/extdev/index.html#dev-extensions
- https://github.com/spatialaudio/nbsphinx/blob/93398b9d8ba3922e860486d7eacf9eeb3b036580/src/nbsphinx.py#L1527-L1586

This approach allows us to retain links to READMEs in the notebooks, so that they work locally and on Github. These links are probably useful even with the move to Read the Docs being the canonical display of demos (#1391), due to being useful locally. (A simpler approach that breaks this would be to change the links to be correct for Read the Docs within each notebook.)

This also adjusts the case of one of the `connector/neo4j` README file, so that it's consistent with the rest of the repo, to make this code simpler.

This fixes 12 out of 66 of the remaining warnings in #1360; all of the ones like:

```
WARNING: File not found: 'demos/README.md'
WARNING: File not found: 'demos/node-classification/README.md'
```

Example: in the GCN node classification notebook (https://github.com/stellargraph/stellargraph/blob/1cca6b6c6f4cd1fea7c6932918cba18d15eef6d4/demos/node-classification/gcn/gcn-cora-node-classification-example.ipynb), there's a conclusion that includes links to both the parent node-classification README and the all notebooks README. Its Markdown source is:

```markdown
StellarGraph includes [other algorithms for node classification](../README.md) and [algorithms and demos for other tasks](../../README.md).
```

Before: https://stellargraph.readthedocs.io/en/latest/demos/node-classification/gcn/gcn-cora-node-classification-example.html#Conclusion These links stayed pointing to `README.md` in the respective parent directories, which don't exist in the Sphinx docs/on Read the Docs, and so they're dead links. Generated HTML (note the `external` CSS class, and the `href=.../README.md`):

```html
StellarGraph includes <a class="reference external" href="../README.md">other algorithms for node classification</a> and <a class="reference external" href="../../README.md">algorithms and demos for other tasks</a>.
```

After: https://stellargraph--1401.org.readthedocs.build/en/1401/demos/node-classification/gcn/gcn-cora-node-classification-example.html#Conclusion The links were rewritten and now work. Generated HTML (note the `internal` CSS class, and the `href=.../index.html`):

```html
StellarGraph includes <a class="reference internal" href="../index.html"><span class="doc">other algorithms for node classification</span></a> and <a class="reference internal" href="../../index.html"><span class="doc">algorithms and demos for other tasks</span></a>.
```

See: #1292
huonw added a commit that referenced this issue Apr 30, 2020
This moves us towards using Read the Docs as the main source of demos, by duplicating the indexing information from READMEs into the Sphinx docs.

We've decided to move to using Read the Docs like this because:

- it gives us more control over how things are shown (e.g. #1386)
- the display is faster
- it allows us to version properly, including with internal links to documentation of the same version as the demo

I used [m2r](https://miyakogi.github.io/m2r/index.html) to convert each of our demo READMEs to a reStructuredText `index.txt` file, and then fixed them up by hand. For each index, I've gone with:

```rst
Title
======

... description with subsections etc ...

Table of contents
------------------

... full ToC listing of subfolders/demos of this index ...
```

(Where the description is empty in some cases, for directories without READMEs: #1139.)

The thinking is we'll replace the README.md files with a much simplified set of information, potentially just a link to the corresponding Read the Docs. However, the READMEs are still useful as an index if someone has a copy of the demos locally in Jupyter lab. Thus, this PR also extends the `demo_table.py` script to render the demo indexing table to both a raw-HTML markdown version into `demos/README.md` and a rST `list-table` version into `docs/demos/index.txt`. This means we can still retain that demo table in an otherwise reduced/not-human-maintained `demos/README.md`.

Other than extending `demo_table.py`, this tries to do the minimal work to get this working. Future work:

- optimising the display of each of the tables (at the moment they scroll sideways a lot): #1418
- handling abbreviations in the main indexing table, because it doesn't seem easy to get hover text in rST: #1417
- validating links (likely covered by #1360)
- updating the demos to have relative links pointing to the index files, not README.mds (likely covered by #1292 and #1360)
- updating the README files to remove their content: #1419
- updating any links to the demos on GitHub to point to Read the Docs: #1420

Rendered demo landing page: https://stellargraph--1391.org.readthedocs.build/en/1391/demos/index.html

See: #1298
huonw added a commit that referenced this issue May 1, 2020
This means that all instances of nested formatting in our notebooks don't work
(spatialaudio/nbsphinx#301), including examples like
the above and also, more often, formatting within or around links:

```markdown
[some `code` in a link](url)
***[bold italic link](url)***
```

This PR removes all instances of this, in two ways:

| description                        | before                        | after                        |
|------------------------------------|-------------------------------|------------------------------|
| removing formatting                | ***text*** ***[link](#url)*** | **text**<br>[link](#url)     |
| moving link (especially doc links) | [some `method`](#url)         | some `method` ([docs](#url)) |

This removal was driven by extending the `notebook_text_checker` script to find
instances of this, using a real Markdown parser (#1393). The script will ensure
that we don't have more instances come into our notebooks in future.

Example of failure on CI (with the notebooks that were updated here):
https://buildkite.com/stellar/stellargraph-public/builds/3610#annotation-notebook_text_checker

I checked some of the instances I fixed here in the existing documentation, and
all of them were rendered incorrectly, validating that this is useful.

Examples:

- links in GCN node classification:
  Before:
  https://stellargraph.readthedocs.io/en/v1.0.0rc1/demos/node-classification/gcn/gcn-cora-node-classification-example.html#Loading-the-CORA-network
  After:
  https://stellargraph--1394.org.readthedocs.build/en/1394/demos/node-classification/gcn/gcn-cora-node-classification-example.html#Loading-the-CORA-network
- paper reference in Attri2Vec node classification
  Before:
  https://stellargraph.readthedocs.io/en/v1.0.0rc1/demos/node-classification/attri2vec/attri2vec-citeseer-node-classification-example.html
  After:
  https://stellargraph--1394.org.readthedocs.build/en/1394/demos/node-classification/attri2vec/attri2vec-citeseer-node-classification-example.html

All new docs:
https://stellargraph--1394.org.readthedocs.build/en/1394/demos/index.html

See: #1292
@huonw
Copy link
Member

huonw commented May 1, 2020

Now that #1394 and #1401 have landed, I think most of these are fixed. Scrolling through some random notebooks, they all seem to look good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working doc Issue relates to documentation sg-library
Projects
None yet
Development

No branches or pull requests

3 participants