hyperlinks in jupyter notebook are missing when converted to html with nbsphinx+sphinx #468

Waerden001 · 2020-06-04T00:45:08Z

I use sphinx with nbsphinx to generate HTML files from Jupyter Notebook files. But hyperlinks in the notebook doesn't show up in the converted html file. More precisely

I have an index.ipynb which contains an cell with html-style hyperlink <a href="https://google.com">Google</a> to the Google website.
I use the sphinx command make html with nbsphinx as an extension to generate the documentation
In the output index.html, the hyperlink turns into un-formated text, namely only the plain text Google appear in the source code of index.html, the hyperlink <a href="https://www.google.com"></a> part just disappears.

Does nbsphinx keep the hyperlinks in the notebook when used in sphinx as an extension?

The text was updated successfully, but these errors were encountered:

mgeier · 2020-06-10T16:13:16Z

HTML-style hyperlinks are currently not supported by nbsphinx.

You can use one of these instead:

https://google.com

<https://google.com>

[Google](https://google.com)

[Google][1]

[1]: https://google.com

[Google]

[Google]: https://google.com

MiniXC · 2020-06-11T11:03:27Z

Why is this the case, when regular Markdown supports html tags like <a>?
This can be useful when for example wanting to use a hyperlink with a class.
I noticed many other tags are stripped as well, e.g. <em>, <strong>, <article>...
And writing <p>test</p> results in <p></p><p>test</p><p></p> which is unexpected.
Is there a workaround for this other than using raw NBConvert html cells?

From the documentation

https://nbsphinx.readthedocs.io/en/0.4.1/markdown-cells.html#HTML-Elements-(HTML-only)

https://nbsphinx.readthedocs.io/en/0.4.1/raw-cells.html#HTML

Maybe these parts of the documentation should be clarified if certain html tags are stripped.

mgeier · 2020-06-12T07:30:39Z

Why is this the case, when regular Markdown supports html tags like <a>?

Simply because nobody has implemented it yet.
And until this very issue, nobody has requested it either.

This is quite easy to implement if you just want to simply convert Markdown to HTML and nothing else.

In the case of nbsphinx it is a bit more complicated, though.
The Markdown content is first converted (by pandoc plus some AST manipulations) to reStructuredText which is then converted to the internal representation of Sphinx/docutils.
From this internal representation, Sphinx can generate HTML and LaTeX (and EPUB, and ...) output files (involving some further custom manipulations).

Raw HTML snippets which are just passed through will be missing in the LaTeX output.

There are already two special cases implemented which also work with LaTeX output: <img> and <div class="alert alert-...">.

Theoretically, a third special case for <a> could be added.

This can be useful when for example wanting to use a hyperlink with a class.

I guess this could be implemented. Do you want to make a PR?

I noticed many other tags are stripped as well, e.g. <em>, <strong>, <article>...

I guess they get lost in the conversion from Markdown to reStructuredText.

I think they are swallowed by pandoc. I don't know if it's possible to avoid that.

In the long term, I'd like to avoid the intermediate reStructuredText representation (and the use of pandoc), see #36 (but this might still take quite a while). But then it might be easier to fix this.

And writing <p>test</p> results in <p></p><p>test</p><p></p> which is unexpected.

OK, that's strange, that's probably an artifact caused by the use of the various tools mentioned above.

Is there a workaround for this other than using raw NBConvert html cells?

You can write something like this in your Markdown cell:

<div class="my-class">

[Google](https://google.com)

</div>

The <div> tags will survive the conversion and then you should be able to use a CSS selector like .my-class a to select the link.

Alternatively, you could try if MyST-NB handles this situation more to your liking.

You can also try RunNotebook (which uses a more direct Markdown-to-HTML conversion) or any of the alternatives mentioned in https://nbsphinx.readthedocs.io/en/0.7.0/links.html.

Maybe these parts of the documentation should be clarified if certain html tags are stripped.

Yes, definitely, the documentation is missing some important information here!

Would you like to make a PR to fix this?

MiniXC · 2020-06-12T13:40:34Z

I will look into pandoc and see if there are options for converting html tags, that might be the cleanest solution.
Regarding the documentation: not just div seems to be supported, but audio and some others as well. If you know by any chance where these special html tags are converted to rst that would be a great help. Happy to make a PR for the docs, not sure if making an exception for just a tags would be worth it though.

MiniXC · 2020-06-12T14:12:30Z

Not sure if that might be out-of-scope for this issue, but my original use-case for <a> tags was that I wanted to replicate automatically linking to classes generated with autodoc as is possible in rst, e.g.:

:class:`.SomeClass`

And my specific problem was that I could not replicate the html the above line would generate in markdown. Long story short, pandoc actually extends markdown and accommodates this case with

`.SomeClass`{.interpreted-text role="class"}

This won't be nicely displayed in a notebook, but that would have been a long shot either way.

I think the documentation should more clearly say that Markdown cells are treated as pandoc markdown, I will submit a PR for that later.

Interestingly enough, any pandoc markdown that involves div, does not seem be supported by nbsphinx (maybe there is custom code for div in place?)

If I'm not mistaken, one could even add autodoc using the following:

<div class="automodule" data-members="" data-undoc-members="" data-show-inheritance="">

some_module.submodule

</div>

mgeier · 2020-06-14T12:14:46Z

Regarding the documentation: not just div seems to be supported, but audio and some others as well.

Yes, I think <audio> and <video> are the most relevant, that's why I'm showing them in https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#HTML-Elements-(HTML-only).

If you know by any chance where these special html tags are converted to rst that would be a great help.

The pandoc options are here:

nbsphinx/src/nbsphinx.py

Lines 1362 to 1366 in 992d555

 input_format = 'markdown' 

 input_format += '-implicit_figures' 

 v = nbconvert.utils.pandoc.get_pandoc_version() 

 if nbconvert.utils.version.check_version(v, '1.13'): 

 input_format += '-native_divs+raw_html'

The +raw_html setting passes some HTML tags (but apparently not all?) through.

Then there is some special handling for citations ans <img> tags, but <audio> and <video> don't need special handling.

You can check pandocs behavior like this:

$ pandoc -f markdown-native_divs -t rst
<div>bla</div>
^D
.. raw:: html

   <div>

bla

.. raw:: html

   </div>

Note that for (future) CommonMark compatibility, blank lines should be used inside the <div> tags:

$ pandoc -f commonmark -t rst
<div>bla</div>
.. raw:: html

   <div>bla</div>

vs.

$ pandoc -f commonmark -t rst
<div>

bla

</div>
^D
.. raw:: html

   <div>

bla

.. raw:: html

   </div>

Happy to make a PR for the docs,

That would be great!

not sure if making an exception for just a tags would be worth it though.

I don't know, probably not.

Not sure if that might be out-of-scope for this issue, but my original use-case for <a> tags was that I wanted to replicate automatically linking to classes generated with autodoc as is possible in rst, e.g.:
:class:`.SomeClass`

My work-around for autodoc links is https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#Links-to-Domain-Objects.

This is of course not as simple as :class:`SomeClass` , but the advantage is that the links also look somewhat reasonable in JupyterLab/nbviewer/Github.

I think the documentation should more clearly say that Markdown cells are treated as pandoc markdown, I will submit a PR for that later.

I would prefer not mentioning pandoc, because it is just an implementation detail which will be removed in the (rather far) future.

I think it would be better to mention a few tags that work (e.g. <div>, <audio>) and vaguely mention that not all tags work.

This way we are open for future changes in behavior.

Interestingly enough, any pandoc markdown that involves div, does not seem be supported by nbsphinx (maybe there is custom code for div in place?)

nbsphinx uses the -native_divs option, maybe that's the culprit?

The raw <div> tags are parsed in the ReplaceAlertDivs transform, in order to find "alert" divs which are turned into "notes"/"warnings".

But all other <div> elements should be passed through?

If I'm not mistaken, one could even add autodoc using the following [...]

You mean instead of using the automodule directive?

Why not just use a raw reST cell (or a separate reST source file) for that?

Waerden001 · 2020-06-14T21:26:37Z

Regarding the documentation: not just div seems to be supported, but audio and some others as well.

Yes, I think <audio> and <video> are the most relevant, that's why I'm showing them in https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#HTML-Elements-(HTML-only).

If you know by any chance where these special html tags are converted to rst that would be a great help.

The pandoc options are here:

nbsphinx/src/nbsphinx.py

Lines 1362 to 1366 in 992d555

input_format = 'markdown'

input_format += '-implicit_figures'

v = nbconvert.utils.pandoc.get_pandoc_version()

if nbconvert.utils.version.check_version(v, '1.13'):

input_format += '-native_divs+raw_html'

The +raw_html setting passes some HTML tags (but apparently not all?) through.

My use of a markdown cell is usually just a mixture of plain text, HTML tags, images and Latex code, nbsphinx + sphinx handle everything smoothly except those tiny HTML tags, so is it possible to handle more HTML tags like <a> by just modifying the +raw_html settings a little bit?

mgeier · 2020-06-15T08:23:55Z

I don't know. Probably. How would you modify them?

MiniXC · 2020-06-15T15:32:42Z

My work-around for autodoc links is https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#Links-to-Domain-Objects.

I saw that workaround, unfortunately it does not replicate the styling that is applied when linking to domain objects in sphinx. Functionally it does the same though, so it is a solution.

mgeier · 2020-06-15T15:50:43Z

I saw that workaround, unfortunately it does not replicate the styling that is applied when linking to domain objects in sphinx.

Yeah, I know, the problem is that reST doesn't allow nested markup, see #301.
This will hopefully become possible when #36 is solved, but this might take some more time ...

spatialaudio/nbsphinx#468

maartenbreddels mentioned this issue Jun 26, 2020

Launch Binder button does not work. vaexio/vaex#856

Open

This was referenced Aug 16, 2020

'Run in Colab' buttons not rendered on readthedocs QData/TextAttack#244

Closed

Replace HTML hyperlinks with Markdown image hyperlinks in ipynb docs QData/TextAttack#248

Merged

kysrpex mentioned this issue Apr 7, 2021

Make tutorials interactive with binder simphony/docs#115

Merged

mgeier mentioned this issue Apr 25, 2021

Force loading MathJax on HTML pages generated from notebooks #551

Merged

manoelmarques mentioned this issue Jun 21, 2021

Tutorial references do not come out as clickable links qiskit-community/qiskit-finance#60

Closed

sgbaird added a commit to sparks-baird/xtal2png that referenced this issue Jul 8, 2022

convert colab badges from html to markdown for nbsphinx compatibility

cbf5328

spatialaudio/nbsphinx#468

sgbaird mentioned this issue Jul 8, 2022

convert colab badges from html to markdown for nbsphinx compatibility sparks-baird/xtal2png#173

Merged

Oscilloscope98 mentioned this issue Aug 18, 2022

Fix "Open in Colab" button in notebooks intel-analytics/ipex-llm#5452

Merged

1 task

michaeldenes mentioned this issue Sep 24, 2024

Updating documentation in example notebooks OceanParcels/plasticparcels#56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hyperlinks in jupyter notebook are missing when converted to html with nbsphinx+sphinx #468

hyperlinks in jupyter notebook are missing when converted to html with nbsphinx+sphinx #468

Waerden001 commented Jun 4, 2020

mgeier commented Jun 10, 2020

MiniXC commented Jun 11, 2020 •

edited

Loading

mgeier commented Jun 12, 2020

MiniXC commented Jun 12, 2020

MiniXC commented Jun 12, 2020

mgeier commented Jun 14, 2020

Waerden001 commented Jun 14, 2020

mgeier commented Jun 15, 2020

MiniXC commented Jun 15, 2020 •

edited

Loading

mgeier commented Jun 15, 2020

hyperlinks in jupyter notebook are missing when converted to html with nbsphinx+sphinx #468

hyperlinks in jupyter notebook are missing when converted to html with nbsphinx+sphinx #468

Comments

Waerden001 commented Jun 4, 2020

mgeier commented Jun 10, 2020

MiniXC commented Jun 11, 2020 • edited Loading

mgeier commented Jun 12, 2020

MiniXC commented Jun 12, 2020

MiniXC commented Jun 12, 2020

mgeier commented Jun 14, 2020

Waerden001 commented Jun 14, 2020

mgeier commented Jun 15, 2020

MiniXC commented Jun 15, 2020 • edited Loading

mgeier commented Jun 15, 2020

MiniXC commented Jun 11, 2020 •

edited

Loading

MiniXC commented Jun 15, 2020 •

edited

Loading