Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the API script to handle | characters #718

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

arnaucasau
Copy link
Collaborator

Summary

This PR changes how we handle pipes inside math expressions when we convert sphinx HTML into markdown. Some math expressions use pipes to define quantum states using the Dirac notation, and we need to escape those characters to avoid breaking the page when the pipe characters are used inside a markdown table.

Details

One solution could only handle the | characters used inside a table, but given that the math expressions could be used in nested tags (e.g <td> <p> <span class="math"> SOME_EXPRESSION </span></p></td>), and would need to make the script more complex without ensuring we fix all the cases where we could make the page to fail to render, I decided to handle that character differently in all math expressions.

The PR replaces the | character with \vert which will represent the same character. \vert needs extra space at the end to handle cases where the pipe was next to a non-numerical character. In those cases, we should avoid converting |x to \vertx given that the latter is not a valid command (\vert x is the correct conversion).

We also need to take into account that, in some cases, we still need to use the | characters because when escaped (\|), it represents a double pipe (||), which could be used in different mathematical expressions like the length of a vector.

This is the regex used, which only matches pipe characters not preceded by a backslash:

/(?<!\\)\|/gm

A new test was added to verify different cases where we can find a pipe character. The tests checks | characters outside math expressions, in math expressions inside a table, and in math expressions outside the table. We can also find a case where we intentionally want to use \| to create a double pipe. In the following screenshot, we can see the rendered result of the test.

test-example

Closes #488

Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I've re-generated the docs with the new script and it all LGTM 🚀

@arnaucasau arnaucasau added this pull request to the merge queue Jan 30, 2024
Merged via the queue into Qiskit:main with commit f6d67a1 Jan 30, 2024
2 checks passed
@arnaucasau arnaucasau deleted the handle-pipes branch January 30, 2024 15:05
@Eric-Arellano
Copy link
Collaborator

Unfortunately, this resulted in some pages crashing, per #673 (comment). This is because we broke column alignment settings for matrices, like this from PauliTable.md:

\begin{split}\left(\begin{array}{ccc|ccc}

If you use \vert for {ccc|ccc}, you get:

ParseError: KaTeX parse error: Unknown column alignment: \vert at position 37: …egin{array}{ccc\̲v̲e̲r̲t̲ ̲ccc}
    x_{0,0…

Maybe we were over-zealous with always replacing | with \vert. What about we fix #488 solely be fixing the problematic line? It's the only instance of this problem we've found! We actually could now even fix this by changing the HTML directly to use \vert. That works because these are historical API docs so we can modify the input HTML thanks to #737. The APIs were removed from modern Qiskit, so no worries about fixing qiskit/qiskit.

Eric-Arellano added a commit that referenced this pull request Feb 6, 2024
Eric-Arellano added a commit that referenced this pull request Feb 6, 2024
Reverts #718. Turns out that
breaks certain pages, as explained at
#718 (comment).

Instead, we fix the problematic pages by changing the original Sphinx
HTML in Box directly in this PR's last commit.

We have to manually revert the 0.45 docs due to
#755.
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
### Summary

This PR changes how we handle pipes inside math expressions when we
convert sphinx HTML into markdown. Some math expressions use pipes to
define quantum states using the Dirac notation, and we need to escape
those characters to avoid breaking the page when the pipe characters are
used inside a markdown table.

### Details

One solution could only handle the `|` characters used inside a table,
but given that the math expressions could be used in nested tags (e.g
`<td> <p> <span class="math"> SOME_EXPRESSION </span></p></td>`), and
would need to make the script more complex without ensuring we fix all
the cases where we could make the page to fail to render, I decided to
handle that character differently in all math expressions.

The PR replaces the `|` character with `\vert ` which will represent the
same character. `\vert` needs extra space at the end to handle cases
where the pipe was next to a non-numerical character. In those cases, we
should avoid converting `|x` to `\vertx` given that the latter is not a
valid command (`\vert x` is the correct conversion).

We also need to take into account that, in some cases, we still need to
use the `|` characters because when escaped (`\|`), it represents a
double pipe (`||`), which could be used in different mathematical
expressions like the length of a vector.

This is the regex used, which only matches pipe characters not preceded
by a backslash:
```ts
/(?<!\\)\|/gm
```

A new test was added to verify different cases where we can find a pipe
character. The tests checks `|` characters outside math expressions, in
math expressions inside a table, and in math expressions outside the
table. We can also find a case where we intentionally want to use `\|`
to create a double pipe. In the following screenshot, we can see the
rendered result of the test.


![test-example](https://github.com/Qiskit/documentation/assets/47946624/9603819c-4fb6-4d6a-997d-dc4fdcf1c3cd)

Closes Qiskit#488
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
Reverts Qiskit#718. Turns out that
breaks certain pages, as explained at
Qiskit#718 (comment).

Instead, we fix the problematic pages by changing the original Sphinx
HTML in Box directly in this PR's last commit.

We have to manually revert the 0.45 docs due to
Qiskit#755.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Improve API script to handle | inside table element values
3 participants