Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adoc escaping problems in source code: angle brackets #763

Open
anarthal opened this issue Dec 6, 2024 · 3 comments · May be fixed by #767
Open

adoc escaping problems in source code: angle brackets #763

anarthal opened this issue Dec 6, 2024 · 3 comments · May be fixed by #767

Comments

@anarthal
Copy link

anarthal commented Dec 6, 2024

The following generates problems:

#include <type_traits>

namespace repro {

/// A type trait.
template <class T>
struct trait
{
    /// The actual value.
    static constexpr bool value = std::is_same_v<T, int> || std::is_same_v<T, float>;
};

}  // namespace repro

int main() {}

For repro/trait/value.adoc, it generates:

= Reference
:mrdocs:

[#repro-trait-value]
== xref:../../repro.adoc[repro]::xref:../trait.adoc[trait]::value


The actual value.


=== Synopsis

Declared in `<pass:[main.cpp]>`
[source,cpp,subs="verbatim,macros,-callouts"]
----
constexpr
static
bool const value = pass:[std::is_same_v<T, int> || std::is_same_v<T, float>];
----



[.small]#Created with https://www.mrdocs.com[MrDocs]#

Which renders as:

Image

I'm worried that we're trying to solve an unsolvable problem, as the lack of a robust escaping strategy has been an open issue in asciidoc for 10 years (asciidoctor/asciidoctor#901).

@alandefreitas
Copy link
Collaborator

Here's the relevant code:

void
AdocGenerator::
escape(OutputRef& os, std::string_view str) const
{
static constexpr std::string_view formattingChars = "\\`*_{}[]()#+-.!|";
bool const needsEscape = str.find_first_of(formattingChars) != std::string_view::npos;
if (needsEscape)
{
os << "pass:[";
// Using passthroughs to pass content (without substitutions) can couple
// your content to a specific output format, such as HTML.
// In these cases, you should use conditional preprocessor directives
// to route passthrough content for different output formats based on
// the current backend.
// If we would like to couple passthrough content to an HTML format,
// then we'd use `HTMLEscape(os, str)` instead of `os << str`.
os << str;
os << "]";
}
else
{
os << str;
}
}

The only way I see to solve this problem consistently is for HTML to escape whatever goes inside pass. This has the obvious disadvantage of coupling the asciidoc output format to HTML.

An alternative would be to try to escape as much as possible with \ and only use a pass block as a last resort. The difficulty here is that commands chars must be escaped only in their opening tags and only when there's a corresponding closing tag. For instance,

  • _a_ renders as <em>a</em> (the usual case)
  • \_a_ renders as _a_ (usual escaping)
  • _a\_ renders as <em>a\</em> (no escaping the opening tag)
  • _a renders as _a (command not applied because there's no closing tag)
  • \_a renders as \_a (command not escaped because there's no closing tag)

In this case, it would have to be a more complex algorithm that keeps track of the context so that we know what kind of commands we are escaping. There's also a weird problem of precedence between commands:

  • *_a_* renders as <strong><em>a</em></strong> (the usual case)
  • _*a*_ renders as <em>*a*</em> (* is not effective because it's inside _)
  • \*_a_* renders as *<em>a</em>* (* is properly escaped and _ is applied)
  • \_*a*_ renders as \_*a*_ (_ is escaped and * is not applied because it's inside _ even though it's escaped)
  • _\*a*_ renders as <em>\*a*</em> (no escaping because * is not effective inside _)
  • *\_a_* renders as <strong>_a_</strong> (it does escape because _ is effective inside *)

I don't even know how to achieve _<strong>a</strong>_ because in _*a*_ the * passes through regardless of whether _ is escaped.

So, we must identify "effective" opening tags by examining the current context and looking ahead to ensure the tag is closing somewhere. Whenever in a command context, we can't try to escape anything else.

It's a complex algorithm that would evolve with time. However, now that I'm thinking more about it, escaping HTML inside pass is not such a bad idea because by using pass, the content is coupled with the output format anyway. It's just coupled in a way that's not what we want for any output format (such as literal <s for HTML). So the problem is not that we shouldn't escape HTML content in pass. The problem is we should try to never use pass at all.

@alandefreitas
Copy link
Collaborator

Another thing we could implement here is making the pass block apply only to consecutive characters that need it. For now, we apply pass to everything because it looked cleaner on the output, but being more precise gives us more correct output.

@alandefreitas
Copy link
Collaborator

I just did a small experiment with nested commands. The results are almost random. Sometimes, \ escapes the nested command. Sometimes it doesn't. It should be an unexpected effect of the way regex is applied.

But at least we can identify what should be escaped based on the context.

Nested formatters

_a_, _#a#_, _*a*_, _`a`_, _~a~_, _^a^_, _[[a]]_, _{a}_, _<<a>>_

#_a_#, #a#, #*a*#, #`a`#, #~a~#, #^a^#, #[[a]]#, #{a}#, #<<a>>#

*_a_*, *#a#*, *a*, *`a`*, *~a~*, *^a^*, *[[a]]*, *{a}*, *<<a>>*

`_a_`, `#a#`, `*a*`, `a`, `~a~`, `^a^`, `[[a]]`, `{a}`, `<<a>>`

~_a_~, ~#a#~, ~*a*~, ~`a`~, ~a~, ~^a^~, ~[[a]]~, ~{a}~, ~<<a>>~

^_a_^, ^#a#^, ^*a*^, ^`a`^, ^~a~^, ^a^, ^[[a]]^, ^{a}^, ^<<a>>^

[[_a_]], [[#a#]], [[*a*]], [[`a`]], [[~a~]], [[^a^]], [[a]], [[{a}]], [[<<a>>]]

{_a_}, {#a#}, {*a*}, {`a`}, {~a~}, {^a^}, {[[a]]}, {a}, {<<a>>}

<<_a_>>, <<#a#>>, <<*a*>>, <<`a`>>, <<~a~>>, <<^a^>>, <<[[a]]>>, <<{a}>>, <<a>>

Escape outter

\_a_, \_#a#_, \_*a*_, \_`a`_, \_~a~_, \_^a^_, \_[[a]]_, \_{a}_, \_<<a>>_

\#_a_#, \#a#, \#*a*#, \#`a`#, \#~a~#, \#^a^#, \#[[a]]#, \#{a}#, \#<<a>>#

\*_a_*, \*#a#*, \*a*, \*`a`*, \*~a~*, \*^a^*, \*[[a]]*, \*{a}*, \*<<a>>*

\`_a_`, \`#a#`, \`*a*`, \`a`, \`~a~`, \`^a^`, \`[[a]]`, \`{a}`, \`<<a>>`

\~_a_~, \~#a#~, \~*a*~, \~`a`~, \~a~, \~^a^~, \~[[a]]~, \~{a}~, \~<<a>>~

\^_a_^, \^#a#^, \^*a*^, \^`a`^, \^~a~^, \^a^, \^[[a]]^, \^{a}^, \^<<a>>^

\[[_a_]], \[[#a#]], \[[*a*]], \[[`a`]], \[[~a~]], \[[^a^]], \[[a]], \[[{a}]], \[[<<a>>]]

\{_a_}, \{#a#}, \{*a*}, \{`a`}, \{~a~}, \{^a^}, \{[[a]]}, \{a}, \{<<a>>}

\<<_a_>>, \<<#a#>>, \<<*a*>>, \<<`a`>>, \<<~a~>>, \<<^a^>>, \<<[[a]]>>, \<<{a}>>, \<<a>>

Escape inner

_a_, _\#a#_, _\*a*_, _\`a`_, _\~a~_, _\^a^_, _\[[a]]_, _\{a}_, _\<<a>>_

#\_a_#, #a#, #\*a*#, #\`a`#, #\~a~#, #\^a^#, #\[[a]]#, #\{a}#, #\<<a>>#

*\_a_*, *\#a#*, *a*, *\`a`*, *\~a~*, *\^a^*, *\[[a]]*, *\{a}*, *\<<a>>*

`\_a_`, `\#a#`, `\*a*`, `a`, `\~a~`, `\^a^`, `\[[a]]`, `\{a}`, `\<<a>>`

~\_a_~, ~\#a#~, ~\*a*~, ~\`a`~, ~a~, ~\^a^~, ~\[[a]]~, ~\{a}~, ~\<<a>>~

^\_a_^, ^\#a#^, ^\*a*^, ^\`a`^, ^\~a~^, ^a^, ^\[[a]]^, ^\{a}^, ^\<<a>>^

[[\_a_]], [[\#a#]], [[\*a*]], [[\`a`]], [[\~a~]], [[\^a^]], [[a]], [[\{a}]], [[\<<a>>]]

{\_a_}, {\#a#}, {\*a*}, {\`a`}, {\~a~}, {\^a^}, {\[[a]]}, {a}, {\<<a>>}

<<\_a_>>, <<\#a#>>, <<\*a*>>, <<\`a`>>, <<\~a~>>, <<\^a^>>, <<\[[a]]>>, <<\{a}>>, <<a>>

Escape both

\_a_, \_\#a#_, \_\*a*_, \_\`a`_, \_\~a~_, \_\^a^_, \_\[[a]]_, \_\{a}_, \_\<<a>>_

\#\_a_#, \#a#, \#\*a*#, \#\`a`#, \#\~a~#, \#\^a^#, \#\[[a]]#, \#\{a}#, \#\<<a>>#

\*\_a_*, \*\#a#*, \*a*, \*\`a`*, \*\~a~*, \*\^a^*, \*\[[a]]*, \*\{a}*, \*\<<a>>*

\`\_a_`, \`\#a#`, \`\*a*`, \`a`, \`\~a~`, \`\^a^`, \`\[[a]]`, \`\{a}`, \`\<<a>>`

\~\_a_~, \~\#a#~, \~\*a*~, \~\`a`~, \~a~, \~\^a^~, \~\[[a]]~, \~\{a}~, \~\<<a>>~

\^\_a_^, \^\#a#^, \^\*a*^, \^\`a`^, \^\~a~^, \^a^, \^\[[a]]^, \^\{a}^, \^\<<a>>^

\[[\_a_]], \[[\#a#]], \[[\*a*]], \[[\`a`]], \[[\~a~]], \[[\^a^]], \[[a]], \[[\{a}]], \[[\<<a>>]]

\{\_a_}, \{\#a#}, \{\*a*}, \{\`a`}, \{\~a~}, \{\^a^}, \{\[[a]]}, \{a}, \{\<<a>>}

\<<\_a_>>, \<<\#a#>>, \<<\*a*>>, \<<\`a`>>, \<<\~a~>>, \<<\^a^>>, \<<\[[a]]>>, \<<\{a}>>, \<<a>>

And here are the results:

Nested formatters

a, a, *a*, `a`, a, a, , {a}, [a]

a, a, a, a, a, a, , {a}, [a]

a, a, a, a, a, a, , {a}, [a]

a, a, a, a, a, a, , {a}, [a]

a, a, a, a, a, a, , {a}, [a]

a, a, a, a, a, a, , {a}, [a]

[[a]], [[a]], [[a]], [[a]], [[a]], [[a]], , [[{a}]], [[[a]]]

{a}, {a}, {a}, {a}, {a}, {a}, {}, {a}, {[a]}

[_a_], [a#], <<*a*>>, <<`a`>>, <<a>>, <<a>>, <<>>, [{a}], [a]

Escape outter

_a_, _#a#_, _*a*_, _`a`_, _a_, _a_, __, _{a}_, _[a]_

#a#, #a#, #a#, #a#, #a#, #a#, ##, #{a}#, #[a]#

*a*, *a*, *a*, *a*, *a*, *a*, **, *{a}*, *[a]*

`a`, `a`, `a`, `a`, `a`, `a`, ``, `{a}`, `[a]`

~a~, ~a~, ~a~, ~a~, ~a~, ~a~, ~~, ~{a}~, ~[a]~

^a^, ^a^, ^a^, ^a^, ^a^, ^a^, ^^, ^{a}^, ^[a]^

\[[a]], \[[a]], \[[a]], \[[a]], \[[a]], \[[a]], [[a]], \[[{a}]], \[[[a]]]

\{a}, \{a}, \{a}, \{a}, \{a}, \{a}, \{}, {a}, \{[a]}

<<_a_>>, <<#a#>>, \<<*a*>>, \<<`a`>>, \<<a>>, \<<a>>, \<<>>, <<{a}>>, <<a>>

Escape inner

a, #a#, \*a*, \`a`, ~a~, ^a^, [[a]], {a}, <<a>>

_a_, a, *a*, `a`, ~a~, ^a^, [[a]], {a}, <<a>>

_a_, #a#, a, `a`, ~a~, ^a^, [[a]], {a}, <<a>>

_a_, #a#, *a*, a, ~a~, ^a^, [[a]], {a}, <<a>>

_a_, #a#, *a*, `a`, a, ^a^, [[a]], {a}, <<a>>

_a_, #a#, *a*, `a`, ~a~, a, [[a]], {a}, <<a>>

, [[#a#]], [[*a*]], [[`a`]], [[~a~]], [[^a^]], , [[{a}]], [[<<a>>]]

{_a_}, {#a#}, {*a*}, {`a`}, {~a~}, {^a^}, {[[a]]}, {a}, {<<a>>}

[_a_], [a#], <<*a*>>, <<`a`>>, <<~a~>>, <<^a^>>, <<[[a]]>>, [{a}], [a]

Escape both

_a_, _\#a#_, _\*a*_, _\`a`_, _~a~_, _^a^_, _[[a]]_, _{a}_, _<<a>>_

#_a_#, #a#, #*a*#, #`a`#, #~a~#, #^a^#, #[[a]]#, #{a}#, #<<a>>#

*_a_*, *#a#*, *a*, *`a`*, *~a~*, *^a^*, *[[a]]*, *{a}*, *<<a>>*

`_a_`, `#a#`, `*a*`, `a`, `~a~`, `^a^`, `[[a]]`, `{a}`, `<<a>>`

~_a_~, ~#a#~, ~*a*~, ~`a`~, ~a~, ~^a^~, ~[[a]]~, ~{a}~, ~<<a>>~

^_a_^, ^#a#^, ^*a*^, ^`a`^, ^~a~^, ^a^, ^[[a]]^, ^{a}^, ^<<a>>^

[[_a_]], \[[#a#]], \[[*a*]], \[[`a`]], \[[~a~]], \[[^a^]], [[a]], \[[{a}]], \[[<<a>>]]

{_a_}, \{#a#}, \{*a*}, \{`a`}, \{~a~}, \{^a^}, \{[[a]]}, {a}, \{<<a>>}

<<_a_>>, <<#a#>>, \<<*a*>>, \<<`a`>>, \<<~a~>>, \<<^a^>>, \<<[[a]]>>, <<{a}>>, <<a>>

alandefreitas added a commit to alandefreitas/mrdocs that referenced this issue Dec 12, 2024
Replace escaping based on passthroughs without universal escaping based on character substitutions: https://docs.asciidoctor.org/asciidoc/latest/subs/replacements/

fix cppalliance#763
@alandefreitas alandefreitas linked a pull request Dec 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants