Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested inline styles turn into extra RST text #4368

Closed
danse opened this issue Feb 14, 2018 · 8 comments
Closed

nested inline styles turn into extra RST text #4368

danse opened this issue Feb 14, 2018 · 8 comments

Comments

@danse
Copy link
Contributor

danse commented Feb 14, 2018

a block like [Emph [Str "first", Strong [Str "second"]]] gets translated by pandoc into *first\ **second*** which is not an accurate RST translation because RST does not support nested inlines

this does not represent a syntax error but it causes extra text to be displayed to the user when RST is converted to other formats like HTML

we propose to ignore the innermost styles, in order to preserve the style given to a broader section of the text

danse added a commit to italia/pandoc that referenced this issue Feb 14, 2018
@danse
Copy link
Contributor Author

danse commented Feb 14, 2018

here is an unit test italia@185309e

@danse
Copy link
Contributor Author

danse commented Feb 15, 2018

i'm using writer.rst as a guide to assess the desired behaviour here. in the case of *An `emphasized link </url>`__*, i argue that it's preferable to drop the style and keep the link even when the style is the outermost inline. the reason is that dropping the link would cause a bigger loss of information

@jgm
Copy link
Owner

jgm commented Feb 16, 2018

See #4339 for an alternative approach?

@danse
Copy link
Contributor Author

danse commented Feb 18, 2018

that approach uses substitutions to produce nested inline markup, which as far as i understand are not supported in RST by design. to me it looks like an hack which relies on the implementation of RST parsers. as far as i understand the logic needed to adopt the hack would feature:

  • render RST substitutions
  • manage incrementally named substitutions to be generated programmatically
  • detect links which would benefit from the hack
  • turn links into suitable substitutions

i'm not sure that the complexity added to the code would be worth the value added by styled links at this stage. the pull request we propose is already bringing some clear advantages without trying to work around RST limitations, further improvements could be discussed separately

@alterat
Copy link

alterat commented Feb 19, 2018

An alternative approach to selecting the outermost format would be to always give priority to bold over italics.

The rationale behind this is that bold formatting is used to highlight or give emphasis to something. Removing bold altogether could be a loss of useful information for the reader, probably more than losing an italics along the way.

Proposed approach:

- **[_Italics_] inside bold**  - would become - **[Italics] inside bold**
- _[**Bold**] inside italics_ - would become - _[_**Bold**_] inside italics_

@danse
Copy link
Contributor Author

danse commented Feb 20, 2018

i like how @atorin's solution looks like even though at the moment i'm not sure how to write it, i'll try some approaches to it

danse added a commit to italia/pandoc that referenced this issue Feb 22, 2018
danse added a commit to italia/pandoc that referenced this issue Feb 22, 2018
danse added a commit to italia/pandoc that referenced this issue Feb 22, 2018
@danse
Copy link
Contributor Author

danse commented Feb 22, 2018

done, i rebased the branch with a version that does what described by
@atorin. the logic is rather sophisticated at this point so i
refactored the code a couple of times trying to improve its
readability as much as possible. i also added plenty of comments

i have some doubts about setInlines and getInlines, they
seem verbose and there is probably some way to abstract that kind of
logic. it looks like we might want to have maybeGetInlines in
Text.Pandoc.Walk because the logic of walkInline and queryInline
there seems to be based on a similar pattern match

the handling described above for strong markup is now also given to
nested links, so for example italics link italics is turned into
italics link italics, which is not what is proposed in #4339
but it keeps more information than the behaviour i proposed earlier

@danse
Copy link
Contributor Author

danse commented Mar 9, 2018

i went through all of this again while investigating other issues related to inlines. the solution i proposed was a pure [Inline] -> [Inline] function containing its own recursive visiting logic, while the RST writer visits nested inlines through the mutual recursion of inlineToRST and inlineListToRST.

the logic proposed by @atorin produces visually appealing documents and we can keep using it through this filter. integrating it in the writer is complex though, so before working on it i would like to know whether we all agree about the direction.

the implementation of inlineToRST is straightforward now and it could get less readable if we added flattening logic. in general the whole feature would be spread throughout multiple functions making it harder to follow and test.

until now inline processing logic did not require to navigate the nested structure so it could be cleanly contained in inlineListToRST, but this logic cannot as far as i understand. there might be some advanced abstraction enabling us to apply this transformation as a cohese function, but nothing comes to my mind at the moment

jgm pushed a commit that referenced this issue Mar 18, 2018
danse added a commit to italia/pandoc that referenced this issue Apr 17, 2018
nested inlines are not valid RST syntax, so we flatten them following
some readability criteria discussed in jgm#4368.
danse added a commit to italia/pandoc that referenced this issue Apr 17, 2018
nested inlines are not valid RST syntax, so we flatten them following
some readability criteria discussed in jgm#4368.
danse added a commit to italia/pandoc that referenced this issue Apr 17, 2018
nested inlines are not valid RST syntax, so we flatten them following
some readability criteria discussed in jgm#4368.
@jgm jgm closed this as completed in eef1c21 Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants