Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Underline #6277

Merged
merged 6 commits into from
Apr 28, 2020
Merged

Support Underline #6277

merged 6 commits into from
Apr 28, 2020

Conversation

vaibhavsagar
Copy link
Contributor

This PR depends on jgm/pandoc-types#68.

@vaibhavsagar
Copy link
Contributor Author

References #6255.

@mb21
Copy link
Collaborator

mb21 commented Apr 12, 2020

Hey Vaibhav, how are you?

Cool that you're contributing to pandoc! Some time off in Singapore? ;-)

Just wanted to make sure you had seen my comment here: #5825 (comment)

So while we all agree that the Underline element in the pandoc AST is a good thing, I guess the discussion about how to render it in HTML is still open....?

Cheers,
Mauro

@vaibhavsagar
Copy link
Contributor Author

Hi Mauro! It's great to hear from you 😄, you're right that I have some downtime and thought that this might be a good opportunity to contribute. It seems like the latest guidance is that <u> is okay? I'm happy to leave this open until that discussion is concluded.

@vaibhavsagar
Copy link
Contributor Author

I'm encountering a test failure that isn't making any sense to me:

    Docx
      inlines
        font formatting:                                             FAIL (0.07s)
          Non-matching xml in word/document.xml:
          + 101           <w:u w:val="single" />
          - 102           <w:u w:val="single" />

Any idea what is going on here?

@mb21
Copy link
Collaborator

mb21 commented Apr 13, 2020

don't know about the test failure... can't see how it could be whitespace...? but maybe have a look yourself, along the lines of:

pandoc -f native input.native -o output.docx
unzip -d outdir output.docx
diff outdir/word/document.xml testdir/word/document.xml

@alerque
Copy link
Contributor

alerque commented Apr 13, 2020

Github has a useful feature for better handling of WIP pull requests called Draft PR's. You can create them as drafts when you open them or can now convert existing ones to drafts. This does a number of useful things and I suggest converting this to a Draft until it at least you as the submitter consider it ready for merging. PR reviews can happen during both phases.

@vaibhavsagar vaibhavsagar marked this pull request as draft April 13, 2020 09:20
@vaibhavsagar
Copy link
Contributor Author

Thanks, I've made this a draft PR. @mb21 I tried to follow your instructions but wasn't sure what input and testdir are meant to be in this case. I converted inline_formatting.docx and even regenerated it, but the regenerated version caused more tests to fail (for the reader this time), which seems to indicate that the parsing logic for .docx files needs to be updated as well.

@mb21
Copy link
Collaborator

mb21 commented Apr 13, 2020

hm.. I cannot reproduce? I added this to stack.yaml

extra-deps:
- github: vaibhavsagar/pandoc-types
  commit: 89bf303f11af19177948d9b6f24ef3e6a09a8009

then stack test gives me: 2 out of 2325 tests failed with:

    Docx
      document
        allow different document.xml file as defined in _rel...:     FAIL
          test/Tests/Helpers.hs:51:
          
          -------------------------------------------------------------- input ---
          [Header 1 ("test",[],[]) [Str "Test"]
          ,Para [Str "This",Space,Str "is",Space,Emph [Str "italic"],Str ",",Space,Strong [Str "bold"],Str ",",Space,Underline [Str "underlined"],Str ",",Space,Emph [Underline [Str "italic",Space,Str "underlined"]],Str ",",Space,Strong [Underline [Str "bold",Space,Str "underlined"]],Str ",",Space,Emph [Strong [Underline [Str "bold",Space,Str "italic",Space,Str "underlined"]]],Str "."]]
          ------------------------------------------------------------- result ---
            [Header 1 ("test",[],[]) [Str "Test"]
          - ,Para [Str "This",Space,Str "is",Space,Emph [Str "italic"],Str ",",Space,Strong [Str "bold"],Str ",",Space,Span ("",["underline"],[]) [Str "underlined"],Str ",",Space,Emph [Span ("",["underline"],[]) [Str "italic",Space,Str "underlined"]],Str ",",Space,Strong [Span ("",["underline"],[]) [Str "bold",Space,Str "underlined"]],Str ",",Space,Emph [Strong [Span ("",["underline"],[]) [Str "bold",Space,Str "italic",Space,Str "underlined"]]],Str "."]]
          + ,Para [Str "This",Space,Str "is",Space,Emph [Str "italic"],Str ",",Space,Strong [Str "bold"],Str ",",Space,Underline [Str "underlined"],Str ",",Space,Emph [Underline [Str "italic",Space,Str "underlined"]],Str ",",Space,Strong [Underline [Str "bold",Space,Str "underlined"]],Str ",",Space,Emph [Strong [Underline [Str "bold",Space,Str "italic",Space,Str "underlined"]]],Str "."]]
          ------------------------------------------------------------------------
      inlines
        font formatting:                                             FAIL
          test/Tests/Helpers.hs:51:
          
          -------------------------------------------------------------- input ---
          [Para [Str "Regular",Space,Str "text",Space,Emph [Str "italics"],Space,Strong [Str "bold",Space,Emph [Str "bold",Space,Str "italics"]],Str "."]
          ,Para [Str "This",Space,Str "is",Space,SmallCaps [Str "Small",Space,Str "Caps"],Str ",",Space,Str "and",Space,Str "this",Space,Str "is",Space,Strikeout [Str "strikethrough"],Str "."]
          ,Para [Str "Some",Space,Str "people",Space,Str "use",Space,Underline [Str "single",Space,Str "underlines",Space,Str "for",Space],Emph [Underline [Str "emphasis"]],Str "."]
          ,Para [Str "Above",Space,Str "the",Space,Str "line",Space,Str "is",Space,Superscript [Str "superscript"],Space,Str "and",Space,Str "below",Space,Str "the",Space,Str "line",Space,Str "is",Space,Subscript [Str "subscript"],Str "."]
          ,Para [Str "A",Space,Str "line",LineBreak,Str "break."]]
          ------------------------------------------------------------- result ---
            [Para [Str "Regular",Space,Str "text",Space,Emph [Str "italics"],Space,Strong [Str "bold",Space,Emph [Str "bold",Space,Str "italics"]],Str "."]
            ,Para [Str "This",Space,Str "is",Space,SmallCaps [Str "Small",Space,Str "Caps"],Str ",",Space,Str "and",Space,Str "this",Space,Str "is",Space,Strikeout [Str "strikethrough"],Str "."]
          - ,Para [Str "Some",Space,Str "people",Space,Str "use",Space,Span ("",["underline"],[]) [Str "single",Space,Str "underlines",Space,Str "for",Space,Emph [Str "emphasis"]],Str "."]
          + ,Para [Str "Some",Space,Str "people",Space,Str "use",Space,Underline [Str "single",Space,Str "underlines",Space,Str "for",Space],Emph [Underline [Str "emphasis"]],Str "."]
            ,Para [Str "Above",Space,Str "the",Space,Str "line",Space,Str "is",Space,Superscript [Str "superscript"],Space,Str "and",Space,Str "below",Space,Str "the",Space,Str "line",Space,Str "is",Space,Subscript [Str "subscript"],Str "."]
            ,Para [Str "A",Space,Str "line",LineBreak,Str "break."]]
          ------------------------------------------------------------------------

@vaibhavsagar
Copy link
Contributor Author

Oops, I forgot to push my latest changes. Can you try again?

@mb21
Copy link
Collaborator

mb21 commented Apr 14, 2020

Okay, so the failing test is test/Tests/Writers/Docx.hs -> search for "font formatting"

to see what's going on:

stack exec pandoc -- -f native test/docx/inline_formatting.native -o test.docx
unzip -d test test.docx 
unzip -d golden test/docx/golden/inline_formatting.docx
xmllint --format test/word/document.xml > test.xml
xmllint --format golden/word/document.xml > golden.xml
diff test.xml golden.xml

ha, indeed, your writer produces:

    <w:rPr>
      <w:u w:val="single"/>
      <w:i/>
    </w:rPr>

while the test says it should be:

    <w:rPr>
      <w:i/>
      <w:u w:val="single"/>
    </w:rPr>

Not sure why and which one is right, but that's why...

@vaibhavsagar
Copy link
Contributor Author

Thanks for finding that! I think my changes to inline_formatting.native are to blame, I modified it slightly and now I'm getting this error in the reader:

      inlines
        font formatting:                                             FAIL
          test/Tests/Helpers.hs:51:
          
          -------------------------------------------------------------- input ---
          [Para [Str "Regular",Space,Str "text",Space,Emph [Str "italics"],Space,Strong [Str "bold",Space,Emph [Str "bold",Space,Str "italics"]],Str "."]
          ,Para [Str "This",Space,Str "is",Space,SmallCaps [Str "Small",Space,Str "Caps"],Str ",",Space,Str "and",Space,Str "this",Space,Str "is",Space,Strikeout [Str "strikethrough"],Str "."]
          ,Para [Str "Some",Space,Str "people",Space,Str "use",Space,Underline [Str "single",Space,Str "underlines",Space,Str "for",Space],Emph [Underline [Str "emphasis"]],Str "."]
          ,Para [Str "Above",Space,Str "the",Space,Str "line",Space,Str "is",Space,Superscript [Str "superscript"],Space,Str "and",Space,Str "below",Space,Str "the",Space,Str "line",Space,Str "is",Space,Subscript [Str "subscript"],Str "."]
          ,Para [Str "A",Space,Str "line",LineBreak,Str "break."]]
          ------------------------------------------------------------- result ---
            [Para [Str "Regular",Space,Str "text",Space,Emph [Str "italics"],Space,Strong [Str "bold",Space,Emph [Str "bold",Space,Str "italics"]],Str "."]
            ,Para [Str "This",Space,Str "is",Space,SmallCaps [Str "Small",Space,Str "Caps"],Str ",",Space,Str "and",Space,Str "this",Space,Str "is",Space,Strikeout [Str "strikethrough"],Str "."]
          - ,Para [Str "Some",Space,Str "people",Space,Str "use",Space,Underline [Str "single",Space,Str "underlines",Space,Str "for",Space,Emph [Str "emphasis"]],Str "."]
          + ,Para [Str "Some",Space,Str "people",Space,Str "use",Space,Underline [Str "single",Space,Str "underlines",Space,Str "for",Space],Emph [Underline [Str "emphasis"]],Str "."]
            ,Para [Str "Above",Space,Str "the",Space,Str "line",Space,Str "is",Space,Superscript [Str "superscript"],Space,Str "and",Space,Str "below",Space,Str "the",Space,Str "line",Space,Str "is",Space,Subscript [Str "subscript"],Str "."]
            ,Para [Str "A",Space,Str "line",LineBreak,Str "break."]]
          ------------------------------------------------------------------------

For some reason part of the document is being parsed as Emph [Underline [Str "emphasis"]] even though it's already contained in an Underline element. I think this might be a change that needs to be made to the reader, what do you think?

@mb21
Copy link
Collaborator

mb21 commented Apr 14, 2020

well... I don't think pandoc does any kind of normalization on nestings of inline elements like strong/emph etc. but just preserves that, which is not perfect, but keeps the code simple:

echo '_**_foo_**_' | pandoc
<p><em><strong><em>foo</em></strong></em></p>

so guess just do the same with Underline?

@vaibhavsagar
Copy link
Contributor Author

Ah, thanks for clarifying that, I assumed that this was happening for the old Span that was being used.

@vaibhavsagar
Copy link
Contributor Author

vaibhavsagar commented Apr 16, 2020

I've regenerated inline_formatting.docx and now I'm running into an issue where w:val="single" isn't being output correctly, even though I think I've accounted for it in the writer:

  Writers
    Docx
      inlines
        font formatting:                       FAIL (0.05s)
          Non-matching xml in word/document.xml:
          +  89           <w:u w:val="single" />
          -  89           <w:u />
          +  95           <w:u w:val="single" />
          -  95           <w:u />
          + 101           <w:u w:val="single" />
          - 101           <w:u />

I think that if I'm able to fix this the PR should be ready to review.

Edit: actually, I misread this output, this seems like i'm always emitting w:val="single" even when it isn't necessary!

@vaibhavsagar
Copy link
Contributor Author

Seems to work now!

@vaibhavsagar vaibhavsagar marked this pull request as ready for review April 16, 2020 12:20
@vaibhavsagar vaibhavsagar changed the title [WIP] Support Underline Support Underline Apr 16, 2020
@jgm
Copy link
Owner

jgm commented Apr 18, 2020

The CI tests are still failing...

@vaibhavsagar
Copy link
Contributor Author

@jgm that's because this PR relies on jgm/pandoc-types#68

@jgm
Copy link
Owner

jgm commented Apr 19, 2020

@jgm that's because this PR relies on jgm/pandoc-types#68

You can add something to stack.yaml and cabal.project telling it to use a particular commit from your PR. Indeed, pandoc currently has such a stanza because we're depending on some table changes. (You'll need to rebase your pandoc-types changes on top of the commit pandoc is currently depending on, and then change stack.yaml and cabal.project to point to your version.)

@vaibhavsagar
Copy link
Contributor Author

Thanks, I did that and now all tests are passing except on macOS, where there is a GHC panic that I don't think I'm responsible for.

@vaibhavsagar
Copy link
Contributor Author

My guess for why the error is occurring is that you are caching .stack-work, which I don't think is a good idea.

@jgm
Copy link
Owner

jgm commented Apr 19, 2020

Can you say more about why caching .stack-work is a bad idea?
It has dramatically cut down compile times on CI.

@jgm
Copy link
Owner

jgm commented Apr 19, 2020

I've just pushed a commit removing the caching of .stack-work; we can see if that helps.
(You'd have to rebase on this.)

@vaibhavsagar
Copy link
Contributor Author

It might be partly superstition, but in the past I remember encountering weirdness in the way stack tries to incrementally rebuild, because caching intermediate build artifacts is a hard problem that I don't expect them to have necessarily solved in full generality. I would expect the same issues if we were caching dist-newstyle. The error I saw would be symptomatic of this issue, because it seems like it's looking for a symbol that doesn't exist yet as it's using a stale intermediate build artifact instead of compiling from scratch (which is slower but always correct).

This is only a guess on my part, though, and I do understand that caching .stack-work makes CI significantly faster. For the project I maintain (https://github.com/gibiansky/IHaskell) I've settled on caching ~/.stack which I think is more likely to be correct since it only contains finished build outputs.

@vaibhavsagar
Copy link
Contributor Author

Hmm, this still seems to be happening without .stack-work so it appears my guess was wrong. I have no idea what the real cause of the error is.

@jgm
Copy link
Owner

jgm commented Apr 19, 2020

I tried building your branch locally on my macOS computer, and got the same error we get in CI.

@jgm
Copy link
Owner

jgm commented Apr 19, 2020

Unfortunately, this is a blocker. It's probably worth reporting it as a bug to ghc, as the message requests, since it's reproducible.

@jgm
Copy link
Owner

jgm commented Apr 19, 2020

For reference, here's the error I get on my machine:

<no location info>: error:
    ghc: panic! (the 'impossible' happened)
  (GHC version 8.6.5 for x86_64-apple-darwin):
        Loading temp shared object failed: dlopen(/var/folders/l0/2t_cldbj26j_vsd9_q2tsf400000gn/T/ghc54764_0/libghc_107.dylib, 5): Symbol not found: _pandoczmtypeszm1zi20zmBUPf2Tq5atP8Lb94Ld23Jj_TextziPandocziBuilder_underline_closure
  Referenced from: /var/folders/l0/2t_cldbj26j_vsd9_q2tsf400000gn/T/ghc54764_0/libghc_107.dylib
  Expected in: flat namespace
 in /var/folders/l0/2t_cldbj26j_vsd9_q2tsf400000gn/T/ghc54764_0/libghc_107.dylib
                        
Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug                    

One more idea: try incrementing the version number in your pandoc-types. Eventually the version will need to go to 1.21 because of API changes (underline, table), but if you change that you'd have to change texmath and other dependencies, so to keep it simple you could try 1.20.1.

@jgm
Copy link
Owner

jgm commented Apr 21, 2020

See the approach in Tests.Readers.HTML. Before running rt tests, we purify the AST by removing elements that can't round-trip. You could add something similar to the Muse tests.

@vaibhavsagar
Copy link
Contributor Author

I took a closer look and learned that underlines are actually supported in Emacs Muse, so I updated the writer accordingly and this test run seems to be going much better.

@vaibhavsagar
Copy link
Contributor Author

Is there anything further you need from me on this PR?

@jgm
Copy link
Owner

jgm commented Apr 22, 2020

It looks good on first glance; I just need to make time for a more careful review.

@mb21
Copy link
Collaborator

mb21 commented Apr 26, 2020

For the record: switching back to master branch from this branch, couldn't build anymore either, got:

ghc: panic! (the 'impossible' happened)
  (GHC version 8.6.5 for x86_64-apple-darwin):
	Loading temp shared object failed: dlopen(/var/folders/55/rrvbjfdn5fd0wf7p8r8spc_m0000gn/T/ghc53115_0/libghc_214.dylib, 5): Symbol not found: _pandoczmtypeszm1zi21zmLb7evYKD6jv46JnhbWrkOK_TextziPandocziDefinition_zdfDataInline60_closure
  Referenced from: /var/folders/55/rrvbjfdn5fd0wf7p8r8spc_m0000gn/T/ghc53115_0/libghc_214.dylib
  Expected in: flat namespace
 in /var/folders/55/rrvbjfdn5fd0wf7p8r8spc_m0000gn/T/ghc53115_0/libghc_214.dylib

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug


--  While building package pandoc-2.9.2.1 using:
      /Users/maurobieg/.stack/setup-exe-cache/x86_64-osx/Cabal-simple_mPHDZzAJ_2.4.0.1_ghc-8.6.5 --builddir=.stack-work/dist/x86_64-osx/Cabal-2.4.0.1 build lib:pandoc exe:pandoc --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1

@jgm
Copy link
Owner

jgm commented Apr 26, 2020

Probably worth reporting this as a bug.
Deleting the stack caches seems to fix it, but that's not pleasant.

Copy link
Owner

@jgm jgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good; thank you very much. I've made a few comments in the code.

What about the readers, though? This only changes the writers. It would be good to support the readers too; otherwise we'll never actually get Underline elements to render!

In the case of Markdown I think that [span]{.ul} should go to Underline.

src/Text/Pandoc/Writers/CommonMark.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/Haddock.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/Jira.hs Show resolved Hide resolved
src/Text/Pandoc/Writers/Man.hs Show resolved Hide resolved
src/Text/Pandoc/Writers/Markdown.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/MediaWiki.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/Ms.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/Muse.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/RST.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Writers/Texinfo.hs Outdated Show resolved Hide resolved
@vaibhavsagar
Copy link
Contributor Author

Most of the readers were already generating underlineSpan in case of an underline, so I changed that to output Underline instead of a span as part of my earlier changes. I think that takes care of most of the readers, but I'm not sure if I got them all.

@jgm
Copy link
Owner

jgm commented Apr 27, 2020

Oh right, you handle the readers already via the change to underlineSpan. I'd missed that.
I think it would be cleaner to deprecate underlineSpan and just use B.underline directly in the readers instead of underlineSpan.

@jgm
Copy link
Owner

jgm commented Apr 27, 2020

Otherwise this is looking good. I'll go ahead and merge the pandoc-types changes, and then you can also change to depend on master in pandoc-types.

@vaibhavsagar
Copy link
Contributor Author

Updated, thanks!

@jgm jgm merged commit 9c2b659 into jgm:master Apr 28, 2020
@jgm
Copy link
Owner

jgm commented Apr 28, 2020

Thanks!

@vaibhavsagar vaibhavsagar deleted the support-underline branch April 29, 2020 00:59
@nackd
Copy link

nackd commented May 14, 2020

I'm looking into converting HTML with underlines to RTF and got here. It doesn't seem to work or at least nothing is underlined when I open the output files in LibreOffice. This is using the \pnul control word. I don't know much about the RTF format, but \pn* control words seem to be related with paragraph numbering. When I replace \pnul with \ul, the files look as expected on LibreOffice. Can someone confirm it should be \ul?

@jgm
Copy link
Owner

jgm commented May 14, 2020

Note that this change has been merged into master but it's not yet in any released version.

However, in all likelihood we just haven't supported underline yet in RTF. If you can confirm that (using a nightly) then please submit a new issue for that.

@jgm
Copy link
Owner

jgm commented May 14, 2020

Ah, I see -- \pnul is for paragraph number styling. Yes, it should be \ul. I can make that chaneg.

@nackd
Copy link

nackd commented May 14, 2020

Great, thank you!

@jgm
Copy link
Owner

jgm commented May 14, 2020

Done

sergiocorreia added a commit to sergiocorreia/panflute that referenced this pull request Nov 5, 2020
Adds Underline() object.

Markdown text set as [here]{.ul} will be underlined

See: jgm/pandoc#6277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants