Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipynb writer: handle cell output with raw block of markdown #7563

Merged
merged 2 commits into from
Dec 10, 2021
Merged

ipynb writer: handle cell output with raw block of markdown #7563

merged 2 commits into from
Dec 10, 2021

Conversation

ickc
Copy link
Contributor

@ickc ickc commented Sep 10, 2021

feat(ipynb writer): write RawBlock of markdown in code-cell output
#7561 makes the ipynb reader reads code-cell output with mime "text/markdown" to a RawBlock of markdown
This commit makes the ipynb writer writes this RawBlock of markdown back inside a code-cell output with the same mime, preserving this information in round-trip

test(ipynb): "text/markdown" mime type in code-cell output
This add tests on ipynb reader (#7561) and ipynb writer (#7563)'s ability to handle a "text/markdown" mime type in a code-cell output

@ickc
Copy link
Contributor Author

ickc commented Sep 10, 2021

@jgm, this is a draft and perhaps needs more discussion.

#7561 added support of text/markdown in cell output in ipynb reader.

This one concerns the ipynb writer, specifically how to have some sort of round-trip identity.

This patch so far only map a raw-block of markdown back to text/markdown in a cell output. In a naive test pandoc texp_repr_mimebundle.ipynb -s --to ipynb this would work.

But if the ipynb has ever been converted to markdown, the markdown raw-block will be written as is, without the markdown raw-block syntax (i.e. {=markdown})

How should this be resolved? Should the AST be aggressively (after not matching other cases) converted to a markdown raw-block?

@jgm
Copy link
Owner

jgm commented Sep 10, 2021

Maybe we should render a RawBlock (Format "markdown") in markdown as

  • a raw markdown block (```{=markdown}...) if it occurs in an output cell
  • otherwise, as its contents (as we'd usually do)

Would this make sense?

@ickc
Copy link
Contributor Author

ickc commented Sep 11, 2021

@jgm, I found that the same problem exists when there's a raw cell with mime-type text/markdown:

  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/markdown",
    "tags": []
   },
   "source": [
    "some markdown here"
   ]
  }

This markdown raw-cell is a strange thing as there's already a markdown cell type. But nonetheless it exists and is valid (although in Jupyter lab/notebook it won't be rendered.)

Then I remember commit 48fb6d9. So may be the solution is to enhance raw_markdown such that when this is specified (signaling people want to have round trip identity), not only will the markdown be written as is, but also enclosed in a markdown raw-block? What do you think?

Edit: I mean the proposal is that if raw_markdown is specified, any markdown will be enclosed as markdown raw-block, which should includes the 2 cases we mentioned in this thread.

@ickc ickc marked this pull request as ready for review December 3, 2021 09:26
@ickc
Copy link
Contributor Author

ickc commented Dec 3, 2021

@jgm, may be merging this first?

This patch already solve the roundtrip idempotence with ipynb to ipynb conversion.

The discussion above is about ipynb to markdown conversion is lossy (so that markdown back to ipynb doesn't have that markdown RawBlock.)

@jgm
Copy link
Owner

jgm commented Dec 6, 2021

I'm sorry, I can't really understand the issue, reading the comments above. Can you explain it more clearly, with an example, including what pandoc does and what you think it should do?

@ickc
Copy link
Contributor Author

ickc commented Dec 6, 2021

In short, this complements #7561, which implements reading raw markdown from ipynb output cell. This PR writes raw markdown block as ipynb output cell.

I think the motivation of having a raw markdown in output cell is already gone through in #7561. This issue is only to support writing what #7561 knows how to read already.

@jgm
Copy link
Owner

jgm commented Dec 6, 2021

I still need examples to understand.

@ickc
Copy link
Contributor Author

ickc commented Dec 6, 2021

@jgm, I added a test file. See if it helps clarifying.

@ickc
Copy link
Contributor Author

ickc commented Dec 7, 2021

Some of these tests are failing because the ordering of dict within JSON are different. Is there a better way to test this?

@jgm
Copy link
Owner

jgm commented Dec 7, 2021

Looks like in the output the keys are in alphabetical order -- so try making the input that way, too?

@ickc
Copy link
Contributor Author

ickc commented Dec 7, 2021

But the orders seem to depend on the platform? Also, I can't control the order of the output cell from jupyter (although for testing purposes we could change the ordering ourselves.) Edit: the ordering problem happens from native to ipynb, and seems to depends on platforms. So there's nothing than can be done by modifying the test files.

Also, does the test file address your question?

@ickc
Copy link
Contributor Author

ickc commented Dec 7, 2021

The test fails because of #7733. Should this test be temporarily disabled and merge first? Regardless testing, I guess the test files illustrates the reasoning behind this PR?

@jgm
Copy link
Owner

jgm commented Dec 7, 2021

Try rebasing against current master. I think some changes in upstream ipynb will give us deterministic order of keys (but this needs confirmation).

@ickc
Copy link
Contributor Author

ickc commented Dec 7, 2021

From the CI test, it seems that it doesn't help.

I can see that the id is now included.

But it doesn't change the ordering of the ipynb writer, and is still platform/compiler dependent.

@jgm
Copy link
Owner

jgm commented Dec 8, 2021

Try rebasing over commit 8215ca0
I think this might do the trick for a deterministic (sorted) output for mime bundles.

@jgm
Copy link
Owner

jgm commented Dec 8, 2021

(Existing tests may need to change, but at least you should get the same result on all platforms.)

@ickc
Copy link
Contributor Author

ickc commented Dec 8, 2021

I rebased and now it includes 8215ca0. But strangely the output ipynb doesn't change after this commit. (I just use the compiled pandoc itself to generate the ipynb/mime.out.ipynb:

pandoc -t native -f ipynb --ipynb-output=all ipynb/mime.ipynb -o ipynb/mime.native
pandoc -f native -t ipynb --wrap=preserve ipynb/mime.native -o ipynb/mime.out.ipynb

)

I.e. it generates the key order exactly the same as before commit 8215ca0.

Also, looking at the diff between ipynb/mime.ipynb & ipynb/mime.out.ipynb, the key isn't ordered in ipynb/mime.out.ipynb.

Also, 8215ca0 correctly points to jgm/ipynb@00246af

Another observation is that all the UNIX CI have the html and markdown key reversed, while the windows CI and my local macOS shares the same key ordering.

@jgm
Copy link
Owner

jgm commented Dec 9, 2021

We're focusing on the difference in platforms. Is there also a difference in the aeson versions they're compiled with? This should be in the build logs.

@ickc
Copy link
Contributor Author

ickc commented Dec 9, 2021

Unfortunately the log doesn't have the information of which aeson version it is using.
I downloaded the log archive and

find .  -type f -name '*.txt' -exec grep --color=auto -iHnE 'aeson' {} + 

and has no results.

Randomly selecting one of the build and search aeson using the web interface also has no results.

@jgm
Copy link
Owner

jgm commented Dec 9, 2021

Okay, I think 9cbea69 may help.

@jgm
Copy link
Owner

jgm commented Dec 9, 2021

Now that we have deterministic ordering, can you do these two things?

  1. remove the third (temporary) commit (fast-fail)
  2. add a fuller commit message to the first commit, describing how this changes pandoc's behavior

otherwise, looks good to me!

ickc added 2 commits December 9, 2021 15:41
#7561 makes the ipynb reader reads code-cell output with mime "text/markdown" to a RawBlock of markdown
This commit makes the ipynb writer writes this RawBlock of markdown back inside a code-cell output with the same mime, preserving this information in round-trip
This add tests on ipynb reader (#7561) and ipynb writer (#7563)'s ability to handle a "text/markdown" mime type in a code-cell output
@ickc
Copy link
Contributor Author

ickc commented Dec 10, 2021

@jgm, ready to merge.

@jgm jgm merged commit 20eb8ac into jgm:master Dec 10, 2021
@jgm
Copy link
Owner

jgm commented Dec 10, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants