-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ipynb reader] smart extension does not parse ASCII en-dash correctly #7928
Comments
A little easier to see the issue with:
We're getting |
Notes:
Conclusion: the root problem is with the markdown writer. The markdown writer should render However, it's hard to see how we can fix this with current architecture. The function that builds our yaml metadata block gets passed something like |
Sorry don't quite follow here, (Context for others, markdown has the |
The problem arises because of the MetaString. With |
Well, it's a complex problem. Writing the MetaString gives us |
This example might be slightly clearer: Input: {
"cells": [],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"title": "Number range 5--6"
}
} Each pipe below is broken down into one single conversion to/from AST, The only problematic one involves ❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "nbformat" , MetaString "4" )
, ( "nbformat_minor" , MetaString "5" )
, ( "title" , MetaString "Number range 5--6" )
])
)
]
}
[]
❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native | pandoc -s -f native -t ipynb
{
"cells": [],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"title": "Number range 5--6"
}
}
❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native | pandoc -s -f native -t ipynb+smart
{
"cells": [],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"title": "Number range 5\\--6"
}
}
❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native | pandoc -s -f native -t markdown
---
jupyter:
nbformat: 4
nbformat_minor: 5
title: Number range 5\--6
---
❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native | pandoc -s -f native -t markdown-smart
---
jupyter:
nbformat: 4
nbformat_minor: 5
title: Number range 5--6
---
❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native | pandoc -s -f native -t markdown | pandoc -f markdown -t native -s
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "nbformat" , MetaInlines [ Str "4" ] )
, ( "nbformat_minor" , MetaInlines [ Str "5" ] )
, ( "title"
, MetaInlines
[ Str "Number"
, Space
, Str "range"
, Space
, Str "5--6"
]
)
])
)
]
}
[]
❯ < pandoc-bug.ipynb | pandoc -s -f ipynb -t native | pandoc -s -f native -t markdown-smart | pandoc -f markdown-smart -t native -s
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "nbformat" , MetaInlines [ Str "4" ] )
, ( "nbformat_minor" , MetaInlines [ Str "5" ] )
, ( "title"
, MetaInlines
[ Str "Number"
, Space
, Str "range"
, Space
, Str "5--6"
]
)
])
)
]
}
[] |
I think it is the ipynb reader didn't convert this to en-dash? Input: {
"cells": [],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"title": "Number range 5--6"
}
} ❯ < pandoc-bug.ipynb | pandoc -s -f ipynb+smart -t native
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "nbformat" , MetaString "4" )
, ( "nbformat_minor" , MetaString "5" )
, ( "title" , MetaString "Number range 5--6" )
])
)
]
}
[] Edit: the direct comparison to markdown would be Input: ---
jupyter:
nbformat: 4
nbformat_minor: 5
title: Number range 5--6
--- ❯ < pandoc-bug.md | pandoc -s -f markdown -t native
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "nbformat" , MetaInlines [ Str "4" ] )
, ( "nbformat_minor" , MetaInlines [ Str "5" ] )
, ( "title"
, MetaInlines
[ Str "Number"
, Space
, Str "range"
, Space
, Str "5\8211\&6"
]
)
])
)
]
}
[]
❯ < pandoc-bug.md | pandoc -s -f markdown-smart -t native
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "nbformat" , MetaInlines [ Str "4" ] )
, ( "nbformat_minor" , MetaInlines [ Str "5" ] )
, ( "title"
, MetaInlines
[ Str "Number"
, Space
, Str "range"
, Space
, Str "5--6"
]
)
])
)
]
}
[] which correctly convert to en-dash with smart. |
Yes - as I was trying to explain, this is because it's in the context of a MetaString, so the content is treated as plain text and not parsed as markdown. |
Two possible fixes were mentioned above:
|
This doesn't say that the title field is supposed to be parsed as markdown, does it? |
I tested and the result is that markdown in title/authors are not interpreted in nbconvert. (i.e. nbconvert does not parse title or authors in metadata as markdown.) |
Then that argues against the second solution above. Unfortunately, the first one would be quite difficult given how things are currently arranged... |
Actually, maybe this is simpler than I thought. |
Previously we used the markdown writer to render metadata. This had some undesirable consequences (e.g. en dash expanded to `--` when `smart` enabled), so now we use the plain writer. This addresses #7928, but I think a more elegant fix is possible.
I'm going to call this closed by my last commit. |
Thanks. I used the latest nightly from I used $ < pandoc-bug.ipynb | pandoc-nightly-macos-2022-02-24/pandoc -s -f ipynb+smart -t ipynb+smart | pandoc -s -f ipynb+smart -t ipynb+smart
{
"cells": [],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"title": "Number range 5\\--6"
}
} |
The problem is that your second pandoc in the pipe is the system one, not the nightly. |
Oh, right. How silly was me. |
MWE with pandoc 2.17.1.1 on macOS 12.2.1.
In
pandoc-bug.ipynb
,Running
< pandoc-bug.ipynb | pandoc -s -f ipynb+smart -t ipynb+smart
,Running it twice:
< pandoc-bug.ipynb | pandoc -s -f ipynb+smart -t ipynb+smart | pandoc -s -f ipynb+smart -t ipynb+smart
And you can see each round trip will add a
\\
.A similar construct in markdown doesn't result in this behavior, e.g.
echo '5–6' | pandoc -f markdown -t markdown | pandoc -f markdown -t markdown
.The text was updated successfully, but these errors were encountered: