Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown reader: taking an unusually long time during a certain kind of input #5166

Open
DaedlyKitten opened this issue Dec 19, 2018 · 8 comments

Comments

@DaedlyKitten
Copy link

The following is a 104 KB txt file of merely repeating the same paragraph that after some trial and error I managed to isolate from a lengthy novel I initially wanted to convert into EPUB,
when converting this sample txt to a EPUB file, it will take an exceedingly very long time, approximately several minutes on my PC, when usually any other ~100kb txt file conversion takes less than 1 second.

This is the file and command I used:

pandoc.exe "%homepath%\Downloads\sample.txt" -o %homepath%\downloads\11111.epub

sample.txt

version info:
pandoc.exe 2.5
Compiled with pandoc-types 1.17.5.4, texmath 0.11.1.2, skylighting 0.7.4
Default user data directory: C:\Users\konpo\AppData\Roaming\pandoc
Copyright (C) 2006-2018 John MacFarlane

@DaedlyKitten DaedlyKitten changed the title Taking a unusually long time during a certain kind of epub conversion (sample.txt included) Taking an unusually long time during a certain kind of epub conversion (sample.txt included) Dec 19, 2018
@DaedlyKitten
Copy link
Author

some further tries seem to suggest it has something to do with its particular length rather than specific characters? as the conversion time gets back to normal when i inserted random characters into the paragraph...

@mb21
Copy link
Collaborator

mb21 commented Dec 19, 2018

I can confirm even this with pandoc 2.5 takes way too long:

pandoc -f markdown -t native https://github.com/jgm/pandoc/files/2693913/sample.txt

Yet it works fine with pandoc 2.4

@mb21 mb21 changed the title Taking an unusually long time during a certain kind of epub conversion (sample.txt included) Markdown reader: taking an unusually long time during a certain kind of input Dec 19, 2018
@jgm
Copy link
Owner

jgm commented Dec 19, 2018 via email

@mb21
Copy link
Collaborator

mb21 commented Jan 1, 2019

@jgm, you are right. Reverting edc6510 fixes the performance degradation:

git revert edc651059ee617cf36b511de080a646d2e6513a4

@jgm
Copy link
Owner

jgm commented Jan 1, 2019

@DaedlyKitten if you use -f markdown-smart, you will not experience any performance degradation.
It happens because pandoc is trying to match up quotes and things like

'去岁嵩山之会

are causing the problem. It would be good to fix things so you don't get the slowdown even with +smart, but this workaround should be fine for your purposes.

@DaedlyKitten
Copy link
Author

@jgm thank you :)
I did the conversion with an earlier version of pandoc.
Thanks again for this awesome software!

@jgm jgm closed this as completed Feb 9, 2019
@mb21
Copy link
Collaborator

mb21 commented Feb 10, 2019

It would be good to fix things so you don't get the slowdown even with +smart

therefore reopening.

@jgm
Copy link
Owner

jgm commented May 23, 2021

A similar example from #7306, using pandoc -f textile:

<config>
    <section1>
        <a>
                <format identifier="1" offset="0" length="22" pn_format="%u-%u-%u" sn_format="%u-%u-%u-%u">%*02u%04u%04u%04u%02u%02u%03u%03u</format>
        </a>
    </section1>
    <section2>
        <b>
                <format identifier="2" offset="0" length="17" pn_format="%u-%u-%u" sn_format="%u-%u-%u-%u">%02u%03u%03u%02u%02u%02u%03u</format>
        </b>
    </section2>
    <section3>
        <c>
                <format identifier="2" offset="0" length="17" pn_format="%u-%u-%u" sn_format="%u-%u-%u-%u">%02u%03u%03u%02u%02u%02u%03u</format>
        </c>
    </section3>
</config>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants