Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: leave temporary file behind with --verbose #2288

Closed
pauloney opened this issue Jul 9, 2015 · 33 comments
Closed

feature request: leave temporary file behind with --verbose #2288

pauloney opened this issue Jul 9, 2015 · 33 comments

Comments

@pauloney
Copy link

pauloney commented Jul 9, 2015

The conversion to PDF could leave the temporary files behind when called with the option --verbose. They are displayed on the screen, but many of the files produced can be images, etc ... whose display on the terminal window is not optimal.

Paulo Ney

@jgm
Copy link
Owner

jgm commented Jul 9, 2015

No, I don't want to do this. --verbose should do what its
name suggests, not litter your file system with temp files.
I don't see why this is needed, since --verbose gives you
full information about the latex command used and the full
text of the source file.

+++ Paulo Ney de Souza [Jul 09 15 15:09 ]:

The conversion to PDF could leave the temporary files behind when
called with the option --verbose. They are displayed on the screen, but
many of the files produced can be images, etc ... whose display on the
terminal window is not optimal.

Paulo Ney


Reply to this email directly or [1]view it on GitHub.

References

  1. feature request: leave temporary file behind with --verbose #2288

@KurtPfeifle
Copy link

Couldn't it be implemented by using a separate switch, explicitly, like --keep-tempfiles, or --do-not-cleanup ?

@jgm
Copy link
Owner

jgm commented Jul 9, 2015

+++ Kurt Pfeifle [Jul 09 15 16:00 ]:

Couldn't it be implemented by using a separate switch, explicitly, like
--keep-tempfiles, or --do-not-cleanup ?

That would be the way to do it, but I still don't see a
compelling reason.

@hftf
Copy link
Contributor

hftf commented Jul 9, 2015

This option makes debugging easier. For example, in my own Makefiles, I have:

%.tex: %.html
    pandoc -o $@ $< ...

%.pdf: %.tex
    xelatex $< ...

because even though the intermediate .tex file is only useful for the rare debugging, it's easier to always build it than to fiddle around with the one-target Makefile:

%.pdf: %.html
    pandoc -o $@ $< ...

@KurtPfeifle
Copy link

The background of Paul's feature request is this thread

which I just now transformed into


A "compelling" reason is to get easier access to the intemediate LaTeX files. In the current case it was relatively easy to copy'n'paste 81 lines of LaTeX code from the terminal (after scrolling about 1000 lines back). But imagine an 810 or 8100 lines long LaTeX file, and scrolling back 10.000 lines in the terminal...
(Ok, I know how to use tee or how to re-directed stderr/stdout into different files, but still... )

It's a good thing we now have at least --verbose :)

Without it, it would have been impossible (for me!) to pinpoint at the line which is the culprit for the failed pdflatex run...

@pauloney
Copy link
Author

And it is not just a question of a long TeX file. In my case the pandoc run
produces a few thousand PNG's (the html comes from InDesign) and there is
no good way to handle the images in the terminal window.

I agree that would be best implemented as a separate flag from --verbose.
leaving files behind at verbose was just a compromise suggestion. In my
specific case many of these png's are the same thing and I want the ability
to order them by size and check on what they look like ... so having them
left behind would be really nice.

Paulo Ney

On Thu, Jul 9, 2015 at 4:52 PM, Kurt Pfeifle notifications@github.com
wrote:

The background of Paul's feature request is this thread

which I just now transformed into


A "compelling" reason is to get easier access to the intemediate LaTeX
files. In the current case it was relatively easy to copy'n'paste 81 lines
of LaTeX code from the terminal (after scrolling about 1000 lines back).
But imagine an 810 or 8100 lines long LaTeX file, and scrolling back 10.000
lines in the terminal...
(Ok, I know how to use tee or how to re-directed stderr/stdout into
different files, but still... )

It's a good thing we now have at least --verbose :)

Without it, it would have been impossible (for me!) to pinpoint at the
line which is the culprit for the failed pdflatex run...


Reply to this email directly or view it on GitHub
#2288 (comment).

@jgm
Copy link
Owner

jgm commented Jul 10, 2015

There's probably an indirect way you could get the images.
I don't know whether this would work or not. But you could
try converting to epub or docx, then using --extract-media to get
the images out, then converting the epub or docx to latex. Worth
a try - I'd have to look at the code to see if it would
work, but trying it is probably easier.

+++ Paulo Ney de Souza [Jul 09 15 17:14 ]:

And it is not just a question of a long TeX file. In my case the pandoc
run
produces a few thousand PNG's (the html comes from InDesign) and there
is
no good way to handle the images in the terminal window.
I agree that would be best implemented as a separate flag from
--verbose.
leaving files behind at verbose was just a compromise suggestion. In my
specific case many of these png's are the same thing and I want the
ability
to order them by size and check on what they look like ... so having
them
left behind would be really nice.

@pauloney
Copy link
Author

Yes! That works nicely. Thanks! ... and now that I understand the meaning
of "--extract-media" that is indeed the ideal name for the flag in the html
case as well -- since HTML may contain embedded media.

PN

On Thu, Jul 9, 2015 at 5:22 PM, John MacFarlane notifications@github.com
wrote:

There's probably an indirect way you could get the images.
I don't know whether this would work or not. But you could
try converting to epub or docx, then using --extract-media to get
the images out, then converting the epub or docx to latex. Worth
a try - I'd have to look at the code to see if it would
work, but trying it is probably easier.

+++ Paulo Ney de Souza [Jul 09 15 17:14 ]:

And it is not just a question of a long TeX file. In my case the pandoc
run
produces a few thousand PNG's (the html comes from InDesign) and there
is
no good way to handle the images in the terminal window.
I agree that would be best implemented as a separate flag from
--verbose.
leaving files behind at verbose was just a compromise suggestion. In my
specific case many of these png's are the same thing and I want the
ability
to order them by size and check on what they look like ... so having
them
left behind would be really nice.


Reply to this email directly or view it on GitHub
#2288 (comment).

@jgm jgm added the enhancement label Aug 8, 2015
@tonk
Copy link

tonk commented Jul 18, 2017

Just being curious if this is still alive?

I would love to keep the LaTeX index file (*.ind), because we have a large project where the total index
compiled from all the idx files from all chapters. Would be nice no to run panic first and LaTeX afterwards.

@mb21
Copy link
Collaborator

mb21 commented Jul 19, 2017

See @hftf's comment. Just create a bash or make file to use pandoc to generate .tex, then call latex seperately.

@tonk
Copy link

tonk commented Jul 19, 2017

I know I can do it that way, but I would really like to just run it through pandoc. At the moment I'm running it through pandoc and then through LaTeX.

I think it would be a great help if Pandoc just had an option like --keep-temp and a way to find out where they are. I can then copy the file(s) I need and clean up.
This way I keep a clean working directory without all the extra mess of LaTeX.

@jgm
Copy link
Owner

jgm commented Jul 23, 2017 via email

@tonk
Copy link

tonk commented Jul 23, 2017

But that is not what I mean. During the LaTeX run the \index{...} commands result in a .idx file.
After running all pandoc's in al our directories I need to generate the index. So (now) I take all the idx files, run them through a script and generate the .ind. file.

Running just sole pandoc won't do, because the *.idx files are removed. So, now, I need to run Pandoc to get the .tex file and then run LaTeX to get the pdf and idx.

It would be very nice if Pandoc would just leave the idx so I can process them later.

@per-review
Copy link

I have another use case where this feature would be really useful.

My markdown source has a lot SVG images. Thanks to #265, these are converted to PNG with PDF as output format. But in LaTeX output they are not and can't be processed. However, I use biblatex for citations, which is not supported in pandoc's PDF output. So I have to decide between converting all the SVG images manually (if I chose .tex output) and giving up on biblatex (if I chose .pdf output).

Running pandoc with --extract-media doesn't help because SVG images are not converted.

@jgm
Copy link
Owner

jgm commented Jun 13, 2018 via email

@per-review
Copy link

I was referring to the original feature request (allowing the the temporary files to be left behind), nothing more. This is not so much about expanding the feature set of pandoc as making it easier for advanced users to build upon what pandoc already does.

Being able to use the temporary files pandoc generates with PDF output (but then deletes) would allow users to take advantage of the fact that pandoc automatically converts images where necessary and rewrites the paths to them in the .tex file it generates.

None of this is possible with LaTeX output, where SVG files are not converted. In this case the --extract-media option is of limited use. In fact, apart from PDF output SVG files are never converted (I believe). And even here they exist only as temporary files and are not accessible with --extract-media.

@davidar
Copy link
Contributor

davidar commented Jan 8, 2021

My markdown source has a lot SVG images. Thanks to #265, these are converted to PNG with PDF as output format. But in LaTeX output they are not and can't be processed.

It's not too hard to create a Lua filter that automatically converts SVGs to PDFs:

function Image (elem)
  local cmd = 'rsvg-convert -f pdf -a -o "' .. elem.src .. '.pdf" "' .. elem.src .. '"'
  print(cmd)
  os.execute(cmd)
  elem.src = elem.src .. '.pdf'
  return elem
end

This could be improved a bit (e.g. it currently assumes that all images are SVGs), but it works fairly well as a drop-in replacement for pandoc's automatic image conversion otherwise.

@jgm
Copy link
Owner

jgm commented Jan 8, 2021

Note: if you use --pdf-engine=latexmk --pdf-engine-opt=-outdir=foo then foo will be used as the latex build directory and will persist.

@tarleb
Copy link
Collaborator

tarleb commented May 21, 2021

It seems like @jgm's last comment provides a sufficient solution. Can this be closed?

@aubertc
Copy link

aubertc commented Nov 3, 2021

I believe it should be closed, indeed, as the latex directory can be "saved" using latexmk. This is the way I use it, and it works just fine.

@jgm jgm closed this as completed Nov 3, 2021
@ras52
Copy link

ras52 commented Apr 14, 2022

I think this issue needs to be reopened as there is still no general solution to keeping the intermediate files, which is critical for easy debugging and for certain other tasks (such as the examples involving indexes, given above).

Running pandoc with --pdf-engine=latexmk --pdf-engine-opt=-outdir=foo isn't an acceptable solution if you need functionality from a different TeX engine, for example if you need some of the advanced font support in xelatex, and most other TeX engines don't have an -outdir option.

Also, it's not just the TeX engine that creates intermediate files that you might want to keep. As an example, if you are using pandoc to convert HTML to PDF and the HTML includes SVG files, pandoc will run rsvg-convert to generate PDF versions of the images and reference these in the TeX file. Those PDF versions of the images are temporary files which are deleted when pandoc terminates, but if you're trying to debug the layout, you could very well want them.

@aubertc
Copy link

aubertc commented Apr 14, 2022

Running pandoc with --pdf-engine=latexmk --pdf-engine-opt=-outdir=foo isn't an acceptable solution if you need functionality from a different TeX engine, for example if you need some of the advanced font support in xelatex

I don't think this is a valid criticism: you can ask latexmk to run xelatex for you, using

latexmk -xelatex file.tex

or

latexmk -pdfxe file.tex

and you can pass that option to pandoc using

pandoc --pdf-engine=latexmk	--pdf-engine-opt=-pdfxe file.md

Your other points ("most other TeX engines don't have an -outdir option" + temporary images) remains valid, as far as I can tell.

@ras52
Copy link

ras52 commented Apr 14, 2022

you can ask latexmk to run xelatex for you, using

latexmk -xelatex file.tex

Point taken, and thank you for the correction. Nevertheless, if you're trying to debug a layout issue while using (say) xelatex, the last thing you want to do is to port the build environment to running latexmk instead in the hope that it will help debug the proble (not least because latexmk often needs separate installation).

@aubertc
Copy link

aubertc commented Apr 14, 2022

I generally agree that leaving temporary files could be an extremely useful option, I was simply indicating this in case you needed a workaround.

@hftf
Copy link
Contributor

hftf commented Apr 30, 2022

After rereading the thread due to recent contributions, I had a thought about how to design this feature and break down its implementation into small actionable steps.

The main issue is that it would add a lot of complexity. We'd need to figure out when runs of biber or biblatex are needed. We'd need a way to select between them. And then, once we've implemented this, people will ask, "what about mkindex/xindy?" Before long, we're reimplementing latexmk. My thought was that we should cut this off at the start. People who want to use native LaTeX features can create their own pipelines.

If we assume that by implementing this enhancement Pandoc must deal with every temporary file generated by every workflow, then sure, this black-and-white thinking applies. But we can also start at a much more basic level of controlling what Pandoc already does. If a line of code in Pandoc is for deleting a file, then simply wrapping an if-statement around it could be the first step to implementing this feature. The if-statement checks if the value of an option, something like --preserve-intermediate-files, is none or pandoc-generated-only; if at some later date someone wanted to put in the work to expand the feature to more situations, then new values for that option can be created in a future-proof way. Abstracting the if-statement and delete operation into a single utility function would also work.

@jgm
Copy link
Owner

jgm commented May 1, 2022

Sorry, I don't understand why a new feature is needed, given the possibility of doing

pandoc --pdf-engine=latexmk --pdf-engine-opt=-pdfxe --pdf-engine-opt=-outdir=foo file.md

@ras52
Copy link

ras52 commented May 1, 2022

Sorry, I don't understand why a new feature is needed, given the possibility of doing

pandoc --pdf-engine=latexmk --pdf-engine-opt=-pdfxe --pdf-engine-opt=-outdir=foo file.md

This doesn't work when there are intermediate files generated by Pandoc, as in the case when the HTML contains SVG images. In this case, Pandoc runs rsvg-convert to create PDFs for each SVG image, which it deletes again before exiting. The command above leaves a LaTeX build tree in foo/ that won't build because the LaTeX references the PDF versions of the images which no longer exist.

@sboukortt
Copy link

I seem to be encountering the opposite problem with pandoc 3.1.11.1 on Windows (MSYS2). pandoc doc.md -o doc.pdf, if doc.md happens to include other PDF files as images, leaves behind a /tmp/tex2pdf.-7223ab5e94ab234a/ directory, which I don’t want. (Breaks my tup build which had been working fine up to this point, because I can’t declare all those now-persisting output files.)

tup error: File 'C:/msys64/tmp/tex2pdf.-1427b74b505a6df9/bayes/distributions.pdf' was written to, but is not in .tup/db. You probably should specify it as an output

Adding && rm -r /tmp/tex2pdf* to my rule fixes the build, but I would rather not have to do that if possible. Is there an option I’m missing?

@jgm
Copy link
Owner

jgm commented Feb 13, 2024

@sboukortt not sure. This shouldn't happen; we're using withSystemTempDirectory which is supposed to clean up after itself. Is this a new issue with 3.1.11.1?

@sboukortt
Copy link

I’m not sure it was introduced in 3.1.11.1 specifically; perhaps I should try to bisect. But I don’t remember it being an issue before a few weeks (months?) ago.

@jgm
Copy link
Owner

jgm commented Feb 13, 2024

If you can bisect, that would really help. There are no recent changes that seem relevant.

@hftf
Copy link
Contributor

hftf commented Feb 13, 2024

I seem to be encountering the opposite problem

Sorry, but it sounds like a separate issue than this feature request – can move to another venue (new issue or mailing list)?

@ZoomRmc
Copy link

ZoomRmc commented Oct 21, 2024

Here's a short program to act as a pad to inspect the parameters the intended executable was invoked with and save the temporary file passed as one of the parameters:

https://gist.github.com/ZoomRmc/91599eb180f534be5c8ecde0ca11ab4b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests