Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal of a generic "article" template #430

Open
dmkaplan2000 opened this issue Jul 28, 2021 · 17 comments
Open

Proposal of a generic "article" template #430

dmkaplan2000 opened this issue Jul 28, 2021 · 17 comments
Labels
next to consider for next release

Comments

@dmkaplan2000
Copy link
Contributor

I recently created a generic "article" format that has a number of useful features for creating vendor neutral articles and is largely compatible with several other rticles formats, allowing one to start writing articles using this neutral format and quickly change to that of a specific journal once one makes a choice. I am using it to write working document for regional fisheries management organizations (which often eventually become peer-reviewed publications), but it is quite generic.

I am wondering if this sort of "vendor neutral" format could be integrated into rticles. I think it is valuable to include pertinent formats that are not linked to specific publishers, but want to make sure there is interest before trying to do so.

I am adding a link to the format below. If there is interest, I can easily adapt it to rticles and create a pull request for the format. Suggestions on what to call it would be appreciated ("generic_article"???).

Link to directory containing format files. Only rfmo_template.tex and rfmo_skeleton.Rmd are essential.


By filing an issue to this repo, I promise that

  • [X ] I have fully read the issue guide at https://yihui.name/issue/.
  • [X ] I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('rticles'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/rticles').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • [X ] I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@cderv
Copy link
Collaborator

cderv commented Jul 28, 2021

That is an interesting idea!

To better understand, what is exactly the main difference with using pdf_document() with the right configuration ?
The format uses already a generic template (the one provided with Pandoc), and which will use documentclass: article.

This package offers other format functions because they use specific class and usually needs to follow some templates provided by the journal.

Adding a generic template to this package would mean to maintain another template and this is known to be a hard task. I would like to understand what would be the difference with the default template and what can't be used in this one directly ?

Did you compare this already ?

Thanks!

@dmkaplan2000
Copy link
Contributor Author

That is an interesting idea!

To better understand, what is exactly the main difference with using pdf_document() with the right configuration ?
The format uses already a generic template (the one provided with Pandoc), and which will use documentclass: article.

The main contributions of this new format would be that it handles multiple authors, author affiliations and author footnotes as one would want for a scientific article, as well as keywords and multi-lingual abstracts, and it places some of the most useful options for scientific articles in the YAML header (endfloat, numbering lines, fancyhdr). I imagine in principle you could do this with pdf_document(), but as far as I know you would be writing quite a bit of LaTeX to the point where you might as well just write the article in LaTeX.

This package offers other format functions because they use specific class and usually needs to follow some templates provided by the journal.

Adding a generic template to this package would mean to maintain another template and this is known to be a hard task. I would like to understand what would be the difference with the default template and what can't be used in this one directly ?

Did you compare this already ?

Yes, I started my template by copying the default latex template from the pandoc code repository and modifying it. My template has essentially everything that is in the default template (no beamer of course), but also a lot more. Most of it is related to the handling of the title page for which I had to write lots of horrifying latex, something I hope to never have to do again, which is why I want this to be a template. I think the best demonstration would be to just play with some of the options in the YAML header that are not in the standard template: fancyhdr, authblk, endfloat, numberlines, abstract, etc.

Thanks!

@dmkaplan2000
Copy link
Contributor Author

I should also say that I would modify the format somewhat if it is included in rticles so that some of the more obscure options, such as authblk have better names and are, where possible, options for the rendering function so that they are clearly specific to this format and not generic, base-level YAML entries.

@cderv
Copy link
Collaborator

cderv commented Jul 28, 2021

I see. Thanks for the clarification.

I can understand the need of scientific addition to the default Pandoc template. I would love to find a way to

  1. base on Pandoc's template
  2. Patch the template so that we add what is missing for scientific usage
  • We can easily redo 1 and 2 as soon as a new template is out (almost at each Pandoc version)

This is not just for this template but for all template.
I need to look deeper in the difference between templates to see the less costly way for us to maintain such template. There are a few ways that we could tweak the default one we get from Pandoc either using configuration or patching the template on the run. This would avoid maintaining a copy.

On pandoc side they do that for highlighting for example. They have highlighting-macros variable in the template which is filled live by their highlighter skylighting depending on the theme used.

Maybe we can think of something to add the scientific stuff needed.

Just sharing thoughts here as I won't have time to look into more right now as I am taking times off, but I'll have a closer look in september if that is ok.

Pinging @yihui as he may be interested and have thoughts on this.

@dmkaplan2000
Copy link
Contributor Author

This may be possible, though on a technical level, I am not 100% sure how you are thinking of going about it. To create my format, I first took the pandoc default latex template, then I deleted all the beamer stuff as I wasn't interested in that and wanted to reduce noise in the template, then I added my parts in. Looking over the template.tex, the stuff that I added are, in order:

  • line numbering
  • endfloat
  • fancyhdr - I left in the default template's "pagestyle" stuff, but you wouldn't want to use both pagestyle and my fancyhdr stuff
  • I had to modify the default multi-lingual stuff for what I wanted to do with multi-lingual abstracts. This was quite unsatisfactory as the default pandoc language stuff is in BCP47 with some internal translation of that into babel or polyglossia language specifications, but for abstracts you have to specify the language in either babel or polyglossia format as I saw no way to access the translation they use. I have been thinking of suggesting to pandoc that they turn the BCP47 translation into a pipe so you can do $lang/babel$ instead of $babel-lang$.
  • Then I have a long section dealing with author affiliations and footnotes, with affiliations as either footnotes or in authblk format. The major issue here is we would need to have a way to detect if you are using just a simple author field as in the default format or the much more complicated format I am using.
  • After \begin{document}, I have some finishing up for the author footnotes, more fancyhdr stuff, multi-lingual abstract stuff and keywords. I have some code for acknowledgements, but that should be deleted as it is better to just use an unnumbered section in the Rmd template.

@cderv
Copy link
Collaborator

cderv commented Jul 29, 2021

What I am thinking is that:

  • For all the additional stuff, we could do as $header-includes$. This mechanism allows any users to add custom content

    • In rmarkdown, we use that to add more in the template based on what we need.
    • If we can use it to add special content in this place, that would be interesting.
    • Otherwise, we could patch the default template with similar variables (like $fancyhdr-includes$). Keep your custom content in a separate file, only modify the default template with the variable (I thinking it could be easier to maintain) and insert the content when rendering. This would modularize the template.
  • Or similar but leveraging new Pandoc feature, we could use partials (https://pandoc.org/MANUAL.html#partials)

    • Patch the default template with one line partial
    • Keep the partials content in separate file

My idea is to modularize the content so that it could be easier to maintain and update based on default template change.

The challenge on all this is how to keep the custom template up to date with default template in Pandoc , and with less friction as possible. Currently, we are not good at that and I need to improve this for all templates.

Another idea: Leverage git features.

  • We could generate some .patch files based on our modification on one default template, then we could apply those .patch files on any new Pandoc template.
  • Git should be clever enough to apply correctly. Looking at the diff after patch would clearly help us see what will be changed, and maybe show us conflict.

Example of such usage: bslib way of maintaining special tweakings on Boostrap updates using patches

Solutions asides, my concerns regarding the challenge with a custom template for Pandoc are that

  • Each pandoc version includes a new version of the default template working with the associated version
  • Either our custom template needs to work with different pandoc version so need to be kept up to date
  • Either the changes we want to make are within the bounds of what the default templates allow you to configure
  • Either, as we are using rmarkdown and not pandoc directly, we find a way to hot patch the template included with the used Pandoc version

I hope it clarifies my current thinking. That is an interesting topic I need to think more about.

(Regarding the format you suggest, maybe it should be in rmarkdown directly like a new formats scientific_pdf_document() or some feature to add into default pdf_document(science_addition = TRUE) or something else.

Thanks for bringing all this up, I like the way it "shakes" my thinking.

@dmkaplan2000
Copy link
Contributor Author

Think away, there is no rush. I more or less get what you are proposing, though I might need help with the mechanics of it when the time comes to implement it (if indeed you are proposing I do it - my time on this is not unlimited, but I am happy to lend a hand if I can get some direction). Once we have a strategy, I can move things around or cut things out of my template as needed, but I don't want to do much effort now until we have a clear strategy.

$header-includes$ could be used for getting in a lot of the code, but I also have some code that goes after \begin{document}. The placement of some of that code relative to \maketitle is essential. $include-before$ seems to be well placed for this use, but we would need to make sure that nothing was ever placed between \maketitle and $include-before$.

Regarding partials, I saw that in the pandoc documentation, but didn't really understand what they were about (the documentation is a bit terse...). It is another option.

Placing this within the rmarkdown package also seems reasonable. As you say, there is no dependence on a specific CLS file, so rmarkdown makes more sense than rticles. Perhaps a new function scientific_pdf_document or article_document with some pushing of file contents to $header-includes$ and $include-before$ is largely sufficient, but there is the issue of the multi-lingual abstracts that required modifying the multi-lingual code in the default template. This could be fixed by pushing the abstract languages into the otherlangs variables in BCP47 format in much the same way that divs and spans with other language specifications become otherlangs, but this would need to happen before pandoc touches it I think. Perhaps there is a way to tack some empty spans with the appropriate languages onto the .md document before that is fed to pandoc.

@cderv
Copy link
Collaborator

cderv commented Jul 29, 2021

Think away, there is no rush. I more or less get what you are proposing, though I might need help with the mechanics of it when the time comes to implement it (if indeed you are proposing I do it - my time on this is not unlimited, but I am happy to lend a hand if I can get some direction). Once we have a strategy, I can move things around or cut things out of my template as needed, but I don't want to do much effort now until we have a clear strategy.

The minimum work I need from you would be to have your files in a repo of yours so that we can retrieve them easily with git. A PR is also an option to share those files with us. Then the more you manage to do, the better it is for me. The hard part is the updating mechanism (at least one mechanism even if not perfect) so that we have a first solution. Then we (I) could try different logic. Once I get a good grasp what should be inserted and what is the best option we'll be able to work on the definitive solution. (from the same PR or using the repo with the files and tests).

In a nutshell, a Minimal Viable product would be great as a base to work on. This does not require to change your template as it could be used already.

$header-includes$ could be used for getting in a lot of the code, but I also have some code that goes after \begin{document}.

That is good to know.

The placement of some of that code relative to \maketitle is essential. $include-before$ seems to be well placed for this use, but we would need to make sure that nothing was ever placed between \maketitle and $include-before$

It may not be the cleanest, but for this kind of "placement specific" content, a hot patch by searching for \maketitle, than replacing / inserting content here is often a good working solution. (replace one line with more lines)

Regarding partials, I saw that in the pandoc documentation, but didn't really understand what they were about (the documentation is a bit terse...). It is another option.

Basically, they introduce this to have common elements between templates. Best example is all the CSS component that are shared across all the HTML templates in Pandoc. There is a style.html file, which is inserted in each template using the partials syntax.

  <style type="text/css">
    $styles.html()$
  </style>

So this is mainly a mechanism to have less duplication. This would be useful for rticles but there would be minimal requirement on Pandoc greater than the current one. I don't want to do it yet on the whole package. (on a specific format, we could)

Placing this within the rmarkdown package also seems reasonable. As you say, there is no dependence on a specific CLS file, so rmarkdown makes more sense than rticles. Perhaps a new function scientific_pdf_document or article_document with some pushing of file contents to $header-includes$ and $include-before$ is largely sufficient, but there is the issue of the multi-lingual abstracts that required modifying the multi-lingual code in the default template. This could be fixed by pushing the abstract languages into the otherlangs variables in BCP47 format in much the same way that divs and spans with other language specifications become otherlangs, but this would need to happen before pandoc touches it I think. Perhaps there is a way to tack some empty spans with the appropriate languages onto the .md document before that is fed to pandoc.

Great I like that you like the idea. This is exactly the no CLS part that makes me think that!

I need to have a deeper look on this multi-lingual thing. This is new to me. Maybe there is a solution we can find leveraging the fact that we are rendering with Pandoc with pre processing in R before. We already to some hot patching of the template, and also pre processing on the md file or post processing on the tex file.
If you think there is improvement to make in Pandoc, you should suggest there. pandoc discuss is a place where you can share thoughts like this I believe).

Just so you know, I will take time off for the coming weeks, but I'll keep thinking about that. However, I could rework on rticles in maybe not sooner than a month.

@yihui
Copy link
Member

yihui commented Jul 29, 2021

The key issue here is long-term maintenance. I'd suggest that we start with using $header-includes$ and $include-before$ and see what else needs to be done. If possible, I wish to avoid maintaining a Pandoc LaTeX template by ourselves, but try to extend/patch it instead. @dmkaplan2000 If you are able to commit to the maintenance of this output format, then I wouldn't worry too much about how the work should be done (you can choose the most convenient way by yourself).

@dmkaplan2000
Copy link
Contributor Author

Perhaps I am wrong, but I think we are on the wrong track with something here. $header-include$ only allows including raw LaTeX code, but I have code that includes pandoc variables and conditional statements. For that, I don't think I can use $header-includes$ or in_header: file.tex. This is largely my fault as I hadn't thought thru what using $header-includes$ implies.

Presuming I have correctly understood things, it seems like partials are the only option if we want to do things the "pandoc way", but then I think we would need hooks in the default pandoc latex template for this to work. As such, I don't see a way to directly implement the "patching" solution. What I propose to do instead is reorganize my template so that all my bits are in a few specific places and then we can see what is the best way to patch those into the pandoc default latex template.

@dmkaplan2000
Copy link
Contributor Author

Also, I ran the idea of a babel pipe for pandoc up the flag pole at the pandoc github page and they immediately shot down the idea... I have an ugly work around in mind, but I will need to test it.

@yihui
Copy link
Member

yihui commented Jul 29, 2021

$header-include$ only allows including raw LaTeX code, but I have code that includes pandoc variables and conditional statements.

Oh I see. That sounds tricky indeed...

@dmkaplan2000
Copy link
Contributor Author

I created a repository with my template where I have taken the default latex template for pandoc, very slightly modified it to add hooks for partials and then placed all of my new code into those partial files. There were a few lines in the default template that I had to delete as these were replaced by the contents of the partial files, but other than that it is essentially the default pandoc template with partials hooks. This provides a clear path for us to implement what I am trying to do just using a few much more manageable files.

The one thing that is clearly not working like I would like it to is multi-lingual abstracts. I tried using spans to push the abstract languages into otherlangs, but this produced lots of problems. For one, this only works if the spans are in the main body of the .Rmd file, not if the spans are in the $abstract.text$ YAML entry. For two, for some reason when I used spanish as one of the languages, it didn't create the spanish environment in the $babel-newcommands$ variable, though it would do this for French or Portuguese. In addition, the texlive-lang-french package seems to have some sort of bug on my Ubuntu system so that using French would cause the output .tex file not to compile. Instead, I just used Portuguese as the declared language (though I can't write two words of Portuguese, so the text is actually French). All in all a train wreck with seemingly multiple different bugs in different places, none of which I can easily fix. We would need to find some solution to this for the template to work.

The repository is https://github.com/dmkaplan2000/generic_rmarkdown_article

@dmkaplan2000
Copy link
Contributor Author

In case it is pertinent, the pandoc template I started with is:

https://github.com/jgm/pandoc/blob/master/data/templates/default.latex

If you diff mine with theirs you should be able to see where I made modifications and deletions.

@dmkaplan2000
Copy link
Contributor Author

I have now fixed essentially all the issues I had with multi-lingual abstracts and even improved on how it worked before by using divs with .abstract for the class name. Also defined a couple of special environments, keywords and renameableabstract, that allow me to insert keywords and abstracts with different titles. This is a much cleaner solution than what I initially had and improves functionality.

Setting a div to be lang=es still doesn't work, but I think this is a bug in the way $babel-newcommands$ works.

Also, to get French to work, I had to force it to use the footnote package instead of footnotehyper. This seems to be a bud related to a clash between babel French and footnotehyper.

@dmkaplan2000
Copy link
Contributor Author

Hope everyone had a nice summer. I wanted to let you know that I now have a worked example that uses this template (with a few small unimportant modifications to fit the specific format demanded by the target audience):
2021-wp-fishing-spatial-distribution-statistics.pdf.

@dmkaplan2000
Copy link
Contributor Author

FYI, I have thrown my latest version of this template into a small R package:

https://github.com/dmkaplan2000/starticles

@cderv cderv moved this to Backlog in R Markdown Team Projects Jun 3, 2022
@cderv cderv moved this from Backlog to To discuss in R Markdown Team Projects Jun 3, 2022
@cderv cderv added the next to consider for next release label Jun 3, 2022
@cderv cderv moved this from To discuss to Backlog in R Markdown Team Projects Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next to consider for next release
Projects
Status: Backlog
Development

No branches or pull requests

3 participants