Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH pandoc parser minor improvements and support older pandoc #284

Merged

Conversation

BenjaminBossan
Copy link
Collaborator

@BenjaminBossan BenjaminBossan commented Jan 25, 2023

Description

Add support for LineBreak item

This one was a bit obscure, it requires the model card to contain a line break with trailing whitespace to be triggered.

Add support for older pandoc versions

This one came up when working on a HF space. There, non-Python dependencies can be installed, but only as debian packages via apt. However, the apt repo only has an old pandoc version, which doesn't work with the current implementation.

The solutions are either to support older pandoc versions or to use a Docker space, which seems like overkill. Fortunately, supporting older pandoc versions wasn't as hard as expected. The main change was introduced with (afaict) pandoc 2.5, which changed handling of tables. Now there are two table implementations, old and new, which differ slightly.

Tested and works with the following pandoc versions:

  • 2.0
  • 2.2
  • 2.5
  • 2.19
  • 3.0

Minimum pandoc version has been decreased to 2.0. CI has been adopted to install pandoc via apt, resulting in version 2.9.2.1.

Tested and works with the following pandoc versions:

- 2.0
- 2.2
- 2.5
- 2.19
- 3.0

Main change was with (afaict) pandoc 2.5, which changed handling of
tables. Now there are two table implementations, old and new, which
differ slightly.

Minimum pandoc version has been decreased to 2.0.
@BenjaminBossan
Copy link
Collaborator Author

Ready for review @skops-dev/maintainers

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -63,7 +63,7 @@ jobs:
else pip install "scikit-learn~=${{ matrix.sklearn_version }}";
fi
if [ ${{ matrix.os }} == "ubuntu-latest" ];
then wget -q https://github.com/jgm/pandoc/releases/download/2.19.2/pandoc-2.19.2-1-amd64.deb && sudo dpkg -i pandoc-2.19.2-1-amd64.deb;
then sudo apt install pandoc && pandoc --version;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo :D

@@ -185,7 +189,16 @@ def _code(item: tuple[Any, str]) -> str:
_, txt = item
return f"`{txt}`"

def _table_cols(self, items) -> list[str]:
def _table_cols_old(self, items) -> list[str]: # pragma: no cover
# pandoc < 2.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea when we can drop support for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess as soon as apt updates to a more recent pandoc version. I have no idea when that will be.


columns = self._table_cols(thead_body)

columns = self._table_cols_new(thead_body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected these lines to be covered.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the table change actually happens after 2.9, not 2.5 as I first suspected. Therefore, given that we now install pandoc via apt on CI, which uses 2.9, the old table implementation is covered, not the new one. I thus moved around the pragmas to the new implementation.

So the table change actually happens after 2.9, not 2.5 as I first
expected. Therefore, given that we now install pandoc via apt on CI,
which uses 2.9, the old table implementation is covered, not the new
one. I thus moved around the pragmas to the new implementation.
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we basically tested the new stuff, and now also the old stuff. This is good to go. Thanks @BenjaminBossan

@adrinjalali adrinjalali changed the title ENH pandoc parser minor improvements ENH pandoc parser minor improvements and support older pandoc Jan 25, 2023
@adrinjalali adrinjalali merged commit d9b7c36 into skops-dev:main Jan 25, 2023
@BenjaminBossan BenjaminBossan deleted the ENH-pandoc-parser-minor-improvements branch January 25, 2023 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants