Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex to catch HTML tags #398

Merged
merged 5 commits into from
Sep 21, 2023
Merged

Fix regex to catch HTML tags #398

merged 5 commits into from
Sep 21, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Sep 20, 2023

This PR fixes the CI for https://github.com/huggingface/course and in particular ./es/chapter5/5.mdx.
Related to this slack thread (internal) and this failing CI.

The failing doc had this form:

10K<n<100K

<Tip>

This is a tip.

</Tip>

in which the _re_lt_html regex detected <n<100K \n<Tip> as a single HTML tag. The _re_lt_html is meant to detect the < characters that are not part of a HTML tag.

I fixed the regex for this use case + added a regression test for it. All previous unit tests are still passing so hopefully it doesn't break something in the wild. I also took the liberty to add verbose mode to explain a bit more what the regex is doing (too me a bit of time to remember 🙄).

cc @mishig25 @MKhalusova @xenova

(also related to #373 and #394 which introduced and modified this regex)

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2023

CI was failing on an un-related test:

tests/test_autodoc.py:283: AssertionError
=========================== short test summary info ============================
FAILED tests/test_autodoc.py::AutodocTester::test_document_object - AssertionError: '\n<d[221 chars]ters>[{"name": "*args", "val": ""}, {"name": "[462 chars]\n\n' != '\n<d[221 chars]ters>""</parameters></docstring>\n\nBase class[401 chars]\n\n'
Diff is 1304 characters long. Set self.maxDiff to None to see it.
========================= 1 failed, 83 passed in 4.21s =========================

It is due to this commit on transformers that added arguments to the ModelInfo object.

I fixed the expected result in ec14a49.

@MKhalusova
Copy link

Thank you for fixing this!

@mishig25
Copy link
Contributor

not to forget this dev change: 8c6163b

Copy link
Contributor

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Green means go! 🚀🚢 (🟢 for Transformers.js)

image

  • just a reminder to remove that dev change

This reverts commit 8c6163b.
Copy link
Contributor

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

@mishig25 mishig25 merged commit daaaf9a into main Sep 21, 2023
4 checks passed
@mishig25 mishig25 deleted the fix-lt-html-regex branch September 21, 2023 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants