fix: Allow different types of EOL in note model template regex #53

dominik-oktav · 2024-03-24T14:03:28Z

As reported in #51, the current regex doesn't work on Windows. I adapted the regex to allow \r\n, \n and \r.

ohare93

Thanks for the efforts mate. Just one comment 👍

ohare93 · 2024-03-28T08:49:13Z

brain_brew/representation/yaml/note_model_template.py

@@ -23,7 +23,7 @@
 HTML_FILE = AnkiField("html_file")
 BROWSER_HTML_FILE = AnkiField("browser_html_file", default_value=None)

-html_separator_regex = r'[\n]{1,}[-]{1,}[\n]{1,}'
+html_separator_regex = r'[(\r\n|\r|\n)]{1,}[-]{1,}[(\r\n|\r|\n)]{1,}'


Suggested change

html_separator_regex = r'[(\r\n|\r|\n)]{1,}[-]{1,}[(\r\n|\r|\n)]{1,}'

html_separator_regex = r'(\r\n|\r|\n){1,}[-]{1,}(\r\n|\r|\n){1,}'

Shouldn't we remove the square brackets here, replacing the with braces for a capture group? Playing around with this on regex101.com it seems that [(\r\n|\r|\n)] is simple matching any individual character (as square brackets do in regex) including opening and closing braces 😅

Removing the square brackets seems to achieve the desired effect 😄

Oh, seems like I overlooked that. You are right of course. However, the groups need to be declared non-capturing to not get more than 1 match. I added these changes accordingly.

I haven't tested and I haven't used regexes for a while, but I believe that capturing vs. non-capturing only affects whether the contents of the group can be accessed later.

Edit: looking further, the groups do indeed need to be non-capturing, because otherwise re.split (which is what uses the regex in question) would return the contents of the newlines as part of the list of lines of text:

https://docs.python.org/3/library/re.html#re.split

helitopia · 2024-08-19T16:47:41Z

Hi everyone,

Any updates on the PR?

Also, is there a reason the regex (?:\r?\n){1,}[-]{1,}(?:\r?\n){1,} is not used? (It is either \n for Linux or \r\n for Windows)

And in general it would be great to merge any solution that works as the issue may potentially block eager Windows-using contributors to Ultimate Geography repo (and any other using brain_brew). Not everyone is able to debug Python code and figure out the root cause of the issue (let alone to fix it locally).

aplaice · 2024-08-19T19:33:54Z

Also, is there a reason the regex (?:\r?\n){1,}[-]{1,}(?:\r?\n){1,} is not used? (It is either \n for Linux or \r\n for Windows)

I suspect that the slightly more general regex was chosen because technically, there are/were systems where just \r is/was used. (Classic MacOS if I remember correctly.) This is probably more relevant in situations where one has to handle old text files, though; here, I expect we will never encounter such files since Anki itself is younger than the unix-based MacOS. OTOH it doesn't really hurt.

aplaice

Testing on the AUG repo on Linux (I don't have easy access to Windows):

with normal \n newlines the patch doesn't break anything.
with manually added \r\n newlines the patch allows BB to run successfully (without the patch \r\n newlines cause BB to crash).

(IMO can/should be merged! :))

ohare93 · 2024-08-20T04:35:23Z

Lovely, thanks all. I’ll give it a quick test on Windows tomorrow night, then make a release shortly thereafter.

fix: Allow different types of EOL in note model template regex

3f71d47

ohare93 reviewed Mar 28, 2024

View reviewed changes

Fix regex, declare groups non-capturing

1e55ebd

aplaice approved these changes Aug 19, 2024

View reviewed changes

ohare93 approved these changes Aug 24, 2024

View reviewed changes

ohare93 merged commit ab4c47b into ohare93:master Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Allow different types of EOL in note model template regex #53

fix: Allow different types of EOL in note model template regex #53

dominik-oktav commented Mar 24, 2024 •

edited

Loading

ohare93 left a comment

ohare93 Mar 28, 2024

dominik-oktav Mar 28, 2024

aplaice Mar 29, 2024 •

edited

Loading

helitopia commented Aug 19, 2024 •

edited

Loading

aplaice commented Aug 19, 2024

aplaice left a comment •

edited

Loading

ohare93 commented Aug 20, 2024

	html_separator_regex = r'[(\r\n\|\r\|\n)]{1,}[-]{1,}[(\r\n\|\r\|\n)]{1,}'
	html_separator_regex = r'(\r\n\|\r\|\n){1,}[-]{1,}(\r\n\|\r\|\n){1,}'

fix: Allow different types of EOL in note model template regex #53

fix: Allow different types of EOL in note model template regex #53

Conversation

dominik-oktav commented Mar 24, 2024 • edited Loading

ohare93 left a comment

Choose a reason for hiding this comment

ohare93 Mar 28, 2024

Choose a reason for hiding this comment

dominik-oktav Mar 28, 2024

Choose a reason for hiding this comment

aplaice Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

helitopia commented Aug 19, 2024 • edited Loading

aplaice commented Aug 19, 2024

aplaice left a comment • edited Loading

Choose a reason for hiding this comment

ohare93 commented Aug 20, 2024

dominik-oktav commented Mar 24, 2024 •

edited

Loading

aplaice Mar 29, 2024 •

edited

Loading

helitopia commented Aug 19, 2024 •

edited

Loading

aplaice left a comment •

edited

Loading