Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Allow different types of EOL in note model template regex #53

Merged
merged 2 commits into from
Aug 24, 2024

Conversation

dominik-oktav
Copy link
Contributor

@dominik-oktav dominik-oktav commented Mar 24, 2024

As reported in #51, the current regex doesn't work on Windows. I adapted the regex to allow \r\n, \n and \r.

Copy link
Owner

@ohare93 ohare93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the efforts mate. Just one comment 👍

@@ -23,7 +23,7 @@
HTML_FILE = AnkiField("html_file")
BROWSER_HTML_FILE = AnkiField("browser_html_file", default_value=None)

html_separator_regex = r'[\n]{1,}[-]{1,}[\n]{1,}'
html_separator_regex = r'[(\r\n|\r|\n)]{1,}[-]{1,}[(\r\n|\r|\n)]{1,}'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
html_separator_regex = r'[(\r\n|\r|\n)]{1,}[-]{1,}[(\r\n|\r|\n)]{1,}'
html_separator_regex = r'(\r\n|\r|\n){1,}[-]{1,}(\r\n|\r|\n){1,}'

Shouldn't we remove the square brackets here, replacing the with braces for a capture group? Playing around with this on regex101.com it seems that [(\r\n|\r|\n)] is simple matching any individual character (as square brackets do in regex) including opening and closing braces 😅

image

Removing the square brackets seems to achieve the desired effect 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems like I overlooked that. You are right of course. However, the groups need to be declared non-capturing to not get more than 1 match. I added these changes accordingly.

Copy link
Contributor

@aplaice aplaice Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested and I haven't used regexes for a while, but I believe that capturing vs. non-capturing only affects whether the contents of the group can be accessed later.

Edit: looking further, the groups do indeed need to be non-capturing, because otherwise re.split (which is what uses the regex in question) would return the contents of the newlines as part of the list of lines of text:

https://docs.python.org/3/library/re.html#re.split

@helitopia
Copy link

helitopia commented Aug 19, 2024

Hi everyone,

Any updates on the PR?

Also, is there a reason the regex (?:\r?\n){1,}[-]{1,}(?:\r?\n){1,} is not used? (It is either \n for Linux or \r\n for Windows)

And in general it would be great to merge any solution that works as the issue may potentially block eager Windows-using contributors to Ultimate Geography repo (and any other using brain_brew). Not everyone is able to debug Python code and figure out the root cause of the issue (let alone to fix it locally).

@aplaice
Copy link
Contributor

aplaice commented Aug 19, 2024

Also, is there a reason the regex (?:\r?\n){1,}[-]{1,}(?:\r?\n){1,} is not used? (It is either \n for Linux or \r\n for Windows)

I suspect that the slightly more general regex was chosen because technically, there are/were systems where just \r is/was used. (Classic MacOS if I remember correctly.) This is probably more relevant in situations where one has to handle old text files, though; here, I expect we will never encounter such files since Anki itself is younger than the unix-based MacOS. OTOH it doesn't really hurt.

Copy link
Contributor

@aplaice aplaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing on the AUG repo on Linux (I don't have easy access to Windows):

  1. with normal \n newlines the patch doesn't break anything.
  2. with manually added \r\n newlines the patch allows BB to run successfully (without the patch \r\n newlines cause BB to crash).

(IMO can/should be merged! :))

@ohare93
Copy link
Owner

ohare93 commented Aug 20, 2024

Lovely, thanks all. I’ll give it a quick test on Windows tomorrow night, then make a release shortly thereafter.

@ohare93 ohare93 merged commit ab4c47b into ohare93:master Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants