-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test cases for "UnicodeToLatex" and "LatexToUnicode" #11061
Conversation
Maybe this could help: https://www.unicode.org/faq/char_combmark.html |
I think, all the unicode should be normalized before conversion. Use the formatter introduced at #11056. |
src/test/java/org/jabref/logic/layout/format/LatexToUnicodeFormatterTest.java
Outdated
Show resolved
Hide resolved
My comment above in other words: We need to rely on the normal form NFC. Base our internal maps on that. And do not introduce some other maps. |
Normalize unicode before conversion, remove the new mapping, add one new test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick action taken. Just minor comments.
Future work: The part at " // Combining accents" will never be touched? One should check with "code coverage"
src/main/java/org/jabref/logic/formatter/bibtexfields/UnicodeToLatexFormatter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/util/strings/HTMLUnicodeConversionMaps.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jabref/logic/layout/format/LatexToUnicodeFormatterTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jabref/logic/layout/format/LatexToUnicodeFormatterTest.java
Show resolved
Hide resolved
- Removed unnecessary line - Renamed the `normalizer` variable to `UNICODE_NORMALIZER` - Added link to the issue
I added the test case mentioned in #5547 and other test cases to ensure reliability.
The problem with the test case was that
ı̄
is not one character it's a combination ofı + ̄
unlikeā
. What was happennig is thatı
was being converted to\i
and then̄
was being converted to{\={}}
so the result would be{\i{\={}}}
.I made it that we deal with such characters that cause conflict after combining accents. They are stored in a new variable called
UNICODE_LATEX_CONVERSION_MAP_AFTER_COMBINING_ACCENTS
I think this will be the base to deal with such cases.I also added some of the underdot characters as the previous implementation didn't handle them.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)