Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test cases for "UnicodeToLatex" and "LatexToUnicode" #11061

Merged
merged 3 commits into from
Mar 21, 2024

Conversation

AbdAlRahmanGad
Copy link
Contributor

I added the test case mentioned in #5547 and other test cases to ensure reliability.

The problem with the test case was that ı̄ is not one character it's a combination of ı + ̄ unlike ā. What was happennig is that ı was being converted to \i and then ̄ was being converted to {\={}} so the result would be {\i{\={}}}.

I made it that we deal with such characters that cause conflict after combining accents. They are stored in a new variable called UNICODE_LATEX_CONVERSION_MAP_AFTER_COMBINING_ACCENTS I think this will be the base to deal with such cases.

I also added some of the underdot characters as the previous implementation didn't handle them.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@calixtus
Copy link
Member

Maybe this could help: https://www.unicode.org/faq/char_combmark.html

@koppor
Copy link
Member

koppor commented Mar 20, 2024

I think, all the unicode should be normalized before conversion. Use the formatter introduced at #11056.

@koppor
Copy link
Member

koppor commented Mar 20, 2024

I made it that we deal with such characters that cause conflict after combining accents.

My comment above in other words: We need to rely on the normal form NFC. Base our internal maps on that. And do not introduce some other maps.

Normalize unicode before conversion,
remove the new mapping,
add one new test case
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick action taken. Just minor comments.


Future work: The part at " // Combining accents" will never be touched? One should check with "code coverage"

- Removed unnecessary line
- Renamed the `normalizer` variable to `UNICODE_NORMALIZER`
- Added link to the issue
@koppor koppor added this pull request to the merge queue Mar 21, 2024
Merged via the queue into JabRef:main with commit 7bb9339 Mar 21, 2024
20 checks passed
@AbdAlRahmanGad AbdAlRahmanGad deleted the test_case branch April 13, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants