-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Latex2unicode #2532
Switch to Latex2unicode #2532
Conversation
# Conflicts: # CHANGELOG.md
Please remove the old latex2unicode formatter code, otherwise good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM....just some minor remarks.
*/ | ||
public class LatexToUnicodeAdapter { | ||
|
||
public static String format(String inField) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would merge this class into the LatexToUnicodeFormatter
. I don't see any advantage of having two classes that do essentially the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not possible, since we need the unicode conversion in the model to internally convert fields of BibEntries
. If we just had the formatter, we would have a dependency from model to logic, which is forbidden in the architecture.
CHANGELOG.md
Outdated
@@ -22,6 +22,7 @@ We refer to [GitHub issues](https://github.com/JabRef/jabref/issues) by using `# | |||
- Redesigned journal abbreviations dialog. | |||
- Redesigned error console. | |||
- All file dialogs now use the native file selector of the OS. [#1711](https://github.com/JabRef/jabref/issues/1711) | |||
- Switch to the latex2unicode library https://github.com/tomtung/latex2unicode for converting LaTeX to unicode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use markdown links
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -28,9 +29,10 @@ public void testFormatStripLatexCommands() { | |||
@Test | |||
public void testFormatTextit() { | |||
// See #1464 | |||
assertEquals("text", formatter.format("\\textit{text}")); | |||
assertEquals("\uD835\uDC61\uD835\uDC52\uD835\uDC65\uD835\uDC61", formatter.format("\\textit{text}")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d835 is the symbol for an invalid character. Is this conversation a bug or indeed the expected outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be used as an escaping symbol for the following hex. I arrived at this sequence by copying the italicized text in Intellij into the code which transformed it into these hexes.
If you remove the first hex, then the first letter of the expected output turns into a ?
for Intellij, so this really seems to be the escape sequence.
I have to admit, though, that my knowledge on the correct hexes of italicized text in unicode is slim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect anyone to ever use italicized text in their entries - after all, why would you? But isn't it cool to know that with the converter we could theoretically support it? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details here https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates :)
Since the build is passing and there are two approved reviews, I will merge this into master now and close the related issues. |
* upstream/master: Switch to Latex2unicode (#2532)
Motivation is discussed in #2465. Fixes many conversion problems, such as those discussed in #2063 and #207.