Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Latex2unicode #2532

Merged
merged 13 commits into from
Feb 10, 2017
Merged

Switch to Latex2unicode #2532

merged 13 commits into from
Feb 10, 2017

Conversation

lenhard
Copy link
Member

@lenhard lenhard commented Feb 9, 2017

Motivation is discussed in #2465. Fixes many conversion problems, such as those discussed in #2063 and #207.

  • Change in CHANGELOG.md described
  • Tests created for changes
  • Manually tested changed features in running JabRef

@lenhard lenhard added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Feb 9, 2017
# Conflicts:
#	CHANGELOG.md
@stefan-kolb
Copy link
Member

Please remove the old latex2unicode formatter code, otherwise good to go.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM....just some minor remarks.

*/
public class LatexToUnicodeAdapter {

public static String format(String inField) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge this class into the LatexToUnicodeFormatter. I don't see any advantage of having two classes that do essentially the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not possible, since we need the unicode conversion in the model to internally convert fields of BibEntries. If we just had the formatter, we would have a dependency from model to logic, which is forbidden in the architecture.

CHANGELOG.md Outdated
@@ -22,6 +22,7 @@ We refer to [GitHub issues](https://github.com/JabRef/jabref/issues) by using `#
- Redesigned journal abbreviations dialog.
- Redesigned error console.
- All file dialogs now use the native file selector of the OS. [#1711](https://github.com/JabRef/jabref/issues/1711)
- Switch to the latex2unicode library https://github.com/tomtung/latex2unicode for converting LaTeX to unicode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use markdown links

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -28,9 +29,10 @@ public void testFormatStripLatexCommands() {
@Test
public void testFormatTextit() {
// See #1464
assertEquals("text", formatter.format("\\textit{text}"));
assertEquals("\uD835\uDC61\uD835\uDC52\uD835\uDC65\uD835\uDC61", formatter.format("\\textit{text}"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d835 is the symbol for an invalid character. Is this conversation a bug or indeed the expected outcome?

Copy link
Member Author

@lenhard lenhard Feb 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be used as an escaping symbol for the following hex. I arrived at this sequence by copying the italicized text in Intellij into the code which transformed it into these hexes.

If you remove the first hex, then the first letter of the expected output turns into a ? for Intellij, so this really seems to be the escape sequence.

I have to admit, though, that my knowledge on the correct hexes of italicized text in unicode is slim.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

italics

Okay... I really kinda works - Although is looks strange ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect anyone to ever use italicized text in their entries - after all, why would you? But isn't it cool to know that with the converter we could theoretically support it? :-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lenhard
Copy link
Member Author

lenhard commented Feb 10, 2017

Since the build is passing and there are two approved reviews, I will merge this into master now and close the related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants