Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid conversion of single underscores #2711

Merged
merged 8 commits into from
Apr 6, 2017
Merged

Avoid conversion of single underscores #2711

merged 8 commits into from
Apr 6, 2017

Conversation

lenhard
Copy link
Member

@lenhard lenhard commented Apr 5, 2017

Fixes #2664

Single underscores that are not followed by { are not converted to unicode during the LaTeX to unicode conversion. This is not actually correct according to the rules of LaTeX, but seems to be an important exception required by our users.

This is achieved by the following:

  1. Replace all single underscores with JABREFUNDERSCORE
  2. Do the unicode conversion
  3. Replace JABREFUNDERSCORE with an underscore

TODO: We should find a better magic word instead of JABREFUNDERSCORE, because every character will be processed by latex2unicode and that costs performance. However, it should still be long and unique enough so that we avoid accidental conversions. Suggestions anyone?

  • Change in CHANGELOG.md described
  • Tests created for changes
  • Manually tested changed features in running JabRef

@lenhard lenhard added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Apr 5, 2017
@Siedlerchr
Copy link
Member

What about the Unicode Replacement Char?
http://www.fileformat.info/info/unicode/char/fffd/index.htm

CHANGELOG.md Outdated
@@ -64,6 +64,7 @@ We refer to [GitHub issues](https://github.com/JabRef/jabref/issues) by using `#
- Entries with a single corporate author are now correclty exported to the corresponding `corporate` author field in MS-Office XML. [#1497](https://github.com/JabRef/jabref/issues/1497)
- Improved author handling in MS-Office Import/Export
- The `day` part of the biblatex `date` field is now exported to the corresponding `day` field in MS-Office XML. [#2691](https://github.com/JabRef/jabref/issues/2691)
- Single underscores are not converted to during the LaTeX to unicode conversion, which does not follow the rules of LaTeX, but is what users requre. [#2664](https://github.com/JabRef/jabref/issues/2664)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: to Latex during the Latex to Unicode cconversion ;)

Copy link
Member Author

@lenhard lenhard Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small issue in the changelog, otherwise LGTM

@Test
public void testCustomUnderscoreConversion() {
// our custom version which should preserve the _
assertEquals("Lorem ipsum_lorem ipsum", formatter.format("Lorem ipsum_lorem ipsum"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please split the test into two tests and convert your comments into nice test names 😸

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I can ;-)

@lenhard
Copy link
Member Author

lenhard commented Apr 5, 2017

Excellent idea @Siedlerchr I have implemented this using the replacement char (and it still works).

lenhard added 2 commits April 5, 2017 15:48
Essentially one level deeper in the hierarchy to make sure that BibEntry itself leverages the functionality
@lenhard
Copy link
Member Author

lenhard commented Apr 6, 2017

As discussed in the devcall: This can be merged aber fixing checkstyle

@lenhard lenhard merged commit 0187e9a into master Apr 6, 2017
@lenhard lenhard deleted the unicode-underscore branch April 6, 2017 13:48
Siedlerchr added a commit that referenced this pull request Apr 6, 2017
* upstream/master:
  Implement #1359: collect telemetry (#2283)
  Write groups under tag `group` instead of `groupstree` to enhance compatibility with previous versions (#2704)
  Avoid conversion of single underscores (#2711)
  Fix #628: implement hierarchical keywords (#2703)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants