Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epsilon symbol in unicode-to-latex not supported #7387

Closed
salmma opened this issue Jan 26, 2021 · 17 comments · Fixed by #7419
Closed

epsilon symbol in unicode-to-latex not supported #7387

salmma opened this issue Jan 26, 2021 · 17 comments · Fixed by #7419

Comments

@salmma
Copy link

salmma commented Jan 26, 2021

JabRef version 5.1 on MacOs 11.1

If 𝜖 in title of a reference, it is not converted to \epsilon when selecting "cleanup" and "unicode-to-latex"

Steps to reproduce the behavior:

  1. Get reference from doi-fetcher
  2. cleanup
  3. unicode-to-latex
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

Does anything happen? (I get some issues when I try to copy-paste the 𝜖 into JabRef, so I can't test it myself)

@Siedlerchr
Copy link
Member

Hm, your Epsilon is somehow the wrong char or is it written in italics?
Greek Small Letter Epsilon ε is coverted to $\epsilon$

@Siedlerchr Siedlerchr added the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Jan 29, 2021
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

I believe it is Mathematical Italic Epsilon Symbol 𝜖. I am getting graphical issues when I am copying it into JabRef and non-sense if I try to convert it to latex.

@salmma
Copy link
Author

salmma commented Jan 30, 2021

Yes, it is originally the $\epsilon$ symbol from LaTeX, see

https://www.overleaf.com/learn/latex/List_of_Greek_letters_and_math_symbols

@Siedlerchr
Copy link
Member

@k3KAW8Pnf7mkmdSMPHz27 Yes, mac has a problem with displaying italic, I see the same issue.
@salmma You said you imported from DOI, can you paste the DOI here?

@salmma
Copy link
Author

salmma commented Jan 30, 2021

@Siedlerchr Siedlerchr removed the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Jan 30, 2021
@tmrd993
Copy link
Contributor

tmrd993 commented Feb 1, 2021

For me, it converts About a criterion of successfully executing a circuit in the NISQ era: what wd ≪ 1/𝜖 eff really means to: About a criterion of successfully executing a circuit in the NISQ era: what wd $\ll$ 1/휖� eff really means

Is this the behaviour you are seeing? I think this is caused by the UnicodeToLatexFormatter.

int cpCurrent = result.codePointAt(i);
Integer cpNext = result.codePointAt(i + 1);
String code = HTMLUnicodeConversionMaps.ESCAPED_ACCENTS.get(cpNext);
if (code == null) {
sb.append((char) cpCurrent);
} else {

As you can see, we are explicitly converting the codepoint to a char. Italic epsilon however has a codepoint of (decimal) 120598 which can't be represented by a java char. Another problem is that italic epsilon is not a bmp codepoint. it is represented by a surrogate pair so the actual length of the character italic epsilon is 2 which is why it converts it into 2 separate characters above.

This fixes the issue.

int cpCurrent = result.codePointAt(i);
Integer cpNext = result.codePointAt(i + 1);
String code = HTMLUnicodeConversionMaps.ESCAPED_ACCENTS.get(cpNext);
if (code == null) {
    // skip next index to avoid reading surrogate as a separate char 
    if (!Character.isBmpCodePoint(cpCurrent)) {
        i++;
    }
    sb.append(new String(Character.toChars(cpCurrent)));
} else {

This doesn't fix the actual issue though since italic epsilon can't be converted to latex without the amsmath latex package. I am not too sure about that though since I am not familiar with latex.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

Is this the behaviour you are seeing? I think this is caused by the UnicodeToLatexFormatter.

Yup that is what I am getting ^^
I am not completely sure what the original intent of the cast to char is. Does replacing them with StringBuilder's appendCodePoint also work? It would be good to get that fixed as well.

... since italic epsilon can't be converted to latex without the amsmath latex package. I am not too sure about that though since I am not familiar with latex.

I don't know what is the correct conversion. Other conversions suggest AMS font symbols so that in itself should not be an issue, see

{"8450", "complexes", "$\\mathbb{C}$"}, // double struck capital C -- requires e.g. amsfonts

@tmrd993
Copy link
Contributor

tmrd993 commented Feb 1, 2021

I am not completely sure what the original intent of the cast to char is. Does replacing them with StringBuilder's appendCodePoint also work?

Yep! That works. Good catch.

I don't know what is the correct conversion. Other conversions suggest AMS font symbols so that in itself should not be an issue, see

{"8450", "complexes", "$\\mathbb{C}$"}, // double struck capital C -- requires e.g. amsfonts

Alright, I guess it's ok to just add another entry for the epsilon (U+1D716) in that case?

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

k3KAW8Pnf7mkmdSMPHz27 commented Feb 1, 2021

Alright, I guess it's ok to just add another entry for the epsilon (U+1D716) in that case?

I am having issues reproducing the correct character, what do you convert it to in latex?

@tmrd993
Copy link
Contributor

tmrd993 commented Feb 1, 2021

Alright, I guess it's ok to just add another entry for the epsilon (U+1D716) in that case?

I am having issues reproducing the correct character, what do you convert it to in latex?

$\mathit{\epsilon}$

EDIT: This works but the dependency used for converting latex to unicode (https://github.com/tomtung/latex2unicode) can't handle that expression for some reason. It converts it to a "smallest element of" character https://www.fileformat.info/info/unicode/char/220a/index.htm.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

k3KAW8Pnf7mkmdSMPHz27 commented Feb 1, 2021

I am not sure of what is the right answer here.
I'd argue that it is better that it does not remain a Unicode character, even if it doesn't strictly produce the right characters in all circumstances. Otherwise, I'd guess it can create problems if Unicode characters are not supported.
But it will also prevent the user from reverting to Unicode at a later point. Any idea @Siedlerchr?

$\mathit{\epsilon}$

AMS math convert it to U+1D716 and not U+1D700?

@tmrd993
Copy link
Contributor

tmrd993 commented Feb 1, 2021

$\mathit{\epsilon}$

AMS math convert it to U+1D716 and not U+1D700?

Yes, U+1D716 is the correct character.

@koppor
Copy link
Member

koppor commented Feb 1, 2021

Refs #3644 and #6155

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

k3KAW8Pnf7mkmdSMPHz27 commented Feb 2, 2021

Yes, U+1D716 is the correct character.

I am sorry, but could you give me some more information regarding how you produce this? I am having issues reproducing it outside of an ACM template, where$\epsilon$ is sufficient. Perhaps it is related to that I am using Overleaf, but in both Mac OS X and Windows 10 amsmath + $\mathit{\epsilon}$, outside of the previously mentioned template, I don't get the correct result.
What packages are you including, and is $\epsilon$ sufficient?
Perhaps it would be a good idea to go ahead with this anyway. The only alternative I can think of is to display a warning/fix dialog to the user when an error of this type occurs. And that I am having issues producing this particular symbol might not be important.

@tmrd993
Copy link
Contributor

tmrd993 commented Feb 2, 2021

Yes, U+1D716 is the correct character.

I am sorry, but could you give me some more information regarding how you produce this? I am having issues reproducing it outside of an ACM template, where$\epsilon$ is sufficient. Perhaps it is related to that I am using Overleaf, but in both Mac OS X and Windows 10 amsmath + $\mathit{\epsilon}$, outside of the previously mentioned template, I don't get the correct result.
What packages are you including, and is $\epsilon$ sufficient?
Perhaps it would be a good idea to go ahead with this anyway. The only alternative I can think of is to display a warning/fix dialog to the user when an error of this type occurs. And that I am having issues producing this particular symbol might not be important.

I am also using Overleaf on Windows 10. I just include amsmath and use$\mathit{\epsilon}$ but $\epsilon$ produces the same result for some reason.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

but $\epsilon$ produces the same result for some reason.

I'd say $\epsilon$ seem to be the way to go in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants