Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"\L\k{e}ski" is displayed after 'Z' #2460

Closed
pirlite2 opened this issue Jan 12, 2017 · 15 comments
Closed

"\L\k{e}ski" is displayed after 'Z' #2460

pirlite2 opened this issue Jan 12, 2017 · 15 comments

Comments

@pirlite2
Copy link

JabRef 3.8.2 + Linux Mint 17.3

The author name "\L\k{e}ski" is sorted out of order. It appears after the last entry beginning with 'Z'. I believe this is incorrect behaviour as "\L" is listed in the Unicode charts as an "L with stroke".

Peter

@lenhard
Copy link
Member

lenhard commented Jan 13, 2017

Hi Peter,

thanks for the report. I think this is rather a feature request than a bug report, since JabRef really is not able to do this so far. JabRef relies on Java string sorting internally and this does not care about accentuated characters.

What you are asking for is a sorting that depends on the locale and takes non-english characters into account. This is possible to implement, but it is really a new feature or rather an enhancement of the current sorting feature.

Some context: http://www.javapractices.com/topic/TopicAction.do?Id=207

@pirlite2
Copy link
Author

Ummm... Well names beginning with '\v{S}', sort correctly, for example. And that is an obscure, Eastern European diacritically-decorated letter. As do many others. The only deviation from correct sorting I can identify is '\L'. So if it all works fine apart from one unusual case, my simple-minded view is that that is a bug rather than a feature. :-)

(I am curious why the Java string sorting routine correctly strips the diacritic stuff from everything other than '\L'. JabRef must ignore/remove leading escaped characters somehow before sorting. But whereas most diacritics are specified in Latex as '\*', 'L with stroke' seems to be a unique pattern. Hence my reasoning of bug - I suspect it is not stripping the leading '\' correctly in the case of '\L' but is doing everything else right.)

And I don't think I am asking for anything locale related. Unless you are proposing I only cite work published by people with proper English names! Unfortunately, large numbers of 'foreign' people make important contributions to my field.

@lenhard
Copy link
Member

lenhard commented Jan 13, 2017

Very interesting, I guess we will have to look into this deeper. My prior comment was just based on my knowledge of the code and I did not actually look at it, which is why I might be wrong in my assumptions.

Regarding the Locale: I was not suggesting anything of the like. I have a "foreign" first name and I really hope that this does not prevent people from citing my work ;-) My Locale comment was purely technical and related to the implementation in Java.

@pirlite2
Copy link
Author

Just to add to this, the entry preview for my Polish friend renders as:

Article (Lkeski2003)
Łęski, J.

which I think I would describe as a dog's breakfast! In the bibtexkey, the escape character has been correctly stripped from the 'L' but the '\k' that produces the ogonek on the 'e' has been mangled.

Curiously, in the process of posting this, I have discovered that the literal text from the entry preview renders here as:

Article (Lkeski2003)
Łęski, J.

So we have some strange hybrid of Markdown(-compatible) rendering in JabRef :-)

@lenhard
Copy link
Member

lenhard commented Feb 10, 2017

We have now replaced our conversion code in #2532 with an external library: latex2unicode.

I have tested the sequence described above in the UI and it now renders as it should. Hence, I am closing this issue. Feel free to reopen it if the problem reappears.

@lenhard lenhard closed this as completed Feb 10, 2017
@lenhard
Copy link
Member

lenhard commented Feb 10, 2017

And revisiting this issue, I just realized that this is not only about rendering (which works now, also in the preview), but also about the sorting.

Sorting is done by the glazedlists library, so this is really outside of JabRef at the moment. I guess it would be possible to reimplement the sorting functionality to place selected diacritics in between the "normal" letters of the alphabet, so in case anybody volunteers for contributing a PR, we would be happy to integrate it.

@lenhard lenhard reopened this Feb 10, 2017
@lenhard
Copy link
Member

lenhard commented Feb 10, 2017

The rendering is tested now: 9eef09c

@tobiasdiez
Copy link
Member

tobiasdiez commented Jan 30, 2018

I think, it is not possible to have a meaningful sorting across languages. There are always problems where the order of words changes for different languages. For example, chalina curioso is correct in English but would be sorted the other way around in Spanish (because ch has a special meaning and is not just c followed by some other letters). So this phenomenon already appears for normal letters, but of course also for special ones. For example, the German ä is viewed as ae for sorting. I guess the Java sorting implementation has a similar replacement for Ł, which places it after Z.

In view of these problems, I would vote to close this issue as wont-fix and stick to the Java implementation.

@lenhard
Copy link
Member

lenhard commented Jan 30, 2018

@tobiasdiez I mostly agree with you, but there is actually support for localized String sorting in Java, see: https://docs.oracle.com/javase/tutorial/i18n/text/locale.html

Previously, we could not apply this because of the glazed lists (I think). Any chance of getting this into the new maintable implementation?

@tobiasdiez
Copy link
Member

If there exists a nice cross-language sorting algorithm, then this very easy to implement. The StringTableColumn class just needs to call setComparator with the desired comparator.

@pirlite2
Copy link
Author

pirlite2 commented Jan 30, 2018 via email

@lenhard
Copy link
Member

lenhard commented Jan 30, 2018

@tobiasdiez Let us look into this at some point. If it is easy to achieve and the Java-based localized String sorting works, we can go for it. Just no customized hacks for certain letters or the like. Thus if it doesn't work out-of-the-box with Java, then we can close this issue as won't fix.

I do not think this issue is important for the migration of the main table right now, so you can leave it from the list in #3621

@pirlite2
Copy link
Author

pirlite2 commented Jan 30, 2018 via email

@stefan-kolb
Copy link
Member

Not sure if this is easily doable with the standard Java sorting behavior.

@pirlite2
Copy link
Author

pirlite2 commented Feb 6, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants