"\L\k{e}ski" is displayed after 'Z' #2460

pirlite2 · 2017-01-12T20:11:13Z

JabRef 3.8.2 + Linux Mint 17.3

The author name "\L\k{e}ski" is sorted out of order. It appears after the last entry beginning with 'Z'. I believe this is incorrect behaviour as "\L" is listed in the Unicode charts as an "L with stroke".

Peter

lenhard · 2017-01-13T09:35:13Z

Hi Peter,

thanks for the report. I think this is rather a feature request than a bug report, since JabRef really is not able to do this so far. JabRef relies on Java string sorting internally and this does not care about accentuated characters.

What you are asking for is a sorting that depends on the locale and takes non-english characters into account. This is possible to implement, but it is really a new feature or rather an enhancement of the current sorting feature.

Some context: http://www.javapractices.com/topic/TopicAction.do?Id=207

pirlite2 · 2017-01-13T10:14:29Z

Ummm... Well names beginning with '\v{S}', sort correctly, for example. And that is an obscure, Eastern European diacritically-decorated letter. As do many others. The only deviation from correct sorting I can identify is '\L'. So if it all works fine apart from one unusual case, my simple-minded view is that that is a bug rather than a feature. :-)

(I am curious why the Java string sorting routine correctly strips the diacritic stuff from everything other than '\L'. JabRef must ignore/remove leading escaped characters somehow before sorting. But whereas most diacritics are specified in Latex as '\*', 'L with stroke' seems to be a unique pattern. Hence my reasoning of bug - I suspect it is not stripping the leading '\' correctly in the case of '\L' but is doing everything else right.)

And I don't think I am asking for anything locale related. Unless you are proposing I only cite work published by people with proper English names! Unfortunately, large numbers of 'foreign' people make important contributions to my field.

lenhard · 2017-01-13T10:31:30Z

Very interesting, I guess we will have to look into this deeper. My prior comment was just based on my knowledge of the code and I did not actually look at it, which is why I might be wrong in my assumptions.

Regarding the Locale: I was not suggesting anything of the like. I have a "foreign" first name and I really hope that this does not prevent people from citing my work ;-) My Locale comment was purely technical and related to the implementation in Java.

pirlite2 · 2017-01-13T17:01:06Z

Just to add to this, the entry preview for my Polish friend renders as:

Article (Lkeski2003)
&Lstrok;&eogon;ski, J.

which I think I would describe as a dog's breakfast! In the bibtexkey, the escape character has been correctly stripped from the 'L' but the '\k' that produces the ogonek on the 'e' has been mangled.

Curiously, in the process of posting this, I have discovered that the literal text from the entry preview renders here as:

Article (Lkeski2003)
Łęski, J.

So we have some strange hybrid of Markdown(-compatible) rendering in JabRef :-)

lenhard · 2017-02-10T09:16:16Z

We have now replaced our conversion code in #2532 with an external library: latex2unicode.

I have tested the sequence described above in the UI and it now renders as it should. Hence, I am closing this issue. Feel free to reopen it if the problem reappears.

lenhard · 2017-02-10T10:37:26Z

And revisiting this issue, I just realized that this is not only about rendering (which works now, also in the preview), but also about the sorting.

Sorting is done by the glazedlists library, so this is really outside of JabRef at the moment. I guess it would be possible to reimplement the sorting functionality to place selected diacritics in between the "normal" letters of the alphabet, so in case anybody volunteers for contributing a PR, we would be happy to integrate it.

lenhard · 2017-02-10T10:52:20Z

The rendering is tested now: 9eef09c

tobiasdiez · 2018-01-30T12:54:00Z

I think, it is not possible to have a meaningful sorting across languages. There are always problems where the order of words changes for different languages. For example, chalina curioso is correct in English but would be sorted the other way around in Spanish (because ch has a special meaning and is not just c followed by some other letters). So this phenomenon already appears for normal letters, but of course also for special ones. For example, the German ä is viewed as ae for sorting. I guess the Java sorting implementation has a similar replacement for Ł, which places it after Z.

In view of these problems, I would vote to close this issue as wont-fix and stick to the Java implementation.

lenhard · 2018-01-30T13:06:59Z

@tobiasdiez I mostly agree with you, but there is actually support for localized String sorting in Java, see: https://docs.oracle.com/javase/tutorial/i18n/text/locale.html

Previously, we could not apply this because of the glazed lists (I think). Any chance of getting this into the new maintable implementation?

tobiasdiez · 2018-01-30T13:25:02Z

If there exists a nice cross-language sorting algorithm, then this very easy to implement. The StringTableColumn class just needs to call setComparator with the desired comparator.

pirlite2 · 2018-01-30T13:25:39Z

Except we are talking about names here. I am not sure how sorting "curious scarf" (?) in Spanish is relevant. "{\L}" is listed in the Unicode docs as "Latin capital letter L with stroke". If a Polish person would sort "\L\k{e}ski" after 'Z' then I agree with Tobias's view. Otherwise, I persist with the view that it's a bug! I suspect the placement of this name after 'Z' results from some unintended behaviour in the sorting algorithm rather than any specific policy. Can any Polish person arbitrate here? Where would "\L\k{e}ski" appear in the Warsaw telephone directory? AFAIU, a letter with a diacritical mark is just that - a letter with a qualifying mark on it. Peter

…

On 30/01/18 12:54, Tobias Diez wrote: I think, it is not possible to have a meaningful sorting across languages. There are always problems where the order of words changes for different languages. For example, |chalina curioso| is correct in English but would be sorted the other way around in Spanish (because |ch| has a special meaning and not just |c| followed by some other letters). So this phenomenon already appears for normal letters, but of course also for special ones. For example, the German |ä| is viewed as |ae| for sorting. I guess the Java sorting implementation has a similar replacement for |Ł|, which places it after |Z|. In view of these problems, I would vote to close this issue as |wont-fix| and stick to the Java implementation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2460 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIRMi96bMEcQQujSO-3Cg4xc6CWu-Iwmks5tPxDqgaJpZM4LiLnE>.

lenhard · 2018-01-30T13:36:47Z

@tobiasdiez Let us look into this at some point. If it is easy to achieve and the Java-based localized String sorting works, we can go for it. Just no customized hacks for certain letters or the like. Thus if it doesn't work out-of-the-box with Java, then we can close this issue as won't fix.

I do not think this issue is important for the migration of the main table right now, so you can leave it from the list in #3621

pirlite2 · 2018-01-30T13:37:36Z

A look up table? From my brief digging, the purpose of a diacritical mark in the Latin alphabet is to modify the pronunciation. This implies that an 'L' with diacritic should be sorted after 'K' but before 'M'. What I can't answer is which should be sorted first: "L" or "\L". Suggest the advice of a linguistics person is sought on this. Otherwise, I think it's what Hunt & Thomas would term a "broken window". Peter

…

On 30/01/18 13:25, Tobias Diez wrote: If there exists a nice cross-language sorting algorithm, then this very easy to implement. The |StringTableColumn| class just needs to call |setComparator| with the desired comparator. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2460 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIRMi8kV2kugn3n-1BubBxfjB_NJpHosks5tPxgwgaJpZM4LiLnE>.

stefan-kolb · 2018-02-06T20:13:39Z

Not sure if this is easily doable with the standard Java sorting behavior.

pirlite2 · 2018-02-06T21:46:06Z

For the record, I have had a closer look at this lexical ordering issue and it turns out to be far more complicated than I thought. 'L with stroke' (\L) in Polish should indeed be sorted after 'L'. But there seem to be a wide variety of language specific practices. So, for example, to implement all of these correctly, a German with a Hungarian name might fall in a different sort order compared to a Hungarian with a German name! (Incidentally, taking the example quoted of 'ch' in Spanish, according to my digging the Royal Spanish Academy decreed in 1994 that this should not be treated as a single letter but a 'c' followed by an 'h' - i.e. two letters. I infer the practice still persists in Spain informally.) So. There is no uniform sorting method that will please everybody. The best that seems on offer would be a consensus hybrid ordering in which a letter with a diacritic should be sorted after the undecorated letter. Letters with different diacritics could be sorted according to Unicode code point. (But this may hack off the Estonian and Slovak user base! And also Germans, I think.) Perhaps a more rational sort method could be added to the wish list for future - I still maintain '\L' appearing after 'Z' is an unintended side effect of the sorting algorithm used; sorting a diacritic-decorated 'L' after 'Z' does not conform to /an//y/ language specific sort order and so should hack off everybody! P.

…

On 06/02/18 20:13, Stefan Kolb wrote: Not sure if this is easily doable with the standard Java sorting behavior. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2460 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIRMi8UzItwc2fI1ieol68NRvFOkRk6gks5tSLJ0gaJpZM4LiLnE>.

This was referenced Jan 13, 2017

"\L{}\k{e}ski" sorted after 'Z' #2461

Closed

Author name starting with '\L' sorted out of order #2462

Closed

lenhard added [outdated] type: enhancement component: ui labels Jan 13, 2017

This was referenced Jan 30, 2017

"\~" Escape Sequence Does Not Display Correctly in Entry Table #2458

Closed

"\'{}" Escape Sequence Does Not Display Correctly in Entry Preview #2498

Closed

lenhard closed this as completed Feb 10, 2017

lenhard reopened this Feb 10, 2017

lenhard added the help-wanted label Feb 10, 2017

tobiasdiez added the component: maintable label May 14, 2017

stefan-kolb closed this as completed Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"\L\k{e}ski" is displayed after 'Z' #2460

"\L\k{e}ski" is displayed after 'Z' #2460

pirlite2 commented Jan 12, 2017

lenhard commented Jan 13, 2017 •

edited

Loading

pirlite2 commented Jan 13, 2017

lenhard commented Jan 13, 2017

pirlite2 commented Jan 13, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

tobiasdiez commented Jan 30, 2018 •

edited

Loading

lenhard commented Jan 30, 2018

tobiasdiez commented Jan 30, 2018

pirlite2 commented Jan 30, 2018 via email

lenhard commented Jan 30, 2018

pirlite2 commented Jan 30, 2018 via email

stefan-kolb commented Feb 6, 2018

pirlite2 commented Feb 6, 2018 via email

"\L\k{e}ski" is displayed after 'Z' #2460

"\L\k{e}ski" is displayed after 'Z' #2460

Comments

pirlite2 commented Jan 12, 2017

lenhard commented Jan 13, 2017 • edited Loading

pirlite2 commented Jan 13, 2017

lenhard commented Jan 13, 2017

pirlite2 commented Jan 13, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

tobiasdiez commented Jan 30, 2018 • edited Loading

lenhard commented Jan 30, 2018

tobiasdiez commented Jan 30, 2018

pirlite2 commented Jan 30, 2018 via email

lenhard commented Jan 30, 2018

pirlite2 commented Jan 30, 2018 via email

stefan-kolb commented Feb 6, 2018

pirlite2 commented Feb 6, 2018 via email

lenhard commented Jan 13, 2017 •

edited

Loading

tobiasdiez commented Jan 30, 2018 •

edited

Loading