-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"\L\k{e}ski" is displayed after 'Z' #2460
Comments
Hi Peter, thanks for the report. I think this is rather a feature request than a bug report, since JabRef really is not able to do this so far. JabRef relies on Java string sorting internally and this does not care about accentuated characters. What you are asking for is a sorting that depends on the locale and takes non-english characters into account. This is possible to implement, but it is really a new feature or rather an enhancement of the current sorting feature. Some context: http://www.javapractices.com/topic/TopicAction.do?Id=207 |
Ummm... Well names beginning with '\v{S}', sort correctly, for example. And that is an obscure, Eastern European diacritically-decorated letter. As do many others. The only deviation from correct sorting I can identify is '\L'. So if it all works fine apart from one unusual case, my simple-minded view is that that is a bug rather than a feature. :-) (I am curious why the Java string sorting routine correctly strips the diacritic stuff from everything other than '\L'. JabRef must ignore/remove leading escaped characters somehow before sorting. But whereas most diacritics are specified in Latex as '\*', 'L with stroke' seems to be a unique pattern. Hence my reasoning of bug - I suspect it is not stripping the leading '\' correctly in the case of '\L' but is doing everything else right.) And I don't think I am asking for anything locale related. Unless you are proposing I only cite work published by people with proper English names! Unfortunately, large numbers of 'foreign' people make important contributions to my field. |
Very interesting, I guess we will have to look into this deeper. My prior comment was just based on my knowledge of the code and I did not actually look at it, which is why I might be wrong in my assumptions. Regarding the Locale: I was not suggesting anything of the like. I have a "foreign" first name and I really hope that this does not prevent people from citing my work ;-) My Locale comment was purely technical and related to the implementation in Java. |
Just to add to this, the entry preview for my Polish friend renders as: Article (Lkeski2003) which I think I would describe as a dog's breakfast! In the bibtexkey, the escape character has been correctly stripped from the 'L' but the '\k' that produces the ogonek on the 'e' has been mangled. Curiously, in the process of posting this, I have discovered that the literal text from the entry preview renders here as: Article (Lkeski2003) So we have some strange hybrid of Markdown(-compatible) rendering in JabRef :-) |
We have now replaced our conversion code in #2532 with an external library: latex2unicode. I have tested the sequence described above in the UI and it now renders as it should. Hence, I am closing this issue. Feel free to reopen it if the problem reappears. |
And revisiting this issue, I just realized that this is not only about rendering (which works now, also in the preview), but also about the sorting. Sorting is done by the glazedlists library, so this is really outside of JabRef at the moment. I guess it would be possible to reimplement the sorting functionality to place selected diacritics in between the "normal" letters of the alphabet, so in case anybody volunteers for contributing a PR, we would be happy to integrate it. |
The rendering is tested now: 9eef09c |
I think, it is not possible to have a meaningful sorting across languages. There are always problems where the order of words changes for different languages. For example, In view of these problems, I would vote to close this issue as |
@tobiasdiez I mostly agree with you, but there is actually support for localized String sorting in Java, see: https://docs.oracle.com/javase/tutorial/i18n/text/locale.html Previously, we could not apply this because of the glazed lists (I think). Any chance of getting this into the new maintable implementation? |
If there exists a nice cross-language sorting algorithm, then this very easy to implement. The |
Except we are talking about names here. I am not sure how sorting
"curious scarf" (?) in Spanish is relevant.
"{\L}" is listed in the Unicode docs as "Latin capital letter L with
stroke". If a Polish person would sort "\L\k{e}ski" after 'Z' then I
agree with Tobias's view. Otherwise, I persist with the view that it's a
bug!
I suspect the placement of this name after 'Z' results from some
unintended behaviour in the sorting algorithm rather than any specific
policy.
Can any Polish person arbitrate here? Where would "\L\k{e}ski" appear in
the Warsaw telephone directory? AFAIU, a letter with a diacritical mark
is just that - a letter with a qualifying mark on it.
Peter
…On 30/01/18 12:54, Tobias Diez wrote:
I think, it is not possible to have a meaningful sorting across
languages. There are always problems where the order of words changes
for different languages. For example, |chalina curioso| is correct in
English but would be sorted the other way around in Spanish (because
|ch| has a special meaning and not just |c| followed by some other
letters). So this phenomenon already appears for normal letters, but
of course also for special ones. For example, the German |ä| is viewed
as |ae| for sorting. I guess the Java sorting implementation has a
similar replacement for |Ł|, which places it after |Z|.
In view of these problems, I would vote to close this issue as
|wont-fix| and stick to the Java implementation.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2460 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIRMi96bMEcQQujSO-3Cg4xc6CWu-Iwmks5tPxDqgaJpZM4LiLnE>.
|
@tobiasdiez Let us look into this at some point. If it is easy to achieve and the Java-based localized String sorting works, we can go for it. Just no customized hacks for certain letters or the like. Thus if it doesn't work out-of-the-box with Java, then we can close this issue as won't fix. I do not think this issue is important for the migration of the main table right now, so you can leave it from the list in #3621 |
A look up table?
From my brief digging, the purpose of a diacritical mark in the Latin
alphabet is to modify the pronunciation. This implies that an 'L' with
diacritic should be sorted after 'K' but before 'M'. What I can't answer
is which should be sorted first: "L" or "\L". Suggest the advice of a
linguistics person is sought on this. Otherwise, I think it's what Hunt
& Thomas would term a "broken window".
Peter
…On 30/01/18 13:25, Tobias Diez wrote:
If there exists a nice cross-language sorting algorithm, then this
very easy to implement. The |StringTableColumn| class just needs to
call |setComparator| with the desired comparator.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2460 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIRMi8kV2kugn3n-1BubBxfjB_NJpHosks5tPxgwgaJpZM4LiLnE>.
|
Not sure if this is easily doable with the standard Java sorting behavior. |
For the record, I have had a closer look at this lexical ordering issue
and it turns out to be far more complicated than I thought.
'L with stroke' (\L) in Polish should indeed be sorted after 'L'. But
there seem to be a wide variety of language specific practices. So, for
example, to implement all of these correctly, a German with a Hungarian
name might fall in a different sort order compared to a Hungarian with a
German name! (Incidentally, taking the example quoted of 'ch' in
Spanish, according to my digging the Royal Spanish Academy decreed in
1994 that this should not be treated as a single letter but a 'c'
followed by an 'h' - i.e. two letters. I infer the practice still
persists in Spain informally.)
So. There is no uniform sorting method that will please everybody. The
best that seems on offer would be a consensus hybrid ordering in which a
letter with a diacritic should be sorted after the undecorated letter.
Letters with different diacritics could be sorted according to Unicode
code point. (But this may hack off the Estonian and Slovak user base!
And also Germans, I think.) Perhaps a more rational sort method could be
added to the wish list for future - I still maintain '\L' appearing
after 'Z' is an unintended side effect of the sorting algorithm used;
sorting a diacritic-decorated 'L' after 'Z' does not conform to /an//y/
language specific sort order and so should hack off everybody!
P.
…On 06/02/18 20:13, Stefan Kolb wrote:
Not sure if this is easily doable with the standard Java sorting behavior.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2460 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIRMi8UzItwc2fI1ieol68NRvFOkRk6gks5tSLJ0gaJpZM4LiLnE>.
|
JabRef 3.8.2 + Linux Mint 17.3
The author name "\L\k{e}ski" is sorted out of order. It appears after the last entry beginning with 'Z'. I believe this is incorrect behaviour as "\L" is listed in the Unicode charts as an "L with stroke".
Peter
The text was updated successfully, but these errors were encountered: