Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The way concepts with notation codes are sorted should be described in config.ttl #937

Closed
kouralex opened this issue Feb 24, 2020 · 6 comments · Fixed by #1205
Closed

The way concepts with notation codes are sorted should be described in config.ttl #937

kouralex opened this issue Feb 24, 2020 · 6 comments · Fixed by #1205
Assignees
Milestone

Comments

@kouralex
Copy link
Contributor

At the moment, there are different types of classification systems published on finto.fi, however, they all must now follow the same sorting setting for notation codes, namely strnatcasecmp method. This causes issues whilst sorting e.g., related concepts for http://finto.fi/ykl/en/page/49.262 that has related concepts such as

30.1 Sociology
30.8 Good manners and etiquette
30.11 Social systems
30.12 Sociology of lifestyles. Cultural sociology
30.81 Arranging parties and celebrations
(in this particular order).

Note that for the decimal classification system used for this vocabulary (PCL) is not interoperable with the wanted ordering of related concepts, which would look like this:

30.1 Sociology
30.11 Social systems
30.12 Sociology of lifestyles. Cultural sociology
30.8 Good manners and etiquette
30.81 Arranging parties and celebrations

That said, there should be a way to describe the way sorting of notation codes is done by Skosmos as some classification systems prefer natural sorting and others do not. Also, there may be other cases (undocumented here) that should be taken into consideration - the implementation should allow user-defined sorting.

Please, have a look at related issues (#265 , #556, #737 and #889) for background details.

@kouralex kouralex added this to the Next Tasks milestone Feb 24, 2020
@joelit joelit added the size-medium 2 hours to 2 days label Feb 26, 2020
@joelit joelit unassigned osma May 20, 2021
@osma osma assigned osma and unassigned osma Sep 9, 2021
@osma
Copy link
Member

osma commented Sep 14, 2021

There are two reasonable ways of sorting by notation that I can think of: decimal (where 30.10 comes before 30.2, as in decimal classifications such as UDC, DDC and YKL/PLC - the most common case) and natural (where 30.10 comes after 30.9, e.g. in the PTVL classification). We already have a configuration setting skosmos:sortByNotation which takes a boolean value. I think we could extend it so that the value can be used to specify the sorting strategy, like this:

skosmos:sortByNotation "false";    # disabled, i.e. sort by labels, not notations
skosmos:sortByNotation "decimal";  # decimal sorting (see above)
skosmos:sortByNotation "natural";  # natural sorting (see above)
skosmos:sortByNotation "true";     # default sorting i.e. decimal, as it's the most common

I looked at locations in the code where sorting by notation currently happens. I think these are the important ones:

  1. In ConceptProperty.sortValues, notations are sorted in natural order if the sortByNotation setting is true.
  2. In ConceptPropertyValue.__toString, the notation is prepended to the returned label if the sortByNotation setting is true. I assume the idea is that this label is used for sorting elsewhere in the code, but I didn't look closer. I wonder if this is necessary at all?
  3. On the JS side, in the sort function passed to jsTree, notations are used for sorting (using < and > string comparisons, which correspond to the decimal strategy) if they are available. The comment is misleading: it says "sort on notation if requested", but the code actually only checks window.showNotation which contains the value of the skosmos:showNotation configuration setting, not sortByNotation. This is probably a bug. The sortByNotation setting is not passed to JavaScript code at all AFAICT.

To me it seems that 1. and 3. are the important cases; 2. should be investigated further to see what (if any) effect the prepending of notation codes has.

Making the notation code sorting strategy configurable means that the setting must be passed both to PHP and JS code, and the code in the above locations must be adjusted so that it can handle both decimal and natural sorting of notation codes.

@kouralex
Copy link
Contributor Author

kouralex commented Sep 14, 2021

Very interesting idea @osma !

I think your approach might work very well!

If I recall correctly, step 2. is also important as it decides how the values are ordered on the concept page. Of course, in all cases, the ordered listing should be the same considering the given skosmos:sortByNotation value.

That said, implementing the above solution could give us consistent values between PHP/JS sorting mechanisms, if done well. I am quite sure some work needs to be taken care for handling the "extra/special" characters the same - needless to say about the content language sorting use case. However, SPARQL queries returning ordered lists (based on labels) won't be fixed in any case but that can be raised as an issue later on (actually, #1190 already reflects this).

Ps. could you please use permalinks instead of code links to master? 🙂 GitHub displays it differently (i.e., directly here) and those lines can not change (that is what the "perma" for "permanent" is there for) - code in master branch can change and the lines be out of order. See attached image below:
image

@osma
Copy link
Member

osma commented Sep 14, 2021

If I recall correctly, step 2. is also important as it decides how the values are ordered on the concept page. Of course, in all cases, the ordered listing should be the same considering the given skosmos:sortByNotation value.

This didn't seem to be the case - the values on the concept page are sorted by the code in 1. (I just tested)
This could be a remnant of some old sorting mechanism that is no longer used, but I may be overlooking something.

That said, implementing the above solution could give us consistent values between PHP/JS sorting mechanisms, if done well. I am quite sure some work needs to be taken care for handling the "extra/special" characters the same - needless to say about the content language sorting use case.

Yes, the sort order in both PHP and JS should be consistent at least in the common cases (alphanumeric notation codes with a few commonly occurring special characters such as .-/).

On the PHP side, we have strcoll for decimal sorting and strnatcasecmp for natural sorting.
On the JS side, I think normal string comparison (< and >) works well enough for decimal sorting. For natural sorting, it might be possible to use localeCompare, but there are other options as well.

However, SPARQL queries returning ordered lists (based on labels) won't be fixed in any case

Out of scope for this issue.

Ps. could you please use permalinks instead of code links to master?

Good point - I changed to permalinks by editing the above comment. The display is still the same though (probably because I used Markdown links, not plain URLs)

@osma
Copy link
Member

osma commented Sep 14, 2021

For natural sorting, it might be possible to use localeCompare, but there are other options as well.

Oh, I forgot that we already have a naturalCompare function in JS so let's just use it here:

// Natural sort from: http://stackoverflow.com/a/15479354/3894569
function naturalCompare(a, b) {
var ax = [], bx = [];
a.replace(/(\d+)|(\D+)/g, function(_, $1, $2) { ax.push([$1 || Infinity, $2 || ""]); });
b.replace(/(\d+)|(\D+)/g, function(_, $1, $2) { bx.push([$1 || Infinity, $2 || ""]); });
while(ax.length && bx.length) {
var an = ax.shift();
var bn = bx.shift();
var nn = (an[0] - bn[0]) || an[1].localeCompare(bn[1], lang);
if(nn) return nn;
}
return ax.length - bx.length;
}

@kouralex
Copy link
Contributor Author

kouralex commented Sep 14, 2021

If I recall correctly, step 2. is also important as it decides how the values are ordered on the concept page. Of course, in all cases, the ordered listing should be the same considering the given skosmos:sortByNotation value.

This didn't seem to be the case - the values on the concept page are sorted by the code in 1. (I just tested)
This could be a remnant of some old sorting mechanism that is no longer used, but I may be overlooking something.

I bet there are some tests that do use this feature. Following their definitions should unveil the mystery.

Edit: also git blame may tell something?

@osma
Copy link
Member

osma commented Sep 14, 2021

I bet there are some tests that do use this feature. Following their definitions should unveil the mystery.

Nope, I just removed it in PR #1205 and nothing happened.

Edit: also git blame may tell something?

Didn't check, but my strong suspicion at the moment is that this is just dead code.

@osma osma modified the milestones: Next Tasks, 2.12 Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants