Language labels matching MARC standard #2933

damien-git · 2023-06-07T16:43:32Z

TODO

When merging, add changelog note about need to reindex.
Open followup PR adding support for indexing ISO codes as discussed below.

demiankatz · 2023-06-07T16:50:19Z

Thanks, @damien-git, this is very helpful! I'm requesting a couple of other reviews -- I'm interested in @EreMaijala's opinion about this (particularly with regard to how it might impact RecordManager), and @lahmann's input on whether this might cause his team any problems (since I know he's been interested in translation of language names for a long time).

There's also a question of whether this should be done as part of the 9.1 release or the 10.0 release -- introducing this will require a full reindex to ensure consistency of language facets, and it also offers a bit of a backward compatibility break in the sense that saved search links containing changed facet values will no longer work (though I suspect this is a rare/insignificant situation).

In any case, I'll be happy to merge and approve this once we have some consensus on timing, etc. Thanks again!

damien-git · 2023-06-07T16:54:22Z

introducing this will require a full reindex to ensure consistency of language facets

Good point, it only takes effect after re-importing data into Solr, and a partial import would create inconsistent results.

demiankatz · 2023-06-07T16:55:38Z

Good point, it only takes effect after re-importing data into Solr, and a partial import would create inconsistent results.

I've added a TODO checkbox to the top of the PR as a reminder to put a changelog note about this into whichever release we end up merging it into. :-)

EreMaijala · 2023-06-09T07:29:05Z

@demiankatz Well, I'd prefer indexing codes instead of translations. They're shorter, and you can change the translations (like "Ancient Greek" => "Greek, Ancient") without affecting searches, and of course display them in the UI language. The downside is that sorting in index order won't match the translations.

RecordManager comes with a sample mapping that normalizes language codes to ISO 639-3 as far as possible, so the missing piece would be translations for them in VuFind, and probably a mechanism to sort them properly if displayed in advanced search form. I can provide English, Finnish and Swedish translations, if deemed useful.

demiankatz · 2023-06-09T10:15:49Z

@EreMaijala, maybe the best way to move forward with this is to add support for ISO codes without changing the existing behavior (at least, not yet). That is, keep the current language field, but add something like iso639_3_str_mv for ISO codes. Then we can index into both fields, and end users can choose whether they want the flexibility of using codes or the advantages (like backward compatibility and index sortability) of the "legacy" approach. This would also allow us to potentially change the language display logic so that if ISO codes are found in the index, we display language by translating the codes... but if they are not, we fall back to displaying the legacy language field. This would add flexibility for displaying unusual languages if there is a need to represent something that does not correspond with a code. What do you think? If this seems like a potentially valuable approach, maybe we can start a separate PR to begin implementing it, and we can keep this one focused on simply bringing the legacy strings up to date.

EreMaijala · 2023-06-09T10:21:36Z

@demiankatz Makes sense to me!

demiankatz · 2023-06-09T11:16:47Z

Thanks, @EreMaijala. If you have a chance to start a PR with your translation files, I can then work on turning it into a replacement for #413, and perhaps we can finally close our oldest open PR. :-)

demiankatz · 2023-06-12T12:21:24Z

Thanks, @EreMaijala -- at this point, I'll just wait for @lahmann's input to see if that impacts the timing of the merge for these updates.

damien-git · 2023-06-15T18:29:25Z

There is a text encoding issue: I used UTF-8, but apparently it needs to be in ISO-8859-1 to be imported correctly into Solr (I guess it's still the default encoding for properties files in Java). Unfortunately one character does not appear in ISO-8859-1, the ã in Lahndā (U+0101). I think I will just use "a" to replace it. In the long run it would be nice to use UTF-8 for these files.

demiankatz · 2023-06-15T18:44:24Z

@damien-git, which version of the code are you using? The most recent release of SolrMarc (which I updated in the dev branch fairly recently) changes the encoding requirement from ISO-8859-1 to UTF-8, which should solve your problems. See #2942.

damien-git · 2023-06-15T18:49:07Z

I was trying with VuFind 8.1. Will this be in VuFind 9 or 10 ? Either way, I will revert to using UTF-8 for this PR.

This reverts commit a007483.

demiankatz · 2023-06-15T19:11:52Z

I was trying with VuFind 8.1. Will this be in VuFind 9 or 10 ? Either way, I will revert to using UTF-8 for this PR.

The SolrMarc upgrade will be in 9.1.

demiankatz · 2023-06-21T10:53:28Z

@damien-git, I have taken care of merging #2942 into the dev branch. Do you mind taking care of conflict resolution here when time permits?

demiankatz · 2023-06-28T18:41:25Z

Thanks for the conflict resolution, @damien-git. I have conservatively pinned this PR to the 10.0 milestone, but if @lahmann or others feel it's worth merging sooner in spite of the minor BC implications, I'm not opposed to including it in 9.1 instead. I just want to be sure we don't lose track of this work. I've also added a TODO checkbox as a reminder to work on the ISO code indexing project discussed with @EreMaijala above.

demiankatz · 2023-11-07T18:07:50Z

@EreMaijala, I suspect our earlier conversation about opening a PR with ISO code translation mappings may have gotten lost in the shuffle somewhere. Is this something you're still willing/able to do? If we can get that started, we might also be able to start incorporating @ThoWagen's ideas about translation aliases discussed in #413, and move that project forward a bit.

EreMaijala · 2023-11-08T11:13:21Z

@demiankatz Indeed, I completely forgot about this one. I'll see to it.

EreMaijala · 2023-11-08T11:28:57Z

@demiankatz #3200.

demiankatz

Thanks again, @damien-git! I'm merging this now and will move remaining TODO items over to #3200, which contains a start on the ISO code translation work.

Fixes issues with encapsulated records, popups and embedded favorite lists.

Language labels matching MARC standard

43a81d4

demiankatz requested review from EreMaijala and lahmann June 7, 2023 16:48

EreMaijala approved these changes Jun 12, 2023

View reviewed changes

Fixed encoding for language_map.properties

a007483

Revert "Fixed encoding for language_map.properties"

d0fb822

This reverts commit a007483.

demiankatz mentioned this pull request Jun 21, 2023

Convert translation maps to UTF-8. #2942

Merged

1 task

Merge branch 'dev' into language_map

3d43085

demiankatz added this to the 10.0 milestone Jun 28, 2023

sturkel89 added the improvement label Oct 26, 2023

demiankatz self-assigned this Nov 7, 2023

demiankatz added the small Minor changes to relatively few files label Nov 7, 2023

demiankatz approved these changes Nov 8, 2023

View reviewed changes

demiankatz mentioned this pull request Nov 8, 2023

Add ISO 639-3 language code translations. #3200

Merged

7 tasks

sturkel89 merged commit 1e899ad into vufind-org:dev Nov 13, 2023

EreMaijala pushed a commit to EreMaijala/vufind that referenced this pull request Aug 27, 2024

[FINNA-2192] Fix unsupported pseudo selectors (vufind-org#2933)

5b8669a

Fixes issues with encapsulated records, popups and embedded favorite lists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language labels matching MARC standard #2933

Language labels matching MARC standard #2933

damien-git commented Jun 7, 2023 •

edited by demiankatz

Loading

demiankatz commented Jun 7, 2023

damien-git commented Jun 7, 2023

demiankatz commented Jun 7, 2023

EreMaijala commented Jun 9, 2023

demiankatz commented Jun 9, 2023

EreMaijala commented Jun 9, 2023

demiankatz commented Jun 9, 2023

demiankatz commented Jun 12, 2023

damien-git commented Jun 15, 2023

demiankatz commented Jun 15, 2023 •

edited

Loading

damien-git commented Jun 15, 2023

demiankatz commented Jun 15, 2023

demiankatz commented Jun 21, 2023

demiankatz commented Jun 28, 2023

demiankatz commented Nov 7, 2023

EreMaijala commented Nov 8, 2023

EreMaijala commented Nov 8, 2023

demiankatz left a comment

Language labels matching MARC standard #2933

Language labels matching MARC standard #2933

Conversation

damien-git commented Jun 7, 2023 • edited by demiankatz Loading

demiankatz commented Jun 7, 2023

damien-git commented Jun 7, 2023

demiankatz commented Jun 7, 2023

EreMaijala commented Jun 9, 2023

demiankatz commented Jun 9, 2023

EreMaijala commented Jun 9, 2023

demiankatz commented Jun 9, 2023

demiankatz commented Jun 12, 2023

damien-git commented Jun 15, 2023

demiankatz commented Jun 15, 2023 • edited Loading

damien-git commented Jun 15, 2023

demiankatz commented Jun 15, 2023

demiankatz commented Jun 21, 2023

demiankatz commented Jun 28, 2023

demiankatz commented Nov 7, 2023

EreMaijala commented Nov 8, 2023

EreMaijala commented Nov 8, 2023

demiankatz left a comment

Choose a reason for hiding this comment

damien-git commented Jun 7, 2023 •

edited by demiankatz

Loading

demiankatz commented Jun 15, 2023 •

edited

Loading