Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language labels matching MARC standard #2933

Merged
merged 4 commits into from
Nov 13, 2023

Conversation

damien-git
Copy link
Contributor

@damien-git damien-git commented Jun 7, 2023

See #2928.

TODO

  • When merging, add changelog note about need to reindex.
  • Open followup PR adding support for indexing ISO codes as discussed below.

@demiankatz demiankatz requested review from EreMaijala and lahmann June 7, 2023 16:48
@demiankatz
Copy link
Member

Thanks, @damien-git, this is very helpful! I'm requesting a couple of other reviews -- I'm interested in @EreMaijala's opinion about this (particularly with regard to how it might impact RecordManager), and @lahmann's input on whether this might cause his team any problems (since I know he's been interested in translation of language names for a long time).

There's also a question of whether this should be done as part of the 9.1 release or the 10.0 release -- introducing this will require a full reindex to ensure consistency of language facets, and it also offers a bit of a backward compatibility break in the sense that saved search links containing changed facet values will no longer work (though I suspect this is a rare/insignificant situation).

In any case, I'll be happy to merge and approve this once we have some consensus on timing, etc. Thanks again!

@damien-git
Copy link
Contributor Author

introducing this will require a full reindex to ensure consistency of language facets

Good point, it only takes effect after re-importing data into Solr, and a partial import would create inconsistent results.

@demiankatz
Copy link
Member

Good point, it only takes effect after re-importing data into Solr, and a partial import would create inconsistent results.

I've added a TODO checkbox to the top of the PR as a reminder to put a changelog note about this into whichever release we end up merging it into. :-)

@EreMaijala
Copy link
Contributor

@demiankatz Well, I'd prefer indexing codes instead of translations. They're shorter, and you can change the translations (like "Ancient Greek" => "Greek, Ancient") without affecting searches, and of course display them in the UI language. The downside is that sorting in index order won't match the translations.

RecordManager comes with a sample mapping that normalizes language codes to ISO 639-3 as far as possible, so the missing piece would be translations for them in VuFind, and probably a mechanism to sort them properly if displayed in advanced search form. I can provide English, Finnish and Swedish translations, if deemed useful.

@demiankatz
Copy link
Member

@EreMaijala, maybe the best way to move forward with this is to add support for ISO codes without changing the existing behavior (at least, not yet). That is, keep the current language field, but add something like iso639_3_str_mv for ISO codes. Then we can index into both fields, and end users can choose whether they want the flexibility of using codes or the advantages (like backward compatibility and index sortability) of the "legacy" approach. This would also allow us to potentially change the language display logic so that if ISO codes are found in the index, we display language by translating the codes... but if they are not, we fall back to displaying the legacy language field. This would add flexibility for displaying unusual languages if there is a need to represent something that does not correspond with a code. What do you think? If this seems like a potentially valuable approach, maybe we can start a separate PR to begin implementing it, and we can keep this one focused on simply bringing the legacy strings up to date.

@EreMaijala
Copy link
Contributor

@demiankatz Makes sense to me!

@demiankatz
Copy link
Member

Thanks, @EreMaijala. If you have a chance to start a PR with your translation files, I can then work on turning it into a replacement for #413, and perhaps we can finally close our oldest open PR. :-)

@demiankatz
Copy link
Member

Thanks, @EreMaijala -- at this point, I'll just wait for @lahmann's input to see if that impacts the timing of the merge for these updates.

@damien-git
Copy link
Contributor Author

There is a text encoding issue: I used UTF-8, but apparently it needs to be in ISO-8859-1 to be imported correctly into Solr (I guess it's still the default encoding for properties files in Java). Unfortunately one character does not appear in ISO-8859-1, the ã in Lahndā (U+0101). I think I will just use "a" to replace it. In the long run it would be nice to use UTF-8 for these files.

@demiankatz
Copy link
Member

demiankatz commented Jun 15, 2023

@damien-git, which version of the code are you using? The most recent release of SolrMarc (which I updated in the dev branch fairly recently) changes the encoding requirement from ISO-8859-1 to UTF-8, which should solve your problems. See #2942.

@damien-git
Copy link
Contributor Author

I was trying with VuFind 8.1. Will this be in VuFind 9 or 10 ? Either way, I will revert to using UTF-8 for this PR.

@demiankatz
Copy link
Member

I was trying with VuFind 8.1. Will this be in VuFind 9 or 10 ? Either way, I will revert to using UTF-8 for this PR.

The SolrMarc upgrade will be in 9.1.

@demiankatz
Copy link
Member

@damien-git, I have taken care of merging #2942 into the dev branch. Do you mind taking care of conflict resolution here when time permits?

@demiankatz demiankatz added this to the 10.0 milestone Jun 28, 2023
@demiankatz
Copy link
Member

Thanks for the conflict resolution, @damien-git. I have conservatively pinned this PR to the 10.0 milestone, but if @lahmann or others feel it's worth merging sooner in spite of the minor BC implications, I'm not opposed to including it in 9.1 instead. I just want to be sure we don't lose track of this work. I've also added a TODO checkbox as a reminder to work on the ISO code indexing project discussed with @EreMaijala above.

@demiankatz demiankatz self-assigned this Nov 7, 2023
@demiankatz demiankatz added the small Minor changes to relatively few files label Nov 7, 2023
@demiankatz
Copy link
Member

@EreMaijala, I suspect our earlier conversation about opening a PR with ISO code translation mappings may have gotten lost in the shuffle somewhere. Is this something you're still willing/able to do? If we can get that started, we might also be able to start incorporating @ThoWagen's ideas about translation aliases discussed in #413, and move that project forward a bit.

@EreMaijala
Copy link
Contributor

@demiankatz Indeed, I completely forgot about this one. I'll see to it.

@EreMaijala
Copy link
Contributor

@demiankatz #3200.

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, @damien-git! I'm merging this now and will move remaining TODO items over to #3200, which contains a start on the ISO code translation work.

@sturkel89 sturkel89 merged commit 1e899ad into vufind-org:dev Nov 13, 2023
EreMaijala pushed a commit to EreMaijala/vufind that referenced this pull request Aug 27, 2024
Fixes issues with encapsulated records, popups and embedded favorite lists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement small Minor changes to relatively few files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants