-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language labels matching MARC standard #2933
Conversation
Thanks, @damien-git, this is very helpful! I'm requesting a couple of other reviews -- I'm interested in @EreMaijala's opinion about this (particularly with regard to how it might impact RecordManager), and @lahmann's input on whether this might cause his team any problems (since I know he's been interested in translation of language names for a long time). There's also a question of whether this should be done as part of the 9.1 release or the 10.0 release -- introducing this will require a full reindex to ensure consistency of language facets, and it also offers a bit of a backward compatibility break in the sense that saved search links containing changed facet values will no longer work (though I suspect this is a rare/insignificant situation). In any case, I'll be happy to merge and approve this once we have some consensus on timing, etc. Thanks again! |
Good point, it only takes effect after re-importing data into Solr, and a partial import would create inconsistent results. |
I've added a TODO checkbox to the top of the PR as a reminder to put a changelog note about this into whichever release we end up merging it into. :-) |
@demiankatz Well, I'd prefer indexing codes instead of translations. They're shorter, and you can change the translations (like "Ancient Greek" => "Greek, Ancient") without affecting searches, and of course display them in the UI language. The downside is that sorting in index order won't match the translations. RecordManager comes with a sample mapping that normalizes language codes to ISO 639-3 as far as possible, so the missing piece would be translations for them in VuFind, and probably a mechanism to sort them properly if displayed in advanced search form. I can provide English, Finnish and Swedish translations, if deemed useful. |
@EreMaijala, maybe the best way to move forward with this is to add support for ISO codes without changing the existing behavior (at least, not yet). That is, keep the current language field, but add something like iso639_3_str_mv for ISO codes. Then we can index into both fields, and end users can choose whether they want the flexibility of using codes or the advantages (like backward compatibility and index sortability) of the "legacy" approach. This would also allow us to potentially change the language display logic so that if ISO codes are found in the index, we display language by translating the codes... but if they are not, we fall back to displaying the legacy language field. This would add flexibility for displaying unusual languages if there is a need to represent something that does not correspond with a code. What do you think? If this seems like a potentially valuable approach, maybe we can start a separate PR to begin implementing it, and we can keep this one focused on simply bringing the legacy strings up to date. |
@demiankatz Makes sense to me! |
Thanks, @EreMaijala. If you have a chance to start a PR with your translation files, I can then work on turning it into a replacement for #413, and perhaps we can finally close our oldest open PR. :-) |
Thanks, @EreMaijala -- at this point, I'll just wait for @lahmann's input to see if that impacts the timing of the merge for these updates. |
There is a text encoding issue: I used UTF-8, but apparently it needs to be in ISO-8859-1 to be imported correctly into Solr (I guess it's still the default encoding for properties files in Java). Unfortunately one character does not appear in ISO-8859-1, the ã in Lahndā (U+0101). I think I will just use "a" to replace it. In the long run it would be nice to use UTF-8 for these files. |
@damien-git, which version of the code are you using? The most recent release of SolrMarc (which I updated in the dev branch fairly recently) changes the encoding requirement from ISO-8859-1 to UTF-8, which should solve your problems. See #2942. |
I was trying with VuFind 8.1. Will this be in VuFind 9 or 10 ? Either way, I will revert to using UTF-8 for this PR. |
This reverts commit a007483.
The SolrMarc upgrade will be in 9.1. |
@damien-git, I have taken care of merging #2942 into the dev branch. Do you mind taking care of conflict resolution here when time permits? |
Thanks for the conflict resolution, @damien-git. I have conservatively pinned this PR to the 10.0 milestone, but if @lahmann or others feel it's worth merging sooner in spite of the minor BC implications, I'm not opposed to including it in 9.1 instead. I just want to be sure we don't lose track of this work. I've also added a TODO checkbox as a reminder to work on the ISO code indexing project discussed with @EreMaijala above. |
@EreMaijala, I suspect our earlier conversation about opening a PR with ISO code translation mappings may have gotten lost in the shuffle somewhere. Is this something you're still willing/able to do? If we can get that started, we might also be able to start incorporating @ThoWagen's ideas about translation aliases discussed in #413, and move that project forward a bit. |
@demiankatz Indeed, I completely forgot about this one. I'll see to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, @damien-git! I'm merging this now and will move remaining TODO items over to #3200, which contains a start on the ISO code translation work.
Fixes issues with encapsulated records, popups and embedded favorite lists.
See #2928.
TODO