Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search API: facets issue on translated custom fields #8287

Open
bappun opened this issue Dec 7, 2021 · 4 comments
Open

Search API: facets issue on translated custom fields #8287

bappun opened this issue Dec 7, 2021 · 4 comments
Labels

Comments

@bappun
Copy link

bappun commented Dec 7, 2021

What steps does it take to reproduce the issue?
In a TSV file I have a controlled vocabulary with the value Politics.Elections for Topic Classification Term. This value is then translated in two languages using the java properties:

  • EN: Elections
  • FR: Élections

When I query the search API with facets enabled (show_facet: true), the label for this field is taken from the english translation. I get this:

"topicClassValue_ss": {
    "friendly": "Topic Classification Term",
    "labels": [
        {"Elections": 2657}
    ]
}

This becomes an issue when I try to search Dataverse using this facet. When I search topicClassValue_ss:"Elections" I get no results because the needed value for the search is the one not translated: topicClassValue_ss:"Politics.Elections". However, there is no way to get the needed value from the API.

  • When does this issue occur?
    When querying the Search API with a translated facet where the english translation is different from the value in the TSV file.

  • Which page(s) does it occurs on?
    In the search API.

  • What happens?
    The query returns 0 elements.

  • To whom does it occur (all users, curators, superusers)?
    Tested only on public API (with no token), but it should occur for any user.

  • What did you expect to happen?
    Be able to use the translated value for facets or have the correct name intended to be used in the API as a new field.

Which version of Dataverse are you using?
5.8

Any related open or closed issues to this bug report?
#8286

@pdurbin
Copy link
Member

pdurbin commented Dec 8, 2021

@bappun I just saw you announce https://cdsp-scpo.github.io/dataverse-feed/build/ at https://dataversecommunity.slack.com/archives/C5V66TV6Y/p1638987754055400

I expected to see 127 results when checking the box for "Elections" under "Topic Classification Term" but I got 0 items (screenshots below). Is this because of this issue you're reporting?

Screen Shot 2021-12-08 at 1 35 29 PM

Screen Shot 2021-12-08 at 1 35 34 PM

@bappun
Copy link
Author

bappun commented Dec 10, 2021

@pdurbin Yes! I noticed the issue while working on this prototype. The online demo is using Dataverse v4.20 but I also tried with a pre-production instance using v5.8 and noticed the same problem on my implementation.

I would like to reproduce the same behavior as in the Dataverse UI where selecting the Elections facet adds the Politics.Elections filter:
image

@qqmyers
Copy link
Member

qqmyers commented Jan 7, 2022

FWIW: My guess is that it is just a bug that the filter shown above the results uses the base term. For the external vocab mechanism, both the facets and the filter are translated in the UI - I think as requested in review.

I'll also note that the issue here makes it hard to do a simple search for CVV as well, i.e., if the translation of Politics.Elections was 'Voting' (anything that didn't have the words 'politics' or 'elections' in it) simple search for the term visible on the page wouldn't get any results either. So, it isn't just an API issue.

After some discussion, it sounds like indexing the CVV values for all configured languages could be a reasonable way to solve this. (I think this can be done so the facets aren't affected but filtering for the base term or any translation would get a hit.) Unless there are concerns/somebody can see a problem with this approach, I'll look into it on Sciences PO's behalf.

@qqmyers qqmyers added the GDCC: SciencesPO related to GDCC work for Sciences PO label Jan 21, 2022
@pdurbin
Copy link
Member

pdurbin commented Oct 10, 2022

it sounds like indexing the CVV values for all configured languages could be a reasonable way to solve this

Sure, I think that approach is worth exploring, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants