Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Harvesting Source to search facets #10298

Closed
Tracked by #10195
DS-INRAE opened this issue Feb 6, 2024 · 9 comments · Fixed by #10464
Closed
Tracked by #10195

Add Harvesting Source to search facets #10298

DS-INRAE opened this issue Feb 6, 2024 · 9 comments · Fixed by #10464
Assignees
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Feature a feature request
Milestone

Comments

@DS-INRAE
Copy link
Member

DS-INRAE commented Feb 6, 2024

Overview of the Feature Request
Following the addition of a source name for harvesting clients :

What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)
API User, Guests

What inspired the request?
Needed for our harvested repositories.

What existing behavior do you want changed?
Modify the current search facet "Metadata Source" to include the list of Sources from harvesting clients.

Any open or closed issues related to this feature request?

@pdurbin
Copy link
Member

pdurbin commented Feb 6, 2024

I remember suggesting this back when we added that facet.

@DS-INRA are you thinking this would be a system-wide setting? And we'd keep the default as-is but installations could opt-in to it? Some installations might like the current behavior.

@DS-INRAE
Copy link
Member Author

DS-INRAE commented Feb 6, 2024

@DS-INRA are you thinking this would be a system-wide setting? And we'd keep the default as-is but installations could opt-in to it? Some installations might like the current behavior.

Good question, I'll post the question on the group with mockups to see the other installations opinion.
An other thought is that maybe simply not indicating a source name for any of the clients would be the easier solution for installations not wanting to dissociate sources, I don't know if it would work with the facet mechanism.

@gwendoux
Copy link
Contributor

gwendoux commented Feb 8, 2024

It could be interesting to include a feature that displays the data sources in search facets.

@DS-INRA, what kind of source information are you considering for the harvesting client to display? Would it be the server URL, nickname, or Dataverse?

@cmbz cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Feb 20, 2024
@cmbz cmbz added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Mar 12, 2024
@cmbz cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Mar 12, 2024
@jp-tosca jp-tosca self-assigned this Mar 14, 2024
@jp-tosca jp-tosca moved this from SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Mar 14, 2024
@jp-tosca jp-tosca self-assigned this Mar 29, 2024
@jp-tosca jp-tosca moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Mar 29, 2024
@jp-tosca
Copy link
Contributor

jp-tosca commented Apr 3, 2024

Hi @DS-INRA 👋🏼, Quick question: What would you expect to see on the Metadata Source? As of now I have this where you see the name that was given to the Harvesting Client

As an example here are my Clients:

image

And here is how it looks:

image

A couple of things come to my mind:

  • The Harvesting Client name can't have spaces so it may not look great for clients with more than 1 word.
  • If we use the name of the original Dataverse we can encounter issues where the same name is used on different sources and would be grouped by the same name.
  • Would it still make sense to include the root between all the other sources?

@DS-INRAE
Copy link
Member Author

DS-INRAE commented Apr 8, 2024

Hello,
Sorry for not detailing this before, I'm lagging behing the issues descriptions and it stayed with the details split between the two issues and not extensive.

What would you expect to see on the Metadata Source ?

We want to see is the harvested repository's name.

If we use the name of the original Dataverse we can encounter issues where the same name is used on different sources and would be grouped by the same name.

The case where two OAI sets from the same repository would get the same "Source/Repository Name" name is actually as expected.
For example, for a repository with two OAI sets (and therefore 2 clients), e.g. with one set from institution A going in collection A' and one set from institution B going in collection B', we would still want the same "Source/Repository Name".

Would it still make sense to include the root between all the other sources?

Yes I think, if you specifically want datasets from the current repository, and for quick counting purposes for dataverse collection admins.

@landreev
Copy link
Contributor

landreev commented Apr 8, 2024

@DS-INRA Hi, I suggested an alternative implementation in the PR earlier today, specifically, rather than using eithter the nickname of the Harvesting Client, or the descriptive label for the remote repository (still to be added, per #10217), just use the name of the local collection into which the client is harvesting. My comment there: #10464 (comment)

The potential advantage of this solution: makes #10217 unnecessary, while still providing a descriptive, user-friendly facet label.
A disadvantage: it's not going to cover the scenario you just described - multiple harvesting clients harvesting different sets from the same archive, into different local collections, that the local admin may want to group under the same facet.

I'm generally happy to implement it the way you prefer, just figured I'd ask.

@DS-INRAE
Copy link
Member Author

DS-INRAE commented Apr 9, 2024

Hi @landreev , unfortunately there are two additional limits to this approach even if it would have been great to avoid adding a new field :)

  • when a Harvested repository matches 1:1 a target collection, the repository name does not necessarily matches the collection name
  • a Dataverse Collection is usually in our case the target of more than one harvesting clients/sets, coming from different repositories

@landreev
Copy link
Contributor

landreev commented Apr 9, 2024

@DS-INRA Sure. So, just to confirm, our plan then is to merge the linked PR #10464 as is, with the client nickname used for the facet (for now). Then, when the descriptive label is added, we'll switch to using that - ?

@DS-INRAE
Copy link
Member Author

I'm okay with this approach, as discussed with @jp-tosca (thanks for the short summary :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Feature a feature request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants