Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document.pid mapping adds the prefix pid #356

Closed
antolinos opened this issue May 9, 2022 · 2 comments
Closed

Document.pid mapping adds the prefix pid #356

antolinos opened this issue May 9, 2022 · 2 comments

Comments

@antolinos
Copy link
Collaborator

It is more a question than an issue. Why is it prefixing the field pid with pid: when value is a integer?

Description:

I am using the default search_api_mapping.json mapping file where the pid of a document is defined as:

  "Document": {
    "base_icat_entity": "Investigation",
    "pid": ["doi", "id"],

If I do:

http://127.0.0.1:5000/search-api/documents

The returned pid values will be look like:

pid | "10.15151/ESRF-ES-94520246"
pid | "pid:13175377"

I see that it is not an error as it is forced in the code:

return f"pid:{value}" if isinstance(value, int) else value

However, by looking at the Document definition given in the search-api impementation for sci-cat, I did not see that it was mandatory:
https://github.com/panosc-eu/search-api/blob/master/doc/data-model.md#document

So, I was wondering the reason why there is a prefix.

Thanks!

@VKTB
Copy link
Contributor

VKTB commented May 9, 2022

Hi @antolinos, thank you for your question.

This is not mandatory as per the Data Model guide but it is a best attempt solution to make the identifiers persistent. The corresponding ICAT columns for the pid PaNOSC fields are nullable (optional), meaning that the values of these columns for some entity instances could be set to NULL in ICAT. In the case of ISIS, we discovered that the pid values of some ICAT Samples are set to NULL and this is when we decided that when such case occurs that its id is instead taken and used in setting the value of the PaNOSC pid field in the pid:<id> format. During this stage, we also discovered that the other ICAT entities have nullable corresponding pid columns. Because of this, we decided to apply the same solution to the rest of the PaNOSC entities, not just Sample. The decision to prefix the pid at the Search API level was based on the suggestions in icatproject/icat.server#231.

Considering the above, your Document with pid pid:13175377 is prefixed with pid: because the doi value of the corresponding investigation in ICAT is NULL, therefore it takes its id (the alternative ICAT mapping field) and sets the pid to be in the pid:<id> format. I hope this makes sense.

Relevant issue to the work mentioned above - #314
Relevant PR to the work mentioned above - #324

@antolinos
Copy link
Collaborator Author

Dear @VKTB. Thanks for your complete explanation.

Currently, the format of the results look a little bit heterogeneous. I am not sure if the PANOSC clients will be able to handle this when there is no DOI.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants