-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search API results payload extension #10811
Conversation
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
Does this PR also resolved/includes the requested changes of #9788? |
@johannes-darms No. This PR adds new fields to the Search API result, without changing existing ones. The scopes are different. I think what is proposed in #9788 requires further discussion, in particular for maintaining backward compatibility in the endpoint, since existing fields would be redefined / renamed. |
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Partly the issue list difference between list/search and get, but also the missing items needed to recreate the dataverse/collection cards based on the search response like logo, affiliation and description. Some of these are already implemented in this PR. Could we add the rest in this PR as well?
Only suggestion two of the problem changes a property, the rest are just additions. Phil has already proposed a solution that would work. I think this is a great PR to fix the problems listed in #9788 and #9713. |
This comment has been minimized.
This comment has been minimized.
I am working on adding your requested properties. I see that the linked icon depends on the If we want to expose the flag as it is used by the JSF UI, we need to move the calculation logic out of What is already indexed and therefore straightforward to expose is I'm still looking into this, but I think determining the linked status may require exploring a solution that would go beyond the scope of this PR / issue. |
Many thanks!
Bummer, I totally agree with you that this refactoring should be a separated PR. Especially as there are two for loops to determine the harvested and linked status of an item, refactoring could improve the performance. |
8d048f3
to
29056fb
Compare
…-api-payload-extensions
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks great. I'm asking some questions in this review.
@@ -151,6 +173,8 @@ https://demo.dataverse.org/api/search?q=trees | |||
} | |||
} | |||
|
|||
Note that the image_url field, if exists, will be returned as a regular URL for Datasets, while for Files and Dataverses, it will be returned as a Base64 URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels really weird. image_url
is sometimes a URL (good!) and sometimes a base64 value (weird!)? Can we pick one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for keeping track in the PR. I repost the same info I sent to you on Slack.
For the SPA, using an absolute URL is more suitable than Base64, as it allows the browser to automatically cache the resources. However, I believe there might be performance-related reasons for using Base64. For reference, see: https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/search/SolrSearchResult.java#L42.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Browser caching should work for the current UI as well, so I'm doubtful that there's a big performance gain embedding base64.
], | ||
"affiliation": "Dataverse.org", | ||
"parentDataverseName": "Root", | ||
"parentDataverseIdentifier": "root" | ||
}, | ||
{ | ||
"name":"Darwin's Finches", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datasets can have identifier_of_dataverse
but not parentDataverseIdentifier
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. parentDataverseIdentifier is only for Dataverses.
"publicationStatuses": [ | ||
"Published" | ||
], | ||
"releaseOrCreateDate": "2016-05-10T12:53:39Z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files do not get parentDataverseIdentifier
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Just for Dataverses
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
This comment has been minimized.
This comment has been minimized.
…/dataverse into 10810-search-api-payload-extensions
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this PR does / why we need it:
Currently, the search API does not return all the attributes needed for files and collections, required to display the file and collection item cards.
I have extended the Search API result payloads to include:
The schema.xml file for Solr has been updated to include a new field called dvParentAlias for the parentDataverseIdentifier. This field is now indexed in the IndexServiceBean.
The
image_url
field was already included in the SolrSearchResult JSON, but it wasn’t returned by the API because it was appended only after the Solr query was processed in the SearchIncludeFragment of JSF. Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available.However, in the
search.rst
documentation, the field was incorrectly listed as being returned by the API, even though it never was since it was only set in JSF.I have reused the same URL generation logic from JSF. But currently, URL generation differs between datasets and dataverses&files. For datasets, an absolute URL is generated, while for files and dataverses, a Base64 URL is generated. In a future iteration, it would be beneficial to standardize the URL format so that the same type of URL is returned for all entities. I have mentioned this in the docs.
Which issue(s) this PR closes:
Special notes for your reviewer:
For the next released version, a Solr reindex will be necessary to apply the new schema.xml version. (Mentioned in the release notes)
Suggestions on how to test this:
Create dataverses, datasets and files and test the search endpoint to verify that new params are correctly retrieved.
curl "http://localhost:8080/api/search?q=*&sort=date&order=desc" -H "X-Dataverse-Key: <YOUR_API_KEY>"
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No
Is there a release notes update needed for this change?:
Yes
Additional documentation:
No
Preview docs at https://dataverse-guide--10811.org.readthedocs.build/en/10811/api/search.html
Related issues:
image_url
from Search API results no longer yields a downloadable image #3616