Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8941 adding file count in solr (v2) #10598

Merged

Conversation

luddaniel
Copy link
Contributor

@luddaniel luddaniel commented May 30, 2024

This PR is following the closed PR #9823 with a branch up to date and more details.

What this PR does / why we need it:

This PR handles 2 subjects as one go :

  • adding file count of a dataset version directly in solr Document (useful for API Search uses, ex: searching for datasets without file http://localhost:8080/api/search?q=fileCount:0)
  • moved the work of calculating the file count of a dataset version in indexing part instead of result json building, meaning : speed up the response when using big item number per page (ex : &per_page=1000)

two birds one stone

Which issue(s) this PR closes:

Special notes for your reviewer:

As the original PR was not merged after 9 month in open state, some competing developments were made on the same lines of code. I just finished the rework after merging develop a0028c3 and noticed that @GPortas was related to those developments. @GPortas can you please be part of reviewers and approve this PR ?

Suggestions on how to test this:

Checking Search API results of dataset version with files and no file.

Is there a release notes update needed for this change?:

May be not required, tell me.

Demo:

filecount_demo.mp4

Screenshot of API Search result of the same dataset after publishing

Screenshot from 2024-05-30 15-14-31

@pdurbin pdurbin requested a review from GPortas May 30, 2024 14:49
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luddaniel overall, this looks great. Nice feature. I didn't run the code but it makes sense from a quick look.

I did leave some comments.

@GPortas you might want to review at this PR because it touches code you add in this PR:

conf/solr/9.3.0/schema.xml Outdated Show resolved Hide resolved
@@ -51,6 +51,7 @@ public void setUp() {
indexService.settingsService = Mockito.mock(SettingsServiceBean.class);
indexService.dataverseService = Mockito.mock(DataverseServiceBean.class);
indexService.datasetFieldService = Mockito.mock(DatasetFieldServiceBean.class);
indexService.datasetVersionFilesServiceBean = Mockito.mock(DatasetVersionFilesServiceBean.class);
BrandingUtil.injectServices(indexService.dataverseService, indexService.settingsService);

Mockito.when(indexService.dataverseService.findRootDataverse()).thenReturn(dataverse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a test to SearchIT.java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
if (!JvmSettings.UI_SHOW_VALIDITY_LABEL_WHEN_PUBLISHED.lookupOptional(Boolean.class).orElse(true)) {
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tabs? Can they please be reverted to spaces?

Copy link
Contributor Author

@luddaniel luddaniel May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IntelliJ IDEA says it is 4 spaces tab spacing. Anyway, I fixed this because indentation was broken inside public boolean isValid(Predicate<SolrSearchResult> canUpdateDataset) {

Copy link
Contributor

@GPortas GPortas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@luddaniel luddaniel requested a review from pdurbin June 5, 2024 12:51
@luddaniel
Copy link
Contributor Author

@pdurbin it will be a forseken PR soon ;)

@pdurbin
Copy link
Member

pdurbin commented Sep 19, 2024

@luddaniel hmm? Sorry, what do you mean? 😅

@pdurbin pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Sep 27, 2024
@coveralls
Copy link

Coverage Status

coverage: 20.876% (-0.3%) from 21.137%
when pulling aa3f855 on Recherche-Data-Gouv:8941-adding-fileCount-in-solr
into 45bdf5a on IQSS:develop.

@luddaniel
Copy link
Contributor Author

@pdurbin this PR has been updated with develop. I tested it again, it looks good to me.

@pdurbin pdurbin added the Type: Feature a feature request label Oct 9, 2024
@luddaniel
Copy link
Contributor Author

@pdurbin Is it in 6.5 scope ?

@pdurbin
Copy link
Member

pdurbin commented Nov 7, 2024

@cmbz @scolapasta what do you think?

@cmbz cmbz added the FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) label Nov 7, 2024
@cmbz cmbz added this to the 6.5 milestone Nov 7, 2024
@stevenwinship stevenwinship self-assigned this Nov 13, 2024
Copy link
Contributor

@stevenwinship stevenwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I don't see any negative comments from anyone about this change so I will be approving.

@stevenwinship stevenwinship removed their assignment Nov 13, 2024
@ofahimIQSS ofahimIQSS self-assigned this Nov 19, 2024
@cmbz cmbz added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 21, 2024
@ofahimIQSS
Copy link
Contributor

No issues observed during testing. Merging PR

Screen.Recording.2024-11-22.at.3.24.47.PM.mov
Screen.Recording.2024-11-22.at.3.20.41.PM.mov

@ofahimIQSS ofahimIQSS merged commit f95c1a0 into IQSS:develop Nov 22, 2024
14 checks passed
@ofahimIQSS ofahimIQSS removed their assignment Nov 22, 2024
@jeromeroucou jeromeroucou deleted the 8941-adding-fileCount-in-solr branch November 25, 2024 07:50
@pdurbin
Copy link
Member

pdurbin commented Dec 9, 2024

I just asked @donsizemore to reindex the beta server (thanks!) and now we can see this feature working.

I went to https://beta.dataverse.org/dataverse/root/?q=fileCount:25 and found a dataset with 25 files (cat pictures, it seems)

Screenshot 2024-12-09 at 3 08 57 PM

Screenshot 2024-12-09 at 3 09 11 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request
Projects
Status: Done 🧹
Status: 🚀 Done (Recherche Data Gouv)
Development

Successfully merging this pull request may close these issues.

fileCount in Solr Documents and API performance
8 participants