Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve conflicting version of MIME4J #10301

Merged
merged 3 commits into from
Oct 24, 2024
Merged

Resolve conflicting version of MIME4J #10301

merged 3 commits into from
Oct 24, 2024

Conversation

poikilotherm
Copy link
Contributor

What this PR does / why we need it:

  • Apache Abdera Parser, Apache Tika and RESTeasy (Testing) use MIME4J
  • Tika and RESTeasy use newer APIs only present since v0.8+
  • Abdera is an abandoned project, uses v0.7.2 and is hopefully compatible with newer releases
  • v0.8.4 given by Apache Tika relies on vulnerable Apache Commons IO 2.6, we want 2.11 per dependency management. Upgrading to v0.8.7 as earliest version with 2.11 dependency

Which issue(s) this PR closes:

Closes #9077

Special notes for your reviewer:
None

Suggestions on how to test this:
Let Jenkins run the SWORD2 tests. Maybe @qqmyers can tell us how to run tests for full text indexing?

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope

Is there a release notes update needed for this change?:
Nope

Additional documentation:
None

- Apache Abdera Parser, Apache Tika and RESTeasy (Testing) use MIME4J
- Tika and RESTeasy use newer APIs only present since v0.8+
- Abdera is an abandoned project, uses v0.7.2 and is hopefully
  compatible with newer releases
- v0.8.4 given by Apache Tika relies on vulnerable Apache Commons IO
  2.6, we want 2.11 per dependency management. Upgrading to v0.8.7 as
  earliest version with 2.11 dependency
@coveralls
Copy link

coveralls commented Feb 6, 2024

Coverage Status

coverage: 20.869%. remained the same
when pulling 535d531 on 9077-fix-mime4j
into a0cb73d on develop.

This comment has been minimized.

@qqmyers
Copy link
Member

qqmyers commented Feb 6, 2024

@poikilotherm tika has a v2.9.1 (we're at 2.4.1) which I think includes the v0.8.7 version you want. Should we upgrade tika in addition/instead? 2.9.1 looks like it works as well or better than the earlier version at QDR.

Re: testing - we don't have a suite of files to test all of full-text indexing so the basic test would be to configure full-text indexing (":SolrFullTextIndexing":"true"), reindex a dataset with test file(s) of various types, and see if they appear in search results for a term in the text (and don't appear in search when full-text is off).

@poikilotherm poikilotherm added Feature: Indexing Size: 3 A percentage of a sprint. 2.1 hours. labels Feb 6, 2024
@poikilotherm
Copy link
Contributor Author

I agree - we should upgrade Tika. Let me check if I can provide a Testcontainers based integration test, would be interesting to have this use case properly covered by a nice integration test.

This comment has been minimized.

@qqmyers qqmyers added GDCC: DANS related to GDCC work for DANS GDCC: QDR of interest to QDR labels Feb 7, 2024
@qqmyers qqmyers added this to the 6.4 milestone Jul 10, 2024
@qqmyers qqmyers added the Consider For Next Release A simple change (eg bug fix) that would be good to prioritize since it has been seen in the wild label Jul 10, 2024
@qqmyers qqmyers removed this from the 6.4 milestone Jul 10, 2024
@pdurbin
Copy link
Member

pdurbin commented Sep 11, 2024

@poikilotherm can you please resolve merge conflicts? ❤️

@cmbz cmbz added the FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) label Sep 25, 2024
@cmbz cmbz added this to the 6.5 milestone Sep 30, 2024
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poikilotherm nevermind. I resolved them. I didn't run the code myself but I'm approving this.

Copy link

github-actions bot commented Oct 2, 2024

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9077-fix-mime4j
ghcr.io/gdcc/configbaker:9077-fix-mime4j

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@cmbz cmbz added the FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) label Oct 9, 2024
@pdurbin pdurbin added the Type: Bug a defect label Oct 9, 2024
@cmbz cmbz added the FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) label Oct 23, 2024
@ofahimIQSS ofahimIQSS self-assigned this Oct 24, 2024
@ofahimIQSS
Copy link
Contributor

Spoke to Jim about testing this. Essentially, I performed a smoke test with SolrFullTextIndexing set to true. No issues found during testing. Merging PR.
Testing of 10301.docx

@ofahimIQSS ofahimIQSS merged commit e2d1b6a into develop Oct 24, 2024
21 checks passed
@ofahimIQSS ofahimIQSS deleted the 9077-fix-mime4j branch October 24, 2024 19:55
@ofahimIQSS ofahimIQSS removed their assignment Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consider For Next Release A simple change (eg bug fix) that would be good to prioritize since it has been seen in the wild Feature: Indexing FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) GDCC: DANS related to GDCC work for DANS GDCC: QDR of interest to QDR Size: 3 A percentage of a sprint. 2.1 hours. Type: Bug a defect
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

sword2-server library overrides tika's apache-mime4j-core dependency with older version
6 participants