Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sword2-server library overrides tika's apache-mime4j-core dependency with older version #9077

Closed
janvanmansum opened this issue Oct 18, 2022 · 6 comments · Fixed by #10301
Closed
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: API Feature: Search/Browse Type: Suggestion an idea User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Milestone

Comments

@janvanmansum
Copy link
Contributor

What steps does it take to reproduce the issue?

  1. Turn full text indexing on: curl -X PUT -d true http://localhost:8080/api/admin/settings/:SolrFullTextIndexing

  2. Create a dataset

  3. Upload an e-mail file, for example

     From: A
     To: B
     Subject: C
    
     An infrequent word: peripatetic
    

    Attached here: email.txt

An error is displayed, even though the file is added. This is because full text indexing fails. The following error is found in the logs:

[2022-10-18T13:36:42.882+0200] [Payara 5.2022.3] [SEVERE] [] [edu.harvard.iq.dataverse.api.errorhandlers.ThrowableHandler] [tid: _ThreadID=108 _ThreadName=http-thread-poo
l::http-listener-1(16)] [timeMillis: 1666093002882] [levelValue: 1000] [[                                                                                                   javax.ejb.EJBException: org/apache/james/mime4j/stream/MimeConfig$Builder
        at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723)
        at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652)
        at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482)                                                            at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601)
        at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134)
        at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104)
        at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220)                                                                at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
        at com.sun.proxy.$Proxy325.indexDataset(Unknown Source)
        at edu.harvard.iq.dataverse.search.__EJB31_Generated__IndexServiceBean__Intf____Bean__.indexDataset(Unknown Source)
        at edu.harvard.iq.dataverse.api.Index.indexDatasetByPersistentId(Index.java:319)                                                                                          at jdk.internal.reflect.GeneratedMethodAccessor1916.invoke(Unknown Source)
....        
Caused by: java.lang.NoClassDefFoundError: org/apache/james/mime4j/stream/MimeConfig$Builder
        at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:74)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:185)                                                                                               at edu.harvard.iq.dataverse.search.IndexServiceBean.toSolrDocs(IndexServiceBean.java:1054)
        at edu.harvard.iq.dataverse.search.IndexServiceBean.addOrUpdateDataset(IndexServiceBean.java:1307)
        at edu.harvard.iq.dataverse.search.IndexServiceBean.addOrUpdateDataset(IndexServiceBean.java:731)
        at edu.harvard.iq.dataverse.search.IndexServiceBean.indexDataset(IndexServiceBean.java:599)
        at jdk.internal.reflect.GeneratedMethodAccessor1747.invoke(Unknown Source)                                                                                                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
        at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408)                                                                          at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
        at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:665)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
        at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)                                                                                                              at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
        at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
        at jdk.internal.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                               at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
        at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)                                                                                                              at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
        at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
        at jdk.internal.reflect.GeneratedMethodAccessor146.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                               at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)                                                                     at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
        at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
        at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807)                                                                                              at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
        at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
        ... 78 more

  • When does this issue occur?
    When a e-mail file is to be indexed.

  • Which page(s) does it occurs on?
    Not really a user interface issue, but an error is displayed when you upload the file.

  • What happens?
    See above

  • To whom does it occur (all users, curators, superusers)?
    all users

  • What did you expect to happen?
    The file should be indexed correctly

Which version of Dataverse are you using?

  • v5.11.1
  • develop

Any related open or closed issues to this bug report?
The problem is a recurring one: dependencies needed by Tika are overriden by older ones, so that the required classes or methods are not found at runtime. In this case sword2-server is the culprit. The quick fix is to exclude apache-mime4j-core from the sword2-server dependency (thanks to @qqmyers for the suggestion). A more solid fix would possibly be introducing Java modules into Dataverse, so that the transitive dependencies of primary dependencies don't interfere with one another.

@poikilotherm
Copy link
Contributor

poikilotherm commented Oct 18, 2022

@janvanmansum I'm sorry, but Java Modules might be unlikely to help here. As the SWORD library is requiring a RFC 5023 implementation and the only one around is the dead Apache Abdera project (and I guess no one wants to reimplement or fork that lib) and these dependencies are not Java 9+ enabled, I'm not sure this would solve our problem.

Usually, the way to deal with such conflicts is to use a proper entry in <dependencyManagement> of Dataverse's POM. See also dev guide where I wrote a few words about this.

but

As we control the lib (please find it at https://github.com/gdcc/sword2-server), we might choose to go a different path here. As the Abdera lib is not updated any longer, it might be preferable to freeze its dependencies in time instead of crossing fingers on every update of transitive dependencies. How about shading the complete Abdera lib and it's dependencies into the SWORD lib JAR?

@pdurbin
Copy link
Member

pdurbin commented Oct 9, 2023

@poikilotherm do you think we should do what @PaulBoon did in the following pull request?

@PaulBoon
Copy link
Contributor

@pdurbin Is there any reason not to apply the suggested fix, because it would be great if we can get rid of this issue with the next release.

@pdurbin
Copy link
Member

pdurbin commented Jan 30, 2024

@PaulBoon you're talking about DANS-KNAW#173 right, not the fix @poikilotherm suggested above?

@PaulBoon
Copy link
Contributor

@pdurbin I am talking about the solution we have working, however if a better solution is available soon that would be nice. Meanwhile we can use the fix we have.

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue Jan 30, 2024
@pdurbin
Copy link
Member

pdurbin commented Jan 31, 2024

@PaulBoon gotcha, thanks.

From the commit above, it looks like @qqmyers picked it up for QDR. 😄

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue Feb 1, 2024
poikilotherm added a commit that referenced this issue Feb 6, 2024
- Apache Abdera Parser, Apache Tika and RESTeasy (Testing) use MIME4J
- Tika and RESTeasy use newer APIs only present since v0.8+
- Abdera is an abandoned project, uses v0.7.2 and is hopefully
  compatible with newer releases
- v0.8.4 given by Apache Tika relies on vulnerable Apache Commons IO
  2.6, we want 2.11 per dependency management. Upgrading to v0.8.7 as
  earliest version with 2.11 dependency
poikilotherm added a commit that referenced this issue Feb 6, 2024
- Apache Abdera Parser, Apache Tika and RESTeasy (Testing) use MIME4J
- Tika and RESTeasy use newer APIs only present since v0.8+
- Abdera is an abandoned project, uses v0.7.2 and is hopefully
  compatible with newer releases
- v0.8.4 given by Apache Tika relies on vulnerable Apache Commons IO
  2.6, we want 2.11 per dependency management. Upgrading to v0.8.7 as
  earliest version with 2.11 dependency
pdurbin added a commit that referenced this issue Oct 2, 2024
@pdurbin pdurbin added this to the 6.5 milestone Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: API Feature: Search/Browse Type: Suggestion an idea User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants