Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to RDF4J 2.2.4 produces integration test failures. #221

Open
emetsger opened this issue Feb 2, 2018 · 2 comments
Open

Upgrading to RDF4J 2.2.4 produces integration test failures. #221

emetsger opened this issue Feb 2, 2018 · 2 comments
Labels

Comments

@emetsger
Copy link
Contributor

emetsger commented Feb 2, 2018

Upgrading to RDF4J 2.2.4 (from 2.2.2) produces a number of test failures. At least one of the IT failures has to do with calculating the number of statements present in an RDF4J context.

An example failure:

[INFO] Running info.rmapproject.indexing.kafka.SaveOffsetOnRebalanceIT
[WARNING] [talledLocalContainer] 13:50:30.652 [2-thread-1] WARN  [IndexingConsume] - Unable to index event ORMapObject{id=rmap:rmd18m7msr, typeStatement=(rmap:rmd18m7msr, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://purl.org/ontology/rmap#Event) [rmap:rmd18m7msr], context=rmap:rmd18m7msr}: Timeout after 10 attempts, 75116 ms: failed to index ORMapObject{id=rmap:rmd18m7msr, typeStatement=(rmap:rmd18m7msr, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://purl.org/ontology/rmap#Event) [rmap:rmd18m7msr], context=rmap:rmd18m7msr}: No DiSCO with id rmap:rmd18m7mr7
[INFO] [talledLocalContainer] info.rmapproject.indexing.IndexingTimeoutException: Timeout after 10 attempts, 75116 ms: failed to index ORMapObject{id=rmap:rmd18m7msr, typeStatement=(rmap:rmd18m7msr, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://purl.org/ontology/rmap#Event) [rmap:rmd18m7msr], context=rmap:rmd18m7msr}: No DiSCO with id rmap:rmd18m7mr7
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.retry(DefaultIndexRetryHandler.java:233)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.IndexingConsumer.indexEvent(IndexingConsumer.java:169)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.IndexingConsumer.lambda$processRecords$15(IndexingConsumer.java:136)
[INFO] [talledLocalContainer] 	at java.lang.Iterable.forEach(Iterable.java:75)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.IndexingConsumer.processRecords(IndexingConsumer.java:119)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.IndexingConsumer.consume(IndexingConsumer.java:98)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.SpringAwareConsumerInitializer.lambda$startConsumers$26(SpringAwareConsumerInitializer.java:204)
[INFO] [talledLocalContainer] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[INFO] [talledLocalContainer] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[INFO] [talledLocalContainer] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[INFO] [talledLocalContainer] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[INFO] [talledLocalContainer] 	at java.lang.Thread.run(Thread.java:745)
[INFO] [talledLocalContainer] Caused by: info.rmapproject.core.exception.RMapDiSCONotFoundException: No DiSCO with id rmap:rmd18m7mr7
[INFO] [talledLocalContainer] 	at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapDiSCOMgr.readDiSCO(ORMapDiSCOMgr.java:140)
[INFO] [talledLocalContainer] 	at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapDiSCOMgr.readDiSCO(ORMapDiSCOMgr.java:114)
[INFO] [talledLocalContainer] 	at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapService.readDiSCO(ORMapService.java:353)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.getDisco(DefaultIndexRetryHandler.java:249)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.composeDTO(DefaultIndexRetryHandler.java:240)
[INFO] [talledLocalContainer] 	at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.retry(DefaultIndexRetryHandler.java:192)
[INFO] [talledLocalContainer] 	... 11 common frames omitted

The source of this issue is in ORMapObjectMgr:

	/**
	 * Looks up an IRI's type in the database to see if it matches the type IRI provided
	 *
	 * @param ts the triplestore instance
	 * @param id the IRI to be checked
	 * @param typeIRI the type IRI to be checked against
	 * @return true, if the id parameter has the type specified in typeIRI
	 * @throws RMapException the RMap exception
	 */
	public boolean isRMapType(Rdf4jTriplestore ts, IRI id, IRI typeIRI) throws RMapException {
		if (ts==null || id==null || typeIRI==null){
			throw new RMapException("Null parameter passed");
		}
		boolean isRmapType = false;
		try {
			if (ts.getConnection().size(id)>0) {
				//resource exists somewhere, lets find out where
				if (ts.getConnection().hasStatement(id, RDF.TYPE, typeIRI, false, id)) {
					//it is of defined type!
					return true;
				} 
			} else if (typeIRI.equals(RMAP.DISCO)) {
				//check events to see if it's a deleted DiSCO
				Set<Statement> stmts = ts.getStatements(null, RMAP.DELETEDOBJECT, id);
				for (Statement stmt : stmts) {
					IRI subject = (IRI) stmt.getSubject();
					IRI context = (IRI) stmt.getContext();
					if (subject.equals(context) && this.isRMapType(ts, subject, RMAP.EVENT)) {
						return true;
					}						
				}		
			}
		} catch (Exception e) {
			throw new RMapException ("Exception thrown searching for object " + id.stringValue(), e);
		}		
		return isRmapType;
	}

Specifically here:

                       if (ts.getConnection().size(id)>0) {
				//resource exists somewhere, lets find out where
				if (ts.getConnection().hasStatement(id, RDF.TYPE, typeIRI, false, id)) {
					//it is of defined type!
					return true;
				} 
			}

With 2.2.4 ts.getConnection().size(id) returns 0, (where id is considered to be the context (i.e. graph uri)). With 2.2.2, this method behaves as expected, returning some positive integer for the number of statements present in the triplestore for the graph. When ts.getConnection().size(id) returns 0, the method isRMapType(Rdf4jTriplestore ts, IRI id, IRI typeIRI) returns false, ultimately resulting in the info.rmapproject.core.exception.RMapDiSCONotFoundException: No DiSCO with id rmap:rmd18m7mr7 exception.

@emetsger emetsger added the bug label Feb 2, 2018
emetsger added a commit to emetsger/rmap that referenced this issue Feb 2, 2018
karenhanson pushed a commit that referenced this issue Feb 5, 2018
* Use an empty base url; none of the URIs are relative in the data.

* Share HTTP-based methods for depositing and updating discos in BaseHttpIT.

* Prepare BaseHttpIT for additonal integration tests
- Utility classes with methods for manipulating RDF and working with OkHttp
- Additional properties for working with the RDF4J server

* IT which queries the RDF4J "size" endpoint, which counts the number of statements in a context.

Related to #221
@karenhanson
Copy link
Contributor

There seems to be more to it than this one problem. I attempted to update to both 2.2.4 and 2.4.2. As part of the updates. For 2.2.4, I circumvented the issue described above by adding a SPARQL query to check for a context. (see my branch here). The tests in that version all pass, but when deployed, the first DiSCO to be posted through the API after deploying a local instance often does not get created properly in the triplestore, though it will sometimes get indexed. While navigating, you will run into a mixture of errors at different points.

Most common:

java.lang.NullPointerException
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapQueriesLineage.getLineageMembersWithDates(ORMapQueriesLineage.java:139) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapQueriesLineage.getLineageMembers(ORMapQueriesLineage.java:158) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapService.getDiSCODVersionsAndDerivatives(ORMapService.java:543) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.service.DataDisplayServiceImpl.getDiSCODTO(DataDisplayServiceImpl.java:173) ~[classes/:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.controllers.DiSCODataController.discoTableData(DiSCODataController.java:202) ~[classes/:?]
[INFO] [talledLocalContainer]   ... 42 more

Further inspection finds progenitor is null.
Here is a rarer one:

[INFO] [talledLocalContainer] Caused by: org.apache.http.impl.conn.ConnectionShutdownException
[INFO] [talledLocalContainer]   at org.apache.http.impl.conn.CPoolProxy.getValidConnection(CPoolProxy.java:79) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.conn.CPoolProxy.getSSLSession(CPoolProxy.java:147) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.DefaultUserTokenHandler.getUserToken(DefaultUserTokenHandler.java:81) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:327) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1113) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1066) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQueryViaHttp(SPARQLProtocolSession.java:972) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:938) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.RDF4JProtocolSession.getStatements(RDF4JProtocolSession.java:492) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.exportStatements(HTTPRepositoryConnection.java:284) ~[rdf4j-repository-http-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.getStatements(HTTPRepositoryConnection.java:269) ~[rdf4j-repository-http-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.triplestore.Rdf4jTriplestore.getStatements(Rdf4jTriplestore.java:231) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.triplestore.Rdf4jTriplestore.getStatements(Rdf4jTriplestore.java:255) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapResourceMgr.getResourceRdfTypes(ORMapResourceMgr.java:443) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapService.getResourceRdfTypesInContext(ORMapService.java:279) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.service.DataDisplayServiceImpl.getDiSCOGraph(DataDisplayServiceImpl.java:202) ~[classes/:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.controllers.DiSCODataController.discoGraphData(DiSCODataController.java:164) ~[classes/:?]
[INFO] [talledLocalContainer]   ... 42 more

Using 2.4.2, it has these problems, plus it hangs without error after a few minutes of browsing the webapp so that you can no longer navigate DiSCOs. Often the last activity it does is attempt to retrieve the DiSCO versions. The way the UI is designed, it does 2 similar requests in parallel which may be causing a problem (sometimes you will see the 2 version requests fire off in tandem before it hangs).

Because of the tandem firing of requests, it suggests it might be thread related, but it's not clear why the new rdf4j would cause or maybe just highlight some existing issues. I note that lineage methods are the only static methods in RMapService, not sure if that could be related as well.

@karenhanson
Copy link
Contributor

PR #255 also has a little more information about this issue.

A few thoughts related to this that I wanted to capture:
(1) I think it might be helpful to look into a refactor of the rdf4j connection slice of RMap. The way it was done originally is less than ideal with RMap adding an unnecessary layer to the connection object in places. Since RMap was created, rdf4j has changed quite a lot. If we take advantage of the changes to connection object some of the issues might go away.
(2) On the back burner, when I have time between things, I've been working to refactor out unnecessary rdf4j-specific logic in anticipation that we may want to move away from rdf4j as the sole option in the future. For example, there is no reason to use rdf4j in model objects or vocabularies.
(3) GraphDB has many more release versions since the one we are running. Perhaps the connection issues that we are seeing in the live environment but not the local environment could be resolved by updating GraphDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants