Skip to content

Commit

Permalink
Merge pull request #209 from JULIELab/aggregation_accuracy
Browse files Browse the repository at this point in the history
Version 0.11.2. Fixes #207,#208.
  • Loading branch information
khituras authored Dec 22, 2022
2 parents 55e6bc2 + f657825 commit b8d6180
Show file tree
Hide file tree
Showing 28 changed files with 110 additions and 46 deletions.
13 changes: 7 additions & 6 deletions gepi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ The `production` stage expects that the complete GePI project has been built in
Run the following commands to create a `development` container:

```bash
DOCKER_BUILDKIT=1 docker build -t gepi:0.11.1 --target development .
docker run -dp 8080:8080 -v {/path/to/gepi/directory}:/var/gepi/dev -e GEPI_CONFIGURATION=<path to config file> gepi:0.11.1
DOCKER_BUILDKIT=1 docker build -t gepi:0.11.2 --target development .
docker run -dp 8080:8080 -v {/path/to/gepi/directory}:/var/gepi/dev -e GEPI_CONFIGURATION=<path to config file> gepi:0.11.2
```

The first command builds an image of the `development` stage. This will also build the `dependencies` stage where all the Java dependencies of the GePI application are downloaded and cached. This will take a while on the first execution but should be faster afterwards thanks to caching.
Expand All @@ -38,8 +38,8 @@ To run the `production` container, run

```bash
mvn clean package --projects gepi-webapp --also-make
DOCKER_BUILDKIT=1 docker build -t gepi:0.11.1 --target production .
docker run -dp 8080:8080 --name gepi gepi:0.11.1
DOCKER_BUILDKIT=1 docker build -t gepi:0.11.2 --target production .
docker run -dp 8080:8080 --name gepi gepi:0.11.2
```

These commands
Expand Down Expand Up @@ -68,7 +68,7 @@ gepi.neo4j.bolt.url=bolt://<host>:<port>

A production environment has a few requirements that are of lesser importance during development. This section explains requirements and solutions that may come up during GePI deployment with the Docker container. While detailed explanations come below, the full Docker `run` command we use for deployment looks like the following:
```
docker run -dp 80:8080 -p 443:8443 -v /host/path/to/certificate.p12:/var/lib/jetty/etc/keystore.p12 -v /host/path/to/configuration.properties:/gepi-webapp-configuration.properties --add-host=host.docker.internal:host-gateway --name gepi -e GEPI_CONFIGURATION=/gepi-webapp-configuration.properties gepi:0.11.1 jetty.sslContext.keyStorePassword=<changeit>
docker run -dp 80:8080 -p 443:8443 -v /host/path/to/certificate.p12:/var/lib/jetty/etc/keystore.p12 -v /host/path/to/configuration.properties:/gepi-webapp-configuration.properties --add-host=host.docker.internal:host-gateway --name gepi -e GEPI_CONFIGURATION=/gepi-webapp-configuration.properties gepi:0.11.2 jetty.sslContext.keyStorePassword=<changeit>
```
Alternatively, the `docker-compose-webapp.yml` file can be used with a few additions.

Expand Down Expand Up @@ -103,7 +103,8 @@ Update the new version number in the following places:
* `pom.xml` files (tip: use `mvn versions:set -DnewVersion=<new version>` and `mvn versions:commit` to remove the backup files)
* `README.md` (by executing `mvn clean package -DskipTests=true` to filter the `readme-raw/README.md` file to automatically set the current version to the `README.md` file)
* `AppModule.java` in `gepi-webapp`
* set `PRODUCTION_MODE` to true for releases
* the Docker image version in the `docker-compose.yml`
* the DB version in `gene-database.xml` in the `gepi-concept-database` module
* in `gepi-indexing-base` execute `python ../../../../jcore-misc/jcore-scripts/createMetaDescriptors.py -c -i -r manual -v 1.0 .` given that `jcore-misc` has been cloned to the same directory as GePI
* in `gepi`-code base directory, execute `python ../../jcore-misc/jcore-scripts/createMetaDescriptors.py -c -i -r manual -v 1.0 gepi-indexing/gepi-indexing-base` given that `jcore-misc` has been cloned to the same directory as GePI
* this updates the description file for the use with the JCoRe pipeline builder
2 changes: 1 addition & 1 deletion gepi/docker-compose-webapp.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: "3.2"
services:
gepi:
image: gepi:0.11.1
image: gepi:0.11.2
container_name: gepi
ports:
- 0.0.0.0:80:8080
Expand Down
2 changes: 1 addition & 1 deletion gepi/docker-compose-with-es.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
services:
gepi:
image: gepi:0.11.1
image: gepi:0.11.2
container_name: gepi
ports:
- 0.0.0.0:80:8080
Expand Down
2 changes: 1 addition & 1 deletion gepi/gepi-concept-database/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<parent>
<groupId>de.julielab</groupId>
<artifactId>gepi</artifactId>
<version>0.11.1</version>
<version>0.11.2</version>
</parent>
<artifactId>gepi-concept-database</artifactId>
<name>GePi Concept Database</name>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
http://www.julielab.de/conceptdb/facets/default http://www.julielab.de/conceptdb/facets/defaultfacet-1.1.0.xsd
http://julielab.de/conceptdb/concepts/bioportal http://www.julielab.de/conceptdb/concepts/bioportalconcepts-1.1.0.xsd">
<versioning>
<version>0.11.1</version>
<version>0.11.2</version>
</versioning>
<connection>
<uri>http://localhost:7474</uri>
Expand Down
2 changes: 1 addition & 1 deletion gepi/gepi-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<parent>
<groupId>de.julielab</groupId>
<artifactId>gepi</artifactId>
<version>0.11.1</version>
<version>0.11.2</version>
<relativePath>../pom.xml</relativePath>
</parent>
<dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,19 @@ public class GepiRequestData implements Cloneable {
private String filterFieldsConnectionOperator = "AND";
private EnumSet<InputMode> inputMode;
private String docId;
private int eventRetrievalLimitForAggregations;
private long dataSessionId;
private boolean includeUnary;
private int eventLikelihood;
private String[] taxId;
private String sectionNameFilterString;
private int pageSize = 10;

public GepiRequestData(List<String> eventTypes, boolean includeUnary, int eventLikelihood, Future<IdConversionResult> listAGePiIds, Future<IdConversionResult> listBGePiIds, String[] taxId, String sentenceFilterString, String paragraphFilterString, String filterFieldsConnectionOperator, String sectionNameFilterString, EnumSet<InputMode> inputMode, String docId, long dataSessionId) {
public int getEventRetrievalLimitForAggregations() {
return eventRetrievalLimitForAggregations;
}

public GepiRequestData(List<String> eventTypes, boolean includeUnary, int eventLikelihood, Future<IdConversionResult> listAGePiIds, Future<IdConversionResult> listBGePiIds, String[] taxId, String sentenceFilterString, String paragraphFilterString, String filterFieldsConnectionOperator, String sectionNameFilterString, EnumSet<InputMode> inputMode, String docId, int eventRetrievalLimitForAggregations, long dataSessionId) {
this.includeUnary = includeUnary;
this.eventLikelihood = eventLikelihood;
this.taxId = taxId;
Expand All @@ -38,6 +43,7 @@ public GepiRequestData(List<String> eventTypes, boolean includeUnary, int eventL
this.filterFieldsConnectionOperator = filterFieldsConnectionOperator;
this.inputMode = inputMode;
this.docId = docId;
this.eventRetrievalLimitForAggregations = eventRetrievalLimitForAggregations;
this.dataSessionId = dataSessionId;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ public class EventRetrievalService implements IEventRetrievalService {
FIELD_EVENT_ARG_HOMOLOGY_PREFERRED_NAME,
FIELD_NUM_ARGUMENTS
);
private static final int SCROLL_SIZE = 200;
private static final int SCROLL_SIZE = 2000;
private Logger log;
private ISearchServerComponent searchServerComponent;
private String documentIndex;
Expand Down Expand Up @@ -218,8 +218,8 @@ public CompletableFuture<EventRetrievalResult> closedSearch(GepiRequestData requ
serverRqst.index = documentIndex;
serverRqst.start = from;
serverRqst.rows = numRows;
serverRqst.requestTimeout = "10m";
configureDeepPaging(serverRqst, downloadAll, forCharts);
// serverRqst.requestTimeout = "10m";
configureDeepPaging(serverRqst, downloadAll, forCharts, requestData.getEventRetrievalLimitForAggregations());
if (!downloadAll) {
addHighlighting(serverRqst);
}
Expand All @@ -230,7 +230,7 @@ public CompletableFuture<EventRetrievalResult> closedSearch(GepiRequestData requ
log.debug("Sent closed search server request");
searchServerComponent.process(carrier);
if (log.isDebugEnabled())
log.debug("Server answered after {} seconds. Reading results.", (System.currentTimeMillis()-time) / 1000);
log.debug("Server answered after {} seconds. Reading results.", (System.currentTimeMillis() - time) / 1000);

EventRetrievalResult eventResult = eventResponseProcessingService
.getEventRetrievalResult(carrier.getSingleSearchServerResponse());
Expand Down Expand Up @@ -303,7 +303,7 @@ public CompletableFuture<EventRetrievalResult> openSearch(GepiRequestData gepiRe
log.debug("Sent open search server request");
searchServerComponent.process(carrier);
if (log.isDebugEnabled())
log.debug("Server answered after {} seconds. Reading results.", (System.currentTimeMillis()-time) / 1000);
log.debug("Server answered after {} seconds. Reading results.", (System.currentTimeMillis() - time) / 1000);


EventRetrievalResult eventResult = eventResponseProcessingService
Expand Down Expand Up @@ -342,23 +342,24 @@ public SearchServerRequest getOpenSearchRequest(GepiRequestData requestData, int
serverRqst.index = documentIndex;
serverRqst.start = from;
serverRqst.rows = numRows;
serverRqst.requestTimeout = "10m";
configureDeepPaging(serverRqst, downloadAll, forCharts);
// serverRqst.requestTimeout = "10m";
configureDeepPaging(serverRqst, downloadAll, forCharts, requestData.getEventRetrievalLimitForAggregations());
if (!downloadAll) {
addHighlighting(serverRqst);
}
return serverRqst;
}

private void configureDeepPaging(SearchServerRequest serverRqst, boolean downloadAll, boolean forCharts) {
private void configureDeepPaging(SearchServerRequest serverRqst, boolean downloadAll, boolean forCharts, int interactionRetrievalLimit) {
if (downloadAll)
serverRqst.rows = SCROLL_SIZE;
serverRqst.rows = Math.min(SCROLL_SIZE, interactionRetrievalLimit);
serverRqst.fieldsToReturn = forCharts ? FIELDS_FOR_CHARTS : FIELDS_FOR_TABLE;
serverRqst.downloadCompleteResults = downloadAll;
serverRqst.downloadCompleteResults = downloadAll && interactionRetrievalLimit > 0;
serverRqst.downloadCompleteResultsMethod = "searchAfter";
serverRqst.downloadCompleteResultMethodKeepAlive = "5m";
if (downloadAll) {
serverRqst.downloadCompleteResultsLimit = 200;
if (interactionRetrievalLimit < Integer.MAX_VALUE)
serverRqst.downloadCompleteResultsLimit = interactionRetrievalLimit;
serverRqst.addSortCommand("_shard_doc", SortOrder.ASCENDING);
}
}
Expand Down Expand Up @@ -424,8 +425,8 @@ public CompletableFuture<EventRetrievalResult> getFulltextFilteredEvents(GepiReq
serverRqst.index = documentIndex;
serverRqst.start = from;
serverRqst.rows = numRows;
serverRqst.requestTimeout = "10m";
configureDeepPaging(serverRqst, downloadAll, forCharts);
// serverRqst.requestTimeout = "10m";
configureDeepPaging(serverRqst, downloadAll, forCharts, requestData.getEventRetrievalLimitForAggregations());
if (!downloadAll) {
addHighlighting(serverRqst);
}
Expand All @@ -436,7 +437,7 @@ public CompletableFuture<EventRetrievalResult> getFulltextFilteredEvents(GepiReq
log.debug("Sent full-text search server request");
searchServerComponent.process(carrier);
if (log.isDebugEnabled())
log.debug("Server answered after {} seconds. Reading results.", (System.currentTimeMillis()-time) / 1000);
log.debug("Server answered after {} seconds. Reading results.", (System.currentTimeMillis() - time) / 1000);

EventRetrievalResult eventResult = eventResponseProcessingService
.getEventRetrievalResult(carrier.getSingleSearchServerResponse());
Expand Down
2 changes: 1 addition & 1 deletion gepi/gepi-indexing/gepi-indexing-base/component.meta
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"maven-artifact": {
"artifactId": "gepi-indexing-base",
"groupId": "de.julielab",
"version": "0.11.1"
"version": "0.11.2"
},
"name": "GePi Indexing Base"
}
2 changes: 1 addition & 1 deletion gepi/gepi-indexing/gepi-indexing-base/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>gepi-indexing</artifactId>
<groupId>de.julielab</groupId>
<version>0.11.1</version>
<version>0.11.2</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ else if (j > i)
// document.addField("argument1prefname", createRawFieldValueForAnnotation(argPair[0], arg1EntryIdPath, geneFb.egid2prefNameReplaceFilter));
final IFieldValue arg1HomoPrefNameValue = createRawFieldValueForAnnotation(argPair[0], arg1EntryIdPath, geneFb.orgid2topaggprefname);
// document.addField("argument1homoprefname", arg1HomoPrefNameValue);
document.addField("argument1homoprefnameaggvalue", arg1HomoPrefNameValue);
// final IFieldValue arg1GoPrefnames = createRawFieldValueForAnnotation(argPair[0], arg1EntryIdPath, geneFb.eg2goprefnameFilter);
// document.addField("argument1goprefnames", arg1GoPrefnames);
// document.addField("argument1matchtype", Stream.of(argPair).map(ArgumentMention.class::cast).map(ArgumentMention::getRef).map(ConceptMention.class::cast).map(cm -> cm.getResourceEntryList(0).getConfidence() == null || cm.getResourceEntryList(0).getConfidence().contains("9999") ? "exact" : "fuzzy").toArray());
Expand All @@ -211,6 +212,7 @@ else if (j > i)
// document.addField("argument2prefname", createRawFieldValueForFieldValue(document.getAsRawToken("argument2conceptid"), geneFb.conceptid2prefNameFilter));
final IFieldValue arg2HomoPrefNameValue = createRawFieldValueForAnnotation(argPair[1], arg2EntryIdPath, geneFb.orgid2topaggprefname);
// document.addField("argument2homoprefname", arg2HomoPrefNameValue);
document.addField("argument2homoprefnameaggvalue", arg2HomoPrefNameValue);
// final IFieldValue arg2GoPrefnames = createRawFieldValueForAnnotation(argPair[1], arg2EntryIdPath, geneFb.eg2goprefnameFilter);
// document.addField("argument2goprefnames", arg2GoPrefnames);
// document.addField("argument2matchtype", Stream.of(argPair).map(ArgumentMention.class::cast).map(ArgumentMention::getRef).map(ConceptMention.class::cast).map(cm -> cm.getResourceEntryList(0).getConfidence() == null || cm.getResourceEntryList(0).getConfidence().contains("9999") ? "exact" : "fuzzy").toArray());
Expand Down Expand Up @@ -255,7 +257,7 @@ else if (j > i)
document.addField("ARGUMENT_FS", argPair);
// For ElasticSearch aggregations, we create terms in the form 'symbol1---symbol2'. We also sort the symbols so that the same pair of symbols is always stored in the same order.
// Then we can use ElasticSearch aggregations to count interactions occurrences instead of retrieving all documents and counting ourselves.
// document.addField("aggregationvalue", document.getAsArrayFieldValue("argumenthomoprefnames").stream().map(IFieldValue::toString).sorted().collect(Collectors.joining("---")));
document.addField("aggregationvalue", document.getAsArrayFieldValue("argumenthomoprefnames").stream().map(IFieldValue::toString).sorted().collect(Collectors.joining("---")));
// final ArrayFieldValue go1Values = new ArrayFieldValue(arg1GoPrefnames);
// final ArrayFieldValue go2Values = new ArrayFieldValue(arg2GoPrefnames);
// for (IFieldValue go1 : go1Values) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,9 @@
"store": true,
"norms": false
},
"argument1homoprefnameaggvalue": {
"type": "keyword"
},
"argument1conceptid": {
"type": "keyword",
"store": true
Expand Down Expand Up @@ -158,6 +161,9 @@
"store": true,
"norms": false
},
"argument2homoprefnameaggvalue": {
"type": "keyword"
},
"argument2conceptid": {
"type": "keyword",
"store": true
Expand Down
4 changes: 2 additions & 2 deletions gepi/gepi-indexing/gepi-indexing-debug/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>gepi-indexing</artifactId>
<groupId>de.julielab</groupId>
<version>0.11.1</version>
<version>0.11.2</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>
Expand All @@ -15,7 +15,7 @@
<dependency>
<groupId>de.julielab</groupId>
<artifactId>gepi-indexing-base</artifactId>
<version>0.11.1</version>
<version>0.11.2</version>
</dependency>
</dependencies>
</project>
Loading

0 comments on commit b8d6180

Please sign in to comment.