Skip to content

OLS 4 Executive overview

henrietteharmse edited this page Jan 24, 2023 · 9 revisions

OLS 4 is available at www.ebi.ac.uk/ols4/. Note that you need the slash at end.

Why

Key reasons for the redesign of OLS are:

  1. Data releases started to take longer and longer. Where we used to able to have a new data release every 24 hours a few years ago, we found that often releases ran for 72 hours or even more than a week. When it runs for more than a week, the cluster terminates the job which corrupts the database. This meant that we could not consider any new use cases that may require re-indexing of all the ontologies hosted on OLS.
  2. Even when the data release job ran for less than a week, we still had frequent corruption of the database. In most cases we were able to catch these corruptions before they were propagated to OLS production, and thus our users experienced few outages. However, our users have been directly affected by delays in updates and indirectly effected by the team doing busy work (that is, fire fighting to keep the service up) rather than adding useful use cases.
  3. OLS 3 did not index all information available in .owl file of an ontology. An example of information that is not indexed in OLS3 is annotations on annotations. I.e., for synonyms you may want to capture additional metadata stating citation information.

How

OLS 4 implements a number of technical improvements. Here highlight only key changes from a user perspective.

  1. The root cause of the longer and longer data release related to using a reasoner and storing the complete ontology in memory. This resulted in OLS 3 requiring 150GB on the cluster to index. This huge memory requirement often meant that the OLS indexing job waited potentially for days before a node with that amount of memory is available. OLS4 assumes that ontologies are pre-reasoned and thus does not do any reasoning on the ontologies we are indexing which means there is no reason to load the complete ontology into memory. This allows OLS4 to make use of streaming and hence the** memory footprint of OLS4 is small**.

  2. OLS 4 makes use of an external database and no longer uses an embedded database. This embedded database was the root cause of many of the data corruption issues we experienced.

  3. We understand that many of our users rely on the OLS API for the implementation of their pipelines. For this reason we aimed at full backward compatibility of the OLS 4 API with OLS 3 to limit effect migration from OLS 3 to OLS 4 will have on our users.

Impact

The benefits OLS 4 (once stabilised) will have for our users are the following:

  1. Daily data releases. This means your new version of your ontology should be available on OLS within 24-48 hours.

  2. Because we can now index all ontologies in a few hours, we can consider more interesting use cases that may affect indexing of all the ontologies. Once OLS 4 is stabilised we will have an OLS userday to discuss planned new use cases to be implemented.

When

Here our tentative roadmap. We will to keep this updated as rollout of OLS 4 progresses. Roadmap