Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_depth metrics fails to calculate with AllegroGraph backend for large UMLS ontologies #181

Closed
alexskr opened this issue Jan 23, 2024 · 4 comments
Assignees
Labels

Comments

@alexskr
Copy link
Member

alexskr commented Jan 23, 2024

Ontology metrics calculation fail for large UMLS ontologies such as SNOMEDCT and NCBITAXON with AllegroGraph 7.3.1 backend (with patches)

, [2024-01-20T21:43:41.700507 #19673]  INFO -- : ["metrics_for_submission start"]
I, [2024-01-20T21:43:41.701395 #19673]  INFO -- : ["Unable to find metrics providing max_depth in file for submission http://data.bioontology.org/ontologies/SNOMEDCT/submissions/28.  Using ruby calculation of max_depth."]
E, [2024-01-21T01:41:55.762358 #19673] ERROR -- : ["too many connection resets (due to Net::ReadTimeout with #<TCPSocket:(closed)> - Net::ReadTimeout) after 7702 requests on 9800, last used 10000.043899019 seconds ago"]
E, [2024-01-21T01:41:55.763227 #19673] ERROR -- : [#<Net::HTTP::Persistent::Error: too many connection resets (due to Net::ReadTimeout with #<TCPSocket:(closed)> - Net::ReadTimeout) after 7702 requests on 9800, last used 10000.043899019 seconds ago>]
E, [2024-01-21T01:41:55.764110 #19673] ERROR -- : ["NoMethodError: undefined method `id=' for nil:NilClass\n/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-e716a6d41088/lib/ontologies_linked_data/models/ontology_submission.rb:1186:in `process_metrics'\n\t/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-e716a6d41088/lib/ontologies_linked_data/models/ontology_submission.rb:1118:in `process_submission'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:171:in `process_submission'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:45:in `block in process_queue_submissions'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:25:in `each'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:25:in `process_queue_submissions'\n\t/srv/ncbo/ncbo_cron/bin/ncbo_cron:252:in `block (3 levels) in <main>'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/scheduler.rb:65:in `block (3 levels) in scheduled_locking_job'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/scheduler.rb:51:in `fork'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/scheduler.rb:51:in `block (2 levels) in scheduled_locking_job'\n\t/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/gems/mlanett-redis-lock-0.2.7/lib/redis-lock.rb:43:in `lock'\n\t/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/gems/mlanett-redis-lock-0.2.7/lib/redis-lock.rb:234:in `lock'\n\t/srv/ncbo/ncbo_cron/lib/ncbo_cron/scheduler.rb:50:in `block in scheduled_locking_job'\n\t/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/gems/rufus-scheduler-2.0.24/lib/rufus/sc/jobs.rb:230:in `trigger_block'\n\t/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/gems/rufus-scheduler-2.0.24/lib/rufus/sc/jobs.rb:204:in `block in trigger'\n\t/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/gems/rufus-scheduler-2.0.24/lib/rufus/sc/scheduler.rb:430:in `block in trigger_job'"]
@alexskr
Copy link
Member Author

alexskr commented Jan 25, 2024

metrics is calculated by owlapi_wrapper for all ontologies except for UMLS. Parsing process falls back to using ruby/sparql code for calculating metrics which is doesn't work well with AllegroGraph.

if self.hasOntologyLanguage.umls?
triples_file_path = self.triples_file_path
logger.info("Using UMLS turtle file found, skipping OWLAPI parse")
logger.flush
mime_type = LinkedData::MediaTypes.media_type_from_base(LinkedData::MediaTypes::TURTLE)
generate_umls_metrics_file(triples_file_path)

logs state that ontology parsing process for UMLS ontologies skips OWLAPI parse but repository directory contains owlapi.xrdf file which indicates that owlapi wrapper was invoked.

I, [2024-01-20T17:52:00.002129 #19673]  INFO -- : ["Starting to process http://data.bioontology.org/ontologies/SNOMEDCT/submissions/28"]
I, [2024-01-20T17:52:00.004475 #19673]  INFO -- : ["Starting to process SNOMEDCT/submissions/28"]
I, [2024-01-20T17:52:00.230010 #19673]  INFO -- : ["Using UMLS turtle file found, skipping OWLAPI parse"]

owlapi_wrapper is invoked when new UMLS ontology submissions are created so we should use that metrics instead of the metrics generated by

def self.max_depth_fn(submission, logger, is_flat, rdfsSC)

@jvendetti
Copy link
Member

I wrote a couple of simple unit tests in the owlapi_wrapper project in my local dev environment to test metrics generation, e.g.:

@Test
public void parse_OntologySNOMEDCT() throws Exception {
    ParserInvocation pi = new ParserInvocation("./src/test/resources/repo/input/snomedct",
        "./src/test/resources/repo/output/snomedct", "SNOMEDCT.ttl", true);
    OntologyParser parser = new OntologyParser(pi);
    assertTrue(parser.parse());
}

The max depth metric is successfully calculated for both the SNOMEDCT and NCBITAXON TTL files, in 5 and 8 seconds respectively:

[main] DEBUG o.s.n.owlapi.wrapper.metrics.Graph - depth for owl:Thing is 30
[main] INFO  o.s.n.o.w.metrics.OntologyMetrics - Finished metrics calculation for SNOMEDCT.ttl in 5047 milliseconds
[main] INFO  o.s.n.o.w.metrics.OntologyMetrics - Generated metrics CSV file for SNOMEDCT.ttl
[main] DEBUG o.s.n.owlapi.wrapper.metrics.Graph - depth for owl:Thing is 37
[main] INFO  o.s.n.o.w.metrics.OntologyMetrics - Finished metrics calculation for NCBITAXON.ttl in 7583 milliseconds
[main] INFO  o.s.n.o.w.metrics.OntologyMetrics - Generated metrics CSV file for NCBITAXON.ttl

It should be relatively straightforward to modify the REST API to first check for the max depth in metrics.csv files. We're already doing this for classes, properties, etc.:

def self.number_classes(logger, submission)
class_count = 0
m_from_file = submission.metrics_from_file(logger)
if m_from_file && m_from_file.length == 2
class_count = m_from_file[1][0].to_i
else

@alexskr
Copy link
Member Author

alexskr commented Jan 26, 2024

max depth calculated by owlapi_wrapper is off by 1 compared to the max depth calculated by ruby/sparql.

Ontology Ruby owlapi_wrapper
STY 7 8
SNOMEDCT 29 30
NCBITAXON 36 37

This needs to be looked into

@jvendetti
Copy link
Member

Max depth calculated by the owlapi_wrapper starts from owl:Thing, which serves as the root class for all other classes in the ontology. It's making this calculation during the initial step of our ontology ingestion process where the ontology is loaded into memory by the OWL API, regardless of the format. The STY ontology is sufficiently small that I was able to verify this manually. I suppose you could debate which methodology is correct.

alexskr added a commit that referenced this issue Jan 27, 2024
Get max depth from the metrics.csv file which is already generated
by owlapi_wrapper when new submission of UMLS ontology is created.
Ruby code/sparql for calculating max_depth fails for large UMLS
ontologie with AllegroGraph backend

Addresses #181
@alexskr alexskr closed this as completed Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants