Skip to content

Commit

Permalink
Add CURIE parse benchmark (#499)
Browse files Browse the repository at this point in the history
* Add CURIE parse benchmark

* Update pages

* Add more epochs to benchmark

* Update README.md

* Pass mypy
  • Loading branch information
cthoyt authored Aug 11, 2022
1 parent 6de5b56 commit 005a836
Show file tree
Hide file tree
Showing 9 changed files with 7,580 additions and 582 deletions.
9 changes: 9 additions & 0 deletions exports/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,12 @@

1. The [`uri_parsing`](uri_parsing) benchmark checks the `bioregistry.parse_iri`
function. See also https://github.com/biopragmatics/bioregistry/pull/481.
2. The [`curie_parsing`](curie_parsing) benchmark checks
the `bioregistry.parse_curie`
function.

## Overview

| URI Parsing | CURIE Parsing |
|------------------------------|--------------------------------|
| ![](uri_parsing/results.svg) | ![](curie_parsing/results.svg) |
32 changes: 32 additions & 0 deletions exports/benchmarks/curie_parsing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# CURIE Parsing Benchmark

This benchmark checks the `bioregistry.parse_curie` function.

## Dataset

The benchmarking dataset is available in [`data.tsv`](data.tsv). It contains
5 columns:

1. `prefix` - a canonical Bioregistry prefix
2. `identifier` - a local unique identifier in the prefix's semantic space
3. `prefix_synonym` - the synonym of the canonical prefix being used
4. `banana` - the banana being used (i.e., redundant prefix in local identifier)
5. `curie` - the CURIE for the prefix_synonym-banana-identifier triple

Example data:

| prefix | identifier | prefix_synonym | banana | curie |
|---------------|--------------|----------------|--------|----------------------------|
| 3dmet | B00162 | 3dmet | | 3dmet:B00162 |
| 4dn.biosource | 4DNSR73BT2A2 | 4DN | | 4DN:4DNSR73BT2A2 |
| 4dn.biosource | 4DNSR73BT2A2 | 4dn.biosource | | 4dn.biosource:4DNSR73BT2A2 |
| 4dn.replicate | 4DNESWX1J3QU | 4dn.replicate | | 4dn.replicate:4DNESWX1J3QU |
| abcd | AD834 | abcd | | abcd:AD834 |
| cco | 0000003 | cco | | cco:0000003 |
| cco | 0000003 | cco | CCO: | cco:CCO:0000003 |

## Results

Most parsing goes pretty fast (average around 4,000 CURIE/second).

![](results.svg)
2,149 changes: 2,149 additions & 0 deletions exports/benchmarks/curie_parsing/data.tsv

Large diffs are not rendered by default.

Loading

0 comments on commit 005a836

Please sign in to comment.