Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index wide tests #225

Closed
wants to merge 11 commits into from
Closed

Add index wide tests #225

wants to merge 11 commits into from

Conversation

gaurav
Copy link
Collaborator

@gaurav gaurav commented Jan 20, 2024

WIP

Should be merged after PR #221.

@gaurav gaurav changed the base branch from master to generate-curie-breakdown-report January 20, 2024 05:14
@gaurav
Copy link
Collaborator Author

gaurav commented Jan 23, 2024

These seem to be working, but errors on duplication are unclear:

INFO:root:(151168,) CURIEs loaded into babel_outputs/reports/duplication/synonyms.sqlite3
INFO:root:Reading synonyms file babel_outputs/synonyms/CellularComponent.txt (3/23)
INFO:root:Read 12132 entries from babel_outputs/synonyms/CellularComponent.txt.
INFO:root:(163300,) CURIEs loaded into babel_outputs/reports/duplication/synonyms.sqlite3
INFO:root:Reading synonyms file babel_outputs/synonyms/GrossAnatomicalStructure.txt (4/23)
INFO:root:Read 10321 entries from babel_outputs/synonyms/GrossAnatomicalStructure.txt.
INFO:root:(173621,) CURIEs loaded into babel_outputs/reports/duplication/synonyms.sqlite3
INFO:root:Reading synonyms file babel_outputs/synonyms/Gene.txt (5/23)
INFO:root:Read 49501597 entries from babel_outputs/synonyms/Gene.txt.
INFO:root:(49675218,) CURIEs loaded into babel_outputs/reports/duplication/synonyms.sqlite3
INFO:root:Reading synonyms file babel_outputs/synonyms/Protein.txt (6/23)
INFO:root:Read 251982847 entries from babel_outputs/synonyms/Protein.txt.
INFO:root:(301658065,) CURIEs loaded into babel_outputs/reports/duplication/synonyms.sqlite3
INFO:root:Reading synonyms file babel_outputs/synonyms/Disease.txt (7/23)
INFO:root:Read 338024 entries from babel_outputs/synonyms/Disease.txt.
INFO:root:(301996089,) CURIEs loaded into babel_outputs/reports/duplication/synonyms.sqlite3
INFO:root:Reading synonyms file babel_outputs/synonyms/PhenotypicFeature.txt (8/23)
[Sat Jan 20 18:08:36 2024]
INFO:snakemake.logging:[Sat Jan 20 18:08:36 2024] 
Error in rule test_synonyms_for_duplication:
    jobid: 0
    input: babel_outputs/synonyms/AnatomicalEntity.txt, babel_outputs/synonyms/Cell.txt, babel_outputs/synonyms/CellularComponent.txt, babel_outputs/synonyms/GrossAnatomicalStructure.txt, babel_outputs/synonyms/Gene.txt, babel_outputs/synonyms/Protein.txt, babel_outputs/synonyms/Disease.txt, babel_outputs/synonyms/PhenotypicFeature.txt, babel_outputs/synonyms/Pathway.txt, babel_outputs/synonyms/BiologicalProcess.txt, babel_outputs/synonyms/MolecularActivity.txt, babel_outputs/synonyms/MolecularMixture.txt, babel_outputs/synonyms/SmallMolecule.txt, babel_outputs/synonyms/Polypeptide.txt, babel_outputs/synonyms/ComplexMolecularMixture.txt, babel_outputs/synonyms/ChemicalEntity.txt, babel_outputs/synonyms/ChemicalMixture.txt, babel_outputs/synonyms/Drug.txt, babel_outputs/synonyms/OrganismTaxon.txt, babel_outputs/synonyms/GeneFamily.txt, babel_outputs/synonyms/DrugChemicalConflated.txt, babel_outputs/synonyms/umls.txt, babel_outputs/synonyms/MacromolecularComplex.txt
    output: babel_outputs/reports/duplication/synonyms.sqlite3, babel_outputs/reports/duplication/synonym_duplication_report.json

ERROR:snakemake.logging:Error in rule test_synonyms_for_duplication:
    jobid: 0
    input: babel_outputs/synonyms/AnatomicalEntity.txt, babel_outputs/synonyms/Cell.txt, babel_outputs/synonyms/CellularComponent.txt, babel_outputs/synonyms/GrossAnatomicalStructure.txt, babel_outputs/synonyms/Gene.txt, babel_outputs/synonyms/Protein.txt, babel_outputs/synonyms/Disease.txt, babel_outputs/synonyms/PhenotypicFeature.txt, babel_outputs/synonyms/Pathway.txt, babel_outputs/synonyms/BiologicalProcess.txt, babel_outputs/synonyms/MolecularActivity.txt, babel_outputs/synonyms/MolecularMixture.txt, babel_outputs/synonyms/SmallMolecule.txt, babel_outputs/synonyms/Polypeptide.txt, babel_outputs/synonyms/ComplexMolecularMixture.txt, babel_outputs/synonyms/ChemicalEntity.txt, babel_outputs/synonyms/ChemicalMixture.txt, babel_outputs/synonyms/Drug.txt, babel_outputs/synonyms/OrganismTaxon.txt, babel_outputs/synonyms/GeneFamily.txt, babel_outputs/synonyms/DrugChemicalConflated.txt, babel_outputs/synonyms/umls.txt, babel_outputs/synonyms/MacromolecularComplex.txt
    output: babel_outputs/reports/duplication/synonyms.sqlite3, babel_outputs/reports/duplication/synonym_duplication_report.json

RuleException:
IntegrityError in file /code/babel/src/snakefiles/reports.snakefile, line 102:
UNIQUE constraint failed: synonyms.curie
  File "/code/babel/src/snakefiles/reports.snakefile", line 102, in __rule_test_synonyms_for_duplication
  File "/code/babel/src/reports/index_wide_synonym_tests.py", line 50, in report_on_index_wide_synonym_tests
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
ERROR:snakemake.logging:RuleException:
IntegrityError in file /code/babel/src/snakefiles/reports.snakefile, line 102:
UNIQUE constraint failed: synonyms.curie
  File "/code/babel/src/snakefiles/reports.snakefile", line 102, in __rule_test_synonyms_for_duplication
  File "/code/babel/src/reports/index_wide_synonym_tests.py", line 50, in report_on_index_wide_synonym_tests
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run

At the least, we need to know the file that triggered this issue, but of course if we know the identifier that triggered this that would solve a lot of our problems all at once.

I assumed that this error was caused by both DrugChemicalConflated.txt as well as its individual files (e.g. Drug.txt) being loaded into SQLite, which I do need to fix, but looking at the logs it seems more likely that this was caused by a disease and a phenotypic feature having the same identifier, which would suggest this PR is already paying its way.

Base automatically changed from generate-curie-breakdown-report to master January 25, 2024 06:25
@gaurav gaurav mentioned this pull request Aug 1, 2024
@gaurav
Copy link
Collaborator Author

gaurav commented Sep 30, 2024

We've standardized to DuckDB for index-wide tests (#290), so we'll close this PR.

@gaurav gaurav closed this Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant