Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**DON'T MERGE -- ARCHIVE ** Build overhaul v4.0.0 #122

Draft
wants to merge 115 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
9badd2a
adding metadata tracking doc
callahantiff Dec 7, 2021
d26e83d
adding chemical-disease metadata
callahantiff Dec 7, 2021
281f43f
adding chemical-phenotype metadata
callahantiff Dec 7, 2021
1a2b7dc
formatting update
callahantiff Dec 7, 2021
ba2f783
updating chemical-disease and chemical-phenotype filtering
callahantiff Dec 7, 2021
271f40e
adding back chemical-rna edges
callahantiff Dec 7, 2021
ce57453
adding chemical-gene metadata
callahantiff Dec 7, 2021
d755d91
updating filtering criteria
callahantiff Dec 7, 2021
4edd87c
fixing identifier columns
callahantiff Dec 7, 2021
6f49944
fixing columns
callahantiff Dec 7, 2021
932a43e
chemical-gene, chemical-protein, chemical-protein
callahantiff Dec 7, 2021
661d952
chemi-gene/protein/rna
callahantiff Dec 7, 2021
489848d
chemical-pathway
callahantiff Dec 7, 2021
bf08827
chemical-go
callahantiff Dec 7, 2021
882c38b
adding reverse edges
callahantiff Dec 7, 2021
062cf8c
updating evidence
callahantiff Dec 7, 2021
9af301f
disease-phenotype
callahantiff Dec 7, 2021
4b337e7
updating criteria
callahantiff Dec 7, 2021
01b0626
gene-disease/phenotype
callahantiff Dec 7, 2021
194ec56
adding gene prefix
callahantiff Dec 7, 2021
bb04d89
gene-gene
callahantiff Dec 7, 2021
df519ac
gene-pathway
callahantiff Dec 8, 2021
c09e703
updates for universal gene identifier mapping
callahantiff Dec 8, 2021
c7f9d49
gene-protein/rna
callahantiff Dec 8, 2021
1ee7d74
gobp-pathway, pathway-gocc/gomf
callahantiff Dec 8, 2021
928e873
updated processing code
callahantiff Dec 10, 2021
75ee527
updating processing criteria
callahantiff Dec 10, 2021
635121f
refining processing pipeline
callahantiff Dec 10, 2021
f611035
protein-anatomy/cell and rna-anatomy/cell
callahantiff Dec 10, 2021
c352c6a
fixing referenced columns
callahantiff Dec 10, 2021
c15fcd6
cleaning up file processing to add metadata
callahantiff Dec 10, 2021
028ebe5
protein-catalyst/cofactor
callahantiff Dec 10, 2021
dda4471
protein-catalyst/cofactor
callahantiff Dec 10, 2021
8672c82
light reformatting and re-org
callahantiff Dec 14, 2021
a879a9d
protein-gobp/gocc/gomf
callahantiff Dec 14, 2021
160ff8a
protein-pathway
callahantiff Dec 14, 2021
4618728
protein-protein
callahantiff Dec 14, 2021
8977722
rna-protein
callahantiff Dec 14, 2021
eaa97e3
adding code to ignore file header metadata
callahantiff Dec 14, 2021
8a44ad4
updating clinvar data file
callahantiff Dec 14, 2021
9d70981
addressing unionOf bug
callahantiff Dec 23, 2021
0911f0e
improved testing dependencies
callahantiff Dec 24, 2021
7e5a1cb
improving logic
callahantiff Dec 24, 2021
f0a196d
partially complete clinvar changes
callahantiff Dec 24, 2021
7d1560f
adding clinvar datasets
callahantiff Dec 27, 2021
c59bb1c
adding file header and editing content
callahantiff Dec 27, 2021
6d80538
extended example
callahantiff Dec 27, 2021
77207f6
updated script
callahantiff Dec 27, 2021
4d4fab3
improved spacing
callahantiff Dec 27, 2021
6202f1f
making all dependencies "|" delimited
callahantiff Dec 27, 2021
ba1ffa3
updating testing data
callahantiff Dec 27, 2021
e947ab6
fixing language
callahantiff Dec 27, 2021
8b4e8f8
adding header and making "|"-delimited
callahantiff Dec 27, 2021
adb82df
fixed typos and updated delimiter
callahantiff Dec 27, 2021
ebbe889
improving def
callahantiff Dec 27, 2021
8dec76f
improving column def
callahantiff Dec 27, 2021
90df2ec
adding bioregistry function
callahantiff Dec 27, 2021
a424c21
place holder for ontology dbxref retrieval
callahantiff Dec 27, 2021
ef36bce
adding pyyaml to requirements
callahantiff Dec 27, 2021
58adf49
adding biolink functionality
callahantiff Dec 28, 2021
5099cd6
delaying build further
callahantiff Dec 30, 2021
c40096b
bumping lxml to match
callahantiff Dec 30, 2021
7d3b387
removed additional parameter
callahantiff Jan 2, 2022
5810379
lowering the case of the variables
callahantiff Jan 3, 2022
9c96330
extending bioregistry functionality
callahantiff Jan 4, 2022
76af5a6
finalizing data
callahantiff Jan 4, 2022
0c27d3c
updating clinvar resource filenames
callahantiff Jan 4, 2022
2ebfbd9
modifying header and updating maps
callahantiff Jan 4, 2022
286daff
better logic for ids missing from bioregistry
callahantiff Jan 4, 2022
da0569c
removing unused arg
callahantiff Jan 4, 2022
431ab11
adding jsonl functions
callahantiff Jan 5, 2022
2d26f0d
updating definitions
callahantiff Jan 6, 2022
b813068
updating variable names
callahantiff Jan 6, 2022
4b83deb
renaming node metadata directory and files
callahantiff Jan 8, 2022
24a1ac5
fixing typo
callahantiff Jan 8, 2022
9b96017
pulling better entity description
callahantiff Jan 19, 2022
deb2a43
adding pubmed ids
callahantiff Jan 19, 2022
12bfa9e
improved explanation
callahantiff Jan 20, 2022
a157da2
fixed identifier
callahantiff Jan 25, 2022
3722191
complete overhaul and update
callahantiff Jan 25, 2022
2730a1e
moving metadata file
callahantiff Jan 28, 2022
2d50148
updating pathway to metadata file
callahantiff Jan 28, 2022
53f7054
fixing edge type errors
callahantiff Jan 28, 2022
51df055
cleaned up file
callahantiff Jan 29, 2022
2face46
committing nearly complete data
callahantiff Jan 29, 2022
ad0fa93
adding better tqdm
callahantiff Apr 1, 2022
6fc6a6e
adding ipywidgets
callahantiff Apr 1, 2022
5be8037
alphabetizing
callahantiff Apr 1, 2022
e0f95f0
cleaning up linting errors
callahantiff Apr 1, 2022
fdac874
sprucing
callahantiff Apr 1, 2022
57dbf85
light overhaul
callahantiff Apr 1, 2022
cfb11f0
adding analytic search examples
callahantiff Apr 1, 2022
1d99d04
Merge branch 'build_overhaul_v4.0.0' of github.com:callahantiff/PheKn…
callahantiff Apr 1, 2022
facfc77
suppressing biolink functions
callahantiff Apr 1, 2022
e3790b4
adding back org test data
callahantiff Apr 1, 2022
4fe4f3b
suppressing changes for testing
callahantiff Apr 1, 2022
9978637
suppressing for testing
callahantiff Apr 1, 2022
1a5bebd
testing rollback
callahantiff Apr 1, 2022
acc5731
rollback for testing
callahantiff Apr 1, 2022
125d243
fixed import error
callahantiff Apr 1, 2022
525d574
addressing typing error
callahantiff Apr 1, 2022
bdd7d89
rollback for testing
callahantiff Apr 1, 2022
7c942f8
rollback for testing
callahantiff Apr 1, 2022
32ab2d6
rollback
callahantiff Apr 1, 2022
4b34596
resolving typing errors
callahantiff Apr 1, 2022
e838d77
Merge branch 'master' into build_overhaul_v4.0.0
callahantiff Apr 1, 2022
b092588
fixing parsing error
callahantiff Apr 1, 2022
ff1e693
adding footnote
callahantiff Apr 4, 2022
21d3bdb
fixing header
callahantiff Apr 4, 2022
9529d9c
Merge branch 'master' into build_overhaul_v4.0.0
callahantiff Apr 4, 2022
1bcf69e
bumping numpy
callahantiff Apr 4, 2022
484e9b5
bumping numpy version
callahantiff Apr 4, 2022
5446b23
adding entity search helper functions
callahantiff Apr 4, 2022
3815e08
fixing numpy version error
callahantiff Apr 4, 2022
ca16527
Update build_requirements.txt
callahantiff Apr 4, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/kg-build-part2.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: KG Build - Part 2 (Construct Knowledge Graphs)
on:
schedule:
- cron: '0 0 25 * *' # runs at 00:00:00 UTC on the second day of each month
- cron: '0 0 29 * *' # runs at 00:00:00 UTC on the second day of each month
env:
PROJECT_ID: ${{ secrets.GCE_PROJECT }}
GCS_SERVICE_ACCOUNT: ${{ secrets.GCE_SA_KEY }}
Expand Down
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ builds/temp/*
#### External Libraries
pkt_kg/libs/deepwalk_c_master/*
pkt_kg/libs/walking-rdf-and-owl-master/*
pkt_kg/libs/pylucene*

#### Scripts
pkt_kg/kg_embedding_visualizer.py
Expand All @@ -48,7 +49,8 @@ scratch*.py
/resources/embeddings/*
/resources/knowledge_graphs/
/resources/kr_model/
/resources/node_data/*
/resources/metadata/*
!/resources/metadata/pheknowlator_source_metadata.xlsx
/resources/ontologies/*
/resources/owl_decoding/*
/resources/processed_data/*
Expand All @@ -60,7 +62,7 @@ scratch*.py
!/resources/edge_data/README.md
!/resources/embeddings/README.md
!/resources/knowledge_graphs/README.md
!/resources/node_data/README.md
!/resources/metadata/README.md
!/resources/ontologies/ontology_source_metadata.txt
!/resources/ontologies/README.md
!/resources/owl_decoding/README.md
Expand Down
8 changes: 4 additions & 4 deletions Main.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def main():
parser.add_argument('-b', '--kg', help='build type: "partial", "full", or "post-closure"', required=True)
parser.add_argument('-r', '--rel', help='yes/no - adding inverse relations to knowledge graph', required=True)
parser.add_argument('-s', '--owl', help='yes/no - removing OWL Semantics from knowledge graph', required=True)
parser.add_argument('-m', '--nde', help='yes/no - adding node metadata to knowledge graph', required=True)
parser.add_argument('-m', '--mta', help='yes/no - adding entity metadata to knowledge graph', required=True)
parser.add_argument('-o', '--out', help='name/path to directory where to write knowledge graph', required=True)
args = parser.parse_args()

Expand Down Expand Up @@ -85,21 +85,21 @@ def main():

if args.kg == 'partial':
kg = PartialBuild(construction=args.app,
node_data=args.nde,
node_data=args.mta,
inverse_relations=args.rel,
decode_owl=args.owl,
cpus=cpus,
write_location=args.out)
elif args.kg == 'post-closure':
kg = PostClosureBuild(construction=args.app,
node_data=args.nde,
node_data=args.mta,
inverse_relations=args.rel,
decode_owl=args.owl,
cpus=cpus,
write_location=args.out)
else:
kg = FullBuild(construction=args.app,
node_data=args.nde,
node_data=args.mta,
inverse_relations=args.rel,
decode_owl=args.owl,
cpus=cpus,
Expand Down
11 changes: 7 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ The ``pkt_kg`` library requires a specific project directory structure.
| |
| knowledge_graphs/
| |
| node_data/
| metadata/
| |
| ontologies/
| |
Expand Down Expand Up @@ -165,7 +165,9 @@ The `KG Construction`_ Wiki page provides a detailed description of the knowledg

* `resources/construction_approach/subclass_construction_map.pkl`_
* `resources/Master_Edge_List_Dict.json`_ ➞ *automatically created after edge list construction*
* `resources/node_data/node_metadata_dict.pkl <https://github.com/callahantiff/PheKnowLator/blob/master/resources/node_data/README.md>`__ ➞ *if adding metadata for new edges to the knowledge graph*
* `resources/metadata/entity_metadata_dict.pkl <https://github
.com/callahantiff/PheKnowLator/blob/master/resources/metadata/README.md>`__ ➞ *if adding metadata for new edges to the
knowledge graph*
* `resources/knowledge_graphs/PheKnowLator_MergedOntologies*.owl`_ ➞ *see* `ontology README`_ *for information*
* `resources/relations_data/RELATIONS_LABELS.txt`_
* `resources/relations_data/INVERSE_RELATIONS.txt`_ ➞ *if including inverse relations*
Expand Down Expand Up @@ -221,7 +223,7 @@ The program can be run locally using the `main.py`_ script or using the `main.ip
kg = PartialBuild(kg_version='v2.0.0',
write_location='./resources/knowledge_graphs',
construction='subclass,
node_data='yes,
metadata='yes,
inverse_relations='yes',
cpus=available_cpus,
decode_owl='yes')
Expand Down Expand Up @@ -437,7 +439,8 @@ Callahan TJ, Tripodi IJ, Hunter LE, Baumgartner WA. `A Framework for Automated C

.. _`resources/Master_Edge_List_Dict.json`: https://www.dropbox.com/s/t8sgzd847t1rof4/Master_Edge_List_Dict.json?dl=1

.. _`resources/node_data/node_metadata_dict.pkl`: https://github.com/callahantiff/PheKnowLator/blob/master/resources/node_data/README.md
.. _`resources/metadata/entity_metadata_dict.pkl`: https://github
.com/callahantiff/PheKnowLator/blob/master/resources/metadata/README.md

.. _`resources/knowledge_graphs/PheKnowLator_MergedOntologies*.owl`: https://www.dropbox.com/s/75lkod7vzpgjdaq/PheKnowLator_MergedOntologiesGeneID_Normalized_Cleaned.owl?dl=1

Expand Down
5 changes: 3 additions & 2 deletions builds/build_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@ google==1.9.3
google-api-core==1.24.1
google-api-python-client~=1.7.9
google-cloud-storage==1.28.0
lxml==4.6.5
lxml>=4.6.5
networkx==2.4
numpy==1.21.0
numpy==1.19.5
openpyxl==3.0.3
oauth2client~=4.1.3
Owlready2==0.25
pandas==1.0.5
python-json-logger==2.0.1
pyyaml
ray~=1.1.0
rdflib==4.2.2
reactome2py==0.0.8
Expand Down
Loading