What's Changed (Aug 31, 2024)
-
Data Updates in Alzkb:
- DrugBank: Updated to version 5.1.12 (2024-03-14)
- NCBI Gene: Updated to V2024-05-13
- Gene Ontology: Updated to V2024-04-24*
- MESH: Updated to V2023-12*
- Uberon: Updated to V2024-03-22*
- DrugCentral: Updated to V2023-11-01*
- BindingDB: Updated to V2024-05*
- MEDLINE: Updated to V2024-05-02*
*Updates based on Hetionet. Please see the alzkb-updates Github repository for more details.
-
Enhancements:
- Added TranscriptionFactor nodes and TRANSCRIPTIONFACTORINTERACTSWITHGENE relationships.
- Added chromosome number as a property to gene nodes.
- Added sourcedatabase as properties to nodes.
- Added correlation, score, p_fisher, z_score, affinity_nm, confidence, sourcedatabase, and unbiased, from Hetionet, DisGeNET, and DoRothEA as properties to relationships.
The instructions for adding new data resources and importing data to the Memgraph graph database are available at alzkb Github repository.
-
Data Quality Improvements:
- Removed the mapping between Creutzfeldt-Jakob disease (CJD) and Familial Alzheimer Disease (FAD). CJD and FAD are different diseases but got merged to the same node in AlzKB because of the DisGeNET “disease_mappings.tsv” file, in which CJD is mapped to FAD.
- Filtered genes to keep human genes only (tax-id = 9606).
- Implemented case-insensitive matching when extracting Alzheimer’s data from DisGeNET to include disease names that are in all caps.
- Consolidated pathways with the same names but different values of pathwayid and sourcedatabase.
- Removed duplicated pathways from AOP-DB that have “Homo sapiens (human)” in their names.
- Removed 21,724 Drug nodes from AOP-DB that had only xrefmesh values and NULL as commonName and were not connected to any other nodes.
Summary of the changes in nodes and relationships
Nodes:
Label | NodeCount | NodeCount previous version | NumChanges |
---|---|---|---|
BiologicalProcess | 12322 | 11381 | 941 |
BodyPart | 652 | 402 | 250 |
CellularComponent | 1695 | 1391 | 304 |
Disease | 34 | 20 | 14 |
Drug | 16581 | 36959 | -20378 |
DrugClass | 474 | 345 | 129 |
Gene | 193279 | 193313 | -34 |
MolecularFunction | 3460 | 2884 | 576 |
Pathway | 4516 | 4570 | -54 |
Symptom | 505 | 438 | 67 |
TranscriptionFactor | 519 | 519 | |
Total | 234037 | 251703 | -17666 |
Relationships:
Type | RelCount | RelCount previous version | NumChanges |
---|---|---|---|
BODYPARTOVEREXPRESSESGENE | 97772 | 97772 | 0 |
BODYPARTUNDEREXPRESSESGENE | 102185 | 102185 | 0 |
CHEMICALBINDSGENE | 25726 | 11531 | 14195 |
CHEMICALDECREASESEXPRESSION | 21051 | 21051 | 0 |
CHEMICALINCREASESEXPRESSION | 18713 | 18713 | 0 |
DISEASELOCALIZESTOANATOMY | 33 | 29 | 4 |
DRUGCAUSESEFFECT | 2 | 2 | 0 |
DRUGINCLASS | 1945 | 1029 | 916 |
DRUGTREATSDISEASE | 9 | 9 | 0 |
GENEASSOCIATEDWITHCELLULARCOMPONENT | 88880 | 73553 | 15327 |
GENEASSOCIATESWITHDISEASE | 508 | 502 | 6 |
GENECOVARIESWITHGENE | 61606 | 61606 | 0 |
GENEHASMOLECULARFUNCTION | 104752 | 97191 | 7561 |
GENEINPATHWAY | 178991 | 179433 | -442 |
GENEINTERACTSWITHGENE | 147088 | 147001 | 87 |
GENEPARTICIPATESINBIOLOGICALPROCESS | 548285 | 559385 | -11100 |
GENEREGULATESGENE | 263978 | 265667 | -1689 |
SYMPTOMMANIFESTATIONOFDISEASE | 53 | 79 | -26 |
TRANSCRIPTIONFACTORINTERACTSWITHGENE | 6910 | 6910 | |
TOTAL | 1668487 | 1636738 | 31749 |
The full database dump can be downloaded from the following link: https://cedars.box.com/v/alzkb-v2-0-0
Instruction for Installing from the CYPHERL file can be found here.