Skip to content

Version 2.0.0

Latest
Compare
Choose a tag to compare
@xixilili xixilili released this 03 Sep 17:27
· 1 commit to master since this release

What's Changed (Aug 31, 2024)

  1. Data Updates in Alzkb:

    • DrugBank: Updated to version 5.1.12 (2024-03-14)
    • NCBI Gene: Updated to V2024-05-13
    • Gene Ontology: Updated to V2024-04-24*
    • MESH: Updated to V2023-12*
    • Uberon: Updated to V2024-03-22*
    • DrugCentral: Updated to V2023-11-01*
    • BindingDB: Updated to V2024-05*
    • MEDLINE: Updated to V2024-05-02*
      *Updates based on Hetionet. Please see the alzkb-updates Github repository for more details.
  2. Enhancements:

    • Added TranscriptionFactor nodes and TRANSCRIPTIONFACTORINTERACTSWITHGENE relationships.
    • Added chromosome number as a property to gene nodes.
    • Added sourcedatabase as properties to nodes.
    • Added correlation, score, p_fisher, z_score, affinity_nm, confidence, sourcedatabase, and unbiased, from Hetionet, DisGeNET, and DoRothEA as properties to relationships.
      The instructions for adding new data resources and importing data to the Memgraph graph database are available at alzkb Github repository.
  3. Data Quality Improvements:

    • Removed the mapping between Creutzfeldt-Jakob disease (CJD) and Familial Alzheimer Disease (FAD). CJD and FAD are different diseases but got merged to the same node in AlzKB because of the DisGeNET “disease_mappings.tsv” file, in which CJD is mapped to FAD.
    • Filtered genes to keep human genes only (tax-id = 9606).
    • Implemented case-insensitive matching when extracting Alzheimer’s data from DisGeNET to include disease names that are in all caps.
    • Consolidated pathways with the same names but different values of pathwayid and sourcedatabase.
    • Removed duplicated pathways from AOP-DB that have “Homo sapiens (human)” in their names.
    • Removed 21,724 Drug nodes from AOP-DB that had only xrefmesh values and NULL as commonName and were not connected to any other nodes.

Summary of the changes in nodes and relationships
Nodes:

Label NodeCount NodeCount previous version NumChanges
BiologicalProcess 12322 11381 941
BodyPart 652 402 250
CellularComponent 1695 1391 304
Disease 34 20 14
Drug 16581 36959 -20378
DrugClass 474 345 129
Gene 193279 193313 -34
MolecularFunction 3460 2884 576
Pathway 4516 4570 -54
Symptom 505 438 67
TranscriptionFactor 519 519
Total 234037 251703 -17666

Relationships:

Type RelCount RelCount previous version NumChanges
BODYPARTOVEREXPRESSESGENE 97772 97772 0
BODYPARTUNDEREXPRESSESGENE 102185 102185 0
CHEMICALBINDSGENE 25726 11531 14195
CHEMICALDECREASESEXPRESSION 21051 21051 0
CHEMICALINCREASESEXPRESSION 18713 18713 0
DISEASELOCALIZESTOANATOMY 33 29 4
DRUGCAUSESEFFECT 2 2 0
DRUGINCLASS 1945 1029 916
DRUGTREATSDISEASE 9 9 0
GENEASSOCIATEDWITHCELLULARCOMPONENT 88880 73553 15327
GENEASSOCIATESWITHDISEASE 508 502 6
GENECOVARIESWITHGENE 61606 61606 0
GENEHASMOLECULARFUNCTION 104752 97191 7561
GENEINPATHWAY 178991 179433 -442
GENEINTERACTSWITHGENE 147088 147001 87
GENEPARTICIPATESINBIOLOGICALPROCESS 548285 559385 -11100
GENEREGULATESGENE 263978 265667 -1689
SYMPTOMMANIFESTATIONOFDISEASE 53 79 -26
TRANSCRIPTIONFACTORINTERACTSWITHGENE 6910 6910
TOTAL 1668487 1636738 31749

The full database dump can be downloaded from the following link: https://cedars.box.com/v/alzkb-v2-0-0

Instruction for Installing from the CYPHERL file can be found here.