Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRAPI 1.5: support source_record_urls #803

Closed
colleenXu opened this issue Mar 26, 2024 · 12 comments
Closed

TRAPI 1.5: support source_record_urls #803

colleenXu opened this issue Mar 26, 2024 · 12 comments

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 26, 2024

TRAPI 1.5 has updated documentation for "references", that says "urls to an external page for the association" should be put into the KG Edge's sources (in the source_record_urls field for one of the source objects), rather than the biolink:publications edge-attribute.

We've gotten confirmation from the UI team that they're aware of this and plan to add support for it (Translator Slack link).

My implementation idea: use a different keyword in the response-mapping

  • right now, we use ref_url to tell BTE to add the field's values to the biolink:publications edge-attribute (previous issue)
  • maybe we could use source_url to tell BTE to use the field's values in the sources
    • add to the primary-knowledge-source's object? ("resource_role": "primary_knowledge_source")
    • add a key:value pair, where key = source_record_urls and value is an array of strings (make 1-element arrays for single urls) from the mapped response field.

We'd do this for:

  • BioThings BindingDB (bindingdb webpages)
  • BioThings rare-source (rare-source webpages; do special reverses so we can retrieve urls in both directions?)
  • MyGene clingen operations (clingen webpages, do special reverses so we can retrieve urls in both directions?)
@colleenXu
Copy link
Collaborator Author

colleenXu commented Mar 27, 2024

For development, use these yamls:

We'll add overrides to these yamls when we deploy this feature.


And WE ARE WAITING AND WON'T MERGE the SmartAPI yaml PRs until AFTER this feature is deployed to Prod:

After these SmartAPI yaml PRs are merged, overrides to this branch's yamls can be removed from BTE.

@tokebe
Copy link
Member

tokebe commented Mar 28, 2024

@colleenXu Do you have any example queries that would hit BindingDB and likely return source_url with the above override?

@colleenXu
Copy link
Collaborator Author

colleenXu commented Mar 28, 2024

@tokebe Sorry for seeing this late >.<

Query for "protein-ligand" operation

Start with CD47 (aka UniProtKB:Q08722 / NCBIGene:961)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"],
                    "ids": ["UniProtKB:Q08722"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:physically_interacts_with"]
                }
            }
        }
    }
}

Currently this is one of the edges:

                "627d6da60b47a3585f493321cb491e82": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "NCBIGene:961",
                    "object": "PUBCHEM.COMPOUND:155537282",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search",
                                "PMID:31403795",
                                "doi:10.1021/acs.jmedchem.9b00024"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                },

With the different handling for source_url, the long url should be in the primary-source-object instead

                "627d6da60b47a3585f493321cb491e82": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "NCBIGene:961",
                    "object": "PUBCHEM.COMPOUND:155537282",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "PMID:31403795",
                                "doi:10.1021/acs.jmedchem.9b00024"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source",
                            "source_record_urls":  [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                },

Query for "ligand-protein" operation

Start with chemical INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:SmallMolecule"],
                    "ids": ["INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:physically_interacts_with"]
                }
            }
        }
    }
}

Currently there's only 1 edge

            "edges": {
                "00c844be0dd7da974fcb364b5bc9c1e0": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:134553288",
                    "object": "NCBIGene:187",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                }
            }
        },

With the different handling for source_url, the long url should be in the primary-source-object instead - so there'll be no edge attributes

            "edges": {
                "00c844be0dd7da974fcb364b5bc9c1e0": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:134553288",
                    "object": "NCBIGene:187",
                    "attributes": [],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source",
                            "source_record_urls":  [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                }
            }
        },

@colleenXu
Copy link
Collaborator Author

colleenXu commented Apr 2, 2024

I've updated my comment above with all of the adjusted SmartAPI yamls for this feature. We'll add overrides to these yamls for this feature later.

I didn't adjust the MyGene reverse operation (geneToDisease) to retrieve the url, because I encountered an issue with jmespath (comment, issue). I didn't encounter this issue when adjusting the BioThings rare source operation - probably because the value of raresource.disease field is always an array (even if there's only 1 element).

I don't think this is a blocking issue for this feature though.

@colleenXu
Copy link
Collaborator Author

@tokebe

I've added a PR in bte-server to add the overrides needed biothings/bte-server#25

Also: could we remove the source_record_urls field when it has no value? Right now it's in every primary knowledge source object, when it'll only be filled in a few cases.

(based on a quick look)

@tokebe
Copy link
Member

tokebe commented Apr 12, 2024

Latest commit should fix this.

@tokebe tokebe added the On CI Related changes are deployed to CI server label Apr 12, 2024
@colleenXu
Copy link
Collaborator Author

Note: we can keep the override to BioThings rare-source, but I reverted the branch's yaml so it's using ref_url (commit). AKA it's not using this feature anymore. These links provide related literature info, but don't explain where the original association came from (GARD). So they should be edge-attributes

@colleenXu colleenXu added On CI -> Test and removed On CI Related changes are deployed to CI server labels May 3, 2024
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels May 9, 2024
@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 4, 2024

We'd like to use source_url for BioThings PFOCR's pfocrUrl field (Ref Slack discussion with @AlexanderPico and NCATS-Tangerine/translator-api-registry#132 (comment)).

However, we needed to adjust our "unique edge hashing" so a TRAPI edge could have multiple values in the source_record_urls array. I think this is needed for our edges from BioThings PFOCR - since I don't think we want a separate edge for every subject/object/figure combo. Currently, we merge records so an edge can contain info from multiple figures with the same triple (subject/object entities).


We have two code changes that have been deployed to dev/CI -> and we want to patch to Test where the rest of the code for this feature is:

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 14, 2024

The BTE code + old overrides (having ncats_rare_source rather than pfocr) were deployed to Prod as part of the Octopus release. I tested and it's live.

However, the latest PFOCR stuff is only on dev/CI right now. I think we want to get this in as a patch to Test/Prod.


So I havne't merged the SmartAPI yaml PR yet, or added the override removal to the chore

@colleenXu colleenXu removed the On Test Related changes are deployed to Test server label Jun 14, 2024
@colleenXu
Copy link
Collaborator Author

See #811 (comment) for details on how we're going to remove the overrides/deploy the PFOCR stuff.

@colleenXu colleenXu added On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server needs discussion labels Jun 24, 2024
@colleenXu colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Jul 19, 2024
@tokebe
Copy link
Member

tokebe commented Jul 26, 2024

Related PRs deployed to Prod.

@tokebe tokebe closed this as completed Jul 26, 2024
@colleenXu
Copy link
Collaborator Author

I've updated BioThings PFOCR x-bte annotation to use this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants