Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonym lookup super slow? How to fix? #2367

Closed
edeutsch opened this issue Sep 7, 2024 · 7 comments
Closed

Synonym lookup super slow? How to fix? #2367

edeutsch opened this issue Sep 7, 2024 · 7 comments
Assignees

Comments

@edeutsch
Copy link
Collaborator

edeutsch commented Sep 7, 2024

I've noticed this for a while, but only posting now. Has anyone noticed that the Synonym lookup through the ARAX GUI is super slow? Try searching for metformin or ibuprofen or anything reasonably common, and I start hearing my CPU fans groaning and it takes 15+ seconds for something to appear. I assume this is either because so much data is returned or rendering the graph is so expensive or? I wonder if anyone else has this issue? And if anyone has ideas on how best to solve it? Return less data? Don't render the graph unless asked? This service was great when answers came back within a second, but now it's painful to use.

ideas?

@amykglen
Copy link
Member

amykglen commented Sep 8, 2024

yes, this started happening after we started using the SRI Node Normalizer's drug_chemical_conflate parameter, which made the clusters for certain drugs really big.

I definitely think it's the 'match graph' that's causing the issue (I think the acetaminophen graph has 10s of thousands of edges now) - I wonder if we could just not display the graph if it has more than some reasonable number of edges? not sure if there's an existing way to determine the number of edges without actually having to load all of them..

@isbluis
Copy link
Member

isbluis commented Sep 17, 2024

As a quick test in devLM, looking up metformin results in the following rough timings:

  • 13 seconds to receive JSON response (>77Mb)
  • 35 seconds to render full table, without displaying the Concept Graph
  • 55 seconds to display, including graph

@edeutsch
Copy link
Collaborator Author

oof, thanks. Yeah, I think we should put some effort into slimming down the response first somehow. And then maybe something on the front end.

@amykglen
Copy link
Member

ok, per discussion with @edeutsch and others today - I've added an optional max_synonyms parameter to the NodeSynonymizer's get_normalizer_results() (in master), which you can use like this:

synonymizer.get_normalizer_results(entities=DOID:14330, max_synonyms=2)

and which produces a truncated cluster like this one (I haven't shown the full knowledge_graph below, but it is also truncated to two nodes and only edges that connect those two nodes):

{
  "DOID:14330": {
    "id": {
      "identifier": "MONDO:0005180",
      "name": "Parkinson disease",
      "category": "biolink:Disease",
      "SRI_normalizer_name": "Parkinson disease",
      "SRI_normalizer_category": "biolink:Disease",
      "SRI_normalizer_curie": "MONDO:0005180"
    },
    "total_synonyms": 18,
    "categories": {
      "biolink:Disease": 18
    },
    "nodes": [
      {
        "identifier": "DOID:14330",
        "category": "biolink:Disease",
        "label": "Parkinson's disease",
        "major_branch": "DiseaseOrPhenotypicFeature",
        "in_sri": true,
        "name_sri": "Parkinson's disease",
        "category_sri": "biolink:Disease",
        "in_kg2pre": true,
        "name_kg2pre": "Parkinson's disease",
        "category_kg2pre": "biolink:Disease"
      },
      {
        "identifier": "MONDO:0005180",
        "category": "biolink:Disease",
        "label": "Parkinson disease",
        "major_branch": "DiseaseOrPhenotypicFeature",
        "in_sri": true,
        "name_sri": "Parkinson disease",
        "category_sri": "biolink:Disease",
        "in_kg2pre": true,
        "name_kg2pre": "Parkinson disease",
        "category_kg2pre": "biolink:Disease"
      }
    ],
    "knowledge_graph": {
      "nodes": {
        ...

so we were thinking the UI can decide how many nodes is reasonable to display in one cluster (e.g., 200?), and then call get_normalizer_results() with that number as max_synonyms. and maybe also provide a dropdown or the like that lets a user increase max_synonyms.

note that the top-level "categories" slot shown above that reports node counts by category includes counts for the full cluster, and I also added a top-level "total_synonyms" slot to make it easy to report how many nodes are in the full cluster.

let me know if I can do anything else!

@edeutsch
Copy link
Collaborator Author

back end now supports max_synonyms=N in a POSTed dict.

@isbluis
Copy link
Member

isbluis commented Oct 31, 2024

The parameter has been added to the Settings pane of the UI (in devLM), and a warning is given when the output is truncated:
https://arax.ncats.io/devLM/index.html?term=aspirin

image

isbluis added a commit that referenced this issue Nov 1, 2024
…[q] (#2400)

- Add new user setting for max number of nodes to list in Synonyms [default=50] (#2367)
- Tidy layout of Settings
- Add "months" to text for long-running Queries in System Activity
- Update ARAXi DSL helper JSON
@edeutsch
Copy link
Collaborator Author

edeutsch commented Nov 1, 2024

Fixed in master by limiting the number of nodes.

Although I now belatedly wonder if that was really the right way to fix it.
What if I really want to see the 300 synonyms of ibuprofen without crashing my browser. Surely a table of 300 things isn't so bad. Maybe it would have been better just to suppress the knowledge graph and that would have fixed it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants