Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Race variable when used to create cohort and then applied at multivariate endpoint #308

Closed
karafecho opened this issue Mar 19, 2024 · 6 comments
Assignees

Comments

@karafecho
Copy link
Contributor

karafecho commented Mar 19, 2024

This issue is to report an apparent bug that Brenna identified when developing multivariate queries. She's retested after our last two bug fixes and redeployments, but the issue was still apparent. She's also recreated the issue when running command-line CURLs. I'm not sure what's going on, but I think it appears to be related to the "in" operator. So, perhaps the root cause of this issue is the same as issue #305.

Cohort 2 (restrict Race, COHORT:14 in curl requests)

Cohort definition:
image

Table generation:
image

Adding additional variables and/or using CURL requests yields the same result.

Cohort 4 (restrict Race and TotalEDInpatientVisits, COHORT:15 in curl requests)
Cohort definition:
image
image

Table generation:
image
image

@karafecho
Copy link
Contributor Author

karafecho commented Mar 20, 2024

More strange results from Brenna:

05H_Summary_03.20.2024.pdf

I'm not sure what's going on, but I'm not able to replicate Brenna's results, at least the few I tested. For example:

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":"2010"}}'

Same output with or without quotations around "2010".

  "return value": {
    "cohort_id": "COHORT:3",
    "size": 1311
  }
}

Another example:

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":"2010"}, "TotalEDInpatientVisits":{"operator":"<=","value":9}}'

Same output with or without quotation marks around "9".

  "return value": {
    "cohort_id": "COHORT:17",
    "size": 1311
  }
}

@karafecho
Copy link
Contributor Author

karafecho commented Mar 20, 2024

Please disregard the prior post. I used the wrong year. Nonetheless, I still can't reproduce the results.

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":2020}}'
  "return value": {
    "cohort_id": "COHORT:19",
    "size": 4753
  }
}

Brenna's query yielded N=4168 patients.

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":2020}, "TotalEDInpatientVisits":{"operator":"<=","value":9}}'
  "return value": {
    "cohort_id": "COHORT:20",
    "size": 4569
  }
}

Brenna's query yielded N=4341 patients.

@karafecho
Copy link
Contributor Author

karafecho commented Mar 21, 2024

FYI: Queries of the csv file from year=2020 yield sample sizes for the two cohorts above that match those obtained from queries of the API. I think there's an issue on Brenna's end, although I have not done extensive testing.

@karafecho
Copy link
Contributor Author

Post-fix tests on dev:

All passed except this one:

curl -X 'POST'
'https://icees-pcd-dev.apps.renci.org/patient/cohort'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{"Race":{"operator":"in","values":["African American","Caucasian","Asian"]}}'

"return value": "Input features invalid or cohort ≤10 patients. Please try again."

Also, I ran this query with COHORT:7 (year=2010):

curl -X 'POST' \
  'https://icees-pcd-dev.apps.renci.org/cohort/COHORT%3A7/multivariate_feature_analysis' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '[
  "TotalEDInpatientVisits",
  "Sex2",
  "Race_UNC"
]'

It ran fine, but it didn't yield any results. Shouldn't it have returned an error, given that Race_UNC doesn't exist in the dataset?

@karafecho
Copy link
Contributor Author

All bugs have been fixed in the code supporting the PCD and DILI instances, deployed to DEV, tested, deployed to PROD, and retested. Note that the asthma and COVID instances will need to be updated at some point in the future.

karafecho pushed a commit that referenced this issue Mar 27, 2024
…in features endpoint (#305) (#309)

* fixed the in operator

* addressed issue #305 with cohort constraint for features endpoint

* fixing test

* fixed test

* revert previous syntax in test

* addressed a couple of more issues discovered from bug fix testing
@karafecho
Copy link
Contributor Author

Bug fixed and tested, so closing ticket ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants