-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add offsides data from nsides #110
base: main
Are you sure you want to change the base?
Add offsides data from nsides #110
Conversation
data/nsides/offsides/meta.yaml
Outdated
description: Standard error of the PRR estimate | ||
type: continuous | ||
names: | ||
- Proportional reporting ratio error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think this is something the model should be able to predict? (I'm just curious)
Overall, looks quite good to me. Your PR does not cause the pre-commit errors. |
No, I don't think the error can be predicted but it can be calculated based on other columns in the dataset like columns A, B C, and D. |
Great! Lmk if something doesn't work. I have another similar dataset in the works once that I will commit after this is merged |
in this case, I would consider removing this column as it might add more confusion than signal to the model |
Has this dataset been used in some benchmarks/papers? Thanks again for your contribution 💯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your contribution; it would be great if we could address the comments/clarifications. Let me know if you want a discussion or need a hand
@kjappelbaum I've removed those columns and yes, the dataset has been used in the following paper: https://pubmed.ncbi.nlm.nih.gov/22422992/ pls lmk if anything is unclear or if you'd like me to make further changes |
Thanks a lot! Can you still remove the column from the target list in the |
…emnlp into dataset_drugxdrug
Done! Lmk if I missed anything else. |
data/nsides/offsides/meta.yaml
Outdated
- id: mean_reporting_frequency | ||
description: Proportion of reports for the drug that report the side effect | ||
type: continuous | ||
names: | ||
- mean reporting frequency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the absolute number something a model should be able to predict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, according to the paper, a model should be able to predict it
data/nsides/offsides/meta.yaml
Outdated
- id: drug_concept_name | ||
description: RxNorm name string for the drug | ||
type: categorical | ||
- id: condition_concept_name | ||
description: MedDRA identifier for the side effect | ||
type: categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both of them simultaneously for the ratio to be meaningful?
That is, a correct prompt would ask the model something like
"What is the proportional reporting ratio for for "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, so the prompt would be "what is the PRR of <condition_concept_name> for the <drug_concept_name>?". higher PRR means higher reported side effect for that particular drug.
data/nsides/offsides/meta.yaml
Outdated
bibtex: "\n @article{Tatonetti2012,\n author = {Tatonetti, Nicholas P. and Ye, Peter P. and Daneshjou, Roxana and Altman, Russ B.},\n \ | ||
\ title = {Data-driven prediction of drug effects and interactions},\n journal = {Sci Transl Med},\n volume = {4},\n number\ | ||
\ = {125},\n pages = {125ra31},\n year = {2012},\n doi = {10.1126/scitranslmed.3003377},\n pmid = {22422992},\n pmcid\ | ||
\ = {PMC3382018}\n }\n " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove those newlines somehow and just have a multiline string? I can help with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! i'll let you take care of it if thats ok
Sorry for being unclear with my last reviews. I have more suggestions, but I can also take care of them if you prefer. Thanks for your contribution! |
Co-authored-by: Kevin M Jablonka <32935233+kjappelbaum@users.noreply.github.com>
Co-authored-by: Kevin M Jablonka <32935233+kjappelbaum@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I just added some minor changes to the text
- id: drug_concept_name | ||
description: RxNorm name string for the drug | ||
type: categorical | ||
- id: condition_concept_name | ||
description: MedDRA identifier for the side effect | ||
type: categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to use prompt templates, I guess, because they are not independent.
No description provided.