Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dose response benchmarking data #208

Closed
annaquaglieri16 opened this issue Aug 30, 2023 · 4 comments
Closed

Dose response benchmarking data #208

annaquaglieri16 opened this issue Aug 30, 2023 · 4 comments

Comments

@annaquaglieri16
Copy link

Hi there,

Thanks a lot for developing the protti package and for putting together the R workflow in this document https://jpquast.github.io/protti/articles/data_analysis_dose_response_workflow.html. !

I'm interested in the dose-response work and I am running your package (and therefore the drc package and estimates) on different operating systems in production on our web platform.

One thing that I noticed is that because of the optimisation with optim the model estimates with drc can be slightly different and it's hard for me to benchmark results so I'm looking for a ground truth dataset where I can compare results and make sure I'm getting always what I need to. Are you able to share a csv with the results that you show in the all hits table (top rows below) so that I can compare the actual EC50 values in your table with what I obtain without the approximation. I noticed that the estimate of the EC50 could vary quite a bit between OSs.

rank score eg_precursor_id pg_protein_accessions anova_adj_pval correlation ec_50
1 0.919 VFDVELLKLE.2 P62942 3.98e-13 0.967 3.6e+06
2 0.914 RGQTC[Carbamidomethyl (C)]VVHYTGMLEDGK.3 P62942 4.73e-14 0.947 3.0e+05
3 0.888 GWEEGVAQMSVGQR.2 P62942 1.08e-13 0.947 4.7e+05

Also, do you have other benchmarking data of which you know dose-response curves and effect and that users could use for benchmarking?

Thanks a lot for this!

Anna

@dschust-r
Copy link
Collaborator

Dear Anna,

Here would be a csv file of the fit table (not filtered for only hits, without the plot_points and the plot_curve columns.
Is that what you were asking for? I am running the analysis on Windows (in case you were wondering).

fit.csv

You should be able to run all of these analyses yourself with the datasets we provide in protti. For this case we use the rapamycin_dose_response data set from this publication.

The rapamycin data set is a good one for benchmarking, as in this experiment we expect very few off-target hits or hits that are difficult to explain (secondary effects). We also reduced the size of the original data set by only including some random proteins and the target.

You could create an artificial data set with known ground-truth using theprottifunction create_synthetic_data() and method = "dose_response". (This might be the cleanest solution to your problem/question)

You could otherwise also try this data set here: https://www.ebi.ac.uk/pride/archive/projects/PXD038768
You should find some code for the analysis with protti, as well as all other relevant files in there.
In this experiment we were identifying calmodulin binding sites on the retinal CNG channel with LiP-MS. This data set is different from the other one - here we are studying protein-protein interactions instead of protein-drug interactions.
As calmodulin also binds to other proteins, we also identify other proteins as targets, some of which we cannot explain.

I hope this helps!

All the best,
Dina

@annaquaglieri16
Copy link
Author

Thanks a lot for this thorough reply Dina! I really appreciate it.

I'm going to use protti internal data for my benchmarking tests for the moment. The main issue that I noticed with the drc package is that the results, especially the EC50, could change a bit different depending on the operating system where the package is run. Usually it's a matter of rounding but it can be tricky to get good reprodubility.

These good hits from your dataset are really helpful as I can at least check that the dose-response curve makes sense for the benchmarking data.

Thanks a lot again!

Anna

@annaquaglieri16
Copy link
Author

Hi Dina,

I'm re-opening this issue with a quick question about the rapamycin_dose_response data.

Do you confirm that the rapamycin_dose_response data in the protti package is the same data used to produce the dose-response curves in Fig 1d of Piazza 2020 ?

I'm asking this because from the protti analysis workflow here I see that the example dose-response curve for _VFDVELLKLE_.2 used data that reach a dose concentration 10^8

Screenshot 2023-10-30 at 12 52 56 pm

but in Fig 1d the dose only reaches 10^4.

Screenshot 2023-10-30 at 12 53 13 pm

Thanks for your help!

Anna

@jpquast
Copy link
Owner

jpquast commented Oct 31, 2023

Hi Anna,
the answer is that the unit is different. As you can see the protti plot is in pM while the plot in the paper is in nM. We chose to use pM so all the concentration points don't have decimal points. But technically it should also work with decimal point concentrations.

The highest concentration in the paper is 10^4 which is 1e+05 nM while ours is 1e+08 pM, which is also 1e+05 nM.

So the plot should be the same between paper and protti, but the dataset rapamycin_dose_response only contains a subset of the whole dataset. It contains all peptides from FKBP1A (the target protein) and 39 more randomly sampled proteins from the dataset including their peptides. The reason is the dataset size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants