Dose response benchmarking data #208

annaquaglieri16 · 2023-08-30T07:45:00Z

Hi there,

Thanks a lot for developing the protti package and for putting together the R workflow in this document https://jpquast.github.io/protti/articles/data_analysis_dose_response_workflow.html. !

I'm interested in the dose-response work and I am running your package (and therefore the drc package and estimates) on different operating systems in production on our web platform.

One thing that I noticed is that because of the optimisation with optim the model estimates with drc can be slightly different and it's hard for me to benchmark results so I'm looking for a ground truth dataset where I can compare results and make sure I'm getting always what I need to. Are you able to share a csv with the results that you show in the all hits table (top rows below) so that I can compare the actual EC50 values in your table with what I obtain without the approximation. I noticed that the estimate of the EC50 could vary quite a bit between OSs.

rank	score	eg_precursor_id	pg_protein_accessions	anova_adj_pval	correlation	ec_50
1	0.919	VFDVELLKLE.2	P62942	3.98e-13	0.967	3.6e+06
2	0.914	RGQTC[Carbamidomethyl (C)]VVHYTGMLEDGK.3	P62942	4.73e-14	0.947	3.0e+05
3	0.888	GWEEGVAQMSVGQR.2	P62942	1.08e-13	0.947	4.7e+05

Also, do you have other benchmarking data of which you know dose-response curves and effect and that users could use for benchmarking?

Thanks a lot for this!

Anna

The text was updated successfully, but these errors were encountered:

dschust-r · 2023-09-04T13:21:00Z

Dear Anna,

Here would be a csv file of the fit table (not filtered for only hits, without the plot_points and the plot_curve columns.
Is that what you were asking for? I am running the analysis on Windows (in case you were wondering).

fit.csv

You should be able to run all of these analyses yourself with the datasets we provide in protti. For this case we use the rapamycin_dose_response data set from this publication.

The rapamycin data set is a good one for benchmarking, as in this experiment we expect very few off-target hits or hits that are difficult to explain (secondary effects). We also reduced the size of the original data set by only including some random proteins and the target.

You could create an artificial data set with known ground-truth using theprottifunction create_synthetic_data() and method = "dose_response". (This might be the cleanest solution to your problem/question)

You could otherwise also try this data set here: https://www.ebi.ac.uk/pride/archive/projects/PXD038768
You should find some code for the analysis with protti, as well as all other relevant files in there.
In this experiment we were identifying calmodulin binding sites on the retinal CNG channel with LiP-MS. This data set is different from the other one - here we are studying protein-protein interactions instead of protein-drug interactions.
As calmodulin also binds to other proteins, we also identify other proteins as targets, some of which we cannot explain.

I hope this helps!

All the best,
Dina

annaquaglieri16 · 2023-09-05T14:52:22Z

Thanks a lot for this thorough reply Dina! I really appreciate it.

I'm going to use protti internal data for my benchmarking tests for the moment. The main issue that I noticed with the drc package is that the results, especially the EC50, could change a bit different depending on the operating system where the package is run. Usually it's a matter of rounding but it can be tricky to get good reprodubility.

These good hits from your dataset are really helpful as I can at least check that the dose-response curve makes sense for the benchmarking data.

Thanks a lot again!

Anna

annaquaglieri16 · 2023-10-30T01:54:24Z

Hi Dina,

I'm re-opening this issue with a quick question about the rapamycin_dose_response data.

Do you confirm that the rapamycin_dose_response data in the protti package is the same data used to produce the dose-response curves in Fig 1d of Piazza 2020 ?

I'm asking this because from the protti analysis workflow here I see that the example dose-response curve for _VFDVELLKLE_.2 used data that reach a dose concentration 10^8

but in Fig 1d the dose only reaches 10^4.

Thanks for your help!

Anna

jpquast · 2023-10-31T07:49:58Z

Hi Anna,
the answer is that the unit is different. As you can see the protti plot is in pM while the plot in the paper is in nM. We chose to use pM so all the concentration points don't have decimal points. But technically it should also work with decimal point concentrations.

The highest concentration in the paper is 10^4 which is 1e+05 nM while ours is 1e+08 pM, which is also 1e+05 nM.

So the plot should be the same between paper and protti, but the dataset rapamycin_dose_response only contains a subset of the whole dataset. It contains all peptides from FKBP1A (the target protein) and 39 more randomly sampled proteins from the dataset including their peptides. The reason is the dataset size.

dschust-r closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dose response benchmarking data #208

Dose response benchmarking data #208

annaquaglieri16 commented Aug 30, 2023

dschust-r commented Sep 4, 2023

annaquaglieri16 commented Sep 5, 2023

annaquaglieri16 commented Oct 30, 2023

jpquast commented Oct 31, 2023 •

edited

Loading

Dose response benchmarking data #208

Dose response benchmarking data #208

Comments

annaquaglieri16 commented Aug 30, 2023

dschust-r commented Sep 4, 2023

annaquaglieri16 commented Sep 5, 2023

annaquaglieri16 commented Oct 30, 2023

jpquast commented Oct 31, 2023 • edited Loading

jpquast commented Oct 31, 2023 •

edited

Loading