Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFQ MBR FDR algorithm needed. #303

Open
3 tasks
ypriverol opened this issue Oct 13, 2023 · 8 comments
Open
3 tasks

LFQ MBR FDR algorithm needed. #303

ypriverol opened this issue Oct 13, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed high-priority

Comments

@ypriverol
Copy link
Member

ypriverol commented Oct 13, 2023

Description of the Feature

During the benchmark of quantms using LFQ and MBR (issues #300 #301 #287) we developed a new probabilistic algorithm based on SVM that control the number of false positives in a better way than previous proteomicsLFQ algorithm (based on number of samples where the feature is found).

However, the current algorithm produces better reliable results issues #301 #287 we should aim in ProteomicsLFQ a better FDR control algorithm that only use one parameter. In addition, would be great to improve the algorithm and feature detection. From my point of view, these are the priorities for that algorithm:

  • Implement an FDR-based approach for MBR reducing the number of parameters.
  • Improve the feature detection, including the possibility to do feature transfer across any msrun in the experiment. I think OpenMS only transfer features across samples in the same condition, however MQ uses all msruns in the experiment, which may be the source of the differences between tools.
  • Implement the MBRs for TMT datasets similar to the following manuscript https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00209

We can discuss the details @timosachsenberg @jpfeuffer @daichengxin.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Oct 13, 2023
@ypriverol ypriverol added enhancement New feature or request help wanted Extra attention is needed high-priority release 1.3 and removed bug Something isn't working labels Oct 13, 2023
@timosachsenberg
Copy link

timosachsenberg commented Oct 13, 2023

I think it should transfer between all files of the same fraction number already.
I think our settings are a bit conservative to not inflate the transfer FDR too much. A more data driven approach would be great here.

E.g., I could imagine that we could

  • determine most similar runs (e.g., aka mapalingertreeguided)
  • train classifier on identified target and decoy (mass offset) features to model correct transfer and wrong transfer (to offset feature).
  • use classifier in FeatureLinkderUnlabeledQT to annotate linking p-values
  • figure out a way how to filter those to attain a global transfer FDR

@jpfeuffer and @cbielow what do you think?

More scalable alternatives would be approaches like IonQuant or Sage.

@ypriverol
Copy link
Member Author

I think it should transfer between all files of the same fraction number already. I think our settings are a bit conservative to not inflate the transfer FDR too much. A more data driven approach would be great here.

I'm probably wrong but MQ do not care much about fraction identifiers, they do transfer also across fractions. My guess is based on the assumption that MQ do not know what raw file belongs to what fraction.

@cbielow
Copy link

cbielow commented Oct 13, 2023

I'm probably wrong but MQ do not care much about fraction identifiers, they do transfer also across fractions. My guess is based on the assumption that MQ do not know what raw file belongs to what fraction.

Actually, MQ only transfers ID's across fractions which are at most 1 fraction apart. Hence you also have to tell MQ about the fraction number in the experimental design.
Of course, if you simply "forget" to annotate fractions in MQ, then it will transfer whatever it can across all runs (and incur a massive false positive rate...)

@ypriverol
Copy link
Member Author

This is interesting @cbielow. Nice discussion. I have seen a lot of experiments not providing fraction information. Do the FDR algorithm of MQ @cbielow correct that, or the FDR will be inflated (if that is the case, do you have a paper reference or some data to show that?)

@cbielow
Copy link

cbielow commented Oct 13, 2023

I only have very old data (and I would need to dig a lot to find it) and anecdotal evidence.

there is https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346880/ which does not discuss fractions, but shows that MQ FDR is not kept at bay, unless you enable the MQ LFQ algorithm.

There is also a discussion on the MQ mailing list on this: https://groups.google.com/g/maxquant-list/c/a9bZMUeSE7Y/m/J6Rw174oCAAJ
Even in newer MQ versions, the XML config still has <matchBetweenRunsFdr>False</matchBetweenRunsFdr> by default, with no way of enabling it in the GUI and its hard to find any documentation on the topic. So it seems MQ is not very confident about this and disables it.

There is also https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8131922/ which describes an FDR method, but which can be augmented with more data to make it better IMHO. The paper also uses MQ 1.6 which is rather old.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 13, 2023

Good ideas. The problem with the last approach is that it is very costly with our current data structures.
I think we would need a binned and indexed representation of an experiment to make this viable (see flashlfq or sage).

And I think we might need to dissect the FFID API to be able to extract single features on demand. Currently it is very focussed on processing a full set of predefined IDs.

@jpfeuffer
Copy link
Collaborator

Not saying it can't be done. @timosachsenberg and me were just thinking about potentially faster or easier to implement ways

@timosachsenberg
Copy link

Btw interesting that the lfq algorithm (did not look into detail but think it is maxlfq) seems to correct for some wrong linking. Can probably be seen as a robust summarization method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed high-priority
Projects
None yet
Development

No branches or pull requests

4 participants