Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usi supported as id #26

Closed
ypriverol opened this issue Oct 26, 2022 · 7 comments
Closed

usi supported as id #26

ypriverol opened this issue Oct 26, 2022 · 7 comments

Comments

@ypriverol
Copy link

@percolator @MatthewThe

I'm trying to use maracluster to cluster billions of spectra. One problem I found is that we have multiple files, and we would like to use usi (https://www.nature.com/articles/s41592-021-01184-6) as identifier of the spectrum in the mgf and then get back the report from maracluster instead that with the index with the usi.

This is how a USI looks like in an MGF:

BEGIN IONS
TITLE=id=mzspec:PXD001924:20140106_52_mlplus_tm3:index:10371,sequence=KWDLGDIVAAR/2
PEPMASS=622.344787597656
CHARGE=2.0+
   595.570	2949.085
   645.291	527.688
   369.346	276.108
   276.277	94.888
  1059.633	35.399
   525.239	212.923
   621.109	8.745
   185.405	132.694
   609.847	2439.161
   800.585	620.666
  1104.713	49.088
   924.684	24.219
   388.354	269.499
   668.601	448.725
   451.616	441.908
   191.218	12.017
   824.108	1919.122
  1083.648	38.183
   444.346	111.252
   744.399	1488.798
  1028.557	160.071
   782.752	429.679
   458.326	405.505
   775.522	269.681
   760.675	420.993
   347.277	213.037
   190.150	55.043
   499.230	1344.165
   509.365	933.466
  1045.479	50.976
   405.413	178.342
   891.637	108.553
   642.355	436.559
   518.248	287.477
   837.390	624.936
   447.333	761.800
   474.377	386.830
   702.498	1712.772
   520.684	696.569
   783.498	811.340
   311.261	85.109
   911.617	517.818
   588.286	3341.009

The id will be for this spectrum mzspec:PXD001924:20140106_52_mlplus_tm3:index:10371.

Do you think you can support this in MaraCluster?

@MatthewThe
Copy link
Contributor

Thanks for the suggestion!

It would indeed be a nice addition to propagate spectrum identifiers to the output format.
I think it shouldn't be too difficult, I will have a look.

@MatthewThe
Copy link
Contributor

Hi Yasset,

I have started working on this. Would it be fine for you if we just return the entire title, e.g. id=mzspec:PXD001924:20140106_52_mlplus_tm3:index:10371,sequence=KWDLGDIVAAR/2
Implementation wise, this is a bit easier.

@ypriverol
Copy link
Author

go for it.

@MatthewThe
Copy link
Contributor

I added a command line argument --addSpecIds, which now adds the spectrum id/title as a fourth column to the clustering output. I will create a new release once all the builds have passed.

@ypriverol
Copy link
Author

@MatthewThe @percolator let me know when the release is done.

@MatthewThe
Copy link
Contributor

Sorry for the delay, I released version 1.04 now which includes the new feature.

@ypriverol
Copy link
Author

Thanks, I will give it a try. !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants