Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid row/colnames after reading mzTab #501

Closed
sneumann opened this issue Apr 13, 2020 · 3 comments
Closed

Valid row/colnames after reading mzTab #501

sneumann opened this issue Apr 13, 2020 · 3 comments

Comments

@sneumann
Copy link
Contributor

Unusual bug report: when reading mzTab as in https://github.com/lgatto/MSnbase/blob/master/tests/testthat/test_MzTab.R#L9, I am getting valid names in e.g.

> colnames(xx@SmallMolecules)
 [1] "SML_ID"                                "SMF_ID_REFS"                          
 [3] "database_identifier"                   "chemical_formula"                     
...
[17] "abundance_assay.4."                    "abundance_assay.5."                   
[19] "abundance_assay.6."                    "abundance_study_variable.1."          
[21] "abundance_variation_study_variable.1." "abundance_study_variable.2."          
[23] "abundance_variation_study_variable.2." "opt_global_Progenesis_identifier"     

> length(xx@SmallMolecules[,"abundance_assay.3."])  ## Works with [-operator
[1] 55

> length(xx@SmallMolecules$abundance_assay.3.)      ## Also works with $-operator
[1] 55

Which means I can't use 'em as-is when writing mzTab back out to file.
I'd have to do some string-foo and search&replace numbers enclosed in '.',
and even that could go wrong under some conditions.

OTOH the metadata names are fine (actually: illegal in R):

> names(xx@Metadata)[1:6]
[1] "mzTab-ID"                   "software[1]"                "ms_run[1]-location"        
[4] "ms_run[1]-scan_polarity[1]" "ms_run[1]-format"           "ms_run[1]-id_format"  ...

> xx@Metadata$'assays[1]'   ## Can't access a list element by illegal name with $-operator:
NULL

> xx@Metadata[['assay[1]']] ## Need to access a list element by [[-operator:
[1] "3injections_inj1_POS"

=> I'd like to propose to accept and use the illegal row/col names when it comes to mzTab,
and deal with it in the downstream code where that happens.

Objections ?

Yours, Steffen

@lgatto
Copy link
Owner

lgatto commented Apr 13, 2020

The naming in mzTab is looking for issues - that [1]... really?

The reason why the names in the metadata are the way they are (illegal) is because they don't stem from a dataframe, are casted to a list and reshaped, thus preserving their illegality.

How would you go about to use illegal column names?

@sneumann
Copy link
Contributor Author

As shown above, the [ operator works, it needs quoting anyway: xx@Metadata[['assay[1]']]
What doesn't work is the $ operator, even with quoting. In the current MSnbase I couldn't find an example where the "fixed" colnames are required creating some MSnbase objects.
What do the assay names look like if I load an example file ? Or are they just overwritten
with the correct names from the MTD section ? Yours, Steffen

@lgatto
Copy link
Owner

lgatto commented Apr 13, 2020

You would need to use backticks for the irregular names:

> xx@Metadata$`assay[1]`
[1] "3injections_inj1_POS"

The reason the colnames are "fixed" for the SmallMolecules (as well as Proteins, Peptides, PSMs) is that these are tabular data created by read.delim and adjusted by make.names:

> make.names("assay[1]")
[1] "assay.1."

If would be possible to unset this by setting check.names = FALSE, which would return:

> xx <- MzTab(fl) ## updated code
> xx
Object of class "MzTab".
 Description:
 Mode: 
 Type: 
 Available data: SmallMolecules  
> names(xx@SmallMolecules)
 [1] "SML_ID"                               
 [2] "SMF_ID_REFS"                          
 [3] "database_identifier"                  
 [4] "chemical_formula"                     
 [5] "smiles"                               
 [6] "inchi"                                
 [7] "chemical_name"                        
 [8] "uri"                                  
 [9] "theoretical_neutral_mass"             
[10] "adduct_ions"                          
[11] "reliability"                          
[12] "best_id_confidence_measure"           
[13] "best_id_confidence_value"             
[14] "abundance_assay[1]"                   
[15] "abundance_assay[2]"                   
[16] "abundance_assay[3]"                   
[17] "abundance_assay[4]"                   
[18] "abundance_assay[5]"                   
[19] "abundance_assay[6]"                   
[20] "abundance_study_variable[1]"          
[21] "abundance_variation_study_variable[1]"
[22] "abundance_study_variable[2]"          
[23] "abundance_variation_study_variable[2]"
[24] "opt_global_Progenesis_identifier"     

As above, the $ operator would need to be used with backticks.

Is that what you are looking for to make it easier to write back MzTab data?

sneumann added a commit to sneumann/MSnbase that referenced this issue Apr 13, 2020
@lgatto lgatto closed this as completed in 7924063 Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants