feat: new sets of id for reactions and metabolites #174

haowang-bioinfo · 2020-06-01T19:13:41Z

Description of the issue:

Here a new set of reaction id is proposed, to promote the application of Human-GEM as a template for generic usage
This new set of rxn ids will be added as a new field to the annotation file humanGEMRxnAssoc.JSON, without affecting the model

Planned implementation:

Append a new field rxnMAID to ~~humanGEMRxnAssoc.JSON~~reactions.tsv
The element ids in rxnMAID will be the same as those in rxns, except that the prefix HMR_ will be systematically replaced with MAR (abbreviation of Metabolic Atlas reactions)
Append a new field metMAID to metabolites.tsv

I hereby confirm that I have:

Done this analysis in the master branch of the repository
Checked that a similar issue does not exist already

The text was updated successfully, but these errors were encountered:

JonathanRob · 2020-06-02T06:54:46Z

Is the plan to eventually assign an MA_ ID for non-HMR_ reactions as well?

haowang-bioinfo · 2020-06-02T07:10:32Z

Yes @JonathanRob, this plan aims toward assigning every rxns id with a unique MA_ identifier eventually.

Given that the whole process may take some time, it might be better to begin with a simple implementation, just replacing HMR_ with MA_ at first.

mihai-sysbio · 2020-11-10T15:07:28Z

Seeing that this change has been in (slow) progress for some time, I'm wondering when it would be a good time to make this change in the entire model. I think the merging of PR #203 would make it easier to implement the adoption of new identifiers.
Another question outside the scope of this issue is if metabolites should also use MA_ identifiers.

haowang-bioinfo · 2020-11-10T15:10:22Z

@mihai-sysbio honestly, myself look forward to a quick adaption to entire model.

mihai-sysbio · 2020-11-11T15:47:35Z

I can see some of the benefits of this transition, but I'm having a hard time spotting downsides besides the obvious backwards compatibility, which can be preserved by moving previous identifiers to the annotation. What other downsides are there?

haowang-bioinfo · 2020-11-25T09:12:07Z

After discussing with @mihai-sysbio and @JonathanRob, the following consensus was reached regarding this transition:

Expand the scope of MA ids from reactions to metabolites, and probably also compartments
The MA id is formatted as concatenation of two parts: prefix letters and suffix digits
The prefix are: MAR for reactions, MAM for metabolites, MAC for compartments
The suffix digits are numbers, with fixed width, in sorted (or random?) order

Any comments/suggestions to this proposed naming convention are welcome.

mihai-sysbio · 2020-11-25T09:16:57Z

The letters M and R are in line with the BiGG recommendations, which is good. I think it's preferable to not have any _ in the identifier though. Should we ignore the special naming scheme for pseudoreactions?
Are there any thoughts of the compartmentalized metabolites, which is the pairing of a compartment and a metabolite?

haowang-bioinfo · 2020-11-25T09:24:20Z

@mihai-sysbio agree to take away underscore _.

Now it's probably better to ignore the special naming for pseudoreactions.

I'm not certain about the exact usage of compartment id, which can be implemented "slowly" than the other two.

haowang-bioinfo · 2020-12-16T08:45:11Z

Due to the breaking of backwards compatibility from this transition, its implementation should be well-planned, i.e. probably associating with a major release.

haowang-bioinfo · 2021-02-06T09:03:18Z

The current implementation of rxnMAID is incomplete - not all reactions have such a id. This caused confusion during the transition and alredy impeded the downstream development in Metabolic Atlas.

Here is a proposal to systemically assign an rxnMAID for each reaction in Human-GEM:

Format change: from MA_DDDDD to MARDDDDD, where D represents one digit
Retain the existing ids and follow the new format (replacing underscore with R)
Add values to empty ones by picking ids in the range from MAR00001 to MAR09999
An easy implementation is to make an orderly filling and avoid duplication

JonathanRob · 2021-02-06T09:39:26Z

@Hao-Chalmers From you comment above, I thought we had agreed to MARDDDDD for reactions?

The prefix are: MAR for reactions, MAM for metabolites, MAC for compartments

haowang-bioinfo · 2021-02-06T09:42:47Z

@JonathanRob yes, the proposal was updated as agreed.

mihai-sysbio · 2021-02-06T10:01:51Z

Thinking about these IDs, they appear to be encouraging a cross-model approach (ie the same identifier in multiple models). However, this creates the difficulty of maintaining consistency of these IDs between models, which shouldn't be taken lightly.
Another approach would be to create IDs within the scope of a model (ie MA:Human-GEM.R0001). The downside here is that mapping across models is more difficult.
I have some thoughts on how we can ensure the consistency so that we can pursue the first option with confidence, but I would value a discussion with you before starting anything.

mihai-sysbio · 2021-02-06T10:04:28Z

Specifically about the ID format, I would encourage the separation of the MA part from the rest of the ID. Also, having looked around on identifiers.org, the _ separator seems a bit uncommon. My preference at the moment is a ., which would result in something like MA.R00001.

haowang-bioinfo · 2021-02-13T09:51:56Z

IMHO, the cross-model approach (using the same identifiers in multiple models) would actually facilitate in maintaining consistency between models. While the approach of creating IDs within the scope of a model does make the mapping a difficult task.

haowang-bioinfo · 2021-02-18T10:55:40Z

It seems we've had a consensus on the MA rxn id format as MARDDDDD. I plan to implement as such. @mihai-sysbio @JonathanRob what do you think?

mihai-sysbio · 2021-02-22T08:59:27Z

I've had another look on identifiers.org at BiGG reactions, VMH reactions, MNX reactions and MetaCyc reactions. It seems okay, and very much in line with MNX, to follow the format MARDDDDD proposed above by @Hao-Chalmers, even though my preference is still to separate the domain space like MA.RDDDDD.

haowang-bioinfo · 2021-02-22T09:41:32Z

@mihai-sysbio what would be the usage of separating an id into domains?

mihai-sysbio · 2021-02-22T09:46:57Z

To me, separating the domain space makes for an easier recognition of what the letters mean, especially down the line with MAM for metabolites and who knows what else (say MAG for gene reaction rules or MAC for complexes).

haowang-bioinfo · 2021-02-22T10:39:39Z

The advantage of an additional dot is not so clear, because they are mainly recognised by code. In addition, this additional delimiter may cause unexpected effect since these ids could be processed by various packages that treat ids with different regular expression patterns (. is a special character).

mihai-sysbio · 2021-02-22T14:55:44Z

The other option is to completely skip the MA letters in the identifier name, since identifiers.org has a dedicated prefix. The result would be something like https://identifiers.org/metabolic.atlas:R00001 and other website would refer to it as Metabolic Atlas ID: R00001.

haowang-bioinfo · 2021-02-22T15:06:51Z

it seems KEGG took this R00001 format already.

JonathanRob · 2021-02-22T17:01:13Z

Yes, to avoid confusion with other database IDs, I would keep the MA prefix within the identifier.

As for the period (MA.R vs. MAR), I think that it makes it more clear by separating the MA prefix from the rest of the identifier, but to be honest I don't have a strong preference either way. I don't believe it should cause parsing issues to have the . present, since many ID types already include special characters.

cfrainay · 2021-02-23T10:50:53Z

Dear all,

Thanks for bringing this up, having a model-specific prefix to avoid confusion and conflicts between models built from same sources sounds like a very good idea.

Regarding the special characters, since the "." is interpreted in regex and sh commands and can add confusion when present in a URI, I guess it might have more inconvenient side effects than _ or nothing.

Sure, it can easily be escaped, but doesn't seem that necessary. The alphabetic characters prefix followed by digit only suffix might already be enough to separate both, if that is ever needed.

mihai-sysbio · 2021-02-28T20:40:43Z

Thank you all for the valuable input. For the looks I still prefer . but even though, like @JonathanRob said, with proper escaping it shouldn't be a problem, let's remove the possibility of situations as described by @cfrainay.

Let's go for MA-R0001.

The separation is meant increased legibility and to avoid potential confusion (if we imagine a contrived example of a website called Metabolic Atlas Reactants then R in a reaction with id MAR0001 can have double meaning: is it Reactants or reactions?).

haowang-bioinfo · 2021-03-02T12:59:19Z

Okay, we end up with the format of MA-RDDDDD, where D represents a digit.

mihai-sysbio · 2021-04-06T09:43:19Z

Similarly, I would propose MA-MDDDDD for the metabolite IDs.

haowang-bioinfo · 2021-04-06T10:24:42Z

It does appear to be reasonable to standardize MA met ids as such.

To make a convenient transition from the previous HMR format, I would recommend to continue appending compartment id as suffix: MA-MDDDDDC, where C represent the abbreviation of corresponding compartment. @mihai-sysbio @JonathanRob what do you think?

mihai-sysbio · 2021-04-06T10:47:42Z

appending compartment id as suffix: MA-MDDDDDC, where C represent the abbreviation of corresponding compartment

I see that for reactions there is no such last letter C for the implemented reaction IDs:

rxns	rxnKEGGID	rxnBiGGID	rxnEHMNID	rxnHepatoNET1ID	rxnREACTOMEID	rxnRecon3DID	rxnMetaNetXID	rxnHMR2ID	rxnRatconID	rxnTCDBID	spontaneous	rxnMAID
"FAOXC101C8m"	""	"FAOXC101C8m"	""	""	""	"FAOXC101C8m"	""	""	""	""	0	"MA-R04967"
"FAOXC101C8x"	""	"FAOXC101C8x"	""	""	""	"FAOXC101C8x"	""	""	""	""	0	"MA-R04968"

haowang-bioinfo · 2021-04-06T12:50:06Z

@mihai-sysbio having compartment id as suffix is a common practice for naming met ids, such as in BiGG (ATP). Another advantage is to retain the consistency and inheritance (at least to some extent) with HMR ids as below.

rxns	rxnMAID
"HMR_1967"	"MA-R01967"

mets	metMAID
"m01967c"	"MA-M01967c"

mihai-sysbio · 2021-04-06T15:24:23Z

True!
The advantage of the other approach is to enable a section like Present in compartments as here https://identifiers.org/vmhmetabolite:h2o .

haowang-bioinfo · 2021-04-11T10:01:28Z

Implementation of MA met ids (cdd63f3):

Convert HMR ids to MA format as proposed (e.g. from m01967c to MA-M01967c).
Replace non-HMR ids with unused values in the range of MA-M00001 to MA-M09999, appending with compartment id.
Ensure an ordered filling and all ids are unique.

@mihai-sysbio @JonathanRob what do you think about this plan?

JonathanRob · 2021-04-11T14:18:37Z

@mihai-sysbio I just wanted to comment on your note that compartment abbreviations are used in metabolite IDs but not reaction IDs.

Ideally, the reaction and metabolite IDs should not be embedded with compartment information because it implicitly enforces a specific set of compartment abbreviations (i.e., "s" is extracellular, which differs from the COBRA style of using "e" for extracellular). In practice, however, it's more complicated.

I think it makes sense to use different IDs for the same reaction in different compartments (rather than using the same base ID with different compartment abbreviations) because some of the reaction properties may differ. For example, the same reaction may be catalyzed by a different enzyme depending on the compartment, meaning that the gene-reaction rule will differ between the two cases. Similarly, the same reaction taking place in a different species will be catalyzed by a different enzyme (unless it's spontaneous). Finally, the environment of a different compartment (or species) may be such that the reaction reversibility is affected, so the associated lower and/or upper bounds would be affected.

For a metabolite, however, it is the same compound with the same annotations, properties, and associations regardless of what compartment or species it's in. It may change its protonation state due to a pH difference, but then it's treated as a different metabolite with a different ID. So unlike a reaction, a metabolite can be treated as (effectively) identical everywhere it exists. In fact, a metabolite should have the same exact ID even in different compartments, but since GEM standards require that all metabolites in a GEM must have a unique ID, we are forced to differentiate them somehow.

So this is my long way of saying that I'm not really happy with the idea of embedding the compartment abbreviation within the metabolite ID, but to me it still seems (very, very slightly) better than using a completely different ID for different compartment versions of a metabolite.

mihai-sysbio · 2021-04-11T14:45:01Z

Thank you both for the ideas and explanations.

In Metabolic Atlas we do distinguish between a Metabolite and a CompartmentalizedMetabolite. The ID format suggested by @Hao-Chalmers would then map directly to the CompartmentalizedMetabolite. I'm still thinking of ways to associate the two.

haowang-bioinfo · 2021-04-12T06:53:56Z

@JonathanRob nice comments that are very good considerations in identifier formatting, which has been and will be a long-term process.

On the other hand, to have a short-term progress, my feeling is we've reached a consensus with the proposed plan. Am I right? @mihai-sysbio

mihai-sysbio · 2021-04-12T07:00:12Z

a consensus with proposed plan

@Hao-Chalmers indeed!

I was wondering if it would make sense to have another column in the metabolites.tsv that maps to the Metabolite, so that each CompartmentalizedMetabolite row is obviously mapped to the Metabolite. In our case, the compartmentalized ID will be with the compartment letter as you said @Hao-Chalmers (even though it is non-standard as @JonathanRob mentioned), and the non-compartmentalized one will simply be missing that letter. It looks really redundant, but it's also very clear and would require no custom regex parsing.

haowang-bioinfo · 2021-04-12T07:07:52Z

if it would make sense to have another column with non-compartmentalized ids that will simply be missing that compartment letter

@mihai-sysbio in metabolites.tsv there is an existing column "metsNoComp" for serving this purpose.

mihai-sysbio · 2021-04-12T08:12:32Z

@Hao-Chalmers is the plan to place these new IDs in:

mets and metsNoComp columns, or
just the metMAID column, or
metMAID and metNoCompMAID columns?

haowang-bioinfo · 2021-04-12T08:22:09Z

for now, let's focus on adding metMAID column first.

mihai-sysbio · 2021-04-12T08:26:28Z

I see, so the metsNoComp then isn't directly useful for the purpose of mapping non-compartmentalized MA ids.

mihai-sysbio · 2021-04-14T08:23:19Z

@Hao-Chalmers what are your thoughts on another column metsNoCompMAID?

haowang-bioinfo · 2021-04-14T08:36:26Z

@Hao-Chalmers what are your thoughts on another column metsNoCompMAID?

My thoughts are:

avoid implementations for uncertain requirements (Occam's Razor)
prioritize to certain things
metsNoCompMAID is super easy to obtain (just trimming the last character)
metsNoCompMAIDwould end up as identical column to metsNoComp once MA ids are used

mihai-sysbio · 2021-04-14T08:49:07Z

avoid implementations for uncertain requirements (Occam's Razor)

The requirement is for a person to link to all compartmentalized metabolites. This cannot be done at the moment, and it would need a metabolite id, similar to metsNoComp.

prioritize to certain things

I'm not sure what this means.

metsNoCompMAID is super easy to obtain (just trimming the last character)

Parsing/regex on identifiers is not a good practice, even if it's easy.

metsNoCompMAID would end up as identical column to metsNoComp once MA ids are used

Totally agree. Let's then implement this in 2 stages - only mets now, and automatically obtain metsNoComp later.

haowang-bioinfo · 2021-04-14T08:53:45Z

prioritize to certain things

I'm not sure what this means.

This means that the consensus to metMAID is certain and can be prioritized.

mihai-sysbio · 2021-04-20T09:29:12Z

I believe this issue can be closed.

haowang-bioinfo added the discussion label Jun 1, 2020

haowang-bioinfo self-assigned this Jun 1, 2020

haowang-bioinfo added the enhancement label Jun 2, 2020

haowang-bioinfo mentioned this issue Jun 2, 2020

feat: add rxnMAIDfield to JSON file #175

Merged

2 tasks

haowang-bioinfo mentioned this issue Jun 11, 2020

Human 1.4 #180

Merged

haowang-bioinfo added the wip label Jul 29, 2020

JonathanRob mentioned this issue Dec 17, 2020

Fix/duplicated components #213

Merged

2 tasks

haowang-bioinfo mentioned this issue Feb 22, 2021

feat: complete MAR ids by filling blank fields #234

Merged

1 task

haowang-bioinfo closed this as completed Mar 3, 2021

haowang-bioinfo reopened this Mar 30, 2021

haowang-bioinfo changed the title ~~feat: a new set of id for reactions~~ feat: new sets of id for reactions and metabolites Mar 30, 2021

haowang-bioinfo mentioned this issue Apr 12, 2021

feat: add MA met ids to metabolites.tsv #245

Merged

2 tasks

haowang-bioinfo mentioned this issue Apr 16, 2021

Human 1.7 #247

Merged

haowang-bioinfo closed this as completed Apr 20, 2021

haowang-bioinfo mentioned this issue Jun 11, 2021

feat: replace HMR ids with MA ids in Human-GEM #265

Closed

2 tasks

feat: new sets of id for reactions and metabolites #174

feat: new sets of id for reactions and metabolites #174

Comments

haowang-bioinfo commented Jun 1, 2020 • edited Loading

Description of the issue:

Planned implementation:

JonathanRob commented Jun 2, 2020

haowang-bioinfo commented Jun 2, 2020 • edited Loading

mihai-sysbio commented Nov 10, 2020

haowang-bioinfo commented Nov 10, 2020

mihai-sysbio commented Nov 11, 2020

haowang-bioinfo commented Nov 25, 2020 • edited Loading

mihai-sysbio commented Nov 25, 2020 • edited Loading

haowang-bioinfo commented Nov 25, 2020 • edited Loading

haowang-bioinfo commented Dec 16, 2020

haowang-bioinfo commented Feb 6, 2021 • edited Loading

JonathanRob commented Feb 6, 2021

haowang-bioinfo commented Feb 6, 2021

mihai-sysbio commented Feb 6, 2021 • edited Loading

mihai-sysbio commented Feb 6, 2021

haowang-bioinfo commented Feb 13, 2021 • edited Loading

haowang-bioinfo commented Feb 18, 2021

mihai-sysbio commented Feb 22, 2021

haowang-bioinfo commented Feb 22, 2021

mihai-sysbio commented Feb 22, 2021

haowang-bioinfo commented Feb 22, 2021 • edited Loading

mihai-sysbio commented Feb 22, 2021

haowang-bioinfo commented Feb 22, 2021

JonathanRob commented Feb 22, 2021

cfrainay commented Feb 23, 2021

mihai-sysbio commented Feb 28, 2021 • edited Loading

haowang-bioinfo commented Mar 2, 2021 • edited Loading

mihai-sysbio commented Apr 6, 2021

haowang-bioinfo commented Apr 6, 2021

mihai-sysbio commented Apr 6, 2021

haowang-bioinfo commented Apr 6, 2021 • edited Loading

mihai-sysbio commented Apr 6, 2021

haowang-bioinfo commented Apr 11, 2021 • edited Loading

JonathanRob commented Apr 11, 2021

mihai-sysbio commented Apr 11, 2021 • edited Loading

haowang-bioinfo commented Apr 12, 2021 • edited Loading

mihai-sysbio commented Apr 12, 2021

haowang-bioinfo commented Apr 12, 2021 • edited Loading

mihai-sysbio commented Apr 12, 2021

haowang-bioinfo commented Apr 12, 2021 • edited Loading

mihai-sysbio commented Apr 12, 2021

mihai-sysbio commented Apr 14, 2021

haowang-bioinfo commented Apr 14, 2021

mihai-sysbio commented Apr 14, 2021

haowang-bioinfo commented Apr 14, 2021

mihai-sysbio commented Apr 20, 2021

haowang-bioinfo commented Jun 1, 2020 •

edited

Loading

haowang-bioinfo commented Jun 2, 2020 •

edited

Loading

haowang-bioinfo commented Nov 25, 2020 •

edited

Loading

mihai-sysbio commented Nov 25, 2020 •

edited

Loading

haowang-bioinfo commented Nov 25, 2020 •

edited

Loading

haowang-bioinfo commented Feb 6, 2021 •

edited

Loading

mihai-sysbio commented Feb 6, 2021 •

edited

Loading

haowang-bioinfo commented Feb 13, 2021 •

edited

Loading

haowang-bioinfo commented Feb 22, 2021 •

edited

Loading

mihai-sysbio commented Feb 28, 2021 •

edited

Loading

haowang-bioinfo commented Mar 2, 2021 •

edited

Loading

haowang-bioinfo commented Apr 6, 2021 •

edited

Loading

haowang-bioinfo commented Apr 11, 2021 •

edited

Loading

mihai-sysbio commented Apr 11, 2021 •

edited

Loading

haowang-bioinfo commented Apr 12, 2021 •

edited

Loading

haowang-bioinfo commented Apr 12, 2021 •

edited

Loading

haowang-bioinfo commented Apr 12, 2021 •

edited

Loading