-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new sets of id for reactions and metabolites #174
Comments
Is the plan to eventually assign an |
Yes @JonathanRob, this plan aims toward assigning every Given that the whole process may take some time, it might be better to begin with a simple implementation, just replacing |
Seeing that this change has been in (slow) progress for some time, I'm wondering when it would be a good time to make this change in the entire model. I think the merging of PR #203 would make it easier to implement the adoption of new identifiers. |
@mihai-sysbio honestly, myself look forward to a quick adaption to entire model. |
I can see some of the benefits of this transition, but I'm having a hard time spotting downsides besides the obvious backwards compatibility, which can be preserved by moving previous identifiers to the annotation. What other downsides are there? |
After discussing with @mihai-sysbio and @JonathanRob, the following consensus was reached regarding this transition:
Any comments/suggestions to this proposed naming convention are welcome. |
The letters |
@mihai-sysbio agree to take away underscore Now it's probably better to ignore the special naming for pseudoreactions. I'm not certain about the exact usage of compartment id, which can be implemented "slowly" than the other two. |
Due to the breaking of backwards compatibility from this transition, its implementation should be well-planned, i.e. probably associating with a major release. |
The current implementation of Here is a proposal to systemically assign an
|
@Hao-Chalmers From you comment above, I thought we had agreed to
|
@JonathanRob yes, the proposal was updated as agreed. |
Thinking about these IDs, they appear to be encouraging a cross-model approach (ie the same identifier in multiple models). However, this creates the difficulty of maintaining consistency of these IDs between models, which shouldn't be taken lightly. |
Specifically about the ID format, I would encourage the separation of the |
IMHO, the cross-model approach (using the same identifiers in multiple models) would actually facilitate in maintaining consistency between models. While the approach of creating IDs within the scope of a model does make the mapping a difficult task. |
It seems we've had a consensus on the MA rxn id format as |
I've had another look on identifiers.org at BiGG reactions, VMH reactions, MNX reactions and MetaCyc reactions. It seems okay, and very much in line with MNX, to follow the format |
@mihai-sysbio what would be the usage of separating an id into domains? |
To me, separating the domain space makes for an easier recognition of what the letters mean, especially down the line with |
The advantage of an additional dot is not so clear, because they are mainly recognised by code. In addition, this additional delimiter may cause unexpected effect since these ids could be processed by various packages that treat ids with different regular expression patterns ( |
The other option is to completely skip the |
it seems KEGG took this R00001 format already. |
Yes, to avoid confusion with other database IDs, I would keep the As for the period ( |
Dear all, Thanks for bringing this up, having a model-specific prefix to avoid confusion and conflicts between models built from same sources sounds like a very good idea. Regarding the special characters, since the "." is interpreted in regex and sh commands and can add confusion when present in a URI, I guess it might have more inconvenient side effects than _ or nothing. Sure, it can easily be escaped, but doesn't seem that necessary. The alphabetic characters prefix followed by digit only suffix might already be enough to separate both, if that is ever needed. |
Thank you all for the valuable input. For the looks I still prefer Let's go for The separation is meant increased legibility and to avoid potential confusion (if we imagine a contrived example of a website called |
Okay, we end up with the format of |
Similarly, I would propose |
It does appear to be reasonable to standardize MA met ids as such. To make a convenient transition from the previous HMR format, I would recommend to continue appending compartment id as suffix: |
I see that for reactions there is no such last letter
|
@mihai-sysbio having compartment id as suffix is a common practice for naming met ids, such as in BiGG (ATP). Another advantage is to retain the consistency and inheritance (at least to some extent) with HMR ids as below.
|
True! |
Implementation of MA met ids (cdd63f3):
@mihai-sysbio @JonathanRob what do you think about this plan? |
@mihai-sysbio I just wanted to comment on your note that compartment abbreviations are used in metabolite IDs but not reaction IDs. Ideally, the reaction and metabolite IDs should not be embedded with compartment information because it implicitly enforces a specific set of compartment abbreviations (i.e., "s" is extracellular, which differs from the COBRA style of using "e" for extracellular). In practice, however, it's more complicated. I think it makes sense to use different IDs for the same reaction in different compartments (rather than using the same base ID with different compartment abbreviations) because some of the reaction properties may differ. For example, the same reaction may be catalyzed by a different enzyme depending on the compartment, meaning that the gene-reaction rule will differ between the two cases. Similarly, the same reaction taking place in a different species will be catalyzed by a different enzyme (unless it's spontaneous). Finally, the environment of a different compartment (or species) may be such that the reaction reversibility is affected, so the associated lower and/or upper bounds would be affected. For a metabolite, however, it is the same compound with the same annotations, properties, and associations regardless of what compartment or species it's in. It may change its protonation state due to a pH difference, but then it's treated as a different metabolite with a different ID. So unlike a reaction, a metabolite can be treated as (effectively) identical everywhere it exists. In fact, a metabolite should have the same exact ID even in different compartments, but since GEM standards require that all metabolites in a GEM must have a unique ID, we are forced to differentiate them somehow. So this is my long way of saying that I'm not really happy with the idea of embedding the compartment abbreviation within the metabolite ID, but to me it still seems (very, very slightly) better than using a completely different ID for different compartment versions of a metabolite. |
Thank you both for the ideas and explanations. In Metabolic Atlas we do distinguish between a Metabolite and a CompartmentalizedMetabolite. The ID format suggested by @Hao-Chalmers would then map directly to the CompartmentalizedMetabolite. I'm still thinking of ways to associate the two. |
@JonathanRob nice comments that are very good considerations in identifier formatting, which has been and will be a long-term process. On the other hand, to have a short-term progress, my feeling is we've reached a consensus with the proposed plan. Am I right? @mihai-sysbio |
@Hao-Chalmers indeed! I was wondering if it would make sense to have another column in the |
@mihai-sysbio in |
@Hao-Chalmers is the plan to place these new IDs in:
|
for now, let's focus on adding |
I see, so the |
@Hao-Chalmers what are your thoughts on another column |
My thoughts are:
|
The requirement is for a person to link to all compartmentalized metabolites. This cannot be done at the moment, and it would need a metabolite id, similar to
I'm not sure what this means.
Parsing/regex on identifiers is not a good practice, even if it's easy.
Totally agree. Let's then implement this in 2 stages - only |
This means that the consensus to |
I believe this issue can be closed. |
Description of the issue:
Human-GEM
as a template for generic usagehumanGEMRxnAssoc.JSON
, without affecting the modelPlanned implementation:
rxnMAID
tohumanGEMRxnAssoc.JSON
reactions.tsv
rxnMAID
will be the same as those inrxns
, except that the prefixHMR_
will be systematically replaced withMAR
(abbreviation ofMetabolic Atlas reactions
)metMAID
tometabolites.tsv
I hereby confirm that I have:
master
branch of the repositoryThe text was updated successfully, but these errors were encountered: