Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/duplicated components #213

Merged
merged 25 commits into from
Dec 17, 2020
Merged

Fix/duplicated components #213

merged 25 commits into from
Dec 17, 2020

Conversation

JonathanRob
Copy link
Collaborator

@JonathanRob JonathanRob commented Dec 1, 2020

Main improvements in this PR:

This PR primarily addresses Issue #184, which identified several metabolites with identical names except for a difference in capitalization.

Each set of duplicated metabolites was merged, and the duplicate entry (or entries) removed. Since the merger of each metabolite often resulted in one or more reactions becoming identical, all reactions in which the affected metabolites were involved were manually inspected to ensure proper metabolite replacement and removal of new duplicate reactions. Each duplicate metabolite was addressed in a separate commit, to clarify the changes to the GEM associated with merging the metabolite set. In addition, the several reaction and metabolite annotations (i.e., external identifiers) were updated during the process.

Additional changes:

The others reaction and metabolite was removed from Human-GEM (addresses Issue #197), as well as any metabolites that occurred solely in that reaction. This was a pool reaction originating from HMR2 that no longer served any foreseeable purpose, and was anyway unable to carry flux (dead-end).

The GEM annotation files (reactions.tsv, metabolites.tsv, and genes.tsv) were moved from the data/annotation/ subdirectory to the model/ directory for convenience (suggested in comments of PR #212). All functions/documentation relying on the location of these files was also updated.

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Selected devel as a target branch

@@ -451,7 +451,7 @@ mets metsNoComp metBiGGID metKEGGID metHMDBID metChEBIID metPubChemID metLipidMa
"m00242s" "m00242" "" "C11088" "" "" "" "" "" "" "M00242" "MNXM6026" "m00242s"
"m00243c" "m00243" "" "C06205" "" "CHEBI:28516" "" "" "" "" "M00243" "MNXM114239" "m00243c"
"m00244c" "m00244" "" "C14784" "" "CHEBI:34048" "" "" "" "" "M00244" "MNXM9529" "m00244c"
"m00245c" "m00245" "" "C15606" "" "CHEBI:49252" "" "" "" "" "M00245" "MNXM494" "m00245c"
"m00245c" "m00245" "dhmtp" "C15606" "" "CHEBI:49252" "" "" "" "" "M00245;dhmtp_c" "MNXM494" "m00245c"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find metabolite dhmtp_c in Recon3D.

@@ -999,7 +999,7 @@ mets metsNoComp metBiGGID metKEGGID metHMDBID metChEBIID metPubChemID metLipidMa
"m00602r" "m00602" "" "" "" "" "53481513" "" "CE3554" "" "CE3554" "MNXM35505" "m00602r"
"m00603c" "m00603" "" "C05497" "" "" "" "" "" "" "M00603" "MNXM3472" "m00603c"
"m00604c" "m00604" "" "C13713" "HMDB00879" "CHEBI:805752" "101771" "" "CE5072" "" "CE5072" "MNXM8277" "m00604c"
"m00605c" "m00605" "" "C05485" "" "CHEBI:28043" "" "LMST02030167" "" "" "M00605" "MNXM4305" "m00605c"
"m00605c" "m00605" "21hprgnlone" "C05485" "" "CHEBI:28043" "" "LMST02030167" "" "" "M00605;21hprgnlone_c" "MNXM735991" "m00605c"
Copy link
Member

@mihai-sysbio mihai-sysbio Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with this other Recon3D metabolite id. And I've seen the same on most of the other commits, but I haven't checked.

Copy link
Collaborator Author

@JonathanRob JonathanRob Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihai-sysbio This and the other ID are present in the Recon3D model (and "reconstruction"). The problem is that on VMH, they strip off the compartment abbreviation (_c). The original aim was to use these Recon3D fields to help maintain compatibility with the Recon3D GEM. However, since these Recon3D IDs are supposed to link to VMH, maybe we should consider stripping off the compartment abbreviation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. How probable is the scenario of one needing to map to Recon3D specifically, and not VMH? If the Recon3D ids are needed, I would suggest renaming the current column to VMH and adding a new one, that is not supposed to have a valid link.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better to remove the compartment abbreviations and have them properly link to VMH. If someone is interested in re-mapping to the Recon3D model, they could simply append the compartment abbreviation to all the "Recon3D" (VMH) IDs.

@JonathanRob
Copy link
Collaborator Author

Based on @mihai-sysbio's good suggestion, I have now removed the compartment abbreviations from all metRecon3DID identifiers in the metabolites.tsv annotation file. This should fix a lot of currently broken links to VMH.

@mihai-sysbio mihai-sysbio self-requested a review December 1, 2020 18:31
- Provide high confidence scores for reactions HMR_5387, HMR_5389, HMR_8798 that had been curated with new literature
@haowang-bioinfo
Copy link
Member

@JonathanRob very nice work! The fixing of duplicated metabolites also involves merging of duplicated reactions. In the following 4 cases:

maltohexaose: malthx was replaced with m02447s
maltopentaose: maltpt was replaced with m02449s
maltotetraose: maltttr was replaced with m02451
thromboxane B2: txb2 was replaced with m02995

To be consistent with above replacement, how about also renaming the corresponding exchange reactions as such:

from EX_malthx[e] to EX_m02447[e]
from EX_maltpt[e] to EX_m02449[e]
from EX_maltttr[e] to EX_m02451[e]
from EX_txb2[e] to EX_m02995c[e]

@JonathanRob
Copy link
Collaborator Author

Thank you, @Hao-Chalmers.

...how about also renaming the corresponding exchange reactions

Since we are anyway planning to do a complete re-naming of the reaction IDs (#174), I don't think it is worth changing the names of these reactions at this point. Re-naming them now will just disrupt backwards compatibility.

@haowang-bioinfo
Copy link
Member

ah yes, can't believe I forgot this.

@haowang-bioinfo haowang-bioinfo merged commit 1e07fc4 into devel Dec 17, 2020
@haowang-bioinfo haowang-bioinfo linked an issue Dec 17, 2020 that may be closed by this pull request
1 task
@JonathanRob JonathanRob mentioned this pull request Dec 18, 2020
3 tasks
@haowang-bioinfo haowang-bioinfo deleted the fix/duplicatedComponents branch January 22, 2021 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Identical metabolites with slightly different names
3 participants