Skip to content

Commit

Permalink
fix: rxnGeneMat is consistent with standardized grRules for all cases
Browse files Browse the repository at this point in the history
  • Loading branch information
IVANDOMENZAIN committed Mar 29, 2018
1 parent a24209e commit 8d6a909
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 36 deletions.
47 changes: 29 additions & 18 deletions core/standardizeGeneRules.m → core/standardizeGrRules.m
Original file line number Diff line number Diff line change
@@ -1,34 +1,44 @@
function newModel = standardizeGeneRules(model)
% standardizeGeneRules
% Standardizes gene-rxn rules in a model and modifies rxnGeneMat for
% providing consistency with the grRules field
function [grRules,rxnGeneMat] = standardizeGrRules(model)
% standardizeGrRules
% Standardizes gene-rxn rules in a model according to the following
% - No overall containing brackets
% - Just enzyme complexes are enclosed into brackets
% - ' and ' & ' or ' strings are strictly set to lowercases
%
% A rxnGeneMat matrix consistent with the standardized grRules is created
%
% model a model structure
%
% newModel an updated model structure
% grRules [nRxns x 1] cell array with the standardized grRules
% rxnGeneMat [nRxns x nGenes]Sparse matrix consitent with the
% standardized grRules
%
% Usage: newModel = standardizeGeneRules(model)
% Usage: [grRules,rxnGeneMat] = standardizeGrRules(model)
%
% Ivan Domenzain, 2018-03-28
% Ivan Domenzain, 2018-03-29
%
newModel = model;
% Preallocate rxnGeneMat
[~,n] = size(model.S);
[g,~] = size(model.genes);
RGMat = sparse(n,g);

% Preallocate fields
[~,n] = size(model.S);
[g,~] = size(model.genes);
rxnGeneMat = sparse(n,g);
grRules = cell(n,1);

if isfield(model,'grRules')
% Search logical errors in the grRules field
findLogicalErrors(model)
findLogicalErrors(model)

for i=1:length(model.grRules)
originalSTR = model.grRules{i};
grRules{i} = originalSTR;
newSTR = [];
% Non-empty grRules are splitted in all their different isoenzymes
genesSets = getSimpleGeneSets(originalSTR);

if ~isempty(genesSets)
for j=1:length(genesSets)
simpleSet = genesSets{j};
RGMat = modifyRxnGeneMat(simpleSet,model.genes,RGMat,i);
rxnGeneMat = modifyRxnGeneMat(simpleSet,model.genes,rxnGeneMat,i);
% Enclose simpleSet in brackets
if length(genesSets)>1
if ~isempty(strfind(simpleSet,' and '))
Expand All @@ -39,17 +49,18 @@
% isoenzymes)
if j<length(genesSets)
newSTR = [newSTR, simpleSet, ' or '];
% Add the last simpleSet
% Add the last simpleSet
else
newSTR = [newSTR, simpleSet];
end

end
newModel.grRules{i} = newSTR;
grRules{i} = newSTR;
end

end
end
newModel.rxnGeneMat = RGMat;

end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% function that gets a cell array with all the simple geneSets in a given
Expand Down Expand Up @@ -101,4 +112,4 @@ function findLogicalErrors(model)
end
error('Logical errors found on grRules')
end
end
end
32 changes: 14 additions & 18 deletions struct_conversion/ravenCobraWrapper.m
Original file line number Diff line number Diff line change
Expand Up @@ -157,12 +157,15 @@
% It seems that grRules, rxnGeneMat and rev are disposable fields in COBRA
% version, but we export them to make things faster, when converting
% COBRA structure back to RAVEN;
if isfield(model,'grRules')
newModel.grRules=model.grRules;
end;
if isfield(model,'rxnGeneMat')
newModel.rxnGeneMat=model.rxnGeneMat;
end;
if isfield(model,'grRules')
[grRules, rxnGeneMat] = standardizeGrRules(model);
newModel.grRules = grRules;
%Incorporate a rxnGeneMat consistent with standardized grRules
newModel.rxnGeneMat = rxnGeneMat;
end
newModel.rev=model.rev;
else
fprintf('Converting COBRA structure to RAVEN..\n');
Expand Down Expand Up @@ -229,15 +232,14 @@
newModel.rxnNames=model.rxnNames;
end;
if isfield(model,'grRules')
newModel.grRules=model.grRules;
[grRules,rxnGeneMat] = standardizeGrRules(model);
newModel.grRules = grRules;
newModel.rxnGeneMat = rxnGeneMat;
else
model.grRules=rulesTogrrules(model);
newModel.grRules=model.grRules;
end;
if isfield(model,'rxnGeneMat')
newModel.rxnGeneMat=model.rxnGeneMat;
elseif isfield(model,'grRules')
newModel.rxnGeneMat=getRxnGeneMat(model);
model.grRules = rulesTogrrules(model);
[grRules,rxnGeneMat] = standardizeGrRules(model);
newModel.grRules = grRules;
newModel.rxnGeneMat = rxnGeneMat;
end;
if isfield(model,'subSystems')
newModel.subSystems=model.subSystems;
Expand Down Expand Up @@ -410,10 +412,4 @@
grRules = strrep(grRules,' )',')');
grRules = regexprep(grRules,'^(',''); %rules that start with a "("
grRules = regexprep(grRules,')$',''); %rules that end with a ")"
end

function rxnGeneMat=getRxnGeneMat(model)
%Check gene association for each reaction and populate rxnGeneMat
modelTemp = standardizeGeneRules(model);
rxnGeneMat = modelTemp.rxnGeneMat;
end
end

7 comments on commit 8d6a909

@edkerk
Copy link
Member

@edkerk edkerk commented on 8d6a909 Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IVANDOMENZAIN I'm not sure what findLogicalErrors exactly does (the comment is not correct). I have a model that gives errors, but I'm not sure why.

@IVANDOMENZAIN
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edkerk, I rechecked it now and it is not properly commented. This function shows an error when it finds the pattern ) and ( in a given grRule. After discussing it with @BenjaSanchez we decided to display an error because the presence of this pattern may lead to ambiguous or wrong grRules. It may be present on the rule for indicating either a complex of complexes (in this case internal brackets are redundant) or a complex of isoenzymes which would be a complicated case and it would require a manual revision by the user/modeller.

But thanks for pointing this out, it should be modified, either by giving a much more descriptive error, or a warning instead because the function is now called by some others in the RAVEN toolbox. We want to give the user the chance to analyze this manually for avoiding weird modifications to the grRule by the standardization algorithm, do you have any suggestion?

@edkerk
Copy link
Member

@edkerk edkerk commented on 8d6a909 Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then it will always give an error if there is an and relationship, and you can never run any function that includes standardizeGrRules if your model has complexes? That seems counter productive...

First, I thought that your approach of making a logical structure was purposely written for being able to deal with and relationships? Please clarify this, otherwise it's unclear to me what's the novelty?

Now I'm not 100% certain anymore on what the point of this function is, but if it always gives error messages when there is an and relationship, maybe the function should have a silent option. Users should be motivated to first run standardizeGrRules stand-alone on their model, to make sure that it complies. When doing this, you'd want to throw useful output, perhaps also instructions on what should be checked if an 'error' occurs. Then, when standardizeGrRules is run as part of other functions it should maybe be silent and not complain about and relationship, but rather assume that those have been fixed already?

@IVANDOMENZAIN
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the function is to standardize the strings of the grRules avoiding unbalanced and redundant brackets and the mixture of upper and lowercase characters in the and & or that may cause problems with other functions or toolboxes that use this field as an input (such as GECKO). We had this idea because after having dealt with different models constructed by different people, I've found different errors, different formats, unbalanced brackets and ambiguity in the field grRules.

Regarding its functioning, it does not give an error when it simply finds the pattern "and" (any complex) rather it does this when it finds the pattern ") and (" this is a potential sign of error and the only justified reason for this would be the presence of a complex of isoenzymes, which due to its complexity would require some manual revision to be sure that the grRule is properly representing the gene association.

The function is able to deal with complexes and even with complexes of isoenzymes " ((G1 or G2) and (G2 or G3))" that are present in several models, however I thought that these require further attention because they are rare cases, so that's the reason of the error display. But I agree and the current state is unpractical for RAVEN. I will incorporate your suggestion of the silent option and block it when it is called from other internal functions and also to give useful and descriptive error or warning output when the option is on. Thanks for the feedback!

@haowang-bioinfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Put the silent option as default;
  2. Provide an example of recommended grRules in help message.

@edkerk
Copy link
Member

@edkerk edkerk commented on 8d6a909 Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IVANDOMENZAIN Thanks for the clarification, the confusion was about the difference of and and ) and (.

At the same time, making it completely silent might not be preferred, because maybe the user hasn't run standardizeGrRules on their model before, stand-alone. So what about the following:

  • If run stand-alone, give similar list as it gives now, + instructions on how to curate.
  • If run as part of function, if it encounters ) and ( relationships, make it give an output like ") and (" relationships are found in the model. Ensure that these are correct by running standardizeGrRules. If ") and (" relationships in grRules have been curated, then please ignore this message.

A problem with that approach would be if standardizeGrRules is repeatedly run, as the functions it contains might be repeatedly run as part of some other function (some reconstruction pipeline?).

Please let me know your thoughts :)

@IVANDOMENZAIN
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edkerk, that's a good solution. Ideally the function should be called just once in any reconstruction (this can be included as a suggestion in the header comments), however in RAVEN it may not be the case cause it was already incorporated in several functions and I'm not sure if it might appear several times in a given pipeline. If the output message is given as a final warning instead of as an error or a simple message then the problem is not so bad, if the user has already checked the pointed grRules manually the it would be a matter of ignoring the warning every time it shows up again.

Please sign in to comment.