-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First version of pLDDT and LUR module #11
Merged
Changes from 7 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
4578c43
Add cli option for plddt and LUR module
bordin89 3e13468
Add minimum length for LUR as constant
bordin89 63f5699
Add plddt summary writer
bordin89 d20661d
Add pLDDTSummary as dataclass
bordin89 8878e56
Remove DEFAULTs for sse lengths, move to models
bordin89 367cd32
Add pLDDT and LUR module from CIF files to summary
bordin89 e822950
move domain_length out of loop to solve average
bordin89 1b2c5f2
Add LUR residues and total_residues
bordin89 d96a511
Add gzip open to deal with gzipped cifs
bordin89 86816d4
Add unit tests for pLDDT and LUR summaries.
bordin89 2c2b577
add LUR summary as model, correct tests
sillitoe f974aeb
Change segment_plddt to chopping_plddt as length
bordin89 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
from pathlib import Path | ||
from Bio.PDB import MMCIF2Dict | ||
import logging | ||
import click | ||
from cath_alphaflow.io_utils import ( | ||
yield_first_col, | ||
get_plddt_summary_writer, | ||
) | ||
from cath_alphaflow.models import pLDDTSummary | ||
from cath_alphaflow.constants import MIN_LENGTH_LUR | ||
|
||
LOG = logging.getLogger() | ||
|
||
|
||
@click.command() | ||
@click.option( | ||
"--cif_in_dir", | ||
type=click.Path(exists=True, file_okay=False, dir_okay=True, resolve_path=True), | ||
required=True, | ||
help="Input: directory of CIF files", | ||
) | ||
@click.option( | ||
"--id_file", | ||
type=click.File("rt"), | ||
required=True, | ||
help="Input: CSV file containing list of ids to process from CIF to pLDDT", | ||
) | ||
@click.option( | ||
"--plddt_stats_file", | ||
type=click.File("wt"), | ||
required=True, | ||
help="Output: pLDDT and LUR output file", | ||
) | ||
@click.option( | ||
"--cif_suffix", | ||
type=str, | ||
default=".cif", | ||
help="Option: suffix to use for mmCIF files (default: .cif)", | ||
) | ||
def convert_cif_to_plddt_summary( | ||
cif_in_dir, | ||
id_file, | ||
plddt_stats_file, | ||
cif_suffix, | ||
): | ||
"Creates summary of secondary structure elements (SSEs) from DSSP files" | ||
|
||
plddt_out_writer = get_plddt_summary_writer(plddt_stats_file) | ||
|
||
for file_stub in yield_first_col(id_file): | ||
cif_path = Path(cif_in_dir) / f"{file_stub}{cif_suffix}" | ||
if not cif_path.exists(): | ||
msg = f"failed to locate CIF input file {cif_path}" | ||
LOG.error(msg) | ||
raise FileNotFoundError(msg) | ||
|
||
avg_plddt = get_average_plddt_from_plddt_string(cif_path) | ||
perc_LUR = get_LUR_residues_percentage(cif_path) | ||
plddt_stats = pLDDTSummary( | ||
af_domain_id=file_stub, avg_plddt=avg_plddt, perc_LUR=perc_LUR | ||
) | ||
plddt_out_writer.writerow(plddt_stats.__dict__) | ||
|
||
click.echo("DONE") | ||
|
||
|
||
def get_average_plddt_from_plddt_string( | ||
cif_path: Path, *, chopping=None, acc_id=None | ||
) -> pLDDTSummary: | ||
if acc_id is None: | ||
acc_id = cif_path.stem | ||
mmcif_dict = MMCIF2Dict.MMCIF2Dict(cif_path) | ||
chain_plddt = mmcif_dict["_ma_qa_metric_global.metric_value"][0] | ||
plddt_string = mmcif_dict["_ma_qa_metric_local.metric_value"] | ||
segment_plddt = "" | ||
if chopping: | ||
for segment in chopping.segments: | ||
segment_plddt += plddt_string[(segment.start - 1) : segment.end] | ||
domain_length = len(segment_plddt) | ||
average_plddt = round((sum(segment_plddt) / domain_length) * 100, 2) | ||
|
||
else: | ||
average_plddt = chain_plddt | ||
return average_plddt | ||
|
||
|
||
def get_LUR_residues_percentage( | ||
cif_path: Path, *, chopping=None, acc_id=None | ||
) -> pLDDTSummary: | ||
if acc_id is None: | ||
acc_id = cif_path.stem | ||
mmcif_dict = MMCIF2Dict.MMCIF2Dict(cif_path) | ||
plddt_string = mmcif_dict["_ma_qa_metric_local.metric_value"] | ||
segment_plddt = "" | ||
if chopping: | ||
for segment in chopping.segments: | ||
segment_plddt += plddt_string[(segment.start - 1) : segment.end] | ||
else: | ||
segment_plddt = plddt_string | ||
# Calculate LUR | ||
LUR_perc = 0 | ||
LUR_total = 0 | ||
LUR_res = 0 | ||
LUR_stretch = False | ||
min_res_lur = MIN_LENGTH_LUR | ||
for residue in segment_plddt: | ||
plddt_res = float(residue) | ||
if plddt_res < 90: | ||
LUR_res += 1 | ||
if LUR_stretch == True: | ||
LUR_total += 1 | ||
|
||
if LUR_res == min_res_lur and LUR_stretch == False: | ||
LUR_stretch = True | ||
LUR_total += min_res_lur | ||
|
||
else: | ||
LUR_stretch = False | ||
LUR_res = 0 | ||
LUR_perc = round(LUR_total / len(segment_plddt) * 100, 2) | ||
|
||
return LUR_perc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,4 @@ | |
DEFAULT_DSSP_SUFFIX = ".dssp" | ||
DEFAULT_HELIX_MIN_LENGTH = 3 | ||
DEFAULT_STRAND_MIN_LENGTH = 2 | ||
MIN_LENGTH_LUR = 5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be initialised as an array (of
float
) that gets append to in the following loop?