-
Notifications
You must be signed in to change notification settings - Fork 23
AB Hash IDs
Rename all sequence IDs to a hash string selected randomly from the ascii_letters set.
A hash-map table is written to stderr above the alignments which are written to stdout (silenced with the -q flag). Note that every sequence gets a unique hash, even if the same original ID is used in multiple alignments; the order of the hash-map will match the order that sequences appear in the output
For developers; an attribute is appended to the SeqBuddy object named hash_map
. It is an OrderedDict(), of the form {hash: original_id}
Optional. Specify the length of the new hash strings IDs (default = 10). If the number of possible hashes is smaller than twice the total number of sequences, then a warning will be printed to stderr and the hash length will be increased automatically until it meets this criteria.
3 62
Bfo-Panxα1 DPHYKKVYYKIGTSGRVILNVLASSISPACFQEIMNNVCPRLIRAHVSRKGRNLGDDPNL--
Hca-Panxα1 --HYKKVYYKIGTSGRVILNVIASSIAPSAFQEIMNNVCPRLIRTHVSKKGRNLIDDPDLIS
Mle-Panxα1 DPHYKKVYYKIGTSGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL--
3 68
Bfo-Panxα4 -----EIIQVMTDNTNPLFFSKIFNELTNLLIETSSDQAGKVVENLAMQG-DEDTIVDLDTSSSRT--
Hca-Panxα4 -------LQVLMANTHPVIFTRIFDELTFRLVTKASMD-CEAVKNLQAEGQIGETAIDLEPNLGKAVG
Mle-Panxα4 GAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKN--ADVIENLSKL---DSSVIELGSKDSI---
3 61
Bfo-Panxα8 GDSKLKYIYFNCGTTGRTYLHLIAKNINPRIFEQLIIKLKNDLVEEKNKQHLKQTK-EMPV
Hca-Panxα8 -DNKLKYIYFNCGTTGRTYLHLIANNVNPRVFEQLVIRLSKDLVEEKNKAHLKKAEGEANV
Mle-Panxα8 ENSKLKFIYFNCGTTGRTYLHLIAKNVNPRIFEQLIIKLSADLVEEKNKQHLKGSK-DILV
$: alb Panx_C-term.physr -hi
# Hash table
4CKTDKOBu3,Bfo-Panxα1
KXz9xKCs46,Hca-Panxα1
3XQEvSkBZo,Mle-Panxα1
ru9eV9aFaW,Bfo-Panxα4
tP1nsbNt35,Hca-Panxα4
wSKIW6vQpX,Mle-Panxα4
JnCuUwvDHe,Bfo-Panxα8
1PKGTIaOsD,Hca-Panxα8
x4B8BdeTRW,Mle-Panxα8
3 62
4CKTDKOBu3 DPHYKKVYYKIGTSGRVILNVLASSISPACFQEIMNNVCPRLIRAHVSRKGRNLGDDPNL--
KXz9xKCs46 --HYKKVYYKIGTSGRVILNVIASSIAPSAFQEIMNNVCPRLIRTHVSKKGRNLIDDPDLIS
3XQEvSkBZo DPHYKKVYYKIGTSGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL--
3 68
ru9eV9aFaW -----EIIQVMTDNTNPLFFSKIFNELTNLLIETSSDQAGKVVENLAMQG-DEDTIVDLDTSSSRT--
tP1nsbNt35 -------LQVLMANTHPVIFTRIFDELTFRLVTKASMD-CEAVKNLQAEGQIGETAIDLEPNLGKAVG
wSKIW6vQpX GAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKN--ADVIENLSKL---DSSVIELGSKDSI---
3 61
JnCuUwvDHe GDSKLKYIYFNCGTTGRTYLHLIAKNINPRIFEQLIIKLKNDLVEEKNKQHLKQTK-EMPV
1PKGTIaOsD -DNKLKYIYFNCGTTGRTYLHLIANNVNPRVFEQLVIRLSKDLVEEKNKAHLKKAEGEANV
x4B8BdeTRW ENSKLKFIYFNCGTTGRTYLHLIAKNVNPRIFEQLIIKLSADLVEEKNKQHLKGSK-DILV
$: alb Panx_C-term.physr Panx_C-term.physr Panx_C-term.physr -hi 1
Warning: The hash_length parameter was passed in with the value 1. This is too small to properly cover all sequences, so it has been increased to 2.
# Hash table
bV,Bfo-Panxα1
R6,Hca-Panxα1
A6,Mle-Panxα1
........
3 62
bV DPHYKKVYYKIGTSGRVILNVLASSISPACFQEIMNNVCPRLIRAHVSRKGRNLGDDPNL--
R6 --HYKKVYYKIGTSGRVILNVIASSIAPSAFQEIMNNVCPRLIRTHVSKKGRNLIDDPDLIS
A6 DPHYKKVYYKIGTSGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL--
3 68
BV -----EIIQVMTDNTNPLFFSKIFNELTNLLIETSSDQAGKVVENLAMQG-DEDTIVDLDTSSSRT--
1x -------LQVLMANTHPVIFTRIFDELTFRLVTKASMD-CEAVKNLQAEGQIGETAIDLEPNLGKAVG
Ly GAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKN--ADVIENLSKL---DSSVIELGSKDSI---
3 61
WA GDSKLKYIYFNCGTTGRTYLHLIAKNINPRIFEQLIIKLKNDLVEEKNKQHLKQTK-EMPV
SY -DNKLKYIYFNCGTTGRTYLHLIANNVNPRVFEQLVIRLSKDLVEEKNKAHLKKAEGEANV
ya ENSKLKFIYFNCGTTGRTYLHLIAKNVNPRIFEQLIIKLSADLVEEKNKQHLKGSK-DILV........