-
Notifications
You must be signed in to change notification settings - Fork 23
AB Clean sequence
Remove all non-alignment characters from input. This will include any spaces, numbers, stop characters (e.g. '*'), etc., but not dashed gap characters ('-'). Passing in the word 'strict' will also replace ambiguous/degenerate characters in nucleotide sequences with 'N'.
Nucleotide sequences: ATGCURYWSMKHBVDNX will be retained. If 'strict' is specified, only ATGCXNU will be retained.
Protein sequences: ACDEFGHIKLMNPQRSTVWXY will be retained. Using the 'strict' command has no effect.
Optional. By default, ambiguous nucleotide characters will be retained (i.e., the degenerate alphabet), but these can cause issues for some downstream analysis. Include the word 'strict' to replace ambiguous characters with a unified character ('N' by default).
Optional. If 'N' is not the desired replacement character for degenerate residues, specify a different one.
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFMFGSNISCIGF
EKLERNFVEEYCWTQGIYTSKAAYNMP-LHTPYPGIAPCVPEYDPVTQKYWLPCG----V
EEEDKAYHLWYQWVPFYFLAVAVGYYLPFLILKGSKLHQVKPLITYLMNQRNLETDPNHL
VGKLSHWIFRQLVYSRFAATSTIRMYWHDWGLVLLVCSVKILYLTVSLIHLFATAKMFHI
GNWFTYGIMFARR---SNSHTTHVKDVFFPKMVACKIETWSFTGKNHLHGMCVLALNVMN
QYLFLIVWYVNVIIIFLNSISCIYTIVKFCSPNIVHHRIVNSSSLDDHHDFTRMFGYVGP
SGRIILAKMSEHMPGYMLKQVAKKVTEKIDIENEKNRGRAPTIKFTKVNGQPSELARQPL
MHLNALMLGMVPQNLPEPKIQNIQRSQKKVRFLV*
>Mle-Panxα11
M--LISSLVQFSRLSPFKEITIDDGWDQLNRSFMFVLMVICGTIVTVRQHTGNIISCNGF
TKYDGSFSEDYCWTQGLYTIREAYHVSDVNVPYPGV---IPEEIPLCLGDNC---DKLAN
SNTTRVYHLWYQWIPFYFWLASAAFFLPYLIYKRYGFGDIKPLIHMLYNPLDGDEGVKAD
SEKASIWLYHRFS-IYMNEHSMYANFMERHGIGILVIAIKVMYLIISVLLMVMTAMMFEL
ADFKQYGIVWAQQWPDPPANVTGIKDLLFPKMVACEIKRWGPTGLEDENGMCVLAPNVIN
QYIFLILWWALVFTIVSNVFNVLAGVIRIVFIYGSYRRMLASAFLRDDPHYKKVYYKIGT
SGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDD----------------
------------PLL*-------------------
Convert protein stop characters into gaps
$: alb Mle-Panx_align.fa -cs
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFMFGSNISCIGF
EKLERNFVEEYCWTQGIYTSKAAYNMP-LHTPYPGIAPCVPEYDPVTQKYWLPCG----V
EEEDKAYHLWYQWVPFYFLAVAVGYYLPFLILKGSKLHQVKPLITYLMNQRNLETDPNHL
VGKLSHWIFRQLVYSRFAATSTIRMYWHDWGLVLLVCSVKILYLTVSLIHLFATAKMFHI
GNWFTYGIMFARR---SNSHTTHVKDVFFPKMVACKIETWSFTGKNHLHGMCVLALNVMN
QYLFLIVWYVNVIIIFLNSISCIYTIVKFCSPNIVHHRIVNSSSLDDHHDFTRMFGYVGP
SGRIILAKMSEHMPGYMLKQVAKKVTEKIDIENEKNRGRAPTIKFTKVNGQPSELARQPL
MHLNALMLGMVPQNLPEPKIQNIQRSQKKVRFLV-
>Mle-Panxα11
M--LISSLVQFSRLSPFKEITIDDGWDQLNRSFMFVLMVICGTIVTVRQHTGNIISCNGF
TKYDGSFSEDYCWTQGLYTIREAYHVSDVNVPYPGV---IPEEIPLCLGDNC---DKLAN
SNTTRVYHLWYQWIPFYFWLASAAFFLPYLIYKRYGFGDIKPLIHMLYNPLDGDEGVKAD
SEKASIWLYHRFS-IYMNEHSMYANFMERHGIGILVIAIKVMYLIISVLLMVMTAMMFEL
ADFKQYGIVWAQQWPDPPANVTGIKDLLFPKMVACEIKRWGPTGLEDENGMCVLAPNVIN
QYIFLILWWALVFTIVSNVFNVLAGVIRIVFIYGSYRRMLASAFLRDDPHYKKVYYKIGT
SGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDD----------------
------------PLL--------------------
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
RRRRRRRRRRRRCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
YCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
WGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
SATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
MGTACGACGGTACGAGGTTAAAGT------------------CCAGACCCTGATCAGTTG
KTGTCACCGACGCGGATATCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTG
HCGGCTGCTGCCTTCTTCATGCCCTACCTTCTGTACA------TTGGCATGGGAGATATC
BAGCCTCTCGTGAG------ACACAATCCAGTAGAATCAGACCAGGAGTTAAAGAAGATG
VCAGACAAGGCTGCAACATGGCTGTTCTACAAGTTTGACCTGTACATGAGCGAACAGTCG
DTCCTAGCAAGTCTCACCAGAAAACACGGTCTTGGTCTATCCATGGTCTTTGTAAAGATC
NTATACGCCGCAGTGTCGTTCGGGTGTTTCCTCCTGACCGCTGAGATGTTCTCAATTGGA
XATTTTAAAACCTATGGATCAGAATGGATCAAGAAGTTAAAGTTGGAAGATAATCTAGCT
TAG---------------------------------------------------------
Restrict alignment characters to the unambiguous character set and 'N'
$: alb ambiguous_cds.fa -cs strict
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
NNNNNNNNNNNNCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
NCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
NGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
NATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
NGTACGACGGTACGAGGTTAAAGT------------------CCAGACCCTGATCAGTTG
NTGTCACCGACGCGGATATCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTG
NCGGCTGCTGCCTTCTTCATGCCCTACCTTCTGTACA------TTGGCATGGGAGATATC
NAGCCTCTCGTGAG------ACACAATCCAGTAGAATCAGACCAGGAGTTAAAGAAGATG
NCAGACAAGGCTGCAACATGGCTGTTCTACAAGTTTGACCTGTACATGAGCGAACAGTCG
NTCCTAGCAAGTCTCACCAGAAAACACGGTCTTGGTCTATCCATGGTCTTTGTAAAGATC
NTATACGCCGCAGTGTCGTTCGGGTGTTTCCTCCTGACCGCTGAGATGTTCTCAATTGGA
NATTTTAAAACCTATGGATCAGAATGGATCAAGAAGTTAAAGTTGGAAGATAATCTAGCT
TAG---------------------------------------------------------
Replace ambiguous characters with 'X' instead of 'N'
$: alb ambiguous_cds.fa -cs strict X
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
XXXXXXXXXXXXCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
XCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
XGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
XATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
XGTACGACGGTACGAGGTTAAAGT------------------CCAGACCCTGATCAGTTG
XTGTCACCGACGCGGATATCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTG
XCGGCTGCTGCCTTCTTCATGCCCTACCTTCTGTACA------TTGGCATGGGAGATATC
XAGCCTCTCGTGAG------ACACAATCCAGTAGAATCAGACCAGGAGTTAAAGAAGATG
XCAGACAAGGCTGCAACATGGCTGTTCTACAAGTTTGACCTGTACATGAGCGAACAGTCG
XTCCTAGCAAGTCTCACCAGAAAACACGGTCTTGGTCTATCCATGGTCTTTGTAAAGATC
XTATACGCCGCAGTGTCGTTCGGGTGTTTCCTCCTGACCGCTGAGATGTTCTCAATTGGA
XATTTTAAAACCTATGGATCAGAATGGATCAAGAAGTTAAAGTTGGAAGATAATCTAGCT
TAG---------------------------------------------------------