-
Notifications
You must be signed in to change notification settings - Fork 23
SB Clean sequence
Remove all non-sequence characters from input. This will include any spaces, numbers, gap characters (e.g. '-'), stop characters (e.g. '*'), etc. Passing in the word 'strict' will also replace ambiguous/degenerate characters in nucleotide sequences with 'N'.
Nucleotide sequences: ATGCURYWSMKHBVDNX will be retained. If 'strict' is specified, only ATGCXNU will be retained.
Protein sequences: ACDEFGHIKLMNPQRSTVWXY will be retained. Using the 'strict' command has no effect.
Optional. By default, ambiguous nucleotide characters will be retained (i.e., the degenerate alphabet), but these can cause issues for some downstream analysis. Include the word 'strict' to replace ambiguous characters with a unified character ('N' by default).
Optional. If 'N' is not the desired replacement character for degenerate residues, specify a different one.
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFMFGSNISCIGF
EKLERNFVEEYCWTQGIYTSKAAYNMP-LHTPYPGIAPCVPEYDPVTQKYWLPCG----V
EEEDKAYHLWYQWVPFYFLAVAVGYYLPFLILKGSKLHQVKPLITYLMNQRNLETDPNHL
VGKLSHWIFRQLVYSRFAATSTIRMYWHDWGLVLLVCSVKILYLTVSLIHLFATAKMFHI
GNWFTYGIMFARR---SNSHTTHVKDVFFPKMVACKIETWSFTGKNHLHGMCVLALNVMN
QYLFLIVWYVNVIIIFLNSISCIYTIVKFCSPNIVHHRIVNSSSLDDHHDFTRMFGYVGP
SGRIILAKMSEHMPGYMLKQVAKKVTEKIDIENEKNRGRAPTIKFTKVNGQPSELARQPL
MHLNALMLGMVPQNLPEPKIQNIQRSQKKVRFLV*
>Mle-Panxα11
M--LISSLVQFSRLSPFKEITIDDGWDQLNRSFMFVLMVICGTIVTVRQHTGNIISCNGF
TKYDGSFSEDYCWTQGLYTIREAYHVSDVNVPYPGV---IPEEIPLCLGDNC---DKLAN
SNTTRVYHLWYQWIPFYFWLASAAFFLPYLIYKRYGFGDIKPLIHMLYNPLDGDEGVKAD
SEKASIWLYHRFS-IYMNEHSMYANFMERHGIGILVIAIKVMYLIISVLLMVMTAMMFEL
ADFKQYGIVWAQQWPDPPANVTGIKDLLFPKMVACEIKRWGPTGLEDENGMCVLAPNVIN
QYIFLILWWALVFTIVSNVFNVLAGVIRIVFIYGSYRRMLASAFLRDDPHYKKVYYKIGT
SGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDD----------------
------------PLL*-------------------
$: sb Mle-Panx_align.fa -cs
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFMFGSNISCIGF
EKLERNFVEEYCWTQGIYTSKAAYNMPLHTPYPGIAPCVPEYDPVTQKYWLPCGVEEEDK
AYHLWYQWVPFYFLAVAVGYYLPFLILKGSKLHQVKPLITYLMNQRNLETDPNHLVGKLS
HWIFRQLVYSRFAATSTIRMYWHDWGLVLLVCSVKILYLTVSLIHLFATAKMFHIGNWFT
YGIMFARRSNSHTTHVKDVFFPKMVACKIETWSFTGKNHLHGMCVLALNVMNQYLFLIVW
YVNVIIIFLNSISCIYTIVKFCSPNIVHHRIVNSSSLDDHHDFTRMFGYVGPSGRIILAK
MSEHMPGYMLKQVAKKVTEKIDIENEKNRGRAPTIKFTKVNGQPSELARQPLMHLNALML
GMVPQNLPEPKIQNIQRSQKKVRFLV
>Mle-Panxα11
MLISSLVQFSRLSPFKEITIDDGWDQLNRSFMFVLMVICGTIVTVRQHTGNIISCNGFTK
YDGSFSEDYCWTQGLYTIREAYHVSDVNVPYPGVIPEEIPLCLGDNCDKLANSNTTRVYH
LWYQWIPFYFWLASAAFFLPYLIYKRYGFGDIKPLIHMLYNPLDGDEGVKADSEKASIWL
YHRFSIYMNEHSMYANFMERHGIGILVIAIKVMYLIISVLLMVMTAMMFELADFKQYGIV
WAQQWPDPPANVTGIKDLLFPKMVACEIKRWGPTGLEDENGMCVLAPNVINQYIFLILWW
ALVFTIVSNVFNVLAGVIRIVFIYGSYRRMLASAFLRDDPHYKKVYYKIGTSGRVILNML
AASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
RRRRRRRRRRRRCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
YCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
WGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
SATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
MGTACGACGGTACGAGGTTAAAGT------------------CCAGACCCTGATCAGTTG
KTGTCACCGACGCGGATATCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTG
HCGGCTGCTGCCTTCTTCATGCCCTACCTTCTGTACA------TTGGCATGGGAGATATC
BAGCCTCTCGTGAG------ACACAATCCAGTAGAATCAGACCAGGAGTTAAAGAAGATG
VCAGACAAGGCTGCAACATGGCTGTTCTACAAGTTTGACCTGTACATGAGCGAACAGTCG
DTCCTAGCAAGTCTCACCAGAAAACACGGTCTTGGTCTATCCATGGTCTTTGTAAAGATC
NTATACGCCGCAGTGTCGTTCGGGTGTTTCCTCCTGACCGCTGAGATGTTCTCAATTGGA
XATTTTAAAACCTATGGATCAGAATGGATCAAGAAGTTAAAGTTGGAAGATAATCTAGCT
TAG---------------------------------------------------------
$: sb ambiguous_cds.fa -cs
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
RRRRRRRRRRRRCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
YCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
WGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
SATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
MGTACGACGGTACGAGGTTAAAGTCCAGACCCTGATCAGTTGKTGTCACCGACGCGGATA
TCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTGHCGGCTGCTGCCTTCTTC
ATGCCCTACCTTCTGTACATTGGCATGGGAGATATCBAGCCTCTCGTGAGACACAATCCA
GTAGAATCAGACCAGGAGTTAAAGAAGATGVCAGACAAGGCTGCAACATGGCTGTTCTAC
AAGTTTGACCTGTACATGAGCGAACAGTCGDTCCTAGCAAGTCTCACCAGAAAACACGGT
CTTGGTCTATCCATGGTCTTTGTAAAGATCNTATACGCCGCAGTGTCGTTCGGGTGTTTC
CTCCTGACCGCTGAGATGTTCTCAATTGGAXATTTTAAAACCTATGGATCAGAATGGATC
AAGAAGTTAAAGTTGGAAGATAATCTAGCTTAG
$: sb ambiguous_cds.fa -cs strict
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
NNNNNNNNNNNNCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
NCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
NGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
NATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
NGTACGACGGTACGAGGTTAAAGTCCAGACCCTGATCAGTTGNTGTCACCGACGCGGATA
TCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTGNCGGCTGCTGCCTTCTTC
ATGCCCTACCTTCTGTACATTGGCATGGGAGATATCNAGCCTCTCGTGAGACACAATCCA
GTAGAATCAGACCAGGAGTTAAAGAAGATGNCAGACAAGGCTGCAACATGGCTGTTCTAC
AAGTTTGACCTGTACATGAGCGAACAGTCGNTCCTAGCAAGTCTCACCAGAAAACACGGT
CTTGGTCTATCCATGGTCTTTGTAAAGATCNTATACGCCGCAGTGTCGTTCGGGTGTTTC
CTCCTGACCGCTGAGATGTTCTCAATTGGANATTTTAAAACCTATGGATCAGAATGGATC
AAGAAGTTAAAGTTGGAAGATAATCTAGCTTAG
$: sb ambiguous_cds.fa -cs strict X
>ML47742a.
ATGTTAGACATACTTTCAAAGTTTTGCTGAGTTACTCCTTTTAAAGGTATAACGATAGAT
XXXXXXXXXXXXCAACTCAATCGGAGTTTTATGTTCGTCCTGCTCGTTGTCATGGGAACG
XCTGTCACTGTCCGGCAATACACCGGCAGTGTCATCAGTTGTGACGGCTTCAAAAAGTTT
XGATCCACTTTTGCGGAGGATTACTGTTGTCCCCAGGGACTGTACACAGTTTTAGAAGGA
XATGAACCAGTCAGACTCAAGTTCCCTTACCCAGGCCTCCTTCCAGACGAGGCACCACCC
XGTACGACGGTACGAGGTTAAAGTCCAGACCCTGATCAGTTGXTGTCACCGACGCGGATA
TCCCACCTATGGTACCAGTGGGTCCCTTTTTACTTCTGGTTGXCGGCTGCTGCCTTCTTC
ATGCCCTACCTTCTGTACATTGGCATGGGAGATATCXAGCCTCTCGTGAGACACAATCCA
GTAGAATCAGACCAGGAGTTAAAGAAGATGXCAGACAAGGCTGCAACATGGCTGTTCTAC
AAGTTTGACCTGTACATGAGCGAACAGTCGXTCCTAGCAAGTCTCACCAGAAAACACGGT
CTTGGTCTATCCATGGTCTTTGTAAAGATCXTATACGCCGCAGTGTCGTTCGGGTGTTTC
CTCCTGACCGCTGAGATGTTCTCAATTGGAXATTTTAAAACCTATGGATCAGAATGGATC
AAGAAGTTAAAGTTGGAAGATAATCTAGCTTAG