-
Notifications
You must be signed in to change notification settings - Fork 3
File Handling
This script take MULTI-FASTA file as input and write all the Accession Number(s) in a new file (accession_no.txt)
$ python extract_accession_no.py <Multi_FASTA_File>
This script take (Multi)Fasta file as input and write the Sequence Header(s) in a new file (fasta_headers.txt)
$ python extract_fasta_headers.py <Multi_FASTA_File>
This script extracts Fasta-records from Multi-Fasta file whose Accession-No(s) are in Accession-Ids file
$ python extract_fasta_records.py <Multi_FASTA_File> <Accession_IDs_File>
This script extract Fasta-record from Multi-Fasta file whose Accession-No is inputted by the user and write the record in a new file (NC_XXXXXX.fasta)
$ python fasta_record_finder.py <Multi_FASTA_File>
This script merge all the files with (.fasta) extension and create a new file (multi_fasta)
$ python fasta_concatenator.py
This script split multi fasta file into individual fasta file(s)
$ python multi_fasta_deconcatenator.py <Multi_FASTA_File>
This script compare two files and return the elements present in one file but not in other
$ python file_comparison.py -f1 <File_1> -f2 <File_2>
This script download all the files whose ftp addresses are listed in ftpfilepaths file
$ python ftp_download.py <ftpfilepaths>
This script takes multi fasta file with gene sequences and concatenate them according to the accession id (as shown below)
Multi-Fasta file [ INPUT ]
>ECO_1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>ECO_2
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>ECO_3
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
>SAL_1
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
>SAL_2
GCGCGCGGGCGCGCGCGCGCGCGCGCGCGCGC
>SAL_3
TATATTATATATTATATATTTATATAATAATA
concatenated_seq.fasta file [ OUTPUT ]
>ECO
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
>SAL
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GCGCGCGGGCGCGCGCGCGCGCGCGCGCGCGC
TATATTATATATTATATATTTATATAATAATA
$ python seq_concatenator.py <Multi-Fasta>
Program to extract nucleotide or protein sequence of particular index (e.g. 200...300) from a Fasta file
$ python extract_seq.py <file.fasta>
Compare two bed files for sequence overlaps
$ python compare_bed.py file1.bed file2.bed
Download all the files whose IDs are listed in gdc_manifest file (downloaded from TCGA GDC portal)
$ python gdc_download.py <gdc_manifest.txt>
Convert Multiple Sequence Alignment (MSA) file in Clustal Omega format to FASTA format
$ python clustal_to_fasta.py <file.clustal_num> <file.fasta>
Convert Multiple Sequence Alignment (MSA) file in Clustal Omega format to .tsv format
$ python clustal_to_tsv.py <file.clustal_num>
Feed sequence data as hash into MySQL database using python connector
$ python fasta2db_feed.py <sequence.fasta>
Feed a hash into MySQL database using python connector
$ python mysqldb_find.py
Convert sequences in FASTQ format to FASTA format
$ python fastq2fasta.py <seq.fastq>
$ python fasta2fastq.py -f <sequence.fasta> -l <read_length> -x <coverage> -o <sequence.fastq>