The following is the stepwise procedure that to be used in working on the programming exercise shared.
- Create your project directory and name it
Project_dengue
making sure it has the necessary subdirectories needed for any bioinformatics project
mkdir -p Project_dengue/{raw_data/{compressed},processed_data,scripts}
- To put the downloaded zip file into the compressed subdirectory inthe raw data and unzipping.
- First move into
compressed
directory by
cd Project_dengue/raw_data/compressed/
- Then copy the file using
cp
command as bellow
cp /mnt/c/Users/micro/Desktop/Msc/lisso/dengue.zip .
- Exctract it by the
unzip
command which can be installed bysudo apt install unzip
when not installed. - Use
unzip
with-d
option when specificing the output directory.
unzip dengue.zip -d /home/genomics/Project_dengue/raw_data/
Use unzip
command with -l
option to get a summary of the zipped file without extracting it.
unzip -l dengue.zip
Output
Archive: ./dengue.zip
Length Date Time Name
--------- ---------- ----- ----
2795 2024-11-19 09:23 dengueseq1.fasta
8006 2024-11-19 09:23 dengueseq2.fasta
6476 2024-11-19 09:24 dengueseq3.fasta
7844 2024-11-19 09:24 dengueseq4.fasta
10936 2024-11-19 09:20 dengueseq5.fasta
--------- -------
36057 5 files
Use wc
with -l
to count lines only in each file with .fasta
extension in the raw_data
directory where extracted files were placed.
wc -l *.fasta
Output
42 dengueseq1.fasta
115 dengueseq2.fasta
94 dengueseq3.fasta
113 dengueseq4.fasta
157 dengueseq5.fasta
521 total
Use the cat
to combined all files and pipe |
the output to count all lines combined by wc -l
cat *.fasta | wc -l
Output
521
Use cat
command and redirect output to new the name.
cat *.fasta > dengue_merged.fasta
Use grep
command with flag -c
grep -c ">" dengue_merged.fasta
Output
5
Use grep
command with flag -c
grep -c ">" dengue_merged.fasta
Output
5
Use grep
command and redirect the of output to a new folder. You can use cat
to view contents of the new file dengue_headers.txt
grep ">" dengue_merged.fasta > dengue_headers.txt
Use both awk
to search for columns and pipe output to sed
to remove identfires.
awk -F '[>,]' '{print $2}' dengue_headers.txt | sed 's/^[^ ]* //' > viruses.txt
Use the awk
command to sort columns
awk -F '>' '{print $2}' dengue_headers.txt | awk '{print $1}' > identifiers.txt
Use grep -v
to invert the output or sed
with d
option to to delete all headers.
grep -v '^>' dengue_merged.fasta > dengue_seq.txt
or
sed '/^>/d' <dengue_merged.fasta > dengue_seq.txt
Use the tr
command to translate from upper to lower case.
tr '[:upper:]' '[:lower:]' < dengue_seq.txt > dengue_seq_lowercase.txt