Lab session 2

This lab is mainly focussed on Regular expression and MEME

It is important to learn basic BASH (mainly usage of grep, awk, sed ) for efficient use of Regular expression

task 1 : Quickly watch this video in 1.5x and understand How shell scripting works, how to write functions and loops

To learn bash go through this video (3 hrs) which starts from basic bash Bash scripting Video link

While watching the video mainly focus on

task 2 : Examples for regex patterns

To learn Regular expressions grep, awk, sed grep, awk and sed video link

To learn specialised REGEX pattern examples Video

From the above videos you learnt

BASH scripting
Regex patterns

Now we will apply the knowledge gained in Life science

task 3 : Understanding the importance of grep with regex using bioinformatics examples link

task 4 : Understanding the importance of sed with regex using bioinformatics examples link

task 5 : Understanding the importance of awk with regex using bioinformatics examples link

After understanding above grep, sed, awk examples, visit this cheat sheet Regex cheet sheet Check wether you are familier with all the

Anchors
Character class
Quantifiers
Escape characters
String replacements
Groups and ranges

Practice questions

Search for a restriction digestion enzyme, pick its restriction digestion site sequence, find how many times the restriction enzyme sites are present in the E.coli genome EcoRI- GAATTC

Hint

Download Ecoli genome in fasta format here
search using grep

Regular expressions for biologists
Assume a biologist come to you with a file of >1000 coding sequences of a prokaryote, asks you to pick ORF region for each gene. How do you pick the ORF sites

File consists of sequences like

>lcl|LR794089.1_cds_CAB3563250.1_1 [gene=mutS] [protein=Methyl-directed mismatch repair] [frame=2] [protein_id=CAB3563250.1] [location=<1..>498] [gbkey=CDS]
CGCCATCCGGTGGTTGAACAGGTACTGAACGAGCCATTTATCGCCAACCCGCTGAACCTGTCGCCGCAGC
GTCGCATGTTGATCATTACCGGTCCGAATATGGGCGGTAAAAGTACCTATATGCGCCAGACCGCACTGAT
TTGTTTGCTACCCATTATTTCGAGCTGACCCAGTTACCGGAGAAAATGGAAGGCGTGGCTAACGTGCATC
TCGATGC

>lcl|LR794088.1_cds_CAB3563248.1_1 [gene=mutS] [protein=Methyl-directed mismatch repair] [frame=2] [protein_id=CAB3563248.1] [location=<1..>498] [gbkey=CDS]
CGCCATCCGGTAGTTGAACAAGTACTGAATGAGCCATTTATCGCTAACCCGCTGAATCTGTCGCCGCAGC
GCCGTATGTTGATCATCACCGGTCCGAACATGGGCGGTAAAAGTACCTATATGCGCCAGACCGCGTTGAT
CTGTTTGCCACCCACTATTTCGAGCTGACACAGTTACCGGAGAAAATGGAAGGCGTCGCCAACGTGCATC
TCGATGC

Hint

First linerarize the fasta which means print header in one line, then print sequence in one line

>lcl|LR794089........
CGCCATCCGGTGGTTGAA......
>lcl|LR794088.1_.......
CGCCATCCGGTAGTT.........

using the logic the coding sequence starts with ATG and ends with stop codon TAA or TAG or TGA. try too pick the lines beteen start codon and stop codon using grep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lab_2.md

Lab_2.md

Lab session 2

This lab is mainly focussed on Regular expression and MEME

task 1 : Quickly watch this video in 1.5x and understand How shell scripting works, how to write functions and loops

task 2 : Examples for regex patterns

task 3 : Understanding the importance of grep with regex using bioinformatics examples link

task 4 : Understanding the importance of sed with regex using bioinformatics examples link

task 5 : Understanding the importance of awk with regex using bioinformatics examples link

Practice questions

Hint

Hint

Files

Lab_2.md

Latest commit

History

Lab_2.md

File metadata and controls

Lab session 2

This lab is mainly focussed on Regular expression and MEME

task 1 : Quickly watch this video in 1.5x and understand How shell scripting works, how to write functions and loops

task 2 : Examples for regex patterns

task 3 : Understanding the importance of grep with regex using bioinformatics examples link

task 4 : Understanding the importance of sed with regex using bioinformatics examples link

task 5 : Understanding the importance of awk with regex using bioinformatics examples link

Practice questions

Hint

Hint