SOC2020 - Sofia Robb

Sequencing of new genomes has become commonplace. In this episode of SOC, Sofia Robb will discuss available open source methods for sharing genome-scale data if it is not feasible to share it through standard databases such as Ensembl. She will also demonstrate helpful ways to mine Ensembl data with Biomart and useful UNIX command line tricks for sorting, searching, and reformatting text data files.

Talks

Genome Tools Talk
Career Path Talk

Hands on Workshop: Let's LARP

Live Action Role Playing

You work with chickens and have completed an RNAseq experiment. You have two conditions,

condition 1: g1 = 'h3.3a-/-, h3.3b-/-'
condition 2: g2 = 'wild type genotype'

You performed differential expression analysis, perhaps with cuffdiff.

Part 1: You are going to download your data.
Part 2: You are going to create up- and down-regualted gene lists.
Part 3: You are going to find out more information about what genes are in your gene lists.
Part 4: You are going to search your gene list for genes involved in your favorite processes.

Part 1: Get expression data

Let's get the expression data from Ensembl

Part 1 Tasks:

Go to EBI Expression Atlas
Select chicken
Check box to download the first experiment, "RNA-seq of H3.3 knockout and wild type chicken DT40 cells". If you get lost, directly download here
Click the download link at the top of the last column.
Navigate to E-MTAB-2754 directory
Checkout the contents of E-MTAB-2754-analytics.tsv

Contents of E-MTAB-2754-analytics.tsv:

$ head E-MTAB-2754-analytics.tsv
Gene ID	Gene Name	g1_g2.p-value	g1_g2.log2foldchange
ENSGALG00000000003	PANX2	0.100242375805959	-0.4
ENSGALG00000000011	C10orf88	0.0802046773105167	0.2
ENSGALG00000000038	CTRB2	NA	0.2
ENSGALG00000000044	WFIKKN1	NA	0
ENSGALG00000000048		0.288103121752422	0.4
ENSGALG00000000055	LAMTOR3	0.529728058895927	0.1
ENSGALG00000000059	TUBB3	0.228430079834946	-0.2
ENSGALG00000000067	SPR	0.0560358954256604	-0.4
ENSGALG00000000071		0.878861305389193	0

Part 2: Create lists of up- and down-regulated genes

What is it that people want to do usually with differential expression data?
They usually want to find the top up regulated genes and the top down regulated genes.

Let's do it!!

Where do we start?

Part 2 Tasks:

We want to make sure we are only looking at data points that are statically signifant, p-value > 0.001.
a. Sort expression file by p-value
b. Keep only the lines that have a p-value > 0.001.
Now let's find our most up- and down- regulated genes. Which means we need to sort the log2foldchage column (4th column)
a. Sort file by log2foldchange
b. Get the top 100 up/down-regulated genes
c. Get a list of all the genes with the most signifant changes
d. Do it a different way

Part 3: Find out more about your up- and down-regulated genes.

Now what are these genes?
We are going to mine gene info data from Ensembl BioMart. BioMart is a SUPER handy tool (if your organism is in Ensembl).

Ensembl has 6 different sites for different groups of organisms:
Ensembl (veterbrates)
Ensembl Plants
Ensembl Fungi
Ensembl Bacteria
Ensembl Metazoa

Let's find out more about chicken genes using Ensembl's BioMart tool.

Part 3 Tasks:

Retrieve the gene ID, gene name, gene description, and interporscan ID, short description, and description for every chicken gene. Need Help?
Find the gene information about out upregulated genes.
Find the gene information about out downregulated genes.

Part 4: Searching for Genes with GO terms

Are any of our up- or down-regualted genes involved in a process you are super interested in?

Of our most signficant up- and down-regualted genes, are any involved in stem cell proliferation (GO:0072089) or pigmenation (GO:0043473)?

Part 4 Tasks:

Get a list of genes involved in stem cell proliferation (GO:0072089). Need Help?
Are any of our up-regulated also in our list of genes involved in stem cell proliferation? Need Help?

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
biomart_get_gene_and_go_info		biomart_get_gene_and_go_info
biomart_get_gene_info		biomart_get_gene_info
gene_info_upregulated		gene_info_upregulated
significant_only		significant_only
sort_by_pvalue		sort_by_pvalue
sort_log2fold		sort_log2fold
talk		talk
.gitignore		.gitignore
E-MTAB-2754-analytics.tsv		E-MTAB-2754-analytics.tsv
E-MTAB-2754-configuration.xml		E-MTAB-2754-configuration.xml
E-MTAB-2754.condensed-sdrf.tsv		E-MTAB-2754.condensed-sdrf.tsv
E-MTAB-2754.idf.txt		E-MTAB-2754.idf.txt
README.md		README.md
dn-2.tsv		dn-2.tsv
dn-2ids.txt		dn-2ids.txt
logfold_sorted.tsv		logfold_sorted.tsv
mart_export.txt		mart_export.txt
pigmentation_mart_export.txt		pigmentation_mart_export.txt
proliferation_mart_export.txt		proliferation_mart_export.txt
pvalue_sorted.tsv		pvalue_sorted.tsv
pvalue_sorted_significant_only.tsv		pvalue_sorted_significant_only.tsv
top_100_dn_regulated.tsv		top_100_dn_regulated.tsv
top_100_up_regulated.tsv		top_100_up_regulated.tsv
up2.tsv		up2.tsv
up2ids.txt		up2ids.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOC2020 - Sofia Robb

Talks

Hands on Workshop: Let's LARP

Part 1: Get expression data

Part 2: Create lists of up- and down-regulated genes

Part 3: Find out more about your up- and down-regulated genes.

Part 4: Searching for Genes with GO terms

About

Releases

Packages

prog4biol/soc2020

Folders and files

Latest commit

History

Repository files navigation

SOC2020 - Sofia Robb

Talks

Hands on Workshop: Let's LARP

Part 1: Get expression data

Part 2: Create lists of up- and down-regulated genes

Part 3: Find out more about your up- and down-regulated genes.

Part 4: Searching for Genes with GO terms

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages