Skip to content
David DeTomaso edited this page Mar 1, 2016 · 2 revisions

What is a Signature?

A Signature is a set of genes with some common characteristic or function. For example, the set of genes involved in the process of coagulation.

Alternately, signatures can describe changes in gene expression between two conditions. For example, a signature could describe "When comparing dendritic cells 4 hours post LPS stimulation, Genes A, B, C, D ... are up-regulated and Genes X, Y, Z ... are down-regulated."

Where can I find signatures?

A great resource for signatuers is the MSigDB, by the Broad Institute

By using the "Search Gene Sets" dialog, many signatures can be downloaded at once in the .gmt file format, which can be input directly into FastProject without modification.

Can I create my own signatures?

Yes, you can create your own signatures. See the section below for information on correctly formatting the signature file.

Acceptable Signature File Formats(file extension matters, see below)

List of gene signatures

Text, tab-delimited Genes should match the row labels in the input data matrix (matching is case insensitive)

Format A (File ends in .txt extension)

One gene per line (each signature has many lines)

<Signature Name> TAB <Signature Sign> TAB <Gene Name>

<Signature Sign> can be either "plus", "minus", or "both" if it's unsigned.

Alternately, you can just omit the second column and all genes will be treated as unsigned.

Example:

Lin-neg_cell_vs_NKT_cell plus TEK
Lin-neg_cell_vs_NKT_cell plus AGPAT5
Lin-neg_cell_vs_NKT_cell plus HSPA4L
Lin-neg_cell_vs_NKT_cell plus FAM126A
Lin-neg_cell_vs_NKT_cell minus UBR1
Lin-neg_cell_vs_NKT_cell minus CYSLTR2
Lin-neg_cell_vs_NKT_cell minus ZNF205
Lin-neg_cell_vs_NKT_cell minus UBXN11
B_cell_vs_CD4T_cell plus RAB39
B_cell_vs_CD4T_cell plus EVI2B
B_cell_vs_CD4T_cell plus GALNS
B_cell_vs_CD4T_cell plus CRIP3
B_cell_vs_CD4T_cell plus HES6
B_cell_vs_CD4T_cell plus HMGXB4
... ... ...

Format B (File ends in .gmt)

One signature per line, (however "plus" and "minus" genes are split into two lines) Each line:

<Signature Name> TAB <Signature Description> TAB <Gene1> TAB <Gene2> … (etc) 

The signature description is ignored by FastProject: that column just exists so the format conforms to the standard .gmt format.

To denote signature signs, use two lines to show the signature, with the "plus" genes in one and the "minus" genes in the other. Add "_plus" to the signature name on the line with the "plus" genes and "_minus" to the signature name on the line with the "minus" genes.

Example:

MEMORY_VS_NAIVE_CD8_TCELL_plus GSE16522P RHOC OFD1 MLF1 ...
MEMORY_VS_NAIVE_CD8_TCELL_minus GSE16522 PTPRK S100A5 IL1A ...
BCELL_VS_LUPUS_BCELL_plus GSE10325 OXT KCNH2 BTBD7 ...
BCELL_VS_LUPUS_BCELL_minus GSE10325 VAMP5 WSB2 CCR2 ...