Skip to content
solo7773 edited this page Oct 26, 2017 · 12 revisions

visnormsc user manual

Description

Version: 17.0.1

Create: August 4, 2017

Update: August 18, 2017

Author: Nan Zhou

Maintain: Nan Zhou

Contents

Introduction

visnormsc is a graphical user interface (GUI) for normalization of single-cell RNA sequencing (RNA-seq) data. It was developed using python so it is a cross-platform GUI program for main operating systems including WindowsTM, GNU/Linux, and macOSTM.

Install Python

The easiest way to install Python and essential dependencies visnormsc requires is to install the latest version of Anaconda for the platform you are using. Installation of Anaconda on WindowsTM, GNU/Linux, and macOSTM will be documented bellow.

WindowsTM

  1. Download the graphical installer of the latest Anaconda distribution of Python >= 3.5 for WindowsTM
  2. Install Anaconda on Windows
  3. At the last few steps, be sure to tick something like
    • Register Anaconda as my default Python 3.6

GNU/Linux

  1. Download the GNU/Linux installer of the latest Anaconda distribution of Python >= 3.5
  2. Install Anaconda on GNU/Linux
  3. If Anaconda was installed in ~/anaconda3 then you could use the default to way to run visnormsc

macOSTM

  1. Download the graphical installer of the latest Anaconda distribution of Python >= 3.5 for macOS
  2. Install Anaconda on macOS
  3. If Anaconda was installed in ~/anaconda3 then you could use the default to way to run visnormsc

Run visnormsc

If the default way couldn't work for you, please use the custom way as described below.

Download or clone visnormsc from the GitHub page to a local space.

Run on WindowsTM

  1. The default way
    1. Go to the directory of visnormsc, i.e., "/path/to/visnormsc/" in Windows Explorer
    2. Go the the sub-directory "bin" in the "visnormsc" directory
    3. Double click "onWindows.bat" to run visnormsc
  2. The custom way
    1. Open the CMD prompt using "Start -> All Programs -> Accessories -> Command Prompt"
    2. Type the command /path/to/python /path/to/visnormsc/visnormscGUI.py to run

Run on GNU/Linux and macOSTM

  1. The default way
    1. Open Terminal
    2. Go to the directory of visnormsc using command cd /path/to/visnormsc
    3. Run command bash onLinuxAndmacOS.sh
  2. The custom way
    1. Open Terminal
    2. Type the command /path/to/python /path/to/visnormsc/visnormscGUI.py to run

Use visnormsc

Prepare input data

The input data should be in csv (comma separated values) format. CSV files can be edited and saved using flat text editor, Microsoft Office ExcelTM, LibreOffice Calc and other spreadsheet softwares.

  • Single-cell RNA-seq data
    • A csv file. The first row shows column names and the first column shows row names. Each column represents a cell and each row represents a gene. Other values can be regarded as a G-by-S matrix, where G (should be > 100) is the number of genes and S is the number of single cells. This matrix should contain estimates of gene expression. Counts of this nature may be obtained from RSEM, HTSeq, Cufflinks, Salmon or a similar approach.
    • A_example_input_data.csv of a 7-by-5 value matrix:
  Cell_1 Cell_2 Cell_3 Cell_4 Cell_5
Gene_1 9 2 16 0 4
Gene_2 4.98 2.99 2.28 0 3.2
Gene_3 0 0 0 0 0
Gene_4 4 11 1 1 2
Gene_5 82 65 110 308.52 71
Gene_6 0 3.72 4.53 0 0
Gene_7 9 0 0 0 0
  • Cell condition data
    • It is also a csv file but only includes a single un-named column. It shows what condition each cell in the input data belongs to. Each row is a reflection of a cell in the input data (e.g. column names in the example above). Generally the definition of condition will be obvious given the experimental setup.
    • A_example_condition_file.csv corresponding to A_example_input_data.csv:
cond1
cond1
cond2
cond3
cond3

         where Cell_1 and Cell_2 belong to cond1, Cell_3 belongs to cond2, Cell_4 and Cell_5 belong to cond3.

Check count-depth relationship in RNA-seq data

Parameters for the Check operation are:
Data: A single-cell RNA-seq data file in csv format.
Normalized data: Default NO. If YES, the input data should have been normalized either by visnormsc or other methods.
Conditions: A csv file. Conditions of cells in the data file.
Tau: The quantile for quantile regression. 0 < float < 1
Filter cell proportion: The proportion of non-zero expression estimates required to include the genes into the evaluation. 0 <= float <= 1
Filter expression: Exclude genes having median of non-zero expression below this threshold. A real number.
Number of expression grups: Split the RNA-seq data into this number of equally sized groups. An integer > 0
NCores: Number of CPU cores to be used. None or a integer > 0

Normalize single-cell RNA-seq data

Parameters for the Normalize operation are:
Data: A single-cell RNA-seq data file in csv format.
Conditions: A csv file. Conditions of cells in the data file.
Proportion of genes: The proportion of genes closest to the slope mode used for the group fitting. 0 < float < 1
Tau: The quantile for quantile regression. 0 < float < 1
Filter cell number: The number of non-zero expression values required to include the genes into model fitting. An integer > 0
K: The number of gene groups in cells of the same condition. Default is None, which means visnormsc will automatically find the best value from 1. K can also be set by user in form of condition name: integer > 0. The condition names should match those given in the condition file. Taking the aforementioned example RNA-seq data and conditions as an example here, K can be set using cond1: 10, cond2: 6, cond3: 8, which means cond1 has 10 groups, cond2 has 6 groups and cond3 has 8 groups.
Save evaluation plots: Save figures of evaluating K.
NCores: Number of CPU cores to be used, None or a integer > 0
Filter expression: Genes having median of non-zero expression below this threshold will be excluded from the model fitting. A real number.
Thresh: A threshold used for evaluating K. A real number.

Demo

The demo data is in "/path/to/visnormsc/test/testData". Simulated single-cell RNA-seq (exampleData.csv) and cells' conditions (exampleDataConditions.csv). Real single-cell RNA-seq (scH1data.csv) and cells' conditions (scH1dataConditions.csv). These data were from Bacher and colleagues study (Bacher et al., 2017).

The demo for simulated data can be successfully reimplemented by selecting data file exampleData.csv and condition file exampleDataConditions.csv, and keeping other settings unchanged.

The demo for real data can be successfully reimplemented by selecting data file scH1data.csv and condition file scH1dataConditions.csv, and keeping other settings unchanged.

References

Bacher, Rhonda, et al. "SCnorm: robust normalization of single-cell RNA-seq data." nature methods 14.6 (2017): 584-586.

Clone this wiki locally