-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from databio/dev
Development changes into master
- Loading branch information
Showing
20 changed files
with
522 additions
and
181 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
*.pyc | ||
.~lock* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Change log | ||
All notable changes to this project will be documented in this file. | ||
|
||
## [0.2.0] | ||
### Added | ||
- FRiP can now be calculated based on reference peaks | ||
- Pipeline now reports Picard estimated library size statistic | ||
- Added option for pyadapt trimming | ||
- Added example project using 'gold standard' data | ||
- Added new resource package grades | ||
- Added preliminary 'exact cuts' scripts, but they are not yet used | ||
|
||
### Changed | ||
- Improved README | ||
- Changed filename of the TSS file | ||
- Reorganized structure of alignment code | ||
|
||
## [0.1.0] | ||
### Added | ||
- First release of ATAC-seq pypiper pipeline |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
# using pre-fix of fastq file | ||
#python pipelines/ATACseq.py -P 3 -M 100 -O test_out -R -S liver -G mm9 -Q paired -C ATACseq.yaml -gs mm -I test_data/liver-CD31_test_R1.fastq.gz -I2 test_data/liver-CD31_test_R2.fastq.gz | ||
python pipelines/ATACseq.py -P 3 -M 100 -O test_out -R -S liver -G hg19 -Q paired -C ATACseq.yaml -gs mm -I test_data/liver-CD31_test_R1.fastq.gz -I2 test_data/liver-CD31_test_R2.fastq.gz | ||
# using pre-fix of fastq file | ||
#python pipelines/ATACseq.py -P 3 -M 100 -O test_out -R -S liver -G mm9 -Q paired -C ATACseq.yaml -gs mm -I test_data/liver-CD31_test_R1.fastq.gz -I2 test_data/liver-CD31_test_R2.fastq.gz | ||
python pipelines/ATACseq.py -P 3 -M 100 -O test_out -R -S liver -G hg19 -Q paired -C ATACseq.yaml -gs mm -I examples/test_data/liver-CD31_test_R1.fastq.gz -I2 examples/test_data/liver-CD31_test_R2.fastq.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
ATAC: ATACseq.py | ||
ATAC: ATACseq.py | ||
ATAC-SEQ: ATACseq.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
|
||
# Gold ATAC | ||
|
||
Testing ATAC-seq pipeline on gold standard public ATAC-seq data. | ||
|
||
## Grab data, project setup | ||
|
||
Download raw `fastq.gz` files (use `fastq-dump` from SRA. You may also use `get_geo.py` to download raw ATAC-seq reads from SRA and metadata from GEO: | ||
|
||
``` | ||
python get_geo.py -i ~/code/ATACseq/examples/gold_atac/metadata/gold_atac_gse.csv -r --fastq | ||
``` | ||
|
||
I used resulting file [metadata/annocomb_gold_atac_gse.csv](metadata/annocomb_gold_atac_gse.csv) to create the looper metadata sheet, [metadata/gold_atac_annotation.csv](metadata/gold_atac_annotation.csv). | ||
|
||
I create project config file and sampled test data. The SRA fastq files should be stored in a folder `${SRAFQ}`, and then this will run with looper with no additional changes. | ||
|
||
## Run pipeline | ||
|
||
``` | ||
looper run ${CODE}ATACseq/examples/gold_atac/metadata/project_config.yaml -d | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
sample_name,Sample_title,Sample_source_name_ch1,organism,Sample_organism_ch1,library,Sample_library_selection,Sample_library_strategy,data_source,Sample_type,SRR,SRX,Sample_geo_accession,Sample_series_id,single_or_paired,Sample_instrument_model | ||
ATAC-seq_from_dendritic_cell_(ENCLB065VMV),ATAC-seq from dendritic cell (ENCLB065VMV),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,Homo sapiens,Homo sapiens,,other,ATAC-seq,SRA,SRA,SRR5210416,SRX2523872,GSM2471255,GSE94182,PAIRED,Illumina HiSeq 2000 | ||
ATAC-seq_from_dendritic_cell_(ENCLB811FLK),ATAC-seq from dendritic cell (ENCLB811FLK),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,Homo sapiens,Homo sapiens,,other,ATAC-seq,SRA,SRA,SRR5210450,SRX2523906,GSM2471300,GSE94222,PAIRED,Illumina HiSeq 2000 | ||
ATAC-seq_from_dendritic_cell_(ENCLB887PKE),ATAC-seq from dendritic cell (ENCLB887PKE),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,Homo sapiens,Homo sapiens,,other,ATAC-seq,SRA,SRA,SRR5210398,SRX2523862,GSM2471249,GSE94177,PAIRED,Illumina NextSeq 500 | ||
ATAC-seq_from_dendritic_cell_(ENCLB586KIS),ATAC-seq from dendritic cell (ENCLB586KIS),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,Homo sapiens,Homo sapiens,,other,ATAC-seq,SRA,SRA,SRR5210428,SRX2523884,GSM2471269,GSE94196,PAIRED,Illumina HiSeq 2000 | ||
ATAC-seq_from_dendritic_cell_(ENCLB384NOX),ATAC-seq from dendritic cell (ENCLB384NOX),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,Homo sapiens,Homo sapiens,,other,ATAC-seq,SRA,SRA,SRR5210390,SRX2523854,GSM2471245,GSE94173,PAIRED,Illumina HiSeq 2000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
sample_name,sample_description,treatment_description,organism,library,data_source,SRR,SRX,Sample_geo_accession,Sample_series_id,single_or_paired,Sample_instrument_model,read1,read2 | ||
test1,ATAC-seq from dendritic cell (ENCLB065VMV),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210416,SRX2523872,GSM2471255,GSE94182,PAIRED,Illumina HiSeq 2000,TEST_1,TEST_2 | ||
gold1,ATAC-seq from dendritic cell (ENCLB065VMV),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210416,SRX2523872,GSM2471255,GSE94182,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2 | ||
gold2,ATAC-seq from dendritic cell (ENCLB811FLK),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210450,SRX2523906,GSM2471300,GSE94222,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2 | ||
gold3,ATAC-seq from dendritic cell (ENCLB887PKE),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210398,SRX2523862,GSM2471249,GSE94177,PAIRED,Illumina NextSeq 500,SRA_1,SRA_2 | ||
gold4,ATAC-seq from dendritic cell (ENCLB586KIS),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210428,SRX2523884,GSM2471269,GSE94196,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2 | ||
gold5,ATAC-seq from dendritic cell (ENCLB384NOX),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210390,SRX2523854,GSM2471245,GSE94173,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
GSE94182 | ||
GSE94222 | ||
GSE94177 | ||
GSE94196 | ||
GSE94173 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# This project config file describes your project. See looper docs for details. | ||
|
||
metadata: # relative paths are relative to this config file | ||
sample_annotation: gold_atac_annotation.csv # sheet listing all samples in the project | ||
output_dir: ${PROCESSED}gold_atac # ABSOLUTE PATH to the parent, shared space where project results go | ||
pipelines_dir: "${CODEBASE}ATACseq" # ABSOLUTE PATH the directory where looper will find the pipeline repository | ||
|
||
# in your sample_annotation, columns with these names will be populated as described | ||
# in the data_sources section below | ||
derived_columns: [read1, read2] | ||
|
||
data_sources: # This section describes paths to your data | ||
# specify the ABSOLUTE PATH of input files using variable path expressions | ||
# These keys then correspond to values in your sample annotation columns. | ||
# Variables specified using brackets are populated from sample_annotation columns. | ||
# Variable syntax: {column_name}. For example, use {sample_name} to populate | ||
# the file name with the value in the sample_name column for each sample. | ||
# example_data_source: "/path/to/data/{sample_name}_R1.fastq.gz" | ||
SRA: "${SRABAM}{SRR}.bam" | ||
SRA_1: "${SRAFQ}{SRR}_1.fastq.gz" | ||
SRA_2: "${SRAFQ}{SRR}_2.fastq.gz" | ||
TEST_1: "${CODEBASE}ATACseq/examples/test_data/{sample_name}_r1.fastq.gz" | ||
TEST_2: "${CODEBASE}ATACseq/examples/test_data/{sample_name}_r2.fastq.gz" | ||
|
||
genomes: | ||
human: hg38 | ||
mouse: mm10 |
Binary file not shown.
Binary file not shown.
Oops, something went wrong.