sarekPrio

Shortcut on tsv-file making

The Sarek tsv-file should have 7 columns (tab-separated) and no headers

Col1	Col2	Col3	Col4	Col5	Col6	Col7
Patient 1	XX	1	Patient1-Tumor	Patient1-Tumor-L001	p1-T_R1.fastq.gz	p1-T_R2.fastq.gz
Patient 1	XX	0	Patient1-Normal	Patient1-Normal-L001	p1-N_R1.fastq.gz	p1-N_R2.fastq.gz

Col 1 is Patient ID, col 2 is gender, col 3 is 1 for Tumor and 0 for Normal. col4 is Sample name, col5 is sample name (+ lane if sequenced in multiple lanes), col6 is path to R1 fastq file, col7 is path to R2 fastq file.

Start by automatically generate col6 and col7 for all samples using the code snippet below, then edit the resulting file and type in col 1-5 manually.

Go to a parent folder of the fastq files, list all R1 fastq.gz files, then R2 files. pwd is the path to the current directory. Keep this to get full paths.

find `pwd` -name *R1*.fastq.gz | sort > r1files.txt
find `pwd` -name *R2*.fastq.gz | sort > r2files.txt
paste r1files.txt r2files.txt > all.tsv

Open the all.tsv file in a text editor and add the missing columns.

It is recommended to start Sarek separately for each patient. The code below takes all.tsv and divides into one tsv file per patient

cut -f1 all.tsv | sort| uniq| while read -r name; do grep $name all.tsv > $name.tsv; done

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
postproc_scripts		postproc_scripts
scripts		scripts
README.md		README.md
sarekPrio.sh		sarekPrio.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sarekPrio

Shortcut on tsv-file making

About

Releases

Packages

Languages

Gwennid/sarekPrio

Folders and files

Latest commit

History

Repository files navigation

sarekPrio

Shortcut on tsv-file making

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages