Skip to content
This repository has been archived by the owner on Nov 18, 2020. It is now read-only.
/ selectION Public archive

Rapid linking of long reads to a reference genome

License

Notifications You must be signed in to change notification settings

giesselmann/selectION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

selectION

Build Status

Rapid linking of long reads to a reference genome

Dependencies

Selection is written in C++14 requires gcc > 5 and the following libraries:

  • boost filesystem
  • boost program_options
  • boost system
  • libhdf5

Boost is available through standard package sources. Libhdf5 is downloaded and build by the install script. The index building uses SSE3 acceleration.

Installation

Linux

In order to download, build and install selectION, execute the following commands:

git clone https://github.com/PayGiesselmann/selectION
cd selectION
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
make install

If you wish to install the software in any other than the default directory, use the following cmake command:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=<install_prefix> ..

If everything's gone right, typing 'selection' into the command line should give you some output like this:

Program:    SelectION
Usage:      selection <command> [options]
Commands:   index : Build FM-Index for reference sequence
            scan : Scan input for reads matching specified positions

Others

Not documented yet. SelectION is cross-platform software and will, the dependencies resolved, build on any x86/x64 system.

Usage

Index

Before you can use selectION, you have to build an index for the reference genome. Input can be any FASTA file containing one or multiple sequences. Multiple input files are not supported yet.

selection index -t 8 ref.fa

This will build the index using eight threads and create a file ref.fa.h5 in the same directory. Additional options are available, for human genome applications the defaults should however work fine.

Scan

Estimate positions for all reads in input.fq and write results to out.sam in current directory. Note that lines will be appended to existing output files.

selection scan -t 8 ref.fa input.fq ./ --sam ./out.sam

Estimate position for all reads in input.fq and write reads matching region of interest in roi.txt to current directory

selection scan -t 8 ref.fa input.fq ./ --filter ./roi.txt

The syntax for the roi.txt is as follows. You can specifiy as many selectors as you want.

# chromosome
X
# single spot
X;67542032
# region of interest
X;67542032;67732619
# with custom name (default: X_146993569_146993569)
X;146993569;146993569;FMR1

For either a complete chromosome, a specific spot or a region defined by start and stop. Last column may contain a custom name to use for the output files. The naming of the chromosome must match the spelling in the reference e.g. chrX is not equal to X!

Support for input fast5 files is coming soon, for the moment we recommend using poretools to extract basecalled sequences from ONT fast5 files.

About

Rapid linking of long reads to a reference genome

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published