Releases · jsh58/Genrich

17 Mar 12:13

jsh58

v0.6.1

8558e92

Version 0.6.1 Latest

Latest

Two bugs fixed:

analyzing singleton alignments with no alignment scores (#30)
removing PCR duplicates when one of multiple samples has no unpaired alignments (#72)

Assets 2

04 Aug 14:45

jsh58

v0.6

d896ab1

Version 0.6

New default peak-calling parameters: -p 0.01 -a 200.0. This also means that, by default, q-values are not calculated.
In ATAC-seq mode, the ends of fragments are adjusted by +5/-5 to account for Tn5 transposase occupancy, when determining the centers of cut site intervals. This adjustment can be avoided with -D.
The number of warning messages of reads extending off the ends of chromosomes has been limited to 128 (per input file).
A convenience option (-S) was added, which prevents Genrich from checking the sort order of the input SAM/BAM file. The input file is still parsed under the assumption that it is sorted by queryname, so this option should be used judiciously.

Assets 2

14 Jan 15:55

jsh58

v0.5

7e4ef84

Version 0.5

The null statistical model was changed to the log-normal distribution. This was determined to be a more appropriate choice than other distributions (including the previously used exponential) from examining ChIP-seq control samples.
The minimum length (-l) is now a regular peak-calling parameter, not an alternative peak-calling method. The default value is 0bp. Those who prefer peak-calling by minimum length and not AUC can run -a 0 -l <int>.
Two convenience options were added:
- -X: option to skip peak-calling, but perform all alignment parsing (including identifying PCR duplicates)
- -P: option to call peaks directly from a bedgraph-ish log file (-f) produced by a previous Genrich run

Assets 2

17 Dec 15:01

jsh58

v0.4

14cfb31

Version 0.4

Option to remove PCR duplicates (-r). The procedure is explained here. Incorporates ideas from findDups9.py, in particular the process of evaluating multimapping reads/fragments.
- A log file listing duplicates can be produced via -R <file>.
A reasonably complete README was added to the homepage.
Input SAM/BAM files are now required to be queryname-sorted (samtools sort -n).
A bug in alignment parsing (paired secondary alignments with matching positions) was fixed.

Assets 2

04 Oct 14:23

jsh58

v0.3

f1a39ce

Version 0.3

New default peak-calling method: area under the curve (AUC). For a peak to be called, the total significance of the region must exceed a minimum value (-a <float>, default 20.0).
- The total significance is calculated as the sum of the -log(q) values above the -q threshold over the length of the region (i.e. the area under the -log(q) "curve"). If a -p threshold is specified, the area under the -log(p) curve is calculated.
- The maximum gap parameter (-g <int>) still allows multiple regions to be linked.
- No minimum length is required for a peak to be called.
- Can be overridden by specifying -l <int>, in which case peak-calling reverts to the previous method, with the given minimum length for peaks.
Option to provide a BED file of genomic regions to exclude from analysis (-E <file>).
- The regions will affect peak calls, such that no peak may extend into or around an excluded region.
- The regions' lengths will be subtracted from the genome length calculated by the program.
- In the output log files, excluded regions will have treatment/control pileup values of 0.0 and p-/q-values of NA.
- Multiple BED files can be specified, comma-separated (or space-separated, in quotes).
Accessory script findNs.py to produce a BED file of 'N' homopolymers from a fasta file (e.g. a reference genome). The output can (and should) be given to Genrich via -E (above).
Option to keep unpaired alignments, with lengths changed to a given value, has been changed to -w <int> (formerly -a <int>).

Assets 2

24 Sep 13:56

jsh58

v0.2

e19b55f

Version 0.2

Multiple replicates are now analyzed separately, with p-values calculated for each. At each position, the multiple p-values are then combined by Fisher's method, before conversion to q-values.
Three optional output log files:
- -f <file>: with one replicate, it lists treatment/control pileups, p- and q-values, and significance for each interval; with multiple replicates, it lists p-values of each replicate, combined p-value, q-value, and significance for each interval.
- -k <file>: for each replicate, sequentially, it lists a header line (# treatment file: <name>; control file: <name>), followed by treatment/control pileups and a p-value for each interval. This is the way to examine pileup values with multiple replicates, since the -f file will not supply them.
- -b <file>: an unsorted BED file of the reads/fragments analyzed. The 4th column gives the read name, number of alignments, 'T'reatment or 'C'ontrol, and sample number (0-based), e.g. SRR5427885.57_2_T_0.
Option to analyze reads in ATAC-seq mode (-j). Instead of analyzing full fragments, the program uses "cut site intervals" centered on the ends of fragments. The interval lengths are determined by the -d parameter (def. 100bp).
Control sample pileups are now scaled to match the treatment, based on fragment/interval lengths.
Fixed bug related to sorting alignments by scores.
Fixed bugs related to reference sequences (chromosomes) missing from a SAM/BAM file. Reads aligning to chromosomes either on the "skipped" list (-e argument) or not appearing in the header of the treatment SAM/BAM file will not be analyzed.

Assets 2

31 Aug 19:54

jsh58

v0.1

61f502b

Version 0.1

Analyzes secondary alignments:
- Keeps alignments whose alignment scores (AS) are within a specified value (-s <float>) of the best alignment. The default value of -s 0 means that only alignments judged as equivalent by the aligner will be kept.
- Each of the n alignments for a read/fragment is counted as 1/n for the pileup.
- To avoid excessive memory usage and the imprecision inherent in floating-point values, a maximum of 10 alignments per read is analyzed. Reads with more than 10 alignments will be subsampled based on the best alignment scores; in the case of ties, alignments appearing first in the SAM/BAM are favored.
Input SAM/BAM files are required to be name sorted (samtools sort -n)
A simple hashtable is implemented for efficiently compiling p-values to calculate q-values. The q-value calculation is still based on the Benjamini-Hochberg procedure.

Assets 2

13 Aug 14:08

jsh58

v0.0

e3c71c3

Version 0.0

Peak-calling for genomic enrichment assays of treatment sample(s) (-t [<file>]+), with or without control sample(s) (-c [<file>]+). Input files must be in SAM/BAM format. Incorporates ideas from SAMtoBED, removeChrom, and MACS2 v2.1.2_dev.

Control of analysis of alignments:
- Properly paired alignments only (default); fragments are inferred appropriately
- Also keeping unpaired alignments:
  - As they appear in the SAM/BAM file (-y)
  - Length increased to specified value (-a <int>)
  - Length increased to average value of paired alignments (-x)
Filtering options:
- Chromosomes (reference sequences) to ignore (-e <arg>)
- Minimum mapping quality (-m <int>)
Peak-calling options:
- Maximum q-value (-q <float>, def. 0.05)
- Maximum p-value (-p <float>)
- Minimum length of a peak (-l <int>, def. 100bp)
- Maximum distance between significant sites (-g <int>, def. 100bp)
Output options:
- Output peak file, in ENCODE narrowPeak format (-o <file)
- Output bedgraph file listing treatment/control pileups and p/q values (-b <file>)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: jsh58/Genrich

Version 0.6.1

Version 0.6

Version 0.5

Version 0.4

Version 0.3

Version 0.2

Version 0.1

Version 0.0