-
Notifications
You must be signed in to change notification settings - Fork 268
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update filterdup, hmmratac, pileup md
- Loading branch information
Showing
4 changed files
with
91 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,73 @@ | ||
# Pileup | ||
# pileup | ||
|
||
## Overview | ||
The `pileup` command is part of the MACS3 suite of tools and is used to pile up alignment files. It is particularly useful in ChIP-Seq analysis where summarizing the read depth at each genomic location is required. | ||
The `pileup` command is part of the MACS3 suite of tools and is used | ||
to pile up alignment files. It is a fast algorithm to generate | ||
coverage track from alignment file -- either single-end or paired-end | ||
data. | ||
|
||
## Detailed Description | ||
|
||
The `pileup` command takes in one or multiple input files and produces an output file with the piled-up alignments. It uses an efficient algorithm to pile up the alignments, improving the quality of your data for further analysis. | ||
The `pileup` command takes in one or multiple input files and produces | ||
an output file with the piled-up genomic coverage. It uses an | ||
efficient algorithm to pile up the alignments. | ||
|
||
Pileup aligned reads with a given extension size (fragment size or d in MACS language). Note there will be no step for duplicate reads filtering or sequencing depth scaling, so you may need to do certain pre/post-processing. | ||
![Pileup Algorithm](./pileup.jpeg) | ||
|
||
Pileup aligned reads with a given extension size (fragment size or d | ||
in MACS language). Note there will be no step for duplicate reads | ||
filtering or sequencing depth scaling, so you may need to do certain | ||
pre/post-processing. | ||
|
||
## Command Line Options | ||
|
||
The command line options for `pileup` are defined in `/MACS3/Commands/pileup_cmd.py` and `/bin/macs3` files. Here is a brief overview of these options: | ||
Here is a brief overview of the command line options for `pileup`: | ||
|
||
- `-i` or `--ifile`: Alignment file. If multiple files are given as '-t A B C', then they will all be read and combined. Note that pair-end data is not supposed to work with this command. REQUIRED. | ||
- `-o` or `--ofile`: Output bedGraph file name. If not specified, will write to standard output. REQUIRED. | ||
- `--outdir`: If specified, all output files will be written to that directory. Default: the current working directory | ||
- `-i` or `--ifile`: Alignment file. If multiple files are given as | ||
'-t A B C', then they will all be read and combined. REQUIRED. | ||
- `-o` or `--ofile`: Output bedGraph file name. If not specified, will | ||
write to standard output. REQUIRED. | ||
- `--outdir`: If specified, all output files will be written to that | ||
directory. Default: the current working directory | ||
- `-f ` or `--format`: Format of the tag file. | ||
- `AUTO`: MACS3 will pick a format from "AUTO", "BED", "ELAND", "ELANDMULTI", "ELANDEXPORT", "SAM", "BAM", and "BOWTIE". If the format is BAMPE or BEDPE, please specify it explicitly. | ||
- `BAMPE` or `BEDPE`: When the format is BAMPE or BEDPE, the -B and --extsize options would be ignored. | ||
- Other options correspond to specific formats. | ||
- `-B` or `--both-direction`: By default, any read will be extended towards the downstream direction by the extension size. If this option is set, aligned reads will be extended in both upstream and downstream directions by the extension size. This option will be ignored when the format is set as BAMPE or BEDPE. DEFAULT: False | ||
- `--extsize`: The extension size in bps. Each alignment read will become an EXTSIZE of the fragment, then be piled up. Check description for -B for details. This option will be ignored when the format is set as BAMPE or BEDPE. DEFAULT: 200 | ||
- `--buffer-size`: Buffer size for incrementally increasing the internal array size to store read alignment information. In most cases, you don't have to change this parameter. However, if there are a large number of chromosomes/contigs/scaffolds in your alignment, it's recommended to specify a smaller buffer size in order to decrease memory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 8 Bytes. DEFAULT: 100000 | ||
- `--verbose`: Set verbose level. 0: only show critical messages, 1: show additional warning messages, 2: show process information, 3: show debug messages. If you want to know where are the duplicate reads, use 3. DEFAULT: 2 | ||
|
||
- `AUTO`: MACS3 will pick a format from "AUTO", "BED", "ELAND", | ||
"ELANDMULTI", "ELANDEXPORT", "SAM", "BAM", and "BOWTIE". If the | ||
format is BAMPE or BEDPE, please specify it explicitly. | ||
- `BAMPE` or `BEDPE`: When the format is BAMPE or BEDPE, the -B and | ||
--extsize options would be ignored. | ||
- Other options correspond to specific formats. | ||
- `-B` or `--both-direction`: By default, any read will be extended | ||
towards the downstream direction by the extension size. If this | ||
option is set, aligned reads will be extended in both upstream and | ||
downstream directions by the extension size. This option will be | ||
ignored when the format is set as BAMPE or BEDPE. DEFAULT: False | ||
- `--extsize`: The extension size in bps. Each alignment read will | ||
become an EXTSIZE of the fragment, then be piled up. Check | ||
description for -B for details. This option will be ignored when the | ||
format is set as BAMPE or BEDPE. DEFAULT: 200 | ||
- `--buffer-size`: Buffer size for incrementally increasing the | ||
internal array size to store read alignment information. In most | ||
cases, you don't have to change this parameter. However, if there | ||
are a large number of chromosomes/contigs/scaffolds in your | ||
alignment, it's recommended to specify a smaller buffer size in | ||
order to decrease memory usage (but it will take longer time to read | ||
alignment files). Minimum memory requested for reading an alignment | ||
file is about # of CHROMOSOME * BUFFER_SIZE * 8 Bytes. DEFAULT: | ||
100000 | ||
- `--verbose`: Set verbose level. 0: only show critical messages, 1: | ||
show additional warning messages, 2: show process information, 3: | ||
show debug messages. If you want to know where are the duplicate | ||
reads, use 3. DEFAULT: 2 | ||
|
||
## Example Usage | ||
|
||
Here is an example of how to use the `pileup` command: | ||
|
||
```bash | ||
macs3 pileup -i treatment.bam -o piledup.bedGraph -f BAM -g hs -n experiment1 | ||
macs3 pileup -i treatment.bam -o piledup.bedGraph -f BAM --extsize 147 | ||
``` | ||
|
||
In this example, the program will pile up the alignments in the `treatment.bam` file and write the result to `piledup.bedGraph`. The input file is in BAM format, the genome size is set to 'hs' (human), and the name of the experiment is 'experiment1'. | ||
In this example, the program will pile up the alignments in the | ||
`treatment.bam` file and write the result to `piledup.bedGraph`. The | ||
input file is in BAM format, and we extend each sequencing tag into a | ||
147bps fragment for pileup. |