This tool converts multiple QC JSON objects into a spreadsheet (TSV/CSV), which will be useful to make a QC spread sheet organized for each experiment/replicate.
It flattens each JSON object to make a row in a spreadsheet and then splits each row into multiple rows according to split rules. This is useful to have a seperate row for a JSON object with a specific key name (e.g. each biological replicate).
This tool can directly read from various types of URIs (gs://
, s3://
, http://
, https://
and local path). To access private cloud buckets (gs://
and s3://
), make sure to authenticate yourself on your shell environment. To access private URLs, use ~/.netrc
file.
Make sure that you have python3
(>= 3.6) installed on your system. Use pip
to install qc2tsv.
$ pip install qc2tsv
We will demonstrate how to make a QC spread sheet from two qc.json
files QC_SE and QC_PE, which were generated from ENCODE ChIP-seq pipeline. You can use any URIs in the command line arguments.
$ QC_SE=https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ref_output/v1.3.0/ENCSR000DYI_subsampled_chr19_only/qc.json
$ QC_PE=https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ref_output/v1.3.0/ENCSR936XTK_subsampled_chr19_only/qc.json
$ qc2tsv $QC_SE $QC_PE
See output.
$ qc2tsv [JSON_FILE1] [JSON_FILE2] ...
Read QC JSON file URIs from a text file TXT
.
$ qc2tsv --file [TXT]
Define a regular expression (NAME:REGEX
) to split row into multiple rows. This is useful to have a new row for each biological replicate in genomic pipeline's QC JSON output. Make sure that backslashes in REGEX
are correctly escaped. You can also define multiple split rules.
$ qc2tsv ... --regex-split-rule "replicate:^(rep|ctl)\\d+$" --regex-split-rule "[RULE_NAME:REGEX]" ...