Releases: bcbio/bcbio-nextgen
Releases · bcbio/bcbio-nextgen
v1.2.9
- Fix vcf header bug: T/N SAMPLE lines are back - needed for import to SolveBio
- add strandedness: auto for -l A option in salmon
- report 10x more peaks in CHIP/ATAC-seq - use 0.05 qvalue
- fix misleading RNA-seq duplicated reads statistics: thanks @sib-bcf
- reorganize conda environments
- snpEff 5.0
- strandedness: auto
- document WGBS pipeline steps
- make --local an option, not default in bismark alignment - too slow
- bcbioRNASeq update to 0.3.44
- pureCN update to 2.0.1
- octopus update to 0.7.4
v1.2.8
- Set ENCODE library complexity flags properly for ChIP-seq. Thanks to @mistrm82.
- Fix greylisted peaks not being propagated to the output directory. Thanks to @mistrm82.
- Better error message when no sample barcodes are found for single-cell RNA-seq.
- Better trimming for 2 wgbs kits
- enable setting parameters for deduplicate_bismark
- custom threading for bismark via yaml
- reproducible WGBS user story with the data from Encode
- While consensus peak calling, keep the highest scoring peak instead of calling the summit for the highest scoring peak and expanding the peak to 250 bases.
- Enable consensus peak calling for broad peaks. Thanks to @mistrm82 and @yoonsquared for pointing out this was missing.
- Re-enable ATAC-seq tests, they work now.
- svprioritize for mm10
- purecn_Dx.R - mutational signatures - still requires a manual update of deconstructsigs or release of it
- make sure purecn uses sv_regions bed to call variants
- fix misleading disambiguation fastqc read statistics (total, hg38, mm10)
- wgbs: nebemseq kit: add --maxins 1000 and --local to bismark align
- WGBS: sorted indexed deduplicated bam for ready.bam
- print error message when aligner: false and hla typing is on
- make sure that mark_duplicates is false with collapsed UMI input
v1.2.7
- RNASeq: Add gene body coverage plots to multiqc report.
- Restore ability to opt out of contamination checking via tools_off.
- Properly invoke threading for
verifybamid2
. - Fix circular import issue when using bcbio functions outside of
the main bcbio script. - Enable setting custom PureCN options via YAML file.
v1.2.6
- RNASeq: Fail more gracefully if SummarizedExperiment object cannot be created.
- Fixes to handle DRAGEN BAM files from the first stage of UMI processing.
- Fix issue with double-annotating with dbSNP. Separating out somatic variant annotation into it's own vcfanno configuration.
v1.2.5
1.2.5 (01 January 2021)
- Joint calling for RNA-seq variant calling requires setting
jointcaller
to bring it in line
with the configuration options for variant calling. - Allow pre-aligned BAMs and gVCFs for RNA-seq joint variant calling. Thanks to @WimSpree for the
feature. - Allow
CollectSequencingArtifacts
to be turned off viatools_off: [collectsequencingartifacts]
. - Fix getiterator -> iter deprecation in ElementTree. Thanks to @smoe.
- Add SummarizedExperiment object from RNA-seq runs, a simplified version of the bcbioRNASeq object.
- Add
umi_type: dragen
. This enables bcbio to run with first-pass, pre-consensus called UMI BAM files from DRAGEN. - Turn off inferential replicate loading when creating the gene x sample RNA-seq count matrix. This allows loading of thousands of RNA-seq samples.
- Only make isoform to gene file from express if we have run express.
- Allow "no consensus peaks found" as a valid endpoint of a ChIP-seq analysis.
- Allow
BCBIO_TEST_DIR
environment variable to control where tests end up. - Collect OxoG and other sequencing artifacts due to damage.
- Round tximport estimated counts.
- Turn off consensus peak calling for broad peaks. Thanks to @lbeltrame and @LMannarino for diagnosing the broad-peaks-run-forever bug.
v1.2.4
1.2.4 (21 September 2020)
- Remove deprecated
--genomicsdb-use-vcf-codec
option as this is now the default. - Add bismark output to MultiQC.
- Fix PS genotype field from octopus to have the correct type.
- Edit VarDict headers to report VCFv4.2, since htsjdk does not fully support VCFv4.3 yet.
- Attempt to speed up bismark by implementing the parallelization strategy suggested here: FelixKrueger/Bismark#96
- Add
--enumerate
option to OptiType to report the top 10 calls and scores, to make it easier to decide how confident we are in
a HLA call. - Performance improvements when HLA calling during panel sequencing. This skips running bwa-kit during the initial
mapping for consensus UMI detection, greatly speeding up panel sequencing runs. - Allow custom options to be passed to
featureCounts
. - Fix race condition when running tests.
- Add TOPMed as a
datatarget
. - Add predicted transcript and peptide output to arriba.
- Add mm10 as a supported genome for arriba.
- Skip
bcbioRNASeq
for more than 100 samples. - Add
rRNA_pseudogene
as a rRNA biotype. - Add
--genomicsdb-use-vcf-codec
when running GenotypeGVCF. See https://gatk.broadinstitute.org/hc/en-us/articles/360040509751- GenotypeGVCFs#--genomicsdb-use-vcf-codec for
a discussion. Thanks to @amizeranschi for finding the issue and posting the solution. - update VEP to v100
- Add consensus peak calling using https://bedops.readthedocs.io/en/latest/content/usage-examples/master-list.html
to collapse overlapping peaks. - Pre-filter consensus peaks by removing peaks with FDR > 0.05 before performing consensus peak calling.
- Add support for Qiagen's Qiaseq UPX 3' transcriptome kit for DGE. Support for 96 and 384 well configurations
by specifyingumi_type: qiagen-upx-96
orumi_type: qiagen-upx-384
. - Add consensus peak counting using featureCounts.
- Skip using autosomal-reference when calling ataqv for mouse/human, as this has a problem with ataqv
(see ParkerLab/ataqv#10) for discussion and followup. - Add pre-generated ataqv HTML report to upload directory.
- Support single-end reads for ATAC-seq.
- Move featureCount output files to featureCounts directory in project directory.
- Remove RNA and reads in peak stats from MultiQC table when they are not calculated for a pipeline.
- Only show somatic variant counts in the general stats table, if germline variants are calculated.
- Add
kit
parameter for setting options for pipelines via just listing the kit. Currently only implemented for WGBS.
v1.2.3
v.1.2.2
- Fix for not properly looking up R environment variables in the base environment.
- Remove --use-new-qual-calculator which was eliminated in GATK 4.1.5.0.
- Ensure header is not written for a Series. In pandas 0.24.0 the default for header was changed from
False to True so we have to set it explictly now. - Remove unused Dockerfile. Thanks to @matthdsm.
- ATAC-seq: Skip peak-calling on fractions with < 1000 reads.
v1.2.1
- Update ChIP and ATAC bowtie2 runs to use
--very-sensitive
. - Properly pad TSS BED file for ataqv TSS enrichment metrics.
- Skip bcbioRNASeq if there are less than three samples.
- Run joint-calling with single cores to save resources.
- Re-support PureCN.
- Skip segments with no informative SNPs when creating the LOH VCF file from PureCN output.
- Fix for duplicated output for mosdepth in quality control report.
- Fix for missing rRNA statistics.
v1.2.0
- Fix for bismark not being a supported aligner.
- Run ataqv (https://github.com/ParkerLab/ataqv) to calculate additional ATAQ-seq quality control
metrics. - Workaround for some bcbioRNASeq plots failing with many samples when
interesting_groups
is not set. - Add
known_fusions
parameter for passing in known fusions to arriba. - Fix for tx2gene not working properly on some GTF files.
- Sort MACS2 output with UNIX sort to avoid memory issues.
- Run RiP on full peak file for ATAC-seq.
- Run ataqv on unfiltered BAM file with the full peak file.
- Run peddy on the population variant file, not the individual sample level file if joint calling was done.
- Add STAR to MultiQC metrics.
- Throw an error if STAR is run on a genome with alts.
- Don't run bcbioRNASeq if there is only one sample. Thanks to @kmendler for the suggestion.
- Improve arriba sensitivity by setting
--peOverlapNbasesMin 10
and--alignSplicedMateMapLminOverLmate 0.5
when
running STAR (see suhrig/arriba#41). - Make TPM and counts files from tximport automatically.
- Use --keepDuplicates when making the Salmon index. This keeps transcripts that are identical in the index instead of
randomly choosing one. This helps when comparing to other ways of quantifying the transcripts, ensuring all of
the transcripts are represented. - Remove unnecessary "quant" subdirectory for Salmon runs. This allows MultiQC to properly name the samples.
- Ensure STAR log file is propagated to the upload directory.
- Fix issue with memory not being specified properly when running
bcbio_prepare_samples.py
. - Run tximport automatically and store TPM in
project/date/tpm
and counts inproject/date/counts
. - Calculate ENCODE quality flags for ATAC-seq. See https://www.encodeproject.org/data-standards/terms/#library for a
description of what the metrics mean. - Fix for command line being too long while joint genotyping thousands of samples.
- Fix for command line being too long when running the CWL workflow with cromwell.