IlluminaLogVision: Extended Epigenomic Analytics for NovaSeq 6000

IlluminaLogVision is a Java-based toolkit dedicated to parsing and interpreting Illumina NovaSeq 6000 log files. It incorporates detailed fields like HPC node usage, system serials, Q30 statistics, pass-filter counts, and more. Originally designed for epigenomic research pipelines, IlluminaLogVision uses advanced metrics to aid in optimizing library preparation and HPC scheduling while ensuring reliable data quality.

Background

Comprehensive Error Analytics: By capturing and standardizing error-rate measurements, we offer in-depth insight into basecalling quality, which is crucial for sensitive epigenetic assays like WGBS (whole-genome bisulfite sequencing) or histone ChIP-seq.
HPC Load Balancing: Multi-node HPC infrastructures often run demultiplexing or alignment tasks in parallel. Tracking HPC node usage and run distribution helps researchers identify bottlenecks and optimize resource allocation across large-scale epigenomic projects.
Q30 and Pass-Filter Metrics: The proportion of bases above Q30 and clusters passing filter are established indicators for run success. By aggregating these measures per lane, researchers can more quickly refine library prep conditions or revisit experimental design.
Yield and Cluster Density: Understanding how yield in gigabases correlates with cluster density is essential for tuning loading concentrations, which is especially beneficial for epigenetic workflows that rely on high coverage.

Features

Extended Parsing Logic: Reads HPC node details, run folder paths, Q30 figures, indexing barcodes, pipeline versions, and additional fields beyond basic logs.
Rich Analytics: Computes average error rate, error-rate standard deviation, lane-specific HPC usage frequency, total yield, and more.
Multi-File Compatibility: Accepts varying log formats, from minimal 7-field lines to larger lines featuring HPC node and pass-filter references.
Epigenetic Application: Integrates seamlessly into bioinformatics pipelines for genomic and methylation assays, focusing on HPC usage and read quality in detail.

Usage

Clone or download this repository, then build and run either via Gradle or directly using the Java command line. Ensure your logs are stored in assets/.

gradle build
gradle run --args="real_runA.txt"

Alternatively, compile and run manually:

javac *.java
java Main real_runA.txt

If no command-line argument is provided, the default file is real_runA.txt.

Extended Log Example

RUN-20250903 Lane1 HPC-Node4 SN3000123456 /seqdata/210801_M04281_0123_000000000-A1B2C 2025-09-03T09:10:22Z 42000000 0.0030 315 38.2 91.5 Q30=88.9 Index=ACTG NGS-v2.2.1 bcl2fastq2.20

Above, fields include Run ID, Lane, HPC Node, Machine Serial, Run Folder Path, Timestamp, Read Count, Error Rate, Cluster Density, Yield (Gb), Pass-Filter Count, Q30, Index, Pipeline Version, and Analysis Software.

Planned Research-Focused Updates

Dynamic Epigenomic Reporting: Automated generation of QC charts for methylation coverage vs. read error distribution, enabling real-time assessment of CpG-specific data quality.
Integrative HPC Metrics: Collect HPC node performance stats (CPU load, memory usage) to refine scheduling across batch-based or containerized workflows.
Hybrid Cloud Support: Real-time synchronization with off-site analysis clusters for massive epigenome projects.

Stars

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
assets		assets
bin		bin
src		src
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IlluminaLogVision: Extended Epigenomic Analytics for NovaSeq 6000

Background

Features

Usage

Extended Log Example

Planned Research-Focused Updates

Stars

License

About

Releases 1

Contributors 3

Languages

License

VerisimilitudeX/IlluminaLogVision

Folders and files

Latest commit

History

Repository files navigation

IlluminaLogVision: Extended Epigenomic Analytics for NovaSeq 6000

Background

Features

Usage

Extended Log Example

Planned Research-Focused Updates

Stars

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Contributors 3

Languages