BoaG is a domain-specific language and infrastructure on top of Hadoop for genomics data. The infrastructure is publicly available here: http://boa.cs.iastate.edu/boag/index.php
http://boa.cs.iastate.edu/examples/boag/index.php
Protobuffer schema and the step by step data generation is shown here.
BoaG can be integrated with jupyter notebook. The output can easily be further analyzed with R, Python, Bash, Perl, etc. The most time-consuming part of the analysis is done via the BoaG infrastructure. More examples can be found in the jupyter notebooks folder.
BoaG compiler is written in Java and the source code is available here
- This is a video on step by step instructions to set up programming environment on Eclipse for Boa compiler. link