Skip to content
/ NR Public

Detecting and correcting misclassified sequences in the large-scale public databases

Notifications You must be signed in to change notification settings

boalang/NR

Repository files navigation

Detecting and correcting misclassified sequences in the large-scale public databases

Dataset: Non Redundant (NR) and CD-HIT clustering information


Boag: Boa for genomics

Boag is a domain-specific language and infrastructure on top of Hadoop for genomics data. Website: https://boalang.github.io/bio/

Boag example on the infrastructure: http://boa.cs.iastate.edu/examples/boag/index.php

Prerequisites

You need to install Java. Boag compiler is written in Java. It can be downloaded here.

Run Boag

These instructions will get you a command line, jupyter notebook, Docker container, and Hadoop version of Boag. You can also set up a programming environment in Eclipse.

From Jupyter notebook

From command line

On a Docker container

On Hadoop

Boag Compiler source code

  • Boag compiler is written in Java. See the source code
  • This is a video on step by step instructions to set up programming environment on Eclipse for Boa compiler. link

Boag Query Script examples:

Download dataset and VirtualBox

  • Google Drive Link
  • Web interface is also implemented in the Ubuntu linux and it can be seen in the VirtualBox.

About

Detecting and correcting misclassified sequences in the large-scale public databases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published