Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Roadmap

A. R. Shajii edited this page Jan 23, 2020 · 2 revisions

This document gives a high-level view of our plans for Seq 1.0, along with the main large-scale changes and additions we'd like to make. We will update this document regularly as we make progress and/or update our plans!

What we want before 1.0

There are several things we aim to do before 1.0; here are the big ones.

(1) Parser re-work and full bi-directional type deduction

We currently have an OCaml-based parser using Menhir. However, type deduction is currently done in C++ just before code generation. This makes certain things difficult, like implementing bi-directional type deduction that can handle certain Python constructs like empty collection literals and lambda functions. A more principled approach is to perform all type deduction/checking while parsing, then codegen a fully-typed AST, which is what we are currently implementing. We are also in the process of moving large parts of the parser from OCaml to C++.

(2) Seq IR

Instead of compiling directly to LLVM, we want to design a new IR that sits between Seq source and LLVM IR. This will enable us to do both Python-specific and bio-specific optimizations in a much more systematic way. For example, there are many common patterns in Python that we can potentially catch and optimize with such an IR (e.g. ''.join(a for a in b)), and the same goes for bioinformatics. Comparable IRs would be Rust's MIR and Swift's SIL.

(3) REPL and Jupyter support

We already have much of the machinery in place to add a REPL and implement Jupyter support; we are actively working on this!

(4) A few more bioinformatics features

For example, supporting a few more formats natively, like VCF. We use htslib internally, so much of this will just be a matter of interfacing with that library.

(5) Documentation

In short, documenting both the Seq standard library, various APIs, and the compiler itself in more depth.

(5) Package management

Some system to manage packages, dependencies, and compiler versions.

Python interoperability

The parser re-work described in (1) above is also intended to further close the language gap between Seq and Python. As far as the standard library goes, we've implemented several Python standard library modules already (see stdlib/), and will more than likely add several more. Here are the modules we have currently (importantly, note that there may be various functions not yet implemented in these; documenting this in detail falls under (5) above):

  • bisect
  • collections
  • getopt
  • gzip
  • heapq
  • itertools
  • math
  • pickle
  • random
  • statistics
  • sys
  • threading
  • time

Python has many standard library modules; those that are not immediately relevant to bioinformatics are low-priority for us. Examples would be smtplib, email, html, http, urllib and the like, although contributions pertaining to one of these modules are certainly still welcome!