-
Notifications
You must be signed in to change notification settings - Fork 50
Roadmap
This document gives a high-level view of our plans for Seq 1.0, along with the main large-scale changes and additions we'd like to make. We will update this document regularly as we make progress and/or update our plans!
There are several things we aim to do before 1.0; here are the big ones.
We currently have an OCaml-based parser using Menhir. However, type deduction is currently done in C++ just before code generation. This makes certain things difficult, like implementing bi-directional type deduction that can handle certain Python constructs like empty collection literals and lambda functions. A more principled approach is to perform all type deduction/checking while parsing, then codegen a fully-typed AST, which is what we are currently implementing. We are also in the process of moving large parts of the parser from OCaml to C++.
Instead of compiling directly to LLVM, we want to design a new IR that sits between Seq source and LLVM IR. This will enable us to do both Python-specific and bio-specific optimizations in a much more systematic way. For example, there are many common patterns in Python that we can potentially catch and optimize with such an IR (e.g. ''.join(a for a in b)
), and the same goes for bioinformatics. Comparable IRs would be Rust's MIR and Swift's SIL.
We already have much of the machinery in place to add a REPL and implement Jupyter support; we are actively working on this!
For example, supporting a few more formats natively, like VCF. We use htslib internally, so much of this will just be a matter of interfacing with that library.
In short, documenting both the Seq standard library, various APIs, and the compiler itself in more depth.
Some system to manage packages, dependencies, and compiler versions.
The parser re-work described in (1) above is also intended to further close the language gap between Seq and Python. As far as the standard library goes, we've implemented several Python standard library modules already (see stdlib/
), and will more than likely add several more. Here are the modules we have currently (importantly, note that there may be various functions not yet implemented in these; documenting this in detail falls under (5) above):
- bisect
- collections
- getopt
- gzip
- heapq
- itertools
- math
- pickle
- random
- statistics
- sys
- threading
- time
Python has many standard library modules; those that are not immediately relevant to bioinformatics are low-priority for us. Examples would be smtplib
, email
, html
, http
, urllib
and the like, although contributions pertaining to one of these modules are certainly still welcome!