Name | ID | |
---|---|---|
Markus Ankenbrand | MA | markus.ankenbrand@uni-wuerzburg.de |
Thomas Hackl | TH | thomas.hackl@uni-wuerzburg.de |
Frank Foerster | FF | frank.foerster@biozentrum.uni-wuerzburg.de |
Whole genome alignment and visualization of homologous regions is an essential tool in comparative genomics. However currently available software either performs purely on large genomes (>100Mbp), when based on directly on alignments, or is aimed at synteny visualisation that depends on comprehensive gene prediction/annotations prior to visualization. Addionally, visualization is usually integrated in some form of interactive viewer, carrying lots of meta information etc. The actual graphics, however are mostly “ugly”.
AliTV objectives:
- generation of whole genome alignments
- using established methods (lastz, mummer)
- using alternative methods (e.g. daligner)
- Conversion of alignment data and visualization
- using Circos (cirular, 2 genomes)
- d3.js
- circular, multiple genomes
- linear, multiple genomes
git-ff-merge strategy for pushing/pulling script available at binf git base (132.187.22.105:common/git-scripts)
Current suggestions - entirely open to discussion.
general tasks:
coding tasks:
;;; TH's TODO color scheme
(setq org-todo-keyword-faces
'(("TODO" . "red1")
("BUGF" . "red1")
("FEAT" . "orange1")
("INPG" . "orange1")
("UINV" . "orange1")
("DISC" . "CornflowerBlue")
("HOLD" . "CornflowerBlue")
("DONE" . "ForestGreen")
("FIXD" . "ForestGreen")))
Gives a good overview on outline properties and easy access to modifications
- on
- C-c C-x C-c (for subtree)
- off
- q (on highlighted entry)
- navigate
- arrow keys
- modify
- S-arrow key or e
http://orgmode.org/worg/org-tutorials/org-column-view-tutorial.html
Actual dev on pipeline source code - features, bugfixes etc, goes here
Ideas, brainstorming, experimenting, etc …
- d3js
- Parallel linear diagrams
- Sankey diagrams
- Tree layout
- simple tab separated format, defined columns
SID: sequence id (chromosome) GID: genome id (to which of multiple genomes does this sequence belong) LEN: length of this sequence SEQ: sequence as text (optional) SID GID LEN[ SEQ]
- simple
- not standardized
- not flexible
SID: sequence id (chromosome) FID: feature id SID FROM TO FID ...
- simple, standardized tsv format, with comprehensive tool box (bedtools) and conversion scripts to other formats
- exiting data set of arbitrary feature annotations can usually converted to bed very easy (gff, blast, sam …)
- To use the features for links the fourth column (feature id) has to be mandatory, in contrast to the bed specification.
FID_[AB]: feature id set A/B LTYPE: link type FID_A LTYPE FID_B
- simple tsv, compatible with Cytoscape
- no link attributes, e.g. identity, score etc..
- to add those attributes either an additional file is needed or the “link type” has to be abused
- simple tsv with header line
- mandatory columns are fida and fidb, all other columns are (named) link properties
- the header starts with a hashtag (#)
- if no header is present “#fida type fidb” is assumed, therefore supporting .sif format
FID_[AB]: feature id set A/B LTYPE: link type IDY: link identity #fida type fidb identity FID_A LTYPE FID_B IDY
- flexible
- extensible
- can be imported into Cytoscape (as edge properties)
- not standardized
- (useful) header lines have to be documented
Test data sets etc.
The ulitmate goal.