-
Notifications
You must be signed in to change notification settings - Fork 0
/
demo.txt~
19 lines (18 loc) · 1.63 KB
/
demo.txt~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
As sequencing technologies improve we able to produce a larger amount
of digitalized genetic data. A reference genome is a structure which collects this information to form a representative sample of the genome for a
given species. To account for the variation which appears as the amount
of data increases, new models for representing reference genomes are being proposed. Graphs present an opportunity to have non-linear relationships between elements, a property which naturally solves the problem of
variation. Newer reference genomes already incorporate graph-like features through the introduction of alternate paths through specific regions.
Methods created for interacting with the existing structures are traditionally centered around linear data representations, realized as a set of text
string operations. In order to allow a complete transition, these methods
must be adapted to fit the domain of graphs.
In this thesis, we present a new method for aligning text strings against
graph based reference genomes. The method is based on the concept of
context-based mapping, a technique proposed to standardize uniqueness in
structures which do not have an innate coordinate system. We have made
the method accessible through a tool which is available online.
We test the feasibility of our approach by doing performance comparisons
with existing methods, examining both accuracy and efficiency. The results
display several traits of the approach which outperform the other proposed
solutions. We argue that the method provides a viable solution to the most
general version of the problem, which provide a basis for more specific biological applications.