-
Notifications
You must be signed in to change notification settings - Fork 22
/
RELEASE-NOTES
163 lines (148 loc) · 7.76 KB
/
RELEASE-NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
RepeatModeler 2.0.6
===================
- Updated to RepeatScout 1.0.7 which has the capability of reporting the
extended coordinates of the seeds. This reduces the complexity searching
for examplars using the consensus sequence.
- Reduced the minimum family instances for RepeatScout families.
- Fixed a bug that caused RepeatClassifier to run with only one thread in the
last release.
- Fixed a bug that caused some RepeatScout derived exemplars to be reported
on the wrong strand.
- Isolate long sequence identifiers from LTR_retriever which will alter
names longer than 13 characters.
- Fixes for LTR_retriever changes in TRF parameter.
- MultAln - new API for calculating Kimura divergence
- new API for slicing operations on MSA
- switch to Smitten V2 for default stk output
- improvements to the Linup reporting format
- Refiner reorganization for future extension algorithm
RepeatModeler 2.0.5
===================
- Fixed a bug that caused failures in "absolute" reproducibility. Prior
to this release use of the "-srand" would not gaurantee that the ouputs
consensi.fa.classified and families-classified.stk were exactly the same
in sequence and sequence order. It did gaurantee that the same samples
were drawn from the genome, and that equivalent scoring families were
derived at at each step. In this release secondary sorts were added
to gaurantee a fixed sort order among equally scoring results, generating
exactly the same output files each time the random number generator seed
is used. NOTE: This change only applies to results generated with
this version and future releases.
- Added "-long" option to faToTwoBit to support larger genomes.
RepeatModeler 2.0.4
====================
- Improved the gathering of RepeatScout examplars for building
seed alignments.
- Parallelized and improved the masking between rounds for faster
runs and fewer redicoveries.
- The 'pa' (parallel batches) option has been replaced with a new
'threads' option which maps directly to the maximum number of
threads the program will attempt to use.
- Takes advantage of RepeatMasker 4.1.4 and RMBlast 2.13.0 parallel
query feature.
- Larger default sample sizes are now possible with speed improvements.
The original sampling strategy can be selected with the new 'quick'
option.
- ABBlast is no longer supported.
- Fixed a few visual artifacts in the viewMSA.pl html
visualization.
RepeatModeler 2.0.3
===================
- The program now generates a logfile in the working directory named
<seqfile>-rmod.log. This file contains the random seed number used
and some high level stats on the run for use with reporting problems
with the program.
- Fixed a problem with the orientation/coordinates provided in the
Stockholm output format that affected a subset of the sequences.
- Fixed a bug affecting the trim functions of the Linup tool.
RepeatModeler 2.0.2
===================
- First release of a set of manual curation tools for use with
de-novo generated TE libraries.
- Added generateSeedAlignments.pl to generate Dfam compatible
seed alignments given a consensus based TE library and
RepeatMasker output.
- Fixed bug in N50 calculation.
- Fixed Ruzzo-Tompa maximal scoring subsequences implementation.
- lots of small bugfixes.
RepeatModeler 2.0.1
===================
- Work around a bug introduced in the new NCBI Blast 2.10.0 with
version 5 databases.
- RepeatScout in some rare cases will generate models for very
long-period satellites. This can cause Refiner to go crazy creating
tons of off-diagonal alignments. This version filters out
these rare cases.
- This version now prints out the version of each dependency in the log.
RepeatModeler 2.0
=================
- Many RepeatModeler seed alignments and consensus sequences contained
stretch of N’s due masking performed by TRF. Instead of using the masked
sequences in the final seed alignment the original genomic sequences
are used, preserving low-complexity regions in the final consensus called.
- The batching system was refactored to improve handling of short input
sequences often found in early assemblies or low-coverage genome projects.
- The interface to TRF was inefficient and a major bottleneck in the
RepeatModeler pipeline. The simple/tandem repeat masking step was
refactored to improve performance.
- Added code to remove excess temporary files from the RM_ directory
after they are no longer needed.
- Container releases ( Docker and Singularity )
- Minor Bugfixes:
- Stockholm sequence coordinates did not consistently show
strandedness using coordinate reversal.
- Temporary file generation in BuildDatabase did not use
a unique filename. This could cause race conditions in
environments where two BuildDatabase processes ran in the
same directory at the same time.
- Classification improvements:
o Fixed a bug that caused some TE families to be labeled with the
RepeatMasker internal-use label “buffer”.
o Fixed problem with TRF masking the input sequences that caused some
low-complexity sequences incorrectly match TE library sequences.
o Fixed problem with conflict resolution that often let a lower scoring
match define the class for the family.
RepeatModeler 1.0.11
====================
- Bugfix release. Avoid bailing if Refiner could not produce a consensus
from the family sequence set.
RepeatModeler 1.0.10
====================
- Bugfix release. Fixes a bug that will cause RepeatModeler to exit
if no new repeats are detected after round-2. This may happen if the
sample size is small enough or contains a repeat-poor set of sequences
and may not reflect the chance that repeats will be found in later rounds.
Typically new repeats are found in all rounds of RepeatModeler.
RepeatModeler 1.0.9
===================
- RepeatModeler employs a genome sampling approach that is based
on a random number generator. In this release of RepeatModeler
we print out the random number generator seed at the start of
a run. This number can be used with the "-srand ####" flag in future
runs to exactly reproduce the samples taken from a given database.
- The final output files are now placed in the same directory as
the input database.
- An additional output file is now generated containing the seed
alignment for each discovered family. This alignment is the source
of the final consnesus and is stored in a Dfam compatible Stockholm
file. The new output files are named <database_name>-families.fa and
<database_name>-families.stk.
- Support for Dfam_consensus has been built into this release. Two
utilities dfamConsensusTool.pl and renameIds.pl can be found in the
RepeatModeler util/ directory. The dfamConsensusTool script enables
one to upload curated seed alignments to the open Dfam_consensus
database from the command line. The renameIds script simplifies
the process of coming up with unique identifiers for a set of
RepeatModeler generated families given a naming template.
RepeatModeler 1.0.8
===================
- New "-recoverDir <dir>" option. This allows RepeatModeler to
restart where it left off after a failure occurs.
- BLAST searches can now be run in parallel on a shared memory
multiprocessor machine. The new "-pa #" option is used to
specify how many parallel searches to start up at any one time.
RepeatModeler 1.0.7
===================
- Added support for new RMBlastn version 2.2.27
- TRFMask updated to work with the new RepeatMasker 4.0 release.
- New viewMSA.pl utility