-
Notifications
You must be signed in to change notification settings - Fork 268
/
ChangeLog
2615 lines (1790 loc) · 85.9 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2024-10-08 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.3b
* Features added
1) We implemented the IO module for reading the fragment files
usually used in single-cell ATAC-seq experiment
`Parser.FragParser`. And we implemented a new
`PairedEndTrack.PETrackII` to store the data in fragment file,
including the barcodes and counts information. In the `PETrackII`
class, we are able to extract a subset using a list of barcodes,
which enables us to call peaks only on a pool (pseudo-bulk) of
cells.
2) We extensively rewrote the `pyx` codes into `py` codes. In
another words, we now apply the 'pure python style' with PEP-484
type annotations to our previous Cython style codes. So that, the
source codes can be more compatible to Python programming tools
such as `flake8`. During rewritting, we cleaned the source codes
even more, and removed unnecessary dependencies during
compilation.
* Bug fixed
1) Fix issues in big-endian system in `Parser.py` codes. Enable
big-endian support in `BAM.py` codes for accessig certain
alignment records that overlap with givin genomic
coordinates using BAM/BAI files.
* Doc
1) Explanation on the filtering criteria on SAM/BAM/BAMPE files.
2024-09-06 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.2
* Features added
1) Introduce a new emission model for the `hmmratac` function. Now
users can choose the simpler Poisson emission `--hmm-type poisson`
instead of the default Gaussian emission. As a consequence, the
saved HMM model file in json will include the hmm-type information
as well. Note that in order to be compatible with the HMM model
file from previous version, if there is no hmm-type information in
the model file, the hmm-type will be assigned as gaussian. #635
2) `hmmratac` now output narrowPeak format output. The summit
position and the peak score columns reported in the narrowPeak
output represents the position with highest foldchange value
(pileup vs average background).
3) Add `--cutoff-analysis-steps` and `--cutoff-analysis-max` for
`callpeak`, `bdgpeakcall`, and `hmmratac` so that we can
have finer resolution of the cutoff analysis report. #636 #642
4) Reduce memory usage of `hmmratac` during decoding step, by
writing decoding results to a temporary file on disk (file
location depends on the environmental TEMP setting), then loading
it back while identifying state pathes. This change will decrease
the memory usage dramatically. #628 #640
5) Fix instructions for preparing narrowPeak files for uploading
to UCSC browser, with the `--trackline` option in `callpeak`. #653
6) For gappedPeak output, set thickStart and thickEnd columns as
0, according to UCSC definition.
* Bugs fixed
1) Use `-O3` instead of `-Ofast` for compatibility. #637
* Documentation
1) Update instruction to install macs3 through conda/bioconda
2) Reorganize MACS3 docs and publish through
https://macs3-project.github.io/MACS
3) Description on various file formats used in MACS3.
2024-02-19 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.1
* Bugs fixed
1) Fixed a bug that the `hmmatac` can't correctly save the
digested signal files. #605 #611
2) Applied a patch to remove cython requirement from the installed
system. (it's needed for building the package). #606 #612
3) Relax the testing script while comparing the peaks called from
current codes and the standard peaks. To implement this, we added
'intersection' function to 'Regions' class to find the
intersecting regions of two Regions object (similar to PeakIO but
only recording chromosome, start and end positions). And we
updated the unit test 'test_Region.py' then implemented a script
'jaccard.py' to compute the Jaccard Index of two peak files. If
the JI > 0.99 we would think the peaks called and the standard
peaks are similar. This is to avoid the problem caused by
different Numpy/SciPy/sci-kit learn libraries, when certain peak
coordinates may have 10bps difference. #615 #619
4) Due to the changes in scikit-learn 1.3.0:
https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
0.3 uses Kmeans will end up with inconsistent results between
sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
hmm.GaussianHMM and adjusted the standard output from `hmmratac`
subcommand. The change is based on
https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
the random seeding of KMeans 10 times. Now the `hmmratac` results
should be more consistent (at least JI>0.99). #615 #620
* Other
1) We added some dependencies to MACS3. `hmmratc` subcommand needs
`hmmlearn` library, `hmmlearn` needs `scikit-learn` and
`scikit-learn` needs `scipy`. Since major releases have happened
for both`scipy` and `scikit-learn`, we have to set specific
version requirements for them in order to make sure the output
results from `hmmratac` are consistent.
2) We updated our documentation website using
Sphinx. https://macs3-project.github.io/MACS/
2023-11-15 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.0
1) Call variants in peak regions directly from BAM files. The
function was originally developed under code name SAPPER. Now
SAPPER has been merged into MACS as the `callvar` command. It can
be used to call SNVs and small INDELs directly from alignment
files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
the DNA sequence at the enriched genomic regions (binding sites or
accessible DNA) and to refine the alignment when necessary. We
added `simde` as a submodule in order to support fermi-lite
library under non-x64 architectures.
2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
a dedicated software to analyze ATAC-seq data. The basic idea
behind HMMRATAC is to digest ATAC-seq data according to the
fragment length of read pairs into four signal tracks: short
fragments, mono-nucleosomal fragments, di-nucleosomal fragments
and tri-nucleosomal fragments. Then integrate the four tracks
again using Hidden Markov Model to consider three hidden states:
open region, nucleosomal region, and background region. The
orginal paper was published in 2019 written in JAVA, by Evan
Tarbell. We implemented it in Python/Cython and optimize the whole
process using existing MACS functions and hmmlearn. Now it can run
much faster than the original JAVA version. Note: evaluation of
the peak calling results is still underway.
3) Speed/memory optimization. Use the cykhash to replace python
dictionary. Use buffer (10MB) to read and parse input file (not
available for BAM file parser). And many optimization tweaks. We
added memory monitoring to the runtime messages.
4) R wrappers for MACS -- MACSr for bioconductor.
5) Code cleanup. Reorganize source codes.
6) Unit testing.
7) Switch to Github Action for CI, support multi-arch testing
including x64, armv7, aarch64, s390x and ppc64le. We also test on
Mac OS 12.
8) MACS tag-shifting model has been refined. Now it will use a
naive peak calling approach to find ALL possible paired peaks at +
and - strand, then use all of them to calculate the
cross-correlation. (a related bug has been fix
[#442](https://github.com/macs3-project/MACS/issues/442))
9) BAI index and random access to BAM file now is
supported. [#449](https://github.com/macs3-project/MACS/issues/449).
10) Support of Python > 3.10
[#498](https://github.com/macs3-project/MACS/issues/498)
11) The effective genome size parameters have been updated
according to
deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
12) Multiple updates regarding dependencies, anaconda built, CI/CD
process.
13) Cython 3 is supported.
14) Documentations for each subcommand can be found under /docs
*Other*
1) Missing header line while no peaks can be called
[#501](https://github.com/macs3-project/MACS/issues/501)
[#502](https://github.com/macs3-project/MACS/issues/502)
2) Note: different numpy, scipy, sklearn may give slightly
different results for hmmratac results. The current standard
results for automated testing in `/test` directory are from Numpy
1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
2020-04-11 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.7.1
* hotfix:
Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
work.
2020-04-10 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.7
* Bugs fixed
1) MACS2 has been tested on multiple architectures to make sure it
can successfully generate consistent results. Currently the
supported architectures are: AMD64, ARM64, i386, PPC64LE, and
S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
#340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
and #370. The lesson is that if the project is built on Cython and
is aimed at memory efficiency, we should specifically define all
int/float types in pyx files such as int8_t or uint32_t using
either libc or numpy (c version) instead of relying on Cython
types such as short, long, double.
2) MACS2 setup script will check numpy and install numpy if
necessary. PR #378, issue #364
3) `bdgbroadcall` command will correctly add the score column (5th
column). The score (5th) column contains 10 times of the average
score in the broad region. PR #373, issue #362
4) The missing test on `bdgopt` subcommand has been added. PR #363
5) The obsolete option `--ratio` from `callpeak` subcommand has
been removed. PR #369, issue #366
6) Fixed the incorrect description in README on the 'maximum
length of broad region is 4 times of d' to 'maximum gap for
merging broad regions is 4 times of tag size by default'. PR #380,
issue #365.
* Other
1) CODE OF CONDUCT document has been added to MACS2 github
repository. PR #358
2019-12-12 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.6
* New Features
1) Speed up MACS2. Some programming tricks and code cleanup. The
filter_dup function replaces separate_dups. The later one was
implemented for potentially putting back duplicate reads in
certain downstream analysis. However such analysis hasn't been
implemented. Optimize the speed of writing bedGraph
files. Optimize BAM and BAMPE parsing with pointer casting instead
of python unpack.
2) The comment lines in the headers of BED or SAM files will be
correctly skipped. However, MACS2 won't check comment lines in the
middle of the file.
* Bugs fixed
1) Cutoff-analysis in callpeak command. #341
2) Issues related to SAMParser and three ELAND Parsers are
fixed. #347
* Other
1) cmdlinetest script in test/ folder has been updated to: 1. test
cutoff-analysis with callpeak cmd; 2. output the 2 lines before
and after the error or warning message during tests; 3. output
only the first 10 lines if the difference between test result and
standard result can be found; 4. prockreport monitor CPU time and
memory usage in 1 sec interval -- a bit more accurate.
2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
2019-10-31 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.5 (Py3 speed up)
* Features added
1) *Github code only and Not included in MACS2 release* New
testing data for performance test. An subsampled ENCODE2 CTCF
ChIP-seq dataset, including 5million ChIP reads and 5 million
control reads, has been included in the test folder for testing
CPU and memory usage (i.e. 5M test). Several related scripts ,
including `prockreport` for output cpu memory usage, `pyprofile`
and `pyprofile_stat` for debuging and profiling MACS2 codes, have
been included.
2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
The old hashtable.pyx implementation copied from Pandas (very old
version) doesn't work well in Python3+Cython. It slows down the
pqtable checkup using the identical Cython codes as in
v2.1.4. While running 5M test, the `__getitem__` function in the
hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
148.6s with the same number of calls in MACS2 v2.2.4. As a
consequence, the standard python dictionary implementation has
replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
faster than py2 version, but uses a bit more memory. In general,
v2.2.5 can finish 5M reads test in 20% less time than MACS2
v2.1.4, but use 15% more memory.
* Bug fixed
1) More Python3 related fixes, e.g. the return value of keys from
py3 dict. #333 #337
2019-10-01 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.4 (Python3)
* Features added
1) First Python3 version MACS2 released.
2) Version number 2.2.X will be used for MACS2 in Python3, in
parallel to 2.1.X.
3) More comprehensive test.sh script to check the consistency of
results from Python2 version and Python3 version.
4) Simplify setup.py script since the newest version transparently
supports cython. And when cython is not installed by the user,
setup.py can still compile using only C codes.
5) Fix Signal.pyx to use np.array instead of np.mat.
2019-09-30 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.4
* Features added
Github Actions is used together with Travis CI for testing and
deployment.
* Bugs fixed
PR #322:
1) #318 Random score in bdgdiff output. It turns out the sum_v is
not initialized as 0 before adding. Potential bugs are fixed in
other functions in ScoreTrack and CallPeakUnit codes.
2) #321 Cython dependency in setup.py script is removed. And place
'cythonzie' call to the correct position.
3) A typo is fixed in Github Actions script.
2019-09-19 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.3.3
* Features added
1) Support Docker auto-deploy. PR #309
2) Support Travis CI auto-testing, update unit-testing
scripts, and enable subcommand testing on small datasets.
3) Update README documents. #297 PR #306
4) `cmbreps` supports more than 2 replicates. Merged from PR #304
@Maarten-vd-Sande and PR #307 (our own chi-sq test code)
5) `--d-min` option is added in `callpeak` and `predictd`, to
exclude predictions of fragment size smaller than the given
value. Merged from PR #267 @shouldsee.
6) `--buffer-size` option is added in `predictd`, `filterdup`,
`pileup` and `refinepeak` subcommands. Users can use this option
to decrease memory usage while there are a large number of contigs
in the data. Also, now `callpeak`, `predictd`, `filterdup`,
`pileup` and `refinepeak` will suggest users to tweak
`--buffer-size` while catching a MemoryError. #313 PR #314
* Bugs fixed
1) #265 Fixed a bug where the pseudocount hasn't been applied
while calculating p-value score in ScoreTrack object.
2) Fixed bdgbroadcall so that it will report those broad peaks
without strong peak inside, a consistent behavior as `callpeak
--broad`.
3) Rename COPYING to LICENSE.
2018-10-17 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.2
* New features
1) Added missing BEDPE support. And enable the support for BAMPE
and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
subcommands. When format is BAMPE or BEDPE, The 'pileup' command
will pile up the whole fragment defined by mapping locations of
the left end and right end of each read pair. Thank @purcaro
2) Added options to callpeak command for tweaking max-gap and
min-len during peak calling. Thank @jsh58!
3) The callpeak option "--to-large" option is replaced with
"--scale-to large".
4) The randsample option "-t" has been replaced with "-i".
* Bug fixes
1) Fixed memory issue related to #122 and #146
2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
3) Fixed a bug while setting commandline qvalue cutoff.
4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
5) Fixed the calculation of average fragment length for paired-end
data. Thank @jsh58
6) Fixed bugs caused by khash while computing p/q-value and log
likelihood ratios. Thank @jsh58
7) More spelling tweaks in source code. Thank @mr-c
2016-03-09 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.1 20160309
* Retire the tag:rc.
* Fixed spelling. Merged pull request #120. Thank @mr-c!
* Change filtering criteria for reading BAM/SAM files
Related to callpeak and filterdup commands. Now the
reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
still be read although MACS2 may decide them as duplicates
later. Related to old issue #33. Sorry I forgot to address it for
years!
2016-02-26 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.1 20160226 (tag:rc Zhengyue)
* Bug fixes
1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
the former option is not supported by older GCC. Related to issues
#91, #109.
2) Issue #108 is fixed. If no peak can be found in a chromosome,
the PeakIO won't throw an error.
* New features
1) callpeak
a) A more flexible format, BEDPE, is supported. Now users can
define the left and right position of the ChIPed fragment, and
MACS2 will skip model building and directly pileup the
fragments. Related to issue #112.
b) The 'tempdir' can be specified, to save cached pileup
tracks. Originially, the temporary files were stored in
/tmp. Thank @daler! Related to issues #97 and #105.
2) bdgopt
New operations are added, to calculate the maximum or minimum value between
values in BEDGRAPH and given value.
3) bdgcmp
New method is added, to calculate the maximum value between values
defined in two BEDGRAPH files.
2015-12-22 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.0 20151222 (tag:rc Dongzhi)
* Bug fixes
1) Fix a bug while dealing with some chromosomes only containing
one read (pair). The size of dup_plus/dup_minus arrays after
filtering dups should +1.
2) Fix a bug related to the broad peak calling function in
previous versions. The gaps were miscalculated, so segmented weak
broad calls may be reported, and sometimes you would see peaks
with lower than cutoff values in the output files.
3) "Potentially" Fixed issue #105 on temporary cache files, need
further followup.
2015-07-31 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.0 20150731 (tag:rc)
* Bug fixes
1) Fixed issue #76: information about broad/narrow cutoff will be
correctly displayed.
2) Fixed issue #79: bdgopt extparam option is fixed.
3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
for filterdup command.
4) Fixed issue #78, #88 and similar issue reported in MACS google
group: MACS2 now can correctly deal with multiple alignment files
for -t or -c. The 'finalize' function will be correctly
called. Multiple files option is enabled for filterdup,
randsample, predictd, pileup and refinepeak commands.
5) A related issue to #88, when BAMPE mode is used, PE pairs will
be sorted by leftmost then rightmost ends.
6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
array. This will cause 'callpeak --nolambda' hang forever while
calculating pvalues and qvalues.
2015-04-20 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.0 20150420 (tag:rc)
* New commands
1) bdgopt: some convenient functions to modify bedGraph files.
2) cmbreps: Combine scores from two replicates. Including three
methods: 1. take the maximum; 2. take the average; 3. use Fisher's
method to combine two p-value scores. After that, user can use
bdgpeakcall to call peaks on combined scores.
* New features
1) callpeak and bdgpeakcall now can try to analyze the
relationship between p-values and number/length of peaks then
generate a summary to help users decide an appropriate cutoff.
2) callpeak now can accept fold-enrichment cutoff as a filter for
final peak calls.
* Performance
Now MACS2 runs about 3X as fast as previous version. Trade
clean python codes for speed... Now while processing 50M ChIP vs
50M control, it will take only 10 minutes.
* Bug fixes
1) Sampling function in BAMPE mode.
2) Callpeak while there are >= 2 input files for -t or -c.
3) While reading BAM/SAM, those secondary or supplementary
alignments will be correctly skipped.
4) Fixed issue #33: Explanation is added to callpeak --keep-dup
option that MACS2 will discard those SAM/BAM alignments with bit
1024 no matter how --keep-dup is set.
5) Fixed issue #49: setuptools is used intead of distutils
6) Fixed issue #51: fix the problem when using --trackline
argument when control file is absent.
7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
read mapped to minus strand. Previous implementation will find
incorrect 5' end if there is indel in alignment.
8) Fixed issue #56: An incorrect sorting method used for BAMPE
mode which will cause incorrect filtering of duplicated reads. Now
fixed.
9) Issue #63: Merged from jayhesselberth@github, extsize now can
be 1.
10) Issue #71: Merged from aertslab@github, close file descriptor
after creating them with mkstemp().
2014-06-16 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.0 20140616 (tag:rc)
* callpeak module
"--ratio" is added to manually assign the scaling factor of ChIP
vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
implementing the patch file!
"--shift" is added to move cutting ends (5' end of reads) around,
in order to process DNAse-Seq data, e.g., use "--shift -100
--extsize 200" to get 200bps fragments around 5' ends. For general
ChIP-Seq data analysis, this option should be always set as
0. Thank Xi Chen and Anshul Kundaje for the discussions in user
group!
** Do not output negative fragment size from cross-correlation
analysis. Thank Alvin Qin for the feedback!
** --half-ext and --control-shift are removed. For complex read
shifting and extending, combine '--shift' and '--extsize'
options. For comparing two conditions, use 'bdgdiff' module
instead.
** a bug is fixed to output the last pileup value in bdg file
correctly.
* filterdup
A 'dry-run' option is added to only output numbers, including the
number of allowed duplicates, the total number of reads before and
after filtering duplicates and the estimated duplication
rate. Thank John Urban for the suggestion!
2013-12-16 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20131216 (tag:alpha)
bug fixes and tweaks
* We changed license from Artistic License to 3-clauses BSD license.
Yes. Simpler the better.
* Process paired-end data with "-f BAMPE" without control
* GappedPeak output for --broad option has been fixed again to be
consistent with official UCSC format. We add 1bp pseudo-block to
left and/or right of broad region when necessary, so that you can
virtualize the regions without strong enrichment inside
successfully. In downstream analysis except for virtualization,
you may need to remove all 1bps blocks from gappedPeak file.
* diffpeak subcommand is temporarily disabled. Till we
re-implement it.
2013-10-28 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20131028 (tag:alpha)
* callpeak --call-summits improvement
The smoothing window length has been fixed as fragment length
instead of short read length. The larger smoothing window will
grant better smoothing results and better sub-peak summits
detection.
* --outdir and --ofile options for almost all commands
Thank Björn Grüning for initially implementing these options!
Now, MACS2 will save results into a specified
directory by '--outdir' option, and/or save result into a
specified file by '--ofile' option. Note, in case '--ofile' is
available for a subcommand, '-o' now has been adjusted to be the
same as '--ofile' instead of '--o-prefix'.
Here is the list of changes. For more detail, use 'macs2 xxx -h'
for each subcommand:
** callpeak: --outdir
** diffpeak: Not implemented
** bdgpeakcall: --outdir and --ofile
** bdgbroadcall: --outdir and --ofile
** bdgcmp: --outdir and --ofile. While --ofile is used, the number
and the order of arguments for --ofile must be the same as for -m.
** bdgdiff: --outdir and --ofile
** filterdup: --outdir
** pileup: --outdir
** randsample: --outdir
** refinepeak: --outdir and --ofile
2013-09-15 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130915 (tag:alpha)
* callpeak Added a new option --buffer-size
This option is to tweak a previously hidden parameter that
controls the steps to increase array size for storing alignment
information. While in some rare cases, the number of
chromosomes/contigs/scaffolds is huge, the original default
setting will cause a huge memory waste. In these cases, we
recommend to decrease --buffer-size (e.g., 1000) to save memory,
although the decrease will slow process to read alignment files.
* an optimization to speed up pvalue-qvalue statistics
Previously, it took a hour to prepare p-q-table for 65M vs 65M
human TF library, and now it will take 10 minutes. It was due to a
single line of code to get a value from a numpy array ...
* fixed logLR bugs.
2013-07-31 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130731 (tag:alpha)
* callpeak --call-summits
Fix bugs causing callpeak --call-summits option generating extra
number of peaks and inconsistent peak boundaries comparing to
default option. Thank Ben Levinson!
* bdgcmp output
Fix bugs causing bdgcmp output logLR all in positive values. Now
'depletion' can be correctly represented as negative values.
* bdgdiff
Fix the behavior of bdgdiff module. Now it can take four
bedGraph files, then use logLR as cutoff to call differential
regions. Check command line of bdgdiff for detail.
2013-07-13 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130713 (tag:alpha)
* fix bugs while output broadPeak and gappedPeak.
Note. Those weak broad regions without any strong enrichment
regions inside won't be saved in gappedPeak file.
* bdgcmp -T and -C are merged into -S and description is updated.
Now, you can use it to override SPMR values in your input for
bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
statistics will cause weird results ( in most cases, lower
significancy), and won't be consistent with MACS2 callpeak
behavior. So if you have SPMR bedGraphs, input the smaller/larger
sample size in MILLION according to 'callpeak --to-large' option.
2013-07-10 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130710 (tag:alpha)
* fix BED style output format of callpeak module:
1) without --broad: narrowPeak (BED6+4) and BED for summit will be
the output. Old BED format file won't be saved.
2) with --broad: broadPeak (BED6+3) for broad region and
gappedPeak (BED12+3) for chained enriched regions will be the
output. Old BED format, narrowPeak format, summit file won't be
saved.
* bdgcmp now can accept list of methods to calculate scores. So
you can run it once to generate multiple types of scores. Thank
Jon Urban for this suggestion!
* C codes are re-generated through Cython 0.19.1.
2013-05-21 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130520 (tag:alpha)
* broad peak calling modules are modified in order to report all
relexed regions even there is no strong enrichment inside.
2013-05-01 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130501 (tag:alpha)
* Memory usage is decreased to about 1/4-1/5 of previous usage
Now, the internal data structure and algorithm are both
re-organized, so that intermediate data wouldn't be saved in
memory. Intead they will be calculated on the fly. New MACS2 will
spend longer time (1.5 to 2 times) however it will use less memory
so can be more usable on small mem servers.
* --seed option is added to callpeak and randsample commands
Thank Mathieu Gineste for this suggestion!
2013-03-05 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.10 20130306 (tag:alpha)
* diffpeak module New module to detect differential binding sites
with more statistics.
* Introduced --refine-peaks
Calculates reads balancing to refine peak summits
* Ouput file names prefix
Correct encodePeak to narrowPeak, broadPeak to bed12.
2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
MACS version 2.0.10 (tag:alpha not released)
* Introduced BAMPEParser
Reads PE data directly, requires bedtools for now
* Introduced --call-summits
Uses signal processing methods to call overlapping peaks
* Added --no-trackline
By default, files have descriptive tracklines now
* new refinepeak command (experimental)
This new function will use a similar method in SPP (wtd), to
analyze raw tag distribution in peak region, then redefine the
peak summit where plus and minus tags are evenly distributed
around.
* Changes to output *
cPeakDetect.pyx has full support for new print/write methods and
--call-peaks, BAMPEParser, and use of paired-end data
* Parser optimization
cParser.pyx is rewritten to use io.BufferedReader to speed
up. Speed is doubled.
Code is reorganized -- most of functions are inherited from
GenericParser class.
* Use cross-correlation to calculate fragment size
First, all pairs will be used in prediction for fragment
size. Previously, only no more than 1000 pairs are used. Second,
cross-correlation is used to find the best phase difference
between + and - tag pileups.
* Speed up p-value and q-value calculation
This part is ten times faster now. I am using a dictionary to
cache p-value results from Poisson CDF function. A bit more memory
will be used to increase speed. I hope this dictionary would not
explode since the possible pairs of ChIP signal and control lambda
are hugely redundant. Also, I rewrited part of q-value
calculation.
* Speed up peak detection
This part is about hundred of times faster now. Optimizations
include using Numpy functions as much as possible, and making loop
body as small as possible.
* Post-processing on differential calls
After macs2diff finds differential binding sites between two
conditions, it will try to annotate the peak calls from one of two
conditions, describe the changes ...
* Fragment size prediction in macs2diff
Now by default, macs2diff will try to use the average fragment
size from both condition 1 and condition 2 for tag extension and
peak calling. Previously, by default, it will use different sizes
unless --nomodel is specified.
Technically, I separate model building processes out. So macs2diff
will build fragment sizes for condition 1 and 2 in parallel (2
processes maximum), then perform 4-way comparisons in parallel (4
processes maximum).
* Diff score
Combine two p/qscore tracks together. At regions where condition 1
is higher than condition 2, score would be positive, otherwise,
negative.
* SAMParser and BAMParser
Bug fixed for paired-end sequencing data.
* BedGraph.pyx
Fixed a bug while calling peaks from BedGraph file. It previously
mistakenly output same peaks multiple times at the end of
chromosome.
2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
MACS version 2.0.9 (tag:alpha)
* Auto fixation on predicted d is turned off by default!
Previous --off-auto is now default. MACS will not automatically
fix d less than 2 times of tag size according to
--shiftsize. While tag size is getting longer nowadays, it would
be easier to have d less than 2 times of tag size, however d may
still be meaningful and useful. Please judge it using your own
wisdom.
* Scaling issue
Now, the default scaling while treatment and input are unbalanced
has been adjusted. By default, larger sample will be scaled down
linearly to match the smaller sample. In this way, background
noise will be reduced more than real signals, so we expect to have
more specific results than the other way around (i.e. --to-large
is set).
Also, an alternative option to randomly sample larger data
(--down-sample) is provided to replace default linear
scaling. However, this option will cause results irresproducible,
so be careful.
* randsample script
A new script 'randsample' is added, which can randomly sample
certain percentage or number of tags.
* Peak summit
Now, MACS will decide peak summits according to pileup height
instead of qvalue scores. In this way, the summit may be more
accurate.
* Diff score
MACS calculate qvalue scores as differential scores. When compare
two conditions (saying A and B), the maximum qscore for comparing
A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
will be computed. If maxqscore_a2b is bigger, the diff score is
+maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
MACS version 2.0.8 (tag:alpha)
* bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
New script bdgbroadcall and the extra option '--broad' for macs2
script, can be used to call broad regions with a loose cutoff to
link nearby significant regions. The output is represented as
BED12 format.
* MACS2/IO/cScoreTrack.pyx
Fix q-value calculation to generate forcefully monotonic values.
* bin/eland*2bed, bin/sam2bed and bin/filterdup
They are combined to one more powerful script called
"filterdup". The script filterdup can filter duplicated reads
according to sequencing depth and genome size. The script can also
convert any format supported by MACS to BED format.
2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
MACS version 2.0.7 (tag:alpha)
* bin/macsdiff renamed to bin/bdgdiff
Now this script will work as a low-level finetuning tool as bdgcmp
and bdgpeakcall.
* bin/macs2diff
A new script to take treatment and control files from two
condition, calculate fragment size, use local poisson to get
pvalues and BH process to get qvalues, then combine 4-ways result
to call differential sites.
This script can use upto 4 cpus to speed up 4-ways calculation. (
I am trying multiprocessing in python. )
* MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
All above files are modified for the new macs2diff script.
* bin/macs2, bin/macs2diff, MACS2/OptValidator.py
Now q-value 0.01 is the default cutoff. If -p is specified,
p-value cutoff will be used instead.
2011-07-25 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.6 (tag:alpha)
* bin/macsdiff
A script to call differential regions. A naive way is introduced
to find the regions where:
1. signal from condition 1 is larger than input 1 and condition 2 --
unique region in condition 1;
2. signal from condition 2 is larger than input 2 and condition 1
-- unique region in condition 2;
3. signal from condition 1 is larger than input 1, signal from
condition 2 is larger than input 2, however either signal from
condition 1 or 2 is not larger than the other.
Here 'larger' means the pvalue or qvalue from a Poisson test is
under certain cutoff.
(I will make another script to wrap up mulitple scripts for
differential calling)
2011-07-07 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.0.5 (tag:alpha)
* bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
MACS2/IO/cPeakIO.pyx
Use hash to store peak information. Add back the feature to deal
with data without control.
Fix bug which incorrectly allows small peaks at the end of
chromosomes.
* bin/bdgpeakcall, bin/bdgcmp
Fix bugs. bdgpeakcall can output encodePeak format.