Skip to content

Commit

Permalink
Merge pull request #576 from macs3-project/feat/macs3/hmmratac
Browse files Browse the repository at this point in the history
MACS3 b3 update `hmmratac` and `Cython` issue
  • Loading branch information
taoliu authored Jul 30, 2023
2 parents 0c161d5 + d734ebd commit 076ee83
Show file tree
Hide file tree
Showing 37 changed files with 203,035 additions and 199,879 deletions.
57 changes: 57 additions & 0 deletions .github/workflows/build-and-test-MACS3-macos12.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: MACS3 CI MacOS 12

on: [push, pull_request]

jobs:
build:
runs-on: macos-12
name: Build on MacOS 12 with Python 3.11
steps:
- name: Checkout MACS
uses: actions/checkout@v3
with:
submodules: 'true'
- name: Cache pip
uses: actions/cache@v3
with:
# This path is specific to Ubuntu
path: ~/.cache/pip
# Look to see if there is a cache hit for the corresponding requirements file
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
${{ runner.os }}-
- name: Setup venv
run: |
# create venv
python3 -m venv macs3venv
- name: Install dependencies
run: |
# make sure pip is the lastest
source macs3venv/bin/activate
python3 -m pip install --upgrade pip
pip3 install pytest
if [ -f requirements.txt ]; then pip3 install --upgrade -r requirements.txt; fi
- name: Install MACS
run: |
# we use pip install instead of old fashion setup.py install
source macs3venv/bin/activate
pip install .
- name: Test with pytest
run: |
source macs3venv/bin/activate
pytest --runxfail && cd test && ./cmdlinetest-nohmmratac macs3
- name: Build sdist
run: |
source macs3venv/bin/activate
python setup.py sdist
- name: Archive sdist
uses: actions/upload-artifact@v3
with:
name: sdist file
path: |
dist/*.tar.gz
7 changes: 4 additions & 3 deletions .github/workflows/build-and-test-MACS3-non-x64.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: CI non x64
name: MACS3 CI non x64

on: [push, pull_request]

Expand All @@ -14,11 +14,11 @@ jobs:
include:
- arch: aarch64
distro: bullseye
- arch: armv7 # scipy installation issue before, let's retest...
- arch: armv7
distro: bullseye
- arch: ppc64le
distro: bullseye
- arch: s390x # scipy installation issue before, let's retest...
- arch: s390x
distro: bullseye
steps:
- name: Checkout MACS
Expand Down Expand Up @@ -57,6 +57,7 @@ jobs:
apt-get update -qq -y
apt-get install apt-utils
apt-get install -yq libblas3 liblapack3 libblas-dev liblapack-dev
# we rely on Debian/Linux python packages for numpy/scipy/sklearn
apt-get install -yq python3 python3-pip python3-virtualenv python3-wheel python3-numpy python3-scipy python3-sklearn cython3
apt-get install -yq procps zlib1g zlib1g-dev gfortran
run: |
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/build-and-test-MACS3-x64.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: CI x64
name: MACS3 CI x64

on: [push, pull_request]

Expand Down Expand Up @@ -38,8 +38,6 @@ jobs:
# make sure pip is the lastest
python -m pip install --upgrade pip
pip install pytest
# install cython first...
pip install cython
if [ -f requirements.txt ]; then pip install --upgrade -r requirements.txt; fi
- name: Install MACS
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/publish-to-anaconda.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflows will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Publish to Anaconda/macs3
name: MACS3 Publish to Anaconda/macs3

on:
release:
Expand All @@ -12,7 +12,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, 3.10, 3.11]
python-version: ['3.11']
arch: ['x64']
name: Build conda package in x64 under Python ${{ matrix.python-version }}
steps:
Expand Down
6 changes: 2 additions & 4 deletions .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflows will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Upload to PYPI
name: MACS3 Upload to PYPI

on:
release:
Expand All @@ -19,12 +19,10 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
# install cython first...
pip install cython
if [ -f requirements.txt ]; then pip install --upgrade -r requirements.txt; fi
# setuptools, wheel and twine are needed to upload to pypi
pip install setuptools wheel twine
Expand Down
73 changes: 40 additions & 33 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,39 +1,20 @@
2023-06-23 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.0b2
2023-07-28 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.0b3

* New features in MACS3:

1) Speed/memory optimization. Use the cykhash to replace python
dictionary. Use buffer (10MB) to read and parse input file (not
available for BAM file parser). And many optimization tweaks. We
added memory monitoring to the runtime messages.

2) Code cleanup. Reorganize source codes.

3) Unit testing.

4) R wrappers for MACS -- MACSr

5) Switch to Github Action for CI, support multi-arch testing
including x64, armv7, aarch64, s390x and ppc64le.
2) Call variants in peak regions directly from BAM files. The
function was originally developed under code name SAPPER. Now
SAPPER has been merged into MACS. Also, `simde` has been added as
a submodule in order to support fermi-lite library under non-x64
architectures.

6) MACS tag-shifting model has been refined. Now it will use a
naive peak calling approach to find ALL possible paired peaks at +
and - strand, then use all of them to calculate the
cross-correlation. (a related bug has been fix #442)

7) Call variants in peak regions directly from BAM files. The
function was originally developed under code name SAPPER. Now
SAPPER has been merged into MACS. Also, `simde` has been added as
a submodule in order to support fermi-lite library under non-x64
architectures.

8) BAI index and random access to BAM file now is supported. #449
And user can use original BAM file (instead of the subset of BAM
file as in SAPPER) in the `callvar` command.

9) Support of Python > 3.10 #497 #498

10) HMMRATAC module is added. HMMRATAC is a dedicated software to
3) HMMRATAC module is added. HMMRATAC is a dedicated software to
analyze ATAC-seq data. The basic idea behind HMMRATAC is to digest
ATAC-seq data according to the fragment length of read pairs into
four signal tracks: short fragments, mononucleosomal fragments,
Expand All @@ -44,19 +25,45 @@
in JAVA, by Evan Tarbell. We implemented it in Python/Cython and
optimize the whole process using existing MACS functions and
hmmlearn. Now it can run much faster than the original JAVA
version. Note: evaluation of the peak calling results is underway.
version. Note: evaluation of the peak calling results is underway.

4) Code cleanup. Reorganize source codes.

5) Unit testing.

6) R wrappers for MACS -- MACSr

7) Switch to Github Action for CI, support multi-arch testing
including x64, armv7, aarch64, s390x and ppc64le. We also test on
Mac OS 12.

8) MACS tag-shifting model has been refined. Now it will use a
naive peak calling approach to find ALL possible paired peaks at +
and - strand, then use all of them to calculate the
cross-correlation. (a related bug has been fix #442)

9) BAI index and random access to BAM file now is supported. #449
And user can use original BAM file (instead of the subset of BAM
file as in SAPPER) in the `callvar` command.

10) Support of Python > 3.10 #497 #498

11) The effective genome size parameters have been updated
according to deeptools. #508

12) Memory usage now will be displayed in the runtime message.

13) Multiple updates regarding dependencies, anaconda built, CI/CD
12) Multiple updates regarding dependencies, anaconda built, CI/CD
process.


13) Cython support to ~0.29. Cython 3 is not supported yet.

* Other:
1) Missing header line while no peaks can be called #501 #502

2) Note: different numpy, scipy, sklearn may give slightly
different results for hmmratac results. The current standard
results for automated testing in `/test` directory are from Numpy
1.25.1, Scipy 1.11.1, and sklearn 1.3.0.

2020-04-11 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.7.1

Expand Down
26 changes: 22 additions & 4 deletions MACS3/Commands/hmmratac_cmd.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Time-stamp: <2023-06-08 11:03:46 Tao Liu>
# Time-stamp: <2023-07-28 12:10:12 Tao Liu>

"""Description: Main HMMR command
Expand Down Expand Up @@ -143,7 +143,7 @@ def run( args ):

# now we will prepare the weights for each fragment length for
# each of the four distributions based on the EM results
weight_mapping = generate_weight_mapping( fl_list, em_means, em_stddevs )
weight_mapping = generate_weight_mapping( fl_list, em_means, em_stddevs, min_frag_p = options.min_frag_p )

options.info( f"# Generate short, mono-, di-, and tri-nucleosomal signals")
digested_atac_signals = generate_digested_signals( petrack, weight_mapping )
Expand Down Expand Up @@ -189,7 +189,7 @@ def run( args ):

# Let MACS3 do the cutoff analysis to help decide the lower and upper cutoffs
with open(cutoffanalysis_file, "w") as ofhd_cutoff:
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking ) )
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking, max_score = 1000 ) )
#raise Exception("Cutoff analysis only.")
sys.exit(1)

Expand All @@ -201,6 +201,19 @@ def run( args ):
if options.hmm_file:
# skip this step if hmm_file is given
options.info( f"#3 Skip this step of looking for training set since a Hidden Markov Model file has been provided!")
elif options.hmm_training_regions:
# if a training region file is provided - need to read in the bedfile and skip the peak calling step
options.info(f"#3 Read training regions from BED file: {options.hmm_training_regions}")
# from refinepeak_cmd.py:
peakio = open(options.hmm_training_regions,"rb")
peaks = PeakIO()
for l in peakio:
fs = l.rstrip().split()
peaks.add( chromosome=fs[0], start=int(fs[1]), end=int(fs[2])) #change based on what expected input file should contain
peakio.close()
training_regions = Regions()
training_regions.init_from_PeakIO( peaks )
options.info("# Training regions have been read from bedfile")
else:
# Find regions with fold change within determined range to use as training sites.
# Find regions with zscore values above certain cutoff to exclude from viterbi.
Expand All @@ -223,7 +236,7 @@ def run( args ):

# Let MACS3 do the cutoff analysis to help decide the lower and upper cutoffs
with open(cutoffanalysis_file, "w") as ofhd_cutoff:
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking ) )
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking, max_score = 1000 ) )

# we will check if anything left after filtering
if peaks.total > options.hmm_maxTrain:
Expand Down Expand Up @@ -299,6 +312,11 @@ def run( args ):
# write hmm into model file
options.info( f"# Write HMM parameters into JSON: {hmm_modelfile}")
hmm_model_save( hmm_modelfile, hmm_model, options.hmm_binsize, i_open_region, i_nucleosomal_region, i_background_region )

# if --modelonly option provided, exit script after hmm model is saved
if options.hmm_modelonly:
options.info( f"# Complete - HMM model was saved, program exited (--modelonly option was provided) ")
sys.exit()

# Now tell users the parameters of the HMM
assignments = [ "", "", "" ]
Expand Down
10 changes: 5 additions & 5 deletions MACS3/Signal/BedGraph.pyx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# cython: language_level=3
# cython: profile=True
# Time-stamp: <2023-06-08 10:52:59 Tao Liu>
# Time-stamp: <2023-07-28 12:06:48 Tao Liu>

"""Module for BedGraph data class.
Expand Down Expand Up @@ -1131,7 +1131,7 @@ cdef class bedGraphTrackI:
#ret.merge_regions()
return ret

cpdef str cutoff_analysis ( self, int32_t max_gap, int32_t min_length, int32_t steps = 100 ):
cpdef str cutoff_analysis ( self, int32_t max_gap, int32_t min_length, int32_t steps = 100, float32_t max_score = 1000 ):
"""
Cutoff analysis function for bedGraphTrackI object.
Expand Down Expand Up @@ -1171,8 +1171,8 @@ cdef class bedGraphTrackI:

#midvalue = self.minvalue/2 + self.maxvalue/2
#s = float(self.minvalue - midvalue)/steps
minv = self.minvalue
maxv = self.maxvalue
minv = max( 0, self.minvalue )
maxv = min( self.maxvalue, max_score )

s = float(maxv - minv)/steps

Expand Down Expand Up @@ -1265,4 +1265,4 @@ cdef np.ndarray calculate_elbows( np.ndarray values, float32_t threshold=0.01):
# Find all points where the change in slope is significantly larger than the average
elbows = np.where(delta_slopes > avg_delta_slope + threshold)[0]

return elbows
return elbows
Loading

0 comments on commit 076ee83

Please sign in to comment.