Skip to content

Commit

Permalink
Merge pull request #2 from CyrilJl/dev
Browse files Browse the repository at this point in the history
1.3
  • Loading branch information
CyrilJl authored Aug 1, 2024
2 parents 4acba6e + d08bb63 commit 70c08c3
Show file tree
Hide file tree
Showing 13 changed files with 403 additions and 287 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,10 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest pandas numpy cython
pip install pytest pandas numpy numba
- name: Install optimask
run: |
cython optimask/optimask_cython.pyx
python setup.py install
- name: Run Pytest
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,12 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine numpy cython
pip install setuptools wheel twine numpy numba
- name: Build and publish
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
run: |
cython optimask/optimask_cython.pyx
python setup.py sdist
twine upload dist/*
2 changes: 0 additions & 2 deletions MANIFEST.in

This file was deleted.

27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,33 @@ OptiMask’s algorithm is useful for handling unstructured NaN patterns, as show

<img src="https://github.com/CyrilJl/OptiMask/blob/main/docs/source/_static/example2.png?raw=true" width="400">

## Performances
``OptiMask`` efficiently handles large matrices, delivering results within reasonable computation times:

```python
from optimask import OptiMask
import numpy as np

def generate_random(m, n, ratio):
"""Missing at random arrays"""
arr = np.zeros((m, n))
nan_count = int(ratio * m * n)
indices = np.random.choice(m * n, nan_count, replace=False)
arr.flat[indices] = np.nan
return arr

x = generate_random(m=100_000, n=1_000, ratio=0.02)
%time rows, cols = OptiMask(verbose=True).solve(x)
>>> Trial 1 : submatrix of size 37094x49 (1817606 elements) found.
>>> Trial 2 : submatrix of size 35667x51 (1819017 elements) found.
>>> Trial 3 : submatrix of size 37908x48 (1819584 elements) found.
>>> Trial 4 : submatrix of size 37047x49 (1815303 elements) found.
>>> Trial 5 : submatrix of size 37895x48 (1818960 elements) found.
>>> Result: the largest submatrix found is of size 37908x48 (1819584 elements) found.
>>> CPU times: total: 172 ms
>>> Wall time: 435 ms
```

## Documentation

For detailed documentation, including installation instructions, API usage, and examples, visit [OptiMask Documentation](https://optimask.readthedocs.io/en/latest/index.html).
Expand Down
2 changes: 1 addition & 1 deletion docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ dependencies:
- pandoc
- numpy
- pandas
- cython
- numba
6 changes: 6 additions & 0 deletions docs/source/future.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@
What's New?
###########


Version 1.3 (July 31, 2024)
~~~~~~~~~~~~~~~~~~~~~~~~~~
- drop cython for numba + various optimizations (speed and memory)
- special cases of NaNs in one row or on columns detected for faster processing

Version 1.2 (June 19, 2024)
~~~~~~~~~~~~~~~~~~~~~~~~~~
- ``np.isnan(x).nonzero()`` replaced by ``np.unravel_index(np.flatnonzero(np.isnan(x)), x.shape)``, 2x faster
Expand Down
333 changes: 171 additions & 162 deletions notebooks/Optimask.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion optimask/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@

__all__ = ['OptiMask']

__version__ = '1.2.5'
__version__ = '1.3'
12 changes: 12 additions & 0 deletions optimask/_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,15 @@ def check_params(param, params=None, types=None):
def warning(msg):
# Trigger a warning with the provided message
return warnings.warn(msg)


class EmptyInputError(ValueError):
"""Raised when the input array or DataFrame is empty."""


class InvalidDimensionError(ValueError):
"""Raised when the input numpy array does not have exactly 2 dimensions."""


class OptiMaskAlgorithmError(ValueError):
"""Raised when the OptiMask algorithm encounters an error during optimization."""
Loading

0 comments on commit 70c08c3

Please sign in to comment.