Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] - Feature/numba parallel #43

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open

Conversation

vc1492a
Copy link
Owner

@vc1492a vc1492a commented Sep 17, 2020

This feature addresses #36 and adds parallelization to the distance calculation between observations through the optional Numba library (which JIT compiles the code for faster run times). While parallelization is confirmed through testing using htop (see below screenshot), some further testing is needed before merging into the dev branch and later to main for public use.

No Parallelization, only Numba JIT
Screen Shot 2020-09-17 at 8 41 54 AM

Numba JIT with Parallelization
Screen Shot 2020-09-17 at 8 42 34 AM

It should be noted that any speed increases brought through parallelization will not be utilized if a pre-existing distance matrix is provided for calculation of local outlier probability scores (which is possible with PyNomaly). This has been noted in readme.md shortly after introducing the option of parallelization.

Note that in order to function on both an Intel Core Atom (circa 2015, 2 cores) and an Intel Core i9 (circa 2019, 8 cores), a newer version of numba was required, moving from version 0.45.x to 0.51.2. Speed improvements - as a percentage of the original speed - were greater on the Atom processor compared to the Core i9. Testing on x86 CPU architectures has so far been successful, but Numba seems to be unable to JIT compile the code on IBM Power8 CPUs (>= 16 cores).

The code will now be tested in several different environment prior to merging, with any issues and successes reported here.

@vc1492a vc1492a added enhancement New feature of request in progress This issue is being actively worked on labels Sep 17, 2020
@vc1492a vc1492a self-assigned this Sep 17, 2020
@coveralls
Copy link

coveralls commented Sep 17, 2020

Pull Request Test Coverage Report for Build 142

  • 32 of 44 (72.73%) changed or added relevant lines in 1 file are covered.
  • 11 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-6.2%) to 93.188%

Changes Missing Coverage Covered Lines Changed/Added Lines %
PyNomaly/loop.py 32 44 72.73%
Files with Coverage Reduction New Missed Lines %
PyNomaly/loop.py 11 93.19%
Totals Coverage Status
Change from base Build 126: -6.2%
Covered Lines: 342
Relevant Lines: 367

💛 - Coveralls

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

On IBM Power8:

(venv-pynomaly) vconstan@SNA-MINSKY-N03:~/projects/PyNomaly$ python examples/numba_speed_diff.py
/home/vconstan/projects/PyNomaly/PyNomaly/loop.py:518: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function _compute_distance_and_neighbor_matrix failed at nopython mode lowering due to: scipy 0.16+ is required for linear algebra

File "PyNomaly/loop.py", line 537:
    def _compute_distance_and_neighbor_matrix(
        <source elided>
                diff = clust_points_vector[p[0]] - clust_points_vector[p[1]]
                d = np.dot(diff, diff) ** 0.5
                ^

During: lowering "$88call_method.23 = call $82load_method.20(diff, diff, func=$82load_method.20, args=[Var(diff, loop.py:536), Var(diff, loop.py:536)], kws=(), vararg=None)" at /home/vconstan/projects/PyNomaly/PyNomaly/loop.py (537)
  @staticmethod
/home/vconstan/.conda/envs/venv-pynomaly/lib/python3.8/site-packages/numba/core/object_mode_passes.py:177: NumbaWarning: Function "_compute_distance_and_neighbor_matrix" was compiled in object mode without forceobj=True.

File "PyNomaly/loop.py", line 519:
    @staticmethod
    def _compute_distance_and_neighbor_matrix(
    ^

  warnings.warn(errors.NumbaWarning(warn_msg,
/home/vconstan/.conda/envs/venv-pynomaly/lib/python3.8/site-packages/numba/core/object_mode_passes.py:187: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "PyNomaly/loop.py", line 519:
    @staticmethod
    def _compute_distance_and_neighbor_matrix(
    ^

  warnings.warn(errors.NumbaDeprecationWarning(msg,

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

The above issue on IBM Power8 was related to an environmental error (scipy was not installed). Since scipy is needed for numba, this has now been reflected as an optional requirement in readme.md.

No Parallelization, only Numba JIT
Screen Shot 2020-09-17 at 9 14 13 AM

Numba JIT with Parallelization
Screen Shot 2020-09-17 at 9 14 28 AM

🚀 🚀 🚀

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

Given that there is a trade-off between the number of cores to utilize in parallel computation and communication between the parallel threads, it may be nice to allow users to set the number of concurrent threads to execute in parallel.

This seems to be set through a Numba environmental variable, and may be worth exploring adding as an additional, optional parameter when executing distance calculations in parallel: https://numba.pydata.org/numba-doc/latest/user/threading-layer.html#setting-the-number-of-threads

@vc1492a vc1492a mentioned this pull request Sep 17, 2020
@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

Added a num_threads parameter that can be used to specify the number of threads. So far, adding more threads - at least with how the parallelism is currently implemented - seems to slow down computation time when processing 25,000 values.

[ ================================================================================ ] 100.00%
Computation took 94.4145040512085 seconds with Numba JIT with parallel processing, using 1 thread.
[ ================================================================================ ] 100.00%
Computation took 114.98689579963684 seconds with Numba JIT with parallel processing, using 2 thread.
[ ================================================================================ ] 100.00%
Computation took 139.79329085350037 seconds with Numba JIT with parallel processing, using 3 thread.
[ ================================================================================ ] 100.00%
Computation took 168.51009488105774 seconds with Numba JIT with parallel processing, using 4 thread.

More investigation is needed to see if the above behavior is machine-specific or code related, but we now have the ability to parallelize distinct portions of the code and set the number of threads as well when using numba.

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 18, 2020

Results from another machine:

[ ================================================================================ ] 100.00%
Computation took 34.91723585128784 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 32.24922227859497 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 30.427764892578125 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 30.22746515274048 seconds with Numba JIT with parallel processing, using 4 thread(s).

@vc1492a vc1492a added the help wanted Extra attention is needed label Sep 18, 2020
@vc1492a
Copy link
Owner Author

vc1492a commented Oct 1, 2020

[ ================================================================================ ] 100.00%
Computation took 50.41339111328125 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 64.93466305732727 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 59.55153703689575 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 60.493231773376465 seconds with Numba JIT with parallel processing, using 4 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.03501510620117 seconds with Numba JIT with parallel processing, using 5 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.178765058517456 seconds with Numba JIT with parallel processing, using 6 thread(s).
[ ================================================================================ ] 100.00%
Computation took 65.13408589363098 seconds with Numba JIT with parallel processing, using 7 thread(s).
[ ================================================================================ ] 100.00%
Computation took 65.27309513092041 seconds with Numba JIT with parallel processing, using 8 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.19127082824707 seconds with Numba JIT with parallel processing, using 9 thread(s).
[ ================================================================================ ] 100.00%
Computation took 59.75213074684143 seconds with Numba JIT with parallel processing, using 10 thread(s).
[ ================================================================================ ] 100.00%
Computation took 57.64805293083191 seconds with Numba JIT with parallel processing, using 11 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.80255579948425 seconds with Numba JIT with parallel processing, using 12 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.80128788948059 seconds with Numba JIT with parallel processing, using 13 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.00968599319458 seconds with Numba JIT with parallel processing, using 14 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.198336124420166 seconds with Numba JIT with parallel processing, using 15 thread(s).
[ ================================================================================ ] 100.00%
Computation took 57.532896995544434 seconds with Numba JIT with parallel processing, using 16 thread(s).

Results from another run.

@medvidov
Copy link

medvidov commented Oct 3, 2020

Results from another machine (4 core CPU, running from WSL):

[ ================================================================================ ] 100.00%
Computation took 51.52172231674194 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.880839347839355 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.5437228679657 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.710304260253906 seconds with Numba JIT with parallel processing, using 4 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.60258507728577 seconds with Numba JIT with parallel processing, using 5 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.15400314331055 seconds with Numba JIT with parallel processing, using 6 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.54375123977661 seconds with Numba JIT with parallel processing, using 7 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.39351201057434 seconds with Numba JIT with parallel processing, using 8 thread(s).
'''

@vc1492a
Copy link
Owner Author

vc1492a commented Feb 3, 2021

Refactored how the processing is handled so that we see a speed improvement when using Numba and upping the number of cores. Once I handle the below issue, I'll report back with some numbers in regards to speed of computation.

To accomplish multi-core processing, this necessitated changes in the progress bar, which is still a work in progress. One of the key challenges currently is to flush the stdout in such a way that is compatible with Numba. While print statements are supported with Numba compiled functions, it doesn't seem that sys.stdout.flush() is supported.

@vc1492a vc1492a added on hold This issue to be resolved at a later time and removed in progress This issue is being actively worked on labels Apr 29, 2024
@vc1492a
Copy link
Owner Author

vc1492a commented Apr 29, 2024

Placing this issue on hold while other repository issues are resolved - this is low priority and can be resolved at a later time.

@vc1492a vc1492a added the low priority This issue is a lower priority relative to other open issues label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature of request help wanted Extra attention is needed low priority This issue is a lower priority relative to other open issues on hold This issue to be resolved at a later time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants