Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pool decorator to functions #489

Merged
merged 23 commits into from
Feb 7, 2024
Merged

Add pool decorator to functions #489

merged 23 commits into from
Feb 7, 2024

Conversation

Yaoyx
Copy link
Collaborator

@Yaoyx Yaoyx commented Jan 22, 2024

No description provided.

cooltools/lib/common.py Outdated Show resolved Hide resolved
cooltools/cli/sample.py Outdated Show resolved Hide resolved
@gfudenberg
Copy link
Member

the decorator does nicely streamline the code! any ideas on a potential tests to add for the pool_decorator function @Yaoyx ?

tests/test_virtual4c.py Outdated Show resolved Hide resolved
tests/test_snipping.py Outdated Show resolved Hide resolved
tests/test_snipping.py Outdated Show resolved Hide resolved
@gfudenberg
Copy link
Member

I think the new tests can be made more concise-- e.g. for each tool can just refactor to keep the insulation.equals(insulation_pooled2) inside of existing earlier test to minmize the boilerplate code

@gfudenberg
Copy link
Member

note this would close #431

@gfudenberg
Copy link
Member

hi @sergpolly & @nvictus -- could either of you check out this PR? Thanks!

cooltools/lib/common.py Outdated Show resolved Hide resolved
Copy link
Member

@nvictus nvictus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool decorator is a great idea and this overall looks good to me! Just one important thing missing.

As it stands right now, applying @pool_decorator to a function removes the flexibility that the map= parameter provides in the first place, which is to support alternative and third party map functors (e.g. from concurrent.futures or loky). This decorator forces the function to always use multiprocess.Pool.imap or builtins.map when nproc is 1.

The decorator should provide a sane default, like imap (or could even be parameterized in the future @pool_decorator("imap_unordered")). But it should allow the client to explicitly pass in map=<my_custom_functor> which will override the defaults and ignore nproc.

… 3. add explicit docstring to pool_decorator 4. all pool.terminate() 5. support third-party map functor
@gfudenberg
Copy link
Member

these changes look great @Yaoyx !

two qs:

  • we currently let the user pass their own map to map_functor when nproc=1. However, this map could use more than one process. Would it be more clear to require nproc=None for the case when the user wants to pass their own map function ? cc @nvictus @sergpolly

e.g.

@pool_decorator
def cooltools_function_to_decorate(
    clr, ...
    nproc=None,
    map_functor=my_custom_map_func,
):
  • seems that the user-facing dots() function could be modified to take the map_functor & the inner scoring_and_histogramming etc could take the default nproc=1 as the kwarg? @Yaoyx

@sergpolly
Copy link
Member

  • Would it be more clear to require nproc=None for the case when the user wants to pass their own map function

... no strong opinion ... I think, good docs would be sufficient - sort of like - "as soon as you start messing with the map_functor you - the user - are on your own - i.e reponsible for all of the boiler plate code around starting and terminating processing pool and the nproc doesn't affect anything in this case" Or in other words - "if you (the user) want an "easy" multiprocessing -> just modify nproc cooltools/decorator will take of the rest, otherwise if you want something custom - modify map_functor and ignore nproc.

seems that the user-facing dots() function could be modified to take the map_functor & the inner scoring_and_histogramming etc could take the default nproc=1 as the kwarg

in the same vein - I'd say, ultimately, provide the convenience of nproc only for the user-facing functions .. Inner functions can receive map_functor argument only. Unless I'm missing something ... Also, I think dots is the only place where this would matter for now - i.e. the case where multiple inner functions are using parallel-map_functor

@sergpolly
Copy link
Member

sergpolly commented Feb 6, 2024

oh , i just looked at the overall code - and there is 1 more tiny thing - I think it would be also nice to clean up all the remaining import multiprocess/int statements from the modules/files the rely on pool_decorator now - perhaps even black/ruff code formatter would take care of that - i.e. the imports that aren't being used inside functions
otherwise looks great

Copy link
Member

@gfudenberg gfudenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to have the same defnition for the nproc & map_functor arguments across functions. I propose:

nproc : int, optional
How many processes to use for calculation. Ignored if map_functor is passed.

map_functor : callable, optional
    Map function to dispatch the matrix chunks to workers. If left unspecified, pool_decorator applies the following defaults: if nproc>1 this defaults to multiprocess.Pool; If nproc=1 this defaults  the builtin map. 

does this clear @sergpolly ?

@sergpolly
Copy link
Member

yes, I think, it is clear

The last thing on my side, @Yaoyx - i've noticed few remaining import multiprocess in some CLI files - "sample" and "coverage" at least I didn't check them all -> otherwise this looks complete - great stuff !

@gfudenberg
Copy link
Member

ok to merge now @nvictus ?

cooltools/lib/common.py Outdated Show resolved Hide resolved
Copy link
Member

@nvictus nvictus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the docstring comment, this LGTM. One issue is that the nproc parameters lingering in the wrapped functions aren't necessary because nproc is provided by the decorator and doesn't get passed down to the wrapped function. We can consider removing them in the future, but maybe retaining them in docstrings.

cooltools/lib/common.py Outdated Show resolved Hide resolved
@nvictus nvictus merged commit 41340d4 into open2c:master Feb 7, 2024
4 checks passed
@gfudenberg
Copy link
Member

how would you then suggest CLI compatibility @nvictus ? could the map_functor argument be an int in which case the decorator interprets it as nprop=my_int?

@nvictus
Copy link
Member

nvictus commented Feb 7, 2024

Yeah, assuming the CLI is not customizing the map implementation, you only need to expose a parameter to set the number of processes.

@gfudenberg
Copy link
Member

gfudenberg commented Feb 7, 2024

I like it! That would make the arguments a little cleaner by not having 2 arguments to control 1 behavior... Though providing an int to an argument named map_functor could be a bit confusing...

any suggestions?

@sergpolly
Copy link
Member

@gfudenberg a function wrapped into pool_decorator - has both nproc and map-functor, but the function being wrapped does not have to have nproc as a kwargs, only map_functor - I think this was Nezar point ... At least this is how understand it now - took time to understand the interplay of the wrapper and the wrapped

@gfudenberg
Copy link
Member

gfudenberg commented Feb 7, 2024

yes, this is now my understanding too @sergpolly -- so I'm wondering if

@pool_decorator
def cooltools_function_to_decorate(
    clr, 
    ... other args,
    map_functor=1 or int or my_custom_map_func(),
):

would be clear, or if this would make the map_functor argument confusing (or better to rename it something else)

@sergpolly
Copy link
Member

Hm , I don't know if passing map_functor=8 or int would improve readability... IMHO. I very much like the current state + Nezars suggestion - the way I understand it is

  • make cooltools_func receive only map_functor kwarg
  • wrapping cooltool_func in a pool_decorator makes it seem like the function has nproc kwarg now (pool decorator - organizes the interplay between nproc and map_functor internally)
  • inside cli we use decorated cooltool_func with nproc only without exposing/touching map_functor

The only part missing for me now - is how do you organize docstrings... - cooltools_func drops nproc as kwarg , together with the docstrings for it - ok, and then what ? Do we specify docstring for nproc in the wrapper ? Can one combine the docstrings of wrapper the wrapped up in a nice way ?

@sergpolly
Copy link
Member

@gfudenberg I'm still a bit confused about this nproc/map_functor thing ... am I missing something obvious ? or do you agree with the previous comment ?

@gfudenberg
Copy link
Member

sorry for the confusion @sergpolly !
your comment makes perfect sense & I agree with the following:

The only part missing for me now - is how do you organize docstrings... - cooltools_func drops nproc as kwarg , together with the docstrings for it - ok, and then what ? Do we specify docstring for nproc in the wrapper ? Can one combine the docstrings of wrapper the wrapped up in a nice way ?

@sergpolly
Copy link
Member

perfect ! it is nice to have a consensus ! i'm sure there is some python/decorator/wrapping magic that would enable docstring combination

agalitsyna pushed a commit that referenced this pull request Feb 12, 2024
* add pool decorator to previous functions

* Fix pool_decorator

* add pool_decorator to dotfinder.py and expected.py

* combined import command for pool_decorator with others

* add tests for pooled functions to check if outputs are the same as using regular map

* changed lambda func to a def func

* try to set start method for multiprocessing

* test pool.map insteand of pool.imap

* switch multiprocessing to multiprocess

* swith to  multiprocess

* change job back to lambda

* change transforms back to lambda function

* streamlined tests for multiprocess

* move multiprocess code from cli/coverage and cli/sample to api

* add nproc to test_sample and reduce the random sampling size

* decrease test_sample sampling value

* change frac to 0.1 in test_sample

* regulate tolerance in test_sample

* 1.move logging into decorator 2. use map_functor as the argument name 3. add explicit docstring to pool_decorator 4. all pool.terminate() 5. support third-party map functor

* remove unused imports

* remove unused imports in cli

* unified definitions of nproc and map_functor

* Update pool_decorator docstring

---------

Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants