Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyProxy and LazyProxyMultiton patterns #269

Merged
merged 12 commits into from
Feb 3, 2022
Merged

Conversation

sjperkins
Copy link
Member

@sjperkins sjperkins commented Jan 28, 2022

  • Tests added / passed

    $ py.test -v -s africanus

    If the pep8 tests fail, the quickest way to correct
    this is to run autopep8 and then flake8 and
    pycodestyle to fix the remaining issues.

    $ pip install -U autopep8 flake8 pycodestyle
    $ autopep8 -r -i africanus
    $ flake8 africanus
    $ pycodestyle africanus
    
  • Fully documented, including HISTORY.rst for all changes
    and one of the docs/*-api.rst files for new API

    To build the docs locally:

    pip install -r requirements.readthedocs.txt
    cd docs
    READTHEDOCS=True make html
    

@sjperkins
Copy link
Member Author

@JSKenyon @bennahugo @o-smirnov

The purpose of this PR is to demonstrate two useful patterns for managing resources in a dask context. I'll briefly describe the LazyProxy which is constructed with the function/class and arguments for constructing an instance of the class.

The instance is only created when attempts to access attributes on it via the LazyProxy occur. Additionally, LazyProxies are lightweight to pickle, assuming that the supplied arguments are lightweight. This means that they can be embedded in dask graphs without incurring creation of expensive resources like files or database connections and transferred to dask workers.

f = LazyProxy(open, "test.txt", mode="w")
f.write("Hello World")
f.close()

Additionally, it's possible to supply a finaliser method to the LazyProxy, which can be used to finalise the instance when the LazyProxy is garbage collected. For example.

def finalise(file_):
  file_.close()

f = LazyProxy((open, finalise), "test.txt", mode="w")
f.write("Hello World")

LazyProxyMultiton takes this one step further by always returning the same instance if the arguments are the same:

assert LazyProxy(open, "test.txt", mode="w") is not LazyProxy(open, "test.txt", mode="w")
assert LazyProxyMultiton(open, "test.txt", mode="w") is LazyProxyMultiton(open, "test.txt", mode="w")

I would like to discuss these patterns over the next couple of weeks in a meeting.

@sjperkins
Copy link
Member Author

It works for submitting parangle computation to to "spawn" Process Pools

@JSKenyon
Copy link
Collaborator

Thanks @sjperkins for taking the time to demonstrate this pattern. I can confirm that this works, with the caveat that in the distributed case it is still necessary to set dask.config.set({"distributed.worker.daemon": False}). Dask does seem to detect more leaked semaphores in this case (if the code is interrupted) but seems to clean them up without issue.

Functions like dask.blockwise rely on duck typing to make decisions
about the objects that they are inserting into the graph. This is
undesirable in the case of LazyProxy's as they generally represent
heavy resource objects like Files, Sockets or Database Connections.

This commit ensures that LazyObject creation does not take place
within a call context that would unnecessarily incur LazyObject
creation, like dask.array.blockwise.
@sjperkins sjperkins merged commit 6ac51e4 into master Feb 3, 2022
@sjperkins sjperkins deleted the lazy-proxy-multiton branch February 3, 2022 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using casacore measures for computing parallactic angles acquires/drops the GIL excessively
2 participants