Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panel adds considerable overhead to longer running async tasks #4239

Closed
MarcSkovMadsen opened this issue Dec 26, 2022 · 2 comments
Closed

Panel adds considerable overhead to longer running async tasks #4239

MarcSkovMadsen opened this issue Dec 26, 2022 · 2 comments
Labels
type: bug Something isn't correct or isn't working
Milestone

Comments

@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Dec 26, 2022

I'm trying to write a how to guide for Dask. I can see in the Dask efficiency docs that they expect one millisecond of overhead for submitting one task to Dask.

From my experiments this seems not to hold when used with Panel. The minimum overhead seems to be around 7 msecs. And what is worse, as the task duration grows so does the overhead. And its not insignificant.

cluster.py

# cluster.py
from dask.distributed import LocalCluster

DASK_SCHEDULER_PORT = 64719
DASK_SCHEDULER_ADDRESS = f"tcp://127.0.0.1:{DASK_SCHEDULER_PORT}"

if __name__ == '__main__':
    cluster = LocalCluster(scheduler_port=DASK_SCHEDULER_PORT, n_workers=4)
    print(cluster.scheduler_address)
    input()

tasks.py

import numpy as np

def _fib(n):
    if n < 2:
        return n
    else:
        return _fib(n - 1) + _fib(n - 2)

def func(value):
    return _fib(value)

value = 36

app.py

import asyncio
import time
import panel as pn
import numpy as np
from dask.distributed import Client
from cluster import DASK_SCHEDULER_ADDRESS
import tasks
import hvplot.pandas
import pandas as pd

pn.extension("terminal", sizing_mode="stretch_width", template="fast")

results = [

]
N_MIN = 0
N_MAX = 40

submit_button = pn.widgets.Button(name="Submit")
overhead_plot = pn.pane.HoloViews(height=400)
duration_plot = pn.pane.HoloViews(height=400)

def update_results(n, local_duration, cluster_duration):
    results.append(dict(n=n, local_duration=local_duration, cluster_duration=cluster_duration, overhead=cluster_duration-local_duration))

    df = pd.DataFrame(results).groupby("n").mean()
    
    overhead_plot.object = df.hvplot.line(y="overhead")
    duration_plot.object = df.hvplot.line(y=["local_duration", "cluster_duration"], ylabel="duration", xlim=(N_MIN, N_MAX))


async def get_client():
    if not "dask-client" in pn.state.cache:
        pn.state.cache["dask-client"] = await Client(
            DASK_SCHEDULER_ADDRESS, asynchronous=True
        )
    return pn.state.cache["dask-client"]

@pn.depends(submit_button, watch=True)
async def _click(_):
    submit_button.disabled=True
    results = []
    
    for n in range(N_MIN, N_MAX):
        start = time.time()
        tasks.func(n)
        local_duration = time.time()-start

        client = await get_client()
        start = time.time()
        await client.submit(tasks.func, n)
        cluster_duration = time.time()-start
        
        update_results(n, local_duration, cluster_duration)
        
    submit_button.disabled=False

component = pn.Column(
    submit_button, overhead_plot, duration_plot
).servable()

I would have expected the overhead to stay constant. But it looks like it does not.

image

Maybe this is just a Dask thing. But I've not been able to find any statements that indicate this.

I've asked the questions in the Dask Discourse forum. See https://dask.discourse.group/t/why-does-submit-overhead-increase-exponentially-for-fibonacci-example/1420.

@MarcSkovMadsen MarcSkovMadsen added the type: bug Something isn't correct or isn't working label Dec 26, 2022
@MarcSkovMadsen MarcSkovMadsen added this to the v0.14.3 milestone Dec 26, 2022
@MarcSkovMadsen
Copy link
Collaborator Author

If I change the func to just a sleep function the overhead seems not to grow.

def func(value):
    return time.sleep(float(value)/10)

image

@MarcSkovMadsen
Copy link
Collaborator Author

It seems that Dask needs to be configured to handle long running python functions that do not release the GIL.

https://dask.discourse.group/t/why-does-submit-overhead-increase-exponentially-for-fibonacci-example/1420/5?u=marcskovmadsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't correct or isn't working
Projects
None yet
Development

No branches or pull requests

1 participant