Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefect Memory Leak #5327

Closed
BitTheByte opened this issue Jan 17, 2022 · 1 comment
Closed

Prefect Memory Leak #5327

BitTheByte opened this issue Jan 17, 2022 · 1 comment

Comments

@BitTheByte
Copy link
Contributor

BitTheByte commented Jan 17, 2022

Description

Running Prefect /w Dask on Kubernetes results in huge memory usage as reported by Dask the list object is only ~47MB however when submitted to the workers it increases by 300x resulting in 13.7G object regardless to say this is expressive and doesn't scale well. please note that I tried methods mentioned at #3966 and dask/distributed#4091

Expected Behavior

Dask scheduler mentioned using scatter method I'm not really familiar with how Prefect's handle DDG so I don't think it's a matter of using scatter before submitting at https://github.com/PrefectHQ/prefect/blob/master/src/prefect/executors/dask.py#L421

Reproduction

from prefect import Flow, task

@task
def mytask():
    return list(range(1, 9999999))

@task
def othertask(data):
    import time
    time.sleep(0.2)
    return data ** data

with Flow(name="memory-leak") as flow:
    a = mytask()
    b = othertask.map(a)

Environment

{
  "config_overrides": {},
  "env_vars": [],
  "system_information": {
    "platform": "Windows-10-10.0.19043-SP0",
    "prefect_backend": "server",
    "prefect_version": "0.15.12",
    "python_version": "3.9.6"
  }
}
@zanieb
Copy link
Contributor

zanieb commented Jan 17, 2022

Hi, sorry to hear you're having issues with this. I suspect this has less to do with the size of your input object and more to do with the overhead of creating a million orchestrated tasks. This is on our roadmap to investigate, but it's likely to require some low level optimizations. Can you share more information about what objects are consuming the memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants