Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager server-side existence check of Blocks is incompatible with flow decorators and pytest #16641

Open
mikeoconnor0308 opened this issue Jan 8, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@mikeoconnor0308
Copy link

mikeoconnor0308 commented Jan 8, 2025

Bug summary

When defining a flow, it's common to specify a custom result store. However, in Prefect V3 (and not V2), there is a check that the result store block has been saved that is executed in the decorator: code.

This makes it difficult to work with, as flow decorators are executed at import time, and there are several situations where one may not be in a good state to communicate with a Prefect server.

A simple and blocking example of such a state is when running tests with pytest. The following code would work fine in V2:

from prefect import flow
from prefect.results import LocalFileSystem
from prefect.testing.utilities import prefect_test_harness

# The result store has not been saved, and that cannot be done at import time as we're not yet connected to a prefect server
@flow(result_storage=LocalFileSystem())
def my_flow():
    return "hello"

def test_my_flow():
    with prefect_test_harness():
        result = my_flow()
        assert result == "hello"

If you are not logged into prefect, running pytest will fail, with the following stack trace:

_______________________________________________________________________________________________________________________________________ ERROR collecting test_block_import.py _______________________________________________________________________________________________________________________________________
test_block_import.py:7: in <module>
    @flow(result_storage=LocalFileSystem())
/opt/miniconda3/envs/prefect-v3/lib/python3.13/site-packages/prefect/flows.py:1604: in __call__
    return Flow(
/opt/miniconda3/envs/prefect-v3/lib/python3.13/site-packages/prefect/flows.py:348: in __init__
    raise TypeError(
E   TypeError: Result storage configuration must be persisted server-side. Please call `.save()` on your block before passing it in.
============================================================================================================================================== short test summary info ==============================================================================================================================================
ERROR test_block_import.py - TypeError: Result storage configuration must be persisted server-side. Please call `.save()` on your block before passing it in.

This is problematic, as there is no simple way (that I'm aware of) to create flows in decorators and refer to file systems that are not yet saved.

In my opinion, the existence / save check should be run at flow evaluation / deployment time. Or, it should be documented more clearly that one should refer to a storage block by name in decorators and ensure they are saved.

Version info

Version:             3.1.11
API version:         0.8.4
Python version:      3.13.1
Git commit:          e448bd34
Built:               Thu, Jan 2, 2025 1:11 PM
OS/Arch:             darwin/arm64
Profile:             default
Server type:         unconfigured
Pydantic version:    2.10.4

Additional context

No response

@mikeoconnor0308 mikeoconnor0308 added the bug Something isn't working label Jan 8, 2025
@zzstoatzz
Copy link
Collaborator

zzstoatzz commented Jan 8, 2025

hi @mikeoconnor0308 - thanks for the issue!

this makes sense to me, although we already should have an escape hatch for this. consider the following example:

from prefect import flow

@flow(result_storage="local-file-system/test")
def my_flow():
    print("Hello, world!")

print("end of import time")
if __name__ == "__main__":
    print("RUNNING FLOW")
    my_flow()
» python repros/16641.py
end of import time
RUNNING FLOW

...RESOLVING RESULT STORAGE...

11:41:23.171 | INFO    | Flow run 'delectable-mastodon' - Beginning flow run 'delectable-mastodon' for flow 'my-flow'
Hello, world!
11:41:23.299 | INFO    | Flow run 'delectable-mastodon' - Finished in state Completed()

this way, we can defer the resolution of the result storage until the flow runs. would this work for your use case?

@mikeoconnor0308
Copy link
Author

Thanks - that is near enough what we ended up doing, but it took us a while to figure it out.

I think the interface or documentation should be more clear or express more clearly that in non-trivial examples one should refer to result stores by name.

I'd be happy to contribute a PR with a docs change.

@desertaxle
Copy link
Member

@mikeoconnor0308 A docs PR would be awesome! I think it makes sense to add a callout to this page: https://docs.prefect.io/v3/develop/results#configuring-result-persistence and potentially update the docstring for the @flow and @task decorators, but if you see any other places where improved documentation would be beneficial, have at it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants