Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[next]: Add memory and disk-based caching to more workflow steps #1690

Merged
merged 67 commits into from
Nov 7, 2024

Conversation

SF-N
Copy link
Contributor

@SF-N SF-N commented Oct 15, 2024

Add memory and disk-based caching to other workflow steps and, therefore, removing unnecessary overhead of Program calls and significantly improving time to first computed value.

Changes:

  • setting cached = True for func_to_past_factory
  • wrapping the GTFN code generation into a CachedStep (using Diskcache) which is activated when setting otf_workflow__cached_translation=True, similar as in PR#1474 (without CachedStep)
  • Fixing hash function of ProgramDefinition
  • New tests for added functionality

This leads to a runtime decrease of about 25% for PMAP-G in the advect-uniform testcase (5 hours) after caches are populated.

TODOs:

  • improving hash functions of fingerprint_stage

@SF-N SF-N requested a review from egparedes November 4, 2024 11:59
@@ -73,6 +73,9 @@ def env_flag_to_bool(name: str, default: bool) -> bool:
)


GTFN_SOURCE_CACHE_DIR: str = os.environ.get(f"{_PREFIX}_GTFN_SOURCE_CACHE_DIR", "gtfn_cache")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid exposing a config option that is special to a certain backend. Is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced this since we wanted to avoid hardcoding it. In almost all cases, this should not change but e.g. in the tests, where the cache should be deleted afterwards, specifying a different directory would make sense.
Do you have a suggestion on how to make this configurable which you would prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DaCe has also a cache directory, maybe we could use a common config option for compiled backends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good to me. Do you agree @havogt?
And do you have a name in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally proposed this, but after thinking about it again, and together with my other suggestion about how to deal with this in tests, I think it's not needed at least for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed it and hardcoded it in gtfn.py.

Copy link
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comments.

src/gt4py/next/program_processors/runners/gtfn.py Outdated Show resolved Hide resolved
src/gt4py/next/ffront/stages.py Show resolved Hide resolved
@@ -73,6 +73,9 @@ def env_flag_to_bool(name: str, default: bool) -> bool:
)


GTFN_SOURCE_CACHE_DIR: str = os.environ.get(f"{_PREFIX}_GTFN_SOURCE_CACHE_DIR", "gtfn_cache")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally proposed this, but after thinking about it again, and together with my other suggestion about how to deal with this in tests, I think it's not needed at least for this PR.

@@ -148,4 +153,3 @@ def add_foast_located_node_to_fingerprint(
) -> None:
add_content_to_fingerprint(obj.location, hasher)
add_content_to_fingerprint(str(obj), hasher)
add_content_to_fingerprint(str(obj), hasher)
Copy link
Contributor Author

@SF-N SF-N Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, why this was done twice, does removing it make sense @egparedes / @DropD ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a bug to me and I agree with your change.

@egparedes
Copy link
Contributor

cscs-ci run

Copy link
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the updated requirements and it looks ready to me. Let's just wait for @DropD lightweight review and we are good to merge from my point of view.

@egparedes egparedes requested a review from DropD November 5, 2024 17:26
@egparedes egparedes changed the title feat[next]: Remove unnecessary overhead of Program calls feat[next]: Add memory and disk-based caching to more workflow steps Nov 6, 2024
@SF-N
Copy link
Contributor Author

SF-N commented Nov 7, 2024

Thanks for highlighting the unused imports, I removed all of them. Applying ruff to the tests as well would help to avoid this in the future.

@SF-N SF-N merged commit 1b9eb5c into GridTools:main Nov 7, 2024
51 checks passed
tehrengruber added a commit that referenced this pull request Nov 14, 2024
#1690 included a change to make the hash of an `itir.FencilDefinition`
stable across multiple runs. This PR adopts the same change to an
`itir.Program`,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants