feature[next]: toolchain configuration interfaces #1438

DropD · 2024-01-31T16:06:54Z

Description

Simplify workflow configuration / creation by providing composable factories (in this case using factoryboy).

I recomend starting your assessment by comparing how the various run_gtfn_XX backends are created in src/gt4py/next/program_processors/runners/gtfn.py. Then look at the factories and optionally think about providing the same or better usage with a different library or pattern.

Requirements

All fixes and/or new features come with corresponding tests.
Important design decisions have been documented in the approriate ADR inside the docs/development/ADRs/ folder.

If this PR contains code authored by new contributors please make sure:

All the authors are covered by a valid contributor assignment agreement provided to ETH Zurich and signed by the employer if needed.
The PR contains an updated version of the AUTHORS.md file adding the names of all the new contributors.

edopao · 2024-02-01T08:05:48Z

src/gt4py/next/program_processors/runners/gtfn.py

-    cache_strategy=cache.Strategy.SESSION, builder_factory=compiledb.CompiledbFactory()
-)
+    translation = factory.SubFactory(
+        gtfn_module.GTFNTranslationStepFactory, device_type=factory.SelfAttribute("..device_type")


What is the meaning of 2 dots ..device_type?

In this context it means "look up the attribute on the parent of the SubFactory": https://factoryboy.readthedocs.io/en/stable/reference.html#parents

edopao · 2024-02-01T08:08:37Z

src/gt4py/next/program_processors/runners/gtfn.py

-    ),
-    allocator=next_allocators.StandardCPUFieldBufferAllocator(),
+run_gtfn_imperative = GTFNBackendFactory(
+    otf_workflow__translation__use_imperative_backend=True, **__user_defaults


Is there a way to shorten these names? (otf_workflow__translation__use_imperative_backend)

It's not a name, it's a path equivalent to otf_workflow.translation.use_imperative_backend. However, that would be invalid python syntax in a keyword arg, so the authors of factoryboy went with __ instead.

The answer is still yes though, we can shorten them in two ways:

make the individual attribute names shorter

provide a parameter (under class Params:) in the parent factory with a shorter name and pass that on to the subfactory. (The full path attribute would still be available though.)

edopao · 2024-02-01T08:09:35Z

src/gt4py/next/program_processors/runners/gtfn.py

 )

+run_gtfn_gpu = GTFNBackendFactory(**__user_defaults | {"gpu": True})


I like this syntax.

edopao · 2024-02-01T08:13:26Z

src/gt4py/next/program_processors/runners/gtfn.py

+        otf_workflow = factory.SubFactory(
+            GTFNCompileWorkflowFactory, device_type=factory.SelfAttribute("..device_type")
+        )
+        name = factory.LazyAttribute(lambda o: f"run_gtfn_{o.device_name}{o.cached_name}")


Maybe we do not want to construct a name, do we? We should just use the object created by the factory.

The factory has to give a name to the object it creates. In practice the name attribute of the executor is mainly used in pytest to see at a glance which test was parametrized with what type of gtfn backend.

src/gt4py/next/config.py

Co-authored-by: Hannes Vogt <hannes@havogt.de>

havogt

First round of comments. Didn't look at the factories yet.

havogt · 2024-02-14T10:54:02Z

docs/development/ADRs/0017-Toolchain-Configuration.md

+
+### Limit the times when configuration can change
+
+By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).


You can always do gt4py.next.config.DEBUG=True at any point? I don't understand the paragraph...

I found the following comment in the test

Because monkey patching the config variables is not enough, as
other variables are computed at import time based on them.

which explains it partly. Basically if changing is limited depends on how it's used. That's potentially dangerous.

Also a question: do you consider changing module level variables monkey patching?

That depends a bit. If I am changing something that is assumed to change throughout the code and I want it to stay changed, probably not. If, like in this case changing it might leave things in an inconsistent state (basically undefined behaviour) and I want the change to only happen locally and revert afterwards, I would describe it as monkey patching.

I reworded this to be more clear (hopefully)

havogt · 2024-02-14T10:55:02Z

docs/development/ADRs/0017-Toolchain-Configuration.md

+
+By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).
+
+### Environment variables are the primary end user interface


Is it needed? See my comment above. Some options might be changed programmatically, and if it makes sense , whynot?

All of these are choices, nothing is needed. The only choice which is fairly well justified is to not try to warn users of "rogue" toolchains, because most of the cycle went into experimentation for that.

I have hopefully made it more clear that this is not a requirement, but a decision that was made for now. I have outlined a sketch of what the alternative could look like (in the alternatives section).

havogt · 2024-02-14T10:56:26Z

docs/development/ADRs/0017-Toolchain-Configuration.md

+---
+tags: [backend, otf, workflows, toolchain]
+---
+


Maybe state somewhere that this is a minimal solution which brings great usability improvements, but could be replaced with a more involved technique later.

src/gt4py/next/config.py

tehrengruber · 2024-02-14T12:43:38Z

src/gt4py/next/config.py

+import tempfile
+
+
+class BuildCacheLifetime(enum.Enum):


I don't like these enums move up here, but I don't have another or better proposal.

Perhaps, if they start accumulating they should get a place of their own.

tehrengruber · 2024-02-14T12:55:00Z

docs/development/ADRs/0017-Toolchain-Configuration.md

+- an external name used to load from environment variables (possibly with a common prefix)
+- a fallback default value in case no environment variable is defined
+
+Any other toolchain option is considered an implementation detail.


Suggested change

Any other toolchain option is considered an implementation detail.

I don't agree with this sentence and don't see the need for this requirement. This one never needs to appear in the global config module nor an environment variable, but is explicitly meant for a user to change (and thus not an implementation detail).

It is not a requirement, simply a description of the status quo. The options not exposed to the end user are implementation details from the point of view of the end user. The end user is someone who runs code that uses GT4Py internally but may not be aware of that.

But the user can change the option I linked see here for an actual case, so in that sense it is not a proper description of the status quo in my opinion.

Since your example is in code, it can by definition not have been done by an end user. By the definition earlier in the text.

tehrengruber · 2024-02-14T13:04:11Z

docs/development/ADRs/0017-Toolchain-Configuration.md

+
+### Limit the times when configuration can change
+
+By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).


Suggested change

By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).

By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed).

I don't agree with this sentence. This doesn't ensure anything. I do agree with the unsupported phrasing later on though.

I guess the suggestion got a bit jumbled up then?

It does indeed not ensure anything, except that changing configuration during a python interpreter run might leave the configuration in an inconsistent state.

Wasn't meant to be a suggestion, but a quote.

I have changed the wording to be more careful and precise.

tehrengruber · 2024-02-14T13:16:34Z

docs/development/ADRs/0017-Toolchain-Configuration.md

@@ -0,0 +1,101 @@
+---


I find a good part of this document not useful to read, but also I don't care too much.

The usefulness of this document depends heavily on whether one is thinking about changing any of this design without re-covering the same ground.

Is the lack of interest and perceived usefulness due to that you have no desire to do so? Or do you think it is useless under the premise of wanting to change the design. Or does the writing style distract from the actual information?

No it's not a lack of interest I just don't feel addressed neither as a user trying to understand how things work (not really the purpose of an ADR sure), but also not as someone trying to understand design decisions (e.g. why would I care about the definition of the term toolchain to mention just one thing).

The definition is provided so that it is clear what the design decisions have been taken for. It is meant to be skipped but then double checked before taking misguided action based on something one has read in the document. Same as all other definitions.

If we had clearly defined terms for components of GT4Py this would not be necessary as much.

tehrengruber · 2024-02-14T13:21:15Z

src/gt4py/next/program_processors/runners/gtfn.py

-    ),
-    allocator=next_allocators.StandardCPUFieldBufferAllocator(),
+run_gtfn_with_temporaries = GTFNBackendFactory(
+    name_postfix="_with_temporaries",


Lets make this a trait.

Either the linked line slipped somehow or there is a misunderstanding about what factoryboy traits are.

Assuming the former:
You might be talking about switching lift_mode and temporary_extraction_heuristics based on a trait?

Assuming the latter, you would be talking about something like:

class GTFNBackendFactory(factory.Factory): ... class Params: name_postfix = factory.Trait( ... and now what? ) ... # usage GTFNBackendFactory(name_postfix=True)

The effect is to negate the purpose of the name_postfix parameter, which is to give custom name postfixes. In either case, this would be an easy post-project cooldown PR. The scope of this PR is now fixed.

Not sure why you mention the name_postfix parameter. I want something like this:

run_gtfn_with_temporaries = GTFNBackendFactory(use_temporaries=True)

because for many purposes the lift_mode and heuristics is a low-levle detail I don't care about as a library user.

The github view makes it look like you are talking specifically about the line containing name_postfix. As I thought, this turns out to be accidental.

The need to change "lift_mode" and "heuristics" in synch arose only as a result of merging with main after the end of the project. Therefore it remains out of scope for this PR.

tehrengruber · 2024-02-14T13:26:17Z

docs/development/ADRs/0017-Toolchain-Configuration.md

+- such a toolchain may have been created but not used to do work for the end user
+- without proper tracking of where configuration comes from, false positives as well as false negatives could not be eliminated.
+
+Implementing tracking was briefly considered but looked like it would be too heavy weight to justify the maintenance burden.


This paragraph doesn't really sum up the experiment we did. Also where is the

class MyStep: config_flag = True os.environ["GT4PY_MYSTEP_CONFIG_FLAG"] = True my_backend = MyStepFactory(mystep_config_flag=False)

part or the sketch of the merging algorithm?

In general there should perhaps be code examples, as I can not put it more obviously into text form.

Also, adding a sketch of the merging algorithm would distract from the fact that the problem can not be solved by another merging algorithm either (because what is needed is trackability).

The merging algorithm I proposed has traceability:

env_defined_vars: dict[str, Any] config_file_defined_vars: dict[str, Any] code_defined_vars: dict[str, Any] var_sources = { "env": env_defined_vars, "config": config_file_defined_vars, "in-code": code_defined_vars } effects_per_var_source: dict[str, dict[str, list[str]]] = [] for var_source, vars in var_sources.items(): res = factory(**vars) # compute what attributes are different in res from default, i.e. factory() effect = ... effects_per_var_source[var_source][var] = effect for (source_a, effect_a), (source_b, effect_b) in itertools.product(effects_per_var_source.items(), effects_per_var_source.items()) for name_outer, effect_outer in user_defined_vars_effects.items(): for name_inner, effect_inner in code_defined_vars_effects.items(): if intersection(effect_outer, effect_inner): print("{source_a}:{name_outer} conflicts with {source_b}:{name_inner}")

It prints both source of the conflict and can straightforward be extended to give even more (e.g. line numbers if the sources are from config files). But my main point is I don't see this paragraph to reflect what we concluded the experiment with:

Algorithmically, warnings are generally possible, but not with FactoryBoy for Defaults. Extensions to multiple SOT are generally also possible and in the simple version also feasible, but whether that is still satisfactory is questionable (because other projects also do not do that at all).

Ah, I will add another paragraph about black box analysis of the effects of different configuration sources and clarify that this one is about tracking where which value comes from.

However, I don't undestand why you would mention factory-boy as blocking all warning implementations. The black-box analysis is agnostic, it can always only easily track to the highest level of toolchain building logic. As soon as the user configuration does not apply directly anymore (but only indirectly via toolchain building logic), it would involve replicating some of that logic, or tracking (as in annotating any configuration with it's source and keeping and updating that annotation at every level of logic).

factory-boy does not implement proper tracking for us but nothing else would, either. I will mention that.

factoryboy workflow config prototype

8b54f4f

DropD requested review from tehrengruber, havogt, egparedes and edopao January 31, 2024 16:06

edopao reviewed Feb 1, 2024

View reviewed changes

Rico Häuselmann added 2 commits February 1, 2024 10:20

cosmetic changes and factory-boy dependency

7c9e306

Merge branch 'main' into c19-backend-config

cfbef72

DropD force-pushed the c19-backend-config branch from d34e6c6 to cfbef72 Compare February 9, 2024 13:02

explicit globals for config, expose build cache and type

6c78404

havogt reviewed Feb 12, 2024

View reviewed changes

src/gt4py/next/config.py Outdated Show resolved Hide resolved

src/gt4py/next/config.py Show resolved Hide resolved

src/gt4py/next/config.py Outdated Show resolved Hide resolved

Rico Häuselmann and others added 7 commits February 13, 2024 14:08

add design decisions in an ADR

de2ef32

add backend factory tests

78a6426

Apply suggestions from code review

e18a7b1

Co-authored-by: Hannes Vogt <hannes@havogt.de>

fix suggestions

3cb9cca

Merge branch 'main' into c19-backend-config

b824abf

document missing (infeasible) tests.

dc0346f

add sphinx docstrings to config variables

0d7c30d

DropD requested review from edopao and havogt February 13, 2024 14:30

fix gtfn formatter

880c8b5

havogt reviewed Feb 14, 2024

View reviewed changes

tehrengruber requested changes Feb 14, 2024

View reviewed changes

improvements based on review comments

953c5a0

havogt mentioned this pull request Feb 15, 2024

fix[next]: Default to Release in CMake #1420

Closed

Rico Häuselmann added 4 commits February 15, 2024 10:24

ADR clarifications

3cdb6ea

fix gtfn factory and tests

2701234

minor fixes

d71ae30

fox factory and tests

b4c1b1a

DropD changed the title ~~feature[next]: factoryboy workflow config prototype~~ feature[next]: toolchain configuration interfaces Feb 16, 2024

DropD requested review from tehrengruber and havogt February 19, 2024 09:04

tehrengruber approved these changes Feb 19, 2024

View reviewed changes

DropD merged commit e631c7f into GridTools:main Feb 19, 2024
51 checks passed

DropD deleted the c19-backend-config branch February 19, 2024 09:49

tehrengruber mentioned this pull request Feb 29, 2024

feature[next]: Make gtfn cache path configurable #1406

Closed

		)

		run_gtfn_gpu = GTFNBackendFactory(**__user_defaults \| {"gpu": True})


		### Limit the times when configuration can change

		By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).


		By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).

		### Environment variables are the primary end user interface

feature[next]: toolchain configuration interfaces #1438

feature[next]: toolchain configuration interfaces #1438

Conversation

DropD commented Jan 31, 2024 • edited Loading

Description

Requirements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

havogt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DropD Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DropD Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DropD Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

tehrengruber Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tehrengruber Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DropD Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

DropD commented Jan 31, 2024 •

edited

Loading

DropD Feb 15, 2024 •

edited

Loading

DropD Feb 15, 2024 •

edited

Loading

DropD Feb 14, 2024 •

edited

Loading

tehrengruber Feb 14, 2024 •

edited

Loading

tehrengruber Feb 14, 2024 •

edited

Loading

DropD Feb 15, 2024 •

edited

Loading