Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature[next]: toolchain configuration interfaces #1438

Merged
merged 17 commits into from
Feb 19, 2024

Conversation

DropD
Copy link
Contributor

@DropD DropD commented Jan 31, 2024

Description

Simplify workflow configuration / creation by providing composable factories (in this case using factoryboy).

I recomend starting your assessment by comparing how the various run_gtfn_XX backends are created in src/gt4py/next/program_processors/runners/gtfn.py. Then look at the factories and optionally think about providing the same or better usage with a different library or pattern.

Requirements

  • All fixes and/or new features come with corresponding tests.
  • Important design decisions have been documented in the approriate ADR inside the docs/development/ADRs/ folder.

If this PR contains code authored by new contributors please make sure:

  • All the authors are covered by a valid contributor assignment agreement provided to ETH Zurich and signed by the employer if needed.
  • The PR contains an updated version of the AUTHORS.md file adding the names of all the new contributors.

cache_strategy=cache.Strategy.SESSION, builder_factory=compiledb.CompiledbFactory()
)
translation = factory.SubFactory(
gtfn_module.GTFNTranslationStepFactory, device_type=factory.SelfAttribute("..device_type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of 2 dots ..device_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context it means "look up the attribute on the parent of the SubFactory": https://factoryboy.readthedocs.io/en/stable/reference.html#parents

),
allocator=next_allocators.StandardCPUFieldBufferAllocator(),
run_gtfn_imperative = GTFNBackendFactory(
otf_workflow__translation__use_imperative_backend=True, **__user_defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to shorten these names? (otf_workflow__translation__use_imperative_backend)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a name, it's a path equivalent to otf_workflow.translation.use_imperative_backend. However, that would be invalid python syntax in a keyword arg, so the authors of factoryboy went with __ instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is still yes though, we can shorten them in two ways:

  • make the individual attribute names shorter
  • provide a parameter (under class Params:) in the parent factory with a shorter name and pass that on to the subfactory. (The full path attribute would still be available though.)

)

run_gtfn_gpu = GTFNBackendFactory(**__user_defaults | {"gpu": True})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this syntax.

otf_workflow = factory.SubFactory(
GTFNCompileWorkflowFactory, device_type=factory.SelfAttribute("..device_type")
)
name = factory.LazyAttribute(lambda o: f"run_gtfn_{o.device_name}{o.cached_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we do not want to construct a name, do we? We should just use the object created by the factory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory has to give a name to the object it creates. In practice the name attribute of the executor is mainly used in pytest to see at a glance which test was parametrized with what type of gtfn backend.

@DropD DropD force-pushed the c19-backend-config branch from d34e6c6 to cfbef72 Compare February 9, 2024 13:02
src/gt4py/next/config.py Outdated Show resolved Hide resolved
src/gt4py/next/config.py Show resolved Hide resolved
src/gt4py/next/config.py Outdated Show resolved Hide resolved
@DropD DropD requested review from edopao and havogt February 13, 2024 14:30
Copy link
Contributor

@havogt havogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments. Didn't look at the factories yet.


### Limit the times when configuration can change

By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always do gt4py.next.config.DEBUG=True at any point? I don't understand the paragraph...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the following comment in the test

Because monkey patching the config variables is not enough, as
other variables are computed at import time based on them.

which explains it partly. Basically if changing is limited depends on how it's used. That's potentially dangerous.

Also a question: do you consider changing module level variables monkey patching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends a bit. If I am changing something that is assumed to change throughout the code and I want it to stay changed, probably not. If, like in this case changing it might leave things in an inconsistent state (basically undefined behaviour) and I want the change to only happen locally and revert afterwards, I would describe it as monkey patching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded this to be more clear (hopefully)


By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).

### Environment variables are the primary end user interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed? See my comment above. Some options might be changed programmatically, and if it makes sense , whynot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are choices, nothing is needed. The only choice which is fairly well justified is to not try to warn users of "rogue" toolchains, because most of the cycle went into experimentation for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have hopefully made it more clear that this is not a requirement, but a decision that was made for now. I have outlined a sketch of what the alternative could look like (in the alternatives section).

---
tags: [backend, otf, workflows, toolchain]
---

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe state somewhere that this is a minimal solution which brings great usability improvements, but could be replaced with a more involved technique later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/gt4py/next/config.py Outdated Show resolved Hide resolved
src/gt4py/next/config.py Show resolved Hide resolved
src/gt4py/next/config.py Outdated Show resolved Hide resolved
import tempfile


class BuildCacheLifetime(enum.Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like these enums move up here, but I don't have another or better proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, if they start accumulating they should get a place of their own.

- an external name used to load from environment variables (possibly with a common prefix)
- a fallback default value in case no environment variable is defined

Any other toolchain option is considered an implementation detail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Any other toolchain option is considered an implementation detail.

I don't agree with this sentence and don't see the need for this requirement. This one never needs to appear in the global config module nor an environment variable, but is explicitly meant for a user to change (and thus not an implementation detail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a requirement, simply a description of the status quo. The options not exposed to the end user are implementation details from the point of view of the end user. The end user is someone who runs code that uses GT4Py internally but may not be aware of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the user can change the option I linked see here for an actual case, so in that sense it is not a proper description of the status quo in my opinion.

Copy link
Contributor Author

@DropD DropD Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your example is in code, it can by definition not have been done by an end user. By the definition earlier in the text.


### Limit the times when configuration can change

By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed). Of course there are ways around it but they should be considered unsupported as they are difficult to make reliable (consider monkey patching as an example).
By making the `gt4py.next.config` module contain module level variables with user configuration, we ensure user configuration can only be changed between python interpreter runs (after `from gt4py import next` the configuration is fixed).

I don't agree with this sentence. This doesn't ensure anything. I do agree with the unsupported phrasing later on though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the suggestion got a bit jumbled up then?

It does indeed not ensure anything, except that changing configuration during a python interpreter run might leave the configuration in an inconsistent state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't meant to be a suggestion, but a quote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the wording to be more careful and precise.

@@ -0,0 +1,101 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a good part of this document not useful to read, but also I don't care too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usefulness of this document depends heavily on whether one is thinking about changing any of this design without re-covering the same ground.

Is the lack of interest and perceived usefulness due to that you have no desire to do so? Or do you think it is useless under the premise of wanting to change the design. Or does the writing style distract from the actual information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not a lack of interest I just don't feel addressed neither as a user trying to understand how things work (not really the purpose of an ADR sure), but also not as someone trying to understand design decisions (e.g. why would I care about the definition of the term toolchain to mention just one thing).

Copy link
Contributor Author

@DropD DropD Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition is provided so that it is clear what the design decisions have been taken for. It is meant to be skipped but then double checked before taking misguided action based on something one has read in the document. Same as all other definitions.

If we had clearly defined terms for components of GT4Py this would not be necessary as much.

),
allocator=next_allocators.StandardCPUFieldBufferAllocator(),
run_gtfn_with_temporaries = GTFNBackendFactory(
name_postfix="_with_temporaries",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make this a trait.

Copy link
Contributor Author

@DropD DropD Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the linked line slipped somehow or there is a misunderstanding about what factoryboy traits are.

Assuming the former:
You might be talking about switching lift_mode and temporary_extraction_heuristics based on a trait?

Assuming the latter, you would be talking about something like:

class GTFNBackendFactory(factory.Factory):
    ...
    class Params:
        name_postfix = factory.Trait(
            ... and now what?
        )
        ...
        
# usage
GTFNBackendFactory(name_postfix=True)

The effect is to negate the purpose of the name_postfix parameter, which is to give custom name postfixes. In either case, this would be an easy post-project cooldown PR. The scope of this PR is now fixed.

Copy link
Contributor

@tehrengruber tehrengruber Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you mention the name_postfix parameter. I want something like this:

run_gtfn_with_temporaries = GTFNBackendFactory(use_temporaries=True)

because for many purposes the lift_mode and heuristics is a low-levle detail I don't care about as a library user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The github view makes it look like you are talking specifically about the line containing name_postfix. As I thought, this turns out to be accidental.

The need to change "lift_mode" and "heuristics" in synch arose only as a result of merging with main after the end of the project. Therefore it remains out of scope for this PR.

- such a toolchain may have been created but not used to do work for the end user
- without proper tracking of where configuration comes from, false positives as well as false negatives could not be eliminated.

Implementing tracking was briefly considered but looked like it would be too heavy weight to justify the maintenance burden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph doesn't really sum up the experiment we did. Also where is the

class MyStep:
    config_flag = True

os.environ["GT4PY_MYSTEP_CONFIG_FLAG"] = True

my_backend = MyStepFactory(mystep_config_flag=False)

part or the sketch of the merging algorithm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there should perhaps be code examples, as I can not put it more obviously into text form.

Also, adding a sketch of the merging algorithm would distract from the fact that the problem can not be solved by another merging algorithm either (because what is needed is trackability).

Copy link
Contributor

@tehrengruber tehrengruber Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merging algorithm I proposed has traceability:

env_defined_vars: dict[str, Any]
config_file_defined_vars: dict[str, Any]
code_defined_vars: dict[str, Any]

var_sources = {
  "env": env_defined_vars,
  "config": config_file_defined_vars,
  "in-code": code_defined_vars
}

effects_per_var_source: dict[str, dict[str, list[str]]] = []

for var_source, vars in var_sources.items():
  res = factory(**vars)
  # compute what attributes are different in res from default, i.e. factory()
  effect = ...
  effects_per_var_source[var_source][var] = effect

for (source_a, effect_a), (source_b, effect_b) in itertools.product(effects_per_var_source.items(), effects_per_var_source.items())
  for name_outer, effect_outer in user_defined_vars_effects.items():
    for name_inner, effect_inner in code_defined_vars_effects.items():
      if intersection(effect_outer, effect_inner):
        print("{source_a}:{name_outer} conflicts with {source_b}:{name_inner}")

It prints both source of the conflict and can straightforward be extended to give even more (e.g. line numbers if the sources are from config files). But my main point is I don't see this paragraph to reflect what we concluded the experiment with:

Algorithmically, warnings are generally possible, but not with FactoryBoy for Defaults. Extensions to multiple SOT are generally also possible and in the simple version also feasible, but whether that is still satisfactory is questionable (because other projects also do not do that at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I will add another paragraph about black box analysis of the effects of different configuration sources and clarify that this one is about tracking where which value comes from.

Copy link
Contributor Author

@DropD DropD Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I don't undestand why you would mention factory-boy as blocking all warning implementations. The black-box analysis is agnostic, it can always only easily track to the highest level of toolchain building logic. As soon as the user configuration does not apply directly anymore (but only indirectly via toolchain building logic), it would involve replicating some of that logic, or tracking (as in annotating any configuration with it's source and keeping and updating that annotation at every level of logic).

factory-boy does not implement proper tracking for us but nothing else would, either. I will mention that.

@DropD DropD changed the title feature[next]: factoryboy workflow config prototype feature[next]: toolchain configuration interfaces Feb 16, 2024
@DropD DropD requested review from tehrengruber and havogt February 19, 2024 09:04
@DropD DropD merged commit e631c7f into GridTools:main Feb 19, 2024
51 checks passed
@DropD DropD deleted the c19-backend-config branch February 19, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants