-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[next][dace]: GTIR-to-DaCe lowering of map-reduce (only full connectivity) #1683
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some points that needs further clarifications but it does not look that bad.
There are also some suggestions that are not directly related to the PR per se.
[gtx_common.Dimension("")] if isinstance(arg.data_type.dtype, gtir_ts.ListType) else [] | ||
[gtx_common.Dimension("")] if isinstance(arg.data_type.dtype, itir_ts.ListType) else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it happen that you have two anonymous dimensions in a field?
I mean something like Field(KDim, "", EdgeDim, "")
wouldn't this be a problem.
I would propose to give these dimensions a name like __anonymous_dim_{UNIQUE_STUFF}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it can happen, but the current IR does not support it. I agree that we should give it a name, not just leave an empty string. However, the name of dimension does not have to be unique. My idea is to use the offset name (e.g. V2E
). So you could have something like Field(KDim, "V2E", EdgeDim, "V2E")
, and this should not be a problem (I would have to ensure that map indices are unique, though). This will be addressed in next iteration (I was thinking about a separate PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay it is a bit off topic, but Rule 11 implies that the name of a dimensions and the iteration variable name is liked.
I think it is not a problem, but we should probably clarify the rule a bit.
_, _, reduce_identity = gtir_to_tasklet.get_reduce_params(stencil_expr.expr) | ||
reduce_identity_for_args = reduce_identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, _, reduce_identity = gtir_to_tasklet.get_reduce_params(stencil_expr.expr) | |
reduce_identity_for_args = reduce_identity | |
_, _, reduce_identity_for_args = gtir_to_tasklet.get_reduce_params(stencil_expr.expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also need to set reduce_identity
, because it is passed to LambdaToTasklet
to lower the current node. Note that reduce_identity_for_args
and reduce_identity
could be different (e.g. the first None
, the latter not None
). That is what I tried to explain below in the branch for neighbors nodes: we do not support nested reduction, so the reduction identity value is used (consumed) by the current neighbors expression to fill the skip values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I would make this aspect more clearer.
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_to_tasklet.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_to_tasklet.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_to_tasklet.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_to_tasklet.py
Outdated
Show resolved
Hide resolved
.../next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_gtir_to_sdfg.py
Show resolved
Hide resolved
Thank you @philip-paul-mueller for the review comments. I have pushed a commit that should clarify how |
Implements some refactoring changes discussed during review of #1683.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing serious, but some aspects need a bit more work.
But I expect that we can merge in the next round.
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
.../next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_gtir_to_sdfg.py
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_dataflow.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_dataflow.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_dataflow.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things that needs some discussion but they are rather minor points.
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Show resolved
Hide resolved
# list (see the second if-branch), we stop carrying the reduce identity further | ||
# (`reduce_identity_for_args = None`) because it is not needed aymore; | ||
# the reason being that reduce operates along a single axis, which in this case | ||
# corresponds to the local dimension of the list of neighbor values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excuse my ignorance and slow understanding, but why, you still have to initialize the reduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I do not need to initialize reduce_identity
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole problem is that there are different reduce identities:
self.reduce_identity
reduce_identity_for_args
I do not get why they are split.
Also I do not fully understand the explanation.
I think it is that if you hit the neighbor built in, then you would reduce over the list of neighbor hood values of a point, and because you are currently ignoring skip values, i.e. assume that all are defined, you do not have to fuill something.
So technically speaking it is not needed but not wrong to do.
If there is no reason to threat is as special case then I would not treat it as special case.
Except you want to do some prep work for the skip value case, but then you have to clearly indicate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.reduce_identity
is defined in LambdaToDataflow
, not here. Here reduce_identity
is an input to the function. By visiting the node arguments with reduce_identity_for_args = None
, their reduce_identity
will be None
.
Setting reduce_identity_for_args = None
will also work in case of skip values. The reduce_identity
value is filled in in place of skip values in the context of neighbors
itself, not in the arguments context.
You are right that it would not be wrong to set reduce_identity_for_args
also in case of neighbors arguments. I do it because it enables a sanity check that the sequence reduce(V2E) -> neighbors(V2E) -> reduce(C2E) -> neighbors(C2E)
is accepted, while the sequence reduce(V2E) -> reduce(C2E) -> neighbors(V2E) -> neighbors(C2E)
is not. The latter sequence would raise the NotImplementedError("nested reductions not supported.")
exception.
Side note. There is a simple alternative design where I can avoid passing reduce_identity
around. It would be to write a dummy value for skip values instead of doing array access (for neighbors
) or executing the map operation (for map
- I cannot safely apply the map operation in case for example of division by 0). Then, in the lowering of reduce I would put -- before the reduce library node -- another local map that fills the skip values with the reduce identity. I really like this approach (separate PR) but it will result in one extra write - replacing the dummy value with the reduce identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add part of the explanation you write here as comments.
Because, you get from my question that this is really hard to understand.
[gtx_common.Dimension("")] if isinstance(arg.data_type.dtype, gtir_ts.ListType) else [] | ||
[gtx_common.Dimension("")] if isinstance(arg.data_type.dtype, itir_ts.ListType) else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay it is a bit off topic, but Rule 11 implies that the name of a dimensions and the iteration variable name is liked.
I think it is not a problem, but we should probably clarify the rule a bit.
_, _, reduce_identity = gtir_to_tasklet.get_reduce_params(stencil_expr.expr) | ||
reduce_identity_for_args = reduce_identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I would make this aspect more clearer.
def make_const_list(arg: str) -> str: | ||
return arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you can say it is like a "lazy fill"?
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_dataflow.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some smallish things.
But It generally looks good.
# list (see the second if-branch), we stop carrying the reduce identity further | ||
# (`reduce_identity_for_args = None`) because it is not needed aymore; | ||
# the reason being that reduce operates along a single axis, which in this case | ||
# corresponds to the local dimension of the list of neighbor values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole problem is that there are different reduce identities:
self.reduce_identity
reduce_identity_for_args
I do not get why they are split.
Also I do not fully understand the explanation.
I think it is that if you hit the neighbor built in, then you would reduce over the list of neighbor hood values of a point, and because you are currently ignoring skip values, i.e. assume that all are defined, you do not have to fuill something.
So technically speaking it is not needed but not wrong to do.
If there is no reason to threat is as special case then I would not treat it as special case.
Except you want to do some prep work for the skip value case, but then you have to clearly indicate this.
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_builtin_translators.py
Outdated
Show resolved
Hide resolved
@@ -145,7 +145,7 @@ class DataflowOutputEdge: | |||
""" | |||
|
|||
state: dace.SDFGState | |||
expr: DataExpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
is probably a better name as expr
, but I suggest you to make it more consistent with the other two classes especially MemletInputEdge
and call it destination
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not really a destination, it is rather the sink node of the dataflow. What about sink
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sink
has a meaning in DaCe as "node with zero output degree", so it is a bit misleading, how about target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, expr
is a sink node according to the dace definition. A data node the dataflow writes to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but will it never be read from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sink node (typically a scalar or a local field) of the dataflow will actually be removed from the SDFG. The edge writing to the sink node will be replaced with an edge passing through the map exit node and writing to a new node, the transient field node (a multi-dimensional array with shape equal to the field operator domain).
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_dataflow.py
Outdated
Show resolved
Hide resolved
The `plus` operation is lowered to a tasklet inside a map that computes | ||
the domain of the local dimension (in this example, the number of neighbors). | ||
|
||
The result is a 1D local field, with same size as the input local dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also mention what the value is.
Now that I think about it, what is the size of the local field anyway.
I would say it has size 1, because
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subscript is the letter 'O' and it stands for offset. I will remove it from the example in order to avoid confusion. In this example, the local size will be V2E.max_neighbors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I would add what value the result has.
src/gt4py/next/program_processors/runners/dace_fieldview/gtir_dataflow.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some strong suggestions to improve the documentation of the implementation and behaviour of internals.
But the PR kind of looks good, but some documenting aspects should be improved.
elif cpm.is_call_to(stencil_expr.expr, "neighbors"): | ||
reduce_identity_for_args = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the explanation you just gave me here.
# list (see the second if-branch), we stop carrying the reduce identity further | ||
# (`reduce_identity_for_args = None`) because it is not needed aymore; | ||
# the reason being that reduce operates along a single axis, which in this case | ||
# corresponds to the local dimension of the list of neighbor values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add part of the explanation you write here as comments.
Because, you get from my question that this is really hard to understand.
@@ -566,7 +640,7 @@ def translate_symbol_ref( | |||
if TYPE_CHECKING: | |||
# Use type-checking to assert that all translator functions implement the `PrimitiveTranslator` protocol | |||
__primitive_translators: list[PrimitiveTranslator] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional declaration does not make any sense to me.
What does it give to you what a good unit test does not give you already?
The `plus` operation is lowered to a tasklet inside a map that computes | ||
the domain of the local dimension (in this example, the number of neighbors). | ||
|
||
The result is a 1D local field, with same size as the input local dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I would add what value the result has.
#1694) This PR extends the solution for map-reduce provided in #1683 with the support for connectivity tables with skip values. The field definition is extended with a `local_offset` attribute that stores the offset provider used to build the values in the local dimension. In case the local dimension is built by the `neighbors` expression, the `local_offset` corresponds to the offset provider used to access the neighbor dimension. Since this information is carried along the data itself, whenever the data is accessed it is also possible to access the corresponding offset provider and check whether the neighbor index is valid or if there is a skip value. For local dimensions already present in the program argument, this information is retrieved from the field domain (enabled in new test case). The data is accessed in the `map_` and `reduce` expressions. Here it is now possible to check for skip values. Therefore, the main objective of this PR is the lowering of map-reduce with skip values. A secondary objective is to pave the road to simplify the lowering logic, by getting rid of the `reduce_identity` value. The current approach is propagate the `reduce_identity` value while visiting the arguments to `reduce` expressions. By introducing `local_offset`, the argument visitor will return the information needed to implement `reduce` of local values in presence of skip values.
This PR adds support for lowering of
map_
andmake_const_list
builtin functions. However, the current implementation only supports neighbor tables with full connectivity (no skip values). The support for skip values will be added in next PR.To be noted:
input_connections
is extended to contain aTaskletConnection
variant, which is lowered to an empty edge from map entry node to the tasklet node.make_const_list
is a scalar value to be broadcasted on a local field. However, in order to keep the lowering simple, this value is represented as a 1D 1-element array (shape=(1,)
).