-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: IterDomain Graphs #32
Conversation
Helps if I include the right files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started reviewing. I haven't gone through all of buildLoopPromotionMap
yet.
csrc/id_graphs.h
Outdated
// (1) The disjoint set of the provided Iter Domain if it exists, | ||
// otherwise a null shared ptr | ||
// (2) If the disjoint set of the provided Iter Domain exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from returning a std::optional<IdGroup>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an optional probably makes a lot of sense, but actually I think most instances I just want to assert there's actually an id set. So probably I'll just put an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just stole the pattern for https://cplusplus.com/reference/unordered_set/unordered_set/emplace/ for development, no reason not to switch to optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. Makes sense. I think std::optional
was introduced in C++17, but std::unordered_set
has been around since C++11 so they had to use the explicit pair<..., bool>
. From side discussion it looks like we might be OK with using C++17 going forward on NVFuser. Either way not a big deal of course.
csrc/id_graphs.h
Outdated
//! all IterDomains in the disjoint set to that PType. | ||
void validateAndPropagatePType() const; | ||
|
||
void buildLoopPromotionMap(const std::vector<Expr*>& exprs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question, but I see "promotion" mentioned many times here. In this context what does it mean to promote an ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Promotion is the concept that an iteration domain that a TensorView has in its root->domain might not really be what's required for the generated kernel to index into that TensorView. Promotion is for example:
- Producer has a broadcast merged with an iteration domain
- Consumer has a (mapped to the producer) iteration domain merged with an iteration domain
- Based on other transformations the producer might have to "promote" it's broadcast domain to the iteration domain in it's consumer
- If there's a producer of that producer, then we still might need the "broadcast promotion" but there isn't a broadcast that maps in that producer's producer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "end goal" of promotion in this pass (still very WIP), is that each leaf iter domain of a tensor view might be "promoted" to a larger iteration domain representative of the for loops. That larger iter domain still needs connections that we can traverse to index into the tensor view's buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisiting this after some thought and going through some of the machinery. Just to try and reiterate the example above:
t0[(b0*i1), i2] // producer
t1[(i3 * i4)] = f(t0) // consumer, through expression f
i3
is defined by an expression in the variables i1, i2
.
We may need to alter b0
to match i3
if i4
matches i1
.
/This might cascade, since (b0*i1)
might match another merged broadcast in its producer.
End goal of promotion: map each leaf IterDomain in each tensorview (importantly including bcast domains and transforms thereof) to an Iteration
IterDomain that is written as a transform of that TVs root_domain, so that we can index into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look, but that's not enough to make any meaningful feedback. I'd need to spend a significant amount of time (i.e., a few days). Should I wait? Is it ready?
@naoyam I think it's worth trying to read through, I wouldn't worry about really nailing down interfaces, but the building and relationship of an Iter Domain Graph, and how we operate on a collection of Iter Domain Graphs. The infrastructure is here, but |
i.e. |
(I'm sorry I accidentally clicked the close button) |
Good to know. That's where I intended to focus on as I think that's one of the main tasks in this PR. I'll wait further updates for the function. Will look though the rest. |
@naoyam you could start looking at I still need to make a backward replay before we try to perform indexing, as I don't want to index all the tensor views one by one, so instead I intend to build a graph that we can naively traverse all at once to get consumer indices. |
(Sorry again accidentally clicked the close button)
All these tests resulted in a failure. Is it expected? |
It seems expected as there's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on IdGraph
Yes, indexing is not hooked up, so they're not yet supported. I'm throwing an error where I leave off in the analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments so far on buildLoopPromotionMap
csrc/id_graphs.cpp
Outdated
VectorOfUniqueEntries<IterDomain*> all_producer_ca_deps; | ||
{ | ||
auto ca_dep_vals = DependencyCheck::getAllValsBetween( | ||
{producer_root.begin(), producer_root.end()}, | ||
{producer_domain.begin(), | ||
producer_domain.begin() + producer->getComputeAtPosition()}); | ||
auto ca_deps_filter = ir_utils::filterByType<IterDomain>(ca_dep_vals); | ||
|
||
all_producer_ca_deps.insert( | ||
ca_deps_filter.begin(), ca_deps_filter.end()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This pattern appears quite often. We should create a utility function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. Something that goes directly from...
DependencyCheck::getAllValsBetween(
{producer_root.begin(), producer_root.end()},
{producer_domain.begin(),
producer_domain.begin() + producer->getComputeAtPosition()})
To:
VectorOfUniqueEntries<IterDomain*> all_producer_ca_deps;
?
csrc/id_graphs.cpp
Outdated
} | ||
} | ||
|
||
const IdGraph& IterDomainGraphs::idGraph(IdMappingMode mode) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this assert if the returned graph is already constructed? For example, the LOOP map is not available before lowering, so it would be nice if we could assert it's accidentally queried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, do you think the check in replay as is enough to do this effectively or do you think that we should have some flag in IterDomainGraphs marking when each mode is initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter seems to make more sense to me.
csrc/id_graphs.cpp
Outdated
|
||
buildPermissiveMap(tv_exprs); | ||
|
||
// Only build loop map during lowering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also sometimes just want to build only an exact and/or permissive map. The scheduler is one example where we sometimes use an exact map only. Maybe a permissive map is also used anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it's not too expensive, but I'd be happy to have finer granularity on the building of the graphs. Being able to specify which are needed, or generating the maps lazily could also be cool. Leaving to future work for now.
csrc/id_graphs.cpp
Outdated
idGraph(IdMappingMode::LOOP).disjointIdSets().disjointSets()) { | ||
if (group->size() == 1) { | ||
p2c_ca_terminal_loop_ids.pushBack(group->front()); | ||
id_consumer_terminal_loop_ids.pushBack(group->front()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: insert continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to clean this up.
csrc/id_graphs.cpp
Outdated
// T4 = T1[i0, b1] + T3[i0, i1] | ||
// T6 = T2[i0, b1] + T5[i0, i2] | ||
// | ||
// The almost exact map will map T1's and T2's b1 together, but they're being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a little more why they are mapped? Are these domains merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, this assumes merge(i0, b1)
, merge(i0, i1)
, and merge(i0, i2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then merge(i0, b1)
of T1 and merge(i0, b1)
of T2 are almost exact mapped together from id graph propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rewrite the expressions as below. I think this is more accurate.
T1[i0, b1] = T0[i0]
T2[i0, b2] = T0[i0] // Not T2[i0, b1]
T4 = T1[i0, b1] + T3[i0, i1]
T6 = T2[i0, b2] + T5[i0, i2]
Then merge(0, 1) with all tensors except for T0
.
The almost-exact map would map i0
and i0*b1
and i0*b2
. Does it also map b1
and b2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, if it maps b1 and b2 together. Should be benign if so.
!build |
Closing this PR. Most of the code was already merged into main, and the remaining code is currently not used and will be revisited it if necessary. The next work is indexing, which is tracked in #2238. |
Renamed test_gpu_indexing.cpp to test_indexing_advanced.cpp. Changed the tests to exercise both the legacy and new indexers. Added several tests originally developed for IdModel (#32). Some of them are disabled as they are not yet supported.
Build out of Iter domain graphs as infrastructure. I kept these concepts separate from the compute_at_map as that may need to be reimplemented later based on this new concept/infrastructure.
This new concept of IterDomainGraphs will eventually replace all our index and parallelization logic. This infrastructure is to make it easier to work with iter domain graphs for processes like accurate broadcast resolution/promotion. IdGraph or IterDomainGraphs could also directly replace BestEffortReplay and similar mappings across producer-consumers.
I added a couple interesting tests that I want to make work but still don't.