-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full indexing rework #260
Full indexing rework #260
Conversation
…issue mentioned in comments of: getBCastMergedIndices in index_compute.cpp
… into contiguity_cherry_pick
… into contiguity_cherry_pick
…ons we will use for index_compute.
…nst/non-const tensor dims.
… into contiguity_v3
I'm merging #277. which temporarily hides all the problem this PR is supposed to fix. We need to revert/undo the changes when testing the correctness of this PR. Sorry for the inconvenience. |
… into contiguity_merged
|
||
at::Tensor cg_output = at::empty({x, y, z}, options); | ||
void testGPU_FusionComplexBCast() { | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we split this into separate test functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, would you please push a commit for it to this pr?
// | ||
// The reason we need both TensorView and TensorDomain is that we need to have a | ||
// record of both what is being computed and how it is being computed. For | ||
// Example we may have the operation: TV3[I, J, K] = TV2[I, J, K] + TV1[I, J, K] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: example
return rfactor_domain_; | ||
}; | ||
// If rfactor domain exists in domain() return it, otherwise return root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add empty line
@@ -342,8 +342,18 @@ void IRPrinter::handle(const kir::NamedScalar* i) { | |||
os << i->name(); | |||
} | |||
|
|||
void IRPrinter::handle(const kir::IterDomain*) { | |||
TORCH_INTERNAL_ASSERT(false, "Unreachable"); | |||
void IRPrinter::handle(const kir::IterDomain* id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this needed? I hit this a few times while working on the refactoring, but every time it indicated that something else needed to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging matching between the IterDomain of kir and fusion was pretty challenging when I couldn't print what they were. No remaining code relies on it, but I rely on it for debugging. I had to update Type.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should now be able to fill the rest out marked as Unreachable
} | ||
|
||
std::vector<Expr*> UnsortedExprs::getFrom(std::vector<Val*> outputs) { | ||
if (outputs.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ ..}
// the life of this context guard. | ||
class TVDomainGuard { | ||
public: | ||
TensorView* tv_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if public, no need for _ suffix
- = nullptr (same for prev_domain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be private, thanks.
@@ -25,18 +29,21 @@ std::vector<kir::Bool*> PredicateCompute::computePredicates( | |||
return {}; | |||
} | |||
|
|||
std::vector<kir::Bool*> preds(root.size(), new kir::Bool(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to pre-initialize before the loop below.
const std::vector<kir::ForLoop*>& loops, | ||
kir::Bool* thread_pred) { | ||
if (loops.empty()) | ||
return new kir::Bool(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ .. }
|
||
void openLoop(kir::ForLoop*); | ||
|
||
std::unordered_map<IterDomain*, kir::Bool*> predicates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private:
to separate private methods from private state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're ordered, that may be the best you get from me.
@@ -64,6 +64,16 @@ static const char* val_type2string(ValType t) { | |||
return "Scalar"; | |||
case ValType::NamedScalar: | |||
return "NamedScalar"; | |||
case ValType::KirIterDomain: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for ir_iostream, an IR that isn't printable isn't debug-able.
// Simply grabs all exprs needed to produce provided outputs, only in dependency | ||
// order, not computeAtOrder like ExprSort in fusion.h | ||
class UnsortedExprs : public IterVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExprSort
doesn't do anything for computeAt anymore as the logic is moved to lower_loops.cpp. Perhaps, we don't need this class and can just use ExprSort
.
@@ -98,7 +100,6 @@ class TORCH_CUDA_API UnrollPass : public OptOutDispatch { | |||
static std::vector<Expr*> runPass( | |||
Fusion* fusion, | |||
const std::vector<Expr*>& exprs, | |||
const std::unordered_set<Expr*>& init_exprs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can re-enable that fix without carrying init_exprs. We can detect initialization expressions by the consumer pattern and the current loop nest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you open a new issue and I can do a follow up PR? I believe we still have correctness now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The previous failing test still works fine, but I'm not sure whether it's intended behavior. I re-opened the issue (#64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @csarofeen mentioned, predicate generation for initializing reduction buffers seems broken again. The original fix consists of #246 and #255.
I did a quick review of all changes and left two comments. One is regarding the predicate generation for reduction buffers. Another is the newly added class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me for now. #64 may be an issue again.
Doing a lot of indexing rework.
Right now we'd say this is contiguous, but linear indexing on this is not the same as:
The latter should be contiguous merges, the former not.