-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoDiff] Automatically determine AdStack's size #2438
Conversation
/format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't carefully checked ControlFlowGraph::determine_ad_stack_size
, but I believe this is great progress and a very useful feature in AutoDiff :-)
At a high level, I think we should add a few test cases using the CHI IR builder. A subtle error in this inference pass may result in undefined behavior (since stack overflows) instead of a laud error message. It would also make sense to add a runtime check of stack overflowing in debug mode.
TODOs:
|
Co-authored-by: Ye Kuang <k-ye@users.noreply.github.com>
Co-authored-by: Yuanming Hu <yuanming-hu@users.noreply.github.com>
verify -> verify_ir_structure |
I think it's better to put this refactoring in another PR. WDYT? |
Sounds good! |
taichi/ir/control_flow_graph.cpp
Outdated
for (auto &stack : oversized_stacks) { | ||
oversized_stacks_name.push_back(stack->name()); | ||
} | ||
TI_WARN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How serious it is if the necessary AD stack size overflows max_ad_stack_size
? If this would directly result in wrong results, we should better report an error and stop immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not that serious -- IMHO in most cases, it's the control-flow graph that unable to determine the max capacity. For example:
s = stack()
for i in range(10):
s.push(i)
The control-flow graph does not have the range 10
, and thus cannot determine the capacity.
From another perspective, the current codebase uses a fixed capacity of 16, and it works fine.
On the other hand, I don't see any warnings in the current tests, so maybe even the above case (a loop with #pushes > #pops) doesn't exist, and the control-flow graph is able to determine all maximum capacities. Whether the above case exists depends on the usage of AD-stack in auto_diff.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the code implementation looks like it is able to figure out the necessary stack size, which overflows max_ad_stack_size
. But the interpretation is "unable to determine the max capacity". Is it possible to distinguish these two cases, or maybe I'm misunderstanding something ? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I was using the terms "size" and "capacity" interchangeably... It should be the necessary stack size, not the max capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If auto_diff.cpp is bug-free, then #pushes should be always equal to #pops?
I still don't see why the fact that an undetermined stack would not lead to a bad result. IIUC, this is serious, but there are cases where this stack size just cannot be determined statically? (Like the range(10)
example you gave).
I don't see any warnings in the current tests
Were you referring to the python tests, or the new CPP tests? If the former, i think that's because the output are not shown in pytest
by default..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Let's add this comment to the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think maybe it's better to use the Bellman-Ford algorithm -- when we are "able to figure out all stack sizes statically", we are guaranteed to figure them out even if a stack needs a large size (and the algorithm's running time is approximately the same); when we are unable to figure out at least one stack size, the algorithm will run slower but we will be sure that we are unable to figure out the stack size statically (instead of a warning about possible overflow) (So, in this case, I think the message should not be a TI_WARN
-- maybe a TI_INFO
or TI_DEBUG
). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think maybe it's better to use the Bellman-Ford algorithm
Sounds great!
So, in this case, I think the message should not be a TI_WARN -- maybe a TI_INFO or TI_DEBUG. WDYT?
IIUC, we cannot determine the stack size as soon as the kernel has a loop in it? If so, yeah, maybe it's better to make this TI_DEBUG to reduce the noise..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we cannot determine the stack size as soon as the kernel has a loop in it? If so, yeah, maybe it's better to make this TI_DEBUG to reduce the noise..
Right...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljcc0930 will help review Bellman-Ford XD
taichi/ir/control_flow_graph.cpp
Outdated
for (auto &stack : oversized_stacks) { | ||
oversized_stacks_name.push_back(stack->name()); | ||
} | ||
TI_WARN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the code implementation looks like it is able to figure out the necessary stack size, which overflows max_ad_stack_size
. But the interpretation is "unable to determine the max capacity". Is it possible to distinguish these two cases, or maybe I'm misunderstanding something ? 🤣
Co-authored-by: Ye Kuang <k-ye@users.noreply.github.com>
taichi/ir/control_flow_graph.cpp
Outdated
for (auto &stack : oversized_stacks) { | ||
oversized_stacks_name.push_back(stack->name()); | ||
} | ||
TI_WARN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/questions/14405063/how-can-i-see-normal-print-output-created-during-pytest-run ? (You may have to tweak ti
to make this work 🤣 )
taichi/ir/control_flow_graph.cpp
Outdated
for (auto &stack : oversized_stacks) { | ||
oversized_stacks_name.push_back(stack->name()); | ||
} | ||
TI_WARN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see, but I thought oversized_stacks
means a different thing? I.e., "I am able to figure out the stack size statically, and it definitely overflows the configured max". So capping it to max_ad_stack_size
will lead to wrong result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Related issue = #656
This PR uses the control-flow graph to compute the necessary size for each Autodiff stack.
In case there may be a loop in which there is a stack push, a limitconstexpr int kMaxAdStackSize = 32;
is set to prevent infinite loops.We use the Bellman-Ford algorithm to compute the sizes. When there is a positive loop (#pushes > #pops in a loop) for an Autodiff stack, we cannot determine the size, and a default value
CompileConfig::default_ad_stack_size = 32
is used.