compiler stack usage improvements #6239

abadams · 2021-09-10T01:56:41Z

I found some code in the wild that needs 9mb of stack to lower. It's a
pain to even diagnose the problem definitively, because it requires
plumbing platform-specific linker flags though the build system to
grant more stack to rule out infinite recursion.

This PR:

Reduces peak stack usage of similar code in the open source repo
Increases the stack size for lowering and codegen to 32mb on all
platforms, using stack switching techniques. We started doing this on
Windows a while ago and it hasn't bitten us, so let's try on more
platforms.
Gives user control over the amount of stack used for lowering and
codegen. It shouldn't be necessary except when diagnosing problems like
this in future.

Using the control I was able to determine that the correctness tests all
pass with 500k of stack, and the apps all pass with 2MB, so 32MB ought
to be enough for anybody.

I found a never-checked-in test for the mux helper which uses 10MB of
stack and really shouldn't need to, so I added that (and opened an
issue) as an example of how to grant more stack when necessary, even
though 10MB is less than our default now.

Also fixed an incorrect comment on the Block node.

I found some code in the wild that needs 9mb of stack to lower. It's a pain to even diagnose the problem definitively, because it requires plumbing platform-specific linker flags to grant more stack. This commit: - Reduces peak stack usage of similar code in the repo (the FFT) - Increases the stack size for lowering and codegen to 32mb on all platforms, using stack switching techniques. We started doing this on Windows a while ago and it hasn't bitten us, so let's try on more platforms. - Gives user control over the amount of stack used for lowering and codegen. It shouldn't be necessary except when diagnosing problems like this in future. Using the control I was able to determine that the correctness tests all pass with 500k of stack, and the apps all pass with 1MB, so 32MB ought to be enough for anybody. I found a never-checked-in test for the mux helper which uses 10MB of stack and really shouldn't need to, so I added that (and opened an issue) as an example of how to grant more stack when necessary, even though 10MB is less than our default now. Also fixed an incorrect comment on the Block node.

abadams · 2021-09-10T16:38:34Z

For the pathological case found in the wild, this reduces stack usage to 3-4 MB, so this PR is not just about increasing the stack size.

alexreinking · 2021-09-10T19:33:49Z

src/AsyncProducers.cpp

@@ -457,28 +457,51 @@ class TightenProducerConsumerNodes : public IRMutator {

    Stmt make_producer_consumer(const string &name, bool is_producer, Stmt body, const Scope<int> &scope) {
        if (const LetStmt *let = body.as<LetStmt>()) {
-            if (expr_uses_vars(let->value, scope)) {
-                return ProducerConsumer::make(name, is_producer, body);
+            Stmt orig = body;  // Only used to keep a reference to the let chain in scope.


but why is that necessary?

Updated comment. It's because 'body' may be the only reference-counted Stmt keeping the first LetStmt alive, but we're mutating body to point to its innards. We don't want that first LetStmt to have its refcount hit zero and make our pointer dangle.

src/Util.cpp

alexreinking · 2021-09-10T19:46:04Z

I found some code in the wild that needs 9mb of stack to lower. It's a
pain to even diagnose the problem definitively, because it requires
plumbing platform-specific linker flags though the build system to
grant more stack to rule out infinite recursion.

Can this not be diagnosed with ulimit on Linux?

abadams · 2021-09-11T04:55:52Z

I found some code in the wild that needs 9mb of stack to lower. It's a
pain to even diagnose the problem definitively, because it requires
plumbing platform-specific linker flags though the build system to
grant more stack to rule out infinite recursion.

Can this not be diagnosed with ulimit on Linux?

It depends. For the person with the problem it couldn't be done without first messing with system settings to give the initial shell more than 8MB of stack to begin with, because ulimit only goes down.

alexreinking · 2021-09-11T17:42:51Z

It depends. For the person with the problem it couldn't be done without first messing with system settings to give the initial shell more than 8MB of stack to begin with, because ulimit only goes down.

On my system at least I start with an 8MB stack, but I can increase it via ulimit -s unlimited. This was necessary for several of the Haskell benchmarks in the Perceus paper since it would routinely use up too much stack space.

abadams · 2021-09-11T17:48:17Z

IIRC that did not work for the reporter and me (possibly because their build system complicates the issue by having an already-launched persistent daemon). Either way, it's good to have a mechanism to let the compiler use more stack that works on every platform.

alexreinking · 2021-09-12T05:22:05Z

Either way, it's good to have a mechanism to let the compiler use more stack that works on every platform.

Sorry, I didn't mean to argue against the PR with that question. It was just for clarification.

alexreinking · 2021-09-14T04:14:36Z

Hmm, looks like we're seeing a real failure on the new test with LLVM 14 on Windows.

The macos failure is unrelated

abadams · 2021-09-14T17:19:26Z

Maybe 10mb is too tight for windows. I bumped it to 12 and we'll see if it passes.

steven-johnson · 2021-09-14T18:38:15Z

src/SkipStages.cpp

@@ -78,37 +78,48 @@ class PredicateFinder : public IRVisitor {
        op->body.accept(this);
        if (should_pop) {
            varying.pop(op->name);
-            //internal_assert(!expr_uses_var(predicate, op->name));
+            // internal_assert(!expr_uses_var(predicate, op->name));


We should either remove this line or add a comment about why we aren't doing this check.

steven-johnson · 2021-09-14T18:38:48Z

src/UnrollLoops.cpp

@@ -83,6 +83,11 @@ class UnrollLoops : public IRMutator {
            Stmt iters;
            for (int i = e->value - 1; i >= 0; i--) {
                Stmt iter = substitute(for_loop->name, for_loop->min + i, body);
+                // It's necessary to simplify eagerly this iteration


nit: "eagerly simplify"

steven-johnson · 2021-09-14T18:41:48Z

src/Util.cpp

+    void *stack = mmap(nullptr, stack_size.size + guard_band, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+    internal_assert(stack);
+
+    mprotect((char *)stack + stack_size.size, guard_band, PROT_NONE);


Shouldn't we check the return value?

steven-johnson · 2021-09-15T20:37:39Z

Looks like we're getting segfaults on osx-arm64

abadams · 2021-09-16T18:18:58Z

On Arm Macos the dubious function pointer casting that caused the deprecation of these routines was indeed invalid - only the bottom 32-bits made it through the variadic call. I switched it to pass args in a thread local instead. Should be safer.

steven-johnson · 2021-09-17T18:10:30Z

Looks like correctness_unroll_huge_mux is segfaulting on llvm14-x86-64-windows

alexreinking · 2021-09-30T19:46:08Z

I think it's time to CRD into the windows buildbot and just binary search our way via the env-var to the right stack size for the mux test case.

abadams · 2021-09-30T19:54:37Z

Windows is currently passing.

alexreinking · 2021-09-30T20:18:21Z

Windows is currently passing.

Oh, good. I saw a Windows failure earlier, but maybe it was re-run?

abadams added 6 commits September 9, 2021 16:32

Fixes for macos

773ecc4

Add test to cmake

325ccf8

Fix type of temporary

39cec7d

Reduce number of exprs in the mux

2f0749a

Fix quadratic memory usage in new test

b03a347

abadams mentioned this pull request Sep 10, 2021

unroll_huge_mux test uses more than 8MB of stack #6238

Open

abadams requested a review from alexreinking September 10, 2021 16:07

alexreinking requested changes Sep 10, 2021

View reviewed changes

abadams added 2 commits September 12, 2021 10:57

Better comment

0e24ddb

Variable name fix

ea35bee

alexreinking approved these changes Sep 12, 2021

View reviewed changes

alexreinking added the release_notes For changes that may warrant a note in README for official releases. label Sep 12, 2021

Try giving windows a little more stack

0715979

Clarify why we want a live Stmt in scope

1adb652

steven-johnson reviewed Sep 14, 2021

View reviewed changes

Review comments

32d4953

steven-johnson reviewed Sep 14, 2021

View reviewed changes

abadams and others added 2 commits September 14, 2021 11:47

Check some return values

6fe568e

tickle buildbots

76e020a

abadams added 2 commits September 16, 2021 11:10

Fixes for arm macos

72fc46c

Remove stray character

748f87c

clang-tidy had some reasonable concerns

46854ab

abadams added 3 commits September 17, 2021 11:28

Comment fix

429c0e3

Maybe windows needs yet more stack

083d389

Merge remote-tracking branch 'origin/master' into abadams/stack_size

d87c943

abadams merged commit 81ad45e into master Oct 1, 2021

abadams deleted the abadams/stack_size branch October 1, 2021 21:41

alexreinking removed the release_notes For changes that may warrant a note in README for official releases. label Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler stack usage improvements #6239

compiler stack usage improvements #6239

abadams commented Sep 10, 2021

abadams commented Sep 10, 2021

alexreinking Sep 10, 2021

abadams Sep 14, 2021

alexreinking commented Sep 10, 2021

abadams commented Sep 11, 2021

alexreinking commented Sep 11, 2021

abadams commented Sep 11, 2021

alexreinking commented Sep 12, 2021

alexreinking commented Sep 14, 2021 •

edited

Loading

abadams commented Sep 14, 2021

steven-johnson Sep 14, 2021

steven-johnson Sep 14, 2021

steven-johnson Sep 14, 2021

steven-johnson commented Sep 15, 2021

abadams commented Sep 16, 2021

steven-johnson commented Sep 17, 2021

alexreinking commented Sep 30, 2021

abadams commented Sep 30, 2021

alexreinking commented Sep 30, 2021

compiler stack usage improvements #6239

compiler stack usage improvements #6239

Conversation

abadams commented Sep 10, 2021

abadams commented Sep 10, 2021

alexreinking Sep 10, 2021

Choose a reason for hiding this comment

abadams Sep 14, 2021

Choose a reason for hiding this comment

alexreinking commented Sep 10, 2021

abadams commented Sep 11, 2021

alexreinking commented Sep 11, 2021

abadams commented Sep 11, 2021

alexreinking commented Sep 12, 2021

alexreinking commented Sep 14, 2021 • edited Loading

abadams commented Sep 14, 2021

steven-johnson Sep 14, 2021

Choose a reason for hiding this comment

steven-johnson Sep 14, 2021

Choose a reason for hiding this comment

steven-johnson Sep 14, 2021

Choose a reason for hiding this comment

steven-johnson commented Sep 15, 2021

abadams commented Sep 16, 2021

steven-johnson commented Sep 17, 2021

alexreinking commented Sep 30, 2021

abadams commented Sep 30, 2021

alexreinking commented Sep 30, 2021

alexreinking commented Sep 14, 2021 •

edited

Loading