Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the skip stages lowering pass #8115

Merged
merged 15 commits into from
Feb 27, 2024
Merged

Conversation

abadams
Copy link
Member

@abadams abadams commented Feb 21, 2024

Skip stages was slow due to crappy computational complexity (quadratic?)

I reworked it into a two-pass linear-time algorithm. The first part
remembers which pieces of IR are actually relevant to the task, and the
second pass performs the task using a bounds-inference-like algorithm.

On main resnet50 spends 519 ms in this pass. This commit reduces it to
40 ms. Local laplacian with 100 pyramid levels spends 7.4 seconds in
this pass. This commit reduces it to ~3 ms.

This commit also moves the cache store for memoized Funcs into the
produce node, instead of at the top of the consume node, because it
naturally places it inside a condition you inject into the produce node.

Built on #8103, so don't bother reviewing until that's merged. I just want to get it tested on the bots.

This pattern has been bugging me for a long time:

```
if (scope.contains(key)) {
  Foo f = scope.get(key);
}
```

This redundantly looks up the key in the scope twice. I've finally
gotten around to fixing it. I've introduced a find method that either
returns a const pointer to the value, if it exists, or null. It also
searches any containing scopes, which are held by const pointer, so the
method has to return a const pointer.

```
if (const Foo *f = scope.find(key)) {
}
```

For cases where you want to get and then mutate, I added shallow_find,
which doesn't search enclosing scopes, but returns a mutable pointer.

We were also doing redundant scope lookups in ScopedBinding. We stored
the key in the helper object, and then did a pop on that key in the
ScopedBinding destructor. This commit changes Scope so that Scope::push
returns an opaque token that you can pass to Scope::pop to have it
remove that element without doing a fresh lookup. ScopedBinding now uses
this. Under the hood it's just an iterator on the underlying map (map
iterators are not invalidated on inserting or removing other stuff).

The net effect is to speed up local laplacian lowering by about 5%

I also considered making it look more like an stl class, and having find
return an iterator, but it doesn't really work. The iterator it returns
might point to an entry in an enclosing scope, in which case you can't
compare it to the .end() method of the scope you have. Scopes are
different enough from maps that the interface really needs to be
distinct.
Skip stages was slow due to crappy computational complexity (quadratic?)

I reworked it into a two-pass linear-time algorithm. The first part
remembers which pieces of IR are actually relevant to the task, and the
second pass performs the task using a bounds-inference-like algorithm.

On main resnet50 spends 519 ms in this pass. This commit reduces it to
40 ms. Local laplacian with 100 pyramid levels spends 7.4 seconds in
this pass. This commit reduces it to ~3 ms.

This commit also moves the cache store for memoized Funcs into the
produce node, instead of at the top of the consume node, because it
naturally places it inside a condition you inject into the produce node.
@zvookin
Copy link
Member

zvookin commented Feb 21, 2024

Is there any chance this changes the values or values available to the cache key computation? (The latter would likely be a compile time error. The former could introduce a change, likely a bug, in computation.)

@abadams
Copy link
Member Author

abadams commented Feb 21, 2024

I don't believe so. The change was from this:

compute cache key
perform cache lookup
realize Foo {
produce Foo {
 if (cache miss) {
   compute Foo
 }
}
consume Foo {
  if (cache miss) {
    cache store
  }
 ...
}
}

to this:

compute cache key
perform cache lookup
realize Foo {
produce Foo {
 if (cache miss) {
   compute Foo
   cache store
 }
} 
consume Foo {
 ...
}
}

The cache lookup and key computation are in the same place as they were before

@steven-johnson
Copy link
Contributor

Ready to land pending green

@steven-johnson steven-johnson merged commit 36d74a8 into main Feb 27, 2024
3 checks passed
@steven-johnson steven-johnson deleted the abadams/rewrite_skip_stages branch February 27, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants