Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant scope lookups #8103

Merged
merged 5 commits into from
Feb 22, 2024
Merged

Avoid redundant scope lookups #8103

merged 5 commits into from
Feb 22, 2024

Conversation

abadams
Copy link
Member

@abadams abadams commented Feb 18, 2024

This pattern has been bugging me for a long time:

if (scope.contains(key)) {
  Foo f = scope.get(key);
}

This redundantly looks up the key in the scope twice. I've finally gotten around to fixing it. I've introduced a find method that either returns a const pointer to the value, if it exists, or null. It also searches any containing scopes, which are held by const pointer, so the method has to return a const pointer.

if (const Foo *f = scope.find(key)) {
}

For cases where you want to get and then mutate, I added shallow_find, which doesn't search enclosing scopes, but returns a mutable pointer.

We were also doing redundant scope lookups in ScopedBinding. We stored the key in the helper object, and then did a pop on that key in the ScopedBinding destructor. This commit changes Scope so that Scope::push returns an opaque token that you can pass to Scope::pop to have it remove that element without doing a fresh lookup. ScopedBinding now uses this. Under the hood it's just an iterator on the underlying map (map iterators are not invalidated on inserting or removing other stuff).

The net effect is to speed up local laplacian lowering by about 5%

I also considered making it look more like an stl class, and having find return an iterator, but it doesn't really work. The iterator it returns might point to an entry in an enclosing scope, in which case you can't compare it to the .end() method of the scope you have. Scopes are different enough from maps that the interface really needs to be distinct.

This pattern has been bugging me for a long time:

```
if (scope.contains(key)) {
  Foo f = scope.get(key);
}
```

This redundantly looks up the key in the scope twice. I've finally
gotten around to fixing it. I've introduced a find method that either
returns a const pointer to the value, if it exists, or null. It also
searches any containing scopes, which are held by const pointer, so the
method has to return a const pointer.

```
if (const Foo *f = scope.find(key)) {
}
```

For cases where you want to get and then mutate, I added shallow_find,
which doesn't search enclosing scopes, but returns a mutable pointer.

We were also doing redundant scope lookups in ScopedBinding. We stored
the key in the helper object, and then did a pop on that key in the
ScopedBinding destructor. This commit changes Scope so that Scope::push
returns an opaque token that you can pass to Scope::pop to have it
remove that element without doing a fresh lookup. ScopedBinding now uses
this. Under the hood it's just an iterator on the underlying map (map
iterators are not invalidated on inserting or removing other stuff).

The net effect is to speed up local laplacian lowering by about 5%

I also considered making it look more like an stl class, and having find
return an iterator, but it doesn't really work. The iterator it returns
might point to an entry in an enclosing scope, in which case you can't
compare it to the .end() method of the scope you have. Scopes are
different enough from maps that the interface really needs to be
distinct.
abadams added a commit that referenced this pull request Feb 18, 2024
For O(n) nested allocate nodes, this pass was quadratic in n, even if
there was no use of atomics. This commit rewrites it to use a
linear-time algorithm, and skips it entirely after the first validation
pass if there aren't any atomic nodes.

It also needlessly used IRGraphMutators, which slowed things down,
didn't handle LargeBuffers (could overflow in the allocation),
incorrectly thought every producer/consumer node was associated with an
output buffer, and didn't print the realization name when printing the
atomic node (the body of an atomic node is only atomic w.r.t. a specific
realization).

I noticed all this because it stuck out in a profile. For resnet 50, the
rewrite that changed to a linear algorithm took this stage from 185ms
down to 6.7ms, and then skipping it entirely when it doesn't find any
atomic nodes added 1.5 for the single IRVisitor check.

For local laplacian with 100 pyramid levels (which contains many nested
allocate nodes due to a large number of skip connections), the times are
5846 ms -> 16 ms -> 4.6 ms

This is built on top of #8103
@abadams abadams merged commit 57164df into main Feb 22, 2024
19 checks passed
abadams added a commit that referenced this pull request Mar 12, 2024
* Avoid redundant scope lookups

This pattern has been bugging me for a long time:

```
if (scope.contains(key)) {
  Foo f = scope.get(key);
}
```

This redundantly looks up the key in the scope twice. I've finally
gotten around to fixing it. I've introduced a find method that either
returns a const pointer to the value, if it exists, or null. It also
searches any containing scopes, which are held by const pointer, so the
method has to return a const pointer.

```
if (const Foo *f = scope.find(key)) {
}
```

For cases where you want to get and then mutate, I added shallow_find,
which doesn't search enclosing scopes, but returns a mutable pointer.

We were also doing redundant scope lookups in ScopedBinding. We stored
the key in the helper object, and then did a pop on that key in the
ScopedBinding destructor. This commit changes Scope so that Scope::push
returns an opaque token that you can pass to Scope::pop to have it
remove that element without doing a fresh lookup. ScopedBinding now uses
this. Under the hood it's just an iterator on the underlying map (map
iterators are not invalidated on inserting or removing other stuff).

The net effect is to speed up local laplacian lowering by about 5%

I also considered making it look more like an stl class, and having find
return an iterator, but it doesn't really work. The iterator it returns
might point to an entry in an enclosing scope, in which case you can't
compare it to the .end() method of the scope you have. Scopes are
different enough from maps that the interface really needs to be
distinct.

* Pacify clang-tidy

* Rewrite the pass that injects mutexes to support atomics

For O(n) nested allocate nodes, this pass was quadratic in n, even if
there was no use of atomics. This commit rewrites it to use a
linear-time algorithm, and skips it entirely after the first validation
pass if there aren't any atomic nodes.

It also needlessly used IRGraphMutators, which slowed things down,
didn't handle LargeBuffers (could overflow in the allocation),
incorrectly thought every producer/consumer node was associated with an
output buffer, and didn't print the realization name when printing the
atomic node (the body of an atomic node is only atomic w.r.t. a specific
realization).

I noticed all this because it stuck out in a profile. For resnet 50, the
rewrite that changed to a linear algorithm took this stage from 185ms
down to 6.7ms, and then skipping it entirely when it doesn't find any
atomic nodes added 1.5 for the single IRVisitor check.

For local laplacian with 100 pyramid levels (which contains many nested
allocate nodes due to a large number of skip connections), the times are
5846 ms -> 16 ms -> 4.6 ms

This is built on top of #8103

* Fix unintentional mutation of interval in scope

---------

Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants