Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect world age range expansion #26514

Merged
merged 1 commit into from
Mar 19, 2018
Merged

Fix incorrect world age range expansion #26514

merged 1 commit into from
Mar 19, 2018

Conversation

Keno
Copy link
Member

@Keno Keno commented Mar 19, 2018

This is a bit of a tricky one, so let me start at the beginning. In
PR #25506, I pushed my WIP of a new inlining passes. However, one of
the things it temporarily doesn't do is turn call sites to :invoke,
causing significantly more jl_apply_generic calls to be codegened.
On CI for that PR (as well as locally), we non-deterministically see
errors like:

LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433]))

Some investigation revealed that what happened here is that a PkgId
got placed in a dictionary, but was later failed to be retrieved
because its hash had supposedly changes. Further investigation reveals
that while the hash function used to place the item in the dictionary
was hash(x::Union{String,SubString{String}}, h::UInt64) in hashing2.jl,
the hash function being used to retrieve it is hash(@nospecialize(x), h::UInt64)
in hashing.jl. The former is loaded later than the latter and should thus invalidate
all specializations of the latter, making them ineligible for selection by
jl_apply_generic. However, looking at the appropriate age ranges showed that
typeof(hash).name.mt.cache had entries for both hash functions with overlapping
age ranges (which is obviously incorrect).

Tracking age range updates, it turns out the world age range for the specialization
of hash(@nospecialize(x), h::UInt64) was expanded by invalidate_backedges called
upon the definition of a hash function inside LibGit2. This seems wrong. I believe
invalidate_backedges should only ever truncate world age ranges rather than expanding
them. This patch simply does that.

The non-determinism of the original issue appears to have to do with which of the
specializations happen to be in the jl_lookup_generic_ call_cache fast-path.

This issue is fairly hard to reproduce because it requires a very specific confluence
of circumstances. Since the range of the captured specialization only gets extended
to before the min_world age of the new definition, it is never visible in the latest
world (e.g. at the REPL). The included test case demonstrates the issue by capturing
the world age with a task. Commenting out the f(x::Float64) definition makes
the test pass, because it is that definition that causes the expansion of the original
specialization of f.

This is a bit of a tricky one, so let me start at the beginning. In
PR #25506, I pushed my WIP of a new inlining passes. However, one of
the things it temporarily doesn't do is turn call sites to :invoke,
causing significantly more jl_apply_generic calls to be codegened.
On CI for that PR (as well as locally), we non-deterministically see
errors like:
```
LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433]))
```
Some investigation revealed that what happened here is that a `PkgId`
got placed in a dictionary, but was later failed to be retrieved
because its hash had supposedly changes. Further investigation reveals
that while the hash function used to place the item in the dictionary
was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl,
the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)`
in hashing.jl. The former is loaded later than the latter and should thus invalidate
all specializations of the latter, making them ineligible for selection by
jl_apply_generic. However, looking at the appropriate age ranges showed that
`typeof(hash).name.mt.cache` had entries for both hash functions with overlapping
age ranges (which is obviously incorrect).

Tracking age range updates, it turns out the world age range for the specialization
of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called
upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe
invalidate_backedges should only ever truncate world age ranges rather than expanding
them. This patch simply does that.

The non-determinism of the original issue appears to have to do with which of the
specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path.

This issue is fairly hard to reproduce because it requires a very specific confluence
of circumstances. Since the range of the captured specialization only gets extended
to before the min_world age of the new definition, it is never visible in the latest
world (e.g. at the REPL). The included test case demonstrates the issue by capturing
the world age with a task. Commenting out the `f(x::Float64)` definition makes
the test pass, because it is that definition that causes the expansion of the original
specialization of `f`.
@Keno Keno merged commit 8812682 into master Mar 19, 2018
@martinholters martinholters deleted the kf/invalidageupdate branch March 19, 2018 14:34
@tkoolen
Copy link
Contributor

tkoolen commented Mar 19, 2018

Maybe this will fix #21653? I'll make a note to test the example from #21653 (comment) before and after this commit once I get things working on master.

@Keno
Copy link
Member Author

Keno commented Mar 19, 2018

It's possible since the age range invariants were being violated by this bug, which can easily trigger more problems down the line. I can't promise it though.

ararslan pushed a commit that referenced this pull request Apr 26, 2018
This is a bit of a tricky one, so let me start at the beginning. In
PR #25506, I pushed my WIP of a new inlining passes. However, one of
the things it temporarily doesn't do is turn call sites to :invoke,
causing significantly more jl_apply_generic calls to be codegened.
On CI for that PR (as well as locally), we non-deterministically see
errors like:
```
LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433]))
```
Some investigation revealed that what happened here is that a `PkgId`
got placed in a dictionary, but was later failed to be retrieved
because its hash had supposedly changes. Further investigation reveals
that while the hash function used to place the item in the dictionary
was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl,
the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)`
in hashing.jl. The former is loaded later than the latter and should thus invalidate
all specializations of the latter, making them ineligible for selection by
jl_apply_generic. However, looking at the appropriate age ranges showed that
`typeof(hash).name.mt.cache` had entries for both hash functions with overlapping
age ranges (which is obviously incorrect).

Tracking age range updates, it turns out the world age range for the specialization
of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called
upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe
invalidate_backedges should only ever truncate world age ranges rather than expanding
them. This patch simply does that.

The non-determinism of the original issue appears to have to do with which of the
specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path.

This issue is fairly hard to reproduce because it requires a very specific confluence
of circumstances. Since the range of the captured specialization only gets extended
to before the min_world age of the new definition, it is never visible in the latest
world (e.g. at the REPL). The included test case demonstrates the issue by capturing
the world age with a task. Commenting out the `f(x::Float64)` definition makes
the test pass, because it is that definition that causes the expansion of the original
specialization of `f`.

Ref #26514
(cherry picked from commit 90a2162)
ararslan pushed a commit that referenced this pull request Apr 26, 2018
This is a bit of a tricky one, so let me start at the beginning. In
PR #25506, I pushed my WIP of a new inlining passes. However, one of
the things it temporarily doesn't do is turn call sites to :invoke,
causing significantly more jl_apply_generic calls to be codegened.
On CI for that PR (as well as locally), we non-deterministically see
errors like:
```
LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433]))
```
Some investigation revealed that what happened here is that a `PkgId`
got placed in a dictionary, but was later failed to be retrieved
because its hash had supposedly changes. Further investigation reveals
that while the hash function used to place the item in the dictionary
was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl,
the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)`
in hashing.jl. The former is loaded later than the latter and should thus invalidate
all specializations of the latter, making them ineligible for selection by
jl_apply_generic. However, looking at the appropriate age ranges showed that
`typeof(hash).name.mt.cache` had entries for both hash functions with overlapping
age ranges (which is obviously incorrect).

Tracking age range updates, it turns out the world age range for the specialization
of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called
upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe
invalidate_backedges should only ever truncate world age ranges rather than expanding
them. This patch simply does that.

The non-determinism of the original issue appears to have to do with which of the
specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path.

This issue is fairly hard to reproduce because it requires a very specific confluence
of circumstances. Since the range of the captured specialization only gets extended
to before the min_world age of the new definition, it is never visible in the latest
world (e.g. at the REPL). The included test case demonstrates the issue by capturing
the world age with a task. Commenting out the `f(x::Float64)` definition makes
the test pass, because it is that definition that causes the expansion of the original
specialization of `f`.

Ref #26514
(cherry picked from commit 90a2162)
ararslan pushed a commit that referenced this pull request Apr 27, 2018
This is a bit of a tricky one, so let me start at the beginning. In
PR #25506, I pushed my WIP of a new inlining passes. However, one of
the things it temporarily doesn't do is turn call sites to :invoke,
causing significantly more jl_apply_generic calls to be codegened.
On CI for that PR (as well as locally), we non-deterministically see
errors like:
```
LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433]))
```
Some investigation revealed that what happened here is that a `PkgId`
got placed in a dictionary, but was later failed to be retrieved
because its hash had supposedly changes. Further investigation reveals
that while the hash function used to place the item in the dictionary
was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl,
the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)`
in hashing.jl. The former is loaded later than the latter and should thus invalidate
all specializations of the latter, making them ineligible for selection by
jl_apply_generic. However, looking at the appropriate age ranges showed that
`typeof(hash).name.mt.cache` had entries for both hash functions with overlapping
age ranges (which is obviously incorrect).

Tracking age range updates, it turns out the world age range for the specialization
of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called
upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe
invalidate_backedges should only ever truncate world age ranges rather than expanding
them. This patch simply does that.

The non-determinism of the original issue appears to have to do with which of the
specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path.

This issue is fairly hard to reproduce because it requires a very specific confluence
of circumstances. Since the range of the captured specialization only gets extended
to before the min_world age of the new definition, it is never visible in the latest
world (e.g. at the REPL). The included test case demonstrates the issue by capturing
the world age with a task. Commenting out the `f(x::Float64)` definition makes
the test pass, because it is that definition that causes the expansion of the original
specialization of `f`.

Ref #26514
(cherry picked from commit 90a2162)
@ararslan ararslan mentioned this pull request May 7, 2018
7 tasks
@ararslan ararslan mentioned this pull request Jun 17, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants