-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect world age range expansion #26514
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a bit of a tricky one, so let me start at the beginning. In PR #25506, I pushed my WIP of a new inlining passes. However, one of the things it temporarily doesn't do is turn call sites to :invoke, causing significantly more jl_apply_generic calls to be codegened. On CI for that PR (as well as locally), we non-deterministically see errors like: ``` LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433])) ``` Some investigation revealed that what happened here is that a `PkgId` got placed in a dictionary, but was later failed to be retrieved because its hash had supposedly changes. Further investigation reveals that while the hash function used to place the item in the dictionary was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl, the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)` in hashing.jl. The former is loaded later than the latter and should thus invalidate all specializations of the latter, making them ineligible for selection by jl_apply_generic. However, looking at the appropriate age ranges showed that `typeof(hash).name.mt.cache` had entries for both hash functions with overlapping age ranges (which is obviously incorrect). Tracking age range updates, it turns out the world age range for the specialization of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe invalidate_backedges should only ever truncate world age ranges rather than expanding them. This patch simply does that. The non-determinism of the original issue appears to have to do with which of the specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path. This issue is fairly hard to reproduce because it requires a very specific confluence of circumstances. Since the range of the captured specialization only gets extended to before the min_world age of the new definition, it is never visible in the latest world (e.g. at the REPL). The included test case demonstrates the issue by capturing the world age with a task. Commenting out the `f(x::Float64)` definition makes the test pass, because it is that definition that causes the expansion of the original specialization of `f`.
Keno
force-pushed
the
kf/invalidageupdate
branch
from
March 19, 2018 03:17
0726273
to
90a2162
Compare
vtjnash
approved these changes
Mar 19, 2018
Maybe this will fix #21653? I'll make a note to test the example from #21653 (comment) before and after this commit once I get things working on master. |
It's possible since the age range invariants were being violated by this bug, which can easily trigger more problems down the line. I can't promise it though. |
ararslan
pushed a commit
that referenced
this pull request
Apr 26, 2018
This is a bit of a tricky one, so let me start at the beginning. In PR #25506, I pushed my WIP of a new inlining passes. However, one of the things it temporarily doesn't do is turn call sites to :invoke, causing significantly more jl_apply_generic calls to be codegened. On CI for that PR (as well as locally), we non-deterministically see errors like: ``` LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433])) ``` Some investigation revealed that what happened here is that a `PkgId` got placed in a dictionary, but was later failed to be retrieved because its hash had supposedly changes. Further investigation reveals that while the hash function used to place the item in the dictionary was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl, the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)` in hashing.jl. The former is loaded later than the latter and should thus invalidate all specializations of the latter, making them ineligible for selection by jl_apply_generic. However, looking at the appropriate age ranges showed that `typeof(hash).name.mt.cache` had entries for both hash functions with overlapping age ranges (which is obviously incorrect). Tracking age range updates, it turns out the world age range for the specialization of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe invalidate_backedges should only ever truncate world age ranges rather than expanding them. This patch simply does that. The non-determinism of the original issue appears to have to do with which of the specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path. This issue is fairly hard to reproduce because it requires a very specific confluence of circumstances. Since the range of the captured specialization only gets extended to before the min_world age of the new definition, it is never visible in the latest world (e.g. at the REPL). The included test case demonstrates the issue by capturing the world age with a task. Commenting out the `f(x::Float64)` definition makes the test pass, because it is that definition that causes the expansion of the original specialization of `f`. Ref #26514 (cherry picked from commit 90a2162)
ararslan
pushed a commit
that referenced
this pull request
Apr 26, 2018
This is a bit of a tricky one, so let me start at the beginning. In PR #25506, I pushed my WIP of a new inlining passes. However, one of the things it temporarily doesn't do is turn call sites to :invoke, causing significantly more jl_apply_generic calls to be codegened. On CI for that PR (as well as locally), we non-deterministically see errors like: ``` LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433])) ``` Some investigation revealed that what happened here is that a `PkgId` got placed in a dictionary, but was later failed to be retrieved because its hash had supposedly changes. Further investigation reveals that while the hash function used to place the item in the dictionary was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl, the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)` in hashing.jl. The former is loaded later than the latter and should thus invalidate all specializations of the latter, making them ineligible for selection by jl_apply_generic. However, looking at the appropriate age ranges showed that `typeof(hash).name.mt.cache` had entries for both hash functions with overlapping age ranges (which is obviously incorrect). Tracking age range updates, it turns out the world age range for the specialization of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe invalidate_backedges should only ever truncate world age ranges rather than expanding them. This patch simply does that. The non-determinism of the original issue appears to have to do with which of the specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path. This issue is fairly hard to reproduce because it requires a very specific confluence of circumstances. Since the range of the captured specialization only gets extended to before the min_world age of the new definition, it is never visible in the latest world (e.g. at the REPL). The included test case demonstrates the issue by capturing the world age with a task. Commenting out the `f(x::Float64)` definition makes the test pass, because it is that definition that causes the expansion of the original specialization of `f`. Ref #26514 (cherry picked from commit 90a2162)
ararslan
pushed a commit
that referenced
this pull request
Apr 27, 2018
This is a bit of a tricky one, so let me start at the beginning. In PR #25506, I pushed my WIP of a new inlining passes. However, one of the things it temporarily doesn't do is turn call sites to :invoke, causing significantly more jl_apply_generic calls to be codegened. On CI for that PR (as well as locally), we non-deterministically see errors like: ``` LoadError("sysimg.jl", 213, KeyError(LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433])) ``` Some investigation revealed that what happened here is that a `PkgId` got placed in a dictionary, but was later failed to be retrieved because its hash had supposedly changes. Further investigation reveals that while the hash function used to place the item in the dictionary was `hash(x::Union{String,SubString{String}}, h::UInt64)` in hashing2.jl, the hash function being used to retrieve it is `hash(@nospecialize(x), h::UInt64)` in hashing.jl. The former is loaded later than the latter and should thus invalidate all specializations of the latter, making them ineligible for selection by jl_apply_generic. However, looking at the appropriate age ranges showed that `typeof(hash).name.mt.cache` had entries for both hash functions with overlapping age ranges (which is obviously incorrect). Tracking age range updates, it turns out the world age range for the specialization of `hash(@nospecialize(x), h::UInt64)` was expanded by invalidate_backedges called upon the definition of a hash function inside `LibGit2`. This seems wrong. I believe invalidate_backedges should only ever truncate world age ranges rather than expanding them. This patch simply does that. The non-determinism of the original issue appears to have to do with which of the specializations happen to be in the `jl_lookup_generic_` `call_cache` fast-path. This issue is fairly hard to reproduce because it requires a very specific confluence of circumstances. Since the range of the captured specialization only gets extended to before the min_world age of the new definition, it is never visible in the latest world (e.g. at the REPL). The included test case demonstrates the issue by capturing the world age with a task. Commenting out the `f(x::Float64)` definition makes the test pass, because it is that definition that causes the expansion of the original specialization of `f`. Ref #26514 (cherry picked from commit 90a2162)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a bit of a tricky one, so let me start at the beginning. In
PR #25506, I pushed my WIP of a new inlining passes. However, one of
the things it temporarily doesn't do is turn call sites to :invoke,
causing significantly more jl_apply_generic calls to be codegened.
On CI for that PR (as well as locally), we non-deterministically see
errors like:
Some investigation revealed that what happened here is that a
PkgId
got placed in a dictionary, but was later failed to be retrieved
because its hash had supposedly changes. Further investigation reveals
that while the hash function used to place the item in the dictionary
was
hash(x::Union{String,SubString{String}}, h::UInt64)
in hashing2.jl,the hash function being used to retrieve it is
hash(@nospecialize(x), h::UInt64)
in hashing.jl. The former is loaded later than the latter and should thus invalidate
all specializations of the latter, making them ineligible for selection by
jl_apply_generic. However, looking at the appropriate age ranges showed that
typeof(hash).name.mt.cache
had entries for both hash functions with overlappingage ranges (which is obviously incorrect).
Tracking age range updates, it turns out the world age range for the specialization
of
hash(@nospecialize(x), h::UInt64)
was expanded by invalidate_backedges calledupon the definition of a hash function inside
LibGit2
. This seems wrong. I believeinvalidate_backedges should only ever truncate world age ranges rather than expanding
them. This patch simply does that.
The non-determinism of the original issue appears to have to do with which of the
specializations happen to be in the
jl_lookup_generic_
call_cache
fast-path.This issue is fairly hard to reproduce because it requires a very specific confluence
of circumstances. Since the range of the captured specialization only gets extended
to before the min_world age of the new definition, it is never visible in the latest
world (e.g. at the REPL). The included test case demonstrates the issue by capturing
the world age with a task. Commenting out the
f(x::Float64)
definition makesthe test pass, because it is that definition that causes the expansion of the original
specialization of
f
.