-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in GraphPPL.jl tests on 1.11, works fine in debugger #56459
Comments
The error
|
Can we get a self-contained MWE instead of referencing code on another repository? I'm asking this also because
is very much not true as one needs also to copy a bunch of definitions in https://github.com/ReactiveBayes/GraphPPL.jl/blob/c97718a10bcf035cff093acf52ee9fe30f225b35/test/testutils.jl and tracking down all missing imports (which are a lot) isn't fun. Side note, sometimes starting julia with |
Ok, the problem with
Here is minimal I could come up with @giordano, however, the issue appears in the repository and I cannot create an MWE without the package: using GraphPPL, Distributions
import GraphPPL: @model
@model function gcv(κ, ω, z, x, y)
log_σ := κ * z + ω
y ~ Normal(x, exp(log_σ))
end
@model function gcv_lm(y, x_prev, x_next, z, ω, κ)
x_next ~ gcv(x = x_prev, z = z, ω = ω, κ = κ)
y ~ Normal(x_next, 1)
end
@model function hgf(y)
# Specify priors
ξ ~ Gamma(1, 1)
ω_1 ~ Normal(0, 1)
ω_2 ~ Normal(0, 1)
κ_1 ~ Normal(0, 1)
κ_2 ~ Normal(0, 1)
x_1[1] ~ Normal(0, 1)
x_2[1] ~ Normal(0, 1)
x_3[1] ~ Normal(0, 1)
# Specify generative model
for i in 2:(length(y) + 1)
x_3[i] ~ Normal(x_3[i - 1], ξ)
x_2[i] ~ gcv(x = x_2[i - 1], z = x_3[i], ω = ω_2, κ = κ_2)
x_1[i] ~ gcv_lm(x_prev = x_1[i - 1], z = x_2[i], ω = ω_1, κ = κ_1, y = y[i - 1])
end
end
function mwe()
model = GraphPPL.Model(identity, GraphPPL.PluginsCollection(), GraphPPL.DefaultBackend())
ctx = GraphPPL.getcontext(model)
y = nothing
for i in 1:10
y = GraphPPL.getorcreate!(model, ctx, :y, i)
end
GraphPPL.add_terminated_submodel!(model, ctx, GraphPPL.NodeCreationOptions(), hgf, (y = y,), GraphPPL.static(1))
return model
end
mwe() isa GraphPPL.Model This code segfaults in 1.11. I also tried to manually debug it with no success. I also dev-ed all the dependencies and removed all the for variable_node in variable_nodes
add_edge!(model, factor_node_id, factor_node_propeties, variable_node, interface_name, index)
index += increase_index(variable_node)
end to foreach(variable_nodes) do variable_node
add_edge!(model, factor_node_id, factor_node_propeties, variable_node, interface_name, index)
index += increase_index(variable_node)
end fixes the problem and there is no segmentation fault. My CS expertise is not good enough to track down segmentation faults. |
While a reproducer should preferably be as small as possible (crafting a minimal reproducer, for example by binary search if you have no other clue, is already a large chunk of the work of hunting down a bug), saying "go and copy some code from somewhere else" doesn't work very well. I tried for like 10 minutes to build the example by copying the code piece by piece from the tests but gave up out of frustration because I'm not familiar with the codebase and didn't know what to do exactly. That said, the segfault doesn't seem to reproduce on |
Point taken, indeed I thought it would be easier, sorry for not preparing a better MWE. Nice to hear that it is fixed on master. I can try run the bisection, is there a script that simplifies this process? |
I usually use a variation of following script with #!/bin/bash
make cleanall || true
make -j60 USECCACHE=1 || exit 125
./usr/bin/julia --startup-file=no my_reproducer.jl
EXIT_CODE=$?
if [[ "${EXIT_CODE}" -eq 139 ]]; then
# For git bisect we need to return an exit status less than 128, but if a
# program segfaults with exit code 11+129=139 we return 11. Don't change
# all other cases.
exit 11
else
exit "${EXIT_CODE}"
fi |
Well I tried for quite some time to run git bisect (for a couple of hours given the compilation time), but it either says |
Releases are cut from branches, not from master. Find the first commit in the release 1.11 branch since the branching out, the parent will be in master. Also, check if you can reproduce the bug on 1.11 alpha 0, 1 or whatever that's called, that gives you an idea of what direction to look at |
That's what I'm struggling with, I'm not sure how to do it |
From the github web interface: go to https://github.com/JuliaLang/julia, choose the release-1.11 branch, you get to https://github.com/JuliaLang/julia/tree/release-1.11, click on 448 commits ahead of and get to master...release-1.11. The top commit (7dad444) is the first one since branching out, its parent aecd8fd is on master From the command line, you can probably do something like git log master...release-1.11, or something like that (I can't check it on the phone). Edit: you can use |
Couple of comments:
I'd say this is the range to look into for the fix: aecd8fd...ee09ae7 (first is bad, last is good). Edit: for the record, it reproduces also on a06a801 but not 4b27a16 |
That does at least give us a pretty good idea of what kind of issue it is likely to be. Somewhat hard to be sure if it is better just to backport that (lots of lines, but very low risk internal only change which only helps Enzyme support this version easier even though it also breaks Enzyme) or investigate whether a more specific fix is possible |
Segfault first appeared in #52405 (corresponding change in our fork of llvm: JuliaLang/llvm-project#23)
but this looks unhelpful, since it was backported to julia v1.10 (1e66ce2) and llvm 15 isn't in julia v1.11 |
This test in GraphPPL.jl causes segmentation fault. The segmentation fault can be reproduced by copy-pasting the content of the test (plus necessary imports) in REPL. Interestingly enough the test passes normally while debugging. So the notable thing is that this line
should return a fully initialized
y
, but on 1.11 it returns an array of#undef
values.The code in the loop uses
isassigned
under the hood to initialize the elements ofy
and the check works correctly during the debugging and in 1.10, e.g in VSCode debugger view I getThe fact that debugging works normally does not really allow us to narrow down the scope of the issue. It also doesn't seem to happen in real code that relies on this functionality, only in tests. Julia shouldn't really segfault so it might indicate deeper problems somewhere else.
The code that segfaults is on the main branch
The text was updated successfully, but these errors were encountered: