-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame conversion from more complex posteriors fails in below MWE. Simple posteriors work. #37
Comments
It seems this error only occurs if StanSample (I assume the master branch) is loaded: julia> using DimensionalData, DataFrames
julia> mu = DimArray(randn(1000, 4), (Dim{:draw}(1:1000), Dim{:chain}(1:4)); name=:mu);
julia> DataFrame(mu);
julia> using StanSample
julia> DataFrame(mu);
ERROR: ArgumentError: Some dims were not found in object
Stacktrace:
[1] _errorextradims()
@ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/K9D4P/src/Dimensions/primitives.jl:646
[2] dimnum
@ ~/.julia/packages/DimensionalData/K9D4P/src/Dimensions/primitives.jl:201 [inlined]
[3] getcolumn
@ ~/.julia/packages/DimensionalData/K9D4P/src/tables.jl:196 [inlined]
[4] fromcolumns(x::DimTable{(:draw, :chain, :mu), DimStack{NamedTuple{(:mu,), Tuple{Matrix{Float64}}}, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, NamedTuple{(:mu,), Tuple{Tuple{Dim{:draw, Colon}, Dim{:chain, Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(:mu,), Tuple{DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{DimensionalData.DimColumn{Int64, Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.DimColumn{Int64, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}}, NamedTuple{(:mu,), Tuple{DimensionalData.DimArrayColumn{Float64, DimArray{Float64, 2, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Matrix{Float64}, Symbol, DimensionalData.Dimensions.LookupArrays.NoMetadata}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64}}}}, names::Vector{Symbol}; copycols::Nothing)
@ DataFrames ~/.julia/packages/DataFrames/KKiZW/src/other/tables.jl:36
[5] DataFrame(x::DimArray{Float64, 2, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Matrix{Float64}, Symbol, DimensionalData.Dimensions.LookupArrays.NoMetadata}; copycols::Nothing)
@ DataFrames ~/.julia/packages/DataFrames/KKiZW/src/other/tables.jl:59
[6] DataFrame(x::DimArray{Float64, 2, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Matrix{Float64}, Symbol, DimensionalData.Dimensions.LookupArrays.NoMetadata})
@ DataFrames ~/.julia/packages/DataFrames/KKiZW/src/other/tables.jl:48
[7] top-level scope
@ REPL[7]:1
(jl_vCqw12) pkg> st
Status `/tmp/jl_vCqw12/Project.toml`
[a93c6f00] DataFrames v1.4.3
[0703355e] DimensionalData v0.23.0
[c1514b29] StanSample v6.12.0 `https://github.com/StanJulia/StanSample.jl.git#master` I suspect the issue is caused by this line: https://github.com/StanJulia/StanSample.jl/blob/209a5bf4d0a6e15a316b46407c194a808389c506/src/utils/dimarray.jl#L7 . If we simply use a different chain key, all is well: julia> mu2 = DimArray(randn(1000, 4), (Dim{:draw}(1:1000), Dim{:foo}(1:4)); name=:mu);
julia> DataFrame(mu2); I suspect the issue is that by declaring these dimensions with julia> mu = DimArray(randn(1000, 4), (draw=1:1000, chain=1:4); name=:mu)
1000×4 DimArray{Float64,2} mu with dimensions:
Dim{:draw} Sampled{Int64} 1:1000 ForwardOrdered Regular Points,
chain Sampled{Int64} 1:4 ForwardOrdered Regular Points
1 2 3 4
1 0.783298 0.0830577 0.597823 1.02076
2 0.353957 -0.207669 1.06499 0.466934
⋮
998 0.590198 -1.0169 0.660785 -0.0630021
999 -0.44761 -2.01247 1.34981 1.58058
1000 -0.273417 -0.769404 0.140136 0.174886 Since the original error occurs independently of InferenceObjects, I'd suggest opening an issue on DimensionalData. But I'd also suggest reconsidering whether you really need to define these dimensions using |
Hi Seth, thanks a lot for looking into this. Removing the original :dimarray option in StanSample solves the issue. As Inference[Object/Data] is a much better solution I'm leaning towards dropping dimarray altogether. I introduced it primarily because I liked the display of :dimarray based chains but that is even better supported by InferenceData. I'll think about it a bit more over the next few days, make a decision and close this issue then. Thanks again for your help! |
Hi @goedman I gave this a little more thought, and actually, I think this is an InferenceObjects issue after all. DimensionalData itself has the special dims julia> DimArray(randn(2, 3), (X=1:2, Y=1:3); name=:x)
2×3 DimArray{Float64,2} x with dimensions:
X Sampled{Int64} 1:2 ForwardOrdered Regular Points,
Y Sampled{Int64} 1:3 ForwardOrdered Regular Points
1 2 3
1 -1.40336 0.254703 0.598698
2 1.61594 0.220872 1.39326 I think the problem is this line of code: InferenceObjects.jl/src/dimensions.jl Line 60 in 84a5aa8
basedims is not part of DimensionalData's API, and if it is passed a Symbol , it will assign your specialized chain type if defined. It's a fairly easy fix to not use basedims here, and then other packages won't be able to break our code like this in the future.
No problem! One thing your |
Hi @sethaxen , StanSample.jl v6.13.0 for now has disabled :dimarray[s] as an option in As I tend to use DataFrames for further work with posterior values, including stacked values, I'll see what it takes to expand what is currently in ./util/dataframes.jl for that purpose. Looking at the 8_schools Pluto notebook (e.g. in Stan.jl v9.10.1, ./Examples_Notebooks/inferencedata.jl) the inferencedata based posterior_schools has 256000 rows and post_schools (simple read_samples(..., :dataframe) has 8000 rows. |
This makes sense. There's more than one way one might flatten multidimensional arrays into a tabular structure. The "widest" possible way is to make each marginal parameter its own column, so one would have The "tallest" possible way effectively julia> DataFrame(inferencedata(m_schools).posterior) |> size
(256000, 8)
julia> DataFrame(inferencedata(m_schools; dims=(theta=[:school], theta_tilde=[:school]).posterior) |> size
32000 Since each of these Tables has its uses for downstream analysis and plotting, the end goal is to have convenience functions so the user can convert between them easily.
What kinds of downstream analyses do you do with the dataframes? I'm wondering how we should document this behavior. |
Just a small comment, do you use dataframes because the Stan output is given in a wide-format table? In python we have had this wide vs long -format discussion before and that is why having nD structure (Dataset) is better. Btw I think R tools (tidy-bayes?) use long-format. |
Typically I've been trying "more advanced" formats (such as DimensionalData and AxisKeys) and I certainly like aspects of it. But they also come with a learning (and re-learning) curve. As an end-user I (personally) always seem to gravitate back to DataFrames for my own use (Statistical Rethinking and Regression and Other Stories related projects). For Stan.jl this is a bit different. If possible I like to support important new work such as C++ threads, BridgeStan, ODE improvements, InferenceObjects and PosteriorDB. As an additional benefit, InferenceObjects show/display works very well in Pluto so I'm quite motivated to support conversion to that format. For above mentioned projects plotting and summarizing are how I use chain-stacked, wide-format DataFrames. Seth's example above:
is a strong argument to learn this "dims" DSL! |
Thanks Seth! I've had some issues installing the last 2 version of InferenceObjects.j:
This is in a pretty empty environment:
Same on J1.8 and J1.9-beta4. InferenceObjects v0.3.4 is pretty fresh, sometimes (not very often though) it takes a bit longer before it is visible, but v0.3.3 should certainly visible. Will keep on checking. |
That's really strange, I have no problem installing on v1.9 and v1.9-beta4. I wonder if there's a way to manually trigger your local copy of the registry to update. |
Ok, this morning no problem upgrading to v0.3.4 anymore! I've occasionally seen these kinds of delays. |
In this PR I am trying to revert back to have both InferenceObjects (v0.3.4) and DimensionalData(v0.28.2) as extensions to StanSample.jl. The updates still create an issue::
There is more info in the StanSample CI action logs on Github (Inferencedata dimarray tests). |
I wonder if this is something I'm doing wrong or if it is a bug. All goes well until the last call to convert to a DataFrame. MWE:
I don't think it is caused by the draw indices, warmup_posterior and posterior_predictive also fail. They all have set variables, e.g.
like theta.1, theta.2, etc.
The text was updated successfully, but these errors were encountered: