-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022
Comments
I also like to keep Arrays in DataFrames. DataFramesMeta includes a julia> using DataFramesMeta, DataFrames
julia> d = DataFrame(a = rand(5), b = rand(5))
5x2 DataFrames.DataFrame
│ Row │ a │ b │
┝━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━┥
│ 1 │ 0.313971 │ 0.951179 │
│ 2 │ 0.516104 │ 0.751373 │
│ 3 │ 0.542247 │ 0.66035 │
│ 4 │ 0.0334964 │ 0.4959 │
│ 5 │ 0.409064 │ 0.403159 │
julia> dump(d)
DataFrames.DataFrame 5 observations of 2 variables
a: DataArrays.DataArray{Float64,1}(5) [0.3139714349302918,0.5161037634090151,0.5422474568330207,0.03349641975260931]
b: DataArrays.DataArray{Float64,1}(5) [0.9511785348243882,0.751373142960055,0.6603496186975484,0.49590041613904745]
julia> d[:a] = P(d[:a].data)
5-element DataFramesMeta.PassThrough{Float64}:
0.313971
0.516104
0.542247
0.0334964
0.409064
julia> dump(d)
DataFrames.DataFrame 5 observations of 2 variables
a: Array(Float64,(5,)) [0.3139714349302918,0.5161037634090151,0.5422474568330207,0.03349641975260931,0.4090640141506947]
b: DataArrays.DataArray{Float64,1}(5) [0.9511785348243882,0.751373142960055,0.6603496186975484,0.49590041613904745]
julia> d = DataFrame(a = P(rand(5)), b = P(rand(5)))
5x2 DataFrames.DataFrame
│ Row │ a │ b │
┝━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━┥
│ 1 │ 0.315305 │ 0.953487 │
│ 2 │ 0.708053 │ 0.540331 │
│ 3 │ 0.242269 │ 0.0708566 │
│ 4 │ 0.299632 │ 0.540411 │
│ 5 │ 0.851317 │ 0.800461 │
julia> dump(d)
DataFrames.DataFrame 5 observations of 2 variables
a: Array(Float64,(5,)) [0.3153053741658196,0.708053239906731,0.24226912151035518,0.29963228587977286,0.8513172034688896]
b: Array(Float64,(5,)) [0.9534867384419063,0.5403313871557482,0.07085658962059216,0.5404111641067515,0.8004607739998972] I should document this... |
I face the same problem in the It seems to me that arbitrarily changing the type in an assignment is a poor design. |
I like the sound of this ^. I think in general, we need to move away from assuming we know the type of a "column" and better define the expected column interface, so that things like SparseVector, NDSparseArray, any of Tim Holy's special array types, etc. can all be used and "just work" when used in a DataFrame. |
I agree we should experiment with no longer converting columns automatically on construction/assignment. Anyway, in practice, real data will most often come for importing files/DBs, and from doing computations on existing columns, so the type decision will have to happen there. |
Closing in favor of #1119. |
The obvious approach of checking that there are no NA's and, if so, assigning
is just an expensive no-op. The act of assigning a column converts Arrays to DataArrays. Is there a way to by-pass this other than creating a DataFrame from a vector of columns and a vector of names?
The text was updated successfully, but these errors were encountered: