Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022

dmbates · 2016-07-30T18:12:07Z

The obvious approach of checking that there are no NA's and, if so, assigning

df[colnm] = Array(df[colnm])

is just an expensive no-op. The act of assigning a column converts Arrays to DataArrays. Is there a way to by-pass this other than creating a DataFrame from a vector of columns and a vector of names?

The text was updated successfully, but these errors were encountered:

tshort · 2016-07-31T23:53:55Z

I also like to keep Arrays in DataFrames.

DataFramesMeta includes a PassThrough type with an alias P. This prevents conversion to DataArrays in constructors and column assignments. Here's an example of use:

julia> using DataFramesMeta, DataFrames

julia> d = DataFrame(a = rand(5), b = rand(5))
5x2 DataFrames.DataFrame
│ Row │ a         │ b        │
┝━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━┥
│ 1   │ 0.313971  │ 0.951179 │
│ 2   │ 0.516104  │ 0.751373 │
│ 3   │ 0.542247  │ 0.66035  │
│ 4   │ 0.0334964 │ 0.4959   │
│ 5   │ 0.409064  │ 0.403159 │

julia> dump(d)
DataFrames.DataFrame  5 observations of 2 variables
  a: DataArrays.DataArray{Float64,1}(5) [0.3139714349302918,0.5161037634090151,0.5422474568330207,0.03349641975260931]
  b: DataArrays.DataArray{Float64,1}(5) [0.9511785348243882,0.751373142960055,0.6603496186975484,0.49590041613904745]

julia> d[:a] = P(d[:a].data)
5-element DataFramesMeta.PassThrough{Float64}:
 0.313971
 0.516104
 0.542247
 0.0334964
 0.409064

julia> dump(d)
DataFrames.DataFrame  5 observations of 2 variables
  a: Array(Float64,(5,)) [0.3139714349302918,0.5161037634090151,0.5422474568330207,0.03349641975260931,0.4090640141506947]
  b: DataArrays.DataArray{Float64,1}(5) [0.9511785348243882,0.751373142960055,0.6603496186975484,0.49590041613904745]

julia> d = DataFrame(a = P(rand(5)), b = P(rand(5)))
5x2 DataFrames.DataFrame
│ Row │ a        │ b         │
┝━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━┥
│ 1   │ 0.315305 │ 0.953487  │
│ 2   │ 0.708053 │ 0.540331  │
│ 3   │ 0.242269 │ 0.0708566 │
│ 4   │ 0.299632 │ 0.540411  │
│ 5   │ 0.851317 │ 0.800461  │

julia> dump(d)
DataFrames.DataFrame  5 observations of 2 variables
  a: Array(Float64,(5,)) [0.3153053741658196,0.708053239906731,0.24226912151035518,0.29963228587977286,0.8513172034688896]
  b: Array(Float64,(5,)) [0.9534867384419063,0.5403313871557482,0.07085658962059216,0.5404111641067515,0.8004607739998972]

I should document this...

dmbates · 2016-09-26T15:31:36Z

I face the same problem in the nl/nullable branch. Can we allow assignment of an Array as a column in a data frame to, well, assign the array and not convert the array to something else which is much more difficult to work with?

It seems to me that arbitrarily changing the type in an assignment is a poor design.

quinnj · 2016-09-26T15:38:55Z

I like the sound of this ^. I think in general, we need to move away from assuming we know the type of a "column" and better define the expected column interface, so that things like SparseVector, NDSparseArray, any of Tim Holy's special array types, etc. can all be used and "just work" when used in a DataFrame.

nalimilan · 2016-09-26T15:45:59Z

I agree we should experiment with no longer converting columns automatically on construction/assignment. Anyway, in practice, real data will most often come for importing files/DBs, and from doing computations on existing columns, so the type decision will have to happen there.

nalimilan · 2017-01-25T09:02:02Z

Closing in favor of #1119.

ararslan added the question label Sep 5, 2016

nalimilan closed this as completed Jan 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022

Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022

dmbates commented Jul 30, 2016

tshort commented Jul 31, 2016

dmbates commented Sep 26, 2016

quinnj commented Sep 26, 2016

nalimilan commented Sep 26, 2016

nalimilan commented Jan 25, 2017

Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022

Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022

Comments

dmbates commented Jul 30, 2016

tshort commented Jul 31, 2016

dmbates commented Sep 26, 2016

quinnj commented Sep 26, 2016

nalimilan commented Sep 26, 2016

nalimilan commented Jan 25, 2017