Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a convenient way to convert a column of a DataFrame from a DataArray to an Array? #1022

Closed
dmbates opened this issue Jul 30, 2016 · 5 comments
Labels

Comments

@dmbates
Copy link
Contributor

dmbates commented Jul 30, 2016

The obvious approach of checking that there are no NA's and, if so, assigning

df[colnm] = Array(df[colnm])

is just an expensive no-op. The act of assigning a column converts Arrays to DataArrays. Is there a way to by-pass this other than creating a DataFrame from a vector of columns and a vector of names?

@tshort
Copy link
Contributor

tshort commented Jul 31, 2016

I also like to keep Arrays in DataFrames.

DataFramesMeta includes a PassThrough type with an alias P. This prevents conversion to DataArrays in constructors and column assignments. Here's an example of use:

julia> using DataFramesMeta, DataFrames

julia> d = DataFrame(a = rand(5), b = rand(5))
5x2 DataFrames.DataFrame
│ Row │ a         │ b        │
┝━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━┥
│ 10.3139710.951179 │
│ 20.5161040.751373 │
│ 30.5422470.66035  │
│ 40.03349640.4959   │
│ 50.4090640.403159 │

julia> dump(d)
DataFrames.DataFrame  5 observations of 2 variables
  a: DataArrays.DataArray{Float64,1}(5) [0.3139714349302918,0.5161037634090151,0.5422474568330207,0.03349641975260931]
  b: DataArrays.DataArray{Float64,1}(5) [0.9511785348243882,0.751373142960055,0.6603496186975484,0.49590041613904745]

julia> d[:a] = P(d[:a].data)
5-element DataFramesMeta.PassThrough{Float64}:
 0.313971
 0.516104
 0.542247
 0.0334964
 0.409064

julia> dump(d)
DataFrames.DataFrame  5 observations of 2 variables
  a: Array(Float64,(5,)) [0.3139714349302918,0.5161037634090151,0.5422474568330207,0.03349641975260931,0.4090640141506947]
  b: DataArrays.DataArray{Float64,1}(5) [0.9511785348243882,0.751373142960055,0.6603496186975484,0.49590041613904745]

julia> d = DataFrame(a = P(rand(5)), b = P(rand(5)))
5x2 DataFrames.DataFrame
│ Row │ a        │ b         │
┝━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━┥
│ 10.3153050.953487  │
│ 20.7080530.540331  │
│ 30.2422690.0708566 │
│ 40.2996320.540411  │
│ 50.8513170.800461  │

julia> dump(d)
DataFrames.DataFrame  5 observations of 2 variables
  a: Array(Float64,(5,)) [0.3153053741658196,0.708053239906731,0.24226912151035518,0.29963228587977286,0.8513172034688896]
  b: Array(Float64,(5,)) [0.9534867384419063,0.5403313871557482,0.07085658962059216,0.5404111641067515,0.8004607739998972]

I should document this...

@dmbates
Copy link
Contributor Author

dmbates commented Sep 26, 2016

I face the same problem in the nl/nullable branch. Can we allow assignment of an Array as a column in a data frame to, well, assign the array and not convert the array to something else which is much more difficult to work with?

It seems to me that arbitrarily changing the type in an assignment is a poor design.

@quinnj
Copy link
Member

quinnj commented Sep 26, 2016

I like the sound of this ^. I think in general, we need to move away from assuming we know the type of a "column" and better define the expected column interface, so that things like SparseVector, NDSparseArray, any of Tim Holy's special array types, etc. can all be used and "just work" when used in a DataFrame.

@nalimilan
Copy link
Member

I agree we should experiment with no longer converting columns automatically on construction/assignment. Anyway, in practice, real data will most often come for importing files/DBs, and from doing computations on existing columns, so the type decision will have to happen there.

@nalimilan
Copy link
Member

Closing in favor of #1119.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants