Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Overwrites DataFrames describe function #33

Open
davidanthoff opened this issue Mar 15, 2017 · 11 comments
Open

Overwrites DataFrames describe function #33

davidanthoff opened this issue Mar 15, 2017 · 11 comments

Comments

@davidanthoff
Copy link
Contributor

I have a lot of situations where I need both DataFrames and DataTables loaded at the same time, e.g. I start out with:

using DataFrames, DataTables

Right now I always get a warning that DataTables overwrites describe from DataFrames, which is not ideal.

I guess the solution for this is to move the function definition in some common base package, and then both DataFrames and DataTables will add a method? Would that be AbstractTables? If so, could we maybe start with a really bare bones AbstractTables now, that only holds that one definition, and then later more stuff can be added?

@ararslan
Copy link
Member

Ideally describe would be removed from one or both packages, as it's more of a statistical function than a tabular data function. Maybe that could live in StatsModels at some point?

@kleinschmidt
Copy link
Contributor

It's from StatsBase, right?

StatsBase.describe(dt::AbstractDataTable) = describe(STDOUT, dt)

@ararslan
Copy link
Member

Yes, but unless we want a dependency on AbstractTables in StatsBase (which I don't think we should do), we'd still have to define the generic describe method on tables elsewhere. That's why I suggested StatsModels.

@kleinschmidt
Copy link
Contributor

I'm confused: why does using DataFrames and DataTables result in one's describe overwriting the other if they're both extending the method from StatsBase?

@ararslan
Copy link
Member

Ohhhhhhhhhhhhhhhhhhhhhhhh heh, DataFrames and DataTables both @reexport StatsBase. I bet that's it.

@davidanthoff
Copy link
Contributor Author

Both have this:

StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv)

That is the first of three overwriting messages I'm getting.

@davidanthoff
Copy link
Contributor Author

And then there is:

function StatsBase.describe{T<:Number}(io, dv::AbstractArray{T})
function StatsBase.describe{T}(io, dv::AbstractArray{T})

in both. I guess those three methods should just move to StatsBase, right?

@ararslan
Copy link
Member

Assuming they don't contain code specific to Nullables and/or NA, yes, those methods should live in StatsBase. Good catch!

@davidanthoff
Copy link
Contributor Author

Well, they actually contain code that is Nullable and DataArray specific :) So I guess they really should dispatch on fewer types?

@kleinschmidt
Copy link
Contributor

Maybe replace those abstract array methods with an non-exported method for single columns?

@davidanthoff
Copy link
Contributor Author

I think StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv) should just move to StatsBase as is.

A version of function StatsBase.describe{T<:Number}(io, nv::AbstractArray{T}) that doesn't handle missing values should also move to StatsBase. In DataTables there should be function StatsBase.describe{T<:Number}(io, nv::NullableArray{T}), and in DataFrames function StatsBase.describe{T<:Number}(io, nv::DataArray{T}).

For function StatsBase.describe{T}(io, nv::AbstractArray{T}) similar story.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants