Skip to content
This repository has been archived by the owner on Jul 17, 2019. It is now read-only.

Add new branch for DataTables? #27

Closed
cjprybol opened this issue Feb 14, 2017 · 8 comments
Closed

Add new branch for DataTables? #27

cjprybol opened this issue Feb 14, 2017 · 8 comments

Comments

@cjprybol
Copy link

I did a DataFrame -> DataTable find and replace and everything seems to be working. I tried to open a pull request but the pull request menu doesn't offer the option to open a pull request against a new branch (that I could find), only to make a pull request against existing branches master & gh-pages. Any chance someone with write access could open a new branch for DataTables? Thanks!

have one ready for CSV.jl too.

@quinnj
Copy link
Member

quinnj commented Feb 15, 2017

Thanks for looking into this! Yeah, let me look into the write access for htis.

@cjprybol
Copy link
Author

I think we'll need to add a version release against the branch too. There's no way to specify a branch in REQUIRE files, is there? but we can tag a release like v0.1.2-DataTables and specify that in the REQUIRE file?

@davidanthoff
Copy link

I guess ideally the whole DataStreams universe would support both DataFrames and DataTables? Now that we have them as separate packages, that would certainly make things a lot simpler, right?

@cjprybol
Copy link
Author

cjprybol commented Feb 28, 2017

the whole DataStreams universe

could you be specific about which packages you're referring to?

Now that we have them as separate packages, that would certainly make things a lot simpler, right?

It very well could be. One way we could support both would be to remove DataFrames from the REQUIRE file in this package. In the changes to deprecate readtable/writetable in DataTables we would add CSV & DataStreams to the REQUIRE file there, in effect flipping the dependency direction. We'd need to make reciprocal changes in DataFrames, but it sounds like the change would be welcomed (JuliaData/DataFrames.jl#1071)? Those changes sound like a generally good idea, but we'd have to figure out a way to inform CSV and DataStreams which data table package they should be talking to.

I'll represent (DataTables|DataFrames) as P. If a user installs either package with Pkg.add("$P") then that command will automatically install all packages in the REQUIRE file including DataStreams and CSV, correct? using P will then call using DataStreams and using CSV. If anyone could suggest an idea for how package P can instruct CSV.read which type of P-implemented data table it should be returning then I think that's a workable solution. CSV.write could detect which type of table is being passed to it. Thoughts?

@cjprybol
Copy link
Author

Functions that currently specify ::DataFrame could be changed to ::AbstractTable where DataTable <: AbstractTable and DataFrame <: AbstractTable? Or if that function definition is too broad then Union{DataTable,DataFrame} could work.

@nalimilan
Copy link
Member

I don't understand what prevents you from opening a PR. Is that still an issue?

I think for now the packages should just be moved to DataTables, since they don't play well with DataArrays-based DataFrames. AbstractTables isn't ready yet.

@quinnj
Copy link
Member

quinnj commented Feb 28, 2017

I'm diving into this currently. I'll start making PRs over the next few days.

@cjprybol
Copy link
Author

cjprybol commented Mar 1, 2017

@nalimilan I hadn't opened the pull request because I was only offered to open the PR against master (or gh-pages). I thought that might be a bad idea since master might need to stay on DataFrames until the current trio (DataFrames & DataFrames-hardcoded CSV and DataStreams) is 0.6 ready. Currently using DataStreams master on 0.6 throws these deprecations and errors

julia> using DataStreams

WARNING: deprecated syntax "abstract StreamType" at /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl:9.
Use "abstract type StreamType end" instead.

WARNING: deprecated syntax "abstract Source" at /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl:84.
Use "abstract type Source end" instead.

WARNING: deprecated syntax "abstract Sink" at /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl:116.
Use "abstract type Sink end" instead.
INFO: Precompiling module DataStreams.

WARNING: deprecated syntax "abstract StreamType" at /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl:9.
Use "abstract type StreamType end" instead.

WARNING: deprecated syntax "abstract Source" at /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl:84.
Use "abstract type Source end" instead.

WARNING: deprecated syntax "abstract Sink" at /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl:116.
Use "abstract type Sink end" instead.

WARNING: deprecated syntax "abstract AbstractIndex" at /Users/Cameron/.julia/v0.6/DataFrames/src/other/index.jl:7.
Use "abstract type AbstractIndex end" instead.

WARNING: deprecated syntax "abstract AbstractDataFrame" at /Users/Cameron/.julia/v0.6/DataFrames/src/abstractdataframe/abstractdataframe.jl:63.
Use "abstract type AbstractDataFrame end" instead.
WARNING: Array{T}(::Type{T}, m::Int) is deprecated, use Array{T}(m) instead.
Stacktrace:
 [1] depwarn(::String, ::Symbol) at ./deprecated.jl:64
 [2] Array(::Type{Void}, ::Int64) at ./deprecated.jl:51
 [3] DataArrays.DataArray(::Type{T} where T, ::Int64) at /Users/Cameron/.julia/v0.6/DataArrays/src/dataarray.jl:111
 [4] include_from_node1(::String) at ./loading.jl:539
 [5] include(::String) at ./sysimg.jl:14
 [6] macro expansion at /Users/Cameron/.julia/v0.6/DataFrames/src/DataFrames.jl:132 [inlined]
 [7] anonymous at ./<missing>:?
 [8] include_from_node1(::String) at ./loading.jl:539
 [9] include(::String) at ./sysimg.jl:14
 [10] anonymous at ./<missing>:2
 [11] eval(::Module, ::Any) at ./boot.jl:235
 [12] process_options(::Base.JLOptions) at ./client.jl:286
 [13] _start() at ./client.jl:371
while loading /Users/Cameron/.julia/v0.6/DataFrames/src/abstractdataframe/abstractdataframe.jl, in expression starting on line 719

WARNING: deprecated syntax "typealias ColumnIndex (macrocall @compat (curly Union Real Symbol))" at /Users/Cameron/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:225.
Use "const ColumnIndex = (macrocall @compat (curly Union Real Symbol))" instead.

WARNING: deprecated syntax "inner constructor SubDataFrame(...) around /Users/Cameron/.julia/v0.6/DataFrames/src/subdataframe/subdataframe.jl:55".
Use "SubDataFrame{T}(...) where T" instead.
ERROR: LoadError: LoadError: UndefVarError: sub not defined
Stacktrace:
 [1] include_from_node1(::String) at ./loading.jl:539
 [2] include(::String) at ./sysimg.jl:14
 [3] macro expansion at /Users/Cameron/.julia/v0.6/DataFrames/src/DataFrames.jl:132 [inlined]
 [4] anonymous at ./<missing>:?
 [5] include_from_node1(::String) at ./loading.jl:539
 [6] include(::String) at ./sysimg.jl:14
 [7] anonymous at ./<missing>:2
while loading /Users/Cameron/.julia/v0.6/DataFrames/src/subdataframe/subdataframe.jl, in expression starting on line 78
while loading /Users/Cameron/.julia/v0.6/DataFrames/src/DataFrames.jl, in expression starting on line 101
ERROR: LoadError: Failed to precompile DataFrames to /Users/Cameron/.julia/lib/v0.6/DataFrames.ji.
Stacktrace:
 [1] compilecache(::String) at ./loading.jl:673
 [2] require(::Symbol) at ./loading.jl:431
 [3] include_from_node1(::String) at ./loading.jl:539
 [4] include(::String) at ./sysimg.jl:14
 [5] anonymous at ./<missing>:2
while loading /Users/Cameron/.julia/v0.6/DataStreams/src/DataStreams.jl, in expression starting on line 246
ERROR: Failed to precompile DataStreams to /Users/Cameron/.julia/lib/v0.6/DataStreams.ji.
Stacktrace:
 [1] compilecache(::String) at ./loading.jl:673
 [2] require(::Symbol) at ./loading.jl:460

I would like to propose updating master branch to work on 0.6, then branching that to create DataTables branch which I could then open a PR against. I think maintaining two branches and versions of CSV.jl and DataStreams.jl should be manageable for the v0.6 release and until DataTables is ready for people to start migrating from DataFrames. I'm open to a different solution, and excited to see what @quinnj is working on!

EDIT: I see now that those errors are DataFrames related, and I'll start opening PR's. Whatever isn't needed can just be closed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants