Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Move DataTable sink for CSV/DataStreams to DataTables #35

Merged
merged 4 commits into from
Apr 7, 2017
Merged

Move DataTable sink for CSV/DataStreams to DataTables #35

merged 4 commits into from
Apr 7, 2017

Conversation

cjprybol
Copy link
Contributor

No description provided.

@cjprybol cjprybol changed the title Move DataTable sink for CSV/DataStreams to DataTable Move DataTable sink for CSV/DataStreams to DataTables Mar 17, 2017
@nalimilan nalimilan requested a review from quinnj March 17, 2017 07:55
REQUIRE Outdated
@@ -5,3 +5,6 @@ StatsBase 0.11.0
SortingAlgorithms
Reexport
Compat 0.19.0
WeakRefStrings
CSV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed?

@@ -17,7 +17,7 @@ import NullableArrays: dropnull, dropnull!
@reexport using CategoricalArrays
using GZip
using SortingAlgorithms

using WeakRefStrings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to io.jl so that it's clear where it's used?


##############################################################################
#
# CSV IO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only CSV, DataStreams.

Data.streamtype(::Type{DataTable}, ::Type{Data.Column}) = true
Data.streamtype(::Type{DataTable}, ::Type{Data.Field}) = true

Data.streamfrom{T <: AbstractVector}(source::DataTable, ::Type{Data.Column}, ::Type{T}, col) = (@inbounds A = source.columns[col]::T; return A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap long lines.

# DataTables DataStreams implementation
function Data.schema(df::DataTable, ::Type{Data.Column})
return Data.Schema(map(string, names(df)),
DataType[typeof(A) for A in df.columns], size(df, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect alignment.

allocate{S,R}(::Type{Nullable{CategoricalArrays.CategoricalValue{S,R}}}, rows, ref) = NullableCategoricalArray{S,1,R}(rows)
allocate{S,R}(::Type{NullableCategoricalVector{S,R}}, rows, ref) = NullableCategoricalArray{S,1,R}(rows)

if isdefined(Main, :DataArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this won't play well with precompilation?

@cjprybol
Copy link
Contributor Author

cjprybol commented Mar 22, 2017

These changes alone would be sufficient to support DataTables without making any changes in DataStreams, CSV, or DataFrames, right? I can use CSV.read(file, DataTable, args...) to read into a DataTable with this while using master/release branches of the other 3, although I'm getting segfaults when I try and work with the WeakRefStrings. I'll need to figure that out, and we'd need to fix this too. We could drop the functions that don't have anything DataTables specific, which would be the allocate functions and WeakRefString-specific append! and bigger decisions about whether functionality should move to DataTables & DataFrames, or directly to AbstractTables with DataTables and DataFrames inheriting from AbstractTables could be made later?

@quinnj
Copy link
Member

quinnj commented Mar 28, 2017

FYI, definitely still on board w/ the DataStreams refactoring, I'm just trying to get new tags out for packages before merging all the refactorings (that way they all have at least one tag that works for 0.6 before more major upheaval).

@quinnj quinnj merged commit 807f23e into JuliaData:master Apr 7, 2017
@quinnj
Copy link
Member

quinnj commented Apr 7, 2017

Pulled this locally to try out and kick the tires. It's actually pretty straightforward, just porting the code over from DataStreams and updating for DataTables. It's nice too because this doesn't require any changes to DataStreams or DataFrames. I did throw in a quick test and updates the REQUIRES and test/REQUIRES.

@cjprybol
Copy link
Contributor Author

cjprybol commented Apr 7, 2017

Thanks Jacob!

@cjprybol cjprybol deleted the cjp/CSV branch April 7, 2017 08:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants