Skip to content
This repository has been archived by the owner on Jul 17, 2019. It is now read-only.

Move data tables dependencies to respective packages #28

Closed
wants to merge 9 commits into from
Closed

Move data tables dependencies to respective packages #28

wants to merge 9 commits into from

Conversation

cjprybol
Copy link

@cjprybol cjprybol commented Mar 1, 2017

#27

@codecov-io
Copy link

codecov-io commented Mar 1, 2017

Codecov Report

Merging #28 into master will increase coverage by 7.38%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #28      +/-   ##
==========================================
+ Coverage   12.42%   19.81%   +7.38%     
==========================================
  Files           1        1              
  Lines         169      106      -63     
==========================================
  Hits           21       21              
+ Misses        148       85      -63
Impacted Files Coverage Δ
src/DataStreams.jl 19.81% <ø> (+7.38%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06d5e99...be9732b. Read the comment docs.

@cjprybol
Copy link
Author

cjprybol commented Mar 7, 2017

@ararslan
Copy link
Member

ararslan commented Mar 7, 2017

I'm hesitant to go ahead with this change until we have a proper table abstraction that both DataFrames and DataTables adhere to.

@davidanthoff
Copy link

Couldn't there be a version of DataStreams that just works with DataTables and DataFrames at the same time? I guess I'm in favor of adding DataTables compat, but not in favor of removing DataFrames compat.

@nalimilan
Copy link
Member

As I noted at JuliaData/CSV.jl#63, I think the abstraction already exists, as we have Source.Sink. But what needs to be changed is that the implementation of that interface should be moved to DataFrames and DataTables, so that DataStreams does not need to depend on them. That way both can be supported at the same time, which is the only way forward if we want to allow for more table types in the future.

As a second step, we should consider whether Source.Sink and AbstractTable could share some methods, the latter inheriting from the former. That way we wouldn't need to reinvent the wheel for AbstractTable.

@cjprybol
Copy link
Author

cjprybol commented Mar 17, 2017

the implementation of that interface should be moved to DataFrames and DataTables, so that DataStreams does not need to depend on them. That way both can be supported at the same time, which is the only way forward if we want to allow for more table types in the future.

I tried this and opened/updated PRs for all 4 packages. Let's keep the main conversation about whether the design is correct here, if possible.

JuliaData/DataFrames.jl#1174 moves the abstraction of writing to a DataFrame from here to DataFrames, and JuliaData/DataTables.jl#35 moves the abstraction for DataTables to DataTables. @davidanthoff this should let you modify the CSV.read return behavior from within DataFrames to return the column types you desired #29.

I did run into issues leaving CSV pointing to DataFrames, so I have not reverted my DataFrames -> DataTables switch back to DataFrames. This would also make DataTable the default sink for CSV.read. In DataFrames, you can read files as

readtable(...)
CSV.read(::file, DataFrame, ...)

and in DataTables

CSV.read(::file, ...)

I also changed the default return type from WeakRefStrings to Strings because I had issues with those and DataFrames too, but also because of opinions expressed in JuliaData/CSV.jl#37.

CSV.read can now work with DataFrames and DataTables at the same time and everything passes tests when loaded together

julia> DataFrames.head(CSV.read("/users/cameron/Repos/Cardiac-Enhancers/GENERATED/enhancer_transcript_associations.tsv", DataFrame, delim='\t'))
6×6 DataFrames.DataFrame
│ Row │ chr    │ start  │ stop   │ score │ source     │ transcript_id       │
├─────┼────────┼────────┼────────┼───────┼────────────┼─────────────────────┤
│ 1"chr1"9063339073180.033"putative""MSTRG.336.1"       │
│ 2"chr1"9418459457640.041"putative""ENST00000466827.1" │
│ 3"chr1"9418459457640.041"putative""MSTRG.339.21"      │
│ 4"chr1"9418459457640.041"putative""ENST00000464948.1" │
│ 5"chr1"9418459457640.041"putative""MSTRG.340.6"       │
│ 6"chr1"9418459457640.041"putative""ENST00000496938.1" │

julia> DataTables.head(CSV.read("/users/cameron/Repos/Cardiac-Enhancers/GENERATED/enhancer_transcript_associations.tsv", delim='\t'))
6×6 DataTables.DataTable
│ Row │ chr  │ start  │ stop   │ score │ source   │ transcript_id     │
├─────┼──────┼────────┼────────┼───────┼──────────┼───────────────────┤
│ 1   │ chr1 │ 9063339073180.033 │ putative │ MSTRG.336.1       │
│ 2   │ chr1 │ 9418459457640.041 │ putative │ ENST00000466827.1 │
│ 3   │ chr1 │ 9418459457640.041 │ putative │ MSTRG.339.21      │
│ 4   │ chr1 │ 9418459457640.041 │ putative │ ENST00000464948.1 │
│ 5   │ chr1 │ 9418459457640.041 │ putative │ MSTRG.340.6       │
│ 6   │ chr1 │ 9418459457640.041 │ putative │ ENST00000496938.1 │

julia> Pkg.test("DataTables", "DataFrames", "CSV", "DataStreams")
INFO: Computing test dependencies for DataTables...
INFO: Installing Atom v0.5.9
INFO: Installing Blink v0.5.1
INFO: Installing CodeTools v0.4.3
INFO: Installing Codecs v0.3.0
INFO: Installing Hiccup v0.1.1
INFO: Installing HttpCommon v0.2.6
INFO: Installing HttpParser v0.2.0
INFO: Installing HttpServer v0.1.7
INFO: Installing Juno v0.2.7
INFO: Installing LNR v0.0.2
INFO: Installing Lazy v0.11.5
INFO: Installing MbedTLS v0.4.4
INFO: Installing Media v0.2.5
INFO: Installing Mustache v0.1.3
INFO: Installing Mux v0.2.3
INFO: Installing RData v0.0.4
INFO: Installing WebSockets v0.2.1
INFO: Building HttpParser
INFO: Building Homebrew
Updated 1 tap (homebrew/core).
==> Updated Formulae
haskell-stack
INFO: Building MbedTLS
Using system libraries...
INFO: Testing DataTables
Running tests:
	PASSED: utils.jl
	PASSED: cat.jl
WARNING: using DataTables.sub in module TestData conflicts with an existing identifier.
	PASSED: data.jl
	PASSED: index.jl
	PASSED: datatable.jl
	PASSED: datatablerow.jl
	PASSED: io.jl
	PASSED: constructors.jl
	PASSED: conversions.jl
	PASSED: sort.jl
Test Summary: | Pass  Total
  colwise     |   32     32
	PASSED: grouping.jl
	PASSED: join.jl
	PASSED: iteration.jl
	PASSED: duplicates.jl
	PASSED: show.jl
INFO: DataTables tests passed
INFO: Removing Atom v0.5.9
INFO: Removing Blink v0.5.1
INFO: Removing CodeTools v0.4.3
INFO: Removing Codecs v0.3.0
INFO: Removing Hiccup v0.1.1
INFO: Removing HttpCommon v0.2.6
INFO: Removing HttpParser v0.2.0
INFO: Removing HttpServer v0.1.7
INFO: Removing Juno v0.2.7
INFO: Removing LNR v0.0.2
INFO: Removing Lazy v0.11.5
INFO: Removing MbedTLS v0.4.4
INFO: Removing Media v0.2.5
INFO: Removing Mustache v0.1.3
INFO: Removing Mux v0.2.3
INFO: Removing RData v0.0.4
INFO: Removing WebSockets v0.2.1
INFO: Computing test dependencies for DataFrames...
INFO: Installing Atom v0.5.9
INFO: Installing Blink v0.5.1
INFO: Installing CodeTools v0.4.3
INFO: Installing Codecs v0.3.0
INFO: Installing Hiccup v0.1.1
INFO: Installing HttpCommon v0.2.6
INFO: Installing HttpParser v0.2.0
INFO: Installing HttpServer v0.1.7
INFO: Installing Juno v0.2.7
INFO: Installing LNR v0.0.2
INFO: Installing Lazy v0.11.5
INFO: Installing MbedTLS v0.4.4
INFO: Installing Media v0.2.5
INFO: Installing Mustache v0.1.3
INFO: Installing Mux v0.2.3
INFO: Installing RData v0.0.4
INFO: Installing RDatasets v0.2.0
INFO: Installing WebSockets v0.2.1
INFO: Building HttpParser
INFO: Building Homebrew
Already up-to-date.
INFO: Building MbedTLS
Using system libraries...
INFO: Testing DataFrames
Running tests:
	PASSED: utils.jl
	PASSED: cat.jl
	PASSED: data.jl
	PASSED: index.jl
	PASSED: dataframe.jl
	PASSED: dataframerow.jl
	PASSED: io.jl
	PASSED: formula.jl
	PASSED: constructors.jl
	PASSED: conversions.jl
	PASSED: sort.jl
	PASSED: grouping.jl
	PASSED: join.jl
	PASSED: iteration.jl
	PASSED: duplicates.jl
	PASSED: show.jl
	PASSED: statsmodel.jl
	PASSED: contrasts.jl
INFO: DataFrames tests passed
INFO: Removing Atom v0.5.9
INFO: Removing Blink v0.5.1
INFO: Removing CodeTools v0.4.3
INFO: Removing Codecs v0.3.0
INFO: Removing Hiccup v0.1.1
INFO: Removing HttpCommon v0.2.6
INFO: Removing HttpParser v0.2.0
INFO: Removing HttpServer v0.1.7
INFO: Removing Juno v0.2.7
INFO: Removing LNR v0.0.2
INFO: Removing Lazy v0.11.5
INFO: Removing MbedTLS v0.4.4
INFO: Removing Media v0.2.5
INFO: Removing Mustache v0.1.3
INFO: Removing Mux v0.2.3
INFO: Removing RData v0.0.4
INFO: Removing RDatasets v0.2.0
INFO: Removing WebSockets v0.2.1
INFO: Computing test dependencies for CSV...
INFO: Installing BufferedStreams v0.3.2
INFO: Installing DataStreamsIntegrationTests v0.0.1
INFO: Installing DecFP v0.1.5
INFO: Installing Libz v0.2.4
INFO: Building DecFP
INFO: Testing CSV
  1.225617 seconds (24.95 M allocations: 652.548 MB, 11.92% gc time)
f = CSV.Source: /Users/Cameron/.julia/v0.5/CSV/test/test_files/test_basic_pipe.csv
    CSV.Options:
        delim: '|'
        quotechar: '"'
        escapechar: '\\'
        null: ""
        dateformat: Base.Dates.DateFormat(Base.Dates.Slot[Base.Dates.DelimitedSlot{Base.Dates.Year}(Base.Dates.Year,'y',4,"-"),Base.Dates.DelimitedSlot{Base.Dates.Month}(Base.Dates.Month,'m',2,"-"),Base.Dates.DelimitedSlot{Base.Dates.Day}(Base.Dates.Day,'d',2,r"(?=\s|$)")],"","english")
Data.Schema{true}:
rows: 2	cols: 3
Columns:
 "col1"  Nullable{Int64}
 "col2"  Nullable{Int64}
 "col3"  Nullable{Int64}
f = CSV.Sink(    CSV.Options:
        delim: ','
        quotechar: '"'
        escapechar: '\\'
        null: ""
        dateformat: Base.Dates.DateFormat(Base.Dates.Slot[Base.Dates.DelimitedSlot{Base.Dates.Year}(Base.Dates.Year,'y',4,"-"),Base.Dates.DelimitedSlot{Base.Dates.Month}(Base.Dates.Month,'m',2,"-"),Base.Dates.DelimitedSlot{Base.Dates.Day}(Base.Dates.Day,'d',2,r"(?=\s|$)")],"","english"),IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1),"/var/folders/p5/_x1sy3yx0cnd35c_r9zzgz840000gn/T/juliaU0Bu36",0,false,String[],false)
WARNING: Method definition (::Type{DataTables.DataTable})(Any, DataStreams.Data.Schema, Type{DataStreams.Data.Field}, Bool, Array{UInt8, 1}) in module DataTables at /Users/Cameron/.julia/v0.5/DataTables/src/abstractdatatable/io.jl:255 overwritten in module Main at /Users/Cameron/.julia/v0.5/CSV/test/datastreams.jl:15.
[2017-03-16T20:52:45.044]: Test high-level to sink from source; e.g. CSV.write
[2017-03-16T20:52:45.083]: Sink: CSV.Sink args => Source: DataTable args
[2017-03-16T20:52:45.804]: Sink: CSV.Sink args => Source: DataTable args + append
[2017-03-16T20:52:45.988]: Sink: CSV.Sink args => Source: DataTable args + transforms
[2017-03-16T20:52:46.249]: Sink: CSV.Sink args => Source: DataTable args + append + transforms
[2017-03-16T20:52:46.374]: Sink: CSV.Sink => Source: DataTable args
[2017-03-16T20:52:46.543]: Sink: CSV.Sink => Source: DataTable args + append
[2017-03-16T20:52:46.734]: Sink: CSV.Sink => Source: DataTable args + transforms
[2017-03-16T20:52:46.87]: Sink: CSV.Sink => Source: DataTable args + append + transforms`
[2017-03-16T20:52:46.984]: Sink: CSV.Sink args => Source: DataTable
[2017-03-16T20:52:47.158]: Sink: CSV.Sink args => Source: DataTable + append
[2017-03-16T20:52:47.349]: Sink: CSV.Sink args => Source: DataTable + transforms
[2017-03-16T20:52:47.484]: Sink: CSV.Sink args => Source: DataTable + append + transforms
[2017-03-16T20:52:47.589]: Sink: CSV.Sink => Source: DataTable
[2017-03-16T20:52:47.752]: Sink: CSV.Sink => Source: DataTable + append
[2017-03-16T20:52:47.936]: Sink: CSV.Sink => Source: DataTable + transforms
[2017-03-16T20:52:48.054]: Sink: CSV.Sink => Source: DataTable + append + transforms
[2017-03-16T20:52:48.178]: finished...
[2017-03-16T20:52:48.18]: Test high-level from source to sink; e.g. CSV.read
[2017-03-16T20:52:48.18]: Source: CSV.Source args => Sink: DataTable args
[2017-03-16T20:52:48.525]: Source: CSV.Source args => Sink: DataTable args + append
[2017-03-16T20:52:48.661]: Source: CSV.Source args => Sink: DataTable args + transforms
[2017-03-16T20:52:48.841]: Source: CSV.Source args => Sink: DataTable args + append + transforms
[2017-03-16T20:52:48.949]: Source: CSV.Source args => Sink: DataTable
[2017-03-16T20:52:49.109]: Source: CSV.Source args => Sink: DataTable + append
[2017-03-16T20:52:49.273]: Source: CSV.Source args => Sink: DataTable + transforms
[2017-03-16T20:52:49.412]: Source: CSV.Source args => Sink: DataTable + append + transforms`
[2017-03-16T20:52:49.531]: Source: CSV.Source => Sink: DataTable args
[2017-03-16T20:52:49.63]: Source: CSV.Source => Sink: DataTable args + append
[2017-03-16T20:52:49.745]: Source: CSV.Source => Sink: DataTable args + transforms
[2017-03-16T20:52:49.844]: Source: CSV.Source => Sink: DataTable args + append + transforms
[2017-03-16T20:52:49.944]: Source: CSV.Source => Sink: DataTable
[2017-03-16T20:52:50.053]: Source: CSV.Source => Sink: DataTable + append
[2017-03-16T20:52:50.163]: Source: CSV.Source => Sink: DataTable + transforms
[2017-03-16T20:52:50.26]: Source: CSV.Source => Sink: DataTable + append + transforms
[2017-03-16T20:52:50.357]: Test high-level from source to sink; e.g. CSV.read
[2017-03-16T20:52:50.357]: Source: CSV.Source args => Sink: CSV.Sink args
[2017-03-16T20:52:50.66]: Source: CSV.Source args => Sink: CSV.Sink args + append
[2017-03-16T20:52:50.952]: Source: CSV.Source args => Sink: CSV.Sink args + transforms
[2017-03-16T20:52:51.223]: Source: CSV.Source args => Sink: CSV.Sink args + append + transforms
[2017-03-16T20:52:51.446]: Source: CSV.Source args => Sink: CSV.Sink
[2017-03-16T20:52:51.753]: Source: CSV.Source args => Sink: CSV.Sink + append
[2017-03-16T20:52:52.068]: Source: CSV.Source args => Sink: CSV.Sink + transforms
[2017-03-16T20:52:52.324]: Source: CSV.Source args => Sink: CSV.Sink + append + transforms`
[2017-03-16T20:52:52.572]: Source: CSV.Source => Sink: CSV.Sink args
[2017-03-16T20:52:52.835]: Source: CSV.Source => Sink: CSV.Sink args + append
[2017-03-16T20:52:53.101]: Source: CSV.Source => Sink: CSV.Sink args + transforms
[2017-03-16T20:52:53.328]: Source: CSV.Source => Sink: CSV.Sink args + append + transforms
[2017-03-16T20:52:53.546]: Source: CSV.Source => Sink: CSV.Sink
[2017-03-16T20:52:53.784]: Source: CSV.Source => Sink: CSV.Sink + append
[2017-03-16T20:52:54.05]: Source: CSV.Source => Sink: CSV.Sink + transforms
[2017-03-16T20:52:54.251]: Source: CSV.Source => Sink: CSV.Sink + append + transforms
[2017-03-16T20:52:54.453]: Test high-level to sink from source; e.g. CSV.write
[2017-03-16T20:52:54.454]: Sink: CSV.Sink args => Source: CSV.Source args
[2017-03-16T20:52:54.728]: Sink: CSV.Sink args => Source: CSV.Source args + append
[2017-03-16T20:52:55.005]: Sink: CSV.Sink args => Source: CSV.Source args + transforms
[2017-03-16T20:52:55.251]: Sink: CSV.Sink args => Source: CSV.Source args + append + transforms
[2017-03-16T20:52:55.471]: Sink: CSV.Sink => Source: CSV.Source args
[2017-03-16T20:52:55.747]: Sink: CSV.Sink => Source: CSV.Source args + append
[2017-03-16T20:52:56.047]: Sink: CSV.Sink => Source: CSV.Source args + transforms
[2017-03-16T20:52:56.285]: Sink: CSV.Sink => Source: CSV.Source args + append + transforms`
[2017-03-16T20:52:56.526]: Sink: CSV.Sink args => Source: CSV.Source
[2017-03-16T20:52:56.783]: Sink: CSV.Sink args => Source: CSV.Source + append
[2017-03-16T20:52:57.041]: Sink: CSV.Sink args => Source: CSV.Source + transforms
[2017-03-16T20:52:57.271]: Sink: CSV.Sink args => Source: CSV.Source + append + transforms
[2017-03-16T20:52:57.493]: Sink: CSV.Sink => Source: CSV.Source
[2017-03-16T20:52:57.735]: Sink: CSV.Sink => Source: CSV.Source + append
[2017-03-16T20:52:58.004]: Sink: CSV.Sink => Source: CSV.Source + transforms
[2017-03-16T20:52:58.222]: Sink: CSV.Sink => Source: CSV.Source + append + transforms
[2017-03-16T20:52:58.438]: finished...
INFO: CSV tests passed
INFO: Removing BufferedStreams v0.3.2
INFO: Removing DataStreamsIntegrationTests v0.0.1
INFO: Removing DecFP v0.1.5
INFO: Removing Libz v0.2.4
INFO: Computing test dependencies for DataStreams...
INFO: No packages to install, update or remove
INFO: Testing DataStreams
INFO: DataStreams tests passed
INFO: No packages to install, update or remove

probably not perfect but I'm happy to fix any mistakes anyone catches and make requested changes.

@@ -10,12 +10,15 @@ Packages can have a single julia type implement both the `Data.Source` and `Data
* [`CSV.Source`](https://github.com/JuliaData/CSV.jl/blob/master/src/Source.jl)
* [`SQLite.Source`](https://github.com/JuliaDB/SQLite.jl/blob/master/src/Source.jl)
* [`DataFrame`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251)
* [`DataTables`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be wrong until we have a new url to link to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use a less specific link when pointing to a different repository, else it's going to be outdated quite fast. Just point to the package homepage? Or maybe to the file?

@cjprybol cjprybol changed the title Add DataTables.jl compatibility Move data tables dependencies to respective packages Mar 17, 2017
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me once support is merged in DataTables.

@@ -10,12 +10,15 @@ Packages can have a single julia type implement both the `Data.Source` and `Data
* [`CSV.Source`](https://github.com/JuliaData/CSV.jl/blob/master/src/Source.jl)
* [`SQLite.Source`](https://github.com/JuliaDB/SQLite.jl/blob/master/src/Source.jl)
* [`DataFrame`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251)
* [`DataTables`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use a less specific link when pointing to a different repository, else it's going to be outdated quite fast. Just point to the package homepage? Or maybe to the file?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants