-
Notifications
You must be signed in to change notification settings - Fork 17
Move data tables dependencies to respective packages #28
Conversation
Codecov Report
@@ Coverage Diff @@
## master #28 +/- ##
==========================================
+ Coverage 12.42% 19.81% +7.38%
==========================================
Files 1 1
Lines 169 106 -63
==========================================
Hits 21 21
+ Misses 148 85 -63
Continue to review full report at Codecov.
|
I'm hesitant to go ahead with this change until we have a proper table abstraction that both |
Couldn't there be a version of DataStreams that just works with DataTables and DataFrames at the same time? I guess I'm in favor of adding DataTables compat, but not in favor of removing DataFrames compat. |
As I noted at JuliaData/CSV.jl#63, I think the abstraction already exists, as we have As a second step, we should consider whether |
I tried this and opened/updated PRs for all 4 packages. Let's keep the main conversation about whether the design is correct here, if possible. JuliaData/DataFrames.jl#1174 moves the abstraction of writing to a DataFrame from here to DataFrames, and JuliaData/DataTables.jl#35 moves the abstraction for DataTables to DataTables. @davidanthoff this should let you modify the CSV.read return behavior from within DataFrames to return the column types you desired #29. I did run into issues leaving CSV pointing to DataFrames, so I have not reverted my DataFrames -> DataTables switch back to DataFrames. This would also make DataTable the default sink for CSV.read. In DataFrames, you can read files as readtable(...)
CSV.read(::file, DataFrame, ...) and in DataTables CSV.read(::file, ...) I also changed the default return type from WeakRefStrings to Strings because I had issues with those and DataFrames too, but also because of opinions expressed in JuliaData/CSV.jl#37. CSV.read can now work with DataFrames and DataTables at the same time and everything passes tests when loaded together julia> DataFrames.head(CSV.read("/users/cameron/Repos/Cardiac-Enhancers/GENERATED/enhancer_transcript_associations.tsv", DataFrame, delim='\t'))
6×6 DataFrames.DataFrame
│ Row │ chr │ start │ stop │ score │ source │ transcript_id │
├─────┼────────┼────────┼────────┼───────┼────────────┼─────────────────────┤
│ 1 │ "chr1" │ 906333 │ 907318 │ 0.033 │ "putative" │ "MSTRG.336.1" │
│ 2 │ "chr1" │ 941845 │ 945764 │ 0.041 │ "putative" │ "ENST00000466827.1" │
│ 3 │ "chr1" │ 941845 │ 945764 │ 0.041 │ "putative" │ "MSTRG.339.21" │
│ 4 │ "chr1" │ 941845 │ 945764 │ 0.041 │ "putative" │ "ENST00000464948.1" │
│ 5 │ "chr1" │ 941845 │ 945764 │ 0.041 │ "putative" │ "MSTRG.340.6" │
│ 6 │ "chr1" │ 941845 │ 945764 │ 0.041 │ "putative" │ "ENST00000496938.1" │
julia> DataTables.head(CSV.read("/users/cameron/Repos/Cardiac-Enhancers/GENERATED/enhancer_transcript_associations.tsv", delim='\t'))
6×6 DataTables.DataTable
│ Row │ chr │ start │ stop │ score │ source │ transcript_id │
├─────┼──────┼────────┼────────┼───────┼──────────┼───────────────────┤
│ 1 │ chr1 │ 906333 │ 907318 │ 0.033 │ putative │ MSTRG.336.1 │
│ 2 │ chr1 │ 941845 │ 945764 │ 0.041 │ putative │ ENST00000466827.1 │
│ 3 │ chr1 │ 941845 │ 945764 │ 0.041 │ putative │ MSTRG.339.21 │
│ 4 │ chr1 │ 941845 │ 945764 │ 0.041 │ putative │ ENST00000464948.1 │
│ 5 │ chr1 │ 941845 │ 945764 │ 0.041 │ putative │ MSTRG.340.6 │
│ 6 │ chr1 │ 941845 │ 945764 │ 0.041 │ putative │ ENST00000496938.1 │
julia> Pkg.test("DataTables", "DataFrames", "CSV", "DataStreams")
INFO: Computing test dependencies for DataTables...
INFO: Installing Atom v0.5.9
INFO: Installing Blink v0.5.1
INFO: Installing CodeTools v0.4.3
INFO: Installing Codecs v0.3.0
INFO: Installing Hiccup v0.1.1
INFO: Installing HttpCommon v0.2.6
INFO: Installing HttpParser v0.2.0
INFO: Installing HttpServer v0.1.7
INFO: Installing Juno v0.2.7
INFO: Installing LNR v0.0.2
INFO: Installing Lazy v0.11.5
INFO: Installing MbedTLS v0.4.4
INFO: Installing Media v0.2.5
INFO: Installing Mustache v0.1.3
INFO: Installing Mux v0.2.3
INFO: Installing RData v0.0.4
INFO: Installing WebSockets v0.2.1
INFO: Building HttpParser
INFO: Building Homebrew
Updated 1 tap (homebrew/core).
==> Updated Formulae
haskell-stack
INFO: Building MbedTLS
Using system libraries...
INFO: Testing DataTables
Running tests:
PASSED: utils.jl
PASSED: cat.jl
WARNING: using DataTables.sub in module TestData conflicts with an existing identifier.
PASSED: data.jl
PASSED: index.jl
PASSED: datatable.jl
PASSED: datatablerow.jl
PASSED: io.jl
PASSED: constructors.jl
PASSED: conversions.jl
PASSED: sort.jl
Test Summary: | Pass Total
colwise | 32 32
PASSED: grouping.jl
PASSED: join.jl
PASSED: iteration.jl
PASSED: duplicates.jl
PASSED: show.jl
INFO: DataTables tests passed
INFO: Removing Atom v0.5.9
INFO: Removing Blink v0.5.1
INFO: Removing CodeTools v0.4.3
INFO: Removing Codecs v0.3.0
INFO: Removing Hiccup v0.1.1
INFO: Removing HttpCommon v0.2.6
INFO: Removing HttpParser v0.2.0
INFO: Removing HttpServer v0.1.7
INFO: Removing Juno v0.2.7
INFO: Removing LNR v0.0.2
INFO: Removing Lazy v0.11.5
INFO: Removing MbedTLS v0.4.4
INFO: Removing Media v0.2.5
INFO: Removing Mustache v0.1.3
INFO: Removing Mux v0.2.3
INFO: Removing RData v0.0.4
INFO: Removing WebSockets v0.2.1
INFO: Computing test dependencies for DataFrames...
INFO: Installing Atom v0.5.9
INFO: Installing Blink v0.5.1
INFO: Installing CodeTools v0.4.3
INFO: Installing Codecs v0.3.0
INFO: Installing Hiccup v0.1.1
INFO: Installing HttpCommon v0.2.6
INFO: Installing HttpParser v0.2.0
INFO: Installing HttpServer v0.1.7
INFO: Installing Juno v0.2.7
INFO: Installing LNR v0.0.2
INFO: Installing Lazy v0.11.5
INFO: Installing MbedTLS v0.4.4
INFO: Installing Media v0.2.5
INFO: Installing Mustache v0.1.3
INFO: Installing Mux v0.2.3
INFO: Installing RData v0.0.4
INFO: Installing RDatasets v0.2.0
INFO: Installing WebSockets v0.2.1
INFO: Building HttpParser
INFO: Building Homebrew
Already up-to-date.
INFO: Building MbedTLS
Using system libraries...
INFO: Testing DataFrames
Running tests:
PASSED: utils.jl
PASSED: cat.jl
PASSED: data.jl
PASSED: index.jl
PASSED: dataframe.jl
PASSED: dataframerow.jl
PASSED: io.jl
PASSED: formula.jl
PASSED: constructors.jl
PASSED: conversions.jl
PASSED: sort.jl
PASSED: grouping.jl
PASSED: join.jl
PASSED: iteration.jl
PASSED: duplicates.jl
PASSED: show.jl
PASSED: statsmodel.jl
PASSED: contrasts.jl
INFO: DataFrames tests passed
INFO: Removing Atom v0.5.9
INFO: Removing Blink v0.5.1
INFO: Removing CodeTools v0.4.3
INFO: Removing Codecs v0.3.0
INFO: Removing Hiccup v0.1.1
INFO: Removing HttpCommon v0.2.6
INFO: Removing HttpParser v0.2.0
INFO: Removing HttpServer v0.1.7
INFO: Removing Juno v0.2.7
INFO: Removing LNR v0.0.2
INFO: Removing Lazy v0.11.5
INFO: Removing MbedTLS v0.4.4
INFO: Removing Media v0.2.5
INFO: Removing Mustache v0.1.3
INFO: Removing Mux v0.2.3
INFO: Removing RData v0.0.4
INFO: Removing RDatasets v0.2.0
INFO: Removing WebSockets v0.2.1
INFO: Computing test dependencies for CSV...
INFO: Installing BufferedStreams v0.3.2
INFO: Installing DataStreamsIntegrationTests v0.0.1
INFO: Installing DecFP v0.1.5
INFO: Installing Libz v0.2.4
INFO: Building DecFP
INFO: Testing CSV
1.225617 seconds (24.95 M allocations: 652.548 MB, 11.92% gc time)
f = CSV.Source: /Users/Cameron/.julia/v0.5/CSV/test/test_files/test_basic_pipe.csv
CSV.Options:
delim: '|'
quotechar: '"'
escapechar: '\\'
null: ""
dateformat: Base.Dates.DateFormat(Base.Dates.Slot[Base.Dates.DelimitedSlot{Base.Dates.Year}(Base.Dates.Year,'y',4,"-"),Base.Dates.DelimitedSlot{Base.Dates.Month}(Base.Dates.Month,'m',2,"-"),Base.Dates.DelimitedSlot{Base.Dates.Day}(Base.Dates.Day,'d',2,r"(?=\s|$)")],"","english")
Data.Schema{true}:
rows: 2 cols: 3
Columns:
"col1" Nullable{Int64}
"col2" Nullable{Int64}
"col3" Nullable{Int64}
f = CSV.Sink( CSV.Options:
delim: ','
quotechar: '"'
escapechar: '\\'
null: ""
dateformat: Base.Dates.DateFormat(Base.Dates.Slot[Base.Dates.DelimitedSlot{Base.Dates.Year}(Base.Dates.Year,'y',4,"-"),Base.Dates.DelimitedSlot{Base.Dates.Month}(Base.Dates.Month,'m',2,"-"),Base.Dates.DelimitedSlot{Base.Dates.Day}(Base.Dates.Day,'d',2,r"(?=\s|$)")],"","english"),IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1),"/var/folders/p5/_x1sy3yx0cnd35c_r9zzgz840000gn/T/juliaU0Bu36",0,false,String[],false)
WARNING: Method definition (::Type{DataTables.DataTable})(Any, DataStreams.Data.Schema, Type{DataStreams.Data.Field}, Bool, Array{UInt8, 1}) in module DataTables at /Users/Cameron/.julia/v0.5/DataTables/src/abstractdatatable/io.jl:255 overwritten in module Main at /Users/Cameron/.julia/v0.5/CSV/test/datastreams.jl:15.
[2017-03-16T20:52:45.044]: Test high-level to sink from source; e.g. CSV.write
[2017-03-16T20:52:45.083]: Sink: CSV.Sink args => Source: DataTable args
[2017-03-16T20:52:45.804]: Sink: CSV.Sink args => Source: DataTable args + append
[2017-03-16T20:52:45.988]: Sink: CSV.Sink args => Source: DataTable args + transforms
[2017-03-16T20:52:46.249]: Sink: CSV.Sink args => Source: DataTable args + append + transforms
[2017-03-16T20:52:46.374]: Sink: CSV.Sink => Source: DataTable args
[2017-03-16T20:52:46.543]: Sink: CSV.Sink => Source: DataTable args + append
[2017-03-16T20:52:46.734]: Sink: CSV.Sink => Source: DataTable args + transforms
[2017-03-16T20:52:46.87]: Sink: CSV.Sink => Source: DataTable args + append + transforms`
[2017-03-16T20:52:46.984]: Sink: CSV.Sink args => Source: DataTable
[2017-03-16T20:52:47.158]: Sink: CSV.Sink args => Source: DataTable + append
[2017-03-16T20:52:47.349]: Sink: CSV.Sink args => Source: DataTable + transforms
[2017-03-16T20:52:47.484]: Sink: CSV.Sink args => Source: DataTable + append + transforms
[2017-03-16T20:52:47.589]: Sink: CSV.Sink => Source: DataTable
[2017-03-16T20:52:47.752]: Sink: CSV.Sink => Source: DataTable + append
[2017-03-16T20:52:47.936]: Sink: CSV.Sink => Source: DataTable + transforms
[2017-03-16T20:52:48.054]: Sink: CSV.Sink => Source: DataTable + append + transforms
[2017-03-16T20:52:48.178]: finished...
[2017-03-16T20:52:48.18]: Test high-level from source to sink; e.g. CSV.read
[2017-03-16T20:52:48.18]: Source: CSV.Source args => Sink: DataTable args
[2017-03-16T20:52:48.525]: Source: CSV.Source args => Sink: DataTable args + append
[2017-03-16T20:52:48.661]: Source: CSV.Source args => Sink: DataTable args + transforms
[2017-03-16T20:52:48.841]: Source: CSV.Source args => Sink: DataTable args + append + transforms
[2017-03-16T20:52:48.949]: Source: CSV.Source args => Sink: DataTable
[2017-03-16T20:52:49.109]: Source: CSV.Source args => Sink: DataTable + append
[2017-03-16T20:52:49.273]: Source: CSV.Source args => Sink: DataTable + transforms
[2017-03-16T20:52:49.412]: Source: CSV.Source args => Sink: DataTable + append + transforms`
[2017-03-16T20:52:49.531]: Source: CSV.Source => Sink: DataTable args
[2017-03-16T20:52:49.63]: Source: CSV.Source => Sink: DataTable args + append
[2017-03-16T20:52:49.745]: Source: CSV.Source => Sink: DataTable args + transforms
[2017-03-16T20:52:49.844]: Source: CSV.Source => Sink: DataTable args + append + transforms
[2017-03-16T20:52:49.944]: Source: CSV.Source => Sink: DataTable
[2017-03-16T20:52:50.053]: Source: CSV.Source => Sink: DataTable + append
[2017-03-16T20:52:50.163]: Source: CSV.Source => Sink: DataTable + transforms
[2017-03-16T20:52:50.26]: Source: CSV.Source => Sink: DataTable + append + transforms
[2017-03-16T20:52:50.357]: Test high-level from source to sink; e.g. CSV.read
[2017-03-16T20:52:50.357]: Source: CSV.Source args => Sink: CSV.Sink args
[2017-03-16T20:52:50.66]: Source: CSV.Source args => Sink: CSV.Sink args + append
[2017-03-16T20:52:50.952]: Source: CSV.Source args => Sink: CSV.Sink args + transforms
[2017-03-16T20:52:51.223]: Source: CSV.Source args => Sink: CSV.Sink args + append + transforms
[2017-03-16T20:52:51.446]: Source: CSV.Source args => Sink: CSV.Sink
[2017-03-16T20:52:51.753]: Source: CSV.Source args => Sink: CSV.Sink + append
[2017-03-16T20:52:52.068]: Source: CSV.Source args => Sink: CSV.Sink + transforms
[2017-03-16T20:52:52.324]: Source: CSV.Source args => Sink: CSV.Sink + append + transforms`
[2017-03-16T20:52:52.572]: Source: CSV.Source => Sink: CSV.Sink args
[2017-03-16T20:52:52.835]: Source: CSV.Source => Sink: CSV.Sink args + append
[2017-03-16T20:52:53.101]: Source: CSV.Source => Sink: CSV.Sink args + transforms
[2017-03-16T20:52:53.328]: Source: CSV.Source => Sink: CSV.Sink args + append + transforms
[2017-03-16T20:52:53.546]: Source: CSV.Source => Sink: CSV.Sink
[2017-03-16T20:52:53.784]: Source: CSV.Source => Sink: CSV.Sink + append
[2017-03-16T20:52:54.05]: Source: CSV.Source => Sink: CSV.Sink + transforms
[2017-03-16T20:52:54.251]: Source: CSV.Source => Sink: CSV.Sink + append + transforms
[2017-03-16T20:52:54.453]: Test high-level to sink from source; e.g. CSV.write
[2017-03-16T20:52:54.454]: Sink: CSV.Sink args => Source: CSV.Source args
[2017-03-16T20:52:54.728]: Sink: CSV.Sink args => Source: CSV.Source args + append
[2017-03-16T20:52:55.005]: Sink: CSV.Sink args => Source: CSV.Source args + transforms
[2017-03-16T20:52:55.251]: Sink: CSV.Sink args => Source: CSV.Source args + append + transforms
[2017-03-16T20:52:55.471]: Sink: CSV.Sink => Source: CSV.Source args
[2017-03-16T20:52:55.747]: Sink: CSV.Sink => Source: CSV.Source args + append
[2017-03-16T20:52:56.047]: Sink: CSV.Sink => Source: CSV.Source args + transforms
[2017-03-16T20:52:56.285]: Sink: CSV.Sink => Source: CSV.Source args + append + transforms`
[2017-03-16T20:52:56.526]: Sink: CSV.Sink args => Source: CSV.Source
[2017-03-16T20:52:56.783]: Sink: CSV.Sink args => Source: CSV.Source + append
[2017-03-16T20:52:57.041]: Sink: CSV.Sink args => Source: CSV.Source + transforms
[2017-03-16T20:52:57.271]: Sink: CSV.Sink args => Source: CSV.Source + append + transforms
[2017-03-16T20:52:57.493]: Sink: CSV.Sink => Source: CSV.Source
[2017-03-16T20:52:57.735]: Sink: CSV.Sink => Source: CSV.Source + append
[2017-03-16T20:52:58.004]: Sink: CSV.Sink => Source: CSV.Source + transforms
[2017-03-16T20:52:58.222]: Sink: CSV.Sink => Source: CSV.Source + append + transforms
[2017-03-16T20:52:58.438]: finished...
INFO: CSV tests passed
INFO: Removing BufferedStreams v0.3.2
INFO: Removing DataStreamsIntegrationTests v0.0.1
INFO: Removing DecFP v0.1.5
INFO: Removing Libz v0.2.4
INFO: Computing test dependencies for DataStreams...
INFO: No packages to install, update or remove
INFO: Testing DataStreams
INFO: DataStreams tests passed
INFO: No packages to install, update or remove probably not perfect but I'm happy to fix any mistakes anyone catches and make requested changes. |
docs/src/index.md
Outdated
@@ -10,12 +10,15 @@ Packages can have a single julia type implement both the `Data.Source` and `Data | |||
* [`CSV.Source`](https://github.com/JuliaData/CSV.jl/blob/master/src/Source.jl) | |||
* [`SQLite.Source`](https://github.com/JuliaDB/SQLite.jl/blob/master/src/Source.jl) | |||
* [`DataFrame`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251) | |||
* [`DataTables`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will be wrong until we have a new url to link to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use a less specific link when pointing to a different repository, else it's going to be outdated quite fast. Just point to the package homepage? Or maybe to the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me once support is merged in DataTables.
docs/src/index.md
Outdated
@@ -10,12 +10,15 @@ Packages can have a single julia type implement both the `Data.Source` and `Data | |||
* [`CSV.Source`](https://github.com/JuliaData/CSV.jl/blob/master/src/Source.jl) | |||
* [`SQLite.Source`](https://github.com/JuliaDB/SQLite.jl/blob/master/src/Source.jl) | |||
* [`DataFrame`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251) | |||
* [`DataTables`](https://github.com/JuliaData/DataStreams.jl/blob/master/src/DataStreams.jl#L251) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use a less specific link when pointing to a different repository, else it's going to be outdated quite fast. Just point to the package homepage? Or maybe to the file?
#27