-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add DataTables.jl compatibility #63
Conversation
This is another case where I'm hesitant to do this until we have a generic table abstraction. Otherwise DataFrames will be SOL. |
Similar sentiments were shared in JuliaData/DataStreams.jl#27 by myself and others. I'm not sure what the best way forward is myself. Ideas so far are:
|
Actually, the abstraction we need here already exists: it's CSV.jl doesn't work very well for Ideally at some point the default sink type will depend on which package is loaded, but for now specifying that you want a |
Yes, @nalimilan is pretty dead on here. A big part of my "next phase" plans for DataStreams was taking most of |
Hey @cjprybol, why change the default |
I had issues getting WeakRefStrings read into DataFrames. After flipping the DataFrames code to read into DataArrays rather than NullableArrays I hit errors I couldn't figure out how to resolve. If I could get some help reading WeakRefStrings into DataArrays I'm happy to flip that back |
I'm confused. The whole idea here is we're switching to DataTables, which uses NullableArrays by default, right? WeakRefStrings will certainly have problems w/ DataArrays, but that should only be a DataFrames problem (not DAtaTables) |
Yes, sorry, I wasn't sure how to best communicate these changes as they're now fragmented across 4 PRs. I wrote a brief summary here in DataStreams JuliaData/DataStreams.jl#28 (comment) but I should summarize everything again to clarify. I've removed the DataFrames specific code from DataStreams here JuliaData/DataStreams.jl#28. Because the DataStreams code that's currently in master reads very nicely into NullableArrays, CategoricalArrays, WeakRefStrings, that code from DataStreams has been pushed to DataTables https://github.com/JuliaData/DataTables.jl/pull/35/files. A subset of that code is also in DataFrames https://github.com/JuliaStats/DataFrames.jl/pull/1174/files. Now those packages each depend on and implement their respective DataStreams code. I first pushed the code to DataFrames with NullableArrays included, but was asked to remove the NullableArrays addition and convert the behavior to return DataArrays. I could no longer get WeakRefStrings to work after that when using CSV.read and DataFrames, so I changed the default weakrefstring behavior to false and made the requested changes in DataFrames. We can keep weakrefstrings=true, but then every call to CSV.read by DataFrames users would require that keyword to be set to false. Now DataFrames only supports a subset of the full CSV.read behavior and so I thought it would be better to keep this PR in CSV pointing towards DataTables rather than DataFrames, because DataTables supports the full range of features and DataFrames doesn't. We can keep weakrefstrings=true and ask DataFrames users to always call CSV.read with weakrefstrings=false? |
Or we could just fix whatever is preventing DataArrays and WeakRefStrings from playing nice together. |
True, it was just involve adding an extra field to the |
I think I'm going to hold off on this for a while. It's not strictly necessary in the DataStreams -> DataTables/DataFrames code migration and I want to minimize as much changes at one time as possible. CSV can continue to work w/ DataFrames by default and we can change to DataTables later. |
Actually, the issue is not with adding support for DataTables: it's with changing DataFrames to using |
Implemented in #95 |
See JuliaData/DataStreams.jl#27