-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for the Feather format #343
Comments
Great idea! |
It would be cool to reuse your FlatBuffer project to automatically map .NET's primitive types and structs (and maybe POCOs) to the Arrow format, since it uses FlatBuffers as well. And then to keep .NET<->Arrow as a reusable module and build Feather on the top of it. Feather is too specific for data frames, while the Arrow format could be used for chunks/block storage. I have been investigating for a while how to adapt it for Spreads library, and I am very interested in the .NET port. What do you think would be easier/feasible - C interface with P/Invoke or native rewrite in F#/C#? |
@buybackoff |
@pkese I'm not the one to talk about Deedle internals, but
My current take on it that the physical binary layout doesn't matter much, there is no a silver bullet, but I'm biased. I'm doing well with SQLite and LMDB and store data blocks as just shuffled+compressed blobs. SQLite is damn fast, SSD write speed is the limit when writing moderate size chunks. LMDB is much faster for reads. Zstd compression often makes IO faster - savings on data size and read/write time are bigger that CPU spent on (de)compression. In the end it is just blobs with headers laid out sequentially with some indexing. Anything will do many orders of magnitude better than Please sign up for announcements here if you are interested in very fast persistence for real-time data streams, series, matrices and frames. I have it partially working in a private repo and hope to release soon for a general use case. I will implement ML.NET's |
Relevant issue in ML.NET: dotnet/machinelearning#1860 ML.NET already has Parquet loader: https://github.com/dotnet/machinelearning/tree/master/src/Microsoft.ML.Parquet. And now we have Feather, Arrow, Parquet, learn how they differ or just names/implementations of the same thing... And then comes IDataView that promises to standardize all the standards. Xkcd link above is so relevant here :) |
Feather is a recently introduced fast binary format for storing data frames. It's language agnostic and it can be currently used to load data frames into R and Python. It would be great to have a support for this format in Deedle as well, to allow exchanging data with R and Python code.
For more information see: blog.rstudio.org/2016/03/29/feather
Feather source code: github.com/wesm/feather
The text was updated successfully, but these errors were encountered: