Support for the Feather format #343

evelinag · 2016-03-29T20:33:47Z

Feather is a recently introduced fast binary format for storing data frames. It's language agnostic and it can be currently used to load data frames into R and Python. It would be great to have a support for this format in Deedle as well, to allow exchanging data with R and Python code.

For more information see: blog.rstudio.org/2016/03/29/feather
Feather source code: github.com/wesm/feather

adamklein · 2016-04-05T14:08:36Z

Great idea!

buybackoff · 2016-04-05T17:11:49Z

It would be cool to reuse your FlatBuffer project to automatically map .NET's primitive types and structs (and maybe POCOs) to the Arrow format, since it uses FlatBuffers as well. And then to keep .NET<->Arrow as a reusable module and build Feather on the top of it. Feather is too specific for data frames, while the Arrow format could be used for chunks/block storage. I have been investigating for a while how to adapt it for Spreads library, and I am very interested in the .NET port. What do you think would be easier/feasible - C interface with P/Invoke or native rewrite in F#/C#?

pkese · 2019-02-11T18:05:15Z

@buybackoff
Do you have any idea how https://github.com/kevin-montrose/FeatherDotNet would fit into Deedle's internals?

buybackoff · 2019-02-11T19:06:45Z

@pkese I'm not the one to talk about Deedle internals, but

Apache Arrow is the standard in that world, Feather is one of implementations. Arrow has .NET implementation: https://github.com/apache/arrow/tree/master/csharp
ML.NET took the other way and defined interfaces: https://github.com/dotnet/machinelearning/tree/master/src/Microsoft.Data.DataView
ML.NET has initial file format, but nothing stops one from developing their own that exposes data via the same interface.

My current take on it that the physical binary layout doesn't matter much, there is no a silver bullet, but I'm biased. I'm doing well with SQLite and LMDB and store data blocks as just shuffled+compressed blobs. SQLite is damn fast, SSD write speed is the limit when writing moderate size chunks. LMDB is much faster for reads. Zstd compression often makes IO faster - savings on data size and read/write time are bigger that CPU spent on (de)compression.

In the end it is just blobs with headers laid out sequentially with some indexing. Anything will do many orders of magnitude better than csv/json. Arrow is more like well-specified common sense and not something unique. Uniqueness is that the very big Apache ecosystem has agreed upon that standard.

Please sign up for announcements here if you are interested in very fast persistence for real-time data streams, series, matrices and frames. I have it partially working in a private repo and hope to release soon for a general use case. I will implement ML.NET's IDataView rather than Arrow on the top of my very simple physical layout that resembles Arrow a lot conceptually.

buybackoff · 2019-02-11T19:33:07Z

Relevant issue in ML.NET: dotnet/machinelearning#1860

ML.NET already has Parquet loader: https://github.com/dotnet/machinelearning/tree/master/src/Microsoft.ML.Parquet.

And now we have Feather, Arrow, Parquet, learn how they differ or just names/implementations of the same thing... And then comes IDataView that promises to standardize all the standards. Xkcd link above is so relevant here :)

adamklein added type-feature type-proposal labels Apr 5, 2016

totalgit74 mentioned this issue Jul 8, 2020

Support for parquet/arrow format #504

Closed

HLWeil mentioned this issue Apr 20, 2024

Implement IDataView interface #563

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for the Feather format #343

Support for the Feather format #343

evelinag commented Mar 29, 2016

adamklein commented Apr 5, 2016

buybackoff commented Apr 5, 2016

pkese commented Feb 11, 2019

buybackoff commented Feb 11, 2019 •

edited

Loading

buybackoff commented Feb 11, 2019

Support for the Feather format #343

Support for the Feather format #343

Comments

evelinag commented Mar 29, 2016

adamklein commented Apr 5, 2016

buybackoff commented Apr 5, 2016

pkese commented Feb 11, 2019

buybackoff commented Feb 11, 2019 • edited Loading

buybackoff commented Feb 11, 2019

buybackoff commented Feb 11, 2019 •

edited

Loading