You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fugue has been built on top of the DataFrame concept. Although a collection of arbitrary objects can be converted to DataFrame to be distributed in Fugue, it is not always efficient or intuitive to do so. Looking at Spark (RDD), Dask (Bag) and even Ray, they all have separate ways to handle a distributed collection of arbitrary objects. So in Fugue, we should have the correspondent concept. And immediate benefit and distributing a collection of tasks, we no longer need to consider it in a dataframe way.
Regarding name, bag is a really nice name and a perfect term that is defined in mathematics, see https://en.wikipedia.org/wiki/Multiset It is unordered, and platform/scale agnostic matching Fugue's design philosophy. And this is also why Dask is using this name.
As an initial version, we don't plan to add many features like what RDD does. One major feature NOT to have in v1 is partitioning and shuffling. In order to do these, DataFrame is required.
The text was updated successfully, but these errors were encountered:
Fugue has been built on top of the DataFrame concept. Although a collection of arbitrary objects can be converted to DataFrame to be distributed in Fugue, it is not always efficient or intuitive to do so. Looking at Spark (RDD), Dask (Bag) and even Ray, they all have separate ways to handle a distributed collection of arbitrary objects. So in Fugue, we should have the correspondent concept. And immediate benefit and distributing a collection of tasks, we no longer need to consider it in a dataframe way.
Regarding name,
bag
is a really nice name and a perfect term that is defined in mathematics, see https://en.wikipedia.org/wiki/Multiset It is unordered, and platform/scale agnostic matching Fugue's design philosophy. And this is also why Dask is using this name.As an initial version, we don't plan to add many features like what RDD does. One major feature NOT to have in v1 is partitioning and shuffling. In order to do these, DataFrame is required.
The text was updated successfully, but these errors were encountered: