[FEATURE] Create bag #377

goodwanghan · 2022-10-22T05:14:59Z

Fugue has been built on top of the DataFrame concept. Although a collection of arbitrary objects can be converted to DataFrame to be distributed in Fugue, it is not always efficient or intuitive to do so. Looking at Spark (RDD), Dask (Bag) and even Ray, they all have separate ways to handle a distributed collection of arbitrary objects. So in Fugue, we should have the correspondent concept. And immediate benefit and distributing a collection of tasks, we no longer need to consider it in a dataframe way.

Regarding name, bag is a really nice name and a perfect term that is defined in mathematics, see https://en.wikipedia.org/wiki/Multiset It is unordered, and platform/scale agnostic matching Fugue's design philosophy. And this is also why Dask is using this name.

As an initial version, we don't plan to add many features like what RDD does. One major feature NOT to have in v1 is partitioning and shuffling. In order to do these, DataFrame is required.

The text was updated successfully, but these errors were encountered:

goodwanghan added enhancement New feature or request high priority programming interface core feature bag labels Oct 22, 2022

goodwanghan added this to the 0.7.4 milestone Oct 22, 2022

goodwanghan modified the milestones: 0.7.4, 0.7.5 Nov 7, 2022

goodwanghan linked a pull request Nov 15, 2022 that will close this issue

Multiple breaking changes #383

Merged

goodwanghan modified the milestones: 0.7.5, 0.8.0 Nov 17, 2022

goodwanghan closed this as completed in #383 Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Create bag #377

[FEATURE] Create bag #377

goodwanghan commented Oct 22, 2022 •

edited

Loading

[FEATURE] Create bag #377

[FEATURE] Create bag #377

Comments

goodwanghan commented Oct 22, 2022 • edited Loading

goodwanghan commented Oct 22, 2022 •

edited

Loading