Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Create bag #377

Closed
goodwanghan opened this issue Oct 22, 2022 · 0 comments · Fixed by #383
Closed

[FEATURE] Create bag #377

goodwanghan opened this issue Oct 22, 2022 · 0 comments · Fixed by #383

Comments

@goodwanghan
Copy link
Collaborator

goodwanghan commented Oct 22, 2022

Fugue has been built on top of the DataFrame concept. Although a collection of arbitrary objects can be converted to DataFrame to be distributed in Fugue, it is not always efficient or intuitive to do so. Looking at Spark (RDD), Dask (Bag) and even Ray, they all have separate ways to handle a distributed collection of arbitrary objects. So in Fugue, we should have the correspondent concept. And immediate benefit and distributing a collection of tasks, we no longer need to consider it in a dataframe way.

Regarding name, bag is a really nice name and a perfect term that is defined in mathematics, see https://en.wikipedia.org/wiki/Multiset It is unordered, and platform/scale agnostic matching Fugue's design philosophy. And this is also why Dask is using this name.

As an initial version, we don't plan to add many features like what RDD does. One major feature NOT to have in v1 is partitioning and shuffling. In order to do these, DataFrame is required.

@goodwanghan goodwanghan added this to the 0.7.4 milestone Oct 22, 2022
@goodwanghan goodwanghan modified the milestones: 0.7.4, 0.7.5 Nov 7, 2022
@goodwanghan goodwanghan linked a pull request Nov 15, 2022 that will close this issue
@goodwanghan goodwanghan modified the milestones: 0.7.5, 0.8.0 Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant