Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Convert List[Any] to a single pickle file #3207

Open
2 tasks done
pingsutw opened this issue Dec 31, 2022 · 4 comments
Open
2 tasks done

[Core feature] Convert List[Any] to a single pickle file #3207

pingsutw opened this issue Dec 31, 2022 · 4 comments
Labels
enhancement New feature or request flytekit FlyteKit Python related issue

Comments

@pingsutw
Copy link
Member

pingsutw commented Dec 31, 2022

Motivation: Why do you think this is important?

Currently, flyte create N (size of list) pickle files if output type is List[Any]. This slows down serialization. it takes more than 15 mins to upload the pickles to s3 if the size of list is 1000.

People don't care about how we serialize List[Any]. We can just convert entire list into a single pickle file, which reduces the time required for serialization.

Goal: What should the final outcome look like, ideally?

it will make serialization faster

Describe alternatives you've considered

  • Raise an error when using large list
  • Add a detailed warning

Propose: Link/Inline OR Additional context

Slack Thread

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@pingsutw pingsutw added enhancement New feature or request flytekit FlyteKit Python related issue labels Dec 31, 2022
@LukasBommes
Copy link

Does this scale well? What if the list contains millions of items? Or what if each list item has a large size?

Would it not be a better approach to take batches of list items and upload each batch as separate pickle file? There could be a setting for the desired upper file size of each pickle file.

@pingsutw
Copy link
Member Author

pingsutw commented Jan 3, 2023

Yup, good idea. This is one of the options. We can parse annotated to get the number of items saved in a pickle file.

@task
def t1() -> Annotated[List[Any],  100]
...

The default behavior could write all the data to one pickle file.

@github-actions
Copy link

github-actions bot commented Oct 1, 2023

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Oct 1, 2023
@github-actions
Copy link

github-actions bot commented Oct 9, 2023

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023
@eapolinario eapolinario reopened this Nov 2, 2023
@github-actions github-actions bot removed the stale label Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flytekit FlyteKit Python related issue
Projects
None yet
Development

No branches or pull requests

3 participants