-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to create a Dataset from spark dataframe #5678
Comments
if i read spark Dataframe , got an error on multi-node Spark cluster. Error: |
How to perform predictions on Dataset object in Spark with multi-node cluster parallelism? |
Addressed in #5701 |
Hi ! for your information we are working on some more documentation on how to use Spark with HF Datasets repositories (without the need for the |
sorry, wrong link: huggingface/hub-docs#1392 |
Feature request
Add a new API
Dataset.from_spark
to create a Dataset from Spark DataFrame.Motivation
Spark is a distributed computing framework that can handle large datasets. By supporting loading Spark DataFrames directly into Hugging Face Datasets, we enable take the advantages of spark to processing the data in parallel.
By providing a seamless integration between these two frameworks, we make it easier for data scientists and developers to work with both Spark and Hugging Face in the same workflow.
Your contribution
We can discuss about the ideas and I can help preparing a PR for this feature.
The text was updated successfully, but these errors were encountered: