From b8dab8eb40426994a8347e19ec29904ccff6a466 Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Thu, 28 Nov 2024 13:40:38 +0100 Subject: [PATCH 1/8] Add dataset contribution guide --- .../contributor-how-to-contribute-dataset.rst | 51 +++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 datasets/doc/source/contributor-how-to-contribute-dataset.rst diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst new file mode 100644 index 00000000000..e5fee8bb88e --- /dev/null +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -0,0 +1,51 @@ +How to contribute a dataset +=========================== + +To make a dataset available in Flower Dataset (`flwr-datasets`), you need to add the dataset to `HuggingFace Hub `_ . + +This guide will explain the best practices we found when adding datasets ourselves and point to the HFs guides. To see the datasets added by Flower, visit https://huggingface.co/flwrlabs. + +Dataset contribution process +---------------------------- +The contribution contains two steps: +1. Create the dataset locally. +We recommend that you do not upload custom scripts to HuggingFace Hub; instead, create the dataset locally and upload the data, which will speed up the processing time each time the data set is downloaded. +2. Contribute to HuggingFace Hub. +Each dataset in the HF Hub is a Git repository with a specific structure and readme file, and HuggingFace provides an API to push the dataset and, alternatively, a user interface directly in the website to populate the information in the readme file. + + + +Creating a dataset locally +========================== +You can create a local dataset directly using the `datasets` library or load it in any custom way and transform it to the `datasets.Dataset` from other Python objects. +To complete this step, we recommend reading our guide available here: :doc:`how-to-use-with-local-data` or/and reading the guide from HF `Create a dataset `_. + +Contribution to the HuggingFace Hub +=================================== +Contributions to the HuggingFace Hub come down to. +1. creating an HF repository for the dataset. +2. uploading the dataset. +3. filling in the information in the readme file. + +To complete this step, follow this HF's guide `Share dataset to the Hub `_. + +Note that the push of the dataset is straightforward, and here's what it could look like: +.. code-block:: python + + from datasets import Dataset + + # Example dataset + data = { + 'column1': [1, 2, 3], + 'column2': ['a', 'b', 'c'] + } + + # Create a Dataset object + dataset = Dataset.from_dict(data) + + # Push the dataset to the HuggingFace Hub + dataset.push_to_hub("you-hf-username/your-ds-name") + +To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our reps ` for the cinic10 dataset. + +That's it! You have successfully contributed a dataset to the HuggingFace Hub. If you want the dataset listed in our `recommended FL dataset list `_ , please send a PR or ping us in `Slack _ `#contributions` channel. \ No newline at end of file From 70f2585585a3fb713f2e16a4ff3fe4726f82abe9 Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Thu, 28 Nov 2024 14:01:08 +0100 Subject: [PATCH 2/8] Fix formatting --- .../contributor-how-to-contribute-dataset.rst | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst index e5fee8bb88e..b29db0497da 100644 --- a/datasets/doc/source/contributor-how-to-contribute-dataset.rst +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -8,21 +8,26 @@ This guide will explain the best practices we found when adding datasets ourselv Dataset contribution process ---------------------------- The contribution contains two steps: + 1. Create the dataset locally. + We recommend that you do not upload custom scripts to HuggingFace Hub; instead, create the dataset locally and upload the data, which will speed up the processing time each time the data set is downloaded. + 2. Contribute to HuggingFace Hub. + Each dataset in the HF Hub is a Git repository with a specific structure and readme file, and HuggingFace provides an API to push the dataset and, alternatively, a user interface directly in the website to populate the information in the readme file. Creating a dataset locally -========================== +^^^^^^^^^^^^^^^^^^^^^^^^^^ You can create a local dataset directly using the `datasets` library or load it in any custom way and transform it to the `datasets.Dataset` from other Python objects. To complete this step, we recommend reading our guide available here: :doc:`how-to-use-with-local-data` or/and reading the guide from HF `Create a dataset `_. Contribution to the HuggingFace Hub -=================================== -Contributions to the HuggingFace Hub come down to. +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Contributions to the HuggingFace Hub come down to: + 1. creating an HF repository for the dataset. 2. uploading the dataset. 3. filling in the information in the readme file. @@ -30,6 +35,7 @@ Contributions to the HuggingFace Hub come down to. To complete this step, follow this HF's guide `Share dataset to the Hub `_. Note that the push of the dataset is straightforward, and here's what it could look like: + .. code-block:: python from datasets import Dataset @@ -46,6 +52,6 @@ Note that the push of the dataset is straightforward, and here's what it could l # Push the dataset to the HuggingFace Hub dataset.push_to_hub("you-hf-username/your-ds-name") -To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our reps ` for the cinic10 dataset. +To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our reps `_ for the cinic10 dataset. -That's it! You have successfully contributed a dataset to the HuggingFace Hub. If you want the dataset listed in our `recommended FL dataset list `_ , please send a PR or ping us in `Slack _ `#contributions` channel. \ No newline at end of file +That's it! You have successfully contributed a dataset to the HuggingFace Hub. If you want the dataset listed in our `recommended FL dataset list `_ , please send a PR or ping us in `Slack `_ #contributions channel. \ No newline at end of file From 3d353658f774f2970e9996f57a96870414f237cb Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Thu, 28 Nov 2024 14:01:22 +0100 Subject: [PATCH 3/8] Add reference to index --- datasets/doc/source/index.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/datasets/doc/source/index.rst b/datasets/doc/source/index.rst index 6f7c47bf241..422d93582a0 100644 --- a/datasets/doc/source/index.rst +++ b/datasets/doc/source/index.rst @@ -66,6 +66,13 @@ Information-oriented API reference and other reference material. recommended-fl-datasets ref-telemetry +.. toctree:: + :maxdepth: 1 + :caption: Contributor tutorials + + contributor-how-to-contribute-dataset + + Main features ------------- Flower Datasets library supports: From e3e568a7cdf3e5a5abcb0e9506e21790422548f9 Mon Sep 17 00:00:00 2001 From: Adam Narozniak <51029327+adam-narozniak@users.noreply.github.com> Date: Fri, 29 Nov 2024 10:09:48 +0100 Subject: [PATCH 4/8] Apply suggestions from code review Co-authored-by: Javier --- datasets/doc/source/contributor-how-to-contribute-dataset.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst index b29db0497da..b70f7d8d4ed 100644 --- a/datasets/doc/source/contributor-how-to-contribute-dataset.rst +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -1,13 +1,13 @@ How to contribute a dataset =========================== -To make a dataset available in Flower Dataset (`flwr-datasets`), you need to add the dataset to `HuggingFace Hub `_ . +To make a dataset available in Flower Dataset (`flwr-datasets`) for other in the community to use, it needs to be added to `HuggingFace Hub `_ . This guide will explain the best practices we found when adding datasets ourselves and point to the HFs guides. To see the datasets added by Flower, visit https://huggingface.co/flwrlabs. Dataset contribution process ---------------------------- -The contribution contains two steps: +The contribution contains three steps: first, on your development machine transform your dataset into a ``datasets.Dataset`` object, the preferred format for datasets in HF Hub; second, upload the dataset to HuggingFace Hub and detail it its readme how can be used with Flower Dataset; third, share your dataset with us and we will add it to the `recommended FL dataset list `_``` 1. Create the dataset locally. From f5cef20094f8b00a154a6c38f49fd9572a697417 Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Fri, 29 Nov 2024 10:22:31 +0100 Subject: [PATCH 5/8] Change list into description in the sections --- .../contributor-how-to-contribute-dataset.rst | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst index b70f7d8d4ed..9d1cc4f0195 100644 --- a/datasets/doc/source/contributor-how-to-contribute-dataset.rst +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -7,25 +7,20 @@ This guide will explain the best practices we found when adding datasets ourselv Dataset contribution process ---------------------------- -The contribution contains three steps: first, on your development machine transform your dataset into a ``datasets.Dataset`` object, the preferred format for datasets in HF Hub; second, upload the dataset to HuggingFace Hub and detail it its readme how can be used with Flower Dataset; third, share your dataset with us and we will add it to the `recommended FL dataset list `_``` - -1. Create the dataset locally. - -We recommend that you do not upload custom scripts to HuggingFace Hub; instead, create the dataset locally and upload the data, which will speed up the processing time each time the data set is downloaded. - -2. Contribute to HuggingFace Hub. - -Each dataset in the HF Hub is a Git repository with a specific structure and readme file, and HuggingFace provides an API to push the dataset and, alternatively, a user interface directly in the website to populate the information in the readme file. - - +The contribution contains three steps: first, on your development machine transform your dataset into a ``datasets.Dataset`` object, the preferred format for datasets in HF Hub; second, upload the dataset to HuggingFace Hub and detail it its readme how can be used with Flower Dataset; third, share your dataset with us and we will add it to the `recommended FL dataset list `_ Creating a dataset locally ^^^^^^^^^^^^^^^^^^^^^^^^^^ You can create a local dataset directly using the `datasets` library or load it in any custom way and transform it to the `datasets.Dataset` from other Python objects. To complete this step, we recommend reading our guide available here: :doc:`how-to-use-with-local-data` or/and reading the guide from HF `Create a dataset `_. +.. tip:: + We recommend that you do not upload custom scripts to HuggingFace Hub; instead, create the dataset locally and upload the data, which will speed up the processing time each time the data set is downloaded. + Contribution to the HuggingFace Hub ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Each dataset in the HF Hub is a Git repository with a specific structure and readme file, and HuggingFace provides an API to push the dataset and, alternatively, a user interface directly in the website to populate the information in the readme file. + Contributions to the HuggingFace Hub come down to: 1. creating an HF repository for the dataset. From c001125e6fb11a8a8464fe3784e554d0efcf0bbe Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Fri, 29 Nov 2024 10:30:20 +0100 Subject: [PATCH 6/8] Add a new section --- .../doc/source/contributor-how-to-contribute-dataset.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst index 9d1cc4f0195..5708307d9fb 100644 --- a/datasets/doc/source/contributor-how-to-contribute-dataset.rst +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -49,4 +49,8 @@ Note that the push of the dataset is straightforward, and here's what it could l To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our reps `_ for the cinic10 dataset. -That's it! You have successfully contributed a dataset to the HuggingFace Hub. If you want the dataset listed in our `recommended FL dataset list `_ , please send a PR or ping us in `Slack `_ #contributions channel. \ No newline at end of file +Increasing visibility of the dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If you want the dataset listed in our `recommended FL dataset list `_ , please send a PR or ping us in `Slack `_ #contributions channel. + +That's it! You have successfully contributed a dataset to the HuggingFace Hub and made it available for FL community. Thank you for your contribution! \ No newline at end of file From 621e7357f521cc2cea415a5010a31bb1dfe9fb5d Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Fri, 29 Nov 2024 10:30:45 +0100 Subject: [PATCH 7/8] Chagne the first sentence --- datasets/doc/source/contributor-how-to-contribute-dataset.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst index 5708307d9fb..7f393e590db 100644 --- a/datasets/doc/source/contributor-how-to-contribute-dataset.rst +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -1,7 +1,7 @@ How to contribute a dataset =========================== -To make a dataset available in Flower Dataset (`flwr-datasets`) for other in the community to use, it needs to be added to `HuggingFace Hub `_ . +To make a dataset available in Flower Dataset (`flwr-datasets`), you need to add the dataset to `HuggingFace Hub `_ . This guide will explain the best practices we found when adding datasets ourselves and point to the HFs guides. To see the datasets added by Flower, visit https://huggingface.co/flwrlabs. From da5b94da84f827c9ccaa66a678c4e48e43af6fb7 Mon Sep 17 00:00:00 2001 From: Adam Narozniak <51029327+adam-narozniak@users.noreply.github.com> Date: Fri, 29 Nov 2024 14:59:54 +0100 Subject: [PATCH 8/8] Apply suggestions from code review Co-authored-by: Javier --- .../doc/source/contributor-how-to-contribute-dataset.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst index 7f393e590db..07a6ba6378f 100644 --- a/datasets/doc/source/contributor-how-to-contribute-dataset.rst +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -1,7 +1,7 @@ How to contribute a dataset =========================== -To make a dataset available in Flower Dataset (`flwr-datasets`), you need to add the dataset to `HuggingFace Hub `_ . +To make a dataset available in Flower Dataset (`flwr-datasets`), you need to add the dataset to `HuggingFace Hub `_ . This guide will explain the best practices we found when adding datasets ourselves and point to the HFs guides. To see the datasets added by Flower, visit https://huggingface.co/flwrlabs. @@ -12,7 +12,7 @@ The contribution contains three steps: first, on your development machine transf Creating a dataset locally ^^^^^^^^^^^^^^^^^^^^^^^^^^ You can create a local dataset directly using the `datasets` library or load it in any custom way and transform it to the `datasets.Dataset` from other Python objects. -To complete this step, we recommend reading our guide available here: :doc:`how-to-use-with-local-data` or/and reading the guide from HF `Create a dataset `_. +To complete this step, we recommend reading our :doc:`how-to-use-with-local-data` guide or/and the `Create a dataset `_ guide from HF. .. tip:: We recommend that you do not upload custom scripts to HuggingFace Hub; instead, create the dataset locally and upload the data, which will speed up the processing time each time the data set is downloaded. @@ -47,7 +47,7 @@ Note that the push of the dataset is straightforward, and here's what it could l # Push the dataset to the HuggingFace Hub dataset.push_to_hub("you-hf-username/your-ds-name") -To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our reps `_ for the cinic10 dataset. +To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our repos `_ for the cinic10 dataset. Increasing visibility of the dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^