From 2943fd105d89e844b6ccf488520b265340409b33 Mon Sep 17 00:00:00 2001 From: Ziyue Xu Date: Mon, 16 Sep 2024 12:35:32 -0400 Subject: [PATCH] update higgs data link (#2941) --- examples/advanced/random_forest/README.md | 2 +- examples/advanced/random_forest/random_forest.ipynb | 2 +- examples/advanced/sklearn-linear/README.md | 2 +- examples/advanced/sklearn-linear/sklearn_linear_higgs.ipynb | 2 +- examples/advanced/vertical_xgboost/README.md | 2 +- examples/advanced/xgboost/README.md | 2 +- examples/advanced/xgboost/data_job_setup.ipynb | 2 +- examples/hello-world/step-by-step/higgs/README.md | 4 ++-- 8 files changed, 9 insertions(+), 9 deletions(-) diff --git a/examples/advanced/random_forest/README.md b/examples/advanced/random_forest/README.md index 541e72017e..1476455099 100644 --- a/examples/advanced/random_forest/README.md +++ b/examples/advanced/random_forest/README.md @@ -11,7 +11,7 @@ which is an optimized distributed gradient boosting library covering random fore Follow along in this [notebook](./random_forest.ipynb) for an interactive experience. ### Dataset -This example illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs). +This example illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/). This dataset contains 11 million instances, each with 28 attributes. Please note that the UCI's website may experience occasional downtime. diff --git a/examples/advanced/random_forest/random_forest.ipynb b/examples/advanced/random_forest/random_forest.ipynb index cd38c942d4..5fd4792fc7 100644 --- a/examples/advanced/random_forest/random_forest.ipynb +++ b/examples/advanced/random_forest/random_forest.ipynb @@ -22,7 +22,7 @@ "which is an optimized distributed gradient boosting library, also covering random forest. In this example, we illustrate the use of NVFlare to carry out *horizontal* federated learning with tree-based collaboration - forming a random forest.\n", "\n", "### Dataset\n", - "This example illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).\n", + "This example illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).\n", "This dataset contains 11 million instances, each with 28 attributes.\n", "\n", "### Horizontal Federated Learning\n", diff --git a/examples/advanced/sklearn-linear/README.md b/examples/advanced/sklearn-linear/README.md index 54b9905fe9..0a962730a4 100644 --- a/examples/advanced/sklearn-linear/README.md +++ b/examples/advanced/sklearn-linear/README.md @@ -30,7 +30,7 @@ This can be achieved by setting the `warm_start` flag of SGDClassifier to `True` in order to allow repeated fitting of the classifiers to the local data. ## Data preparation -The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs). +The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/). This dataset contains 11 million instances, each with 28 attributes. Download the dataset from the HIGGS link above, containing a single `.csv` file. By default, we assume the dataset is downloaded, uncompressed, and stored in `DATASET_ROOT/HIGGS.csv`. diff --git a/examples/advanced/sklearn-linear/sklearn_linear_higgs.ipynb b/examples/advanced/sklearn-linear/sklearn_linear_higgs.ipynb index 0e5fadc02a..ab58b75b7e 100644 --- a/examples/advanced/sklearn-linear/sklearn_linear_higgs.ipynb +++ b/examples/advanced/sklearn-linear/sklearn_linear_higgs.ipynb @@ -77,7 +77,7 @@ "metadata": {}, "source": [ "## 2. Data preparation \n", - "The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).\n", + "The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).\n", "This dataset contains 11 million instances, each with 28 attributes.\n", "By default, we assume the dataset is downloaded, uncompressed, and stored in `DATASET_ROOT/HIGGS.csv`.\n", "Please note that the UCI's website may experience occasional downtime.\n" diff --git a/examples/advanced/vertical_xgboost/README.md b/examples/advanced/vertical_xgboost/README.md index 9330f8c0e7..bddf82f2a6 100644 --- a/examples/advanced/vertical_xgboost/README.md +++ b/examples/advanced/vertical_xgboost/README.md @@ -8,7 +8,7 @@ python3 -m pip install -r requirements.txt ``` ## Preparing HIGGS Data -In this example we showcase a binary classification task based on the [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs), which contains 11 million instances, each with 28 features and 1 class label. +In this example we showcase a binary classification task based on the [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/), which contains 11 million instances, each with 28 features and 1 class label. ### Download and Store Dataset First download the dataset from the HIGGS link above, which is a single zipped `.csv` file. diff --git a/examples/advanced/xgboost/README.md b/examples/advanced/xgboost/README.md index 8f101e502f..207e94334b 100644 --- a/examples/advanced/xgboost/README.md +++ b/examples/advanced/xgboost/README.md @@ -12,7 +12,7 @@ They use [XGBoost](https://github.com/dmlc/xgboost), which is an optimized distributed gradient boosting library. ### HIGGS -The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs). +The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/). This dataset contains 11 million instances, each with 28 attributes. Please note that the UCI's website may experience occasional downtime. diff --git a/examples/advanced/xgboost/data_job_setup.ipynb b/examples/advanced/xgboost/data_job_setup.ipynb index 18d04d366a..b70df6b386 100644 --- a/examples/advanced/xgboost/data_job_setup.ipynb +++ b/examples/advanced/xgboost/data_job_setup.ipynb @@ -21,7 +21,7 @@ "which is an optimized distributed gradient boosting library.\n", "\n", "### HIGGS\n", - "The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).\n", + "The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).\n", "This dataset contains 11 million instances, each with 28 attributes.\n", "Please note that the UCI's website may experience occasional downtime.\n", "\n", diff --git a/examples/hello-world/step-by-step/higgs/README.md b/examples/hello-world/step-by-step/higgs/README.md index 9f344b43d3..bf2315c948 100644 --- a/examples/hello-world/step-by-step/higgs/README.md +++ b/examples/hello-world/step-by-step/higgs/README.md @@ -1,8 +1,8 @@ # Training traditional ML classifiers with HIGGS dataset -The [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs) contains 11 million instances, each with 28 attributes, for binary classification to predict whether an event corresponds to the decayment of a Higgs boson or not. Follow the [prepare_data.ipynb](prepare_data.ipynb) notebook to download the HIGGS dataset and prepare the data splits. -(Please note that the [UCI's website](https://archive.ics.uci.edu/dataset/280/higgs) may experience occasional downtime) +The [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/) contains 11 million instances, each with 28 attributes, for binary classification to predict whether an event corresponds to the decayment of a Higgs boson or not. Follow the [prepare_data.ipynb](prepare_data.ipynb) notebook to download the HIGGS dataset and prepare the data splits. +(Please note that the [UCI's website](https://mlphysics.ics.uci.edu/data/higgs/) may experience occasional downtime) The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The data has been produced using Monte Carlo simulations. The first 21 features are kinematic properties measured by the particle detectors in the accelerator. The last 7 features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes.