Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.5] Update higgs data link #2945

Merged
merged 1 commit into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/advanced/random_forest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ which is an optimized distributed gradient boosting library covering random fore
Follow along in this [notebook](./random_forest.ipynb) for an interactive experience.

### Dataset
This example illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).
This example illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).
This dataset contains 11 million instances, each with 28 attributes.

Please note that the UCI's website may experience occasional downtime.
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/random_forest/random_forest.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"which is an optimized distributed gradient boosting library, also covering random forest. In this example, we illustrate the use of NVFlare to carry out *horizontal* federated learning with tree-based collaboration - forming a random forest.\n",
"\n",
"### Dataset\n",
"This example illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).\n",
"This example illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).\n",
"This dataset contains 11 million instances, each with 28 attributes.\n",
"\n",
"### Horizontal Federated Learning\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/sklearn-linear/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ This can be achieved by setting the `warm_start` flag of SGDClassifier to
`True` in order to allow repeated fitting of the classifiers to the local data.

## Data preparation
The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).
The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).
This dataset contains 11 million instances, each with 28 attributes. Download the dataset from the HIGGS link above, containing a single `.csv` file.
By default, we assume the dataset is downloaded, uncompressed, and stored
in `DATASET_ROOT/HIGGS.csv`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
"metadata": {},
"source": [
"## 2. Data preparation \n",
"The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).\n",
"The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).\n",
"This dataset contains 11 million instances, each with 28 attributes.\n",
"By default, we assume the dataset is downloaded, uncompressed, and stored in `DATASET_ROOT/HIGGS.csv`.\n",
"Please note that the UCI's website may experience occasional downtime.\n"
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/vertical_xgboost/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ python3 -m pip install -r requirements.txt
```

## Preparing HIGGS Data
In this example we showcase a binary classification task based on the [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs), which contains 11 million instances, each with 28 features and 1 class label.
In this example we showcase a binary classification task based on the [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/), which contains 11 million instances, each with 28 features and 1 class label.

### Download and Store Dataset
First download the dataset from the HIGGS link above, which is a single zipped `.csv` file.
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/xgboost/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ They use [XGBoost](https://github.com/dmlc/xgboost),
which is an optimized distributed gradient boosting library.

### HIGGS
The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).
The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).
This dataset contains 11 million instances, each with 28 attributes.

Please note that the UCI's website may experience occasional downtime.
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/xgboost/data_job_setup.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"which is an optimized distributed gradient boosting library.\n",
"\n",
"### HIGGS\n",
"The examples illustrate a binary classification task based on [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs).\n",
"The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).\n",
"This dataset contains 11 million instances, each with 28 attributes.\n",
"Please note that the UCI's website may experience occasional downtime.\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions examples/hello-world/step-by-step/higgs/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@

# Training traditional ML classifiers with HIGGS dataset

The [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs) contains 11 million instances, each with 28 attributes, for binary classification to predict whether an event corresponds to the decayment of a Higgs boson or not. Follow the [prepare_data.ipynb](prepare_data.ipynb) notebook to download the HIGGS dataset and prepare the data splits.
(Please note that the [UCI's website](https://archive.ics.uci.edu/dataset/280/higgs) may experience occasional downtime)
The [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/) contains 11 million instances, each with 28 attributes, for binary classification to predict whether an event corresponds to the decayment of a Higgs boson or not. Follow the [prepare_data.ipynb](prepare_data.ipynb) notebook to download the HIGGS dataset and prepare the data splits.
(Please note that the [UCI's website](https://mlphysics.ics.uci.edu/data/higgs/) may experience occasional downtime)

The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator.
The data has been produced using Monte Carlo simulations. The first 21 features are kinematic properties measured by the particle detectors in the accelerator. The last 7 features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes.
Expand Down
Loading