|
| 1 | +# HSTU Training example |
| 2 | + |
| 3 | +We have supported both retrieval and ranking model whose backbones are HSTU layers. In this example collection, we allow user to specify the model structures via gin-config file. Supported datasets are listed below. Regarding the gin-config interface, please refer to [inline comments](../utils/gin_config_args.py) . |
| 4 | + |
| 5 | +## Parallelism Introduction |
| 6 | +To facilitate large embedding tables and scaling-laws of HSTU dense, we have integrate **[TorchRec](https://github.com/pytorch/torchrec)** that does shard embedding tables and **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** that enable dense parallelism(e.g Data, Tensor, Sequence, Pipeline, and Context parallelism) in this example. |
| 7 | +This integration ensures efficient training by coordinating sparse (embedding) and dense (context/data) parallelisms within a single model. |
| 8 | + |
| 9 | + |
| 10 | +## Environment Setup |
| 11 | +### Start from dockerfile |
| 12 | + |
| 13 | +We provide [dockerfile](../../../docker/Dockerfile) for users to build environment. |
| 14 | +``` |
| 15 | +git clone https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples |
| 16 | +docker build -f docker/Dockerfile --platform linux/amd64 -t recsys-examples:latest . |
| 17 | +``` |
| 18 | +If you want to build image for Grace, you can use |
| 19 | +``` |
| 20 | +git clone https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples |
| 21 | +docker build -f docker/Dockerfile --platform linux/arm64 -t recsys-examples:latest . |
| 22 | +``` |
| 23 | +You can also set your own base image with args `--build-arg <BASE_IMAGE>`. |
| 24 | + |
| 25 | +### Start from source file |
| 26 | +Before running examples, build and install libs under corelib following instruction in documentation: |
| 27 | +- [HSTU attention documentation](.../../../corelib/hstu/README.md) |
| 28 | +- [Dynamic Embeddings documentation](.../../../corelib/dynamicemb/README.md) |
| 29 | + |
| 30 | +On top of those two core libs, Megatron-Core along with other libs are required. You can install them via pypi package: |
| 31 | + |
| 32 | +```bash |
| 33 | +pip install torchx gin-config torchmetrics==1.0.3 typing-extensions iopath megatron-core==0.9.0 |
| 34 | +``` |
| 35 | + |
| 36 | +If you fail to install the megatron-core package, usually due to the python version incompatibility, please try to clone and then install the source code. |
| 37 | + |
| 38 | +```bash |
| 39 | +git clone -b core_r0.9.0 https://github.com/NVIDIA/Megatron-LM.git megatron-lm && \ |
| 40 | +pip install -e ./megatron-lm |
| 41 | +``` |
| 42 | + |
| 43 | +We provide our custom HSTU CUDA operators for enhanced performance. You need to install these operators using the following command: |
| 44 | + |
| 45 | +```bash |
| 46 | +cd /workspace/recsys-examples/examples/hstu && \ |
| 47 | +python setup.py install |
| 48 | +``` |
| 49 | +### Dataset Introduction |
| 50 | + |
| 51 | +We have supported several datasets as listed in the following sections: |
| 52 | + |
| 53 | +### Dataset Information |
| 54 | +#### **MovieLens** |
| 55 | +refer to [MovieLens 1M](https://grouplens.org/datasets/movielens/1m/) and [MovieLens 20M](https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset) for details. |
| 56 | +#### **KuaiRand** |
| 57 | + |
| 58 | +| dataset | # users | seqlen max | seqlen min | seqlen mean | seqlen median | # items | |
| 59 | +|---------------|---------|------------|------------|-------------|---------------|------------| |
| 60 | +| kuairand_pure | 27285 | 910 | 1 | 1 | 39 | 7551 | |
| 61 | +| kuairand_1k | 1000 | 49332 | 10 | 5038 | 3379 | 4369953 | |
| 62 | +| kuairand_27k | 27285 | 228000 | 100 | 11796 | 8591 | 32038725 | |
| 63 | + |
| 64 | +refer to [KuaiRand](https://kuairand.com/) for details. |
| 65 | + |
| 66 | +## Running the examples |
| 67 | + |
| 68 | +Before getting started, please make sure that all pre-requisites are fulfilled. You can refer to [Get Started](../../../README) section in the root directory of the repo to set up the environment. |
| 69 | + |
| 70 | + |
| 71 | +### Dataset preprocessing |
| 72 | + |
| 73 | +In order to prepare the dataset for training, you can use our `preprocessor.py` under the hstu example folder of the project. |
| 74 | + |
| 75 | +```bash |
| 76 | +cd <root-to-repo>/examples/hstu && |
| 77 | +mkdir -p ./tmp_data && python3 ./preprocessor.py --dataset_name <"ml-1m"|"ml-20m"|"kuairand-pure"|"kuairand-1k"|"kuairand-27k"> |
| 78 | + |
| 79 | +``` |
| 80 | + |
| 81 | +### Start training |
| 82 | +The entrypoint for training are `pretrain_gr_retrieval.py` or `pretrain_gr_ranking.py`. We use gin-config to specify the model structure, training arguments, hyper-params etc. |
| 83 | + |
| 84 | +Command to run retrieval task with `MovieLens 20m` dataset: |
| 85 | + |
| 86 | +```bash |
| 87 | +# Before running the `pretrain_gr_retrieval.py`, make sure that current working directory is `hstu` |
| 88 | +cd <root-to-project>examples/hstu |
| 89 | +PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 ./training/pretrain_gr_retrieval.py --gin-config-file ./training/configs/movielen_retrieval.gin |
| 90 | +``` |
| 91 | + |
| 92 | +To run ranking task with `MovieLens 20m` dataset: |
| 93 | +```bash |
| 94 | +# Before running the `pretrain_gr_ranking.py`, make sure that current working directory is `hstu` |
| 95 | +cd <root-to-project>examples/hstu |
| 96 | +PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 ./training/pretrain_gr_ranking.py --gin-config-file ./training/configs/movielen_ranking.gin |
| 97 | +``` |
| 98 | + |
| 99 | + |
0 commit comments