Skip to content

Commit 45e700a

Browse files
committed
update hstu example readme
1 parent 3c4b347 commit 45e700a

File tree

1 file changed

+4
-62
lines changed

1 file changed

+4
-62
lines changed

examples/hstu/README.md

Lines changed: 4 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
## Generative Recommender Introduction
44
Meta's paper ["Actions Speak Louder Than Words"](https://arxiv.org/abs/2402.17152) introduces a novel paradigm for recommendation systems called **Generative Recommenders(GRs)**, which reformulates recommendation tasks as generative modeling problems. The work introduced Hierarchical Sequential Transduction Units (HSTU), a novel architecture designed to handle high-cardinality, non-stationary data streams in large-scale recommendation systems. HSTU enables both retrieval and ranking tasks. As noted in the paper, “HSTU-based GRs, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4% and have been deployed on multiple surfaces of a large internet platform with billions of users.”
5-
While **distributed-recommender** supports both retrieval and ranking use cases, in the following sections, we will guide you through the process of building a generative recommender for ranking tasks.
5+
6+
In this example, we introduce the model architecture, training, and inference processes of HSTU. For more details, refer to the [training](./training/) and [inference](./inference/) entry folders, which include comprehensive guides and benchmark results.
67

78
## Ranking Model Introduction
89
The model structure of the generative ranking model can be depicted by the following picture.
@@ -31,69 +32,10 @@ The HSTU block is a core component of the architecture, which modifies tradition
3132
### Prediction Head
3233
The prediction head of the HSTU model employs a MLP network structure, enabling multi-task predictions.
3334

34-
## Parallelism for HSTU-based Generative Recommender
35-
Scaling is a crucial factor for HSTU-based GRs due to their demonstrated superior scalability compared to traditional Deep Learning Recommendation Models (DLRMs). According to the paper, while DLRMs plateau at around 200 billion parameters, GRs can scale up to 1.5 trillion parameters, resulting in improved model accuracy.
36-
37-
However, achieving efficient scaling for GRs presents unique challenges. Existing libraries designed for large-scale training in LLMs or recommendation systems often fail to meet the specific needs of GRs:
38-
* **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)**, which supports advanced parallelism (e.g Data, Tensor, Sequence, Pipeline, and Context parallelism), is not well-suited for recommendation systems due to their reliance on massive embedding tables that cannot be effectively handled by existing parallelism.
39-
* **[TorchRec](https://github.com/pytorch/torchrec)**, while providing solutions for sharding large embedding tables across GPUs, lacks robust support for dense model parallelism. This makes it difficult for users to combine embedding and dense parallelism without significant design effort
40-
41-
To address these limitations, a hybrid approach combining sparse and dense parallelism is introduced as the pic shows.
42-
**TorchRec** is employed to shard large embedding tables effectively.
43-
**Megatron-Core** is used to support data and context parallelism for the dense components of the model. Please note that context parallelism is planned as part of future development.
44-
This integration ensures efficient training by coordinating sparse (embedding) and dense (context/data) parallelisms within a single model.
45-
![parallelism](./figs/parallelism.png)
46-
47-
48-
## Dataset Introduction
49-
50-
We have supported several datasets as listed in the following sections:
51-
52-
### Dataset Information
53-
#### **MovieLens**
54-
refer to [MovieLens 1M](https://grouplens.org/datasets/movielens/1m/) and [MovieLens 20M](https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset) for details.
55-
#### **KuaiRand**
56-
57-
| dataset | # users | seqlen max | seqlen min | seqlen mean | seqlen median | # items |
58-
|---------------|---------|------------|------------|-------------|---------------|------------|
59-
| kuairand_pure | 27285 | 910 | 1 | 1 | 39 | 7551 |
60-
| kuairand_1k | 1000 | 49332 | 10 | 5038 | 3379 | 4369953 |
61-
| kuairand_27k | 27285 | 228000 | 100 | 11796 | 8591 | 32038725 |
62-
63-
refer to [KuaiRand](https://kuairand.com/) for details.
64-
6535
## Running the examples
6636

67-
Before getting started, please make sure that all pre-requisites are fulfilled. You can refer to [Get Started][../../README] section in the root directory of the repo to set up the environment.
68-
69-
### Dataset Preprocessing
70-
We provides preprocessor scripts to assist in downloading raw data if it is not already present. It processes the raw data into csv files.
71-
```bash
72-
mkdir -p ./tmp_data && python3 preprocessor.py --dataset_name <dataset-name>
73-
```
74-
The following dataset-name is supported:
75-
* ml-1m
76-
* ml-20m
77-
* kuairand-pure
78-
* kuairand-1k
79-
* kuairand-27k
80-
* all: preprocess all above datasets
81-
82-
83-
### Start training
84-
The entrypoint for training are `pretrain_gr_retrieval.py` or `pretrain_gr_ranking.py`. We use gin-config to specify the model structure, training arguments, hyper-params etc.
85-
To run retrieval task with `MovieLens 20m` dataset:
86-
87-
```bash
88-
# Before running the `pretrain_gr_retrieval.py`, make sure that current working directory is `hstu`
89-
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 pretrain_gr_retrieval.py --gin-config-file movielen_retrieval.gin
90-
```
91-
92-
To run ranking task with `MovieLens 20m` dataset:
93-
```bash
94-
# Before running the `pretrain_gr_ranking.py`, make sure that current working directory is `hstu`
95-
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 pretrain_gr_ranking.py --gin-config-file movielen_ranking.gin
96-
```
37+
* [HSTU training example](./training/)
38+
* [HSTU inference example](./inference/)
9739

9840
# Acknowledgements
9941

0 commit comments

Comments
 (0)