Skip to content

Commit

Permalink
Update README.
Browse files Browse the repository at this point in the history
  • Loading branch information
AFDWang committed Nov 30, 2023
1 parent efd5789 commit 00b8abc
Showing 1 changed file with 14 additions and 14 deletions.
28 changes: 14 additions & 14 deletions galvatron/models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ sh scripts/search_dist.sh
Users can customize multiple parallelism optimization options:

### Model Configuration
Users can set `model_size` and easily get a pre-defined model configuration. User can also customize model configuration: specify `set_model_config_manually` to `1` and specify model configs manually, or specify `set_layernum_manually` to `1` and specify layer numbers manually only.
Users can set `model_size` and easily get a pre-defined model configuration. Users can also customize model configuration: specify `set_model_config_manually` to `1` and specify model configs manually, or specify `set_layernum_manually` to `1` and specify layer numbers manually only.

### Cluster Size & Memory Constraint
Galvatron can perform searching over multiple nodes with same number of GPUs. Users should set `num_nodes`, `num_gpus_per_node` and `memory_constraint` (memory budget for each GPU).
Expand All @@ -52,33 +52,33 @@ sh scripts/train_dist.sh
Users can customize multiple training options:

### Model Configuration
Users can set `model_size` and easily get a pre-defined model configuration. User can also customize model configuration: specify `set_model_config_manually` to `1` and specify model configs manually, or specify `set_layernum_manually` to `1` and specify layer numbers manually only.
Users can set `model_size` and easily get a pre-defined model configuration. Users can also customize model configuration: specify `set_model_config_manually` to `1` and specify model configs manually, or specify `set_layernum_manually` to `1` and specify layer numbers manually only.

### Cluster Environment
Galvatron can perform training over multiple nodes with same number of GPUs. Users should set ```NUM_NODES, NUM_GPUS_PER_NODE, MASTER_ADDR, MASTER_PORT, NODE_RANK``` according to the environment.

### Parallelism Strategy

In distributed training with Galvatron, users can either train models with the optimal parallel strategy searched by the parallelism optimization to obtain the optimal throughput, or specify the hybrid parallel strategies as they like.
In distributed training with Galvatron, users can either train models with the optimal parallelism strategy searched by the parallelism optimization to obtain the optimal throughput, or specify the hybrid parallelism strategies as they like.

#### JSON Config Mode [Recommended]
JSON config mode is a **recommended** layerwise hybrid parallel training mode, activated by assigning argument `galvatron_config_path` with the config path in `configs` directory. In JSON config mode, users don't need be aware of the details of searched parallel strategies, and don't need to tune any parallel strategies or hyper-parameters. As the optimized parallel strategy will be saved in `configs` directory after parallelism optimization, users can run the optimal parallel strategies by simply assigning `galvatron_config_path` with the corresponding config path (e.g., assign ```galvatron_config_path``` as ```./configs/galvatron_config_xxx.json```).
JSON config mode is a **recommended** layerwise hybrid parallel training mode, activated by assigning argument `galvatron_config_path` with the config path in `configs` directory. In JSON config mode, users don't need be aware of the details of searched parallelism strategies, and don't need to tune any parallelism strategies or hyper-parameters. Users can simply use the searched optimal parallelism strategy saved in `configs` directory by setting `galvatron_config_path` as `./configs/galvatron_config_xxx.json`. For advanced users, JSON config mode also provides a more fine-grained approach to parallelism tuning.

#### GLOBAL Config Mode
GLOBAL config mode is a global hybrid parallel training mode, activated by assigning argument `galvatron_config_path` as `None`. In this mode, users can specify `pp_deg`, `global_tp_deg`, `global_tp_consec`, `sdp`, `global_train_batch_size`, `chunks`, `global_checkpoint`, `pipeline_type` to determine the global parallelism strategy, and all the layers of the Transformer model uses the same hybrid parallel strategy assigned by the users (just as in Megatron-LM).
GLOBAL config mode is a global hybrid parallel training mode, activated by assigning argument `galvatron_config_path` as `None`. In this mode, users can specify `pp_deg`, `global_tp_deg`, `global_tp_consec`, `sdp`, `global_train_batch_size`, `chunks`, `global_checkpoint`, `pipeline_type` to determine the global parallelism strategy, and all the layers of the Transformer model uses the same hybrid parallelism strategy assigned by the users (just as in Megatron-LM).

### Arguments
1. JSON Config Mode
- galvatron_config_path: str, json config path, whether to activate JSON config mode. If activated, arguments in GLOBAL config mode will be ignored and overwritten by the JSON config.
- `galvatron_config_path`: str, json config path, whether to activate JSON config mode. If activated, arguments in GLOBAL config mode will be ignored and overwritten by the JSON config.
2. GLOBAL Config Mode
- global_train_batch_size: Integer, global batch size of distributed training.
- pp_deg: Integer, pipeline (PP) degree,.
- global_tp_deg: Integer, tensor parallel (TP) degree.
- global_tp_consec: 0/1, whether the communication group of TP is consecutive, (eg., [0,1,2,3] is consecutive while [0,2,4,6] is not).
- sdp: 0/1, whether to use SDP instead of DP.
- chunks: Integer, number of microbatches of PP.
- global_checkpoint: 0/1, whether to turn on activation checkpointing to the whole model.
- pipeline_type: `gpipe` or `pipedream_flush`, choose the pipeline type to use.
- `global_train_batch_size`: Integer, global batch size of distributed training.
- `pp_deg`: Integer, pipeline (PP) degree,.
- `global_tp_deg`: Integer, tensor parallel (TP) degree.
- `global_tp_consec`: `0`/`1`, whether the communication group of TP is consecutive, (eg., [0,1,2,3] is consecutive while [0,2,4,6] is not).
- `sdp`: `0`/`1`, whether to use SDP instead of DP.
- `chunks`: Integer, number of microbatches of PP.
- `global_checkpoint`: `0`/`1`, whether to turn on activation checkpointing to the whole model.
- `pipeline_type`: `gpipe` or `pipedream_flush`, choose the pipeline type to use.

### Other Training Optimizations
Set `mixed_precision` to allow mixed precision training, e.g., `bf16`. Set `use-flash-attn` to allow [FlashAttention-2](https://github.com/Dao-AILab/flash-attention) features.
Expand Down

0 comments on commit 00b8abc

Please sign in to comment.