-
Notifications
You must be signed in to change notification settings - Fork 78
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
103 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Update Log 0.2.3 | ||
|
||
**Full Changelog**: https://github.com/OpenBMB/BMTrain/compare/0.2.0...0.2.3 | ||
|
||
|
||
## What's New | ||
|
||
### 1. Get rid of torch cpp extension when compiling | ||
|
||
Before 0.2.3, the installation of BMTrain requires the torch cpp extension, which is not friendly to some users (it requires CUDA Runtime fits with torch). Now we get rid of the torch cpp extension when compiling BMTrain, which makes the source-code way installation of BMTrain more convenient. | ||
Just run `pip install .` to install BMTrain using source code. | ||
|
||
### 2. CICD | ||
|
||
In 0.2.3, we bring the Github action CICD to BMTrain. Now we can run the CI/CD pipeline on Github to ensure the quality of the code. CICD will run the test cases and compile the source code into wheel packages. | ||
|
||
### 3. Loss scale management | ||
|
||
In 0.2.3, we add the min and max loss scale to the loss scale manager. The loss scale manager can adjust the loss scale dynamically according to the loss scale's min and max value. This feature can help users to avoid the loss scale being too large or too small. | ||
|
||
|
||
### 3. Others | ||
|
||
* Fix `bmt.load(model)` OOM when meets torch >= 1.12 | ||
* `AdamOffloadOptimizer` can choose avx flag automatically in runtime | ||
* Now BMTrain is fully compatible with torch 2.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# Update Log 1.0.0 | ||
|
||
**Full Changelog**: https://github.com/OpenBMB/BMTrain/compare/0.2.3...1.0.0 | ||
|
||
## What's New | ||
|
||
### 1. Using pytorch's hook mechanism to refactor ZeRO, checkpoint, pipeline, communication implementation | ||
|
||
Now user can specify zero level of each `bmt.CheckpointBlock`. | ||
|
||
**======= Before 1.0.0 =======** | ||
|
||
```python | ||
import bmtrain as bmt | ||
bmt.init_distributed(zero_level=3) | ||
|
||
``` | ||
|
||
The zero level setting can only set globally and computation checkpointing can not be disabled. | ||
For `bmt.TransformerBlockList`, it has to call a blocklist forward instead of a loop way | ||
|
||
**======= After 1.0.0 =======** | ||
|
||
```python | ||
import bmtrain as bmt | ||
bmt.init_distributed() | ||
# construct block | ||
class Transformer(bmt.DistributedModule): | ||
def __init__(self, | ||
num_layers : int) -> None: | ||
super().__init__() | ||
|
||
self.transformers = bmt.TransformerBlockList([ | ||
bmt.Block( | ||
TransformerEncoder( | ||
dim_model, dim_head, num_heads, dim_ff, bias, dtype | ||
), use_checkpoint=True, zero_level=3 | ||
) | ||
for _ in range(num_layers) | ||
]) | ||
|
||
def forward(self): | ||
# return self.transformers(x) v0.2.3 can only forward in this way | ||
for block in self.transformers: | ||
x = block(x) | ||
return x | ||
|
||
``` | ||
|
||
You can specify the zero level of each `bmt.CheckpointBlock` (alias of `bmt.Block`) and computation checkpointing can be disabled by setting `use_checkpoint=False` . For `bmt.TransformerBlockList`, it can be called in a loop way. | ||
|
||
|
||
### 2. Add Bf16 support | ||
|
||
Now BMTrain supports Bf16 training. You can simply use `dtype=torch.bfloat16' in your model construction method and BMTrain will handle the rest. | ||
|
||
### 3. Tensor parallel implementation | ||
|
||
For this part, BMTrain only provides a series of parallel ops for Tensor parallel implementation, including `bmt.nn.OpParallelLinear` and `bmt.nn.VPEmbedding` . We also provide a Tensor Parallel training example in our training example. You can simply use `bmt.init_distributed(tp_size=4)` to enable a 4-way tensor parallel training. | ||
|
||
### 4. `AdamOffloadOptimizer` can save whole gathered state | ||
|
||
Now `AdamOffloadOptimizer` can save whole gathered state. This feature can help users to save the whole gathered state of the optimizer, which can be used to resume training from the saved state. For better performance, we provide async-way save state_dict to overlap I/O and computation. | ||
```python | ||
import bmtrain as bmt | ||
# you can enbale this feature in two ways: Optimmanager's or optimizer's interface | ||
global_ckpt = bmt.optim.Optimmanager.state_dict(gather_opt=True) | ||
global_ckpt = optimizer.state_dict(gather=True) | ||
``` | ||
### Others | ||
|
||
* New test for new version BMTrain |