Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add whiten ops before compute advatanges (#887)
* Add whiten ops before compute advatanges 1. From LLaMA 2 paper, it says: ``` We also find it important to whiten the final linear scores (shown here by reversing the sigmoid with the logit function) in order to increase stability and balance properly with the KL penalty term (β) above. ``` 2. This function is taken from [alpaca_farm](https://github.com/tatsu-lab/alpaca_farm/blob/64e489c67ea502ab5fa944bebde3078c9722f6ee/src/alpaca_farm/rl/ppo_trainer.py#L86) * Fix type def of self --------- Co-authored-by: Lin Junpeng <linjunpeng@sensetime.com>
- Loading branch information