You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
***Definite Articles:** Removed definite articles where possible to streamline language. (Eg: Changed "The string to replicate" to "String to replicate")
286
286
***Type Annotations:**
287
287
* Always include type definitions, indicating if a parameter is optional and specifying the default value.
288
-
* Note that `Optional` means that the value can be `None`, and `*optional*` means that it is not required for the user to pass a value.
289
-
E.g., for arguments that can't be `None` and aren't required:
290
-
291
-
```txt
292
-
foo (`int`, *optional*, defaults to `4`):
293
-
```
294
-
295
-
For arguments that can be `None` and are required:
296
-
297
-
```txt
298
-
foo (`Optional[int]`):
299
-
```
300
-
301
-
for arguments that can be `None` and aren't required (in this case, if the default value is `None`, you can omit it):
302
-
303
-
```txt
304
-
foo (`Optional[int]`, *optional*):
305
-
```
306
288
307
289
***String Defaults:**
308
290
* Ensured that default string values are wrapped in double quotes:
On-Policy Distillation involves a student model generating rollouts for each batch of training data. We subsequently obtain the probability distributions for each token of the rollouts from both the student and teacher models. The student model is then optimized to minimize the negative Kullback-Leibler (KL) divergence between its own token distributions and those of the teacher model.
> Packing may cause batch contamination, where adjacent sequences influence one another. This can be problematic for some applications. For more details, see [#1230](https://github.com/huggingface/trl/issues/1230).
95
-
96
93
## Liger for reducing peak memory usage
97
94
98
95
> [Liger Kernel](https://github.com/linkedin/Liger-Kernel) is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduce memory usage by 60%.
0 commit comments