Skip to content

Commit a83c49d

Browse files
committed
Provide guidance for specifying values
Refer to NVIDIA-NeMo#966. Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
1 parent 7af6a51 commit a83c49d

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/user-guides/configuration-guide.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -697,10 +697,18 @@ The following table describes the subfields for the `streaming` field:
697697
* - streaming.chunk_size
698698
- Specifies the number of tokens for each chunk.
699699
The toolkit applies output guardrails on each chunk of tokens.
700+
701+
Larger values provide more meaningful information for the rail to assess,
702+
but can add latency while accumulating tokens for a full chunk.
703+
The risk of higher latency is especially true if you specify `stream_first: False`.
700704
- `200`
701705

702706
* - streaming.context_size
703707
- Specifies the number of tokens to keep from the previous chunk to provide context and continuity in processing.
708+
709+
Larger values provide continuity across chunks with minimal impact on latency.
710+
Small values might fail to detect cross-chunk violations.
711+
Specifying approximately 25% of `chunk_size` provides a good compromise.
704712
- `50`
705713

706714
* - streaming.stream_first

0 commit comments

Comments
 (0)