Provide guidance for specifying values

mikemckiernan · mikemckiernan · commit a83c49dbadce · 2025-02-04T15:30:56.000-05:00
Refer to NVIDIA-NeMo#966. Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
diff --git a/docs/user-guides/configuration-guide.md b/docs/user-guides/configuration-guide.md
@@ -697,10 +697,18 @@ The following table describes the subfields for the `streaming` field:
 * - streaming.chunk_size
   - Specifies the number of tokens for each chunk.
     The toolkit applies output guardrails on each chunk of tokens.
+
+    Larger values provide more meaningful information for the rail to assess,
+    but can add latency while accumulating tokens for a full chunk.
+    The risk of higher latency is especially true if you specify `stream_first: False`.
   - `200`
 
 * - streaming.context_size
   - Specifies the number of tokens to keep from the previous chunk to provide context and continuity in processing.
+
+    Larger values provide continuity across chunks with minimal impact on latency.
+    Small values might fail to detect cross-chunk violations.
+    Specifying approximately 25% of `chunk_size` provides a good compromise.
   - `50`
 
 * - streaming.stream_first