Skip to content

Commit 1f92c6e

Browse files
committed
updated README
1 parent dec0124 commit 1f92c6e

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

benchmarks/data_generator/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,14 @@ The following tools help analyze and synthesize new data based on the [mooncake
2626

2727
**Hash ID Generation:** Each new hash ID is the next consecutive integer after the last one used. Two `hash_ids` sharing the same integers represents the prefix overlap. To generate these increasing hash IDs from a list of texts, we provide the `texts_to_hashes` function in `hasher.py`.
2828

29+
> [!note]The `hashes_to_texts` function can then be used to generate back random texts from these hash IDs sampling from Lorem Ipsum.
30+
2931
**Timestamp:** The arrival time (in milliseconds) of the request since the first request, which can be the same for multiple requests arriving simultaneously.
3032

3133
**Block Size and Hash IDs:** In this example, the `block_size` (the page size of the KV cache) is assumed to be 512. The length of the `hash_ids` array equals `input_length // block_size`.
3234

35+
A general workflow can use `texts_to_hashes` to convert texts to hashes, then use `datagen synthesize` to generate new hashes, then use `hashes_to_texts` to convert them back to random texts.
36+
3337
## Prefix Analyzer
3438

3539
The Prefix Analyzer provides statistics on a trace file, such as Input Sequence Length (ISL), Output Sequence Length (OSL), and theoretical cache hit rate.

0 commit comments

Comments
 (0)