Updated DAPT readme

Signed-off-by: Janaki <jvamaraju@nvidia.com>
NVIDIA · Oct 2, 2024 · aab9c96 · aab9c96
1 parent bd23c35
commit aab9c96
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/tutorials/llm/llama/domain-adaptive-pretraining/README.md b/tutorials/llm/llama/domain-adaptive-pretraining/README.md
@@ -11,6 +11,6 @@ Here, we share a tutorial with best practices on custom tokenization + DAPT (dom
 
 * `./code/data` should contain curated data from chip domain after processing with NeMo Curator. Playbook for DAPT data curation can be found [here](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/dapt-curation)
 
-* `./code/general_data should contain open-source general purpose data that the llama-2 was trained on. This data will help idenitfy token/vocabulary differences between general purpose and domain-specific datasets. This data can be downloaded from [Wikepedia](https://huggingface.co/datasets/legacy-datasets/wikipedia), [CommonCrawl](https://data.commoncrawl.org/) etc. and curated with [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/single_node_tutorial)
+* `./code/general_data` should contain open-source general purpose data that llama-2 was trained on. This data will help idenitfy token/vocabulary differences between general purpose and domain-specific datasets. Data can be downloaded from [Wikepedia](https://huggingface.co/datasets/legacy-datasets/wikipedia), [CommonCrawl](https://data.commoncrawl.org/) etc. and curated with [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/single_node_tutorial)
 
 * `./code/custom_tokenization.ipynb` walks through the custom tokenization workflow required for DAPT (Domain Adaptive Pre-training)