diff --git a/README.md b/README.md
index 805376773..d5dada153 100644
--- a/README.md
+++ b/README.md
@@ -243,7 +243,7 @@ New options to note:
 
 In LLaVA-1.5, we evaluate models on a diverse set of 12 benchmarks. To ensure the reproducibility, we evaluate the models with greedy decoding. We do not evaluate using beam search to make the inference process consistent with the chat demo of real-time outputs.
 
-Detailed evaluation scripts coming soon.
+See [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md).
 
 ### GPT-assisted Evaluation