Skip to content
ymcui edited this page May 8, 2024 · 6 revisions

Frequently Asked Questions

Question 1: Why didn't you expand the vocabulary like in phases one and two?

Answer: For several reasons: 1) The Llama-3 vocabulary has already been expanded to 128K, which is a significant increase compared to its predecessors; 2) Encoding efficiency tests on the Chinese Wikipedia show that the Llama-3 vocabulary's efficiency is roughly equivalent to our Chinese LLaMA-2 (about 95%); 3) Our work on Chinese Mixtral demonstrates that expanding the vocabulary is not a necessary condition for large model language transfer. Relevant experiments can be referenced in our paper: https://arxiv.org/abs/2403.01851

Question 2: Will there be a 70B version released?

Answer: Currently, there is no guarantee. Training might be considered if a suitable opportunity arises.

Question 3: Why is the instruction model not called Alpaca anymore?

Answer: Due to the constraints related to resource distribution described in the Llama-3 license.

Question 4: Can the models in this repository be used commercially?

Answer: Yes, but please carefully read the commercial licensing requirements of the original Llama-3 beforehand. Developers are responsible for compliance when using these models and should seek legal support if necessary. This project is not responsible for any results or consequential losses arising from the use of these models.

Question 5: Why not perform full pre-training on the model instead of using LoRA?

Answer: For several reasons: 1) Training cost and efficiency; 2) Llama has evolved through three generations, with its Chinese capabilities also gradually improving, and LoRA allows for rapid enhancement of Chinese understanding and generation capabilities; 3) Benchmark tests on some open-source Chinese Llama-3 models show that models trained with full parameters do not outperform those trained using the PEFT method. Therefore, using LoRA in this project is the result of balancing multiple factors.

Question 6: Why are the conversational capabilities of Llama-3-Chinese not good?

Answer: Llama-3-Chinese is a foundational model, not a conversational model. It is primarily used for purposes such as fine-tuning. For conversational/question-and-answer purposes, please use the Instruct version.

Question 7: Why does the instruction model reply saying it is ChatGPT?

Answer: We did not include any identity data during the training of the instruction model; therefore, the model's output mainly depends on the training data used during the SFT stage. Since the SFT data incorporated a lot of data scraped from ChatGPT, the model tends to mimic ChatGPT's behavior, which is why it might reply that it is ChatGPT. Users are advised not to worry about this aspect. If needed, you can: 1) Write system instructions to assign identity information to the model; 2) Construct identity instruction data and further fine-tune on top of our model.

Question 8: What are the differences between v1 and v2 of the Instruct model?

Answer: Llama-3-Chinese-8B-Instruct (v1) is based on the original Meta-Llama-3-8B and was pre-trained using 120G of Chinese corpus, then further fine-tuned using 5 million instructional data. Llama-3-Chinese-8B-Instruct-v2 is based on original Meta-Llama-3-8B-Instruct and was directly fine-tuned using 500 instructional data. Since the downstream task performance of the original Meta-Llama-3-8B-Instruct is higher than that of Meta-Llama-3-8B, the downstream task performance of Llama-3-Chinese-8B-Instruct-v2 is also somewhat higher than that of Llama-3-Chinese-8B-Instruct. However, downstream task performance is just one aspect; the actual subjective experience also requires choosing the appropriate model depending on the scenario.

Clone this wiki locally