GLM-130B 模型结构超参问题 #196

peiyingxin · 2023-07-05T08:11:15Z

GLM-130B在设置模型超参时，ffn_hidden_size=12288 attention_head=96 layers=70，LLaMA-65B 模型超参 ffn_hidden_size=8192 attention_head=64 layers=80, GLM-130B似乎更宽，业界主流模型似乎更深？请问GLM-130B模型设计时是出于什么考虑选用这个超参的呢？谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM-130B 模型结构超参问题 #196

GLM-130B 模型结构超参问题 #196

peiyingxin commented Jul 5, 2023

GLM-130B 模型结构超参问题 #196

GLM-130B 模型结构超参问题 #196

Comments

peiyingxin commented Jul 5, 2023