You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned both in the paper and the webpage, DeepSeek-V2-Lite has a total of 15.7B params and 2.4B active params. However, the active params in my quick math is a little over 2.6B params while the total params seem to be the same including the word embedding and lm_head. Please point out any mistake I might make in calculating the active params.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
As mentioned both in the paper and the webpage, DeepSeek-V2-Lite has a total of 15.7B params and 2.4B active params. However, the active params in my quick math is a little over 2.6B params while the total params seem to be the same including the word embedding and lm_head. Please point out any mistake I might make in calculating the active params.
Thanks!
The text was updated successfully, but these errors were encountered: