When I use build.py to build a code llama model, I got an error when processing word-embedding layer.
The cause is that the word-embedding size of code llama is 32016 which can not be divisible by 64. The script padded its newly initialized word-embedding matrix to be divisible by 64, but it did not change the original word-embedding matrix. Therefore the script tried to fit a matrix of shape 32064,hidden_size into 32016,hidden_size.
I believe there exists a bug when building lm_head for code llama in build.py as well.
I fixed the bug by expand the code llama's word-embedding to 32064 and save it in advance, then process it in build.py.
Everything works perfectly after that.