Build error in codellama model's word embedding layer (32016 not divisible by 64)

When I use build.py to build a code llama model, I got an error when processing word-embedding layer. 

The cause is that the word-embedding size of code llama is 32016 which can not be divisible by 64. The script padded its newly initialized word-embedding matrix to be divisible by 64, but it did not change the original word-embedding matrix. Therefore the script tried to fit a matrix of shape 32064,hidden_size into 32016,hidden_size. 

I believe there exists a bug when building lm_head for code llama in build.py as well.

I fixed the bug by expand the code llama's word-embedding to 32064 and save it in advance, then process it in build.py.
Everything works perfectly after that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build error in codellama model's word embedding layer (32016 not divisible by 64) #301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build error in codellama model's word embedding layer (32016 not divisible by 64) #301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions