Skip to content

Build error in codellama model's word embedding layer (32016 not divisible by 64) #301

@CaesarWWK

Description

@CaesarWWK

When I use build.py to build a code llama model, I got an error when processing word-embedding layer.

The cause is that the word-embedding size of code llama is 32016 which can not be divisible by 64. The script padded its newly initialized word-embedding matrix to be divisible by 64, but it did not change the original word-embedding matrix. Therefore the script tried to fit a matrix of shape 32064,hidden_size into 32016,hidden_size.

I believe there exists a bug when building lm_head for code llama in build.py as well.

I fixed the bug by expand the code llama's word-embedding to 32064 and save it in advance, then process it in build.py.
Everything works perfectly after that.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions