-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gguf: Add types for LLM architectures #640
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool work @ngxson
I've launched the CI run, let me know if any help is needed, LGTM otherwise
@ngxson locally you can check the errors since the CI uses same commands:
|
@julien-c @mishig25 Regarding how to actually use these types, I'm thinking about having a list of functions like this: getArchLlama(metadata: GGUFMetadata): ArchLlama?
getArchFalcon(metadata: GGUFMetadata): ArchFalcon?
... Which returns The idea is to prevent user from doing something like Do you have any other ideas? Thanks. P/s: I'm not planning to do that in this PR, it should be in another PR with more considerations before actually implement it. |
the proposal sounds good to me.
You can start with a draft PR and we can iterate if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can write a small section on readme about this change, that would be great.
I've left one last nit here #640 (comment)
LGTM! Unless @julien-c comments in the next few hours, I will merge this PR soon
Thanks for the careful review. For now it's not very clear how to use these types, so I don't think I can add new section to readme right now. Maybe with the next PR (the proposal that you mentioned), I will add an example to readme. |
"whisper", | ||
] as const; | ||
|
||
const ARCHITECTURES = [...LLM_ARCHITECTURES, "rwkv", "whisper"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ARCHITECTURES = [...LLM_ARCHITECTURES, "rwkv", "whisper"]; | |
const ARCHITECTURES = [...LLM_ARCHITECTURES, "rwkv", "whisper"] as const; |
I think you need to add this here otherwise the type inference is string[]
– unless we're not using the same version of typescript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re. your proposal, please note that I am not super good with TS type inference, but IMO the ideal way we would use this is to enable like this:
if (metadata["general.architecture"] === "grok") {
const x = metadata; // type = TransformerLLMBase<"grok">
const foo = x["grok.attention.alibi_bias_max"]; // type = number
// and acess all other grok specific properties, with the right typing
}
do you think it's possible at all?
checked it manually. As of right now, ts server is just giving generic cc: @coyotte508 @ngxson do you know if above request is possible #640 (comment) ? |
maybe by doing some generics stuff in export type GGUFMetadata<K extends Architecture = Architecture> = {
version: Version;
tensor_count: bigint;
kv_count: bigint;
} & Partial<General<K>> &
Partial<ModelBase<K>>; maybe too complex though |
> = { [K in `${TArchitecture}.layer_count`]: number } & { [K in `${TArchitecture}.feed_forward_length`]: number } & { | ||
[K in `${TArchitecture}.context_length`]: number; | ||
} & { [K in `${TArchitecture}.embedding_length`]: number } & { [K in `${TArchitecture}.block_count`]: number }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's maybe more readable to have:
> = { [K in `${TArchitecture}.layer_count`]: number } & { [K in `${TArchitecture}.feed_forward_length`]: number } & { | |
[K in `${TArchitecture}.context_length`]: number; | |
} & { [K in `${TArchitecture}.embedding_length`]: number } & { [K in `${TArchitecture}.block_count`]: number }; | |
> = Record< | |
| `${TArchitecture}.layer_count` | |
| `${TArchitecture}.feed_forward_length` | |
| `${TArchitecture}.context_length` | |
| `${TArchitecture}.embedding_length` | |
| `${TArchitecture}.block_count`, | |
number | |
> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's being formatted by pnpm run format
. Can we maybe ignore specific lines from the formatter?
Sorry I misunderstood the proposal. Yeah seems like a good idea, I'll try that (and also take a bit more time to clean up other parts of the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we merge this one and improve in later PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we can merge now. I'm not having much time for doing code atm, but I'll open another pull to resolve this and the also the proposal related to usage of these types (hopefully I'll have time this weekend).
You can do this: const x: TransformerLLM = {} as any;
if ("grok.layer_count" in x) {
// below is typed correctly
console.log(x["grok.attention.layer_norm_epsilon"])
}
Yes it's possible probably You would need a mapped type: type Archs = {
"grok": ArchGrok,
...
} satisfies Record<Architecture, unknown> And then call Since we have code generation it's also possible to generate every possible This can totally be a follow up PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although would appreciate #640 (comment) :)
i tend to favor verbose in those situations:) |
Let's merge ? |
@ngxson thanks a lot for the contribution ! |
Follow up #640 Ref comments: - #640 (review) by @julien-c suggests using a check `metadata["general.architecture"] === ...` to select the correct type - #640 (comment) by @coyotte508 suggests using less generic but more verbose code The type system introduce in this PR allows type-checking at both compile time & runtime: ```ts const model: GGUFMetadata<GGUFType.STRICT> = null as any; if (model["general.architecture"] === "whisper") { model["encoder.whisper.block_count"] = 0; // @ts-expect-error because it must be a number model["encoder.whisper.block_count"] = "abc"; } if (model["tokenizer.ggml.model"] === undefined) { // @ts-expect-error because it's undefined model["tokenizer.ggml.eos_token_id"] = 1; } if (model["tokenizer.ggml.model"] === "gpt2") { // @ts-expect-error because it must be a number model["tokenizer.ggml.eos_token_id"] = undefined; model["tokenizer.ggml.eos_token_id"] = 1; } if (model["general.architecture"] === "mamba") { model["mamba.ssm.conv_kernel"] = 0; // @ts-expect-error because it must be a number model["mamba.ssm.conv_kernel"] = "abc"; } if (model["general.architecture"] === "llama") { // @ts-expect-error llama does not have ssm.* keys model["mamba.ssm.conv_kernel"] = 0; } ``` Type checks can be disable with `GGUFMetadata<GGUFType.NON_STRICT>`
Resolve #566
This PR introduces a generator script that pulls
llama.cpp
file, then use regex to extract needed information.It also addresses some smaller issues:
Attention
,Rope
,MOE
,...) use type intersection while it should be unionTokenizerModel
should be limited to 4 models, seellm_load_vocab()
: https://github.com/ggerganov/llama.cpp/blob/928e0b7013c862cf10701957b3d654aa70f11bd8/llama.cpp#L4198While it's fun to have the script & these type definitions, I think it would be nice to have some clearer directions on how to use them in the future.
Demo: