Skip to content

feat: Automatic fallback to CPU in case of GPU loading failure #805

Closed
@the-vindicar

Description

@the-vindicar

Problem
If GPU acceleration is enabled, Jan appears to follow "all or nothing" strategy, with model failing to activate completely if there is not enough vRAM, for example.

Success Criteria
A much better approach would be "graceful degradation", with model activating using CPU instead. Perhaps with a UI warning to notify user what has happened. That way at least the model would still respond, even if more slowly. Additionally, it'd allow accelerating small models and still working with larger ones.

An ideal approach would be to implement partial model offloading. That way it'd be possible to make a guess at how many layers can be safely offloaded into vRAM, so the model is accelerated as much as possible with the given hardware.

Additional context
I think LMStudio and GPT4All implement partial model offloading, so it's something that's possible to do. However, they just stick a slider into UI and leave it to user to find out how many layers can be loaded into vRAM.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Icebox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions