feat: Automatic fallback to CPU in case of GPU loading failure

**Problem**
If GPU acceleration is enabled, Jan appears to follow "all or nothing" strategy, with model failing to activate completely if there is not enough vRAM, for example. 

**Success Criteria**
A much better approach would be "graceful degradation", with model activating using CPU instead. Perhaps with a UI warning to notify user what has happened. That way at least the model would still respond, even if more slowly. Additionally, it'd allow accelerating small models and still working with larger ones.

An ideal approach would be to implement partial model offloading. That way it'd be possible to make a guess at how many layers can be safely offloaded into vRAM, so the model is accelerated as much as possible with the given hardware.

**Additional context**
I think LMStudio and GPT4All implement partial model offloading, so it's something that's possible to do. However, they just stick a slider into UI and leave it to user to find out how many layers can be loaded into vRAM.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Automatic fallback to CPU in case of GPU loading failure #805

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Automatic fallback to CPU in case of GPU loading failure #805

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions