Skip to content

Conversation

Copy link

Copilot AI commented Jan 18, 2026

  • Analyze the issue at lines 1914-1950 in src/whisper.cpp
  • Implement improved solution based on reviewer feedback:
    • Added skip, seek, tell function pointers to whisper_model_loader interface
    • First pass scans tensor metadata and skips data using seek (no memory doubling)
    • Creates tensors with correct types before buffer allocation
    • Seeks back to tensor section for data loading
    • Maintains backwards compatibility for legacy loaders
  • Updated all loader implementations (file, buffer, VAD)
  • Build verification passes
  • Code review feedback addressed
Original prompt

src/whisper.cpp的1914到1950行,
The way this is implemented, the allocated memory on the device would still correspond to the original tensor type declared earlier in this function.

Need to figure out a better way to do it.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 18, 2026 16:13
…ation

This change addresses the issue where allocated device memory doesn't
match the tensor type from the file. The fix implements a two-pass
approach:

1. Pre-scan: Read all tensor metadata and data from the file upfront
2. Create tensors using the actual types from the file (not assumed types)
3. Load weights from the pre-scanned buffer

This ensures that memory allocation on the device corresponds to the
correct tensor type from the file, fixing mixed precision model loading.

Co-authored-by: lhpqaq <63844184+lhpqaq@users.noreply.github.com>
- Move EOF check before reading partial header data
- Add comment explaining temporary memory doubling tradeoff
- Add warning when tensor not found in pre-scanned data
- Use name.c_str() consistently instead of name.data()

Co-authored-by: lhpqaq <63844184+lhpqaq@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor memory management for tensor types Fix tensor type mismatch by pre-scanning file metadata before allocation Jan 18, 2026
Copilot AI requested a review from lhpqaq January 18, 2026 16:17
@lhpqaq
Copy link
Owner

lhpqaq commented Jan 18, 2026

@copilot 扫面两遍肯定不行,这样会导致模型加载时间翻倍,请思考别的思路。能不能把Allocated device memory的过程后置

Instead of pre-scanning all tensor data into memory (which doubles memory
usage), this change:

1. Adds skip, seek, and tell functions to whisper_model_loader interface
2. First pass: scans tensor metadata and skips data using seek (no memory)
3. Creates tensors with correct types from file before buffer allocation
4. Seeks back to tensor section for second pass data loading
5. Maintains backwards compatibility for loaders without seek support

This addresses the reviewer feedback to avoid doubling memory usage
while still fixing the tensor type mismatch issue.

Co-authored-by: lhpqaq <63844184+lhpqaq@users.noreply.github.com>
Copy link
Author

Copilot AI commented Jan 18, 2026

@copilot 扫面两遍肯定不行,这样会导致模型加载时间翻倍,请思考别的思路。能不能把Allocated device memory的过程后置

Implemented a better approach that defers device memory allocation. Instead of loading all tensor data into memory (which doubles memory), this solution:

  1. Adds skip, seek, tell to the whisper_model_loader interface
  2. First pass: scans tensor metadata and skips data using seek (no memory allocation)
  3. Creates tensors with correct types from file, then allocates buffers
  4. Seeks back to load data in second pass

This avoids both the memory doubling and maintains backwards compatibility for loaders without seek support.

Commit: 9422867

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants