BUG: fix embedding token calculation & optimize memory #2221

qinxuye · 2024-09-03T12:40:50Z

This PR did a few things.

Previous tokens count is wrong, this PR fixed it.
Clear cache after create embedding, and do it not only every some calls but also when input tokens is large (when input tokens is long, the memory will grow very quickly)
Support torch_dtype not only for gte-Qwen2

Fixes #2000

codingl2k1 · 2024-09-03T21:17:53Z

The CI tests have failed.

qinxuye · 2024-09-04T04:19:41Z

The CI tests have failed.

Fixed.

xinference/model/embedding/core.py

BUG: fix embedding token calculation & optimize memory

88e2dba

XprobeBot added the bug Something isn't working label Sep 3, 2024

XprobeBot modified the milestones: v0.14, v0.15 Sep 3, 2024

fix ut

27db3f2

qinxuye requested a review from codingl2k1 September 4, 2024 04:19

amumu96 reviewed Sep 6, 2024

View reviewed changes

xinference/model/embedding/core.py Outdated Show resolved Hide resolved

qinxuye commented Sep 6, 2024

View reviewed changes

xinference/model/embedding/core.py Outdated Show resolved Hide resolved

Update xinference/model/embedding/core.py

6a7b62d

amumu96 approved these changes Sep 6, 2024

View reviewed changes

qinxuye merged commit 2198965 into xorbitsai:main Sep 6, 2024
7 of 13 checks passed

qinxuye deleted the bug/embedding branch September 6, 2024 05:14

Provide feedback