Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wenet/utils] add kemans in torch with wenet #2545

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

Mddct
Copy link
Collaborator

@Mddct Mddct commented May 31, 2024

kmeans 是个常用的工具, 这里实现了在wenet的speech model基础上 进行online 提特征
用途:
speech encoder 聚类 离散化id -> LLM 语音理解 (asr等)
hubert/w2vbert 聚类, 离散化 semantic token -> tts 的semantic 等

TODO:

  • 将kmeans 的class 放入其他文件中, 因为kmeans 会作为一些codec/ssl 模型训练时候的初始化

@Mddct
Copy link
Collaborator Author

Mddct commented May 31, 2024

it works ! aishell 8gpu

截屏2024-05-31 19 36 01

@Mddct
Copy link
Collaborator Author

Mddct commented Jun 4, 2024

encode save to file works

截屏2024-06-04 20 00 21 截屏2024-06-04 20 11 57

means = distributed_sample_vectors(input, self.num_clusters)
self.means.copy_(means)
self.is_initialized = True
inertia = self._one_step(input.unsqueeze(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groups > 1的时候需要对input扩展input.expand(0).expand(self.groups, -1, -1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants