Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new_io] 看起来简单又很复杂的IO, 仅供参考 #2107

Closed
Mddct opened this issue Nov 3, 2023 · 3 comments
Closed

[new_io] 看起来简单又很复杂的IO, 仅供参考 #2107

Mddct opened this issue Nov 3, 2023 · 3 comments
Labels
not in the plan New feature but not in the plan Stale

Comments

@Mddct
Copy link
Collaborator

Mddct commented Nov 3, 2023

https://gist.github.com/Mddct/ca449217164a704641b742cb7fae2b18
引入了tf.data, 又可以在torch distributed环境上跑, 仅供参考

@Mddct Mddct added the not in the plan New feature but not in the plan label Nov 3, 2023
@Mddct
Copy link
Collaborator Author

Mddct commented Nov 4, 2023

有个初步的benchmark:

Screenshot 2023-11-05 at 00 17 12
  • 特征对齐, 结果对齐
  • 速度会比wenet 快 (由于tf 有生产者消费者机制, test时候gpu基本不会等)

@Mddct Mddct closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2023
@Mddct Mddct reopened this Nov 5, 2023
@Mddct Mddct closed this as not planned Won't fix, can't repro, duplicate, stale Nov 6, 2023
@xingchensong
Copy link
Member

有个想法,oldio的shuffle和sort是在pipeline最后做的(做完了各种数据处理之后),如果像newio一样挪到前面,是不是也会有提升

@xingchensong
Copy link
Member

raw的shuffle比较容易放到decode_wav前面,但是感觉shard好像不是很好整,必须得decode_wav结束后

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not in the plan New feature but not in the plan Stale
Projects
None yet
Development

No branches or pull requests

2 participants