Supporting RWKV (a RNN that can match transformer LM & zero-shot performance at 1B+ params) #48

BlinkDL · 2022-07-18T13:43:55Z

Hi guys. I am working on RWKV, which might be the only RNN (no attention!) that can match transformer LM & zero-shot performance at 1B+ params:
https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r_rwkv3_scaling_rnn_to_15b_and_reach_transformer/

I am using some CUDA in my project too. Probably we can collaborate to promote RNN and scale it to 100B+ params :)

Provide feedback