# RWKV-howto

possibly useful materials and tutorial for learning [RWKV](https://www.rwkv.com).

> RWKV: Parallelizable RNN with Transformer-level LLM Performance.

### Relevant Papers

- :star2:(2023-05) RWKV: Reinventing RNNs for the Transformer Era [arxiv](https://arxiv.org/abs/2305.13048)
- (2023-03) Resurrecting Recurrent Neural Networks for Long Sequences [arxiv](https://arxiv.org/abs/2303.06349)

- (2023-02) SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks [arxiv](https://arxiv.org/abs/2302.13939)
- (2022-08) Simplified State Space Layers for Sequence Modeling [ICLR2023](https://openreview.net/forum?id=Ai8Hw3AXqks)

- :star2:(2021-05) An Attention Free Transformer [arxiv](https://arxiv.org/abs/2105.14103)

- (2021-10) Efficiently Modeling Long Sequences with Structured State Spaces [ICLR2022](https://arxiv.org/abs/2111.00396) 

- (2020-08) Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [ICML2020](https://arxiv.org/abs/2006.16236)
- (2018) Parallelizing Linear Recurrent Neural Nets Over Sequence Length [ICLR2018](https://openreview.net/forum?id=HyUNwulC-)
- (2017-09) Simple Recurrent Units for Highly Parallelizable Recurrence [EMNLP2017](https://arxiv.org/abs/1709.02755)
- (2017-10) MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks [Neurips2017](https://arxiv.org/abs/1711.06788)
- (2017-06) Attention Is All You Need [Neurips2017](https://arxiv.org/abs/1706.03762)
- (2016-11) Quasi-Recurrent Neural Networks [ICLR2017](https://arxiv.org/abs/1611.01576)

### Resources

- Introducing RWKV - An RNN with the advantages of a transformer [Hugging Face](https://huggingface.co/blog/rwkv)
- 有了Transformer框架后是不是RNN完全可以废弃了？[知乎](https://www.zhihu.com/question/302392659/answer/2954997969)
- RNN最简单有效的形式是什么？[知乎](https://zhuanlan.zhihu.com/p/616357772)
- :star2:RWKV的RNN CNN二象性 [知乎](https://zhuanlan.zhihu.com/p/614311961)
- RNN的隐藏层需要非线性吗？[知乎](https://zhuanlan.zhihu.com/p/615672175)
- Google新作试图“复活”RNN：RNN能否再次辉煌？ [苏剑林](https://kexue.fm/archives/9554)
- :star2:How the RWKV language model works [Johan Sokrates Wind](https://www.mn.uio.no/math/english/people/aca/johanswi/index.html)

- :star2:The RWKV language model: An RNN with the advantages of a transformer [Johan Sokrates Wind](https://johanwind.github.io/2023/03/23/rwkv_overview.html)
- The Unreasonable Effectiveness of Recurrent Neural Networks [Andrej Karpathy blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

### Code

- [RKWV-LM](https://github.com/BlinkDL/RWKV-LM)
- [ChatRWKV](https://github.com/BlinkDL/ChatRWKV)
- [RWKV_in_150_lines](https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_in_150_lines.py)