Add SpikeGPT model #21875

gsarti · 2023-03-01T13:43:38Z

Model description

Abstract:

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the RWKV language model, we successfully implement `SpikeGPT', a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations.

Concretely, it is a GPT model using Receptance Weighted Key Value (RWKV) instead of regular attention, and an adapted FFN layer.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Paper | Code

Author: @ridgerchu

ridgerchu · 2023-03-01T16:57:10Z

Thanks for your interest to our work! The checkpoint weights of 120M spike GPT has available now, but just for debug and playing with the model.

julien-c · 2023-03-06T13:04:42Z

I've read the paper, this model looks really cool 👍

gsarti added the New model label Mar 1, 2023

fblgit mentioned this issue Apr 11, 2023

Add RWKV2 (fast) #17230

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SpikeGPT model #21875

Add SpikeGPT model #21875

gsarti commented Mar 1, 2023

ridgerchu commented Mar 1, 2023 •

edited

Loading

julien-c commented Mar 6, 2023

Add SpikeGPT model #21875

Add SpikeGPT model #21875

Comments

gsarti commented Mar 1, 2023

Model description

Open source status

Provide useful links for the implementation

ridgerchu commented Mar 1, 2023 • edited Loading

julien-c commented Mar 6, 2023

ridgerchu commented Mar 1, 2023 •

edited

Loading