deep-learning sequence-modeling attention attention-mechanism nlp transformer keras
This is an implementation of the Transformer for music generation. The music is represented in ABC notation, which consists of a song's metadata, e.g. the BPM, and its notes. The model takes the metadata as input, creates an encoding from it, and predicts notes given the context of this encoding and previous notes.
-
In traditional sequence models, each point in the sequence is a bottleneck to all previous representations learned. The transformer avoids this problem by processing all tokens of the sequence simultaneously.
-
Attention mechanism makes sure that at each point of the output sequence, the token is generated only from parts of the input that are relevant.
-
With positional encoding the model is able to tell where a token lies in the sequence, despite the input being processed in parallel.
I chose subword tokenization over character or word tokenization to reduce the size of the vocabulary to be learned.
With a dataset size of 1000, a vocabulary size of 1000, an embedding size of 128, 4 encoder/decoder layers (Nx), 8 heads, and a batch size of 32, the model can take about 2000 iterations to converge, depending on the complexity of the dataset.
Below are some of the songs generated by the model.
This model has been deployed to a mobile application. Feel free to use it if you do not have the python environment needed to run the source code.