Skip to content

ucsd-ml-arts/generative-audio-zizhen-wang

Repository files navigation

Project 3 Generative Audio

Zizhen Wang, ziw142@ucsd.edu

Abstract

Different types music always have the specific rhythm, melody and instruments. People could recognize these sytles depends on the them but if we use machine learning to generate the new music which it may change the number of instrument, the melody and the rhythm, could we still can recognize them. I will sue GANsynth, MusicVAE and Transformer to change the number of instruments, the melody and the rhythm in different orders to get results.

Model/Data

  • La companella
  • "La campanella" (Italian for "The little bell") is the nickname given to the third of Franz Liszt's six Grandes études de Paganini, S. 141 (1851). It is in the key of G-sharp minor. This piece is a revision of an earlier version from 1838, the Études d'exécution transcendente d'après Paganini, S. 140. Its melody comes from the final movement of Niccolò Paganini's Violin Concerto No. 2 in B minor, where the tune was reinforced metaphorically by a 'little handbell'. This is portrayed by the top note jumps that need to be played within the timeframe of a 16th note.
  • This is a piano song which represents classical style and only one instrument.

Code

Jupyter notebooks:

Results

  • Firstly, I used GANSynth generator to generate new music by 16 random samples of instruments. GANSynth is a new approach to audio synthesis using neural networks to release a playable set of neural synthsizer instruments. The Synth dataset has a lot of intruments samples and qualities like bright or dark. They use encoder and decoder to generate new instuments into new waveform. The generate custom inerpolation for instruments: [0, 3, 6, 0] and for time :[0, 0.3, 0.6, 1.0]

  • 03 - La Campanella.mp3

  • generated_clip.wav You could also download from github if you do not have permission.

  • The first one is original music and second one is generated music. We will see the form for x-axis is almost same which means the rhythm and melody is almost same. We could also listen from these 2 musics. The interesting thing is the generated music still keep the classic music style and it listens like symphony or drama which have a lot of instruments.

  • Secondly, I used MusicVAE for 16-bar Melody Models. MusicVAE is a hierarchical recurrent variational autoencoder for learning latent spaces for musical scores.. For the training data, I both use the original music and generated music by GANSynth to input 16-bar melody models and temperature is 0.5.

  • hierdec_mel_16bar_mean.mp3

  • hierdec_mel_16bar_mean2.mp3 You could also download from github if you do not have permission.

  • The first one is generated by original music and the second one is generated by generated music by GANSynth. These two we see that they both changed the melody. Since the provided samples are all piano so they both generated output is piano music. However, the first one is more discrete in waveform and the second is more continuous in waveform. After listening these two samples, we found the first one has like more staccato and the second one has more prolonged sound and more bass sound. My assumption is more instruments have more waveforms covered and when it decodes and encodes, the output will be more smooth and continuous.

  • Thirdly, I used Generating Piano Music with Transformer for melody-conditioned piano performance model. Music Transformer is an attention-based neural network that can generate music with improved long-term coherence. For the training data, I both use the original music and generated music by MusicVAE and temperature is 0.5. -

  • accompaniment.mp3

  • accompaniment2.mp3

  • The first one is generated by mean1 music and the second one is generated by mean2 mucic as before. These two we see that they both changed the rhythm is much more than melody so we could still listen the similar melody compared with the last part. Since the provided samples are all piano so they both generated output is piano music. However, the first one is rapid which the rhythm is get faster and the second one has more parts. My assumption is more instruments have more waveforms covered and when it decodes and encodes, the output will be multi-voice music which have more parts.

  • Lastly, I used GANSynth generator to generate accompaniment2 to compare with the generated music from the first part.

  • generated_clip2.wav

This new music which all the components are changed that we could not recognize the classical style. This music is like a new type electrical music which is very interesting.

Technical Notes

Some resource could not be as traning data set, so we need convert wav file to midi file which choose some converters online and tranform them back to wav as result.

Reference

About

generative-audio-zizhen-wang created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published