-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect outputs for predominant melody algorithm using essentia #42
Comments
Have you looked at the vector of confidence values? If the values are negative it means the algorithm has estimated these frames as unvoiced and their pitch value should be discarded. |
Hi Justin Salamon, Thanks for your reply. I have tried what you suggested. Some of the overshoot portions got eliminated while discarding the negative pitch confidence values. But still some are remaining as I am plotting below(with reference to the first figure in this thread). When I analyzed further those seem to be silence portions, but returns some positive pitch confidence values. I even tried applying some threshold value for the pitch confidence, but that affects some of the voiced section contours. Once again thanks for the suggestions. |
Actually a likely cause is the fact that because the algorithm was originally designed for polyphonic music (and not a monophonic melody with silences) the salience function is generating some fake contours from the background noise (even if it's very low) during silent segments of the recording. You could play with the voicing tolerance parameter (setting it to a lower value than the default 0.2) to see if that helps. If you know the expected frequency range of the melody you can limit the min/max frequencies as well. I don't think it's likely to be the vibrato detection because the segments you've indicated don't present a stable vibrato, and anyway unless you set it manually by default vibrato detection is deactivated in the implementation. Since quite a few people are using the algorithm for f0 estimation of monophonic sources in addition to polyphonic sources, we may look into adding some functionality to handle this issue specifically, but it's not implemented at the moment. |
Hi Justin Salamon, Thanks again for your inputs. The change in voicing tolerance is not giving any improvements. :-( Just now, after reading the reference paper ([1] J. Salamon and E. Gómez, "Melody extraction from polyphonic music signals using pitch contour characteristics," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012.) , I could observe the following things:
Can this be an issue with the filtering out of non melody contours in the final melody selection or some issues with the silence detection? Thanks for your inputs. |
Hi, Even I have been trying the same with polyphonic songs. I also got lots of overshoots there which according to me was not correct. It seems there is some problem with melody contour creation in essentia Thanks |
@Philipsciby regarding your observations:
@camillussmith remember melody extraction is still an open research problem and you can't expect perfect results for every file. The MELODIA algorithm is state-of-the-art, but it will still make some estimation errors (e.g. the overshoot may be octave errors for example). |
Thanks for your inputs. Now I have tried with the values "pitch continuity cue" = 13.5625 and the "minDuration" = 50ms. After discarding the negative pitch confidence values, I am almost getting a satisfactory contour with the monophonic song I tried. I am not sure whether it is the correct way to handle the issue. Hope essentia will soon come up with a correct implementation similar to that of melodia.dll(vamp) where the melody extraction seems most correct. Thanks |
I am still getting issues for some other inputs with the above configuration. |
While essentia allows it, it is not recommended to change the pitch continuity cue and the minDuration parameters as these are internal to the algorithm and changing them can lead to undesired effects. Again, the algorithm is designed for polyphonic signals, and while it often works well for monophonic signals, this requires slightly different processing which as you have noted is included in the vamp plug-in but not yet in the essentia implementation. You could also try the PitchYinFFT algorithm in essentia which is designed for monophonic signals. Anyway, while this functionality may be added in the future, it's more of a feature request than an actual issue (the code works fine). The issue has been labeled as 'enhancement'. |
Hi Justin Salamon, I understand the same.What I observed was, for the same monophonic song recorded through a microphone when given to melodia vamp plug in and essentia software, the vamp plug in gives a very good reults while the essentia output seems a little corrupted. My point was, melodia vamp plugin implemented by you is correct and some thing would have missed in essentia implementations. Thanks |
Hi,
I have been trying out a few algorithms in essentia package. While trying out the predominant melody algorithm, I gave a song sang by me without any instrumental background as the input to the algorithm. I observed that the output pitch values have some abnormalities with some sudden overshoots to frequency ranges of 800- 1400Hz at several portions, and for the rest it seems very reasonable with average pitch values around 150-280. The same repeated for many songs. I believe the overshoot frequencies(800- 1400Hz) are beyond normal ranges while singing and I am suspecting the predominant melody extraction fails. I tried out changing the input parameter values, but it didn't improve. :-(
Am I missing some initialization of the parameters here? Or is it having anything to do inside the predominant melody algorithm.
Following is a snapshot of my output.
The text was updated successfully, but these errors were encountered: