You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the de-duplication mechanism is quite stupid and error-prone. It considers a given (artist, track) combination unique, giving no consideration to cover songs, etc.
A good example of this is the song "Emily" by Frank Sinatra. This has been covered many dozens of times, and the covers rarely identify themselves as such. Further, it has a very generic name, making it impossible to filter based on that. The only real way forward I see is to perform similarity comparisons between all combinations of all lyrics (perhaps limit to identical song names?), via fuzzywuzzy or similar.
I think we could then grab the publication date for all duplicates and use only the earliest one.
The text was updated successfully, but these errors were encountered:
This won't be possible until I can get access to the full MusixMatch API, or find a different API that gives full lyrics. The current API only gives me snippets, and not necessarily the same "slice" of the lyrics, so the comparisons are meaningless.
Currently the de-duplication mechanism is quite stupid and error-prone. It considers a given
(artist, track)
combination unique, giving no consideration to cover songs, etc.A good example of this is the song "Emily" by Frank Sinatra. This has been covered many dozens of times, and the covers rarely identify themselves as such. Further, it has a very generic name, making it impossible to filter based on that. The only real way forward I see is to perform similarity comparisons between all combinations of all lyrics (perhaps limit to identical song names?), via fuzzywuzzy or similar.
I think we could then grab the publication date for all duplicates and use only the earliest one.
The text was updated successfully, but these errors were encountered: