Enhanced duplicate song detection via lyric similarity comparisons #1

tchamberlin · 2021-01-27T23:44:26Z

Currently the de-duplication mechanism is quite stupid and error-prone. It considers a given (artist, track) combination unique, giving no consideration to cover songs, etc.

A good example of this is the song "Emily" by Frank Sinatra. This has been covered many dozens of times, and the covers rarely identify themselves as such. Further, it has a very generic name, making it impossible to filter based on that. The only real way forward I see is to perform similarity comparisons between all combinations of all lyrics (perhaps limit to identical song names?), via fuzzywuzzy or similar.

I think we could then grab the publication date for all duplicates and use only the earliest one.

The text was updated successfully, but these errors were encountered:

tchamberlin · 2021-05-14T00:21:42Z

This won't be possible until I can get access to the full MusixMatch API, or find a different API that gives full lyrics. The current API only gives me snippets, and not necessarily the same "slice" of the lyrics, so the comparisons are meaningless.

But maybe one day.

tchamberlin added the enhancement New feature or request label Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced duplicate song detection via lyric similarity comparisons #1

Enhanced duplicate song detection via lyric similarity comparisons #1

tchamberlin commented Jan 27, 2021

tchamberlin commented May 14, 2021

Enhanced duplicate song detection via lyric similarity comparisons #1

Enhanced duplicate song detection via lyric similarity comparisons #1

Comments

tchamberlin commented Jan 27, 2021

tchamberlin commented May 14, 2021