Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced duplicate song detection via lyric similarity comparisons #1

Open
tchamberlin opened this issue Jan 27, 2021 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@tchamberlin
Copy link
Owner

Currently the de-duplication mechanism is quite stupid and error-prone. It considers a given (artist, track) combination unique, giving no consideration to cover songs, etc.

A good example of this is the song "Emily" by Frank Sinatra. This has been covered many dozens of times, and the covers rarely identify themselves as such. Further, it has a very generic name, making it impossible to filter based on that. The only real way forward I see is to perform similarity comparisons between all combinations of all lyrics (perhaps limit to identical song names?), via fuzzywuzzy or similar.

I think we could then grab the publication date for all duplicates and use only the earliest one.

@tchamberlin tchamberlin added the enhancement New feature or request label Jan 27, 2021
@tchamberlin
Copy link
Owner Author

This won't be possible until I can get access to the full MusixMatch API, or find a different API that gives full lyrics. The current API only gives me snippets, and not necessarily the same "slice" of the lyrics, so the comparisons are meaningless.

But maybe one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant