Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controversial Freedb support #138

Closed
wants to merge 2 commits into from
Closed

Conversation

ubitux
Copy link
Contributor

@ubitux ubitux commented Mar 5, 2017

OK so I understand you probably don't want this, but I'm submitting anyway. I'll maintain my crap in my fork in case it's rejected.

The python-cddb dependency drop is controversial because I'm actually adding the support to CDDB natively. The main benefit aside from having one less dependency is to have proper unicode support. Note: I should probably move the main() to the test suite.

The second commit introduces the --allow-freedb option in which we craft fake MusicBrainz metadata. Note that this option is NOT the default, and that it also warns the user about how evil this is.

@MerlijnWajer
Copy link
Collaborator

Thanks for working on this. I have no opinion, but wanted to let you know it's appreciated regardless. Let's see what other think, or perhaps have already said.

@JoeLametta JoeLametta self-requested a review April 3, 2017 17:31
self.parser.add_argument('-U', '--unknown',
action="store_true", dest="unknown",
help="whether to continue ripping if the CD is unknown",
default=False)
self.parser.add_argument('--allow-freedb',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the short option '-F' for FreeDB Fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not make using this option easy and didn't want to reserve a 1-letter option for a somehow "legacy" mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All options should have a short form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many don't (--cdr, --track-template, --disc-template in this module)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we're planning on fixing shortly, although I understand that you may not be privy to transient information on IRC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also can't argue there's a consensus on this topic, I just strongly prefer BSD-style interfaces; so do what you want.


allow_freedb = getattr(self.options, 'allow_freedb', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a default, so I think allow_freedb = self.options.allow_freedb is appropriate. Better yet I'd ditch the temporary variable and just use if self.options.allow_freedb: on line 147.

Copy link
Contributor Author

@ubitux ubitux Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have no idea what pattern I was trying to copy here...

Edit: ah, that's copied from the "unknown" code below.

freedb_data = sfdb.read(match['category'], match['discid'])
self.program.metadata = self.program.craftMusicBrainzFromFreeDB(freedb_data)

if not self.program.metadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the convention of failing early:

if matches:
    ...
    if not self.options.allow_freedb and not self.options.unknown:
        logger.critical(...)
        return -1
    logger.warning(...)
    freedb_data ...
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the diff as small as possible, and I'm not sure I'm following what you are trying to restructure here. I'd rather have such change on top of the proposed change if you don't mind.

sfdb = simplefreedb.SimpleFreeDB()
matches = sfdb.query(cddbid[0], cddbid[1], cddbid[2:-1], cddbid[-1])
if matches:
match = matches[0] # TODO: honor --prompt
Copy link
Contributor

@RecursiveForest RecursiveForest Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a gander at the work I did to rewrite AccurateRip, I factored out a generic prompt function you might find useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git grep -i prompt doesn't seem to raise anything related. Can you point out where I can find it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@ubitux ubitux Sep 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not mainline, or is it?

import urllib
import morituri

class SimpleFreeDB:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know morituri is organised this way, but since you're not storing any dynamic data on the class, I think this entire module might be a good candidate for a more functional style. It'll read pretty much the same way, but indented 4 characters less and with selfless signatures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I think I prefer using the class system for at least namespace. Python is way too much a free4all language already, so it helps with the scoping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand-- modules provide namespacing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends how you import it. Anyway, I'm just trying to be consistent with the common way of writing Python. Do you think it's really a problem to wrap an API in a class?


def _craft_match(self, categ, discid_str, dtitle):
artist, title = self._split_dtitle(dtitle)
return {'category': categ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just spell out 'category'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

allow_freedb = getattr(self.options, 'allow_freedb', False)

sfdb = simplefreedb.SimpleFreeDB()
matches = sfdb.query(cddbid[0], cddbid[1], cddbid[2:-1], cddbid[-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the cddbid parsing logic should be done inside of sfdb.query(), because this is pretty opaque. If you know the sfdb.query() signature, sure you can work it out, but outside of your test cases it doesn't seem like you're ever calling sfdb.query() without having an already composed cddbid.

Copy link
Contributor Author

@ubitux ubitux Sep 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have getCDDBValues() not build a custom array of stuff. It would indeed be simpler here to just pass cddbid, but the way result is crafted in getCDDBValues() makes me believe it was originally written by someone who had no clue about data structures. If you consider SimpleFreeDB could live outside the whipper project, it is saner to have this prototype, because the parameters are constructed separately: there is no such thing as "grab all the info and pack them in an array that way" thing outside of whipper.

Edit: I split the unpacking of the values earlier in this code to clarifies the call.

@@ -279,15 +295,20 @@ def add_arguments(self):
self.parser.add_argument('--track-template',
action="store", dest="track_template",
default=DEFAULT_TRACK_TEMPLATE,
help="template for track file naming (default default)")
help="template for track file naming")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these (and the above whitespace change in getPath) are both good changes, but I think don't belong in this PR.

Copy link
Contributor Author

@ubitux ubitux Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whitespace change seems to have been included already by some other change. I'll take out the default-default patch out of it and make a new pull request in a moment.

Edit: PR @ #188

def getCDDB(self, cddbdiscid):
"""
@param cddbdiscid: list of id, tracks, offsets, seconds
def craftMusicBrainzFromFreeDB(self, freedb_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short docstring explaining that this is creating musicbrainz-sytle metadata from a freedb entry would make this function perfect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

if number > 0:
has_mbinfo = mbidTrack and mbidTrackArtist and mbidAlbum \
and mbidTrackAlbum and mbDiscId
if number > 0 and has_mbinfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth assigning tags['...TRACKID'] et al. within individual if branches instead of all at once? I'm not sure if it's useful or not to have, say, ARTISTID and not ALBUMID, or if that would ever happen, but it might be worth it? shrugs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this if is protecting the insert of MusicBrainZ tags, and you absolutely don't want them when using --allow-freedb. This code basically checks for every field that is supposed to be present with MusicBrainZ; if any is missing, that's because we're hacking FreeDB stuff in it and so you don't want to end up with broken MusicBrainZ references obtained from dubious sources.


# Copyright (C) 2017 Clément Bœsch <u@pkh.me>

# This file is part of morituri.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whipper!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm in the process of rebasing that stuff, it was morituri when I wrote that patchset. I won't forget to fix that :)

# -*- Mode: Python; test-case-name: morituri.test.test_common_program -*-
# vi:si:et:sw=4:sts=4:ts=4

# Morituri - for those about to RIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIP morituri-- it's whipper now >:D

import re
import socket
import getpass
import urllib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it requires another dependency (although one that we may introduce for ARv2 support regardless), but it might be worth using the 'requests' library in lieu of urllib here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? urllib is builtin. I got so many issues with requests...

return data

def main():
test_queries = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to decouple testing from the actual code by creating an accompanying test module, like the rest of the application.

Copy link
Contributor Author

@ubitux ubitux Sep 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the little assert, this is not actually a test but more a debug tool (it's not testing output, it's dumping data). I'd prefer to keep tests depending on external websites such as freedb.org outside of the test suite.

@RecursiveForest
Copy link
Contributor

RecursiveForest commented Sep 6, 2017

Overall this is pretty solid. I left a few notes regarding style and organisation, but the core functionality looks pretty good.

I'm not convinced yet it's worth the potential feature creep, especially because we want to encourage MusicBrainz contributions, but I can see a case being made for its inclusion. You could also consider writing a small tagging tool on top of simplefreedb that applies freedb-sourced metadata to flacs that were ripped by whipper with the --unknown flag.

Would you consider making simplefreedb into its own independent python module that could be developed independently of whipper, which we would then import? The two existent cddb modules for python are not particularly well packaged (and we know at least one is lacking in UTF-8 support, apparently), so there's definitely a place for it. This is pretty clean code.

@ubitux
Copy link
Contributor Author

ubitux commented Sep 8, 2017

Would you consider making simplefreedb into its own independent python module that could be developed independently of whipper, which we would then import?

I don't think I'll be motivated enough to do all the packaging effort, I'm sorry. Feel free to do so though.

Thanks for the long review, I'll make the adjustments you requested soon.

@ubitux ubitux changed the title Controversial Freedb support :) Controversial Freedb support Sep 9, 2017
@ubitux ubitux force-pushed the freedb branch 6 times, most recently from cb0550c to d24f918 Compare September 16, 2017 13:57
@anarcat
Copy link
Contributor

anarcat commented Oct 4, 2018

i would argue for this even if only for archival purposes. here i'm using whipper to rescue an old personal CD collection and i happened to have tagged some of those disks on freeDB. they have very low distribution (this is probably the only copy left in the universe) so there's little point in repopulating musicbrainz with this stuff...

@JoeLametta
Copy link
Collaborator

JoeLametta commented Oct 6, 2018

@anarcat Well, that's a pretty niche use case, isn't it? (considering that you can also tag those files after whipper has finished its job)

Anyway, right now we aren't using the python-cddb module anymore (see #276 for more details), so what's left relevant in this pull request is the second commit (cd: add --allow-freedb option). That one may be a worthful addition: let's keep discussing the pros and cons of this proposal...

@anarcat
Copy link
Contributor

anarcat commented Oct 10, 2018

just to weigh another point in - what i ended up doing here is to feed the data back into musicbrainz. there are two ways of viewing this:

  1. yay, people are contributing
  2. crap, people are feeding useless old, bad data from freedb into musicbrainz still?

i'd rather not encourage the second pattern, but it's true that we might not encourage people to contribute to musicbrainz if data is only on freedb....

@JoeLametta
Copy link
Collaborator

Related to #44.

@Freso
Copy link
Member

Freso commented Oct 28, 2018

they have very low distribution (this is probably the only copy left in the universe) so there's little point in repopulating musicbrainz

I'd argue that that means that there's even more point in feeding that data into MusicBrainz. High distribution stuff will likely make its way there eventually anyway, but MusicBrainz wants information on all music. If you're the only one left who can "tell" us about a given release or maybe even artist, then we'd be all the more appreciative of you doing so.

(Keep in mind that MusicBrainz is a database/encyclopedia first, any uses of the data is secondary. There are people using the MB data from cleaning up or expanding on other datasets (one of the common use cases for the streaming services that are MetaBrainz supporters), and maybe there's a copy of some of the music you have access to floating around on YouTube or somewhere that would benefit from being linked up to MBIDs. There are a bunch of researchers around the world using MusicBrainz to look into various questions related to music information, needless to say, the more information there is, the better/more accurate results they'll get. Just because there may not be another person to rip the disc anytime in the future, doesn't mean that the value of the disc being in MusicBrainz is zero or even insignificant.)

@peioe
Copy link

peioe commented Dec 9, 2019

It would be really nice to add an option to allow grabbing metadata from freedb. It's completely ok if it doesn't by default but it would really help with the adoption of whipper imo.

@JoeLametta
Copy link
Collaborator

@Freso @MerlijnWajer What's your opinion about this?

@Freso
Copy link
Member

Freso commented Dec 15, 2019

Considering that freedb.org is sunsetting and seems like it will be shut down in a few months, I vote to reject this PR and any other attempt of freedb compatibility. Even if it gets resurrected (on the same domain or elsewhere), the technology behind it is ancient and largely in maintenance-only mode (if even that much!). Let’s move on.

billede

@JoeLametta
Copy link
Collaborator

@Freso Thanks for the reply: didn't know about the imminent shutdown! For this reason I'm closing the pull request.

@JoeLametta JoeLametta closed this Dec 15, 2019
@peioe
Copy link

peioe commented Dec 15, 2019

That's sad, there are a lot more releases on freedb than MB, and if someone hosts a mirror (and they already exist) it could still be used. In my experience having only MB is pretty much equivalent to having no metadata at all unfortunately.

@ubitux
Copy link
Contributor Author

ubitux commented Dec 15, 2019

That's sad, there are a lot more releases on freedb than MB, and if someone hosts a mirror (and they already exist) it could still be used. In my experience having only MB is pretty much equivalent to having no metadata at all unfortunately.

That was pretty much my experience and the reason I hacked this branch. Out of all the CDs I tried to RIP with whipper (about 20), almost none where found on MB, while almost all of them were on freedb.

@anarcat
Copy link
Contributor

anarcat commented Dec 15, 2019

Same here: what brought me here was the daunting task of ripping my old CD collection out of fear it would dissolve itself in the mists of time (which it partly did). Because many of those releases predate the very existence of MB, a lot of those releases are only on FreeDB.org...

But I guess if the site dies, it's not worth keeping this implementation around.

@peioe which mirror are you aware of?

@MerlijnWajer
Copy link
Collaborator

Here's another controversial thought - what about adding gracenote support? They pretty much "inherited" their data from FreeDB.org anyway. (I might already have code for this)

@Freso
Copy link
Member

Freso commented Dec 15, 2019

I’m principally opposed to supporting a closed source, proprietary data source. (Also, Gracenote were the ones responsible for shutting off contributors’ access to CDDB 20 years ago, leading to the creation of FreeDB, MusicBrainz, et al.)

@peioe
Copy link

peioe commented Dec 16, 2019

@peioe which mirror are you aware of?

http://www.freedb.org/en/faq.3.html#18
http://www.freedb.org/en/faq.3.html#112
http://www.freedb.org/en/download__server_software.4.html
http://www.freedb.org/en/download__database.10.html

EAC for example allows you to specify the url to a freedb server, so you can just host your own.

Even if freedb goes down tomorrow, the data it collected over the years is still extremely useful, and if it's possible to host a mirror it would be great to be able to use it with whipper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants