Controversial Freedb support #138

ubitux · 2017-03-05T12:36:08Z

OK so I understand you probably don't want this, but I'm submitting anyway. I'll maintain my crap in my fork in case it's rejected.

The python-cddb dependency drop is controversial because I'm actually adding the support to CDDB natively. The main benefit aside from having one less dependency is to have proper unicode support. Note: I should probably move the main() to the test suite.

The second commit introduces the --allow-freedb option in which we craft fake MusicBrainz metadata. Note that this option is NOT the default, and that it also warns the user about how evil this is.

MerlijnWajer · 2017-03-11T01:24:37Z

Thanks for working on this. I have no opinion, but wanted to let you know it's appreciated regardless. Let's see what other think, or perhaps have already said.

RecursiveForest · 2017-09-06T01:29:01Z

morituri/command/cd.py

        self.parser.add_argument('-U', '--unknown',
            action="store_true", dest="unknown",
            help="whether to continue ripping if the CD is unknown",
            default=False)
+        self.parser.add_argument('--allow-freedb',


How about the short option '-F' for FreeDB Fallback?

I'd rather not make using this option easy and didn't want to reserve a 1-letter option for a somehow "legacy" mode.

All options should have a short form.

Many don't (--cdr, --track-template, --disc-template in this module)

It's something we're planning on fixing shortly, although I understand that you may not be privy to transient information on IRC.

I also can't argue there's a consensus on this topic, I just strongly prefer BSD-style interfaces; so do what you want.

RecursiveForest · 2017-09-06T01:32:05Z

morituri/command/cd.py


+            allow_freedb = getattr(self.options, 'allow_freedb', False)


there's a default, so I think allow_freedb = self.options.allow_freedb is appropriate. Better yet I'd ditch the temporary variable and just use if self.options.allow_freedb: on line 147.

Good point, I have no idea what pattern I was trying to copy here...

Edit: ah, that's copied from the "unknown" code below.

RecursiveForest · 2017-09-06T01:35:36Z

morituri/command/cd.py

+                    freedb_data = sfdb.read(match['category'], match['discid'])
+                    self.program.metadata = self.program.craftMusicBrainzFromFreeDB(freedb_data)
+
+        if not self.program.metadata:


Following the convention of failing early:

if matches: ... if not self.options.allow_freedb and not self.options.unknown: logger.critical(...) return -1 logger.warning(...) freedb_data ... ...

I'd like to keep the diff as small as possible, and I'm not sure I'm following what you are trying to restructure here. I'd rather have such change on top of the proposed change if you don't mind.

RecursiveForest · 2017-09-06T01:39:39Z

morituri/command/cd.py

+            sfdb = simplefreedb.SimpleFreeDB()
+            matches = sfdb.query(cddbid[0], cddbid[1], cddbid[2:-1], cddbid[-1])
+            if matches:
+                match = matches[0] # TODO: honor --prompt


If you take a gander at the work I did to rewrite AccurateRip, I factored out a generic prompt function you might find useful here.

git grep -i prompt doesn't seem to raise anything related. Can you point out where I can find it?

Oh, it was @ RecursiveForest@ca33dcf#diff-aa28dbdd5d3fa03298c07270deac1ec1R40 , not part of the AR PR.

That's not mainline, or is it?

RecursiveForest · 2017-09-06T01:41:56Z

morituri/common/simplefreedb.py

+import urllib
+import morituri
+
+class SimpleFreeDB:


I know morituri is organised this way, but since you're not storing any dynamic data on the class, I think this entire module might be a good candidate for a more functional style. It'll read pretty much the same way, but indented 4 characters less and with selfless signatures.

I'm sorry, I think I prefer using the class system for at least namespace. Python is way too much a free4all language already, so it helps with the scoping.

I don't understand-- modules provide namespacing.

Depends how you import it. Anyway, I'm just trying to be consistent with the common way of writing Python. Do you think it's really a problem to wrap an API in a class?

RecursiveForest · 2017-09-06T01:47:31Z

morituri/common/simplefreedb.py

+
+    def _craft_match(self, categ, discid_str, dtitle):
+        artist, title = self._split_dtitle(dtitle)
+        return {'category': categ,


just spell out 'category'

Sure, will do.

RecursiveForest · 2017-09-06T02:02:20Z

morituri/command/cd.py

+            allow_freedb = getattr(self.options, 'allow_freedb', False)
+
+            sfdb = simplefreedb.SimpleFreeDB()
+            matches = sfdb.query(cddbid[0], cddbid[1], cddbid[2:-1], cddbid[-1])


Maybe the cddbid parsing logic should be done inside of sfdb.query(), because this is pretty opaque. If you know the sfdb.query() signature, sure you can work it out, but outside of your test cases it doesn't seem like you're ever calling sfdb.query() without having an already composed cddbid.

I'd rather have getCDDBValues() not build a custom array of stuff. It would indeed be simpler here to just pass cddbid, but the way result is crafted in getCDDBValues() makes me believe it was originally written by someone who had no clue about data structures. If you consider SimpleFreeDB could live outside the whipper project, it is saner to have this prototype, because the parameters are constructed separately: there is no such thing as "grab all the info and pack them in an array that way" thing outside of whipper.

Edit: I split the unpacking of the values earlier in this code to clarifies the call.

RecursiveForest · 2017-09-06T02:03:02Z

morituri/command/cd.py

@@ -279,15 +295,20 @@ def add_arguments(self):
        self.parser.add_argument('--track-template',
            action="store", dest="track_template",
            default=DEFAULT_TRACK_TEMPLATE,
-            help="template for track file naming (default default)")
+            help="template for track file naming")


these (and the above whitespace change in getPath) are both good changes, but I think don't belong in this PR.

The whitespace change seems to have been included already by some other change. I'll take out the default-default patch out of it and make a new pull request in a moment.

Edit: PR @ #188

RecursiveForest · 2017-09-06T02:09:30Z

morituri/common/program.py

-    def getCDDB(self, cddbdiscid):
-        """
-        @param cddbdiscid: list of id, tracks, offsets, seconds
+    def craftMusicBrainzFromFreeDB(self, freedb_data):


A short docstring explaining that this is creating musicbrainz-sytle metadata from a freedb entry would make this function perfect.

Sure, will do.

RecursiveForest · 2017-09-06T02:11:06Z

morituri/common/program.py

-            if number > 0:
+            has_mbinfo = mbidTrack and mbidTrackArtist and mbidAlbum \
+                    and mbidTrackAlbum and mbDiscId
+            if number > 0 and has_mbinfo:


Would it be worth assigning tags['...TRACKID'] et al. within individual if branches instead of all at once? I'm not sure if it's useful or not to have, say, ARTISTID and not ALBUMID, or if that would ever happen, but it might be worth it? shrugs

Nope, this if is protecting the insert of MusicBrainZ tags, and you absolutely don't want them when using --allow-freedb. This code basically checks for every field that is supposed to be present with MusicBrainZ; if any is missing, that's because we're hacking FreeDB stuff in it and so you don't want to end up with broken MusicBrainZ references obtained from dubious sources.

RecursiveForest · 2017-09-06T02:11:20Z

morituri/common/simplefreedb.py

+
+# Copyright (C) 2017 Clément Bœsch <u@pkh.me>
+
+# This file is part of morituri.


Yep, I'm in the process of rebasing that stuff, it was morituri when I wrote that patchset. I won't forget to fix that :)

RecursiveForest · 2017-09-06T02:11:29Z

morituri/common/simplefreedb.py

+# -*- Mode: Python; test-case-name: morituri.test.test_common_program -*-
+# vi:si:et:sw=4:sts=4:ts=4
+
+# Morituri - for those about to RIP


RIP morituri-- it's whipper now >:D

RecursiveForest · 2017-09-06T02:12:31Z

morituri/common/simplefreedb.py

+import re
+import socket
+import getpass
+import urllib


I know it requires another dependency (although one that we may introduce for ARv2 support regardless), but it might be worth using the 'requests' library in lieu of urllib here.

Why? urllib is builtin. I got so many issues with requests...

RecursiveForest · 2017-09-06T02:14:58Z

morituri/common/simplefreedb.py

+        return data
+
+def main():
+    test_queries = (


I think it would be preferable to decouple testing from the actual code by creating an accompanying test module, like the rest of the application.

Aside from the little assert, this is not actually a test but more a debug tool (it's not testing output, it's dumping data). I'd prefer to keep tests depending on external websites such as freedb.org outside of the test suite.

RecursiveForest · 2017-09-06T02:21:45Z

Overall this is pretty solid. I left a few notes regarding style and organisation, but the core functionality looks pretty good.

I'm not convinced yet it's worth the potential feature creep, especially because we want to encourage MusicBrainz contributions, but I can see a case being made for its inclusion. You could also consider writing a small tagging tool on top of simplefreedb that applies freedb-sourced metadata to flacs that were ripped by whipper with the --unknown flag.

Would you consider making simplefreedb into its own independent python module that could be developed independently of whipper, which we would then import? The two existent cddb modules for python are not particularly well packaged (and we know at least one is lacking in UTF-8 support, apparently), so there's definitely a place for it. This is pretty clean code.

ubitux · 2017-09-08T18:05:56Z

Would you consider making simplefreedb into its own independent python module that could be developed independently of whipper, which we would then import?

I don't think I'll be motivated enough to do all the packaging effort, I'm sorry. Feel free to do so though.

Thanks for the long review, I'll make the adjustments you requested soon.

anarcat · 2018-10-04T20:46:08Z

i would argue for this even if only for archival purposes. here i'm using whipper to rescue an old personal CD collection and i happened to have tagged some of those disks on freeDB. they have very low distribution (this is probably the only copy left in the universe) so there's little point in repopulating musicbrainz with this stuff...

JoeLametta · 2018-10-06T10:01:30Z

@anarcat Well, that's a pretty niche use case, isn't it? (considering that you can also tag those files after whipper has finished its job)

Anyway, right now we aren't using the python-cddb module anymore (see #276 for more details), so what's left relevant in this pull request is the second commit (cd: add --allow-freedb option). That one may be a worthful addition: let's keep discussing the pros and cons of this proposal...

anarcat · 2018-10-10T13:47:48Z

just to weigh another point in - what i ended up doing here is to feed the data back into musicbrainz. there are two ways of viewing this:

yay, people are contributing
crap, people are feeding useless old, bad data from freedb into musicbrainz still?

i'd rather not encourage the second pattern, but it's true that we might not encourage people to contribute to musicbrainz if data is only on freedb....

JoeLametta · 2018-10-28T16:01:34Z

Related to #44.

Freso · 2018-10-28T17:23:18Z

they have very low distribution (this is probably the only copy left in the universe) so there's little point in repopulating musicbrainz

I'd argue that that means that there's even more point in feeding that data into MusicBrainz. High distribution stuff will likely make its way there eventually anyway, but MusicBrainz wants information on all music. If you're the only one left who can "tell" us about a given release or maybe even artist, then we'd be all the more appreciative of you doing so.

(Keep in mind that MusicBrainz is a database/encyclopedia first, any uses of the data is secondary. There are people using the MB data from cleaning up or expanding on other datasets (one of the common use cases for the streaming services that are MetaBrainz supporters), and maybe there's a copy of some of the music you have access to floating around on YouTube or somewhere that would benefit from being linked up to MBIDs. There are a bunch of researchers around the world using MusicBrainz to look into various questions related to music information, needless to say, the more information there is, the better/more accurate results they'll get. Just because there may not be another person to rip the disc anytime in the future, doesn't mean that the value of the disc being in MusicBrainz is zero or even insignificant.)

peioe · 2019-12-09T15:34:01Z

It would be really nice to add an option to allow grabbing metadata from freedb. It's completely ok if it doesn't by default but it would really help with the adoption of whipper imo.

JoeLametta · 2019-12-12T11:06:37Z

@Freso @MerlijnWajer What's your opinion about this?

Freso · 2019-12-15T00:17:29Z

Considering that freedb.org is sunsetting and seems like it will be shut down in a few months, I vote to reject this PR and any other attempt of freedb compatibility. Even if it gets resurrected (on the same domain or elsewhere), the technology behind it is ancient and largely in maintenance-only mode (if even that much!). Let’s move on.

JoeLametta · 2019-12-15T15:59:24Z

@Freso Thanks for the reply: didn't know about the imminent shutdown! For this reason I'm closing the pull request.

peioe · 2019-12-15T16:02:20Z

That's sad, there are a lot more releases on freedb than MB, and if someone hosts a mirror (and they already exist) it could still be used. In my experience having only MB is pretty much equivalent to having no metadata at all unfortunately.

ubitux · 2019-12-15T16:25:45Z

That's sad, there are a lot more releases on freedb than MB, and if someone hosts a mirror (and they already exist) it could still be used. In my experience having only MB is pretty much equivalent to having no metadata at all unfortunately.

That was pretty much my experience and the reason I hacked this branch. Out of all the CDs I tried to RIP with whipper (about 20), almost none where found on MB, while almost all of them were on freedb.

anarcat · 2019-12-15T22:34:51Z

Same here: what brought me here was the daunting task of ripping my old CD collection out of fear it would dissolve itself in the mists of time (which it partly did). Because many of those releases predate the very existence of MB, a lot of those releases are only on FreeDB.org...

But I guess if the site dies, it's not worth keeping this implementation around.

@peioe which mirror are you aware of?

MerlijnWajer · 2019-12-15T22:45:58Z

Here's another controversial thought - what about adding gracenote support? They pretty much "inherited" their data from FreeDB.org anyway. (I might already have code for this)

Freso · 2019-12-15T22:55:08Z

I’m principally opposed to supporting a closed source, proprietary data source. (Also, Gracenote were the ones responsible for shutting off contributors’ access to CDDB 20 years ago, leading to the creation of FreeDB, MusicBrainz, et al.)

peioe · 2019-12-16T06:42:55Z

@peioe which mirror are you aware of?

http://www.freedb.org/en/faq.3.html#18
http://www.freedb.org/en/faq.3.html#112
http://www.freedb.org/en/download__server_software.4.html
http://www.freedb.org/en/download__database.10.html

EAC for example allows you to specify the url to a freedb server, so you can just host your own.

Even if freedb goes down tomorrow, the data it collected over the years is still extremely useful, and if it's possible to host a mirror it would be great to be able to use it with whipper.

ubitux force-pushed the freedb branch from 1db7a50 to 73d6581 Compare March 6, 2017 22:29

JoeLametta self-requested a review April 3, 2017 17:31

RecursiveForest reviewed Sep 6, 2017

View reviewed changes

ubitux changed the title ~~Controversial Freedb support :)~~ Controversial Freedb support Sep 9, 2017

ubitux force-pushed the freedb branch 6 times, most recently from cb0550c to d24f918 Compare September 16, 2017 13:57

ubitux force-pushed the freedb branch from d24f918 to 331d15c Compare December 16, 2017 12:41

ubitux added 2 commits December 16, 2017 13:41

project: drop dependency to python-cddb

ace0381

cd: add --allow-freedb option

4d582d3

ubitux force-pushed the freedb branch from 331d15c to 4d582d3 Compare December 16, 2017 12:41

JoeLametta closed this Dec 15, 2019


		# Copyright (C) 2017 Clément Bœsch <u@pkh.me>

		# This file is part of morituri.

Controversial Freedb support #138

Controversial Freedb support #138

Conversation

ubitux commented Mar 5, 2017 • edited Loading

MerlijnWajer commented Mar 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ubitux Sep 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RecursiveForest Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ubitux Sep 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ubitux Sep 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ubitux Sep 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ubitux Sep 9, 2017 • edited Loading

Choose a reason for hiding this comment

RecursiveForest commented Sep 6, 2017 • edited Loading

ubitux commented Sep 8, 2017

anarcat commented Oct 4, 2018

JoeLametta commented Oct 6, 2018 • edited Loading

anarcat commented Oct 10, 2018

JoeLametta commented Oct 28, 2018

Freso commented Oct 28, 2018 • edited Loading

peioe commented Dec 9, 2019

JoeLametta commented Dec 12, 2019

Freso commented Dec 15, 2019

JoeLametta commented Dec 15, 2019

peioe commented Dec 15, 2019

ubitux commented Dec 15, 2019 • edited Loading

anarcat commented Dec 15, 2019

MerlijnWajer commented Dec 15, 2019

Freso commented Dec 15, 2019

peioe commented Dec 16, 2019

ubitux commented Mar 5, 2017 •

edited

Loading

ubitux Sep 8, 2017 •

edited

Loading

RecursiveForest Sep 6, 2017 •

edited

Loading

ubitux Sep 9, 2017 •

edited

Loading

ubitux Sep 9, 2017 •

edited

Loading

ubitux Sep 8, 2017 •

edited

Loading

ubitux Sep 9, 2017 •

edited

Loading

RecursiveForest commented Sep 6, 2017 •

edited

Loading

JoeLametta commented Oct 6, 2018 •

edited

Loading

Freso commented Oct 28, 2018 •

edited

Loading

ubitux commented Dec 15, 2019 •

edited

Loading