-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues while importing the November 2020 dump to PostgreSQL #131
Comments
Could you try with the October file and see if you have the same issues, please? If it is not too much, could you also test the October files with the changes in your PR? |
The results are similar when running the process with the October 2020 dumps. No errors are reported while importing with the database created by
A zero Trying to visit artist 118760 page on Discogs directly results in "This artist is used as a placeholder entry and does not link to any artist.". The artist name for that ID is "No Artist", so having null in this place actually makes sense. I have no idea what to do about that bogus
Since the import script takes the list of columns from the CSV file header, the list of columns is set to |
Superb analysis, thank you. So what do you think we should do? |
Something like this for starters - this includes a fix for missing values in the diff --git a/alternatives/dotnet/discogs/DiscogsRelease.cs b/alternatives/dotnet/discogs/DiscogsRelease.cs
index d554759..75d0b54 100644
--- a/alternatives/dotnet/discogs/DiscogsRelease.cs
+++ b/alternatives/dotnet/discogs/DiscogsRelease.cs
@@ -10,7 +10,7 @@ namespace discogs.Releases
{
{ "release", "id title released country notes data_quality master_id status".Split(" ") },
{ "release_genre", "release_id genre".Split(" ") },
- { "release_label", "release_id label_name catno".Split(" ") },
+ { "release_label", "release_id label_id label_name catno".Split(" ") },
{ "release_style", "release_id style".Split(" ") },
{ "release_image", "release_id type width height".Split(" ") },
{ "release_format", "release_id name qty text_string descriptions".Split(" ") },
@@ -22,6 +22,16 @@ namespace discogs.Releases
{ "release_track_artist", "release_id track_sequence track_id artist_id artist_name extra anv position join_string role tracks".Split(" ") },
};
+ private static string SanitizeArtistId(string artistId)
+ {
+ if (artistId == "0" /* non-linked credits in credit lists have this set to zero */
+ || artistId == "118760" /* No Artist */)
+ {
+ return "";
+ }
+ return artistId;
+ }
+
[XmlAttribute]
public string id { get; set; }
@@ -82,7 +92,7 @@ namespace discogs.Releases
foreach (var l in labels)
{
if (l == null) continue;
- yield return ("release_label", new[] { id, l.name, l.catno });
+ yield return ("release_label", new[] { id, l.id, l.name, l.catno });
}
}
if (styles?.Length > 0)
@@ -138,7 +148,7 @@ namespace discogs.Releases
foreach (var a in artists)
{
if (a == null) continue;
- yield return ("release_artist", new[] { id, a.id, a.name, "0", a.anv, (position++).ToString(), a.join, a.role, a.tracks });
+ yield return ("release_artist", new[] { id, SanitizeArtistId(a.id), a.name, "0", a.anv, (position++).ToString(), a.join, a.role, a.tracks });
}
}
if (extraartists?.Length > 0)
@@ -147,7 +157,7 @@ namespace discogs.Releases
foreach (var a in extraartists)
{
if (a == null) continue;
- yield return ("release_artist", new[] { id, a.id, a.name, "1", a.anv, (position++).ToString(), a.join, a.role, a.tracks });
+ yield return ("release_artist", new[] { id, SanitizeArtistId(a.id), a.name, "1", a.anv, (position++).ToString(), a.join, a.role, a.tracks });
}
}
int seq = 0;
@@ -159,13 +169,13 @@ namespace discogs.Releases
foreach (var a in (t.artists ?? System.Array.Empty<artist>())) {
if (a == null) continue;
artistSeq += 1;
- yield return ("release_track_artist", new[] { id, t.position, t.track_id, a.id, a.name, "0", a.anv, artistSeq.ToString(), a.join, a.role, a.tracks });
+ yield return ("release_track_artist", new[] { id, t.position, t.track_id, SanitizeArtistId(a.id), a.name, "0", a.anv, artistSeq.ToString(), a.join, a.role, a.tracks });
}
artistSeq = 0;
foreach (var a in (t.extraartists ?? System.Array.Empty<artist>())) {
if (a == null) continue;
artistSeq += 1;
- yield return ("release_track_artist", new[] { id, t.position, t.track_id, a.id, a.name, "1", a.anv, artistSeq.ToString(), a.join, a.role, a.tracks });
+ yield return ("release_track_artist", new[] { id, t.position, t.track_id, SanitizeArtistId(a.id), a.name, "1", a.anv, artistSeq.ToString(), a.join, a.role, a.tracks });
}
}
} C# is not my native language, so I'm sure this can be expressed in a better way, but the intent should be clear. The real question is, how many "magic" artist IDs there are : I'm not aware of any list of "artists who have their IDs but aren't artists who have a page and thus an entry in The project I've used a few times before actually commented out creating FK constraints in its database initialisation script, possibly because they just weren't worth the trouble. I'm not experienced enough with DBs to say for sure what kind of benefits they bring to the engine, as opposed to the human who can figure out the relations more easily. Perhaps this is really the best way to go. |
Hi there. I wanted to use the importer with this month's dump, and ran into some problems along the way.
The CSV files were generated by the C# converter. PostgreSQL version is 12.5. Running on Arch Linux.
While importing
artist.csv
:While importing
artist_url.csv
:While importing
release_track_artist.csv
:While importing
release_track.csv
:parent
should betext
, sincetrack_id
istext
as well. Or is the CSV output wrong and this should just be11
instead of723.11
?While creating FK constraints :
While creating indices :
Furthermore, the
label_id
column in therelease_label
is null for all records :The text was updated successfully, but these errors were encountered: