You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dotnet_parser branch has a dumb, unoptimized .NET Core-based parser and the results are promising, to say the least: on labels, it's about 3-4x faster than the python version: 22s vs 75s.
$ time python3 run.py --export=label /Users/af59986/Dev/tmp/discogs /Users/af59986/Dev/tmp/discogs/csv-dir
Processing labels: 1571873labels [01:14, 21041.64labels/s]
python3 run.py --export=label /Users/af59986/Dev/tmp/discogs 74.51s user 0.44s system 99% cpu 1:15.05 total
$ time dotnet run bin/Release/netcoreapp3.1/discogs.dll -- ~/Dev/tmp/discogs/discogs_20200806_labels.xml.gz
0 - bin/Release/netcoreapp3.1/discogs.dll; 1 - /Users/af59986/Dev/tmp/discogs/discogs_20200806_labels.xml.gz
Variant2: /Users/af59986/Dev/tmp/discogs/discogs_20200806_labels.xml.gz
Found 1,571,873 label. Wrote them to /Users/af59986/Dev/tmp/discogs/label.csv; /Users/af59986/Dev/tmp/discogs/label_url.csv; /Users/af59986/Dev/tmp/discogs/label_image.csv.
dotnet run bin/Release/netcoreapp3.1/discogs.dll -- 27.20s user 2.34s system 131% cpu 22.435 total
releases.xml.gz (.NET w/o track artists): 42:38 vs 1:45:16
$ time python3 run.py --export=release /Users/af59986/Dev/tmp/discogs /Users/af59986/Dev/tmp/discogs/csv-dir
Processing releases: 12867980releases [1:45:15, 2037.40releases/s]
python3 run.py --export=release /Users/af59986/Dev/tmp/discogs 5753.31s user 48.92s system 91% cpu 1:45:16.46 total
time dotnet run bin/Release/netcoreapp3.1/discogs.dll -- ~/Dev/tmp/discogs/discogs_20200806_releases.xml.gz
0 - bin/Release/netcoreapp3.1/discogs.dll; 1 - /Users/af59986/Dev/tmp/discogs/discogs_20200806_releases.xml.gz
Variant2: /Users/af59986/Dev/tmp/discogs/discogs_20200806_releases.xml.gz
Parsing done. Writing streams.
Found 12,867,980 releases. Wrote them to.....
dotnet run bin/Release/netcoreapp3.1/discogs.dll -- 3182.56s user 240.38s system 133% cpu 42:38.93 total
Performance Numbers
Note: tests consistent (same OS, files, etc) only across same file
File
Record Count
Python
C#
discogs_20200806_artists.xml.gz
7,046,615
6:22
2:35
discogs_20200806_labels.xml.gz
1,571,873
1:15
0:22
discogs_20200806_masters.xml.gz
1,734,371
3:56
1:57
discogs_20200806_releases.xml.gz
12,867,980
1:45:16
42:38
TODO:
labels parser (smallest file) that creates equivalent files to the python parser
releases parser (largest file)
compare times with python parser
compare release csv files with python csv files
artists
masters
compressed csv files
progress bar indicating conversion status
"accurate" API counts - might provide in a patch, if requested.
tests
GH build actions
binary production for major platforms
changes to database
command line arguments
verbose flag might provide in a patch if we can figure out what information is verbose
dry-run flag
provide platform builds; can get them from the built artifacts and attach them to the release
update README with running instructions
The text was updated successfully, but these errors were encountered:
The
dotnet_parser
branch has a dumb, unoptimized .NET Core-based parser and the results are promising, to say the least: on labels, it's about 3-4x faster than the python version: 22s vs 75s.releases.xml.gz (.NET w/o track artists): 42:38 vs 1:45:16
Performance Numbers
Note: tests consistent (same OS, files, etc) only across same file
TODO:
"accurate" API counts- might provide in a patch, if requested.might provide in a patch if we can figure out what information is verboseverbose
flagdry-run
flagThe text was updated successfully, but these errors were encountered: