Skip to content

Latest commit

 

History

History
140 lines (97 loc) · 4.24 KB

README.md

File metadata and controls

140 lines (97 loc) · 4.24 KB

musescore-dataset

🚨 The dataset has been left unmaintained since Sep 30, 2021.
Help appreciated if you want to take the risk of becoming the victim of personal harassments

The unofficial dataset of all music sheets and users on musescore.com, dedicated to big data analytics / data science / machine learning.

All data is collected by iterating through musecore.com's public API.

The jsonl files are in the Newline-delimited JSON (JSON Lines) format.

Only need the sheet files to learn music? try musescore-downloader.

View/Query in Google BigQuery

User Data

Update Manually,
Last Updated: Nov 9, 2020

https://musescore-dataset.xmader.com/user.jsonl

Music Sheet Metadata

Last Updated: Sep 30, 2021

https://musescore-dataset.xmader.com/score.jsonl

All mscz files

Last Updated: Sep 30, 2021

https://musescore-dataset.xmader.com/mscz-files.csv

# The CSV file itself is on IPFS
# ipns://QmSdXtvzC8v8iTTZuj5cVmiugnzbR1QATYRcGix4bBsioP
cid=$(curl https://musescore-dataset.xmader.com/csv-ipfs-ref | grep -o "\\w\{46\}")
wget -O mscz-files.csv https://ipfs.io/ipfs/${cid}/mscz-files.csv

This is a csv file, which contains score id (id) and the corresponding IPFS reference (ref) to each mscz file.

All files are available on IPFS.
NO ONE CAN TAKE IT DOWN NOW!

Bulk Download

We (LibreScore team) don't condone mass downloads using regular methods.
USE AT YOUR OWN RISK

See https://discord.com/channels/774491656643674122/777457743983411221/1032054445422420039

(You must join the LibreScore Community Discord first to see the message.)
Discord

Download mscz files via IPFS HTTP Gateways

#!/bin/bash
while IFS=, read -r id ref
do
    if [ -f "$id.mscz" ]; then
        echo "$id.mscz exists."
    else
        echo "$id.mscz does not exist."
        wget -nv --read-timeout=3 https://ipfs.io$ref -O $id.mscz
    fi
done < <(sed '1d' mscz-files.csv)

Using CURL

#!/bin/bash
while IFS=, read -r id ref
do
    if [ -f "$id.mscz" ]; then
        echo "$id.mscz exists."
    else
        echo "$id.mscz does not exist."
        curl -\# -f https://ipfs.io$ref -o $id.mscz -m 3
    fi
done < <(sed '1d' mscz-files.csv)

Or using local IPFS daemon

#!/bin/bash

# Install IPFS https://docs.ipfs.io/how-to/command-line-quick-start/#install-ipfs

ipfs daemon --init &

while IFS=, read -r id ref
do
    ipfs get $ref -o $id.mscz
done < <(sed '1d' mscz-files.csv)

Help hosting files

You could help musescore-dataset become more accessible by:

  • Hosting (ipfs pin) those mscz files on your own IPFS nodes

    #!/bin/bash
    while IFS=, read -r id ref
    do
        ipfs pin add -r --progress $ref
    done < <(sed '1d' mscz-files.csv)

    or,

  • Asking a public IPFS gateway to periodically fetch and cache file requests

    #!/bin/bash
    # run in a cron job
    while IFS=, read -r id ref
    do
        echo "fetching $id.mscz"
        curl -\# -f https://ipfs.io$ref -o $id.mscz -m 0.5
        rm -f $id.mscz
    done < <(sed '1d' mscz-files.csv | shuf)

Contact me if you have any questions.

The purpose of the project is to make the data of musescore.com accessible to anyone in need, and bring a clean and high-quality music dataset to the world of computer science, but not for individuals who only want to keep the data pointlessly.