Skip to content

datalad dropbox

Dorota Jarecka edited this page Nov 14, 2017 · 2 revisions

Creating datalad repository from data stored in dropbox

Getting a list of shared links from dropbox using python api

  • Installing dropbox for python - https://github.com/dropbox/dropbox-sdk-python (note, that the current version of library is using dropbox api v.2, but it's very easy to find examples for v.1 that will not work!)

  • Creating your dropbox application: https://www.dropbox.com/developers/apps

  • Once you create the application, you can check App key, App secret, and generate access token. If you have App key and App secret you can create a token using api, but had some problems and prefer to generate directly from the application site in my account.

  • python script to get generate the list:

from dropbox import DropboxOAuth2FlowNoRedirect, Dropbox
auth_code = input("Enter the access token as string here: ")

dbx = Dropbox(auth_code)

res = dbx.files_list_folder(path='/ds000114-nipype_tutorial_output', recursive=True)
# not sure if `recursive=True` works, you can check that this gives only one directory `len(res.entries)`
# if you have subdirectories `res.has_more` returns `True` and you can ask for `cursor`

cur = res.cursor
res_all = dbx.files_list_folder_continue(cur)
# `res_all` should have all paths (to your files and to directories), so you can write the correct paths to a file

with open('url_all.txt', 'w') as file:
    for en in res_all.entries:
        if hasattr(en, 'size'): # to eliminate paths to directories
            # en.path_lower[1:] eliminates the first `/` from the path given from dropbox
            # dbx.sharing_create_shared_link creates (or just returns if already exists) a shared link to a file
            # but you will get links that starts with "https://www.dropbox.com" and for some reason git annex is not 
            # able to get files having the link (but it doesn't give an error when you add the file to repo), 
            # so your link has to start with "https://dl.dropbox.com"
             file.write(en.path_lower[1:] + ", " +  dbx.sharing_create_shared_link(en.path_lower).url.replace("www", "dl") + "\n")

Creating datalad repo and publishing on github:

  • creating repo:

    • datalad create data_repo
    • cd data_repo
  • adding all urls from the list, e.g. using python script:

import subprocess

with open('/path/to/the/list/url_list.txt') as file:
    for ll in file.readlines():
        # `--fast` is really faster and doesn't download the data
        command = ["git", "annex", "addurl", "--fast", "--file", ll.split(',')[0], ll.split(',')[1].lstrip()]
        subprocess.call(command)
  • creating a github repo

    • datalad create-sibling-github --github-login YOUR_LOGIN data_repo_github
  • saving and publishing the files on github

    • datalad add --to-git --nosave *
    • datalad save -S -m "adding files"
    • datalad publish --to github *
    • you should see all files (i.e. links to them) in your github repo

how to install the created repo and download data using datalad

  • installing repo in a new location

    • datalad install -s https://github.com/djarecka/data_repo_github data_new_location
    • cd data_new_location
    • You can check the content using ls but the data is not there yet.
  • you can check where is your file

    • git annex whereis path/to/file
  • you can also download a specific file or all the data

    • datalad get *

TEMP: datalad create new cd new mkdir inputs datalad install -d . -s my_repo to ...new/inputs