-
Notifications
You must be signed in to change notification settings - Fork 0
datalad dropbox
-
Installing dropbox for python - https://github.com/dropbox/dropbox-sdk-python (note, that the current version of library is using dropbox api v.2, but it's very easy to find examples for v.1 that will not work!)
-
Creating your dropbox application: https://www.dropbox.com/developers/apps
-
Once you create the application, you can check
App key
,App secret
, and generateaccess token
. If you haveApp key
andApp secret
you can create a token using api, but had some problems and prefer to generate directly from the application site in my account. -
python script to get generate the list:
from dropbox import DropboxOAuth2FlowNoRedirect, Dropbox
auth_code = input("Enter the access token as string here: ")
dbx = Dropbox(auth_code)
res = dbx.files_list_folder(path='/ds000114-nipype_tutorial_output', recursive=True)
# not sure if `recursive=True` works, you can check that this gives only one directory `len(res.entries)`
# if you have subdirectories `res.has_more` returns `True` and you can ask for `cursor`
cur = res.cursor
res_all = dbx.files_list_folder_continue(cur)
# `res_all` should have all paths (to your files and to directories), so you can write the correct paths to a file
with open('url_all.txt', 'w') as file:
for en in res_all.entries:
if hasattr(en, 'size'): # to eliminate paths to directories
# en.path_lower[1:] eliminates the first `/` from the path given from dropbox
# dbx.sharing_create_shared_link creates (or just returns if already exists) a shared link to a file
# but you will get links that starts with "https://www.dropbox.com" and for some reason git annex is not
# able to get files having the link (but it doesn't give an error when you add the file to repo),
# so your link has to start with "https://dl.dropbox.com"
file.write(en.path_lower[1:] + ", " + dbx.sharing_create_shared_link(en.path_lower).url.replace("www", "dl") + "\n")
-
creating repo:
datalad create data_repo
cd data_repo
-
adding all urls from the list, e.g. using python script:
import subprocess
with open('/path/to/the/list/url_list.txt') as file:
for ll in file.readlines():
# `--fast` is really faster and doesn't download the data
command = ["git", "annex", "addurl", "--fast", "--file", ll.split(',')[0], ll.split(',')[1].lstrip()]
subprocess.call(command)
-
creating a github repo
datalad create-sibling-github --github-login YOUR_LOGIN data_repo_github
-
saving and publishing the files on github
datalad add --to-git --nosave *
datalad save -S -m "adding files"
datalad publish --to github *
- you should see all files (i.e. links to them) in your github repo
-
installing repo in a new location
datalad install -s https://github.com/djarecka/data_repo_github data_new_location
cd data_new_location
- You can check the content using
ls
but the data is not there yet.
-
you can check where is your file
git annex whereis path/to/file
-
you can also download a specific file or all the data
datalad get *
TEMP: datalad create new cd new mkdir inputs datalad install -d . -s my_repo to ...new/inputs