-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial: Upload a dataset to VALERIA
The University provided access to cutting edge data storage and transfer solutions through the VALERIA service. Most notably, professors have access to 6 Tb of Amazon S3 storage through VALERIA. This storage is ideal for managing datasets. In this tutorial, we will cover how to get access to this storage, upload data and eventually share datasets with the community.
This guide was written by Dominic Baril, 2022
Update, Maxime Vaidis September 2022
-
You will first need to create a VALERIA account, and ask for your supervisor to give you access to the platform.
-
Once your account is created, you need to ask for permission to access to the storage. For example, François has a /norlab repository.
-
You then need to configure your access to the storage. If you are on Ubuntu, the easiest client to configure is rclone.
-
Install rclone using sudo apt install rclone (to ensure the version is stable).
-
Follow the steps shown on this page starting at step 2.1 to configure it on your local computer. You will need to enter your access keys, which can be found on your VALERIA dashboard. Once your rclone is configured to have access to VALERIA's S3 storage, you should see the following return when entering the
rclone config
command:
Current remotes:
Name Type
==== ====
VALERIAS3 s3
You can then use rclone commands to interact with the S3 storage. Refer to the rclone documentation for details on all possible commands. Here are some simple useful ones:
-
rclone ls VALERIAS3:path
: List all directories / files in the "path" folder of your VALERIAS3 remote storage. Here, "path" is the name of the repository on the VALERIAS3 storage (example: "VALERIAS3:/norlab"). Note that if a folder is shared with you, the auto-completion might not work. In this case, just manually type the path you want to access. -
rclone copy sourcepath VALERIAS3:destpath --progress
: Copy the local directory / file "sourcepath" into "destpath" on the S3 storage. -
rclone move VALERIAS3:sourcepath VALERIAS3:destpath
: Move contents of a directory to another on the S3 storage.
-
Once your files are uploaded in the S3 storage, you might want to create a link for quick remote access to share your dataset with the community. To do so, you will need to use the
s3cmd
client instead. You can access it through a terminal via the JupyterHub server on VALERIA. -
Once on JupyterHub, you will first need to configure the S3 command-line tool. Open a terminal and create a
.s3cfg
file:
touch .s3cfg
- Then, using your favorite command-line text editor, paste the following lines in the
.s3cfg
file and adjust the values of theaccess_key
andsecret_key
parameters:
[default]
# VOTRE IDUL
access_key = YOUR_ACCESS_KEY
secret_key = YOUR_SECRET_KEY
access_token =
add_encoding_exts =
add_headers =
bucket_location = US
ca_certs_file =
cache_file =
check_ssl_certificate = True
check_ssl_hostname = True
cloudfront_host = cloudfront.amazonaws.com
content_disposition =
content_type =
default_mime_type = binary/octet-stream
delay_updates = False
delete_after = False
delete_after_fetch = False
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
expiry_date =
expiry_days =
expiry_prefix =
follow_symlinks = False
force = False
get_continue = False
gpg_command = /usr/bin/gpg
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase =
guess_mime_type = True
host_base = s3.valeria.science
host_bucket = %(bucket)s.s3.valeria.science
human_readable_sizes = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
invalidate_on_cf = False
kms_key =
limit = -1
limitrate = 0
list_md5 = False
log_target_prefix =
long_listing = False
max_delete = -1
mime_type =
multipart_chunk_size_mb = 15
multipart_max_chunks = 10000
preserve_attrs = True
progress_meter = True
proxy_host =
proxy_port = 0
- Finally, generate the URL to the dataset:
s3cmd signurl s3://norlab/fr2021_dataset/winter_dataset.zip/winter.zip $(echo "`date +%s` + 3600 * 24 * 7 * 1000" | bc)
This command returns a URL to download the dataset, which is valid for 1000 weeks. It is up to you to define the expiry date for your URL, but beware that an expired URL will prevent other people in the scientific community from accessing your dataset.
- You can then create a page to share the link that allows you to access your dataset. We suggest your to use this Wiki to share your dataset. Here is the example for the Kilometer-scale autonomous navigation in subarctic forests: challenges and lessons learned dataset.
- Warthog Teach and Repeat (ROS1)
- Warthog Teach and Repeat (ROS2)
- Time Synchronization
- Deployment of Robotic Total Stations (RTS)
- Deployment of the backpack GPS
- Warthog Emlid GPS
- Atlans-C INS
- How to use a CB Radio when going in the forest
- IP forwarding
- Emlid Data Postprocessing (PPK)
- Lessons Learned
- Robots' 3D Models
- Order Management
- Fast track Master → PhD
- Intellectual Property
- Repository Guidelines
- TF Cheatsheet
- Montmorency Forest Wintertime Dataset
- RTS-GT Dataset 2023
- Deschenes2021 Dataset
- TIGS Dataset
- DRIVE Datasets
- BorealHDR
- TimberSeg 1.0
- DARPA Subterranean Challenge - Urban Dataset
- How to upload a dataset to VALERIA
- ROS1 Bridge
- Migrating a repository to ROS2 (Humble)
- ROS2 and rosbags
- MCAP rosbags
- DDS Configuration (work in progress)
- Using a USB Microphone with ROS2
- ROS2 in VSCode
- ROS2 Troubleshooting