🎉 First off, thank you for considering contributing to our project! 🎉
This is a community-driven project, so it's people like you that make it useful and successful.
If you get stuck at any point you can create an issue on GitHub (look for the Issues tab in the repository) or contact us at one of the other channels mentioned below.
For general information about contributing to open-source and the Fatiando a Terra projects, please refer to our standard Contributing Guide.
This document also contains guidelines specific to this repository below.
The goal is to maintain a diverse community that's pleasant for everyone. Please be considerate and respectful of others. Everyone must abide by our Code of Conduct and we encourage all to read it carefully.
The following are the requirements that datasets need to meet in oder to be considered for this project.
Definitions:
- Source dataset: the original data as distributed by the data owners/creators.
- Output dataset: the modified/repackaged version that we distribute.
- FAIR data: data that follows the FAIR principles.
Source datasets must:
- Be FAIR data: either in the public domain or distributed under an open licence that does not place restrictions on reuse beyond attribution or using the same license. For example, CC-BY and CC-BY-SA are acceptable but not CC-BY-NC.
- Represent a common real-world application.
- Contain interesting features that lead to teachable moments for tutorials. for example, interesting anomalies easily associated with geology, large gaps in bathymetry lead to interesting interpolation issues, etc.
Output datasets should:
- Contain standard and descriptive variable names. For example, "longitude" instead of "LON", "gravity_disturbance_mgal" instead of "FAA", "easting_m" instead of "x".
- Include associated metadata (datum, license, source, etc.) if supported
by the format. For example, netCDF metadata following CF conventions
through
.attrs
attributes in xarray. - Specify units through appropriate metadata (CF conventions in netCDF or
column names in CSV, like
gravity_disturbance_mgal
). Exceptions are longitude and latitude coordinates which are always in decimal degrees. - Strive to be under 10 Mb in size, if possible. This keeps downloads fast, particularly when building documentation and testing on CI. Use compression when appropriate and only if it doesn't add difficult to install dependencies. Larger files may be considered but should not be used in code that runs on CI to avoid long build times and overloading the data servers.
- Propose a new dataset: First, open an Issue in [][issue] with information about the proposed dataset for discussion.
THE FOLLOWING NEEDS TO BE UPDATED.
Follow these guidelines to prepare the dataset:
- See our standard Contributing Guide for instructions on creating pull requests and setting up your environment.
- Create a folder following the naming convention
location_datatype
(all lower case and separated by_
). - Inside that folder, create a Jupyter notebook called
prepare.ipynb
with the code for downloading (using Pooch), formatting (cleaning, slicing, datum conversion, etc), and exporting the new dataset. Follow the conventions in the other notebooks. - If any new dependencies are required to prepare the dataset, add them to the
environment.yml
file. - The output dataset should follow the same naming convention as the folder:
location_datatype.extension
. - The notebook should create a
preview.jpg
image with a plot of the output dataset for easy inspection. - If the original data can't be automatically downloaded in the notebook and it is under 50 Mb, you may include it in the repository. Feel free to use compression to reduce the size of the file(s).
- Include the information about the new dataset in the
README.md
file.