Skip to content

brianrice2/ds-project-template

Repository files navigation

Data Science Project Template

An example repo structure for data science projects.

So what does using this template offer?

Table of Contents

Directory structure 🗺️

├── README.md                         <- You are here!
|
├── .github/                          <- Automated configurations for Github 
|
├── config/                           <- Configuration files
│   ├── local/                        <- Private configuration files and environment variable
|   |                                      settings (not tracked)
│   └── logging/                      <- Configuration of Python loggers
|
├── data/                             <- Data files used for analysis or by the app itself
│   ├── cleaned/                      <- Processed data
│   └── raw/                          <- Raw data
|
├── deliverables/                     <- Final presentations, white papers, etc. for
|                                          stakeholders
│
├── docs/                             <- Sphinx documentation based on Python docstrings
│
├── notebooks/
│   ├── archive/                      <- Development notebooks no longer being used
│   ├── deliver/                      <- Notebooks shared with others / in final state
│   └── develop/                      <- Current notebooks being used in development
|
├── scripts/                          <- Standalone Python/bash/other scripts
│
├── src/                              <- Source code for the project 
│
├── tests/                            <- Pytest unit tests
│
├── environment.yml                   <- Environment specs for conda
├── mypy.ini                          <- Mypy configuration
├── pylintrc                          <- Pylint configuration
├── Makefile                          <- Defines handy directives for automation
└── setup.py                          <- Make the repo into a package for easier imports

Setup ⛺

Python virtual environment

First, edit the starter environment details in environment.yml with your desired environment name and Python version. Be sure to also reflect the Python version in .github/workflows/tests.yml!

Then, create the new environment:

make setup

Finally, activate your environment:

conda activate myenv

If you install any new dependencies, either manually edit the environment.yml spec or run make conda-export to save all aspects of the current environment. Happy coding!

Install your package

In order to make using the src/ source code modules easier from any directory in the project, this project is a (minimal) package. Install the package via:

pip install -e .

Github Actions

This repo uses Github Actions to automatically run workflows. If you want to trigger actions, you'll need to configure a personal access token. See this section of the TOC Generator docs:

  1. Generate a personal access token with the public_repo or repo scope (repo is required for private repositories).
  2. Save as ACCESS_TOKEN in this repository's "Secrets" settings

About

An example repo structure for data science projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published