Skip to content

Latest commit

 

History

History
96 lines (74 loc) · 4.51 KB

README.md

File metadata and controls

96 lines (74 loc) · 4.51 KB

ABS labour force report automation

This repository contains a minimal viable example of an R data visualisation and report generation workflow using ABS labour force open data.

The contents of this repository have been created to support the Automating R Markdown report generation - Part 2 tutorial in my r_tips repository.

Rmd tips

  • As referenced in this GitHub issue, path handling by rmarkdown::render() is currently not ideal as the output_dir argument creates an absolute path for rendered figures. This can be resolved by using xfun::in_dir("code", ...) to render inside .\code and then moving the outputs into .\output.

CI/CD automation tips

  • Use renv to manage package version and commit your renv.lock file with your repository. The renv package will automatically create a second .gitignore file in ~/renv, which prevents the private project library ~/renv/library from being committed.

  • Load the minimum set of packages required i.e. load dplyr instead of tidyverse if you are just performing simple data transformations and avoid using pacman::p_load().

  • The package renv uses static analysis to determine which packages are used i.e. by scanning your code for calls to library(pkg), require(pkg) or pkg::. Due to this functionality, avoid mapping package loading with lapply(packages, library, character.only = TRUE) as described here.

    # Recommended due to renv static analysis approach 
    library("here")  
    library("readr")  
    
    # Also recommmended for extra code reproducibility
    here::here(...)
    readr::read_csv(...)
    
    # Not recommended 
    packages <- c("here", "readr")
    invisible(lapply(packages, library, character.only = TRUE))
    
  • The pandoc package is not bundled with the rmarkdown package (pandoc is provided by RStudio) so the correct version of pandoc needs to be manually specified in the YAML pipeline.

    steps:
      # Checks out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2
    
      # Sets up pandoc which is required for knitting HTML reports  
      - uses: r-lib/actions/setup-pandoc@v2
        with:
          pandoc-version: '2.17.1' 
    
  • A virtual R environment needs to first be set up.

    steps:
      - name: Setup R version 4.1.2
        uses: r-lib/actions/setup-r@v2
        with:
          r-version: '4.1.2' 
    
  • The template CI/CD code for using renv to install R package dependencies is found here, based on a GitHub actions renv cache issue recorded here.

    env:
        RENV_PATHS_ROOT: ~/.local/share/renv
    
    steps:
      # Set up R packages cache for workflow reruns 
      - name: Cache R packages
        uses: actions/cache@v1
        with:
           path: ${{ env.RENV_PATHS_ROOT }}
           key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
           restore-keys: |-
              ${{ runner.os }}-renv-
    
      # Install cURL to transfer data to virtual environment
      - run: sudo apt-get install -y --no-install-recommends libcurl4-openssl-dev
    
      # Install renv and project specific R packages 
      - name: Restore R packages
        shell: Rscript {0}
        run: |
          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
          renv::restore()
    
  • Write scripts that are self-contained. This means using one script to separately load all R libraries should be avoided, to minimise errors in case one job cannot access the outputs of another job.

  • I personally prefer running scripts as separate steps, for better job progress monitoring.

      # Execute R scripts
      - name: Extract data from ABS labour force data API
        run: Rscript code/01_extract_data.R
    
      - name: Clean raw labour force data
        run: Rscript code/02_clean_data.R