Skip to content

This script preprocesses Backblaze drive_stats CSV files before database insertion. It merges, cleans, and standardizes the data to match the target schema.

Notifications You must be signed in to change notification settings

kennel-org/backblaze-preprocess-drive-stats-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Backblaze Drive Stats CSV Preprocessing

This repository contains an R script designed to preprocess Backblaze drive_stats CSV files. The script handles several tasks including merging specific files, cleaning invalid data, removing empty files, and generating consolidated monthly CSV files with a standardized schema.

Data Source

The original CSV files can be downloaded from the Backblaze Hard Drive Test Data page. After downloading, please extract the CSV files into the infile/csv directory.

Prerequisites

The script requires the following R packages:

  • RMariaDB
  • tictoc
  • stringr
  • purrr
  • dplyr
  • lubridate
  • tidyverse

You can install these packages using the following command in R:

install.packages(c("RMariaDB", "tictoc", "stringr", "purrr", "dplyr", "lubridate", "tidyverse"))

License

This project is licensed under the MIT License.

About

This script preprocesses Backblaze drive_stats CSV files before database insertion. It merges, cleans, and standardizes the data to match the target schema.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages