This repository contains an R script designed to preprocess Backblaze drive_stats CSV files. The script handles several tasks including merging specific files, cleaning invalid data, removing empty files, and generating consolidated monthly CSV files with a standardized schema.
The original CSV files can be downloaded from the Backblaze Hard Drive Test Data page. After downloading, please extract the CSV files into the infile/csv
directory.
The script requires the following R packages:
RMariaDB
tictoc
stringr
purrr
dplyr
lubridate
tidyverse
You can install these packages using the following command in R:
install.packages(c("RMariaDB", "tictoc", "stringr", "purrr", "dplyr", "lubridate", "tidyverse"))
This project is licensed under the MIT License.