-
Notifications
You must be signed in to change notification settings - Fork 1
/
SD_Flat_File_April23.Rmd
76 lines (58 loc) · 2.49 KB
/
SD_Flat_File_April23.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
title: "SD_Flat_File_April23"
author: "Holly Kundel"
date: "`r Sys.Date()`"
output: html_document
---
#SD Flat File work
- able to read in all files and rename columns:
## Load libraries
```{r, warning = FALSE}
library(readr)
library(readxl)
library(dplyr)
library(stringr)
library(arrow)
library(data.table)
library(googledrive)
library(janitor)
library(tidyr)
library(data.table)
library(tidyr)
```
#Testing Mike's WI code on SD data
```{r}
# find files on Google Drive Desktop
SD_files_list <- list.files(path = "G:/Shared drives/Hansen Lab/RESEARCH PROJECTS/Fish Survey Data/SD_Data/sd_raw_disaggregated_data", pattern = ".+\\.csv") #grabs only.csv files
SD_files_list #check that file names look correct
n <- length(SD_files_list)
for(i in 1:n) {
#i = 1
assign(gsub(".csv","", SD_files_list[i]),
data.table(read_csv_arrow(paste0("G:/Shared drives/Hansen Lab/RESEARCH PROJECTS/Fish Survey Data/SD_Data/sd_raw_disaggregated_data/",
SD_files_list[i]))))
#consider moving renaming into here!
#this does those two steps in one package
# note we want to review a sorted list of column names to check misspelling etc.
cde %>% # call data explainer file
filter(`new_file_name`== gsub(".csv","", SD_files_list)[i])%>% #keep only the row relevant to this file
select_if(~ !any(is.na(.))) %>%
transpose(keep.names = "newname") %>%
rename("oldname" = V1) %>%
assign("names", ., envir = .GlobalEnv)
#see if any column names will not have a match!
# IF any pop FALSE, force stop and revist of data explainer ()
# - e.g., named something "total catch" when actual column name was "total_catch"
print(
cbind(colnames(get(gsub(".csv","", SD_files_list)[i])),
colnames(get(gsub(".csv","", SD_files_list)[i])) %in% names[ !str_detect(newname,"unique_row_key"), oldname, ]
)
)
# break the loop if the current file has column names not in the data explainer
if (all(cbind(colnames(get(gsub(".csv","", SD_files_list)[i])), colnames(get(gsub(".csv","", SD_files_list)[i])) %in% names[ !str_detect(newname,"unique_row_key"), oldname, ])[,2]) == FALSE ) break
#now rename that file's colnames
setnames(get(gsub(".csv","", SD_files_list)[i]), colnames(get(gsub(".csv","", SD_files_list)[i])), names[!str_detect(newname,"unique_row_key")] [match(names(get(gsub(".csv","", SD_files_list)[i])),names[!str_detect(newname,"unique_row_key"),oldname]), newname] )
#confirm import of files:
print(paste(gsub(".csv","", SD_files_list)[i] ,"added to workspace" ))
}
```