forked from AlanBerger/Practice-programming-exercises-for-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfirst-R-programming-exercise-column-number-from-Excel-column-letter-markdown-UTF8.Rmd
221 lines (168 loc) · 10.4 KB
/
first-R-programming-exercise-column-number-from-Excel-column-letter-markdown-UTF8.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
output: md_document
---
## "First programming exercise"
### Alan Berger Aug 20, 2020 minor edits Jan 18, 2021
### additional R code, run the functions, and some text edits Aug 22, 2020
## Introduction
This is the first in a sequence of programming exercises intended to fill the gap between learning the correct
syntax of basic R commands and the programming assignments in the R Programming course in the Johns Hopkins University
Data Science Specialization on Coursera. In this sequence of exercises in "composing" an R function to carry out a particular task,
the idea is to practice correct use of R constructs and built in functions (functions the "come with" the basic R installation), while
learning how to "put together" a correct sequence of blocks of commands that will obtain the desired result.
In these exercises, there will be a statement of what your function should do (what are the input variables and what the function
should return) and a sequence of "hints". To get the most out of these exercises, try to write your function using as few hints as possible.
Note there are often several ways to write a function that will obtain the correct result. For these exercises the directions and
hints may point toward a particular approach intended to practice particular constructs in R and a particular line of reasoning.
There may be an existing R function or package that will do what is stated for a given practice exercise, but here
(unlike other aspects of the R Programming course) the point is to practice formulating a logical sequence of steps,
with each step a section of code, to obtain a working function, not to find an existing solution or a quick solution using a
more powerful R construct that is better addressed later on.
## Motivation for this exercise
The input data for many functions will be or include a tab delimited text file that comes
from or is naturally viewed as an Excel spreadsheet. For example, the last programming assignment in the R Programming
course uses a .csv (comma separated variables) text file appropriately read in using R's read.csv function. It has a total of 46 columns,
only 5 of which are relevant to the assignment (Excel columns B, G, K, Q and W). For this and many other situations,
it would be convenient to have an R function that would take as its input an Excel column name (as a character variable)
and output the corresponding column number.
### Instruction for this function
For the first version of your function, just have it work for a column letter between a
and z and only take as input a lower case letter.
The skeleton of your function should "look like"
```
colnameToNumber <- function(colname) {
# convert an Excel column name
# to the corresponding column number;
# colname should be a lower case letter in the
# form of a character string, for example "a" or "r"
# "between" a and z (including a and z)
# coding lines
return(colnumber)
}
```
Directions: do NOT use a whole bunch of if statements (yes, one could do 26 if statements of the form
`if(colname == "a") colnumber = 1`
and so on, but, among other things, it would be easy to have typographic
mistakes, and writing all those out would get pretty boring).
Note that the built in R variable (R object) **letters** is
the character vector with entries "a", "b", ... , "z" and note the built in R function **which** takes as input a logical
vector (often defined by whether some condition is or is not true) and outputs the integer vector of the
_indices_ of the input vector for which the logical value is **TRUE**, so for example `which(1:4 < 3)` is equal `c(1, 2)`,
and `which(sqrt(1:4) == 2)` is the single value 4 (since `sqrt(1:4)` is the vector c(1, $\sqrt 2$, $\sqrt 3$, 2) which is
`c(1, 1.414214, 1.732051, 2)` (to the number of digits printed).
Your function, when applied to b, g, k, q and w (as character variables) should return 2, 7, 11, 17 and 23,
respectively, and `colnameToNumber("zz")` should
"throw an error" along the lines of
Error in colnameToNumber("zz") : no match to colname
This comes from the R statement: `stop("no match for colname")` which you should have occur in your function if it is the case that the input was not a lower case letter between a and z.
Try writing your program now before going to the additional hints below (the more you do "on your own" the faster you will gain skill at programming).
### Further hints
You can use the **which** function to "pick out" the index k (an integer) of the **letters** vector for which `letters[k]` equals colname, this is the number you want to return.
How might you check for whether colname was a letter between "a" and "z"? What will the **which** function return if colname was not a "valid" value for
the colnameToNumber function as currently constructed (colname is not equal any entry of **letters**)? (Try it out in an R session.)
Things to think about: how would you "extend" your function to treat some additional Excel columns, say through column "dz" (check out the **paste** function, you could use
it along with **letters** to construct vectors containing additional column names and concatenate them together). You could do this with several lines of
code to get the vector of "a" through "dz" (a through z; aa through az, ba through bz; ca through cz; da through dz; then concatenate them into 1 vector).
To be able to find the column number for many more Excel columns you would want to use a for loop to construct and concatenate blocks of column names.
How would you extend your function to easily deal with upper case letters or a "mix" of upper and lower case letters,
as in the column name "Cz" (some people like me are sloppy typists, or will forget about a lower case restriction and use capital letters as Excel does).
Now is the time to make use of Google to search for a built in R function that will convert any upper case letters in a character variable to lower case (and leave other characters as is).
How would you extend your function to treat a vector (of length > 1) of column names?
A working version of this function is given below, but try to do your own function before looking.
### colnameToNumber function
```{r}
colnameToNumber <- function(colname){
# convert an Excel column name to the
# corresponding column number;
# should not depend on whether the
# letter(s) in colname are upper or
# lower case or a mixture of upper and
# lower case
# colname should be a character string such as "k" or "dz"
colname <- tolower(colname)
# so only need to deal with lower case
# arrange to treat column names from "a" up through "dz"
az <- letters # c("a", ... , "z")
aaz <- paste("a", az, sep = "")
# c("aa", "ab", ... , "az")
# additional column names
baz <- paste("b", az, sep = "")
caz <- paste("c", az, sep = "")
daz <- paste("d", az, sep = "")
# concatenate the column names
colnames <- c(az, aaz, baz, caz, daz)
# get the number corresponding to the input column name
# using the which function
colnumber <- which(colnames == colname)
# test for having found a (unique) match
if(length(colnumber) != 1) stop("no match to colname")
return(colnumber)
}
```
Do check runs.
```{r}
colnameToNumber("a")
colnameToNumber("z")
colnameToNumber("aa")
colnameToNumber("az")
colnameToNumber("ba")
colnameToNumber("bz")
colnameToNumber("ca")
colnameToNumber("cz")
colnameToNumber("da")
colnameToNumber("dz")
# colnameToNumber("zz") # should get error message
```
Note because we used the built in **letters** character vector, the calls
to colnameToNumber above should be sufficient to check that it has been coded correctly. Contrast that
with the checks that would be needed if we had used if tests for each column name between a and dz.
For the last part of this exercise, write a function that uses the colnameToNumber function to
convert a character _vector_ of column names to the corresponding vector of column numbers. One could modify
the colnameToNumber function to do this using a for loop, but here write a separate function that uses
the colnameToNumber function (this set up will be used as practice with the **sapply** function later on).
Hints: The output of this function, call the function colvecToNumbers will be an integer vector, call it colnumbers,
having the same **length** as the input character vector, call it colvec. One can initialize colnumbers using the **integer**
function, and then "fill in" its entries in a for loop using the colnameToNumber function.
A working version of this function is given below, but try to do your own function before looking.
### colvecToNumbers function
```{r}
colvecToNumbers <- function(colvec){
# convert a character vector of Excel column names
# to the integer vector of corresponding column numbers
# each entry of colvec should be a column name between a and dz
# check the input is a non-empty character vector
# note a single character string is a character vector of length 1
if(length(colvec) < 1) stop("colvec input to colvecToNumbers is empty")
if(!is.character(colvec)) stop("colvec input to colvecToNumbers is not character")
# create the integer vector to be output
n <- length(colvec)
colnumbers <- integer(n)
# use colnameToNumber to get each entry of colnumbers
for (i in 1:n) {
colname <- colvec[i]
colnumbers[i] <- colnameToNumber(colname)
}
return(colnumbers)
}
```
### check runs for colvecToNumbers
```{r}
colvec <- c("b", "g", "k", "q", "w") # the hospital data relevant columns
colvecToNumbers(colvec)
colvec <- c("a", "aa", "ba", "ca", "da") # start of groups of 26 columns
colvecToNumbers(colvec)
colvec <- c("z", "az", "bz", "cz", "dz") # end of groups of 26 columns
colvecToNumbers(colvec)
```
Agrees with what it should be.
Hope this programming exercise was informative and good practice with
writing a function.
Things to look forward to: later in the R Programming course one will learn
the "apply family" of powerful built in R functions. Not counting the lines
that check whether the input value of colvec is valid, one can write a version
of colvecToNumbers using the one!! line
sapply(colvec, colnameToNumber)
Nice and concise and only one line needs to be checked for any mistakes - **sapply** does
the for loop and creates the vector to be returned without any more lines of code needed.
Fewer lines of code (as long as they are reasonably "readable" - not a birds nest of
parentheses and brackets) mean fewer chances to make a mistake and fewer places for bugs to hide.