-
Notifications
You must be signed in to change notification settings - Fork 2
/
operators-data-types.qmd
491 lines (334 loc) · 14 KB
/
operators-data-types.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
# Operators & Data Types {#sec-operators-data-types}
```{r}
#| echo: false
#| message: false
source("_common.R")
```
> ### Learning Objectives {.unnumbered}
>
> * Be able to use R as a calculator.
> * Be able to compare values in R.
> * Know the distinctions between how R handles different types of data types (numbers, strings, and logicals).
>
> ### Suggested Readings {.unnumbered}
>
> * Chapter 3 of Danielle Navarro's book ["Learning Statistics With R"](https://learningstatisticswithr.com/book/introR.html)
## R as a calculator
You can do a ton of things with R, but at its core it's basically a fancy calculator. Let's get started with some basic arithmetic!
### Doing basic math
R handles simple arithmetic using the following **arithmetic** operators:
<div style="width:500px">
```{r}
#| label: 'arithmetic'
#| echo: false
knitr::kable(rbind(
c("addition", "`+`", "`10 + 2`", "`12`"),
c("subtraction", "`-`", "`9 - 3`", "`6`"),
c("multiplication", "`*`", "`5 * 5`", "`25`"),
c("division", "`/`", "`9 / 3`", "`3`"),
c("power", "`^`", "`5 ^ 2`", "`25`")
),
col.names = c(
"operation", "operator", "example input" , "example output"),
align = "lccc"
)
```
</div>
The first four basic operators (`+`, `-`, `*`, `/`) are pretty straightforward and behave as expected:
```{r}
7 + 5 # Addition
7 - 5 # Subtraction
7 * 5 # Multiplication
7 / 5 # Division
```
Not a lot of surprises (you can ignore the `[1]` you see in the returned values...that's just R saying there's only one value to return).
Powers (i.e. $x^n$) are represented using the `^` symbol. For example, to calculate $5^4$ in R, we would type:
```{r}
5^4
```
### Slightly more tricky math
There are two other operators that are not typically as well-known as the first five but are quite common in programming:
<div style="width:500px">
```{r}
#| label: 'mod'
#| echo: false
knitr::kable(
rbind(
c("integer division", "`%/%`", "`4 %/% 3`", "`1`"),
c("modulus", "`%%`", "`8 %% 3`", "`2`")),
col.names = c(
"operation", "operator", "example input" ,
"example output"),
align="lccc")
```
</div>
#### Integer division
Integer division is division in which the remainder is discarded. Note the difference between regular (`/`) and integer (`%/%`) division:
```{r}
4 / 3 # Regular division
```
```{r}
4 %/% 3 # Integer division
```
With integer division, 3 can only go into 4 once, so `4 %/% 3` returns `1`.
With integer division, dividing a number by a larger number will always produce `0` (because the larger number cannot go into the smaller number):
```{r}
4 %/% 5 # Will return 0
```
#### The Modulus operator
The **modulus** (aka "mod" operator) returns the remainder after doing integer division. For example:
```{r}
17 %% 3
```
This returns `2` because because 17 / 3 is equal to 5 with a remainder of 2. The modulus returns any remainder, including decimals:
```{r}
3.1415 %% 3
```
If you mod a number by itself, you'll get `0` (because there's no remainder):
```{r}
17 %% 17 # Will return 0
```
Finally, if you mod a number by a larger number, you'll get the smaller number back since it's the remainder:
```{r}
17 %% 20 # Will return 17
```
### Tricks with `%%` and `%/%`
The `%%` and `%/%` operators can be really handy. Here are a few tricks.
#### Odds and evens with `n %% 2`
You can tell if an integer `n` is even or odd by using `m %% 2`. If the result is `0`, `n` must be even (because 2 goes in _evenly_ to even numbers with no remainder). If `n` is odd, you'll get a remainder of `1`. Here's an example:
```{r}
10 %% 2 # Even
11 %% 2 # Odd
```
This trick also works with negative numbers!
```{r}
-42 %% 2 # Even
-43 %% 2 # Odd
```
### Number "chopping" with 10s
When you use the mod operator `%%` on a **positive** number with factors of 10, it "chops" the number and returns everything to the _right_ of the "chop" point:
```{r}
123456 %% 1 # Chops to the right of the *ones* digit
123456 %% 10 # Chops to the right of the *tens* digit
123456 %% 100 # Chops to the right of the *hundreds* digit
```
Integer division `%/%` works the same way, except it returns everything to the _left_ of the "chop" point:
```{r}
123456 %/% 1 # "Chops to the right of the ones digit
123456 %/% 10 # "Chops to the right of the tens digit
123456 %/% 100 # "Chops to the right of the hundreds digit
```
This trick works with non-integers too!
```{r}
3.1415 %% 1
3.1415 %/% 1
```
But **be careful** - this "trick" only works with _positive_ numbers:
```{r}
-123.456 %% 10
-123.456 %/% 10
```
Here's some mental notes to remember how this works:
- `%%` returns everything to the _right_ (`<chop> ->`)
- `%/%` returns everything to the _left_ (`<- <chop>`)
- The "chop" point is always just to the _right_ of the chopping digit:
<div style="width:500px">
Example | "Chop" point | "Chop" point description
----------------|--------------|----------------------------
`1234 %% 1` | `1234 | ` | Right of the `1`'s digit
`1234 %% 10` | `123 | 4` | Right of the `10`'s digit
`1234 %% 100` | `12 | 34` | Right of the `100`'s digit
`1234 %% 1000` | `1 | 234` | Right of the `1,000`'s digit
`1234 %% 10000` | ` | 1234` | Right of the `10,000`'s digit
</div>
## Comparing things in R
Other than simple arithmetic, another common programming task is to compare different values to see if one is greater than, less than, or equal to the other. R handles comparisons with **relational** and **logical** operators.
## Comparing two things
To compare two things, use the following **relational** operators:
- Less than: `<`
- Less than or equal to : `<=`
- Greater than or equal to: `>=`
- Greater than: `>`
- Equal: `==`
- Not equal: `!=`
The *less than* operator `<` can be used to test whether one number is smaller than another number:
```{r}
2 < 5
```
If the two values are equal, the `<` operator will return `FALSE`, while the `<=` operator will return `TRUE`: :
```{r}
2 < 2
2 <= 2
```
The "greater than" (`>`) and "greater than or equal to" (`>=`) operators work the same way but in reverse:
```{r}
2 > 5
2 > 2
2 >= 2
```
To assess whether two values are equal, we have to use a double equal sign (`==`):
```{r}
(2 + 2) == 4
(2 + 2) == 5
```
To assess whether two values are _not_ equal, we have to use an exclamation point sign with an equal sign (`!=`):
```{r}
(2 + 2) != 4
(2 + 2) != 5
```
It's worth noting that you can also apply equality operations to "strings," which is the general word to describe character values (i.e. not numbers). For example, R understands that a `"penguin"` is a `"penguin"` so you get this:
```{r}
"penguin" == "penguin"
```
However, R is very particular about what counts as equality. For two pieces of text to be equal, they must be _precisely_ the same:
```{r}
"penguin" == "PENGUIN" # FALSE because the case is different
"penguin" == "p e n g u i n" # FALSE because the spacing is different
"penguin" == "penguin " # FALSE because there's an extra space on the second string
```
## Making multiple comparisons
To make a more complex comparison of more than just two things, use the following **logical** operators:
- And: `&`
- Or: `|`
- Not: `!`
**And**:
A logical expression `x & y` is `TRUE` only if *both* `x` and `y` are `TRUE`.
```{r}
(2 == 2) & (2 == 3) # FALSE because the second comparison if not TRUE
```
```{r}
(2 == 2) & (3 == 3) # TRUE because both comparisons are TRUE
```
**Or**:
A logical expression `x | y` is `TRUE` if *either* `x` or `y` are `TRUE`.
```{r}
(2 == 2) | (2 == 3) # TRUE because the first comparison is TRUE
```
**Not**:
The `!` operator behaves like the word *"not"* in everyday language. If a statement is "not true", then it must be "false". Perhaps the simplest example is
```{r}
!TRUE
```
It is good practice to include parentheses to clarify the statement or comparison being made. Consider the following example:
```{r}
!3 == 5
```
This returns `TRUE`, but it's a bit confusing. Reading from left to right, you start by saying "not 3"...what does that mean?
What is really going on here is R first evaluates whether 3 is equal to 5 (`3 == 5`), and then returns the "not" (`!`) of that. A better version of the same thing would be:
```{r}
!(3 == 5)
```
### Order of operations
R follows the typical BEDMAS order of operations. That is, R evaluates statements in this order^[For a more precise statement, see the [operator precedence](http://stat.ethz.ch/R-manual/R-devel/library/base/html/Syntax.html) for R.]:
1. **B**rackets
2. **E**xponents
3. **D**ivision
4. **M**ultiplication
5. **A**ddition
6. **S**ubtraction
For example, if I type:
```{r}
1 + 2 * 4
```
R first computes `2 * 4` and then adds `1`. If what you actually wanted was for R to first add `2` to `1`, then you should have added parentheses around `1` and `2`:
```{r}
(1 + 2) * 4
```
A helpful rule of thumb to remember is that **brackets always come first**. So, if you're ever unsure about what order R will do things in, an easy solution is to enclose the thing you want it to do first in brackets.
> Note that for logical operators, the order precedence is `! > & > |`
For example, consider the following statement:
```{r}
TRUE | FALSE & FALSE
```
This returns `TRUE` because the `&` statement (`FALSE & FALSE`) is evaluated first, so the whole statement simplifies to `TRUE | FALSE`, which returns `TRUE`. If you put parentheses around the `|` statement, it would evaluate first and the whole statement would return `FALSE`:
```{r}
(TRUE | FALSE) & FALSE
```
Similarly, consider the following statement:
```{r}
! TRUE | TRUE
```
This returns `TRUE` because the `!` statement is evaluated first (`! TRUE` is `FALSE`), and the simplified statement `FALSE | TRUE` returns `TRUE`. Again, if you put parentheses around the `|` statement the whole statement becomes `FALSE`:
```{r}
! (TRUE | TRUE)
```
## Data types
Every programming language has the ability to store data of different types. R recognizes several important basic data types (there are others, but these cover most cases):
Type | Description | Example
----------|-----------------------------|----------
`double` | Number with a decimal place (aka "float") | `3.14`, `1.61803398875`
`integer` | Number without a decimal place | `1`, `42`
`character` | Text in quotes (aka "string") | `"this is some text"`, `"3.14"`
`logical` | True or False (for comparing things) | `TRUE`, `FALSE`
If you want to check with type a value is, you can use the function `typeof()`. For example:
```{r}
typeof("hello")
```
### Numeric types
Numbers in R have the `numeric` data type, which is also the default computational type. There are two types of numbers:
- **Integers**
- **Non-integers** (aka "double" or "float")
The difference is that integers don't have decimal values. A non-integer in R has the type "`double`":
```{r}
typeof(3.14)
```
By default, R assumes all numbers have a decimal place, even if it _looks_ like an integer:
```{r}
typeof(3)
```
In this case, R assumes that `3` is really `3.0`. To make sure R knows you really do mean to create an integer, you have to add an `L` to the end of the number^[Why `L`? Well, it's a bit [complicated](https://stackoverflow.com/questions/24350733/why-would-r-use-the-l-suffix-to-denote-an-integer), but R supports complex numbers which are denoted by `i`, so `i` was already taken. A quick answer is that R uses 32-bit _long_ integers, so `L` for "long".]:
```{r}
typeof(3L)
```
### Character types
A character value is used to represent string values in R. Anything put between single quotes (`''`) or double quotes (`""`) will be stored as a character. For example:
```{r}
typeof('3')
```
Notice that even though the value _looks_ like a number, because it is inside quotes R interprets it as a character. If you mistakenly thought it was a a number, R will gladly return an error when you try to do a numerical operation with it:
```{r}
#| error: true
'3' + 7
```
It doesn't mattef if you use single or double quotes to create a character. The only time is _does_ matter is if the character is a quote symbole itself. For example, if you wanted to type the word `"don't"`, you should use double quotes so that R knows the single quote is part of the character:
```{r}
typeof("don't")
```
If you used single quotes, you'll get an error because R reads `'don'` as a character:
```{r}
#| error: true
typeof('don't')
```
We will go into much more detail about working with character values later on in [Week 7](L7-strings.html).
### Logical types
Logical data only have two values: `TRUE` or `FALSE`. Note that these are not in quotes and are in all caps.
```{r}
typeof(TRUE)
typeof(FALSE)
```
R uses these two special values to help answer questions about logical statements. For example, let's compare whether `1` is greater than `2`:
```{r}
1 > 2
```
R returns the values `FALSE` because 1 is not greater than 2. If I flip the question to whether `1` is _less_ than `2`, I'll get `TRUE`:
```{r}
1 < 2
```
### Special values
In addition to the four main data types mentioned, there are a few additional "special" types: `Inf`, `NaN`, `NA` and `NULL`.
**Infinity**: `Inf` corresponds to a value that is infinitely large (or infinitely small with `-Inf`). The easiest way to get `Inf` is to divide a positive number by 0:
```{r}
1/0
```
**Not a Number**: `NaN` is short for "not a number", and it's basically a reserved keyword that means "there isn't a mathematically defined number for this." For example:
```{r}
0/0
```
**Not available**: `NA` indicates that the value that is "supposed" to be stored here is missing. We'll see these much more when we start getting into data structures like vectors and data frames.
**No value**: `NULL` asserts that the variable genuinely has no value whatsoever, or does not even exist.
## Page sources {.unnumbered}
Some content on this page has been modified from other courses, including:
- Danielle Navarro's book ["Learning Statistics With R"](https://learningstatisticswithr.com/book/introR.html)
- Jenny Bryan's [STAT 545 Course](http://stat545.com/)
- [RStudio primers](https://rstudio.cloud/learn/primers/1.2)
- Xiao Ping Song's [Intro2R crash course](https://github.com/xp-song/Intro2R)