forked from uva-cs/pdr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-numbers.html
1027 lines (910 loc) · 38.7 KB
/
03-numbers.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>CS 2150: 03-numbers slide set</title>
<meta name="description" content="A set of slides for a course on Program and Data Representation">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../slides/reveal.js/css/reveal.css">
<link rel="stylesheet" href="../slides/reveal.js/css/theme/black.css" id="theme">
<link rel="stylesheet" href="../slides/css/pdr.css">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../slides/reveal.js/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../slides/reveal.js/css/print/pdf.css' : '../slides/reveal.js/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="../slides/reveal.js/lib/js/html5shiv.js"></script>
<![endif]-->
<script type="text/javascript" src="../slides/js/dhtmlwindow.js"></script>
<script type="text/javascript" src="../slides/js/canvas.js"></script>
<link rel="stylesheet" href="../slides/css/dhtmlwindow.css" type="text/css">
<style>.reveal li { font-size:93%; line-height:120%; }</style>
</head>
<body onload="canvasinit()">
<div id="dhtmlwindowholder"><span style="display:none"></span></div>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section data-markdown id="cover"><script type="text/template">
# CS 2150
### Program and Data Representation
<center><small><a href="http://www.cs.virginia.edu/~asb">Aaron Bloomfield</a> (aaron@virginia.edu)<br><a href="http://www.cs.virginia.edu/~nn4pj">Rich Nguyen</a> (nn4pj@virginia.edu)<br><a href="http://www.cs.virginia.edu/~mrf8t">Mark Floryan</a> (mrf8t@virginia.edu)</small></center>
<center><small><a href="http://@github/uva-cs/pdr">@github</a> | <a href="index.html">↑</a> | <a href="daily-announcements.html?print-pdf"><img class="print" width="20" src="../slides/images/print-icon.png"></a></small></center>
## Number Representation
</script></section>
<section>
<h2>CS 2150 Roadmap</h2>
<table class="wide">
<tr><td colspan="3"><p class="center">Data Representation</p></td><td></td><td colspan="3"><p class="center">Program Representation</p></td></tr>
<tr>
<td class="top"><small> <br> <br>string<br> <br> <br> <br>int x[3]<br> <br> <br> <br>char x<br> <br> <br> <br>0x9cd0f0ad<br> <br> <br> <br>01101011</small></td>
<!-- image adapted from http://openclipart.org/detail/3677/arrow-left-right-by-torfnase -->
<td><img class="noborder" src="images/red-double-arrow.png" height="500" alt="vertical red double arrow"></td>
<td class="top"> <br>Objects<br> <br>Arrays<br> <br>Primitive types<br> <br>Addresses<br> <br>bits</td>
<td> </td>
<td class="top"><small> <br> <br>Java code<br> <br> <br>C++ code<br> <br> <br>C code<br> <br> <br>x86 code<br> <br> <br>IBCM<br> <br> <br>hexadecimal</small></td>
<!-- image adapted from http://openclipart.org/detail/3677/arrow-left-right-by-torfnase -->
<td><img class="noborder" src="images/green-double-arrow.png" height="500" alt="vertical green double arrow"></td>
<td class="top"> <br>High-level language<br> <br>Low-level language<br> <br>Assembly language<br> <br>Machine code</td>
</tr>
</table>
</section>
<section data-markdown><script type="text/template">
# Contents
[Introduction](#/introduction)
[Radix Conversion](#/radix)
[Machine Representation](#/machinerep)
[Endian-ness](#/endian)
[Integer Representation](#/integers)
[Real Representation](#/reals)
</script></section>
<section>
<section id="introduction" data-markdown><script type="text/template">
# Introduction
</script></section>
<section data-markdown><script type="text/template">
## Numbers vs. Numerals
- Which is bigger?
- 5 or 8 or 12
- What if they are sorted alphabetically?
- Which is "five"?
- five or V or cinq or 101
- We use numerals represent numbers
</script></section>
<section>
<h2>Positional Number Systems</h2>
<ul>
<li>Integers
<ul>
<li>346 = 3*10<sup>2</sup> + 4*10<sup>1</sup> + 6*10<sup>0</sup></li>
<li>346 = 2<sup>8</sup> + 2<sup>6</sup> + 2<sup>4</sup> + 2<sup>3</sup> + 2<sup>1</sup>
<ul><li>=1*2<sup>8</sup>+0*2<sup>7</sup>+1*2<sup>6</sup>+0*2<sup>5</sup>+1*2<sup>4</sup>+1*2<sup>3</sup>+0*2<sup>2</sup>+1*2<sup>1</sup>+0*2<sup>0</sup></li></ul></li>
<li>\( d_{n} d_{n-1} \ldots d_{0} = \sum_{i=0}^{n} d_{i} \cdot R^{i} \)</li>
</ul>
</li>
<li>Reals
<ul>
<li>\( d_{n} d_{n-1} \ldots d_{0} . d_{-1} d_{-2} \ldots d_{-m} = \sum_{i=-m}^{n} d_{i} \cdot R^{i} \)</li>
</ul>
</li>
</ul>
</section>
<section data-markdown><script type="text/template">
## Examples
- Binary (base 2): 1111<sub>2</sub>
- Ternary (base 3): 120<sub>3</sub>
- Octal (base 8): 17<sub>8</sub>
- Hexadecimal (base 16): F
</script></section>
</section>
<section>
<section id="radix" data-markdown><script type="text/template">
# Radix Conversion
</script></section>
<section>
<h2>Conversion between bases</h2>
<p>Radix <i>R</i> to decimal:<p>
<p>\( n = d_n R^n + \ldots + d_0 R^0 \)</p>
<p> </p>
<p>Decimal to radix <i>R</i>:</p>
<p>\( \frac{n}{R} = d_n R^{n-1} + \ldots + d_1 R^0 \), remainder \( d_0 \)</p>
</section>
<section>
<h2>Radix to Decimal</h2>
<p>\( 42_5 \)</p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p>\( 121_3 \)</p>
<p> </p>
<p> </p>
<script type="text/javascript">insertCanvas();</script>
</section>
<section>
<h2>Decimal to Radix</h2>
<p>\( 42_{10} \) to radix 5</p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p>\( 121_{10} \) to radix 11</p>
<p> </p>
<p> </p>
<script type="text/javascript">insertCanvas();</script>
</section>
<section data-markdown><script type="text/template">
## What is the value of "101"?
- In binary: 5
- In octal (base 8): 65
- In decimal: 101
- In hexadecimal: 257
</script></section>
<section data-markdown><script type="text/template">
## Specifying numbers in other bases
- In C/C++, any integer that begins with '0' is interpreted as octal
- 073 = 73<sub>8</sub> = 59<sub>10</sub>
- And any number that begins with '0x' is interpreted as hexadecimal
- 0x73 = 73<sub>16</sub> = 115<sub>10</sub>
- 0x3f = 3f<sub>16</sub> = 63<sub>10</sub>
- Case does not matter (0xFF == 0xff)
- There is no (convenient) way to specify a number in binary (they are almost always done in hex)
</script></section>
<section data-markdown><script type="text/template">
## Converting between binary and hexadecmial
- Split the binary number into 4 bit chunks ('nibbles')
- Convert each one to a ***single*** hexadecimal digit
- Converting through decimal, if necessary
- Example: 0100 1010 1000 1101<sub>b</sub> => 0x4a8d
- Likewise, hexadecimal to binary is just converting each hex digit to 4 bits
- Example: 0x13ac => 0001 0011 1010 1100<sub>b</sub>
- Keep in mind that it takes ***two*** hex digits to make a byte
</script></section>
</section>
<section>
<section id="machinerep" data-markdown><script type="text/template">
# Machine<br>Representation
</script></section>
<section data-markdown><script type="text/template">
## ENIAC
- Started 1943 - early electronic programmable computer
- Operational in 1946, and computed ballistics tables
- 17,468 vacuum tubes and used 150 kW of power
- Earlier Computers: Z3 (Conrad Zuse) in 1941; Colossus in 1943
![ENIAC](images/03-numbers/eniac.png)
</script></section>
<section>
<h2>Directions for getting 6<h2>
<ol style="font-size:55%;line-height:120%;font-variant:normal">
<li>Choose any regular accumulator (ie. Accumulator #9).</li>
<li>Direct the Initiating Pulse to terminal 5i.</li>
<li>The initiating pulse is produced by the initiating unit's Io terminal (usually plugged into Program Line 1-1) each time the Eniac is started. Simply connect a program cable from Program Line 1-1 to terminal 5i on this Accumulator.</li>
<li>Set the Repeat Switch for Program Control 5 to 6.</li>
<li>Set the Operation Switch for Program Control 5 to ADD.</li>
<li>Set the Clear-Correct switch to C.</li>
<li>Turn on and clear the Eniac. </li>
<li>If there are random neons illuminated in the accumulators, press the "Initial Clear" button of the Initiating device</li>
<li>Press the "Initiating Pulse Switch" that is located on the Initiating device.</li>
<li><b>Stand back.</b></li>
</ol>
</section>
<section data-markdown><script type="text/template">
## ENIAC Number Representation
- Decimal system
- Ring of 36 vacuum tubes to store one digits (10 flip-flops to store 0-9)
- Designed to emulate mechanical adding machine electronically
- 20 accumulators (~registers), each stores 10-digits
- 5,000 cycles per second
- Perform addition/subtraction between 2 accumulators each cycle
</script></section>
<section data-markdown><script type="text/template">
## Binary Number Representations
- First presented by Gottfried Leibniz, 1705 ("Explication de l'Arithmetique Binaire")
- George Boole ("Boolean" logic), 1854
- Claude Shannon's 1937 Master's thesis: implemented Boolean algebra with switches and relays
- Used by Atanasoff-Berry Computer, Colossus and Z3
</script></section>
<section>
<h2>Binary Representation</h2>
<p><i>n</i>-bit binary number: \( b_{n-1} b_{n-2} b_{n-3} \ldots b_2 b_1 b_0 \)</p>
<p>Value = \( \sum_{i=0}^{n-1} b_i \cdot 2^i \)</p>
<p> </p>
<p>Boolean arithmetic:</p>
<ul>
<li>0 + 0 = 0</li>
<li>0 + 1 = 1</li>
<li>1 + 0 = 1</li>
<li>1 + 1 = 0 carry 1</li>
</ul>
<p>Maximum value is \( 2^n-1 \) -- but what should \( n \) be?</p>
</section>
<section data-markdown><script type="text/template">
## What is *n*?
- Java:
- byte, char = 8 bits
- short = 16 bits
- int = 32 bits
- long = 64 bits
- C: implementation-defined
- `unsigned int`: can hold between 0 and `UINT_MAX`
- `UINT_MAX` must be at least 65535
- `UINT_MAX` is in `<climits>`
- n >= 16, typical current machines *n* = 32 or 64
- `sizeof(int)` will evaluate to the byte size of an int in C/C++
</script></section>
</section>
<section>
<section id="endian" data-markdown><script type="text/template">
# Endian-ness
</script></section>
<section data-markdown><script type="text/template">
## The Great Debate
- "Big-endian": most significant ***first*** (lowest address)
- 1000 0000 0000 0000 = 2<sup>15</sup> = 32768
- "Little-endian": most significant ***last*** (highest address)
- 1000 0000 0000 0000 = 2<sup>0</sup> = 1
- Which is better?
- Note that although all the *bits* are reversed, usually it is displayed with just the *bytes* reversed
</script></section>
<section data-markdown><script type="text/template">
## More on Endian-ness
- Often refers to *byte* ordering, rather than *bit* ordering
- Consider 0xdeadbeef
- On a big-endian machine, that's 0xdeadbeef
- On a little-endian machine, that's 0xefbeadde
- 0xdeadbeef is used as a memory allocation pattern by some OSes
</script></section>
<section data-markdown><script type="text/template">
## Endianness
- It's a "religious" argument: names taken from Big-Endians and Little-Endians in *Gulliver's Travels* who argued over which end of an egg to crack
- Different orderings problematic
- Consider what << means in C
- big-endian ~ multiply by 2
- little-endian ~ divide by 2
- Some architectures support both ("bi-Endian"): PowerPC, DEC Alpha, IA/64
- There were even some middle-endian machines once upon a time
- Most Internet standards: big-endian
</script></section>
<!--
<section data-markdown><script type="text/template">
## Endian checking, 1 of 2
(no external source code)
```
void CheckEndian () {
static int firsttime = 1;
if (firsttime) {
union {
char charword[4];
unsigned int intword;
} check;
check.charword[0] = 1; check.charword[1] = 2;
check.charword[2] = 3; check.charword[3] = 4;
// continued on next slide...
```
</script></section>
<section data-markdown><script type="text/template">
## Endian checking, 2 of 2
(no external source code)
```
#ifdef IS_BIG_ENDIAN
if (check.intword != 0x01020304) { /* big */
cerr << "ERROR: Host machine is not Big-endian.\n"
<< "Exiting." << endl;
Exit (205); }
#else
#ifdef IS_LITTLE_ENDIAN
if (check.intword != 0x04030201) { /* little */
cerr << "ERROR: Host machine is not Little-endian.\n"
<< "Exiting." << endl;
Exit (206); }
#else
cerr << "ERROR: Host machine not defined as Big or "
<< Little-endian.\nExiting." << endl;
Exit (207);
#endif // IS_LITTLE_ENDIAN
#endif // IS_BIG_ENDIAN
firsttime = 0;
}
}
```
</script></section>
<section data-markdown><script type="text/template">
## Always writing a little-endian file
```
void Image::WriteInt (int value, FILE * fp) {
union {
int intvalue;
struct {
#ifdef IS_LITTLE_ENDIAN
char a, b, c, d;
#else
#ifdef IS_BIG_ENDIAN
char d, c, b, a;
#else
#error Must define IS_BIG_ENDIAN or IS_LITTLE_ENDIAN
#endif // IS_BIG_ENDIAN
#endif // IS_LITTLE_ENDIAN
} endian;
} e;
e.intvalue = value;
fputc (e.endian.a, fp);
fputc (e.endian.b, fp);
fputc (e.endian.c, fp);
fputc (e.endian.d, fp);
}
```
</script></section>
-->
<section data-markdown><script type="text/template">
## More on Endianness
- Little vs. big-endian deals with the *byte* order, not the *bit* order
![Endian-ness](images/03-numbers/endian-ness.png)
</script></section>
<section data-markdown><script type="text/template">
## Another way to think of Endianness
- Big-endian:
- the quick brown fox jumped over the lazy dog
- Little-endian
- dog lazy the over jumped fox brown quick the
</script></section>
</section>
<section>
<section id="integers" data-markdown><script type="text/template">
# Integer<br>Representation
</script></section>
<section data-markdown><script type="text/template">
## Sign-and-magnitude
- Sign Bit, sign-and-magnitude
![sign and magnitude](images/03-numbers/sign-and-magnitude.png)
- Algorithm to encode:
- Encode absolute value of the number using *n*-1 bits
- First bit is 1 if the number is < 0
- Problem!
- Two representations for 0
</script></section>
<section data-markdown><script type="text/template">
## One's complement
- Sign Bit, One's Complement
![one's complement](images/03-numbers/ones-complement.png)
- Algorithm to encode:
- Encode absolute value of the number using *n*-1 bits
- If negative, flip all the bits
- Problem!
- Still two representations for 0
</script></section>
<section data-markdown><script type="text/template">
## Two's complement
- Avoids two representations for zero
![two's complement](images/03-numbers/twos-complement.png)
- A negative number has it's bits fipped, and then you add 1
- This shifts all the numbers by one, avoiding two representations for zero
- Most common means of representing integers in computers
</script></section>
<section data-markdown><script type="text/template">
## Two's complement
- Algorithm for an *n*-bit memory space:
- Zero is *n* 0's
- For positive numbers, encode normally in *n*-1 bits
- Maximum value is 2<sup>*n*-1</sup>-1
- Sign bit is zero
- This, zero is a "positive" number! (and is all zeros)
- For negative numbers, take the absolute value
- Then subtract that from 2<sup>*n*</sup>, and encode that value
- Maximum value is -2<sup>*n*-1</sup>
- Alternatively, encode the absolute value, flip the bits, and add 1
</script></section>
<section>
<h2>Two's complement (n=8)</h2>
<table class="transparent"><tr><td class="top">
<ul>
<li>0<ul class="fragment"><li>0<sub>d</sub> = 00000000<sub>b</sub></li></ul></li>
<li>1<ul class="fragment"><li>1<sub>d</sub> = 00000001<sub>b</sub></li></ul></li>
<li>10 = 8+2<ul class="fragment"><li>10<sub>d</sub> = 00001010<sub>b</sub></li></ul></li>
<li>100 = 64+32+4<ul class="fragment"><li>100<sub>d</sub> = 01100100<sub>b</sub></li></ul></li>
<li>127 = 64+32+16+8+4+2+1<ul class="fragment"><li>127<sub>d</sub> = 01111111<sub>b</sub></li></ul></li>
</ul>
</td><td> </td><td class="top">
<ul>
<li>-1<ul class="fragment"><li>+1<sub>d</sub> = 00000001<sub>b</sub></li><li>-1<sub>d</sub> = 11111111<sub>b</sub></li></ul></li>
<li>-10<ul class="fragment"><li>+10<sub>d</sub> = 00001010<sub>b</sub></li><li>-10<sub>d</sub> = 11110110<sub>b</sub></li></ul></li>
<li>-100<ul class="fragment"><li>+100<sub>d</sub> = 01100100<sub>b</sub></li><li>-100<sub>d</sub> = 10011100<sub>b</sub></li></ul></li>
<li>-128<ul class="fragment"><li>+128<sub>d</sub> = 10000000<sub>b</sub></li><li>-128<sub>d</sub> = 10000000<sub>b</sub></li></ul></li>
</ul>
</td></tr></table>
</section>
<section data-markdown><script type="text/template">
## Two's complement (*n*=8)
| sign | msb | | | | | | lsb | | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = | 127 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | = | 2 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = | -1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | = | -2 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = | -127 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = | -128 |
</script></section>
<section data-markdown><script type="text/template">
## Integer overflow
- Given a signed 8-bit integer: 127 = 0111 1111
- The maximum value for that data type
- If you add 1 to that, you get 1000 0000 (= -128<sub>d</sub>)
- This is called *overflow* (or *integer overflow*)
- Other common overflows:
- 16-bits: 32,767 + 1 = -32,768
- 32-bits: 2,147,483,647 + 1 = -2,147,483,648
- 64-bits: 9,223,372,036,854,775,807 + 1 = -9,223,372,036,854,775,808
- A real-word application: [*Gangnam Style* overflows INT_MAX, forces YouTube to go 64-bit](http://arstechnica.com/business/2014/12/gangnam-style-overflows-int_max-forces-youtube-to-go-64-bit/) (Dec 3, 2014)
- If you care, it's now at 3.2 billion views (as of Sep 2018)
</script></section>
<section data-markdown><script type="text/template">
## unsigned types
- C/C++ has *unsigned* types
- A 32-bit `int` has a range from $-2^{31} \rightarrow 2^{31}-1$
- That's -2.15 billion to +2.15 billion
- or maybe it's 64 bits...
- An `unsigned int` has a range from $0 \rightarrow 2^{32}-1$
- That's zero to +4.3 billion
- Still $2^{32}$ values, but all positive (and zero)
- Tyipcally used for returning sizes of strings, vectors, etc.
</script></section>
<section>
<h2>A bit of an aside...</h2>
<p>This is an <i>actual</i> comment left by a graduating SEAS computer science major:</p>
<p> </p>
<blockquote class="fragment">
<p>I can't count the number of times I was taught how to count in binary</p>
</blockquote>
<p> </p>
<p class="fragment">So this material may appear again in the CS curriculum. Hopefully everybody will (eventually) learn it...</p>
</section>
</section>
<section>
<section id="reals" data-markdown><script type="text/template">
# Real<br>Representation
</script></section>
<section data-markdown><script type="text/template">
## Real Numbers
- 1/3
- π
- 0.1
- 3.333333333333 * 10<sup>-1</sup>
- √<span style="text-decoration:overline;"> 2 </span>
</script></section>
<section data-markdown><script type="text/template">
## Fixed Point
- Radix point is fixed at some position
![fixed point](images/03-numbers/fixed-point.png)
- Pros:
- Less computationally demanding
- Good for CPUs that don't have an FPU
- Number representation space is uniform
- Cons:
- Less flexible
- Nobody uses it anymore
- Small range of values
</script></section>
<section data-markdown><script type="text/template">
## Floating Point, part 1
- Each number in scientific notation has four parts: -3.24 \* 10<sup>6</sup>
- Sign bit (1 is negative)
- Mantissa (the "value" of the number): 3.24 in the example above
- Always in the range 1.0 ≤ *m* < 10.0 for decimal numbers in scientific notation
- Always in the range 1.0 ≤ *m* < 2.0 in the binary representation
</script></section>
<section data-markdown><script type="text/template">
## Floating Point, part 2
- Each number in scientific notation has four parts: -3.24 \* 10<sup>6</sup>
- Base
- This is 10 for typically scientific notation, but 2 for computers
- As this is always known for floats and doubles, it is omitted
- Exponent (power of the base to multiply mantissa by): 6 in the example above
</script></section>
<section data-markdown><script type="text/template">
## Conversion overview
- We are going to convert a number such as -3.24 \* 10<sup>6</sup> to a format where:
- The mantissa is between 1 and 2 (not 1 and 10)
- The base is 2 (not 10)
- Example: 106 = 1.06 \* 10<sup>2</sup> = 1.65625 * 2<sup>6</sup>
- Once in that format, it is viable to encode in binary
</script></section>
<section>
<h2>IEEE 754 Floating Point<br>Single Precision (32 bits)</h2>
<ul>
<li>32 bits are split as follows:
<ul>
<li>bit 1: sign bit, 1 means negative (1 bit)</li>
<li>bits 2-9: exponent (8 bits)</li>
<li>bits 10-32: mantissa (23 bits)</li>
</ul>
</li>
<li>Exponent values:
<ul>
<li>0: zeros</li>
<li>1-254: exponent+127<ul>
<li>The value of 127 is called the <i>exponent offset</i> or the <i>bias</i></li></ul></li>
<li>255: infinities, overflow, underflow, NaN</li>
</ul>
</li>
</ul>
<p> </p>
<p>\( \text{value} = (1-2*\text{sign}) * (1 + \text{mantissa}) * 2^{\text{exponent}-127} \)</p>
</section>
<section>
<h2>Mantissa</h2>
<p class="center" style="font-size:90%">\( b_{1}b_{2}b_{3}b_{4}b_{5}b_{6}b_{7}b_{8}b_{9}b_{10}b_{11}b_{12}b_{13}b_{14}b_{15}b_{16}b_{17}b_{18}b_{19}b_{20}b_{21}b_{22}b_{23} \)</p>
<p> </p>
<p class="center">\( \text{mantissa} = 1.0 + \sum^{23}_{i=1} \frac{b_i}{2^i} \)</p>
<p> </p>
<ul>
<li>If the bits are all 1, what is the mantissa value?
<ul><li>1 + (almost) 1 = (almost) 2: the maximum value for the mantissa</li>
<li>Actually about 1.9999999999999</li></ul></li>
<li>Minimum value is all 0's
<ul><li>That's a mantissa of 1.0</li></ul></li>
</ul>
</section>
<section data-markdown><script type="text/template">
## Converting a float from binary to decimal
- Example taken from [Wikipedia](http://en.wikipedia.org/wiki/Single_precision)
- 0x41c80000 (big-endian) = 0100 0001 1100 1000 0000 0000 0000 0000
- Sign bit: 0 (means it's positive)
- Exponent: 1000 0011<sub>b</sub> = 0x83 = 131<sub>d</sub>
- Exponent offset (aka bias): 2<sup>n-1</sup>-1 = 2<sup>(8-1)</sup>-1 = 127
- Subtract 127 from exponent value yields 4
- Which means multiply the mantissa by 2<sup>4</sup>=16
- Mantissa: 100 1000 0000 0000 0000 0000
</script></section>
<section data-markdown><script type="text/template">
## Converting a float from binary to decimal
- Mantissa: 100 1000 0000 0000 0000 0000
- Each bit represents (1/2)<sup>*n*</sup> for *n* from 1 on up
- Not 0 on up!
- This mantissa has the first and fourth bits set
- The '1' bits in positions 1 and 4 represent (1/2)<sup>1</sup> and (1/2)<sup>4</sup>
- That's 0.5 + 0.0625 = 0.5625
- Then add 1 to that to yield 1.5625
- Now multiply by 2<sup>*exponent*</sup>
- 1.5625 \* 2<sup>4</sup> = 1.5625 \* 16 = 25
- Thus, 0x41c80000 = 25.0<sub>d</sub>
</script></section>
<section data-markdown id="maxfloatvalue"><script type="text/template">
## IEEE floating point maximum<br>(finite) positive value
- The largest float has:
- 0 as the sign bit (it's positive)
- 254 as the exponent (1111 1110)
- 255 is reserved for infinities and overflows
- That exponent is 254-127 = 127
- All 1's for the mantissa
- Which yields *almost* 2
- 2 \* 2<sup>127</sup> = 2<sup>128</sup> = 3.402823 \* 10<sup>38</sup>
</script></section>
<section data-markdown><script type="text/template">
## IEEE floating point minimum<br>(finite) positive value
- The smallest float has:
- 0 as the sign bit (it's positive)
- Binary 1 as the exponent (0000 0001)
- 0 is reserved for zeros
- That exponent is 1-127 = -126
- All 0's for the mantissa
- Which yields 1.0 as the mantissa
- 1 \* 2<sup>-126</sup> = 2<sup>-126</sup> = 1.175494 x 10<sup>-38</sup>
</script></section>
<section data-markdown><script type="text/template">
## Not spatially uniform!
- Consider a numerical type that holds only 3 decimal digits
- Examples: 1.23, 12.3, 123, 1,230
- In scientific notation, respectively: 1.23\*10<sup>0</sup>, 1.23\*10<sup>1</sup>, 1.23\*10<sup>2</sup>, 1.23\*10<sup>3</sup>
- Consider the next highest number for each of them:
- Next highest in this numerical type, not of all real numbers
- 1.23 => 1.24; difference is 0.01
- 12.3 => 12.4; difference is 0.1
- 123 => 124; difference is 1
- 1,230 => 1,240; difference is 10
- Depending on the exponent, the difference between two successive numbers is not the same!
</script></section>
<section data-markdown><script type="text/template">
## Floating point numbers are<br>not spatially uniform
- Consider two positive floats
- Both have a mantissa with just the last bit set
- One has an exponent of -126 (i.e. 1), the other 127 (i.e. 254)
- Now flip the second to last mantissa bit in both numbers
- The number with the higher exponent will "gain" more than the number with a lower exponent
</script></section>
<section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary
- Sign: 1 if negative (signed), 0 if non-negative (unsigned)
- Exponent (*e*): find what power of 2 is required to bring the number to 1.0 <= *x* < 2.0
- If you have to multiply, then it's negative
- If you have to divide, then it's positive
- Add 127 to that value
- The mantissa is *f/2<sup>e</sup>*
- Note that *e* is the exponent before adding 127 to it
- Subtract 1 from the mantissa (so that 0.0 ≤ *m* < 1.0)
- Convert to closest representation using powers of ½
- Similar to converting a base *n* number to base 10
</script></section>
<section data-markdown><script type="text/template">
## Converting a float
Source code: [float_to_hex.cpp](code/03-numbers/float_to_hex.cpp.html) ([src](code/03-numbers/float_to_hex.cpp)) (we want 32-bit pointers, so we compile with `-m32`)
```
#include <iostream>
using namespace std;
union foo {
float f;
int *x;
} bar;
int main() {
bar.f = 42.125;
cout << bar.x << endl; // prints in big-endian
return 0;
}
```
Output: 0x42288000
</script></section>
<section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary
- Take 42.125 (see [float_to_hex.cpp](code/03-numbers/float_to_hex.cpp.html) ([src](code/03-numbers/float_to_hex.cpp)))
- Sign is 0 (it's positive)
- Convert the float to a sum of powers of 2
- 42.125 = 32 + 8 + 2 + 1/8
- = 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup>
- (details on the next slides)
- Convert to binary (base 2)
- 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup> = 101010.001<sub>b</sub>
- Move the decimal to put in scientific notation
- 101010.001<sub>b</sub> = 1.01010001<sub>b</sub> * 2<sup>5</sup>
- It slides over 5 spots, so the exponent is 5
</script></section>
<section data-markdown><script type="text/template">
## Converting a number to powers of 2
- Consider 42.125
- For the integer portion (42):
- Repeatedly subtract the highest power of 2 that is less than or equal to the number
- 42 - 32 => 10 - 8 => 2 - 2 => 0
- Thus, 42 = 32 + 8 + 2 = 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup>
</script></section>
<section data-markdown><script type="text/template">
## Converting a number to powers of 2
- Consider 42.65625 (NOT the number on the previous slide)
- For the decimal portion (0.65625):
- Repeatedly subtract the *lowest* power of 1/2 that is less than or equal to the number
- Often easist done in rational form: 0.65625 = 21/32
- 21/32 - 16/32 (aka 1/2) => 5/32 - 4/32 (aka 1/8) => 1/32 - 1/32 => 0
- Thus, 0.65625 = 21/32 = 1/2 + 1/8 + 1/32 = 2<sup>-1</sup> + 2<sup>-3</sup> + 2<sup>-5</sup>
- If we consider 42.125 (the number on the previous slide), the decimal portion is just 0.125 = 1/8 = 2<sup>-3</sup>
</script></section>
<section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary (again)
- Take 42.125 (see [float_to_hex.cpp](code/03-numbers/float_to_hex.cpp.html) ([src](code/03-numbers/float_to_hex.cpp)))
- Sign is 0 (it's positive)
- Convert the float to a sum of powers of 2
- 42.125 = 32 + 8 + 2 + 1/8
- = 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup>
- (details on the next slide)
- Convert to binary (base 2)
- 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup> = 101010.001<sub>b</sub>
- Move the decimal to put in scientific notation
- 101010.001<sub>b</sub> = 1.01010001<sub>b</sub> * 2<sup>5</sup>
- It slides over 5 spots, so the exponent is 5
</script></section>
<section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary
- Take 1.<span class='skyblue'>01010001</span><sub>b</sub> * 2<sup><span class='red'>5</span></sup>
- Mantissa is the bits after the decimal
- Ignore the leading 1. Why?
- Append as many trailing zeros as necessary
- Exponent is <span class='red'>5</span>+bias = <span class='red'>5</span>+127 = 132 (<span class='green'>1000 0100</span><sub>b</sub>)
- In binary, then, it's:
- <span class='pink'>0</span><span class='green'>100 0010 0</span><span class="skyblue">010 1000 1000 0000 0000 0000</span>
- The <span class='pink'>sign</span>, <span class='green'>exponent</span> and <span class='skyblue'>mantissa</span> are colored differently for clarity
- In hex: 0x42288000
</script></section>
<section data-markdown><script type="text/template">
## Example
- 1/10 = 0.1 (Decimal)
- What is this in binary?
- 1/10 ≈ 1/16 + 1/32
- 1/16 + 1/32 = 3/32
- 3/32 is off from 1/10 by 0.2/32
- 0.2/32 = 2/320 ≈ 1/256 + 1/512
- But this does not exactly equal 0.1 yet...
</script></section>
<section data-markdown><script type="text/template">
## Dividing one by ten in binary
![long division](images/03-numbers/long-division.png)
Even common decimals like 0.1 cannot be represented exactly!
</script></section>
<section data-markdown><script type="text/template">
## Floating point precision in Java
What gets printed? (source: [FloatTest.java](code/03-numbers/FloatTest.java.html) ([src](code/03-numbers/FloatTest.java)))
```
class FloatTest {
public static void main (String args[]) {
// There are 10 0.1's in the next statement
double y = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 +
0.1 + 0.1 + 0.1 + 0.1 + 0.1;
System.out.println (y);
}
}
```
Hint: it's not 1.0!
</script></section>
<section data-markdown><script type="text/template">
## Comparing floats & doubles
- Consider, in Java:
```
double a = 1;
double b = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
+ 0.1 + 0.1 + 0.1 + 0.1;
double c = .9999999999999999;
```
- Two true expressions!
- `c == b`
- `b != a`
- Two false expressions!
- `a == b`
- `b != c`
- Problem is the finite precision of floating-point types
- Instead with the ordering operators for closeness
</script></section>
<section data-markdown><script type="text/template">
## How to solve this
- Don't compare floating-point values if you can help it!
- Both doubles and floats
- Need to test if the two doubles are "close" in value
```
// Java code
final double EPSILON = 0.000001;
boolean foo = Math.abs (a-b) < EPSILON;
```
```
// C++: #include <math.h> & compile with -lm
#define EPSILON 0.000001
bool foo = fabs (a-b) < EPSILON;
```
- A float has 7 decimal places of (printed) accuracy
- And two floats may differ by a value that is at 8 decimal places
- It prints as 1.0, but is really 0.9999999999
</script></section>
<section data-markdown><script type="text/template">
## Floating point rounding errors
- Add 0.33333333333 three times, and you don't get 1.0
- 1/3 cannot be represented via a (finite) decimal num
- Any denominator that has factors that are not factors of the radix
- Decimal radix factors: 2 an 5
- Good: 25 (factors: 5, 5), 10 (factors: 2, 2, 5, 5), etc.
- Bad: 1/3, 1/7, 1/11, 1/12
- Same thing with 0.1 (and many others!) in binary
- Any denominator that has factors that are not factors of the radix (which is just 2!)
- I.e. 1/3, 1/5, 1/7, etc. - anything whose denominator is not a power of 2
</script></section>
<section>
<h2>Floating point rounding errors</h2>
<ul>
<li>0.1 is stored (as a 32 bit float) as
<ul><li>mantissa = 100 1100 1100 1100 1100 1101</li></ul></li>
<li>Or: \( \frac{1+\frac{1}{2^1}+\frac{1}{2^4}+\frac{1}{2^5}+\frac{1}{2^8}+\frac{1}{2^9}+\frac{1}{2^{12}}+\frac{1}{2^{13}}+\frac{1}{2^{16}}+\frac{1}{2^{17}}+\frac{1}{2^{20}}+\frac{1}{2^{21}}+\frac{1}{2^{23}}}{2^4} \)<br> </li>
<li>Which is exactly equal to:
<ul><li> 0.100000001490116119384765625</li>
<li>But only the first 7 digits, .1000000, are printed</li></ul></li>
<li>0.1 (exactly) is a finite number in decimal, but repeating in binary</li>
</ul>
</section>
<section data-markdown><script type="text/template">
## How to solve this
- Store number as a rational number, if possible
- Works for 1/3, 1/7, etc.
- Use more digits
- A "bigger" floating point number
- BigFloat (analgous to BigInteger)
- But these still have a finite number of digits!
</script></section>
<section data-markdown><script type="text/template">
## Real example: Patriot Missile
- Gulf War I (1990-1991)
- Failed to intercept incoming Iraqi scud missile (Feb 25, 1991); both travel at about Mach 5
- 28 American soldiers killed
![patriot missile launch](images/03-numbers/patriot-missile.jpg)
- GAO report [here](http://www.fas.org/spp/starwars/gao/im92026.htm); image from [Wikipedia](http://en.wikipedia.org/wiki/File:Patriot_missile_launch_b.jpg)
</script></section>
<section data-markdown><script type="text/template">
## Patriot Design
- Intended to operate only for a few hours at a time
- But was left running for ~100 hours prior to incident
- Designed to defend Europe from Soviet weapons
- Four 24 bit fixed-point registers (1970s design)
- Meaning there were about 6 digits of precision after the decimal place
- Although a `float` has somewhat similar precision
- Kept time with integer counter: incremented every 1/10 sec
- To calculate speed of incoming missile to predict future positions:
- velocity = (loc<sub>1</sub> - loc<sub>0</sub>) / ((count<sub>1</sub> - count<sub>0</sub>) * 0.1)
- But cannot represent 0.1 exactly!
</script></section>
<section>
<h2>Floating Imprecision</h2>
<ul>
<li>24 bits in a fixed-point register:<br>
\( 0.1 \Rightarrow \frac{1+\frac{1}{2^1}+\frac{1}{2^4}+\frac{1}{2^5}+\frac{1}{2^8}+\frac{1}{2^9}+\frac{1}{2^{12}}+\frac{1}{2^{13}}+\frac{1}{2^{16}}+\frac{1}{2^{17}}+\frac{1}{2^{20}}+\frac{1}{2^{21}}}{2^4} \\
= 209715 / 2097152 \\
= 0.099999904632568359375 \)
<ul><li>Error is 0.2/2097152 = 1/10485760</li></ul></li>
<li>One hour = 3,600 seconds = 36,000 tenths of a second<ul>
<li>At 1/10485760 time error per tenth of a second, that yields 0.0034 seconds of time error per hour</li>
<li>100 hours: 0.34s</li></ul></li>
<li>The Scud missile was traveling at 1,676 m/s (Mach 4.93)<ul>
<li>1,676 m/s * 0.34 seconds = 570 meters (1,870 feet)</li>
<li>The Patriot thus missed the target by over a half a km</li></ul></li>
</ul>
</section>
<section data-markdown><script type="text/template">
## The bug fix...
> Two weeks before the incident, Army officials received Israeli data indicating some loss in accuracy after the system had been running for 8 consecutive hours.
> Consequently, Army officials modified the software to improve the system's accuracy.
> However, the modified software did not reach Dhahran until February 26, 1991 -- the day after the Scud incident.
>
> \- GAO Report
</script></section>
<section data-markdown><script type="text/template">
## Better Floats: More Bits
- IEEE 754 Double Precision (64 bits)
- A `float` has about 7 decimal places of accuracy, and a `double` has about 15
- 64 bits are split as follows:
- bit 1: sign bit, 1 means negative (1 bit)
- bits 2-12: exponent (11 bits)
- bits 13-64: mantissa (52 bits)
- Exponent offset (aka bias) is 2<sup>*e*-1</sup>-1 = 1023
- Single precision: 0.1 = 209715 / 2097152
- Error = 9.5 \* 10<sup>-8</sup> (at least 20 hours to miss target)
- Double precision:
- 0.1 = 56294995342131 / 562949953421312
- Error = 3.608 \* 10<sup>-16</sup> (2,172,375,450 ***years*** to miss)
</script></section>
<section data-markdown><script type="text/template">
## Other Floating Points
- IEEE 754r quad-precision format
- 128 bits (1 sign, 15 exponent, 112 mantissa)
- About 34 decimal places of accuracy
- IBM Floating Point ("Hexadecimal")
- Use more bits in fraction, fewer in exponent (7/24 and 7/56 instead of 8/23 and 11/52)
- Decimal Formats (IEEE 754d)
- 1 decimal digit into 4 binary digits
- Cowlishaw encoding:
- Exact representation of decimals (e.g., 0.1)
- 3 decimal digits (0-999) into 10 binary digits (0-1023) (24 wasted out of 1024)
</script></section>
<section data-markdown><script type="text/template">
## Smaller Floating Point