-
Notifications
You must be signed in to change notification settings - Fork 10
/
ProbCalc.tex
2057 lines (1621 loc) · 71.5 KB
/
ProbCalc.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Basic Probability Models}
\label{probcalc}
This chapter will introduce the general notions of probability. Most of
it will seem intuitive to you, but pay careful attention to the general
principles which are developed; in more complex settings intuition may
not be enough, and the tools discussed here will be very useful.
\section{ALOHA Network Example}
\label{aloha}
Throughout this book, we will be discussing both ``classical''
probability examples involving coins, cards and dice, and also examples
involving applications to computer science. The latter will involve
diverse fields such as data mining, machine learning, computer networks,
software engineering and bioinformatics.
In this section, an example from computer networks is presented which
will be used at a number of points in this chapter. Probability analysis
is used extensively in the development of new, faster types of networks.
Today's Ethernet evolved from an experimental network developed at the
University of Hawaii, called ALOHA. A number of network nodes would
occasionally try to use the same radio channel to communicate with a
central computer. The nodes couldn't hear each other, due to the
obstruction of mountains between them. If only one of them made an
attempt to send, it would be successful, and it would receive an
acknowledgement message in response from the central computer. But if
more than one node were to transmit, a {\bf collision} would occur,
garbling all the messages. The sending nodes would timeout after
waiting for an acknowledgement which never came, and try sending again
later. To avoid having too many collisions, nodes would engage in
random {\bf backoff}, meaning that they would refrain from sending for a
while even though they had something to send.
One variation is {\bf slotted} ALOHA, which divides time into intervals
which I will call ``epochs.'' Each epoch will have duration 1.0, so
epoch 1 extends from time 0.0 to 1.0, epoch 2 extends from 1.0 to 2.0
and so on. In the version we will consider here, in each epoch, if a
node is {\bf active}, i.e. has a message to send, it will either send or
refrain from sending, with probability p and 1-p. The value of p is set
by the designer of the network. (Real Ethernet hardware does something
like this, using a random number generator inside the chip.)
The other parameter q in our model is the probability that a node which
had been inactive generates a message during an epoch, i.e. the
probability that the user hits a key, and thus becomes ``active.''
Think of what happens when you are at a computer. You are not typing
constantly, and when you are not typing, the time until you hit a key
again will be random. Our parameter q models that randomness.
Let n be the number of nodes, which we'll assume for simplicity is two.
Assume also for simplicity that the timing is as follows. Arrival of a
new message happens in the middle of an epoch, and the decision as to
whether to send versus back off is made near the end of an epoch, say
90\% into the epoch.
For example, say that at the beginning of the epoch which extends from
time 15.0 to 16.0, node A has something to send but node B does not. At
time 15.5, node B will either generate a message to send or not, with
probability q and 1-q, respectively. Suppose B does generate a new
message. At time 15.9, node A will either try to send or refrain, with
probability p and 1-p, and node B will do the same. Suppose A refrains
but B sends. Then B's transmission will be successful, and at the start
of epoch 16 B will be inactive, while node A will still be active. On
the other hand, suppose both A and B try to send at time 15.9; both will
fail, and thus both will be active at time 16.0, and so on.
Be sure to keep in mind that in our simple model here, during the time a
node is active, it won't generate any additional new messages.
(Note: The definition of this ALOHA model is summarized concisely on
page \pageref{basicprobcomp}.)
Let's observe the network for two epochs, epoch 1 and epoch 2. Assume
that the network consists of just two nodes, called node 1 and node 2,
both of which start out active. Let $X_1$ and $X_2$ denote the numbers
of active nodes at the {\it very end} of epochs 1 and 2, {\it after possible
transmissions}. We'll take p to be 0.4 and q to be 0.8 in this example.
Let's find $P(X_1 = 2)$, the probability that $X_1 = 2$, and then get to
the main point, which is to ask what we really mean by this probability.
How could $X_1 = 2$ occur? There are two possibilities:
\begin{itemize}
\item both nodes try to send; this has probability $p^2$
\item neither node tries to send; this has probability $(1-p)^2$
\end{itemize}
Thus
\begin{equation}
\label{x12}
P(X_1 = 2) = p^2 + (1-p)^2 = 0.52
\end{equation}
\section{A ``Notebook'' View: the Notion of a Repeatable Experiment}
\label{repeatexpt}
It's crucial to understand what that 0.52 figure really means in a
practical sense. To this end, let's put the ALOHA example aside for a
moment, and consider the ``experiment'' consisting of rolling two dice,
say a blue one and a yellow one. Let X and Y denote the number of dots
we get on the blue and yellow dice, respectively, and consider the
meaning of $P(X+Y = 6) = \frac{5}{36}$.
\begin{table}
\begin{center}
\vskip 0.5in
\begin{tabular}{|c|c|c|c|c|c|}
\hline
1,1 & 1,2 & 1,3 & 1,4 & 1,5 & 1,6 \\
\hline
2,1 & 2,2 & 2,3 & 2,4 & 2,5 & 2,6 \\
\hline
3,1 & 3,2 & 3,3 & 3,4 & 3,5 & 3,6 \\
\hline
4,1 & 4,2 & 4,3 & 4,4 & 4,5 & 4,6 \\
\hline
5,1 & 5,2 & 5,3 & 5,4 & 5,5 & 5,6 \\
\hline
6,1 & 6,2 & 6,3 & 6,4 & 6,5 & 6,6 \\
\hline
\end{tabular}
\end{center}
\caption{Sample Space for the Dice Example}
\label{dicesamplespace}
\end{table}
In the mathematical theory of probability, we talk of a {\bf sample
space}, which (in simple cases) consists of the possible outcomes
$(X,Y)$, seen in Table \ref{dicesamplespace}. In a theoretical
treatment, we place weights of 1/36 on each of the points in the space,
reflecting the fact that each of the 36 points is equally likely, and
then say, ``What we mean by $P(X+Y = 6) = \frac{5}{36}$ is that the
outcomes (1,5), (2,4), (3,3), (4,2), (5,1) have total weight 5/36.''
Unfortunately, the notion of sample space becomes mathematically tricky
when developed for more complex probability models. Indeed, it requires
graduate-level math, called {\bf measure theory}.
And much worse, under the sample space approach, one loses all the
intuition. In particular, {\bf there is no good way using set theory to
convey the intuition underlying conditional probability} (to be
introduced in Section \ref{probdefs}). The same is true for expected
value, a central topic to be introduced in Section \ref{expval}.
In any case, most probability computations do not rely on explicitly
writing down a sample space. In this particular example it is useful
for us as a vehicle for explaining the concepts, but we will NOT use it
much. Those who wish to get a more theoretical grounding can get a
start in Section \ref{reconcil}.
But the intuitive notion---which is FAR more important---of what $P(X+Y
= 6) = \frac{5}{36}$ means is the following. Imagine doing the
experiment many, many times, recording the results in a large notebook:
\begin{itemize}
\item Roll the dice the first time, and write the outcome on the first
line of the notebook.
\item Roll the dice the second time, and write the outcome on the second
line of the notebook.
\item Roll the dice the third time, and write the outcome on the third
line of the notebook.
\item Roll the dice the fourth time, and write the outcome on the fourth
line of the notebook.
\item Imagine you keep doing this, thousands of times, filling
thousands of lines in the notebook.
\end{itemize}
\begin{table}
\begin{center}
\vskip 0.5in
\begin{tabular}{|l|l|l|}
\hline
notebook line & outcome & blue+yellow = 6? \\ \hline
\hline
1 & blue 2, yellow 6 & No \\ \hline
2 & blue 3, yellow 1 & No \\ \hline
3 & blue 1, yellow 1 & No \\ \hline
4 & blue 4, yellow 2 & Yes \\ \hline
5 & blue 1, yellow 1 & No \\ \hline
6 & blue 3, yellow 4 & No \\ \hline
7 & blue 5, yellow 1 & Yes \\ \hline
8 & blue 3, yellow 6 & No \\ \hline
9 & blue 2, yellow 5 & No \\ \hline
\end{tabular}
\end{center}
\caption{Notebook for the Dice Problem}
\label{dicenotebook}
\end{table}
The first 9 lines of the notebook might look like Table
\ref{dicenotebook}. Here 2/9 of these lines say Yes. But after many,
many repetitions, approximately 5/36 of the lines will say Yes. For
example, after doing the experiment 720 times, approximately
$\frac{5}{36} \times 720 = 100$ lines will say Yes.
This is what probability really is: In what fraction of the lines does
the event of interest happen? {\bf It sounds simple, but if you always
think about this ``lines in the notebook'' idea, probability problems
are a lot easier to solve.} And it is the fundamental basis of computer
simulation.
\section{Our Definitions}
\label{probdefs}
These definitions are intuitive, rather than rigorous math, but
intuition is what we need. Keep in mind that we are making
\underline{definitions} below, not listing properties.
\begin{itemize}
\item We assume an ``experiment'' which is (at least in concept)
repeatable. The experiment of rolling two dice is repeatable, and even
the ALOHA experiment is so. (We simply watch the network for a long
time, collecting data on pairs of consecutive epochs in which there are
two active stations at the beginning.) On the other hand, the
econometricians, in forecasting 2009, cannot ``repeat'' 2008. Yet all
of the econometricians' tools assume that events in 2008 were affected
by various sorts of randomness, and we think of repeating the experiment
in a conceptual sense.
\item We imagine performing the experiment a large number of times,
recording the result of each repetition on a separate line in a
notebook.
\item We say A is an {\bf event} for this experiment if it is a possible
boolean (i.e. yes-or-no) outcome of the experiment. In the above
example, here are some events:
\begin{itemize}
\item[*] X+Y = 6
\item[*] X = 1
\item[*] Y = 3
\item[*] X-Y = 4
\end{itemize}
\item A {\bf random variable} is a numerical outcome of the experiment,
such as X and Y here, as well as X+Y, 2XY and even sin(XY).
\item For any event of interest A, imagine a column on A in the
notebook. The $k^{th}$ line in the notebook, k = 1,2,3,..., will
say Yes or No, depending on whether A occurred or not during the
$k^{th}$ repetition of the experiment. For instance, we have such a
column in our table above, for the event \{A = blue+yellow = 6\}.
\item For any event of interest A, we define P(A) to be the long-run
fraction of lines with Yes entries.
\item For any events A, B, imagine a new column in our notebook,
labeled ``A and B.'' In each line, this column will say Yes if and
only if there are Yes entries for both A and B. P(A and B) is
then the long-run fraction of lines with Yes entries in the
new column labeled ``A and B.''\footnote{In most textbooks, what we call
``A and B'' here is written A$\cap$B, indicating the intersection of two
sets in the sample space. But again, we do not take a sample space
point of view here.}
\item For any events A, B, imagine a new column in our notebook,
labeled ``A or B.'' In each line, this column will say Yes if and
only if at least one of the entries for A and B says Yes.\footnote{In
the sample space approach, this is written A $\cup$ B.}
\item For any events A, B, imagine a new column in our notebook, labeled
``A $|$ B'' and pronounced ``A given B.'' In each line:
\begin{itemize}
\item[*] This new column will say ``NA'' (``not applicable'') if the B
entry is No.
\item[*] If it is a line in which the B column says Yes, then this
new column will say Yes or No, depending on whether the A column
says Yes or No.
\end{itemize}
\end{itemize}
Think of probabilities in this ``notebook'' context:
\begin{itemize}
\item P(A) means the long-run fraction of lines in the notebook in
which the A column says Yes.
\item P(A or B) means the long-run fraction of lines in the notebook in
which the A-or-B column says Yes.
\item P(A and B) means the long-run fraction of lines in the notebook in
which the A-and-B column says Yes.
\item P(A $|$ B) means the long-run fraction of lines in the notebook in
which the A $|$ B column says Yes---{\bf \Large among the lines which do
NOT say NA.}
\end{itemize}
{\bf A hugely common mistake is to confuse P(A and B) and P(A $|$ B).}
This is where the notebook view becomes so important. In the dice
example, compare the quantities $P(X = 1 {\rm ~and~} S = 6) =
\frac{1}{36}$ and $P(X = 1 | S = 6) = \frac{1}{5}$, where S =
X+Y:\footnote{Think of adding an S column to the notebook too}
\begin{itemize}
\item After a large number of repetitions of the experiment,
approximately 1/36 of the lines of the notebook will have the property
that both X = 1 and S = 6 (since X = 1 and S = 6 is equivalent to X = 1
and Y = 5).
\item After a large number of repetitions of the experiment, if {\bf we
look only at the lines in which S = 6}, then {\bf among those lines},
approximately 1/5 of {\bf those lines} will show X = 1.
\end{itemize}
The quantity P(A$|$B) is called the {\bf conditional probability of A,
given B}.
Note that {\it and} has higher logical precedence than {\it or}. For
example, P(A and B or C) means P[(A and B) or C]. Also, {\it not} has
higher precedence than {\it and}.
Here are some more very important definitions and properties:
\begin{itemize}
\item
\begin{definition}
Suppose A and B are events such that it is impossible for them to
occur in the same line of the notebook. They are said to be {\bf
disjoint} events.
\end{definition}
\item If A and B are disjoint events, then
\begin{equation}
\label{disjointor}
P(A \textrm{ or } B) = P(A) + P(B)
\end{equation}
Again, this terminology {\it disjoint} stems from the set-theoretic
sample space approach, where it means that A $\cap$ B = $\phi$. That
mathematical terminology works fine for our dice example, but in my
experience people have major difficulty applying it correctly in more
complicated problems. This is another illustration of why I put so much
emphasis on the ``notebook'' framework.
By writing
\begin{equation}
\{ A \textrm{ or } B \textrm{ or } C \} =
\{ A \textrm{ or } [B \textrm{ or } C ] \} =
\end{equation}
(\ref{disjointor}) can be iterated, e.g.
\begin{equation}
P(A \textrm{ or } B \textrm{ or } C) = P(A) + P(B) + P(C)
\end{equation}
\item If A and B are not disjoint, then
\begin{equation}
\label{genor}
P(A \textrm{ or } B) = P(A) + P(B) - P(A \textrm{ and } B)
\end{equation}
In the disjoint case, that subtracted term is 0, so (\ref{genor})
reduces to (\ref{disjointor}).
\item
\begin{definition}
Events A and B are said to be {\bf stochastically independent},
usually just stated as {\bf independent},\footnote{The term {\it
stochastic} is just a fancy synonym for {\it random}.} if
\end{definition}
\begin{equation}
\label{indepand}
P(A \textrm{ and } B) = P(A) \cdot P(B)
\end{equation}
\item In calculating an ``and'' probability, how does one know whether
the events are independent? The answer is that this will typically be
clear from the problem. If we toss the blue and yellow dice, for
instance, it is clear that one die has no impact on the other, so events
involving the blue die are independent of events involving the yellow
die. On the other hand, in the ALOHA example, it's clear that events
involving $X_1$ are NOT independent of those involving $X_2$.
\item If A and B are not independent, the equation (\ref{indepand})
generalizes to
\begin{equation}
\label{genand}
P(A \textrm{ and } B) = P(A) P(B|A)
\end{equation}
This should make sense to you. Suppose 30\% of all UC Davis students
are in engineering, and 20\% of all engineering majors are female. That
would imply that 0.30 x 0.20 = 0.06, i.e. 6\% of all UCD students are
female engineers.
Note that if A and B actually are independent, then $P(B|A) = P(B)$, and
(\ref{genand}) reduces to (\ref{indepand}).
Note too that (\ref{genand}) implies
\begin{equation}
\label{condba}
P(B|A) =
\frac
{ P(A \textrm{ and } B) }
{ P(A) }
\end{equation}
\end{itemize}
\section{``Mailing Tubes''}
\label{mailingtubes1}
{\it If I ever need to buy some mailing tubes, I can come here}---friend
of the author's, while browsing through an office supplies store
\bigskip
Examples of the above properties, e.g. (\ref{indepand}) and
(\ref{genand}), will be given starting in Section \ref{basicprobcomp}.
But first, a crucial strategic point in learning probability must be
addressed.
Some years ago, a friend of mine was in an office supplies store, and he
noticed a rack of mailing tubes. My friend made the remark shown above.
Well, (\ref{indepand}) and \ref{genand} are ``mailing tubes''--make a
mental note to yourself saying, ``If I ever need to find a probability
involving {\it and}, one thing I can try is (\ref{indepand}) and
(\ref{genand}).'' {\bf \LARGE Be ready for this!}
This mailing tube metaphor will be mentioned often, such as in Section
\ref{mailingtubes2} .
\section{Example: ALOHA Network}
\label{basicprobcomp}
Please keep in mind that the notebook idea is simply a vehicle to
help you understand what the concepts really mean. This is crucial for
your intuition and your ability to apply this material in the real
world. But the notebook idea is NOT for the purpose of calculating
probabilities. Instead, we use the properties of probability, as seen
in the following.
Let's look at all of this in the ALOHA context. Here's a summary:
\begin{itemize}
\item We have n network nodes, sharing a common communications channel.
\item Time is divided in epochs. $X_k$ denotes the number of active nodes at
the end of epoch k, which we will sometimes refer to as the {\bf state}
of the system in epoch k.
\item If two or more nodes try to send in an epoch, they collide, and
the message doesn't get through.
\item We say a node is active if it has a message to send.
\item If a node is active node near the end of an epoch, it tries to
send with probability p.
\item If a node is inactive at the beginning of an epoch, then at the
middle of the epoch it will generate a message to send with probability
q.
\item In our examples here, we have n = 2 and $X_0 = 2$, i.e. both nodes
start out active.
\end{itemize}
Let's say the two nodes consist of terminals, at which John and Mary are
typing. Call the nodes Node John and Node Mary.
Now, in Equation (\ref{x12}) we found that
\begin{equation}
P(X_1 = 2) = p^2 + (1-p)^2 = 0.52
\end{equation}
How did we get this? Let $C_i$ denote the event that node i
tries to send, i = 1,2. Then using the definitions above, our steps
would be
\begin{eqnarray}
P(X_1 = 2)
&=& P(\underbrace{ C_1 {\rm ~and~} C_2 } {\rm ~or~ }
\underbrace{ {\rm ~not~} C_1 {\rm ~and~} ~{\rm not}~ C_2 } ) \label{a1} \\
&=& P(C_1 {\rm ~and~} C_2) + P(~{\rm not}~ C_1 {\rm ~and~} ~{\rm not}~ C_2)
\textrm{ (from (\ref{disjointor}))} \label{a2} \\
&=& P(C_1) P(C_2) + P(~{\rm not}~ C_1) P(~{\rm not}~ C_2)
\textrm{ (from (\ref{indepand}))} \label{a3} \\
&=& p^2 + (1-p)^2
\end{eqnarray}
(The underbraces in (\ref{a1}) do not represent some esoteric
mathematical operation. There are there simply to make the grouping
clearer, corresponding to events G and H defined below.)
Here are the reasons for these steps:
\begin{itemize}
\item[(\ref{a1}):] We listed the ways in which the event $\{X_1 = 2\}$
could occur.
\item[(\ref{a2}):] Write $G = {C_1 {\rm ~and~} C_2}$, $H = {D_1 {\rm
~and~} D_2}$, where $D_i = \textrm{not } C_i$, i = 1,2. Then the events G
and H are clearly disjoint; if in a given line of our notebook there is
a Yes for G, then definitely there will be a No for H, and vice versa.
\item[(\ref{a3}):] The two nodes act physically independently of each
other. Thus the events $C_1$ and $C_2$ are stochastically independent,
so we applied (\ref{indepand}). Then we did the same for $D_1$ and
$D_2$.
\end{itemize}
Now, what about $P(X_2 = 2)$? Again, we break big events down into
small events, in this case according to the value of $X_1$:
\begin{eqnarray}
\label{x2}
P(X_2 = 2)
&=& P(X_1 = 0 {\rm ~and~} X_2 = 2 \textrm{ or }
X_1 = 1 {\rm ~and~} X_2 = 2 \textrm{ or }
X_1 = 2 {\rm ~and~} X_2 = 2) \nonumber \\
&=& P(X_1 = 0 {\rm ~and~} X_2 = 2) \\
&+& P(X_1 = 1 {\rm ~and~} X_2 = 2) \nonumber \\
&+& P(X_1 = 2 {\rm ~and~} X_2 = 2) \nonumber
\end{eqnarray}
Since $X_1$ cannot be 0, that first term, $P(X_1 = 0 {\rm ~and~} X_2 =
2)$ is 0. To deal with the second term, $P(X_1 = 1 {\rm ~and~} X_2 =
2)$, we'll use (\ref{genand}). Due to the time-sequential nature of our
experiment here, it is natural (but certainly not ``mandated,'' as we'll
often see situations to the contrary) to take A and B to be $\{X_1 =
1\}$ and $\{X_2 = 2\}$, respectively. So, we write
\begin{equation}
\label{x1x2}
P(X_1 = 1 {\rm ~and~} X_2 = 2) = P(X_1 = 1) P(X_2 = 2 | X_1 = 1)
\end{equation}
To calculate $P(X_1 = 1)$, we use the same kind of reasoning as in
Equation (\ref{x12}). For the event in question to occur, either Node
John would send and Node B wouldn't, or John would refrain from sending
and Node B would send. Thus
\begin{equation}
\label{px11}
P(X_1 = 1) = 2p(1-p) = 0.48
\end{equation}
Now we need to find $P(X_2 = 2 | X_1 = 1)$. This again involves
breaking big events down into small ones. If $X_1 = 1$, then $X_2 = 2$
can occur only if {\it both} of the following occur:
\begin{itemize}
\item Event A: Whichever node was the one to successfully transmit
during epoch 1---and we are given that there indeed was one, since $X_1
= 1$---now generates a new message.
\item Event B: During epoch 2, no successful transmission occurs, i.e.
either they both try to send or neither tries to send.
\end{itemize}
Recalling the definitions of p and q in Section \ref{aloha},
we have that
\begin{equation}
P(X_2 = 2 | X_1 = 1) = q[p^2+(1-p)^2] = 0.41
\end{equation}
Thus $P(X_1 = 1 {\rm ~and~} X_2 = 2) = 0.48 \times 0.41 = 0.20$.
We go through a similar analysis for $P(X_1 = 2 {\rm ~and~} X_2 = 2)$:
We recall that $P(X_1 = 2) = 0.52$ from before, and find that $P(X_2 = 2
| X_1 = 2) = 0.52$ as well. So we find $P(X_1 = 2 {\rm ~and~} X_2 = 2)$
to be $0.52^2 = 0.27$. Putting all this together, we find that $P(X_2 =
2) = 0.47$.
Let's do another; let's find $P(X_1 = 1 | X_2 = 2)$. [Pause a minute
here to make sure you understand that this is quite different from
$P(X_2 = 2 | X_1 = 1)$.] From (\ref{condba}), we know that
\begin{equation}
\label{x1givx2}
P(X_1 = 1 | X_2 = 2) = \frac
{P(X_1 = 1 \textrm{ and } X_2 = 2)}
{P(X_2 = 2)}
\end{equation}
We computed both numerator and denominator here before, in Equations
(\ref{x1x2}) and (\ref{x2}), so we see that $P(X_1 = 1 | X_2 = 2) =
0.20/0.47 = 0.43$.
So, in our notebook view, if we were to look only at lines in the
notebook for which $X_2 = 2$, a fraction 0.43 of {\it those lines}
would have $X_1 = 1$.
You might be bothered that we are looking ``backwards in time'' in
(\ref{x1givx2}), kind of guessing the past from the present. There is
nothing wrong or unnatural about that. Jurors in court trials do it all
the time, though presumably not with formal probability calculation.
And evolutionary biologists do use formal probability models to guess
the past.
And one more calculation: $P(X_1 = 2 \textrm{ or } X_2 = 2)$. From
(\ref{genor}),
\begin{equation}
P(X_1 = 2 \textrm{ or } X_2 = 2) =
P(X_1 = 2) +
P(X_2 = 2) -
P(X_1 = 2 \textrm{ and } X_2 = 2)
\end{equation}
Luckily, we've already calculated all three probabilities on the
right-hand side to be 0.52, 0.47 and 0.27, respectively. Thus
$P(X_1 = 2 \textrm{ or } X_2 = 2) = 0.72$.
Note by the way that events involving $X_2$ are NOT independent of those
involving $X_1$. For instance, we found in (\ref{x1givx2}) that
\begin{equation}
P(X_1 = 1 | X_2 = 2) = 0.43
\end{equation}
yet from (\ref{px11}) we have
\begin{equation}
P(X_1 = 1) = 0.48.
\end{equation}
\section{Example: Dice and Conditional Probability}
Note in particular that $P(B | A)$ and $P(A | B)$ are completely
different quantities. The first restricts attention to lines of the
notebook in which A occurs, while for the second we look at lines in
which B occurs.
Suppose two dice are rolled, resulting in the random variables X and Y.
Let S be the sum X+Y, and let T denote the number of dice having an
even number of dots, 0, 1 or 2. Suppose we hear that S = 12, and we know
nothing else. Then we know that T must be 2. In other words,
\begin{equation}
P(T = 2 ~|~ S = 12) = 1
\end{equation}
On the other hand, if we hear instead that T = 2, that does {\it not}
imply that S is 12, i.e.
\begin{equation}
P(S = 12 ~|~ T = 2) < 1
\end{equation}
In other words
\begin{equation}
P(S = 12 ~|~ T = 2) \neq
P(T = 2 ~|~ S = 12)
\end{equation}
(In fact the reader should show that $P(S = 12 ~|~ T = 2) = 1/9$.
\section{ALOHA in the Notebook Context}
Think of doing the ALOHA ``experiment'' many, many times.
\begin{itemize}
\item Run the network for two epochs, starting with both nodes active,
the first time, and write the outcome on the first
line of the notebook.
\item Run the network for two epochs, starting with both nodes active,
the second time, and write the outcome on the second line of the
notebook.
\item Run the network for two epochs, starting with both nodes active,
the third time, and write the outcome on the third line of the notebook.
\item Run the network for two epochs, starting with both nodes active,
the fourth time, and write the outcome on the fourth line of the
notebook.
\item Imagine you keep doing this, thousands of times, filling
thousands of lines in the notebook.
\end{itemize}
The first seven lines of the notebook might look like Table
\ref{alohanotebook}. We see that:
\begin{table}
\begin{center}
\vskip 0.5in
\begin{tabular}{|l|l|l|l|l|}
\hline
notebook line &
$X_1 = 2$ &
$X_2 = 2$ &
$X_1 = 2$ and $X_2 = 2$ &
$X_2 = 2 | X_1 = 2$ \\ \hline
\hline
1 & Yes & No & No & No \\ \hline
2 & No & No & No & NA \\ \hline
3 & Yes & Yes & Yes & Yes \\ \hline
4 & Yes & No & No & No \\ \hline
5 & Yes & Yes & Yes & Yes \\ \hline
6 & No & No & No & NA \\ \hline
7 & No & Yes & No & NA \\ \hline
\end{tabular}
\end{center}
\caption{Top of Notebook for Two-Epoch ALOHA Experiment}
\label{alohanotebook}
\end{table}
\begin{itemize}
\item Among those first seven lines in the notebook, 4/7 of them have
$X_1 = 2$. After many, many lines, this fraction will be
approximately 0.52.
\item Among those first seven lines in the notebook, 3/7 of them have
$X_2 = 2$. After many, many lines, this fraction will be
approximately 0.47.\footnote{Don't make anything of the fact that these
probabilities nearly add up to 1.}
\item Among those first seven lines in the notebook, 2/7 of them have
$X_1 = 2 {\rm ~and~ } X_2 = 2$. After many, many lines, this fraction
will be approximately 0.27.
\item Among the first seven lines in the notebook, four of them do not
say NA in the $X_2 = 2 | X_1 = 2$ column. {\bf Among these four lines},
two say Yes, a fraction of 2/4. After many, many lines, this
fraction will be approximately 0.52.
\end{itemize}
\section{A Note on Modeling}
Here is a question to test your understanding of the ALOHA model---not
the calculation of probabilities, but rather the meaning of the model
itself. What kinds of properties are we trying to capture in the model?
So, consider this question:
\begin{quote}
Consider the ALOHA network model. Say we have two such networks, A and
B. In A, the network typically is used for keyboard input, such as a
user typing e-mail or editing a file. But in B, users tend to do a lot
of file uploading, not just typing. Fill in the blanks: In B, the
model parameter \_\_\_\_\_\_\_ is \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ than in
A, and in order to accommodate this, we should set the parameter
\_\_\_\_\_\_\_\_ to be relatively \_\_\_\_\_\_\_\_\_\_ in B.
\end{quote}
In network B we have heavy traffic. Thus, when a node becomes idle, it
is quite likely to have a new message to send right away.\footnote{The
file is likely read in chunks called disk {\it sectors}, so there may be
a slight pause between the uploading of chunks. Our model here is too
coarse to reflect such things.} Thus q is large.
That means we need to be especially worried about collisions, so we
probably should set p to a low value.
\section{Solution Strategies}
The example in Section \ref{basicprobcomp} shows typical stategies in
exploring solutions to probability problems, such as:
\begin{itemize}
\item Name what seem to be the important variables and events, in this case
$X_1$, $X_2$, $C_1$, $C_2$ and so on.
\item Write the given probability in terms of those named variables,
e.g.
\begin{equation}
P(X_1 = 2)
= P(\underbrace{ C_1 {\rm ~and~} C_2 } {\rm ~or~ }
\underbrace{ {\rm ~not~} C_1 {\rm ~and~} ~{\rm not}~ C_2 } )
\end{equation}
above.
\item Ask the famous question, ``How can it happen?''
Break big events down into small events; in the above case
the event $X_1 = 2$ can happen if $C_1
\textrm{ and }
C_2
\textrm{ or }
\textrm{ not }
C_1
\textrm{ and }
\textrm{ not }
C_2$.
\item But when you do break things down like this, make sure to neither
expand or contract the scope of the probability. Say you write
something like
\begin{equation}
P(A) = P(B)
\end{equation}
where B might be some complicated event expression such as in the
right-hand side of (\ref{a1}). Make SURE that A and B are
equivalent---meaning that in every notebook line in which A occurs, then
B also occurs, and {\it vice versa}.
\item Do not write/think nonsense. For example: the expression ``P(A)
or P(B)'' is nonsense---do you see why? Probabilities are numbers, not
boolean expressions, so ``P(A) or P(B)'' is like saying, ``0.2 or
0.5''---meaningless!
Similarly, say we have a random variable X. The ``probability'' P(X) is
invalid. Say X is the number of dots we get when we roll a single die.
Then P(X) would mean ``the probability that the number of dots,'' which
is nonsense English! P(X = 3) is valid, but P(X) is meaningless.
Please note that = is not like a comma, or equivalent to the English
word {\it therefore}. It needs a left side and a right side; ``a = b''
makes sense, but ``= b'' doesn't.
\item Similarly, don't use ``formulas'' that you didn't learn and that
are in fact false. For example, in an expression involving a random
variable X, one can NOT replace X by its mean. (How would you like it
if your professor were to lose your exam, and then tell you, ``Well,
I'll just assign you a score that is equal to the class mean''?)
\item Adhere to convention! Use capital letters for random variables
and names of events. Use P() notation, not p() (which will mean
something else in this book).
\item In the beginning of your learning probability methods,
meticulously write down all your steps, with reasons, as in the
computation of $P(X_1 = 2)$ in Equations (\ref{a1})ff. After you gain
more experience, you can start skipping steps, but not in the initial
learning period.
\item Solving probability problems---and even more so, building useful
probability models---is like computer programming: It's a creative
process.
One can NOT---repeat, NOT---teach someone how to write programs. All
one can do is show the person how the basic building blocks work, such
as loops, if-else and arrays, then show a number of examples. But the
actual writing of a program is a creative act, not formula-based. The
programmer must creatively combined the various building blocks to
produce the desired result. The teacher cannot teach the student how to
do this.
The same is true for solving probability problems. The basic building
blocks were presented above in Section \ref{basicprobcomp}, and many
more ``mailing tubes'' will be presented in the rest of this book. But
it is up to the student to try using the various building blocks in a
way that solves the problem. Sometimes use of one block may prove to be
unfruitful, in which case one must try other blocks.
For instance, in using probability formulas like P(A and B) = P(A)
P(B$|$A), there is no magic rule as to how to choose A and B.
Moreover, if you need P(B$|$A), there is no magic rule on how to find
it. On the one hand, you might calculate it from (\ref{condba}), as we
did in (\ref{x1givx2}), but on the other hand you may be able to reason
out the value of P(B$|$A), as we did following (\ref{px11}). Just try
some cases until you find one that works, in the sense that you can
evaluate both factors. It's the same as trying various programming
ideas until you find one that works.
\end{itemize}
% \section{Example: Divisibility of Random Integers}
%
% Suppose at step i we generate a random integer between 1 and 1000, and
% check whether it's evenly divisible by i, i = 5,4,3,2,1. Let N denote
% the number of steps needed to reach an evenly divisible number.
%
% Let's find P(N = 2). Let q(i) denote the fraction of numbers in
% 1...,1000 that are evenly divisible by i, so that for instance q(5) =
% 200/1000 = 1/5 while q(3) = 333/1000. Let's label the steps 5,4,..., so
% that the first step is number 5. Then since the random numbers are
% independent from step to step, we have
%
% \begin{eqnarray}
% \label{divis}
% P(N = 2) &=& P(\textrm{fail in step 5 and succeed in step 4}) ~~~~
% \textrm{(``How can it happen?'')}\\
% &=& P(\textrm{fail in step 5) ~ P(succeed in step 4 $|$ fail in step 5)} ~~~~
% ((\ref{genand})) \\
% &=& [1-q(5)] q(4) \\
% &=& \frac{4}{5} \cdot \frac{1}{4} \\
% &=& \frac{1}{5}
% \end{eqnarray}
%
% But there's more.
%
% First, note that q(i) is either equal or approximately equal to 1/i.
% Then following the derivation in (\ref{divis}), you'll find that
%
% \begin{equation}
% P(N = j) \approx \frac{1}{5}
% \end{equation}
%
% for ALL j in 1,...,5.
%
% That may seem counterintuitive. Yet the example here is in essence the
% same as one found as an exercise in many textbooks on probability:
%
% \begin{quote}
% A man has five keys. He knows one of them opens a given lock, but he
% doesn't know which. So he tries the keys one at a time until he finds
% the right one. Find P(N = j), j = 1,...,5, where N is the number of
% keys he tries until he succeeds.
% \end{quote}
%
% Here too the answer is 1/5 for all j. But this one makes intuitive
% sense: Each of the keys has chance 1/5 of being the right key, so each
% of the values 1,...,5 is equally likely for N.
%
% This is then an example of the fact that sometimes we can gain insight
% into one problem by considering a mathematically equivalent problem in a
% quite different setting.
%
\section{Example: A Simple Board Game}
\label{boardgame}
Consider a board game, which for simplicity we'll assume consists of two
squares per side, on four sides. A player's token advances around the
board. The squares are numbered 0-7, and play begins at square 0. The
token advances according to the roll of a single die. If a player lands
on square 3, he/she gets a bonus turn.
In most problems like this, {\bf it is greatly helpful to first name the
quantities or events involved}. Toward that end, let R denote the
player's first roll, and let B be his bonus if there is one, with B
being set to 0 if there is no bonus.
Let's find the probability that a player has yet to make a complete
circuit of the board---i.e. has not yet reached or passed 0---after the
first turn (including the bonus, if any).
\begin{eqnarray}
P(\textrm{doesn't reach or pass 0}) &=& P(R+B \leq 7) \\
&=& P(R \leq 6, R \neq 3 \textrm{ or } R = 3, B \leq 4) \\
&=& P(R \leq 6, R \neq 3) + P(R = 3, B \leq 4) \\
&=& P(R \leq 6, R \neq 3) + P(R = 3) ~ P(B \leq 4 ~|~ R = 3) \\
&=& \frac{5}{6} + \frac{1}{6} \cdot \frac{4}{6} \\
&=& \frac{17}{18}
\end{eqnarray}
(Above we have written commas as a shorthand notation for {\it and}, a
common abbreviation.)
Now here is a very subtle issue. The events $R+B = 3$ and $B \leq 4$
are \textit{not} independent. At first it would seem that they're
independent, as we are rolling the die twice, and successful rolls are
independent. That's true, they are independent, but remember, $B$ can
be 0 (no bonus), and the event $B \leq 4$ includes that 0 case. And if
we know $R = 3$, then we know for sure that $B$ cannot be 0, so we see