forked from kodecocodes/swift-algorithm-club
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
321 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
# Selection Sampling | ||
|
||
Goal: Select *k* items at random from a collection of *n* items. | ||
|
||
Let's say you have a deck of 52 playing cards and you need to draw 10 cards at random. This algorithm lets you do that. | ||
|
||
Here's a very fast version: | ||
|
||
```swift | ||
func select<T>(from a: [T], count k: Int) -> [T] { | ||
var a = a | ||
for i in 0..<k { | ||
let r = random(min: i, max: a.count - 1) | ||
if i != r { | ||
swap(&a[i], &a[r]) | ||
} | ||
} | ||
return Array(a[0..<k]) | ||
} | ||
``` | ||
|
||
As often happens with these [kinds of algorithms](../Shuffle/), it divides the array into two regions. The first region is the selected items; the second region is all the remaining items. | ||
|
||
Here's an example. Let's say the array is: | ||
|
||
[ "a", "b", "c", "d", "e", "f", "g" ] | ||
|
||
We want to select 3 items, so `k = 3`. In the loop, `i` is initially 0, so it points at `"a"`. | ||
|
||
[ "a", "b", "c", "d", "e", "f", "g" ] | ||
i | ||
|
||
We calculate a random number between `i` and `a.count`, the size of the array. Let's say this is 4. Now we swap `"a"` with `"e"`, the element at index 4, and move `i` forward: | ||
|
||
[ "e" | "b", "c", "d", "a", "f", "g" ] | ||
i | ||
|
||
The `|` bar shows the split between the two regions. `"e"` is the first element we've selected. Everything to the right of the bar we still need to look at. | ||
|
||
Again, we ask for a random number between `i` and `a.count`, but because `i` has shifted, the random number can never be less than 1. So we'll never again swap `"e"` with anything. | ||
|
||
Let's say the random number is 6 and we swap `"b"` with `"g"`: | ||
|
||
[ "e" , "g" | "c", "d", "a", "f", "b" ] | ||
i | ||
|
||
One more random number to pick, let's say it is 4 again. We swap `"c"` with `"a"` to get the final selection on the left: | ||
|
||
[ "e", "g", "a" | "d", "c", "f", "b" ] | ||
|
||
And that's it. Easy peasy. The performance of this function is **O(k)** because as soon as we've selected *k* elements, we're done. | ||
|
||
However, there is one downside: this algorithm does not keep the elements in the original order. In the input array `"a"` came before `"e"` but now it's the other way around. If that is an issue for your app, you can't use this particular method. | ||
|
||
Here is an alternative approach that does keep the original order intact, but is a little more involved: | ||
|
||
```swift | ||
func select<T>(from a: [T], count requested: Int) -> [T] { | ||
var examined = 0 | ||
var selected = 0 | ||
var b = [T]() | ||
|
||
while selected < requested { // 1 | ||
examined += 1 | ||
|
||
let r = Double(arc4random()) / 0x100000000 // 2 | ||
|
||
let leftToExamine = a.count - examined + 1 // 3 | ||
let leftToAdd = requested - selected | ||
|
||
if Double(leftToExamine) * r < Double(leftToAdd) { // 4 | ||
selected += 1 | ||
b.append(a[examined - 1]) | ||
} | ||
} | ||
return b | ||
} | ||
``` | ||
|
||
This algorithm uses probability to decide whether to include a number in the selection or not. | ||
|
||
1. The loop steps through the array from beginning to end. It keeps going until we've selected *k* items from our set of *n*. Here, *k* is called `requested` and *n* is `a.count`. | ||
|
||
2. Calculate a random number between 0 and 1. We want `0.0 <= r < 1.0`. The higher bound is exclusive; we never want it to be exactly 1. That's why we divide the result from `arc4random()` by `0x100000000` instead of the more usual `0xffffffff`. | ||
|
||
3. `leftToExamine` is how many items we still haven't looked at. `leftToAdd` is how many items we still need to select before we're done. | ||
|
||
4. This is where the magic happens. Basically, we're flipping a coin. If it was heads, we add the current array element to the selection; if it was tails, we skip it. | ||
|
||
Interestingly enough, even though we use probability, this approach always guarantees that we end up with exactly *k* items in the output array. | ||
|
||
Let's walk through the same example again. The input array is: | ||
|
||
[ "a", "b", "c", "d", "e", "f", "g" ] | ||
|
||
The loop looks at each element in turn, so we start at `"a"`. We get a random number between 0 and 1, let's say it is 0.841. The formula at `// 4` multiplies the number of items left to examine with this random number. There are still 7 elements left to examine, so the result is: | ||
|
||
7 * 0.841 = 5.887 | ||
|
||
We compare this to 3 because we wanted to select 3 items. Since 5.887 is greater than 3, we skip `"a"` and move on to `"b"`. | ||
|
||
Again, we get a random number, let's say 0.212. Now there are only 6 elements left to examine, so the formula gives: | ||
|
||
6 * 0.212 = 1.272 | ||
|
||
This *is* less than 3 and we add `"b"` to the selection. This is the first item we've selected, so two left to go. | ||
|
||
On to the next element, `"c"`. The random number is 0.264, giving the result: | ||
|
||
5 * 0.264 = 1.32 | ||
|
||
There are only 2 elements left to select, so this number must be less than 2. It is, and we also add `"c"` to the selection. The total selection is `[ "b", "c" ]`. | ||
|
||
Only one item left to select but there are still 4 candidates to look at. Suppose the next random number is 0.718. The formula now gives: | ||
|
||
4 * 0.718 = 2.872 | ||
|
||
For this element to be selected the number has to be less than 1, since there is only 1 element left to be picked. It isn't, so we skip `"d"`. Only three possibilities left -- will we make it? | ||
|
||
The random number is 0.346. The formula gives: | ||
|
||
3 * 0.346 = 1.038 | ||
|
||
Just a tiny bit too high. We skip `"e"`. Only two candidates left... | ||
|
||
Note that now literally we're dealing with a toin coss: if the random number is less than 0.5 we select `"f"` and we're done. If it's greater than 0.5, we go on to the final element. Let's say we get 0.583: | ||
|
||
2 * 0.583 = 1.166 | ||
|
||
We skip `"f"` and look at the very last element. Whatever random number we get here, it should always select `"g"` or we won't have selected enough elements and the algorithm doesn't work! | ||
|
||
Let's say our final random number is 0.999 (remember, it can never be 1.0 or higher). Actually, no matter what we choose here, the formula will always give a value less than 1: | ||
|
||
1 * 0.999 = 0.999 | ||
|
||
And so the last element will always be chosen if we didn't have a big enough selection yet. The final selection is `[ "b", "c", "g" ]`. Notice that the elements are still in their original order, because we examined the array from left to right. | ||
|
||
Maybe you're not convinced yet... What if we always got 0.999 as the random value, would that still select 3 items? Well, let's do the math: | ||
|
||
7 * 0.999 = 6.993 is this less than 3? no | ||
6 * 0.999 = 5.994 is this less than 3? no | ||
5 * 0.999 = 4.995 is this less than 3? no | ||
4 * 0.999 = 3.996 is this less than 3? no | ||
3 * 0.999 = 2.997 is this less than 3? YES | ||
2 * 0.999 = 1.998 is this less than 2? YES | ||
1 * 0.999 = 0.999 is this less than 1? YES | ||
|
||
It always works! But does this mean that elements closer to the end of the array have a higher probability of being chosen than those in the beginning? Nope, all elements are equally likely to be selected. (Don't take my word for it: see the playground for a quick test that shows this in practice.) | ||
|
||
Here's an example of how to test this algorithm: | ||
|
||
```swift | ||
let input = [ | ||
"there", "once", "was", "a", "man", "from", "nantucket", | ||
"who", "kept", "all", "of", "his", "cash", "in", "a", "bucket", | ||
"his", "daughter", "named", "nan", | ||
"ran", "off", "with", "a", "man", | ||
"and", "as", "for", "the", "bucket", "nan", "took", "it", | ||
] | ||
|
||
let output = select(from: input, count: 10) | ||
print(output) | ||
print(output.count) | ||
``` | ||
|
||
The performance of this second algorithm is **O(n)** as it may require a pass through the entire input array. | ||
|
||
> **Note:** If `k > n/2`, then it's more efficient to do it the other way around and choose `k` items to remove. | ||
Based on code from Algorithm Alley, Dr. Dobb's Magazine, October 1993. | ||
|
||
*Written for Swift Algorithm Club by Matthijs Hollemans* |
82 changes: 82 additions & 0 deletions
82
Selection Sampling/SelectionSampling.playground/Contents.swift
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
//: Playground - noun: a place where people can play | ||
|
||
import Foundation | ||
|
||
/* Returns a random integer in the range min...max, inclusive. */ | ||
public func random(min min: Int, max: Int) -> Int { | ||
assert(min < max) | ||
return min + Int(arc4random_uniform(UInt32(max - min + 1))) | ||
} | ||
|
||
/* | ||
func select<T>(from a: [T], count k: Int) -> [T] { | ||
var a = a | ||
for i in 0..<k { | ||
let r = random(min: i, max: a.count - 1) | ||
if i != r { | ||
swap(&a[i], &a[r]) | ||
} | ||
} | ||
return Array(a[0..<k]) | ||
} | ||
*/ | ||
|
||
func select<T>(from a: [T], count requested: Int) -> [T] { | ||
var examined = 0 | ||
var selected = 0 | ||
var b = [T]() | ||
|
||
while selected < requested { | ||
examined += 1 | ||
|
||
// Calculate random variable 0.0 <= r < 1.0 (exclusive!). | ||
let r = Double(arc4random()) / 0x100000000 | ||
|
||
let leftToExamine = a.count - examined + 1 | ||
let leftToAdd = requested - selected | ||
|
||
// Decide whether to use the next record from the input. | ||
if Double(leftToExamine) * r < Double(leftToAdd) { | ||
selected += 1 | ||
b.append(a[examined - 1]) | ||
} | ||
} | ||
return b | ||
} | ||
|
||
|
||
|
||
let poem = [ | ||
"there", "once", "was", "a", "man", "from", "nantucket", | ||
"who", "kept", "all", "of", "his", "cash", "in", "a", "bucket", | ||
"his", "daughter", "named", "nan", | ||
"ran", "off", "with", "a", "man", | ||
"and", "as", "for", "the", "bucket", "nan", "took", "it", | ||
] | ||
|
||
let output = select(from: poem, count: 10) | ||
print(output) | ||
output.count | ||
|
||
|
||
|
||
// Use this to verify that all input elements have the same probability | ||
// of being chosen. The "counts" dictionary should have a roughly equal | ||
// count for each input element. | ||
|
||
/* | ||
let input = [ "a", "b", "c", "d", "e", "f", "g" ] | ||
var counts = [String: Int]() | ||
for x in input { | ||
counts[x] = 0 | ||
} | ||
|
||
for _ in 0...1000 { | ||
let output = select(from: input, count: 3) | ||
for x in output { | ||
counts[x] = counts[x]! + 1 | ||
} | ||
} | ||
|
||
print(counts) | ||
*/ |
4 changes: 4 additions & 0 deletions
4
Selection Sampling/SelectionSampling.playground/contents.xcplayground
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> | ||
<playground version='5.0' target-platform='osx'> | ||
<timeline fileName='timeline.xctimeline'/> | ||
</playground> |
6 changes: 6 additions & 0 deletions
6
Selection Sampling/SelectionSampling.playground/timeline.xctimeline
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<Timeline | ||
version = "3.0"> | ||
<TimelineItems> | ||
</TimelineItems> | ||
</Timeline> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
//: Playground - noun: a place where people can play | ||
|
||
import Foundation | ||
|
||
/* Returns a random integer in the range min...max, inclusive. */ | ||
public func random(min min: Int, max: Int) -> Int { | ||
assert(min < max) | ||
return min + Int(arc4random_uniform(UInt32(max - min + 1))) | ||
} | ||
|
||
/* | ||
Selects k items at random from an array of size n. Does not keep the elements | ||
in the original order. Performance: O(k). | ||
*/ | ||
func select<T>(from a: [T], count k: Int) -> [T] { | ||
var a = a | ||
for i in 0..<k { | ||
let r = random(min: i, max: a.count - 1) | ||
if i != r { | ||
swap(&a[i], &a[r]) | ||
} | ||
} | ||
return Array(a[0..<k]) | ||
} | ||
|
||
/* | ||
Selects `count` items at random from an array. Respects the original order of | ||
the elements. Performance: O(n). | ||
|
||
Note: if `count > size/2`, then it's more efficient to do it the other way | ||
around and choose `count` items to remove. | ||
|
||
Based on code from Algorithm Alley, Dr. Dobb's Magazine, October 1993. | ||
*/ | ||
func select<T>(from a: [T], count requested: Int) -> [T] { | ||
var examined = 0 | ||
var selected = 0 | ||
var b = [T]() | ||
|
||
while selected < requested { | ||
examined += 1 | ||
|
||
// Calculate random variable 0.0 <= r < 1.0 (exclusive!). | ||
let r = Double(arc4random()) / 0x100000000 | ||
|
||
let leftToExamine = a.count - examined + 1 | ||
let leftToAdd = requested - selected | ||
|
||
// Decide whether to use the next record from the input. | ||
if Double(leftToExamine) * r < Double(leftToAdd) { | ||
selected += 1 | ||
b.append(a[examined - 1]) | ||
} | ||
} | ||
return b | ||
} |