layout	title	math
default	Algorithms	true

CS260 - Algorithms

Greedy Algorith ms For Scheduling Problems

Interval Scheduling Problems

Interval Scheduling Problem

Job $$j $$ starts at $$ s_{j} $$ and finishes at $$ f_{j} $$
Two jobs are compatible if they don’t overlap
Goal: find the maximum subset of mutually compatible jobs

Inputs: Sequences of jobs

Outputs: maximum subset of mutually compatible jobs.

Earliest-Finish-Time-First Algorithm

$$ Sort $$ jobs by finish times a renumber so that $$ f_{1} \leq f_{2} \leq ... \leq f_{n} $$

$$ S \leftarrow \emptyset $$

$$ FOR $$ $$ j = 1$$ $$TO$$ $$ n $$

$$ IF$$ (job $$j$$ is compatible with $$S$$)

$$S \leftarrow S \cup {j}$$

$$RETURN$$ $$S$$

Proposition Can implement earliest finish time first in $$O(n log n)$$

Keep track of job $$j*$$ that was added last to $$S$$
Job $$j$$ is compatible with $$S$$ iff $$s_{j} \leq f_{j*}$$
Sorting by finish times takes $$O(n log n)$$
The body of the for loop can be completed in constant time

Algorithm creates a set $$S$$ that are non-overlapping

When we are considering a new item $$j$$, we know its start and finish time
We know that the finish time is after the finish time of $$j*$$, all we need to do here to verify if the start time of $$j$$ is larger or equal to the start time of job $$j*$$
By the use of a variable to store the last finish time, this allows the check to be done in $$O(1)$$ time.

Proof of correctness of the earliest time first algorithm

Theorem: The earliest finish time first algorithm is an optimal compatible set of jobs.

Proof - by contradiction.

Interval Partitioning Problem

lecture $$j$$ starts at $$s_{j}$$ and finishes at $$f_{j}$$.
Goal: find the minimum number of classrooms to schedule all lectures so that no two lectures occur at the same time in the same room.

Interval Partitioning Algorithm

This is a greedy algorithm because for every lecture based on a simple rule, we allocate to one of the classrooms allocated so far, and never change the decision.

Inputs: Sequences of pairs $$(s_{j}, f_{j})$$

**Outputs: ** A minimum number of lists with sequences of pairs: $$(s_{j}, f_{j})$$

$$EarliestStartTimeFirst(n,s_1,s_2,…,s_n,f_n,f_2,…,f_n)$$

$$Sort$$ lectures by start times and renumber

$$d \leftarrow 0$$ - Number of allocated classrooms

for $$j=1$$ to $$n$$

if (lecture $$j$$ is compatible with some classroom)

Schedule lecture $$j$$ in any such classroom $$k$$

else

Allocate a new classroom $$d+1$$

Schedule lecture $$j$$ in classroom $$d+1$$

$$d\leftarrow d+1$$

return $$schedule$$

Proposition

“To maintain a PQ in which we store all the classrooms that have been considered so far, in which the keys are the finish times of the last jobs scheduled in each of those classrooms respectively. The PQ find min operation will allow us to find the classroom and lecture $$(i*)$$ that has the smallest finish time $$(f_{i*})$$ of all lectures that have been allocated to any of the classrooms so far. Assume that $$(i*)$$ is the lecture allocated to one of the classrooms which is the last lecture in the classroom chronologically speaking, and whose finish time is the smallest of all other lectures that are last in the other classrooms.”
- In order to compare if the next lecture is compatible, it is sufficient to compare $$f_{i*}<s_j$$, to check compatibility.
- When $$s_j <f_{i*}$$, then it is possible to determine that lecture $$j$$ is incompatible with all classrooms $$d$$, therefore must be allocated to $$d+1$$
- The start time of $$s_i \leq s_j < f_{i*} $$
- There is a time soon after $$s_j$$ denoted as: $$s_j + \varepsilon : s_j < s_j + \varepsilon < f_{i*}$$. this is a time where $$j$$ is taking place, and all the lectures in classrooms $$1$$ to $$d$$ are taking place at this time, as well as lecture $$j$$.
The time complexity depends on the allocation
If a suitable data structure is used (a PQ) this can be implemented and maintained efficiently with each decision and iteration.

Proof of correctness

Interval partitioning: lower bound on optimal solution:

DEF The depth of a set of open intervals is the maximum number of intervals that contain any given point
- KOBV Number of classrooms $$\geq$$ depth

As shown above, the depth above is 3.

Question: is the depth also an upper bound?

Yes, Moreover, ESTF algorithm always finds a schedule whose number of classrooms equals the depth.

Begin Proof

Let $$d =$$ number of classrooms that the algorithm allocates
Classroom $$d$$ is opened because we need to schedule a lecture, say $$j$$, that is incompatible with a lecture in each of $$d-1$$ other classrooms.
Therefore, these $$d$$ lectures each end after $$s_j$$.
Since we sorted by start time, each of the incompatible lectures start not later than $$s_r$$.
Thus, we have $$d$$ lecture overlapping at time $$s_j + \varepsilon$$.
Key Observation $$\Rightarrow$$ all schedule use $$\geq d$$ classrooms.

Whenever the algorithm generates a new classroom, it exhibits why the depth is at least d.

Scheduling to minimizing lateness

Single resource processes one job at a time
Job $$j$$ requires $$t_j$$ units of processing time and is due at time $$d_j$$.
If $$j$$ starts at time $$s_j$$, it finishes at time $$f_j=s_j+t_j$$
Lateness: $$l_j=max{0,f_j-d_j}$$ - if positive, this is the amount of time that the job misses the schedule
Goal: schedule all jobs to minimize maximum lateness $$L = max_jl_j$$
Output: sequence of numbers.

Minimizing lateness: earliest deadline first

Algorithm

$$EarliestDeadlineFirst(n, t_1, t_2, ..., t_n, d_1,d_2,...,d_n)$$

$$Sort$$ jobs by due times and renumber so that $$d_1 \leq d_2 \leq … \leq d_n$$

$$t\leftarrow 0$$

for $$j = 1$$ to $$n$$

Assign job $$j$$ to interval $$[t, t+t_j]$$

$$s_j \leftarrow t; f_j \leftarrow t + t_j$$

$$t\leftarrow t + t_j$$

return intervals $$[s_1,f_1],[s_2,f_2],…,[s_n,f_n]$$

Minimizing lateness

No Idle Time

Observation 1 - there exists and optimal schedule with no idle time
Observation 2 - the EDF schedule has no idle time.

Inversions

Definition Given a shcedule $$S$$, an inversion is a pair of jobs $$i,j : i < j$$, but $$j$$ is scheduled before $$i$$.
- We assume the jobs are numbered so that $$d_1,\leq d_2, \leq … \leq d_n$$
Observation 3 - the EDG schedule is the unique idle-free schedule with no inversions.
- If there is any other schedule, there may be an inversion.
Observation 4 - if an idle-free schedule has an inversion, then it has an adjacent inversion.
- Proof
  - Left $$i-j$$ be a a closest inversion.
  - Let $$k$$ be element immediately to the right of $$j$$.
  - Case 1. $$[j >k]$$, Then $$j-k$$ is an adjacent inversion.
  - Case 2. $$[j<k$$] Then $$i-k$$ is a closer inversion since $$i<j<k$$
  $$\square$$
Key Claim Exchanging two adjacent, inverted jobs $$i,j$$ reduces the number of inversions by 1 and does not increase the max lateness. - the lateness of the new $$j$$ is upper bounded by the lateness of $$i$$ in the old schedule
- Proof
  - $$l^{'}_k = l_k \forall k \neq i,j$$
  - $$ l^{'}_i \leq l_i$$
  - If job $$j$$ is late: - the lateness
    - $$l^{'}_j = f^{'}_j - d_j$$ $$\longleftarrow$$ Definition
    - $$ = f_i-d_j$$ $$\longleftarrow$$ $$j$$ now finishes at time $$f_i$$
    - $$\leq f_i - d_i$$ $$\longleftarrow$$ $$i<j \Rightarrow d_i \leq d_j$$
    - $$\leq l_i$$ $$\longleftarrow$$ Definition

Proof

Theorem The earliest-deadline-first schedule $$S$$ is optimal.

Proof - by contradiction.

Define $$S^*$$ to be an optimal schedule with the fewest inverions.

Can assume $$S^*$$ has no idle time. $$\longleftarrow$$ Observation 1
Case 1. $$[S^$$ has no inversions$$]$$ Then $$S = S^$$ $$\longleftarrow$$ Observation 3
Case 2. $$[S^*$$ has an inversion$$]$$
- let $$i-j$$ be an adjacent inversion $$\longleftarrow$$ Observation 4
- exchanging jobs $$i,j$$ decreases the number of inversions by $$1$$ without increasing the max lateness. $$\longleftarrow$$ key claim
- contradicts “fewest inversions” as part of the definition of $$S^* $$ $$\square$$

Proof Techniques For Correctness Of Greedy Algorithms

Greedy analysis strategies

Greedy algorithm stays ahead Show that after each step of the greedy algorithm, its solution is at least as good as any other algorithm’s.

Structural Discover a simple “structural” bound asserting that every possible solution must have a certain value. Then show that the algorithm always achieves this bound.

Exchange Argument Gradually transform any solution to the one found by the greedy algorithm without hurting its quality

Other greedy algorithms. Gale-Shapley, Kruskal, Prim, Dijkstra, Huffman, …

Stable Matching Problem

Given a set of preferences among employers and applicants, we assign applicants to employers so that for every employer, and every applicant who is not scheduled to work for at least one of the following is true.
- prefers every on eof its accepted applicants to or
- prefers their current situation over working for
If this is true, the outcome is stable (no one will want to change).

Formulating the problem - whenever trying to get to the essence of a problem, making it as “clean” as possible helps.

In this case:

Every applicant is looking for 1 company
Every company looking for many applicants
May be more or less applicants than spaces
Applicants may or may not apply to every company

Therefore, to eliminate this, each of the applicants applies to each of the companies - each company wants only a single applicant.

This in effect, leaves us with two genders (Men and Women)

Therefore, consider a set and

is the set of all ordered pairs of the form where
The set is a set of ordered pairs from where each member of and each member of appears in at most one pair in .
A perfect matching set is a matching with the property that each member of and each member of appear in exactly one pair in
After this, the notion of preference can be added. Each man ranks all women, and vice versa.
Given a perfect matching set , and there are two pairs: , but each prefer each other then there is nothing from stopping them from changing
- This is an instability.
This creates a goal of making a set of stable pairings.
A set is stable if:
- It is perfect
- There is no instability
Two Questions:
- Does there exist a stable matching for every set of preference lists?
- Given a set of preference lists, can we efficiently construct a stable matching (if it exists)?

Designing the Algorithm

Some basic ideas motivating the algorithm:

Initially everyone is unmarried. If an unmarried man chooses a woman , who ranks highest on his preference list and proposes. They should not instantly pair, as in future someone who ranks higher on ‘s list may propose. And they should not instantly reject. This may be the highest ranked proposal that gets.
- Add an engagement state.
Now there is a state where some men and women are engaged and some are not. The next step could be as follows.
- An arbitrary free man chooses the highest ranked woman to whom he has not yet proposed. If is not engaged, they get engaged. Else, determines which of the men or rank higher.
Finally, the algorithm terminates when no-one is free - the perfect matching is returned.

PSEUDOCODE

Initially all are free

while who is free and hasn’t proposed to every woman

Choose a man

Let be the highest ranked woman in the preference list to whom they have not yet proposed

If is free

then get engaged

else is engaged to

if prefers

then remains free

else

become engaged

becomes free

return the set of engaged pairs

Note This algorithm returns a set that is a stable matching.

Greedy Algorithms for Shortest Paths and Minimum Spanning Trees

Shortest Path Problems - Extremely Wide Range of Applications

Single Pair Shortest Path Problem
- Given a digraph $$G=(V,E)$$, edge lengths $$l_e\geq 0$$, source $$s\in V$$ and destination $$t\in V$$, find the shortest path from $$s$$ to $$t$$
Single Source Shortest Paths Problem
- Given a digraph $$G=(V,E)$$, edge lengths $$l_e\geq 0$$, source $$s\in V$$, find the shortest path from $$s$$ to every node.
- Forms a tree routed at $$s$$

Dijkstra's Algorithm

The problem

given nodes $$u,v$$ what is the shortest $$u-v$$ path, or given a node $$s$$, what is the shortest path from $$s$$ to any other point in the graph.

The setup of the graph is as follows:

Given a directed graph $$G=(V,E)$$ with a designated start node $$s$$.
- We assume that $$s$$ has a path to every other node in $$G$$. Each edge $$e$$ has length $$l_e \geq 0$$, indicating time or cost to taverse that edge.
For a path $$P$$, the length of $$P$$ is denoted as $$l(P)$$ and is the sum of the length of all the edges in that path..
Our Goal: the goal is to find the shortest length from $$s$$ to every other node in the graph $$G$$.
Note - this is meant for directed graphs, but also works with undirected ones, where each edge $$e=(u,v)$$ must be replaces with the edges $$(u,v),(v,u)$$ with the same length.

Designing the Algorithm

We begin by describing an algorithm that determines the length of the shortest path from $$s$$ ti each ither nide un the graph. (This makes it easier to produce other paths as well).
The algorithm maintains a set $$S$$ of vertices $$u$$ for whivh we have determined a shortest path distance $$d(u)$$ from $$s$$
- This is the explored part of the graph.
- Initially $$S = {s}$$, and $$d(s) = 0$$
For each node $$v\in V-S$$, we determine the shortest path that can be constructed by travelling along a path through the explored part $$S$$ to some $$u\in S$$ , followed by the single edge $$(u,v)$$.
- Essentially we consider the quantity: $$\pi ’(v) = min_{e=(u,v):u\in S} d(u) + l_e$$

It is simple to produce the $$s-u$$ paths corresponding distances found by Dijkstra’s algorithm.

As each node $$v$$ is added to the set, we recored the edge $$(u,v)$$ on which it achieved the value: $$d’(v) = min_{e=(u,v):u\in S} d(u) + l_e$$
The path $$P_v$$ is implicitly represented by these edges:
- If $$(u,v)$$ is the edge stored for $$v$$, then $$P_v$$ is (recursively) the path $$P_u$$ followed by the single edge $$(u,v)$$.
- Essentially: to construct $$P_v$$, we start at $$v$$, follow the edge stored for $$v$$ in the reverse direction to $$u$$; then follow the edge we have stored for $$u$$ in the reverse direction to its predecessor and so on, until $$s$$ is reache.
  - Note $$s$$ must be reached, as it is a backward walk from a path that originated from $$s$$.
Essentially, the next path taken will be at the smallest total distance from the origin node $$s$$.

Analysing the algorithm

Dijkstra’s Algorithm is greedy in the sense that we always form the shortest new $$s-v$$ path we can make from the path $$S$$ followed by a single edge. To prove its correctness, we show that it:

Stays ahead - each time it selects a path to a node, that path is shorter than every other possible path.
Consider the set $$S$$ at any point un the algorithms execution. For each $$u\in S$$, the path $$P_u$$ is a shortest path.

This established correctness, since this can be applied at the algorithm’s termination, where $$S$$ includes all nodes.

Proof - From Lecture Notes

Invariant: For each node $$u \in S:d[u] =$$ length of a shortest path $$s\rightarrow u$$ path.

Induction on $$|S|$$

Base Case: $$|S| = 1$$ is trivial since $$S={s}$$ and $$d[s]=0$$

Inductive Hypothesis: assume true for $$|S| \geq 1$$

Let $$v$$ be next node added to $$S$$, and let $$(u,v)$$ be the final edge
A shortest path $$s \rightarrow u$$ path plus $$(u,v)$$ is an $$s\rightarrow v$$ path of length $$\pi(v)$$
Consider any other $$s\rightarrow v$$ path $$P$$. We show that it is no shorter than $$\pi(v)$$
Let $$e=(x,y)$$ be the first edge in $$P$$ that leaves $$S$$ and let $$P’$$ be the fsubpath from $$s$$ to $$x$$
The length of $$P$$ is already $$\geq \pi(v)$$ as soon as it reaches $$y$$

$$l(p) \geq l(P) + l_e \geq d[x] + l_e \geq \pi(y) \geq \pi(v)$$

Non negative lengths, 2. inductive hypothesis, 3. definition of $$\pi(y)$$ 4. Dijkstra chose $$v$$ instead of $$y$$

Proof - From Book

Induction on the size of $$S$$

The case $$|S| = 1$$ is easy:

$$S = {s}$$ and $$d(s)= 0$$

Suppose that the claim holds when $$|S| = k$$ for some value of $$k \geq 1$$, we now grow S to size $$k+1$$ by adding the node $$v$$

Let $$(u,v)$$ be the final edge on our $$s-v$$ path$$P_v$$.
By the induction hypothesis, $$P_u$$ is the shortest $$s-u$$ path for each $$u\in S$$
Now consider any other $$s-v$$ path $$P$$, this path $$P$$ must leave the set $$S$$ somewhere.
Let $$y$$ be the first node on $$P$$ that is not in $$S$$, and let $$x\in S$$ be the node just before $$y$$
$$P$$ cannot be shorter than $$P_u$$ because it is already at least as long as $$P_v$$ by the time it has left the set $$S$$.
In iteration $$k+1$$ the algorithm must have considered adding node $$y$$ to the set $$S$$ via the edge $$(x,y)$$ and rejecting this in favour of adding $$v$$
- This means that there is no path fron $$s$$ to $$y$$ through $$x$$ that is shorter than $$P_v$$, but the subpath of $$P$$ up to $$y$$ is such a path, therefore, this subpath is at least as long as $$P_v$$.
- Lengths are non-negative, therefore the full path $$P$$ is at least as long as $$P_v$$ $$\square$$

Notes on the proof

The algorithm does not find shortes paths if some of the edges can have negative lengths.
Dijkstra’s Algorithm is essentially a wave front reaching the shortest path to each node. (See page 140 Algorithm Design)

Implementation and Running Time

When considering it’s running time, there are $$n-1$$ iterations of the while loop for a graph with $$n$$ nodes. Each iteration adds a new node $$v$$ to $$S$$. Selecting the correct node $$v$$ efficiently is the more difficult issue.

Initially, it appears that one must consider every node $$v\notin S$$ and go through all edges between $$S,v$$ to determine the minimum $$min_{e=(u,v):u\in S} d(u) + l_e$$ to select the node $$v$$ for which this minimum is smallest.
For a graph with $$m$$ edges, this takes $$O(m)$$ time, meaning that the algorithm has a complexity of $$O(mn)$$

However, using the correct data structure can drastically improve this.

1st, we explicitly maintain the values of the minima $$d’(v) = min_{e=(u,v):u\in S} d(u) + l_e$$ for each node $$v\in V-S$$, rather than recalculating them in every iteration.
2nd, the nodes $$V-S$$ can be kept in a PQ with $$d’(v)$$ as their keys. (PQs discussed in Chapter 2 of this book and CS126 last year)
A PQ can efficiently insert and delete elements, change an element’s key and extract the minimum element.
For this, we require $$ChangeKey$$ and $$ExtractMin$$

There

Implementation In a Priority Queue

Put the nodes $$V$$ in a PQ with $$d’(v)$$ as the key for $$v\in V$$
To select which node should be added to $$S$$, we use $$ExtractMin$$
To update the keys, consider an iteration where:
- node $$v$$ is added to $$S$$
- Let $$w\notin S$$ be a node that remains in the PQ
- To update the value of $$d’(w)$$:
  - If $$(v,w)$$ is not an edge, nothing needs be done: the set of edges is considered in the minimum $$min_{e=(u,v):u\in S} d(u) + l_e$$,, is exactly the same as before and after adding $$v$$ to $$S$$.
  - If $$e’ = (v,w) \in E$$, then the new value for the key is $$min(d’(w), d(v) + l_e)$$. If $$d’(w)>d(v) + l_e$$ then the $$ChangeKey$$ operation is needed to decrease the key of node $$w$$ appropriately.
  - This can happen at most once per edge when the tail of $$e’$$ is added to $$S$$

This leaves us with:

Using a PQ, the algorithm can be implemented on a graph with $$n$$ nodes and $$m$$ edges to run in $$O(m)$$ time, plus the time for $$n$$ $$ExtractMin$$ and $$m$$ $$ChangeKey$$
Using the heap based PQ, each PQ operation can be run in $$O(log (n))$$ time. Therefore, the overall running time is: $$O(m log (n))$$

Pseudocode

$$Dijkstra(V,E,l,s)$$

foreach $$v \neq s: \pi[v] \leftarrow \infty, pred[v] \leftarrow null; \pi[s] \leftarrow 0 $$

Create an empty priority queue $$pq$$

foreach $$v\in V: Insert(pq,v,\pi[v])$$

while ($$isNotEmpty(pq)$$)

$$u \leftarrow RemoveMin(pq)$$

foreach edge $$e = (u,v) \in E$$ leaving $$u$$

if $$(\pi[v] > \pi[u] + l_e)$$

$$DecreaseKey(pq,v,\pi[u] + l_e)$$

$$\pi[v] \leftarrow \pi[u] + l_e ; pred[v] \leftarrow e$$

Which Priority Queue

Performance: Depends on PQ: $$n Insert(), n DeleteMin(), \leq DecreaseKey()$$

Array implementation optimal for dense graphs: $$\Theta(n^2)$$ edges
Binary Heap much faster for sparse graphs: $$\Theta(n)$$ edges
4-way heap worth the trouble in performance crititcal situations.

Priority Queue	Insert	DeleteMin()	DecreaseKey()	Total
node-indexed array: ($$A[i] = $$ priority of $$i$$)	$$O(1)$$	$$O(n)$$	$$O(1)$$	$$O(n^2)$$
binary heap	$$O(log(n))$$	$$O(log(n))$$	$$O(log(n))$$	$$O(m \cdot log(n))$$
d-way heap	$$O(d \cdot log_d(n))$$	$$O(d \cdot log_d(n))$$	$$O(log_d(n))$$	$$O(m\cdot log_{m/n}(n))$$
Fibonacci heap	$$O(1)$$	$$O(log(n))$$	$$O(1)$$	$$O(m + n\cdot log(n))$$
Integer PQ	$$O(1)$$	$$O(log(log(n)))$$	$$O(1)$$	$$O(m + n \cdot log(log(n)))$$

Minimum Spanning Tree Problem

Cycles and cutsets

Cycles

A cycle is a path with no repeated nodes or edges other than the starting and ending nodes.

Cuts

A cut is a partition of the nodes into two nonempty subset $$S$$ and $$S-V$$

A cutset of a cut $$S$$ is the set of edges with exactly one endpoint in $$S$$

Cycle-cut insertion

Proposition a cycle and a cutset intersect in an even number of edges:

This is for the cut $${4,5,8}$$

Consider the cycle $$C$$, if the cycle is fully within or entirely outside of the cut $$S$$, then the intersection of the cycle and the cutset is empty.

If however, if there is some edge in the cycle, that goes out of the cut, then there must be an edge which goes back into the cut, as the cycle must come back to the original vertex, without repeating any other edges or vertices but the start vertex.

Minimum Spanning Trees

Spanning tree definition

Let $$H=(V,T)$$ be a subgraph of an undirected $$G=(V,E)$$. $$H$$ is a spanning tree of $$G$$ if $$H$$ is both acyclic and connected.

Spanning Tree Properties

Proposition Let $$H=(V,T)$$ be a subgraph of an undirected graph $$G=(V,E)$$.

Then the following are equivalent:

$$H$$ is a spanning tree of $$G$$
$$H$$ is acyclic and connected
$$H$$ is connected and has $$|V| -1$$ edges
$$H$$ is acyclic and has $$|V| -1$$ edges
$$H$$ is minimally connected: removal of any edge disconnects it
$$H$$ is maximally acyclic: addition of any edge creates a cycle

Minimum Spanning Tree Definition

Definition Given a connected, undirected graph $$G=(V,E)$$ with edge costs $$c_e$$, a minimum spanning tree $$(V,T)$$ is a spanning tree of $$G$$ such that the sum of the edge costs in $$T$$ is minimized

Cayley’s theorem The complete graph on $$n$$ nodes has $$n^{n-2}$$ spanning trees (Brute force not applicable)

Very useful for approximation algorithms for NP-hard problems

Fundamental Cycle

Property Fundamental cycle: Let $$H =(V,T) $$be a spanning tree of $$G=(V,E)$$

For any non tree edge $$e\in E : T\cup {e}$$ contains a unique cycle say $$C$$
For any edge $$f\in C:(V,T\cup {e}-{f})$$ is a spanning tree

Observation: If $$c_e < c_f$$, then $$(V,T)$$ is not an MST - the new tree has a smaller cost.

Fundamental Cutset

Let $$H=(V,T)$$ be a spanning tree of $$G=(V,E)$$

For any tree edge $$f\in T: (V,T-{f})$$ has two connected components
Let $$D$$ denote corresponding cutest.
For any edge $$e\in D : (V,T-{f}\cup {e})$$ is a spanning tree

Observation If $$c_e < c_f$$, then $$(V,T)$$ is not an MST

The Greedy Algorithm

Red Rule

Let $$C$$ by a cycle with no red edges
Select an uncolored edge of $$C$$ of max cost and colour it red.

Blue Rule

Let $$D$$ be a cutset with no blue edges
Select an uncoloured edge in $$D$$ of min cost and colour it blue

Greedy Algorithm

Apply the red and blue ruled (nondeterministic ally) until all edged are coloured
- The blue edges form an MST
Note can stop once $$n-1$$ edges are coloured blue.

Proof of correctness for the Greedy Algorithm

Colour Invariant There exist an $$MST(V,T*)$$ containing every blue edge and no red edge.

Proof induction on the number of iterations

Base Case No Edged coloured $$\Rightarrow$$ every MST satisfies the invariant

Induction step - blue rule Suppose colour invariant true before blue rule

Let $$D$$ be chosen cutset, and let edge $$f$$ be edge coloured blue
If $$f\in T^$$, then $$T^$$ still satisfies invariant
Otherwise, consider fundamentay cycle $$C$$ by adding $$f$$ to $$T^*$$
Let $$e\in C$$ be another edge in $$D$$
$$e$$ is uncoloured and $$c_e \geq c_f$$ since
- $$e\in T^* \Rightarrow e$$ not red
- Blue rule $$\Rightarrow e$$ not blue and $$c_e \geq c_f$$
Thus, $$T^*\cup {f} -{e}$$ satisfies invariant

Induction Step - red rule Suppose colour invariant true before red rule

Let $$C$$ be chosen cycl,e and let edge $$e$$ be edge coloured red.
If $$e\notin T^$$ then $$T^$$ still satisfies invariant
Otherwise, consider funamental cutest $$D$$ by deleting $$e$$ from $$T^*$$
Let $$f\in D$$ be another edge in $$C$$
$$f$$ is uncoloured and $$c_e\geq c_f$$ since
- $$f\notin T^* \Rightarrow f$$ not blue
- red rule $$\Rightarrow$$ $$f$$ not red and $$c_e \geq c_f$$
Thus, $$T^*\cup {f}-{e}$$ satisfies invariant $$\blacksquare$$

Theorem The greedy algorithm terminates. Blue edges form an MST

Proof We need to show that either the red or blue rule (or both) applies

Suppose $$e$$ is uncoloured
Blue edges form a forest
Case 1: both endpoints of $$e$$ are in the same blue tree

$$\Rightarrow$$ apply red rule to cycle formed by adding $$e$$ to the blue forest

Case 2: oth endpoints of $$e$$ are in different blue trees.

$$\Rightarrow$$ apply blue rule to cutset induced by either of two blue trees. $$\blacksquare$$

Prim’s algorithm

Initialise $$S={s}$$ for any node $$s$$, $$T\neq \emptyset$$

Repeat $$n-1$$ times:

Add to $$T$$ a min-cost edge with exactly one endpoint in $$S$$
Add the other endpoint to $$S$$

Theorem Prim’s algorithm computes an MST

Proof Special case of greedy algorithm (blue rule repeatedly applied to $$S$$) - by construction, edges in cutset are uncoloured

Implementation

$$PRIM(V,E,c)$$

$$S\leftarrow \empty, T\leftarrow \empty$$

$$s\leftarrow $$ any node in $$V$$

foreach $$v\neq s: \pi[v] \leftarrow \infty, pred[v] \leftarrow null; \pi[s] \leftarrow 0$$

create an empty priority queue $$pq$$

foreach $$v\in V:Insert(pq,v,\pi[v])$$ - $$\pi[v]$$ is the cost of the cheapest known edge between $$v$$ and $$S$$

while ($$IsNotEmpty(pq)$$)

$$u\leftarrow RemoveMin(pq)$$

$$S\leftarrow S \cup {u}, T \leftarrow T \cup {pred[u]}$$

foreach edge $$e=(u,v) \in E$$ with $$v\notin S$$

if ($$c_e < \pi[v]$$)

$$DecreaseKey(pq,v,c_e)$$

$$\pi[v] \leftarrow c_e: pred[v] \leftarrow e$$

Kruskal’s Algorithm

Consider edges in ascending order of cost:

Add to tree unless it would create a cycle.

Theorem: Kruskal’s algorithm computes an MST

Proof: Special case of greedy algorithm:

Case 1: both endpoints of $$e$$ in the same blue tree.

$$\Rightarrow$$ color $$e$$ red by applying red rule to unique cycle (all other edges in the cycle are blue)
Case 2: both endpoints of $$e$$ in different blue trees

$$\Rightarrow$$ colour $$e$$ blue by applying blue rule to cutset defined by either tree. $$\blacksquare$$

Pseudocode

Theorem Kruskal’s algorithm can be implemented in $$O(m \cdot log(m))$$ time

Sort edges by cost
Use union-find data structure to dynamically maintain connected components

$$Kruskal(V,E,c)$$

$$Sort$$ $$m$$ edges by cost and renumber so that $$c(e_1) \leq c(e_2) \leq … \leq c(e_m)$$

$$T\leftarrow \empty$$

foreach $$v\in V:MakeSet(v)$$

for $$i=1$$ to $$m$$

$$(u,v) \leftarrow e_i$$

if $$(FindSet(u) \neq FindSet(v))$$ - are $$u,v$$ in the same component

$$T\leftarrow T \cup {e_i}$$

$$Union(u,v)$$ - make $$u,v$$ in the same component

return $$T$$

Reverse Delete Algorithm

Start with all edges in $$T$$ and consider them in descending order of cost:

Delete edge from $$T$$ unless it would disconnect $$T$$

Theorem the reverse delete algorithm computes an MST

Proof Special case of greedy algorithm

Case 1: (Deleting edge $$e$$ does not disconnect $$T$$)

$$\Rightarrow$$ apply red rule to cycle $$C$$ formed by adding $$e$$ to another path in $$T$$ between its two endpoints
Case 2: (Deleting edge $$e$$ disconnects $$T$$)

$$\Rightarrow$$ apply blue rule to cutset $$D$$ included by either component $$\blacksquare$$

Fact: Can be implemented in $$O(m \cdot log(n \cdot log(log(n))^3))$$ time - almost as fast as the others. Useful for clustering

Minimum Spanning Tree Problem - Additional reading

The Problem

Suppose we have a set of location $$V - (v_1, v_2,..,v_n)$$, and we want to build a communication network on top of them. The network should be connected - there should be a path between every pair of nodes - however, this should be done as cheaply as possible.
For some pairs $$(v_i,v_j)$$, we may connect an edge directly between them, for a certain cost: $$c(v_i,v_j) > 0$$.
- Therefore, we can represent the set of possible links that may be built using a graph $$G=(V,E)$$ with a positive cost associated with each edge.
- The problem is to find a subset of the edges $$T\subseteq E$$ so the graph $$(V,T)$$ is connected, and the total cost of all edges is as small as possible.
  - Assumption - That the graph $$G$$ is connected or no possible solution
Observation Let $$T$$ be the minimum-cost solution to the network design problem, then $$(V,T$$)$$ is a tree.
Proof : By definition, $$(V,T)$$ must be connected; we show it also contains no cycles.
- If it contained a cycle $$C$$, and let $$e$$ be any edge on $$C$$.
- $$(V,T-{e})$$ is still connected, as it can go the “long” way round.
- Therefore, it follows that $$(V,T -{e})$$ is a solution.
- $$\square$$
If there are any edges with cost 0, then the solution may have extra edges, as it can still be minimum cost. Despite this, there still is a minimum solution that is a tree
- Starting from any minimum cost solution, we could keep deleting edges on cycles until there is a tree.
We call a subset $$T\subseteq E$$ a spanning tree of $$G$$ if $$(V,T)$$ is a tree. The statement “Let $$T$$ be a minimum cost solution to the network design problem, Then $$(V,T)$$ is a tree.”
- This says the goal of the network design problem can be rephased that of finding the cheapest spanning tree of the graph - called the minimum spanning tree problem.
- Unless G is very simple, it will have exponentially many different spanning trees, making the initial approach very unclear.

Designing the Algorithm

Once again, many greedy algorithms potentially could be solutions to the problem. Here are three examples:

This algorithm stars without any edges at all and builds a spanning tree by successively inserting edges from $$E$$ in order of increasing cost. As we move through the edges in this order, we insert each edge $$e$$ as long as it does not create a cycle when added to the edges already inserted. If it would create a cycle, $$e$$ is discarded and continue. - Kruskal’s Algorithm
Another can be designed by analogy with Dijkstra’s algorithm for paths.
- Start with a root node $$s$$ and try to greedily grow a tree from $$s$$ outward. At each step, we imply add the node that can be attached as cheaply as possible to the partial tree which we already have:
- We maintain a set $$S\subseteq V$$ on which a spanning tree has been constructed so far. Initially $$S={s}$$. In each iteration we grow $$S$$ by one node, adding the node $$v$$ that minimises the attachment cost $$min_{e=(u,v):u\in S} c_e$$ and including the edge $$e=(u,v)$$ that achieves this minimum spanning tree: - Prim’s Algorithm
“Backward Version of Kruskal’s” we start with a full graph $$(V,E)$$ and begin deleting edges in order of decreasing cost. As we get to each edge $$e$$ - starting with the most expensive, we delete it, so long as this does not disconnect the graph. - Reverse-Delete Algorithm

Analysing the Algorithms

We assume that all edge costs are distinct from each other.

When is it safe to include an edge in the minimum spanning tree?

Statement (4.17) Assume that all edge costs are distinct, Let $$S$$ be any subset of nodes that is neither empty nor equal to all of $$V$$, and let edge $$e=(v,w)$$ be the minimum cost edge with one end in S and the other in $$V-S$$, then every minimum spanning tree contains the edge $$e$$

Proof Let $$T$$ be the spanning tree that does not contain $$e$$; we need to show that $$T$$ does not have the minimum possible cost.

The ends of $$e$$ are $$v,w$$. $$T$$ is a spanning tree, so there must be a path $$P$$ in $$T$$ from $$v$$ to $$w$$. Starting at $$v$$, suppose we follow the nodes of $$P$$ in a sequence. There is a first node $$w’$$ on $$P$$ that is in $$V-S$$.
Let $$v’\in S$$ be be the node just before $$w’$$ on $$P$$, and let $$e’=(v’,w’)$$ be the edge joining them.
Therefore, $$e’$$ is an edge of $$T$$ with one end in $$S$$ and the other in $$V-S$$
If we exchange the edges $$e$$ for $$e’$$, we get a set of edges $$T’=T-{e'}\cup {e}$$, we claim that $$T’$$ is a spanning tree. Clearly $$(V,T’)$$ is connected, since $$(V,T)$$ is connected, and any path in $$(V,T)$$ that used edge $$e’=(v’,w’)$$ can now be rerouted in $$(V,T’)$$ to follow the portionof $$P$$ from $$v’$$ to $$v$$, then the edge $$e$$, and then the portion of $$P$$ from $$w$$ to $$w’$$. To see that $$(V,T’)$$ is also acyclic, note that the only cycle in $$(V, T’\cup {e’})$$ is the one composed of $$e$$ and the path $$P$$, and this cycle is not present in $$(V,T’)$$ due to the deletion of $$e’$$.
We noted above that the edge $$e’$$ has one end in $$S$$ and the other in $$V-S$$. But $$e$$ is the cheapest edge with this property, and so $$c_e < c_{e’}$$ (inequality is strict as no two edges have the same cost). Thus the total cost of $$T’$$ is less than that of $$T$$. $$\blacksquare$$

Kruskal's Algorithm produces a minimum spanning tree of G

Proof:

Consider any edge $$e=(v,w)$$ added by KA, and let $$S$$ be the set of all nodes to which $$v$$ has a path at the moment just before $$e$$ is added. Clearly $$v\in S$$, but $$w\notin S$$ since adding $$e$$ does not create a cycle. Moreover, no edge from $$S$$ to $$V- S$$ has been encountered yet, since any such edge could have been added without creating a cycle, and hence would have been added by KAs algorithm. Therefore $$e$$ is the cheapest edge with one end in $$S$$ and the other in $$V-S$$, and by (4.17), it belongs to every minimum spanning tree.
Clearly $$(V,T)$$ contains no cycles, since the algorithm is explicitly designed to avoid creating cycles. Further, if $$(V,T)$$ were not connected, then there would exist a nonempty subset of nodes $$S$$ (not all equal to $$V$$) such that there is no edge from $$S$$ to $$V-S$$, but this contradicts the algorithm’s behaviour: we know that since $$G$$ is connected, there is at least one edge between $$S$$ and $$V-S$$ and the algorithm will add the first of these that it encounters. $$\blacksquare$$

Prims Algorithm produces a minimum spanning tree of G

Proof

In each iteration there is a set $$ S\subseteq V$$ on which a partial spanning tree has been constructed, and a nore $$v$$ and edge $$e$$ are added that minimize the quantity $$min_{e=(u,v):u\in S} c_e$$. By definition, $$e$$ is the cheapest edge with one end i $$S$$ and the other end in $$V-S$$, and so by the cut property (4.17) it is in every minimum spanning tree.
It is also straightforward to show that Prims Algorithm produces a spanning tree of G, and hence it produces a minimum spanning tree. $$\blacksquare$$

Divide and Conquer Algorithms and Recurrences

Divide-and-Conquer

divide up problem into several subproblems of the same kind
Solve each subprovblem recursively
Combine soluitions to subproblems into overall solution

Most Common Use

Divide a problem of size $$n$$ into subproblems of size $$n/2$$ - $$O(n)$$ time
Solve two subproblems recursively
Combine two solutions into overall solution - $$O(n)$$ time

Consequence

Brute Force: $$\Theta(n^2)$$
Divide-and-conquer: $$O(n log n )$$

Merge Sort

Given a list of $$n$$ elements from a totally ordered universe, rearrange them in ascending order.

Sorting applications

Google page rank results
Identify statistical outliers
Binary search in a DB
Remove duplicates in a mailing list
Convex hull
Closest pair of points
Interval scheduling
Minimizing lateness
Minimum spanning trees.

Mergesort Process

Recursively sort left half
Recursively sort right half
Merge two halves to make the sorted whole

Merging

Scan $$A$$ and $$B$$ from left to right
Compare $$a_i, b_j$$
If $$a_i \leq b_j$$ append $$a_i$$ to $$C$$ - no larger than any remaining element in $$B$$
If $$a_i > b_j$$ append $$b_j$$ to $$C$$ - smaller than every remaining element in $$A$$

Moving the pointer from left to right shows that there can be maximum of $$n$$ comparisons

Mergesort Implementation

Input List $$L$$ of $$n$$ elements from a totally ordered universe

Output The $$n$$ elements in ascending order

$$MergeSort(L)$$

If(list $$L$$ has one element)

Return $$L$$

Divide the list into two halves $$A,B$$

$$A\leftarrow MergeSort(A)$$ - $$T(n/2)$$

$$B\leftarrow MergeSort(B)$$ -$$T(n/2)$$

$$L \leftarrow Merge(A,B)$$ - $$\Theta(n)$$

Return $$L$$

Mergesort Time Complexity

$$T(n) =$$ max number of compares to mergesort of a list of length $$n$$

Recurrence

The solution is $$O(n$$ $$log$$ $$n)$$

Proofs

Describe several ways to solve the recurrence, and initially assume $$n$$ is a power of 2 and replace $$\leq$$ with $$=$$ in the reccurence.

Divide and conquer recurrence - recursion tree

If $$n =1, T(n) = 0$$
If $$n>1, T(n) = 2T(n/2)+n$$

The tree below can be grouped into levels, where in each level there are 2^level^ nodes, where level is 0 indexed

Level 0 $$=1\cdot n$$ - the denominators cancel,
Level 1$$=2\cdot n/2$$
Level 2 = $$2^2\cdot n/4$$

The number of levels are $$log$$ $$n$$, therefore, the total running time is $$O(n$$ $$log$$ $$n)$$

Proof of Theorem by Induction

If the following is satisfied, then $$T(n) = n$$ $$log$$ $$n$$

If $$n =1, T(n) = 0$$
If $$n>1, T(n) = 2T(n/2)+n$$

Closest Pair Of Points Problem

Given $$n$$ points in the pane, find the pair of points with the smallest Euclidean distance between them

Fundamental geometric primitive

Graphic, computer vision, geographic information systems, molecular modelling, air traffic control
Special case of nearest neighbour, Euclidean MST, Voronoi

Brute force - check all pairs with the $$\Theta(n)$$ distance calculations

1D version - Easy $$O(nlogn)$$ algorithm if points are on a line

Non degeneracy assumption No two points have the same $$x$$ coordinate - technically true if you consider real real numbers

First attempt

Sort by $$x$$ coordinate and consider nearby points
Sort by $$y$$ coordinate and consider nearby points.

This does not work

In this case, the closest two points are far apart in comparison to the rest, therefore will not be established to be the closest point

Second Attempt

Divide the into 4 parts - ideally with similar number of points, however this may often be impossible.

Third Attempt

Divide - draw vertical line $$L$$ so that $$n/2$$ points on each side

Conquer - find the closest pair in each side recursively

Combine - find the closest pair with one point in each side

Return - three best solutions

How to find the closest pair with one point on each side.

Find the closest pair with one point in each side, assuming that the distance $$< \delta$$

Observation - suffices to consider only those points within $$\delta$$ of the line $$L$$
$$\delta = min(12,21)$$ - only consider points within that dividing line
- Any points that are further than $$\delta$$, mean that they will not be the closest pair of points as there are a pair of points that are distance $$zdelta$$
Sort points in 2 $$\delta$$ - strip by their $$y$$ coordinate
Check distances of only those points within 7 positions in the sorted list
- Note the pair of points that were closest to each other
- Repeat this step with the other points, record the smallest of those distances and update the smallest distance found so far.
- The worst case can be linear, however we are only going to consider 7 of the points each times, and is going of be linear in the number of all points.

Proof of correctness

Let $$s_i$$ be the points in the $$2\delta$$ strip, with the $$i^{th}$$ smallest y coordinate
Claim - I f$$|j-i|>7$$ then the distance between the two points is at least $$\delta$$
Consider a $$2\delta$$ by $$\delta$$ rectangle $$R$$ in the strip whose min $$y$$ coordinate is $$y$$ coordinate of $$s_j$$
Distance between $$s_i$$ and any point $$s_j$$ above $$R$$ is $$\geq \delta$$
Subdivide $$R$$ into 8 squares
At most 1 per points per square - the largest distance between two points in one of the squares is the diagonal is $$\delta / \sqrt{2} <\delta$$
- Each of those squares is either to the left or the right of the dividing line $$L$$ cannot be as far as delta apart,
At most 7 other points can be in $$R$$ - this can be refined with a geometric packing algorithm
- Any point whose index is greater than 8, the height will be at least $$\delta$$, therefore there is no way that it is smaller than delta

Algorithm

$$ClostestPair(p_1,p_2,…,p_n)$$

compute vertical line $$L$$ such that half the points are on each side of the line - $$O(n)$$

$$\delta_{1} \leftarrow$$ $$ClosestPair($$points in left half$$)$$ - $$T(n/2)$$

$$\delta_{2} \leftarrow$$ $$ClosestPair($$points in right half$$)$$ - $$T(n/2)$$

$$\delta \leftarrow min{\delta_{1}, \delta_2}$$

Delete all points further than $$\delta$$ from line $$L$$ $$O(n)$$

Sort remaining by y - coordinate - $$O(n \cdot log (n))$$

Scan points in $$y$$-order and compare distances between each point and the next 7 neighbours if any one of these distances is less than $$\delta$$,

then update $$\delta$$ - $$O(n)$$

return $$\delta$$

The Master Theorem

Divide and Conquer Recurrences

Recipe for solving common DC recurrences

$$T(n) = aT\big(\frac{n}{b}\big) + f(n)$$

With $$T(0) = 0 $$ and $$T(1) = \Theta (1)$$

Terms

$$a\geq 1$$ is the number of subproblems
$$b\geq 2$$ is the factor by which the subproblem size decreases
$$f(n) \geq 0 $$ is the work to divide and combine subproblems

Recursion Tree

$$a=$$ branching factor
$$a^i$$ = number of subproblems at level $$i$$
$$1+log_b n$$ levels
$$n/b^i=$$ size of subproblems at level $$i$$

Suppose that $$T(n)$$ satisifies $$T(n) = aT(n/b) + n^c$$ with $$T(1) = 1)$$ for $$n$$ a power of $$b$$

$$a^{log^bn} = (b^{log_b a})^{log_b n}= b^{log_ba\cdot log_b n }= (b^{lob_b n})^{log_ba} = n^{log_ba}$$

Let $$r=a/b^c$$. $$r< 1$$ iff $$c>log_ba$$

Geometric series
- If $$0<r<1$$, then $$1+r+r^2+...+r^k \leq 1/(1-r)$$
- If $$r=1$$ then $$1+r+r^2+...+r^k = k+1$$
- If $$r>1$$, then $$1+r+r^2+...+r^k = (r^{k+1}-1)/(r-1)$$
$$\frac{r^{k+1} -1}{r-1} = O(r^k)$$ as $$r$$ is a constant

Therefore

$$T(n)= n^c\cdot \sum\limits_{i=0}^{log_bn}= n^c \cdot O((\frac{a}{b})^{log^bn}) = n^c \cdot O(\frac{n^{log_ba}}{n^c})=O(n^{log_ba})$$

The Master Theorem

Recipe for solving common DC recurrences

$$T(n) = aT\big(\frac{n}{b}\big) + \Theta(n^c)$$

With $$T(0) = 0 $$ and $$T(1) = \Theta (1)$$, where $$n/b$$ means either $$\lfloor{n/b}\rfloor$$ or $$\lceil n/b \rceil$$

3 Cases:

If $$c<log^ba$$ then $$T(n) = \Theta(n^{log_ba})$$ - the solution is the bigger of the two numbers
If $$c=log^ba$$ then $$T(n) = \Theta(n^c\cdot log$$ $$n)$$
If $$c>log^ba$$ then $$T(n) = \Theta(n^c)$$ - the solution is the bigger of the two numbers

Extensions

Can replace $$\Theta$$ with $$O$$ everywhere
Can replace $$\Theta$$ with $$\Omega$$ everywhere
Can replace initial conditions with $$T(n) = \Theta(1) \forall n\leq n_0$$ and require recurrence to hold only $$\forall n \in \Z :n > n_0$$

Example 1

$$T(n) = 3T(\lfloor n/2\rfloor) +5n$$
$$a=3,b=2,c=1,log_ba<1.58$$
$$T(n) = \Theta (n^{log_23})=O(n^{1.58})$$

Example 2

$$T(n) = T(\lfloor n/2\rfloor) + T(\lceil n/2 \rceil) +17n$$
$$a=2,b=2,c=1,log_ba=1$$
$$T(n) = \Theta(n$$ $$log$$ $$n)$$

Example 3

$$T(n) = 48T(\lfloor n/4\rfloor) +n^3$$
$$a=48,b=4,c=3, log_ba>2.79$$
$$T(n) = \Theta(n^3)$$

Gaps in the Master Theorem

Number of subproblems is not a constant
- $$T(n) = n\cdot T(n/2) + n^2$$
Number of subproblems is less than $$1$$
- $$T(n) = \frac{1}{2}\cdot T(n/2) + n^2$$
Work to divide the subproblems is not $$\Theta(n^c)$$
- $$T(n) = {2}\cdot T(n/2) + n$$ $$log$$ $$n$$

Integer Addition

Addition - given two $$n$$ - bit integers $$a,b$$ compute $$a+b$$

Subtraction - given two $$n$$ - bit integers $$a,b$$ compute $$a-b$$

Standard: $$\Theta(n)$$ bit operations

Integer Multiplication

Given two $$n$$-bit numbers $$a,b$$ compute $$a\cdot b$$
Typical algorithm - long multiplication:
- $$\Theta(n^2)$$

Karatsuba’s divide-and-conquer Algorithm

Divide $$x,y$$ into low and high order orbits
Multiply four $$1/2 n$$ bit integers recursively
Add and shift to obtain result:
- $$m=\lceil n/2 \rceil$$
- $$a=\lfloor x/2^m \rfloor$$ $$b=x$$ $$mod$$ $$ 2^m$$
- $$c=\lfloor y/2^m \rfloor$$ $$d=y$$ $$mod$$ $$ 2^m$$
$$xy = (2^ma+b)(2^mc+d)=2^{2m}ac+2^m(bc+ad)+bd$$
Multiplication by two can be done via a bit shift

$$Multiply(x,y,n)$$

If $$(n=1)$$

Return $$x\times y$$

Else

$$m\leftarrow \lceil n/2 \rceil$$ $$\Longleftarrow \Theta(n)$$

$$a \leftarrow\lfloor x/2^m \rfloor;$$ $$b=x$$ $$mod$$ $$ 2^m$$ $$\Longleftarrow \Theta(n)$$

$$c\leftarrow\lfloor y/2^m \rfloor;$$ $$d=y$$ $$mod$$ $$ 2^m$$ $$\Longleftarrow \Theta(n)$$

$$e\leftarrow Multiply(a,c,m).$$ $$\Longleftarrow T(\lceil n/2\rceil)$$

$$f\leftarrow Multiply(b,d,m).$$ $$\Longleftarrow T(\lceil n/2\rceil)$$

$$g\leftarrow Multiply(b,c,m).$$ $$\Longleftarrow T(\lceil n/2\rceil)$$

$$h\leftarrow Multiply(a,d,m). $$ $$\Longleftarrow T(\lceil n/2\rceil)$$

Return $$2^{2m}e + 2^m(g+h)+f$$ $$\Longleftarrow \Theta(n)$$

Karatsuba Trick

divide x and y into low and hih order bits
Compute middle term $$bc + ad$$ use identity:
- $$bc+ad=ac+bd=(a-b)(c-d)$$
Multiply only three $$1/2\cdot n$$ bit integers recursively
- $$m=\lceil n/2 \rceil$$
- $$a=\lfloor x/2^m \rfloor$$ $$b=x$$ $$mod$$ $$ 2^m$$
- $$c=\lfloor y/2^m \rfloor$$ $$d=y$$ $$mod$$ $$ 2^m$$
$$xy = (2^ma+b)(2^mc+d)=2^{2m}ac+2^m(bc+ad)+bd= 2^{2m}ac + 2^m(ac+bd-(a-b)(c-d))+bd$$
- This means that only three components need be calculated - the middle term can be calculated using the other parts

Algorithm

$$KaratsubaMultiply(x,y,n)$$

If $$(n=1)$$

Return $$x\times y$$

Else

$$m\leftarrow \lceil n/2 \rceil$$ $$\Longleftarrow \Theta(n)$$

$$a \leftarrow\lfloor x/2^m \rfloor;$$ $$b=x$$ $$mod$$ $$ 2^m$$ $$\Longleftarrow \Theta(n)$$

$$c\leftarrow\lfloor y/2^m \rfloor;$$ $$d=y$$ $$mod$$ $$ 2^m$$ $$\Longleftarrow \Theta(n)$$

$$e\leftarrow KaratsubaMultiply(a,c,m).$$ $$\Longleftarrow T(\lceil n/2\rceil)$$

$$f\leftarrow KaratsubaMultiply(b,d,m).$$ $$\Longleftarrow T(\lceil n/2\rceil)$$

$$g\leftarrow KaratsubaMultiply(|a-b|,|c-d|,m).$$ $$\Longleftarrow T(\lceil n/2\rceil)$$

Flip sign of $$g$$ if needed

Return $$2^{2m}e + 2^m(e+f-g)+f$$

Applying case 1 of the master theoram:

$$\Longrightarrow T(n)=\Theta(n^{log_23})=P(n^{1.585})$$

Integer Arithmetic Reductions

Arithmetic Problem	Formula	Bit Complexity
Integer Multiplication	$$a\times b$$	$$M(n)$$
Integer Square	$$a^2$$	$$\Theta(M(n))$$
Integer Division	$$\lfloor a/b\rfloor,a$$ $$mod$$ $$b$$	$$\Theta(M(n))$$
Integer Square Root	$$\lfloor \sqrt a \rfloor$$	$$\Theta(M(n))$$

Dynamic Programming

Algorithmic paradigms:

Greed - Process the input in some order, making irreversible decisions.
Divide and Conquer - break up a problem into independent subproblems, solve each subproblem, combine solutions to subproblems to form solution to original problem.
Dynamic programming - break up a problem into a series of overlapping subproblems, combine to smaller subproblems to form a solution to a large subproblems.

Brief History

Bellman pioneered the systematic study of dynamic programming i the 1950s

Dynamic programming = planning over time
Secretary of Defence had a pathological fear of mathematical research
Sought a dynamic adjective to avoid conflict.

Application Areas

AI, compilers, systems, graphics
Operations research
Information theory
Control theory
Bioinformatics

Examples

Seam carving
Comparing two files
Hidden Markov Models
Evaluating spline curves
Shortest path working with negative lengths
Word wrapping in TEX
Parsing context free grammars
Sequence alignment

Weighted Interval Scheduling

Instead of maximizing the number of intervals, the idea is to maximise the total weight of the problem - a strict generalization of the problem.

The general interval scheduling problem is a subset of this problem, however assigning the same weight to each of the intervals.

Job $$j$$ starts at $$s_j$$, finishes at $$h_j$$ and has weight $$w_j >0$$
Compatible if they don’t overlap
Find the max weight subset of mutually compatible jobs.

Earliest-finish-time first algorithm

Consider the jobs in ascending order of finish time
Add job to the subset if it is compatible with previously chosen jobs.

The greedy algorithm works when all weights are 1.

Take the example below, the EFTF algorithm selects job a and c, as the inclusion of b does not work.

Bellman Optimality Equation

Jobs are in ascending order of finish time: $$f_1 \leq f_2 \leq … \leq f_n$$

$$p(j)=$$ largest index $$i<j$$ such that $$i,j$$ are compatible – $$i$$ is the rightmost interval that ends before $$j$$

Example:

$$p(8) = 1, p(7) = 3, p(2) = 0$$ - these parameters can be calculated efficiently. - for now assume that the number is known.
All the jobs whose finish time are larger that the start time of 8, are not compatible with 8. They all have some moment where there are multiple jobs occurring whilst 8 is ongoing.
- 1 is the only compatible job with 8.
With job 7, the largest job number is job 3

Binary Choice

Definition $$OPT(j) =$$max weight of any subset mutually compatible jobs for subproblem consisting of only jobs $$1,2,…,j$$

Goal $$OPT(n)=$$ max weight of any subset of mutually compatible jobs.

Case 1: $$OPT(j) $$ does not select $$j$$

Must be an optimal solution to the smaller problem consisting of remaining jobs $$1,2,…,j-1$$
Assume that this optimal solution is not an optimal solution to the solution of the subproblem, which means there is a better one. However since it only uses jobs to $$j-1$$, it is still a solution to the solution with $$j$$ jobs, however this contradicts the fact that it is not an optimal solution.

Case 2: $$OPT(j)$$ selects $$j$$

Collect profit $$w_j$$ - this then include the weight of $$j$$, and the weights of the other jobs in the maximum solution
Can’t use incompatible jobs $${p(j) + 1, p(j) + 2,…, j - 1}$$
- All the jobs that are between $$p(j)$$ and $$j-1$$, they are definitely not compatible with job $$j$$
Must include optimal solution to problem consisting of remaining compatible jobs $$1,2,…,p(j)$$
- Must also include all the optimal solutions up to the $$p(j)$$
- If it didn’t, then the better solution of the subproblem from $$1,...,p(j)$$ then add the $$j$$^th^ job to it as all the jobs up to $$p(1)$$ are compatible with $$j$$, contradicting the assumption that the original solution was optimal.

Bellman Equation:

$$OPT(j) = 0$$ when $$j=1$$
$$OPT(j) = max{(j-1), w_j + OPT(p(j))}$$ when $$j>1$$
- Initially, we do not know whether job $$j$$ is included, all we know that the optimal solution is the maximum of the solution.

Brute Force

$$BruteForce(m,s_1,…,s_n,f_1,…,f_n,w_1,…,w_n)$$

Sort jobs by finish time

Compute $$p[1],p[2],…,p[n]$$ via binary search

Return $$ComputeOpt(n)$$

$$ComputeOpt(j)$$

if $$(j=0)$$

return 0

else

return $$max{ComputeOpt(j-1),w_j+ComputeOpt(p[j])}$$

Running Time

Has the worst case running time of $$O(1.618^n)$$

Slow because of overlapping subproblems - exponential time algorithm
Number of recursive calls for a family of layered instances which grows like a Fibonacci sequence.

Grows at the rate of the ‘golden ratio’

Memoization

Top-down dynamic programming

Cache result of subproblem $$j$$ in $$M[j]$$
Use $$M[j]$$ to avoid solving subproblem $$j$$ more than once

$$TopDown(m,s_1,…,s_n,f_1,…,f_n,w_1,…,w_n)$$

Sort jobs by finish time

Compute $$p[1],p[2],…,p[n]$$ via binary search

$$M[0] \leftarrow 0$$

return $$MComputeOpt(n)$$

$$MComputeOpt(n)$$

if $$(M[j]$$ is uninitialized) - not recomputing, but checking whether there is already a solution for it

$$M[j] \leftarrow max { MComputeOpt(j-1), w_j + MComputeOpt(p[j]) }$$

return $$M[j]$$

Proof Of Efficiency

Algorithm take $$O(n$$ $$log$$ $$n)$$ time

Proof

Sort by finish time takes $$O(n$$ $$log$$ $$n)$$ with mergesort
Computing $$p[j]$$ for each $$j$$: $$O(n$$ $$log$$ $$n)$$ with binary search
$$MComputeOpt(j)$$: each invocation takes $$O(1)$$ time and either
- (1) returns an initialized value $$M[j]$$
- (2) initializes $$M[j]$$ and makes two recursive calls - increases the number of initialized entries
  - This can be executed at most $$n$$ times, as all the recursive calls are two per every execution of 3, the most can be $$2n$$
Progress measure $$\Phi =#$$ initialized entries among $$M[1,..,n]$$
- Initially $$\Phi = 0;$$ throughout $$\Phi \leq n$$
- (2) increases $$\Phi$$ by 1 $$\Longrightarrow \leq 2n$$ recursive calls
Overall running time of $$MComputeOpt(n)$$ is $$O(n)$$

Finding a Solution

Make a second pass by calling $$FindSolution(n)$$

$$FindSolution(j)$$

if $$(j=0)$$

return $$\empty$$

else if $$(w_j + M[p[j]] > M[j-1])$$ - strictly larger - including job $$j$$ was the better option

return $${j} \cup FindSolution(p[j])$$

else

return $$FindSolution(j-1)$$

$$M[j] = max {M[j-1], w_j + M[p[j]]}$$

The number of recursive calls $$\leq n \Leftarrow O(n)$$

Bottom Up Dynamic Programming

$$BottomUp(m,s_1,…,s_n,f_1,…,f_n,w_1,…,w_n)$$

Sort jobs by finish time and renumber

Compute $$P[1],p[2],…,p[n]$$

$$M[0]\leftarrow 0$$

for $$j=1$$ to $$n$$

$$M[j] \leftarrow max{M[j-1],m_j,M[p[j]]}$$

Takes $$O(n$$ $$log$$ $$n)$$ time - dominated by the sorting and computing of $$p$$ values

Knapsack Problem

Goal - pack a knapsack so as to maximise the total value of items taken

There are $$n$$ items, item $$i$$ provides value $$v_i>0$$ and weighs $$w_i>0$$
Value of a subset of items = sum of values of individual items
Has a weight limit of $$W$$

Assumption: all values and weights are integral

Dynamic Programming - two variables

Definition $$OPT(i,w)$$ = optimal value of knapsack problem with items $$1,…,i$$ subject to weight limit $$w$$

Goal $$OPT(n,W)$$

Case 1: $$OPT(i,w)$$ does not select item $$i$$ - likely because $$w_i >w$$

$$OPT(i,w)$$ selects best of $${1,2,…,i-1}$$ subject to weight limit $$w$$ n

Case 2: $$OPT(i,w)$$ selects item $$i$$

Collect value $$v_i$$
New egith limit = $$w -w_i$$
$$OPT(i,w)$$ selects best of $${1,2,…,i-1}$$ subject to new weight limit

Bellman Equation

$$Knapsack(n,W,w_1,…,w_n,v_1,…,v_n)$$

for $$w=0$$ to $$W$$

$$M[0,w] \leftarrow 0$$

for $$i=1$$ to $$n$$

for $$w=0$$ to $$n$$

if $$(w_i>w)$$ then $$M[i,w]\leftarrow M[i-1,w]$$

else $$M[i,w] \leftarrow max {M[i-1,w],v_i +M[i-1,w-w_i]}$$

return $$M[n,W]$$

$$OPT(i,w)$$

$$=0$$ if $$i=0$$
$$=OPT(i-1,w)$$ if $$w_i>w$$
$$=max{OPT(i-1,w),v_i+OPT(i-1,w-w_i)}$$ otherwise

Running Time

Theorem Solves the knapsack problem with $$n$$ items and maximum weight $$W$$ in $$\Theta(n\cdot W)$$ time and space - only with integers

Proof

Takes $$O(1)$$ time per entry
There are $$\Theta(n\cdot W)$$
After computing optimal values, can trace back to find a solution - $$OPT(i,w)$$ takes the item $$i$$ iff $$M[i,w]>M[i-1,w]$$

Sequence alignment

Originates from speech recognition and natual language processing - how similar are two string

Different ways of determining this

Edit Distance

Gap penalty: $$\delta$$, mismatch penalty $$\alpha$$
Cost = sum of gap and mismatch penalty
Also applications in computational biology in sequence alignment of DNA and proteins

Algorithm

Given tro strings: $$x_1,…,x_m$$ and $$y_1,…,y_n$$ and find a minimmum cost alignment
An alignment $$M$$ is a set of ordered pairs $$x_i-y_j$$ such that each charactar appears in at most one pair and no crossings
- $$x_i-y_j$$ and $$x_{i’} - y_{j’}$$ cross if $$i<i’$$ but $$j >j’$$
The cost of alignment $$M$$ is:

some of the items may be aligned to gaps

Problem structure

OPT(i,j) = min cost of aligning prefix strings $$x_1,…,x_i$$ and $$y_1,…,y_j$$
Goal: OPT(m,n)

Case 1: $$OPT(i,j)$$ matches $$x_i-y_j$$

Pay mismatch for $$x_i-y_j$$ + min cost of aligning $$x_1,…,x_{i-1}$$, and $$y_1,...y_{j-1}$$
The rest of the alignment must find a way of aligning the previous strings. Customary that in an optimal alignment in this case, it must be the case that the rest must be an optimal alignment of the prefixes of the words as if it wasn’t an optimal, then we could improve the cost of the whole alignment of substituting the alignment of the prefixes to an optimal one.
When an optimal alignment matches the prefixes of the last letters, the rest of the alignment must be an optimal one

Case 2 a: $$OPT(i,j)$$ leaves $$x_i$$ unmatched

Pay gap for $$x_i+$$ min cost of aligning $$x_1,…,x_{i-1}$$, and $$y_1,...y_{j}$$
The last letter in $$x$$ is not matched must be matched to a gap
The corresponding prefixes must be matched in an optimal alignment

Case 2 b: $$OPT(i,j)$$ leaves $$y_i$$ unmatched

Pay gap for $$y_i+$$ min cost of aligning $$x_1,…,x_{i}$$, and $$y_1,...y_{j-1}$$
Last letter matched to a gap

Number of subproblems is $$O(mn)$$

Fill a two d array, with $$m,n$$ as the dimensions
- The values for the cell $$i,k$$ relies on the values of 1 left, 1 above and 1 above-left

Analysis

The dynamic programming algorithm computes the edit distance and an optimal alignment of two strings of length $$m,n$$ in $$\Theta(mn)$$ time and space

Proof

Algorithm computes edit distance
Cant race back to extract optimal alignment itself

Shortest paths with negative edge weights

Given a digraph $$G=(V,E)$$ with arbitrary edge lengths, find the shortest path from source node $$n$$ to destination node $$t$$

Dijkstra may not find the optimal solution when the edge weights are negative. Adding constants to every node does not necessarily work either.

Negative Cycles

A negative cycle is a directed cycle for which the sum of its edge weight lengths is negative

a critical issue:

Lemma 1 - if there exists a path $$v\rightsquigarrow t$$ contains a negative cycle, then there does not exist a shortest path form $$v$$ to $$t$$

If there is a negative cycle, then we can just loop it infinitely many times.

Lemma 2 - if $$G$$ has no negative cycles, then there exists a shortest path $$t,v$$ that is simple and has $$\leq n-1$$ edges

Among shortest paths, consider than one uses the fewest edges
If that path $$P$$ contains a directed cycle $$W$$, we can remove a portion of $$P$$ corresponding to $$W$$ without increasing its length.

Shortest paths and negative cycle problems

Single destination shortest paths problem - Gicen a digraph $$G=(V,e)$$ with edge lengths, but no negative cycles and a destination node $$t$$, find shortest path $$\rightsquigarrow t$$ for every node $$v$$

Negative cycle problem. Given a digraph $$G=(V,E)$$ with edge lengths find a negative cycle if it exists

Def: $$OPT(i,v)=$$ length of shortest $$v,t$$ path that uses $$\leq i$$ edges

Goal - $$OPT(n-1,v)$$ for each $$v$$

Case 1: Shortest $$v,t$$ path uses $$\leq i-1$$ edges

$$OPT(i,v) = OPT(i-1,v)$$

Case 2: Shortest $$v,t$$ path uses exactly $$i$$ edges

IF $$(v,w)$$ is the firest edge in shortest such path, incur a cost of $$l_{vw}$$
Then, select best $$w,t$$ path using $$\leq i-1 $$ edges

Time complexity is $$O(n^2)$$ where $$n$$ is the number of vertices in the graph.

Analysis

Theorem: Given a digraph $$G=(V,E)$$ with no nagative cycles the DP algorithm computes the length of a shortest $$v,t$$ path for evey node $$v$$ in $$\Theta(mn)$$ times, and $$\Theta(n^2)$$ space

Proof

Table required $$\Theta(n^2)$$ space
Each iteration $$i$$ takes $$\Theta(m)$$ time since we examine each node once.

Finding shortest Paths

Approach 1: maintain successor[i,v] that points to the next node on shortest $$v,t$$ path using $$\leq i$$ edges
Approach 2: compute omptimal lengths $$M[i,j]$$ and consider only edges with $$M[i,v] = M[i-1,w] + l_{vw}$$. Any directed path in this subgraph is a shortest path

Bellman Ford Algorithm

Maintain two 1D arrays
- $$d[v]=$$ length of a shortest path $$v,t$$ path that we have found so far
- $$successor[v]=$$ next node on a $$v,t$$ path

Performance optimization:

If $$d[w]$$ was not updated in iteration $$i-1$$ then no reason to consider edges entering $$w$$ in iteration $$i$$
- This will not lead to a worst case asymptotic improvement

Algorithm

1d arrays indexed by the vertices in the graph
They are initialised, so that $$d$$ values that are $$\inf$$ and the successor values are $$null$$
$$d[t]$$ = 0
If there are no changes - this would mean that it will not change for later values.

Analysis

Lemma 3 For each node $$v:d[v]$$ is the length of some $$vt$$ path

Every time $$d[v]$$ is updated, we find a new path which instead of following some edge from $$v$$ now follows another edge which gives a better estimate.
IHN from the destination of that edge, we know that there is some path from that vertex to $$t$$ so the first edge and the path existing by IHN to establish a path from $$v$$ to $$t$$ which has exactly the length equal to the value of that estimate

Lemma 4 For each node $$v:d[v]$$ is monotone non-increasing

The value of $$d$$ cannot increase for any vertex
That whenever $$d$$ is changed, it is only a value strictly smaller than its current value

Lemma 5 After pass $$i,d[v]\leq$$length of a shortest $$v,t$$ path using $$\leq i$$ edges

Induction on $$i$$

IHN $$d^i[v]\leq l(P’)$$ that $$d^{i+1}[v]\leq l(P)$$

Base case: $$i=0$$
Assume true after pass $$i$$
Let $$P$$ be any $$v,t$$ path with $$\leq i+1$$ edges
Let $$(v,w)$$ be first edge in $$P$$ and let $$P’$$ be a subpath from $$w$$ to $$t$$
By IHN, at the end of pass $$i,d[q]\leq c(P’)$$ because $$P’$$ is a $$w,t$$ path with $$\leq i$$ edges
After considering edge $$(v,w)$$ in pass $$i+1$$

$$d^{i+1}[v]\leq d[v] \leq l_{vw}+d[w]$$ - by lemma 4, $$d[v]$$ does not increase

$$\leq l_{vw} + l(P’)$$

$$= l(P)$$

Final proof

Assuming nonegative cycles, Bellman Ford computes the lengths of the shortest $$v,t$$ paths in $$O(mn)$$ time and $$\Theta(n)$$ extra space

Lemma 2 + Lemma 4 + Lemma 3
Shortest paths exist with at most $$n-1$$ edges, after $$i$$ passes, the value of $$d[v]\leq$$ of the shortest path using at most $$i$$ edges, $$d[v]$$ is the length of some $$v,t$$ path

Typically faster in practice:

Edge $$(v,w)$$ considered in pass $$i+1$$ only if $$d[w]$$ updtaed in pass $$i$$
If shortest path has $$k$$ edges, then the algorithm finds it after $$\leq k$$ passes

Successor Array

Throughout Bellman Ford, following $$successor[d]$$ pointers gives a directed path from $$v$$ to $$i$$ of length $$d[v]$$

This claim is false
Length of successor may be strictly shorter that $$d[v]$$

This leaves us with a path of successors can be shorter than the value of $$d[v]$$
If there are negative cycles, the situation may become worse - the successor graph may have directed cycles

Lemma 6

The sum of the weights in the successor cycle is negative

Every directed cycle $$W$$ in the successor edges is a negative cycle

Proof

If $$successor[v]=w$$ we must have $$d[v] \geq d[q] + l_{vw}$$
- LHS and RHS are equal when successor[v] is set, d[w] can only decrease
- $$d[v]$$ decreases only when successor[v] is reset
- LEt $$v_1\rightarrow v_2\rightarrow…\rightarrow v_k\rightarrow v_1$$ be the seqeuence of nodes in a directed cycle $$W$$
- Assume that $$(v_k,v_1)$$ is the last edge in $$W$$ added to the successor graph
- Prior to that:

Adding inequalities yeilds $$l(v_1,v_2) + l(v_2,v_3) + … + l(v_{k-1},v_k) + l(v_k,v_1)$$
- Hence the negatvie cycle

Theorem 3

If we assume there are no negative cycles, Bellman Frm finds the shortest $$v,t$$ paths for every node $$v$$ in $$O(mn)$$ time and $$\Theta(n)$$ extra space

Proof

The successor graph cannot have a directed cycle
Thus, following the successor points from $$v$$ yeilds a directed path to $$t$$
Let $$v_1\rightarrow v_2\rightarrow…\rightarrow v_k=t$$ be the nodes along this path $$P$$
Upon termination, if $$successor[v]=w$$ we must have $$d[v]=d[w]+l_{vw}$$
- LHS and RHS are equal when successor[v] is set. $$d[\cdot]$$ did not change
Therefore:
- Adding the equations gets: $$d[v] = d[t] + l(v_1,v_2) + l(v_2,v_3) + … + l(v_{k-1},v_k) $$
  - Min length of any $$v,t$$ path, $$d[t] = 0$$. rest is length of path $$P$$

Intractability

Algorithm design antipatterns

NP-completeness - $$O(n^k)$$ algorithm unlikely
PSPACE-completeness - $$O(n^k)$$ certification algorithm unlikely
Undecidability - No algorithm possible

Classification according to computational requirements

What can be solved in practice

A working definition - those with poly time algorithms

Theorem Definition is broad and robust - Turing machine, word RAM, uniform circuits - concept of a polynomial time algorithm is independent of the machine models adopted

Practice Poly-time algorithms scale to huge problems

We tend to disregard constants - a multiplicative constant, and the degree of the polynomial - could be impractically large

Yes	Probably no
Shortest-path	Longest path
Min cut	Max cut
2-satisfiability	3-satisfiability
planar 4-colorability	planar 3-colorability
bipartite vertex cover	vertex cover
matching	3-d matching
primality testing	factoring
Linear programming	integer linear programming

Classify Problems

Desiderata - Classify problems according to those that can be solved in polynomial time and those that cannot

Probably requires exponential time

Given a constant-size program, does it halt at most $$k$$ steps - where the input size = $$c + log$$ $$k$$ steps
Given a board position in an $$n$$ by $$n$$ generalisation of checkers can black guarantee a win?

Huge number of fundamental problems have defied classification

Polytime reductions

Goal:

suppose we could solve problem $$Y$$ in polynomial time, what else could we solve in polynomial time

Reduction Problem $$X$$ polynomial-time reduces to problem $$Y$$ if arbitrary instances of problem $$X$$ can be solved using:

Polynomial number of standard computational steps
Polynomial number of calls to oracle that solves problem $$Y$$
- Oracle, computational model supplemented by special piece of hardware that solves instances of $$Y$$ in a single step

Notation $$X\leq_p Y$$

Note - we pay for time to write down instances of $$Y$$ sent to oracle $$\implies$$ instances of $$Y$$ must be of polynomial size

Design Algorithms - If $$X\leq_pY$$ and $$Y$$ can be solved in poly-time

Establish intractability. If $$X\leq_p Y$$ and $$X$$ cannot be solved in polynomial time, then $$Y$$ cannot be solved in polynomial time
Establish equivalence: IF $$X\leq_p Y \wedge Y\leq_pX$$ we use notation $$X\equiv_pY$$, in this case, $$X$$ can be solved in polynomial time iff $$Y$$ can be

Classify problems by relative difficulty

Independent set

An example of a packing problem.

Given a graph $$G=(V,E)$$ and an integer $$k$$, is there a subset of $$k$$ or more vertices such that no two are adjacent

Vertex Cover

An example of a covering problem

Given a graph $$G=(V,E)$$ and an integer $$k$$, is there a subset of $$\leq k$$ vertices such that each edge is incident to at least one vertex in the subset.

Reduction between the two

Theorem: IndependentSet $$\equiv_p$$ VertexCover

We show $$S$$ is an independent size $$k$$ iff $$V-S$$ is a vertex cover of size $$n-k$$

Polytime Reduction:

Input: (G,K)
Output - the result of calling the oracle with input $$(G,n-k)$$ - an algorithm for solving one using instances of the other

Proof:

$$\Rightarrow$$

Let $$S$$ be any independent set of size $$k$$
$$V-S$$ is of size $$n-k$$
Consider an arbitrary edge $$(u,v)\in E$$
$$S$$ independent $$\implies$$ either $$u\notin S$$ or $$v\notin S$$ or both $$\implies$$ either $$u\in V-S$$ or $$v\in V-S$$ or both
Therefore $$V-S$$ covers $$(u,v)$$

$$\Leftarrow$$

Let $$V-S$$ be any vertex cover of size $$n-k$$
$$S$$ is of size $$k$$
Consider an arbitrary edge $$(u,v)\in E$$
$$v-S$$ is a vertex cover $$\implies $$ either $$u\in V-S$$ or $$v\in V-S$$ or both $$\implies$$ either $$u\notin S$$ or $$v\notin S$$ or both
Therefore $$S$$ is an independent set.

$$\blacksquare$$

Set cover

Given a set of $$U$$ elements, a collection of $$S$$ subsets of $$U$$, and an integer $$k$$, are there $$\leq k$$ of these subsets whose union is equal to $$U$$

Application:

$$m$$ available pieces of software
Set of $$U$$ of $$n$$ capabilities that we would like our system to have
The $$i^{th}$$ piece of software provides the set $$S_i\subseteq U$$ of capabilities
Goal - achieve all $$n$$ capabilities using fewest pieces of software

Vertex cover reduces to set cover

Theorem VertexCover $$\leq_p$$ SetCover

Proof Given a VertexCover instance $$G=(V,E)$$ and $$k$$, we construct a SetCover instance $$(U,S,k)$$ that has a set cover of size $$k$$ iff $$G$$ has a vertex cover of size $$k$$

Construction:

Universe $$u=E$$
Include one subset for each node $$v\in V:S_v={e\in E: e$$ incident to $$v}$$

Lemma

$$G=(V,E)$$ contains a vertex of size $$k$$ iff $$(U,S,k)$$ contains a set cover of size $$k$$

Proof

$$\Rightarrow$$ let $$X\subseteq V$$ be a vertex cover of size $$j$$ in $$G$$ - yes instances of vertex cover are solved correctly

Then $$Y={S_v:v\in X}$$ is a set of cover of size $$k$$

$$\Leftarrow$$ Let $$Y\subseteq S$$ be a set cover of size $$k$$ in $$(U,S,k)$$ - no instances of vertex cover are solved correctly

Then $$X={v:S_v\in Y}$$ is a vertex cover of size $$k$$ in $$G$$

Satisfiability

Literal a boolean variable or its negation: $$x_i$$ or $$\overline{x_i}$$

Clause A disjunctive of literals: $$C_j= x_1 \or \overline{x_2} \or x_3$$

Conjunctive normal form CNF A propositional formula $$\Phi$$ that is a conjunction of clauses $$\Phi = C_1 \and C_2 \and C_3 \and C_4$$

SAT Given a CNF formula $$\Phi$$, does it have to satisfy truth assignment?

3-SAT SAT where each clause contains exactly 3 literals - each corresponding to a different variable

Application:

Electronic design automation - EDA

Difficulty

Hypotheses: there does not exist a poly-time algorithm for 3-SAT

$$P$$ vs $$NP$$ - This hypotheses is equivalent to $$P\neq NP$$ conjecture

3 SAT to independent set

Theorem 3SAT $$\leq_p$$ IndependentSet

Proof Given an instance $$\Phi$$ of 3 SAT, we construct an instance of $$(G,k)$$ of IndependentSet that has an independent set of size $$k=|\Phi|$$ iff $$\Phi$$ is satisfiable

Construction

$$g$$ contains 3 nodes for each clause, one for each literal
Connect 3 literals in a clause in a triangle
Connect literal to each of its negations

Proof

$$\Rightarrow$$ Consider any satisfying assignment for $$\Phi$$ - yes instance for 3SAT are solved correctly

Select one true literal from each clause/triangle
This is an independent set of size $$k=|\Phi|$$

$$\Leftarrow$$ Let $$S$$ be independent set of size $$k$$ - no instances of 3SAT are solved correctly

None of the variables have labels that are negations of each other

$$S$$ must contain exactly one node in each triangle
Set these literals to $$true$$ and remaining literals consistently
All clauses in $$\Phi$$ are satisfied

£Review

Simple equivalence - IndependentSet $$\equiv_p$$ VertexCover
Special case to general case - VertexCover $$\leq_p$$ SetCover
Encoding with gadgets: $$3$$-SAT$$\leq_p$$ IndependentSet

Transitivity

$$X\leq_pY\and Y\leq_pZ\implies X\leq_pZ$$
Proof idea - compose the two algorithms

3 SAT is widely believed to be a difficult a hard problem - therefore independent set is likely a difficult algorithm

This can however be reduced to vertex cover, then set cover respoectively.

Subset Sum

Given $$n$$ natural number $$w_1,…,w_n$$ and an integer $$W$$, is there a subset that adds up to exactly $$W$$

With arithmetic problems, input integers are encoded in binary - Polytime reduction must be polynomial in binary encoding

Theorem

3SAT $$\leq_p$$ SubsetSum

Proof

Given an instance $$\Phi$$ of 3SAT, we construct an instance of SubsetSum that has a solution iff $$\Phi$$ is satisfiable

Construction

Given 3SAT instance $$\Phi$$ with $$n$$ variable and $$k$$ clauses, from $$2n+2k$$ decimal integers, each having $$n+k$$ digits
- Include one digit for each variable $$x_i$$ and one digit for each clause $$C_j$$
- Include two numbers for each variable $$x_i$$
- Include two numbers for each clause $$C_j$$
- Sum of each $$x_i$$ digit is 1, Sum of each $$C_j$$ digit is 4
No carries are possible - each digit yields one equation

3SAT:

Subset Sum

Proof

Lemma - $$\Phi$$ is satisfiable iff there exists a subset that sums to $$W$$

Proof

$$\Rightarrow$$ Suppose 3 SAT instance $$\Phi$$ has a satisfying assignment $$x^*$$

If $$x^*=true$$ select integer in row $$x_i$$, other wise, select integer in row $$¬x_i$$
Each $$x_i$$ digit sums to 1
Since $$\Phi$$ is satisfiable, each $$C_i$$ digit sums to at least 1 from $$x_i$$ and $$¬x_i$$ rows
- In the example case, it is either 1, or 2, or 3 - in our case, it is 1 in each column
- Never 0
Select dummy integers to make $$C_i$$ digits sum to 4

$$\blacksquare$$

$$\Leftarrow$$ Suppose there exists a subset $$S^*$$ that sums to $$W$$.

Digit $$x_i$$ forces subset $$S^*$$ to select either row $$x_i$$ or row $$¬x_i$$, but not both
If row $$x_i$$ selected, assign $$x_i^=true$$; otherwise, assign $$x_i^$$ is false. Digit $$C_j$$ forces subset $$S^*$$ to select at least one literal in clause

$$x_1^* = false$$
$$x_2^* = true$$
$$x_3^* = true$$

Subset Sum to Knapsack

3SAT Given $$n$$ natural number $$w_1,…,w_n$$ and an integer $$W$$, is there a subset that adds up to $$W$$

Knapsack - Given set of items $$X$$, weights $$u_i\geq 0$$, values $$v_i\geq 0$$, a weight limit $$U$$, and a target value $$V$$, is there subset $$S\subseteq X$$ such that

Polytime reductions

P vs NP

DEcision Problem

Problem $$X$$ is a set of strings
Instance $$s$$ is one string
Algorithm $$A$$ solves problem $$X$$:

Algorithm runs in polynomial time if for every string $$s,A(s)$$ terminates in $$\leq p(|s|)$$ strps where $$p(\cdot)$$ is some polynomial function
$$P$$ = set of decision problems for which there exists a poly-time algorithm

Some problems in $$P$$

NP

Algorithm $$C(s,t)$$ is a certifier for problem $$X$$ is for every string $$s$$: $$s\in X$$ iff there exists a string $$t$$ such that $$C(s,t)=yes$$
NP = set of decision problems for which there exists a poly-time certifier
- $$C(s,t)$$ is a poly time algorithm
- Certificate $$t$$ is of polynomial time algorithm
- Certificate of $$t$$ is of polynomial size: $$|t|\leq p(|s|)$$ for some polynomial $$p(\cdot)$$
$$C(s,t)$$ is a certifier if:
- It can be convinced:
  - If $$s\in X$$ then there is $$t$$, such that $$C(s,t)=yes$$
- It cannot be fooled:
  - If $$s\notin X$$ then for all $$t$$, we have $$C(s,t)=no$$

NP is related to non-determinism (come later)

Examples Composites

all positive integers which are not prime
Instance s 427669, if the number is composite GSD can be convinced, if prime, there can be no certificate which can fool that certificate
Certificate t 541
certifier C(s,t) grade-school-division

Satisfiability

SAT Given CNF formula $$\Phi$$, does it have a satisfying truth assignment

3 SAT SAT where each clause contains exactly 3 literals

Certificate an assignment of trut values to the boolean variables

Certifier Check that each clause in $$\Phi$$ has at least one true clause.

SAT $$\in$$ NP, 3 SAT $$\in$$ NP

Hamiltonian Path

Given an undir graph $$G=(V,E)$$ does there exist a simple path $$P$$ that visits every node
A certificate is a permuation $$\pi$$ of the $$n$$ nodes
Check that $$\pi$$ contains each node in $$V$$ exactly once and that $$G$$ contains an edge between each pair of adjacent nodes

Problems in NP

Significance of NP

Class of decision problems for which there exists a poly-time certifier

NP captures vast domains of computational, scientific and mathematical endeavour, and seems to roughly delimit what mathematicians and scientists have been aspiring to compute feasibly

P, NP, EXP

P decision problems for which there exists a poly-time algorithm
NP Decision problems for which there exists a poly-time certifier
EXP Decision problems for which there exists an exponential time algorithm

Proposition P $$\subseteq$$ NP

Proof consider any problem $$X\in$$P

By definition, there exists a poly-time algorithm $$A(s)$$ that solves $$X$$
Certificate $$t=\alpha$$ certifier $$C(s,t)=A(s)$$

Proposition NP $$\subseteq$$ EXP

Proof consider any problem $$X\in $$NP

by definition there exists a poly-time certifier $$C(s,t)$$ for $$X$$ where certificate $$t$$ satisfies $$|t|\leq p(|s|)$$ for some polynomial $$p(\cdot)$$
To solve instances $$s$$ run $$C(s,t)$$ on all strings $$s$$ with $$|t|\leq p(|s|)$$
Return $$yes$$ iff $$C(s,t)$$ retruns $$yes$$ for any of these potential certificates

**FACT P$$\neq$$EXP $$\implies$$ ** either P$$\neq$$NP or NP$$\neq$$EXP or both

Does P = NP

Is the decision problem as easy as the certification problem

If yes Efficient algorithms for 3-SAT, TSP, Vertex Cover, Factor

If No No efficient algorithms possible for 3 SAT, TSP, Vertex Cover

open question, has not been proven - one of the hardest problems to

NP Completeness

a problem $$y\in NP$$ with the problem that for every problem $$x\in NP$$, $$X\leq_p Y$$ - problem is NP hard

Proposition suppose $$Y\in $$ NP complete. Then $$Y\in$$ P iff P=NP

Proof $$\Longleftarrow$$ If P=NP then $$Y\in P$$ because $$Y\in NP$$

Proof $$\Longrightarrow$$ Suppose $$Y\in P$$

Consider any problem $$X\in NP$$ Since. $$X\leq_p Y$$, we have $$X\in P$$
This implies $$NP \subseteq P$$
We know that $$P\subseteq NP$$ therefore $$P=NP$$

Are there any natural NP complete problems

Establihsing NP completeness

Once establish first natural NP complete problem, others follow

Recipie to show $$Y\in NP$$ complete

Show that $$Y\in NP$$
Choose an NP complete problem $$X$$
Prove that $$X\leq_p Y$$

Proposition

If $$X\in NP$$ complete, $$Y\in NP$$, and $$X\leq_p Y$$ then $$Y\in NP$$ complete

Proof consider any problem $$W\in NP$$. Then both $$W\leq_pX$$ and $$X\leq_pY$$

By transitivity $$W\leq_pY$$
$$Y\in NP$$ complete

Some NP Complete problems:

Network Flows

A tuple consisting of $G=(V,E,s,t,c)$

Digraph, $(V,E)$ with source s and sink t
Capacity $c(e)\geq0$ $\forall e\in E$

An st-Flow $f$ is a function that satisfies:

Capacity - the flow along every edge is between 0 and smaller than its capacity
Flor conservation - the sum in must equal the sum out
The value is the sum of all edges minus the flow values that enter it

Max - Flow Problem

find the flow of maximum value

Greedy algorithm attempt

Start with $f(e)=0$ for each edge e
Find an st path $P$ where each edge has $f(e)<c(e)$
augment flow along path $P$
Repeat until you get stuck

FF algorithm

Residual Network

Original Edge
- Flow f(e)
- Capacity c(e)

Reverse edge e^reverse^
- Undo flow sent
Residual Capacity

Residual Network - $G_f=(V,E_f,s,t,c_f)$

Augmenting Path

An aug path is an ST path in the residual network $G_f$
The bottleneck capacity of an aug path is the minimum residual capacity of any ege in $P$

Property:

Let $f$ be a flow and $P$ be an aug path in $G_f$
Then, after calling $f'\leftarrow Augment(f,c,p)$ the resulting $f'$ is a flow and $v(f')=v(f)+bottleneck(G_f,P)$

Algorithm

Start with $f(e)=0$ for ever edge $e\in E$
Find an st path $P$ in the residual network $G_f$
Augment flow along path $P$
Repeat until you get stuck

Min Cut - Max Flow

An ST cut is a prtition $(A,B)$ of the nodes with $s\in A, t\in B$
A capacity is the sum of capacities of the edges from $A$ to $B$

Lemma

Let $f$ be any flow and let $(A,B)$ be any cut. Then, the value of the flow $f$ equals the new flow across the cut $(A,B)$

Proof

Weak Duality

Let $f$ be any flow and A,B be any cut. Then $val(f)\leq cap(A,B)$

Proof

Certificate of Optimailit

Corollary

Let $f$ be a flow and let $(A,B)$ be any cut
If $val(f)=cap(A,B)$ then $f$ is a max flow and $(A,B)$ is a min cut

Proof

for any flow $f':val(f')\leq cap(A,B)= val(f)$
For any cute $(A',B'): cap(A',B')\geq val(f) = cap(A,B)$

Max-flow min-cut theorem

val of max flow = capacity of a min cut

Augmenting path theorem

a flow f is a max flow iff there are no aug paths

Proof

The following three conditions are equivalent for any flow $f$

  1. There exists a cut (A,B) such that $cap(A,B)=val(f)$
  1. $f$ is a max flow
  1. There is no aug path with respect to $f$

$1\implies 2$

This is the weak duality corollary $\blacksquare$

$2\implies 3$

By contrapositive ( not 3 implies not 2)
Suppose that there is an aug path with respect to $f$
Can improve the flow $f$ by sending a flow along this path
Therefore $f$ is not a max flow

$\blacksquare$

$3\implies 1$

Let $f$ be a flow with no aug paths
Let $A$ be a set of nodes reachable from $s$ in residual network $G_f$
By definition of A: $s\in A$
By definition of flow $f$ $t\notin A$

$\blacksquare$

Since we have made a circle to each thing, we can conclude:

$1\iff 2 \iff 3$

Computing a minimum cut from a max flow

Theorem

Given a max flow $f$, can compute a min cut from $(A,B)$ in $O(m)$ time

Proof

Let $A= $ set of nodes reachable from s in residual network $G_f$

Running time of FF

Every edge capacity $c(e)$ is an integer between 1 and $C$
Integrality Invariant Throughout FF, every edge flow $f(e)$ and residual capacity $c_f(e) is an integer
By induction on the number of aug paths

Theorem

FF terminates after at most $val(f^)\leq n C$ aug paths where $f^$ is a max flow

Proof

Each augmentation increases the value of the flow by at least 1

Corollary

The running time of FF is $O(mnC)$

Proof

Can either use BFS or DFS to find an aug path in $O(m)$ time

Integrality theorem

There exists an integral max flow $f^*$

Proof

Since $FF$ terminates, theorem follows from integrality invariant and aug path theorem.

Choosing good augmenting paths

When edge capacities can be irrational, no guarantee that FF terminates or converges to max flow

Goal

Can find aug paths effectively
Few iterations

improvements

Max bottlenexck capacity - fattest
Sufficiently large bottleneck capacity
Fewest edges

Capacity Scaling Algorithm

Choose aug paths with a large bottleneck capacity
Maintain scaling parameter $\Delta$
Let $G_f(\Delta)$ be the part of the residual network containing only those edges with capacity $\geq \Delta$
Any augmenting path in $G_f(\Delta)$ has capacity $\geq \Delta$
Any augmenting path in $G_f(\Delta)$ has bottleneck capacity $\geq \Delta$

After every scaling phase, halve delta
Assume it is a power of 2 - almost as large as the capacity of the network
Every iteration consists of a version of FF, but rather than AUG, but instead, construct the residiual graph with the scaling factor, therefre the bottleneck of every path will be at least $\Delta$

Correctness

Assumption - all edge capacities are integers between 1 and C

Invariant - the scaling parameter $\Delta$ is a power of $2$

Proof - initially a power of 2; each page divides $\Delta$ by exactly $2\blacksquare$

Integrality Invariant Through the algorithm, every edge flow $f(e)$ and residual capacity $c_f(e)$ is an integer

Proof - same for generic FF

Theorem

If capacity scaling algorithm terminates, then $f$ is a max flow

Proof

By integrality invariant, when $\Delta=1\implies G_f(\Delta)=G_f$ - this is the residual graph without any invariant
Upon termination of $\Delta=1$ phase, there are no augmenting path
Result follows augmenting path theorem

$\blacksquare$

Runtime Analysis

Lemma 1

There are $1+\lfloor \log C\rfloor$ scaling phases

Proof

Initially $C/2<\Delta \leq C$, $\Delta$ decreases by a factor of 2 in each iteration

Lemma 2

Let $f$ be the flow at the end of a $\Delta$ scaling phase
Then, max flow value $\leq val(f)+m\Delta$ - shows that as $\Delta$ gets smaller, so does the tightness. $val(f)\geq val(f^*)-m\Delta$

Proof

Show there exists a cut $(A,B)$ such that $cap(A,B)\leq val(f)+m\Delta$
Choose $A$ to be the set of nodes reachable from $s$ in $G_f(\Delta)$
By definition of $A$, $s\in A$
By defnition of flow $f: t\notin A$

Lemma 3

There are $\leq 2m$ augs per scaling phase

Proof

Let $f$ be the flow at the beginning of a $\Delta$ scaling phase
Lemma 2 $\implies$ max flow value $\leq val(f)+m(2\Delta)$
Each augmentation in a $\Delta$ phase increases $val(f)$ by at least $\Delta$

$\blacksquare$

Theorem

The capacity algorithm takes $O(m^2\log C)$ time

Proof

Lemma 1 + Lemma 3 $\implies O(m\log C)$ augmentations
Finding an aug path takes $O(m)$ time

$\blacksquare$

Bipartite matching to max flow

Def

a Graph $G$ is bipartite if the nodes can be partitioned into two subsets $L,R$ such that every edge connects a node in $L$ with a node in $R$

Birpartite matching

Given a bipartite graph $G=(L\cup R,E)$

Formulation

Create a digraph $G'=(L\cup R\cup \set{s,t},E')$
Direct all edges from $L$ to $R$ and assign infinite (or unit) capacity
Add unit acpacity edges from $s$ to each node in $L$
Add unit capacity edges from each node in $R$ to $t$

Theorem

$1-1$ correspondence between matchings of cardinality $k$ in $G$ and integral flows of value $k$ in $G$

Proof $\implies$

Let $M$ be a matching in $G$ of cardinality $k$
Consider flow $f$ that sends 1 unit on each of the $k$ corresponding paths
$f$ is a flow of value $k$

**Proof ** $\Longleftarrow$

Let $f$ be an integral flow in $G'$ of value $k$
Consider $M=$ set of edges from $L$ to $R$ with $F(e)=1$
- Each node in $L$ and $R$ participates in at most one edge in $M$
- $|M|=k$ apply flow-value lemma to cut $(L\cup \set s,R\cup \set t)$

$\blacksquare$

Corollary

Can solve biaprtite matching via a max flow formulation

Proof

Integrality theorem $\implies$ exists a max flow $F^*$ in $G'$ that is integral
1-1 Correspondence $\implies f^*$ corresponds to a max cardinality matching

$\blacksquare$

Perfect Matching in bipartite graph

if every node appears in exactly one edge in $M$
Clearly need $|L|=|R|$
Let $S$ be a subset of ndoes, let $N(S)$ be the set of nodes adjacent to $S$
If a bipartite graph has a perfect matching, then $|N(S)|\geq |S|\forall S\subseteq L$ - also sufficient
Proof
- Each node in $S$ has to be matched to a different node in $SN(S)$

Hall's Marraige theorem

Let $G=(L\cup R,E)$ be a birparite graph $|L|=|R|$ then $G$ has a perfect matching $\iff |N(S)|\geq |S|\forall S \subseteq L$
From the previous observation $\implies$

$\Longleftarrow$

Proof by contrapositiive

Suppose $G$ does not have a perfect matchig
Formulate as a max flow problem and let $(A,B)$ be a min cut of $G'$
By max flow, min-cut theorem, $cap(A,B)<|L|$
$cap(A,B)=|L_B|+|R_A|\implies |R_A|<|L_A|: |R_A|=cap(A,B)-|L_B|<|L|-|L_B|=L_A$
Min Cut cant use $\infty$ edges, $\implies N(L_A)\subseteq R_A$
$|N(L_A)|\leq |R_A|<|L_A|$
Choose $S=L_A$
This means that we have found a set where its neighbourhood is smaller than $L_A$
$\blacksquare$

Files

CS260.md

Latest commit

History

CS260.md

File metadata and controls

CS260 - Algorithms

Greedy Algorith ms For Scheduling Problems

Interval Scheduling Problems

Interval Scheduling Problem

Earliest-Finish-Time-First Algorithm

Proof of correctness of the earliest time first algorithm

Interval Partitioning Problem

Interval Partitioning Algorithm

Proposition

Proof of correctness

Scheduling to minimizing lateness

Minimizing lateness: earliest deadline first

Algorithm

Minimizing lateness

Proof

Proof Techniques For Correctness Of Greedy Algorithms

Greedy analysis strategies

Stable Matching Problem

Formulating the problem - whenever trying to get to the essence of a problem, making it as “clean” as possible helps.

Designing the Algorithm

PSEUDOCODE

Greedy Algorithms for Shortest Paths and Minimum Spanning Trees

Shortest Path Problems - Extremely Wide Range of Applications

Dijkstra's Algorithm

The problem

Designing the Algorithm

Analysing the algorithm

Proof - From Lecture Notes

Proof - From Book

Implementation and Running Time

Implementation In a Priority Queue

Pseudocode

Which Priority Queue

Minimum Spanning Tree Problem

Cycles and cutsets

Cycles

Cuts

Cycle-cut insertion

Minimum Spanning Trees

Spanning tree definition

Spanning Tree Properties

Minimum Spanning Tree Definition

Fundamental Cycle

Fundamental Cutset

The Greedy Algorithm

Proof of correctness for the Greedy Algorithm

Prim’s algorithm

Implementation

Kruskal’s Algorithm

Pseudocode

Reverse Delete Algorithm

Minimum Spanning Tree Problem - Additional reading

The Problem

Designing the Algorithm

Analysing the Algorithms

Divide and Conquer Algorithms and Recurrences

Merge Sort

Sorting applications

Mergesort Process

Merging

Mergesort Implementation

Mergesort Time Complexity

Divide and conquer recurrence - recursion tree

Proof of Theorem by Induction

Closest Pair Of Points Problem

First attempt

Second Attempt

Third Attempt

How to find the closest pair with one point on each side.

Proof of correctness

Algorithm

The Master Theorem

Divide and Conquer Recurrences

Recursion Tree