title | filename | chapternum |
---|---|---|
Introduction |
lec_01_introduction |
0 |
- Introduce and motivate the study of computation for its own sake, irrespective of particular implementations.
- The notion of an algorithm and some of its history.
- Algorithms as not just tools, but also ways of thinking and understanding.
- Taste of Big-$O$ analysis and the surprising creativity in the design of efficient algorithms.
"Computer Science is no more about computers than astronomy is about telescopes", attributed to Edsger Dijkstra.^[This quote is typically read as disparaging the importance of actual physical computers in Computer Science, but note that telescopes are absolutely essential to astronomy as they provide us with the means to connect theoretical predictions with actual experimental observations.]
"Hackers need to understand the theory of computation about as much as painters need to understand paint chemistry.", Paul Graham 2003.^[To be fair, in the following sentence Graham says "you need to know how to calculate time and space complexity and about Turing completeness". This book includes these topics, as well as others such as NP-hardness, randomization, cryptography, quantum computing, and more.]
"The subject of my talk is perhaps most directly indicated by simply asking two questions: first, is it harder to multiply than to add? and second, why?...I (would like to) show that there is no algorithm for multiplication computationally as simple as that for addition, and this proves something of a stumbling block.", Alan Cobham, 1964
One of the ancient Babylonians' greatest innovations is the place-value number system. The place-value system represents numbers as sequences of digits where the position of each digit determines its value.
This is opposed to a system like Roman numerals, where every digit has a fixed value regardless of position. For example, the average distance to the moon is approximately 259,956 Roman miles. In standard Roman numerals, that would be
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMDCCCCLVI
Writing the distance to the sun in Roman numerals would require about 100,000 symbols; it would take a 50-page book to contain this single number!
For someone who thinks of numbers in an additive system like Roman numerals, quantities like the distance to the moon or sun are not merely large---they are unspeakable: they cannot be expressed or even grasped. It's no wonder that Eratosthenes, the first to calculate the earth's diameter (up to about ten percent error), and Hipparchus, the first to calculate the distance to the moon, used not a Roman-numeral type system but the Babylonian sexagesimal (base 60) place-value system.
In the language of Computer Science, the place-value system for representing numbers is known as a data structure: a set of instructions, or "recipe", for representing objects as symbols. An algorithm is a set of instructions, or "recipe", for performing operations on such representations. Data structures and algorithms have enabled amazing applications that have transformed human society, but their importance goes beyond their practical utility. Structures from computer science, such as bits, strings, graphs, and even the notion of a program itself, as well as concepts such as universality and replication, have not just found (many) practical uses but contributed a new language and a new way to view the world.
In addition to coming up with the place-value system, the Babylonians also invented the "standard algorithms" that we were all taught in elementary school for adding and multiplying numbers. These algorithms have been essential throughout the ages for people using abaci, papyrus, or pencil and paper, but in our computer age, do they still serve any purpose beyond torturing third-graders? To see why these algorithms are still very much relevant, let us compare the Babylonian digit-by-digit multiplication algorithm ("grade-school multiplication") with the naive algorithm that multiplies numbers through repeated addition. We start by formally describing both algorithms, see naivemultalg{.ref} and gradeschoolalg{.ref}.
INPUT: Non-negative integers $x,y$
OUTPUT: Product $x\cdot y$
Let $result \leftarrow 0$.
For{$i=1,\ldots,y$}
$result \leftarrow result + x$
endfor
return $result$
INPUT: Non-negative integers $x,y$
OUTPUT: Product $x\cdot y$
Write $x=x_{n-1}x_{n-2}\cdots x_0$ and $y = y_{m-1}y_{m-2}\cdots y_0$ in decimal place-value notation. # $x_0$ is the ones digit of $x$, $x_1$ is the tens digit, etc.
Let $result \leftarrow 0$
For{$i=0,\ldots,n-1$}
For{$j=0,\ldots,m-1$}
$result \leftarrow result + 10^{i+j}\cdot x_i \cdot y_j$
endfor
endfor
return $result$
Both naivemultalg{.ref} and gradeschoolalg{.ref} assume that we already know how to add numbers, and gradeschoolalg{.ref} also assumes that we can multiply a number by a power of
Computers have not made algorithms obsolete. On the contrary, the vast increase in our ability to measure, store, and communicate data has led to much higher demand for developing better and more sophisticated algorithms that empower us to make better decisions based on these data. We also see that in no small extent the notion of algorithm is independent of the actual computing device that executes it. The digit-by-digit multiplication algorithm is vastly better than iterated addition, regardless of whether the technology we use to implement it is a silicon-based chip, or a third-grader with pen and paper.
Theoretical computer science is concerned with the inherent properties of algorithms and computation; namely, those properties that are independent of current technology. We ask some questions that were already pondered by the Babylonians, such as "what is the best way to multiply two numbers?", but also questions that rely on cutting-edge science such as "could we use the effects of quantum entanglement to factor numbers faster?".
::: {.remark title="Specification, implementation, and analysis of algorithms." #implspecanarem} A full description of an algorithm has three components:
-
Specification: What is the task that the algorithm performs (e.g., multiplication in the case of naivemultalg{.ref} and gradeschoolalg{.ref}.)
-
Implementation: How is the task accomplished: what is the sequence of instructions to be performed. Even though naivemultalg{.ref} and gradeschoolalg{.ref} perform the same computational task (i.e., they have the same specification), they do it in different ways (i.e., they have different implementations).
-
Analysis: Why does this sequence of instructions achieve the desired task. A full description of naivemultalg{.ref} and gradeschoolalg{.ref} will include a proof for each one of these algorithms that on input
$x,y$ , the algorithm does indeed output$x\cdot y$ .
Often as part of the analysis we show that the algorithm is not only correct but also efficient. That is, we want to show that not only will the algorithm compute the desired task, but will do so in a prescribed number of operations. For example gradeschoolalg{.ref} computes the multiplication function on inputs of
Once you think of the standard digit-by-digit multiplication algorithm, it seems like the ``obviously best'' way to multiply numbers.
In 1960, the famous mathematician Andrey Kolmogorov organized a seminar at Moscow State University in which he conjectured that every algorithm for multiplying two
Karatsuba's algorithm is based on a faster way to multiply two-digit numbers.
Suppose that
The grade-school algorithm can be thought of as transforming the task of multiplying a pair of two-digit numbers into four single-digit multiplications via the formula
Generally, in the grade-school algorithm doubling the number of digits in the input results in quadrupling the number of operations, leading to an
which reduces multiplying the two-digit number
The above is the intuitive idea behind Karatsuba's algorithm, but is not enough to fully specify it. A complete description of an algorithm entails a precise specification of its operations together with its analysis: proof that the algorithm does in fact do what it's supposed to do. The operations of Karatsuba's algorithm are detailed in karatsubaalg{.ref}, while the analysis is given in karatsubacorrect{.ref} and karatsubaefficient{.ref}.
{#karatsubaruntimefig .margin }
INPUT: non-negative integers $x,y$ each of at most $n$ digits
OUTPUT: $x\cdot y$
procedure{Karatsuba}{$x$,$y$}
lif {$n \leq 4$} return $x\cdot y$ lendif
Let $m = \floor{n/2}$
Write $x= 10^{m}\overline{x} + \underline{x}$ and $y= 10^{m}\overline{y}+ \underline{y}$
$A \leftarrow Karatsuba(\overline{x},\overline{y})$
$B \leftarrow Karatsuba(\overline{x}+\underline{x},\overline{y}+\underline{y})$
$C \leftarrow Karatsuba(\underline{x},\underline{y})$
Return $(10^n-10^m)\cdot A + 10^m \cdot B +(1-10^m)\cdot C$
endprocedure
karatsubaalg{.ref} is only half of the full description of Karatsuba's algorithm.
The other half is the analysis, which entails proving that (1) karatsubaalg{.ref} indeed computes the multiplication operation and (2) it does so using
For every non-negative integers
::: {.proof data-ref="karatsubacorrect"}
Let
Plugging this into
Rearranging the terms we see that
$$
x\cdot y = 10^{2m}\overline{x}\overline{y} + 10^{m}\left[ (\overline{x}+\underline{x})(\overline{y}+\underline{y}) - \underline{x}\underline{y} - \overline{x}\overline{y} \right] + \underline{x}\underline{y} ;.
\label{eqkarastubatwo}
$$
since the numbers
If
::: {.proof data-ref="karatsubaefficient"}
karatsubafig{.ref} illustrates the idea behind the proof, which we only sketch here, leaving filling out the details as karatsuba-ex{.ref}.
The proof is again by induction. We define
The recursive equation eqkaratsubarecursion{.eqref} solves to
Karatsuba's algorithm is by no means the end of the line for multiplication algorithms.
In the 1960's, Toom and Cook extended Karatsuba's ideas to get an
::: {.remark title="Matrix Multiplication (advanced note)" #matrixmult} (This book contains many "advanced" or "optional" notes and sections. These may assume background that not every student has, and can be safely skipped over as none of the future parts depends on them.)
Ideas similar to Karatsuba's can be used to speed up matrix multiplications as well. Matrices are a powerful way to represent linear equations and operations, widely used in numerous applications of scientific computing, graphics, machine learning, and many many more.
One of the basic operations one can do with two matrices is to multiply them.
For example, if $x = \begin{pmatrix} x_{0,0} & x_{0,1}\ x_{1,0}& x_{1,1} \end{pmatrix}$ and $y = \begin{pmatrix} y_{0,0} & y_{0,1}\ y_{1,0}& y_{1,1} \end{pmatrix}$ then the product of
Now suppose that
In 1969 Volker Strassen noted that we can compute the product of a pair of two-by-two matrices using only seven products of numbers by observing that each entry of the matrix
Using this observation, we can obtain an algorithm such that doubling the dimension of the matrices results in increasing the number of operations by a factor of
The quest for better algorithms is by no means restricted to arithmetic tasks such as adding, multiplying or solving equations. Many graph algorithms, including algorithms for finding paths, matchings, spanning trees, cuts, and flows, have been discovered in the last several decades, and this is still an intensive area of research. (For example, the last few years saw many advances in algorithms for the maximum flow problem, borne out of unexpected connections with electrical circuits and linear equation solvers.) These algorithms are being used not just for the "natural" applications of routing network traffic or GPS-based navigation, but also for applications as varied as drug discovery through searching for structures in gene-interaction graphs to computing risks from correlations in financial investments.
Google was founded based on the PageRank algorithm, which is an efficient algorithm to approximate the "principal eigenvector" of (a dampened version of) the adjacency matrix of the web graph.
The Akamai company was founded based on a new data structure, known as consistent hashing, for a hash table where buckets are stored at different servers.
The backpropagation algorithm, which computes partial derivatives of a neural network in
Even for classical questions, studied through the ages, new discoveries are still being made. For example, for the question of determining whether a given integer is prime or composite, which has been studied since the days of Pythagoras, efficient probabilistic algorithms were only discovered in the 1970s, while the first deterministic polynomial-time algorithm was only found in 2002. For the related problem of actually finding the factors of a composite number, new algorithms were found in the 1980s, and (as we'll see later in this course) discoveries in the 1990s raised the tantalizing prospect of obtaining faster algorithms through the use of quantum mechanical effects.
Despite all this progress, there are still many more questions than answers in the world of algorithms. For almost all natural problems, we do not know whether the current algorithm is the "best", or whether a significantly better one is still waiting to be discovered. As alluded to in Cobham's opening quote for this chapter, even for the basic problem of multiplying numbers we have not yet answered the question of whether there is a multiplication algorithm that is as efficient as our algorithms for addition. But at least we now know the right way to ask it.
Finding better algorithms for problems such as multiplication, solving equations, graph problems, or fitting neural networks to data, is undoubtedly a worthwhile endeavor. But why is it important to prove that such algorithms don't exist? One motivation is pure intellectual curiosity. Another reason to study impossibility results is that they correspond to the fundamental limits of our world. In other words, impossibility results are laws of nature.
Here are some examples of impossibility results outside computer science (see bnotesintrosec{.ref} for more about these). In physics, the impossibility of building a perpetual motion machine corresponds to the law of conservation of energy. The impossibility of building a heat engine beating Carnot's bound corresponds to the second law of thermodynamics, while the impossibility of faster-than-light information transmission is a cornerstone of special relativity. In mathematics, while we all learned the formula for solving quadratic equations in high school, the impossibility of generalizing this formula to equations of degree five or more gave birth to group theory. The impossibility of proving Euclid's fifth axiom from the first four gave rise to non-Euclidean geometries, which ended up crucial for the theory of general relativity.
In an analogous way, impossibility results for computation correspond to "computational laws of nature" that tell us about the fundamental limits of any information processing apparatus, whether based on silicon, neurons, or quantum particles. Moreover, computer scientists found creative approaches to apply computational limitations to achieve certain useful tasks. For example, much of modern Internet traffic is encrypted using the RSA encryption scheme, the security of which relies on the (conjectured) impossibility of efficiently factoring large integers. More recently, the Bitcoin system uses a digital analog of the "gold standard" where, instead of using a precious metal, new currency is obtained by "mining" solutions for computationally difficult problems.
- The history of algorithms goes back thousands of years; they have been essential to much of human progress and these days form the basis of multi-billion dollar industries, as well as life-saving technologies.
- There is often more than one algorithm to achieve the same computational task. Finding a faster algorithm can often make a much bigger difference than improving computing hardware.
- Better algorithms and data structures don't just speed up calculations, but can yield new qualitative insights.
- One question we will study is to find out what is the most efficient algorithm for a given problem.
- To show that an algorithm is the most efficient one for a given problem, we need to be able to prove that it is impossible to solve the problem using a smaller amount of computational resources.
Often, when we try to solve a computational problem, whether it is solving a system of linear equations, finding the top eigenvector of a matrix, or trying to rank Internet search results, it is enough to use the "I know it when I see it" standard for describing algorithms.
As long as we find some way to solve the problem, we are happy and might not care much on the exact mathematical model for our algorithm.
But when we want to answer a question such as "does there exist an algorithm to solve the problem
In particular, we will need to (1) define exactly what it means to solve
Once we have these formal models of computation, we can try to obtain impossibility results for computational tasks, showing that some problems can not be solved (or perhaps can not be solved within the resources of our universe). Archimedes once said that given a fulcrum and a long enough lever, he could move the world. We will see how reductions allow us to leverage one hardness result into a slew of a great many others, illuminating the boundaries between the computable and uncomputable (or tractable and intractable) problems.
Later in this book we will go back to examining our models of computation, and see how resources such as randomness or quantum entanglement could potentially change the power of our model. In the context of probabilistic algorithms, we will see a glimpse of how randomness has become an indispensable tool for understanding computation, information, and communication. We will also see how computational difficulty can be an asset rather than a hindrance, and be used for the "derandomization" of probabilistic algorithms. The same ideas also show up in cryptography, which has undergone not just a technological but also an intellectual revolution in the last few decades, much of it building on the foundations that we explore in this course.
Theoretical Computer Science is a vast topic, branching out and touching upon many scientific and engineering disciplines. This book provides a very partial (and biased) sample of this area. More than anything, I hope I will manage to "infect" you with at least some of my love for this field, which is inspired and enriched by the connection to practice, but is also deep and beautiful regardless of applications.
This book is divided into the following parts, see dependencystructurefig{.ref}.
-
Preliminaries: Introduction, mathematical background, and representing objects as strings.
-
Part I: Finite computation (Boolean circuits): Equivalence of circuits and straight-line programs. Universal gate sets. Existence of a circuit for every function, representing circuits as strings, universal circuit, lower bound on circuit size using the counting argument.
-
Part II: Uniform computation (Turing machines): Equivalence of Turing machines and programs with loops. Equivalence of models (including RAM machines,
$\lambda$ calculus, and cellular automata), configurations of Turing machines, existence of a universal Turing machine, uncomputable functions (including the Halting problem and Rice's Theorem), Gödel's incompleteness theorem, restricted computational models (regular and context free languages). -
Part III: Efficient computation: Definition of running time, time hierarchy theorem,
$\mathbf{P}$ and$\mathbf{NP}$ ,$\mathbf{P_{/poly}}$ ,$\mathbf{NP}$ completeness and the Cook-Levin Theorem, space bounded computation. -
Part IV: Randomized computation: Probability, randomized algorithms,
$\mathbf{BPP}$ , amplification,$\mathbf{BPP} \subseteq \mathbf{P}_{/poly}$ , pseudorandom generators and derandomization. -
Part V: Advanced topics: Cryptography, proofs and algorithms (interactive and zero knowledge proofs, Curry-Howard correspondence), quantum computing.
The book largely proceeds in linear order, with each chapter building on the previous ones, with the following exceptions:
-
The topics of
$\lambda$ calculus (lambdacalculussec{.ref} and lambdacalculussec{.ref}), Gödel's incompleteness theorem (godelchap{.ref}), Automata/regular expressions and context-free grammars (restrictedchap{.ref}), and space-bounded computation (spacechap{.ref}), are not used in the following chapters. Hence you can choose whether to cover or skip any subset of them. -
Part II (Uniform Computation / Turing Machines) does not have a strong dependency on Part I (Finite computation / Boolean circuits) and it should be possible to teach them in the reverse order with minor modification. Boolean circuits are used Part III (efficient computation) for results such as
$\mathbf{P} \subseteq \mathbf{P_{/poly}}$ and the Cook-Levin Theorem, as well as in Part IV (for$\mathbf{BPP} \subseteq \mathbf{P_{/poly}}$ and derandomization) and Part V (specifically in cryptography and quantum computing). -
All chapters in advancedpart{.ref} (Advanced topics) are independent of one another and can be covered in any order.
A course based on this book can use all of Parts I, II, and III (possibly skipping over some or all of the
::: {.exercise } Rank the significance of the following inventions in speeding up the multiplication of large (that is 100-digit or more) numbers. That is, use "back of the envelope" estimates to order them in terms of the speedup factor they offered over the previous state of affairs.
a. Discovery of the grade-school digit by digit algorithm (improving upon repeated addition).
b. Discovery of Karatsuba's algorithm (improving upon the digit by digit algorithm).
c. Invention of modern electronic computers (improving upon calculations with pen and paper). :::
::: {.exercise}
The 1977 Apple II personal computer had a processor speed of 1.023 Mhz or about
a.
b.
c.
d.
e.
::: {.exercise title="Usefulness of algorithmic non-existence"} In this chapter we mentioned several companies that were founded based on the discovery of new algorithms. Can you give an example for a company that was founded based on the non-existence of an algorithm? See footnote for hint.^[As we will see in Chapter chapcryptography{.ref}, almost any company relying on cryptography needs to assume the non-existence of certain algorithms. In particular, RSA Security was founded based on the security of the RSA cryptosystem, which presumes the non-existence of an efficient algorithm to compute the prime factorization of large integers.] :::
::: {.exercise title="Analysis of Karatsuba's Algorithm" #karatsuba-ex}
a. Suppose that
b. Prove that the number of single-digit operations that Karatsuba's algorithm takes to multiply two
:::
::: {.exercise }
Implement in the programming language of your choice functions Gradeschool_multiply(x,y)
and Karatsuba_multiply(x,y)
that take two arrays of digits x
and y
and return an array representing the product of x
and y
(where x
is identified with the number x[0]+10*x[1]+100*x[2]+...
etc..) using the grade-school algorithm and the Karatsuba algorithm respectively. At what number of digits does the Karatsuba algorithm beat the grade-school one?
:::
::: {.exercise title="Matrix Multiplication (optional, advanced)" #matrixex}
In this exercise, we show that if for some
To make this precise, we need to make some notation that is unfortunately somewhat cumbersome. Assume that there is some
For a brief overview of what we'll see in this book, you could do far worse than read Bernard Chazelle's wonderful essay on the Algorithm as an Idiom of modern science. The book of Moore and Mertens [@MooreMertens11] gives a wonderful and comprehensive overview of the theory of computation, including much of the content discussed in this chapter and the rest of this book. Aaronson's book [@Aaronson13democritus] is another great read that touches upon many of the same themes.
For more on the algorithms the Babylonians used, see Knuth's paper and Neugebauer's classic book.
Many of the algorithms we mention in this chapter are covered in algorithms textbooks such as those by Cormen, Leiserson, Rivest, and Stein [@CLRS], Kleinberg and Tardos [@KleinbergTardos06], and Dasgupta, Papadimitriou and Vazirani [@DasguptaPV08], as well as Jeff Erickson's textbook. Erickson's book is freely available online and contains a great exposition of recursive algorithms in general and Karatsuba's algorithm in particular.
The story of Karatsuba's discovery of his multiplication algorithm is recounted by him in [@Karatsuba95]. As mentioned above, further improvements were made by Toom and Cook [@Toom63, @Cook66], Schönhage and Strassen [@SchonhageStrassen71], Fürer [@Furer07], and recently by Harvey and Van Der Hoeven [@HarveyvdHoeven2019], see this article for a nice overview. The last papers crucially rely on the Fast Fourier transform algorithm. The fascinating story of the (re)discovery of this algorithm by John Tukey in the context of the cold war is recounted in [@Cooley87FFTdiscovery]. (We say re-discovery because it later turned out that the algorithm dates back to Gauss [@heideman1985gauss].) The Fast Fourier Transform is covered in some of the books mentioned below, and there are also online available lectures such as Jeff Erickson's. See also this popular article by David Austin. Fast matrix multiplication was discovered by Strassen [@Strassen69], and since then this has been an active area of research. [@Blaser13] is a recommended self-contained survey of this area.
The Backpropagation algorithm for fast differentiation of neural networks was invented by Werbos [@Werbos74]. The Pagerank algorithm was invented by Larry Page and Sergey Brin [@pagerank99]. It is closely related to the HITS algorithm of Kleinberg [@Kleinber99]. The Akamai company was founded based on the consistent hashing data structure described in [@Akamai97]. Compressed sensing has a long history but two foundational papers are [@CandesRombergTao06, @Donoho2006compressed]. [@compressedmri08] gives a survey of applications of compressed sensing to MRI; see also this popular article by Ellenberg [@Ellenberg10wired]. The deterministic polynomial-time algorithm for testing primality was given by Agrawal, Kayal, and Saxena [@AgrawalKayalSaxena04].
We alluded briefly to classical impossibility results in mathematics, including the impossibility of proving Euclid's fifth postulate from the other four, impossibility of trisecting an angle with a straightedge and compass and the impossibility of solving a quintic equation via radicals. A geometric proof of the impossibility of angle trisection (one of the three geometric problems of antiquity, going back to the ancient Greeks) is given in this blog post of Tao. The book of Mario Livio [@Livio05] covers some of the background and ideas behind these impossibility results. Some exciting recent research is focused on trying to use computational complexity to shed light on fundamental questions in physics such as understanding black holes and reconciling general relativity with quantum mechanics