Edits

alan-turing-institute · Apr 23, 2024 · 1975109 · 1975109
1 parent 87000d2
commit 1975109
Show file tree

Hide file tree

Showing 3 changed files with 85 additions and 68 deletions.
diff --git a/notes/bilinear-form.asy b/notes/bilinear-form.asy
@@ -26,11 +26,11 @@ draw(mapcurve, margin = DotMargins, Arrow(size=4pt));
 label("$V$", vt * (0.5, 1), 2 * N);
 label("$V^*$", xt * vt * (0.5, 1), 2 * N); 
 
-label("$C$", midpoint(mapcurve), N);
+label("$\bm{C}$", midpoint(mapcurve), N);
 // label("$C^{-1}$", reflect((0,0),E) * midpoint(mapcurve), S);
 
-dot("$x$", vt * (0.7, 0.7), W);
-dot("$C(x)$", xt * vt * (0.3, 0.7), E);
+dot("$v$", vt * (0.7, 0.7), W);
+dot("$\bm{C}(v)$", xt * vt * (0.3, 0.7), E);
 
 // dot("$\tilde{b}$", xt * vt * (0.3, 0.2), E);
 // dot("$C^{-1}(\tilde{b})$", vt * (0.7, 0.2), W);
diff --git a/notes/bilinear-form2.asy b/notes/bilinear-form2.asy
@@ -26,11 +26,11 @@ draw(reflect((0,0),E) * mapcurve, margin = DotMargins, BeginArrow(size = 4pt));
 label("$V$", vt * (0.5, 1), 2 * N);
 label("$V^*$", xt * vt * (0.5, 1), 2 * N); 
 
-label("$C$", midpoint(mapcurve), N);
-label("$C^{-1}$", reflect((0,0),E) * midpoint(mapcurve), S);
+label("$\bm{C}$", midpoint(mapcurve), N);
+label("$\bm{C}^{-1}$", reflect((0,0),E) * midpoint(mapcurve), S);
 
-dot("$x$", vt * (0.7, 0.7), W);
-dot("$C(x)$", xt * vt * (0.3, 0.7), E);
+dot("$v$", vt * (0.7, 0.7), W);
+dot("$\bm{C}(v)$", xt * vt * (0.3, 0.7), E);
 
 dot("$\tilde{b}$", xt * vt * (0.3, 0.2), E);
-dot("$C^{-1}(\tilde{b})$", vt * (0.7, 0.2), W);
+dot("$\bm{C}^{-1}(\tilde{b})$", vt * (0.7, 0.2), W);
diff --git a/notes/optimisation.tex b/notes/optimisation.tex
@@ -22,6 +22,7 @@
 \date{\today}
 %%
 \DeclareBoldMathCommand{\setR}{R}
+\DeclareBoldMathCommand{\bfC}{C}
 \DeclareMathOperator*{\argmin}{arg\,min}
 \newcommand{\eg}{\emph{Example:}}
 \newcommand{\ie}{\emph{i.e.}}
@@ -37,7 +38,7 @@
 prevent an analytic solution and impede a numerical one.
 
 The general problem is this. Suppose $X$ is some set, possibly with
-additional structure, and $f\colon \setR \to \setR$ a real-valued function
+additional structure, and $f\colon X \to \setR$ a real-valued function
 on~$X$. We are to find $x_\text{min}\in V$ (if one exists) such that
 \[
  f(x_\text{min}) \leq f(x) \quad\text{for all $x\in V$}.  
@@ -147,67 +148,69 @@
 with $v$, somehow “carry it across” to $V^*$, and then act with the
 result on~$w$.
 
-Thus, let $C\colon V\to V^*$ be a linear map from $V$ to its dual. For
-any vector $v\in V$, we obtain $C(v)\in V^*$, a linear map from $V$
+Thus, let $\bfC\colon V\to V^*$ be a linear map from $V$ to its dual. For
+any vector $v\in V$, we obtain $\bfC(v)\in V^*$, a linear map from $V$
 to~$\setR$ (see figure~\ref{fig:bilinear-form}). Since an element of $V^*$ is a linear map from $V$
-to~$\setR$, we may apply $C(v)$ to $w\in V$ and thereby obtain a number,
-$(C(v))(w)$. 
+to~$\setR$, we may apply $\bfC(v)$ to $w\in V$ and thereby obtain a number,
+$(\bfC(v))(w)$. 
 
 \begin{marginfigure}
   \begin{center}
     \asyinclude[width=5cm]{bilinear-form.asy}
   \end{center}
   \caption{A vector space $V$ and its dual $V^*$, showing: an element
-    $x\in V$; a linear map $C\colon V\to V^*$; and the image of $x$ in
-    $V^*$ under $C$.\label{fig:bilinear-form}}
+    $v\in V$; a linear map $\bfC\colon V\to V^*$; and the image of $v$ in
+    $V^*$ under $\bfC$.\label{fig:bilinear-form}}
 \end{marginfigure}
-In a sense, one may think of $C$ as a map, from pairs
+In a sense, one may think of $\bfC$ as a map, from pairs
 $(v,w)\in V\times V$ to the reals, which is “linear in both $v$ and
 $w$.” This view suggests a less cumbersome notation: instead of
-$(C(v))(w)$ we shall write $C(v,w)$. Thus, by $C(v, w)$ we shall mean,
-“apply $C$ to $v$, obtaining an elemement of $V^*$, and apply this
-element to $w$, obtaining a number.” When $C$ is viewed from this
+$(\bfC(v))(w)$ we shall write $\bfC(v,w)$. Thus, by $\bfC(v, w)$ we shall mean,
+“apply $\bfC$ to $v$, obtaining an elemement of $V^*$, and apply this
+element to $w$, obtaining a number.” When $\bfC$ is viewed from this
 perspective, it is known as a \emph{bilinear form}.
 
-\eg{} For $C$ any bilinear form,
-$C(\alpha v, \beta w) = \alpha\beta C(v,w)$ (which very much gives $C$ the flavour of a
+\eg{} For $\bfC$ any bilinear form,
+$\bfC(\alpha v, \beta w) = \alpha\beta \bfC(v,w)$ (which very much gives $\bfC$ the flavour of a
 product).
 
 We can now say roughly what is meant by a “quadratic term:” it is an
-expression of the form $C(v,v)$ for some bilinear form~$C$.
+expression of the form $\bfC(v,v)$ for some bilinear form~$\bfC$.
 
-Notice, however, that in this expression $C$ is applied to a single
+Notice, however, that in this expression $\bfC$ is applied to a single
 $v$ (twice); whereas more generally a bilinear form may be applied to
 two different vectors. Is there some redundancy in this definition?
-Let $A$ be any bilinear form such that $A(v,w)=-A(w,v)$ and consider
-the bilinear form $C+A$. By linearity, we have $(C+A)(v, v) =
-C(v,v)+A(v,v)$. However, $A(v,v)=-A(v,v)$ (by assumption), whence
-$A(v,v)=0$. Thus $(C+A)(v,v)=C(v,v)$; that is, $C+A$ gives rise to the
-same quadratic form as~$C$.
-
-A bilinear form $A$ for which $A(v,w)=-A(w,v)$ is said to be
-\emph{antisymmetric}. Conversely, a bilinear form $S$ for which
-$S(v,w)=S(w,v)$ is said to be \emph{symmetric}. Let $C$ be any
-bilinear form and consider the identity:
+Let $\bm{A}$ be any bilinear form such that $\bm{A}(v,w)=-\bm{A}(w,v)$ and consider
+the bilinear form $\bfC+\bm{A}$. By linearity, we have $(\bfC+\bm{A})(v, v) =
+\bfC(v,v)+\bm{A}(v,v)$. However, $\bm{A}(v,v)=-\bm{A}(v,v)$ (by assumption), whence
+$\bm{A}(v,v)=0$. Thus $(\bfC+\bm{A})(v,v)=\bfC(v,v)$; that is, $\bfC+\bm{A}$ gives rise to the
+same quadratic form as~$\bfC$.
+
+A bilinear form $\bm{A}$ for which $\bm{A}(v,w)=-\bm{A}(w,v)$ is said to be
+\emph{antisymmetric}. Conversely, a bilinear form $\bm{S}$ for which
+$\bm{S}(v,w)=\bm{S}(w,v)$ is said to be \emph{symmetric}. From the foregoing, we
+may add to $\bfC$ any antisymmetric bilinear form without affecting the
+value of $\bfC(v,v)$. Now consider the identity (for any bilinear form):
 \[
-  C(v, w) = \frac{1}{2}\bigl[C(v,w) + C(w,v)\bigr]
-  + \frac{1}{2}\bigl[C(v,w) - C(w,v)\bigr].
+  \bfC(v, w) = \frac{1}{2}\bigl[\bfC(v,w) + \bfC(w,v)\bigr]
+  + \frac{1}{2}\bigl[\bfC(v,w) - \bfC(w,v)\bigr].
 \]
 The first term on the right-hand side is symmetric whereas the second
 is antisymmetic. Since the antisymmetric term vanishes when both
 arguments are the same, we may, without loss of generality, assume
-that $C$ is symmetric when evaluting~$C(v,v)$.\sidenote{Well, we have
-  to show that every symmetric bilinear form arises in this way.}
+that $\bfC$ is symmetric when evaluting~$\bfC(v,v)$.\sidenote{It is also
+  true, though we do not show it, that there is no further
+  redundancy.}
 
 We are now in a position to say what we mean by a quadratic function on
 a vector space. It is a function of the form:
 \begin{equation}
-  f(v) = a - 2\tilde{b}(v) + C(v, v).
+  f(v) = a - 2\tilde{b}(v) + \bfC(v, v),
   \label{eq:quadratic-function}
 \end{equation}
-In this expression, $a$ is a number, $\tilde{b}$ is an element of the
-dual space and $C$ is a symmetric bilinear form. (The factor of $-2$
-is conventional as it simplifies certain calculations.)
+where, in this expression, $a$ is a number, $\tilde{b}$ is an element
+of the dual space and $\bfC$ is a symmetric bilinear form. (The factor of
+$-2$ is conventional as it simplifies certain calculations.)
 
 Having written down a function on $V$, we return to the problem of
 finding the location of its minimum.
@@ -216,79 +219,93 @@
 eq.~\eqref{eq:completing-the-square}, we might attempt to rewrite
 eq.~\eqref{eq:quadratic-function} as:
 \begin{equation}
-  f(v) = \kappa + \Gamma(v - \xi, v - \xi),
+  f(v) = \kappa + \bm{\Gamma}(v - \xi, v - \xi),
   \label{eq:vector-square}
 \end{equation}
 where now $\kappa$ is a number, $\xi$ is a vector (which we hope will turn
-out to be the minimiser of $f$!), and $\Gamma$ is a symmetric bilinear
+out to be the minimiser of $f$!), and $\bm{\Gamma}$ is a symmetric bilinear
 form. (Note that, previously, the last term on the right-hand side
 involved the expression ${(x-\xi)}^2$; here, a symmetric bilinear form
 is required to effect the square.)
 
 In the one-dimensional case we next expanded the term in ${(x-\xi)}^2$
 and equated coefficients of each power of $x$. To do the same thing
-here, we shall have to expand the term in $\Gamma$. Recall the meaning of
-$\Gamma(v-\xi, v-\xi)$: $\Gamma$ is applied to $v-\xi$ to obtain an element
+here, we shall have to expand the term in $\bm{\Gamma}$. Recall the meaning of
+$\bm{\Gamma}(v-\xi, v-\xi)$: $\bm{\Gamma}$ is applied to $v-\xi$ to obtain an element
 of~$V^*$; this element is then applied to $v-\xi$. Both of these
 applications are linear, and so
 \[
   \begin{aligned}
-  \Gamma(v-\xi,v-\xi) & = \Gamma(v-\xi,v)-\Gamma(v-\xi, \xi) \\
-  & = \Gamma(v,v)-\Gamma(v, \xi) - \Gamma(\xi,v) + \Gamma(\xi, \xi) \\
-  & = \Gamma(v,v)-2\Gamma(\xi,v)+\Gamma(\xi,\xi).
+  \bm{\Gamma}(v-\xi,v-\xi) & = \bm{\Gamma}(v-\xi,v)-\bm{\Gamma}(v-\xi, \xi) \\
+  & = \bm{\Gamma}(v,v)-\bm{\Gamma}(v, \xi) - \bm{\Gamma}(\xi,v) + \bm{\Gamma}(\xi, \xi) \\
+  & = \bm{\Gamma}(v,v)-2\bm{\Gamma}(\xi,v)+\bm{\Gamma}(\xi,\xi).
   \end{aligned}
 \]
-Replacing $\Gamma$ in eq.~\eqref{eq:vector-square} with this expansion, we
+Replacing $\bm{\Gamma}$ in eq.~\eqref{eq:vector-square} with this expansion, we
 obtain
 \[
-  a -2\tilde{b}(v)+C(v,v) = \bigl[\kappa+\Gamma(\xi,\xi)\bigr] -2\Gamma(\xi,v) + \Gamma(v,v)
+  a -2\tilde{b}(v)+\bfC(v,v) = \bigl[\kappa+\bm{\Gamma}(\xi,\xi)\bigr] -2\bm{\Gamma}(\xi,v) + \bm{\Gamma}(v,v)
 \]
-from which we conclude: $\Gamma(v,v) = C(v,v)$ (from the
-terms “quadratric in $v$”); $\Gamma(\xi, v) = \tilde{b}(v)$ (from the terms
-linear in $v$); and $\kappa+\Gamma(\xi,\xi)=a$ (from the constant terms).
+from which we conclude: $\bm{\Gamma}(v,v) = \bfC(v,v)$ (from the
+terms “quadratric in $v$”); $\bm{\Gamma}(\xi, v) = \tilde{b}(v)$ (from the terms
+linear in $v$); and $\kappa+\bm{\Gamma}(\xi,\xi)=a$ (from the constant terms).
 
 The first and third of these identifications are clear. We should
-choose $\Gamma=C$ and therefore $\kappa=a-C(\xi,\xi)$. The second term is less
-obvious. Replacing $\Gamma$ with $C$, it is
-$C(\xi, v) = \tilde{b}(v)$. What meaning should we ascribe to this?
-Recall the meaning of $C(\xi,v)$: it is notation for $(C(\xi))(v)$, or
-“$C$ applied first to $\xi$, and the result is applied to~$v$ (see
+choose $\bm{\Gamma}=\bfC$ and therefore $\kappa=a-\bfC(\xi,\xi)$. The second term is less
+obvious. Replacing $\bm{\Gamma}$ with $\bfC$, it is
+$\bfC(\xi, v) = \tilde{b}(v)$. What meaning should we ascribe to this?
+Recall the meaning of $\bfC(\xi,v)$: it is notation for $(\bfC(\xi))(v)$, or
+“$\bfC$ applied first to $\xi$, and the result is applied to~$v$ (see
 figure~\ref{fig:bilinear-form2}).
 \begin{marginfigure}
   \begin{center}
     \asyinclude[width=5cm]{bilinear-form2.asy}
   \end{center}
 \caption{A vector space $V$ and its dual $V^*$, showing an element $x\in
-  V$ and its image in $V^*$ under $C$, as well as an element
-  $\tilde{b}\in V^*$ and its image in $V$ under~$C^{-1}$.\label{fig:bilinear-form2}}
+  V$ and its image in $V^*$ under $\bfC$, as well as an element
+  $\tilde{b}\in V^*$ and its image in $V$ under~$\bfC^{-1}$.\label{fig:bilinear-form2}}
 \end{marginfigure}
-That is, $C(\xi)$ is an element of~$V^*$, as is $\tilde{b}$. Moreover,
+That is, $\bfC(\xi)$ is an element of~$V^*$, as is $\tilde{b}$. Moreover,
 both of these give the same result when acting on any $v\in V$ and,
-hence, are the same element of~$V^*$. That is, $C(\xi) = \tilde{b}$. 
+hence, are the same element of~$V^*$. That is, $\bfC(\xi) = \tilde{b}$. 
 
 A candidate answer for the minimiser of $f(v)$, is therefore
 \begin{equation}
   \label{eq:minimiser}
-  \xi = C^{-1}(\tilde{b}).
+  \xi = \bfC^{-1}(\tilde{b}).
 \end{equation}
 Unfortunately, we are not yet done. To conclude that this is a
-minimiser we must in addition show two things: first, that $C$
+minimiser we must in addition show two things: first, that $\bfC$
 \emph{has} an inverse; and second that $f(v)$ is indeed a minimum at
 this value.
 
 It is convenient to tackle the second condition first. Assume, for the
-moment, that $C$ is invertible and that $\xi$ is given by
+moment, that $\bfC$ is invertible and that $\xi$ is given by
 eq.~\eqref{eq:minimiser}. This $\xi$ will be a minimiser of We must show that $f(v)>f(\xi)$ for all
 $v\neq x$ which, from eq.~\eqref{eq:vector-square}, is equivalent to
-requiring $C(v-\xi,v-\xi)>C(0,0)$. Since $v$ is arbitrary and
-$C(0,0)=0$ this condition is equivalent to
+requiring $\bfC(v-\xi,v-\xi)>\bfC(0,0)$. Since $v$ is arbitrary and
+$\bfC(0,0)=0$ this condition is equivalent to
 \begin{equation}
   \label{eq:positive-definite}
-  C(x,x) > 0 \quad\text{for all $x\in V$ such that $x\neq0$}.
+  \bfC(x,x) > 0 \quad\text{for all $x\in V$ such that $x\neq0$}.
 \end{equation}
 A symmetric bilinear form for which eq.~\eqref{eq:positive-definite}
 holds is said to be \emph{positive definite}. 
 
+Now we return to the issue of whether $\bfC$ is invertible. In fact,
+we have:
+
+\emph{Theorem}: Any positive-definite, symmetric bilinear form on a
+finite-dimensional vector space is invertible.
+
+\emph{Proof}: Suppose $\bfC$ is a positive-definite, symmetric
+bilinear form. $\bfC$ is invertible if it is injective and
+surjective. To show injectivity suppose, for contradiction, that there
+is some $u\neq\bm{0}$ in $V$ such that $\bfC(u)=\bm{0}$ (with the
+right-hand side being the zero element of~$V$, noting that this
+condition is equivalent to injectivity). Then we would have
+$\bfC(u, x) =0$ for any $x$ and in particular $\bfC(u,u)=0$,
+contradicting the assumed positive-definiteness of~$\bfC$.