cat_notes.tex

\documentclass[11pt]{article}
\usepackage{amssymb,amsmath,tikz-cd,xspace, natbib,verse}
\usepackage[T1]{fontenc}

\begin{document}
\input{tex.macros}

\title{Category theory notes}
\maketitle

They say you don't understand something until you can explain it, so here is my exercise
in explaining category theory. I make no claims to any authority on this subject and the
document is incomplete in proportion to my incomplete understanding.

\section{Why}
The first question I have had to explain, to myself and every single person I've spoken
to about these things: why study category theory at all? This is forward looking,
but here are my impressions halfway through a few textbooks:

\items{
\item  By {\em category}, we mean {\em context}, and we're looking for patterns across
contexts. When two mathematical pursuits do the same thing, can
we formalize exactly how they are or are not identical?
There is a promise that even theoretical work in real-world
contexts (i.e., my actual job) will benefit from better formalization of how
contexts relate.
\item  I'm hoping it'll be a shortcut to learning more math. If I get this, then
algebraic topology should be metaphorical cake.
\item There is a trend in programming languages toward languages whose theory is based on
a category of types.  Out-of-fashion languages
are written in terms of procedures: plug 4 into the square function and we do some
things and return 16. Languages oriented toward mathematical functions encourage a
thinking that is far more still: the square of 4 is 16 and always has been and always
will be.
\item From what I've read, a lot of philosophers work on it---there's some presumption that
it's not just about symbol shunting, but about finding `natural' and `universal'
properties of those patterns across disparate things.  The relationship
between these categories/contexts has always been there and we only need a way to see them.
Here's Robert Penn Warren on the matter:
}

\begin{verse}
Here is the shadow of truth, for only the shadow is true.\\
And the line where the incoming swell from the sunset Pacific\\
First leans and staggers to break will tell all you need to know\\
About submarine geography, and your father's death rattle\\
Provides all biographical data required for the {\em Who's Who} of the dead.

\dots In the distance, in {\em plaza, piazza, place, platz,} and square,\\
Boot heels, like history being born, on cobbles bang.

Everything seems an echo of something else.

\dots I watched the sheep huddling.  Their eyes\\
Stared into nothingness.  In that mist-diffused light their eyes\\
Were stupid and round like the eyes of fat fish in muddy water,\\
Or of a scholar who has lost faith in his calling. \dots

You would think that nothing would ever again happen.

That may be a way to love God.
\end{verse}

OK, now that we've covered the {\em why} question, let's start from simple element-to-element
mappings and work up.

\section{Leveling up}\label{levelsec}

A function $f:A\to B$, aka a mapping, assigns a value to all of the elements in the set (or group of
elements) $A$ to
some element(s) in the set $B$. Note that if $A=\{\emptyset\}$ then you still have a function, though
it is vacuous.

\paragraph{Level 1: Objects=Elements}
We can draw this function via arrows, maybe

\begin{equation}
\begin{tikzcd}[row sep=tiny]
    a_1 \ar[rd] & b_1\\
    a_2 \ar[r] & b_2\\
    a_3 \ar[r] & b_3
\end{tikzcd}\tag{Map}
\end{equation}

Here, the nodes in the graph are single elements in the sets, and the arrows describe
where each element goes.

Because of this multi-level issue, I'll mark what the arrows mean in any given diagram
here with a side-note. In this case, the arrows are mappings, in the next they will be
functions, and so on down the line.

\paragraph{Level 2: Objects=Sets}
Level 1 showed elements internal to the sets, and at the next level we'll look externally,
treating each set as single object.
The function notation $f:A\to B$ is itself a tiny graph. If there were also a function
$g:C\to B$, we could construct a diagram that looks like the above element-by-element
graph, but with sets at the nodes instead of single elements:

\Comdiag[Fn]{A \ar[r,"f"]\& B\\
C \ar[ru, "g"]
}
So we've gone up a level: each arrow at this level could be cracked open to reveal
a bundle of arrows, its own element-by-element diagram like the one we had above.

A function mapping from a set to itself (aka an {\em epimorphism}, $f:A\to A$, would be drawn as a loop:

\Comdiag[Fn]{A\ar[loop right]}

Basically all the math people use day to day (trigonometry, calculus, \dots) is at this level.
\comment{
One of the big sticking points was Russel's paradox: given $S$= the set of all sets that
do not include themselves, is $S$ an element of $S$? This showed that we need more
structure than just sets and mappings, and is one of the things that led to category
theory. More below.
}

\paragraph{Level 3, Categories: Bundling functions} \label{homsetsec}
But we can bundle further. In the last
diagram we had a single function. But consider a collection of mappings (aka homomorphisms)
from set $A$ to set $B$, $\Hom{set}(A, B)$.
The first diagram with three elements in $A$ and
three in $B$ could be rewired nine different ways, but it's our choice and we can include
one to nine functions as desired. Generally, if you start with $X$ elements and each
points to one of $Y$ elements in the target set, you can have $Y^X$ combinations, so 
$\Hom{set}(X, Y)$ is sometimes written as $Y^X$.  We could gather these bundles into one
arrow:
\Comdiag[Hom]{A \ar[r,"\Hom{set}"]\& B}

I think of this arrow as a bundle of quivering wires, each with its own power. If this were anime
they'd be tentacles. But there is indeed a lot of power there: with \Re\ representing the
real numbers, we can now summarize all real one-variable functions with the diagram 
\Comdiag[Hom]{\Re\ar[loop right]}

\paragraph{Level 4, Functors: Objects=hom sets}
But we can go one step
further, and put a bundle of objects plus bundle of homomorphisms into a box as well, and
now it is a single node in our graph. Let the above loop of all functions $f:\Re\to\Re$
be $\bf R$, and let's keep it company with the set of all functions on the natural
numbers, $g:\Nat\to\Nat$, notated {\bf N}.

\Comdiag[Fnctr]{{\bf N}\ar[r, "F"] \& {\bf R}}


Here, the arrow is a {\em functor}, which is a bundle of two sets of arrows at once:
one to map elements of $\Nat$ to elements of $\Re$, and one to map the functions
$\Nat\to \Nat$ to functions $\Re\to \Re$. [There are additional restrictions; see
below.] Custom is to write these with capital letters, to contrast with typically
lower-case functions.

\paragraph{Level 5, Natural transformations: Objects=categories}
It's natural to ask, then, why not bundle further and consider mappings from functors to functors?
It only makes sense to map functors if you are bringing the sets of objects with them, so natural
transformations are across category diagrams.

\Comdiag[Nat]{A \ar[d, "F"]\ar[r] \& A' \ar[d, "F'"]  \\
        B\ar[r]\& B'
}

This is called a 2-category.
We could follow the pattern further, up to 3-Categories \&c., but it seems most people stop at
natural transformations.

All these diagrams look alike, so for any diagram you come across in a textbook, it's
worth checking where we are in the pyramid:
\items{
    \item Node is a single element; arrow represents a mapping from one element to another
    \item Node is a set; arrow represents a set of mappings from one set to another, aka a function
    \item Node is a set; arrow represents a set of functions from one set to another
    \item Node is a set of objects with their own sets of functions; arrow represents a functor from one context to another
    \item Node is a category diagram; arrow represents a mapping from one category to another.
}

Anything we could say about a hom set, we could say about the near-trivial hom
set of a single function, so Hom-based diagrams are also Fn-based diagrams.

\subsection{Simpler categories}
Some of the examples above were for rather large categories, such as all functions on all
reals, but bear in mind that the same machinery works for categories of only a few objects
and arrows. Say we have a square of four elements and arrows forming the square (however
you want them), and a functor maps it to all reals and all functions on reals. That map
lands on four objects and four arrows in the target category of reals, maybe

\Comdiag{A \ar[r] \& B \ar[d]  \ar[rr,Rightarrow] \& \&  1 \ar[r,"2x"] \& 2 \ar[d, "x+3"]\\
        C\ar[u]\& D\ar[l]                         \&\&  25\ar[u,"\sqrt{x-9}-3"]\& 5\ar[l,"x^2"]
}

We can't select at random: we will have a rule that if we have an arrow $A\to B$, and
select $A\to 1$ and $B\to 2$, the arrow will have to be something in the set of arrows
of the form $1\to 2$. We've
selected four elements and the same number of arrows indicating the same relations.
\citet{leinster:basic} describes this as assigning data to the relation. A simple category
used for this sort of relation is formally called an {\em index category} or just a {\em diagram}. \label{diagramref}

\paragraph{The selector}
If you had a category with a single element, and a second with who knows how many point, than a
function from the one-element category to the other is a selector.
The hom-set from a one-category set is then the set of element selectors.

Going beyond diagrams, we can also select characteristics from infinite sets,
such as the sorts of restrictions presented in Section
\ref{bundlesec}, or requiring retracts and sections as in Section \ref{injectivesec}.

\section{Bundles}\label{bundlesec}
We're used to thinking in terms of sets comprising atomic elements, but mathematical
objects are typically a bundle of parts. Non-mathematicians think about the non-negative real numbers
$\Re^+$, but we have to start thinking about ($\Re^+$ paired with addition) and ($\Re^+$
paired with multiplication) as different things. Some bundles:

\items{
    \item {\em Magma}: A set $S$ and a function $f:S\to S$.
    \item {\em Preorder}: A set $S$ and a binary relation $\leq$. (See flash cards for
the formal definition)
    \item {\em Graph}: A collection $(V, A, src, tgt)$:
    \items{
        \item $V$: a set (vertices)
        \item $A$: a set (arrows)
        \item $src:A\to V$ a function giving arrow sources
        \item $tgt:A\to V$ a function giving arrow targets
    }
}

Authors who write in object-oriented programming languages will recognize this
use of {\em object} to mean a structure of elements, some of which could be verbs.

\subsection{Constraints}
These objects often have constraints attached. For example, what we name the pair $(S, f)$ depends on
how restrictive we want to be about the function:

\items{
    \item {\em Magma}: A set $S$ and a function $f:S\to S$.
    \item {\em Semigroup}: A magma where $f$ is associative: $f(a, f(b,c))=f(f(a, b),c)$.
    \item {\em Monoid}: A semigroup with an identity element for which $f(id, s)=s,$ for all $s\in S$.
    \item {\em Group}: A monoid where every element has an inverse: $\forall x\in S$,
            $\exists x_{inv}$ such that $f(x, x_{inv}) = id$.
}

As an example, if we allow zero in $\Re^+$, then $(\Re^+, +)$ is a monoid (but no
negative numbers, so no inverses) and $(\Re^+, \cdot)$ is a group.

Associativity and identity are going to be part of the definition of categories,
so there won't be a lot of discussion of things on the list before monoids.
Groups are reserved for contexts where we want a sense of reversibility or symmetry.

\subsection{Completing a group; freedom} Say we have an element $a$ and some function $f$. If
$f(a,a)=a$, then we have a complete group or monoid, with $a$ the identity and its own
inverse.  But say that this is not the case, and $f(a,a)=b$. Then we can incrementally fill in what is
needed: the value $f(a,b)=c$, an identity where $f(a, id)=a$, $f(b,c)=d$, and so on. 
For example, given an element $1$ and the operation $+$, completing the monoid generates
the positive integers.

In these parts, the word {\em free} has informal, not-quite-defined usage
\citep{leinster:basic}, but my impression is that it generally refers to the collection
of the fewest parts needed to complete or generate the system.  The definition of
{\em group} above is a template; the free group is a minimal exemplar that fits that
template. The word {\em the} is often used, because for many forms $F$, one free $F$
is isomorphic to any other free $F$.

The {\em free monoid} refers to lists. Elements $\{a, b, c\}$, the empty set in the r\^ole
of identity,  and the operation
of concatenation generate lists like $[a]$, $[a,b]$, $[a,b,b,a,c,c]$, \dots. It's a
record of a walk from element to element. The free monoid with initial alphabet $\{1\}$
gets its own notation, $\underline 1$, and is isomorphic to the positive integers.

\subsection{Category constraints}
A category is a bundle of elements as per Level 3 of the chain of arrow abstractions
in Section \ref{levelsec}. Bundle together some things that will be nodes (herein,
objects), and the sets of mappings between some or all of them.

You get to pick which mappings you will put in the set of mappings, with a few simple conditions.
You don't have to include every mapping (herein, morphism) from node to node, which as above may cover all of one-variable high school algebra with a single arrow. But for every set we do have to include
the identity morphism $id_A:A\to A$ mapping every element of an object to itself.

A category also has a composition function, so given $F:A\to B$
and $G:B\to C$, we have to include $G\circ F:A\to C$. If the composition function
is the usual $g(f(a)) = c$, so this is not difficult, but in case you get
creative, the textbooks also require that $id_b\circ f(a)=f(a)$ and $g\circ id(b)
= g(b)$.\footnote{Every author makes a big deal of how $G\circ F$ looks backward,
but all you have to do is read $\circ$ as {\em of}, as in {\em $g$ of $f$}, which is
exactly how people read $g(f(x))$. Really not a big deal at all.}

Having these additional rules means we can't just throw out any old set of nodes and edges
and call it a category, but given any old set it's not difficult to generate a complete
category by adding identities and chains of functions like $h\circ g\circ f$ as needed,
to the point that people usually don't even bother drawing the loops to mark the
identities.

Functors, as above, are a bundle of mappings of objects to objects plus mappings from
functions to functions. We want them to represent a sensible transformation from one
context to another, and chaining functors is going to be central here on in, so a
functor $F$ has to have these constraints on how it maps functions and objects:
\items{
    \item $F(f:A\to B) = F(f): F(A)\to F(B)$
    \item $F(id_A) = id_{F(A)}$
    \item $f\circ g = F(f)\circ F(g)$
}
This set of constraints is what we'll need for Level 5 below.

\section{Starting to map}
The nodes in all the graphs above could be anything, including the bundles of elements from
Section \ref{bundlesec}. From here on, famous categories will be in boldface.

Let ${\bf PrO}$ be the set of all preorders and ${\bf Graph}$ be the set of graphs. A
single preorder is a set of relations of the form $a \leq b$, and we could replace the $\leq$
symbol with an arrow, like $a\leftarrow b$, and we have another little inline graph. 
Doing the same for all of the elements in the preorder would generate a list of
$($nodes, arrows, sources, targets$)$ in ${\bf Graph}$.
That is, $f:{\bf PrO}\to {\bf Graph}$ makes as much sense as $f:\Re \to \Re$. Set up such
a mapping for every element of ${\bf PrO}$, and we have a category diagram with two
objects and one arrow representing all homomorphisms mapping all partial orders to graphs.

An arrow in a partial order maps from a single element to a single element; a monoid's
function maps from two input elements to one output. One way to do this, currying, will be
covered later, but our definition of category already does have one binary operation
built in: the composition rule. This gives us the first example of a category that
doesn't fit the intuitive approach of having a set of elements, some functions between
them, and the usual function composition. Here are the components of a category built
from a monoid $(S, id, f)$:
\items{
    \item Objects: following the name {\em monoid}, this category has exactly one object.
Call it \S.
    \item Morphisms: For every monoid element $M$, including $id$, define a morphism $M:\S\to\S$.
That is, the same mapping, $f(\S)=\S$,
appears several times in the same category, with each arrow distinguished by different labels.
    \item Composition: $M_1\circ M_2 \equiv f(M_1, M_2)$.
}
A category requires a composition function that is associative and correctly handles the
identity morphism, and we get exactly that from the monoid requirements on $f$. More on
this below.

\paragraph{Level 5: the category of categories}
Categories are themselves a bundle like the ones from Section \ref{bundlesec}: a set
of objects paired with a set of morphisms (at this level, functors) between them. So
they too could be a node in one of the above diagrams, giving us the category of
categories, ${\bf Cat}$. Notice that the rules for functors, having an identity and
associativity, is exactly the requirement we put on the set of homomorphisms inside
of a single category.

Things are already getting self-referential: ${\bf PrO}\to {\bf Graph}$ is a
node-and-arrow graph representing an operation involving the set of node-and-arrow
graphs. For ${\bf Cat}$, we can show that ${\bf Cat}\to {\bf Graph}$, thus drawing a
graph for two categories to represent how categories and graphs relate.  Mathematicians
love this stuff.

\subsection{Types} Consider the sets $\Re$, $\Nat$, and strings of letters A-Z, and the
category of mappings between these three objects. This is the beginning of a description
of a programming language. C basically runs entirely on these three types. But just as a
free monoid is a path along a set of elements, a program is a path selecting some
functions from the category listing all options. This seems like a nice way of thinking
about a lot of different types of categories.


\subsection{Examples} We already mapped $PrO$ and $Graph$, and we can go in the other
direction from $Graph$ to $PrO$ using similar thinking.

A {\em partial order} is a preorder that also has the condition that if $a\leq b$ then
$b\not\leq a$ (when $a\neq b$). Because $\leq$ is transitive, that means we can't
develop a sequence where $a\leq b$, $b\leq c$, $c\leq a$.
A {\em tree} is a graph with no cycles, meaning exactly the same property holds for arrows
in the set of trees. We therefore have the same mapping from partial orders to trees and
back.

We'd like to touch every node in the tree (span the tree), by starting at a set of
nodes, then following arrows to hit the next node, following all arrows from that node, and so on.
Will this work starting at a single node; will it always be possible? Maybe you have an
image of a tree in your head and the answer is obvious to you: no to both questions.
But proving this using the four-part definition of a graph above may be a pain.

Define a minimum in a partial order to be a value such that there is no $x\neq a$ such that
$x\leq a$, and to add jargon, let an $\omega$-chain be a sequence $a\leq b\leq
c\dots$. With an infinite number of elements (like the number line), a chain may not have
a minimum, and because we have a partial order, there may be multiple chains that don't
intersect. Every element in a partial order is on some $\omega$-chain, though it may be a
single element by itself, or a chain with no minimum. So if there's a chain with no
minimum, we can't span it.  If there are only chains with minima, we can span the full
set by starting at all minima, then walking up their chains.

Now use the functor mapping the category of partial orders to the category of trees,
and we have set the procedure to span any given tree and given conditions for when it will work,
without dealing with the graph elements $V$, $A$, $src$, and $tgt$ at all.
For me, I initially did it wrong with trees, because I did a proof-by-picture using
a finite tree with minimum elements, but from the perspective of a partial order the
full proof came more easily.

\subsection{Functions of two variables} All of our arrows map from one object to one
object, so what do we do with functions with two inputs, $f(x_1, x_2)=y$? There are two
approaches: cross-products and currying.

The first is to define an object which is the cross-product of other objects:

\Comdiag[Hom]{
a\ar[rd]\\
\& a\times b \ar[r,"{f(a,b)}"] \& y\\
b\ar[ru] 
}

Here we have a grid of values, and each maps to $y$.
This diagram has to live at the hom-set level, because if there are three values of $b$,
for example, then each value of $a$ points to three values of $a\times b$.

The second alternative is {\em currying}, in which we treat functions as objects
themselves. Before we get to that, we'll need exponentiation and an evaluation function.
As per Section \ref{homsetsec}, there are $B^A$ possible arrows in $\Hom{set}(A, B)$, so
the hom set is often written using that notation. If you gave me that set and a specific
value of $A$, I could find the given arrow(s) and follow it to an element/elements of $B$.
This is the {\em evaluation function}:
\Comdiag[Hom]{
B^A\times A \ar[r, "ev"] \& B
}

A category {\em has exponentiation} if for any $A, B$, we have $B^A$ and there exists
an evaluation function, and for all objects $C$ 
$g:C\times A \to B$, there is a unique $\hat g:C\to B^A$ so this commutes:

\Comdiag[Hom]{
    B^A\times A \ar[dr, "ev"] \\
        \& B\\
    C\times A \ar[uu, dashrightarrow, "\hat g\times {\bf 1}_A"] \ar[ur, "g"]
}

This is a {\em limit}; see section \ref{universalsec}. It
says that if you have whatever $C\times A$ that maps to $B$, then $C$ implicitly defines a
function in $\hom{set}(A,B)$. This drives home the point that mappings can be expressed as
objects like any other.

That lets us break down $f(a,b)=c$ into components.
Define mappings that select from a set of functions, say $A\to C^B$,
which selects a function $f:B\to C$ for a given value of $A$. Then the full chain
\Comdiag[Hom]{
A \ar[r] \& C^B \ar[r] \& C^B \times B \ar[r, "ev"] \& C\\
B \ar[urr]
}

In fact, we have two such paths, which are equivalent; more on this below.

\Comdiag[Hom]{
A \ar[r] \ar[drr] \& C^B \ar[r] \& C^B \times B \ar[r, "ev"] \& C\\
B \ar[urr] \ar[r] \& C^A \ar[r] \& C^A \times A \ar[ru, "ev"']
}

This rendition explicitly shows that marginal functions exist. Though, we write
our own categories, so I suppose we could write one where $f:A\times B\to C$ exists,
but we don't bother to include the marginal functions $C^B$ or $C^A$ as elements.


\section{Where commuting gets us} It's already potentially interesting that we can
express mappings between different concepts like orderings and graphs, but things get more
interesting when we have multiple paths to the same place.

For example,
\citet{spivak:category} describes multisets as sets of elements $E$ with a set $B$ of
symbol names, and a map $\pi:E\to B$. So, for example, we could replace all the words in a
book with numbers, $e_1$={\em the}, $e_2$={\em and}, $e_{250}$={\em dog}, \dots, and thus
turn a text into a sequence like $e_1, e_{250}, e_2, e_1, \dots$, then have a lookup
function $\pi$ which tells us that $e_{250}$ maps to {\em dog}. We
may have another set of elements and symbols in another context, maybe a corpus in
another language, so we'll need an element mapping placeholders $f_1:E\to E'$ and a symbol mapping
$f_2:B\to B'$. See the flash cards for formalization.

Here is a diagram including both
 the functions to map a multiset $(E, B, \pi)$ to another, and
the $\pi$ functions inside the multisets:
\Comdiag[Fn]{E \ar[r, "f_1"] \ar[d, "\pi"'] \& E'  \ar[d, "\pi'"]\\
        B \ar[r, "f_2"'] \& B'
    }

The dictionary definition of {\em
commute} is {\em to exchange}; in math circles, {\em commutativity} typically refers to
reversing the order of things, like $x+y = y+x$. In this case, we can 
get from the original element set to the primed symbol set in two ways:
either apply the symbol map $\pi$ first to get to $B$ and then step from the unprimed to
primed corpus (calculating $f_2\circ\pi(e)$), or first step to the primed corpus via $f_1$ and then
map from $E$ to $B$ on that side (calculating $\pi\circ f_1(e)$).

When are both routes identical? This seems to me an interesting question in its own right.
For books and word lists in Portuguese and English, I'd guess there's a natural
way to map simple nouns: finding that an element in the $P$ corpus is {\em cão},
then using the Portuguese-to-English dictionary to find that it means {\em dog}
is likely equivalent to the commuted procedure of first finding the element in the
Portuguese corpus matching our element of interest, then looking up that that element in
the $E'$ corpus maps to {\em dog}. This is going to be easy and natural.

Doing the same with {\em saudade} is going to take some forcing.
We're going to generate a list of
categories that all behave similarly and relate nicely, but what about those things
that have to be thrown out to fit the categorical mold? The textbooks that focus on
that mold naturally exclude them, but by definition, standardization requires jettisoning 
the unique.

Indeed, one way to nail down a relationship is to just cull down what is in your sets.
If you need a functor mapping from preorders to partial orders, use the identity functor,
and just ignore everything in the set of partial orders that isn't a preorder. Now you
have a lot to say about a set of measure zero within the space of preorders.

The set of books, $B$, can be cataloged either via the Library of Congress classification system
(a set of call numbers $L$), or the Dewey decimal system (another set of call numbers
$D$). Let us pair together call numbers referring to the same book, in the cross-product
space $D\times L$. The full cross product matching every Dewey number with ever LoC
number makes no sense, and not all books have a classification in every system,
so out of the full cross of all pairs $(d, l)$, we will write down the relatively
small list of pairs that are about the same book, herein $D\times_B L$.  Finally,
we have commutativity:
\Comdiag[Fn]{D\times_B L \ar[r] \ar[d] \& L  \ar[d]\\
        D \ar[r] \& B
    }
This is a {\em pullback}. There is information in the pairing, and maybe even
information in the list of books and call numbers that can't be found from $D\times_B L$
and are now lost in the stacks.

This example may seem obvious, but in Section \ref{universalsec} we'll generalize this
to other contexts.

\subsection{Natural transformations}
If there is a mapping from one context to another where (apply internal function,
then jump) produces the same outcome as (jump, then apply internal function), we call
the jump across categories a {\em natural transformation}. We had a set-level example
above where (find English lookup, translate lookup to Portuguese) and (find Portuguese
match to element in English corpus, lookup in Portuguese) had the same outcome. We can do
the same at the category level.

Given two categories {\bf C} and {\bf D}, we can define ${\rm Fun}({\bf C}, {\bf D})$ as
the category of functors between the two objects, and the set of arrows in that category
will be the natural transformations. So defining and identifying natural
transformations lets us bring the category of functors into the categorical club.

Start with two categories and two functors, $F$ and $G$, each independently mapping the same first
category to the same second category. For the rest of the discussion, almost every element in the discussion
to follow ($\alpha_a, \alpha_b, F(a), F(h), G(b), \dots$ below) lives in the second category.
This is different from the English/Portuguese translation, where there was no specified
space in which everything lived, and $f_1$ and $f_2$ lived in an {\em ad hoc} space of
mappings between multisets.

With that caveat and context in mind, pick two elements from the first category, $a$
and $b$, and a mapping $h:a\to b$.  We want mappings $\alpha_a$ and $\alpha_b$ in the
second category so that

\Comdiag[Funct]{F(a)  \ar[r, "\alpha_a"] \ar[d, "F(h)"] \& G(a)  \ar[d, "G(h)"]\\
        F(b) \ar[r, "\alpha_b"] \& G(b)
    }

The functors are a natural transformation if, for any two elements of the first category,
we have $\alpha$s that make this square commute. We could summarize this with
\Comdiag[Nat]{F \ar[r]\& G}
and thus reaching the fifth level in Section \ref{levelsec}: just as
functors include mappings of functions to functions, natural transformations include
mappings of functors to functors. Objects that aren't maps are {\em $0$-categories},
Maps between $0$-categories are {\em $1$-categories}, making these maps between 
$1$-categories {\em $2$-categories}.


If we have all this, then we have the sort of naturalness we seek: you can
start in the $F$-transformed version and walk from $F(a)$ to $F(b)$ and then walk over to
the $G$-transformed version and get to $G(b)$; or you can commute the order and
$\alpha$-walk over to the $G$-transformed version first and then from $G(a)$ to $G(b)$.

Just because you have $X$ and $Y$ and a nice mapping $X_F\to Y_F$ in the $F$-space doesn't mean there's a corresponding 
mapping in $G$-space $X_G\to Y_G$.
Not all categories have all mapping, so one must verify that all the parts are present.

\subsection{Chaining}
Further, all these commutative diagrams, for functions and functors, chain together. If
we had English ($E$), Portuguese ($E'$), and Russian  ($E''$) corpora, we could draw:

\Comdiag[Fn]{E \ar[r, "f_1"] \ar[d, "\pi"'] \& E'   \ar[r, "f_3"] \ar[d, "\pi'"] \& E'' \ar[d, "\pi''"]\\
        B \ar[r, "f_2"'] \& B' \ar[r, "f_4"'] \&B''
    }

It is self-evident that if the left square and the right square commute, than any flow
from upper left to lower right will be equivalent. This becomes especially interesting
when we have mappings across different contexts: if we can map from multisets to preorders
and preorders to graphs, we just got a multiset-to-graph generator for free.

Natural transformations also chain: if there exists an n.t. from functor $F$ to functor
$G$, and from functor $G$ to functor $H$, then we have a natural transformation from $F$
to $H$. If we have three categories and natural transformations $F$ and $G$ from first to
second, and $H$ and $I$ from second to third, then $H\circ F$ must have a natural
transformation to $G\circ I$.

\subsection{Initial, final, terminal}
An {\em initial object} is one such that there exists exactly one morphism from this object to any other object.
A {\em final object} is one such that there exists exactly one morphism to this object from any other object.
Or if you're feeling vague, you can call these {\em terminal objects} without specifying direction.

For example, if we have the ordering of the positive integers $\leq$, then one is a final object
and there is no initial object.

\subsection{Universal properties}\label{universalsec}
\citet{aluffi:zero} points out that universal properties are terminal objects, in an appropriate
space of shapes.
Those shapes are {\em cones}, constructed in a few steps (I'm following \citet{milewski:pigs} ch 12).
Start with a diagram: a category with a few representative elements, like maybe two points; call them
$x_1, x_2, \dots$.
Then pick a point $C$ in the target category, which is intended to be much larger than the diagram.
Now map each element the diagram to both a respective point in the target space $x_1^F, x_2^F, \dots$, and to $c$. For this to be a
natural transformation, you'll also need the arrows $c\to x_1^F, c\to x_2^F, \dots$.
This is where the name {\em cone} comes in, as we now have a point $c$ at the top of the cone, lines
from that top to all the images of the diagram, and as typically drawn a pyramid floor of all the maps from category to
diagram.

The question, though, is whether there is, within the space of valid cone-tops $c$, a terminal value.

For the empty diagram, with no arrows coming into the target category, $c$ is a terminal cone-top
iff it is a terminal point.
E.g., as above, in positive integers, one is a valid option for $c$.


With two points coming in, the limit is the product; see below.

A colimit and co-cone goes in the other direction:
now run arrows $c\leftarrow x_1^F, c\leftarrow x_2^F, \dots$.

These terminal nodes are often referred to as {\em limits}, in a reference to the topological
origins of category theory.
They are also referred to as {\em universal properties}, implying that they're tapping into
something fundamental and immutable.
On top of this, the same diagram will form a
limit in multiple categories, providing another level of ostensible unification.

Consider the cross product of two sets, $A\times B$. There is an obvious mapping from
$A\times B \to A$ and $A\times B \to B$---just drop the unneeded $B$ or $A$ part. Here
is the shape of this as a diagram, with $\pi_1$ and $\pi_2$ being the {\em drop one
part} functions:

\Comdiag[Hom]{
        \& X\times Y \ar[dl,"\pi_1"'] \ar[dr,"\pi_2"] \\
    X \& \& Y
}

This diagram (in the formal indexing set sense) is called a span and can also be turned into a little inline diagram as
$X \leftarrow (X\times Y) \rightarrow Y$.
Of course, there are other things we could put at the top of the fork, some other
set $A$ that maps both ways: $X \leftarrow A \rightarrow Y$.
But $X\times Y$ is unique in this capacity because every $A$ and its span can be expressed
via a morphism to $X\times Y$. Here is the commutative diagram:

 \Comdiag[Hom]{
    \& A \ar[dddl, "\forall f"'] \ar[dddr, "\forall g"] \ar[dd, dashrightarrow, "f\times g"]\\
\\
        \& X\times Y \ar[dl,"\pi_1"'] \ar[dr,"\pi_2"] \\
    X \& \& Y
    }

As per the annotation in the diagram, the $A\to (X\times Y)$ morphism is trivial to generate as
the ordered pair $(f(a), g(a))$.  For any $f$ and $g$, you could go the short way to
map $A$ to $X$ or $Y$, but you are guaranteed that there is also a pair of mappings
$A\to (X\times Y) \to X$ and $A\to (X\times Y) \to Y$ that are the long commute to the
same point.  Put differently, you can have an arbitrary span composed of any object and
mappings iff you can write down a map $A\to (X\times Y)$. That any span can be expressed
by going through $X\times Y$ makes that object a sort of universal spanner. Returning
to the problem of combining Dewey and LoC numbers, you could come up with all sorts
of other clever combination schemes besides the ordered pair $($Dewey, LoC$)$, but
you are guaranteed that all of them must reduce to this simpler scheme. This sounds
obvious, but people put a lot of time into trying to be clever.

You could do the same in other categories. The product of two graphs could be similarly
constructed, by generating the grid of all vertex pairs $X\times Y$, then including all
arrows from one pair to another if either base graph had an arrow for its corresponding
vertices. Projecting back to the original graphs would also be to just drop the other
dimension.

Groups reveal how the $X\times Y \to X$ homomorphism isn't
necessarily so neat. For $(\Re, \cdot)$ and the span $3 \leftarrow 3\times 4 \rightarrow 4$, the
first arrow is $f_1(x)=x/4$ and the second $f_2(x)=x/3$, which is not especially general.
But then, recall that even the identity homomorphism has item-specific constituents, like the arrow $3\to 3$ and $4\to
4$.

Anyway, does it work? For any $a$, the arrow $a\to 3$ must be $f_a(a)=3/a$, and the arrow to
$3\times 4$ must be $f_{x}(a)=3\cdot 4/a$. Commutativity works: $f_1\circ f_x(a) = 3\cdot
4/a\cdot 1/4 = 3/a = f_a(a)$, and the same for the other leg pointing to $4$.

This gives us a nice coincidence between the set concept of $X\times Y$ and the arithmetic
concept. I'd always learned it as a sort of metaphor, where if you have three dots on the
$x$ axis and twelve dots on the $y$ then you'd have twelve $(x,y)$ pairs. When the counts
go to infinity, well, we've already got the pattern down for finite elements so don't
worry about it. Here we have a more formal relationship, that any transformations into $X$
and into $Y$ have to have a unique corresponding transformation into $x\times y$.


For a partially ordered set, the only thing we have to work with is the $\leq$ relation.
If $x\leftarrow (x\times y) \rightarrow y$ means $x\leq (x\times y) \geq y$, and we want
the unique $(x\times y)$ that meets this requirement, then we need
$(x\times y)\leq \min(x,y)$ to meet the requirements at all, and 
$\min(x,y)\leq (x\times y)$ because otherwise if $A = \min(x,y)$ then $x\leq A$ and $y\leq
A$ but the unique map $A\leq (x\times y)$ won't exist.

The product notation $x\times y$ doesn't seem far-fetched for $\min(x,y)$. We could extend
it to multiple elements, like $x\times y\times z = \min(x,y,z)$. We would want a
multiplicative identity, which for a finite set $S$ would be $\max(S)$, because
$\min(x,\max(S)) = x$. The multiplicative identity is what you might get if you
went the other way and had zero terms in your product, so the commutative diagram above
reduces to {\em there exists a unique mapping $A\to X\times Y$} with no additional
conditions. This brings us to terminal sets.

\paragraph{Terminal sets} A terminal set $T$ is another universal property of a
category. Here, there exists a unique functor $A\to T$, for any $A$. Generally, this
means that the terminal set has one element, so every element of $A$ has to map to
that single element.

Groups: for a group with only one element, that element has to be the multiplicative identity, $1$.

Sets: doesn't matter the name of the element, but as above, there can be only one.

Posets: Let the single element in the terminal set be $t$. For any $a$, $a\leq t$. So, $t$
is $\max(A)$; else if $A$ is infinite and the terminal set doesn't exist, it seems to me.

Graphs: we need one vertex and one arrow, and the arrow must therefore have source and
target equal to that one vertex. A point pointing to itself.

Back to products, all of these are indeed the multiplicative identity for the given
concept of product.

We can't quite define the limit as the multiplicative identity, though that seems to be
the sort of  promise the authors make about the universality of universals. For example,
it doesn't work for rings, for which the terminal set is $\{0\}$. There's probably a way
to make this work, but it characterizes $+$, not $\times$ as the product in that context.

One way to define a thing is to just declare it: $a$ times $b$ is what you get when you
add $a$ items $b$ times over. Another way is to describe the properties that the thing
has. $a$ times $b$ is the unique thing such that the above diagram commutes, and any other
transformation that hits both $a$ and $b$ has to go through $a\times b$. Also, it has to
have a multiplicative identity, which is the thing such that $a\times id$ is isomorphic
to $a$ and there is no identity such that 

\section{The injective retraction epic}\label{injectivesec}
These are some themes that seemed to repeat at
different levels. Am not sure about how to fit them in to the overall narrative.

Given a function $f:A\to B$, could we back-trace the arrows to generate a function in the opposite direction $B\to A$?

If two elements $a_1$ and $a_2$ have the same target $b_1$, reversing the arrows
won't give us a function, because a function maps each element to only one target.
If $f(a_1)=f(a_2) \Rightarrow a_1=a_2$, the function is {\em injective}, and we have our
condition of each reversed arrow hitting only one target. Note that we must have the size
of $B$ greater than or equal to the size of $A$ for this one-to-one property to hold.

\citet{aluffi:zero} reminds us that we can think of functions as a grid: rows are the source points
in the domain, columns are the destination points in the codomain.
To be a function, there is exactly one dot in each row.
To be injective, there is at most one dot in each column.

Injections have a relabeling character to them, where if $f(a_1)=b_1$, then $b_1$ is
effectively just a renaming of $a_1$. No information about $A$ is lost or merged with
other information, though some labels in $B$ may not be used and their information
disappears.

If there is an element of $B$ that has no arrows pointing to it, then we can't reverse the
arrows in $f:A\to B$ to get a function $B\to A$, because every element in the source set
must have some target. If for all elements $b$, there exist some $a$ such that $f(a)=b$,
then $f$ is {\em surjective}, aka {\em onto}. We must have the size of $A$ greater than or equal
to $B$ for this to work.

In \citet{aluffi:zero}'s grid, to be surjective, there is at least one dot in each column.

Surjections have a sectioning character.  Because we know that every $b$ back-maps to some $a$ and some $b$s
can back-map to multiple $a$s, we could partition the set $A$ into elements that map to $b_1$,
elements that map to $b_2$, \dots, thus producing a sectioning of $A$. Mapping from $A$ to
$B$ loses information about what lives in each section; all information about $B$ is
accounted for.

If $f$ is both injective and surjective, then we can generate a proper function
$f^{-1}:B\to A$. I.e., $f$ defines an isomorphism between $A$ and $B$, which must have
the same number of elements. Note that in the case of infinite sets, we can effectively
use this as the definition of `same number'.

Injectivity requires that each element of $B$ have at {\em most} one arrow pointing
to it; surjectivity requires that each element of $B$ have at {\em least} one arrow
pointing to it.  So given both, the back-arrows form a function where every element in
$B$ has exactly one arrow attached. We lose no information in either domain or codomain,
and there is a one-to-one mapping in both directions---an isomorphism.

\subsection{Retract/section}
A {\em retraction} is a function $r:B\to A$ such that $r\circ f=1_A$. Injectivity is
necessary. Consider near-injectivity: if both $a_1\to b$ and $a_2\to b$ and every other
$a_n\to b_n$, then we can assign $f(b_n)=a_n$ and arbitrarily assign $r(b)=r(a_1)$,
but for what $b$ is $r(b)=a_2$?  We don't need surjectivity of $f$, because $b$s that
don't enter into the condition that $r(f(a_n))=a_n$ can point to wherever they please
and we still have our retraction.

A {\em section} is a function $s:B\to A$ such that $f\circ s=1_B$. We need surjectivity of
$f$ (the one with a sectioning character), because each element of $B$ needs to
be accounted for. We don't need injectivity, because if $a_1\to b$ and $a_2\to b$,
arbitrarily pick $s(b)=a_1$ and the condition that $f(s(b))=b$ works.

So we have a retraction iff $f$ is injective; a section iff $f$ is surjective.

\subsection{Determination and choice}

\cite{lawvere:conceptual} recommend drawing the retraction as
\Comdiag[Fn]{ \& B\ar[dr,dashrightarrow,"r"]\\
A\ar[ur,"f"]\ar[rr,"id_A"]\&\&A
}
because now we can generalize to the determination or extension problem: does there exist
a function $g:B\to C$ to fill in this diagram:
\Comdiag[Fn]{ \& B\ar[dr,dashrightarrow,"g"]\\
A\ar[ur,"f"]\ar[rr,"h"]\&\&C
}
it's called a determination problem because $f(a)$ needs to be sufficient to derive
$h(a)$---$g(\cdot)$ can't add information or recover information lost. So $h(a)$ is
determined by $f(a)$---they're relabelings. If $h(a_1)\neq h(a_2)$ but $f(a_1)=f(a_2)$, $g(\cdot)$ can't
help us.

If $f$ does have a retraction $r$, then $g=h\circ r$ and the triangle commutes: $g\circ f =  h\circ r\circ f = h\circ 1_A$.

Similarly, draw the section as
\Comdiag[Fn]{ \& B\ar[dr,"f"]\\
A\ar[ur,dashrightarrow,"s"]\ar[rr,"id_A"]\&\&A
}
because now we can generalize to the choice or lifting problem: does there exist
a function $f:A\to B$ to fill in this diagram:
\Comdiag[Fn]{ \& B\ar[dr,"f"]\\
A\ar[ur,dashrightarrow,"g"]\ar[rr,"h"]\&\&C
}
The same sectioning story holds in the general case. There could be multiple elements of
$B$ each pointing to $h(a)$, and $g(a)$ has to choose one of those.

If $f$ has a section $s$, then $g=s\circ h$ and the triangle commutes: $f\circ g =  f\circ s\circ h = 1_B\circ h$.

\subsection{Epimorphisms and monomorphisms, limits}
An epimorphism is a function $f$ such that, given this diagram,

\Comdiag[Hom]{
    X \arrow[r,"f"] \&Y \arrow[r, shift left,"g_1"] \arrow[r,"g_2"'] \& Z
}

$f\circ g_1 = f\circ g_2$ $\Rightarrow g_1 = g_2$. If $f$ is not surjective, then
this breaks: non-surjectivity means we have some $y$ such that there is no $f(x)=y$,
and we can construct $g_1(y)=z_1$ and $g_2(y)=z_2$. So surjectivity is necessary, and
it seems obvious that surjectivity is sufficient. We could have many sections in $X$, but
once we've reduced it to one representative in $Y$ per section, we won't need to recover
the details of each section.

$F$ is {\em monic} or {\em mono} iff $f \circ g_1 = f \circ g_2 \Rightarrow g_1 = g_2$ in:
\Comdiag[Hom]{
    Z \arrow[r, shift left,"g_1"] \arrow[r,"g_2"'] \&Y \arrow[r,"f"] \& X
}
Now a failure of injectivity kills the requirement, because if $f(y_1)=f(y_2)$ for
$y_1\neq y_2$, it is easy to construct $g_1(z_1)=y_1$ and $g_2(z_1)=y_2$ and the monic
condition fails. Injectivity is sufficient because $f$ injective means the reachable
elements of $Z$ are just relabelings of $Y$.

Epis are limits and monos are colimits, as per the discussion of limits above. The diagram
that is the basis of the cone is $\cdot \to \cdot$, and the pair of cones is
\Comdiag[Hom]{
    X \arrow[r,"f"] \&Y \arrow[r, shift left,"g_1"] \arrow[r,"g_2"'] \& Z\\
   A \ar[u,dashrightarrow,"j"] \ar[ur, "h"]
}
So if you have some other mapping from $A$ such that $g_1\circ h\cong g_2\circ h$,
there has to be a way to get there through $f$, $g_1\circ f\circ j \cong g_2\circ f\circ
j$. That is, however you sectioned $X$, you'll have to section $Y$ in an analogous way and
can't be either more or less clever about it.

Monos as colimits tell a similar story, with the diagram appropriately reversed.

\subsection{Summary}

Injective:
\items{
\item Every element of the codomain has at most one back-arrow to the domain.
\item Drawing a function as a grid with domain in rows and codomain in columns, there is at least one dot in each column.
\item Those elements in the codomain that are pointed to are effectively re-labels of
the points in the domain.
\item The domain must have as many or more elements than the codomain.
\item All information in domain is retained; information could be lost in codomain.
\item We can write a retraction such that we can get back to the original elements of
$A$, meaning $r\circ f=1_A$, by just following the arrows back.
\item Gives us the monomorphism because an initial relabeling won't affect whether $g_1$ and $g_2$ match or don't.
}

Surjective:
\items{
\item Every element of the codomain has at least one back-arrow to the domain.
\item Drawing a function as a grid, there is at most one dot in each column.
\item The domain may have multiple arrows pointing to the same target, thus splitting the domain
into sections based on target elements.
\item The codomain must have as many or more elements than the domain.
\item All information in codomain is retained; information may be lost from the domain.
\item We can write a section such that $f\circ s = 1_B$, because surjectivity gives us that each element has an arrow, and if $f(a_1)=f(a_2)=b$,
just pick one.
\item The epimorphism condition is met because every element of $Y$ is relevant to the monomorphism chain.
}

Injective + Surjective = Isomorphism:
\items{
\item Every element of codomain has exactly one back-arrow
\item Codomain elements are unique relabels of domain elements; all codomain labels are used.
\item Domain and codomain are the same size (for most authors, by definition).
\item No information in domain or codomain is lost.
\item We can write $f^{-1}$ such that $f^{-1}\circ f = 1_A$ and $f\circ f^{-1} = 1_B$.
\item The function $f$ is both epic and monic, as no information is lost on either end.
}


\nocite{awodey:category}
\nocite{riehl:category}
\nocite{lawvere:conceptual}
\nocite{leinster:basic}
\bibliographystyle{plainnat}
\bibliography{cat_notes}

\end{document}