Machine Larning

Starting with the normal equation $A\bar{x} = \bar{b} $

$\bar{x} $ is the scaling factors corresponding to the columns of $A $
- this does not live in either the row or the column space (it lives in R^d)
- HOWEVER the useful components of x to help define b do live in the row space (and the lost inputs reside in the (right) null space)
the columns of $A $ represent the independent directions that span some space
$\bar{b} $ lives in the column space (if the system is consistent)

In many cases:

When A is not known, we seek to learn the values of A which project the input data into the space where b resides by using A - we seek to learn A

The structure of A is partially informed by b, because we seek a consistent system where b resides in the column space of A

Specific Scenarios

A has many more rows than columns
over-determined system
typically inconsistent (unless b lies exactly in column space)
a best fit solution can be reached if the system is consistent by minimizing the error
we have less information in the inputs x than is required for the output b
- each bit of information in x corresponds to an "equation" in A
- there are more "equations" aka "constraints" than inputs x

A has many more columns than rows
under-determined system
there are more inputs x than "equations" aka "constraints"
we have more information in the inputs x than is required for the output b
- potentially infinite number of solutions, because many ways to reach b from x
- here, we constrain the solutions via regularization