From fb07f6985a35f6df7214c36445c050d73fa5a29e Mon Sep 17 00:00:00 2001 From: eugenekim Date: Sun, 26 Nov 2023 23:57:10 +0900 Subject: [PATCH] f --- .gitignore | 1 + content/math/2023-11-21-la.md | 109 ++++++++++++++++------------------ 2 files changed, 51 insertions(+), 59 deletions(-) diff --git a/.gitignore b/.gitignore index f823311..bb1002a 100644 --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,4 @@ tmp/ public/ resources/ +static/ltximg diff --git a/content/math/2023-11-21-la.md b/content/math/2023-11-21-la.md index bfc7625..baf4736 100644 --- a/content/math/2023-11-21-la.md +++ b/content/math/2023-11-21-la.md @@ -1,8 +1,8 @@ +++ title = "Matrix methods in data analysis" -author = ["littlehome"] +author = ["eugenekim"] date = 2023-11-24 -lastmod = 2023-11-25T00:11:11+09:00 +lastmod = 2023-11-26T23:52:49+09:00 draft = false +++ @@ -32,18 +32,19 @@ A = \\begin{bmatrix} \\end{bmatrix} \\] -\\begin{aligned} -A = CR = -\\begin{bmatrix} -2 & 3 \\\\ +\\[ \\begin{aligned} +A = CR =\\begin{bmatrix} +2 & 1 \\\\ 3 & 1 \\\\ 5 & 7 \\\\ -\\end{bmatrix} -\\begin{bmatrix} +\\end{bmatrix}\\begin{bmatrix} 1 & 0 & 1 \\\\ 0 & 1 & 1 \\\\ \\end{bmatrix} -\\end{aligned} +\\end{aligned} \\] + +- \\( A \\) is a linear combination of \\(C\\) column vectors +- \\( A \\) is a linear combination of \\(R\\) row vectors ### Col rank = Row Rank {#col-rank-row-rank} @@ -67,83 +68,67 @@ The linear combinations are the \\( R \\) ## LU factorization {#lu-factorization} -\\begin{aligned} -A = -\\begin{bmatrix} +\\[ \\begin{aligned} +A = \\begin{bmatrix} 2 & 3 \\\\ 4 & 7 \\\\ -\\end{bmatrix} -\\text{ With elimination, we get: -> } -\\begin{bmatrix} +\\end{bmatrix} \\text{ With elimination, we get: -> } \\begin{bmatrix} 2 & 3 \\\\ 0 & 1 \\\\ \\end{bmatrix} -\\end{aligned} +\\end{aligned} \\] What have we done, is the LU factorization ### as LU factorization {#as-lu-factorization} -\\begin{aligned} -A = -\\begin{bmatrix} +\\[ \\begin{aligned} +A = \\begin{bmatrix} 1 & 0 \\\\ 2 & 1 \\\\ -\\end{bmatrix} -\\begin{bmatrix} +\\end{bmatrix} \\begin{bmatrix} 2 & 3 \\\\ 0 & 1 \\\\ \\end{bmatrix} -\\end{aligned} +\\end{aligned} \\] ### LU factorization can be viewed as sum of rank 1 matrixs {#lu-factorization-can-be-viewed-as-sum-of-rank-1-matrixs} -\\begin{aligned} -A &= -\\begin{bmatrix} +\\[ \\begin{aligned} +A &= \\begin{bmatrix} 1 & 0 \\\\ 2 & 1 \\\\ -\\end{bmatrix} -\\begin{bmatrix} +\\end{bmatrix} \\begin{bmatrix} 2 & 3 \\\\ 0 & 1 \\\\ \\end{bmatrix} \\\\ -&= -\\begin{bmatrix} +&= \\begin{bmatrix} 1 \\\\ 2 \\\\ -\\end{bmatrix} -\\begin{bmatrix} +\\end{bmatrix} \\begin{bmatrix} 2 & 3 \\\\ -\\end{bmatrix} - + -\\begin{bmatrix} +\\end{bmatrix} + \\begin{bmatrix} 0 \\\\ 1 \\\\ -\\end{bmatrix} -\\begin{bmatrix} +\\end{bmatrix} \\begin{bmatrix} 0 & 1 \\\\ \\end{bmatrix} \\\\ -&= -\\begin{bmatrix} +&= \\begin{bmatrix} 2 & 3 \\\\ 4 & 6 \\\\ -\\end{bmatrix} -i+ -\\begin{bmatrix} +\\end{bmatrix} + \\begin{bmatrix} 0 & 0 \\\\ 0 & 1 \\\\ -\\end{bmatrix} = -\\begin{bmatrix} +\\end{bmatrix} = \\begin{bmatrix} col_1 row_1 \\\\ -\\end{bmatrix} + -\\begin{bmatrix} +\\end{bmatrix} + \\begin{bmatrix} 0 & 0 \\\\ 0 & A_2 \\\\ \\end{bmatrix} \\end{aligned} +\\] Where \\(A_2\\) is the remaining entries of \\(A\\) @@ -295,7 +280,7 @@ Property 4 is about matrix multiplication, how it should behave when our linear The matrix norm is about how much linear map stretches vectors. -#### QUESTION So here we have a property (of matrix) which gives us different answer when we think a unified transformation or separate sequential transformation. {#so-here-we-have-a-property--of-matrix--which-gives-us-different-answer-when-we-think-a-unified-transformation-or-separate-sequential-transformation-dot} +#### QUESTION So here we have a property (of matrix) which gives us different answer when we think a unified transformation or separate sequential transformation. {#question-so-here-we-have-a-property--of-matrix--which-gives-us-different-answer-when-we-think-a-unified-transformation-or-separate-sequential-transformation-dot} It's weird property. Which of the norms (L1, L2 and so on) have the \\( = \\) equality property? @@ -306,7 +291,7 @@ Which of the norms (L1, L2 and so on) have the \\( = \\) equality property? There are several different ways to define a matrix norm, but some of the most common include: -#### ****Induced (or Operator) Norm****: Defined based on vector norms. For a given vector norm \\( \\Vert \\cdot \\Vert_v \\), the corresponding induced matrix norm is defined as {#induced--or-operator--norm-defined-based-on-vector-norms-dot-for-a-given-vector-norm--vert-cdot-vert-v--the-corresponding-induced-matrix-norm-is-defined-as} +#### ****Induced (or Operator) Norm****: {#induced--or-operator--norm} \\[ \\Vert A \\Vert = \\max_{{x} \\neq 0} \\frac{\\Vert A x \\Vert_v}{\\Vert x \\Vert_v} \\] @@ -321,27 +306,31 @@ It happens to be \\( x \\) is the \\( v_1 \\), the first eigen vector of \\( A^T let's prove it -#### ****Frobenius Norm****: Defined as {#frobenius-norm-defined-as} +#### ****Frobenius Norm****: {#frobenius-norm} \\[ \\Vert A \\Vert_F = \\sqrt{\\sum_{i,j} |a_{ij}|^2} \\] where \\( a_{ij} \\) are the elements of \\( A \\). This is analogous to the Euclidean norm for vectors. -#### ****Nuclear Norm****: This is the maximum absolute value of the elements of the matrix. {#nuclear-norm-this-is-the-maximum-absolute-value-of-the-elements-of-the-matrix-dot} +#### ****Nuclear Norm****: {#nuclear-norm} + +\\[ \\Vert A \\Vert = \\sum_{i,j} |a_{ij}| \\] Netflix competition winner used this norm. There's hypothetis that Deep learning finds the nuclear norm. -- QUESTION Doesn't deep learning's loss function specify the which norm to use? what does he mean when he hypothesize? +- QUESTION Doesn't deep learning's loss function specify the which norm to use? what does he mean when he hypothesize? The important distiction is that, we have many more possible solutions because # of parameters > # of data ### Eckart-Young {#eckart-young} -Given a matrix A, there's a unique matrix B with rank \\(r\\) such that \\( \\Vert A-C \\Vert \\geq \\Vert A-B \\Vert \\) for any mattrix C with rank \\( r \\) +Given a matrix A, there's a unique matrix \\(B\\) with rank \\(r\\) such that \\( \\Vert A-C \\Vert \\geq \\Vert A-B \\Vert \\) for any mattrix C with rank \\( r \\) + +\\( B \\) is the first \\(r\\) rank-1 sum of SVD. #### proof {#proof} @@ -350,7 +339,7 @@ Given a matrix A, there's a unique matrix B with rank \\(r\\) such that \\( \\Ve -- QUESTION why is he looking at \\( A-CR\\)? +- QUESTION why is he looking at \\( A-CR\\)? He must be assuming, any matrix can be represented by \\(CR\\)? @@ -358,10 +347,10 @@ Given a matrix A, there's a unique matrix B with rank \\(r\\) such that \\( \\Ve ### PCA {#pca} Pick the first \\( r\\) number of \\( v \\)s to best represent the data, after Singular value decomposition. -Eckart-Young says they are the best. +Eckart-Young says they are the best in a sense that any other vectors . -#### QUESTION What is the different from MSE method? {#what-is-the-different-from-mse-method} +#### QUESTION What is the different from MSE method? {#question-what-is-the-different-from-mse-method} (ANSWER) PCA measures the orthogonal distance, and MSE measures the distance using the space's basis. But I can't see why. @@ -407,7 +396,9 @@ We find the closest vector in the column space to the \\( b \\): by projecting b The vector \\(\\mathbf{A}\\hat{\\mathbf{x}}\\) is the projection of \\(\\mathbf{b}\\) onto Col(\\(\\mathbf{A}\\)). The difference \\(\\mathbf{b} - \\mathbf{A}\\hat{\\mathbf{x}}\\) is orthogonal to every column of \\(\\mathbf{A}\\), which leads to the equation \\( \\mathbf{A}^{\\mathsf{T}}(\\mathbf{b} - \\mathbf{A}\\hat{\\mathbf{x}}) = \\mathbf{0} \\) -#### QUESTION So MSE (when compared to PCA), seemed to be using a direction which has no 'perpendicular' sense to it, but apparently it is, so can you visualize PCA which direction it's trying to minimize? {#so-mse--when-compared-to-pca--seemed-to-be-using-a-direction-which-has-no-perpendicular-sense-to-it-but-apparently-it-is-so-can-you-visualize-pca-which-direction-it-s-trying-to-minimize} +#### QUESTION MSE and PCA ? {#question-mse-and-pca} + +MSE (when compared to PCA), seemed to be using a direction which has no 'perpendicular' sense to it, but apparently it is, so can you visualize PCA which direction it's trying to minimize? #### When does it work? {#when-does-it-work} @@ -440,7 +431,7 @@ Why? can be viewed in multiple ways -- QUESTION Think of \\( A^\* \\) +- QUESTION Think of \\( A^\* \\) #### can we not use pseudo inverse formula even when \\( A \\) is invertible? {#can-we-not-use-pseudo-inverse-formula-even-when--a--is-invertible} @@ -452,7 +443,7 @@ I guess yes we can use it. But, does that also mean, that pseudo inverse generates L_2 minimum solution when \\( A \\) is not invertible? -#### QUESTION I.e. Does pseudo inverse gives \\( \\min_x \\|b - Ax\\|_2^2\\) ? {#i-dot-e-dot-does-pseudo-inverse-gives--min-x-b-ax-2-2} +#### QUESTION I.e. Does pseudo inverse gives \\( \\min_x \\|b - Ax\\|_2^2\\) ? {#question-i-dot-e-dot-does-pseudo-inverse-gives--min-x-b-ax-2-2} I think it does.. @@ -538,7 +529,7 @@ Think 1d case where \\( A = [ \\sigma ] \\) -- QUESTION So what's the point of the statement ? +- QUESTION So what's the point of the statement ? Does it have any practical value? Do we decrease \\( \\delta\\) and see if we can pluck out some variables? @@ -561,7 +552,7 @@ When matrix is 1000 ish big.. - - QUESTION why is it a good basis? + - QUESTION why is it a good basis? Maybe, because we are using \\(A\\) to form basis, but can't tell