FIZ353 - Numerical Analysis | 06/11/2020
Emre S. Tasci emre.tasci@hacettepe.edu.tr
Any set of linear relations involving variables can be represented in different formats and -most of the time- can be solved via a number of methods. Before we delve unto some of these methods, a little revision might be useful.
Any equation that contains -at most- 1st power of its variables is called a linear equation (e.g., $3x + 5y = 1$ is a linear equation whereas $3x^2 + 5\sqrt y$ isn't (it is a non-linear equation)). If we have more than one linear equation, then we are working on a linear system. Depending on the number of independent equations ($n_e$) and the number of variables ($n_v$) our linear system will belong to one of these three classes:
Usually, the physical problem at hand will be presented to be composed of some variables collected in a number of equations. Before proceeding to solve them, one must rearrange them so that the variables are ordered in the same way in each equation and the numerical value is written to the right hand side of the equation, e.g.:
\begin{align} x - 3 &= y\\ y - 2x + 2 &= 11 \end{align}can be rearranged into:
\begin{align} x - y &= 3\\ -2x + y &= 9 \end{align}The most popular method to solve a linear system is the Gaussian Elimination where one equation is picked and by being multiplied by a scalar to fit the negative of the coefficient of the variable that is wanted to be eliminated in another equation and afterwards added to that equation. Take this linear system:
\begin{align} 10x_1 + 2x_2 - x_3 &= 27\quad\quad[1]\\ -3x_1 - 5x_2 + 2x_3 &= -61.5\quad[2]\\ x_1 + x_2 + 6x_3 &= -21.5\quad[3] \end{align}To eliminate $x_1$ in the 2nd equation, we multiply the 1st equation by -3/10 and subtract it from the 2nd equation (as we are adding/subtracting an equation (which in essence, has the same meaning like 2=2 or 5=5, both sides of the 2nd equation will be affected the same and even though it will become a different equation, the equality will still hold, e.g., "4 = 4" + "5 = 5" => "9 = 9" (still valid)). We do a similar thing to eliminate $x_1$ in the 3rd equation as well (multiply the 1st by 1/10 and subtract it from the 3rd).
After this modification, the equations becomes: \begin{align} [1]:\quad10x_1 + 2x_2 - x_3 &= 27\\ \left[[2] - \left(-{\tfrac{3} {10}}\right)\cdot[1]\right]:\quad 0 - 4.4x_2 + 1.7x_3&=-53.4\\ \left[[3] - \left({\tfrac{1} {10}}\right)\cdot[1]\right]:\quad 0 + 0.8x_2 + 6.1x_3 &= -24.2 \end{align}
Now we will use 2nd equation to eliminate $x_2$ in the 3rd equation. If we multiply the 2nd equation by $\left(\frac{0.8}{-4.4}\right)$ and subtract it from the 3rd equation:
\begin{align} [1]:\quad10x_1 + 2x_2 - x_3 &= 27\\ [2]:\quad 0 - 4.4x_2 + 1.7x_3&=-53.4\\ \left[[3] - \left(-{\tfrac{0.8} {4.4}}\right)\cdot[2]\right]:\quad 0 + 0 + 6.41x_3 &= -33.91 \end{align}The 3rd equation only contains $x_3$ so we can immediately calculate it:
$x_3 = \frac{-33.91}{6.41} = -5.29$
Now that we know $x_3$, substituting this value into the 2nd equation, we can find $x_2$:
$-4.4x_2 + 1.7\cdot(-5.29) = -53.4 \Rightarrow x_2 = \frac{-53.4 + 8.99}{(-4.4)} = 10.09$
and finally, substituting the newly found values of $x_2$ and $x_3$ into the 1st equation:
$10x_1 + 2\cdot(10.09) - (-5.29) = 27 \Rightarrow x_1 = 0.15$
When we were going downward, eliminating one of the variables on each run in the equations below, this process is called forward elimination. Then, when we reached the bottom, we had a direct equality for only a single variable and we went back to the top by substituting the found values to find the other variables. This going back process to recover all the unknowns is called backward substitution.
We can fasten the analyzing process by representing our linear system as a matrix multiplication. Continuing with the same case, i.e.,\begin{align} 10x_1 + 2x_2 - x_3 &= 27\\ -3x_1 - 5x_2 + 2x_3 &= -61.5\\ x_1 + x_2 + 6x_3 &= -21.5 \end{align}
we can represent it, without any loss of information, as:
$$\begin{bmatrix} 10 & 2 & -1\\ -3 & -5 & 2\\ 1 & 1 & 6 \end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=\begin{bmatrix}27\\-61.5\\-21.5\end{bmatrix}$$Therefore it is now a matrix multiplication, i.e., $A\cdot x=b$, where:
$$A=\begin{bmatrix} 10 & 2 & -1\\ -3 & -5 & 2\\ 1 & 1 & 6 \end{bmatrix} \quad x=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix} \quad b=\begin{bmatrix}27\\-61.5\\-21.5\end{bmatrix}$$import numpy as np
A = np.array([[10., 2., -1.],[-3., -5., 2.],[1., 1., 6.]])
print("A:\n",A,"\n")
b = np.array([[27.],[-61.5],[-21.5]])
print("b:\n",b,"\n")
# Define a new matrix to store the modified values:
A2 = A
# Multiply the first row by (-3/10) and subtract it from the second:
A2[1,:] = A[1,:] - (-3/10)*A[0,:]
# Multiply the first row by (1/10) and subtract it from the second:
A2[2,:] = A[2,:] - (1/10)*A[0,:]
# Here's the current situation:
print("A2:\n",A2,"\n")
# We need to do apply the same operations to the right hand side
# of the equation, as well:
b2 = b
b2[1] -= (-3./10.)*b2[0]
b2[2] -= ( 1./10.)*b2[0]
print("b:\n",b,"\n")
# In the next iteration, we multiply the second row by -0.8/4.4
# and subtract it from the third to eliminate x_2
A2[2,:] -= (-0.8/4.4)*A2[1,:]
b2[2] -= (-0.8/4.4)*b2[1]
print("A2:\n",A2,"\n")
print("b2:\n",b2,"\n")
# Now we can solve
x3 = b2[2] / A[2,2]
x2 = (b2[1] - A[1,2]*x3) / A[1,1]
x1 = (b2[0] - A[0,2]*x3 - A[0,1]*x2) / A[0,0]
print ("x1: %.5f\tx2: %.5f\tx3: %.5f\n"%(x1,x2,x3))
In each step, modifying both the rows of A and b is a little bit tiring, therefore we can "merge" the two to have an augmented matrix for practical purposes:
$$\left[ \begin{array}{ccc|c} 10 & 2 & -1 & 27\\ -3 & -5 & 2 & -61.5\\ 1 & 1 & 6 & -21.5 \end{array} \right]$$and take it from there all at once:
import numpy as np
AA = np.array([[10., 2., -1., 27.],[-3., -5., 2., -61.5],[1., 1., 6.,-21.5]])
print("Initial AA:\n",AA,"\n")
AA[1,:] -= -3./10.*AA[0,:]
AA[2,:] -= 1./10.*AA[0,:]
AA[2,:] -= -0.8/4.4*AA[1,:]
print("Eliminated AA:\n",AA,"\n")
It's a good strategy from precision's point of view to reorder the rows such that the diagonal elements are the ones with the greater absolute value in their column.
If we were to pivot our case example, but this time we'll assume that we had the rows interchanged to start with: $$\left[ \begin{array}{ccc|c} 1 & 1 & 6 & -21.5\\ -3 & -5 & 2 & -61.5\\ 10 & 2 & -1 & 27 \end{array} \right]$$
and do the elimination for $x_2$ in the 3rd equation:
$$\left[ \begin{array}{ccc|c} 10 & 2 & -1 & 27\\ 0 & -4.4 & 1.7 & -53.4\\ 0 & 0 & 6.41 & -33.9 \end{array} \right]$$...and we have arrived at the same (and most optimal) ordering that we had taken for granted in the previous section.
But what can differ by simply changing the order of the equations and then proceeding elimination? To illustrate the importance of pivoting, consider the following case:
Example Use Gauss elimination with and without pivoting to solve the following linear system:
$$\begin{align} 0.0003x_1 + 3.0000x_2 &= 2.0001\\ 1.0000x_1 + 1.0000x_2 &= 1.0000 \end{align}$$Without pivoting: To eliminate x_1 in the second equation, we need to multiply the 1st equation by 1.0000/0.0003 and then subtract it:
$$1.0000/0.0003 \times \text{[1st equation]: } x_1 + 10000x_2 = 6667$$subtract this from the 2nd equation:
$$-9999x_2 = -6666\\ x_2 = \frac{6666}{9999} = 2/3$$Substitute this value into the 1st equation: $$\Rightarrow x_1 = \frac{2.0001 - 3x_2}{0.0003} = \frac{2.0001 - 3(2/3)}{0.0003}$$
The obtained values for $x_1$ will be dependent on the number of significant figures taken for $x_2 = 2/3$:
x1_true = 1/3
print("#SF\tx2\t\t x1\t\tErr%")
for sigfig_no in range(3,8):
formatstr = "%."+"%d"%sigfig_no+"f"
numstr = formatstr%(2/3)
x2 = float(numstr)
x1 = (2.0001 - 3*x2) / 0.0003
abs_per_rel_err_x1 = np.abs(x1_true-x1) / (x1_true) * 100
print("%2d\t%.9f\t%12.9f\t%7.2f"%(sigfig_no,x2,x1,abs_per_rel_err_x1))
With pivoting: This time, let's first proceed with pivoting. For the first column, second row has the greatest value (1 > 0.0003) so, let's rearrange:
$$\begin{align} 1.0000x_1 + 1.0000x_2 &= 1.0000\\ 0.0003x_1 + 3.0000x_2 &= 2.0001 \end{align}$$Solving once again, we multiply the 1st equation with 0.0003 and subtract it from the 2nd:
$$0.0003 \times \text{[1st equation]: } 0.0003x_1 + 0.0003x_2 = 0.0003$$subtract this from the 2nd equation:
$$2.9997x_2 = 1.9998\\ x_2 = \frac{1.9998}{2.9997} = 2/3$$We still get the same value for $x_2$, howeveri this time, the equation we'll substitute this value is "a little bit" different than the previous one:
Substitute this value into the 1st equation: $$\Rightarrow x_1 = 1.0000 - x_2 = 1 - (2/3)$$
The obtained values for $x_1$ will still be dependent on the number of significant figures taken for $x_2 = 2/3$, but let's see what happens this time:
x1_true = 1/3
print("#SF\tx2\t\t x1\t\t Err%")
for sigfig_no in range(3,8):
formatstr = "%."+"%d"%sigfig_no+"f"
numstr = formatstr%(2/3)
x2 = float(numstr)
x1 = (1 - x2)
abs_per_rel_err_x1 = np.abs(x1_true-x1) / (x1_true) * 100
print("%2d\t%.9f\t%12.9f\t%7.2f"%(sigfig_no,x2,x1,abs_per_rel_err_x1))
Thus, the precision has been greatly increased!
Suppose that in the $A\cdot x=b$ linear system, $A$ remains fixed but b keeps changing. This can be a situation where the medium changes but the equipment remains the same, e.g., measuring the magnetic field in different parts of a room. In order to not to do all the forward elimination & backwards substition all over again when b changes, we apply the LU decomposition, developed by none other than Alan Turing himself! In other words, decompose our A matrix into the multiplication of two other matrices L and U, such that L is a lower trigonal matrix and U is an upper trigonal matrix:
$$ A = L \cdot U\\ L = \begin{bmatrix} 1 & 0 & 0\\ l_{21} & 1 & 0\\ l_{31} & l_{32} & 1 \end{bmatrix},\quad U = \begin{bmatrix} u_{11} & u_{12} & u_{13}\\ 0 & u_{22} & u_{23}\\ 0 & 0 & u_{33} \end{bmatrix} $$When we were applying the Gauss elimination method, we were already working on this decomposition. Remember the coefficients we multiplied the rows to subtract from the other rows to eliminate? Those were our elements of the L matrix, and the transformed part was the U matrix!
In our case study, we had: \begin{align}10x_1 + 2x_2 - x_3 &= 27\\ -3x_1 - 5x_2 + 2x_3 &= -61.5\\ x_1 + x_2 + 6x_3 &= -21.5 \end{align}
$$\begin{bmatrix} 10 & 2 & -1\\ -3 & -5 & 2\\ 1 & 1 & 6 \end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=\begin{bmatrix}27\\-61.5\\-21.5\end{bmatrix}$$The steps we took in elimination were as follows:
These coefficients yield the L matrix: $$l_{21} = -3/10 = -0.3\\ l_{31} = 1/10 = 0.1\quad l_{32} = -0.8 / 4.4 = -0.18\\\\ L = \begin{bmatrix} 1 & 0 & 0\\ -3/10 & 1 & 0\\ 1/10 & -0.8/4.4 & 1 \end{bmatrix} $$
After the eliminations we had A transformed into: $$\begin{bmatrix} 10 & 2 & -1\\ 0 & -4.4 & 1.7\\ 0 & 0 & 6.41 \end{bmatrix}$$
which is actually our U matrix. Let's test if their multiplication indeed yields A:
import numpy as np
L = np.array([[1., 0., 0.],[-3./10., 1., 0.],[1./10.,-0.8/4.4,1.]])
U = np.array([[10., 2., -1.],[0.,-4.4, 1.7],[0.,0.,6.41]])
LU = np.dot(L,U)
print("LU:\n",LU,"\n")
Thus we have verified that $A = L\cdot U$
To proceed we first solve $L\cdot z = b$ where $z$ is a column vector of unknowns $z$. Continuing with our case study: $$ L = \begin{bmatrix} 1 & 0 & 0\\ l_{21} & 1 & 0\\ l_{31} & l_{32} & 1 \end{bmatrix}\begin{bmatrix}z_1\\z_2\\z_3\end{bmatrix}=\begin{bmatrix}27\\-61.5\\-21.5\end{bmatrix}$$
$z_1$ is directly available, and forward substituting, we also find $z_2$ and $z_3$:
$$\begin{bmatrix}z_1\\z_2\\z_3\end{bmatrix} = \begin{bmatrix}27\\-53.4\\-33.91\end{bmatrix}$$Now we solve the $U\cdot x = z$ system: $$\begin{bmatrix} 10 & 2 & -1\\ 0 & -4.4 & 1.7\\ 0 & 0 & 6.41 \end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=\begin{bmatrix}z_1\\z_2\\z_3\end{bmatrix}$$
This time, $x_3$ is readily available, and backward substituting, we derive $x_2$ and $x_1$:
$$\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=\begin{bmatrix}0.15\\10.09\\-5.29\end{bmatrix}$$Now that we have decomposed our system, even if the results ($b$) change, all we have to do is direct backward/forward substitution!
Inverse of a matrix $A$ can be calculated via various ways.
The systematical way
Example: Calculate the inverse of $A=\begin{bmatrix} 10 & 2 & -1\\ -3 & -5 & 2\\ 1 & 1 & 6 \end{bmatrix}$
import numpy as np
def minor(arr,i,j):
# ith row, jth column removed
return arr[np.array(list(range(i))+list(range(i+1,arr.shape[0])))[:,np.newaxis],
np.array(list(range(j))+list(range(j+1,arr.shape[1])))]
# Source: https://stackoverflow.com/a/3858333
A = np.mat([[10., 2., -1.],[-3., -5., 2.],[1., 1., 6.]])
print("A:\n",A,"\n")
C = A*2
for i in range(0,3):
for j in range(0,3):
C[i,j] = (-1)**(i+j)*np.linalg.det(minor(A,i,j))
print("C:\n",C,"\n")
CT = C.H
print("C^*T:\n",CT,"\n")
detA = np.linalg.det(A)
print("detA: ",detA,"\n")
# or det is also:
#print(np.dot(A[0,:],np.squeeze(np.asarray(C[0,:]))))
invA = CT / detA
print("invA:\n",invA,"\n")
As we now have $A^{-1}$, we can directly multiply from left of both sides of the $A\cdot x = b$ equation to get: \begin{align}A^{-1} (A\cdot x) = (A^{-1}A)x &=A^{-1}b\\ x &= A^{-1}b\end{align}
# Continuing from the previous code where invA was calculated
print("invA:\n",invA,"\n")
b = np.array([[27.],[-61.5],[-21.5]])
print("b\n",b,"\n")
x = invA * b
print(x)
In this method, we substitute the value column vector b into the relevant column of the coefficients matrix A to form the $D_i$ matrix. The unknown value is found by calculating: $$\frac{\det (D_i)}{\det (A)}$$
$$\begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}$$$$x_1 = \frac{\det\begin{bmatrix} y_1 & a_{12} & a_{13}\\ y_2 & a_{22} & a_{23}\\ y_3 & a_{32} & a_{33} \end{bmatrix} }{\det \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix}} ,\quad x_2 = \frac{\det\begin{bmatrix} a_{11} & y_1 & a_{13}\\ a_{21} & y_2 & a_{23}\\ a_{31} & y_3 & a_{33} \end{bmatrix} }{\det \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix}},\quad ... $$This method is used when one needs to find not all the unknowns, but some of them.
Example \begin{align} 10x_1 + 2x_2 - x_3 &= 27\\ -3x_1 - 5x_2 + 2x_3 &= -61.5\\ x_1 + x_2 + 6x_3 &= -21.5 \end{align}
$$A=\begin{bmatrix} 10 & 2 & -1\\ -3 & -5 & 2\\ 1 & 1 & 6 \end{bmatrix} \quad x=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix} \quad b=\begin{bmatrix}27\\-61.5\\-21.5\end{bmatrix}$$import numpy as np
A = np.array([[10., 2., -1.],[-3., -5., 2.],[1., 1., 6.]])
print("A:\n",A,"\n")
b = np.array([[27.],[-61.5],[-21.5]])
print("b:\n",b,"\n")
D1 = np.copy(A)
D2 = np.copy(A)
D3 = np.copy(A)
D1[:,0] = np.transpose(b)
print("D1:\n",D1,"\n")
det_D1 = np.linalg.det(D1)
D2[:,1] = np.transpose(b)
print("D2:\n",D2,"\n")
det_D2 = np.linalg.det(D2)
D3[:,2] = np.transpose(b)
print("D3:\n",D3,"\n")
det_D3 = np.linalg.det(D3)
det_A = np.linalg.det(A)
x1 = det_D1 / det_A
x2 = det_D2 / det_A
x3 = det_D3 / det_A
print ("x1: %.5f\tx2: %.5f\tx3: %.5f\n"%(x1,x2,x3))
solve(A,b) function from numpy.linalg package directly solves the $A\cdot b = x$ system of linear equations:
import numpy as np
A = np.array([[10., 2., -1.],[-3., -5., 2.],[1., 1., 6.]])
print("A:\n",A,"\n")
b = np.array([[27.],[-61.5],[-21.5]])
print("b:\n",b,"\n")
x = np.linalg.solve(A,b)
print("x:\n",x,"\n")
LU = lu(A) from scipy.linalg gives the LU decomposition of A
import numpy as np
import scipy.linalg as la
A = np.mat([[10., 2., -1.],[-3., -5., 2.],[1., 1., 6.]])
print("A:\n",A,"\n")
b = np.mat([[27.],[-61.5],[-21.5]])
print("b:\n",b,"\n")
(P,L,U)= la.lu(A)
print("L\n",L,"\n")
print("U\n",U,"\n")
z = np.linalg.solve(L,b)
print("z\n",z,"\n")
x = np.linalg.solve(U,z)
print("x\n",x,"\n")
We have already seen that for an $A$ (square) matrix, we could find such a special vector $\vec{u}$ that when multiplied it returned the vector itself, scaled by a scalar $\lambda$:
$$A\cdot\vec{u} = \lambda \vec{u}$$For such a case, the scalar $\lambda$ was called as the eigenvalue and the vector $\vec{u}$ was called as the eigenvector. Eigenvalues and eigenvectors play a very special part in all branches of physics, most notably in Quantum Mechanics.
Now, we will treat them as a way to decompose our matrix, into a multiplication of 3 matrices, one being a diagonal matrix holding the eigenvalues and the other two being the inverse of each other. So:
$$ U^{-1}\cdot A \cdot U = D$$and hence, equivalently:
$$ U\cdot D \cdot U^{-1} = A$$where $U = \begin{bmatrix}\rule[-1ex]{0.5pt}{2.5ex} & \rule[-1ex]{0.5pt}{2.5ex} & &\rule[-1ex]{0.5pt}{2.5ex}\\ \vec{u}_1 & \vec{u}_2&\cdots&\vec{u}_n\\\rule[-1ex]{0.5pt}{2.5ex} & \rule[-1ex]{0.5pt}{2.5ex} & &\rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}$ and $D=\begin{bmatrix}\lambda_1 & & \\ & \ddots & \\ & & \lambda_n\end{bmatrix}$ such that:
$$A\cdot \vec{u}_i = \lambda_i\, \vec{u}_i$$Example
Consider:
$$A=\begin{bmatrix}2&3\\4&5\end{bmatrix}$$
A = np.array([[2,3],[4,5]])
[lambdas, U] = np.linalg.eig(A)
print("Eigenvalues: ",lambdas,"\n-----\nEigenvectors Matrix:\n",U)
D = np.diag(lambdas)
D
# U.D.(inv(U)) ==> A
np.linalg.multi_dot((U,D,np.linalg.inv(U)))
#(inv(U)).A.U ==> D
np.linalg.multi_dot((np.linalg.inv(U),A,U))
Consider a zombie breakout (with cure): at every hour, 20% of the humans turn to zombies while %10 of the zombies recover and once again become human. So if the current number of humans is $h_0$ and number of zombies is $z_0$, after 1 hour, this numbers will be:
$$\begin{align*} h_1 &= h_0\times 80\% + z_0\times 10\%\\ z_1 &= z_0\times 90\% + h_0\times 20\% \end{align*}$$or, rearranging the above equations into a matrix multiplication:
$$\begin{pmatrix}h_1 \\ z_1 \end{pmatrix} = \begin{bmatrix}0.80 & 0.10 \\ 0.20 & 0.90\end{bmatrix}\begin{pmatrix}h_0 \\ z_0 \end{pmatrix}$$Let's start with a total number of 120 humans and 60 zombies:
h0 = 120
z0 = 60
hz0 = np.array([[h0],[z0]])
print("Hour: {:2d} | Humans # {:8.4f} | Zombies # {:8.4f}"
.format(0,h0,z0))
trmat = np.array([[0.8, 0.1],[0.2, 0.9]])
for hour in range(1,200):
hz1 = np.dot(trmat,hz0)
print("Hour: {:2d} | Humans # {:8.4f} | Zombies # {:8.4f}"
.format(hour,hz1[0,0],hz1[1,0]))
# check if hz1 & hz0 are parallel
cross_p = np.cross(hz0[:,0],hz1[:,0])
if(np.linalg.norm(cross_p) < 1E-2):
# They are parallel
print("Found it!\n",hz1)
break
else:
# continue the "evolution"
hz0 = hz1.copy()
So, it means that around the 37th hour, human and the zombie population will reach to an equilibrium.
What we did here was to interpret the transformation matrix as a rotation matrix and we kept on rotating an initial vector accordingly, until it did not rotate, meaning that it converged to an eigenvector.
As opposed to the conventional analytic solution, here we first find the eigenvector, and then we can proceed to calculate the corresponding eigenvalue:
hz1
Au = np.dot(trmat,hz1)
meaning that the eigenvalue is equal to 1 (as A.u = (1).u)
# Verification using built-in function:
[l,u] = np.linalg.eig(trmat)
print(l,"\n",u)
hz1.T.shape
One of the most powerful decompositions is the SVD decomposition, where an $A$ matrix (not necessarily square) is decomposed into singular vector matrices $U$, $V$ along with a diagonal $\lambda \equiv\sigma^2$ matrix:
$$A = U\cdot S\cdot V^T$$There are many magical things involved. 8) $U$ and $V$ share the same eigenvalues $\sigma^2$, and they are orthogonal within each other. Furthermore, defining the symmetrical matrices $AA^T$ and $A^TA$, $\{u_i\} \in U$ are the eigenvectors of $AA^T$, while $\{v_i\} \in V$ are the eigenvectors of $A^TA$. Also:
$$A\cdot v_i = \sigma_i u_i$$and
$$A=\sigma_1 u_1 v_1^T + \dots+ \sigma_r u_r v_r^T$$where each $\sigma$ reports the "importance" of every term, so a controlled approximation is possible.
AS=np.array([[3,2,2],[2,3,-2]])
AS
[u,s,v]=np.linalg.svd(AS)
print(u,"\n",s,"\n",v)
# 0.707106781 : 1/\sqrt(2)
# 0.235702260 : 1/\sqrt(18)
# 0.942809042 : 4/\sqrt(18)
AAT =np.dot(AS,AS.T)
AAT
np.dot(AAT,u[:,0])/25
u[:,0]
np.dot(AAT,u[:,1])/9
u[:,1]
c = []
tot = np.zeros((2,3))
for i in [0,1]:
k = s[i]*np.dot(np.array([u[:,i]]).T,np.array([v[i,:]]))
print(k)
c.append(k)
tot += k
c
tot
import os
os.getcwd()
# pip install Pillow
from PIL import Image
image = Image.open("data/Solvay_conference_1927.jpg")
data = np.asarray(image)
print(data.shape)
print(data[0:10,0:10])
image2 = Image.fromarray(data)
im = image2.save('../Scratch_and_Demo/saved.jpg')
image.mode # 'L' for greyscale
[u,s,vh]=np.linalg.svd(data)
s.shape
s
np.sum(s>100)
np.sum(s)
tot = np.sum(s)
x = 70
tot_xp = tot*x/100
tot_aux = 0
for i in range(s.shape[0]):
tot_aux += s[i]
if(tot_aux>tot_xp):
print(i)
break
threshold = i
c = np.empty(data.shape)
for i in range(threshold):
c += s[i]*np.dot(np.array([u[:,i]]).T,np.array([vh[i,:]]))
c.shape
image2 = Image.fromarray(c)
image2 = image2.convert("L")
im = image2.save('../Scratch_and_Demo/compressed_x.jpg')
597.8/745.2*100
533.4/745.2*100
SVD can also be used to calculate the _pseudo-_inverse of a matrix which is not necessarily square. Given that $A$ is a $(m\times n)$ matrix, its pseudo-inverse counterpart $A^+$ will be an $(n\times m)$ matrix such that:
$$A\cdot A^+ \approx \mathbb{1}$$where $\mathbb{1}$ is the $(m\times m)$ identity matrix (right inverse). A similar definition is also present for a left inverse where $A^+_{(n\times m)} \cdot A_{(m\times n)} \approx \mathbb{1}_{(n\times n)}$.
Pseudo-inverse matrix can be calculated from:
$$A^+ = VD^+U^T$$where $D^+$ indicates $\{\tfrac{1}{\sigma_i}\}$ of the non-zero elements of $D$. This equation is guaranteed to minimize the least squares error $|AA^+ - \mathbb{1}_n|_2$.
Example
Pseudo-inverse of $A=\begin{bmatrix}1 & 1\\1 & 1\end{bmatrix}$
A = np.array([[1,1],[1,1]])
[u,s,vh] = np.linalg.svd(A)
s[s!=0] = 1/s[s!=0]
D_p = np.diag(s)
D_p
A_pinv = np.linalg.multi_dot((vh.T,D_p.T,u.T))
A_pinv
np.dot(A,A_pinv)
Example
Pseudo-inverse of $B=\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}$
B = np.arange(1,7).reshape(2,3)
B
[u,s,vh] = np.linalg.svd(B)
print(u,"\n",s,"\n",vh)
s[s!=0] = 1/s[s!=0]
D_p = np.diag(s,-1)
D_p = D_p[[1,2],:]
print(D_p)
B_pinv = np.linalg.multi_dot((vh.T,D_p.T,u.T))
B_pinv
np.dot(B,B_pinv)
data = np.array([[ 1. , 4.9142],
[ 2. , 7.1201],
[ 3. , 8.8456],
[ 4. , 10.8113],
[ 5. , 13.2231]])
import matplotlib.pyplot as plt
plt.plot(data[:,0],data[:,1],"ro")
plt.show()
m = np.mean(data,0)
m
data_2 = data - m
S = np.dot(data_2.T,data_2)/4 # 4 factor is coming from 1/(n-1)
plt.plot(data_2[:,0],data_2[:,1],"ro")
print(S)
print("------------")
Sigma = np.cov(data_2[:,0],data_2[:,1])
print(Sigma)
[u,s,vh]=np.linalg.svd(S)
print(u,"\n\n",s,"\n\n",vh)
s[1]/np.sum(s)*100
np.linalg.norm(u[:,1])
u1=u[:,0]
u2=u[:,1]
np.dot(u1,u2)
u1 # principal axis
m1=u1[1]/u1[0]
m2=u2[1]/u2[0]
print(m1,m2)
data_2
x=np.linspace(-4,4,30)
y1 = m1*x
y2 = m2*x
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal','datalim')
plt.plot(data_2[:,0],data_2[:,1],"ro",x,y1,"b-",x,y2,"g-")
plt.show()
fig,ax1 = plt.subplots(1,1)
ax1.set_aspect('equal','datalim')
plt.plot(data[:,0],data[:,1],"ro",x+m[0],y1+m[1],"b-",
x+m[0],y2+m[1],"g-")
plt.show()
#s1_contribution
s[0]/np.sum(s)*100