Matrix Decompositions
SVD, Cholesky, QR — Cara Matriks ‘Dibongkar’
Matrix decompositions adalah tools utama dalam komputasi linear algebra terapan. Hampir setiap algoritma numerik yang bekerja dengan matriks menggunakan satu atau lebih decomposition:
- SVD: PCA, recommender systems (collaborative filtering), image compression, pseudoinverse, low-rank approximation, natural language processing (LSA)
- QR: solusi OLS yang numerically stable (kenapa
lm()lebih baik dari inversi manual), least squares dengan constraints, Gram-Schmidt orthogonalization - Cholesky: simulasi multivariate normal di Monte Carlo, MCMC sampling, solving SPD systems efficiently (Bayesian posterior covariance), Kalman filter
- LU: solusi sistem linear umum, determinant computation
Kalau kamu pernah heran kenapa lm() di R lebih numerically stable dari solve(t(X)%*%X) %*% t(X) %*% y, jawabannya adalah: lm() pakai QR decomposition, yang menghindari “squaring the condition number” dari \(X^TX\).
1 1. Singular Value Decomposition (SVD)
SVD adalah dekomposisi yang paling umum dan paling powerful. Berlaku untuk semua matriks, tidak harus square, tidak harus symmetric.
Untuk matriks \(A \in \mathbb{R}^{m \times n}\), SVD adalah:
\[A = U\Sigma V^T\]
di mana: - \(U \in \mathbb{R}^{m \times m}\): left singular vectors, kolom-kolomnya orthonormal (\(U^TU = UU^T = I_m\)) - \(\Sigma \in \mathbb{R}^{m \times n}\): matriks “diagonal” dengan singular values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\) di diagonal, rest nol - \(V \in \mathbb{R}^{n \times n}\): right singular vectors, kolom-kolomnya orthonormal (\(V^TV = VV^T = I_n\))
di mana \(r = \text{rank}(A)\).
Properti: - Singular values: \(\sigma_i = \sqrt{\lambda_i(A^TA)} = \sqrt{\lambda_i(AA^T)}\) - \(\text{rank}(A)\) = jumlah singular values nonzero - \(\|A\|_F^2 = \sum_i \sigma_i^2\) (Frobenius norm) - \(\|A\|_2 = \sigma_1\) (spectral norm = largest singular value)
Interpretasi geometris: setiap linear transformation \(A\) bisa dipecah jadi 3 langkah: 1. Rotasi/refleksi \(V^T\) di input space \(\mathbb{R}^n\) 2. Scaling oleh \(\Sigma\) (stretch/shrink sepanjang sumbu koordinat) 3. Rotasi/refleksi \(U\) di output space \(\mathbb{R}^m\)
1.1 Thin SVD
Untuk komputasi, sering lebih praktis pakai thin (economy) SVD: \[A = U_r \Sigma_r V_r^T\]
di mana \(U_r \in \mathbb{R}^{m \times r}\), \(\Sigma_r \in \mathbb{R}^{r \times r}\), \(V_r \in \mathbb{R}^{n \times r}\), \(r = \text{rank}(A)\).
1.2 Best Rank-\(k\) Approximation (Eckart-Young Theorem)
Ini adalah salah satu hasil paling berguna dalam linear algebra:
\[A_k = U_k\Sigma_kV_k^T = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T\]
adalah best rank-\(k\) approximation dari \(A\) dalam arti: \[A_k = \underset{\text{rank}(B) \leq k}{\arg\min} \|A - B\|_F\]
Truncated SVD memberikan approximasi optimal! Ini adalah fondasi dari PCA, image compression, dan collaborative filtering.
2 2. SVD dan PCA
Kalau \(X \in \mathbb{R}^{n \times p}\) adalah data matrix yang sudah di-center (mean kolom = 0):
SVD dari \(X\): \(X = U\Sigma V^T\)
Koneksi ke eigendecomposition: - \(X^TX = V\Sigma^T U^T U \Sigma V^T = V(\Sigma^T\Sigma)V^T = V\Lambda V^T\)
di mana \(\Lambda = \text{diag}(\sigma_1^2, \ldots, \sigma_p^2)\). Ini adalah eigendecomposition dari \(X^TX\)!
- Eigenvalues dari \(X^TX\): \(\lambda_i = \sigma_i^2\)
- Eigenvectors dari \(X^TX\) (= loadings PCA): kolom-kolom \(V\)
Covariance matrix \(\hat{\Sigma} = \frac{1}{n-1}X^TX = \frac{1}{n-1}V\Lambda V^T\): - Eigenvalues: \(\sigma_i^2/(n-1)\) = variance explained by PC \(i\) - Proportion of variance: \(\sigma_i^2 / \sum_j \sigma_j^2\)
PC scores (koordinat data dalam basis principal components): \[Z = XV = U\Sigma V^T V = U\Sigma\]
Kolom ke-\(k\) dari \(Z\) adalah scores pada PC ke-\(k\).
# PCA via SVD — menghubungkan keduanya
set.seed(42)
n <- 200; p <- 5
# Generate correlated data
Sigma_true <- matrix(0.5, p, p)
diag(Sigma_true) <- 1
L <- chol(Sigma_true)
X_raw <- matrix(rnorm(n*p), n, p) %*% L
# Center data
X <- scale(X_raw, center=TRUE, scale=FALSE)
# SVD approach
svd_result <- svd(X)
sigma <- svd_result$d # singular values
U <- svd_result$u
V <- svd_result$v
# Variance explained
var_explained <- sigma^2 / sum(sigma^2)
cat("Proportion of variance explained:\n")
cat(round(var_explained, 3), "\n")
cat("Cumulative:", round(cumsum(var_explained), 3), "\n")
# PC scores
scores_svd <- U %*% diag(sigma) # = X %*% V
# Compare with prcomp()
pca <- prcomp(X)
# pca$sdev^2 = sigma^2 / (n-1) (not sigma^2, due to scaling)
# pca$rotation = V (loadings, possibly sign-flipped)
# pca$x = scores (= X %*% V)
cat("\nSVD singular values^2:", round(sigma^2, 2), "\n")
cat("prcomp eigenvalues * (n-1):", round(pca$sdev^2 * (n-1), 2), "\n")
# Should be equal3 3. Worked Example: SVD Computation
Hitung SVD dari \(A = \begin{bmatrix}3 & 1 \\ 1 & 3\end{bmatrix}\).
3.1 Step 1: Hitung \(A^TA\) dan eigendecomposition
\[A^TA = \begin{bmatrix}3&1\\1&3\end{bmatrix}\begin{bmatrix}3&1\\1&3\end{bmatrix} = \begin{bmatrix}10&6\\6&10\end{bmatrix}\]
Eigenvalues: \(\det(A^TA - \lambda I) = (10-\lambda)^2 - 36 = 0\) \[(10-\lambda)^2 = 36 \Rightarrow 10-\lambda = \pm 6\] \[\lambda_1 = 16, \quad \lambda_2 = 4\]
Singular values: \(\sigma_1 = \sqrt{16} = 4\), \(\sigma_2 = \sqrt{4} = 2\).
3.2 Step 2: Right singular vectors \(V\) (eigenvectors dari \(A^TA\))
Untuk \(\lambda_1 = 16\): \((A^TA - 16I)\mathbf{v} = 0\) \[\begin{bmatrix}-6&6\\6&-6\end{bmatrix}\mathbf{v} = 0 \Rightarrow v_1 = v_2 \Rightarrow \mathbf{v}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\end{bmatrix}\]
Untuk \(\lambda_2 = 4\): \((A^TA - 4I)\mathbf{v} = 0\) \[\begin{bmatrix}6&6\\6&6\end{bmatrix}\mathbf{v} = 0 \Rightarrow v_1 = -v_2 \Rightarrow \mathbf{v}_2 = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\-1\end{bmatrix}\]
\[V = \frac{1}{\sqrt{2}}\begin{bmatrix}1&1\\1&-1\end{bmatrix}\]
3.3 Step 3: Left singular vectors \(U\) (dari \(\mathbf{u}_i = \frac{1}{\sigma_i}A\mathbf{v}_i\))
\[\mathbf{u}_1 = \frac{1}{4}A\mathbf{v}_1 = \frac{1}{4}\begin{bmatrix}3&1\\1&3\end{bmatrix}\frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\end{bmatrix} = \frac{1}{4\sqrt{2}}\begin{bmatrix}4\\4\end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\end{bmatrix}\]
\[\mathbf{u}_2 = \frac{1}{2}A\mathbf{v}_2 = \frac{1}{2\sqrt{2}}\begin{bmatrix}3&1\\1&3\end{bmatrix}\begin{bmatrix}1\\-1\end{bmatrix} = \frac{1}{2\sqrt{2}}\begin{bmatrix}2\\-2\end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\-1\end{bmatrix}\]
\[U = \frac{1}{\sqrt{2}}\begin{bmatrix}1&1\\1&-1\end{bmatrix}\]
3.4 Step 4: Rekonstruksi
\[A = U\Sigma V^T = \frac{1}{\sqrt{2}}\begin{bmatrix}1&1\\1&-1\end{bmatrix}\begin{bmatrix}4&0\\0&2\end{bmatrix}\frac{1}{\sqrt{2}}\begin{bmatrix}1&1\\1&-1\end{bmatrix}\]
(Dalam kasus ini \(U = V\) karena \(A\) symmetric — untuk symmetric PSD matrices, \(U = V\).)
A <- matrix(c(3, 1, 1, 3), nrow=2)
svd_result <- svd(A)
cat("Singular values:", svd_result$d, "\n") # 4, 2
cat("U:\n"); print(svd_result$u)
cat("V:\n"); print(svd_result$v)
# Reconstruct A
U <- svd_result$u; D <- diag(svd_result$d); V <- svd_result$v
reconstructed <- U %*% D %*% t(V)
cat("Reconstructed A:\n"); print(round(reconstructed, 10))
cat("Max error:", max(abs(A - reconstructed)), "\n") # ~04 4. Low-Rank Approximation
Demonstrasi Eckart-Young theorem — approximasi matriks dengan rank lebih rendah.
# Generate a "true" rank-3 matrix with noise
set.seed(42)
n <- 50; p <- 40
# True signal: rank 3
U_true <- matrix(rnorm(n*3), n, 3)
V_true <- matrix(rnorm(p*3), p, 3)
A_signal <- U_true %*% t(V_true) # rank 3
# Add noise
A_noisy <- A_signal + matrix(rnorm(n*p, sd=0.5), n, p)
# SVD of noisy matrix
svd_noisy <- svd(A_noisy)
# Low-rank approximations
reconstruct <- function(svd_result, k) {
U_k <- svd_result$u[, 1:k, drop=FALSE]
d_k <- svd_result$d[1:k]
V_k <- svd_result$v[, 1:k, drop=FALSE]
U_k %*% diag(d_k) %*% t(V_k)
}
# Approximation errors vs true signal
errors <- sapply(1:15, function(k) {
A_k <- reconstruct(svd_noisy, k)
frobenius_err <- sqrt(sum((A_signal - A_k)^2))
frobenius_err
})
cat("Approximation error vs true signal (rank 1 to 15):\n")
cat(round(errors, 2), "\n")
# Should decrease up to rank ~3, then start increasing (overfitting noise)
# Scree plot
plot(1:15, errors, type="b",
xlab="Rank k", ylab="||A_signal - A_k||_F",
main="Low-Rank Approximation Error")
abline(v=3, col="red", lty=2)
# Variance explained (scree plot)
var_exp <- svd_noisy$d^2 / sum(svd_noisy$d^2)
cumulative_var <- cumsum(var_exp)
cat("Cumulative variance explained (first 5 SVs):", round(cumulative_var[1:5], 3), "\n")Interpretasi: elbow di scree plot menunjukkan rank “optimal”. Ini sama persis dengan prinsip yang digunakan di PCA untuk menentukan jumlah components yang dipertahankan.
5 5. QR Decomposition
Setiap matriks \(A \in \mathbb{R}^{m \times n}\) dengan \(m \geq n\) punya QR decomposition:
\[A = QR\]
di mana: - \(Q \in \mathbb{R}^{m \times n}\): orthonormal columns (\(Q^TQ = I_n\)) - \(R \in \mathbb{R}^{n \times n}\): upper triangular dengan positive diagonal
Algoritma: Gram-Schmidt orthogonalization pada kolom-kolom \(A\).
5.1 QR untuk OLS yang Numerically Stable
OLS naif: \(\hat{\boldsymbol{\beta}} = (X^TX)^{-1}X^T\mathbf{y}\)
Problem: \(\kappa(X^TX) = \kappa(X)^2\) — kondisi \(X^TX\) jauh lebih buruk dari \(X\).
OLS via QR: kalau \(X = QR\): \[X^TX = R^TQ^TQR = R^TR\] \[X^T\mathbf{y} = R^TQ^T\mathbf{y}\]
Normal equations \(X^TX\hat{\boldsymbol{\beta}} = X^T\mathbf{y}\) menjadi: \[R^TR\hat{\boldsymbol{\beta}} = R^TQ^T\mathbf{y}\] \[R\hat{\boldsymbol{\beta}} = Q^T\mathbf{y}\]
Selesaikan dengan back substitution (karena \(R\) upper triangular) — jauh lebih stabil!
Condition number: \(\kappa(R) = \kappa(X)\), bukan \(\kappa(X)^2\).
# OLS via QR vs naive inversion — numerical stability comparison
set.seed(42)
n <- 100; p <- 10
# Ill-conditioned X (near multicollinearity)
X <- matrix(rnorm(n*p), n, p)
X[, p] <- X[, 1] + rnorm(n, sd=1e-6) # nearly collinear
y <- rnorm(n)
cat("Condition number of X:", kappa(X), "\n")
cat("Condition number of X^TX:", kappa(t(X)%*%X), "\n")
# Method 1: Naive (numerically unstable for ill-conditioned X)
beta_naive <- solve(t(X) %*% X) %*% t(X) %*% y
# Method 2: QR (numerically stable)
qr_result <- qr(X)
beta_qr <- qr.coef(qr_result, y)
# Method 3: lm() (also uses QR internally)
beta_lm <- coef(lm(y ~ X - 1))
cat("Max diff naive vs QR:", max(abs(beta_naive - beta_qr)), "\n")
cat("Max diff lm vs QR:", max(abs(beta_lm - beta_qr)), "\n")
# For very ill-conditioned X, naive and QR may differ significantly6 6. Cholesky Decomposition
Untuk matriks symmetric positive definite (SPD) \(A \in \mathbb{R}^{n \times n}\), Cholesky decomposition memberikan:
\[A = LL^T\]
di mana \(L\) adalah lower triangular dengan positive diagonal entries.
Ekuivalen: \(A = R^TR\) dengan \(R\) upper triangular (konvensi R, chol() memberikan upper triangular).
Eksistensi dan Uniqueness: \(A\) punya Cholesky decomposition \(\iff\) \(A\) adalah positive definite. Dan decomposition ini unik.
Efisiensi: \(O(n^3/3)\) — dua kali lebih cepat dari LU decomposition.
6.1 Algoritma Cholesky
Untuk \(A = \begin{bmatrix}a_{11} & \mathbf{a}_{12}^T \\ \mathbf{a}_{12} & A_{22}\end{bmatrix}\):
\[L_{11} = \sqrt{a_{11}}, \quad L_{12} = \mathbf{0}, \quad \mathbf{l}_{21} = \mathbf{a}_{12}/L_{11}, \quad L_{22} = \text{chol}(A_{22} - \mathbf{l}_{21}\mathbf{l}_{21}^T)\]
6.2 Cholesky untuk Simulasi Multivariate Normal
Kita ingin simulate \(\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma)\).
Idea: kalau \(\mathbf{z} \sim \mathcal{N}(\mathbf{0}, I)\) dan \(\Sigma = LL^T\) (Cholesky), maka: \[\mathbf{x} = \boldsymbol{\mu} + L\mathbf{z} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma)\]
Bukti: \(\text{Cov}(L\mathbf{z}) = L\text{Cov}(\mathbf{z})L^T = LIL^T = LL^T = \Sigma\) ✓
Contoh: \[\Sigma = \begin{bmatrix}4 & 2 \\ 2 & 3\end{bmatrix}\]
Hitung Cholesky \(\Sigma = LL^T\):
\(L_{11} = \sqrt{\sigma_{11}} = \sqrt{4} = 2\)
\(L_{21} = \sigma_{21}/L_{11} = 2/2 = 1\)
\(L_{22} = \sqrt{\sigma_{22} - L_{21}^2} = \sqrt{3 - 1} = \sqrt{2}\)
\[L = \begin{bmatrix}2 & 0 \\ 1 & \sqrt{2}\end{bmatrix}\]
Verifikasi: \(LL^T = \begin{bmatrix}2&0\\1&\sqrt{2}\end{bmatrix}\begin{bmatrix}2&1\\0&\sqrt{2}\end{bmatrix} = \begin{bmatrix}4&2\\2&1+2\end{bmatrix} = \begin{bmatrix}4&2\\2&3\end{bmatrix} = \Sigma\) ✓
# Cholesky decomposition
Sigma <- matrix(c(4, 2, 2, 3), nrow=2)
mu <- c(1, 2)
# R's chol() returns upper triangular R where Sigma = R^T R
R <- chol(Sigma) # upper triangular
cat("R (upper triangular):\n"); print(R)
cat("R^T R (should = Sigma):\n"); print(round(t(R) %*% R, 10))
# For simulation, need L = t(R) (lower triangular)
L <- t(R)
cat("L (lower triangular):\n"); print(L)
# Simulate N(mu, Sigma) using Cholesky
set.seed(42)
n_sim <- 10000
Z <- matrix(rnorm(2 * n_sim), 2, n_sim) # 2 x n_sim matrix of N(0,I) samples
X_sim <- L %*% Z + mu # 2 x n_sim
# Verify
cat("Sample mean (should ≈ [1,2]):", rowMeans(X_sim), "\n")
cat("Sample covariance (should ≈ Sigma):\n")
print(round(var(t(X_sim)), 2))
# Application: Bayesian posterior simulation
# If posterior Sigma_post = L_post L_post^T, then
# beta_draw = beta_hat + L_post %*% rnorm(p)7 7. LU Decomposition
Untuk matriks general (tidak harus SPD) \(A \in \mathbb{R}^{n \times n}\):
\[PA = LU\]
di mana: - \(P\): permutation matrix (row swaps untuk numerical stability) - \(L\): lower triangular dengan 1 di diagonal - \(U\): upper triangular
LU adalah essentially Gaussian elimination dalam bentuk matriks. Setelah faktorisasi, solve \(A\mathbf{x} = \mathbf{b}\) dengan: 1. \(L\mathbf{y} = P\mathbf{b}\) (forward substitution) 2. \(U\mathbf{x} = \mathbf{y}\) (back substitution)
Efisien untuk solving multiple systems dengan matrix \(A\) yang sama.
# LU decomposition di R (melalui expand())
# install.packages("Matrix")
library(Matrix)
A <- matrix(c(2,1,-1,1,3,2,3,2,1), nrow=3, byrow=TRUE)
# R's base lu (via .Internal)
lu_result <- lu(Matrix(A))
elu <- expand(lu_result)
cat("L:\n"); print(elu$L)
cat("U:\n"); print(elu$U)
cat("P:\n"); print(elu$P)
# Verify PA = LU
cat("PA:\n"); print(elu$P %*% A)
cat("LU:\n"); print(as.matrix(elu$L %*% elu$U))8 8. Perbandingan Decompositions
| Decomposition | Syarat | Kompleksitas | Use Case Utama |
|---|---|---|---|
| SVD (\(U\Sigma V^T\)) | Semua matriks | \(O(\min(m,n) \cdot mn)\) | PCA, low-rank approx, pseudoinverse |
| QR (\(QR\)) | \(m \geq n\) | \(O(mn^2)\) | Numerically stable OLS, least squares |
| Cholesky (\(LL^T\)) | SPD | \(O(n^3/3)\) | Efficient solve, MVN simulation |
| LU (\(PLU\)) | Square, invertible | \(O(n^3/3)\) | General linear systems |
| Eigendecomp (\(PDP^{-1}\)) | Square, diagonalizable | \(O(n^3)\) | PCA (symmetric), AR stability |
Aturan praktis: - Perlu solve sistem linear → LU atau Cholesky (kalau SPD) - Perlu dimensionality reduction / low-rank → SVD - Perlu stable OLS computation → QR - Matriks covariance dan perlu simulate → Cholesky - Matriks symmetric dan perlu eigenvalues → eigendecomp (lebih intuitif), tapi SVD juga bekerja
9 9. Koneksi: Recommender Systems
Bayangkan matrix \(R \in \mathbb{R}^{m \times n}\) di mana baris = users, kolom = items (films, produk), dan \(R_{ij}\) = rating user \(i\) untuk item \(j\) (banyak missing values).
Matrix factorization (SVD-inspired): approximate \(R \approx U\Sigma V^T\) dengan rank \(k\): - \(U \in \mathbb{R}^{m \times k}\): user embeddings (latent user preferences) - \(V \in \mathbb{R}^{n \times k}\): item embeddings (latent item characteristics) - \(\sigma_i\): strength dari latent factor ke-\(i\)
Rating yang diprediksi: \(\hat{R}_{ij} = \mathbf{u}_i^T \mathbf{v}_j\) (dot product embedding user dan item).
Truncated SVD dengan \(k\) components terkecil memberikan regularisasi alami — kita hanya keep “important” latent factors, menghindari overfitting pada sparse data.
Ini adalah ide dasar Netflix Prize dan banyak sistem rekomendasi modern.
10 10. Comprehensive R Code
# ============================================
# SVD
# ============================================
A <- matrix(c(3,1,1,3), nrow=2)
svd_result <- svd(A)
svd_result$d # singular values: 4, 2
svd_result$u # left singular vectors
svd_result$v # right singular vectors
# Verify reconstruction
U <- svd_result$u; D <- diag(svd_result$d); V <- svd_result$v
reconstructed <- U %*% D %*% t(V)
max(abs(A - reconstructed)) # ~0
# Thin SVD for non-square matrix
B <- matrix(rnorm(4*3), 4, 3)
svd_B <- svd(B)
length(svd_B$d) # 3 (= min(4,3))
# Pseudoinverse via SVD (Moore-Penrose)
B_pinv <- V %*% solve(D) %*% t(U) # only for full-rank square case
# For general: only use non-zero singular values
# ============================================
# QR
# ============================================
X <- matrix(rnorm(50*5), 50, 5)
qr_result <- qr(X)
Q <- qr.Q(qr_result)
R <- qr.R(qr_result)
cat("Q^T Q (should = I):\n"); print(round(t(Q)%*%Q, 10))
cat("Q R (should = X):\n"); print(max(abs(Q%*%R - X)))
# OLS via QR
y <- rnorm(50)
beta_qr <- qr.coef(qr_result, y)
beta_naive <- solve(t(X)%*%X) %*% t(X) %*% y
max(abs(beta_qr - beta_naive)) # should be small
# ============================================
# Cholesky
# ============================================
Sigma <- matrix(c(4,2,2,3), nrow=2)
L <- chol(Sigma) # upper triangular in R!
cat("chol(Sigma):\n"); print(L)
# Simulate multivariate normal
set.seed(42)
n_sim <- 1000
Z <- matrix(rnorm(2*n_sim), 2, n_sim)
X_mvn <- t(L) %*% Z # t(L) = lower triangular
cat("Sample cov:\n"); print(round(var(t(X_mvn)), 2))
# Solve SPD system efficiently
b <- c(1, 2)
x_sol <- backsolve(L, forwardsolve(t(L), b))
cat("Solution:", x_sol, "\n")
cat("Verify:", Sigma %*% x_sol, "\n") # should = b
# Cholesky-based determinant (numerically stable)
log_det <- 2 * sum(log(diag(L))) # log|Sigma| = 2 * sum(log(diag(R)))
cat("log|Sigma|:", log_det, "\n")
cat("Direct:", log(det(Sigma)), "\n")11 11. Practice Problems
Problem 1: SVD Properties
Untuk matriks \(A = \begin{bmatrix}1&0\\0&2\\0&0\end{bmatrix} \in \mathbb{R}^{3 \times 2}\):
- Temukan SVD secara manual.
- Berapa rank dari \(A\)?
- Apa singular values-nya?
Solusi:
\(A\) sudah dalam bentuk yang sangat sederhana. \(A^TA = \begin{bmatrix}1&0\\0&4\end{bmatrix}\), eigenvalues = 1 dan 4.
Singular values: \(\sigma_1 = 2\), \(\sigma_2 = 1\).
Right singular vectors (eigenvectors dari \(A^TA\)): \(\mathbf{v}_1 = \begin{bmatrix}0\\1\end{bmatrix}\), \(\mathbf{v}_2 = \begin{bmatrix}1\\0\end{bmatrix}\).
Left singular vectors: \(\mathbf{u}_1 = \frac{1}{2}A\mathbf{v}_1 = \frac{1}{2}\begin{bmatrix}0\\2\\0\end{bmatrix} = \begin{bmatrix}0\\1\\0\end{bmatrix}\), \(\mathbf{u}_2 = A\mathbf{v}_2 = \begin{bmatrix}1\\0\\0\end{bmatrix}\).
Harus tambahkan satu \(\mathbf{u}_3\) yang orthogonal: \(\mathbf{u}_3 = \begin{bmatrix}0\\0\\1\end{bmatrix}\).
SVD: \(A = \begin{bmatrix}0&1&0\\1&0&0\\0&0&1\end{bmatrix}\begin{bmatrix}2&0\\0&1\\0&0\end{bmatrix}\begin{bmatrix}0&1\\1&0\end{bmatrix}\)
Rank = 2 (dua nonzero singular values).
Problem 2: Cholesky dari Scratch
Hitung Cholesky dari \(A = \begin{bmatrix}9&3&3\\3&5&1\\3&1&6\end{bmatrix}\).
Solusi:
\(L_{11} = \sqrt{9} = 3\)
\(L_{21} = 3/3 = 1\), \(L_{31} = 3/3 = 1\)
\(L_{22} = \sqrt{5 - 1^2} = \sqrt{4} = 2\)
\(L_{32} = (1 - 1\cdot 1)/2 = 0/2 = 0\)
\(L_{33} = \sqrt{6 - 1^2 - 0^2} = \sqrt{5}\)
\[L = \begin{bmatrix}3&0&0\\1&2&0\\1&0&\sqrt{5}\end{bmatrix}\]
Verifikasi dengan R:
A <- matrix(c(9,3,3,3,5,1,3,1,6), nrow=3, byrow=TRUE)
chol(A) # upper triangular = L^T
t(chol(A)) # lower triangular = LProblem 3: QR dan OLS
Jelaskan mengapa QR memberikan solusi OLS yang lebih numerically stable dibanding inversi langsung \((X^TX)^{-1}\).
Solusi:
Kondisi numerik dari sistem diukur dengan condition number \(\kappa\). Kalau \(\kappa(X) = c\), maka:
- Metode langsung: \(X^TX\) punya \(\kappa(X^TX) = c^2\) → kesalahan numerik dikuadratkan!
- QR method: Selesaikan \(R\hat{\boldsymbol{\beta}} = Q^T\mathbf{y}\). \(R\) punya \(\kappa(R) = \kappa(X) = c\) → tidak ada squaring.
Contoh konkret: kalau \(X\) punya \(\kappa = 10^6\) (tidak jarang dengan banyak predictors), maka \(X^TX\) punya \(\kappa = 10^{12}\). Dengan presisi float64 (~16 digit), kita kehilangan 12 digit presisi — practically meaningless!
QR menjaga kondisi numerik tetap di \(\kappa(X)\), memberikan solusi yang jauh lebih akurat.
Ini kenapa lm() lebih aman dari solve(t(X)%*%X)%*%t(X)%*%y untuk data yang collinear.