Quadratic Forms & Positive Definiteness

Bentuk Kuadratik dan Syarat Optimisasi

NoteWhy This Matters for Your Work

Quadratic forms dan positive definiteness adalah jembatan antara linear algebra dan optimisasi — dan optimisasi ada di mana-mana:

  • OLS: meminimumkan \(\|\mathbf{y} - X\boldsymbol{\beta}\|^2 = (\mathbf{y}-X\boldsymbol{\beta})^T(\mathbf{y}-X\boldsymbol{\beta})\) — quadratic form in \(\boldsymbol{\beta}\). Unik minimum karena Hessian \(2X^TX\) adalah PSD.
  • MLE: maximum log-likelihood terjadi di mana Hessian negatif definit (titik adalah maximum, bukan saddle point).
  • Mahalanobis distance: \(d^2 = (\mathbf{x}-\boldsymbol{\mu})^T\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})\) — quadratic form yang selalu \(\geq 0\) karena \(\Sigma^{-1}\) PSD.
  • Covariance matrices: selalu PSD — variansi tidak bisa negatif. PD iff tidak ada perfect multicollinearity.
  • Second-order conditions: di ridge regression, GMM, kalman filter — semua menggunakan positive definiteness untuk guarantee unique optimum.

Pahami ini, dan kondisi-kondisi seperti “Hessian harus negatif definit” dan “covariance matrix PSD” tidak lagi jadi mantra ajaib — mereka punya makna geometris yang jelas.


1 1. Quadratic Forms: Definisi

ImportantDefinisi: Quadratic Form

Untuk matriks symmetric \(A \in \mathbb{R}^{n \times n}\) dan vektor \(\mathbf{x} \in \mathbb{R}^n\), quadratic form adalah fungsi skalar:

\[Q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j\]

Kita selalu asumsi \(A\) symmetric karena setiap quadratic form bisa ditulis dengan symmetric matrix (ganti \(a_{ij}\) dengan \((a_{ij} + a_{ji})/2\)).

Kasus \(n = 2\): untuk \(A = \begin{bmatrix}a_{11} & a_{12} \\ a_{12} & a_{22}\end{bmatrix}\):

\[Q(x_1, x_2) = a_{11}x_1^2 + 2a_{12}x_1x_2 + a_{22}x_2^2\]

Tiga suku: kuadrat \(x_1\), cross term \(x_1x_2\), dan kuadrat \(x_2\).

Kasus \(n = 3\): untuk \(A = \begin{bmatrix}a&b&c\\b&d&e\\c&e&f\end{bmatrix}\):

\[Q = ax_1^2 + dx_2^2 + fx_3^2 + 2bx_1x_2 + 2cx_1x_3 + 2ex_2x_3\]

Mengapa “quadratic”? Semua term berderajat 2. Quadratic forms adalah generalisasi multivariabel dari fungsi kuadrat univariabel \(q(x) = ax^2\).


2 2. Klasifikasi Quadratic Forms

ImportantDefinisi: Positive Definite dan Kawan-kawannya

Quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) (atau ekuivalen, matriks \(A\)) diklasifikasikan berdasarkan tanda:

Klasifikasi Kondisi Artinya
Positive definite (PD) \(\mathbf{x}^TA\mathbf{x} > 0\) untuk semua \(\mathbf{x} \neq \mathbf{0}\) Selalu positif kecuali di origin
Positive semidefinite (PSD) \(\mathbf{x}^TA\mathbf{x} \geq 0\) untuk semua \(\mathbf{x}\) Tidak pernah negatif (bisa nol)
Negative definite (ND) \(\mathbf{x}^TA\mathbf{x} < 0\) untuk semua \(\mathbf{x} \neq \mathbf{0}\) Selalu negatif
Negative semidefinite (NSD) \(\mathbf{x}^TA\mathbf{x} \leq 0\) untuk semua \(\mathbf{x}\) Tidak pernah positif
Indefinite Ada \(\mathbf{x}\), \(\mathbf{y}\) s.t. \(\mathbf{x}^TA\mathbf{x} > 0\) dan \(\mathbf{y}^TA\mathbf{y} < 0\) Bisa positif maupun negatif

Notasi: \(A \succ 0\) (PD), \(A \succeq 0\) (PSD), \(A \prec 0\) (ND), \(A \preceq 0\) (NSD).

Contoh visual: quadratic form \(Q(x_1, x_2)\) bisa divisualisasikan sebagai permukaan 3D: - PD: “mangkuk” — terbuka ke atas, minimum di origin - ND: “topi” — terbuka ke bawah, maximum di origin - Indefinite: “saddle” — seperti pelana kuda, minimum di satu arah, maximum di arah lain


3 3. Kriteria Eigenvalue

ImportantDefinisi: Eigenvalue Criterion untuk Definiteness

Untuk symmetric matrix \(A\) dengan eigenvalues \(\lambda_1, \ldots, \lambda_n\):

  • \(A\) PD \(\iff\) semua \(\lambda_i > 0\)
  • \(A\) PSD \(\iff\) semua \(\lambda_i \geq 0\)
  • \(A\) ND \(\iff\) semua \(\lambda_i < 0\)
  • \(A\) NSD \(\iff\) semua \(\lambda_i \leq 0\)
  • \(A\) Indefinite \(\iff\) ada \(\lambda_i > 0\) dan \(\lambda_j < 0\)

Bukti sketch (untuk PD): dari spectral theorem, \(A = Q\Lambda Q^T\) dengan \(Q\) orthogonal. Maka: \[\mathbf{x}^TA\mathbf{x} = \mathbf{x}^TQ\Lambda Q^T\mathbf{x} = \mathbf{y}^T\Lambda\mathbf{y} = \sum_i \lambda_i y_i^2\]

di mana \(\mathbf{y} = Q^T\mathbf{x}\). Jika semua \(\lambda_i > 0\), maka \(\sum_i \lambda_i y_i^2 > 0\) untuk \(\mathbf{y} \neq \mathbf{0}\) (ekuivalen dengan \(\mathbf{x} \neq \mathbf{0}\)). \(\square\)


4 4. Sylvester’s Criterion

ImportantDefinisi: Sylvester’s Criterion

Untuk matriks symmetric \(A \in \mathbb{R}^{n \times n}\), definisikan leading principal minors:

\[\Delta_k = \det(A_k) = \det\begin{bmatrix}a_{11} & \cdots & a_{1k} \\ \vdots & \ddots & \vdots \\ a_{k1} & \cdots & a_{kk}\end{bmatrix}\]

Sylvester’s Criterion: \[A \text{ positive definite} \iff \Delta_1 > 0, \Delta_2 > 0, \ldots, \Delta_n > 0\]

Untuk PD: semua \(\Delta_k > 0\).

Untuk ND: \(\Delta_k\) berselang-seling tanda, mulai negatif: \(\Delta_1 < 0, \Delta_2 > 0, \Delta_3 < 0, \ldots\)

Untuk PSD dan indefinite, Sylvester’s criterion tidak langsung berlaku — perlu eigenvalue criterion.

Sylvester’s criterion berguna karena tidak perlu hitung eigenvalues, cukup beberapa determinant yang lebih kecil.


5 5. Worked Example: Definiteness Check

Tentukan definiteness dari \(A = \begin{bmatrix}2 & 1 \\ 1 & 2\end{bmatrix}\).

5.1 Metode 1: Eigenvalue Criterion

Characteristic equation: \((2-\lambda)^2 - 1 = 0 \Rightarrow (2-\lambda)^2 = 1 \Rightarrow \lambda = 1\) atau \(\lambda = 3\).

Kedua eigenvalues \(\lambda_1 = 3 > 0\) dan \(\lambda_2 = 1 > 0\)\(A\) positive definite.

5.2 Metode 2: Sylvester’s Criterion

\(\Delta_1 = a_{11} = 2 > 0\)

\(\Delta_2 = \det(A) = (2)(2) - (1)(1) = 3 > 0\)

Kedua leading principal minors positif → \(A\) positive definite

5.3 Interpretasi Geometris

Quadratic form: \(Q(x_1, x_2) = 2x_1^2 + 2x_1x_2 + 2x_2^2\)

Ini adalah “mangkuk” paraboloid di \(\mathbb{R}^3\) — terbuka ke atas, minimum di \((0,0)\). Artinya: fungsi ini punya unique global minimum.

A <- matrix(c(2, 1, 1, 2), nrow=2)

# Eigenvalue check
eigenvalues <- eigen(A)$values
cat("Eigenvalues:", eigenvalues, "\n")  # 3, 1 — both positive → PD
cat("All positive:", all(eigenvalues > 0), "\n")

# Sylvester's criterion
cat("Delta_1:", A[1,1], "\n")          # 2 > 0
cat("Delta_2:", det(A), "\n")          # 3 > 0

# Cholesky (exists iff PD)
tryCatch(
  {L <- chol(A); cat("Cholesky exists → PD ✓\n"); print(L)},
  error = function(e) cat("Cholesky failed → NOT PD\n")
)

# Visualize the quadratic form
x1 <- seq(-2, 2, by=0.1)
x2 <- seq(-2, 2, by=0.1)
Q_vals <- outer(x1, x2, function(a, b) {
  sapply(1:length(a), function(i) c(a[i], b[i]) %*% A %*% c(a[i], b[i]))
})

# persp(x1, x2, Q_vals, main="Quadratic Form Q(x) = x^T A x",
#       xlab="x1", ylab="x2", zlab="Q(x)", theta=30, phi=20)
# Should show a "bowl" shape → PD

Tunjukkan bahwa OLS objective function \(f(\boldsymbol{\beta}) = \|\mathbf{y} - X\boldsymbol{\beta}\|^2\) adalah konveks dalam \(\boldsymbol{\beta}\), dengan unique global minimum.

Langkah 1: Tulis sebagai quadratic form.

\[f(\boldsymbol{\beta}) = (\mathbf{y} - X\boldsymbol{\beta})^T(\mathbf{y} - X\boldsymbol{\beta})\] \[= \mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^TX^T\mathbf{y} + \boldsymbol{\beta}^TX^TX\boldsymbol{\beta}\]

Langkah 2: Hitung Hessian.

\[H = \frac{\partial^2 f}{\partial\boldsymbol{\beta}\partial\boldsymbol{\beta}^T} = 2X^TX\]

Langkah 3: Periksa definiteness dari Hessian.

\(X^TX\) selalu positive semidefinite karena untuk sembarang \(\boldsymbol{\beta}\): \[\boldsymbol{\beta}^T(X^TX)\boldsymbol{\beta} = (X\boldsymbol{\beta})^T(X\boldsymbol{\beta}) = \|X\boldsymbol{\beta}\|^2 \geq 0\]

Jadi \(H = 2X^TX \succeq 0\)\(f\) adalah convex (Hessian PSD = convex function).

Langkah 4: Kapan ada unique minimum?

Unique minimum iff \(H \succ 0\) (strict convexity), yaitu \(X^TX \succ 0\), yaitu \(X\) full column rank. Kalau \(X\) tidak full column rank, \(f\) masih konveks tapi tidak strictly konveks → infinite minimizers (set minimizers adalah affine subspace).

Implikasi: OLS loss tidak punya “local minimum yang bukan global minimum” — setiap stationary point adalah global minimum. Ini juga berlaku untuk ridge regression dengan \(\lambda > 0\) yang membuat Hessian strictly PD.

# Visualize OLS loss landscape
n <- 50
x <- rnorm(n)
X <- cbind(1, x)  # dengan intercept
y <- 2 + 3*x + rnorm(n)

# Hessian of OLS loss = 2 X^T X
H <- 2 * t(X) %*% X
cat("Hessian:\n"); print(H)
cat("Eigenvalues of Hessian:", eigen(H)$values, "\n")  # both positive → PD
cat("Loss function is strictly convex → unique global minimum\n")

# OLS solution
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
cat("OLS solution:", beta_hat, "\n")

# Plot loss surface
beta0 <- seq(beta_hat[1]-2, beta_hat[1]+2, by=0.1)
beta1 <- seq(beta_hat[2]-1, beta_hat[2]+1, by=0.1)
loss <- outer(beta0, beta1, function(b0, b1) {
  apply(cbind(b0, b1), 1, function(b) sum((y - X %*% b)^2))
})
# This should show a convex bowl (elliptic paraboloid)

6 6. Applications: Mahalanobis Distance

CautionConnection: Mahalanobis Distance dan Positive Definiteness

Mahalanobis distance dari \(\mathbf{x}\) ke distribution \(\mathcal{N}(\boldsymbol{\mu}, \Sigma)\): \[d^2(\mathbf{x}, \boldsymbol{\mu}) = (\mathbf{x} - \boldsymbol{\mu})^T\Sigma^{-1}(\mathbf{x} - \boldsymbol{\mu})\]

Ini adalah quadratic form \(Q(\mathbf{z}) = \mathbf{z}^T\Sigma^{-1}\mathbf{z}\) dengan \(\mathbf{z} = \mathbf{x} - \boldsymbol{\mu}\).

Mengapa selalu ≥ 0? Karena \(\Sigma\) adalah covariance matrix → PSD → \(\Sigma^{-1}\) juga PSD (kalau invertible) → quadratic form ≥ 0.

Geometri: level sets \(\{x : d^2(\mathbf{x}, \boldsymbol{\mu}) = c\}\) adalah ellipsoids yang berorientasi sesuai eigenvectors \(\Sigma\) dan “radius” di arah ke-\(i\) adalah \(c/\lambda_i\) (\(\lambda_i\) = eigenvalue ke-\(i\) dari \(\Sigma\)).

Mahalanobis distance “menstandarisasi” Euclidean distance dengan mempertimbangkan variansi dan korelasi. Titik yang jauh dalam arah dengan variance besar dianggap tidak sejauh itu.

# Mahalanobis distance example
mu <- c(0, 0)
Sigma <- matrix(c(9, 4.5, 4.5, 4), nrow=2)  # correlated

# Point yang sama Euclidean distance tapi beda Mahalanobis
x1 <- c(3, 0)  # along x1 axis
x2 <- c(0, 2)  # along x2 axis

cat("Euclidean dist x1:", sqrt(sum(x1^2)), "\n")
cat("Euclidean dist x2:", sqrt(sum(x2^2)), "\n")

cat("Mahalanobis dist x1:", mahalanobis(x1, mu, Sigma), "\n")
cat("Mahalanobis dist x2:", mahalanobis(x2, mu, Sigma), "\n")
# x1 has more variance in x1 direction → smaller Mahalanobis dist

# PD check for Sigma
cat("Eigenvalues of Sigma:", eigen(Sigma)$values, "\n")  # both > 0 → PD
cat("Sigma is PD:", all(eigen(Sigma)$values > 0), "\n")

# Chi-squared distribution of Mahalanobis distances
set.seed(42)
n <- 1000
L <- chol(Sigma)
Z <- matrix(rnorm(2*n), 2, n)
X_sim <- t(L) %*% Z

# Mahalanobis distances should follow chi-squared(p) distribution
mah_dist <- mahalanobis(t(X_sim), mu, Sigma)
cat("Mean (should ≈ p=2):", mean(mah_dist), "\n")  # E[chi^2(p)] = p

7 7. Applications: Hessian dan Second-Order Conditions

CautionConnection: Hessian dan Optimisasi MLE

Dalam MLE, kita maximize log-likelihood \(\ell(\boldsymbol{\theta})\).

First-order condition: \(\nabla_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta}^*) = \mathbf{0}\) (score = 0 at MLE).

Second-order condition: Hessian \(H = \nabla^2_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta}^*)\) harus negatif definit untuk \(\boldsymbol{\theta}^*\) menjadi local maximum.

Kenapa? Taylor expansion di sekitar \(\boldsymbol{\theta}^*\): \[\ell(\boldsymbol{\theta}) \approx \ell(\boldsymbol{\theta}^*) + \underbrace{(\boldsymbol{\theta}-\boldsymbol{\theta}^*)^T\nabla\ell}_{\approx 0} + \frac{1}{2}(\boldsymbol{\theta}-\boldsymbol{\theta}^*)^TH(\boldsymbol{\theta}-\boldsymbol{\theta}^*)\]

Term terakhir adalah quadratic form dalam \(\boldsymbol{\delta} = \boldsymbol{\theta}-\boldsymbol{\theta}^*\). Agar \(\ell(\boldsymbol{\theta}) < \ell(\boldsymbol{\theta}^*)\) untuk semua perturbasi kecil \(\boldsymbol{\delta}\) (yaitu \(\boldsymbol{\theta}^*\) adalah maximum):

\[\frac{1}{2}\boldsymbol{\delta}^TH\boldsymbol{\delta} < 0 \text{ untuk semua } \boldsymbol{\delta} \neq \mathbf{0}\]

Ini artinya \(H\) harus negatif definit (\(H \prec 0\)).

Fisher Information Matrix: \(\mathcal{I}(\boldsymbol{\theta}) = -\mathbb{E}[H]\) adalah expected negative Hessian. PD iff model adalah “identifiable” (parameter-parameter bisa dipelajari dari data).

\(\text{Var}(\hat{\boldsymbol{\theta}}_{MLE}) \approx \mathcal{I}(\boldsymbol{\theta})^{-1}\) — Cramér-Rao bound.

# MLE example: logistic regression
# log-likelihood: l(beta) = sum_i [y_i log(p_i) + (1-y_i) log(1-p_i)]
# where p_i = sigmoid(x_i^T beta)

sigmoid <- function(x) 1 / (1 + exp(-x))

loglik <- function(beta, X, y) {
  p <- sigmoid(X %*% beta)
  sum(y * log(p + 1e-10) + (1-y) * log(1 - p + 1e-10))
}

# Hessian of logistic regression log-likelihood = -X^T W X
# where W = diag(p_i (1-p_i))
hessian_logistic <- function(beta, X) {
  p <- sigmoid(X %*% beta)
  W <- diag(as.vector(p * (1 - p)))
  -t(X) %*% W %*% X  # negative semidefinite!
}

# Check: at MLE, Hessian should be negative definite
n <- 200; p <- 3
X <- cbind(1, matrix(rnorm(n*(p-1)), n, p-1))
beta_true <- c(0.5, 1, -0.5)
y <- rbinom(n, 1, sigmoid(X %*% beta_true))

fit <- glm(y ~ X[,-1], family=binomial)
beta_hat <- coef(fit)

H <- hessian_logistic(beta_hat, X)
cat("Eigenvalues of Hessian at MLE:", eigen(H)$values, "\n")
# All should be negative (ND) → MLE is a maximum, not saddle point

# Fisher information (negative expected Hessian)
p_hat <- sigmoid(X %*% beta_hat)
W_hat <- diag(as.vector(p_hat * (1 - p_hat)))
FI <- t(X) %*% W_hat %*% X
cat("Eigenvalues of Fisher Information:", eigen(FI)$values, "\n")
# All positive (PD) → model is identifiable

# Standard errors from FI
se <- sqrt(diag(solve(FI)))
cat("Standard errors:", se, "\n")
cat("From glm:", summary(fit)$coef[,2], "\n")

8 8. Checking Positive Definiteness di R

# === Method 1: Eigenvalues ===
is_pd <- function(A) {
  all(eigen(A, symmetric=TRUE)$values > 0)
}

is_psd <- function(A) {
  all(eigen(A, symmetric=TRUE)$values >= -1e-10)  # toleransi numerik
}

# === Method 2: Cholesky (exists iff PD) ===
is_pd_chol <- function(A) {
  tryCatch({chol(A); TRUE}, error=function(e) FALSE)
}

# === Method 3: Sylvester's criterion ===
sylvester <- function(A) {
  n <- nrow(A)
  minors <- sapply(1:n, function(k) det(A[1:k, 1:k, drop=FALSE]))
  cat("Leading principal minors:", round(minors, 4), "\n")
  all(minors > 0)
}

# === Test cases ===
A1 <- matrix(c(4,2,2,3), nrow=2)    # PD
A2 <- matrix(c(1,2,2,4), nrow=2)    # PSD (singular, rank 1)
A3 <- matrix(c(-2,1,1,-3), nrow=2)  # ND
A4 <- matrix(c(1,2,2,-1), nrow=2)   # Indefinite

for (name in c("A1","A2","A3","A4")) {
  A <- get(name)
  ev <- eigen(A, symmetric=TRUE)$values
  cat(name, "- eigenvalues:", round(ev, 3),
      "- classification:",
      ifelse(all(ev > 1e-10), "PD",
      ifelse(all(ev >= -1e-10), "PSD",
      ifelse(all(ev < -1e-10), "ND",
      ifelse(all(ev <= 1e-10), "NSD", "Indefinite")))), "\n")
}

# === Covariance matrix ===
n <- 100
X <- matrix(rnorm(n*5), n, 5)
S <- var(X)  # sample covariance matrix
cat("\nSample covariance matrix eigenvalues:", round(eigen(S)$values, 4), "\n")
cat("All >= 0:", all(eigen(S)$values >= -1e-10), "\n")  # PSD

# Theoretical: if full rank sample, covariance should be PD
# If n < p, covariance will be PSD but not PD

9 9. Contoh: Elipsoid dan Mahalanobis

Quadratic forms dengan PD matrix mendefinisikan ellipsoids di \(\mathbb{R}^n\). Level set \(\{\mathbf{x} : \mathbf{x}^TA\mathbf{x} = 1\}\) adalah ellipsoid.

Sumbu-sumbu ellipsoid: eigenvectors dari \(A\). Panjang sumbu: \(1/\sqrt{\lambda_i}\) (semakin besar eigenvalue, semakin pendek sumbu → lebih “tight”).

library(ggplot2)

# 2D ellipsoid visualization
A_pd <- matrix(c(4, 2, 2, 2), nrow=2)  # PD
ev <- eigen(A_pd)
lambdas <- ev$values; V <- ev$vectors

cat("Eigenvalues:", lambdas, "\n")
cat("Semi-axes length (1/sqrt(lambda)):", 1/sqrt(lambdas), "\n")

# Generate ellipsoid: x^T A x = 1
# Parametrically: x(t) = V D^{-1/2} [cos(t), sin(t)]
theta <- seq(0, 2*pi, length.out=200)
unit_circle <- rbind(cos(theta), sin(theta))
ellipsoid_pts <- V %*% diag(1/sqrt(lambdas)) %*% unit_circle

df_ellipse <- data.frame(x=ellipsoid_pts[1,], y=ellipsoid_pts[2,])
df_axes <- data.frame(
  x0=c(0, 0),
  y0=c(0, 0),
  x1=V[1,] / sqrt(lambdas),  # scaled by 1/sqrt(lambda)
  y1=V[2,] / sqrt(lambdas)
)

ggplot(df_ellipse, aes(x, y)) +
  geom_path(color="blue", linewidth=1.2) +
  geom_segment(data=df_axes, aes(x=x0,y=y0,xend=x1,yend=y1),
               color="red", arrow=arrow(length=unit(0.3,"cm")), linewidth=1) +
  coord_equal() +
  labs(title="Ellipsoid: {x : x^T A x = 1}",
       subtitle="Red arrows: eigenvectors scaled by 1/sqrt(lambda)") +
  theme_minimal()

# For Mahalanobis distance: A = Sigma^{-1}
# Level sets of Mahalanobis distance are ellipsoids aligned with eigenvectors of Sigma
Sigma <- solve(A_pd)  # covariance = A^{-1}
cat("Sigma eigenvalues:", eigen(Sigma)$values, "\n")
# Ellipsoid aligned with eigenvectors of Sigma, semi-axes = sqrt(lambda_i of Sigma)

10 10. Summary: Definiteness dalam Konteks Stat/ML

Konteks Matriks Harus Kenapa
OLS Hessian \(2X^TX\) \(X^TX\) PSD (PD iff full rank) Convex loss → global minimum
MLE Hessian \(\nabla^2\ell\) \(H\) ND at MLE \(\theta^*\) adalah maximum
Covariance matrix \(\Sigma\) \(\Sigma\) PSD Variance tidak negatif
Mahalanobis distance \(\Sigma^{-1}\) PSD Distance tidak negatif
Fisher information \(\mathcal{I}\) \(\mathcal{I}\) PD Model teridentifikasi
Ridge Hessian \(2(X^TX+\lambda I)\) \(X^TX + \lambda I\) PD (untuk \(\lambda>0\)) Unique minimum

11 11. Practice Problems

Problem 1: Classify definiteness

Untuk setiap matriks berikut, tentukan apakah PD, PSD, ND, NSD, atau Indefinite:

  1. \(A = \begin{bmatrix}3 & -1 \\ -1 & 2\end{bmatrix}\)

  2. \(B = \begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix}\)

  3. \(C = \begin{bmatrix}1 & 3 \\ 3 & 2\end{bmatrix}\)

Solusi:

  1. \(\Delta_1 = 3 > 0\), \(\Delta_2 = 6-1 = 5 > 0\)PD. Eigenvalues: \((5 \pm \sqrt{5})/2 \approx 3.62, 1.38\) — both positive ✓

  2. \(\Delta_1 = 1 > 0\), \(\Delta_2 = 4-4 = 0\). Eigenvalues: 0 dan 5 → PSD (singular, rank 1). Kolom kedua = 2 × kolom pertama.

  3. \(\Delta_2 = 2-9 = -7 < 0\). Eigenvalues: \((3 \pm \sqrt{29})/2 \approx 4.19, -1.19\) — one positive, one negative → Indefinite.


Problem 2: OLS convexity

Bayangkan ridge regression: \(f_\lambda(\boldsymbol{\beta}) = \|\mathbf{y} - X\boldsymbol{\beta}\|^2 + \lambda\|\boldsymbol{\beta}\|^2\).

  1. Tulis \(f_\lambda\) dalam bentuk quadratic form.
  2. Hitung Hessian.
  3. Tunjukkan bahwa untuk \(\lambda > 0\), Hessian selalu PD.
  4. Apa implikasi untuk uniqueness of solution?

Solusi:

  1. \(f_\lambda(\boldsymbol{\beta}) = \boldsymbol{\beta}^TX^TX\boldsymbol{\beta} - 2\boldsymbol{\beta}^TX^T\mathbf{y} + \mathbf{y}^T\mathbf{y} + \lambda\boldsymbol{\beta}^T\boldsymbol{\beta}\) \(= \boldsymbol{\beta}^T(X^TX + \lambda I)\boldsymbol{\beta} - 2\boldsymbol{\beta}^TX^T\mathbf{y} + \text{const}\)

  2. Hessian: \(H = 2(X^TX + \lambda I)\)

  3. Untuk \(\lambda > 0\) dan sembarang \(\boldsymbol{\beta} \neq \mathbf{0}\): \[\boldsymbol{\beta}^T(X^TX + \lambda I)\boldsymbol{\beta} = \underbrace{\|X\boldsymbol{\beta}\|^2}_{\geq 0} + \underbrace{\lambda\|\boldsymbol{\beta}\|^2}_{> 0} > 0\]

Jadi \(X^TX + \lambda I \succ 0\)\(H = 2(X^TX + \lambda I) \succ 0\)PD. ✓

  1. PD Hessian → strictly convex loss → unique global minimum → unique \(\hat{\boldsymbol{\beta}}_\lambda = (X^TX + \lambda I)^{-1}X^T\mathbf{y}\), bahkan ketika \(X^TX\) singular (perfect multicollinearity).

Problem 3 (R): Investigasi definite

# Check bahwa covariance matrix dari data real adalah PSD
data(mtcars)
X <- scale(mtcars)  # standardize
S <- var(X)         # sample covariance

# Eigenvalues
ev <- eigen(S)$values
cat("Eigenvalues:\n"); print(round(ev, 4))
cat("All >= 0:", all(ev >= -1e-10), "\n")  # should be TRUE (PSD)
cat("Min eigenvalue:", min(ev), "\n")  # > 0 → actually PD for this data

# Check: Mahalanobis distance should always be non-negative
mu_hat <- colMeans(X)
d2 <- mahalanobis(X, mu_hat, S)
cat("All Mahalanobis distances >= 0:", all(d2 >= 0), "\n")
cat("Min:", min(d2), "Max:", max(d2), "\n")

# Under multivariate normality: d2 ~ chi^2(p)
p <- ncol(X)
cat("Expected mean (=p):", p, "\n")
cat("Sample mean:", mean(d2), "\n")  # should be ≈ p