Theory of Estimation
MLE, OLS, GMM — dan Sifat-sifat Ideal Estimator
1 Kenapa Ini Penting?
Estimator bukan hanya “formula”. Saat kamu menulis \(\hat{\beta} = (X^TX)^{-1}X^Ty\), kamu sedang membuat pilihan dari infinite banyaknya cara untuk mengestimasi \(\beta\). Kenapa ini pilihan yang baik?
Understanding why OLS works, when MLE is better, and what ‘efficient’ means separates good researchers from great ones.
Beberapa pertanyaan yang seharusnya kamu bisa jawab setelah bab ini: - Kenapa OLS memberikan BLUE (Best Linear Unbiased Estimator)? - Kenapa MLE asymptotically efficient (tidak ada estimator lain yang lebih efisien di large samples)? - Kapan bootstrap bisa lebih baik dari analytic formulas? - Apa sebenarnya Generalized Method of Moments, dan kenapa hampir semua estimators dalam econometrics bisa ditulis sebagai GMM?
2 1. Sifat-sifat Ideal Estimator
Misalkan kita ingin mengestimasi parameter \(\theta\) dari distribusi \(f(x|\theta)\) menggunakan sample \(X_1, \ldots, X_n\). Estimator \(\hat{\theta} = T(X_1, \ldots, X_n)\) adalah fungsi dari data.
2.1 Unbiasedness
Estimator \(\hat{\theta}\) unbiased jika: \[E[\hat{\theta}] = \theta \quad \text{untuk semua } \theta\]
Bias: \(\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta\)
Asymptotically unbiased: \(E[\hat{\theta}_n] \to \theta\) as \(n \to \infty\) (tapi mungkin bias untuk fixed \(n\)).
Contoh: - \(\bar{X} = \frac{1}{n}\sum X_i\): unbiased estimator untuk \(\mu = E[X]\) - \(S^2 = \frac{1}{n-1}\sum(X_i-\bar{X})^2\): unbiased untuk \(\sigma^2\) - \(\hat{\sigma}^2 = \frac{1}{n}\sum(X_i-\bar{X})^2\): biased untuk \(\sigma^2\) (tapi asymptotically unbiased)
Catatan penting: Unbiasedness tidak selalu artinya estimator bagus! Mean Squared Error (MSE) = Variance + Bias\(^2\). Estimator yang sedikit biased tapi variance sangat rendah bisa lebih baik (bias-variance tradeoff).
2.2 Consistency
Estimator \(\hat{\theta}_n\) consistent jika: \[\hat{\theta}_n \xrightarrow{p} \theta \quad \text{sebagai } n \to \infty\]
Sufficient condition: \(E[\hat{\theta}_n] \to \theta\) dan \(\text{Var}(\hat{\theta}_n) \to 0\).
Consistency lebih fundamental dari unbiasedness untuk large-sample analysis. OLS bisa konsisten bahkan ketika errors tidak normal — yang penting adalah exogeneity assumption.
2.3 Efficiency (dalam class estimators tertentu)
Untuk unbiased estimator \(\hat{\theta}\) dari parameter skalar \(\theta\):
\[\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}\]
dimana Fisher Information adalah: \[I(\theta) = E\left[\left(\frac{\partial \log f(X|\theta)}{\partial \theta}\right)^2\right] = -E\left[\frac{\partial^2 \log f(X|\theta)}{\partial \theta^2}\right]\]
Estimator yang mencapai bound ini disebut efficient (dalam kelas unbiased estimators).
Intuisi: Fisher Information mengukur “banyaknya informasi” tentang \(\theta\) yang terkandung dalam satu observasi. Semakin tajam likelihood function di sekitar true value, semakin besar \(I(\theta)\), dan semakin kecil variance minimum yang mungkin.
Multiparameter Cramer-Rao: Untuk vector parameter \(\boldsymbol{\theta}\): \[\text{Var}(\hat{\boldsymbol{\theta}}) \geq I(\boldsymbol{\theta})^{-1}\]
dimana inequality adalah matrix inequality (positive semi-definite).
2.4 Asymptotic Normality
Estimator \(\hat{\theta}_n\) asymptotically normal jika: \[\sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, V)\]
Asymptotic variance \(V\) bergantung pada estimator dan model. Untuk efficient estimators, \(V = I(\theta)^{-1}\).
3 2. Method of Moments (MOM)
Method of Moments adalah cara paling intuitif untuk estimation: set sample moments equal to population moments.
Prosedur: 1. Tentukan \(k\) population moments sebagai fungsi dari parameter \(\boldsymbol{\theta} = (\theta_1, \ldots, \theta_k)\): \(\mu_r = E[X^r] = g_r(\boldsymbol{\theta})\) 2. Hitung sample moments: \(\hat{\mu}_r = \frac{1}{n}\sum_{i=1}^n X_i^r\) 3. Solve system of equations: \(\hat{\mu}_r = g_r(\hat{\boldsymbol{\theta}})\)
Contoh: Normal distribution \(N(\mu, \sigma^2)\), dua parameter: - \(\mu_1 = \mu \Rightarrow \hat{\mu} = \bar{X}\) - \(\mu_2 = \mu^2 + \sigma^2 \Rightarrow \hat{\sigma}^2 = \hat{\mu}_2 - \hat{\mu}^2 = \frac{1}{n}\sum X_i^2 - \bar{X}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2\)
Kekurangan: MOM estimators tidak selalu efficient. MLE biasanya lebih efficient asymptotically.
4 3. Maximum Likelihood Estimation (MLE)
MLE adalah method paling umum digunakan dalam statistik parametrik.
Likelihood function: \(L(\theta; x_1,\ldots,x_n) = \prod_{i=1}^n f(x_i|\theta)\)
Log-likelihood: \(\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i|\theta)\)
MLE: \(\hat{\theta}_{MLE} = \arg\max_\theta \ell(\theta)\)
Score function: \(s(\theta) = \frac{\partial \ell(\theta)}{\partial \theta}\)
First Order Condition (FOC): \(s(\hat{\theta}_{MLE}) = 0\)
4.1 Properties of MLE
Consistent: \(\hat{\theta}_{MLE} \xrightarrow{p} \theta_0\) (true parameter) under regularity conditions
Asymptotically efficient: MLE achieves Cramer-Rao bound asymptotically: \[\sqrt{n}(\hat{\theta}_{MLE} - \theta_0) \xrightarrow{d} N\left(0, I(\theta_0)^{-1}\right)\]
Invariant: Jika \(\hat{\theta}\) adalah MLE dari \(\theta\), maka \(g(\hat{\theta})\) adalah MLE dari \(g(\theta)\) untuk any function \(g\).
Asymptotically normal: (dari property 2)
5 4. Worked Example: MLE untuk Normal Distribution
Data: \(X_1, \ldots, X_n \sim N(\mu, \sigma^2)\) iid. Derive MLEs.
Log-likelihood: \[\ell(\mu, \sigma^2) = \sum_{i=1}^n \log\left[\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x_i-\mu)^2}{2\sigma^2}}\right]\] \[= -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2\]
FOC untuk \(\mu\): \[\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2}\sum_{i=1}^n(x_i-\mu) = 0 \Rightarrow \hat{\mu}_{MLE} = \bar{x} = \frac{1}{n}\sum x_i\]
FOC untuk \(\sigma^2\) (let \(v = \sigma^2\)): \[\frac{\partial \ell}{\partial v} = -\frac{n}{2v} + \frac{1}{2v^2}\sum_{i=1}^n(x_i-\mu)^2 = 0\] \[\Rightarrow \hat{\sigma}^2_{MLE} = \frac{1}{n}\sum_{i=1}^n(x_i-\hat{\mu})^2\]
Catatan: \(\hat{\sigma}^2_{MLE}\) menggunakan \(1/n\) (bukan \(1/(n-1)\)), sehingga biased downward untuk finite \(n\). Tapi asymptotically unbiased dan consistent.
Fisher Information untuk normal: \[I(\mu, \sigma^2) = \begin{pmatrix} 1/\sigma^2 & 0 \\ 0 & 1/(2\sigma^4) \end{pmatrix}\]
Asymptotic variances: \(\text{Var}(\hat{\mu}) \approx \sigma^2/n\) dan \(\text{Var}(\hat{\sigma}^2) \approx 2\sigma^4/n\).
# MLE for Normal distribution using optim()
set.seed(2024)
n <- 100
true_mu <- 5; true_sigma <- 2
x <- rnorm(n, mean=true_mu, sd=true_sigma)
# Log-likelihood function (return negative for minimization)
neg_loglik <- function(params, data) {
mu <- params[1]
sigma2 <- params[2]
if(sigma2 <= 0) return(Inf)
n <- length(data)
-(-n/2 * log(2*pi) - n/2 * log(sigma2) -
1/(2*sigma2) * sum((data - mu)^2))
}
# Optimization
result <- optim(
par = c(mean(x), var(x)), # Initial values
fn = neg_loglik,
data = x,
method = "L-BFGS-B",
lower = c(-Inf, 1e-6) # sigma2 > 0 constraint
)
cat("MLE mu:", result$par[1], "(true:", true_mu, ")\n")
cat("MLE sigma^2:", result$par[2], "(true:", true_sigma^2, ")\n")
cat("Analytical MLE mu:", mean(x), "\n")
cat("Analytical MLE sigma^2:", mean((x-mean(x))^2), "\n")6 5. Generalized Method of Moments (GMM)
GMM adalah framework yang sangat umum — OLS, IV, MLE (untuk many models) semuanya bisa ditulis sebagai GMM.
Misalkan ada \(r\) moment conditions: \[E[g(x_i, \theta)] = \mathbf{0}, \quad g: \mathbb{R}^k \times \Theta \to \mathbb{R}^r\]
dimana \(r \geq \dim(\theta)\) (at least as many conditions as parameters).
Just-identified (\(r = \dim(\theta)\)): Solve \(\frac{1}{n}\sum_i g(x_i, \hat{\theta}) = \mathbf{0}\) directly.
Over-identified (\(r > \dim(\theta)\)): Minimize: \[J(\theta) = \left[\frac{1}{n}\sum_i g(x_i, \theta)\right]^T W \left[\frac{1}{n}\sum_i g(x_i, \theta)\right]\]
dimana \(W\) adalah positive definite weight matrix.
Optimal GMM: \(W = \left[E[g(x_i,\theta_0)g(x_i,\theta_0)^T]\right]^{-1}\) minimizes asymptotic variance.
6.1 Semua Estimator Klasik sebagai GMM
| Estimator | Moment Conditions |
|---|---|
| OLS | \(E[x_i(y_i - x_i^T\beta)] = 0\) |
| IV/2SLS | \(E[z_i(y_i - x_i^T\beta)] = 0\) |
| MLE (normal) | Score equations: \(E[\partial \log f / \partial \theta] = 0\) |
| Method of Moments | \(E[X^r] - \mu_r(\theta) = 0\) |
6.2 Asymptotic Theory of GMM
Under regularity conditions: \[\sqrt{n}(\hat{\theta}_{GMM} - \theta_0) \xrightarrow{d} N(\mathbf{0}, V_{GMM})\]
dimana untuk optimal GMM: \[V_{GMM} = (G^T S^{-1} G)^{-1}\]
dengan \(G = E[\partial g / \partial \theta^T]\) dan \(S = E[g g^T]\) (variance of moment conditions).
7 6. Koneksi ke Econometrics
OLS sebagai GMM: Moment condition \(E[x_i(y_i - x_i^T\beta)] = 0\) adalah pernyataan \(E[x_i\varepsilon_i] = 0\) (exogeneity). OLS solves \(\frac{1}{n}X^T(y-X\hat{\beta}) = 0\), yang persis adalah sample analog.
IV estimation: Ketika \(x_i\) endogenous, kita butuh instrument \(z_i\) dengan \(E[z_i\varepsilon_i]=0\). IV adalah GMM dengan moment conditions \(E[z_i(y_i-x_i^T\beta)]=0\).
Efficient vs Consistent: OLS adalah consistent dan BLUE (Best Linear Unbiased) under Gauss-Markov. Dengan heteroskedasticity, OLS masih consistent tapi tidak efficient — GLS (Generalized Least Squares) lebih efficient.
ML untuk nonlinear models: Logit, probit, Poisson regression — semua menggunakan MLE karena tidak ada “natural” linear estimator. MLE is asymptotically efficient.
8 7. R Code: Estimation Methods
library(stats4)
library(bbmle)
# ============================================================
# METHOD OF MOMENTS vs MLE
# ============================================================
set.seed(42)
n <- 200
# Gamma(shape=3, rate=2): mean=1.5, var=0.75
x <- rgamma(n, shape=3, rate=2)
# Method of Moments for Gamma(alpha, beta):
# E[X] = alpha/beta => alpha_hat = mean^2/var
# Var[X] = alpha/beta^2 => beta_hat = mean/var
mean_x <- mean(x); var_x <- var(x)
alpha_mom <- mean_x^2 / var_x
beta_mom <- mean_x / var_x
cat("=== METHOD OF MOMENTS ===\n")
cat(sprintf("alpha_hat = %.4f (true: 3)\n", alpha_mom))
cat(sprintf("beta_hat = %.4f (true: 2)\n", beta_mom))
# MLE for Gamma
neg_loglik_gamma <- function(alpha, beta) {
if(alpha <= 0 || beta <= 0) return(Inf)
-sum(dgamma(x, shape=alpha, rate=beta, log=TRUE))
}
mle_result <- mle2(
neg_loglik_gamma,
start=list(alpha=alpha_mom, beta=beta_mom),
method="L-BFGS-B",
lower=c(alpha=1e-4, beta=1e-4)
)
cat("\n=== MLE ===\n")
print(summary(mle_result)@coef[, c("Estimate", "Std. Error")])
# ============================================================
# FISHER INFORMATION and CRAMER-RAO BOUND
# ============================================================
# For Normal(mu, sigma^2 known):
# I(mu) = n/sigma^2
# CR bound: Var(mu_hat) >= sigma^2/n
sigma2_known <- 4 # sigma^2 = 4
n_vals <- c(10, 50, 100, 500, 1000)
cr_bounds <- sigma2_known / n_vals
cat("\n=== CRAMER-RAO BOUND FOR NORMAL MEAN ===\n")
cat(sprintf("%-6s %-12s %-12s\n", "n", "CR Bound", "Var(Xbar)"))
for(i in seq_along(n_vals)) {
n_i <- n_vals[i]
sims_var <- var(replicate(2000, {
xi <- rnorm(n_i, mean=0, sd=sqrt(sigma2_known))
mean(xi)
}))
cat(sprintf("%-6d %-12.6f %-12.6f\n", n_i, cr_bounds[i], sims_var))
}
# ============================================================
# GMM: OLS as a special case
# ============================================================
set.seed(123)
n <- 300
X <- cbind(1, rnorm(n), rnorm(n))
beta_true <- c(2, 1.5, -0.8)
y <- X %*% beta_true + rnorm(n, sd=2)
# Moment conditions for OLS: g(beta) = X'(y - X*beta)
# Set sample analog to 0: X'(y - X*beta_hat) = 0
# Solution: beta_hat = (X'X)^{-1} X'y
beta_ols <- solve(t(X) %*% X) %*% t(X) %*% y
cat("\n=== OLS (GMM just-identified) ===\n")
cat("Estimated beta:", round(beta_ols, 4), "\n")
cat("True beta: ", beta_true, "\n")
# Verify moment conditions are satisfied (should be ~0)
moment_conditions <- t(X) %*% (y - X %*% beta_ols)
cat("Moment conditions (should be ~0):", round(moment_conditions, 8), "\n")9 Practice Problems
Problem 1: Unbiasedness check.
Buktikan bahwa \(S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2\) adalah unbiased estimator untuk \(\sigma^2\).
Hint: Tulis \(\sum(X_i-\bar{X})^2 = \sum X_i^2 - n\bar{X}^2\), kemudian ambil expectation.
Problem 2: Fisher Information.
Untuk \(X \sim \text{Bernoulli}(p)\): - Hitung Fisher Information \(I(p)\) - Tentukan Cramer-Rao lower bound untuk estimator \(\hat{p}\) - Tunjukkan bahwa \(\hat{p} = \bar{X}\) adalah efficient
Jawaban: \(\log f(x|p) = x\log p + (1-x)\log(1-p)\). \(\frac{\partial \log f}{\partial p} = x/p - (1-x)/(1-p)\). \(I(p) = E\left[(x/p - (1-x)/(1-p))^2\right] = 1/(p(1-p))\). CR Bound = \(p(1-p)/n\). \(\text{Var}(\bar{X}) = p(1-p)/n\) = CR Bound. Efficient!
Problem 3: Invariance of MLE.
Misalkan \(X_i \sim N(\mu, \sigma^2)\) dengan MLE \(\hat{\mu} = \bar{X}\) dan \(\hat{\sigma}^2 = \frac{1}{n}\sum(X_i-\bar{X})^2\).
- Apa MLE untuk \(\sigma\) (bukan \(\sigma^2\))?
- Apa MLE untuk \(P(X > 3)\)?
Jawaban: Oleh invariance property: - MLE untuk \(\sigma = \sqrt{\hat{\sigma}^2}\) - MLE untuk \(P(X>3) = 1 - \Phi((3-\hat{\mu})/\hat{\sigma})\)
Problem 4: GMM intuition.
Jelaskan kenapa dalam IV estimation, moment condition \(E[z_i(y_i - x_i^T\beta)] = 0\) adalah “the right” moment condition. Apa artinya secara ekonomi?
Jawaban: Kondisi ini menyatakan bahwa instrument \(z_i\) uncorrelated dengan error setelah controlling for \(\beta\). Secara ekonomi: instrument yang valid harus mempengaruhi outcome hanya melalui regressor \(x_i\) yang endogenous, tidak langsung.