Theory of Estimation

MLE, OLS, GMM — dan Sifat-sifat Ideal Estimator

statistics
estimation
MLE
GMM
Memahami MLE, OLS, GMM, dan sifat-sifat ideal estimator: unbiasedness, consistency, efficiency, dan Cramer-Rao bound.

1 Kenapa Ini Penting?

NoteWhy This Matters for Your Work

Estimator bukan hanya “formula”. Saat kamu menulis \(\hat{\beta} = (X^TX)^{-1}X^Ty\), kamu sedang membuat pilihan dari infinite banyaknya cara untuk mengestimasi \(\beta\). Kenapa ini pilihan yang baik?

Understanding why OLS works, when MLE is better, and what ‘efficient’ means separates good researchers from great ones.

Beberapa pertanyaan yang seharusnya kamu bisa jawab setelah bab ini: - Kenapa OLS memberikan BLUE (Best Linear Unbiased Estimator)? - Kenapa MLE asymptotically efficient (tidak ada estimator lain yang lebih efisien di large samples)? - Kapan bootstrap bisa lebih baik dari analytic formulas? - Apa sebenarnya Generalized Method of Moments, dan kenapa hampir semua estimators dalam econometrics bisa ditulis sebagai GMM?


2 1. Sifat-sifat Ideal Estimator

Misalkan kita ingin mengestimasi parameter \(\theta\) dari distribusi \(f(x|\theta)\) menggunakan sample \(X_1, \ldots, X_n\). Estimator \(\hat{\theta} = T(X_1, \ldots, X_n)\) adalah fungsi dari data.

2.1 Unbiasedness

ImportantDefinisi: Unbiasedness

Estimator \(\hat{\theta}\) unbiased jika: \[E[\hat{\theta}] = \theta \quad \text{untuk semua } \theta\]

Bias: \(\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta\)

Asymptotically unbiased: \(E[\hat{\theta}_n] \to \theta\) as \(n \to \infty\) (tapi mungkin bias untuk fixed \(n\)).

Contoh: - \(\bar{X} = \frac{1}{n}\sum X_i\): unbiased estimator untuk \(\mu = E[X]\) - \(S^2 = \frac{1}{n-1}\sum(X_i-\bar{X})^2\): unbiased untuk \(\sigma^2\) - \(\hat{\sigma}^2 = \frac{1}{n}\sum(X_i-\bar{X})^2\): biased untuk \(\sigma^2\) (tapi asymptotically unbiased)

Catatan penting: Unbiasedness tidak selalu artinya estimator bagus! Mean Squared Error (MSE) = Variance + Bias\(^2\). Estimator yang sedikit biased tapi variance sangat rendah bisa lebih baik (bias-variance tradeoff).

2.2 Consistency

ImportantDefinisi: Consistency

Estimator \(\hat{\theta}_n\) consistent jika: \[\hat{\theta}_n \xrightarrow{p} \theta \quad \text{sebagai } n \to \infty\]

Sufficient condition: \(E[\hat{\theta}_n] \to \theta\) dan \(\text{Var}(\hat{\theta}_n) \to 0\).

Consistency lebih fundamental dari unbiasedness untuk large-sample analysis. OLS bisa konsisten bahkan ketika errors tidak normal — yang penting adalah exogeneity assumption.

2.3 Efficiency (dalam class estimators tertentu)

ImportantDefinisi: Cramer-Rao Lower Bound

Untuk unbiased estimator \(\hat{\theta}\) dari parameter skalar \(\theta\):

\[\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}\]

dimana Fisher Information adalah: \[I(\theta) = E\left[\left(\frac{\partial \log f(X|\theta)}{\partial \theta}\right)^2\right] = -E\left[\frac{\partial^2 \log f(X|\theta)}{\partial \theta^2}\right]\]

Estimator yang mencapai bound ini disebut efficient (dalam kelas unbiased estimators).

Intuisi: Fisher Information mengukur “banyaknya informasi” tentang \(\theta\) yang terkandung dalam satu observasi. Semakin tajam likelihood function di sekitar true value, semakin besar \(I(\theta)\), dan semakin kecil variance minimum yang mungkin.

Multiparameter Cramer-Rao: Untuk vector parameter \(\boldsymbol{\theta}\): \[\text{Var}(\hat{\boldsymbol{\theta}}) \geq I(\boldsymbol{\theta})^{-1}\]

dimana inequality adalah matrix inequality (positive semi-definite).

2.4 Asymptotic Normality

ImportantDefinisi: Asymptotic Normality

Estimator \(\hat{\theta}_n\) asymptotically normal jika: \[\sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, V)\]

Asymptotic variance \(V\) bergantung pada estimator dan model. Untuk efficient estimators, \(V = I(\theta)^{-1}\).


3 2. Method of Moments (MOM)

Method of Moments adalah cara paling intuitif untuk estimation: set sample moments equal to population moments.

Prosedur: 1. Tentukan \(k\) population moments sebagai fungsi dari parameter \(\boldsymbol{\theta} = (\theta_1, \ldots, \theta_k)\): \(\mu_r = E[X^r] = g_r(\boldsymbol{\theta})\) 2. Hitung sample moments: \(\hat{\mu}_r = \frac{1}{n}\sum_{i=1}^n X_i^r\) 3. Solve system of equations: \(\hat{\mu}_r = g_r(\hat{\boldsymbol{\theta}})\)

Contoh: Normal distribution \(N(\mu, \sigma^2)\), dua parameter: - \(\mu_1 = \mu \Rightarrow \hat{\mu} = \bar{X}\) - \(\mu_2 = \mu^2 + \sigma^2 \Rightarrow \hat{\sigma}^2 = \hat{\mu}_2 - \hat{\mu}^2 = \frac{1}{n}\sum X_i^2 - \bar{X}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2\)

Kekurangan: MOM estimators tidak selalu efficient. MLE biasanya lebih efficient asymptotically.


4 3. Maximum Likelihood Estimation (MLE)

MLE adalah method paling umum digunakan dalam statistik parametrik.

ImportantDefinisi: Maximum Likelihood Estimation

Likelihood function: \(L(\theta; x_1,\ldots,x_n) = \prod_{i=1}^n f(x_i|\theta)\)

Log-likelihood: \(\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i|\theta)\)

MLE: \(\hat{\theta}_{MLE} = \arg\max_\theta \ell(\theta)\)

Score function: \(s(\theta) = \frac{\partial \ell(\theta)}{\partial \theta}\)

First Order Condition (FOC): \(s(\hat{\theta}_{MLE}) = 0\)

4.1 Properties of MLE

  1. Consistent: \(\hat{\theta}_{MLE} \xrightarrow{p} \theta_0\) (true parameter) under regularity conditions

  2. Asymptotically efficient: MLE achieves Cramer-Rao bound asymptotically: \[\sqrt{n}(\hat{\theta}_{MLE} - \theta_0) \xrightarrow{d} N\left(0, I(\theta_0)^{-1}\right)\]

  3. Invariant: Jika \(\hat{\theta}\) adalah MLE dari \(\theta\), maka \(g(\hat{\theta})\) adalah MLE dari \(g(\theta)\) untuk any function \(g\).

  4. Asymptotically normal: (dari property 2)


5 4. Worked Example: MLE untuk Normal Distribution

Data: \(X_1, \ldots, X_n \sim N(\mu, \sigma^2)\) iid. Derive MLEs.

Log-likelihood: \[\ell(\mu, \sigma^2) = \sum_{i=1}^n \log\left[\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x_i-\mu)^2}{2\sigma^2}}\right]\] \[= -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2\]

FOC untuk \(\mu\): \[\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2}\sum_{i=1}^n(x_i-\mu) = 0 \Rightarrow \hat{\mu}_{MLE} = \bar{x} = \frac{1}{n}\sum x_i\]

FOC untuk \(\sigma^2\) (let \(v = \sigma^2\)): \[\frac{\partial \ell}{\partial v} = -\frac{n}{2v} + \frac{1}{2v^2}\sum_{i=1}^n(x_i-\mu)^2 = 0\] \[\Rightarrow \hat{\sigma}^2_{MLE} = \frac{1}{n}\sum_{i=1}^n(x_i-\hat{\mu})^2\]

Catatan: \(\hat{\sigma}^2_{MLE}\) menggunakan \(1/n\) (bukan \(1/(n-1)\)), sehingga biased downward untuk finite \(n\). Tapi asymptotically unbiased dan consistent.

Fisher Information untuk normal: \[I(\mu, \sigma^2) = \begin{pmatrix} 1/\sigma^2 & 0 \\ 0 & 1/(2\sigma^4) \end{pmatrix}\]

Asymptotic variances: \(\text{Var}(\hat{\mu}) \approx \sigma^2/n\) dan \(\text{Var}(\hat{\sigma}^2) \approx 2\sigma^4/n\).

# MLE for Normal distribution using optim()
set.seed(2024)
n <- 100
true_mu <- 5; true_sigma <- 2
x <- rnorm(n, mean=true_mu, sd=true_sigma)

# Log-likelihood function (return negative for minimization)
neg_loglik <- function(params, data) {
  mu <- params[1]
  sigma2 <- params[2]
  if(sigma2 <= 0) return(Inf)
  n <- length(data)
  -(-n/2 * log(2*pi) - n/2 * log(sigma2) -
      1/(2*sigma2) * sum((data - mu)^2))
}

# Optimization
result <- optim(
  par = c(mean(x), var(x)),          # Initial values
  fn = neg_loglik,
  data = x,
  method = "L-BFGS-B",
  lower = c(-Inf, 1e-6)              # sigma2 > 0 constraint
)

cat("MLE mu:", result$par[1], "(true:", true_mu, ")\n")
cat("MLE sigma^2:", result$par[2], "(true:", true_sigma^2, ")\n")
cat("Analytical MLE mu:", mean(x), "\n")
cat("Analytical MLE sigma^2:", mean((x-mean(x))^2), "\n")

6 5. Generalized Method of Moments (GMM)

GMM adalah framework yang sangat umum — OLS, IV, MLE (untuk many models) semuanya bisa ditulis sebagai GMM.

ImportantDefinisi: GMM

Misalkan ada \(r\) moment conditions: \[E[g(x_i, \theta)] = \mathbf{0}, \quad g: \mathbb{R}^k \times \Theta \to \mathbb{R}^r\]

dimana \(r \geq \dim(\theta)\) (at least as many conditions as parameters).

Just-identified (\(r = \dim(\theta)\)): Solve \(\frac{1}{n}\sum_i g(x_i, \hat{\theta}) = \mathbf{0}\) directly.

Over-identified (\(r > \dim(\theta)\)): Minimize: \[J(\theta) = \left[\frac{1}{n}\sum_i g(x_i, \theta)\right]^T W \left[\frac{1}{n}\sum_i g(x_i, \theta)\right]\]

dimana \(W\) adalah positive definite weight matrix.

Optimal GMM: \(W = \left[E[g(x_i,\theta_0)g(x_i,\theta_0)^T]\right]^{-1}\) minimizes asymptotic variance.

6.1 Semua Estimator Klasik sebagai GMM

Estimator Moment Conditions
OLS \(E[x_i(y_i - x_i^T\beta)] = 0\)
IV/2SLS \(E[z_i(y_i - x_i^T\beta)] = 0\)
MLE (normal) Score equations: \(E[\partial \log f / \partial \theta] = 0\)
Method of Moments \(E[X^r] - \mu_r(\theta) = 0\)

6.2 Asymptotic Theory of GMM

Under regularity conditions: \[\sqrt{n}(\hat{\theta}_{GMM} - \theta_0) \xrightarrow{d} N(\mathbf{0}, V_{GMM})\]

dimana untuk optimal GMM: \[V_{GMM} = (G^T S^{-1} G)^{-1}\]

dengan \(G = E[\partial g / \partial \theta^T]\) dan \(S = E[g g^T]\) (variance of moment conditions).


7 6. Koneksi ke Econometrics

CautionConnection: Estimation dalam Econometrics

OLS sebagai GMM: Moment condition \(E[x_i(y_i - x_i^T\beta)] = 0\) adalah pernyataan \(E[x_i\varepsilon_i] = 0\) (exogeneity). OLS solves \(\frac{1}{n}X^T(y-X\hat{\beta}) = 0\), yang persis adalah sample analog.

IV estimation: Ketika \(x_i\) endogenous, kita butuh instrument \(z_i\) dengan \(E[z_i\varepsilon_i]=0\). IV adalah GMM dengan moment conditions \(E[z_i(y_i-x_i^T\beta)]=0\).

Efficient vs Consistent: OLS adalah consistent dan BLUE (Best Linear Unbiased) under Gauss-Markov. Dengan heteroskedasticity, OLS masih consistent tapi tidak efficient — GLS (Generalized Least Squares) lebih efficient.

ML untuk nonlinear models: Logit, probit, Poisson regression — semua menggunakan MLE karena tidak ada “natural” linear estimator. MLE is asymptotically efficient.


8 7. R Code: Estimation Methods

library(stats4)
library(bbmle)

# ============================================================
# METHOD OF MOMENTS vs MLE
# ============================================================
set.seed(42)
n <- 200
# Gamma(shape=3, rate=2): mean=1.5, var=0.75
x <- rgamma(n, shape=3, rate=2)

# Method of Moments for Gamma(alpha, beta):
# E[X] = alpha/beta => alpha_hat = mean^2/var
# Var[X] = alpha/beta^2 => beta_hat = mean/var
mean_x <- mean(x); var_x <- var(x)
alpha_mom <- mean_x^2 / var_x
beta_mom  <- mean_x / var_x
cat("=== METHOD OF MOMENTS ===\n")
cat(sprintf("alpha_hat = %.4f (true: 3)\n", alpha_mom))
cat(sprintf("beta_hat  = %.4f (true: 2)\n", beta_mom))

# MLE for Gamma
neg_loglik_gamma <- function(alpha, beta) {
  if(alpha <= 0 || beta <= 0) return(Inf)
  -sum(dgamma(x, shape=alpha, rate=beta, log=TRUE))
}

mle_result <- mle2(
  neg_loglik_gamma,
  start=list(alpha=alpha_mom, beta=beta_mom),
  method="L-BFGS-B",
  lower=c(alpha=1e-4, beta=1e-4)
)
cat("\n=== MLE ===\n")
print(summary(mle_result)@coef[, c("Estimate", "Std. Error")])

# ============================================================
# FISHER INFORMATION and CRAMER-RAO BOUND
# ============================================================
# For Normal(mu, sigma^2 known):
# I(mu) = n/sigma^2
# CR bound: Var(mu_hat) >= sigma^2/n

sigma2_known <- 4  # sigma^2 = 4
n_vals <- c(10, 50, 100, 500, 1000)
cr_bounds <- sigma2_known / n_vals

cat("\n=== CRAMER-RAO BOUND FOR NORMAL MEAN ===\n")
cat(sprintf("%-6s %-12s %-12s\n", "n", "CR Bound", "Var(Xbar)"))
for(i in seq_along(n_vals)) {
  n_i <- n_vals[i]
  sims_var <- var(replicate(2000, {
    xi <- rnorm(n_i, mean=0, sd=sqrt(sigma2_known))
    mean(xi)
  }))
  cat(sprintf("%-6d %-12.6f %-12.6f\n", n_i, cr_bounds[i], sims_var))
}

# ============================================================
# GMM: OLS as a special case
# ============================================================
set.seed(123)
n <- 300
X <- cbind(1, rnorm(n), rnorm(n))
beta_true <- c(2, 1.5, -0.8)
y <- X %*% beta_true + rnorm(n, sd=2)

# Moment conditions for OLS: g(beta) = X'(y - X*beta)
# Set sample analog to 0: X'(y - X*beta_hat) = 0
# Solution: beta_hat = (X'X)^{-1} X'y

beta_ols <- solve(t(X) %*% X) %*% t(X) %*% y
cat("\n=== OLS (GMM just-identified) ===\n")
cat("Estimated beta:", round(beta_ols, 4), "\n")
cat("True beta:     ", beta_true, "\n")

# Verify moment conditions are satisfied (should be ~0)
moment_conditions <- t(X) %*% (y - X %*% beta_ols)
cat("Moment conditions (should be ~0):", round(moment_conditions, 8), "\n")

9 Practice Problems

Problem 1: Unbiasedness check.

Buktikan bahwa \(S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2\) adalah unbiased estimator untuk \(\sigma^2\).

Hint: Tulis \(\sum(X_i-\bar{X})^2 = \sum X_i^2 - n\bar{X}^2\), kemudian ambil expectation.

Problem 2: Fisher Information.

Untuk \(X \sim \text{Bernoulli}(p)\): - Hitung Fisher Information \(I(p)\) - Tentukan Cramer-Rao lower bound untuk estimator \(\hat{p}\) - Tunjukkan bahwa \(\hat{p} = \bar{X}\) adalah efficient

Jawaban: \(\log f(x|p) = x\log p + (1-x)\log(1-p)\). \(\frac{\partial \log f}{\partial p} = x/p - (1-x)/(1-p)\). \(I(p) = E\left[(x/p - (1-x)/(1-p))^2\right] = 1/(p(1-p))\). CR Bound = \(p(1-p)/n\). \(\text{Var}(\bar{X}) = p(1-p)/n\) = CR Bound. Efficient!

Problem 3: Invariance of MLE.

Misalkan \(X_i \sim N(\mu, \sigma^2)\) dengan MLE \(\hat{\mu} = \bar{X}\) dan \(\hat{\sigma}^2 = \frac{1}{n}\sum(X_i-\bar{X})^2\).

  • Apa MLE untuk \(\sigma\) (bukan \(\sigma^2\))?
  • Apa MLE untuk \(P(X > 3)\)?

Jawaban: Oleh invariance property: - MLE untuk \(\sigma = \sqrt{\hat{\sigma}^2}\) - MLE untuk \(P(X>3) = 1 - \Phi((3-\hat{\mu})/\hat{\sigma})\)

Problem 4: GMM intuition.

Jelaskan kenapa dalam IV estimation, moment condition \(E[z_i(y_i - x_i^T\beta)] = 0\) adalah “the right” moment condition. Apa artinya secara ekonomi?

Jawaban: Kondisi ini menyatakan bahwa instrument \(z_i\) uncorrelated dengan error setelah controlling for \(\beta\). Secara ekonomi: instrument yang valid harus mempengaruhi outcome hanya melalui regressor \(x_i\) yang endogenous, tidak langsung.