Asymptotic Theory

Kenapa Large-Sample Inference Bekerja

statistics
asymptotics
OLS
econometrics
Fondasi matematis dari asymptotic econometrics: konsistensi OLS, asymptotic normality, sandwich estimator, HC standard errors, dan Newey-West.

1 Kenapa Ini Penting?

NoteWhy This Matters for Your Work

Asymptotic theory adalah mathematical foundation of econometrics. Tanpanya, kita tidak punya justifikasi untuk menggunakan t-stats dan F-stats kecuali ketika errors benar-benar normal (yang jarang terjadi di real data).

Setiap kali kamu melihat di paper: - “Under mild regularity conditions, the estimator is consistent…” - “Asymptotically, the test statistic follows chi-squared distribution…” - “Using heteroskedasticity-robust standard errors…”

Itu semua adalah asymptotic theory. Bab ini memberikan kamu pemahaman tentang kenapa ini benar dan kapan approximasinya baik.

Pertanyaan yang akan kamu bisa jawab: - Kenapa OLS consistent? Asumsi apa yang dibutuhkan? - Kenapa t-stats “work” di large samples even with non-normal errors? - Apa sebenarnya “robust standard errors” dan kapan kamu butuh mereka? - Apa bedanya asymptotic efficiency dengan finite-sample efficiency?


2 1. Konsistensi OLS

ImportantDefinisi: Konsistensi OLS

Model: \(y = X\beta + \varepsilon\), dimana \(y\) dan \(\varepsilon\) adalah \(n \times 1\) vectors, \(X\) adalah \(n \times k\) matrix.

OLS estimator: \(\hat{\beta} = (X^TX)^{-1}X^Ty\)

Theorem: Di bawah kondisi: 1. Exogeneity: \(E[\varepsilon_i | x_i] = 0\) (atau weak: \(E[x_i \varepsilon_i] = 0\)) 2. LLN-ble: \(\frac{1}{n}X^TX \xrightarrow{p} Q_{XX} = E[x_i x_i^T]\) (finite, positive definite) 3. LLN-ble: \(\frac{1}{n}X^T\varepsilon \xrightarrow{p} 0\)

Maka: \(\hat{\beta} \xrightarrow{p} \beta\)

Proof sketch:

\[\hat{\beta} = (X^TX)^{-1}X^Ty = (X^TX)^{-1}X^T(X\beta + \varepsilon) = \beta + (X^TX)^{-1}X^T\varepsilon\]

\[\hat{\beta} - \beta = \underbrace{\left(\frac{X^TX}{n}\right)^{-1}}_{\xrightarrow{p} Q_{XX}^{-1}} \cdot \underbrace{\frac{X^T\varepsilon}{n}}_{\xrightarrow{p} 0}\]

Oleh Slutsky’s theorem (product of plim):

\[\hat{\beta} - \beta \xrightarrow{p} Q_{XX}^{-1} \cdot 0 = 0 \Rightarrow \hat{\beta} \xrightarrow{p} \beta\]

2.1 Asumsi untuk Kondisi 2 dan 3

Kondisi 2 terpenuhi jika \(\{x_i x_i^T\}\) adalah stationary ergodic dengan finite second moments.

Kondisi 3 terpenuhi jika \(E[x_i \varepsilon_i] = 0\) (exogeneity) dan \(\{x_i\varepsilon_i\}\) satisfies LLN.

Apa yang TIDAK dibutuhkan: Normality of errors! OLS konsisten tanpa normality assumption. Ini adalah keunggulan besar asymptotic theory.


3 2. Asymptotic Normality OLS

ImportantDefinisi: Asymptotic Normality OLS

Di bawah kondisi konsistensi ditambah: 4. CLT-ble: \(\frac{1}{\sqrt{n}}X^T\varepsilon \xrightarrow{d} N(\mathbf{0}, S)\) dimana \(S = E[\varepsilon_i^2 x_i x_i^T]\)

Maka: \[\sqrt{n}(\hat{\beta} - \beta) \xrightarrow{d} N\left(\mathbf{0},\ Q_{XX}^{-1} S Q_{XX}^{-1}\right)\]

Kasus homoskedastik (\(\text{Var}(\varepsilon_i|x_i) = \sigma^2\)): \(S = \sigma^2 Q_{XX}\), sehingga: \[\sqrt{n}(\hat{\beta} - \beta) \xrightarrow{d} N(\mathbf{0},\ \sigma^2 Q_{XX}^{-1})\]

Proof sketch (homoskedastic case):

\[\sqrt{n}(\hat{\beta} - \beta) = \left(\frac{X^TX}{n}\right)^{-1} \frac{1}{\sqrt{n}}X^T\varepsilon\]

By CLT: \(\frac{1}{\sqrt{n}}\sum_i x_i\varepsilon_i \xrightarrow{d} N(0, E[x_i x_i^T \varepsilon_i^2]) = N(0, \sigma^2 Q_{XX})\) (under homoskedasticity)

By Slutsky (\((X^TX/n)^{-1} \xrightarrow{p} Q_{XX}^{-1}\)): \[\sqrt{n}(\hat{\beta}-\beta) \xrightarrow{d} Q_{XX}^{-1} \cdot N(0, \sigma^2 Q_{XX}) = N(0, \sigma^2 Q_{XX}^{-1})\]


4 3. Sandwich Estimator (HC Standard Errors)

Ketika errors heteroskedastic (\(\text{Var}(\varepsilon_i|x_i) = \sigma_i^2\)), standard OLS SEs tidak valid.

ImportantDefinisi: Sandwich Estimator (HC SE)

Estimasi dari \(\text{Avar}(\hat{\beta}) = Q_{XX}^{-1} S Q_{XX}^{-1}\) menggunakan:

\[\hat{V}_{HC} = (X^TX)^{-1}\left(\sum_{i=1}^n \hat{\varepsilon}_i^2 x_i x_i^T\right)(X^TX)^{-1}\]

Bentuk “sandwich”: \(\underbrace{(X^TX)^{-1}}_{\hat{Q}_{XX}^{-1}} \underbrace{\left(\sum \hat{\varepsilon}_i^2 x_i x_i^T\right)}_{\hat{S}} \underbrace{(X^TX)^{-1}}_{\hat{Q}_{XX}^{-1}}\)

Varianten (finite sample corrections): - HC0 (White 1980): \(\hat{S} = \sum_i \hat{\varepsilon}_i^2 x_i x_i^T\) - HC1: \(\frac{n}{n-k} \hat{S}_{HC0}\) - HC3: \(\sum_i \frac{\hat{\varepsilon}_i^2}{(1-h_{ii})^2} x_i x_i^T\) — best for small \(n\), \(h_{ii}\) = leverage

Kapan gunakan HC SEs?

  • Selalu di cross-sectional data (lebih konservatif, tidak ada cost jika homoskedastic)
  • Di panel data, gunakan cluster-robust SEs
  • Di time series, gunakan HAC (Newey-West) SEs

Intuisi sandwich: \((X^TX)^{-1}\) adalah “bread”, \(\sum \hat{\varepsilon}_i^2 x_i x_i^T\) adalah “meat”. Sandwich = bread-meat-bread. Meat menggunakan squared residuals untuk estimate heteroskedasticity.


5 4. Newey-West HAC Standard Errors

Untuk time series data dengan serial correlation:

ImportantDefinisi: HAC (Newey-West) Estimator

Ketika \(\{x_i \varepsilon_i\}\) autocorrelated (time series), kita perlu estimate long-run variance. Newey-West estimator:

\[\hat{S}_{NW} = \hat{\Gamma}_0 + \sum_{j=1}^L \left(1 - \frac{j}{L+1}\right)(\hat{\Gamma}_j + \hat{\Gamma}_j^T)\]

dimana \(\hat{\Gamma}_j = \frac{1}{n}\sum_{i=j+1}^n \hat{u}_i \hat{u}_{i-j} x_i x_{i-j}^T\) adalah lag-\(j\) autocovariance estimate.

\(L\) adalah bandwidth (number of lags) — typically \(L = \lfloor 4(n/100)^{2/9} \rfloor\) (automatic rule).


6 5. Konsistensi vs Efisiensi

ImportantDefinisi: Consistency vs Efficiency Trade-off

Consistency: \(\hat{\theta} \xrightarrow{p} \theta\) — estimator converges ke true value.

Efficiency: Estimator dengan minimum asymptotic variance di antara class tertentu dari estimators.

Perbedaan kunci: - OLS bisa konsisten tapi tidak efisien (dengan heteroskedasticity) - GLS efisien tapi butuh knowing \(\Omega = \text{Cov}(\varepsilon)\) - Estimated GLS (FGLS) konsisten dan asymptotically efficient

Contoh: Dengan heteroskedasticity, OLS adalah konsisten dan unbiased, tapi WLS (Weighted Least Squares, a form of GLS) lebih efisien — SEs lebih kecil, confidence intervals lebih sempit.


7 6. Asymptotic Efficiency — Cramer-Rao in Large Samples

ImportantDefinisi: Asymptotic Efficiency

Estimator \(\hat{\theta}_n\) adalah asymptotically efficient jika, di antara semua asymptotically normal estimators yang consistent, ia memiliki minimum asymptotic variance.

Hasil kunci: MLE adalah asymptotically efficient — mencapai inverse Fisher information: \[\sqrt{n}(\hat{\theta}_{MLE} - \theta) \xrightarrow{d} N(0, I(\theta)^{-1})\]

OLS adalah asymptotically efficient hanya dalam class of linear estimators under homoskedasticity (Gauss-Markov in asymptotic version).


8 7. Worked Example: OLS Consistency via Simulation

Kita simulate untuk menunjukkan: 1. OLS consistent (bias dan variance go to 0 as \(n\) increases) 2. Dengan endogeneity, OLS not consistent

set.seed(2024)

# ============================================================
# CASE 1: OLS consistent (exogeneity holds)
# ============================================================
simulate_ols_consistency <- function(n, n_sims=1000) {
  betas <- replicate(n_sims, {
    x <- rnorm(n)
    e <- rnorm(n, sd=2)  # Independent of x
    y <- 2 + 1.5*x + e   # True beta = 1.5
    lm(y~x)$coef[2]
  })
  return(betas)
}

ns <- c(30, 100, 500, 2000)
cat("=== OLS CONSISTENCY UNDER EXOGENEITY ===\n")
cat(sprintf("%-6s %-10s %-10s %-10s\n", "n", "Bias", "Var", "MSE"))
for(n in ns) {
  betas <- simulate_ols_consistency(n)
  bias_n <- mean(betas) - 1.5
  var_n <- var(betas)
  mse_n <- bias_n^2 + var_n
  cat(sprintf("%-6d %-10.5f %-10.5f %-10.5f\n", n, bias_n, var_n, mse_n))
}

# ============================================================
# CASE 2: OLS inconsistent (endogeneity)
# ============================================================
simulate_ols_endogeneity <- function(n, n_sims=1000) {
  betas <- replicate(n_sims, {
    u <- rnorm(n)          # Unobserved confounder
    x <- 0.5*u + rnorm(n) # x correlated with error!
    e <- 0.7*u + rnorm(n) # error contains u
    y <- 2 + 1.5*x + e    # True beta = 1.5 still
    lm(y~x)$coef[2]
  })
  return(betas)
}

cat("\n=== OLS INCONSISTENCY UNDER ENDOGENEITY ===\n")
cat(sprintf("%-6s %-10s (True beta = 1.5)\n", "n", "Mean(beta_hat)"))
for(n in ns) {
  betas <- simulate_ols_endogeneity(n)
  cat(sprintf("%-6d %-10.5f\n", n, mean(betas)))
}
cat("OLS doesn't converge to 1.5! It converges to ~1.5 + bias term.\n")

# ============================================================
# CASE 3: Asymptotic normality — verify CLT applies
# ============================================================
n <- 200; n_sims <- 5000; true_beta <- 1.5
betas <- replicate(n_sims, {
  x <- rnorm(n)
  e <- rexp(n) - 1  # Non-normal! But CLT should still hold
  y <- 2 + true_beta*x + e
  coef(lm(y~x))[2]
})

# Standardize: sqrt(n) * (beta_hat - beta) / asymptotic_sd
sigma_e <- 1  # Var(exp(1)-1) = Var(exp(1)) = 1
sigma_x <- 1  # Var(N(0,1)) = 1
asymp_sd <- sigma_e / (sigma_x * sqrt(n))

z_standardized <- (betas - true_beta) / asymp_sd

# Test normality of standardized betas
cat("\n=== ASYMPTOTIC NORMALITY CHECK ===\n")
cat("Error distribution: Exponential (non-normal!)\n")
cat(sprintf("Mean of z:  %.4f (expect 0)\n", mean(z_standardized)))
cat(sprintf("Var of z:   %.4f (expect 1)\n", var(z_standardized)))
cat(sprintf("Skewness:   %.4f (expect 0 for normal)\n",
            mean(z_standardized^3)))
cat(sprintf("Shapiro-Wilk p-value: %.4f\n",
            shapiro.test(sample(z_standardized, 5000))$p.value))

hist(z_standardized, probability=TRUE, breaks=50,
     main="Standardized OLS Beta: Non-Normal Errors, n=200",
     xlab="z = sqrt(n)*(b_hat - b) / sigma")
curve(dnorm(x), add=TRUE, col="red", lwd=2)

9 8. Asymptotics dalam Praktek: FAQ

Q: Berapa \(n\) yang cukup untuk asymptotic approximations?

Tergantung masalah: - Simple regression dengan normal-ish data: \(n \geq 30\) biasanya cukup - Dengan severe heteroskedasticity atau skewed distributions: \(n \geq 100\) - Complex nonlinear models (probit, Poisson): \(n \geq 200\) - Simulations untuk verify!

Q: Kalau \(n\) kecil, apa yang harus dilakukan?

  • Exact methods (exact t-test, Fisher exact test untuk 2x2 tables)
  • Bootstrap untuk CIs
  • Bayesian inference dengan informative priors

Q: Kapan gunakan robust SEs?

  • Cross-sectional data: selalu gunakan HC3
  • Panel data: cluster-robust (cluster on individual)
  • Time series: HAC (Newey-West)
  • RCT dengan clustering: cluster-robust, cluster at unit-of-randomization level

10 9. Koneksi ke Applied Econometrics

CautionConnection: Asymptotic Theory dalam Econometrics

Robust inference di Stata/R: robust, vce(cluster), vcov=vcovHC() — semua implement sandwich estimator.

Asymptotic equivalence tests: LR, Wald, Score semua asymptotically equivalent (same power against local alternatives). Di finite samples mereka bisa differ — usually LR is most reliable.

Panel data: FE estimator consistency requires \(T \to \infty\) (or \(n \to \infty\) with \(T\) fixed under strict exogeneity). “Incidental parameters problem” arises when \(T\) fixed dan \(n \to \infty\) for nonlinear FE.

Instrumental variables: IV consistency requires \(E[z_i \varepsilon_i] = 0\) (instrument exogeneity) dan \(E[z_i x_i] \neq 0\) (instrument relevance). Weak instruments (\(E[z_i x_i] \approx 0\)) cause poor finite-sample performance even when consistent asymptotically.

Nonparametric/semiparametric: Convergence rates often \(n^{-r}\) for \(r < 1/2\), slower than \(n^{-1/2}\) parametric rate. Bandwidth selection is crucial.


11 10. R Code: HC dan HAC Standard Errors

library(sandwich)
library(lmtest)

# ============================================================
# COMPARE: OLS, HC, Cluster-Robust SEs
# ============================================================
set.seed(2024)
n <- 500

# Generate heteroskedastic data
x <- rnorm(n)
sigma_i <- 0.5 + 0.5*abs(x)  # Heteroskedasticity
e <- rnorm(n, sd=sigma_i)
y <- 2 + 1.5*x + e

model <- lm(y ~ x)

# Different SE estimates
se_ols  <- sqrt(diag(vcov(model)))                       # Standard OLS SE
se_hc0  <- sqrt(diag(vcovHC(model, type="HC0")))         # White 1980
se_hc1  <- sqrt(diag(vcovHC(model, type="HC1")))         # With n/(n-k)
se_hc3  <- sqrt(diag(vcovHC(model, type="HC3")))         # Recommended

cat("=== COMPARISON OF STANDARD ERRORS ===\n")
cat(sprintf("%-12s %-12s %-12s\n", "Type", "SE(intercept)", "SE(slope)"))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "OLS",  se_ols[1],  se_ols[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HC0",  se_hc0[1],  se_hc0[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HC1",  se_hc1[1],  se_hc1[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HC3",  se_hc3[1],  se_hc3[2]))

# ============================================================
# HAC STANDARD ERRORS FOR TIME SERIES
# ============================================================
# Generate AR(1) process
set.seed(42)
n_ts <- 300
phi <- 0.5  # Autocorrelation

# AR(1) error
e_ts <- numeric(n_ts)
e_ts[1] <- rnorm(1)
for(t in 2:n_ts) e_ts[t] <- phi*e_ts[t-1] + rnorm(1)

x_ts <- rnorm(n_ts)
y_ts <- 2 + 1.5*x_ts + e_ts

model_ts <- lm(y_ts ~ x_ts)

# Standard vs HAC SEs
se_standard_ts <- sqrt(diag(vcov(model_ts)))
se_hac <- sqrt(diag(vcovHAC(model_ts)))  # Automatic bandwidth
se_nw  <- sqrt(diag(NeweyWest(model_ts))) # Newey-West

cat("\n=== TIME SERIES: OLS vs HAC SEs ===\n")
cat("(True beta_x = 1.5, AR(1) errors with phi =", phi, ")\n")
cat(sprintf("%-12s %-12s %-12s\n", "Type", "SE(intercept)", "SE(slope)"))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "OLS Standard", se_standard_ts[1], se_standard_ts[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HAC (auto)",   se_hac[1], se_hac[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "Newey-West",   se_nw[1], se_nw[2]))

# ============================================================
# MONTE CARLO: Coverage of different CIs
# ============================================================
n_mc <- 2000; n <- 100; true_b <- 1.5
coverage_ols <- 0; coverage_hc <- 0

for(i in 1:n_mc) {
  x_mc <- rnorm(n)
  sigma_mc <- 0.5 + 0.5*abs(x_mc)
  e_mc <- rnorm(n, sd=sigma_mc)
  y_mc <- 2 + true_b*x_mc + e_mc
  m <- lm(y_mc ~ x_mc)

  # OLS CI
  ci_ols <- confint(m)[2,]
  coverage_ols <- coverage_ols + (ci_ols[1] <= true_b & true_b <= ci_ols[2])

  # HC3 CI
  se_mc <- sqrt(vcovHC(m, type="HC3")[2,2])
  b_mc <- coef(m)[2]
  ci_hc3 <- b_mc + c(-1,1) * qt(0.975, n-2) * se_mc
  coverage_hc <- coverage_hc + (ci_hc3[1] <= true_b & true_b <= ci_hc3[2])
}

cat("\n=== COVERAGE UNDER HETEROSKEDASTICITY ===\n")
cat(sprintf("OLS CI coverage:     %.1f%% (nominal: 95%%)\n",
            100*coverage_ols/n_mc))
cat(sprintf("HC3 CI coverage:     %.1f%% (nominal: 95%%)\n",
            100*coverage_hc/n_mc))

12 Practice Problems

Problem 1: Consistency check.

Misalkan model true adalah \(y_i = \alpha + \beta x_i + \gamma z_i + \varepsilon_i\) tapi kita run \(y_i = \alpha + \beta x_i + \eta_i\) (omit \(z_i\)).

  • Apa plim dari \(\hat{\beta}_{OLS}\) dari misspecified model?
  • Kapan \(\hat{\beta}\) masih consistent untuk \(\beta\)?

Jawaban: \(\text{plim}(\hat{\beta}) = \beta + \gamma \cdot \frac{\text{Cov}(x_i, z_i)}{\text{Var}(x_i)}\)

Consistent hanya jika \(\gamma = 0\) (omitted variable tidak affect \(y\)) atau \(\text{Cov}(x_i, z_i) = 0\) (omitted variable uncorrelated with included variable).

Problem 2: Sandwich estimator derivation.

Show that under homoskedasticity (\(\text{Var}(\varepsilon_i) = \sigma^2\)), sandwich estimator simplifies to standard OLS variance.

Hint: \(\sum_i \hat{\varepsilon}_i^2 x_i x_i^T \approx \sigma^2 X^TX\) for large \(n\).

Problem 3: Asymptotic distribution with misspecification.

OLS with heteroskedasticity: \(y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\) dimana \(E[\varepsilon_i^2 | x_i] = \sigma_i^2 = x_i^2\).

  • Tulis down asymptotic variance of \(\hat{\beta}_1\) (sandwich form)
  • Apa yang salah jika kita gunakan standard OLS SE?
  • Numerically verify via simulation

Problem 4: HAC bandwidth selection.

Untuk AR(1) process \(\varepsilon_t = \phi\varepsilon_{t-1} + u_t\) dengan \(\phi = 0.8\), compute optimal HAC bandwidth using Andrews’ (1991) automatic selection formula. Compare Newey-West SEs with bandwidth \(L=1,5,10\).