Asymptotic Theory
Kenapa Large-Sample Inference Bekerja
1 Kenapa Ini Penting?
Asymptotic theory adalah mathematical foundation of econometrics. Tanpanya, kita tidak punya justifikasi untuk menggunakan t-stats dan F-stats kecuali ketika errors benar-benar normal (yang jarang terjadi di real data).
Setiap kali kamu melihat di paper: - “Under mild regularity conditions, the estimator is consistent…” - “Asymptotically, the test statistic follows chi-squared distribution…” - “Using heteroskedasticity-robust standard errors…”
Itu semua adalah asymptotic theory. Bab ini memberikan kamu pemahaman tentang kenapa ini benar dan kapan approximasinya baik.
Pertanyaan yang akan kamu bisa jawab: - Kenapa OLS consistent? Asumsi apa yang dibutuhkan? - Kenapa t-stats “work” di large samples even with non-normal errors? - Apa sebenarnya “robust standard errors” dan kapan kamu butuh mereka? - Apa bedanya asymptotic efficiency dengan finite-sample efficiency?
2 1. Konsistensi OLS
Model: \(y = X\beta + \varepsilon\), dimana \(y\) dan \(\varepsilon\) adalah \(n \times 1\) vectors, \(X\) adalah \(n \times k\) matrix.
OLS estimator: \(\hat{\beta} = (X^TX)^{-1}X^Ty\)
Theorem: Di bawah kondisi: 1. Exogeneity: \(E[\varepsilon_i | x_i] = 0\) (atau weak: \(E[x_i \varepsilon_i] = 0\)) 2. LLN-ble: \(\frac{1}{n}X^TX \xrightarrow{p} Q_{XX} = E[x_i x_i^T]\) (finite, positive definite) 3. LLN-ble: \(\frac{1}{n}X^T\varepsilon \xrightarrow{p} 0\)
Maka: \(\hat{\beta} \xrightarrow{p} \beta\)
Proof sketch:
\[\hat{\beta} = (X^TX)^{-1}X^Ty = (X^TX)^{-1}X^T(X\beta + \varepsilon) = \beta + (X^TX)^{-1}X^T\varepsilon\]
\[\hat{\beta} - \beta = \underbrace{\left(\frac{X^TX}{n}\right)^{-1}}_{\xrightarrow{p} Q_{XX}^{-1}} \cdot \underbrace{\frac{X^T\varepsilon}{n}}_{\xrightarrow{p} 0}\]
Oleh Slutsky’s theorem (product of plim):
\[\hat{\beta} - \beta \xrightarrow{p} Q_{XX}^{-1} \cdot 0 = 0 \Rightarrow \hat{\beta} \xrightarrow{p} \beta\]
2.1 Asumsi untuk Kondisi 2 dan 3
Kondisi 2 terpenuhi jika \(\{x_i x_i^T\}\) adalah stationary ergodic dengan finite second moments.
Kondisi 3 terpenuhi jika \(E[x_i \varepsilon_i] = 0\) (exogeneity) dan \(\{x_i\varepsilon_i\}\) satisfies LLN.
Apa yang TIDAK dibutuhkan: Normality of errors! OLS konsisten tanpa normality assumption. Ini adalah keunggulan besar asymptotic theory.
3 2. Asymptotic Normality OLS
Di bawah kondisi konsistensi ditambah: 4. CLT-ble: \(\frac{1}{\sqrt{n}}X^T\varepsilon \xrightarrow{d} N(\mathbf{0}, S)\) dimana \(S = E[\varepsilon_i^2 x_i x_i^T]\)
Maka: \[\sqrt{n}(\hat{\beta} - \beta) \xrightarrow{d} N\left(\mathbf{0},\ Q_{XX}^{-1} S Q_{XX}^{-1}\right)\]
Kasus homoskedastik (\(\text{Var}(\varepsilon_i|x_i) = \sigma^2\)): \(S = \sigma^2 Q_{XX}\), sehingga: \[\sqrt{n}(\hat{\beta} - \beta) \xrightarrow{d} N(\mathbf{0},\ \sigma^2 Q_{XX}^{-1})\]
Proof sketch (homoskedastic case):
\[\sqrt{n}(\hat{\beta} - \beta) = \left(\frac{X^TX}{n}\right)^{-1} \frac{1}{\sqrt{n}}X^T\varepsilon\]
By CLT: \(\frac{1}{\sqrt{n}}\sum_i x_i\varepsilon_i \xrightarrow{d} N(0, E[x_i x_i^T \varepsilon_i^2]) = N(0, \sigma^2 Q_{XX})\) (under homoskedasticity)
By Slutsky (\((X^TX/n)^{-1} \xrightarrow{p} Q_{XX}^{-1}\)): \[\sqrt{n}(\hat{\beta}-\beta) \xrightarrow{d} Q_{XX}^{-1} \cdot N(0, \sigma^2 Q_{XX}) = N(0, \sigma^2 Q_{XX}^{-1})\]
4 3. Sandwich Estimator (HC Standard Errors)
Ketika errors heteroskedastic (\(\text{Var}(\varepsilon_i|x_i) = \sigma_i^2\)), standard OLS SEs tidak valid.
Estimasi dari \(\text{Avar}(\hat{\beta}) = Q_{XX}^{-1} S Q_{XX}^{-1}\) menggunakan:
\[\hat{V}_{HC} = (X^TX)^{-1}\left(\sum_{i=1}^n \hat{\varepsilon}_i^2 x_i x_i^T\right)(X^TX)^{-1}\]
Bentuk “sandwich”: \(\underbrace{(X^TX)^{-1}}_{\hat{Q}_{XX}^{-1}} \underbrace{\left(\sum \hat{\varepsilon}_i^2 x_i x_i^T\right)}_{\hat{S}} \underbrace{(X^TX)^{-1}}_{\hat{Q}_{XX}^{-1}}\)
Varianten (finite sample corrections): - HC0 (White 1980): \(\hat{S} = \sum_i \hat{\varepsilon}_i^2 x_i x_i^T\) - HC1: \(\frac{n}{n-k} \hat{S}_{HC0}\) - HC3: \(\sum_i \frac{\hat{\varepsilon}_i^2}{(1-h_{ii})^2} x_i x_i^T\) — best for small \(n\), \(h_{ii}\) = leverage
Kapan gunakan HC SEs?
- Selalu di cross-sectional data (lebih konservatif, tidak ada cost jika homoskedastic)
- Di panel data, gunakan cluster-robust SEs
- Di time series, gunakan HAC (Newey-West) SEs
Intuisi sandwich: \((X^TX)^{-1}\) adalah “bread”, \(\sum \hat{\varepsilon}_i^2 x_i x_i^T\) adalah “meat”. Sandwich = bread-meat-bread. Meat menggunakan squared residuals untuk estimate heteroskedasticity.
5 4. Newey-West HAC Standard Errors
Untuk time series data dengan serial correlation:
Ketika \(\{x_i \varepsilon_i\}\) autocorrelated (time series), kita perlu estimate long-run variance. Newey-West estimator:
\[\hat{S}_{NW} = \hat{\Gamma}_0 + \sum_{j=1}^L \left(1 - \frac{j}{L+1}\right)(\hat{\Gamma}_j + \hat{\Gamma}_j^T)\]
dimana \(\hat{\Gamma}_j = \frac{1}{n}\sum_{i=j+1}^n \hat{u}_i \hat{u}_{i-j} x_i x_{i-j}^T\) adalah lag-\(j\) autocovariance estimate.
\(L\) adalah bandwidth (number of lags) — typically \(L = \lfloor 4(n/100)^{2/9} \rfloor\) (automatic rule).
6 5. Konsistensi vs Efisiensi
Consistency: \(\hat{\theta} \xrightarrow{p} \theta\) — estimator converges ke true value.
Efficiency: Estimator dengan minimum asymptotic variance di antara class tertentu dari estimators.
Perbedaan kunci: - OLS bisa konsisten tapi tidak efisien (dengan heteroskedasticity) - GLS efisien tapi butuh knowing \(\Omega = \text{Cov}(\varepsilon)\) - Estimated GLS (FGLS) konsisten dan asymptotically efficient
Contoh: Dengan heteroskedasticity, OLS adalah konsisten dan unbiased, tapi WLS (Weighted Least Squares, a form of GLS) lebih efisien — SEs lebih kecil, confidence intervals lebih sempit.
7 6. Asymptotic Efficiency — Cramer-Rao in Large Samples
Estimator \(\hat{\theta}_n\) adalah asymptotically efficient jika, di antara semua asymptotically normal estimators yang consistent, ia memiliki minimum asymptotic variance.
Hasil kunci: MLE adalah asymptotically efficient — mencapai inverse Fisher information: \[\sqrt{n}(\hat{\theta}_{MLE} - \theta) \xrightarrow{d} N(0, I(\theta)^{-1})\]
OLS adalah asymptotically efficient hanya dalam class of linear estimators under homoskedasticity (Gauss-Markov in asymptotic version).
8 7. Worked Example: OLS Consistency via Simulation
Kita simulate untuk menunjukkan: 1. OLS consistent (bias dan variance go to 0 as \(n\) increases) 2. Dengan endogeneity, OLS not consistent
set.seed(2024)
# ============================================================
# CASE 1: OLS consistent (exogeneity holds)
# ============================================================
simulate_ols_consistency <- function(n, n_sims=1000) {
betas <- replicate(n_sims, {
x <- rnorm(n)
e <- rnorm(n, sd=2) # Independent of x
y <- 2 + 1.5*x + e # True beta = 1.5
lm(y~x)$coef[2]
})
return(betas)
}
ns <- c(30, 100, 500, 2000)
cat("=== OLS CONSISTENCY UNDER EXOGENEITY ===\n")
cat(sprintf("%-6s %-10s %-10s %-10s\n", "n", "Bias", "Var", "MSE"))
for(n in ns) {
betas <- simulate_ols_consistency(n)
bias_n <- mean(betas) - 1.5
var_n <- var(betas)
mse_n <- bias_n^2 + var_n
cat(sprintf("%-6d %-10.5f %-10.5f %-10.5f\n", n, bias_n, var_n, mse_n))
}
# ============================================================
# CASE 2: OLS inconsistent (endogeneity)
# ============================================================
simulate_ols_endogeneity <- function(n, n_sims=1000) {
betas <- replicate(n_sims, {
u <- rnorm(n) # Unobserved confounder
x <- 0.5*u + rnorm(n) # x correlated with error!
e <- 0.7*u + rnorm(n) # error contains u
y <- 2 + 1.5*x + e # True beta = 1.5 still
lm(y~x)$coef[2]
})
return(betas)
}
cat("\n=== OLS INCONSISTENCY UNDER ENDOGENEITY ===\n")
cat(sprintf("%-6s %-10s (True beta = 1.5)\n", "n", "Mean(beta_hat)"))
for(n in ns) {
betas <- simulate_ols_endogeneity(n)
cat(sprintf("%-6d %-10.5f\n", n, mean(betas)))
}
cat("OLS doesn't converge to 1.5! It converges to ~1.5 + bias term.\n")
# ============================================================
# CASE 3: Asymptotic normality — verify CLT applies
# ============================================================
n <- 200; n_sims <- 5000; true_beta <- 1.5
betas <- replicate(n_sims, {
x <- rnorm(n)
e <- rexp(n) - 1 # Non-normal! But CLT should still hold
y <- 2 + true_beta*x + e
coef(lm(y~x))[2]
})
# Standardize: sqrt(n) * (beta_hat - beta) / asymptotic_sd
sigma_e <- 1 # Var(exp(1)-1) = Var(exp(1)) = 1
sigma_x <- 1 # Var(N(0,1)) = 1
asymp_sd <- sigma_e / (sigma_x * sqrt(n))
z_standardized <- (betas - true_beta) / asymp_sd
# Test normality of standardized betas
cat("\n=== ASYMPTOTIC NORMALITY CHECK ===\n")
cat("Error distribution: Exponential (non-normal!)\n")
cat(sprintf("Mean of z: %.4f (expect 0)\n", mean(z_standardized)))
cat(sprintf("Var of z: %.4f (expect 1)\n", var(z_standardized)))
cat(sprintf("Skewness: %.4f (expect 0 for normal)\n",
mean(z_standardized^3)))
cat(sprintf("Shapiro-Wilk p-value: %.4f\n",
shapiro.test(sample(z_standardized, 5000))$p.value))
hist(z_standardized, probability=TRUE, breaks=50,
main="Standardized OLS Beta: Non-Normal Errors, n=200",
xlab="z = sqrt(n)*(b_hat - b) / sigma")
curve(dnorm(x), add=TRUE, col="red", lwd=2)9 8. Asymptotics dalam Praktek: FAQ
Q: Berapa \(n\) yang cukup untuk asymptotic approximations?
Tergantung masalah: - Simple regression dengan normal-ish data: \(n \geq 30\) biasanya cukup - Dengan severe heteroskedasticity atau skewed distributions: \(n \geq 100\) - Complex nonlinear models (probit, Poisson): \(n \geq 200\) - Simulations untuk verify!
Q: Kalau \(n\) kecil, apa yang harus dilakukan?
- Exact methods (exact t-test, Fisher exact test untuk 2x2 tables)
- Bootstrap untuk CIs
- Bayesian inference dengan informative priors
Q: Kapan gunakan robust SEs?
- Cross-sectional data: selalu gunakan HC3
- Panel data: cluster-robust (cluster on individual)
- Time series: HAC (Newey-West)
- RCT dengan clustering: cluster-robust, cluster at unit-of-randomization level
10 9. Koneksi ke Applied Econometrics
Robust inference di Stata/R: robust, vce(cluster), vcov=vcovHC() — semua implement sandwich estimator.
Asymptotic equivalence tests: LR, Wald, Score semua asymptotically equivalent (same power against local alternatives). Di finite samples mereka bisa differ — usually LR is most reliable.
Panel data: FE estimator consistency requires \(T \to \infty\) (or \(n \to \infty\) with \(T\) fixed under strict exogeneity). “Incidental parameters problem” arises when \(T\) fixed dan \(n \to \infty\) for nonlinear FE.
Instrumental variables: IV consistency requires \(E[z_i \varepsilon_i] = 0\) (instrument exogeneity) dan \(E[z_i x_i] \neq 0\) (instrument relevance). Weak instruments (\(E[z_i x_i] \approx 0\)) cause poor finite-sample performance even when consistent asymptotically.
Nonparametric/semiparametric: Convergence rates often \(n^{-r}\) for \(r < 1/2\), slower than \(n^{-1/2}\) parametric rate. Bandwidth selection is crucial.
11 10. R Code: HC dan HAC Standard Errors
library(sandwich)
library(lmtest)
# ============================================================
# COMPARE: OLS, HC, Cluster-Robust SEs
# ============================================================
set.seed(2024)
n <- 500
# Generate heteroskedastic data
x <- rnorm(n)
sigma_i <- 0.5 + 0.5*abs(x) # Heteroskedasticity
e <- rnorm(n, sd=sigma_i)
y <- 2 + 1.5*x + e
model <- lm(y ~ x)
# Different SE estimates
se_ols <- sqrt(diag(vcov(model))) # Standard OLS SE
se_hc0 <- sqrt(diag(vcovHC(model, type="HC0"))) # White 1980
se_hc1 <- sqrt(diag(vcovHC(model, type="HC1"))) # With n/(n-k)
se_hc3 <- sqrt(diag(vcovHC(model, type="HC3"))) # Recommended
cat("=== COMPARISON OF STANDARD ERRORS ===\n")
cat(sprintf("%-12s %-12s %-12s\n", "Type", "SE(intercept)", "SE(slope)"))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "OLS", se_ols[1], se_ols[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HC0", se_hc0[1], se_hc0[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HC1", se_hc1[1], se_hc1[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HC3", se_hc3[1], se_hc3[2]))
# ============================================================
# HAC STANDARD ERRORS FOR TIME SERIES
# ============================================================
# Generate AR(1) process
set.seed(42)
n_ts <- 300
phi <- 0.5 # Autocorrelation
# AR(1) error
e_ts <- numeric(n_ts)
e_ts[1] <- rnorm(1)
for(t in 2:n_ts) e_ts[t] <- phi*e_ts[t-1] + rnorm(1)
x_ts <- rnorm(n_ts)
y_ts <- 2 + 1.5*x_ts + e_ts
model_ts <- lm(y_ts ~ x_ts)
# Standard vs HAC SEs
se_standard_ts <- sqrt(diag(vcov(model_ts)))
se_hac <- sqrt(diag(vcovHAC(model_ts))) # Automatic bandwidth
se_nw <- sqrt(diag(NeweyWest(model_ts))) # Newey-West
cat("\n=== TIME SERIES: OLS vs HAC SEs ===\n")
cat("(True beta_x = 1.5, AR(1) errors with phi =", phi, ")\n")
cat(sprintf("%-12s %-12s %-12s\n", "Type", "SE(intercept)", "SE(slope)"))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "OLS Standard", se_standard_ts[1], se_standard_ts[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "HAC (auto)", se_hac[1], se_hac[2]))
cat(sprintf("%-12s %-12.5f %-12.5f\n", "Newey-West", se_nw[1], se_nw[2]))
# ============================================================
# MONTE CARLO: Coverage of different CIs
# ============================================================
n_mc <- 2000; n <- 100; true_b <- 1.5
coverage_ols <- 0; coverage_hc <- 0
for(i in 1:n_mc) {
x_mc <- rnorm(n)
sigma_mc <- 0.5 + 0.5*abs(x_mc)
e_mc <- rnorm(n, sd=sigma_mc)
y_mc <- 2 + true_b*x_mc + e_mc
m <- lm(y_mc ~ x_mc)
# OLS CI
ci_ols <- confint(m)[2,]
coverage_ols <- coverage_ols + (ci_ols[1] <= true_b & true_b <= ci_ols[2])
# HC3 CI
se_mc <- sqrt(vcovHC(m, type="HC3")[2,2])
b_mc <- coef(m)[2]
ci_hc3 <- b_mc + c(-1,1) * qt(0.975, n-2) * se_mc
coverage_hc <- coverage_hc + (ci_hc3[1] <= true_b & true_b <= ci_hc3[2])
}
cat("\n=== COVERAGE UNDER HETEROSKEDASTICITY ===\n")
cat(sprintf("OLS CI coverage: %.1f%% (nominal: 95%%)\n",
100*coverage_ols/n_mc))
cat(sprintf("HC3 CI coverage: %.1f%% (nominal: 95%%)\n",
100*coverage_hc/n_mc))12 Practice Problems
Problem 1: Consistency check.
Misalkan model true adalah \(y_i = \alpha + \beta x_i + \gamma z_i + \varepsilon_i\) tapi kita run \(y_i = \alpha + \beta x_i + \eta_i\) (omit \(z_i\)).
- Apa plim dari \(\hat{\beta}_{OLS}\) dari misspecified model?
- Kapan \(\hat{\beta}\) masih consistent untuk \(\beta\)?
Jawaban: \(\text{plim}(\hat{\beta}) = \beta + \gamma \cdot \frac{\text{Cov}(x_i, z_i)}{\text{Var}(x_i)}\)
Consistent hanya jika \(\gamma = 0\) (omitted variable tidak affect \(y\)) atau \(\text{Cov}(x_i, z_i) = 0\) (omitted variable uncorrelated with included variable).
Problem 2: Sandwich estimator derivation.
Show that under homoskedasticity (\(\text{Var}(\varepsilon_i) = \sigma^2\)), sandwich estimator simplifies to standard OLS variance.
Hint: \(\sum_i \hat{\varepsilon}_i^2 x_i x_i^T \approx \sigma^2 X^TX\) for large \(n\).
Problem 3: Asymptotic distribution with misspecification.
OLS with heteroskedasticity: \(y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\) dimana \(E[\varepsilon_i^2 | x_i] = \sigma_i^2 = x_i^2\).
- Tulis down asymptotic variance of \(\hat{\beta}_1\) (sandwich form)
- Apa yang salah jika kita gunakan standard OLS SE?
- Numerically verify via simulation
Problem 4: HAC bandwidth selection.
Untuk AR(1) process \(\varepsilon_t = \phi\varepsilon_{t-1} + u_t\) dengan \(\phi = 0.8\), compute optimal HAC bandwidth using Andrews’ (1991) automatic selection formula. Compare Newey-West SEs with bandwidth \(L=1,5,10\).