Convergence, LLN & CLT
Kenapa Sample Besar Bisa Dipercaya
1 Kenapa Ini Penting?
Dua teorema di bab ini adalah bedrock of econometrics:
- LLN menjustifikasi mengapa sample averages adalah estimator yang baik untuk population means. Konsistensi OLS bergantung sepenuhnya pada LLN.
- CLT menjustifikasi mengapa kita bisa menggunakan normal distribution untuk inference, bahkan ketika data tidak normal. Ini adalah alasan t-tests dan F-tests bekerja dengan baik di large samples.
Hampir semua asymptotic econometrics bergantung pada kedua teorema ini. Ketika seseorang bilang “under mild regularity conditions, this estimator is asymptotically normal”, mereka sedang invoke CLT. Ketika seseorang bilang “consistent estimator”, mereka sedang invoke LLN.
Kalau kamu pernah bertanya-tanya: “Kenapa kita peduli dengan n besar? Model saya hanya punya 200 observasi.” Jawabannya: karena exact finite-sample theory seringkali tidak tersedia, kita rely on asymptotic approximations yang dijamin oleh LLN dan CLT.
2 1. Types of Convergence
Ada beberapa notions of convergence untuk random variables. Ini penting karena tiap jenis punya strength berbeda.
1. Convergence in Probability: \(X_n \xrightarrow{p} X\)
\[\lim_{n\to\infty} P(|X_n - X| > \varepsilon) = 0 \quad \text{untuk semua } \varepsilon > 0\]
Artinya: untuk \(n\) besar, \(X_n\) hampir pasti dekat dengan \(X\).
2. Almost Sure Convergence (Strong): \(X_n \xrightarrow{a.s.} X\)
\[P\left(\lim_{n\to\infty} X_n = X\right) = 1\]
Artinya: dengan probability 1, seluruh sequence \(X_n(\omega)\) converges ke \(X(\omega)\) untuk setiap outcome \(\omega\).
3. Convergence in Distribution: \(X_n \xrightarrow{d} X\)
\[\lim_{n\to\infty} F_{X_n}(t) = F_X(t) \quad \text{untuk semua continuity points } t \text{ dari } F_X\]
Artinya: distribusi dari \(X_n\) mendekati distribusi \(X\), bukan necessarily nilai-nilainya.
4. \(L^r\) Convergence (\(r\)-th mean convergence):
\[E[|X_n - X|^r] \to 0\]
\(L^2\) convergence (mean square convergence) paling umum digunakan.
2.1 Hierarki Konvergensi
Almost Sure (a.s.)
↓
In Probability (p)
↓
In Distribution (d)
- \(X_n \xrightarrow{a.s.} X \Rightarrow X_n \xrightarrow{p} X\) (tapi tidak sebaliknya)
- \(X_n \xrightarrow{p} X \Rightarrow X_n \xrightarrow{d} X\) (tapi tidak sebaliknya)
- \(L^2 \Rightarrow L^1 \Rightarrow p\)
Pengecualian: Jika \(X\) adalah konstanta \(c\), maka \(X_n \xrightarrow{p} c \Leftrightarrow X_n \xrightarrow{d} c\).
3 2. Weak Law of Large Numbers (WLLN)
Misalkan \(X_1, X_2, \ldots\) adalah iid dengan \(E[X_i] = \mu < \infty\). Definisikan sample mean: \[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\]
Maka: \[\bar{X}_n \xrightarrow{p} \mu \quad \text{sebagai } n \to \infty\]
Proof sketch (menggunakan Chebyshev’s inequality, asumsi tambahan \(\text{Var}(X_i) = \sigma^2 < \infty\)): \[P(|\bar{X}_n - \mu| > \varepsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\varepsilon^2} = \frac{\sigma^2/n}{\varepsilon^2} \xrightarrow{} 0\]
Interpretasi praktis: Semakin besar sample, sample mean semakin dekat ke population mean. Ini adalah fondasi dari menggunakan \(\bar{X}\) sebagai estimator \(\mu\).
4 3. Strong Law of Large Numbers (SLLN)
Misalkan \(X_1, X_2, \ldots\) iid dengan \(E[|X_i|] < \infty\) (finite first moment). Maka: \[\bar{X}_n \xrightarrow{a.s.} \mu\]
Lebih kuat dari WLLN: hampir setiap sample path akan eventually converge.
Perbedaan WLLN vs SLLN: WLLN mengatakan “pada setiap titik waktu \(n\) yang besar, probabilitas error besar menjadi kecil.” SLLN mengatakan “hampir pasti, sequence itu sendiri converges.” SLLN lebih kuat tapi butuh proof yang lebih complex.
5 4. Central Limit Theorem (CLT)
Misalkan \(X_1, X_2, \ldots\) iid dengan \(E[X_i] = \mu\) dan \(\text{Var}(X_i) = \sigma^2 < \infty\). Maka:
\[\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} N(0, 1)\]
Ekuivalen: \[\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)\]
Atau dalam notasi praktis (untuk \(n\) besar): \[\bar{X}_n \overset{\text{approx}}{\sim} N\left(\mu, \frac{\sigma^2}{n}\right)\]
Apa yang luar biasa dari CLT?
CLT berlaku untuk distribusi apapun selama finite variance. Data kamu bisa Exponential, Uniform, Binomial — sample mean-nya akan approximately normal untuk \(n\) cukup besar. Ini adalah alasan kenapa normal distribution adalah sangat central dalam statistics.
Berapa besar “n cukup besar”? Tergantung distribusi asal: - Symmetric, unimodal: \(n \geq 15\) biasanya cukup - Moderately skewed: \(n \geq 30\) - Highly skewed atau heavy-tailed: \(n \geq 100\) atau lebih
6 5. Delta Method
Misalkan \(\sqrt{n}(X_n - \mu) \xrightarrow{d} N(0, \sigma^2)\) dan \(g\) adalah fungsi dengan \(g'(\mu) \neq 0\).
Maka: \[\sqrt{n}(g(X_n) - g(\mu)) \xrightarrow{d} N(0, \sigma^2 [g'(\mu)]^2)\]
Multivariate Delta Method: Jika \(\sqrt{n}(\mathbf{X}_n - \boldsymbol{\mu}) \xrightarrow{d} N(\mathbf{0}, \Sigma)\) dan \(g: \mathbb{R}^p \to \mathbb{R}\) differentiable, maka: \[\sqrt{n}(g(\mathbf{X}_n) - g(\boldsymbol{\mu})) \xrightarrow{d} N\left(0, \nabla g(\boldsymbol{\mu})^T \Sigma \nabla g(\boldsymbol{\mu})\right)\]
Intuisi: Delta method adalah “first-order Taylor approximation” + CLT. Kalau \(X_n \approx \mu\) untuk besar \(n\), maka \(g(X_n) \approx g(\mu) + g'(\mu)(X_n - \mu)\).
Contoh Aplikasi:
- Estimating log-odds: \(p\) adalah proportion. $ = $ sample proportion. Mau inference untuk \(\log(\hat{p}/(1-\hat{p}))\)?
- \(g(p) = \log(p/(1-p))\), \(g'(p) = 1/(p(1-p))\)
- \(\text{Var}(\hat{p}) = p(1-p)/n\)
- Delta method: \(\sqrt{n}(\log(\hat{p}/(1-\hat{p})) - \log(p_0/(1-p_0))) \xrightarrow{d} N(0, 1/(p(1-p)))\)
- Transformasi nonlinear dari OLS estimates: Misalnya, marginal effects di probit/logit model.
7 6. Slutsky’s Theorem
Jika \(X_n \xrightarrow{d} X\) dan \(Y_n \xrightarrow{p} c\) (konstanta), maka:
\[X_n + Y_n \xrightarrow{d} X + c\] \[X_n \cdot Y_n \xrightarrow{d} c \cdot X\] \[X_n / Y_n \xrightarrow{d} X / c \quad (\text{jika } c \neq 0)\]
Kenapa ini penting? Dalam econometrics, kita sering punya:
\[T_n = \frac{\sqrt{n}(\bar{X}_n - \mu)}{S_n}\]
dimana \(S_n\) adalah sample standard deviation. Kita tahu: - Numerator: \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)\) by CLT - Denominator: \(S_n \xrightarrow{p} \sigma\) by LLN (consistency of sample variance)
Slutsky \(\Rightarrow\) \(T_n = \frac{\sqrt{n}(\bar{X}_n - \mu)}{S_n} \xrightarrow{d} N(0,1)\).
Ini adalah justification untuk t-statistics being asymptotically standard normal!
8 7. Continuous Mapping Theorem
Jika \(X_n \xrightarrow{p} X\) dan \(g\) adalah fungsi kontinu, maka: \[g(X_n) \xrightarrow{p} g(X)\]
Versi untuk convergence in distribution: Jika \(X_n \xrightarrow{d} X\) dan \(g\) kontinu di setiap point of \(X\) (a.s.), maka \(g(X_n) \xrightarrow{d} g(X)\).
Aplikasi langsung: Jika \(Z_n \xrightarrow{d} N(0,1)\), maka \(Z_n^2 \xrightarrow{d} \chi^2(1)\) karena \(g(z) = z^2\) kontinu.
9 8. Worked Example: CLT Simulation
Kita lempar \(n\) dadu fair, ambil rata-ratanya. CLT mengatakan distribusi rata-rata akan mendekati normal saat \(n\) besar.
- Dadu: \(X \sim \text{Uniform}\{1, 2, 3, 4, 5, 6\}\)
- \(E[X] = 3.5\), \(\text{Var}(X) = 35/12 \approx 2.917\)
- CLT: \(\bar{X}_n \overset{\text{approx}}{\sim} N(3.5, 2.917/n)\)
set.seed(2024)
n_sims <- 10000
# Simulate for different n
par(mfrow=c(2,2))
ns <- c(1, 5, 30, 100)
for(n in ns) {
# Simulate n_sims samples of size n
sample_means <- replicate(n_sims, mean(sample(1:6, n, replace=TRUE)))
# Theoretical parameters for CLT approximation
mu_theory <- 3.5
sigma_theory <- sqrt(35/12 / n)
# Plot histogram vs theoretical normal
hist(sample_means, probability=TRUE, breaks=30,
main=paste0("n = ", n),
xlab="Sample Mean", col="lightblue", border="white")
# Overlay CLT normal
curve(dnorm(x, mean=mu_theory, sd=sigma_theory),
add=TRUE, col="red", lwd=2)
# Q-Q plot overlay info
cat(sprintf("n=%3d: Mean=%.4f (theory=3.5), SD=%.4f (theory=%.4f)\n",
n, mean(sample_means), sd(sample_means), sigma_theory))
}Output shows: Untuk \(n=1\), distribusi seragam (discrete). Untuk \(n=5\), mulai membentuk bell shape tapi masih ada sedikit discrete structure. Untuk \(n=30\), sudah sangat mirip normal. Untuk \(n=100\), hampir perfect normal.
10 9. Multivariate CLT dan Aplicasi ke OLS
Jika \(\mathbf{x}_1, \ldots, \mathbf{x}_n\) iid dengan \(E[\mathbf{x}_i] = \boldsymbol{\mu}\) dan \(\text{Cov}(\mathbf{x}_i) = \Sigma\), maka:
\[\sqrt{n}(\bar{\mathbf{X}}_n - \boldsymbol{\mu}) \xrightarrow{d} N(\mathbf{0}, \Sigma)\]
Koneksi ke OLS Asymptotics:
OLS estimator: \(\hat{\beta} = (X^TX)^{-1}X^Ty = \beta + (X^TX)^{-1}X^T\varepsilon\)
Tulis: \[\sqrt{n}(\hat{\beta} - \beta) = \left(\frac{X^TX}{n}\right)^{-1} \frac{X^T\varepsilon}{\sqrt{n}}\]
- \(\frac{X^TX}{n} = \frac{1}{n}\sum_i x_i x_i^T \xrightarrow{p} Q_{XX} = E[x_i x_i^T]\) (by LLN)
- \(\frac{X^T\varepsilon}{\sqrt{n}} = \frac{1}{\sqrt{n}}\sum_i x_i\varepsilon_i \xrightarrow{d} N(\mathbf{0}, \sigma^2 Q_{XX})\) (by CLT, karena \(E[x_i\varepsilon_i] = 0\) under exogeneity)
Maka by Slutsky + CMT: \[\sqrt{n}(\hat{\beta} - \beta) \xrightarrow{d} N(\mathbf{0}, \sigma^2 Q_{XX}^{-1})\]
11 10. Koneksi ke Aplikasi
Konsistensi OLS: LLN menjamin \(\frac{1}{n}X^TX \xrightarrow{p} Q_{XX}\) (population moment matrix) dan \(\frac{1}{n}X^T\varepsilon \xrightarrow{p} 0\) (exogeneity condition). Ini makes OLS consistent.
Asymptotic normality OLS: CLT menjamin \(\frac{1}{\sqrt{n}}X^T\varepsilon \xrightarrow{d} N(0, \sigma^2 Q_{XX})\). Combined dengan Slutsky, ini gives asymptotic normality of \(\hat{\beta}\).
t-statistics dan F-statistics: Dalam large samples, t-statistics are asymptotically \(N(0,1)\) dan F-statistics are asymptotically \(\chi^2(m)/m\). Ini justifies using critical values from these distributions even when errors are not exactly normal.
Bootstrap validity: Bootstrap works because resampling from empirical distribution mimics the LLN/CLT mechanism in the original sample.
GMM estimation: Moment conditions \(\frac{1}{n}\sum_i g(x_i, \theta) \xrightarrow{p} E[g(x_i, \theta)] = 0\) by LLN. The GMM objective function exploits this convergence.
12 11. R Code: LLN dan CLT Simulation
library(ggplot2)
library(dplyr)
# ============================================================
# LLN: Running average converges to true mean
# ============================================================
set.seed(2024)
n_total <- 5000
# Simulate from Exponential(lambda=2), true mean = 0.5
lambda <- 2
x <- rexp(n_total, rate=lambda)
true_mean <- 1/lambda
# Running average
running_avg <- cumsum(x) / seq_along(x)
# Plot
df_lln <- data.frame(n=1:n_total, running_avg=running_avg)
ggplot(df_lln, aes(x=n, y=running_avg)) +
geom_line(color="steelblue", linewidth=0.5) +
geom_hline(yintercept=true_mean, color="red", linewidth=1, linetype="dashed") +
scale_x_log10() +
labs(title="LLN: Running Average of Exponential(2) Samples",
subtitle="Red dashed line = True mean (0.5)",
x="n (log scale)", y="Running Average") +
theme_minimal()
# ============================================================
# CLT: Distribution of sample means
# ============================================================
simulate_clt <- function(n, n_sims=5000, dist="exponential") {
if(dist == "exponential") {
sample_means <- replicate(n_sims, mean(rexp(n, rate=1)))
true_mean <- 1; true_var <- 1/n
} else if(dist == "uniform") {
sample_means <- replicate(n_sims, mean(runif(n, 0, 1)))
true_mean <- 0.5; true_var <- (1/12)/n
} else if(dist == "bernoulli") {
p <- 0.3
sample_means <- replicate(n_sims, mean(rbinom(n, 1, p)))
true_mean <- p; true_var <- p*(1-p)/n
}
return(list(means=sample_means, true_mean=true_mean, true_var=true_var))
}
# Show CLT for exponential (highly skewed!) distribution
par(mfrow=c(1,3))
for(n in c(5, 30, 100)) {
res <- simulate_clt(n, dist="exponential")
hist(res$means, probability=TRUE, breaks=40,
main=paste("Exponential, n =", n),
xlab="Sample Mean", col="lightcoral", border="white")
curve(dnorm(x, mean=res$true_mean, sd=sqrt(res$true_var)),
add=TRUE, col="darkblue", lwd=2)
}
# ============================================================
# DELTA METHOD: Variance of log(X_bar)
# ============================================================
# Exponential(1): E[X]=1, Var(X)=1
# Sample mean: X_bar ~ N(1, 1/n) approx for large n
# g(x) = log(x), g'(x) = 1/x, g'(1) = 1
# Delta method: log(X_bar) ~ N(log(1), 1/n * (1/1)^2) = N(0, 1/n)
n <- 500; n_sims <- 10000
log_means <- replicate(n_sims, log(mean(rexp(n, rate=1))))
# Compare to delta method approximation
cat("Sample mean of log(X_bar):", mean(log_means), "(theory: 0)\n")
cat("Sample SD of log(X_bar):", sd(log_means),
"(theory:", 1/sqrt(n), ")\n")
hist(log_means, probability=TRUE, breaks=40,
main="Delta Method: Distribution of log(X_bar)",
xlab="log(X_bar)")
curve(dnorm(x, mean=0, sd=1/sqrt(n)), add=TRUE, col="red", lwd=2)
# ============================================================
# SLUTSKY: t-statistic is asymptotically N(0,1)
# ============================================================
set.seed(42)
n <- 50; n_sims <- 5000
mu_true <- 5; sigma_true <- 3
t_stats <- replicate(n_sims, {
x <- rnorm(n, mean=mu_true, sd=sigma_true)
s <- sd(x)
t <- sqrt(n) * (mean(x) - mu_true) / s
t
})
# Should be approximately N(0,1) by Slutsky
hist(t_stats, probability=TRUE, breaks=40,
main="Slutsky: t-statistic is Asymptotically N(0,1)",
xlab="t-statistic")
curve(dnorm(x), add=TRUE, col="red", lwd=2) # N(0,1)
curve(dt(x, df=n-1), add=TRUE, col="blue", lwd=2, lty=2) # exact t
legend("topright", c("N(0,1) [Slutsky approx]", "t(n-1) [Exact]"),
col=c("red","blue"), lwd=2, lty=1:2)13 Practice Problems
Problem 1: LLN application.
Kamu melempar koin bias (\(p = 0.6\)) sebanyak \(n\) kali. Misalkan $_n = $ proportion of heads.
- Apakah \(\hat{p}_n \xrightarrow{p} 0.6\)? Justifikasi.
- Pakai Chebyshev untuk menemukan berapa besar \(n\) yang dibutuhkan agar \(P(|\hat{p}_n - 0.6| > 0.01) < 0.05\).
Jawaban: - Ya, karena \(\hat{p}_n = \bar{X}_n\) dimana \(X_i \sim \text{Bernoulli}(0.6)\) iid, dan LLN applies. - \(P(|\hat{p}_n - 0.6| > 0.01) \leq \frac{0.6(0.4)/n}{0.01^2} = \frac{0.24}{0.0001 n} = \frac{2400}{n}\) Kita butuh \(2400/n < 0.05 \Rightarrow n > 48000\).
Problem 2: CLT application.
Data: \(X_1, \ldots, X_n \sim \text{Uniform}(0,10)\), sehingga \(\mu=5\), \(\sigma^2=100/12=25/3\).
- Untuk \(n=36\), approximasi \(P(\bar{X}_{36} > 5.5)\)
- Approximasi \(P(4.8 < \bar{X}_{100} < 5.3)\)
Jawaban: - \(\text{Var}(\bar{X}_{36}) = (25/3)/36 = 25/108\), \(\text{SD} = 5/\sqrt{108} = 0.481\) \(P(\bar{X}_{36} > 5.5) = P(Z > (5.5-5)/0.481) = P(Z > 1.04) = 1 - \Phi(1.04) \approx 0.149\) - \(\text{SD}(\bar{X}_{100}) = 5/10 = 0.5\) \(P(4.8 < \bar{X}_{100} < 5.3) = P(-0.4 < Z < 0.6) = \Phi(0.6) - \Phi(-0.4) \approx 0.726 - 0.345 = 0.381\)
Problem 3: Delta method.
\(X_1, \ldots, X_n \sim \text{Exponential}(\lambda)\), \(\bar{X}_n \xrightarrow{d} N(\lambda^{-1}, \lambda^{-2}/n)\).
Misalkan kita ingin mengestimasi \(\lambda\) dengan \(\hat{\lambda} = 1/\bar{X}_n\).
- Gunakan delta method untuk cari asymptotic distribution dari \(\hat{\lambda}\).
- Apa asymptotic SE dari \(\hat{\lambda}\)?
Jawaban: \(g(x) = 1/x\), \(g'(x) = -1/x^2\), \(g'(\mu) = g'(1/\lambda) = -\lambda^2\).
\(\sqrt{n}(\hat{\lambda} - \lambda) \xrightarrow{d} N(0, \lambda^{-2}(-\lambda^2)^2) = N(0, \lambda^2)\)
Asymptotic SE: \(\lambda/\sqrt{n}\), estimated as \(\hat{\lambda}/\sqrt{n}\).
Problem 4: Simulation exercise.
Verifikasi Slutsky’s theorem via simulation. Buat \(X_n \sim N(0, 1/n)\) dan \(Y_n \sim N(0, 1/n^2)\) (independent). Tunjukkan bahwa \(X_n + Y_n \xrightarrow{d} N(0,1)\)… Tunggu, apakah ini benar? Explore numerically.
Hint: \(X_n \xrightarrow{d} 0\) dan \(Y_n \xrightarrow{p} 0\) (constant 0). Slutsky says \(X_n + Y_n \xrightarrow{d} 0 + 0 = 0\). Explore numerically.