Confidence Intervals & Bootstrap
Ketidakpastian yang Terukur
1 Kenapa Ini Penting?
Confidence interval yang benar memberikan range of plausible values untuk parameter yang tidak diketahui, bukan “probability parameter ada di sini”. Misunderstanding CI adalah one of the most common errors in applied research — bahkan di jurnal top.
Survey ke 1000+ statistisi (Hoekstra et al. 2014): mayoritas menjawab dengan interpretasi salah tentang CI.
Interpretasi SALAH yang umum: - “Ada 95% probability bahwa true parameter ada dalam interval ini” — SALAH (frequentist CI bukan probabilistic statement tentang parameter) - “Jika kita replikasi study, 95% chance parameter sama” — SALAH - “95% dari observasi ada dalam CI” — itu prediction interval, bukan CI
Interpretasi BENAR: Jika kita ulangi prosedur sampling berkali-kali dan compute CI setiap kali, 95% dari CI yang dihasilkan akan mengandung true parameter.
Ini bukan sekedar technicality. Memahami ini penting untuk: - Melaporkan hasil dengan benar - Tidak over-claim dari data - Membedakan CI dari prediction intervals dan Bayesian credible intervals
2 1. Definisi Formal
Interval \([\hat{\theta}_L, \hat{\theta}_U]\) (keduanya fungsi dari data) adalah \((1-\alpha)\) confidence interval untuk \(\theta\) jika:
\[P(\hat{\theta}_L \leq \theta \leq \hat{\theta}_U) \geq 1-\alpha \quad \text{untuk semua } \theta\]
Confidence level \(1-\alpha\) biasanya 90%, 95%, atau 99%.
Kunci: Random variables di sini adalah endpoint \(\hat{\theta}_L\) dan \(\hat{\theta}_U\), bukan \(\theta\). Parameter \(\theta\) adalah tetap (unknown constant) dalam paradigma frequentist.
Analogy yang berguna: Bayangkan CI seperti jaring ikan. Kita lakukan 100 kali sampling, 100 kali buat CI. Sekitar 95 jaring akan “menangkap” ikan (true parameter). Bukan 95% probability ikan ada di satu jaring tertentu — ikan sudah di mana adanya.
3 2. Pivotal Method
Suatu statistik \(Q = Q(\hat{\theta}, \theta)\) adalah pivot jika distribusinya tidak bergantung pada \(\theta\) (atau parameter lain yang unknown).
Prosedur: Jika \(P(a \leq Q \leq b) = 1-\alpha\), maka manipulasi aljabar untuk isolate \(\theta\) memberikan CI.
Contoh: Normal mean dengan unknown variance
Pivot: \(T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t(n-1)\)
Distribusi \(t(n-1)\) tidak bergantung pada \(\mu\) atau \(\sigma^2\).
\[P\left(-t_{n-1,\alpha/2} \leq \frac{\bar{X}-\mu}{S/\sqrt{n}} \leq t_{n-1,\alpha/2}\right) = 1-\alpha\]
Manipulate: \(P\left(\bar{X} - t_{n-1,\alpha/2}\frac{S}{\sqrt{n}} \leq \mu \leq \bar{X} + t_{n-1,\alpha/2}\frac{S}{\sqrt{n}}\right) = 1-\alpha\)
4 3. Normal-Based CI
Jika estimator \(\hat{\theta}\) adalah asymptotically normal dengan SE \(\hat{\sigma}_{\hat{\theta}}\):
\[\hat{\theta} \pm z_{\alpha/2} \cdot \text{SE}(\hat{\theta})\]
dimana \(z_{\alpha/2} = \Phi^{-1}(1-\alpha/2)\).
Common values: - 90% CI: \(z_{0.05} = 1.645\) - 95% CI: \(z_{0.025} = 1.960\) - 99% CI: \(z_{0.005} = 2.576\)
Kapan gunakan z vs t?
- t distribution: ketika \(\sigma^2\) unknown dan estimated, small samples, atau exact inference
- z (normal): ketika \(\sigma^2\) known (rare) atau large samples (\(n > 100\)) where \(t \approx z\)
- In practice: pakai t selalu untuk means (konservatif, bedanya minor untuk \(n > 30\))
5 4. t-Based CI untuk Regression Coefficients
Untuk OLS model \(y = X\beta + \varepsilon\) dengan normality assumption:
\[\hat{\beta}_j \pm t_{n-k, \alpha/2} \cdot \hat{s.e.}(\hat{\beta}_j)\]
dimana: - \(\hat{s.e.}(\hat{\beta}_j) = \sqrt{\hat{\sigma}^2[(X^TX)^{-1}]_{jj}}\) - \(\hat{\sigma}^2 = SSE/(n-k)\) - \(n-k\) = residual degrees of freedom
Asymptotically: Dengan large \(n\), \(t_{n-k} \approx N(0,1)\) dan kita dapat: \[\hat{\beta}_j \pm z_{\alpha/2} \cdot \hat{s.e.}(\hat{\beta}_j)\]
CI vs Hypothesis Test — Duality:
\(\beta_j = 0\) tidak ada dalam 95% CI \(\Leftrightarrow\) Reject \(H_0: \beta_j = 0\) pada \(\alpha = 0.05\) (two-sided).
Ini disebut duality between CI and hypothesis tests. CI memberikan lebih banyak informasi — bukan hanya “significant or not” tapi juga magnitude dan precision.
6 5. CI untuk Selisih Dua Means
Dengan equal variances: \[(\bar{X}_1 - \bar{X}_2) \pm t_{n_1+n_2-2, \alpha/2} \cdot S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\]
Welch’s CI (unequal variances, preferred): \[(\bar{X}_1 - \bar{X}_2) \pm t_{\nu, \alpha/2} \cdot \sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}}\]
dimana \(\nu\) adalah Welch-Satterthwaite effective degrees of freedom.
7 6. Delta Method CI
Ingat Delta Method: jika \(\sqrt{n}(\hat{\theta} - \theta) \xrightarrow{d} N(0, V)\) dan \(g\) differentiable: \[\sqrt{n}(g(\hat{\theta}) - g(\theta)) \xrightarrow{d} N(0, V[g'(\theta)]^2)\]
CI untuk \(g(\theta)\): \[g(\hat{\theta}) \pm z_{\alpha/2} \cdot \sqrt{\hat{V}/n} \cdot |g'(\hat{\theta})|\]
Contoh: CI untuk odds ratio \(OR = e^\beta\) dari logistic regression.
Jika \(\hat{\beta} \pm z_{\alpha/2} \cdot SE(\hat{\beta})\) adalah CI untuk \(\beta\), maka: \[[e^{\hat{\beta} - z_{\alpha/2} \cdot SE(\hat{\beta})},\ e^{\hat{\beta} + z_{\alpha/2} \cdot SE(\hat{\beta})}]\] adalah CI untuk \(OR = e^\beta\) (by delta method with \(g(x) = e^x\), \(g'(x) = e^x\)).
8 7. Bootstrap
Bootstrap adalah pendekatan computationally intensive untuk constructing CIs tanpa parametric assumptions.
Algoritma Bootstrap: 1. Dari sample \(\{x_1, \ldots, x_n\}\), compute statistic \(\hat{\theta}\) of interest 2. For \(b = 1, \ldots, B\): a. Draw bootstrap sample \(x^{*(b)} = \{x_1^*, \ldots, x_n^*\}\) dengan replacement dari \(\{x_1,\ldots,x_n\}\) b. Compute \(\hat{\theta}^{*(b)}\) dari bootstrap sample 3. Use distribution of \(\{\hat{\theta}^{*(1)}, \ldots, \hat{\theta}^{*(B)}\}\) untuk inference
Bootstrap SE: \(\hat{SE}_{boot} = \text{SD}(\hat{\theta}^{*(1)}, \ldots, \hat{\theta}^{*(B)})\)
Percentile Bootstrap CI: \([\hat{\theta}^{*(\alpha/2)}, \hat{\theta}^{*(1-\alpha/2)}]\) — quantiles dari bootstrap distribution
BCa (Bias-Corrected and Accelerated) CI: Lebih accurate tapi lebih complex.
8.1 Bootstrap SE vs Analytical SE
Bootstrap SE dan analytical SE harus sangat mirip untuk smooth statistics (means, regression coefficients). Jika sangat berbeda, ada warning signal: - Bootstrap mungkin failing (heavy tails, small \(n\)) - Analytical formula mungkin assuming something wrong
8.2 Kapan Bootstrap Lebih Baik?
- Statistics yang complex tanpa analytical formulas (e.g., median, quantiles, ratios)
- Small samples di mana asymptotic approximations poor
- Untuk validation dari analytical SEs
8.3 Kapan Bootstrap Gagal?
- Heavy-tailed distributions (infinite variance)
- Statistics yang non-smooth (e.g., maximum, minimum)
- Dependent data (time series) — butuh block bootstrap
- Very small samples
9 8. Worked Example: Bootstrap vs Analytical CI
Problem: 250 dari 1000 students lulus ujian. Hitung 95% CI untuk true pass rate \(p\).
Analytical CI (normal approximation): \[\hat{p} = 0.25, \quad SE = \sqrt{\hat{p}(1-\hat{p})/n} = \sqrt{0.25 \times 0.75/1000} = 0.0137\] \[\text{CI: } [0.25 - 1.96(0.0137),\ 0.25 + 1.96(0.0137)] = [0.223, 0.277]\]
Exact Clopper-Pearson CI (based on Beta distribution): \[[B(\alpha/2; k, n-k+1),\ B(1-\alpha/2; k+1, n-k)] = [0.224, 0.278]\]
Bootstrap CI:
set.seed(2024)
n <- 1000; k <- 250 # 250 successes out of 1000
p_hat <- k/n # 0.25
# Create binary data
x <- c(rep(1, k), rep(0, n-k))
# Bootstrap
B <- 5000
p_boot <- replicate(B, mean(sample(x, n, replace=TRUE)))
# Percentile Bootstrap CI
ci_boot <- quantile(p_boot, c(0.025, 0.975))
# Analytical CI
se_analytical <- sqrt(p_hat * (1-p_hat) / n)
ci_analytical <- c(p_hat - 1.96*se_analytical, p_hat + 1.96*se_analytical)
# Exact CI
ci_exact <- c(qbeta(0.025, k, n-k+1), qbeta(0.975, k+1, n-k))
cat("=== COMPARISON OF 95% CIs ===\n")
cat(sprintf("Analytical: [%.4f, %.4f]\n", ci_analytical[1], ci_analytical[2]))
cat(sprintf("Exact: [%.4f, %.4f]\n", ci_exact[1], ci_exact[2]))
cat(sprintf("Bootstrap: [%.4f, %.4f]\n", ci_boot[1], ci_boot[2]))
# Plot bootstrap distribution
hist(p_boot, probability=TRUE, breaks=50,
main="Bootstrap Distribution of p_hat",
xlab="Bootstrap p*")
abline(v=ci_boot, col="red", lwd=2, lty=2)
abline(v=p_hat, col="blue", lwd=2)Dengan \(n=1000\), ketiga metode hampir identik. Bootstrap paling berguna ketika parametric methods unclear atau \(n\) kecil.
10 9. Sandwich Estimator dan Robust SEs
Dalam regresi OLS dengan heteroskedasticity (\(\text{Var}(\varepsilon_i|x_i) = \sigma_i^2\)), usual SE tidak valid. HC (Heteroskedasticity-Consistent) atau “robust” SE menggunakan sandwich estimator:
\[\hat{V}_{HC} = (X^TX)^{-1}\left(\sum_i \hat{\varepsilon}_i^2 x_ix_i^T\right)(X^TX)^{-1}\]
Bentuk sandwich: \(A^{-1}BA^{-1}\) dimana \(A = X^TX/n\) dan \(B = \sum_i \hat{\varepsilon}_i^2 x_ix_i^T/n\).
Varianten: - HC0: \(\hat{\varepsilon}_i^2\) (original White 1980) - HC1: \(n/(n-k) \cdot \hat{\varepsilon}_i^2\) (finite sample correction) - HC3: \(\hat{\varepsilon}_i^2/(1-h_{ii})^2\) (leverage-adjusted, recommended for small \(n\))
CI menggunakan robust SE: \[\hat{\beta}_j \pm t_{n-k,\alpha/2} \cdot \hat{se}_{HC}(\hat{\beta}_j)\]
11 10. Koneksi ke Econometrics
Standard OLS CI: Valid hanya under homoskedasticity + normality. Dalam practice, hampir selalu gunakan robust SEs.
Cluster-robust SEs: Ketika observations dalam cluster berkorelasi (e.g., students dalam school, firm-level data), cluster-robust SE menggunakan: \[\hat{V}_{cluster} = (X^TX)^{-1}\left(\sum_g X_g^T\hat{E}_g\hat{E}_g^TX_g\right)(X^TX)^{-1}\]
dimana \(g\) adalah cluster index.
HAC (Newey-West) SEs: Untuk time series data dengan serial correlation dan/atau heteroskedasticity. CI menggunakan HAC SE valid asymptotically.
Bootstrap CI dalam Difference-in-Differences: Untuk complex treatment assignment designs, block bootstrap (resample clusters) memberikan valid CIs.
Comparing two specifications: Jika CIs dari dua alternative specifications largely overlap, effect estimate is not robust to specification.
12 11. R Code: Comprehensive CI Examples
library(sandwich)
library(lmtest)
library(boot)
# ============================================================
# COMPARISON: Different types of CIs for regression
# ============================================================
set.seed(2024)
n <- 100
x <- rnorm(n)
sigma_true <- 0.5 + abs(x) # Heteroskedastic!
y <- 2 + 1.5*x + rnorm(n, sd=sigma_true)
model <- lm(y ~ x)
# Standard CI (assumes homoskedasticity)
ci_standard <- confint(model, level=0.95)
# HC-robust CI
coeftest_hc <- coeftest(model, vcov=vcovHC(model, type="HC3"))
se_hc <- coeftest_hc[2, "Std. Error"]
b <- coef(model)[2]
ci_hc <- b + c(-1,1) * qt(0.975, df=n-2) * se_hc
cat("=== CIs for slope coefficient ===\n")
cat(sprintf("Standard: [%.4f, %.4f]\n", ci_standard[2,1], ci_standard[2,2]))
cat(sprintf("HC-Robust: [%.4f, %.4f]\n", ci_hc[1], ci_hc[2]))
# ============================================================
# BOOTSTRAP CI for median (no analytical formula)
# ============================================================
set.seed(42)
x_skew <- rexp(50, rate=1) # True median = log(2) ≈ 0.693
true_median <- log(2)
# Bootstrap function for median
boot_median <- function(data, indices) {
median(data[indices])
}
boot_result <- boot(data=x_skew, statistic=boot_median, R=2000)
# Three types of bootstrap CI
ci_types <- boot.ci(boot_result, type=c("norm", "basic", "perc", "bca"),
conf=0.95)
cat("\n=== Bootstrap CIs for Median (true =", round(true_median, 4), ") ===\n")
cat("Sample median:", median(x_skew), "\n")
cat(sprintf("Normal: [%.4f, %.4f]\n",
ci_types$norm[2], ci_types$norm[3]))
cat(sprintf("Basic: [%.4f, %.4f]\n",
ci_types$basic[4], ci_types$basic[5]))
cat(sprintf("Percentile: [%.4f, %.4f]\n",
ci_types$percent[4], ci_types$percent[5]))
cat(sprintf("BCa: [%.4f, %.4f]\n",
ci_types$bca[4], ci_types$bca[5]))
# ============================================================
# DELTA METHOD CI: for nonlinear function of parameters
# ============================================================
# Example: CI for ratio beta2/beta1 (elasticity type)
n <- 200
x1 <- rnorm(n, mean=5, sd=1)
x2 <- rnorm(n, mean=3, sd=1.5)
y <- 10 + 2*x1 + 4*x2 + rnorm(n, sd=2)
model2 <- lm(y ~ x1 + x2)
b1 <- coef(model2)["x1"]
b2 <- coef(model2)["x2"]
ratio_hat <- b2/b1 # Estimate of beta2/beta1
# Delta method: g(b1, b2) = b2/b1
# grad g = [-b2/b1^2, 1/b1]
Sigma_beta <- vcov(model2)[2:3, 2:3] # 2x2 cov matrix for [b1, b2]
grad_g <- c(-b2/b1^2, 1/b1)
# Asymptotic variance of ratio
var_ratio <- as.numeric(t(grad_g) %*% Sigma_beta %*% grad_g)
se_ratio <- sqrt(var_ratio)
ci_ratio_delta <- ratio_hat + c(-1,1) * qnorm(0.975) * se_ratio
cat("\n=== Delta Method CI for beta2/beta1 ===\n")
cat(sprintf("Estimate: %.4f (true: 2.0)\n", ratio_hat))
cat(sprintf("Delta Method CI: [%.4f, %.4f]\n",
ci_ratio_delta[1], ci_ratio_delta[2]))
# ============================================================
# VISUALIZE: What 95% CI means via simulation
# ============================================================
set.seed(2024)
n_sims <- 100; n <- 50; mu_true <- 5; sigma <- 2
# Compute 100 CIs
ci_results <- replicate(n_sims, {
x <- rnorm(n, mu_true, sigma)
c(mean(x) - qt(0.975, n-1)*sd(x)/sqrt(n),
mean(x) + qt(0.975, n-1)*sd(x)/sqrt(n))
})
# Which CIs contain true mu?
contains_true <- ci_results[1,] <= mu_true & ci_results[2,] >= mu_true
coverage <- mean(contains_true)
# Plot
plot(NULL, xlim=c(3, 7), ylim=c(0, n_sims+1),
main=sprintf("95%% CIs: %d/%d contain true mu=5", sum(contains_true), n_sims),
xlab="mu", ylab="Simulation")
abline(v=mu_true, col="red", lwd=2)
for(i in 1:n_sims) {
col <- ifelse(contains_true[i], "steelblue", "firebrick")
lines(ci_results[,i], c(i,i), col=col, lwd=0.8)
}
cat(sprintf("\nEmpirical coverage: %.1f%% (expected: 95%%)\n", coverage*100))13 Practice Problems
Problem 1: Interpretasi CI.
Kamu compute 95% CI untuk mean income: [42.3, 51.7] juta. Mana interpretasi yang benar?
- Ada 95% probability bahwa mean income antara 42.3 dan 51.7
- 95% dari individual incomes ada dalam range ini
- Jika kita ulangi sampling berkali-kali, 95% dari CIs yang dihasilkan akan mengandung true mean
- Mean income pasti ada dalam range ini dengan 95% confidence
Jawaban: (c) adalah satu-satunya yang benar dalam frequentist framework.
Problem 2: Bootstrap CI calculation.
Data: \(\{3, 7, 2, 9, 5, 1, 8, 4, 6, 10\}\). Statistic: median.
- Lakukan 5 bootstrap iterations manually (atau menggunakan R)
- Compute percentile bootstrap 80% CI
Problem 3: Delta method.
\(\hat{p}_1 = 0.6\) (n=100) dan \(\hat{p}_2 = 0.4\) (n=150) dari dua independent samples. Compute 95% CI untuk log odds ratio \(\log(OR) = \log\frac{p_1/(1-p_1)}{p_2/(1-p_2)}\).
Hint: Log OR = \(\log(\hat{p}_1/(1-\hat{p}_1)) - \log(\hat{p}_2/(1-\hat{p}_2))\). SE dari log(\(\hat{p}/(1-\hat{p})\)) via delta method = \(1/\sqrt{n\hat{p}(1-\hat{p})}\).
Jawaban: - \(\log(OR) = \log(0.6/0.4) - \log(0.4/0.6) = 0.405 - (-0.405) = 0.811\) - \(SE^2 = 1/(100 \times 0.6 \times 0.4) + 1/(150 \times 0.4 \times 0.6) = 1/24 + 1/36 = 5/72\) - \(SE = \sqrt{5/72} = 0.264\) - 95% CI: \([0.811 - 1.96(0.264), 0.811 + 1.96(0.264)] = [0.293, 1.329]\) - On OR scale: \([e^{0.293}, e^{1.329}] = [1.34, 3.78]\)
Problem 4: Coverage simulation.
Tulis R code untuk verify bahwa t-based CI has correct coverage (approximately 95%) untuk data dari skewed distribution (e.g., Exponential). Test for n=10, 30, 100.