Joint, Marginal & Conditional Distributions
Ketika Variabel Tidak Hidup Sendiri-sendiri
1 Kenapa Ini Penting?
Di dunia nyata, variabel tidak pernah benar-benar independen. GDP growth berkorelasi dengan investment. Education berkorelasi dengan income. Residuals dalam spatial data berkorelasi dengan tetangga mereka.
Covariance matrix, correlation, conditional expectations — semuanya berakar dari joint distributions. Multivariate statistics dimulai di sini. Tanpa memahami joint distributions, kamu tidak akan mengerti kenapa:
- OLS mengeksploitasi \(E[y|X] = X\beta\)
- GLS mengoreksi untuk \(\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I\)
- Omitted variable bias terjadi ketika omitted variable berkorelasi dengan included variables
- Principal Component Analysis mencari directions of maximum variance dalam data
2 1. Joint Distribution
Untuk variabel random kontinu \((X, Y)\), joint PDF \(f_{X,Y}(x,y)\) memenuhi: \[P(a \leq X \leq b, c \leq Y \leq d) = \int_a^b\int_c^d f_{X,Y}(x,y)\,dy\,dx\]
dengan \(f_{X,Y}(x,y) \geq 0\) dan \(\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\,dx = 1\).
Untuk variabel diskret, joint PMF: \(p_{X,Y}(x,y) = P(X=x, Y=y)\) dengan \(\sum_x\sum_y p(x,y) = 1\).
Contoh sederhana — Joint PMF diskret untuk dua koin adil:
| \(Y=0\) | \(Y=1\) | |
|---|---|---|
| \(X=0\) | 1/4 | 1/4 |
| \(X=1\) | 1/4 | 1/4 |
Ini adalah kasus independent. Nanti kita akan lihat yang tidak independent.
3 2. Marginal Distributions
Marginal distribution adalah distribusi satu variabel, “dikompres” dari joint distribution.
Dari joint distribution, kita bisa recover marginal distributions:
Kontinu: \[f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\] \[f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx\]
Diskret: \[p_X(x) = \sum_y p_{X,Y}(x,y) \quad \text{(sum over all y)}\]
Intuisi: Marginal \(f_X(x)\) adalah total probability of \(X=x\) tanpa peduli nilai \(Y\). Di tabel diskret, ini adalah row sum atau column sum.
4 3. Conditional Distribution
Conditional distribution adalah distribusi satu variabel given nilai variabel lain.
\[f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0\]
Ini langsung dari definisi conditional probability: \(P(A|B) = P(A \cap B)/P(B)\).
Conditional expectation: \[E[Y|X=x] = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x)\,dy\]
Perhatikan: \(E[Y|X=x]\) adalah fungsi dari \(x\), bukan konstanta.
Contoh penting: Dalam linear regression, kita mengasumsikan \(E[y|X] = X\beta\). Ini adalah pernyataan tentang conditional expectation function — given the predictors \(X\), expected value of \(y\) is a linear function of \(X\).
5 4. Independence
\(X\) dan \(Y\) independent jika dan hanya jika: \[f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) \quad \text{untuk semua } x, y\]
Ekuivalen: \(f_{Y|X}(y|x) = f_Y(y)\) (mengetahui \(X\) tidak mengubah distribusi \(Y\)).
Implikasi dari independence: - \(E[XY] = E[X]E[Y]\) - \(\text{Cov}(X,Y) = 0\) - \(\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)\)
HATI-HATI: \(\text{Cov}(X,Y) = 0\) (uncorrelated) TIDAK mengimplikasikan independence, kecuali untuk multivariate normal!
6 5. Covariance
\[\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - \mu_X\mu_Y\]
Computational formula: \(\text{Cov}(X,Y) = E[XY] - E[X]E[Y]\)
Properties: - \(\text{Cov}(X,X) = \text{Var}(X)\) - \(\text{Cov}(X,Y) = \text{Cov}(Y,X)\) (symmetric) - \(\text{Cov}(aX+b, cY+d) = ac \cdot \text{Cov}(X,Y)\) - \(\text{Cov}(X+Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)\) (bilinear)
Var of sum: \[\text{Var}(X+Y) = \text{Var}(X) + 2\text{Cov}(X,Y) + \text{Var}(Y)\]
Lebih umum: \[\text{Var}\left(\sum_i a_i X_i\right) = \sum_i a_i^2 \text{Var}(X_i) + 2\sum_{i<j} a_i a_j \text{Cov}(X_i, X_j)\]
Masalah dengan covariance: unit-dependent. \(\text{Cov}(\text{income in IDR}, \text{years edu})\) akan jauh lebih besar dari \(\text{Cov}(\text{income in million IDR}, \text{years edu})\) meskipun mengukur hal yang sama. Solusi: correlation.
7 6. Correlation
\[\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}, \quad \sigma_X = \sqrt{\text{Var}(X)}, \sigma_Y = \sqrt{\text{Var}(Y)}\]
Properties: - \(-1 \leq \rho \leq 1\) (Cauchy-Schwarz inequality) - \(\rho = 1\): perfect positive linear relationship - \(\rho = -1\): perfect negative linear relationship - \(\rho = 0\): uncorrelated (bukan necessarily independent!) - Scale-invariant: \(\text{Corr}(aX+b, cY+d) = \text{sign}(ac) \cdot \text{Corr}(X,Y)\)
Catatan penting: Correlation mengukur linear association. \(Y = X^2\) dengan \(X \sim N(0,1)\): \(\text{Corr}(X,Y) = 0\) meskipun \(Y\) sepenuhnya ditentukan oleh \(X\)!
8 7. Covariance Matrix
Untuk random vector \(\mathbf{X} = (X_1, \ldots, X_p)^T\), covariance matrix (atau variance-covariance matrix) \(\Sigma\) adalah matrix \(p \times p\):
\[\Sigma_{ij} = \text{Cov}(X_i, X_j)\]
Dengan bentuk matrix: \[\Sigma = E[(\mathbf{X}-\boldsymbol{\mu})(\mathbf{X}-\boldsymbol{\mu})^T]\]
Properties: - Symmetric: \(\Sigma = \Sigma^T\) - Positive semi-definite (PSD): \(\mathbf{a}^T\Sigma\mathbf{a} \geq 0\) untuk semua \(\mathbf{a} \in \mathbb{R}^p\) - Diagonal elements = variances: \(\Sigma_{ii} = \text{Var}(X_i)\)
Ekspresi matrix untuk variance linear combination: \[\text{Var}(\mathbf{a}^T\mathbf{X}) = \mathbf{a}^T\Sigma\mathbf{a}\]
Ini sangat penting untuk OLS: variance dari estimator \(\hat{\beta} = (X^TX)^{-1}X^Ty\) adalah: \[\text{Var}(\hat{\beta}) = (X^TX)^{-1}X^T \cdot (\sigma^2 I) \cdot X(X^TX)^{-1} = \sigma^2(X^TX)^{-1}\]
9 8. Bivariate Normal Distribution
\((X,Y) \sim N(\boldsymbol{\mu}, \Sigma)\) dengan \(\boldsymbol{\mu} = (\mu_X, \mu_Y)^T\) dan: \[\Sigma = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix}\]
PDF: \[f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)\]
Key Properties for Bivariate Normal: 1. Marginals: \(X \sim N(\mu_X, \sigma_X^2)\), \(Y \sim N(\mu_Y, \sigma_Y^2)\) 2. Conditional distribution is also normal: \[Y|X=x \sim N\left(\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X),\ \sigma_Y^2(1-\rho^2)\right)\] 3. Jika \(\rho = 0\), maka \(X\) dan \(Y\) independent (hanya berlaku untuk normal!)
Implikasi untuk regresi: Conditional mean \(E[Y|X=x] = \mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X)\) adalah linear function dari \(x\). Ini memberikan theoretical justification untuk linear regression ketika data joint normal.
Slope regresi \(Y\) on \(X\): \(b = \rho\frac{\sigma_Y}{\sigma_X} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}\).
10 9. Law of Iterated Expectations
\[E[Y] = E_X[E[Y|X]] = E[E[Y|X]]\]
Artinya: untuk mendapat unconditional expectation of \(Y\), kita bisa: 1. Hitung conditional expectation \(E[Y|X=x]\) untuk setiap \(x\) 2. Average over distribution of \(X\)
Dalam notasi integral: \[E[Y] = \int E[Y|X=x] \cdot f_X(x)\,dx\]
Kenapa ini penting? LIE adalah fondasi dari:
Causal inference: \(E[Y^{(1)} - Y^{(0)}]\) dapat dihitung sebagai \(E_X[E[Y|X, D=1] - E[Y|X, D=0]]\) under conditional independence.
Panel data: \(E[y_{it}] = E_i[E[y_{it}|i]]\) — decomposing into between and within variation.
Prediction: Best predictor of \(Y\) given \(X\) (minimizing MSE) is \(E[Y|X]\).
Contoh intuitif: Rata-rata nilai ujian siswa di sekolah = rata-rata (rata-rata nilai di setiap kelas). LIE menyatakan ini secara formal.
11 10. Conditional Variance Formula (Eve’s Law)
\[\text{Var}(Y) = E[\text{Var}(Y|X)] + \text{Var}(E[Y|X])\]
Decomposisi: - \(E[\text{Var}(Y|X)]\) = within-group variance (average variance dalam setiap group \(X\)) - \(\text{Var}(E[Y|X])\) = between-group variance (variance of group means)
Analogy ke ANOVA: Ini persis SST = SSW + SSB dalam ANOVA!
Aplikasi dalam regression: \(R^2 = \text{SSR}/\text{SST} = \text{Var}(E[Y|X])/\text{Var}(Y)\) adalah fraction of variance yang explained oleh between-group variation dalam conditional means.
12 11. Worked Example: Bivariate Covariance
Problem: Misalkan joint PMF untuk \((X, Y)\) adalah:
| \(Y=0\) | \(Y=1\) | \(Y=2\) | |
|---|---|---|---|
| \(X=0\) | 0.1 | 0.2 | 0.1 |
| \(X=1\) | 0.2 | 0.3 | 0.1 |
Step 1: Marginal distributions. \[p_X(0) = 0.1+0.2+0.1 = 0.4, \quad p_X(1) = 0.2+0.3+0.1 = 0.6\] \[p_Y(0) = 0.1+0.2 = 0.3, \quad p_Y(1) = 0.2+0.3 = 0.5, \quad p_Y(2) = 0.1+0.1 = 0.2\]
Step 2: Means. \[E[X] = 0(0.4) + 1(0.6) = 0.6\] \[E[Y] = 0(0.3) + 1(0.5) + 2(0.2) = 0 + 0.5 + 0.4 = 0.9\]
Step 3: \(E[XY]\). \[E[XY] = \sum_{x,y} xy \cdot p(x,y)\] \[= 0\cdot0\cdot0.1 + 0\cdot1\cdot0.2 + 0\cdot2\cdot0.1 + 1\cdot0\cdot0.2 + 1\cdot1\cdot0.3 + 1\cdot2\cdot0.1\] \[= 0 + 0 + 0 + 0 + 0.3 + 0.2 = 0.5\]
Step 4: Covariance dan correlation. \[\text{Cov}(X,Y) = E[XY] - E[X]E[Y] = 0.5 - (0.6)(0.9) = 0.5 - 0.54 = -0.04\]
\[\text{Var}(X) = E[X^2] - (E[X])^2 = 0.6 - 0.36 = 0.24\] \[\text{Var}(Y) = E[Y^2] - (E[Y])^2 = (0+0.5+0.8) - 0.81 = 1.3 - 0.81 = 0.49\]
\[\rho = \frac{-0.04}{\sqrt{0.24 \times 0.49}} = \frac{-0.04}{\sqrt{0.1176}} = \frac{-0.04}{0.343} = -0.117\]
Step 5: Conditional expectation \(E[Y|X=1]\). \[P(Y=0|X=1) = 0.2/0.6 = 1/3, \quad P(Y=1|X=1) = 0.3/0.6 = 1/2, \quad P(Y=2|X=1) = 0.1/0.6 = 1/6\] \[E[Y|X=1] = 0(1/3) + 1(1/2) + 2(1/6) = 0 + 0.5 + 0.333 = 0.833\]
Verify LIE: \[E[E[Y|X]] = E[Y|X=0]\cdot 0.4 + E[Y|X=1]\cdot 0.6\] \[E[Y|X=0] = 0(1/4) + 1(1/2) + 2(1/4) = 1\] \[= 1(0.4) + 0.833(0.6) = 0.4 + 0.5 = 0.9 = E[Y] \checkmark\]
# Define joint PMF as matrix
joint_pmf <- matrix(c(0.1, 0.2, 0.1,
0.2, 0.3, 0.1), nrow=2, byrow=TRUE)
rownames(joint_pmf) <- c("X=0", "X=1")
colnames(joint_pmf) <- c("Y=0", "Y=1", "Y=2")
# Marginals
p_X <- rowSums(joint_pmf) # c(0.4, 0.6)
p_Y <- colSums(joint_pmf) # c(0.3, 0.5, 0.2)
# Means
x_vals <- c(0, 1); y_vals <- c(0, 1, 2)
E_X <- sum(x_vals * p_X) # 0.6
E_Y <- sum(y_vals * p_Y) # 0.9
# E[XY]
E_XY <- sum(outer(x_vals, y_vals) * joint_pmf) # 0.5
# Covariance
cov_XY <- E_XY - E_X * E_Y # -0.04
# Variances and correlation
var_X <- sum(x_vals^2 * p_X) - E_X^2 # 0.24
var_Y <- sum(y_vals^2 * p_Y) - E_Y^2 # 0.49
rho <- cov_XY / sqrt(var_X * var_Y) # -0.117
cat("Cov(X,Y) =", cov_XY, "\nCorr(X,Y) =", rho, "\n")13 12. R Code: Working with Joint Distributions
library(MASS)
library(ggplot2)
# ============================================================
# BIVARIATE NORMAL: Simulate and visualize
# ============================================================
set.seed(2024)
n <- 1000
# Parameters
mu <- c(2, 5)
sigma_x <- 1.5
sigma_y <- 2.0
rho <- 0.7
# Covariance matrix
Sigma <- matrix(c(sigma_x^2,
rho * sigma_x * sigma_y,
rho * sigma_x * sigma_y,
sigma_y^2), nrow=2)
# Simulate
data <- mvrnorm(n, mu, Sigma)
X <- data[, 1]; Y <- data[, 2]
# Verify
cat("Sample correlation:", cor(X, Y), "(True:", rho, ")\n")
cat("Sample cov matrix:\n"); print(cov(data))
cat("True cov matrix:\n"); print(Sigma)
# ============================================================
# CONDITIONAL EXPECTATION
# ============================================================
# Theoretical E[Y|X=x]
# E[Y|X=x] = mu_Y + rho * (sigma_Y/sigma_X) * (x - mu_X)
cond_mean_fn <- function(x) {
mu[2] + rho * (sigma_y/sigma_x) * (x - mu[1])
}
# Empirical: bin X and compute average Y in each bin
x_breaks <- quantile(X, probs=seq(0, 1, by=0.2))
X_bin <- cut(X, breaks=x_breaks, include.lowest=TRUE)
cond_means <- tapply(Y, X_bin, mean)
x_midpoints <- (x_breaks[-length(x_breaks)] + x_breaks[-1]) / 2
# Plot
plot(X, Y, col=rgb(0,0,1,0.2), pch=16, cex=0.5,
main="Bivariate Normal with Conditional Mean",
xlab="X", ylab="Y")
curve(cond_mean_fn(x), add=TRUE, col="red", lwd=2)
points(x_midpoints, cond_means, col="orange", pch=17, cex=1.5)
legend("topleft", c("Data", "True E[Y|X]", "Empirical E[Y|X]"),
col=c(rgb(0,0,1,0.5), "red", "orange"),
pch=c(16, NA, 17), lty=c(NA, 1, NA), lwd=c(NA, 2, NA))
# ============================================================
# LAW OF ITERATED EXPECTATIONS: verify numerically
# ============================================================
# Marginal E[Y] should equal E[E[Y|X]]
n_x_points <- 100
x_range <- seq(min(X), max(X), length.out=n_x_points)
cond_means_at_x <- cond_mean_fn(x_range)
density_at_x <- dnorm(x_range, mean=mu[1], sd=sigma_x)
# Numerical integral: E_X[E[Y|X]]
LIE_integral <- sum(cond_means_at_x * density_at_x) * diff(x_range)[1]
cat("\nLIE check:\n")
cat("E[Y] (marginal) =", mu[2], "\n")
cat("E[E[Y|X]] (integral) =", LIE_integral, "\n")
# ============================================================
# COVARIANCE MATRIX IN OLS CONTEXT
# ============================================================
set.seed(42)
n <- 200
X_mat <- cbind(1, rnorm(n), rnorm(n)) # Design matrix [1, x2, x3]
beta_true <- c(1, 2, -1)
sigma_sq <- 4
y <- X_mat %*% beta_true + rnorm(n, sd=sqrt(sigma_sq))
# OLS
XtX_inv <- solve(t(X_mat) %*% X_mat)
beta_hat <- XtX_inv %*% t(X_mat) %*% y
e_hat <- y - X_mat %*% beta_hat
sigma_sq_hat <- sum(e_hat^2) / (n - 3)
# Estimated covariance matrix of beta_hat
Var_beta_hat <- sigma_sq_hat * XtX_inv
cat("\nEstimated Cov(beta_hat):\n")
print(Var_beta_hat)
cat("\nSE of beta_hat:", sqrt(diag(Var_beta_hat)), "\n")
# Verify with lm()
model <- lm(y ~ X_mat[,2] + X_mat[,3])
cat("\nFrom lm():", coef(summary(model))[, "Std. Error"], "\n")14 13. Koneksi ke Econometrics dan ML
OLS sebagai conditional expectation model: Kita mengasumsikan \(E[\varepsilon|X] = 0\), yang berarti \(E[y|X] = X\beta\). Ini adalah pernyataan tentang conditional distribution — given covariates, expected error adalah nol.
Ketika asumsi ini dilanggar (endogeneity), estimator OLS tidak konsisten karena \(E[\varepsilon|X] \neq 0\), atau equivalently \(\text{Cov}(X, \varepsilon) \neq 0\).
GLS untuk correlated errors: Ketika \(\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I\) (serial correlation atau heteroskedasticity), kita perlu GLS yang memperhitungkan joint distribution dari error terms.
Principal Component Analysis: PCA mencari eigenvectors dari covariance matrix \(\Sigma\). Ini adalah kasus direct dari linear algebra covariance matrix.
Omitted variable bias: Jika true model adalah \(y = X_1\beta_1 + X_2\beta_2 + \varepsilon\) tapi kita run \(y = X_1\gamma + \eta\), maka \(\hat{\gamma} \to \beta_1 + \text{Cov}(X_1, X_2)/\text{Var}(X_1) \cdot \beta_2\). Besarnya bias bergantung pada \(\text{Cov}(X_1, X_2)\) — sebuah joint distribution quantity!
15 Practice Problems
Problem 1: Marginal dan conditional.
Joint PDF: \(f(x,y) = 2\) untuk \(0 \leq x \leq y \leq 1\), nol otherwise.
- Hitung \(f_X(x)\) dan \(f_Y(y)\)
- Hitung \(f_{Y|X}(y|x)\)
- Apakah \(X\) dan \(Y\) independent?
- Hitung \(E[Y|X=0.3]\)
Jawaban: - \(f_X(x) = \int_x^1 2\,dy = 2(1-x)\) untuk \(0 \leq x \leq 1\) - \(f_Y(y) = \int_0^y 2\,dx = 2y\) untuk \(0 \leq y \leq 1\) - \(f_{Y|X}(y|x) = 2/(2(1-x)) = 1/(1-x)\) untuk \(x \leq y \leq 1\) (Uniform[x,1]) - No, karena \(f_{X,Y} \neq f_X \cdot f_Y\) - \(E[Y|X=0.3] = (0.3+1)/2 = 0.65\) (mean of Uniform[0.3, 1])
Problem 2: Covariance dan correlation.
\(X \sim \text{Uniform}(0,1)\) dan \(Y = X + \varepsilon\) dimana \(\varepsilon \sim N(0, \sigma^2)\) independent dari \(X\).
- Hitung \(\text{Cov}(X,Y)\)
- Hitung \(\text{Var}(Y)\)
- Hitung \(\text{Corr}(X,Y)\)
- Saat \(\sigma^2 \to 0\), apa yang terjadi pada correlation?
Jawaban: - \(\text{Cov}(X,Y) = \text{Cov}(X, X+\varepsilon) = \text{Var}(X) = 1/12\) - \(\text{Var}(Y) = \text{Var}(X) + \text{Var}(\varepsilon) = 1/12 + \sigma^2\) - \(\rho = (1/12)/\sqrt{(1/12)(1/12+\sigma^2)}\) - Saat \(\sigma^2 \to 0\): \(\rho \to 1\) (perfect correlation)
Problem 3: Law of Iterated Expectations.
Nilai ujian siswa \(Y\) berdistribusi berbeda depending on school type \(X\): - \(X=1\) (public school, 60% siswa): \(Y|X=1 \sim N(75, 100)\) - \(X=2\) (private school, 40% siswa): \(Y|X=2 \sim N(85, 64)\)
Hitung: - \(E[Y]\) (overall average score) - \(\text{Var}(Y)\) (overall variance, use Law of Total Variance)
Jawaban: - \(E[Y] = E[E[Y|X]] = 75(0.6) + 85(0.4) = 45 + 34 = 79\) - \(E[\text{Var}(Y|X)] = 100(0.6) + 64(0.4) = 60 + 25.6 = 85.6\) - \(\text{Var}(E[Y|X]) = (75-79)^2(0.6) + (85-79)^2(0.4) = 16(0.6) + 36(0.4) = 9.6 + 14.4 = 24\) - \(\text{Var}(Y) = 85.6 + 24 = 109.6\)