Joint, Marginal & Conditional Distributions

Ketika Variabel Tidak Hidup Sendiri-sendiri

probability
joint-distributions
multivariate
Memahami joint distributions, covariance, correlation, dan Law of Iterated Expectations sebagai fondasi multivariate statistics dan econometrics.

1 Kenapa Ini Penting?

NoteWhy This Matters for Your Work

Di dunia nyata, variabel tidak pernah benar-benar independen. GDP growth berkorelasi dengan investment. Education berkorelasi dengan income. Residuals dalam spatial data berkorelasi dengan tetangga mereka.

Covariance matrix, correlation, conditional expectations — semuanya berakar dari joint distributions. Multivariate statistics dimulai di sini. Tanpa memahami joint distributions, kamu tidak akan mengerti kenapa:

  • OLS mengeksploitasi \(E[y|X] = X\beta\)
  • GLS mengoreksi untuk \(\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I\)
  • Omitted variable bias terjadi ketika omitted variable berkorelasi dengan included variables
  • Principal Component Analysis mencari directions of maximum variance dalam data

2 1. Joint Distribution

ImportantDefinisi: Joint PDF dan PMF

Untuk variabel random kontinu \((X, Y)\), joint PDF \(f_{X,Y}(x,y)\) memenuhi: \[P(a \leq X \leq b, c \leq Y \leq d) = \int_a^b\int_c^d f_{X,Y}(x,y)\,dy\,dx\]

dengan \(f_{X,Y}(x,y) \geq 0\) dan \(\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\,dx = 1\).

Untuk variabel diskret, joint PMF: \(p_{X,Y}(x,y) = P(X=x, Y=y)\) dengan \(\sum_x\sum_y p(x,y) = 1\).

Contoh sederhana — Joint PMF diskret untuk dua koin adil:

\(Y=0\) \(Y=1\)
\(X=0\) 1/4 1/4
\(X=1\) 1/4 1/4

Ini adalah kasus independent. Nanti kita akan lihat yang tidak independent.


3 2. Marginal Distributions

Marginal distribution adalah distribusi satu variabel, “dikompres” dari joint distribution.

ImportantDefinisi: Marginal Distribution

Dari joint distribution, kita bisa recover marginal distributions:

Kontinu: \[f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\] \[f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx\]

Diskret: \[p_X(x) = \sum_y p_{X,Y}(x,y) \quad \text{(sum over all y)}\]

Intuisi: Marginal \(f_X(x)\) adalah total probability of \(X=x\) tanpa peduli nilai \(Y\). Di tabel diskret, ini adalah row sum atau column sum.


4 3. Conditional Distribution

Conditional distribution adalah distribusi satu variabel given nilai variabel lain.

ImportantDefinisi: Conditional Distribution

\[f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0\]

Ini langsung dari definisi conditional probability: \(P(A|B) = P(A \cap B)/P(B)\).

Conditional expectation: \[E[Y|X=x] = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x)\,dy\]

Perhatikan: \(E[Y|X=x]\) adalah fungsi dari \(x\), bukan konstanta.

Contoh penting: Dalam linear regression, kita mengasumsikan \(E[y|X] = X\beta\). Ini adalah pernyataan tentang conditional expectation function — given the predictors \(X\), expected value of \(y\) is a linear function of \(X\).


5 4. Independence

ImportantDefinisi: Independence

\(X\) dan \(Y\) independent jika dan hanya jika: \[f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) \quad \text{untuk semua } x, y\]

Ekuivalen: \(f_{Y|X}(y|x) = f_Y(y)\) (mengetahui \(X\) tidak mengubah distribusi \(Y\)).

Implikasi dari independence: - \(E[XY] = E[X]E[Y]\) - \(\text{Cov}(X,Y) = 0\) - \(\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)\)

HATI-HATI: \(\text{Cov}(X,Y) = 0\) (uncorrelated) TIDAK mengimplikasikan independence, kecuali untuk multivariate normal!


6 5. Covariance

ImportantDefinisi: Covariance

\[\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - \mu_X\mu_Y\]

Computational formula: \(\text{Cov}(X,Y) = E[XY] - E[X]E[Y]\)

Properties: - \(\text{Cov}(X,X) = \text{Var}(X)\) - \(\text{Cov}(X,Y) = \text{Cov}(Y,X)\) (symmetric) - \(\text{Cov}(aX+b, cY+d) = ac \cdot \text{Cov}(X,Y)\) - \(\text{Cov}(X+Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)\) (bilinear)

Var of sum: \[\text{Var}(X+Y) = \text{Var}(X) + 2\text{Cov}(X,Y) + \text{Var}(Y)\]

Lebih umum: \[\text{Var}\left(\sum_i a_i X_i\right) = \sum_i a_i^2 \text{Var}(X_i) + 2\sum_{i<j} a_i a_j \text{Cov}(X_i, X_j)\]

Masalah dengan covariance: unit-dependent. \(\text{Cov}(\text{income in IDR}, \text{years edu})\) akan jauh lebih besar dari \(\text{Cov}(\text{income in million IDR}, \text{years edu})\) meskipun mengukur hal yang sama. Solusi: correlation.


7 6. Correlation

ImportantDefinisi: Correlation (Pearson)

\[\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}, \quad \sigma_X = \sqrt{\text{Var}(X)}, \sigma_Y = \sqrt{\text{Var}(Y)}\]

Properties: - \(-1 \leq \rho \leq 1\) (Cauchy-Schwarz inequality) - \(\rho = 1\): perfect positive linear relationship - \(\rho = -1\): perfect negative linear relationship - \(\rho = 0\): uncorrelated (bukan necessarily independent!) - Scale-invariant: \(\text{Corr}(aX+b, cY+d) = \text{sign}(ac) \cdot \text{Corr}(X,Y)\)

Catatan penting: Correlation mengukur linear association. \(Y = X^2\) dengan \(X \sim N(0,1)\): \(\text{Corr}(X,Y) = 0\) meskipun \(Y\) sepenuhnya ditentukan oleh \(X\)!


8 7. Covariance Matrix

ImportantDefinisi: Covariance Matrix

Untuk random vector \(\mathbf{X} = (X_1, \ldots, X_p)^T\), covariance matrix (atau variance-covariance matrix) \(\Sigma\) adalah matrix \(p \times p\):

\[\Sigma_{ij} = \text{Cov}(X_i, X_j)\]

Dengan bentuk matrix: \[\Sigma = E[(\mathbf{X}-\boldsymbol{\mu})(\mathbf{X}-\boldsymbol{\mu})^T]\]

Properties: - Symmetric: \(\Sigma = \Sigma^T\) - Positive semi-definite (PSD): \(\mathbf{a}^T\Sigma\mathbf{a} \geq 0\) untuk semua \(\mathbf{a} \in \mathbb{R}^p\) - Diagonal elements = variances: \(\Sigma_{ii} = \text{Var}(X_i)\)

Ekspresi matrix untuk variance linear combination: \[\text{Var}(\mathbf{a}^T\mathbf{X}) = \mathbf{a}^T\Sigma\mathbf{a}\]

Ini sangat penting untuk OLS: variance dari estimator \(\hat{\beta} = (X^TX)^{-1}X^Ty\) adalah: \[\text{Var}(\hat{\beta}) = (X^TX)^{-1}X^T \cdot (\sigma^2 I) \cdot X(X^TX)^{-1} = \sigma^2(X^TX)^{-1}\]


9 8. Bivariate Normal Distribution

ImportantDefinisi: Bivariate Normal

\((X,Y) \sim N(\boldsymbol{\mu}, \Sigma)\) dengan \(\boldsymbol{\mu} = (\mu_X, \mu_Y)^T\) dan: \[\Sigma = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix}\]

PDF: \[f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)\]

Key Properties for Bivariate Normal: 1. Marginals: \(X \sim N(\mu_X, \sigma_X^2)\), \(Y \sim N(\mu_Y, \sigma_Y^2)\) 2. Conditional distribution is also normal: \[Y|X=x \sim N\left(\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X),\ \sigma_Y^2(1-\rho^2)\right)\] 3. Jika \(\rho = 0\), maka \(X\) dan \(Y\) independent (hanya berlaku untuk normal!)

Implikasi untuk regresi: Conditional mean \(E[Y|X=x] = \mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X)\) adalah linear function dari \(x\). Ini memberikan theoretical justification untuk linear regression ketika data joint normal.

Slope regresi \(Y\) on \(X\): \(b = \rho\frac{\sigma_Y}{\sigma_X} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}\).


10 9. Law of Iterated Expectations

ImportantDefinisi: Law of Iterated Expectations (LIE)

\[E[Y] = E_X[E[Y|X]] = E[E[Y|X]]\]

Artinya: untuk mendapat unconditional expectation of \(Y\), kita bisa: 1. Hitung conditional expectation \(E[Y|X=x]\) untuk setiap \(x\) 2. Average over distribution of \(X\)

Dalam notasi integral: \[E[Y] = \int E[Y|X=x] \cdot f_X(x)\,dx\]

Kenapa ini penting? LIE adalah fondasi dari:

  1. Causal inference: \(E[Y^{(1)} - Y^{(0)}]\) dapat dihitung sebagai \(E_X[E[Y|X, D=1] - E[Y|X, D=0]]\) under conditional independence.

  2. Panel data: \(E[y_{it}] = E_i[E[y_{it}|i]]\) — decomposing into between and within variation.

  3. Prediction: Best predictor of \(Y\) given \(X\) (minimizing MSE) is \(E[Y|X]\).

Contoh intuitif: Rata-rata nilai ujian siswa di sekolah = rata-rata (rata-rata nilai di setiap kelas). LIE menyatakan ini secara formal.


11 10. Conditional Variance Formula (Eve’s Law)

ImportantDefinisi: Law of Total Variance

\[\text{Var}(Y) = E[\text{Var}(Y|X)] + \text{Var}(E[Y|X])\]

Decomposisi: - \(E[\text{Var}(Y|X)]\) = within-group variance (average variance dalam setiap group \(X\)) - \(\text{Var}(E[Y|X])\) = between-group variance (variance of group means)

Analogy ke ANOVA: Ini persis SST = SSW + SSB dalam ANOVA!

Aplikasi dalam regression: \(R^2 = \text{SSR}/\text{SST} = \text{Var}(E[Y|X])/\text{Var}(Y)\) adalah fraction of variance yang explained oleh between-group variation dalam conditional means.


12 11. Worked Example: Bivariate Covariance

Problem: Misalkan joint PMF untuk \((X, Y)\) adalah:

\(Y=0\) \(Y=1\) \(Y=2\)
\(X=0\) 0.1 0.2 0.1
\(X=1\) 0.2 0.3 0.1

Step 1: Marginal distributions. \[p_X(0) = 0.1+0.2+0.1 = 0.4, \quad p_X(1) = 0.2+0.3+0.1 = 0.6\] \[p_Y(0) = 0.1+0.2 = 0.3, \quad p_Y(1) = 0.2+0.3 = 0.5, \quad p_Y(2) = 0.1+0.1 = 0.2\]

Step 2: Means. \[E[X] = 0(0.4) + 1(0.6) = 0.6\] \[E[Y] = 0(0.3) + 1(0.5) + 2(0.2) = 0 + 0.5 + 0.4 = 0.9\]

Step 3: \(E[XY]\). \[E[XY] = \sum_{x,y} xy \cdot p(x,y)\] \[= 0\cdot0\cdot0.1 + 0\cdot1\cdot0.2 + 0\cdot2\cdot0.1 + 1\cdot0\cdot0.2 + 1\cdot1\cdot0.3 + 1\cdot2\cdot0.1\] \[= 0 + 0 + 0 + 0 + 0.3 + 0.2 = 0.5\]

Step 4: Covariance dan correlation. \[\text{Cov}(X,Y) = E[XY] - E[X]E[Y] = 0.5 - (0.6)(0.9) = 0.5 - 0.54 = -0.04\]

\[\text{Var}(X) = E[X^2] - (E[X])^2 = 0.6 - 0.36 = 0.24\] \[\text{Var}(Y) = E[Y^2] - (E[Y])^2 = (0+0.5+0.8) - 0.81 = 1.3 - 0.81 = 0.49\]

\[\rho = \frac{-0.04}{\sqrt{0.24 \times 0.49}} = \frac{-0.04}{\sqrt{0.1176}} = \frac{-0.04}{0.343} = -0.117\]

Step 5: Conditional expectation \(E[Y|X=1]\). \[P(Y=0|X=1) = 0.2/0.6 = 1/3, \quad P(Y=1|X=1) = 0.3/0.6 = 1/2, \quad P(Y=2|X=1) = 0.1/0.6 = 1/6\] \[E[Y|X=1] = 0(1/3) + 1(1/2) + 2(1/6) = 0 + 0.5 + 0.333 = 0.833\]

Verify LIE: \[E[E[Y|X]] = E[Y|X=0]\cdot 0.4 + E[Y|X=1]\cdot 0.6\] \[E[Y|X=0] = 0(1/4) + 1(1/2) + 2(1/4) = 1\] \[= 1(0.4) + 0.833(0.6) = 0.4 + 0.5 = 0.9 = E[Y] \checkmark\]

# Define joint PMF as matrix
joint_pmf <- matrix(c(0.1, 0.2, 0.1,
                      0.2, 0.3, 0.1), nrow=2, byrow=TRUE)
rownames(joint_pmf) <- c("X=0", "X=1")
colnames(joint_pmf) <- c("Y=0", "Y=1", "Y=2")

# Marginals
p_X <- rowSums(joint_pmf)  # c(0.4, 0.6)
p_Y <- colSums(joint_pmf)  # c(0.3, 0.5, 0.2)

# Means
x_vals <- c(0, 1); y_vals <- c(0, 1, 2)
E_X <- sum(x_vals * p_X)  # 0.6
E_Y <- sum(y_vals * p_Y)  # 0.9

# E[XY]
E_XY <- sum(outer(x_vals, y_vals) * joint_pmf)  # 0.5

# Covariance
cov_XY <- E_XY - E_X * E_Y  # -0.04

# Variances and correlation
var_X <- sum(x_vals^2 * p_X) - E_X^2  # 0.24
var_Y <- sum(y_vals^2 * p_Y) - E_Y^2  # 0.49
rho <- cov_XY / sqrt(var_X * var_Y)    # -0.117

cat("Cov(X,Y) =", cov_XY, "\nCorr(X,Y) =", rho, "\n")

13 12. R Code: Working with Joint Distributions

library(MASS)
library(ggplot2)

# ============================================================
# BIVARIATE NORMAL: Simulate and visualize
# ============================================================
set.seed(2024)
n <- 1000

# Parameters
mu <- c(2, 5)
sigma_x <- 1.5
sigma_y <- 2.0
rho <- 0.7

# Covariance matrix
Sigma <- matrix(c(sigma_x^2,
                  rho * sigma_x * sigma_y,
                  rho * sigma_x * sigma_y,
                  sigma_y^2), nrow=2)

# Simulate
data <- mvrnorm(n, mu, Sigma)
X <- data[, 1]; Y <- data[, 2]

# Verify
cat("Sample correlation:", cor(X, Y), "(True:", rho, ")\n")
cat("Sample cov matrix:\n"); print(cov(data))
cat("True cov matrix:\n"); print(Sigma)

# ============================================================
# CONDITIONAL EXPECTATION
# ============================================================
# Theoretical E[Y|X=x]
# E[Y|X=x] = mu_Y + rho * (sigma_Y/sigma_X) * (x - mu_X)
cond_mean_fn <- function(x) {
  mu[2] + rho * (sigma_y/sigma_x) * (x - mu[1])
}

# Empirical: bin X and compute average Y in each bin
x_breaks <- quantile(X, probs=seq(0, 1, by=0.2))
X_bin <- cut(X, breaks=x_breaks, include.lowest=TRUE)
cond_means <- tapply(Y, X_bin, mean)
x_midpoints <- (x_breaks[-length(x_breaks)] + x_breaks[-1]) / 2

# Plot
plot(X, Y, col=rgb(0,0,1,0.2), pch=16, cex=0.5,
     main="Bivariate Normal with Conditional Mean",
     xlab="X", ylab="Y")
curve(cond_mean_fn(x), add=TRUE, col="red", lwd=2)
points(x_midpoints, cond_means, col="orange", pch=17, cex=1.5)
legend("topleft", c("Data", "True E[Y|X]", "Empirical E[Y|X]"),
       col=c(rgb(0,0,1,0.5), "red", "orange"),
       pch=c(16, NA, 17), lty=c(NA, 1, NA), lwd=c(NA, 2, NA))

# ============================================================
# LAW OF ITERATED EXPECTATIONS: verify numerically
# ============================================================
# Marginal E[Y] should equal E[E[Y|X]]
n_x_points <- 100
x_range <- seq(min(X), max(X), length.out=n_x_points)
cond_means_at_x <- cond_mean_fn(x_range)
density_at_x <- dnorm(x_range, mean=mu[1], sd=sigma_x)

# Numerical integral: E_X[E[Y|X]]
LIE_integral <- sum(cond_means_at_x * density_at_x) * diff(x_range)[1]
cat("\nLIE check:\n")
cat("E[Y] (marginal) =", mu[2], "\n")
cat("E[E[Y|X]] (integral) =", LIE_integral, "\n")

# ============================================================
# COVARIANCE MATRIX IN OLS CONTEXT
# ============================================================
set.seed(42)
n <- 200
X_mat <- cbind(1, rnorm(n), rnorm(n))  # Design matrix [1, x2, x3]
beta_true <- c(1, 2, -1)
sigma_sq <- 4

y <- X_mat %*% beta_true + rnorm(n, sd=sqrt(sigma_sq))

# OLS
XtX_inv <- solve(t(X_mat) %*% X_mat)
beta_hat <- XtX_inv %*% t(X_mat) %*% y
e_hat <- y - X_mat %*% beta_hat
sigma_sq_hat <- sum(e_hat^2) / (n - 3)

# Estimated covariance matrix of beta_hat
Var_beta_hat <- sigma_sq_hat * XtX_inv
cat("\nEstimated Cov(beta_hat):\n")
print(Var_beta_hat)
cat("\nSE of beta_hat:", sqrt(diag(Var_beta_hat)), "\n")

# Verify with lm()
model <- lm(y ~ X_mat[,2] + X_mat[,3])
cat("\nFrom lm():", coef(summary(model))[, "Std. Error"], "\n")

14 13. Koneksi ke Econometrics dan ML

CautionConnection: Joint Distributions dalam Praktik

OLS sebagai conditional expectation model: Kita mengasumsikan \(E[\varepsilon|X] = 0\), yang berarti \(E[y|X] = X\beta\). Ini adalah pernyataan tentang conditional distribution — given covariates, expected error adalah nol.

Ketika asumsi ini dilanggar (endogeneity), estimator OLS tidak konsisten karena \(E[\varepsilon|X] \neq 0\), atau equivalently \(\text{Cov}(X, \varepsilon) \neq 0\).

GLS untuk correlated errors: Ketika \(\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I\) (serial correlation atau heteroskedasticity), kita perlu GLS yang memperhitungkan joint distribution dari error terms.

Principal Component Analysis: PCA mencari eigenvectors dari covariance matrix \(\Sigma\). Ini adalah kasus direct dari linear algebra covariance matrix.

Omitted variable bias: Jika true model adalah \(y = X_1\beta_1 + X_2\beta_2 + \varepsilon\) tapi kita run \(y = X_1\gamma + \eta\), maka \(\hat{\gamma} \to \beta_1 + \text{Cov}(X_1, X_2)/\text{Var}(X_1) \cdot \beta_2\). Besarnya bias bergantung pada \(\text{Cov}(X_1, X_2)\) — sebuah joint distribution quantity!


15 Practice Problems

Problem 1: Marginal dan conditional.

Joint PDF: \(f(x,y) = 2\) untuk \(0 \leq x \leq y \leq 1\), nol otherwise.

  • Hitung \(f_X(x)\) dan \(f_Y(y)\)
  • Hitung \(f_{Y|X}(y|x)\)
  • Apakah \(X\) dan \(Y\) independent?
  • Hitung \(E[Y|X=0.3]\)

Jawaban: - \(f_X(x) = \int_x^1 2\,dy = 2(1-x)\) untuk \(0 \leq x \leq 1\) - \(f_Y(y) = \int_0^y 2\,dx = 2y\) untuk \(0 \leq y \leq 1\) - \(f_{Y|X}(y|x) = 2/(2(1-x)) = 1/(1-x)\) untuk \(x \leq y \leq 1\) (Uniform[x,1]) - No, karena \(f_{X,Y} \neq f_X \cdot f_Y\) - \(E[Y|X=0.3] = (0.3+1)/2 = 0.65\) (mean of Uniform[0.3, 1])

Problem 2: Covariance dan correlation.

\(X \sim \text{Uniform}(0,1)\) dan \(Y = X + \varepsilon\) dimana \(\varepsilon \sim N(0, \sigma^2)\) independent dari \(X\).

  • Hitung \(\text{Cov}(X,Y)\)
  • Hitung \(\text{Var}(Y)\)
  • Hitung \(\text{Corr}(X,Y)\)
  • Saat \(\sigma^2 \to 0\), apa yang terjadi pada correlation?

Jawaban: - \(\text{Cov}(X,Y) = \text{Cov}(X, X+\varepsilon) = \text{Var}(X) = 1/12\) - \(\text{Var}(Y) = \text{Var}(X) + \text{Var}(\varepsilon) = 1/12 + \sigma^2\) - \(\rho = (1/12)/\sqrt{(1/12)(1/12+\sigma^2)}\) - Saat \(\sigma^2 \to 0\): \(\rho \to 1\) (perfect correlation)

Problem 3: Law of Iterated Expectations.

Nilai ujian siswa \(Y\) berdistribusi berbeda depending on school type \(X\): - \(X=1\) (public school, 60% siswa): \(Y|X=1 \sim N(75, 100)\) - \(X=2\) (private school, 40% siswa): \(Y|X=2 \sim N(85, 64)\)

Hitung: - \(E[Y]\) (overall average score) - \(\text{Var}(Y)\) (overall variance, use Law of Total Variance)

Jawaban: - \(E[Y] = E[E[Y|X]] = 75(0.6) + 85(0.4) = 45 + 34 = 79\) - \(E[\text{Var}(Y|X)] = 100(0.6) + 64(0.4) = 60 + 25.6 = 85.6\) - \(\text{Var}(E[Y|X]) = (75-79)^2(0.6) + (85-79)^2(0.4) = 16(0.6) + 36(0.4) = 9.6 + 14.4 = 24\) - \(\text{Var}(Y) = 85.6 + 24 = 109.6\)