Joint, Marginal & Conditional Distributions

Ketika Variabel Tidak Hidup Sendiri-sendiri

probability

joint-distributions

multivariate

Memahami joint distributions, covariance, correlation, dan Law of Iterated Expectations sebagai fondasi multivariate statistics dan econometrics.

1 Kenapa Ini Penting?

Why This Matters for Your Work

Di dunia nyata, variabel tidak pernah benar-benar independen. GDP growth berkorelasi dengan investment. Education berkorelasi dengan income. Residuals dalam spatial data berkorelasi dengan tetangga mereka.

Covariance matrix, correlation, conditional expectations — semuanya berakar dari joint distributions. Multivariate statistics dimulai di sini. Tanpa memahami joint distributions, kamu tidak akan mengerti kenapa:

OLS mengeksploitasi $E[y|X] = X\beta$
GLS mengoreksi untuk $\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I$
Omitted variable bias terjadi ketika omitted variable berkorelasi dengan included variables
Principal Component Analysis mencari directions of maximum variance dalam data

2 1. Joint Distribution

Definisi: Joint PDF dan PMF

Untuk variabel random kontinu $(X, Y)$, joint PDF $f_{X,Y}(x,y)$ memenuhi: \[P(a \leq X \leq b, c \leq Y \leq d) = \int_a^b\int_c^d f_{X,Y}(x,y)\,dy\,dx\]

dengan $f_{X,Y}(x,y) \geq 0$ dan $\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\,dx = 1$.

Untuk variabel diskret, joint PMF: $p_{X,Y}(x,y) = P(X=x, Y=y)$ dengan $\sum_x\sum_y p(x,y) = 1$.

Contoh sederhana — Joint PMF diskret untuk dua koin adil:

	$Y=0$	$Y=1$
$X=0$	1/4	1/4
$X=1$	1/4	1/4

Ini adalah kasus independent. Nanti kita akan lihat yang tidak independent.

3 2. Marginal Distributions

Marginal distribution adalah distribusi satu variabel, “dikompres” dari joint distribution.

Definisi: Marginal Distribution

Dari joint distribution, kita bisa recover marginal distributions:

Kontinu: \[f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\] \[f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx\]

Diskret: \[p_X(x) = \sum_y p_{X,Y}(x,y) \quad \text{(sum over all y)}\]

Intuisi: Marginal $f_X(x)$ adalah total probability of $X=x$ tanpa peduli nilai $Y$. Di tabel diskret, ini adalah row sum atau column sum.

4 3. Conditional Distribution

Conditional distribution adalah distribusi satu variabel given nilai variabel lain.

Definisi: Conditional Distribution

\[f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0\]

Ini langsung dari definisi conditional probability: $P(A|B) = P(A \cap B)/P(B)$.

Conditional expectation: \[E[Y|X=x] = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x)\,dy\]

Perhatikan: $E[Y|X=x]$ adalah fungsi dari $x$, bukan konstanta.

Contoh penting: Dalam linear regression, kita mengasumsikan $E[y|X] = X\beta$. Ini adalah pernyataan tentang conditional expectation function — given the predictors $X$, expected value of $y$ is a linear function of $X$.

5 4. Independence

Definisi: Independence

$X$ dan $Y$ independent jika dan hanya jika: \[f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) \quad \text{untuk semua } x, y\]

Ekuivalen: $f_{Y|X}(y|x) = f_Y(y)$ (mengetahui $X$ tidak mengubah distribusi $Y$).

Implikasi dari independence: - $E[XY] = E[X]E[Y]$ - $\text{Cov}(X,Y) = 0$ - $\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)$

HATI-HATI: $\text{Cov}(X,Y) = 0$ (uncorrelated) TIDAK mengimplikasikan independence, kecuali untuk multivariate normal!

6 5. Covariance

Definisi: Covariance

\[\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - \mu_X\mu_Y\]

Computational formula: $\text{Cov}(X,Y) = E[XY] - E[X]E[Y]$

Properties: - $\text{Cov}(X,X) = \text{Var}(X)$ - $\text{Cov}(X,Y) = \text{Cov}(Y,X)$ (symmetric) - $\text{Cov}(aX+b, cY+d) = ac \cdot \text{Cov}(X,Y)$ - $\text{Cov}(X+Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)$ (bilinear)

Var of sum: \[\text{Var}(X+Y) = \text{Var}(X) + 2\text{Cov}(X,Y) + \text{Var}(Y)\]

Lebih umum: \[\text{Var}\left(\sum_i a_i X_i\right) = \sum_i a_i^2 \text{Var}(X_i) + 2\sum_{i<j} a_i a_j \text{Cov}(X_i, X_j)\]

Masalah dengan covariance: unit-dependent. $\text{Cov}(\text{income in IDR}, \text{years edu})$ akan jauh lebih besar dari $\text{Cov}(\text{income in million IDR}, \text{years edu})$ meskipun mengukur hal yang sama. Solusi: correlation.

7 6. Correlation

Definisi: Correlation (Pearson)

\[\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}, \quad \sigma_X = \sqrt{\text{Var}(X)}, \sigma_Y = \sqrt{\text{Var}(Y)}\]

Properties: - $-1 \leq \rho \leq 1$ (Cauchy-Schwarz inequality) - $\rho = 1$: perfect positive linear relationship - $\rho = -1$: perfect negative linear relationship - $\rho = 0$: uncorrelated (bukan necessarily independent!) - Scale-invariant: $\text{Corr}(aX+b, cY+d) = \text{sign}(ac) \cdot \text{Corr}(X,Y)$

Catatan penting: Correlation mengukur linear association. $Y = X^2$ dengan $X \sim N(0,1)$: $\text{Corr}(X,Y) = 0$ meskipun $Y$ sepenuhnya ditentukan oleh $X$!

8 7. Covariance Matrix

Definisi: Covariance Matrix

Untuk random vector $\mathbf{X} = (X_1, \ldots, X_p)^T$, covariance matrix (atau variance-covariance matrix) $\Sigma$ adalah matrix $p \times p$:

\[\Sigma_{ij} = \text{Cov}(X_i, X_j)\]

Dengan bentuk matrix: \[\Sigma = E[(\mathbf{X}-\boldsymbol{\mu})(\mathbf{X}-\boldsymbol{\mu})^T]\]

Properties: - Symmetric: $\Sigma = \Sigma^T$ - Positive semi-definite (PSD): $\mathbf{a}^T\Sigma\mathbf{a} \geq 0$ untuk semua $\mathbf{a} \in \mathbb{R}^p$ - Diagonal elements = variances: $\Sigma_{ii} = \text{Var}(X_i)$

Ekspresi matrix untuk variance linear combination: \[\text{Var}(\mathbf{a}^T\mathbf{X}) = \mathbf{a}^T\Sigma\mathbf{a}\]

Ini sangat penting untuk OLS: variance dari estimator $\hat{\beta} = (X^TX)^{-1}X^Ty$ adalah: \[\text{Var}(\hat{\beta}) = (X^TX)^{-1}X^T \cdot (\sigma^2 I) \cdot X(X^TX)^{-1} = \sigma^2(X^TX)^{-1}\]

9 8. Bivariate Normal Distribution

Definisi: Bivariate Normal

$(X,Y) \sim N(\boldsymbol{\mu}, \Sigma)$ dengan $\boldsymbol{\mu} = (\mu_X, \mu_Y)^T$ dan: \[\Sigma = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix}\]

PDF: \[f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)\]

Key Properties for Bivariate Normal: 1. Marginals: $X \sim N(\mu_X, \sigma_X^2)$, $Y \sim N(\mu_Y, \sigma_Y^2)$ 2. Conditional distribution is also normal: \[Y|X=x \sim N\left(\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X),\ \sigma_Y^2(1-\rho^2)\right)\] 3. Jika $\rho = 0$, maka $X$ dan $Y$ independent (hanya berlaku untuk normal!)

Implikasi untuk regresi: Conditional mean $E[Y|X=x] = \mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X)$ adalah linear function dari $x$. Ini memberikan theoretical justification untuk linear regression ketika data joint normal.

Slope regresi $Y$ on $X$: $b = \rho\frac{\sigma_Y}{\sigma_X} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}$.

10 9. Law of Iterated Expectations

Definisi: Law of Iterated Expectations (LIE)

\[E[Y] = E_X[E[Y|X]] = E[E[Y|X]]\]

Artinya: untuk mendapat unconditional expectation of $Y$, kita bisa: 1. Hitung conditional expectation $E[Y|X=x]$ untuk setiap $x$ 2. Average over distribution of $X$

Dalam notasi integral: \[E[Y] = \int E[Y|X=x] \cdot f_X(x)\,dx\]

Kenapa ini penting? LIE adalah fondasi dari:

Causal inference: $E[Y^{(1)} - Y^{(0)}]$ dapat dihitung sebagai $E_X[E[Y|X, D=1] - E[Y|X, D=0]]$ under conditional independence.
Panel data: $E[y_{it}] = E_i[E[y_{it}|i]]$ — decomposing into between and within variation.
Prediction: Best predictor of $Y$ given $X$ (minimizing MSE) is $E[Y|X]$.

Contoh intuitif: Rata-rata nilai ujian siswa di sekolah = rata-rata (rata-rata nilai di setiap kelas). LIE menyatakan ini secara formal.

11 10. Conditional Variance Formula (Eve’s Law)

Definisi: Law of Total Variance

\[\text{Var}(Y) = E[\text{Var}(Y|X)] + \text{Var}(E[Y|X])\]

Decomposisi: - $E[\text{Var}(Y|X)]$ = within-group variance (average variance dalam setiap group $X$) - $\text{Var}(E[Y|X])$ = between-group variance (variance of group means)

Analogy ke ANOVA: Ini persis SST = SSW + SSB dalam ANOVA!

Aplikasi dalam regression: $R^2 = \text{SSR}/\text{SST} = \text{Var}(E[Y|X])/\text{Var}(Y)$ adalah fraction of variance yang explained oleh between-group variation dalam conditional means.

12 11. Worked Example: Bivariate Covariance

Worked Example: Menghitung Covariance dan Conditional Expectation

Problem: Misalkan joint PMF untuk $(X, Y)$ adalah:

	$Y=0$	$Y=1$	$Y=2$
$X=0$	0.1	0.2	0.1
$X=1$	0.2	0.3	0.1

Step 1: Marginal distributions. \[p_X(0) = 0.1+0.2+0.1 = 0.4, \quad p_X(1) = 0.2+0.3+0.1 = 0.6\] \[p_Y(0) = 0.1+0.2 = 0.3, \quad p_Y(1) = 0.2+0.3 = 0.5, \quad p_Y(2) = 0.1+0.1 = 0.2\]

Step 2: Means. \[E[X] = 0(0.4) + 1(0.6) = 0.6\] \[E[Y] = 0(0.3) + 1(0.5) + 2(0.2) = 0 + 0.5 + 0.4 = 0.9\]

Step 3: $E[XY]$. \[E[XY] = \sum_{x,y} xy \cdot p(x,y)\] \[= 0\cdot0\cdot0.1 + 0\cdot1\cdot0.2 + 0\cdot2\cdot0.1 + 1\cdot0\cdot0.2 + 1\cdot1\cdot0.3 + 1\cdot2\cdot0.1\] \[= 0 + 0 + 0 + 0 + 0.3 + 0.2 = 0.5\]

Step 4: Covariance dan correlation. \[\text{Cov}(X,Y) = E[XY] - E[X]E[Y] = 0.5 - (0.6)(0.9) = 0.5 - 0.54 = -0.04\]

\[\text{Var}(X) = E[X^2] - (E[X])^2 = 0.6 - 0.36 = 0.24\] \[\text{Var}(Y) = E[Y^2] - (E[Y])^2 = (0+0.5+0.8) - 0.81 = 1.3 - 0.81 = 0.49\]

\[\rho = \frac{-0.04}{\sqrt{0.24 \times 0.49}} = \frac{-0.04}{\sqrt{0.1176}} = \frac{-0.04}{0.343} = -0.117\]

Step 5: Conditional expectation $E[Y|X=1]$. \[P(Y=0|X=1) = 0.2/0.6 = 1/3, \quad P(Y=1|X=1) = 0.3/0.6 = 1/2, \quad P(Y=2|X=1) = 0.1/0.6 = 1/6\] \[E[Y|X=1] = 0(1/3) + 1(1/2) + 2(1/6) = 0 + 0.5 + 0.333 = 0.833\]

Verify LIE: \[E[E[Y|X]] = E[Y|X=0]\cdot 0.4 + E[Y|X=1]\cdot 0.6\] \[E[Y|X=0] = 0(1/4) + 1(1/2) + 2(1/4) = 1\] \[= 1(0.4) + 0.833(0.6) = 0.4 + 0.5 = 0.9 = E[Y] \checkmark\]

# Define joint PMF as matrix
joint_pmf <- matrix(c(0.1, 0.2, 0.1,
                      0.2, 0.3, 0.1), nrow=2, byrow=TRUE)
rownames(joint_pmf) <- c("X=0", "X=1")
colnames(joint_pmf) <- c("Y=0", "Y=1", "Y=2")

# Marginals
p_X <- rowSums(joint_pmf)  # c(0.4, 0.6)
p_Y <- colSums(joint_pmf)  # c(0.3, 0.5, 0.2)

# Means
x_vals <- c(0, 1); y_vals <- c(0, 1, 2)
E_X <- sum(x_vals * p_X)  # 0.6
E_Y <- sum(y_vals * p_Y)  # 0.9

# E[XY]
E_XY <- sum(outer(x_vals, y_vals) * joint_pmf)  # 0.5

# Covariance
cov_XY <- E_XY - E_X * E_Y  # -0.04

# Variances and correlation
var_X <- sum(x_vals^2 * p_X) - E_X^2  # 0.24
var_Y <- sum(y_vals^2 * p_Y) - E_Y^2  # 0.49
rho <- cov_XY / sqrt(var_X * var_Y)    # -0.117

cat("Cov(X,Y) =", cov_XY, "\nCorr(X,Y) =", rho, "\n")

13 12. R Code: Working with Joint Distributions

library(MASS)
library(ggplot2)

# ============================================================
# BIVARIATE NORMAL: Simulate and visualize
# ============================================================
set.seed(2024)
n <- 1000

# Parameters
mu <- c(2, 5)
sigma_x <- 1.5
sigma_y <- 2.0
rho <- 0.7

# Covariance matrix
Sigma <- matrix(c(sigma_x^2,
                  rho * sigma_x * sigma_y,
                  rho * sigma_x * sigma_y,
                  sigma_y^2), nrow=2)

# Simulate
data <- mvrnorm(n, mu, Sigma)
X <- data[, 1]; Y <- data[, 2]

# Verify
cat("Sample correlation:", cor(X, Y), "(True:", rho, ")\n")
cat("Sample cov matrix:\n"); print(cov(data))
cat("True cov matrix:\n"); print(Sigma)

# ============================================================
# CONDITIONAL EXPECTATION
# ============================================================
# Theoretical E[Y|X=x]
# E[Y|X=x] = mu_Y + rho * (sigma_Y/sigma_X) * (x - mu_X)
cond_mean_fn <- function(x) {
  mu[2] + rho * (sigma_y/sigma_x) * (x - mu[1])
}

# Empirical: bin X and compute average Y in each bin
x_breaks <- quantile(X, probs=seq(0, 1, by=0.2))
X_bin <- cut(X, breaks=x_breaks, include.lowest=TRUE)
cond_means <- tapply(Y, X_bin, mean)
x_midpoints <- (x_breaks[-length(x_breaks)] + x_breaks[-1]) / 2

# Plot
plot(X, Y, col=rgb(0,0,1,0.2), pch=16, cex=0.5,
     main="Bivariate Normal with Conditional Mean",
     xlab="X", ylab="Y")
curve(cond_mean_fn(x), add=TRUE, col="red", lwd=2)
points(x_midpoints, cond_means, col="orange", pch=17, cex=1.5)
legend("topleft", c("Data", "True E[Y|X]", "Empirical E[Y|X]"),
       col=c(rgb(0,0,1,0.5), "red", "orange"),
       pch=c(16, NA, 17), lty=c(NA, 1, NA), lwd=c(NA, 2, NA))

# ============================================================
# LAW OF ITERATED EXPECTATIONS: verify numerically
# ============================================================
# Marginal E[Y] should equal E[E[Y|X]]
n_x_points <- 100
x_range <- seq(min(X), max(X), length.out=n_x_points)
cond_means_at_x <- cond_mean_fn(x_range)
density_at_x <- dnorm(x_range, mean=mu[1], sd=sigma_x)

# Numerical integral: E_X[E[Y|X]]
LIE_integral <- sum(cond_means_at_x * density_at_x) * diff(x_range)[1]
cat("\nLIE check:\n")
cat("E[Y] (marginal) =", mu[2], "\n")
cat("E[E[Y|X]] (integral) =", LIE_integral, "\n")

# ============================================================
# COVARIANCE MATRIX IN OLS CONTEXT
# ============================================================
set.seed(42)
n <- 200
X_mat <- cbind(1, rnorm(n), rnorm(n))  # Design matrix [1, x2, x3]
beta_true <- c(1, 2, -1)
sigma_sq <- 4

y <- X_mat %*% beta_true + rnorm(n, sd=sqrt(sigma_sq))

# OLS
XtX_inv <- solve(t(X_mat) %*% X_mat)
beta_hat <- XtX_inv %*% t(X_mat) %*% y
e_hat <- y - X_mat %*% beta_hat
sigma_sq_hat <- sum(e_hat^2) / (n - 3)

# Estimated covariance matrix of beta_hat
Var_beta_hat <- sigma_sq_hat * XtX_inv
cat("\nEstimated Cov(beta_hat):\n")
print(Var_beta_hat)
cat("\nSE of beta_hat:", sqrt(diag(Var_beta_hat)), "\n")

# Verify with lm()
model <- lm(y ~ X_mat[,2] + X_mat[,3])
cat("\nFrom lm():", coef(summary(model))[, "Std. Error"], "\n")

14 13. Koneksi ke Econometrics dan ML

Connection: Joint Distributions dalam Praktik

OLS sebagai conditional expectation model: Kita mengasumsikan $E[\varepsilon|X] = 0$, yang berarti $E[y|X] = X\beta$. Ini adalah pernyataan tentang conditional distribution — given covariates, expected error adalah nol.

Ketika asumsi ini dilanggar (endogeneity), estimator OLS tidak konsisten karena $E[\varepsilon|X] \neq 0$, atau equivalently $\text{Cov}(X, \varepsilon) \neq 0$.

GLS untuk correlated errors: Ketika $\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I$ (serial correlation atau heteroskedasticity), kita perlu GLS yang memperhitungkan joint distribution dari error terms.

Principal Component Analysis: PCA mencari eigenvectors dari covariance matrix $\Sigma$. Ini adalah kasus direct dari linear algebra covariance matrix.

Omitted variable bias: Jika true model adalah $y = X_1\beta_1 + X_2\beta_2 + \varepsilon$ tapi kita run $y = X_1\gamma + \eta$, maka $\hat{\gamma} \to \beta_1 + \text{Cov}(X_1, X_2)/\text{Var}(X_1) \cdot \beta_2$. Besarnya bias bergantung pada $\text{Cov}(X_1, X_2)$ — sebuah joint distribution quantity!

15 Practice Problems

Practice Problems

Problem 1: Marginal dan conditional.

Joint PDF: $f(x,y) = 2$ untuk $0 \leq x \leq y \leq 1$, nol otherwise.

Hitung $f_X(x)$ dan $f_Y(y)$
Hitung $f_{Y|X}(y|x)$
Apakah $X$ dan $Y$ independent?
Hitung $E[Y|X=0.3]$

Jawaban: - $f_X(x) = \int_x^1 2\,dy = 2(1-x)$ untuk $0 \leq x \leq 1$ - $f_Y(y) = \int_0^y 2\,dx = 2y$ untuk $0 \leq y \leq 1$ - $f_{Y|X}(y|x) = 2/(2(1-x)) = 1/(1-x)$ untuk $x \leq y \leq 1$ (Uniform[x,1]) - No, karena $f_{X,Y} \neq f_X \cdot f_Y$ - $E[Y|X=0.3] = (0.3+1)/2 = 0.65$ (mean of Uniform[0.3, 1])

Problem 2: Covariance dan correlation.

$X \sim \text{Uniform}(0,1)$ dan $Y = X + \varepsilon$ dimana $\varepsilon \sim N(0, \sigma^2)$ independent dari $X$.

Hitung $\text{Cov}(X,Y)$
Hitung $\text{Var}(Y)$
Hitung $\text{Corr}(X,Y)$
Saat $\sigma^2 \to 0$, apa yang terjadi pada correlation?

Jawaban: - $\text{Cov}(X,Y) = \text{Cov}(X, X+\varepsilon) = \text{Var}(X) = 1/12$ - $\text{Var}(Y) = \text{Var}(X) + \text{Var}(\varepsilon) = 1/12 + \sigma^2$ - $\rho = (1/12)/\sqrt{(1/12)(1/12+\sigma^2)}$ - Saat $\sigma^2 \to 0$: $\rho \to 1$ (perfect correlation)

Problem 3: Law of Iterated Expectations.

Nilai ujian siswa $Y$ berdistribusi berbeda depending on school type $X$: - $X=1$ (public school, 60% siswa): $Y|X=1 \sim N(75, 100)$ - $X=2$ (private school, 40% siswa): $Y|X=2 \sim N(85, 64)$

Hitung: - $E[Y]$ (overall average score) - $\text{Var}(Y)$ (overall variance, use Law of Total Variance)

Jawaban: - $E[Y] = E[E[Y|X]] = 75(0.6) + 85(0.4) = 45 + 34 = 79$ - $E[\text{Var}(Y|X)] = 100(0.6) + 64(0.4) = 60 + 25.6 = 85.6$ - $\text{Var}(E[Y|X]) = (75-79)^2(0.6) + (85-79)^2(0.4) = 16(0.6) + 36(0.4) = 9.6 + 14.4 = 24$ - $\text{Var}(Y) = 85.6 + 24 = 109.6$

--- title: "Joint, Marginal & Conditional Distributions" subtitle: "Ketika Variabel Tidak Hidup Sendiri-sendiri" description: "Memahami joint distributions, covariance, correlation, dan Law of Iterated Expectations sebagai fondasi multivariate statistics dan econometrics." categories: [probability, joint-distributions, multivariate] --- ## Kenapa Ini Penting? ::: {.callout-note title="Why This Matters for Your Work"} Di dunia nyata, variabel tidak pernah benar-benar independen. GDP growth berkorelasi dengan investment. Education berkorelasi dengan income. Residuals dalam spatial data berkorelasi dengan tetangga mereka. **Covariance matrix, correlation, conditional expectations** — semuanya berakar dari joint distributions. Multivariate statistics dimulai di sini. Tanpa memahami joint distributions, kamu tidak akan mengerti kenapa: - OLS mengeksploitasi $E[y|X] = X\beta$ - GLS mengoreksi untuk $\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I$ - Omitted variable bias terjadi ketika omitted variable berkorelasi dengan included variables - Principal Component Analysis mencari directions of maximum variance dalam data ::: --- ## 1. Joint Distribution ::: {.callout-important title="Definisi: Joint PDF dan PMF"} Untuk variabel random **kontinu** $(X, Y)$, **joint PDF** $f_{X,Y}(x,y)$ memenuhi: $$P(a \leq X \leq b, c \leq Y \leq d) = \int_a^b\int_c^d f_{X,Y}(x,y)\,dy\,dx$$ dengan $f_{X,Y}(x,y) \geq 0$ dan $\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy\,dx = 1$. Untuk variabel **diskret**, **joint PMF**: $p_{X,Y}(x,y) = P(X=x, Y=y)$ dengan $\sum_x\sum_y p(x,y) = 1$. ::: **Contoh sederhana** — Joint PMF diskret untuk dua koin adil: | | $Y=0$ | $Y=1$ | |---|---|---| | $X=0$ | 1/4 | 1/4 | | $X=1$ | 1/4 | 1/4 | Ini adalah kasus independent. Nanti kita akan lihat yang tidak independent. --- ## 2. Marginal Distributions Marginal distribution adalah distribusi satu variabel, "dikompres" dari joint distribution. ::: {.callout-important title="Definisi: Marginal Distribution"} Dari joint distribution, kita bisa recover **marginal distributions**: **Kontinu**: $$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy$$ $$f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx$$ **Diskret**: $$p_X(x) = \sum_y p_{X,Y}(x,y) \quad \text{(sum over all y)}$$ ::: **Intuisi**: Marginal $f_X(x)$ adalah total probability of $X=x$ tanpa peduli nilai $Y$. Di tabel diskret, ini adalah row sum atau column sum. --- ## 3. Conditional Distribution Conditional distribution adalah distribusi satu variabel **given** nilai variabel lain. ::: {.callout-important title="Definisi: Conditional Distribution"} $$f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0$$ Ini langsung dari definisi conditional probability: $P(A|B) = P(A \cap B)/P(B)$. **Conditional expectation**: $$E[Y|X=x] = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x)\,dy$$ Perhatikan: $E[Y|X=x]$ adalah **fungsi dari $x$**, bukan konstanta. ::: **Contoh penting**: Dalam linear regression, kita mengasumsikan $E[y|X] = X\beta$. Ini adalah pernyataan tentang **conditional expectation function** — given the predictors $X$, expected value of $y$ is a linear function of $X$. --- ## 4. Independence ::: {.callout-important title="Definisi: Independence"} $X$ dan $Y$ **independent** jika dan hanya jika: $$f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) \quad \text{untuk semua } x, y$$ Ekuivalen: $f_{Y|X}(y|x) = f_Y(y)$ (mengetahui $X$ tidak mengubah distribusi $Y$). **Implikasi dari independence**: - $E[XY] = E[X]E[Y]$ - $\text{Cov}(X,Y) = 0$ - $\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)$ **HATI-HATI**: $\text{Cov}(X,Y) = 0$ (uncorrelated) **TIDAK** mengimplikasikan independence, kecuali untuk multivariate normal! ::: --- ## 5. Covariance ::: {.callout-important title="Definisi: Covariance"} $$\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - \mu_X\mu_Y$$ **Computational formula**: $\text{Cov}(X,Y) = E[XY] - E[X]E[Y]$ **Properties**: - $\text{Cov}(X,X) = \text{Var}(X)$ - $\text{Cov}(X,Y) = \text{Cov}(Y,X)$ (symmetric) - $\text{Cov}(aX+b, cY+d) = ac \cdot \text{Cov}(X,Y)$ - $\text{Cov}(X+Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)$ (bilinear) **Var of sum**: $$\text{Var}(X+Y) = \text{Var}(X) + 2\text{Cov}(X,Y) + \text{Var}(Y)$$ Lebih umum: $$\text{Var}\left(\sum_i a_i X_i\right) = \sum_i a_i^2 \text{Var}(X_i) + 2\sum_{i<j} a_i a_j \text{Cov}(X_i, X_j)$$ ::: **Masalah dengan covariance**: unit-dependent. $\text{Cov}(\text{income in IDR}, \text{years edu})$ akan jauh lebih besar dari $\text{Cov}(\text{income in million IDR}, \text{years edu})$ meskipun mengukur hal yang sama. Solusi: correlation. --- ## 6. Correlation ::: {.callout-important title="Definisi: Correlation (Pearson)"} $$\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}, \quad \sigma_X = \sqrt{\text{Var}(X)}, \sigma_Y = \sqrt{\text{Var}(Y)}$$ **Properties**: - $-1 \leq \rho \leq 1$ (Cauchy-Schwarz inequality) - $\rho = 1$: perfect positive linear relationship - $\rho = -1$: perfect negative linear relationship - $\rho = 0$: uncorrelated (bukan necessarily independent!) - Scale-invariant: $\text{Corr}(aX+b, cY+d) = \text{sign}(ac) \cdot \text{Corr}(X,Y)$ ::: **Catatan penting**: Correlation mengukur **linear** association. $Y = X^2$ dengan $X \sim N(0,1)$: $\text{Corr}(X,Y) = 0$ meskipun $Y$ sepenuhnya ditentukan oleh $X$! --- ## 7. Covariance Matrix ::: {.callout-important title="Definisi: Covariance Matrix"} Untuk random vector $\mathbf{X} = (X_1, \ldots, X_p)^T$, **covariance matrix** (atau variance-covariance matrix) $\Sigma$ adalah matrix $p \times p$: $$\Sigma_{ij} = \text{Cov}(X_i, X_j)$$ Dengan bentuk matrix: $$\Sigma = E[(\mathbf{X}-\boldsymbol{\mu})(\mathbf{X}-\boldsymbol{\mu})^T]$$ **Properties**: - **Symmetric**: $\Sigma = \Sigma^T$ - **Positive semi-definite (PSD)**: $\mathbf{a}^T\Sigma\mathbf{a} \geq 0$ untuk semua $\mathbf{a} \in \mathbb{R}^p$ - Diagonal elements = variances: $\Sigma_{ii} = \text{Var}(X_i)$ ::: **Ekspresi matrix untuk variance linear combination**: $$\text{Var}(\mathbf{a}^T\mathbf{X}) = \mathbf{a}^T\Sigma\mathbf{a}$$ Ini sangat penting untuk OLS: variance dari estimator $\hat{\beta} = (X^TX)^{-1}X^Ty$ adalah: $$\text{Var}(\hat{\beta}) = (X^TX)^{-1}X^T \cdot (\sigma^2 I) \cdot X(X^TX)^{-1} = \sigma^2(X^TX)^{-1}$$ --- ## 8. Bivariate Normal Distribution ::: {.callout-important title="Definisi: Bivariate Normal"} $(X,Y) \sim N(\boldsymbol{\mu}, \Sigma)$ dengan $\boldsymbol{\mu} = (\mu_X, \mu_Y)^T$ dan: $$\Sigma = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix}$$ **PDF**: $$f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)$$ **Key Properties for Bivariate Normal**: 1. Marginals: $X \sim N(\mu_X, \sigma_X^2)$, $Y \sim N(\mu_Y, \sigma_Y^2)$ 2. **Conditional distribution is also normal**: $$Y|X=x \sim N\left(\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X),\ \sigma_Y^2(1-\rho^2)\right)$$ 3. Jika $\rho = 0$, maka $X$ dan $Y$ **independent** (hanya berlaku untuk normal!) ::: **Implikasi untuk regresi**: Conditional mean $E[Y|X=x] = \mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X)$ adalah linear function dari $x$. Ini memberikan theoretical justification untuk linear regression ketika data joint normal. Slope regresi $Y$ on $X$: $b = \rho\frac{\sigma_Y}{\sigma_X} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}$. --- ## 9. Law of Iterated Expectations ::: {.callout-important title="Definisi: Law of Iterated Expectations (LIE)"} $$E[Y] = E_X[E[Y|X]] = E[E[Y|X]]$$ Artinya: untuk mendapat unconditional expectation of $Y$, kita bisa: 1. Hitung conditional expectation $E[Y|X=x]$ untuk setiap $x$ 2. Average over distribution of $X$ Dalam notasi integral: $$E[Y] = \int E[Y|X=x] \cdot f_X(x)\,dx$$ ::: **Kenapa ini penting?** LIE adalah fondasi dari: 1. **Causal inference**: $E[Y^{(1)} - Y^{(0)}]$ dapat dihitung sebagai $E_X[E[Y|X, D=1] - E[Y|X, D=0]]$ under conditional independence. 2. **Panel data**: $E[y_{it}] = E_i[E[y_{it}|i]]$ — decomposing into between and within variation. 3. **Prediction**: Best predictor of $Y$ given $X$ (minimizing MSE) is $E[Y|X]$. **Contoh intuitif**: Rata-rata nilai ujian siswa di sekolah = rata-rata (rata-rata nilai di setiap kelas). LIE menyatakan ini secara formal. --- ## 10. Conditional Variance Formula (Eve's Law) ::: {.callout-important title="Definisi: Law of Total Variance"} $$\text{Var}(Y) = E[\text{Var}(Y|X)] + \text{Var}(E[Y|X])$$ Decomposisi: - $E[\text{Var}(Y|X)]$ = **within-group variance** (average variance dalam setiap group $X$) - $\text{Var}(E[Y|X])$ = **between-group variance** (variance of group means) ::: **Analogy ke ANOVA**: Ini persis SST = SSW + SSB dalam ANOVA! **Aplikasi dalam regression**: $R^2 = \text{SSR}/\text{SST} = \text{Var}(E[Y|X])/\text{Var}(Y)$ adalah fraction of variance yang explained oleh between-group variation dalam conditional means. --- ## 11. Worked Example: Bivariate Covariance ::: {.callout-tip title="Worked Example: Menghitung Covariance dan Conditional Expectation" collapse="true"} **Problem**: Misalkan joint PMF untuk $(X, Y)$ adalah: | | $Y=0$ | $Y=1$ | $Y=2$ | |---|---|---|---| | $X=0$ | 0.1 | 0.2 | 0.1 | | $X=1$ | 0.2 | 0.3 | 0.1 | **Step 1**: Marginal distributions. $$p_X(0) = 0.1+0.2+0.1 = 0.4, \quad p_X(1) = 0.2+0.3+0.1 = 0.6$$ $$p_Y(0) = 0.1+0.2 = 0.3, \quad p_Y(1) = 0.2+0.3 = 0.5, \quad p_Y(2) = 0.1+0.1 = 0.2$$ **Step 2**: Means. $$E[X] = 0(0.4) + 1(0.6) = 0.6$$ $$E[Y] = 0(0.3) + 1(0.5) + 2(0.2) = 0 + 0.5 + 0.4 = 0.9$$ **Step 3**: $E[XY]$. $$E[XY] = \sum_{x,y} xy \cdot p(x,y)$$ $$= 0\cdot0\cdot0.1 + 0\cdot1\cdot0.2 + 0\cdot2\cdot0.1 + 1\cdot0\cdot0.2 + 1\cdot1\cdot0.3 + 1\cdot2\cdot0.1$$ $$= 0 + 0 + 0 + 0 + 0.3 + 0.2 = 0.5$$ **Step 4**: Covariance dan correlation. $$\text{Cov}(X,Y) = E[XY] - E[X]E[Y] = 0.5 - (0.6)(0.9) = 0.5 - 0.54 = -0.04$$ $$\text{Var}(X) = E[X^2] - (E[X])^2 = 0.6 - 0.36 = 0.24$$ $$\text{Var}(Y) = E[Y^2] - (E[Y])^2 = (0+0.5+0.8) - 0.81 = 1.3 - 0.81 = 0.49$$ $$\rho = \frac{-0.04}{\sqrt{0.24 \times 0.49}} = \frac{-0.04}{\sqrt{0.1176}} = \frac{-0.04}{0.343} = -0.117$$ **Step 5**: Conditional expectation $E[Y|X=1]$. $$P(Y=0|X=1) = 0.2/0.6 = 1/3, \quad P(Y=1|X=1) = 0.3/0.6 = 1/2, \quad P(Y=2|X=1) = 0.1/0.6 = 1/6$$ $$E[Y|X=1] = 0(1/3) + 1(1/2) + 2(1/6) = 0 + 0.5 + 0.333 = 0.833$$ **Verify LIE**: $$E[E[Y|X]] = E[Y|X=0]\cdot 0.4 + E[Y|X=1]\cdot 0.6$$ $$E[Y|X=0] = 0(1/4) + 1(1/2) + 2(1/4) = 1$$ $$= 1(0.4) + 0.833(0.6) = 0.4 + 0.5 = 0.9 = E[Y] \checkmark$$ ```r # Define joint PMF as matrix joint_pmf <- matrix(c(0.1, 0.2, 0.1, 0.2, 0.3, 0.1), nrow=2, byrow=TRUE) rownames(joint_pmf) <- c("X=0", "X=1") colnames(joint_pmf) <- c("Y=0", "Y=1", "Y=2") # Marginals p_X <- rowSums(joint_pmf) # c(0.4, 0.6) p_Y <- colSums(joint_pmf) # c(0.3, 0.5, 0.2) # Means x_vals <- c(0, 1); y_vals <- c(0, 1, 2) E_X <- sum(x_vals * p_X) # 0.6 E_Y <- sum(y_vals * p_Y) # 0.9 # E[XY] E_XY <- sum(outer(x_vals, y_vals) * joint_pmf) # 0.5 # Covariance cov_XY <- E_XY - E_X * E_Y # -0.04 # Variances and correlation var_X <- sum(x_vals^2 * p_X) - E_X^2 # 0.24 var_Y <- sum(y_vals^2 * p_Y) - E_Y^2 # 0.49 rho <- cov_XY / sqrt(var_X * var_Y) # -0.117 cat("Cov(X,Y) =", cov_XY, "\nCorr(X,Y) =", rho, "\n") ``` ::: --- ## 12. R Code: Working with Joint Distributions ```r library(MASS) library(ggplot2) # ============================================================ # BIVARIATE NORMAL: Simulate and visualize # ============================================================ set.seed(2024) n <- 1000 # Parameters mu <- c(2, 5) sigma_x <- 1.5 sigma_y <- 2.0 rho <- 0.7 # Covariance matrix Sigma <- matrix(c(sigma_x^2, rho * sigma_x * sigma_y, rho * sigma_x * sigma_y, sigma_y^2), nrow=2) # Simulate data <- mvrnorm(n, mu, Sigma) X <- data[, 1]; Y <- data[, 2] # Verify cat("Sample correlation:", cor(X, Y), "(True:", rho, ")\n") cat("Sample cov matrix:\n"); print(cov(data)) cat("True cov matrix:\n"); print(Sigma) # ============================================================ # CONDITIONAL EXPECTATION # ============================================================ # Theoretical E[Y|X=x] # E[Y|X=x] = mu_Y + rho * (sigma_Y/sigma_X) * (x - mu_X) cond_mean_fn <- function(x) { mu[2] + rho * (sigma_y/sigma_x) * (x - mu[1]) } # Empirical: bin X and compute average Y in each bin x_breaks <- quantile(X, probs=seq(0, 1, by=0.2)) X_bin <- cut(X, breaks=x_breaks, include.lowest=TRUE) cond_means <- tapply(Y, X_bin, mean) x_midpoints <- (x_breaks[-length(x_breaks)] + x_breaks[-1]) / 2 # Plot plot(X, Y, col=rgb(0,0,1,0.2), pch=16, cex=0.5, main="Bivariate Normal with Conditional Mean", xlab="X", ylab="Y") curve(cond_mean_fn(x), add=TRUE, col="red", lwd=2) points(x_midpoints, cond_means, col="orange", pch=17, cex=1.5) legend("topleft", c("Data", "True E[Y|X]", "Empirical E[Y|X]"), col=c(rgb(0,0,1,0.5), "red", "orange"), pch=c(16, NA, 17), lty=c(NA, 1, NA), lwd=c(NA, 2, NA)) # ============================================================ # LAW OF ITERATED EXPECTATIONS: verify numerically # ============================================================ # Marginal E[Y] should equal E[E[Y|X]] n_x_points <- 100 x_range <- seq(min(X), max(X), length.out=n_x_points) cond_means_at_x <- cond_mean_fn(x_range) density_at_x <- dnorm(x_range, mean=mu[1], sd=sigma_x) # Numerical integral: E_X[E[Y|X]] LIE_integral <- sum(cond_means_at_x * density_at_x) * diff(x_range)[1] cat("\nLIE check:\n") cat("E[Y] (marginal) =", mu[2], "\n") cat("E[E[Y|X]] (integral) =", LIE_integral, "\n") # ============================================================ # COVARIANCE MATRIX IN OLS CONTEXT # ============================================================ set.seed(42) n <- 200 X_mat <- cbind(1, rnorm(n), rnorm(n)) # Design matrix [1, x2, x3] beta_true <- c(1, 2, -1) sigma_sq <- 4 y <- X_mat %*% beta_true + rnorm(n, sd=sqrt(sigma_sq)) # OLS XtX_inv <- solve(t(X_mat) %*% X_mat) beta_hat <- XtX_inv %*% t(X_mat) %*% y e_hat <- y - X_mat %*% beta_hat sigma_sq_hat <- sum(e_hat^2) / (n - 3) # Estimated covariance matrix of beta_hat Var_beta_hat <- sigma_sq_hat * XtX_inv cat("\nEstimated Cov(beta_hat):\n") print(Var_beta_hat) cat("\nSE of beta_hat:", sqrt(diag(Var_beta_hat)), "\n") # Verify with lm() model <- lm(y ~ X_mat[,2] + X_mat[,3]) cat("\nFrom lm():", coef(summary(model))[, "Std. Error"], "\n") ``` --- ## 13. Koneksi ke Econometrics dan ML ::: {.callout-caution title="Connection: Joint Distributions dalam Praktik"} **OLS sebagai conditional expectation model**: Kita mengasumsikan $E[\varepsilon|X] = 0$, yang berarti $E[y|X] = X\beta$. Ini adalah pernyataan tentang **conditional distribution** — given covariates, expected error adalah nol. Ketika asumsi ini dilanggar (endogeneity), estimator OLS tidak konsisten karena $E[\varepsilon|X] \neq 0$, atau equivalently $\text{Cov}(X, \varepsilon) \neq 0$. **GLS untuk correlated errors**: Ketika $\text{Cov}(\varepsilon) = \sigma^2\Omega \neq \sigma^2 I$ (serial correlation atau heteroskedasticity), kita perlu GLS yang memperhitungkan joint distribution dari error terms. **Principal Component Analysis**: PCA mencari eigenvectors dari covariance matrix $\Sigma$. Ini adalah kasus direct dari linear algebra covariance matrix. **Omitted variable bias**: Jika true model adalah $y = X_1\beta_1 + X_2\beta_2 + \varepsilon$ tapi kita run $y = X_1\gamma + \eta$, maka $\hat{\gamma} \to \beta_1 + \text{Cov}(X_1, X_2)/\text{Var}(X_1) \cdot \beta_2$. Besarnya bias bergantung pada $\text{Cov}(X_1, X_2)$ — sebuah joint distribution quantity! ::: --- ## Practice Problems ::: {.callout-warning title="Practice Problems" collapse="true"} **Problem 1**: Marginal dan conditional. Joint PDF: $f(x,y) = 2$ untuk $0 \leq x \leq y \leq 1$, nol otherwise. - Hitung $f_X(x)$ dan $f_Y(y)$ - Hitung $f_{Y|X}(y|x)$ - Apakah $X$ dan $Y$ independent? - Hitung $E[Y|X=0.3]$ *Jawaban*: - $f_X(x) = \int_x^1 2\,dy = 2(1-x)$ untuk $0 \leq x \leq 1$ - $f_Y(y) = \int_0^y 2\,dx = 2y$ untuk $0 \leq y \leq 1$ - $f_{Y|X}(y|x) = 2/(2(1-x)) = 1/(1-x)$ untuk $x \leq y \leq 1$ (Uniform[x,1]) - No, karena $f_{X,Y} \neq f_X \cdot f_Y$ - $E[Y|X=0.3] = (0.3+1)/2 = 0.65$ (mean of Uniform[0.3, 1]) **Problem 2**: Covariance dan correlation. $X \sim \text{Uniform}(0,1)$ dan $Y = X + \varepsilon$ dimana $\varepsilon \sim N(0, \sigma^2)$ independent dari $X$. - Hitung $\text{Cov}(X,Y)$ - Hitung $\text{Var}(Y)$ - Hitung $\text{Corr}(X,Y)$ - Saat $\sigma^2 \to 0$, apa yang terjadi pada correlation? *Jawaban*: - $\text{Cov}(X,Y) = \text{Cov}(X, X+\varepsilon) = \text{Var}(X) = 1/12$ - $\text{Var}(Y) = \text{Var}(X) + \text{Var}(\varepsilon) = 1/12 + \sigma^2$ - $\rho = (1/12)/\sqrt{(1/12)(1/12+\sigma^2)}$ - Saat $\sigma^2 \to 0$: $\rho \to 1$ (perfect correlation) **Problem 3**: Law of Iterated Expectations. Nilai ujian siswa $Y$ berdistribusi berbeda depending on school type $X$: - $X=1$ (public school, 60% siswa): $Y|X=1 \sim N(75, 100)$ - $X=2$ (private school, 40% siswa): $Y|X=2 \sim N(85, 64)$ Hitung: - $E[Y]$ (overall average score) - $\text{Var}(Y)$ (overall variance, use Law of Total Variance) *Jawaban*: - $E[Y] = E[E[Y|X]] = 75(0.6) + 85(0.4) = 45 + 34 = 79$ - $E[\text{Var}(Y|X)] = 100(0.6) + 64(0.4) = 60 + 25.6 = 85.6$ - $\text{Var}(E[Y|X]) = (75-79)^2(0.6) + (85-79)^2(0.4) = 16(0.6) + 36(0.4) = 9.6 + 14.4 = 24$ - $\text{Var}(Y) = 85.6 + 24 = 109.6$ ::: --- ::: {.page-navigation} [← Previous](04-distributions.qmd) | [↑ Module Overview](index.qmd) | [Next →](06-convergence-lln-clt.qmd) :::

	\(Y=0\)	\(Y=1\)
\(X=0\)	1/4	1/4
\(X=1\)	1/4	1/4