Mathematical Theory

1. The Smoothing Problem

Given noisy observations \((x_i, y_i)\) where:

\[ y_i = f(x_i) + \varepsilon_i \]

We want to estimate the unknown function \(f(x)\) at any point \(x_0\).

2. Kernel Smoothing Approach

Kernel smoothers estimate \(f(x_0)\) using a weighted average of nearby points:

\[ \hat{f}(x_0) = \frac{\sum_i K\left(\frac{x_i - x_0}{h}\right) y_i}{\sum_i K\left(\frac{x_i - x_0}{h}\right)} \]

where \(K(\cdot)\) is the kernel function and \(h\) is the bandwidth.

3. Epanechnikov Kernel

The Epanechnikov kernel is defined as:

\[ K(u) = \frac{3}{4}(1 - u^2) \cdot \mathbf{1}_{|u| \leq 1} \]

Why Epanechnikov? It is the optimal kernel in the sense of minimizing the asymptotic mean integrated squared error (AMISE) among all second-order kernels.

Properties:

4. Weighted Median vs Weighted Mean

The standard Nadaraya-Watson estimator uses the weighted mean:

\[ \hat{f}_{\text{mean}}(x_0) = \frac{\sum_i w_i y_i}{\sum_i w_i} \]

Our approach uses the weighted median:

\[ \hat{f}_{\text{median}}(x_0) = \text{argmin}_m \sum_i w_i |y_i - m| \]

Robustness Comparison:

Property Weighted Mean Weighted Median
Breakdown Point 0% 50%
Influence Function Unbounded Bounded
Effect of Single Outlier Can shift estimate arbitrarily Limited effect

5. Bandwidth Selection

Silverman's rule of thumb provides automatic bandwidth selection:

\[ h = 0.9 \cdot \min\left(\hat{\sigma}, \frac{\text{IQR}}{1.34}\right) \cdot n^{-1/5} \]

where \(\hat{\sigma}\) is the sample standard deviation and IQR is the interquartile range.

6. Bias-Variance Tradeoff