Linear Predictor

It sometimes happens that the joint probability distribution of X and Y is not completely known; or if it is known, it is such that the calculation of E[Y|X=x] is mathematically intractable. If, however, the means and variances of X and Y and the correlation of X and Y are known, then we can at least determine the best linear predictor of Y with respect to X.

To obtain the best linear predictor of Y with respect to X, we need to choose a and b so as to minimize E[(Y-(a+bX))2]. Now,

$\begin{array}{l}
E[(Y-(a+bX))^2] \\ \\
=E[Y^2-2aY-2bXY+a^2+2abX+b^2X^2] \\ \\
=E[Y^2]-2aE[Y]-2bE[XY]+a^2+2abE[X]+b^2E[X^2]
\end{array}$
Taking partial derivatives, we obtain
$\begin{array}{rcl}
\displaystyle\frac{\partial}{\partial a}E[(Y-a-bX)^2]&=&-2E[...
...ial}{\partial b}E[(Y-a-bX)^2]&=&
-2E[XY]+2aE[X]+2bE[X^2]\qquad(5.3)
\end{array}$
Equating Equations (5.3) to 0 and solving for a and b yields the solutions
$\begin{array}{rcl}
a&=&\displaystyle\frac{E[XY]-E[X]E[Y]}{E[X^2]-(E[X])^2}=\fra...
...[X]=E[Y]-\displaystyle\frac{\rho\sigma_yE[X]}{\sigma_x}\qquad (5.4)
\end{array}$
where $\rho=\mbox{ Correlation }(X,Y)$, $\sigma_y^2=Var(Y)$, and $\sigma_x^2=Var(X)$. It is easy to verify that the values of a and b from Equation (5.4) minimize E[Y-a-bX)2], and thus the best (in the sense of mean square error) linear predictor Y with respect to X is $\mu_y+\displaystyle\frac{\rho\sigma_y}{\sigma_x}(X-\mu_x)$ where $\mu_y=E[Y]$ and $\mu_x=E[X]$.

The mean square error of this predictor is given by

$\begin{array}{l}
E\left [\left (Y-\mu_y-\rho\displaystyle\frac{\sigma_y}
{\sigm...
...^2-2\rho^2\sigma_y^2 \\ \\
=\sigma_y^2(1-\rho^2)\qquad\qquad (5.5)
\end{array}$
We note from Equation (5.5) that if $\rho$ is near +1 or -1, then the mean square error of the best linear predictor is near 0.