# ProbabilityCovariance

Just as mean and variance are summary statistics for the distribution of a single random variable, *covariance* is useful for summarizing how are jointly distributed.

The **covariance** of two random variables and is defined to be the expected product of their deviations from their respective means:

The covariance of two independent random variables is zero, because the expectation

**Exercise**

Identify each of the following joint distributions as representing positive covariance, zero covariance, or negative covariance. The size of a dot at represents the probability that and .

*Solution.* The first graph shows negative covariance, since and have opposite sign for the top-left mass and for the bottom-right mass, and the contributions of the other two points are smaller since these points are close to the mean .

The second graph shows positive covariance, since the top right and bottom left points contribute positively, and the middle point contributes much less.

The third graph shows zero covariance, since the points contribute to the sum defining in two cancelling pairs.

**Exercise**

Does imply that and are independent?

Hint: consider the previous exercise. Alternatively, consider a random variable which is uniformly distributed on and an independent random variable which is uniformly distributed on . Set . Consider the pair .

*Solution.* The third example in the previous exercise shows a non-independent pair of random variables which has zero covariance.

Alternatively, the suggested random variables and have zero covariance, but they are not independent since, for example, even though and are both positive.

**Exercise**

The **correlation** of two random variables and is defined to be their covariance normalized by the product of their standard deviations:

In this problem, we will show that the correlation of two random variables is always between and . Let , and let .

Consider the following quadratic polynomial in :

where is a variable. Explain why this polynomial is nonnegative for all .

Recall that a polynomial is nonnegative for all if and only if the discriminant is nonpositive (this follows from the quadratic formula). Use this fact to show that

Conclude that .

*Solution.*

The polynomial is nonnegative because the left-hand side of the given equation is the expectation of a nonnegative random variable.

Substituting for , for , and for , the inequality implies

which implies the desired inequality.

Dividing both sides of the preceding inequality by and taking the square root of both sides, we find that , which implies .

**Exercise**

Show that

if are independent random variables.

*Solution.* The expectation of is the sum of the values in this table:

The square of the expectation of is the sum of the values in this table:

Subtracting these two tables entry-by-entry, we get the variances on the right-hand side from the diagonal terms, and all of the off-diagonal terms cancel, by the

**Exercise** (Mean and variance of the sample mean)

Suppose that are independent random variables with the same distribution. Find the mean and variance of

*Solution.* By linearity of expectation, we have

Then

**Exercise**

The **covariance matrix** of a vector of random variables defined on the same probability space is defined to be the matrix whose th entry is equal to .

Show that if all of the random variables have mean zero. (Note: expectation operates on a matrix or vector of random variables entry-by-entry.)

*Solution.* The definition of matrix multiplication implies that the th entry of is equal to . Therefore, the th entry of is equal to , which in turn is equal to since the random variables have zero mean.