Schwarz inequality in linear algebra and probability theory

Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.

Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.

What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?

edited Jan 15 at 7:50

asked Jan 13 at 12:33

W. Zhu

685316

1

$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01

$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59

$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26

$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07

1

$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32

add a comment |

edited Jan 15 at 7:50

asked Jan 13 at 12:33

W. Zhu

685316

1

$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01

$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59

$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26

$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07

1

$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32

add a comment |

edited Jan 15 at 7:50

asked Jan 13 at 12:33

W. Zhu

685316

linear-algebra probability-theory cauchy-schwarz-inequality

edited Jan 15 at 7:50

asked Jan 13 at 12:33

W. Zhu

685316

edited Jan 15 at 7:50

asked Jan 13 at 12:33

W. Zhu

685316

edited Jan 15 at 7:50

asked Jan 13 at 12:33

W. Zhu

685316

asked Jan 13 at 12:33

W. Zhu

685316

asked Jan 13 at 12:33

W. Zhu

685316

1

$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01

$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59

$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26

$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07

1

$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32

add a comment |

1

$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01

$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59

$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26

$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07

1

$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32

Not "the same as", rather "a particular case of" (can you spot how?).

– Did
Jan 13 at 13:01

@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!

– W. Zhu
Jan 14 at 2:59

Thus, question solved?

– Did
Jan 14 at 11:26

@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?

– W. Zhu
Jan 14 at 15:07

I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.

– Giuseppe Negro
Jan 15 at 13:32

add a comment |

1 Answer
1

active

oldest

votes

After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.

Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.

Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$

This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).

Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.

Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.

Let us look at 3 more examples that connect linear algebra to probability theory:

The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.

$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.

edited Jan 16 at 13:20

answered Jan 16 at 11:22

W. Zhu

685316

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071967%2fschwarz-inequality-in-linear-algebra-and-probability-theory%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.

Let us look at 3 more examples that connect linear algebra to probability theory:

The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.

$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.

edited Jan 16 at 13:20

answered Jan 16 at 11:22

W. Zhu

685316

add a comment |

After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.

Let us look at 3 more examples that connect linear algebra to probability theory:

The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.

$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.

edited Jan 16 at 13:20

answered Jan 16 at 11:22

W. Zhu

685316

add a comment |

After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.

Let us look at 3 more examples that connect linear algebra to probability theory:

The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.

$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.

edited Jan 16 at 13:20

answered Jan 16 at 11:22

W. Zhu

685316

After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.

Let us look at 3 more examples that connect linear algebra to probability theory:

The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.

$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.

edited Jan 16 at 13:20

answered Jan 16 at 11:22

W. Zhu

685316

edited Jan 16 at 13:20

answered Jan 16 at 11:22

W. Zhu

685316

answered Jan 16 at 11:22

W. Zhu

685316

answered Jan 16 at 11:22

W. Zhu

685316

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

GbjsTuQt6XzzU546kVYLr 9

搜尋此網誌

Xrfgtjtk