Relation between Shannon Entropy and Total Variation distance

Let $p_1(cdot), p_2(cdot)$ be two discrete distributions on $mathbb{Z}.$ Total variation distance is defined as $d_{TV}(p_1,p_2)= frac{1}{2} displaystyle sum_{k in mathbb{Z}}|p_1(k)-p_2(k)|$ and Shannon entropy is defined the usual way, i.e
$$
H(p_1)=sum_k p_1(k) log(frac{1}{p_1(k)})
$$
Binary entropy function $h(cdot)$ is defined by $h(x)=x log(1/x)+(1-x)log(1/1-x), forall x in (0,1)$

I am trying to prove that $H(frac{p_1+p_2}{2})-frac{1}{2}H(p_1)-frac{1}{2}H(p_2) leq h (d_{TV}(p_1,p_2)/2)$. Can anyone guide me in this direction ?

edited Oct 17 '15 at 2:41

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

Out of curiosity, where did that question arise?
– Clement C.
Oct 15 '15 at 23:47

I would write of a function $h$ rather than of a function $h(cdot)$, reserving the parentheses to express a value of the function at some argument, as in $text{“}h(x)=text{some expression depending on }xtext{''}$. However, you feel strongly that you need the parentheses, the proper notation is $h(cdot)$ rather than $h(.)$. I edited accordingly. ${}qquad{}$
– Michael Hardy
Oct 15 '15 at 23:53

@ClementC. : The exact problem statement is as follows : $X ~ Bern(0.5), mathbb{P}(Y=k|X=0)=p_1(k), mathbb{P}(Y=k|X=1)=p_2(k)$. I am trying to prove $I(X;Y) leq h(d_{TV}(p_1,p_2))$
– pikachuchameleon
Oct 16 '15 at 2:37

@AshokVardhan I am deleting my previous comments, since they are no longer relevant to the question after the correction/edit you made. On a side note, I wonder if looking as the other expression of TV, namely $sup_S (p_1(S) - p_2(S))$, would help as a first step.
– Clement C.
Oct 17 '15 at 14:22

1

Without some assumptions on the entropies of $p_1,p_2$, it seems that what you are trying to prove may lead into trouble, because the right hand side of the inequality, namely, $h(d_{TV}(p_1,p_2)/2)$, is always finite, ( clearly $d_{TV}(p_1,p_2)leq 1$), but the left hand side can be infinite. For example, take $p_1$ with $H(p_1)=infty$. Now choose a second distribution $p_2$ for which $H(p_2)$ is finite. You get: $$infty-frac{1}{2}infty-frac{1}{2}H(p_2)leq C$$ for some positive number $C>0$. This does not make much sense.
– uniquesolution
Oct 23 '15 at 21:44

|
show 12 more comments

I am trying to prove that $H(frac{p_1+p_2}{2})-frac{1}{2}H(p_1)-frac{1}{2}H(p_2) leq h (d_{TV}(p_1,p_2)/2)$. Can anyone guide me in this direction ?

edited Oct 17 '15 at 2:41

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

Out of curiosity, where did that question arise?
– Clement C.
Oct 15 '15 at 23:47

I would write of a function $h$ rather than of a function $h(cdot)$, reserving the parentheses to express a value of the function at some argument, as in $text{“}h(x)=text{some expression depending on }xtext{''}$. However, you feel strongly that you need the parentheses, the proper notation is $h(cdot)$ rather than $h(.)$. I edited accordingly. ${}qquad{}$
– Michael Hardy
Oct 15 '15 at 23:53

@ClementC. : The exact problem statement is as follows : $X ~ Bern(0.5), mathbb{P}(Y=k|X=0)=p_1(k), mathbb{P}(Y=k|X=1)=p_2(k)$. I am trying to prove $I(X;Y) leq h(d_{TV}(p_1,p_2))$
– pikachuchameleon
Oct 16 '15 at 2:37

@AshokVardhan I am deleting my previous comments, since they are no longer relevant to the question after the correction/edit you made. On a side note, I wonder if looking as the other expression of TV, namely $sup_S (p_1(S) - p_2(S))$, would help as a first step.
– Clement C.
Oct 17 '15 at 14:22

1

Without some assumptions on the entropies of $p_1,p_2$, it seems that what you are trying to prove may lead into trouble, because the right hand side of the inequality, namely, $h(d_{TV}(p_1,p_2)/2)$, is always finite, ( clearly $d_{TV}(p_1,p_2)leq 1$), but the left hand side can be infinite. For example, take $p_1$ with $H(p_1)=infty$. Now choose a second distribution $p_2$ for which $H(p_2)$ is finite. You get: $$infty-frac{1}{2}infty-frac{1}{2}H(p_2)leq C$$ for some positive number $C>0$. This does not make much sense.
– uniquesolution
Oct 23 '15 at 21:44

|
show 12 more comments

I am trying to prove that $H(frac{p_1+p_2}{2})-frac{1}{2}H(p_1)-frac{1}{2}H(p_2) leq h (d_{TV}(p_1,p_2)/2)$. Can anyone guide me in this direction ?

edited Oct 17 '15 at 2:41

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

I am trying to prove that $H(frac{p_1+p_2}{2})-frac{1}{2}H(p_1)-frac{1}{2}H(p_2) leq h (d_{TV}(p_1,p_2)/2)$. Can anyone guide me in this direction ?

probability-theory measure-theory information-theory

edited Oct 17 '15 at 2:41

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

edited Oct 17 '15 at 2:41

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

edited Oct 17 '15 at 2:41

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

asked Oct 15 '15 at 23:27

pikachuchameleon

1,124620

Out of curiosity, where did that question arise?
– Clement C.
Oct 15 '15 at 23:47

I would write of a function $h$ rather than of a function $h(cdot)$, reserving the parentheses to express a value of the function at some argument, as in $text{“}h(x)=text{some expression depending on }xtext{''}$. However, you feel strongly that you need the parentheses, the proper notation is $h(cdot)$ rather than $h(.)$. I edited accordingly. ${}qquad{}$
– Michael Hardy
Oct 15 '15 at 23:53

@ClementC. : The exact problem statement is as follows : $X ~ Bern(0.5), mathbb{P}(Y=k|X=0)=p_1(k), mathbb{P}(Y=k|X=1)=p_2(k)$. I am trying to prove $I(X;Y) leq h(d_{TV}(p_1,p_2))$
– pikachuchameleon
Oct 16 '15 at 2:37

@AshokVardhan I am deleting my previous comments, since they are no longer relevant to the question after the correction/edit you made. On a side note, I wonder if looking as the other expression of TV, namely $sup_S (p_1(S) - p_2(S))$, would help as a first step.
– Clement C.
Oct 17 '15 at 14:22

1

Without some assumptions on the entropies of $p_1,p_2$, it seems that what you are trying to prove may lead into trouble, because the right hand side of the inequality, namely, $h(d_{TV}(p_1,p_2)/2)$, is always finite, ( clearly $d_{TV}(p_1,p_2)leq 1$), but the left hand side can be infinite. For example, take $p_1$ with $H(p_1)=infty$. Now choose a second distribution $p_2$ for which $H(p_2)$ is finite. You get: $$infty-frac{1}{2}infty-frac{1}{2}H(p_2)leq C$$ for some positive number $C>0$. This does not make much sense.
– uniquesolution
Oct 23 '15 at 21:44

|
show 12 more comments

Out of curiosity, where did that question arise?
– Clement C.
Oct 15 '15 at 23:47

I would write of a function $h$ rather than of a function $h(cdot)$, reserving the parentheses to express a value of the function at some argument, as in $text{“}h(x)=text{some expression depending on }xtext{''}$. However, you feel strongly that you need the parentheses, the proper notation is $h(cdot)$ rather than $h(.)$. I edited accordingly. ${}qquad{}$
– Michael Hardy
Oct 15 '15 at 23:53

@ClementC. : The exact problem statement is as follows : $X ~ Bern(0.5), mathbb{P}(Y=k|X=0)=p_1(k), mathbb{P}(Y=k|X=1)=p_2(k)$. I am trying to prove $I(X;Y) leq h(d_{TV}(p_1,p_2))$
– pikachuchameleon
Oct 16 '15 at 2:37

@AshokVardhan I am deleting my previous comments, since they are no longer relevant to the question after the correction/edit you made. On a side note, I wonder if looking as the other expression of TV, namely $sup_S (p_1(S) - p_2(S))$, would help as a first step.
– Clement C.
Oct 17 '15 at 14:22

1

Without some assumptions on the entropies of $p_1,p_2$, it seems that what you are trying to prove may lead into trouble, because the right hand side of the inequality, namely, $h(d_{TV}(p_1,p_2)/2)$, is always finite, ( clearly $d_{TV}(p_1,p_2)leq 1$), but the left hand side can be infinite. For example, take $p_1$ with $H(p_1)=infty$. Now choose a second distribution $p_2$ for which $H(p_2)$ is finite. You get: $$infty-frac{1}{2}infty-frac{1}{2}H(p_2)leq C$$ for some positive number $C>0$. This does not make much sense.
– uniquesolution
Oct 23 '15 at 21:44

Out of curiosity, where did that question arise?
– Clement C.
Oct 15 '15 at 23:47

I would write of a function $h$ rather than of a function $h(cdot)$, reserving the parentheses to express a value of the function at some argument, as in $text{“}h(x)=text{some expression depending on }xtext{''}$. However, you feel strongly that you need the parentheses, the proper notation is $h(cdot)$ rather than $h(.)$. I edited accordingly. ${}qquad{}$
– Michael Hardy
Oct 15 '15 at 23:53

@ClementC. : The exact problem statement is as follows : $X ~ Bern(0.5), mathbb{P}(Y=k|X=0)=p_1(k), mathbb{P}(Y=k|X=1)=p_2(k)$. I am trying to prove $I(X;Y) leq h(d_{TV}(p_1,p_2))$
– pikachuchameleon
Oct 16 '15 at 2:37

@AshokVardhan I am deleting my previous comments, since they are no longer relevant to the question after the correction/edit you made. On a side note, I wonder if looking as the other expression of TV, namely $sup_S (p_1(S) - p_2(S))$, would help as a first step.
– Clement C.
Oct 17 '15 at 14:22

Without some assumptions on the entropies of $p_1,p_2$, it seems that what you are trying to prove may lead into trouble, because the right hand side of the inequality, namely, $h(d_{TV}(p_1,p_2)/2)$, is always finite, ( clearly $d_{TV}(p_1,p_2)leq 1$), but the left hand side can be infinite. For example, take $p_1$ with $H(p_1)=infty$. Now choose a second distribution $p_2$ for which $H(p_2)$ is finite. You get: $$infty-frac{1}{2}infty-frac{1}{2}H(p_2)leq C$$ for some positive number $C>0$. This does not make much sense.
– uniquesolution
Oct 23 '15 at 21:44

|
show 12 more comments

1 Answer
1

active

oldest

votes

Below the second part is rather inelegant, and I think this can possibly be improved. Suggestions are welcome.

Note that the LHS is the Jensen-Shannon divergence ($mathrm{JSD}$) between $P_1$ and $P_2$, and that $mathrm{JSD}$ is a $f$-divergence. For $f$-divergences generated by $f, g$ the joint ranges of $D_f,D_g$ are defined as
begin{align}
mathbb{R}^2 supset mathcal{R} :=& { (D_f(P|Q), D_g(P|Q)): P, Q textrm{ are distributions on some measurable space} } \
mathcal{R}_k :=& { (D_f(P|Q), D_g(P|Q)): P, Q textrm{ are distributions on } ([1:k], 2^{[1:k]} ) }end{align}

A remarkable theorem of Harremoees & Vajda (see also these notes by Wu) states that for any pair of $f$-divergences, $$mathcal{R} = mathrm{co}(mathcal{R}_2),$$ where $mathrm{co}$ is the convex hull operator.

Now, we want to show the relation $mathrm{JSD} le h(d_{TV}).$ Since both $mathrm{JSD}$ and $d_{TV}$ are $f$-divergences, and since the set $mathcal{S} := { y - h(x) le 0}$ is convex in $mathbb{R}^2$, it suffices to show this inequality for distributions on $2$-symbols, since by the convexity we have $mathcal{R}_2 subset mathcal{S} implies mathrm{co}(mathcal{R}_2) subset mathcal{S},$ as the convex hull of a set is the intersection of all convex sets containing it. The remainder of this answer will thus concentrate on showing $mathcal{R}_2 subset mathcal{S}$.

Let $p := pi + delta, q:= pi - delta,$ where $delta in [0,1/2]$ and $pi in [delta, 1- delta].$ We will show that $$ mathrm{JSD}(mathrm{Bern}(p)|mathrm{Bern}(q) ) le hleft(frac{1}{2}d_{TV}(mathrm{Bern}(p)|mathrm{Bern}(q) )right) = h(delta), tag{1}$$ which suffices to show the relation on $2$-letter distributions. Note that above $pge q$ always, but this doesn't matter since both $mathrm{JSD}$ and $d_{TV}$ are symmetric in their arguments.

For conciseness I'll set represent the $mathrm{JSD}$ above by $J$. All '$log$'s in the following are natural, and we will make use of the simple identities for $p in (0,1)$ $$ frac{mathrm{d}}{mathrm{d}p} h(p) = log frac{1-p}{p} \ frac{mathrm{d}^2}{mathrm{d}p^2} h(p) = -frac{1}{p} - frac{1}{1-p}. $$

By the expansion in the question, $$J(pi, delta) = h( pi) - frac{1}{2} h(pi + delta) - frac{1}{2} h(pi - delta).$$

It is trivial to see that the relation $(1)$ holds if $delta = 0$. Let us thus assume that $delta > 0.$ For $pi in (delta, 1-delta),$ we have

begin{align} frac{partial}{partial pi} J &= log frac{1-pi}{pi} - frac{1}{2} left( log frac{1 - pi - delta}{pi + delta} + log frac{1 - pi +delta}{pi - delta}right) end{align} and begin{align} frac{partial^2}{partial pi^2} J &= frac{1}{2} left( frac{1}{pi + delta} + frac{1}{pi - delta} + frac{1}{1 - pi - delta} + frac{1}{1 - pi + delta} right) - frac{1}{pi} - frac{1}{1-pi} \
&= frac{pi}{pi^2 - delta^2} - frac{1}{pi} + frac{1 - pi}{( 1-pi)^2 - delta^2} - frac{1}{1-pi} \
&= frac{delta^2}{pi(pi^2 - delta^2)} + frac{delta^2}{(1-pi)( (1-pi)^2 - delta^2)} > 0,
end{align}

where the final inequality uses $delta > 0,$ and that $ pi in (delta, 1-delta).$

As a consequence, for every fixed $delta >0,$ $J$ is strictly convex on $(delta, 1-delta).$ Since the maxima of a convex function on an interval must lie on the end points, we have $$ J(pi ,delta) le max( J(delta, delta), J(1- delta, delta) ).$$

But $$J(delta, delta) = h(delta) - frac{1}{2} (h(2delta) + h(0) ) = h(delta) - frac{1}{2} h(2delta),$$ and similarly $$J(1-delta, delta) = h(delta) - frac{1}{2} h(1-2delta) = h(delta) - frac{1}{2} h(2delta),$$ by the symmetry of $h$. We immediately get that for every $delta in [0,1/2], pi in [delta, 1-delta],$ $$J(pi, delta) le h(delta) - frac{1}{2} h(2delta) le h(delta),$$ finishing the argument.

Note that the last line indicates something stronger for $2$-symbol distributions: $J(pi, delta) le h(delta) - h(2delta)/2$. Unfortunately the RHS is a convex function of $delta$, so this doesn't directly extend to all alphabets. It'd be interesting if a bound that has such an advantage can be shown in general.

edited Dec 10 '18 at 21:50

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1482341%2frelation-between-shannon-entropy-and-total-variation-distance%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Below the second part is rather inelegant, and I think this can possibly be improved. Suggestions are welcome.

By the expansion in the question, $$J(pi, delta) = h( pi) - frac{1}{2} h(pi + delta) - frac{1}{2} h(pi - delta).$$

It is trivial to see that the relation $(1)$ holds if $delta = 0$. Let us thus assume that $delta > 0.$ For $pi in (delta, 1-delta),$ we have

where the final inequality uses $delta > 0,$ and that $ pi in (delta, 1-delta).$

edited Dec 10 '18 at 21:50

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

add a comment |

Below the second part is rather inelegant, and I think this can possibly be improved. Suggestions are welcome.

By the expansion in the question, $$J(pi, delta) = h( pi) - frac{1}{2} h(pi + delta) - frac{1}{2} h(pi - delta).$$

It is trivial to see that the relation $(1)$ holds if $delta = 0$. Let us thus assume that $delta > 0.$ For $pi in (delta, 1-delta),$ we have

where the final inequality uses $delta > 0,$ and that $ pi in (delta, 1-delta).$

edited Dec 10 '18 at 21:50

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

add a comment |

Below the second part is rather inelegant, and I think this can possibly be improved. Suggestions are welcome.

By the expansion in the question, $$J(pi, delta) = h( pi) - frac{1}{2} h(pi + delta) - frac{1}{2} h(pi - delta).$$

It is trivial to see that the relation $(1)$ holds if $delta = 0$. Let us thus assume that $delta > 0.$ For $pi in (delta, 1-delta),$ we have

where the final inequality uses $delta > 0,$ and that $ pi in (delta, 1-delta).$

edited Dec 10 '18 at 21:50

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

Below the second part is rather inelegant, and I think this can possibly be improved. Suggestions are welcome.

By the expansion in the question, $$J(pi, delta) = h( pi) - frac{1}{2} h(pi + delta) - frac{1}{2} h(pi - delta).$$

It is trivial to see that the relation $(1)$ holds if $delta = 0$. Let us thus assume that $delta > 0.$ For $pi in (delta, 1-delta),$ we have

where the final inequality uses $delta > 0,$ and that $ pi in (delta, 1-delta).$

edited Dec 10 '18 at 21:50

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

edited Dec 10 '18 at 21:50

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

answered Dec 10 '18 at 20:45

stochasticboy321

2,477617

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xrfgtjtk