Bias Variance Decomposition for KL Divergence
up vote
1
down vote
favorite
While studying a slide deck, I encountered the following exercise:
We are to give a bias variance decomposition when the prediction is given as a probability distribution over $C$ classes.
Let $P = [P_1, . . . , P_C ]$ be the ground truth class distribution associated to a particular input pattern. Assume the random estimator of class probabilities $$bar{{P}} = [bar{{P}}_1, . . . , bar{{P}}_C ] $$ for the same input pattern. The error function is given by
the KL-divergence between the ground truth and the estimated probability distribution:
$$text{Error} = E[D_{KL}(P||bar{{P}})]$$
First, we would like to determine the mean of the class distribution estimator $bar{{P}}$. We define the mean as the distribution
that minimizes its expected KL divergence from the class distribution estimator, that is, the distribution
$R$ that optimizes
$$begin{matrix}min\R end{matrix} space E[ D_{KL}(R||bar{{P}})]$$
I have found a way to proof that this is:
$$ R = [R_1, . . . , R_C ] $$
$$ text{where} spacespace R_i = frac{{expspace E[log bar{P_i} ]}}{{sum_jexpspace E[log bar{P_j}]}} space space space ∀ space 1 ≤ i ≤ C.$$
We are now asked to proof that:
$$ Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) $$
where
$$Error(hat{P}) = E[D_{KL}(P || hat{P})$$
$$Bias(hat{P}) = D_{KL}(P || R)$$
$$Var(hat{P}) = E[D_{KL}(R || hat{P})$$
I started by writing out the KL Divergence:
$$Bias(hat{P}) + Var(hat{P}) = sum_{i=1}^{C} left (P_i log left (frac{P_i}{R_i} right ) right) + E left [ sum_{i=1}^{c} left (R_i log left (frac{R_i}{hat{P}_i} right ) right ) right ]$$
But I don't know how to continue from here on.
statistics variance
add a comment |
up vote
1
down vote
favorite
While studying a slide deck, I encountered the following exercise:
We are to give a bias variance decomposition when the prediction is given as a probability distribution over $C$ classes.
Let $P = [P_1, . . . , P_C ]$ be the ground truth class distribution associated to a particular input pattern. Assume the random estimator of class probabilities $$bar{{P}} = [bar{{P}}_1, . . . , bar{{P}}_C ] $$ for the same input pattern. The error function is given by
the KL-divergence between the ground truth and the estimated probability distribution:
$$text{Error} = E[D_{KL}(P||bar{{P}})]$$
First, we would like to determine the mean of the class distribution estimator $bar{{P}}$. We define the mean as the distribution
that minimizes its expected KL divergence from the class distribution estimator, that is, the distribution
$R$ that optimizes
$$begin{matrix}min\R end{matrix} space E[ D_{KL}(R||bar{{P}})]$$
I have found a way to proof that this is:
$$ R = [R_1, . . . , R_C ] $$
$$ text{where} spacespace R_i = frac{{expspace E[log bar{P_i} ]}}{{sum_jexpspace E[log bar{P_j}]}} space space space ∀ space 1 ≤ i ≤ C.$$
We are now asked to proof that:
$$ Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) $$
where
$$Error(hat{P}) = E[D_{KL}(P || hat{P})$$
$$Bias(hat{P}) = D_{KL}(P || R)$$
$$Var(hat{P}) = E[D_{KL}(R || hat{P})$$
I started by writing out the KL Divergence:
$$Bias(hat{P}) + Var(hat{P}) = sum_{i=1}^{C} left (P_i log left (frac{P_i}{R_i} right ) right) + E left [ sum_{i=1}^{c} left (R_i log left (frac{R_i}{hat{P}_i} right ) right ) right ]$$
But I don't know how to continue from here on.
statistics variance
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
While studying a slide deck, I encountered the following exercise:
We are to give a bias variance decomposition when the prediction is given as a probability distribution over $C$ classes.
Let $P = [P_1, . . . , P_C ]$ be the ground truth class distribution associated to a particular input pattern. Assume the random estimator of class probabilities $$bar{{P}} = [bar{{P}}_1, . . . , bar{{P}}_C ] $$ for the same input pattern. The error function is given by
the KL-divergence between the ground truth and the estimated probability distribution:
$$text{Error} = E[D_{KL}(P||bar{{P}})]$$
First, we would like to determine the mean of the class distribution estimator $bar{{P}}$. We define the mean as the distribution
that minimizes its expected KL divergence from the class distribution estimator, that is, the distribution
$R$ that optimizes
$$begin{matrix}min\R end{matrix} space E[ D_{KL}(R||bar{{P}})]$$
I have found a way to proof that this is:
$$ R = [R_1, . . . , R_C ] $$
$$ text{where} spacespace R_i = frac{{expspace E[log bar{P_i} ]}}{{sum_jexpspace E[log bar{P_j}]}} space space space ∀ space 1 ≤ i ≤ C.$$
We are now asked to proof that:
$$ Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) $$
where
$$Error(hat{P}) = E[D_{KL}(P || hat{P})$$
$$Bias(hat{P}) = D_{KL}(P || R)$$
$$Var(hat{P}) = E[D_{KL}(R || hat{P})$$
I started by writing out the KL Divergence:
$$Bias(hat{P}) + Var(hat{P}) = sum_{i=1}^{C} left (P_i log left (frac{P_i}{R_i} right ) right) + E left [ sum_{i=1}^{c} left (R_i log left (frac{R_i}{hat{P}_i} right ) right ) right ]$$
But I don't know how to continue from here on.
statistics variance
While studying a slide deck, I encountered the following exercise:
We are to give a bias variance decomposition when the prediction is given as a probability distribution over $C$ classes.
Let $P = [P_1, . . . , P_C ]$ be the ground truth class distribution associated to a particular input pattern. Assume the random estimator of class probabilities $$bar{{P}} = [bar{{P}}_1, . . . , bar{{P}}_C ] $$ for the same input pattern. The error function is given by
the KL-divergence between the ground truth and the estimated probability distribution:
$$text{Error} = E[D_{KL}(P||bar{{P}})]$$
First, we would like to determine the mean of the class distribution estimator $bar{{P}}$. We define the mean as the distribution
that minimizes its expected KL divergence from the class distribution estimator, that is, the distribution
$R$ that optimizes
$$begin{matrix}min\R end{matrix} space E[ D_{KL}(R||bar{{P}})]$$
I have found a way to proof that this is:
$$ R = [R_1, . . . , R_C ] $$
$$ text{where} spacespace R_i = frac{{expspace E[log bar{P_i} ]}}{{sum_jexpspace E[log bar{P_j}]}} space space space ∀ space 1 ≤ i ≤ C.$$
We are now asked to proof that:
$$ Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) $$
where
$$Error(hat{P}) = E[D_{KL}(P || hat{P})$$
$$Bias(hat{P}) = D_{KL}(P || R)$$
$$Var(hat{P}) = E[D_{KL}(R || hat{P})$$
I started by writing out the KL Divergence:
$$Bias(hat{P}) + Var(hat{P}) = sum_{i=1}^{C} left (P_i log left (frac{P_i}{R_i} right ) right) + E left [ sum_{i=1}^{c} left (R_i log left (frac{R_i}{hat{P}_i} right ) right ) right ]$$
But I don't know how to continue from here on.
statistics variance
statistics variance
asked Nov 28 at 23:39
BlockchainDieter
61
61
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
We want to proof following statement:
$$Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) = E[D_{KL}(P || hat{P})].$$
$$Error(hat{P}) = E[D_{KL}(P || hat{P})]$$
$$= E[sum_{i=1}^{N}P_{i} log(frac{P_{i}}{hat{P_{i}}})] \$$
$$= E[E[log(P) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R) + log(R) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R)] + E[log(R) - log(hat{P})]] \$$
$$= E[log(P) - log(R)] + E[E[log(R) - log(hat{P})]] \$$
$$= sum_{i=1}^{N}P_{i} log(frac{P_{i}}{R_{i}}) + E[sum_{i=1}^{N}R_{i}
log(frac{R_{i}}{hat{P_{i}}})]$$
$$= D_{KL}(P || R) + E[D_{KL}(R || hat{P})] \
= Bias(hat{P}) + Var(hat{P}) $$
New contributor
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
We want to proof following statement:
$$Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) = E[D_{KL}(P || hat{P})].$$
$$Error(hat{P}) = E[D_{KL}(P || hat{P})]$$
$$= E[sum_{i=1}^{N}P_{i} log(frac{P_{i}}{hat{P_{i}}})] \$$
$$= E[E[log(P) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R) + log(R) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R)] + E[log(R) - log(hat{P})]] \$$
$$= E[log(P) - log(R)] + E[E[log(R) - log(hat{P})]] \$$
$$= sum_{i=1}^{N}P_{i} log(frac{P_{i}}{R_{i}}) + E[sum_{i=1}^{N}R_{i}
log(frac{R_{i}}{hat{P_{i}}})]$$
$$= D_{KL}(P || R) + E[D_{KL}(R || hat{P})] \
= Bias(hat{P}) + Var(hat{P}) $$
New contributor
add a comment |
up vote
1
down vote
We want to proof following statement:
$$Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) = E[D_{KL}(P || hat{P})].$$
$$Error(hat{P}) = E[D_{KL}(P || hat{P})]$$
$$= E[sum_{i=1}^{N}P_{i} log(frac{P_{i}}{hat{P_{i}}})] \$$
$$= E[E[log(P) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R) + log(R) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R)] + E[log(R) - log(hat{P})]] \$$
$$= E[log(P) - log(R)] + E[E[log(R) - log(hat{P})]] \$$
$$= sum_{i=1}^{N}P_{i} log(frac{P_{i}}{R_{i}}) + E[sum_{i=1}^{N}R_{i}
log(frac{R_{i}}{hat{P_{i}}})]$$
$$= D_{KL}(P || R) + E[D_{KL}(R || hat{P})] \
= Bias(hat{P}) + Var(hat{P}) $$
New contributor
add a comment |
up vote
1
down vote
up vote
1
down vote
We want to proof following statement:
$$Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) = E[D_{KL}(P || hat{P})].$$
$$Error(hat{P}) = E[D_{KL}(P || hat{P})]$$
$$= E[sum_{i=1}^{N}P_{i} log(frac{P_{i}}{hat{P_{i}}})] \$$
$$= E[E[log(P) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R) + log(R) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R)] + E[log(R) - log(hat{P})]] \$$
$$= E[log(P) - log(R)] + E[E[log(R) - log(hat{P})]] \$$
$$= sum_{i=1}^{N}P_{i} log(frac{P_{i}}{R_{i}}) + E[sum_{i=1}^{N}R_{i}
log(frac{R_{i}}{hat{P_{i}}})]$$
$$= D_{KL}(P || R) + E[D_{KL}(R || hat{P})] \
= Bias(hat{P}) + Var(hat{P}) $$
New contributor
We want to proof following statement:
$$Error(hat{P}) = Bias(hat{P}) + Var(hat{P}) = E[D_{KL}(P || hat{P})].$$
$$Error(hat{P}) = E[D_{KL}(P || hat{P})]$$
$$= E[sum_{i=1}^{N}P_{i} log(frac{P_{i}}{hat{P_{i}}})] \$$
$$= E[E[log(P) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R) + log(R) - log(hat{P})]] \$$
$$= E[E[log(P) - log(R)] + E[log(R) - log(hat{P})]] \$$
$$= E[log(P) - log(R)] + E[E[log(R) - log(hat{P})]] \$$
$$= sum_{i=1}^{N}P_{i} log(frac{P_{i}}{R_{i}}) + E[sum_{i=1}^{N}R_{i}
log(frac{R_{i}}{hat{P_{i}}})]$$
$$= D_{KL}(P || R) + E[D_{KL}(R || hat{P})] \
= Bias(hat{P}) + Var(hat{P}) $$
New contributor
New contributor
answered Dec 2 at 13:32
arsaljalib
111
111
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3017916%2fbias-variance-decomposition-for-kl-divergence%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown