What is the motivation of proximal mapping?

For a convex function $h$, its proximal operator is defined as:
$$operatorname{prox}_h(x)=argmin_u Big(h(u)+frac{1}{2}|u-x|^2Big)$$
Can anyone provide an intuitive explanation/motivation of proximal mapping?

edited Dec 17 '18 at 12:45

littleO

29.6k645109

asked Jul 8 '12 at 6:40

mining

465513

$begingroup$
Not an answer, but... If you let $g(x) = frac{1}{2} ||x||^2$, then the function $h square g (x) = inf_{u} h(u)+g(x-u)$ is called the infimal convolution and corresponds to the function you get from the set $mathbb{epi}, h + mathbb{epi} , g$. So there is a nice geometrical interpretation as a 'smoothing' of sorts. Rockafellar's classic Convex Analysis is a good read & reference. Predates the current popularization of convex analysis.
$endgroup$
– copper.hat
Jul 8 '12 at 10:01

$begingroup$
Oops, I should have been more explicit, $mathbb{epi} , h square g = mathbb{epi} ,h + mathbb{epi} , g$. The epigraph defines the function.
$endgroup$
– copper.hat
Jul 8 '12 at 16:10

$begingroup$
Thank you very much. I have ordered one copy.
$endgroup$
– mining
Jul 8 '12 at 23:59

$begingroup$
I also don't have a definite answer, but one way to think of it is that you want a point that minimizes $h(u)$ but is also close to $x$ (the argument of prox). There is usually also a constant that dictates the tradeoff between these two components of the objective. The prox operator is very important in convex analysis and its properties are well studied. Take a look at this monograph by Boyd for more details.
$endgroup$
– rnegrinho
Jun 12 '14 at 11:42

add a comment |

edited Dec 17 '18 at 12:45

littleO

29.6k645109

asked Jul 8 '12 at 6:40

mining

465513

$begingroup$
Not an answer, but... If you let $g(x) = frac{1}{2} ||x||^2$, then the function $h square g (x) = inf_{u} h(u)+g(x-u)$ is called the infimal convolution and corresponds to the function you get from the set $mathbb{epi}, h + mathbb{epi} , g$. So there is a nice geometrical interpretation as a 'smoothing' of sorts. Rockafellar's classic Convex Analysis is a good read & reference. Predates the current popularization of convex analysis.
$endgroup$
– copper.hat
Jul 8 '12 at 10:01

$begingroup$
Oops, I should have been more explicit, $mathbb{epi} , h square g = mathbb{epi} ,h + mathbb{epi} , g$. The epigraph defines the function.
$endgroup$
– copper.hat
Jul 8 '12 at 16:10

$begingroup$
Thank you very much. I have ordered one copy.
$endgroup$
– mining
Jul 8 '12 at 23:59

$begingroup$
I also don't have a definite answer, but one way to think of it is that you want a point that minimizes $h(u)$ but is also close to $x$ (the argument of prox). There is usually also a constant that dictates the tradeoff between these two components of the objective. The prox operator is very important in convex analysis and its properties are well studied. Take a look at this monograph by Boyd for more details.
$endgroup$
– rnegrinho
Jun 12 '14 at 11:42

add a comment |

edited Dec 17 '18 at 12:45

littleO

29.6k645109

asked Jul 8 '12 at 6:40

mining

465513

convex-analysis convex-optimization

edited Dec 17 '18 at 12:45

littleO

29.6k645109

asked Jul 8 '12 at 6:40

mining

465513

edited Dec 17 '18 at 12:45

littleO

29.6k645109

asked Jul 8 '12 at 6:40

mining

465513

edited Dec 17 '18 at 12:45

littleO

29.6k645109

edited Dec 17 '18 at 12:45

littleO

29.6k645109

edited Dec 17 '18 at 12:45

littleO

29.6k645109

asked Jul 8 '12 at 6:40

mining

465513

asked Jul 8 '12 at 6:40

mining

465513

asked Jul 8 '12 at 6:40

mining

465513

$begingroup$
Not an answer, but... If you let $g(x) = frac{1}{2} ||x||^2$, then the function $h square g (x) = inf_{u} h(u)+g(x-u)$ is called the infimal convolution and corresponds to the function you get from the set $mathbb{epi}, h + mathbb{epi} , g$. So there is a nice geometrical interpretation as a 'smoothing' of sorts. Rockafellar's classic Convex Analysis is a good read & reference. Predates the current popularization of convex analysis.
$endgroup$
– copper.hat
Jul 8 '12 at 10:01

$begingroup$
Oops, I should have been more explicit, $mathbb{epi} , h square g = mathbb{epi} ,h + mathbb{epi} , g$. The epigraph defines the function.
$endgroup$
– copper.hat
Jul 8 '12 at 16:10

$begingroup$
Thank you very much. I have ordered one copy.
$endgroup$
– mining
Jul 8 '12 at 23:59

$begingroup$
I also don't have a definite answer, but one way to think of it is that you want a point that minimizes $h(u)$ but is also close to $x$ (the argument of prox). There is usually also a constant that dictates the tradeoff between these two components of the objective. The prox operator is very important in convex analysis and its properties are well studied. Take a look at this monograph by Boyd for more details.
$endgroup$
– rnegrinho
Jun 12 '14 at 11:42

add a comment |

$begingroup$
Not an answer, but... If you let $g(x) = frac{1}{2} ||x||^2$, then the function $h square g (x) = inf_{u} h(u)+g(x-u)$ is called the infimal convolution and corresponds to the function you get from the set $mathbb{epi}, h + mathbb{epi} , g$. So there is a nice geometrical interpretation as a 'smoothing' of sorts. Rockafellar's classic Convex Analysis is a good read & reference. Predates the current popularization of convex analysis.
$endgroup$
– copper.hat
Jul 8 '12 at 10:01

$begingroup$
Oops, I should have been more explicit, $mathbb{epi} , h square g = mathbb{epi} ,h + mathbb{epi} , g$. The epigraph defines the function.
$endgroup$
– copper.hat
Jul 8 '12 at 16:10

$begingroup$
Thank you very much. I have ordered one copy.
$endgroup$
– mining
Jul 8 '12 at 23:59

$begingroup$
I also don't have a definite answer, but one way to think of it is that you want a point that minimizes $h(u)$ but is also close to $x$ (the argument of prox). There is usually also a constant that dictates the tradeoff between these two components of the objective. The prox operator is very important in convex analysis and its properties are well studied. Take a look at this monograph by Boyd for more details.
$endgroup$
– rnegrinho
Jun 12 '14 at 11:42

Not an answer, but... If you let $g(x) = frac{1}{2} ||x||^2$, then the function $h square g (x) = inf_{u} h(u)+g(x-u)$ is called the infimal convolution and corresponds to the function you get from the set $mathbb{epi}, h + mathbb{epi} , g$. So there is a nice geometrical interpretation as a 'smoothing' of sorts. Rockafellar's classic Convex Analysis is a good read & reference. Predates the current popularization of convex analysis.

– copper.hat
Jul 8 '12 at 10:01

Oops, I should have been more explicit, $mathbb{epi} , h square g = mathbb{epi} ,h + mathbb{epi} , g$. The epigraph defines the function.

– copper.hat
Jul 8 '12 at 16:10

Thank you very much. I have ordered one copy.

– mining
Jul 8 '12 at 23:59

I also don't have a definite answer, but one way to think of it is that you want a point that minimizes $h(u)$ but is also close to $x$ (the argument of prox). There is usually also a constant that dictates the tradeoff between these two components of the objective. The prox operator is very important in convex analysis and its properties are well studied. Take a look at this monograph by Boyd for more details.

– rnegrinho
Jun 12 '14 at 11:42

add a comment |

2 Answers
2

active

oldest

votes

Here are two interpretations which can lend useful intuition in some circumstances:

Prox is a generalized projection

Consider the indicator of some nonempty convex set $X$: $$i_X(x) = begin{cases}0 & text{if } x in X \ infty & text{if } x not in Xend{cases}.$$ Then we find that the proximal mapping is the projection onto $X$: $$begin{align}text{prox}_{alpha i_X}(y) &= underset{x}{text{argmin}}(alpha i_X(x)+frac{1}{2}|x - y|^2) \ &= underset{x in X}{text{argmin}}(frac{1}{2}|x - y|^2) \ &= text{proj}_X(y).end{align}$$

Prox is backward Euler

Consider the backward Euler update: $x^{k+1} = x^k - alpha nabla f(x^{k+1})$ for some differentiable $f$. This update can be viewed as a proximal point update $x^{k+1} = text{prox}_{alpha f}(x^k)$, since for differentiable convex functions the optimality condition is $0 = nabla (alpha f(x^star) + frac{1}{2}|x^star - x^k|^2) = alpha nabla f(x^star) + x^star - x^k$.

answered Nov 2 '18 at 20:10

Moormanly

1314

add a comment |

If $h$ is a closed convex function, then the proximal operator of $h$ (with parameter $t > 0$) is defined by
$$
text{prox}_{th}(hat x) = arg min_x , h(x) + frac{1}{2t} |x - hat x |_2^2.
$$
When we evaluate the prox-operator of $h$, we are attempting to reduce the value of $h$, but we are penalized for straying too far from $hat x$. (The parameter $t$ is like a "step size" that controls how much we are penalized for moving away from $hat x$.)

You can see that evaluating a prox-operator might be a useful sub-step in an optimization algorithm. For example, suppose that we want to solve the optimization problem
$$
text{minimize} quad g(x) + h(x)
$$
where $g$ and $h$ are convex, $g$ is differentiable, and $h$ is "simple" in the sense that its prox-operator can be evaluated efficiently. (Importantly, $h$ is not required to be differentiable.) A natural strategy is to first reduce the value of $g$ by taking a step in the negative gradient direction, then reduce the value of $h$ by applying the prox-operator of $h$, and repeat. This strategy yields the following iteration:
$$
x^{k+1} = text{prox}_{th}(x^k - t nabla g(x^k)).
$$
(This iteration is known as the proximal gradient method. You could implement it right now in Python to solve famous optimization problems such as the Lasso problem from statistics.)

Here is an alternative way to derive the proximal gradient method and, in the process, discover the proximal operator. Suppose again that we want to minimize $g(x) + h(x)$, with $g$ and $h$ as given above. A natural strategy is to replace $g$ with a local linear approximation to $g$ (this is the fundamental strategy of calculus), and also to add a penalty term to the objective function which penalizes us from straying too far from the current iterate (this ensures that the local linear approximation to $g$ remains accurate). This strategy yields the following iteration:
$$
x^{k+1} = arg min_x quad h(x) + underbrace{g(x^k) + nabla g(x^k)^T(x - x^k)}_{approx , g(x)} + frac{1}{2t} |x - x^k |_2^2.
$$
Now if we combine the two terms on the right by completing the square, and then omit terms that do not depend on $x$, we find that
begin{align}
x^{k+1} &= arg min_x , h(x) + frac{1}{2t} |x - (x^k - tnabla g(x^k)) |_2^2 \
&= text{prox}_{th}(x^k - tnabla g(x^k)).
end{align}
This is the proximal gradient iteration. We see that the proximal operator of $h$ has appeared out of thin air. To me this seems like the most likely way that someone would discover the idea of a proximal operator.

Another motivation for the proximal operator comes from the viewpoint of monotone operator theory. Minimizing a closed convex function $h$ is equivalent to finding a point $x$ such that
$$
tag{1} 0 in partial h(x).
$$
Here $partial h(x)$ is the subdifferential of $h$ at $x$. The set-valued mapping $x mapsto partial h(x)$ is the best example of a "monotone operator", and problem (1) is called a "monotone inclusion problem". Multiplying both sides by $t$ and then adding $x$ to both sides, we find that
begin{align}
& 0 in partial h(x) \
iff & x in (I + t partial h)(x) \
iff & x in (I + t partial h)^{-1}(x).
end{align}
The operator $(I + t partial h)^{-1}$ is called the "resolvent" of the operator $partial h$, and it has nice properties such as the following: if $x in mathbb R^n$, then $(I + t partial h)^{-1}(x)$ is a singleton. Thus, $(I + t partial h)^{-1}$ can be viewed as a function (rather than a set-valued mapping), and solving $0 in partial h(x)$ is equivalent to solving

$$
x = (I + t partial h)^{-1}(x).
$$
A natural idea is to solve for $x$ using fixed point iteration. The resulting iteration is
$$
x^{k+1} = (I + t partial h)^{-1}(x^k).
$$

Punch line: The function $(I + t partial h)^{-1}$ is none other than the proximal operator of $h$, and this iteration is known as the "proximal point algorithm".

edited Dec 17 '18 at 12:47

answered Dec 17 '18 at 12:37

littleO

29.6k645109

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f168159%2fwhat-is-the-motivation-of-proximal-mapping%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here are two interpretations which can lend useful intuition in some circumstances:

Prox is a generalized projection

Prox is backward Euler

answered Nov 2 '18 at 20:10

Moormanly

1314

add a comment |

Here are two interpretations which can lend useful intuition in some circumstances:

Prox is a generalized projection

Prox is backward Euler

answered Nov 2 '18 at 20:10

Moormanly

1314

add a comment |

Here are two interpretations which can lend useful intuition in some circumstances:

Prox is a generalized projection

Prox is backward Euler

answered Nov 2 '18 at 20:10

Moormanly

1314

Here are two interpretations which can lend useful intuition in some circumstances:

Prox is a generalized projection

Prox is backward Euler

answered Nov 2 '18 at 20:10

Moormanly

1314

answered Nov 2 '18 at 20:10

Moormanly

1314

answered Nov 2 '18 at 20:10

Moormanly

1314

answered Nov 2 '18 at 20:10

Moormanly

1314

add a comment |

$$
x = (I + t partial h)^{-1}(x).
$$
A natural idea is to solve for $x$ using fixed point iteration. The resulting iteration is
$$
x^{k+1} = (I + t partial h)^{-1}(x^k).
$$

Punch line: The function $(I + t partial h)^{-1}$ is none other than the proximal operator of $h$, and this iteration is known as the "proximal point algorithm".

edited Dec 17 '18 at 12:47

answered Dec 17 '18 at 12:37

littleO

29.6k645109

add a comment |

$$
x = (I + t partial h)^{-1}(x).
$$
A natural idea is to solve for $x$ using fixed point iteration. The resulting iteration is
$$
x^{k+1} = (I + t partial h)^{-1}(x^k).
$$

Punch line: The function $(I + t partial h)^{-1}$ is none other than the proximal operator of $h$, and this iteration is known as the "proximal point algorithm".

edited Dec 17 '18 at 12:47

answered Dec 17 '18 at 12:37

littleO

29.6k645109

add a comment |

$$
x = (I + t partial h)^{-1}(x).
$$
A natural idea is to solve for $x$ using fixed point iteration. The resulting iteration is
$$
x^{k+1} = (I + t partial h)^{-1}(x^k).
$$

Punch line: The function $(I + t partial h)^{-1}$ is none other than the proximal operator of $h$, and this iteration is known as the "proximal point algorithm".

edited Dec 17 '18 at 12:47

answered Dec 17 '18 at 12:37

littleO

29.6k645109

$$
x = (I + t partial h)^{-1}(x).
$$
A natural idea is to solve for $x$ using fixed point iteration. The resulting iteration is
$$
x^{k+1} = (I + t partial h)^{-1}(x^k).
$$

Punch line: The function $(I + t partial h)^{-1}$ is none other than the proximal operator of $h$, and this iteration is known as the "proximal point algorithm".

edited Dec 17 '18 at 12:47

answered Dec 17 '18 at 12:37

littleO

29.6k645109

edited Dec 17 '18 at 12:47

answered Dec 17 '18 at 12:37

littleO

29.6k645109

answered Dec 17 '18 at 12:37

littleO

29.6k645109

answered Dec 17 '18 at 12:37

littleO

29.6k645109

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xrfgtjtk