How relevant are theoretical convex optimization convergence rates in practice, when parameters are unknown...
$begingroup$
There are many theoretical results known on convergence rates for various (possibly stochastic) convex optimization problems. For example, the popular review on optimization algorithms for machine learning [https://arxiv.org/abs/1405.4980] gives many such rigorous results.
For example, say we have a constraint set with diameter $R$, and we have stochastic access to an objective function with gradient estimates upper bounded by $G$. Then, one can rigorously prove that projected SGD, with proper step sizes, converges at a rate $O(RG/sqrt{T})$.
I am wondering how relevant theoretical results such as this are in practice, where relevant parameters may be unknown. In the above example, the step sizes of SGD must be chosen wisely (as functions of $R$ and $G$) to attain the provable convergence rate. But, say that my optimization problem is actually complicated and nonconvex, and I run some SGD algorithm on it. Say that I start SGD in some convex region containing a local minimum, at distance $R$ from the true optimum. However, my algorithm doesn't know that it is distance $R$ from the optimum. Also, I might not a priori know a bound $G$ on the gradient estimates I receive. In practice, can I still run some SGD algorithm and converge to the optimum at a rate $O(RG/sqrt{T})$?
The above paragraph is a specific example, but I'm just generally curious about how well provable converge rates carry over in practice to complicated nonconvex optimization problems (where, say, one is satisfied to converge to any local minimum to high precision).
optimization convex-optimization gradient-descent non-convex-optimization
$endgroup$
add a comment |
$begingroup$
There are many theoretical results known on convergence rates for various (possibly stochastic) convex optimization problems. For example, the popular review on optimization algorithms for machine learning [https://arxiv.org/abs/1405.4980] gives many such rigorous results.
For example, say we have a constraint set with diameter $R$, and we have stochastic access to an objective function with gradient estimates upper bounded by $G$. Then, one can rigorously prove that projected SGD, with proper step sizes, converges at a rate $O(RG/sqrt{T})$.
I am wondering how relevant theoretical results such as this are in practice, where relevant parameters may be unknown. In the above example, the step sizes of SGD must be chosen wisely (as functions of $R$ and $G$) to attain the provable convergence rate. But, say that my optimization problem is actually complicated and nonconvex, and I run some SGD algorithm on it. Say that I start SGD in some convex region containing a local minimum, at distance $R$ from the true optimum. However, my algorithm doesn't know that it is distance $R$ from the optimum. Also, I might not a priori know a bound $G$ on the gradient estimates I receive. In practice, can I still run some SGD algorithm and converge to the optimum at a rate $O(RG/sqrt{T})$?
The above paragraph is a specific example, but I'm just generally curious about how well provable converge rates carry over in practice to complicated nonconvex optimization problems (where, say, one is satisfied to converge to any local minimum to high precision).
optimization convex-optimization gradient-descent non-convex-optimization
$endgroup$
$begingroup$
For nonconvex problems, you might want to look at this paper.
$endgroup$
– VHarisop
Dec 21 '18 at 19:40
$begingroup$
Convergence proofs have no value in practice, even for convex functions. The bounds are too weak, so implementations rarely follow theory
$endgroup$
– LinAlg
Dec 21 '18 at 20:30
$begingroup$
Thanks for the comments. @LinAlg , if convergence proofs have no value in practice, what is their purpose? Do you truly mean they have no value, or just that there is often a large difference between theory and practice? Does the theory not at least guide our expectations for practical applications?
$endgroup$
– M. Eieals
Dec 21 '18 at 23:23
$begingroup$
@M.Eieals I am sure some people can appreciate convergence proofs. Let me given an example. For linear optimization there are the simplex method and an interior point method. Only the latter has proven polynomial complexity, but the implementations often do not. In practice the simplex method can be much faster or much slower, depending on the linear optimization model and data.
$endgroup$
– LinAlg
Dec 21 '18 at 23:31
$begingroup$
The major problem with convergence rate proofs is that the theory has not evolved enough yet to provide a useful bound. For example, many first-order methods are proved to converge at $O(1/k^2)$ rate, where $k$ is the number of iterations, but in practice linear convergence is observed in many cases. Moreover, the constants in the bound are far from being tight, and are of no practical value. In many cases, the rate of convergence proof is useful to get intuition - a method with a substantially faster rate than a competitor, in many cases, is indeed faster in practice when implemented well.
$endgroup$
– Alex Shtof
Dec 22 '18 at 10:57
add a comment |
$begingroup$
There are many theoretical results known on convergence rates for various (possibly stochastic) convex optimization problems. For example, the popular review on optimization algorithms for machine learning [https://arxiv.org/abs/1405.4980] gives many such rigorous results.
For example, say we have a constraint set with diameter $R$, and we have stochastic access to an objective function with gradient estimates upper bounded by $G$. Then, one can rigorously prove that projected SGD, with proper step sizes, converges at a rate $O(RG/sqrt{T})$.
I am wondering how relevant theoretical results such as this are in practice, where relevant parameters may be unknown. In the above example, the step sizes of SGD must be chosen wisely (as functions of $R$ and $G$) to attain the provable convergence rate. But, say that my optimization problem is actually complicated and nonconvex, and I run some SGD algorithm on it. Say that I start SGD in some convex region containing a local minimum, at distance $R$ from the true optimum. However, my algorithm doesn't know that it is distance $R$ from the optimum. Also, I might not a priori know a bound $G$ on the gradient estimates I receive. In practice, can I still run some SGD algorithm and converge to the optimum at a rate $O(RG/sqrt{T})$?
The above paragraph is a specific example, but I'm just generally curious about how well provable converge rates carry over in practice to complicated nonconvex optimization problems (where, say, one is satisfied to converge to any local minimum to high precision).
optimization convex-optimization gradient-descent non-convex-optimization
$endgroup$
There are many theoretical results known on convergence rates for various (possibly stochastic) convex optimization problems. For example, the popular review on optimization algorithms for machine learning [https://arxiv.org/abs/1405.4980] gives many such rigorous results.
For example, say we have a constraint set with diameter $R$, and we have stochastic access to an objective function with gradient estimates upper bounded by $G$. Then, one can rigorously prove that projected SGD, with proper step sizes, converges at a rate $O(RG/sqrt{T})$.
I am wondering how relevant theoretical results such as this are in practice, where relevant parameters may be unknown. In the above example, the step sizes of SGD must be chosen wisely (as functions of $R$ and $G$) to attain the provable convergence rate. But, say that my optimization problem is actually complicated and nonconvex, and I run some SGD algorithm on it. Say that I start SGD in some convex region containing a local minimum, at distance $R$ from the true optimum. However, my algorithm doesn't know that it is distance $R$ from the optimum. Also, I might not a priori know a bound $G$ on the gradient estimates I receive. In practice, can I still run some SGD algorithm and converge to the optimum at a rate $O(RG/sqrt{T})$?
The above paragraph is a specific example, but I'm just generally curious about how well provable converge rates carry over in practice to complicated nonconvex optimization problems (where, say, one is satisfied to converge to any local minimum to high precision).
optimization convex-optimization gradient-descent non-convex-optimization
optimization convex-optimization gradient-descent non-convex-optimization
asked Dec 21 '18 at 19:25
M. EiealsM. Eieals
61
61
$begingroup$
For nonconvex problems, you might want to look at this paper.
$endgroup$
– VHarisop
Dec 21 '18 at 19:40
$begingroup$
Convergence proofs have no value in practice, even for convex functions. The bounds are too weak, so implementations rarely follow theory
$endgroup$
– LinAlg
Dec 21 '18 at 20:30
$begingroup$
Thanks for the comments. @LinAlg , if convergence proofs have no value in practice, what is their purpose? Do you truly mean they have no value, or just that there is often a large difference between theory and practice? Does the theory not at least guide our expectations for practical applications?
$endgroup$
– M. Eieals
Dec 21 '18 at 23:23
$begingroup$
@M.Eieals I am sure some people can appreciate convergence proofs. Let me given an example. For linear optimization there are the simplex method and an interior point method. Only the latter has proven polynomial complexity, but the implementations often do not. In practice the simplex method can be much faster or much slower, depending on the linear optimization model and data.
$endgroup$
– LinAlg
Dec 21 '18 at 23:31
$begingroup$
The major problem with convergence rate proofs is that the theory has not evolved enough yet to provide a useful bound. For example, many first-order methods are proved to converge at $O(1/k^2)$ rate, where $k$ is the number of iterations, but in practice linear convergence is observed in many cases. Moreover, the constants in the bound are far from being tight, and are of no practical value. In many cases, the rate of convergence proof is useful to get intuition - a method with a substantially faster rate than a competitor, in many cases, is indeed faster in practice when implemented well.
$endgroup$
– Alex Shtof
Dec 22 '18 at 10:57
add a comment |
$begingroup$
For nonconvex problems, you might want to look at this paper.
$endgroup$
– VHarisop
Dec 21 '18 at 19:40
$begingroup$
Convergence proofs have no value in practice, even for convex functions. The bounds are too weak, so implementations rarely follow theory
$endgroup$
– LinAlg
Dec 21 '18 at 20:30
$begingroup$
Thanks for the comments. @LinAlg , if convergence proofs have no value in practice, what is their purpose? Do you truly mean they have no value, or just that there is often a large difference between theory and practice? Does the theory not at least guide our expectations for practical applications?
$endgroup$
– M. Eieals
Dec 21 '18 at 23:23
$begingroup$
@M.Eieals I am sure some people can appreciate convergence proofs. Let me given an example. For linear optimization there are the simplex method and an interior point method. Only the latter has proven polynomial complexity, but the implementations often do not. In practice the simplex method can be much faster or much slower, depending on the linear optimization model and data.
$endgroup$
– LinAlg
Dec 21 '18 at 23:31
$begingroup$
The major problem with convergence rate proofs is that the theory has not evolved enough yet to provide a useful bound. For example, many first-order methods are proved to converge at $O(1/k^2)$ rate, where $k$ is the number of iterations, but in practice linear convergence is observed in many cases. Moreover, the constants in the bound are far from being tight, and are of no practical value. In many cases, the rate of convergence proof is useful to get intuition - a method with a substantially faster rate than a competitor, in many cases, is indeed faster in practice when implemented well.
$endgroup$
– Alex Shtof
Dec 22 '18 at 10:57
$begingroup$
For nonconvex problems, you might want to look at this paper.
$endgroup$
– VHarisop
Dec 21 '18 at 19:40
$begingroup$
For nonconvex problems, you might want to look at this paper.
$endgroup$
– VHarisop
Dec 21 '18 at 19:40
$begingroup$
Convergence proofs have no value in practice, even for convex functions. The bounds are too weak, so implementations rarely follow theory
$endgroup$
– LinAlg
Dec 21 '18 at 20:30
$begingroup$
Convergence proofs have no value in practice, even for convex functions. The bounds are too weak, so implementations rarely follow theory
$endgroup$
– LinAlg
Dec 21 '18 at 20:30
$begingroup$
Thanks for the comments. @LinAlg , if convergence proofs have no value in practice, what is their purpose? Do you truly mean they have no value, or just that there is often a large difference between theory and practice? Does the theory not at least guide our expectations for practical applications?
$endgroup$
– M. Eieals
Dec 21 '18 at 23:23
$begingroup$
Thanks for the comments. @LinAlg , if convergence proofs have no value in practice, what is their purpose? Do you truly mean they have no value, or just that there is often a large difference between theory and practice? Does the theory not at least guide our expectations for practical applications?
$endgroup$
– M. Eieals
Dec 21 '18 at 23:23
$begingroup$
@M.Eieals I am sure some people can appreciate convergence proofs. Let me given an example. For linear optimization there are the simplex method and an interior point method. Only the latter has proven polynomial complexity, but the implementations often do not. In practice the simplex method can be much faster or much slower, depending on the linear optimization model and data.
$endgroup$
– LinAlg
Dec 21 '18 at 23:31
$begingroup$
@M.Eieals I am sure some people can appreciate convergence proofs. Let me given an example. For linear optimization there are the simplex method and an interior point method. Only the latter has proven polynomial complexity, but the implementations often do not. In practice the simplex method can be much faster or much slower, depending on the linear optimization model and data.
$endgroup$
– LinAlg
Dec 21 '18 at 23:31
$begingroup$
The major problem with convergence rate proofs is that the theory has not evolved enough yet to provide a useful bound. For example, many first-order methods are proved to converge at $O(1/k^2)$ rate, where $k$ is the number of iterations, but in practice linear convergence is observed in many cases. Moreover, the constants in the bound are far from being tight, and are of no practical value. In many cases, the rate of convergence proof is useful to get intuition - a method with a substantially faster rate than a competitor, in many cases, is indeed faster in practice when implemented well.
$endgroup$
– Alex Shtof
Dec 22 '18 at 10:57
$begingroup$
The major problem with convergence rate proofs is that the theory has not evolved enough yet to provide a useful bound. For example, many first-order methods are proved to converge at $O(1/k^2)$ rate, where $k$ is the number of iterations, but in practice linear convergence is observed in many cases. Moreover, the constants in the bound are far from being tight, and are of no practical value. In many cases, the rate of convergence proof is useful to get intuition - a method with a substantially faster rate than a competitor, in many cases, is indeed faster in practice when implemented well.
$endgroup$
– Alex Shtof
Dec 22 '18 at 10:57
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048814%2fhow-relevant-are-theoretical-convex-optimization-convergence-rates-in-practice%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048814%2fhow-relevant-are-theoretical-convex-optimization-convergence-rates-in-practice%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
For nonconvex problems, you might want to look at this paper.
$endgroup$
– VHarisop
Dec 21 '18 at 19:40
$begingroup$
Convergence proofs have no value in practice, even for convex functions. The bounds are too weak, so implementations rarely follow theory
$endgroup$
– LinAlg
Dec 21 '18 at 20:30
$begingroup$
Thanks for the comments. @LinAlg , if convergence proofs have no value in practice, what is their purpose? Do you truly mean they have no value, or just that there is often a large difference between theory and practice? Does the theory not at least guide our expectations for practical applications?
$endgroup$
– M. Eieals
Dec 21 '18 at 23:23
$begingroup$
@M.Eieals I am sure some people can appreciate convergence proofs. Let me given an example. For linear optimization there are the simplex method and an interior point method. Only the latter has proven polynomial complexity, but the implementations often do not. In practice the simplex method can be much faster or much slower, depending on the linear optimization model and data.
$endgroup$
– LinAlg
Dec 21 '18 at 23:31
$begingroup$
The major problem with convergence rate proofs is that the theory has not evolved enough yet to provide a useful bound. For example, many first-order methods are proved to converge at $O(1/k^2)$ rate, where $k$ is the number of iterations, but in practice linear convergence is observed in many cases. Moreover, the constants in the bound are far from being tight, and are of no practical value. In many cases, the rate of convergence proof is useful to get intuition - a method with a substantially faster rate than a competitor, in many cases, is indeed faster in practice when implemented well.
$endgroup$
– Alex Shtof
Dec 22 '18 at 10:57