Why are $∇f(x*) = 0$, and $x^T H_x > 0$ necessary conditions for $x^*$ to be minimum?












2












$begingroup$


See the book OPTIMIZATION: Algorithms and Applications by Rajesh Kumar Arora, Page-$56$.




... ...



The necessary conditions for $x^*$ to be a minimum are that
$$∇f(x*) = 0 qquad (3.1)$$



and $x^T Hx $ is positive definite $(x^T H x > 0)$. To ensure this,
eigenvalues of $H$ are to be positive. Consider a two-variable
function $$f_x = {x_1}^2 + {x_2}^2 - 2x_1 qquad (3.2)$$



... ...




I have three questions:




  1. What is $Hx$?

  2. What does it mean by the term positive-definite?

  3. Why are
    (a) $displaystyle ∇f(x*) = 0$, and
    (b) $displaystyle x^T H x > 0$
    necessary conditions for $x^*$ to be minimum?










share|cite|improve this question











$endgroup$

















    2












    $begingroup$


    See the book OPTIMIZATION: Algorithms and Applications by Rajesh Kumar Arora, Page-$56$.




    ... ...



    The necessary conditions for $x^*$ to be a minimum are that
    $$∇f(x*) = 0 qquad (3.1)$$



    and $x^T Hx $ is positive definite $(x^T H x > 0)$. To ensure this,
    eigenvalues of $H$ are to be positive. Consider a two-variable
    function $$f_x = {x_1}^2 + {x_2}^2 - 2x_1 qquad (3.2)$$



    ... ...




    I have three questions:




    1. What is $Hx$?

    2. What does it mean by the term positive-definite?

    3. Why are
      (a) $displaystyle ∇f(x*) = 0$, and
      (b) $displaystyle x^T H x > 0$
      necessary conditions for $x^*$ to be minimum?










    share|cite|improve this question











    $endgroup$















      2












      2








      2





      $begingroup$


      See the book OPTIMIZATION: Algorithms and Applications by Rajesh Kumar Arora, Page-$56$.




      ... ...



      The necessary conditions for $x^*$ to be a minimum are that
      $$∇f(x*) = 0 qquad (3.1)$$



      and $x^T Hx $ is positive definite $(x^T H x > 0)$. To ensure this,
      eigenvalues of $H$ are to be positive. Consider a two-variable
      function $$f_x = {x_1}^2 + {x_2}^2 - 2x_1 qquad (3.2)$$



      ... ...




      I have three questions:




      1. What is $Hx$?

      2. What does it mean by the term positive-definite?

      3. Why are
        (a) $displaystyle ∇f(x*) = 0$, and
        (b) $displaystyle x^T H x > 0$
        necessary conditions for $x^*$ to be minimum?










      share|cite|improve this question











      $endgroup$




      See the book OPTIMIZATION: Algorithms and Applications by Rajesh Kumar Arora, Page-$56$.




      ... ...



      The necessary conditions for $x^*$ to be a minimum are that
      $$∇f(x*) = 0 qquad (3.1)$$



      and $x^T Hx $ is positive definite $(x^T H x > 0)$. To ensure this,
      eigenvalues of $H$ are to be positive. Consider a two-variable
      function $$f_x = {x_1}^2 + {x_2}^2 - 2x_1 qquad (3.2)$$



      ... ...




      I have three questions:




      1. What is $Hx$?

      2. What does it mean by the term positive-definite?

      3. Why are
        (a) $displaystyle ∇f(x*) = 0$, and
        (b) $displaystyle x^T H x > 0$
        necessary conditions for $x^*$ to be minimum?







      calculus optimization






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Jan 3 at 8:03







      user366312

















      asked Jan 3 at 7:27









      user366312user366312

      652317




      652317






















          1 Answer
          1






          active

          oldest

          votes


















          5












          $begingroup$


          $1$. What is $H$?




          The matrix $H$ is referred to as the Hessian matrix, i.e.



          $$H = begin{bmatrix}
          dfrac{partial^2 f}{partial x_1^2} & dfrac{partial^2 f}{partial x_1,partial x_2} & cdots & dfrac{partial^2 f}{partial x_1,partial x_n} \[2.2ex]
          dfrac{partial^2 f}{partial x_2,partial x_1} & dfrac{partial^2 f}{partial x_2^2} & cdots & dfrac{partial^2 f}{partial x_2,partial x_n} \[2.2ex]
          vdots & vdots & ddots & vdots \[2.2ex]
          dfrac{partial^2 f}{partial x_n,partial x_1} & dfrac{partial^2 f}{partial x_n,partial x_2} & cdots & dfrac{partial^2 f}{partial x_n^2}
          end{bmatrix}$$




          $2$. What does it mean by the term positive-definite?




          Positive definite means that if you grab any vector $z$, the term $z^T H z$ is always positive for any non-zero $z$. Informally speaking, it is a way of saying that your matrix is "positive". To see that, take a scalar and say it is positive-definite, it just turns out that the scalar is positive.




          $3$. Why are
          (a) $∇f(x*) = 0$, and
          (b) $x^T H x > 0$
          necessary conditions for $x^*$ to be minimum?




          In 1-dimensional functions, so that you have a local minimum around $x^*$, wiggling around $x^*$ should make your function larger, right? This means that we must have
          $$f(x^*+epsilon )gt f(x^*)Rightarrow f(x^*+epsilon )- f(x^*)gt 0$$
          Taylor series tells us that
          $$f(x^*+epsilon)approx f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2+O(epsilon^3)$$
          We can substitute the approximation of $f(x^*+epsilon)$ to the definition of minimum
          $$f(x^*+epsilon )- f(x^*)=f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2-f(x^*)gt 0$$
          $$frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2gt 0$$



          Now in $n-$dimensional, which is your case, (note that $epsilon$ becomes a vector now)
          $$nabla f(x^*)epsilon+ epsilon^T frac 12nabla^2 f(x^*)epsilongt 0$$
          We now use the assumption that $nabla f(x^*)=0$, which gives
          $$epsilon^T nabla^2 f(x^*)epsilongt 0$$
          Since the term $epsilon^T A epsilongt 0$ is always positive definite for every $epsilon$ the condition for local minimum becomes
          $$nabla^2 f(x^*)gt 0$$






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            What is $Hx$ or $Hz$?
            $endgroup$
            – user366312
            Jan 3 at 8:03










          • $begingroup$
            @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
            $endgroup$
            – Ahmad Bazzi
            Jan 3 at 8:05











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3060334%2fwhy-are-%25e2%2588%2587fx-0-and-xt-h-x-0-necessary-conditions-for-x-to-be-min%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5












          $begingroup$


          $1$. What is $H$?




          The matrix $H$ is referred to as the Hessian matrix, i.e.



          $$H = begin{bmatrix}
          dfrac{partial^2 f}{partial x_1^2} & dfrac{partial^2 f}{partial x_1,partial x_2} & cdots & dfrac{partial^2 f}{partial x_1,partial x_n} \[2.2ex]
          dfrac{partial^2 f}{partial x_2,partial x_1} & dfrac{partial^2 f}{partial x_2^2} & cdots & dfrac{partial^2 f}{partial x_2,partial x_n} \[2.2ex]
          vdots & vdots & ddots & vdots \[2.2ex]
          dfrac{partial^2 f}{partial x_n,partial x_1} & dfrac{partial^2 f}{partial x_n,partial x_2} & cdots & dfrac{partial^2 f}{partial x_n^2}
          end{bmatrix}$$




          $2$. What does it mean by the term positive-definite?




          Positive definite means that if you grab any vector $z$, the term $z^T H z$ is always positive for any non-zero $z$. Informally speaking, it is a way of saying that your matrix is "positive". To see that, take a scalar and say it is positive-definite, it just turns out that the scalar is positive.




          $3$. Why are
          (a) $∇f(x*) = 0$, and
          (b) $x^T H x > 0$
          necessary conditions for $x^*$ to be minimum?




          In 1-dimensional functions, so that you have a local minimum around $x^*$, wiggling around $x^*$ should make your function larger, right? This means that we must have
          $$f(x^*+epsilon )gt f(x^*)Rightarrow f(x^*+epsilon )- f(x^*)gt 0$$
          Taylor series tells us that
          $$f(x^*+epsilon)approx f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2+O(epsilon^3)$$
          We can substitute the approximation of $f(x^*+epsilon)$ to the definition of minimum
          $$f(x^*+epsilon )- f(x^*)=f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2-f(x^*)gt 0$$
          $$frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2gt 0$$



          Now in $n-$dimensional, which is your case, (note that $epsilon$ becomes a vector now)
          $$nabla f(x^*)epsilon+ epsilon^T frac 12nabla^2 f(x^*)epsilongt 0$$
          We now use the assumption that $nabla f(x^*)=0$, which gives
          $$epsilon^T nabla^2 f(x^*)epsilongt 0$$
          Since the term $epsilon^T A epsilongt 0$ is always positive definite for every $epsilon$ the condition for local minimum becomes
          $$nabla^2 f(x^*)gt 0$$






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            What is $Hx$ or $Hz$?
            $endgroup$
            – user366312
            Jan 3 at 8:03










          • $begingroup$
            @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
            $endgroup$
            – Ahmad Bazzi
            Jan 3 at 8:05
















          5












          $begingroup$


          $1$. What is $H$?




          The matrix $H$ is referred to as the Hessian matrix, i.e.



          $$H = begin{bmatrix}
          dfrac{partial^2 f}{partial x_1^2} & dfrac{partial^2 f}{partial x_1,partial x_2} & cdots & dfrac{partial^2 f}{partial x_1,partial x_n} \[2.2ex]
          dfrac{partial^2 f}{partial x_2,partial x_1} & dfrac{partial^2 f}{partial x_2^2} & cdots & dfrac{partial^2 f}{partial x_2,partial x_n} \[2.2ex]
          vdots & vdots & ddots & vdots \[2.2ex]
          dfrac{partial^2 f}{partial x_n,partial x_1} & dfrac{partial^2 f}{partial x_n,partial x_2} & cdots & dfrac{partial^2 f}{partial x_n^2}
          end{bmatrix}$$




          $2$. What does it mean by the term positive-definite?




          Positive definite means that if you grab any vector $z$, the term $z^T H z$ is always positive for any non-zero $z$. Informally speaking, it is a way of saying that your matrix is "positive". To see that, take a scalar and say it is positive-definite, it just turns out that the scalar is positive.




          $3$. Why are
          (a) $∇f(x*) = 0$, and
          (b) $x^T H x > 0$
          necessary conditions for $x^*$ to be minimum?




          In 1-dimensional functions, so that you have a local minimum around $x^*$, wiggling around $x^*$ should make your function larger, right? This means that we must have
          $$f(x^*+epsilon )gt f(x^*)Rightarrow f(x^*+epsilon )- f(x^*)gt 0$$
          Taylor series tells us that
          $$f(x^*+epsilon)approx f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2+O(epsilon^3)$$
          We can substitute the approximation of $f(x^*+epsilon)$ to the definition of minimum
          $$f(x^*+epsilon )- f(x^*)=f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2-f(x^*)gt 0$$
          $$frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2gt 0$$



          Now in $n-$dimensional, which is your case, (note that $epsilon$ becomes a vector now)
          $$nabla f(x^*)epsilon+ epsilon^T frac 12nabla^2 f(x^*)epsilongt 0$$
          We now use the assumption that $nabla f(x^*)=0$, which gives
          $$epsilon^T nabla^2 f(x^*)epsilongt 0$$
          Since the term $epsilon^T A epsilongt 0$ is always positive definite for every $epsilon$ the condition for local minimum becomes
          $$nabla^2 f(x^*)gt 0$$






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            What is $Hx$ or $Hz$?
            $endgroup$
            – user366312
            Jan 3 at 8:03










          • $begingroup$
            @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
            $endgroup$
            – Ahmad Bazzi
            Jan 3 at 8:05














          5












          5








          5





          $begingroup$


          $1$. What is $H$?




          The matrix $H$ is referred to as the Hessian matrix, i.e.



          $$H = begin{bmatrix}
          dfrac{partial^2 f}{partial x_1^2} & dfrac{partial^2 f}{partial x_1,partial x_2} & cdots & dfrac{partial^2 f}{partial x_1,partial x_n} \[2.2ex]
          dfrac{partial^2 f}{partial x_2,partial x_1} & dfrac{partial^2 f}{partial x_2^2} & cdots & dfrac{partial^2 f}{partial x_2,partial x_n} \[2.2ex]
          vdots & vdots & ddots & vdots \[2.2ex]
          dfrac{partial^2 f}{partial x_n,partial x_1} & dfrac{partial^2 f}{partial x_n,partial x_2} & cdots & dfrac{partial^2 f}{partial x_n^2}
          end{bmatrix}$$




          $2$. What does it mean by the term positive-definite?




          Positive definite means that if you grab any vector $z$, the term $z^T H z$ is always positive for any non-zero $z$. Informally speaking, it is a way of saying that your matrix is "positive". To see that, take a scalar and say it is positive-definite, it just turns out that the scalar is positive.




          $3$. Why are
          (a) $∇f(x*) = 0$, and
          (b) $x^T H x > 0$
          necessary conditions for $x^*$ to be minimum?




          In 1-dimensional functions, so that you have a local minimum around $x^*$, wiggling around $x^*$ should make your function larger, right? This means that we must have
          $$f(x^*+epsilon )gt f(x^*)Rightarrow f(x^*+epsilon )- f(x^*)gt 0$$
          Taylor series tells us that
          $$f(x^*+epsilon)approx f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2+O(epsilon^3)$$
          We can substitute the approximation of $f(x^*+epsilon)$ to the definition of minimum
          $$f(x^*+epsilon )- f(x^*)=f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2-f(x^*)gt 0$$
          $$frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2gt 0$$



          Now in $n-$dimensional, which is your case, (note that $epsilon$ becomes a vector now)
          $$nabla f(x^*)epsilon+ epsilon^T frac 12nabla^2 f(x^*)epsilongt 0$$
          We now use the assumption that $nabla f(x^*)=0$, which gives
          $$epsilon^T nabla^2 f(x^*)epsilongt 0$$
          Since the term $epsilon^T A epsilongt 0$ is always positive definite for every $epsilon$ the condition for local minimum becomes
          $$nabla^2 f(x^*)gt 0$$






          share|cite|improve this answer











          $endgroup$




          $1$. What is $H$?




          The matrix $H$ is referred to as the Hessian matrix, i.e.



          $$H = begin{bmatrix}
          dfrac{partial^2 f}{partial x_1^2} & dfrac{partial^2 f}{partial x_1,partial x_2} & cdots & dfrac{partial^2 f}{partial x_1,partial x_n} \[2.2ex]
          dfrac{partial^2 f}{partial x_2,partial x_1} & dfrac{partial^2 f}{partial x_2^2} & cdots & dfrac{partial^2 f}{partial x_2,partial x_n} \[2.2ex]
          vdots & vdots & ddots & vdots \[2.2ex]
          dfrac{partial^2 f}{partial x_n,partial x_1} & dfrac{partial^2 f}{partial x_n,partial x_2} & cdots & dfrac{partial^2 f}{partial x_n^2}
          end{bmatrix}$$




          $2$. What does it mean by the term positive-definite?




          Positive definite means that if you grab any vector $z$, the term $z^T H z$ is always positive for any non-zero $z$. Informally speaking, it is a way of saying that your matrix is "positive". To see that, take a scalar and say it is positive-definite, it just turns out that the scalar is positive.




          $3$. Why are
          (a) $∇f(x*) = 0$, and
          (b) $x^T H x > 0$
          necessary conditions for $x^*$ to be minimum?




          In 1-dimensional functions, so that you have a local minimum around $x^*$, wiggling around $x^*$ should make your function larger, right? This means that we must have
          $$f(x^*+epsilon )gt f(x^*)Rightarrow f(x^*+epsilon )- f(x^*)gt 0$$
          Taylor series tells us that
          $$f(x^*+epsilon)approx f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2+O(epsilon^3)$$
          We can substitute the approximation of $f(x^*+epsilon)$ to the definition of minimum
          $$f(x^*+epsilon )- f(x^*)=f(x^*)+frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2-f(x^*)gt 0$$
          $$frac{df}{dx}|_{x^*}epsilon+frac 12frac{d^2f}{dx^2}|_{x^*}epsilon^2gt 0$$



          Now in $n-$dimensional, which is your case, (note that $epsilon$ becomes a vector now)
          $$nabla f(x^*)epsilon+ epsilon^T frac 12nabla^2 f(x^*)epsilongt 0$$
          We now use the assumption that $nabla f(x^*)=0$, which gives
          $$epsilon^T nabla^2 f(x^*)epsilongt 0$$
          Since the term $epsilon^T A epsilongt 0$ is always positive definite for every $epsilon$ the condition for local minimum becomes
          $$nabla^2 f(x^*)gt 0$$







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jan 3 at 8:10

























          answered Jan 3 at 7:51









          Ahmad BazziAhmad Bazzi

          8,3622824




          8,3622824












          • $begingroup$
            What is $Hx$ or $Hz$?
            $endgroup$
            – user366312
            Jan 3 at 8:03










          • $begingroup$
            @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
            $endgroup$
            – Ahmad Bazzi
            Jan 3 at 8:05


















          • $begingroup$
            What is $Hx$ or $Hz$?
            $endgroup$
            – user366312
            Jan 3 at 8:03










          • $begingroup$
            @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
            $endgroup$
            – Ahmad Bazzi
            Jan 3 at 8:05
















          $begingroup$
          What is $Hx$ or $Hz$?
          $endgroup$
          – user366312
          Jan 3 at 8:03




          $begingroup$
          What is $Hx$ or $Hz$?
          $endgroup$
          – user366312
          Jan 3 at 8:03












          $begingroup$
          @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
          $endgroup$
          – Ahmad Bazzi
          Jan 3 at 8:05




          $begingroup$
          @user366312 It is a vector which is computed upon multiplying matrix $H$ with vector $x$ (to give $Hx$) or vector $z$ (to give $Hz$). Please refer to this article mathinsight.org/matrix_vector_multiplication on more insights.
          $endgroup$
          – Ahmad Bazzi
          Jan 3 at 8:05


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3060334%2fwhy-are-%25e2%2588%2587fx-0-and-xt-h-x-0-necessary-conditions-for-x-to-be-min%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Bressuire

          Cabo Verde

          Gyllenstierna