IMPORTANT : End semester exam for the course is scheduled on the morning of Saturday, 29 April. The material for the exam will be finished by Tuesday, 25 April and there will be a revision class only, on Thursday 27 April.

These notes finish some basic material on necessary and sufficient conditions for constrained optima. The remaining part of the course will consist of Quadratic Programming, use of QP to solve general nonlinear constrained problems, and finally, use of some non-traditional techniques to solve optimization problems. I will look at course project proposals if they are given to me on Monday, not after that.

Theorems of the alternative

To show the validity of the Karush Kuhn Tucker first order conditions for optimality of constrained optimization problems, there are a number of possible approaches. One of them is based on a set of (classical) results called theorems of the alternative. Examples of this are Gordan’s theorem and Farkas’s lemma. They are basically of the form that a certain system of (linear) equalities or inequalities has a feasible solution if and only if some other system has no solution.

One version of Farkas’s lemma refers to the following two systems (exactly one of which has a solution):

System 1 : {x : Ax = b, x >= 0}

System 2 : {y : y^TA <= 0, y^Tb > 0}

These theorems have nice geometrical interpretations, and some of them can be seen as separation theorems in convex analysis. For example, in the above system, consider the columns of A as vectors. The first system says that vector b lies in the positive cone generated by the columns of A (here x_i refers to the weight of column i of A). The second system says that we can find a plane that separates the columns of A and the b vector (then y represents the normal vector that defines the separating hyperplane, which makes an obtuse angle with the columns of A and an acute angle with b).

Alternative interpretations of the KKT conditions

Another way to look at the KKT condition is to linearise the objective function and constraints – which is valid if a constraint qualification holds (see below) and apply linear programming duality. Conversely, one can derive LP duality as an application of the KKT conditions.

If we define the Lagrangean function L(x,l,m) = f(x) + l^Tg(x) + m^Th(x), the main KKT condition can also be viewed as the gradient (w.r.t. x) of the Lagrangean being set equal to zero for optimality, rather than just the gradient of the objective function.

The Lagrange multipliers l and m at optimality play the same roles as the optimal dual variables in Linear Programming and have the interpretation of shadow prices of resources (i.e. marginal change in optimal objective function value for unit change in right hand side of constraints). Apart from other things, this is consistent with the complementary slackness condition that the marginal change is zero for a constraint that is not binding at optimality.

Constraint qualifications

For a constraint set K, define a feasible sequence at x* as a sequence of vectors {x^k} such that x^k not equal to x*, lim x^k = x* and x^k belonging to K for k sufficiently large. Then a limiting direction of this sequence is lim (x^k – x*)/|| x^k – x* || in a suitable norm.

The local condition for optimality (say local minimum) of f at x* over set K is that Ñf(x*)^Td >= 0 for all limiting directions d. [Verify this from the first order expansion for f]. This condition is difficult to check. What is more reasonable is the following. Suppose K is defined by the constraint set {g_i(x) <= 0, i = 1, …, n and h_j(x) = 0, j = 1, …, m}. The linearised set of feasible directions at a feasible point x* is the following: {d such that Ñh_j(x*)^Td = 0 and Ñgi(x*)^Td <= 0 for active constraints i}. A constraint qualification is said to hold at x* if these sets are equal (i.e. the set of limiting directions and the linearised set of feasible directions).

It is easy to see that if the KKT conditions hold, (local) optimality must hold. A constraint qualification (or regularity condition) basically allows us to derive a set of multipliers whenever optimality holds. The proof of that requires a version of the implicit function theorem or the use of generalized inverses.

Constraint qualifications are of several different types. Two which are commonly applicable are (i) Linear independence of active constraint gradients at x* or (ii) Linear constraints at x*. There are many others, and you can refer to a book on non-linear programming for details.

Examples where constraint qualification does not hold

{(x₁,x₂) s.t. x₂ >= 0, x₂ <= x₁³}. For this set at [0,0], the direction [1,0] is the only limiting direction but the set of linearised feasible directions is {d: d₂ = 0}. This set also contains [-1,0], which is not a limiting direction. In such cases, the KKT conditions for an optimization problem over this set may not hold. [Verify this, and see what the implications are for, say, f(x) = x₁ + x₂, which attains its minimum at [0,0]. Another example is K = {(x₁,x₂) s.t. x₁ >= 0, x₂ >= 0, x₂ – (1- x₁)³ <= 0}.

An example with a different flavour is K = {(x₁,x₂) s.t. (x₁²+ x₁²) = 1, (x₁ + 1)² + x₂² = 4}. [Exercise: Try finding an objective function for which the KKT conditions will fail for this example.]

Despite this technicality, the KKT conditions are very useful to characterize optimality in a majority of cases.

Examples

Try the following examples for practice. Note that in all cases, we are looking for locally optimal solutions.

Min (x₁ – 3/2)² + (x₂ – 1/8)⁴ s.t. x₁ + x₂ – 1 <= 0, x₁ – x₂ – 1 <= 0, x₂ – x₁ – 1 <= 0, - x₁ – x₂ – 1 <= 0
Max x₁x₂x₃ s.t. x₁ x₂ + x₂ x₃ + x₃ x₁ = A/2 (this is the problem of maximizing the volume of a box of fixed area)
Max x^TQx s.t. ||x||₂ = 1
Min –2 x₁ + x₂, s.t. x₂ – (1- x₁)³ <= 0 and 1 – x₂ – 0.25 x₁² <= 0
Min x₁ x₂ s.t. x₁² + x₂² = 1
Min –0.1(x₁-4)² + x₂² s.t. 1 – x₁² – x₂² <= 0
LP in standard form (and its dual)
Symmetric form of the LP (and its dual)
Min ½ x^TQx – d^Tx s.t. Ax = b (assume that Q is symmetric positive definite)
Min ½ x^TQx – d^Tx s.t. Ax <= b (assume that Q is symmetric positive definite)

Second order conditions

Second order necessary and sufficient conditions are stated here. The most convenient way is to state them in terms of the Hessian of the Lagrangean function at optimality. For a point x* and the associated set of multipliers l*,m* which satisfy the first order conditions, the matrix Ñ²_xxL(x*,l*, m*) is positive semidefinite on an appropriate set of directions. This set of directions is defined as {d : Ñh_j(x*)^Td = 0 for equality constraints h_j=0, Ñ g_i(x*)^Td = 0 for i defining active constraints and for which the multipliers l_i*> 0 and Ñg_i(x*)^Td >= 0 for i defining active constraints for which the multipliers l_i*= 0}. See Chong and Zak and other books on nonlinear optimization for the details.

In the inequality constrained case, when the multipliers l* at optimality are unique and if strict complementarity holds, this leads to the checkable condition that Z^TÑ²_xxL(x*,l*)Z is positive semidefinite, where Z is a full rank matrix spanning the null space of the active constraint gradients at x* (this matrix can be computed).

Generally speaking, sufficient conditions are when the Hessian (i.e. second derivative matrix) of the Lagrangean is positive definite over the appropriate subspace (set of directions).

Note that it is not necessary that just the Hessian of the objective function be positive semi-definite over the set of directions and that the given condition is a weaker condition than that.