Quadratic Programming and Sequential Quadratic Programming

The Quadratic Programming (QP) problem is the following:

Min ½ x^TQx + d^Tx

s.t. Ax >= b

i.e. the minimization (or maximization) of a quadratic function of n variables subject to linear inequality constraints.

This formulation includes equality constraints as well (in fact, we will see that problem first). This problem in general, has a structure similar to Linear Programming – you will see that if you write down the KKT conditions for LP and QP.

An unconstrained QP makes sense (unlike in LP). Analysing the unconstrained QP tells us that QP can have local optima that are not global optima (also stationary points that are neither minima nor maxima). This will of course carry over in the constrained case as well. So QP will have all the characteristics of general non-linear problems, but with the feature that the convex QP (with positive definite matrix Q) will admit LP like solutions. The constraint region of QP is convex.

QP is important in its own right as well as for the reason that it forms that basis for the most successful techniques for general non-linear programming problems (Sequential Quadratic Programming and variants are generally regarded currently as the best methods for general NLP).

Equality constrained QP (EQP)

Equality constrained QP (min Min ½ x^TQx + d^Tx s.t. Ax = b, with Q an n dim matrix, A an m x n constraint matrix, with m < n) is easy to solve mathematically, but has some subtleties computationally speaking. Verify that if Q is non singular, and A is full rank (i.e. rank m), then the equality constrained QP amounts to solving an m+n dimension square, nonsingular system of linear equations for the x values and the multipliers. Alternatively, we can think of the m constraints being used to eliminate m of the variables and then solving the resulting unconstrained quadratic function minimization.

Example: 3x₁² + 2 x₁x₂ + x₁x₃ + 2.5 x₂² + 2 x₂ x₃ + 2 x₃² – 8 x₁ – 3 x₂ – 3 x₃ s.t. x₁ + x₃ = 3, x₂ + x₃ = 0.

The solution of this problem is x* = [2, -1, 1]. [Verify that this satisfies the KKT conditions. How do you interpret the multipliers in this case? Using the multipliers, decide whether the solution is optimal if the first constraint was x₁ + x₃ <= 3? For the case x₁ + x₃ >= 3?]

QP is therefore a finite problem, in that the solution can be found by enumerating a finite number of possibilities with finite computations for each possibility. This is seen by allowing a certain number of constraints to be active (including the possibility of none of the constraints being active) and solving the equality constrained QP in each case. Of course, this would not be a practical way to solve QPs, but it is (like) LP at least a finite problem.

If the Q matrix is not positive definite (for a minimization problem), the objective function is not convex. Then combinatorially speaking, QP is a hard problem, and essentially has to be solved by (clever) enumeration, or by heuristics (for large problems). Descent-based methods are not guaranteed to give us the global minimum.

If we allow x to be constrained to integer values (in particular, 0-1 values), then QP includes the hardest known problems in discrete optimization, including the traveling salesman problem and variants.

QP and LCP

A unifying framework which includes QP, LP and some other problems is the Linear Complementarity Problem LCP. This is stated as follows:

Given a matrix M and a vector q, find a vector z, such that the following conditions are met. z >= 0, q + Mz >= 0 and z^T(q + Mz) = 0. It is easy to define this problem (i.e. define appropriate M and q) so that LP and QP are special cases of LCP. [Try this.] The complementarity condition says that either z_i = 0 or (q + Mz)_i = 0, so trying out various possibilities (each involving a linear system to be solved) will give a solution. A procedure for LCP and therefore QP (Lemke’s algorithm) is given in Belegundu and Chandrupatla.

Active set methods for inequality constrained convex QP

The active set method for convex QP is somewhat similar to the Simplex method for LP. Assume that Q is positive semidefinite.

We start with a feasible solution to the system Ax <= b (by using phase one simplex, if required). We maintain a list of constraint indices that are active (where [Ax]_i = b_i). Solve the equality constrained QP with these constraints only. This is written as follows:

With respect to the current iterate x^k, which has a certain set of active constraints, we can define a direction p^k to be found, which is restricted to that subspace of active constraints. The relevant EQP is

Min_p ½ p^TQp + (Qx^k + d)^Tp

s.t. a_i^Tp = 0 for all the constraints i that are active at x^k

Here, note that p is the variable in this QP. The term a_i is the i-the constraint row and xk is the current iterate (so the term Qx^k + d) is a known one. This EQP is just the minimization of the original objective function, but written in terms of p = x - x_k.

If the optimal solution to EQP is p^k = 0, look at the multipliers for all the (active) constraints. If they are all non-negative, then the given solution is optimal (verify then that all the KKT conditions are satisfied). If any of the multipliers are negative, say corresponding to constraint i is the least one, drop that constraint from the active set and re-solve the EQP.

If the direction p^k is not equal to zero, do a line search along p^k (this is a very simple line search to ensure that all the linear inequality constraints are satisfied – similar to the ratio test in LP). Re-calculate the active set of indices at this point and continue. Note that step size = 1 is like taking the Newton step that gives the local minimum in the appropriate direction.

Example (from Nocedal and Wright): Min (x₁ – 1)² + (x₂ – 2.5)²

s.t. x₁ – 2 x₂ + 2 >= 0, – x₁ – 2 x₂ + 6 >= 0, – x₁ + 2 x₂ + 2 >= 0, x₁ >= 0, x₂ >= 0.

Starting with the feasible point [2,0], we can start with the active set of constraints and the various iterates of the algorithm will finally terminate at the solution [1.4,1.7]. [Verify the details. For this example, try different objective functions, including one that gives a minimum in the interior of the feasible region, and verify that the method works. Also try different starting points for this example. Try to write an argument for the finite convergence of this algorithm, in the absence of degeneracy – like in LP.]

Belegundu and Chandrupatla provide a version of the active set method for QP, for the special case Q = I, which arises in their general sequential quadratic programming algo. In this special QP, the dual QP is particularly simple to solve (having only non-negativity constraints) and their active set method is specialized to that case.

Exercise: Try the QP Min ½ x^Tx s.t. Ax <= b (here x is an n-dimensional variable, A is a given m by n matrix and b is a given m vector). Consider also the dual QP to this one, Max – ½ y^T(AA^T)y – b^Ty s.t. y >= 0. Let f₁ and f₂ be the primal and dual objective functions.

Show that the KKT conditions of (P) and (D) are the same.

State and prove a weak duality result for f₁ and f₂ (using techniques similar to those used in LP). Therefore show that if x₀ and y₀ are feasible for (P) and (D) and f₁(x₀) = f₂(y₀), then x₀ and y₀ are optimal for (P) and (D).

Exercise: Solve Min x₁² + 2x₂² – 2 x₁ – 6 x₂ – 2 x₁ x₂ s.t. ½ x₁ + ½ x₂ <= 1, - x₁ + 2 x₂ <= 2, x₁, x₂ >= 0.

Try different starting points, including the interior of the feasible region.

General non-linear optimization problems

This topic is a big one by itself and a large number of techniques have been proposed in the literature. Here, we only mention a few and give some details on one of the most successful ones, namely Sequential Quadratic Programming (SQP). We also give some introduction to penalty function based methods. Three other methods that are successful in various problem settings are summarized in Belegundu and Chandrupatla (Rosen’s gradient projection method for linear constraints, Zoutendijk’s method of feasible directions and the generalized reduced gradient method).

Merit functions and penalty functions

A useful concept in constrained optimization problems is that of a merit function. Keeping in mind that there is an objective function (which needs to be minimized, say) and constraints which need to be satisfied, we need to keep both aspects in mind. A merit function is a composite function that includes the objective function and constraints violations, if any, and that can be used to measure the progress of an algorithm and also as a means to perform a line search during iterations.

For the equality constrained nonlinear optimization problem Min f(x) s.t. h_j(x) = 0, the quadratic penalty function, f(x) + (1/m) S_j h_j(x)² has been proposed as a merit function (for a given parameter m > 0). This can also used as an objective function in a penalty function based approach, where (1/m) represents the penalty parameter for violating the constraint h_j(x) = 0. Typically, this unconstrained problem is solved for a sequence of mk values going down to zero, with each solution used as a starting point for the next problem. It can be shown that the solutions to the unconstrained problem will converge to the constrained solution. Note that typically, all the solutions of the unconstrained problems will be infeasible (will not satisfy h(x) = 0, and that we get a feasible and optimal solution only in the limit.

Exercise: Try the penalty approach for the problem Min x₁² + x₂² s.t. x₁ + x₂ = 1, and then for the problem Min x₁ + x₂ s.t. x₁² + x₂² – 2 = 0.

Exercise: Think of a way to extend the logic of penalty functions to inequality constrained problems.

Another merit function that is very important is the l₁ merit function, which is defined as f(x) + (1/m) S_j |h_j(x)| + (1/m) S_i |g_i(x)|⁺, where |g_i(x)|⁺ is max(gi(x),0). This is an exact penalty function because for an appropriate choice of the penalty parameter m, a single minimization of this will give the solution to the constrained problem. However, this function is not differentiable at all points.

There are other merit functions which are defined based on penalty parameters and also the multipliers or dual variables (suitably defined).

Techniques for general nonlinear problems

In solving general nonlinear programming problems by iterative means, it is seen that subproblems that involve linear constraints are tractable. Nonlinear constraints (even quadratic), are difficult to handle, and themselves would require iterative procedures to solve. It turns out that although linear objective function with linear constraints (which result in LP subproblems) are tractable, they are weak approximations and are generally not effective enough (somewhat akin to steepest descent methods for unconstrained problems). (Convex) quadratic objective with linear constraints are possible to solved efficiently. This is the SQP scheme. Nonlinear (convex) objective functions with linear constraints offer some hope of solutions in reasonable time. These methods also have shown some success.

Sequential quadratic programming SQP

There are many ways to motivate the class of algorithms falling under the SQP scheme. One of the ways is as follows. Consider the inequality constrained NLP Min f(x) s.t. g_i(x) <= 0 and some iterate x^k. Replace the objective function and each (active) constraint by a quadratic approximation around x^k and try to find a d^k that gives a locally improved solution (like the Newton scheme).

This gives the quadratic optimization problem

Min_d f(x^k) + Ñf(x^k)^Td + ½ d^TÑ²f(x^k)d

s.t. g_i(x^k) + Ñg_i(x^k)Td + ½ d^TÑ²g_i(x^k)d <= 0 for active constraints i

This is difficult to solve (this is not a QP as it has quadratic constraints), but we attempt to write the KKT conditions for this problem. [Please do this]. They will include some optimal multiplier values for each constraint.

We then consider a related problem, namely

Min_d f(x^k) + Ñf(x^k)^Td + ½ d^TÑ²f(x^k)d + S_i l_id^TÑ²g_i(x^k)d

s.t. g_i(x^k) + Ñg_i(x^k)^Td <= 0 for active constraints i

We see that the KKT conditions for this problem are the same as those for the earlier problem (and this is a QP with linear constraints) [Verify these details]. The catch of course is that the l_i values are not known beforehand. But in an iterative scheme, these values are used from the previous iteration (starting with an ad-hoc or intelligent initial estimate).

Once the direction finding QP is solved for d^k, a line search is done (on an appropriate merit function) so that the objective function is reduced, without giving up too much on the feasibility. Note that all iterates need not satisfy the original constraints.

In practical implementations, the second order term in the QP objective function above (which is really the Hessian of the Lagrangean) is approximated by an appropriate positive definite matrix (akin to Quasi Newton methods). Belegundu and Chandrupatla give a version of this where the identity matrix I is used throughout as the approximation to the Lagrangean of the Hessian. This has the advantage that the resulting QP is easier to solve as it has special dual structure, as they illustrate.

Code based on SQP is available in B and C and also in Matlab and other software. B and C also give interesting interpretations of the QP direction finding subproblem.