ME 609 – Optimization methods in engineering

These are very brief notes and are not substitutes for textbooks. Reading only these will NOT be adequate in learning the subject well or even in doing well in the exams. As senior and postgraduate students, familiarity with more than one textbook in this classical and important area is taken for granted.

Attendance will be taken and 75% attendance is required for writing the exam and clearing the course.

Introduction to optimization

Optimization in this course will consist of minimization or maximization (it doesn’t matter) of a “well defined” function of several (but finite number of) variables, perhaps subject to some constraints. Continuous optimization relies on the use of derivatives (slopes or gradients) and higher order quantities like second derivatives (curvature and its generalizations), to model a function’s behaviour as the parameters change.

Most of continuous optimization will rely on either a functional form of the objective that has to be optimized, or at least that there are smooth properties of a function that can be evaluated efficiently at various points. Note that derivatives of various orders can be approximated by finite differences when required.

We will also consider discrete optimization, which deals with a (usually finite) discrete set of feasible points. Here, notions of derivatives and model functions are not directly useful, but notions of descent, improvement and local/global optimality are useful here, too.

What this course will NOT consider: Multi-objective optimization, fuzzy function theory, probabilistic concerns both of objective and constraints and equilibrium or game theoretic models. Note that all of these can be treated as fairly natural, but non-trivial extensions of the theory that we will see here.

Some nice things that you may learn from this course

· Linear programming as a link technique between continuous and discrete optimization

· Linear programming duality and sensitivity, its extensions to Nonlinear problems and the interpretations there

· (Convex) Quadratic programming (QP) is as easy as linear programming

· Quasi Newton methods based on approximations of second order information would be among the most efficient general purpose nonlinear programming techniques

· Sequential QP would form among the most robust general purpose tools for constrained optimization

· Integer programming problems are sometimes solvable using linear programming subproblems

Self study

Optimization of a function of one variable

This consists of derivative and nonderivative methods that are quite nice and intuitive and useful. Please read them. As subroutines for larger (dimensional) problems, it may be enough to find the first (local) optimum point in a given direction and that too, approximately.

Basic linear programming

Graphical method for 2 variable problems, extreme point properties of a solution (notion of a basic feasible solution) and the basic simplex method will be taken as the starting point.

Some exercises

· Formulate a procedure for minimizing a function f of n variables. The only information available is to be able to evaluate f(x) given x. It is believed that the function is a smooth one (differentiable, etc.), but no formula is available. Assume that some “reasonable” starting solution is available.

· On a directed network of nodes (set N) and directed arcs (set A) with each arc going from some node to another, and a distance function defined on each arc, describe using proper notation, the problem of finding the shortest path between two designated nodes s and t. Try to also formulate this problem as a linear programming problem. Note that LP is a continuous problem.

· Show rigorously how maximization and minimization problems are equivalent. You have to first formulate a rigorous statement about this! The crux is that if you can do one, you can do the other.

Some preliminaries on norms, derivatives and later on, linear algebra, especially of positive definite matrices, are available as intro chapters or detailed appendices in any book on optimization or even O.R. and on books on (real) analysis and linear algebra. A basic understanding of this, along with appropriate notation for vectors and matrices is essential for this course.

Norms on vectors

For defining derivatives for functions of several variables, for describing algorithms and their convergence to optimal points and even to characterize optimal points in the first place, we need notions of norms on vectors (and later, on matrices). A norm is a function || x || defined on n-dimensional vectors x, that measures the magnitude of vector x, that satisfy the following conditions:

|| x || >= 0 for all vectors x and || x ||= 0 only for the zero vector x
|| ax || = |a| || x || for all vectors x and scalars a and
|| x || + || y || >= || x+y || for any two vectors x and y.

A set of functions satisfying these conditions are the p-norms on vectors, defined as -

|| x ||_p = p-th root of S_i | x_i |^p

The two norm (p=2) is the usual Euclidean norm and the one norm is the rectilinear or Manhattan norm and p = infinity corresponds to the max norm, i.e. || x || = max_i | x_i |.

For most of our purposes, these norms (and some others) are equivalent in the sense that a sequence of vectors that converges to zero under one norm does so under another.

Derivatives

A derivative is a linear operator that captures the local behaviour of a function. When evaluated at a point x, it provides an estimate of the function value at points near x, in a quantifiably approximate way.

For a function of one variable, the notion of a derivative is a constant evaluated at each point. This notion extended to a function f of n variables f(x₁, …, x_n) results in the definition of a gradient vector Ñf(x), the vector of partial derivatives with respect to the co-ordinate variables. The i-th component of Ñf(x) is the partial derivative of f(x) w.r.t. x_i.

The second derivative generalization is that of the Hessian matrix, defined as the (symmetric) matrix of cross derivatives, evaluated at some point.

Characterization of optima

It is not possible to design efficient algorithms for optimization unless the characterization of optima is exact and easily computable or verifiable. For unconstrained minima/maxima of differentiable functions, the optimum is characterized by f’(x) = 0 for a function of one variable, and Ñf(x) = 0 (the zero vector) for a function f defined on n-variables. [Verify this, using the Taylor series expansion for a function of n-variables.]