These are very brief notes and are not substitutes for
textbooks. Reading only these will NOT
be adequate in learning the subject well or even in doing well in the
exams. As senior and postgraduate
students, familiarity with more than one textbook in this classical and
important area is taken for granted.
Attendance will be taken and 75% attendance is
required for writing the exam and clearing the course.
Optimization in this course will consist of minimization or maximization (it doesn’t matter) of a “well defined” function of several (but finite number of) variables, perhaps subject to some constraints. Continuous optimization relies on the use of derivatives (slopes or gradients) and higher order quantities like second derivatives (curvature and its generalizations), to model a function’s behaviour as the parameters change.
Most of continuous optimization will rely on either a functional form of the objective that has to be optimized, or at least that there are smooth properties of a function that can be evaluated efficiently at various points. Note that derivatives of various orders can be approximated by finite differences when required.
We will also consider discrete optimization, which deals with a (usually finite) discrete set of feasible points. Here, notions of derivatives and model functions are not directly useful, but notions of descent, improvement and local/global optimality are useful here, too.
What this course will NOT consider: Multi-objective optimization, fuzzy function theory, probabilistic concerns both of objective and constraints and equilibrium or game theoretic models. Note that all of these can be treated as fairly natural, but non-trivial extensions of the theory that we will see here.
· Linear programming as a link technique between continuous and discrete optimization
· Linear programming duality and sensitivity, its extensions to Nonlinear problems and the interpretations there
· (Convex) Quadratic programming (QP) is as easy as linear programming
· Quasi Newton methods based on approximations of second order information would be among the most efficient general purpose nonlinear programming techniques
· Sequential QP would form among the most robust general purpose tools for constrained optimization
· Integer programming problems are sometimes solvable using linear programming subproblems
This consists of derivative and nonderivative methods that are quite nice and intuitive and useful. Please read them. As subroutines for larger (dimensional) problems, it may be enough to find the first (local) optimum point in a given direction and that too, approximately.
Graphical method for 2 variable problems, extreme point properties of a solution (notion of a basic feasible solution) and the basic simplex method will be taken as the starting point.
· Formulate a procedure for minimizing a function f of n variables. The only information available is to be able to evaluate f(x) given x. It is believed that the function is a smooth one (differentiable, etc.), but no formula is available. Assume that some “reasonable” starting solution is available.
· On a directed network of nodes (set N) and directed arcs (set A) with each arc going from some node to another, and a distance function defined on each arc, describe using proper notation, the problem of finding the shortest path between two designated nodes s and t. Try to also formulate this problem as a linear programming problem. Note that LP is a continuous problem.
· Show rigorously how maximization and minimization problems are equivalent. You have to first formulate a rigorous statement about this! The crux is that if you can do one, you can do the other.
For defining derivatives for functions of several variables, for describing algorithms and their convergence to optimal points and even to characterize optimal points in the first place, we need notions of norms on vectors (and later, on matrices). A norm is a function || x || defined on n-dimensional vectors x, that measures the magnitude of vector x, that satisfy the following conditions:
A set of functions satisfying these conditions are the p-norms on vectors, defined as -
|| x ||p = p-th root of Si | xi |p
The two norm (p=2) is the usual Euclidean norm and the one norm is the rectilinear or Manhattan norm and p = infinity corresponds to the max norm, i.e. || x || = maxi | xi |.
For most of our purposes, these norms (and some others) are equivalent in the sense that a sequence of vectors that converges to zero under one norm does so under another.
A derivative is a linear operator that captures the local behaviour of a function. When evaluated at a point x, it provides an estimate of the function value at points near x, in a quantifiably approximate way.
For a function of one variable, the notion of a derivative is a constant evaluated at each point. This notion extended to a function f of n variables f(x1, …, xn) results in the definition of a gradient vector Ńf(x), the vector of partial derivatives with respect to the co-ordinate variables. The i-th component of Ńf(x) is the partial derivative of f(x) w.r.t. xi.
The second derivative generalization is that of the Hessian matrix, defined as the (symmetric) matrix of cross derivatives, evaluated at some point.
It is not possible to design efficient algorithms for
optimization unless the characterization of optima is exact and easily
computable or verifiable. For
unconstrained minima/maxima of differentiable functions, the optimum is
characterized by f’(x) = 0 for a function of one variable, and Ńf(x)
= 0 (the zero vector) for a function f defined on n-variables. [Verify this,
using the Taylor series expansion for a function of n-variables.]