Genetic algorithms
Another method for finding globally optimal solutions is by the use of Genetic Algorithms. A self contained reference for that is Chapter 6 of K.Deb’s book, Optimization for Engineering Design, which should be read by those interested (this chapter also includes material on Simulated Annealing). Belegundu and Chandrupatla also has a quick section on it and Chong and Zak have a more theoretically oriented chapter on this topic.
The first step is to decide on an encoding scheme to represent the variables to the desired accuracy using a (usually binary) string. The length of the string represents the desired accuracy. If there are bounds on the variables, a simple scheme is to use the string 00…0 for each variable at its lower bound and 11…1 at its upper bound and intermediate values suitably interpolated [see K.Deb for details. Note that the formulation discussed there has some box constraints on the variables to make the domain a finite one for realistic encoding.].
An initial set of solutions is selected by a random mechanism which favours desirable solutions (as of now). The desirability of a solution is evaluated by a fitness function (which includes the objective function, and some penalty on violated constraints, if any). This sampling of the population is done by an appropriate weighted probability mechanism, referred to as the roulette wheel (creating a probability distribution where better solutions have a higher probability of being selected). [Make sure you understand this simple mechanism and how to implement it.]
From the initial pool at any stage, a new set of solutions is generated through various operators, of which the most commonly used are the crossover and mutation operators. These randomly perturb the solutions to get new ones (which explore new parts of the search space). These can be defined in many ways, and can be used to control the progress of the algorithm. Termination criteria is in terms of the number of iterations and keeping track of the best available solutions at any time.
Crossover tries to combine two solutions to get a third, different one. The points in the sequence of bits where the crossover can take place can be controlled. Mutation takes one solution and generates a different one through random perturbation of that solution. Parameters of a typical GA implementation would include the probabilities associated with the crossover and mutation operations.
For combinatorial optimization problems, the coding is most often in
terms of 0-1 variables, which map naturally to chromosomes. For the n-city TSP, for example, we could
have a string of n2 0-1 variables indicating the position of the
i-th city in the tour or a string of n variables (each taking on integer values
from 1, …, n) representing the order in which the cities are visited. Notice that for the TSP, in the second
representation, a set of n circular permutations are actually the same, as far
as a tour is concerned. A fitness
function is evaluated for each chromosome.
The original GA is attributed to Holland and associates in the
1970’s. In that, the initial population
of M chromosomes was selected at random, and a crossover was effected by
choosing one parent with a high fitness function and another parent chosen at
random, to get a new pair of offspring.
Convergence analysis of the GA is not easy and relies on defining
schema (which represent groups of solutions) and the average fitness of the
population at every step and the specific reproduction plan that is
chosen. For details, you would have to
refer to more specialized references, beyond the scope of this course.
Try the function f(x) = x3 – 60x2 + 900x + 100 with x restricted to integer values between 0 and 31 to find the maximum, using a genetic algorithm. You can use the ordinary binary representation of integers as an illustrative coding for this one dimensional problem [e.g. the string 10011 would represent x = 19, and the string 00101 would represent x = 5]. Choose 5 strings at random and implement crossover and mutation and see how the algorithm progresses.
Try the Himmelblau function f(x1, x2) = (x12
+ x2 – 11)2 + (x1 + x22
– 7)2 with the box constraints 0 <= x1,x2
<= 6, and also the constrained version with (x1 – 5)2
+ x22 – 26 >= 0, and code it yourself, to understand
how the method works.
As a further exercise, also try the simulated annealing algorithm on
the same function and explore the following issues:
What are the initial points for which the algorithm terminates at the
global minimum?
What is the sensitivity of the algorithm to the cooling schedule and
the initial parameter?
Come up with a scheme for GA for the slot timetabling problem and
implement it.
It is difficult to come up with general results about the performance
of GA as there are a number of different ways of implementing GA, starting with
options in encoding the solution space and variables and the different ways of
generating starting populations, the selection mechanisms and the methods used
for ‘evolution’ of new generations of solutions. But some broad principles are as follows.
We restrict the analysis to simple, single-point crossover and mutation
as the two mechanisms of evolution. For
convenience, we use the binary encoding of solutions to illustrate the
arguments. The framework used is that
of a schema, which is a subset of
chromosomes, where some of the values in the chromosome take on a fixed
value. For example, the schema * * 1 *
0 * * represents a collection of chromosomes, which includes the chromosomes 1
0 1 0 0 0 1 and 1 1 1 1 0 1 0 and several others.
The order of a schema is the number of fixed entries and the length of
a schema is the distance between the first and last fixed positions in the
schema (the order of the schema above is 2 and the length is also 2). A third quantity of interest is the fitness
ratio, which is the average fitness of a schema divided by the average fitness
of the population (overall). The
following three results can then be verified.
[Note that each generation can be thought of as a time step in the
algorithm, so some references use time instead of generation. In what follows, Pc is the
probability that a given chromosome is selected for crossover and Pm
is the probability that it is selected for mutation, while defining the next
generation.]
Result 1: In the selection plan, if a parent is
selected in proportion to its fitness, then the expected number of instances of
schema H in generation k+1 is f(H,k) N(H,k) where f is the fitness ration of H
in generation k and N is the number of instances of H in generation k.
Result 2: If P(H,k) is the probability that a schema H
is represented in the generation k, and if crossover is applied in generation k
with a probability Pc, then
P(H,k+1) >= [1 – Pcl(H) ] [1 – P(H,k)]
/ (n – 1), where n is the length of the chromosome and l is the length of
the schema.
Result 3: If P(H,k) is the probability that a schema H
is represented in the generation k, and if mutation is applied in generation k
with a probability Pm, then
P(H,k+1) >= [1 – Pmm(H)], where n is
the length of the chromosome and m is the order of
the schema.
Putting these together, we get a lower bound on the presence of schema
H in generation (k+1) if crossover and mutation are applied.
This gives the Schema Theorem for the expected
number of representatives of schema H in generation k+1 as:
E(H,k+1) >= [1 – Pcl(H) ] / (n – 1) [1
– P(H,k) – Pmm(H)]
f(H,k) N(H,k)
This basically says that short, low order schemata combine with each
other to form better and better solutions.
This itself is not enough to ensure anything, but it is expected to work
well in practice, which it does, empirically in some applications.
Dependence on encoding: As can be seen, the success depends on the way the solution space is encoded. As an illustration, supposing the good schema are such that the length is long (the number of entries between the first and last fixed elements of the schema), then crossover less likely to favour the continuation of this schema in subsequent generations. Therefore even the ordering of variables in a multi-variable problem matters significantly, as it effects the encoding scheme and therefore the schemata which capture “desirable” solutions. It is this that makes GA somewhat of an experimental science (but which is quite successful sometimes)!
Introduction to
Lagrangean relaxation
Taking the weighted set covering problem as an example, we illustrate the principle of Lagrangean relaxation. Consider the following example, from Beasley (in Reeves’ book).
Min 2x1
+ 3x2 + 4x3 + 5x4
s.t. x1 + x3 >= 1
x1 + x4 >= 1
x2 + x3 + x4 >= 1
xi e {0,1} all i.
Make sure you are able to relate this to the notation of the weighted set covering problem above.
The related problem
Min 2x1 + 3x2 + 4x3 + 5x4 + l1 (1 - x1 - x3) + l2 (1 - x1 - x4) + l3 (1 - x2 - x3 - x4)
s.t. xi e {0,1} all i.
if solved, gives a lower bound on the original problem value
for any positive values of the l
multipliers. Verify this using arguments
similar to weak duality in LP. Also
numerically try this for a few positive values of the multipliers and compare
with the true optimal value for the original problem (which you can easily get,
at least by enumeration).
This problem for specific values of the multipliers is easy to solve (compared to the original problem). Try it. A solution of such a problem which also satisfies the original feasibility and complementary slackness conditions would solve the original problem.
Lagrangean relaxation attempts to construct a sequence of such problems and iteratively come up with better bounds which can either be used in a branch and bound procedure or solve the problem optimally, directly. The procedure relies on a clever updating of the values of the multipliers to get the best value of the relaxed problem. Subgradient optimization is one technique used in this context.