Genetic algorithms

Another method for finding globally optimal solutions is by the use of Genetic Algorithms. A self contained reference for that is Chapter 6 of K.Deb’s book, Optimization for Engineering Design, which should be read by those interested (this chapter also includes material on Simulated Annealing). Belegundu and Chandrupatla also has a quick section on it and Chong and Zak have a more theoretically oriented chapter on this topic.

The first step is to decide on an encoding scheme to represent the variables to the desired accuracy using a (usually binary) string. The length of the string represents the desired accuracy. If there are bounds on the variables, a simple scheme is to use the string 00…0 for each variable at its lower bound and 11…1 at its upper bound and intermediate values suitably interpolated [see K.Deb for details. Note that the formulation discussed there has some box constraints on the variables to make the domain a finite one for realistic encoding.].

An initial set of solutions is selected by a random mechanism which favours desirable solutions (as of now). The desirability of a solution is evaluated by a fitness function (which includes the objective function, and some penalty on violated constraints, if any). This sampling of the population is done by an appropriate weighted probability mechanism, referred to as the roulette wheel (creating a probability distribution where better solutions have a higher probability of being selected). [Make sure you understand this simple mechanism and how to implement it.]

From the initial pool at any stage, a new set of solutions is generated through various operators, of which the most commonly used are the crossover and mutation operators. These randomly perturb the solutions to get new ones (which explore new parts of the search space). These can be defined in many ways, and can be used to control the progress of the algorithm. Termination criteria is in terms of the number of iterations and keeping track of the best available solutions at any time.

Crossover tries to combine two solutions to get a third, different one. The points in the sequence of bits where the crossover can take place can be controlled. Mutation takes one solution and generates a different one through random perturbation of that solution. Parameters of a typical GA implementation would include the probabilities associated with the crossover and mutation operations.

For combinatorial optimization problems, the coding is most often in terms of 0-1 variables, which map naturally to chromosomes. For the n-city TSP, for example, we could have a string of n² 0-1 variables indicating the position of the i-th city in the tour or a string of n variables (each taking on integer values from 1, …, n) representing the order in which the cities are visited. Notice that for the TSP, in the second representation, a set of n circular permutations are actually the same, as far as a tour is concerned. A fitness function is evaluated for each chromosome.

The original GA is attributed to Holland and associates in the 1970’s. In that, the initial population of M chromosomes was selected at random, and a crossover was effected by choosing one parent with a high fitness function and another parent chosen at random, to get a new pair of offspring.

Convergence analysis of the GA is not easy and relies on defining schema (which represent groups of solutions) and the average fitness of the population at every step and the specific reproduction plan that is chosen. For details, you would have to refer to more specialized references, beyond the scope of this course.

Try the function f(x) = x³ – 60x² + 900x + 100 with x restricted to integer values between 0 and 31 to find the maximum, using a genetic algorithm. You can use the ordinary binary representation of integers as an illustrative coding for this one dimensional problem [e.g. the string 10011 would represent x = 19, and the string 00101 would represent x = 5]. Choose 5 strings at random and implement crossover and mutation and see how the algorithm progresses.

Try the Himmelblau function f(x₁, x₂) = (x₁² + x₂ – 11)² + (x₁ + x₂² – 7)² with the box constraints 0 <= x₁,x₂ <= 6, and also the constrained version with (x₁ – 5)² + x₂² – 26 >= 0, and code it yourself, to understand how the method works.

As a further exercise, also try the simulated annealing algorithm on the same function and explore the following issues:

What are the initial points for which the algorithm terminates at the global minimum?

What is the sensitivity of the algorithm to the cooling schedule and the initial parameter?

Come up with a scheme for GA for the slot timetabling problem and implement it.

Performance of Genetic Algorithms

It is difficult to come up with general results about the performance of GA as there are a number of different ways of implementing GA, starting with options in encoding the solution space and variables and the different ways of generating starting populations, the selection mechanisms and the methods used for ‘evolution’ of new generations of solutions. But some broad principles are as follows.

We restrict the analysis to simple, single-point crossover and mutation as the two mechanisms of evolution. For convenience, we use the binary encoding of solutions to illustrate the arguments. The framework used is that of a schema, which is a subset of chromosomes, where some of the values in the chromosome take on a fixed value. For example, the schema * * 1 * 0 * * represents a collection of chromosomes, which includes the chromosomes 1 0 1 0 0 0 1 and 1 1 1 1 0 1 0 and several others.

The order of a schema is the number of fixed entries and the length of a schema is the distance between the first and last fixed positions in the schema (the order of the schema above is 2 and the length is also 2). A third quantity of interest is the fitness ratio, which is the average fitness of a schema divided by the average fitness of the population (overall). The following three results can then be verified.

[Note that each generation can be thought of as a time step in the algorithm, so some references use time instead of generation. In what follows, P_c is the probability that a given chromosome is selected for crossover and P_m is the probability that it is selected for mutation, while defining the next generation.]

Result 1: In the selection plan, if a parent is selected in proportion to its fitness, then the expected number of instances of schema H in generation k+1 is f(H,k) N(H,k) where f is the fitness ration of H in generation k and N is the number of instances of H in generation k.

Result 2: If P(H,k) is the probability that a schema H is represented in the generation k, and if crossover is applied in generation k with a probability P_c, then

P(H,k+1) >= [1 – P_cl(H) ] [1 – P(H,k)] / (n – 1), where n is the length of the chromosome and l is the length of the schema.

Result 3: If P(H,k) is the probability that a schema H is represented in the generation k, and if mutation is applied in generation k with a probability P_m, then

P(H,k+1) >= [1 – P_mm(H)], where n is the length of the chromosome and m is the order of the schema.

Putting these together, we get a lower bound on the presence of schema H in generation (k+1) if crossover and mutation are applied.

This gives the Schema Theorem for the expected number of representatives of schema H in generation k+1 as:

E(H,k+1) >= [1 – P_cl(H) ] / (n – 1) [1 – P(H,k) – P_mm(H)] f(H,k) N(H,k)

This basically says that short, low order schemata combine with each other to form better and better solutions. This itself is not enough to ensure anything, but it is expected to work well in practice, which it does, empirically in some applications.

Dependence on encoding: As can be seen, the success depends on the way the solution space is encoded. As an illustration, supposing the good schema are such that the length is long (the number of entries between the first and last fixed elements of the schema), then crossover less likely to favour the continuation of this schema in subsequent generations. Therefore even the ordering of variables in a multi-variable problem matters significantly, as it effects the encoding scheme and therefore the schemata which capture “desirable” solutions. It is this that makes GA somewhat of an experimental science (but which is quite successful sometimes)!

Introduction to Lagrangean relaxation

Taking the weighted set covering problem as an example, we illustrate the principle of Lagrangean relaxation. Consider the following example, from Beasley (in Reeves’ book).

Min 2x₁ + 3x₂ + 4x₃ + 5x₄

s.t. x₁ + x₃ >= 1

x₁ + x₄ >= 1

x₂ + x₃ + x₄ >= 1

x_i e {0,1} all i.

Make sure you are able to relate this to the notation of the weighted set covering problem above.

The related problem

Min 2x₁ + 3x₂ + 4x₃ + 5x₄ + l₁(1 - x₁ - x₃) + l₂ (1 - x₁ - x₄) + l₃ (1 - x₂ - x₃ - x₄)

s.t. x_i e {0,1} all i.

if solved, gives a lower bound on the original problem value for any positive values of the l multipliers. Verify this using arguments similar to weak duality in LP. Also numerically try this for a few positive values of the multipliers and compare with the true optimal value for the original problem (which you can easily get, at least by enumeration).

This problem for specific values of the multipliers is easy to solve (compared to the original problem). Try it. A solution of such a problem which also satisfies the original feasibility and complementary slackness conditions would solve the original problem.

Lagrangean relaxation attempts to construct a sequence of such problems and iteratively come up with better bounds which can either be used in a branch and bound procedure or solve the problem optimally, directly. The procedure relies on a clever updating of the values of the multipliers to get the best value of the relaxed problem. Subgradient optimization is one technique used in this context.