Appendix C Genetic Algorithm - RoskaTamásD.Sc. Scientificadvisor: MiklósRásonyiPh.D. Supervisor

This section is based on [22], [24] and [79].

Genetic algorithm (GA) is a heuristical search method that mimics natural evolu-tion. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms can generate solutions to optimization problems us-ing techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. However these operations are not always used together in every version of genetic algorithms.

In a genetic algorithm, a population of strings (called chromosomes, genotype or genomes), which encode candidate solutions representing them in the state space of an optimization problem, evolves toward better solutions. Traditionally, solutions are rep-resented in binary as strings of 0s and 1s ³, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and hap-pens in generations. In each generation, the fitness of every individual in the population is evaluated (based on heursitcis), multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a max-imum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.

Genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics, physics and other fields.

A typical genetic algorithm requires:

• a genetic representation of the solution domain (the state space of the problem),

• a fitness function to evaluate the solution domain (a heuristical evaluation of the genomes).

A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operations. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in genetic programming and graph-form representations are explored in evolutionary programming; a mix of both linear chromosomes and trees is explored in gene expression programming.

3hence these are the straight forward represnetation on a common computer

111

The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For instance, in the knapsack problem one wants to maximize the total value of objects that can be put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise.

In some problems, it is hard or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used.

Canonical operations of the Genetic Algorithm

The following four methods are the fingerprints that determine genetic algorithms They can be found in every variants and alterations.

• 1- Initialization of the population

• 2- Fitness (weight) calculation for every entity

• 3- selection and recombination

• 4- mutation

And after the fourth step the iteration of steps 2,3,4 till a previously given time con-straint, or until we can find the optimal solution. (the N-queen problem uses the second version because the fitness of the optimal solution is known. This weight is not known in every case, however this will not effect or change the steps in the iterations)

During each successive generation, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as the former process may be very time-consuming.

The next step is to generate a second generation population of solutions from those selected through genetic operators: crossover (also called recombination), and/or mutation.

For each new solution to be produced, a pair of "parent" solutions is selected for breeding from the pool selected previously. By producing a "child" solution using the above methods of crossover and mutation, a new solution is created which typically shares many of the characteristics of its "parents". New parents are selected for each new child, and the process continues until a new population of solutions of appropriate size is generated.

Although reproduction methods that are based on the use of two parents are more "biology 112

inspired", some research [80] [81] suggests that more than two "parents" generate higher quality chromosomes.

The detailed description of the operators are the following:

Initialization

Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a population of solutions (usually randomly) and then to improve it through repetitive application of the mutation, crossover, inversion and selection operators. Initial-ization At this step we will create random strings (with genomes according to the problem representation). Each genome codes a possible solution candidate, representing a point in the state-space Our aim is to create the most diverse population possible and cover the entire stat-space. The optimization of this operation is always problem dependent. The convergence of the algorithm does not depend on the initial population if the mutation rate and the number of entities is relatively high, in this case the algorithm will always converge to the optimal solution with probability one.

Initially many individual solutions are (usually) randomly generated to form an initial population. The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Traditionally, the population is generated randomly, allowing the entire range of possible solutions (the search space).

Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found.

Fitness Calculation

We will calculate a fitness (weight) value for every genome. This value represents a distance between our point and an optimal point in the state space. The metric of the distance is based on a problem dependent heuristic.

The selection of this heuristic is a key part of the algorithm, and an extremely difficult task. however we do not want to investigate this problem in this article. There are published and well known fitness functions, heuristic metrics for a large number of problems. e will calculate a fitness (weight) value for every genome. This value represents a distance between our solution candidate and an optimal solution in the state space. The metric of the distance is based on problem dependent heuristics. Selecting an appropriate metric is a key question, I base our choice on well-known, published suggestions.

Selection of Parents

In case of the ’general’ GA selection is calculated globally. During this step we select the genome we want to conserve, and use in the next iteration, and overwrite the unnecessary elements.

For selection the following methods are used:

113

Deterministic Sampling The best genome, genomes with highest fitness values are selected. The genomes are ordered, and after this the first X % is selected, and the other genomes are cleared. X is previously declared as a parameter.

The commonly used sampling methods are:

-Stochastic Universal Sampling also known as roulette wheel sampling.

During the generation of the new population ever genome has a probability to be chosen relational to its fitness.

-Stochastic Tournament Selection

A composite method from the previous two version. First we select n genomes with Stochastic Universal Sampling, and after this from the n elements we will chose k genomes (usually k=1 or k=2) and these genomes will be conserved as parents for the next pop-ulation. We will repeat the sampling until the new population will reach the size of the previous population

-Remainder Stochastic Sampling

Also a composite method created from the first two selection mechanism. We normalize the weights for every entity, in a way that the sum of the weight will be number of genomes in the population. the selection is based on these normalized weights. First we will select every genomes deterministically as many times as the integer part of their fitness. After this we will create an other weight from the remainder part of the fitness values and perform a stochastic resampling based on them. e.g.: after the normalization the fitness value is 2.65, it means that the genome will be used twice in the gene pool of the next iteration, and it will have a 0.65 likelihood (not probability) for the stochastic selection.

Recombination

There are two different versions of this operator: single-point and multi-point recom-binations.

In case of a single-point mutation we will choose one gene in the genome, and until this point all the genes will be taken from one parent and after this point all the genes will be copied from the other parent. Single-point mutation can only be used in case of two parents.

Multi-point mutation is a repeated version of single-point mutation for more parents.

we will select more genes in the genomes, and the genes between this selection are de-termined by one parent. In an extreme case it is also possible the that all the genes are inhereted from a different father.

Mutation

Mutation performs a random jump in the state space of problem in the neighborhood of a candidate solution

114

There exist many mutation variants, which usually affect one or more loci (genes or components) of the individual. The mutation randomly modifies a single solution whereas the recombination acts on two or more parent chromosomes.

There are also two different, simple version for this step: We have an upper limit for the number of changing genes in a genome, or we do not have an upper limit,and the number of changing genes is arbitrary. For every gene we have a probability, that the selected gene will change its value. There are a large number of heuristic and non heuristic version for improving the mutation, however all these steps are usually based on the problem, and its representation. The mutation can be value dependent or independent, in the value dependent case the new gene will be an altered version of the previous gene (larger/smaller with a predefined value) in case of the value independent case the new gene will be a randomly selected value.

Termination Criterion

These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. Generally the average fitness will have increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions, for reasons already mentioned above.

Although Crossover and Mutation are known as the main genetic operators, it is pos-sible to use other operators such as regrouping, colonization-extinction, or migration in genetic algorithms [82].

This generational process is repeated until a termination condition has been reached.

Common terminating conditions are:

• A solution is found that satisfies minimum criteria

• Fixed number of generations reached

• Allocated budget (computation time/money) reached

• The highest ranking solution’s fitness is reaching or has reached a plateau such that successive iterations no longer produce better results

• Manual inspection

• Combinations of the above

In document RoskaTamásD.Sc. Scientificadvisor: MiklósRásonyiPh.D. Supervisor: HorváthAndrásAthesissubmittedforthedegreeofDoctorofPhilosophyPázmányPéterCatholicUniversityFacultyofInformationTechnology SOLVINGNON-TOPOGRAPHICPROBLEMSWITHTOPOGRAPHICANDSYNCHRONIZATIONALGO (Pldal 111-115)