GA Notes

FSU Seal - 1851

COT 5405
Advanced Algorithms
Chris Lacher
Notes GA: Introduction to Genetic Algorithms

Genetic Algorithms:
Nonlinear Optimization Technology
Biologically Inspired Computation

begin with random population
repeat 
    selection
    crossover
    mutation
    ==> new population
until fitness of best individual is optimized

Population
Fitness Function
Fitness-proportionate selection
Crossover
Mutation
Fitness Optimization Criteria
Algorithm
Example
Applying GA
Theory
Variations
Representative Application Areas
Evolving Neural Architectures
Grammatical Encoding: Detailed Example
Research Topics

begin with random population
repeat 
    selection
    crossover
    mutation
    ==> new population
until fitness of best individual is optimized

Population

Alphabet aka alleles
DNA: {A,T,C,G} [nucleotides A (adenosine), T (thymidine), C (cytidine), G (guanosine)]
Protein: {A,B,C,D,E,F,G,H,I,J,K,L,N,O,P,Q,R,S,T} [20 amino acids] Genetic Code
Generic: {0,1}

Chromosome = fixed-length sequence from alphabet Protein example: IHCCAASASDMIKPQFHFOSEBBDCBDBABIBIABDMKPKICBEBHVGGGS Generic example: 01101100010101000101 Population = set of n chromosomes of same length l Generic example: {00000010,11101110,00100000,00110100} length l = 50 - 1000 (typical) size n = 50 - 1000 (typical)

begin with random population repeat selection crossover mutation ==> new population until fitness of best individual is optimized Fitness Function Function F defined for every chromosome (of length l) Interpretation: F(c₁) < F(c₂) implies c₁ is less fit than c₂ Example: F(c) = number of 1s in c Fitness-Proportionate Selection Called viability selection in biology Implementation: roulette wheel sampling Each individual of population assigned a sector of wheel proportional to it's fitness Spin wheel to select an individual begin with random population repeat selection crossover mutation ==> new population until fitness of best individual is optimized Crossover With probability p_c, cross two parents at a randomly selected location Example of crossover, at location 3: parent pair x3 child pair 11101100 11111011 10111011 10101100 p_c = 0.4 - 0.7 (typical) begin with random population repeat selection crossover mutation ==> new population until fitness of best individual is optimized Mutation With probability p_m, mutate at each locus For binary alphabet, mutation is a bit flip p_m = 0.001 (typical) Fitness Optimization Criteria Stop when "close" to perfect Weak link: cannot know how close or far away "perfect" is Practice: Take many runs, vary parameters, keep record of best (c, f(c)) for each run. Algorithm main { input { L, // chromosome length N, // population size // make this an even number for convenience P, // randomly generated set of n chromosomes of length L p_c, // crossover probability p_m, // mutation probability F // fitness function } max_fitness = maximum value of F(c) for c in P; c_max = chromosome in P where maximum is achieved; while (max_fitness < fitness_goal) { P1 = generate_new_population(P); max_fitness = maximum value of F(c) for c in P1; c_max = chromosome in P1 where maximum is achieved; P = P1; } output { P, c_max, F(c_max) } } generate_new_population(P) { for (i = 0; i < N/2; ++i) { choose two elements c1, c2 of P using roulette wheel sampling; with probability p_c, crossover(c1,c2); mutate (c1); mutate (c2); insert c1 and c2 in P1; } return P1 } crossover (&c1 &c2) // passed by reference { loc = random [0,L); for (i = 0; i < loc; ++i) c1[i]=c2[i]; for (i = loc; i < L; ++i) c2[i] = c1[i]; } mutate (&c) // passed by reference { for (i = 0; i < L; ++i) with probability p_m flip c[i]; } Example L = 8 N = 4 p_c = 0.6 p_m = 0.1 P = { 00000110 , 11101110 , 00100000 , 00110100 } F(c) = number of '1' P[0] F ROULETTE XC P[1] F ROULETTE XC P[2] F ROULETTE XC P[3] F -------- - -------- -- -------- - -------- -- -------- - -------- -- -------- - 00000110 2 11101110 y3 11110100 5 11101110 y5 11101100 5 11110110 y6 11110110 6 11101110 6 00110100 00101110 4 11110100 11110110 6 11101110 11101110 6 00100000 1 11101110 n 11101110 6 11101110 y2 11101110 6 00101110 y4 00100110 3 00110100 3 00000110 00000110 2 00101110 00101110 4 11110110 11111110 7 ROULETTE XC P[5] F ROULETTE XC P[6] F ROULETTE XC P[7] F -------- -- -------- - -------- -- -------- - -------- -- -------- - 11111110 y2 11110110 6 11111110 n 11111110 7 11111110 y3 11111110 7 11110110 11111110 7 11111110 11111110 7 11111110 11111110 7 11111110 n 11111110 7 11101110 y7 11111110 7 11111110 y7 11111110 7 11101110 11101110 6 11111110 11101110 6 11101110 11101110 6 Note that P[7] = P[6] From P[7], two "bad" genes are propagated without possibility of change by crossover: sites 4, 8 The site 4 can be removed only by the failure of the individual to propagate - which is likely to happen eventually because of roulette wheel sampling. (Possible, but much less likely, is that the other genotype will die out.) Add mutation: Eventually mutation will occur at site 8, producing an optimally fit individual in the population (assuming the unlikely possibility of death of 11111110 has not occured). Applying GA Problem Coding Find alphabet and length that represents problem instances Reversible mapping {problem instances} <--> {chromosomes} Sometimes this is straightforward, sometimes it is a leap, and sometimes the straightforward approach is not the best Fitness Function Evaluate representative chromosome Function F:{chromosomes} --> {numbers} F(c) high value ==> instance represented by c is good Theory Schemata - chromosomes with "wild cards" - 10*0*1110* Order of a schema H o(H) = number of fixed positions o(10*0*1110*) = 7 Defining length of a schema H d(H) = distance between first and last fixed position d(10*0*1110*) = 9 m(H,t) = number of representatives of H in population at time t f(H) = average fitness of chromosomes in H f(P) = average fitness over entire population P Schema Theorem (Holland, 1975) m(H, t + 1) >= m(H,t) [f(H)/F(P)] [1 - p_c d(H)/(l - 1) - o(H)p_m] Standard Interpretation: The number of representatives of a schema in a population grows in proportion to the fitness of the schema. This interpretation has come into question in recent years. The theory of GA in particular and evolutionary computation in general is an active field of research. Variations "Fitness" may be inversely related to F(c) (low is good) - rescale for Roulette Crossover often needs coding-specific definition or exceptions Multipoint crossover Variable mutation rate (inversely proportional to Hamming distance between parents) Representative Application Areas Artificial Life Lisp Programs Cellular Automata Real Life Learning and survival: models of Baldwin Effect Population biology simulations Ecosystems Science Protein Structure Nonlinear Opitimization OR, Prediction AI Evolving Learning Algorithms: backprop parameters; reinforcement learning, others Neural Network Architecture Evolving Neural Architectures Direct Encoding Problem Mapping: adjacency (connection) matrix <==> network topology feed-forward network <==> lower-diagonal matrix alphabet = {0,1} chromosome = rows of adjacency matrix concatenated Crossover: exchange rows row = input to a single node Mutation: standard Fitness: for a given problem, backprop for a fixed number of epochs using training set evaluate on test set sum square error = fitness (low is good) Montana & Davis Grammatical Encoding Problem Mapping: graph generation grammar ==> graph alphabet = set of non-terminals in grammar chromosome = string of non-terminals (first element must be S = "start") Crossover: standard or multiple-site Mutation: variable rate (inverse Hamming) Fitness: train, test as above Hiroaki Kitano, [Former Director, SONY Computer Science Lab], Drug Design Kitano Publications Grammatical Encoding - Detailed Example Alphabet = {S,A,B,C,D,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p} Chromosome = SXXXX XXXXX XXXXX XXXXX XXXXX ... (X = any alphabet character) (spaces for visual convenience only) Grammar Terminals (fixed): a -> 0 0 b -> 0 0 c -> 0 0 d -> 0 0 e -> 0 1 0 0 0 1 1 0 1 1 0 0 f -> 0 1 g -> 0 1 h -> 0 1 i -> 1 0 j -> 1 0 1 0 0 1 1 1 0 0 0 1 k -> 1 0 l -> 1 0 m -> 1 1 n -> 1 1 o -> 1 1 1 0 1 1 0 0 0 1 1 0 p -> 1 1 1 1 Example non-terminals (variable): S -> A B A -> c a B -> a a C -> f g D -> c a C D f a a a o k p c represented by chromosome SABCDAcafaBaaaaCfgokDcapc Example chromosome SABCD Aaaca Baaaa Cokab Daaoa decodes as follows: S -> A B -> a a a a -> 0 0 0 0 0 0 0 0 = adjacency matrix C D c a a a 0 0 0 0 0 0 0 0 o k a a 0 0 0 0 0 0 0 0 a b o a 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 Need ad hoc rules such as: Take first production for each non-terminal, discard the rest Discard backward connections (make matrix lower diagonal) discard intra-layer connections Research Area: Applying GA to Acyclic Architectures Terminals - use with known interesting subnets (e.g., Expert Net Nodes) Matrix Blocks Applicable to Neural or Computational Nets Fitness - possibly penalize for number of connections, selecting for sparseness References Golgberg, David E., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989. Mitchell, Melanie, An Introduction to Genetic Algorithms, MIT Press, 1996 Artificial Life Home Page Lacher, R.C., Expert networks: Paradigmatic conflict, technological rapprochement, Minds and Machines 3 (1993) 53--71

Genetic Algorithms:Nonlinear Optimization Technology Biologically Inspired Computation

Population

Fitness Function

Fitness-Proportionate Selection

Crossover

Mutation

Fitness Optimization Criteria

Algorithm

Example

Applying GA

Theory

Variations

Representative Application Areas

Evolving Neural Architectures

Grammatical Encoding - Detailed Example

Research Area: Applying GA to Acyclic Architectures

References

Genetic Algorithms:
Nonlinear Optimization Technology
Biologically Inspired Computation