A Genetic Algorithm In C++ To Solve
A Travelling Salesman Problem


... and other Optimization Problems.


In computer science and applied mathematics an optimization problem is to find the best of all possible or available solutions for a given problem. A software engineer might define the problem by an 'objective function' with one or more input parameters. The goal of the optimization process is to systematically choose input variables as solutions to find the minimum or maximum output value of the function. The variables might be continuous for continuous optimization problems or discrete for combinatorial optimization problems.

Besides pure theoretical problems very interesting and complex real world problems can be described by objective functions. An optimization algorithm can be used to automatically find good or optimal solutions for such problems.

An example: Transport companies have to find the most efficient routes for their truck drivers. In such a case an objective function calculates the length of a route and its input is an ordered set of route locations. The goal is to find a solution / combination of locations which will result in the shortest possible route length.

When the output value of an objective function should be minimized, it is also called a 'loss function' or 'cost function', which calculates the 'cost' of a problem. When the output value should be maximized, the objective function is inverted and often called a 'fitness function' or 'reward function'.

Many optimization algorithms are search techniques for finding a global optimum in a defined space. As mentioned before, a global optimum (either maximal or minimal) is the best among all possible solutions. Optimization processes often get stuck in local optima. In this case the search process finds the best solution in a subset of all solutions, but cannot find the best among all possible solutions. It is desirable to find the global optimum, but also very valuable to find a local optimum.


Figure 1 (click to enlarge)


Genetic Algorithm


A Genetic Algorithm is one of many optimization algorithms. Its purpose is to guide a search process to find a global optimal solution for a problem in a very large search space. It belongs to the class of evolutionary algorithms and can solve very complex combinatorial problems. Evolutionary algorithms are more or less based on biological processes and Darwinian evolution theory. Charles Darwin formulated the scientific theory of evolution by natural selection in his book 'On the Origin of Species', which was published in 1859. The algorithm is not an exact simulation of evolution, but it uses ideas from Genetics like Genes, DNA, Chromosomes, Selection, Recombination and Mutation as an analogy for the optimization of a given problem.

On some classes of problems a Genetic Algorithm can not guarantee to find a global optimal solution, but it may approximate a sufficiently good solution in a relative short time. It is a Metaheuristic which trades optimality for speed and is often the only viable option, when the search space is very large, discrete or not differentiable and therefore not suited for classic mathematical (calculus based) optimization methods.

One example is the automatic design of antennas for radio communication. There is an infinite amount of possible shapes for antennas. An evolutionary algorithm can be used to find the (near-)optimal shape of an antenna with a given requirement. In fact the NASA used Genetic Algorithms to evolve an antenna design for its Space Technology 5 (ST5) mission which launched in March 2006.

There is a huge list of interesting applications for Genetic Algorithms, ranging from the design and engineering of industrial products, solving complex scheduling problems, evolving or training neural networks for artificial intelligence, computational chemistry, stock market trading and financial mathematics, to the automatic creation of art.


Evolution in a nutshell


Living organisms are composed of many different types of cells. The core of a cell is the nucleus. It contains genetic material in the form of chromosomes, which are the blueprints or construction plans of organisms. A chromosome is a DNA (DeoxyriboNucleic Acid) molecule which is a long sequence of nucleotides. The nucleotides are paired together as two long strands. A gene is a small segment of this nucleotide sequence with a specific function. The DNA is a combination of many genes which encode different characteristics of an organism, for example its size, colour and gender.

Just remember that a DNA / chromosome stores information in the form of many genes. The type and order of the genes define the features of an organism.


Figure 2 (click to enlarge).
A simplified 3D rendering of a short DNA-segment (paired strands of nucleotides).


A population is a group of organisms, in which pairs of members can breed offspring (children). The different organisms compete to survive and reproduce. Their children form the next generation of the population. When two organisms reproduce, their individual chromosomes / DNA will be copied, mixed up and passed on to their children (Recombination / Crossover). Furthermore a few genes of the children will be changed by random chance (Mutation).

By inheriting different features of its parents through recombination and the development of new traits caused by mutation, a child will introduce lots of variety into the next generation. The children with the best fitness will survive and reproduce more than other children (Natural Selection). Finally a species will evolve over many generations which is optimized to survive in a specific environment. Fitness does not simply mean strength or intelligence. Organisms with a better fitness have characteristics, which lead to better reproductive success in their environment. Some traits of an organism can be a benefit in one environment and a disadvantage in another one. Successful features for survival will eventually get common in a population, but the diversity of the organisms is what drives the evolution. Without recombination and mutation, which also introduces bad chracteristics into a population, the evolution will stop.

How can evolution be used to optimize or solve many real world problems?

When a problem can be represented as a combination (DNA) of multiple things (Genes) and a function can be provided that calculates the quality of such combination (Fitness Function), a Genetic Algorithm may hopefully find the best of all possible combinations - if given enough time! The algorithm may automatically evolve better solutions from generation to generation with the help of Selection, Recombination and Mutation.

Even a single number, which might be used as an input variable for an objective function, can be represented as a DNA with a fixed number of genes. Decimal numbers and many other things can be encoded as bit strings:

Figure 3 (click to enlarge)

The decimal numbers change when recombination or mutation will be applied to its bit patterns.


Travelling Salesman Problem


Imagine a company has created a new version of its bestselling product. The company sends a salesman on a road show. He should visit the biggest cities in the country to demonstrate the new product to the public. The salesman should travel to each city only once and his journey has to end where he originally started. What is the shortest possible route to visit each city exactly once and return to the origin?

A route with only 15 cities results in 43589145600 (43 thousand million) possible combinations. A route with 18 cities already has 177843714048000 (177 million million) combinations. A route with 50 cities has a number of possible combinations with 63 decimal digits! The formula to calculate the number of combinations is factorial(n-1) / 2 where n is the number of cities.

When testing all possible combinations of a route with 50 cities, it may take a modern computer millions of years to find the shortest route. A Heuristic like a Genetic Algorithm can find a short route in seconds and it may even find the shortest of all possible routes in a few seconds or minutes.

There are very effective and specialized algorithms for solving the Travelling Salesman Problem, like the '2-opt' and '3-opt' heuristic or the 'Lin-Kernighan' heuristic. These local search algorithms reorder route locations, until a shorter route is found. They can be used exclusively or can be guided by Genetic Algorithms and other Metaheuristics to find short routes even faster.


Terminology of Genetic Algorithms


Gene Smallest unit of information
DNA / Chromosome / Genome Sequence of genes
Population Collection of individual DNAs
Generation A new population of evolved DNA
Fitness The score or quality of a DNA
Fitness Function Objective function which calculates the score of a DNA
Selection A strategy to select parent DNAs for reproduction based on their fitness
Recombination / Crossover A strategy to copy genes from parent DNAs to create new child DNAs
Mutation A (random) change of one or more genes in a DNA

... in the context of the Travelling Salesman Problem:

Gene Location of a city
DNA / Chromosome / Genome A sequence of city locations = route
Population Collection of individual routes
Generation A new collection of evolved routes
Fitness The total length of a route
Fitness Function Adds up the distances between every pair of cities in the route
Selection A strategy to select routes based on their fitness to create new routes
Recombination / Crossover A strategy to copy locations from parent routes to build new child routes
Mutation Swapping one or more locations in a route


The Algorithm


  1. Initialization: An initial population with many different DNAs will be created. The DNAs will be composed of random genes. In the case of the Travelling Salesman Problem every possible gene (city) will be placed in a DNA (route) once and in a random order.

  2. Fitness evaluation: A fitness function (objective function) calculates a score (the length of the route) for each DNA in the population. If the best DNA is an optimal or sufficient solution for the problem, the algorithm will stop. Otherwise a new generation of the population will be evolved:

  3. Selection: Every DNA in the population should be replaced with a new child. Two DNAs will be selected as parents for a child. One way would be to select the two best DNAs as parents. This may evolve better child DNAs very fast but the evolution may get stuck (in a local optimum) very quick and it will become less likely to find a global optimal solution. It is desirable to get more genetic variety in the next generation by giving more DNAs the chance to reproduce. The better the fitness of a DNA, the higher the chance that it will be selected as a parent.

  4. Recombination / Crossover: For every DNA in the population a child DNA will be produced by copying and mixing the genes of its selected parents. The child will become a member of a new population which forms the next generation.

  5. Mutation: Every new child DNA will be mutated by randomly changing one or more of its genes. This creates additional variety in the new generation. The amount of mutation per DNA must be kept very low. Too much mutation will overwrite the inherited features of the parents and hinder or completely stop the evolution process.

  6. Replacement: Every DNA of the current population will be replaced with the (child) DNA of the new generation.

  7. Go to step 2


Results  (of the implementation below)


A Genetic Algorithm is a versatile problem solver, but it is a challenge to find the best values for the initial population size, mutation rate and the best selection and recombination strategy. Too much and too low genetic variation will lead to a break down of the evolution and the process gets stuck in local optimal solutions. The right parameters can also be found by using an optimization algorithm. This is called Meta-Optimization. A Metaheuristic like a Genetic Algorithm can also be combined with different Heuristics to become even more effective.

An optimization of a Travelling Salesman Problem with a sole Genetic Algorithm. A near-optimal solution for 256 route locations was found after 300 generations. There are still some overlaps in the route.


Figure 4 (click to enlarge)
A Genetic Algorithm combined with 2-opt search found the global optimal solution after 165 generations.


Figure 5 (click to enlarge)

Figure 5 shows the results of three optimizations of the 'Berlin52' Travelling Salesman Problem. The data set (created by Martin Grötschel) consists of 52 route locations. The shortest route length is 7544,366 units. In all shown cases the optimal solution was found after a few evolved generations. With a large population size of 150000 a sole Genetic Algorithm without 2-opt local search found the optimal solution in about 50% of the tests. A lower population size of 50000 needed more generations to find a short route and it got stuck in local optima very often. With a very large population size of 5000000 the algorithm found the global optimum solution most of the time, but the memory usage and computation time per generation increased dramatically. Huge improvements were gained by using a 2-opt local search after each mutation. With a very small population size of 100 the shortest route was found every time after only 3 - 10 generations.

A simple way to increase the chance of finding a global optimum solution is to run many search processes in parallel on different threads.

Figure 6 demonstrates another optimization of the 'Berlin52' Travelling Salesman Problem with a population size of 150000. It shows the genes of the best solutions per generation. The genes are indices for the different travelling locations in a route.

Figure 6 (click to enlarge)

Another example of an optimization process with a Genetic Algorithm is shown in Figure 7. A string (DNA) with 68 random characters (Genes) will be evolved until it matches a predefined string. Each character in the string can be picked from an alphabet of 54 different characters (upper and lower case letters, full stop and space). The fitness of the DNA is based on the number of correct characters in the string. This is a toy problem since the correct solution is already known but it is easy to implement and can be used to test the performance of the Genetic Algorithm. The number of possible combinations for this problem has 118 decimal digits! The algorithm found the optimal solution after 11 generations.

Figure 7 (click to enlarge)


About the implementation


The following C++ implementation is based on the C++14 standard. It uses C++ features like Smart Pointers, Lambdas, Templates, Inheritance and Polymorphism. The code focuses on easy readability and not on performance optimizations. The algorithm is not parallelized and performs no error checks. You will find more details in the comments of the source code.

In the latest test the code successfully compiled without warnings with Visual Studio 2017 and GCC 5.3 on Windows 10.


Last change: 18.01.2018
Copyright by Tim Völcker - All rights reserved

Please send your feedback to:
File: main.cpp

// The algorithm is composed of two main classes. A reusable GeneticAlgorithm
// class and a DNA class which implements a specific problem:
//
// -  The GeneticAlgorithm class guides the evolution process of DNAs.
//    It manages the DNAs of the current population and next generation
//    and implements a strategy to select parent DNAs for reproduction.
//    It can be used for different problems without modification.
//
// -  A DNA class implements a particular problem to be solved by a
//    Genetic Algorithm. The class manages its own reproduction with
//    recombination / crossover, its mutation and fitness evaluation.
//    The methods are not very reusable and must be customized for each
//    problem. In this case a Travelling Salesman Problem is
//    implemented as a DNA_TSP class.
//
// There is no abstract base class for a DNA to inherit from. A user has to
// implement a DNA class with its particular methods and inject it into a
// GeneticAlgorithm by using templates. With this 'duck typing' approach all
// types will be figured out at compile time and not at runtime.
// A GeneticAlgorithm instance will create and call lots of DNA objects.
// There will be no performance overhead by runtime method binding
// (virtual functions), and no (abstract) factory methods must be provided
// for the creation of different DNA instances.
//
// The user of a console or GUI based host application should be able to
// choose different optimization algorithms. The GeneticAlgorithm class is an
// implementation of an abstract OptimizationAlgorithm base class / interface.
// By using this 'Strategy Design Pattern' the optimization algorithm can be
// switched at runtime.


#include <memory>
using std::unique_ptr;
using std::make_unique;

#include <iostream>
using std::cout;

#include <iomanip>
using std::left;
using std::setw;
using std::fixed;
using std::setprecision;

#include "rand_num_gen.hpp"
#include "optimization_algorithm.hpp"
#include "genetic_agorithm.hpp"
#include "dna_tsp.hpp"


int main()
{
    // A Genetic Algorithm uses lots of random numbers and is nondeterministic.
    // Every run of the program will produce different results. Sometimes it
    // will get stuck in a local optimum and sometimes it will find a global
    // optimal solution.

    auto randomNumberGenerator = make_unique<RandNumGen>();

    // For testing purposes the algorithm can be forced to be deterministic
    // by providing a fixed seed value for the random number generator.
    // With the following seed the algorithm will find a global optimal
    // solution. (On my machine. I'm not sure if the random number generator
    // produces the same results on all machines and operating systems!)

    // auto randomNumberGenerator = make_unique<RandNumGen>(1337);

    // The Genetic Algorithm will be initialized with a Travelling Salesman
    // Problem. The total length of a route through many locations should
    // be minimized.

    const size_t populationCount = 100;
    const double percentOfBestCanReproduce = 50.0;
    const double recombinationProbability = 0.9;   // 90%
    const double mutationProbability = 0.02;       //  2%

    unique_ptr<OptimizationAlgorithm> oa =
        make_unique< GeneticAlgorithm<DNA_TSP> >(
            OptimizationObjective::Minimization,
            populationCount,
            percentOfBestCanReproduce,
            recombinationProbability,
            mutationProbability,
            randomNumberGenerator.get());


    cout << "\nOptimization started...\n\n";
    cout << "Initialization     Cost: "
         << left << setw(10) << fixed
         << setprecision(4)  << oa->getSolutionQuality()
         << "   Solution: "
         << oa->getSolutionAsString() << "\n";


    while (! oa->isProblemSolved())
    {
        oa->calcNextSolution();

        cout << "Iteration: "
             << left << setw(5) << oa->getIteration()
             << "   Cost: "
             << left << setw(10) << fixed
             << setprecision(4)  << oa->getSolutionQuality()
             << "   Solution: "
             << oa->getSolutionAsString() << "\n";
    }

    if (oa->isProblemSolved()) { cout << "\nSOLVED!\n\n"; }

    return 0;
}
File: rand_num_gen.hpp

// A class for the creation of (pseudo) random numbers.
//
// This class uses a Mersenne Twister as a high quality (pseudo) random number
// generator. It is slower than simply calling the rand() function but it will 
// create better random numbers with a very good (uniform) distribution.
//
// A random number generator instance should be created ONCE for every thread.
// It should be injected into and used by objects which need to create
// random numbers. Do NOT create a generator for every random number!
//
// By providing a generator for every thread it does not need to be locked /
// synced which will result in a better performance.
//
// The system time & thread id will be used to create an initial random seed.
// When the system clock is called by multiple threads at nearly the same time
// it might return the same value. A hash of the thread id will be combined
// with the current time to create a unique random seed for every thread.


#pragma once

#include <iostream>
#include <string>

#include <random>
using std::mt19937;
using std::uniform_int_distribution;
using std::uniform_real_distribution;
using std::bernoulli_distribution;

#include <chrono>
using std::chrono::steady_clock;

#include <thread>
using std::thread;

#include <functional>
using std::hash;


class RandNumGen
{
public:
    RandNumGen();
    RandNumGen(long seed);
    template<typename T> T getIntegerInRange(T minInclusive, T maxInclusive);
    template<typename T> T getRealInRange(T minInclusive, T maxInclusive);
    bool isTrueWithProbability(double probability);
    long getSeed() const;

private:
    mt19937 _rng;
    long _seed;
};


template<typename T> T
RandNumGen::getIntegerInRange(T minInclusive, T maxInclusive)
{
    uniform_int_distribution<T> distribution(minInclusive, maxInclusive);
    return distribution(_rng);
}


template<typename T> T
RandNumGen::getRealInRange(T minInclusive, T maxInclusive)
{
    uniform_real_distribution<T> distribution(minInclusive, maxInclusive);
    return distribution(_rng);
}
File: rand_num_gen.cpp

#include "rand_num_gen.hpp"


RandNumGen::RandNumGen()
{
    // Create a random seed by combining the current time and thread id.

    const auto time = steady_clock::now().time_since_epoch().count();
    const auto threadId = hash<thread::id>()(std::this_thread::get_id());
    _seed = static_cast<long>(time) + static_cast<long>(threadId);

    _rng = mt19937(_seed); // Mersenne Twister
    _rng.discard(700000);  // Warm up for even better (pseudo) random numbers
}


RandNumGen::RandNumGen(long seed)
{
    _seed = seed;
    _rng = mt19937(seed);
    _rng.discard(700000);
}


bool RandNumGen::isTrueWithProbability(double probability)
{
    // A probability of 1.0 will always return 'true'
    // A probability of 0.7 will return 'true' 70% of the time
    // A probability of 0.0 will never return 'true'
    bernoulli_distribution distribution(probability);
    return distribution(_rng);
}


long RandNumGen::getSeed() const
{
    return _seed;
}
File: optimization_algorithm.hpp

// An Abstract Base Class / Interface for Optimization Algorithms.
//
// When an optimization algorithm like a genetic algorithm implements
// this interface, a host application can exchange it with different
// optimization algorithms at runtime. (Strategy Design Pattern)
//
// The objective of an optimization algorithm is to minimize or maximize
// the output value of an objective function. The algorithm has to
// systematically pick or create input variables (solutions) for the
// objective function until a sufficient or optimal solution is found.
//
// An objective function is often called a 'loss function' or 'cost function'
// when the result of the function should be minimized. 
// An objective function is often called a 'utility function' or
// 'fitness function' when the result of the function should be maximized.
//
// The objective function and the creation of new solutions must be
// implemented individually for each optimization algorithm.


#pragma once

#include <string>
using std::string;


enum OptimizationObjective { Minimization, Maximization };

class OptimizationAlgorithm
{
public:
    virtual ~OptimizationAlgorithm() noexcept { };

    virtual OptimizationObjective getObjective() const = 0;
    virtual string getName() const = 0;
    virtual size_t getIteration() const = 0;

    // This creates a new iteration of the optimization.
    // A new solution must be created and it has to be
    // evaluated by an objective function.
    virtual void calcNextSolution() = 0;

    // This returns the result of an evaluated objective function.
    // It's the cost or fitness value of the current (best) solution.
    // The objective is to minimize or maximize this value.
    virtual double getSolutionQuality() const = 0;

    virtual string getSolutionAsString() const = 0;
    virtual bool   isProblemSolved() const = 0;
};
File: genetic_algorithm.hpp

#pragma once

#include <vector>
using std::vector;

#include <memory>
using std::unique_ptr;
using std::make_unique;

#include <utility>
using std::move;
using std::pair;
using std::make_pair;

#include <string>
using std::string;

#include <limits>
#include <algorithm>

#include "optimization_algorithm.hpp"
#include "rand_num_gen.hpp"


template<typename DNA>
class GeneticAlgorithm : public OptimizationAlgorithm
{
public:
    GeneticAlgorithm(
        OptimizationObjective objective,
        size_t populationSize,
        double percentOfBestCanReproduce,
        double recombinationProbability,
        double mutationProbability,
        RandNumGen* randNumGen);

    OptimizationObjective getObjective() const override;
    string getName() const override;
    size_t getIteration() const override;

    void calcNextSolution() override;

    double getSolutionQuality() const override;
    string getSolutionAsString() const override;
    bool   isProblemSolved() const override;

private:
    inline void createInitialPopulation();
    inline void evolveNextGeneration();
    inline void calcAllFitnessValues();
    inline void createSelectionPool();
    inline void selectParentsFromPool();

private:
    OptimizationObjective _objective;
    RandNumGen* _randNumGen;
    size_t _populationSize;
    double _percentOfBestCanReproduce;
    double _recombinationProbability;
    double _mutationProbability;
    size_t _iteration;
    vector<unique_ptr<DNA>> _population;
    vector<unique_ptr<DNA>> _nextGeneration;
    vector<size_t> _selectionPool;
    DNA* _parentDNA1;
    DNA* _parentDNA2;
    DNA* _bestDNA;
};


// ----------------------------------------------------------------------------


template<typename DNA>
GeneticAlgorithm<DNA>::GeneticAlgorithm(
    OptimizationObjective objective,
    size_t populationSize,
    double percentOfBestCanReproduce,
    double recombinationProbability,
    double mutationProbability,
    RandNumGen* randNumGen)
{
    _objective = objective;
    _randNumGen = randNumGen;

    // Ensure a minimum and even population size
    _populationSize = std::max<size_t>(4, populationSize);
    if (_populationSize % 2 != 0) { _populationSize++; }

    _percentOfBestCanReproduce = percentOfBestCanReproduce;
    _recombinationProbability = recombinationProbability;
    _mutationProbability = mutationProbability;

    _iteration = 0;

    _population.reserve(_populationSize);
    _nextGeneration.reserve(_populationSize);
    _selectionPool.reserve(_populationSize);

    _parentDNA1 = nullptr;
    _parentDNA2 = nullptr;
    _bestDNA = nullptr;

    createInitialPopulation();
}


template<typename DNA>
inline OptimizationObjective GeneticAlgorithm<DNA>::getObjective() const
{
    return _objective;
}


template<typename DNA>
inline string GeneticAlgorithm<DNA>::getName() const
{
    return "Genetic Algorithm";
}


template<typename DNA>
inline size_t GeneticAlgorithm<DNA>::getIteration() const
{
    return _iteration;
}


template<typename DNA>
inline void GeneticAlgorithm<DNA>::calcNextSolution()
{
    _iteration++;
    evolveNextGeneration();
}


template<typename DNA>
inline double GeneticAlgorithm<DNA>::getSolutionQuality() const
{
    return _bestDNA->getFitness();
}


template<typename DNA>
inline string GeneticAlgorithm<DNA>::getSolutionAsString() const
{
    return _bestDNA->toString();
}


template<typename DNA>
inline bool GeneticAlgorithm<DNA>::isProblemSolved() const
{
    return _bestDNA->isSolved();
}


template<typename DNA>
inline void GeneticAlgorithm<DNA>::createInitialPopulation()
{
    // INITIALIZATION  (See Tutorial)
    _population.clear();
    for (size_t iDNA = 0; iDNA < _populationSize; iDNA++)
    {
        auto dna = make_unique<DNA>(_randNumGen);
        dna->initWithRandomGenes();
        _population.emplace_back(move(dna));
    }

    // FITNESS EVALUATION  (See Tutorial)
    calcAllFitnessValues();
}


template<typename DNA>
inline void GeneticAlgorithm<DNA>::calcAllFitnessValues()
{
    double minFitness = std::numeric_limits<double>::max();
    double maxFitness = std::numeric_limits<double>::min();
    double fitness = 0.0;

    if (_objective == OptimizationObjective::Minimization)
    {
        for (size_t iDNA = 0; iDNA < _populationSize; iDNA++)
        {
            fitness = _population[iDNA]->calcFitness();

            if (fitness < minFitness)
            {
                minFitness = fitness;
                _bestDNA = _population[iDNA].get();
            }
        }
    }
    else if (_objective == OptimizationObjective::Maximization)
    {
        for (size_t iDNA = 0; iDNA < _populationSize; iDNA++)
        {
            fitness = _population[iDNA]->calcFitness();

            if (fitness > maxFitness)
            {
                maxFitness = fitness;
                _bestDNA = _population[iDNA].get();
            }
        }
    }
}


template<typename DNA>
inline void GeneticAlgorithm<DNA>::evolveNextGeneration()
{
    _nextGeneration.clear();

    // PRE-SELECTION
    // Potential parent DNAs for reproduction will be picked from the current
    // population and stored in a selection pool. DNAs with better fitness
    // values have a higher chance to be part of the pool. 

    createSelectionPool();

    // In this implementation parents always create two children and the
    // size of the next generation is equal to the current population size.
    // You can experiment with a variable number of children or let the
    // population size increase or decrease over time.

    for (size_t iDNA = 0; iDNA < _populationSize / 2; iDNA++)
    {
        // SELECTION   (See Tutorial)
        selectParentsFromPool();

        // RECOMBINATION  (See Tutorial)
        auto child1 = make_unique<DNA>(_randNumGen);
        auto child2 = make_unique<DNA>(_randNumGen);

        if (_randNumGen->isTrueWithProbability(_recombinationProbability))
        {
            child1->initAsRecombination(_parentDNA1, _parentDNA2);
            child2->initAsRecombination(_parentDNA2, _parentDNA1);
        }
        else
        {
            child1->initAsCopy(_parentDNA1);
            child2->initAsCopy(_parentDNA2);
        }
        
        // MUTATION  (See Tutorial)
        child1->mutateGenes(_mutationProbability);
        child2->mutateGenes(_mutationProbability);

        _nextGeneration.emplace_back(move(child1));
        _nextGeneration.emplace_back(move(child2));
    }

    // REPLACEMENT  (See Tutorial)
    _population.swap(_nextGeneration);

    // FITNESS EVALUATION  (See Tutorial)
    calcAllFitnessValues();
}


template<typename DNA>
inline void GeneticAlgorithm<DNA>::createSelectionPool()
{
    // Usual methods for SELECTION in a Genetic Algorithm are:
    //
    // - Fitness Proportionate Selection
    // - Roulette Wheel Selection
    // - Stochastic Universal Sampling
    // - Rank Selection
    // - Elitist Selection
    // - Tournament Selection
    // - Truncation Selection
    //
    // This implementation combines some of the ideas.
    //
    // When a new generation will be evolved, not every DNA in the population
    // will get a chance to reproduce. A given percentage of the best DNAs will
    // be placed in a pool from which parents will be randomly selected 
    // for reproduction. A DNA with a better fitness value will be inserted
    // more often into the selection pool than a DNA with a low fitness value.
    // A DNA which is represented more often in the pool has a higher
    // probability to be selected as a parent.
    //
    // Imagine the selection pool as a roulette wheel. Each DNA in the pool gets
    // a slot in a roulette wheel. The size of the slot is based on the fitness
    // of the DNA. When selecting parents for reproduction it is like throwing
    // two balls into a roulette wheel. The balls have a higher chance of
    // landing in bigger slots than in smaller slots.
    //
    // When a single DNA has a very high fitness value and all other DNAs have
    // very low values, the one good DNA will become overrepresented in the
    // pool and it will be selected for reproduction nearly every time.
    // This (elitism) will reduce the diversity in the next generation and the
    // evolution may get stuck very fast. To get a more balanced representation
    // of good DNAs in the pool, their relative fitness values (to each other)
    // will be used instead of their absolute fitness values. All DNAs of a
    // population will be sorted in a list based on their fitness values.
    // Their position in the list will define their rank (relative fitness).
    // A percentage of the best ranked DNAs will be inserted into the
    // selection pool.
    //
    // The DNAs will not be copied to the ranking list or the selection pool.
    // Both are vectors of indices pointing to the DNAs in the population.


    // A ranking of DNA indices will be created. The DNAs will be sorted based
    // on their fitness values.

    vector<pair<size_t, double>> ranking;
    ranking.reserve(_populationSize);

    for (size_t iDNA = 0; iDNA < _populationSize; iDNA++)
    {
        ranking.emplace_back( make_pair(iDNA, _population[iDNA]->getFitness()) );
    }

    if (_objective == OptimizationObjective::Minimization)
    {
        std::sort(
            ranking.begin(),
            ranking.end(),
            [](auto& left, auto& right) { return left.second < right.second; });
    }
    else if (_objective == OptimizationObjective::Maximization)
    {
        std::sort(
            ranking.begin(),
            ranking.end(),
            [](auto& left, auto& right) { return left.second > right.second; });
    }


    // Take some percentage of the best DNAs from the ranking as candidates
    // for the selection pool. Better ranked candidates have a higher
    // probability to get inserted into the pool.

    _selectionPool.clear();

    const size_t candidateCount =
        static_cast<size_t>( ranking.size() * (_percentOfBestCanReproduce / 100.0) );

    const double candidateQuotient = 1.0 / candidateCount;

    double insertionProbability = 0.0;


    for (size_t rank = 0; rank < candidateCount; rank++)
    {
        insertionProbability = 1.0 - rank * candidateQuotient;

        // Optional: Strong (exponential) falloff
        insertionProbability = std::pow(insertionProbability, 2.0);

        if (_randNumGen->isTrueWithProbability(insertionProbability))
        {
            _selectionPool.emplace_back(ranking[rank].first);
        }
    }
}


template<typename DNA>
inline void GeneticAlgorithm<DNA>::selectParentsFromPool()
{
    // Randomly select two DNAs from the selection pool.
    // DNAs with better fitness values have a higher chance to get selected
    // because they have a higher appearance in the selection pool.
    // It is tolerable that sometimes both parents will be the same DNA.

    const auto poolIdx1 =
        _randNumGen->getIntegerInRange<size_t>(0, _selectionPool.size() - 1);

    const auto poolIdx2 =
        _randNumGen->getIntegerInRange<size_t>(0, _selectionPool.size() - 1);

    const auto dnaIdx1 = _selectionPool[poolIdx1];
    const auto dnaIdx2 = _selectionPool[poolIdx2];

    _parentDNA1 = _population[dnaIdx1].get();
    _parentDNA2 = _population[dnaIdx2].get();
}
File: dna_tsp.hpp

// This DNA class implements a solution for a Travelling Salesman Problem.
// From 52 given route locations the shortest possible route should be found.
// The DNA represents a route. The genes of the DNA are the route locations.
// The fitness value of the DNA is defined by the total length of the route.
// In this case the objective is to minimize the fitness value.
// Other optimization algorithms would call this a 'cost' value, because
// the cost of a solution should be minimized, but here it will be called
// 'fitness' value, because this class is part of a Genetic Algorithm.

// A DNA class implements a particular problem to be solved by a Genetic
// Algorithm. The class manages its own reproduction, mutation and fitness
// evaluation. There is no abstract base class for a DNA to inherit from.
// A user has to implement the class with its particular methods and inject
// it into a Genetic Algorithm by using templates. With this 'duck typing'
// approach there will be no performance overhead by runtime method binding
// (virtual functions), and there will be less code because no factory
// methods must be provided for the creation of different DNA instances.


#pragma once

#include <algorithm>
#include <numeric>

#include <array>
using std::array;

#include <vector>
using std::vector;

#include <string>
using std::string;

#include <cmath>
using std::sqrt;
using std::pow;

#include "rand_num_gen.hpp"


class DNA_TSP
{
// Methods to be called by the GeneticAlgorithm class:
public:

    DNA_TSP(RandNumGen* randNumGen);

    void initWithRandomGenes();
    void initAsCopy(const DNA_TSP* dna);
    void initAsRecombination(const DNA_TSP* parent1, const DNA_TSP* parent2);

    void mutateGenes(double probability);

    double calcFitness();
    double getFitness() const;

    string toString() const;

    bool isSolved() const;


// Specific methods and properties for a Travelling Salesman Problem:
private:

    inline double calcRouteLength(const vector<size_t>& route);

    inline double euclidianDistance2D(
        double x1,
        double y1,
        double x2,
        double y2);

    inline void orderCrossover_OX(const DNA_TSP* dna1, const DNA_TSP* dna2);

    inline bool isGeneInSection(
        size_t gene,
        size_t iSectionStart,
        size_t iSectionEnd) const;

    inline void swapMutation();

    inline void twoOptLocalSearch();

    inline void twoOptSwap(
        const vector<size_t>& inGenes,
        vector<size_t>& outGenes,
        size_t iGene1,
        size_t iGene2);

private:
    RandNumGen* _randNumGen;
    const static array<double, 52> _posX;  // 52 location coordinates
    const static array<double, 52> _posY;
    vector<size_t> _genes;  // Route = 52 indices to location coordinates
    double _fitness;
};
File: dna_tsp.cpp

#include "dna_tsp.hpp"


// The 'Berlin52' data set consists of 52 route locations.

const array<double, 52> DNA_TSP::_posX =
    { 565.0, 25.0, 345.0, 945.0, 845.0, 880.0, 25.0, 525.0, 580.0, 650.0,
      1605.0, 1220.0, 1465.0, 1530.0, 845.0, 725.0, 145.0, 415.0, 510.0, 560.0,
      300.0, 520.0, 480.0, 835.0, 975.0, 1215.0, 1320.0, 1250.0, 660.0, 410.0,
      420.0, 575.0, 1150.0, 700.0, 685.0, 685.0, 770.0, 795.0, 720.0, 760.0,
      475.0, 95.0, 875.0, 700.0, 555.0, 830.0, 1170.0, 830.0, 605.0, 595.0,
      1340.0, 1740.0 };

const array<double, 52> DNA_TSP::_posY =
    { 575.0, 185.0, 750.0, 685.0, 655.0, 660.0, 230.0, 1000.0, 1175.0, 1130.0,
      620.0, 580.0, 200.0, 5.0, 680.0, 370.0, 665.0, 635.0, 875.0, 365.0,
      465.0, 585.0, 415.0, 625.0, 580.0, 245.0, 315.0, 400.0, 180.0, 250.0,
      555.0, 665.0, 1160.0, 580.0, 595.0, 610.0, 610.0, 645.0, 635.0, 650.0,
      960.0, 260.0, 920.0, 500.0, 815.0, 485.0, 65.0, 610.0, 625.0, 360.0,
      725.0, 245.0 };

//  The shortest route (global optimal solution) of the data set is:
//
//  { 0, 48, 31, 44, 18, 40, 7, 8, 9, 42, 32, 50, 10, 51, 13, 12, 46, 25, 26,
//   27, 11, 24, 3, 5, 14, 4, 23, 47, 37, 36, 39, 38, 35, 34, 33, 43, 45, 15,
//   28, 49, 19, 22, 29, 1, 6, 41, 20, 16, 2, 17, 30, 21, (0) }
//
//  Total length = 7544.366
//
//  The given values are indices for the location coordinates (arrays).
//  The route is a loop and can be read in forward and backward direction.


DNA_TSP::DNA_TSP(RandNumGen* randNumGen)
{
    _randNumGen = randNumGen;

    // Fill the route with all possible locations. In this case the genes
    // are indices (0 to 51) for the location coordinates (arrays).
    _genes.resize(52);
    std::iota(_genes.begin(), _genes.end(), 0);

    // For the Travelling Salesman Problem the fitness value of the DNA
    // is represented by the total length of the route, which should be
    // minimized. It will be initialized with the worst possible value.
    _fitness = std::numeric_limits<double>::max();
}


void DNA_TSP::initWithRandomGenes()
{
    // Randomly shuffle the genes (route indices).

    size_t iGeneSwap = 0;
    size_t geneTemp = 0;

    for (size_t iGene = 0; iGene < _genes.size(); iGene++)
    {
        iGeneSwap = _randNumGen->getIntegerInRange<size_t>(0, _genes.size() - 1);
        
        geneTemp = _genes[iGene];
        _genes[iGene] = _genes[iGeneSwap];
        _genes[iGeneSwap] = geneTemp;
    }
}


void DNA_TSP::initAsCopy(const DNA_TSP* dna)
{
    for (size_t iGene = 0; iGene < _genes.size(); iGene++)
    {
        _genes[iGene] = dna->_genes[iGene];
    }

    _fitness = dna->_fitness;
}


void DNA_TSP::initAsRecombination(
    const DNA_TSP* parent1,
    const DNA_TSP* parent2)
{
    // This DNA will become a child of both parent DNAs by copying some genes
    // of parent1 and some genes of parent2. This recombination is also called
    // crossover. The following methods are typically used as a
    // crossover operator:
    //
    // - Every gene has a 50% chance to be copied from parent1 and a 50%
    //   chance to be copied from parent2.
    //
    // - A fixed or random crossover point (index) will be set.
    //   All genes before the index will be copied from parent1 and all genes
    //   after the the index will be copied from parent2. Alternative crossover
    //   methods may use multiple crossover points to copy different sections
    //   from parent DNAs.
    //
    // Both methods can NOT be used for the Travelling Salesman Problem because
    // this way some of the genes might be duplicated. The same route location
    // (gene) can be inserted from parent1 and parent2 but the Travelling
    // Salesman Problem demands that every route location is unique!
    //
    // There are some special crossover methods for TSP:
    //
    // - Partially-mapped crossover (PMX)
    // - Cycle crossover (CX)
    // - Position-based crossover (PBX)
    // - Order-based crossover (OBX)
    // - Order crossover (OX)
    //
    // One of the best methods is the order crossover (OX) which is used here.

    orderCrossover_OX(parent1, parent2);
}


void DNA_TSP::mutateGenes(double probability)
{
    // The DNA will be mutated by picking a single gene (point mutation) or
    // multiple genes (with a given rate / probability) and exchange them
    // with new genes from the set of all possible genes.
    //
    // This method can NOT be used for the Travelling Salesman Problem because
    // this way some of the genes will be duplicated. The same route location
    // may get inserted multiple times into the DNA but the Travelling Salesman
    // Problem demands that every route location is unique!
    //
    // Two of many possible solutions to solve this problem are:
    // 1) Pick one random gene in the DNA and swap it with its next neighbor.
    // 2) Pick two random genes in the DNA and swap them.
    // We are going to use the latter with a given probability:

    if (_randNumGen->isTrueWithProbability(probability))
    {
        swapMutation();
    }

    // OPTIONAL:
    // There are specialized heuristics like the 2-opt local search algorithm
    // which tries to find better solutions for the Travelling Salesman Problem
    // by reordering the route locations when the route crosses over itself.
    // A Genetic Algorithm will find a global optimum solution much faster
    // when a local search algorithm will be used after the mutation.

    twoOptLocalSearch();

    // When 2-opt search is used the population size can / should be very low!
    // Recommended parameters for the Genetic Algorithm when 2-opt is used:
    // populationCount = 100;
    // percentOfBestCanReproduce = 50.0;
    // recombinationProbability = 0.9;
    // mutationProbability = 0.02;

    // When 2-opt search is NOT used the population size has to be much higher!
    // Recommended parameters for the Genetic Algorithm when 2-opt is NOT used:
    // populationCount = 150000;
    // percentOfBestCanReproduce = 10.0;
    // recombinationProbability = 0.9;
    // mutationProbability = 0.02;
}


double DNA_TSP::calcFitness()
{
    // FITNESS FUNCTION

    // The fitness value of the DNA is defined by the total length of the
    // route. In this case of a Travelling Salesman Problem the objective
    // is to minimize the fitness value.

    _fitness = calcRouteLength(_genes);

    // To get a better separation of different DNAs with a similar fitness,
    // the values can be squared to get exponential instead of linear results
    // (_fitness = _fitness * _fitness). Squared fitness values are not
    // beneficial in this case because the GeneticAlgorithm class uses a
    // rank based selection method for DNAs which sorts all fitness values.

    return _fitness;
}


double DNA_TSP::getFitness() const
{
    return _fitness;
}


string DNA_TSP::toString() const
{
    // This returns a string of all route locations which are represented
    // as indices of the coordinate arrays. The route is a loop and will be
    // shifted to always display index 0 as the first route location.
    //
    // The standard string has a move constructor, so returning long strings
    // by value is efficient.
    
    // Find the start of the route.

    size_t iRouteStart = 0;

    for (size_t iGene = 0; iGene < _genes.size(); iGene++)
    {
        if (_genes[iGene] == 0) { iRouteStart = iGene; break; }
    }

    // Get the elements from the route start index to the end of the array.

    string dnaStr = "[";

    for (size_t iGene = iRouteStart; iGene < _genes.size(); iGene++)
    {
        if (_genes[iGene] < 10) { dnaStr += " "; }
        dnaStr += std::to_string(_genes[iGene]) + ", ";
    }

    // Get the elements from the start of the array to the route start index.

    for (size_t iGene = 0; iGene < iRouteStart; iGene++)
    {
        if (_genes[iGene] < 10) { dnaStr += " "; }
        dnaStr += std::to_string(_genes[iGene]) + ", ";
    }

    return dnaStr + "]";
}


bool DNA_TSP::isSolved() const
{
    // 7544.3659 is the shortest route length for the 'Berlin52' data set.
    return _fitness < 7544.366;
}


inline double DNA_TSP::calcRouteLength(const vector<size_t>& route)
{
    // This calculates the sum of all distances between successive
    // route locations.

    double totalLength = 0.0;

    for (size_t i = 0; i < route.size() - 1; i++)
    {
        totalLength += euclidianDistance2D(
            _posX[route[i]],
            _posY[route[i]],
            _posX[route[i + 1]],
            _posY[route[i + 1]]);
    }

    // Add the distance from the last to the first route location.

    totalLength += euclidianDistance2D(
        _posX[route[route.size() - 1]],
        _posY[route[route.size() - 1]],
        _posX[route[0]],
        _posY[route[0]]);

    return totalLength;
}


inline double DNA_TSP::euclidianDistance2D(
    double x1,
    double y1,
    double x2,
    double y2)
{
    return sqrt( pow(x1 - x2, 2.0) + pow(y1 - y2, 2.0) );
}


inline void DNA_TSP::orderCrossover_OX(
    const DNA_TSP* dna1,
    const DNA_TSP* dna2)
{
    // This crossover method will copy the genes of two parent DNAs
    // without creating any duplicates and by preserving their order. 
    // The result will be a valid route for the Travelling Salesman Problem.
    //
    // A section of genes from dna1 will be copied to this dna and the
    // remaining genes will be copied to this dna in the order in which they
    // appear in dna2.

    // Get a random start and end index for a section of dna1.

    size_t iSecStart = 0;
    size_t iSecEnd = 0;

    while (iSecStart >= iSecEnd)
    {
        iSecStart =
            _randNumGen->getIntegerInRange<size_t>(0, dna1->_genes.size() - 1);

        iSecEnd =
            _randNumGen->getIntegerInRange<size_t>(0, dna1->_genes.size() - 1);
    }

    // Copy the section of dna1 to this dna.

    for (size_t iGene = iSecStart; iGene <= iSecEnd; iGene++)
    {
        _genes[iGene] = dna1->_genes[iGene];
    }

    const size_t sectionSize = iSecEnd - iSecStart;

    // Copy the genes of dna2 without the genes found in the section of dna1.
    // The copying begins after the end index of the section. When the end of
    // dna2 is reached, copy the genes from the beginning of dna2 to the start
    // index of the section.

    vector<size_t> dnaDifference;
    dnaDifference.reserve(dna2->_genes.size() - sectionSize);

    if (iSecEnd + 1 <= dna2->_genes.size() - 1)
    {
        for (size_t iGene = iSecEnd + 1; iGene < dna2->_genes.size(); iGene++)
        {
            if (!isGeneInSection(dna2->_genes[iGene], iSecStart, iSecEnd))
            {
                dnaDifference.emplace_back(dna2->_genes[iGene]);
            }
        }
    }

    for (size_t iGene = 0; iGene <= iSecEnd; iGene++)
    {
        if (!isGeneInSection(dna2->_genes[iGene], iSecStart, iSecEnd))
        {
            dnaDifference.emplace_back(dna2->_genes[iGene]);
        }
    }

    // The difference from dna1 and dna2 will be copied to this dna.
    // The insertion of genes into this dna begins after the end index of the
    // section. When the end of the dna is reached, insert the genes in the
    // beginning of the dna to the start index of the section.

    size_t i = 0;

    if (iSecEnd + 1 <= dna2->_genes.size() - 1) { i = iSecEnd + 1; }

    for (size_t iGene = 0; iGene < dnaDifference.size(); iGene++)
    {
        _genes[i] = dnaDifference[iGene];
        i++;
        if (i > _genes.size() - 1) { i = 0; }
    }
}


inline bool DNA_TSP::isGeneInSection(
    size_t gene,
    size_t iSectionStart,
    size_t iSectionEnd) const
{
    for (size_t iGene = iSectionStart; iGene <= iSectionEnd; iGene++)
    {
        if (gene == _genes[iGene]) { return true; }
    }

    return false;
}


inline void DNA_TSP::swapMutation()
{
    // This mutates the DNA in a minimal way by selecting two random genes
    // and swapping them with each other.

    const auto iGene1 =
        _randNumGen->getIntegerInRange<size_t>(0, _genes.size() - 1);

    const auto iGene2 =
        _randNumGen->getIntegerInRange<size_t>(0, _genes.size() - 1);

    const auto tempGene = _genes[iGene1];
    _genes[iGene1] = _genes[iGene2];
    _genes[iGene2] = tempGene;
}


inline void DNA_TSP::twoOptLocalSearch()
{
    // '2-opt' is a local search algorithm for optimizing a Travelling Salesman
    // Problem. It takes a route that crosses over itself and reorders the
    // locations to eliminate the cross over.

    // The reordering will be applied to all possible location pairs as long as
    // it can improve the total length of the route.

    bool hasImproved = true;
    double swappedGenesFitness = 0.0;
    vector<size_t> swappedGenes(_genes.size());

    while (hasImproved)
    {
        for (size_t iGene1 = 1; iGene1 < _genes.size() - 1; iGene1++)
        for (size_t iGene2 = iGene1 + 1; iGene2 < _genes.size(); iGene2++)
        {
            twoOptSwap(_genes, swappedGenes, iGene1, iGene2);
            swappedGenesFitness = calcRouteLength(swappedGenes);

            if (swappedGenesFitness < _fitness)
            {
                _genes.swap(swappedGenes);
                _fitness = swappedGenesFitness;
                hasImproved = true;
            }
            else { hasImproved = false; }
        }
    }
}


inline void DNA_TSP::twoOptSwap(
    const vector<size_t>& inGenes,
    vector<size_t>& outGenes,
    size_t iGene1,
    size_t iGene2)
{
    // Take inGenes[0] to inGenes[iGene1 - 1]
    // and add them in order to outGenes

    for (size_t iGene = 0; iGene <= iGene1 - 1; iGene++)
    {
        outGenes[iGene] = inGenes[iGene];
    }

    // Take inGenes[iGene1] to inGenes[iGene2] and
    // add them in reverse order to outGenes

    size_t iter = 0;
    for (size_t iGene = iGene1; iGene <= iGene2; iGene++)
    {
        outGenes[iGene] = inGenes[iGene2 - iter];
        iter++;
    }

    // Take inGenes[iGene2 + 1] to end of inGenes
    // and add them in order to outGenes

    for (size_t iGene = iGene2 + 1; iGene < inGenes.size(); iGene++)
    {
        outGenes[iGene] = inGenes[iGene];
    }
}
COPYRIGHT BY TIM VÖLCKER   •   ALL RIGHTS RESERVED