Find an exact model for the Koza quintic problem

In this first tutorial we show how to find an exact formula for some input data that do not require any real valued constant. This is the easiest case for a symbolic regression task and thus makes it for a perfect entry tutorial.

We use the classic problem Koza quintic polynomial, that is x - 2x^3 + x^5.

Code:

 1#include <vector>
 2
 3#include <boost/algorithm/string.hpp>
 4#include <pagmo/algorithm.hpp>
 5#include <pagmo/io.hpp>
 6#include <pagmo/population.hpp>
 7#include <pagmo/problem.hpp>
 8
 9#include <dcgp/algorithms/es4cgp.hpp>
10#include <dcgp/gym.hpp>
11#include <dcgp/kernel_set.hpp>
12#include <dcgp/problems/symbolic_regression.hpp>
13#include <dcgp/symengine.hpp>
14
15using namespace dcgp;
16using namespace boost::algorithm;
17
18int main()
19{
20    // We load the data (using problem koza_quintic from the gym)
21    std::vector<std::vector<double>> X, Y;
22    gym::generate_koza_quintic(X, Y);
23
24    // We instantiate a symbolic regression problem with no ephemeral constants.
25    auto n_eph = 0u;
26    symbolic_regression udp(X, Y, 1u, 20u, 21u, 2u, kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), n_eph);
27
28    // We init a population with four individuals
29    pagmo::population pop{udp, 4u};
30
31    // And we define an evolutionary startegy with 10000 generation and 2
32    // active mutations (base)
33    dcgp::es4cgp uda(10000, 4u, 1e-8);
34    pagmo::algorithm algo{uda};
35    algo.set_verbosity(500u);
36
37    // We evolve the population
38    pop = algo.evolve(pop);
39
40    // We print on screen the best found
41    auto idx = pop.best_idx();
42    auto prettier = udp.prettier(pop.get_x()[idx]);
43    trim_left_if(prettier, is_any_of("["));
44    trim_right_if(prettier, is_any_of("]"));
45
46    pagmo::print("\nBest fitness: ", pop.get_f()[idx], "\n");
47    pagmo::print("Chromosome: ", pop.get_x()[idx], "\n");
48    pagmo::print("Pretty Formula: ", udp.pretty(pop.get_x()[idx]), "\n");
49    pagmo::print("Prettier Formula: ", prettier, "\n");
50    pagmo::print("Expanded Formula: ", SymEngine::expand(SymEngine::Expression(prettier)), "\n");
51
52    return false;
53}

Output:

Note: the actual output will be different on your computers as its non deterministic.

Gen:        Fevals:          Best:  Constants:    Formula:
   0              0        3898.35           []    [2*x0**3] ...
 500           2000        638.426           []    [x0**5] ...
1000           4000        138.482           []    [(-x0**2 + x0**4)*x0] ...
1500           6000        101.734           []    [-x0 + (-x0**2 + x0**4)*x0] ...
1698           6792     5.2071e-30           []    [x0*(1 - x0**2) - x0**3*(1 - x0**2)] ...
Exit condition -- ftol < 1e-08

Best fitness: [5.2071e-30]
Chromosome: [2, 0, 0, 3, 1, ... ]
Pretty Formula: [(((((x0*x0)/(x0*x0))-(x0*x0))*x0)-(((((x0*x0)/(x0*x0))-(x0*x0))*x0)*(x0*x0)))]
Prettier Formula: x0*(1 - x0**2) - x0**3*(1 - x0**2)
Expanded Formula: x0 - 2*x0**3 + x0**5