Find an exact model for the Koza quintic problem
In this first tutorial we show how to find an exact formula for some input data that do not require any real valued constant. This is the easiest case for a symbolic regression task and thus makes it for a perfect entry tutorial.
We use the classic problem Koza quintic polynomial, that is x - 2x^3 + x^5.
Code:
1#include <vector>
2
3#include <boost/algorithm/string.hpp>
4#include <pagmo/algorithm.hpp>
5#include <pagmo/io.hpp>
6#include <pagmo/population.hpp>
7#include <pagmo/problem.hpp>
8
9#include <dcgp/algorithms/es4cgp.hpp>
10#include <dcgp/gym.hpp>
11#include <dcgp/kernel_set.hpp>
12#include <dcgp/problems/symbolic_regression.hpp>
13#include <dcgp/symengine.hpp>
14
15using namespace dcgp;
16using namespace boost::algorithm;
17
18int main()
19{
20 // We load the data (using problem koza_quintic from the gym)
21 std::vector<std::vector<double>> X, Y;
22 gym::generate_koza_quintic(X, Y);
23
24 // We instantiate a symbolic regression problem with no ephemeral constants.
25 auto n_eph = 0u;
26 symbolic_regression udp(X, Y, 1u, 20u, 21u, 2u, kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), n_eph);
27
28 // We init a population with four individuals
29 pagmo::population pop{udp, 4u};
30
31 // And we define an evolutionary startegy with 10000 generation and 2
32 // active mutations (base)
33 dcgp::es4cgp uda(10000, 4u, 1e-8);
34 pagmo::algorithm algo{uda};
35 algo.set_verbosity(500u);
36
37 // We evolve the population
38 pop = algo.evolve(pop);
39
40 // We print on screen the best found
41 auto idx = pop.best_idx();
42 auto prettier = udp.prettier(pop.get_x()[idx]);
43 trim_left_if(prettier, is_any_of("["));
44 trim_right_if(prettier, is_any_of("]"));
45
46 pagmo::print("\nBest fitness: ", pop.get_f()[idx], "\n");
47 pagmo::print("Chromosome: ", pop.get_x()[idx], "\n");
48 pagmo::print("Pretty Formula: ", udp.pretty(pop.get_x()[idx]), "\n");
49 pagmo::print("Prettier Formula: ", prettier, "\n");
50 pagmo::print("Expanded Formula: ", SymEngine::expand(SymEngine::Expression(prettier)), "\n");
51
52 return false;
53}
Output:
Note: the actual output will be different on your computers as its non deterministic.
Gen: Fevals: Best: Constants: Formula:
0 0 3898.35 [] [2*x0**3] ...
500 2000 638.426 [] [x0**5] ...
1000 4000 138.482 [] [(-x0**2 + x0**4)*x0] ...
1500 6000 101.734 [] [-x0 + (-x0**2 + x0**4)*x0] ...
1698 6792 5.2071e-30 [] [x0*(1 - x0**2) - x0**3*(1 - x0**2)] ...
Exit condition -- ftol < 1e-08
Best fitness: [5.2071e-30]
Chromosome: [2, 0, 0, 3, 1, ... ]
Pretty Formula: [(((((x0*x0)/(x0*x0))-(x0*x0))*x0)-(((((x0*x0)/(x0*x0))-(x0*x0))*x0)*(x0*x0)))]
Prettier Formula: x0*(1 - x0**2) - x0**3*(1 - x0**2)
Expanded Formula: x0 - 2*x0**3 + x0**5