Finding an entire non dominated front of formulas.
In this fourth tutorial on symbolic regression we solve the multiobjective symbolic regression problem. The Mean Squared Error (i.e the loss) of our model is considered next to the model complexity to determine how good a certain model is. The result is thus a whole non-dominated front of models.
This case is arguably the most complete and useful among symbolic regression tasks. We use here the problem vladi6 from the dcgp::gym, that is: 6*cos(x*sin(y))
Code:
1#include <vector>
2
3#include <boost/algorithm/string.hpp>
4#include <pagmo/algorithm.hpp>
5#include <pagmo/io.hpp>
6#include <pagmo/population.hpp>
7#include <pagmo/problem.hpp>
8#include <pagmo/utils/multi_objective.hpp>
9
10#include <dcgp/algorithms/momes4cgp.hpp>
11#include <dcgp/gym.hpp>
12#include <dcgp/kernel_set.hpp>
13#include <dcgp/problems/symbolic_regression.hpp>
14#include <dcgp/symengine.hpp>
15
16using namespace dcgp;
17using namespace boost::algorithm;
18
19int main()
20{
21 // We load the data (using problem vladi6 from the gym)
22 std::vector<std::vector<double>> X, Y;
23 gym::generate_P1(X, Y);
24
25 // We instantiate a symbolic regression problem with one only ephemeral constants.
26 // Note that here we also set a batch parallelism to 5 so that 5 batches of 2 points
27 // will be created and handled in parallel.
28 auto n_eph = 1u;
29 symbolic_regression udp(X, Y, 1u, 15u, 16u, 2u, kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), n_eph, true);
30
31 // We init a large population (100) of individuals.
32 pagmo::population pop{udp, 4u};
33
34 // We here use the Multi-Objective Memetic Evolutionary Strategy: an original algorithm provided in this dCGP
35 // project. We instantiate it with 100 generations and 4 active mutations maximum per individual.
36 dcgp::momes4cgp uda{1000u, 4u, 1e-8};
37 pagmo::algorithm algo{uda};
38 algo.set_verbosity(50u);
39
40 // We solve the problem
41 pop = algo.evolve(pop);
42
43 // Finally we print on screen the non dominated front.
44 pagmo::print("\nNon dominated Front at the end:\n");
45 auto ndf = pagmo::non_dominated_front_2d(pop.get_f());
46 for (decltype(ndf.size()) i = 0u; i < ndf.size(); ++i) {
47 auto idx = ndf[i];
48 auto prettier = udp.prettier(pop.get_x()[idx]);
49 trim_left_if(prettier, is_any_of("["));
50 trim_right_if(prettier, is_any_of("]"));
51 pagmo::print(std::setw(2), i + 1, " - Loss: ", std::setw(13), std::left, pop.get_f()[idx][0], std::setw(15),
52 "Complexity: ", std::left, std::setw(5), pop.get_f()[idx][1], std::setw(10), "Formula: ", prettier,
53 "\n");
54 }
55 return false;
56}
Output:
Note: the actual output will be different on your computers as its non deterministic.
Gen: Fevals: Best loss: Ndf size: Compl.:
0 0 4.10239 11 62
10 1000 1.42703 7 74
20 2000 0.828554 8 45
30 3000 0.374803 13 78
40 4000 0.164032 16 66
50 5000 0.03552 14 48
60 6000 0.0200792 11 45
70 7000 0 10 19
80 8000 0 10 19
90 9000 0 10 19
100 10000 0 11 19
110 11000 0 10 18
120 12000 0 11 18
130 13000 0 11 18
140 14000 0 11 18
150 15000 0 11 18
160 16000 0 11 18
170 17000 0 11 18
180 18000 0 11 18
190 19000 0 11 18
200 20000 0 11 18
210 21000 0 11 18
220 22000 0 11 18
230 23000 0 11 18
240 24000 0 10 18
250 25000 0 10 18
Exit condition -- max generations = 250
Non dominated Front at the end:
1 - Loss: 0 Complexity: 18 Formula: c1*cos(x0*sin(x1))
2 - Loss: 0.844272 Complexity: 17 Formula: 3*c1 - 2*x0 - 2*x0*x1
3 - Loss: 1.10197 Complexity: 15 Formula: 4*c1 - 3*x0 - x0*x1
4 - Loss: 1.17331 Complexity: 14 Formula: 3*c1 - 3*x0 - 2*x1
5 - Loss: 1.27379 Complexity: 10 Formula: c1 - 3*x0 - x1
6 - Loss: 1.92403 Complexity: 9 Formula: 2*c1 - 3*x0
7 - Loss: 1.92403 Complexity: 7 Formula: c1 - 3*x0
8 - Loss: 3.22752 Complexity: 5 Formula: c1 - x0
9 - Loss: 4.74875 Complexity: 2 Formula: c1
10 - Loss: 4.8741 Complexity: 1 Formula: 4