Finding an entire non dominated front of formulas.

In this fourth tutorial on symbolic regression we solve the multiobjective symbolic regression problem. The Mean Squared Error (i.e the loss) of our model is considered next to the model complexity to determine how good a certain model is. The result is thus a whole non-dominated front of models.

This case is arguably the most complete and useful among symbolic regression tasks. We use here the problem vladi6 from the dcgp::gym, that is: 6*cos(x*sin(y))

Code:

 1#include <vector>
 2
 3#include <boost/algorithm/string.hpp>
 4#include <pagmo/algorithm.hpp>
 5#include <pagmo/io.hpp>
 6#include <pagmo/population.hpp>
 7#include <pagmo/problem.hpp>
 8#include <pagmo/utils/multi_objective.hpp>
 9
10#include <dcgp/algorithms/momes4cgp.hpp>
11#include <dcgp/gym.hpp>
12#include <dcgp/kernel_set.hpp>
13#include <dcgp/problems/symbolic_regression.hpp>
14#include <dcgp/symengine.hpp>
15
16using namespace dcgp;
17using namespace boost::algorithm;
18
19int main()
20{
21    // We load the data (using problem vladi6 from the gym)
22    std::vector<std::vector<double>> X, Y;
23    gym::generate_P1(X, Y);
24
25    // We instantiate a symbolic regression problem with one only ephemeral constants.
26    // Note that here we also set a batch parallelism to 5 so that 5 batches of 2 points
27    // will be created and handled in parallel.
28    auto n_eph = 1u;
29    symbolic_regression udp(X, Y, 1u, 15u, 16u, 2u, kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), n_eph, true);
30
31    // We init a large population (100) of individuals.
32    pagmo::population pop{udp, 4u};
33
34    // We here use the Multi-Objective Memetic Evolutionary Strategy: an original algorithm provided in this dCGP
35    // project. We instantiate it with 100 generations and 4 active mutations maximum per individual.
36    dcgp::momes4cgp uda{1000u, 4u, 1e-8};
37    pagmo::algorithm algo{uda};
38    algo.set_verbosity(50u);
39
40    // We solve the problem
41    pop = algo.evolve(pop);
42
43    // Finally we print on screen the non dominated front.
44    pagmo::print("\nNon dominated Front at the end:\n");
45    auto ndf = pagmo::non_dominated_front_2d(pop.get_f());
46    for (decltype(ndf.size()) i = 0u; i < ndf.size(); ++i) {
47        auto idx = ndf[i];
48        auto prettier = udp.prettier(pop.get_x()[idx]);
49        trim_left_if(prettier, is_any_of("["));
50        trim_right_if(prettier, is_any_of("]"));
51        pagmo::print(std::setw(2), i + 1, " - Loss: ", std::setw(13), std::left, pop.get_f()[idx][0], std::setw(15),
52                     "Complexity: ", std::left, std::setw(5), pop.get_f()[idx][1], std::setw(10), "Formula: ", prettier,
53                     "\n");
54    }
55    return false;
56}

Output:

Note: the actual output will be different on your computers as its non deterministic.

Gen:        Fevals:     Best loss: Ndf size:  Compl.:
    0              0        4.10239        11        62
   10           1000        1.42703         7        74
   20           2000       0.828554         8        45
   30           3000       0.374803        13        78
   40           4000       0.164032        16        66
   50           5000        0.03552        14        48
   60           6000      0.0200792        11        45
   70           7000              0        10        19
   80           8000              0        10        19
   90           9000              0        10        19
  100          10000              0        11        19
  110          11000              0        10        18
  120          12000              0        11        18
  130          13000              0        11        18
  140          14000              0        11        18
  150          15000              0        11        18
  160          16000              0        11        18
  170          17000              0        11        18
  180          18000              0        11        18
  190          19000              0        11        18
  200          20000              0        11        18
  210          21000              0        11        18
  220          22000              0        11        18
  230          23000              0        11        18
  240          24000              0        10        18
  250          25000              0        10        18
  Exit condition -- max generations = 250

  Non dominated Front at the end:
  1  - Loss: 0            Complexity:    18   Formula:  c1*cos(x0*sin(x1))
  2  - Loss: 0.844272     Complexity:    17   Formula:  3*c1 - 2*x0 - 2*x0*x1
  3  - Loss: 1.10197      Complexity:    15   Formula:  4*c1 - 3*x0 - x0*x1
  4  - Loss: 1.17331      Complexity:    14   Formula:  3*c1 - 3*x0 - 2*x1
  5  - Loss: 1.27379      Complexity:    10   Formula:  c1 - 3*x0 - x1
  6  - Loss: 1.92403      Complexity:    9    Formula:  2*c1 - 3*x0
  7  - Loss: 1.92403      Complexity:    7    Formula:  c1 - 3*x0
  8  - Loss: 3.22752      Complexity:    5    Formula:  c1 - x0
  9  - Loss: 4.74875      Complexity:    2    Formula:  c1
  10 - Loss: 4.8741       Complexity:    1    Formula:  4