Finding an entire non dominated front of formulas.

In this fourth tutorial on symbolic regression we solve the multiobjective symbolic regression problem. The Mean Squared Error (i.e the loss) of our model is considered next to the model complexity to determine how good a certain model is. The result is thus a whole non-dominated front of models.

This case is arguably the most complete and useful among symbolic regression tasks. We use here the problem vladi6 from the dcgp::gym, that is: 6*cos(x*sin(y))

Code:

#include <vector>

#include <boost/algorithm/string.hpp>
#include <pagmo/algorithm.hpp>
#include <pagmo/io.hpp>
#include <pagmo/population.hpp>
#include <pagmo/problem.hpp>
#include <pagmo/utils/multi_objective.hpp>

#include <dcgp/algorithms/momes4cgp.hpp>
#include <dcgp/gym.hpp>
#include <dcgp/kernel_set.hpp>
#include <dcgp/problems/symbolic_regression.hpp>
#include <dcgp/symengine.hpp>

using namespace dcgp;
using namespace boost::algorithm;

int main()
{
    // We load the data (using problem vladi6 from the gym)
    std::vector<std::vector<double>> X, Y;
    gym::generate_P1(X, Y);

    // We instantiate a symbolic regression problem with one only ephemeral constants.
    // Note that here we also set a batch parallelism to 5 so that 5 batches of 2 points
    // will be created and handled in parallel.
    auto n_eph = 1u;
    symbolic_regression udp(X, Y, 1u, 15u, 16u, 2u, kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), n_eph, true);

    // We init a large population (100) of individuals.
    pagmo::population pop{udp, 4u};

    // We here use the Multi-Objective Memetic Evolutionary Strategy: an original algorithm provided in this dCGP
    // project. We instantiate it with 100 generations and 4 active mutations maximum per individual.
    dcgp::momes4cgp uda{1000u, 4u, 1e-8};
    pagmo::algorithm algo{uda};
    algo.set_verbosity(50u);

    // We solve the problem
    pop = algo.evolve(pop);

    // Finally we print on screen the non dominated front.
    pagmo::print("\nNon dominated Front at the end:\n");
    auto ndf = pagmo::non_dominated_front_2d(pop.get_f());
    for (decltype(ndf.size()) i = 0u; i < ndf.size(); ++i) {
        auto idx = ndf[i];
        auto prettier = udp.prettier(pop.get_x()[idx]);
        trim_left_if(prettier, is_any_of("["));
        trim_right_if(prettier, is_any_of("]"));
        pagmo::print(std::setw(2), i + 1, " - Loss: ", std::setw(13), std::left, pop.get_f()[idx][0], std::setw(15),
                     "Complexity: ", std::left, std::setw(5), pop.get_f()[idx][1], std::setw(10), "Formula: ", prettier,
                     "\n");
    }
    return false;
}

Output:

Note: the actual output will be different on your computers as its non deterministic.

Gen:        Fevals:     Best loss: Ndf size:  Compl.:
            0        4.10239        11        62
         1000        1.42703         7        74
         2000       0.828554         8        45
         3000       0.374803        13        78
         4000       0.164032        16        66
         5000        0.03552        14        48
         6000      0.0200792        11        45
         7000              0        10        19
         8000              0        10        19
         9000              0        10        19
        10000              0        11        19
        11000              0        10        18
        12000              0        11        18
        13000              0        11        18
        14000              0        11        18
        15000              0        11        18
        16000              0        11        18
        17000              0        11        18
        18000              0        11        18
        19000              0        11        18
        20000              0        11        18
        21000              0        11        18
        22000              0        11        18
        23000              0        11        18
        24000              0        10        18
        25000              0        10        18
  Exit condition -- max generations = 250

  Non dominated Front at the end:
- Loss: 0            Complexity:    18   Formula:  c1*cos(x0*sin(x1))
- Loss: 0.844272     Complexity:    17   Formula:  3*c1 - 2*x0 - 2*x0*x1
- Loss: 1.10197      Complexity:    15   Formula:  4*c1 - 3*x0 - x0*x1
- Loss: 1.17331      Complexity:    14   Formula:  3*c1 - 3*x0 - 2*x1
- Loss: 1.27379      Complexity:    10   Formula:  c1 - 3*x0 - x1
- Loss: 1.92403      Complexity:    9    Formula:  2*c1 - 3*x0
- Loss: 1.92403      Complexity:    7    Formula:  c1 - 3*x0
- Loss: 3.22752      Complexity:    5    Formula:  c1 - x0
- Loss: 4.74875      Complexity:    2    Formula:  c1
- Loss: 4.8741       Complexity:    1    Formula:  4