Find an exact model inclduing one parameter using a memetic approach

In this second tutorial we show how to find a model for our input data when we also want to learn some constants.

Constants can, in general, be learned via two main techniques:

evolutionary (common and standard practice in GP)
memetic (original with dCGP)

The difference is that the evolutionary approach cannot find the correct and exact values for constants, only approximations. In this tutorial we follow the evolutionary approach 2. In the next tutorial we will follow a memetic approach 2. We use the problem P1 from the dcgp::gym, that is x^5 - pi*x^3 + x

Code:

#include <vector>

#include <boost/algorithm/string.hpp>
#include <pagmo/algorithm.hpp>
#include <pagmo/io.hpp>
#include <pagmo/population.hpp>
#include <pagmo/problem.hpp>

#include <dcgp/algorithms/mes4cgp.hpp>
#include <dcgp/gym.hpp>
#include <dcgp/kernel_set.hpp>
#include <dcgp/problems/symbolic_regression.hpp>
#include <dcgp/symengine.hpp>

using namespace dcgp;
using namespace boost::algorithm;

int main()
{
    // We load the data (using problem P1 from the gym)
    std::vector<std::vector<double>> X, Y;
    gym::generate_P1(X, Y);

    // We instantiate a symbolic regression problem with one only ephemeral constants.
    auto n_eph = 1u;
    symbolic_regression udp(X, Y, 1u, 15u, 16u, 2u, kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), n_eph);

    // We init a population with four individuals.
    pagmo::population pop{udp, 4u};

    // We instantiate the memetic solver setting 1000 maximum generation and one active mutation (minimum)
    dcgp::mes4cgp uda(1000u, 4u, 1e-8);
    pagmo::algorithm algo{uda};
    algo.set_verbosity(50u);

    // We solve
    pop = algo.evolve(pop);

    // We print on screen the best found
    auto idx = pop.best_idx();
    auto prettier = udp.prettier(pop.get_x()[idx]);
    trim_left_if(prettier, is_any_of("["));
    trim_right_if(prettier, is_any_of("]"));

    pagmo::print("\nBest fitness: ", pop.get_f()[idx], "\n");
    pagmo::print("Chromosome: ", pop.get_x()[idx], "\n");
    pagmo::print("Pretty Formula: ", udp.pretty(pop.get_x()[idx]), "\n");
    pagmo::print("Prettier Formula: ", prettier, "\n");
    pagmo::print("Expanded Formula: ", SymEngine::expand(SymEngine::Expression(prettier)), "\n");
    return false;
}

Output:

Note: the actual output is non deterministic. Sometimes, with a bit of luck :) the problem is solved exaclty (loss goes to zero). The following output reports one of these occurences. Note that in the traditional evolutionary approach this result is incredibly hard to obtain.

Gen:        Fevals:          Best:   Constants:      Formula:
   0              0        4009.59   [-2.41031]      [(c1 + x0)*x0] ...
  50            200       0.978909   [0.238557]      [x0**2*c1*(-x0 + x0**4) - (c1 + x0 + x0* ...
 100            400        0.84565   [0.240548]      [x0**2*c1*(-x0 + x0**4) - (c1 + 2*x0)] ...
 150            600       0.761757   [0.240032]      [-2*x0 + x0**2*c1*(-x0 + x0**4)] ...
 200            800      0.0170582   [-1.16484]      [(-x0 + x0**3)*(c1 + x0**2) - x0**3] ...
 250           1000      0.0170582   [-0.164837]     [(-x0 + x0**3)*(c1 + x0**2) - (-x0 + 2*x ...
 300           1200      0.0170582   [-0.164837]     [(-x0 + x0**3)*(c1 + x0**2) - (-x0 + 2*x ...
 350           1400      0.0170582   [-0.164837]     [(-x0 + x0**3)*(c1 + x0**2) - (-x0 + 2*x ...
 357           1428    2.17578e-29   [-1.14159]      [x0**3*(c1 + x0**2) - (-x0 + 2*x0**3)] ...
Exit condition -- ftol < 1e-08

Best fitness: [2.17578e-29]
Chromosome: [-1.14159, 2, 0, 0, 2, ... ]
Pretty Formula: [(((c1+(x0*x0))*((x0*x0)*x0))-(((x0*x0)*x0)+(((x0*x0)*x0)-x0)))]
Prettier Formula: x0**3*(c1 + x0**2) - (-x0 + 2*x0**3)
Expanded Formula: x0 + x0**3*c1 - 2*x0**3 + x0**5