Symbolic Regression (UDP)
-
class symbolic_regression
A Symbolic Regression problem.
Symbolic regression is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity (ref: https://en.wikipedia.org/wiki/Symbolic_regression). It also is one of the core applications for Differentiable Cartesian Genetic Programming.
This class provides an easy way to instantiate symbolic regression problems as optimization problems having a continuous part (i.e. the value of the parameters in the model) and an integer part (i.e. the representation of the model computational graph). The instantiated object can be used as UDP (User Defined Problem) in the pagmo optimization suite.
The symbolic regression problem can be instantiated both as a single and a two-objectives problem. In the second case, aside the Mean Squared Error, the formula complexity will be considered as an objective.
Public Functions
-
inline symbolic_regression()
Default constructor.
A default constructor is needed by the pagmo UDP interface, but it should not be used. It constructs a list of 1 empty points/labels vector and a dummy cgp member. It is guaranteed that m_points[0] and m_labels[0] can be accessed.
-
inline symbolic_regression(const std::vector<std::vector<double>> &points, const std::vector<std::vector<double>> &labels, unsigned r = 1u, unsigned c = 10u, unsigned l = 11u, unsigned arity = 2u, std::vector<kernel<double>> f = kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), unsigned n_eph = 0u, bool multi_objective = false, unsigned parallel_batches = 0u, std::string loss_s = "MSE", unsigned seed = random_device::next())
Constructor.
Constructs a symbolic_regression optimization problem compatible with the pagmo UDP interface.
- Parameters
points – [in] input data.
labels – [in] output data.
r – [in] number of rows of the dCGP.
c – [in] number of columns of the dCGP.
l – [in] number of levels-back allowed in the dCGP.
arity – [in] arity of the basis functions.
f – [in] function set. An std::vector of dcgp::kernel<expression::type>.
n_eph – [in] number of ephemeral constants.
multi_objective – [in] when true, it will consider the model complexity as a second objective.
parallel_batches – [in] number of parallel batches.
loss_s – [in] loss type as string, either “MSE” or “CE”.
seed – [in] seed used for the random engine.
- Throws
std::invalid_argument – if points and labels are not consistent.
std::invalid_argument – if the CGP related parameters (i.e. r, c, etc…) are malformed.
-
inline pagmo::vector_double::size_type get_nobj() const
Number of objectives.
Returns the number of objectives.
- Returns
the number of objectives.
-
inline pagmo::vector_double fitness(const pagmo::vector_double &x) const
Fitness computation.
Computes the fitness for this UDP
- Parameters
x – the decision vector.
- Returns
the fitness of
x
.
-
inline pagmo::vector_double gradient(const pagmo::vector_double &x) const
Gradient computation.
Computes the gradient of the loss with respect to the ephemeral constants (i.e. the continuous part of the chromosome).
- Parameters
x – the decision vector.
- Returns
the gradient in
x
.
-
inline pagmo::sparsity_pattern gradient_sparsity() const
Sparsity pattern (gradient)
Returns the sparsity pattern of the gradient. The sparsity patter is dense in the continuous part of the chromosome. (this is a result of assuming all ephemeral constants are actually in the expression, if not zeros will be returned)
- Returns
the gradient sparsity pattern.
-
inline std::vector<pagmo::vector_double> hessians(const pagmo::vector_double &x) const
Hessian computation.
Computes the hessian of the loss with respect to the ephemeral constants (i.e. the continuous part of the chromosome).
- Parameters
x – the decision vector.
- Returns
the hessian in
x
.
-
inline std::vector<pagmo::sparsity_pattern> hessians_sparsity() const
Sparsity pattern (hessian)
Returns the sparsity pattern of the hessian. The sparsity patter is dense in the continuous part of the chromosome. (this is a result of assuming all ephemeral constants are actually in the expression, if not zeros will be returned)
- Returns
the hessian sparsity pattern.
-
inline std::pair<pagmo::vector_double, pagmo::vector_double> get_bounds() const
Box-bounds.
Returns the box-bounds for this UDP.
- Returns
the lower and upper bounds for each of the decision vector components
-
inline pagmo::vector_double::size_type get_nix() const
Integer dimension.
Returns the integer dimension of the problem.
- Returns
the integer dimension of the problem.
-
inline std::string get_name() const
Problem name.
Returns a string containing the problem name.
- Returns
a string containing the problem name
-
inline std::string get_extra_info() const
Extra info.
- Returns
a string containing extra problem information.
-
inline std::string pretty(const pagmo::vector_double &x) const
Human-readable representation of a decision vector.
A human readable representation of the chromosome is here obtained by calling directly the expression::operator() assuming as inputs variables names \(x_1, x_2, ...\) and as ephemeral constants names \(c_1, c_2, ...\)
- Parameters
x – [in] a valid chromosome.
- Returns
a string containing the mathematical expression represented by x.
-
inline std::string prettier(const pagmo::vector_double &x) const
Human-readable representation of a decision vector.
A human readable representation of the chromosome is here obtained by using symengine the expression::operator() assuming as inputs variables names \(x_1, x_2, ...\) and as ephemeral constants names \(c_1, c_2, ...\)
- Parameters
x – [in] a valid chromosome.
- Returns
a string containing the mathematical expression represented by x.
-
inline const expression<double> &get_cgp() const
Gets the inner CGP.
The access to the inner CGP is offered in the public interface to allow evolve methods in UDAs to reuse the same object and perform mutations via it. This is a hack to interface pagmo with dCGP. Alternatives would be a friendship relation (uughhh) or construct a new CGP object within the evolve each time (seems expensive). So here it is, FOR USE ONLY IN udas::evolve methods.
-
inline void set_cgp(const pagmo::vector_double &x) const
Sets the inner CGP.
The access to the inner CGP is offered in the public interface to allow evolve methods in UDAs to reuse the same object and perform mutations via it.
-
inline std::vector<double> predict(const std::vector<double> &point, pagmo::vector_double x) const
Model predictions.
Uses the model encoded in x to predict the label of point.
- Parameters
point – [in] point to be predicted.
x – [in] chromosome encoding the model.
- Returns
the predicted label for point.
-
inline std::vector<std::vector<double>> predict(const std::vector<std::vector<double>> &points, pagmo::vector_double x) const
Model predictions.
Uses the model encoded in x to predict the labels of points.
- Parameters
points – [in] points to be predicted.
x – [in] chromosome encoding the model.
- Returns
the predicted labels for points.
-
inline pagmo::thread_safety get_thread_safety() const
Thread safety for this udp.
This is set to none as pitonic kernels could be in the inner expression
-
inline void set_phenotype_correction(typename expression<double>::pc_fun_type pc, typename expression<audi::gdual_v>::pc_fun_type dpc)
Sets the phenotype correction.
Sets the phenotype correction for both the internal cgp and dcgp. No checks are made and the user must take care that the two functions passed are actually computing the same quantity using different types (double and gdual_v).
- Parameters
pc – callable to be applied to correct a double expression.
dpc – callable to be applied to correct a gdual_v expression.
-
inline void unset_phenotype_correction()
Unsets the phenotype correction.
-
inline symbolic_regression()