Symbolic Regression (UDP)
- class dcgpy.symbolic_regression(points, labels, rows = 1, columns=16, levels_back=17, arity=2, kernels, n_eph=0, multi_objective=False, parallel_batches=0, loss="MSE")
Symbolic regression is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity (ref: https://en.wikipedia.org/wiki/Symbolic_regression). It also is one of the applications for Differentiable Cartesian Genetic Programming.
This class provides an easy way to instantiate symbolic regression problems as optimization problems having a continuous part (i.e. the value of the parameters in the model) and an integer part (i.e. the representation of the model computational graph). The instantiated object can be used as UDP (User Defined Problem) in the pygmo optimization suite.
The symbolic regression problem can be instantiated both as a single and as a two-objectives problem. In the second case, aside the chosen loss on the data, the model complexity will be considered as an objective.
Constructs a symbolic_regression optimization problem compatible with the pagmo UDP interface.
- Parameters
points (2D NumPy float array or
list of lists
offloat
) – the input datalabels (2D NumPy float array or
list of lists
offloat
) – the output data (to be predicted)rows (
int
) – number of rows in the cartesian programcolumns (
int
) – number of columns in the cartesian programlevels_back (
int
) – number of levels-back in the cartesian programarity (
int
onlist
) – arity of the kernels. Assumed equal for all columns.kernels (
List[dcgpy.kernel_]
) – kernel functionsn_eph (
int
) – Number of ephemeral constants.multi_objective (
bool
) – when True the problem will be considered as multiobjective (loss and model complexity).parallel_batches (
int
) – allows to split the data into batches for parallel evaluation.loss (
str
) – loss type used, one of “MSE” (for mean squared error) or “CE” (for cross entropy).
- Raises
unspecified – any exception thrown by failures at the intersection between C++ and Python (e.g., type conversion errors, mismatched function signatures, etc.)
Examples
>>> import dcgpy >>> import pygmo as pg >>> X, Y = dcgpy.generate_koza_quintic() >>> udp = dcgpy.symbolic_regression( ... points = X, ... labels = Y, ... rows = 1, ... cols = 20, ... levels_back = 21, ... arity = 2, ... kernels = dcgpy.kernel_set_double(["sum", "diff"])(), ... n_eph = 1, ... multi_objective = True, ... parallel_batches = 0) >>> prob = pg.problem(udp) >>> print(prob) Problem name: a CGP symbolic regression problem Global dimension: 62 Integer dimension: 61 Fitness dimension: 2 Number of objectives: 2 Equality constraints dimension: 0 Inequality constraints dimension: 0 Lower bounds: [-10, 0, 0, 0, 0, ... ] Upper bounds: [10, 1, 1, 1, 1, ... ] Has batch fitness evaluation: false Has gradient: true User implemented gradient sparsity: true Expected gradients: 1 Has hessians: true User implemented hessians sparsity: true Expected hessian components: [1, 1] Fitness evaluations: 0 Gradient evaluations: 0 Hessians evaluations: 0 Thread safety: basic Extra info: Data dimension (in): 1 Data dimension (out): 1 Data size: 10 Kernels: [sum, diff]