expression_ann (dCGP-ANN)

This class represents a Artificial Neural Network Cartesian Genetic Program. Each node connection is associated to a weight and each node to a bias. Only a subset of the kernel functions is allowed, including the most used nonlinearities in ANN research: tanh, sig, ReLu, ELU and ISRU. The resulting expression can represent any feed forward neural network but also other less obvious architectures. Weights and biases of the expression can be trained using the efficient backpropagation algorithm (gduals are not allowed for this class, they correspond to forward mode automated differentiation which is super inefficient for deep networks ML.)

A, small, artificial neural network as using the dCPP-ANN approach.

class expression_ann : public dcgp::expression<double>

A dCGP-ANN expression.

This class represents an artificial neural network as a differentiable Cartesian Genetic program. It adds weights, biases and backward automated differentiation to the class dcgp::expression.

Public Types

enum class kernel_type

Allowed kernels (for backpropagation to work)

Values:

enumerator SIG: sigmoid

enumerator TANH: Hyperbolic tangent.

enumerator RELU: Rectified linear unit.

enumerator ELU: Exponential linear unit.

enumerator ISRU: ISRU.

enumerator SUM: Simple sum of inputs.

enumerator SIN_NU: non unary sine

enumerator COS_NU: non unary cosine

enumerator GAUSSIAN_NU: non unary cosine

enumerator INV_SUM: negative of the input sum

enumerator ABS: absolute value of inputs

enumerator STEP: step funxtion

Public Functions

inline expression_ann(unsigned n, unsigned m, unsigned r, unsigned c, unsigned l, std::vector<unsigned> arity, std::vector<kernel<double>> f, unsigned seed = dcgp::random_device::next())

Constructor.

Constructs a dCGPANN expression

Parameters

n – [in] number of inputs (independent variables).
m – [in] number of outputs (dependent variables).
r – [in] number of rows of the cartesian representation of the network as an acyclic graph.
c – [in] number of columns of the cartesian representation of the network as an acyclic graph.
l – [in] number of levels-back allowed. This, essentially, controls the minimum number of allowed operations in the network. If uncertain set it to c + 1
arity – [in] arities of the basis functions for each column.
f – [in] function set. An std::vector of dcgp::kernel<expression::type>. Can only contain allowed functions.
seed – [in] seed for the random number generator (initial expression and mutations depend on this).

inline expression_ann(unsigned n = 1u, unsigned m = 1u, unsigned r = 1u, unsigned c = 1u, unsigned l = 1u, unsigned arity = 2u, std::vector<kernel<double>> f = kernel_set<double>({"sum"})(), unsigned seed = dcgp::random_device::next())

Constructor.

Constructs a dCGPANN expression

Parameters

n – [in] number of inputs (independent variables).
m – [in] number of outputs (dependent variables).
r – [in] number of rows of the cartesian representation of the network as an acyclic graph.
c – [in] number of columns of the cartesian representation of the network as an acyclic graph.
l – [in] number of levels-back allowed. This, essentially, controls the minimum number of allowed operations in the network. If uncertain set it to c + 1
arity – [in] uniform arity for all basis functions.
f – [in] function set. An std::vector of dcgp::kernel<expression::type>. Can only contain allowed functions.
seed – [in] seed for the random number generator (initial expression and mutations depend on this).

inline virtual std::vector<double> operator()(const std::vector<double> &point) const override

Evaluates the dCGP-ANN expression.

This evaluates the dCGP-ANN expression. This method overrides the base class method. NOTE we cannot template this and the following function as they are virtual in the base class.

Parameters: [point] – in an std::vector containing the values where the dCGP-ANN expression has to be computed
Returns: The value of the output (an std::vector)

inline virtual std::vector<std::string> operator()(const std::vector<std::string> &point) const override

Evaluates the dCGP-ANN expression.

This evaluates the dCGP-ANN expression. This method overrides the base class method.

Parameters: [point] – in an std::vector containing the symbol names.
Returns: The symbolic value of the output (an std::vector)

template<typename U, enable_double_string<U> = 0> inline std::vector<U> operator()(const std::initializer_list<U> &point) const

Evaluates the dCGP-ANN expression.

This evaluates the dCGP-ANN expression. This template can be instantiated with type U double, in which case the algorithm computes the numerical value of the inputs or with U being a string, in which case the instantiated method will produce a symbolic representation of the output.

Parameters: [point] – in an initialzer list containing the values where the dCGP-ANN expression has to be computed (doubles or strings)
Returns: The value of the output (an std::vector)

inline void d_loss(double &value, std::vector<double> &gweights, std::vector<double> &gbiases, const std::vector<double> &point, const std::vector<double> &prediction, const expression<double>::loss_type loss_e) const

Cumulates the loss and its gradient (of a single point)

Cumulates the loss and its gradient with respect to weights and biases. The values are cumulated into the inputs. If called in a loop with many data points will cumulate the total batch values.

Parameters

[value] – The initial loss
[gweights] – The initial loss gradient w.r.t. weights
[gbiases] – The initial loss gradient w.r.t. biases
[point] – The input data (single point)
[prediction] – The predicted output (single point)
[loss_e] – The loss type. Must be loss_type::MSE for Mean Square Error (regression) or loss_type::CE for Cross Entropy (classification)

inline std::tuple<double, std::vector<double>, std::vector<double>> d_loss(const std::vector<std::vector<double>> &points, const std::vector<std::vector<double>> &labels, expression<double>::loss_type loss_e, unsigned parallel = 0u)

Evaluates the loss and its gradient (on a batch)

Returns the loss and its gradient with respect to weights and biases.

Parameters

[points] – The input data (a batch).
[labels] – The predicted outputs (a batch).
[loss_e] – The loss type. Must be loss_type::MSE for Mean Square Error (regression) or loss_type::CE for Cross Entropy (classification)
[parallel] – sets the grain for parallelism. 0 -> no parallelism n -> divides the data into n parts and processes them in parallel threads.

Returns

the loss, the gradient of the loss w.r.t. all weights (also inactive) and the gradient of the loss w.r.t all biases.

inline double sgd(std::vector<std::vector<double>> &points, std::vector<std::vector<double>> &labels, double lr, unsigned batch_size, const std::string &loss_s, unsigned parallel = 0u, bool shuffle = true)

Stochastic gradient descent.

Performs one “epoch” of stochastic gradient descent using mean square error

Parameters

[points] – The input data (a batch). Will be randomly shuffled (with labels) after a call to sgd.
[labels] – The predicted outputs (a batch). Will be randomly shuffled (with points) after a call to sgd.
[lr] – The learning rate.
[batch_size] – The batch size.
[loss_s] – A string defining the loss type. Can be one of “MSE” (mean squared error) or “CE” (cross-entropy)
[parallel] – sets the grain for parallelism. 0 -> no parallelism n -> divides the data into n parts and processes them in parallel threads.
[shuffle] – when true it shuffles the points and labels before performing one epoch of training.

Throws

std::invalid_argument – if the data and label size do not match or is zero, or if lr is not positive.

Returns

The average error across the batches. Note: this will not be equal to the error on the whole data set as weights get updated after each batch. It is an indicator, though, and its free to compute.

inline void set_output_f(const std::string &name)

Sets the output nonlinearities.

Sets the nonlinearities of all nodes connected to the output nodes. This is useful when, for example, the dCGPANN is used for a regression task where output values are expected in [-1 1] and hence the output layer should have some sigmoid or tanh nonlinearity.

Parameters: name – [in] the name of the kernel (nonlinearity)
Throws: std::invalid_argument – if name is invalid.

inline unsigned n_active_weights(bool unique = false) const

Computes the number of weights influencing the result.

Computes the number of weights influencing the result. This will also be the number of weights that are updated when calling sgd. The number of active weights, as well as the number of active nodes, define the complexity of the expression expressed by the chromosome.

Parameters: unique – [in] when true weights are counted only once if connecting the same two nodes.

inline void set_weight(unsigned node_id, unsigned input_id, const double &w)

Sets a weight.

Sets a connection weight to a new value

Parameters

node_id – [in] the id of the node whose weight is being set (convention adopted for node numbering http://ppsn2014.ijs.si/files/slides/ppsn2014-tutorial3-miller.pdf)
input_id – [in] the id of the node input (0 for the first one up to arity-1)
w – [in] the new value of the weight

Throws

std::invalid_argument – if the node_id or input_id are not valid

inline void set_weight(std::vector<double>::size_type idx, const double &w)

Sets a weight.

Sets a connection weight to a new value

Parameters

[idx] – index of the weight to be changed.
[w] – value of the weight to be changed.

Throws

std::invalid_argument – if the node_id or input_id are not valid

inline void set_weights(const std::vector<double> &ws)

Sets all weights.

Sets all the connection weights at once

Parameters: ws – [in] an std::vector containing all the weights to set
Throws: std::invalid_argument – if the input vector dimension is not valid.

inline double get_weight(unsigned node_id, unsigned input_id) const

Gets a weight.

Gets the value of a connection weight

Parameters

node_id – [in] the id of the node (convention adopted for node numbering http://ppsn2014.ijs.si/files/slides/ppsn2014-tutorial3-miller.pdf)
input_id – [in] the id of the node input (0 up to node arity-1)

Throws

std::invalid_argument – if the node_id or input_id are not valid

Returns

the value of the weight

inline double get_weight(std::vector<double>::size_type idx) const

Gets a weight.

Gets the value of a connection weight

Parameters: idx – [in] index of the weight

inline const std::vector<double> &get_weights() const

Gets the weights.

Gets the values of all the weights.

Returns: an std::vector containing all the weights

inline void randomise_weights(double mean = 0, double std = 0.1, std::random_device::result_type seed = random_number)

Randomises all weights.

Set all weights to a normally distributed number

Parameters

[mean] – the mean of the normal distribution.
[std] – the standard deviation of the normal distribution.
[seed] – the seed to generate the new weights (by default its randomly generated).

inline void set_bias(typename std::vector<double>::size_type idx, const double &w)

Sets a bias.

Sets a node bias to a new value

Parameters

[idx] – index of the bias to be changed.
[w] – value of the new bias.

inline void set_biases(const std::vector<double> &bs)

Sets all biases.

Sets all the nodes biases at once

Parameters: bs – [in] an std::vector containing all the biases to set
Throws: std::invalid_argument – if the input vector dimension is not valid (r*c)

inline double get_bias(typename std::vector<double>::size_type idx) const

Gets a bias.

Gets the value of a bias

Parameters: idx – [in] index of the bias

inline const std::vector<double> &get_biases() const

Gets the biases.

Gets the values of all the biases

Returns: an std::vector containing all the biases

inline void randomise_biases(double mean = 0, double std = 0.1, std::random_device::result_type seed = random_number)

Randomises all biases.

Set all biases to a normally distributed number

Parameters

mean – [in] the mean of the normal distribution
std – [in] the standard deviation of the normal distribution
seed – [in] the seed to generate the new biases (by default its randomly generated)

template<typename Archive> inline void serialize(Archive &ar, unsigned)

Object serialization.

This method will save/load this into the archive ar.

Parameters: ar – target archive.
Throws: unspecified – any exception thrown by the serialization of the expression and of primitive types.

Friends

inline friend std::ostream &operator<<(std::ostream &os, const expression_ann &d)

Overloaded stream operator.

Will return a formatted string containing a human readable representation of the class

Returns: std::string containing a human-readable representation of the problem.