{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multi-objective memetic approach\n", "\n", "In this third tutorial we consider an example with two dimensional input data and we approach its solution using a multi-objective approach where, aside the loss, we consider the formula complexity as a second objective.\n", "\n", "We will use a memetic approach to learn the model parameters while evolution will shape the model itself.\n", "\n", "Eventually you will learn:\n", "\n", " * How to instantiate a multi-objective symbolic regression problem.\n", " \n", " * How to use a memetic multi-objective approach to find suitable models for your data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Some necessary imports.\n", "import dcgpy\n", "import pygmo as pg\n", "# Sympy is nice to have for basic symbolic manipulation.\n", "from sympy import init_printing\n", "from sympy.parsing.sympy_parser import *\n", "init_printing()\n", "# Fundamental for plotting.\n", "from matplotlib import pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1 - The data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# We load our data from some available ones shipped with dcgpy.\n", "# In this particular case we use the problem sinecosine from the paper:\n", "# Vladislavleva, Ekaterina J., Guido F. Smits, and Dick Den Hertog.\n", "# \"Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic\n", "# programming.\" IEEE Transactions on Evolutionary Computation 13.2 (2008): 333-349. \n", "X, Y = dcgpy.generate_sinecosine()\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from mpl_toolkits.mplot3d import Axes3D \n", "# And we plot them as to visualize the problem.\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection='3d')\n", "_ = ax.scatter(X[:,0], X[:,1], Y[:,0])\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2 - The symbolic regression problem" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# We define our kernel set, that is the mathematical operators we will\n", "# want our final model to possibly contain. What to choose in here is left\n", "# to the competence and knowledge of the user. A list of kernels shipped with dcgpy \n", "# can be found on the online docs. The user can also define its own kernels (see the corresponding tutorial).\n", "ss = dcgpy.kernel_set_double([\"sum\", \"diff\", \"mul\", \"sin\", \"cos\"])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\tData dimension (points): 2\n", "\tData dimension (labels): 1\n", "\tData size: 30\n", "\tKernels: [sum, diff, mul, sin, cos]\n", "\n" ] } ], "source": [ "# We instantiate the symbolic regression optimization problem\n", "# Note how we specify to consider one ephemeral constant via\n", "# the kwarg n_eph. We also request 100 kernels with a linear \n", "# layout (this allows for the construction of longer expressions) and\n", "# we set the level back to 101 (in an attempt to skew the search towards\n", "# simple expressions)\n", "udp = dcgpy.symbolic_regression(\n", " points = X, labels = Y, kernels=ss(), \n", " rows = 1, \n", " cols = 100, \n", " n_eph = 1, \n", " levels_back = 101,\n", " multi_objective=True)\n", "prob = pg.problem(udp)\n", "print(udp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3 - The search algorithm" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# We instantiate here the evolutionary strategy we want to use to\n", "# search for models. Note we specify we want the evolutionary operators\n", "# to be applied also to the constants via the kwarg *learn_constants*\n", "uda = dcgpy.momes4cgp(gen = 250, max_mut = 4)\n", "algo = pg.algorithm(uda)\n", "algo.set_verbosity(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4 - The search" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# We use a population of 100 individuals\n", "pop = pg.population(prob, 100)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Here is where we run the actual evolution. Note that the screen output\n", "# will show in the terminal (not on your Jupyter notebook in case \n", "# you are using it). Note you will have to run this a few times before \n", "# solving the problem entirely.\n", "pop = algo.evolve(pop)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5 - Inspecting the non dominated front" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Compute here the non dominated front.\n", "ndf = pg.non_dominated_front_2d(pop.get_f())" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Loss: Model: \n", "\n", "1.6049416203226965e-36 | c1*(x1*cos(x1) + cos(x1)) + 2*c1 + 6*cos(x0*sin(x1)) |\n", "1.4444474582904268e-35 | c1*x0 - 2*c1 + 6*cos(x0*sin(x1)) |\n", "1.3000027124613843e-34 | c1*x0 + 2*c1 + 6*cos(x0*sin(x1)) |\n", " 0.8559137162832793 | sin(x1) + 5*cos(x0*sin(x1)) |\n", " 3.04427756327168 | 2*c1*x1*cos(x0*sin(x1)) |\n", " 4.714418293710785 | 3*cos(x0*sin(x1)) |\n", " 8.875932935300025 | 4*cos(c1 + x0) + 1 |\n", " 9.493068363220251 | 5*cos(c1 + x0) |\n", " 13.422370193371659 | 2*c1 - 2*x0 |\n", " 13.42237019337166 | c1 - 2*x0 |\n", " 13.486758301564212 | c1 - x0 |\n", " 15.41066772551229 | 2 - x0 |\n", " 18.679277437831498 | c1 |\n", " 18.85767317484314 | 0 |\n", " 18.85767317484314 | 0 |\n" ] } ], "source": [ "# Inspect the front and print the proposed expressions.\n", "print(\"{: >20} {: >30}\".format(\"Loss:\", \"Model:\"), \"\\n\")\n", "for idx in ndf:\n", " x = pop.get_x()[idx]\n", " f = pop.get_f()[idx]\n", " a = parse_expr(udp.prettier(x))[0]\n", " print(\"{: >20} | {: >30}\".format(str(f[0]), str(a)), \"|\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Lets have a look to the non dominated fronts in the final population.\n", "ax = pg.plot_non_dominated_fronts(pop.get_f())\n", "_ = plt.xlabel(\"loss\")\n", "_ = plt.ylabel(\"complexity\")\n", "_ = plt.title(\"Non dominate fronts\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 6 - Lets have a look to the log content\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Here we get the log of the latest call to the evolve\n", "log = algo.extract(dcgpy.momes4cgp).get_log()\n", "gen = [it[0] for it in log]\n", "loss = [it[2] for it in log]\n", "compl = [it[4] for it in log]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# And here we plot, for example, the generations against the best loss\n", "_ = plt.plot(gen, loss)\n", "_ = plt.title('last call to evolve')\n", "_ = plt.xlabel('generations')\n", "_ = plt.ylabel('loss')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }