{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Training a FFNN in dCGPANN vs. Keras (classification)\n", "\n", "A Feed Forward Neural network is a widely used ANN model for regression and classification. Here we show how to encode it into a dCGPANN and train it with stochastic gradient descent on a regression task. To check the correctness of the result we perform the same training using the widely used Keras Deep Learning toolbox." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Initial import\n", "import dcgpy\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from tqdm import tqdm\n", "from sklearn.utils import shuffle\n", "import timeit\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data set" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# We import the data for a classification task.\n", "from numpy import genfromtxt\n", "# https://archive.ics.uci.edu/ml/datasets/Abalone\n", "my_data = genfromtxt('abalone_data_set.csv', delimiter=',')\n", "points = my_data[:,:-1]\n", "labels_tmp = my_data[:,-1]\n", "\n", "# We trasform the categorical variables to one hot encoding\n", "# The problem is treated as a three class problem\n", "labels = np.zeros((len(labels_tmp), 3))\n", "for i,l in enumerate(labels_tmp):\n", " if l < 9:\n", " labels[i][0] = 1\n", " elif l > 10:\n", " labels[i][2] = 1\n", " else :\n", " labels[i][1] = 1\n", "\n", "# And split the data into training and test\n", "X_train = points[:3000]\n", "Y_train = labels[:3000]\n", "X_test = points[3000:]\n", "Y_test = labels[3000:]\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Stable implementation of the softmax function\n", "def softmax(x):\n", " \"\"\"Compute softmax values for each sets of scores in x.\"\"\"\n", " e_x = np.exp(x - np.max(x))\n", " return e_x / e_x.sum()\n", "\n", "# We define the accuracy metric\n", "def accuracy(ex, points, labels):\n", " acc = 0.\n", " for p,l in zip(points, labels):\n", " ps = softmax(ex(p))\n", " if np.argmax(ps) == np.argmax(l):\n", " acc += 1.\n", " return acc / len(points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Encoding and training a FFNN using dCGP\n", "\n", "There are many ways the same FFNN could be encoded into a CGP chromosome. The utility *encode_ffnn* selects one for you returning the expression." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting error: 2.655116669472566\n", "Net complexity (number of active weights): 1460\n", "Net complexity (number of unique active weights): 1460\n", "Net complexity (number of active nodes): 81\n" ] } ], "source": [ "# We encode a FFNN into a dCGP expression. Note that the last layer is made by a sum activation function\n", "# so that categorical cross entropy can be used and produce a softmax activation last layer. \n", "# In a dCGP the concept of layers is absent and neurons are defined by activation functions R->R.\n", "dcgpann = dcgpy.encode_ffnn(8,3,[50,20],[\"sig\", \"sig\", \"sum\"], 5)\n", "\n", "# By default all weights (and biases) are set to 1 (and 0). We initialize the weights normally distributed\n", "dcgpann.randomise_weights(mean = 0., std = 1.)\n", "dcgpann.randomise_biases(mean = 0., std = 1.)\n", "\n", "\n", "print(\"Starting error:\", dcgpann.loss(X_test,Y_test, \"CE\"))\n", "print(\"Net complexity (number of active weights):\", dcgpann.n_active_weights())\n", "print(\"Net complexity (number of unique active weights):\", dcgpann.n_active_weights(unique=True))\n", "print(\"Net complexity (number of active nodes):\", len(dcgpann.get_active_nodes()))\n", "\n", "#dcgpann.visualize(show_nonlinearities=True, legend=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Start error (training set): 2.582368721500477\n", "Start error (test): 2.655116669472566\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 100/100 [00:03<00:00, 29.44it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "End error (training set): 0.722099548734631\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "End error (test): 0.7430558265559517\n", "Time: 3.3997908270102926\n" ] } ], "source": [ "res = []\n", "\n", "# We train\n", "n_epochs = 100\n", "print(\"Start error (training set):\", dcgpann.loss(X_train,Y_train, \"CE\"), flush=True)\n", "print(\"Start error (test):\", dcgpann.loss(X_test,Y_test, \"CE\"), flush=True)\n", "\n", "start_time = timeit.default_timer()\n", "for i in tqdm(range(n_epochs)):\n", " res.append(dcgpann.sgd(X_train, Y_train, 1., 32, \"CE\", parallel = 4))\n", "elapsed = timeit.default_timer() - start_time\n", "\n", "print(\"End error (training set):\", dcgpann.loss(X_train,Y_train, \"CE\"), flush=True)\n", "print(\"End error (test):\", dcgpann.loss(X_test,Y_test, \"CE\"), flush=True)\n", "print(\"Time:\", elapsed, flush=True)\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy (test): 0.6508071367884452\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(res)\n", "print(\"Accuracy (test): \", accuracy(dcgpann, X_test, Y_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Same training is done using Keras (Tensor Flow backend)\n", "IMPORTANT: no GPU is used for the comparison. The values are thus only to be taken as indications of the performances on a simple environment with 4 CPUs." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "import keras\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Activation\n", "from keras import optimizers\n", "\n", "# We define Stochastic Gradient Descent as an optimizer\n", "sgd = optimizers.SGD(lr=1.)\n", "# We define weight initializetion\n", "initializerw = keras.initializers.RandomNormal(mean=0.0, stddev=1, seed=None)\n", "initializerb = keras.initializers.RandomNormal(mean=0.0, stddev=1, seed=None)\n", "\n", "model = Sequential([\n", " Dense(50, input_dim=8, kernel_initializer=initializerw, bias_initializer=initializerb),\n", " Activation('sigmoid'),\n", " Dense(20, kernel_initializer=initializerw, bias_initializer=initializerb),\n", " Activation('sigmoid'),\n", " Dense(3, kernel_initializer=initializerw, bias_initializer=initializerb),\n", " Activation('softmax'),\n", "])\n", "model.compile(optimizer=sgd,\n", " loss='categorical_crossentropy', metrics=['acc'])\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "End error (training set): [0.7097578446865082, 0.671999990940094]\n", "End error (test): [0.7359503806396978, 0.6525063514709473]\n", "Time: 9.497980389976874\n" ] } ], "source": [ "start_time = timeit.default_timer()\n", "history = model.fit(X_train, Y_train, epochs=100, batch_size=32, verbose=False)\n", "elapsed = timeit.default_timer() - start_time\n", "print(\"End error (training set):\", model.evaluate(X_train,Y_train, verbose=False))\n", "print(\"End error (test):\", model.evaluate(X_test,Y_test, verbose=False))\n", "print(\"Time:\", elapsed)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# We plot for comparison the MSE during learning in the two cases\n", "plt.plot(history.history['loss'], label='Keras')\n", "plt.plot(res, label='dCGP')\n", "plt.title('dCGP vs Keras')\n", "plt.xlabel('epochs')\n", "plt.legend()\n", "_ = plt.ylabel('Cross Entropy Loss')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }