Answers of the 4th demonstrations of the course in neural networks Marko Gronroos, Christian Lehtinen 1. Classes and class operators 1.1. Unit class Unit { protected: float output, error; output is the output value of the unit and error the calculated error. floatARR weights, weight_deltas, last_delta; weights is an float array that contains the inputs to the unit and last_delta contains the deltas of the respective weights from the last iteration turn. 'floatARR' is a parameterized (generic) array of floating point items. The bias (threshold) is the last item in the weights-array. weight_deltas is used when the use_allonce-flag is set on. It stores the deltas of the weight array. public: void make (int n); make is the constructor of the Unit class. The use of parameterized array classes prevents the use of normal construction. The parameter is the number of source units that connect to an unit. void init (); init initializes the unit. This could be done in the constructor, but is made separate to get some more genericity. It initializes the weights with random values. ARRAY(Unit) This creates a parameterized class called 'UnitARR', which is used by the Layer and BPN classes. 1.2. Layer class Layer { protected: UnitARR units; units is an array of units in the layers. Should be self-explanatory. int transf_func; transf_func is a flag that indicated which transfer function should be used in this layer. Two transfer functions are implemented: TF_SIGMOID and TF_LINEAR. It is set with the function BPN::lsize and can be seen in the layer printout as letter 'S' (sigmoid) or 'L' (linear). Only the last layer can be made linear in this implementation for efficiency purposes. public: void make (int n, int prevn); make constructs the layer. The parameter 'n' is the number of units in the layer and the second parameter 'prevn' the number of units in the previous layer. void init (); init initializes all the units in the layer. ostream& print (ostream& out); prints the layer to the given output stream. void propagate (Layer& previous, int use_bias); propagates the signal from the given layer. This first parameter MUST be of the same size as the size given in the constructor. The second parameter is a flag that tells whether to use bias or not. float compouterr (floatARR& desired); computes the error at the last layer. This must be used only for the last layer. The parameter is a float array that contains the desired output for the propagated pattern. void errorprop (Layer& upper); computes the error for all other but not (ever) for the last layer. The parameter is a reference to the next upper layer, where from the error is to be propagated backwards. void nulldeltaweights (); Sets the weight deltas to 0. 1.3. BPN class BPN { protected: LayerARR layers; float alpha, eta; 'layers' contains the layers in the network and alpha and eta are the learning parameters of the network. public: int use_bias; int use_allonce; use_bias is a flag that indicates whether (true) or not (false) the bias value should be used in calculations. Yes, it's public, and it's used by directly setting it to 0 or 1. The default is true. use_allonce-flag indicates whether or not to use the learning method, where all patterns are scanned before weight changes. The default is false. BPN (int n, float alpha, float eta); The constructor of the entire network. The first parameter, 'n', is the number of layers in the network. The other parameters are the learning parameters of the network. void init (); Initializes the network. This MUST be used AFTER the sizes of layers have been set. void lsize (int n, int siz); void lsize (int n, int siz, int tfunc); lsize sets the size (second parameter) of the given layer. The optional third parameter is the transfer function (TF_SIGMOID or TF_LINEAR). The third parameter must be used only for the last layer, and it can be omitted if the default, TF_SIGMOID, is to be used. This function MUST be used for each of the layers in the network. ostream& print (ostream& out); Prints out the output values of all the layers to the given output stream. void propagate (); Propagates the signal at the input layer to the output layer. float errorprop (floatARR& desired); Backpropagates the errors between propagated signal and the given desired output pattern back to the input layer. Does not adjust the weights. void nulldeltaweights (); Sets the weight deltas to 0. void adjweights (); Adjusts the weights of the network. void train (TrainSet& set, int iterations); Trains the network with the given training set for the given number of iterations. void test (TrainSet& set, int number); Tests the network with the training pattern of the given training set which is located in the place given in the second parameter. void save (char* filename); Saves the weight matrix of the network into a file. void load (char* filename); Loads the weight matrix of the network from a file. Observe that the network size MUST be the same as the size in the saved file. 4. TrainSet class TrainSet { private: floatARRARR inputs, outputs; inputs and outputs are matrices that contain the corresponding input-output training patterns (of type floatARR). public: TrainSet (int patts, int ins, int outs, char* filename); Patts is the number of training patterns and ins and outs are the number of input and output sizes of the patterns. The last parameter is the name of the file that contains the patterns. void print (ostream& out); Prints the training set to the given output stream. void mutate (int errcnt); Mutates the inputs of the training set by given number of errors. The number of errors equals to the hamming distance between normal and mutated patterns. 2. The use 2.1. Training patterns The training patterns should be written to a separate file so that first comes the input pattern, then output pattern. Values are separated by spaces, tabulator characters or linefeeds. The number of values in the file must match patterns*(inputs + outputs), where patterns is the number of training patterns. Example: xor.tra 0 0 0 0 1 1 1 0 1 1 1 0 2.2. Network definition First we construct the network by defining the number of layers it has and it's learning parameters: BPN aNetwork (number of layers, alpha, eta); Then we set the size of each of the layers. Here we assume to have a three-layered network. aNetwork.lsize (1, number of input units); aNetwork.lsize (2, number of hidden units); aNetwork.lsize (3, number of output units, TF_LINEAR or TF_SIGMOID); The definition of the transfer function in the last layer can be omitted of the desired transfer function is sigmoid. This must be done before initialization. Now the structure of the network has been defined. Next we have to initialize the weights, etc. This must be done. aNetwork.init (); If one wishes not to use the bias (threshold) value, one should set the optional use_bias-flag off: aNetwork.use_bias = 0; If one wishes to use a learning method where all the patterns are propagated before weight changes, one can set this flag to true. aNetwork.use_allonce = 1; 2.3. Training The input patterns are stored in a TrainSet. The file size MUST match the parameters of this constructor call. TrainSet atrainset (Number of patterns in the file, number of inputs in each pattern, number of outputs in each pattern, "filename.tra"); Then we train the network with the training set for 2000 iterations. The train operation prints the current total squared error of the network to the standard output during the training so that one can follow the learning interactively. It also prints out the test for each pattern after every 50 iterations. aNetwork.train (atrainset, 2000); After the training we usually wish to save the network. aNetwork.save ("filename.wgh"); 2.4. Testing First we load a training set or a test set that we wish to apply to the network. If we don't have a trained network, we can load it's weights from a file with the load operation. TrainSet atrainset (patterns, inputs, outputs, "filename.tra"); aNetwork.load ("filename.wgh"); Then we can test the network with a pattern in the traning set that we select. The test-operator doesn't automatically print the results so we have to do it by hand. If we wish to do the testing for all the patterns in the set, we must do it with a for-loop. aNetwork.test (atrainset, number of a pattern in the TrainSet); aNetwork.print (cout); If wehave only a training set and we wish to easily create a test set, we can do it with the mutate-operation of the TrainSet class. It automatically makes random changes to all the patterns in the training set. Observe that it is dependant on the random number generator of the machine architecture. For example in Unix it always generates the same errors in every execution, unless the srand (int seed) - function is ran to change the random number seed. atrainset.mutate (4); The training set can be printed out with the print operator. 3. Exercises 3.1. Character recognition with no bias Training: BPN verkko (3, 0.2, 0.1); verkko.lsize (1, 35); verkko.lsize (2, 20); verkko.lsize (3, 36, TF_LINEAR); verkko.init (); verkko.use_bias = 0; TrainSet syotteet (36, 7*5, 36, "merkit.tra"); verkko.train (syotteet, 10000); verkko.save ("3.2.wgh"); Testing: TrainSet syotteet (36, 7*5, 36, "merkit.tra"); verkko.load ("3.2.wgh"); syotteet.mutate (4); for (int i=1; i<=36; i++) { verkko.test (syotteet, i); verkko.print (cout); } , which is ran with different number of errors (here 4). 3.2. Character recognition with no bias and linear output layer Training: BPN verkko (3, 0.2, 0.1); verkko.lsize (1, 35); verkko.lsize (2, 20); verkko.lsize (3, 36, TF_LINEAR); verkko.init (); verkko.use_bias = 0; TrainSet syotteet (36, 7*5, 36, "merkit.tra"); verkko.train (syotteet, 10000); verkko.save ("3.2.wgh"); Testing as in 3.1, except that with our parameters the network didn't learn so much that it would be sensible to run it with any errors. 3.3. XOR with no bias BPN verkko (3, 0.5, 0.2); verkko.lsize (1, 2); verkko.lsize (2, 4); verkko.lsize (3, 1); verkko.init (); verkko.use_bias = 0; TrainSet syotteet (4, 2, 1, "xor.tra"); verkko.train (syotteet, 10000); With this setup (4 hidden neurons) the network learned the patterns in 3000 iterations. 3.4. As in 3.1. except verkko.use_allonce = 1; It learned very slowly and poorly. After 2000 iterations the error was about 13 (alpha=0.5, eta=0.2). Only about half of the characters were recognized, while others were about 0.00's in the output layer. 3.5. Training: BPN verkko (3, 0.7, 0.2); (Network sizes as in 3.1) verkko.use_bias = 1; TrainSet syotteet (36, 7*5, 36, "merkit.tra"); verkko.train (syotteet, 10000); verkko.save ("3.5.wgh"); Learns the patterns adequetly in about 1000 iterations. Testing as in 3.1.