Meta-Genetic Programming: Co-evolving the Operators of Variation - Bruce Edmonds

4. Preliminary Test Results

The main practical questions to be resolved with respect to MGP are whether using this technique leads to improved results and whether such improvements are worth the additional computational cost. To do this a straight comparison of the number of evaluations of the base population is unfair as when using MGP one has to also evaluate the operator population as well. Thus if one has a base population of 600 in both cases and an operator population of 200 in the second case execution time is increased by a factor of about 1.33. This is a bit rough and ready but seems to give a reasonably fair comparison reflecting computational times in practice.

I chose a problem that was deliberately straight-forward, but not too easy for simple GP algorithms: the 4-parity problem allowing nodes of AND, OR and NOT only, nodes for the four possible inputs, using base populations of 600. The fitness function was the number of correct cases the gene made (out of the total of 16), with a slight penalty for depth which can never be greater than one. The initial population was randomly generated with an even depth of 7.

The MGP algorithm had in addition a population of 200 operators using nodes up1, up2, down, down1 and subs with terminals of bott1 and bott2 (as described in section 2. above). This is turn was operated on by a population of 9 cut-and-graft operators (as illustrated in figure 3) and one simple propagation operator, all with fixed equal weights (as illustrated in figure 13). This population was also initially randomly generated with an even depth of 6. The fitness function for this population was found to be critical to the success of MGP. The basic fitness was the average proportionate change effected in the fitnesses of the base genes it operated upon. There were several (necessary) elaborations on this basic function: the fitnesses were smoothed by a running average over the last 3 generations; new genes were accorded a fitness of 1; and genes that had no effect on the fitness of genes (i.e. they were essentially propagation operators) were heavily discounted. (otherwise these came to quickly dominate the population as most `active' operators had a fitness of less than one).

The MGP version had, in addition, two further modifications, found necessary to make it run successfully. Firstly, there was a depth limitation of 15 imposed, as the MGP algorithm is susceptible to hyper-inflation (this is potentially far worse than in a GP algorithm as an operator with multiple subs operations can be evolved). Secondly, in the MGP algorithm fitness differentials were vastly magnified by a ranking mechanism, this was necessary in the operator population because of the small differences in their fitnesses but was also because MGP introduces a higher level of variation into the base population so a stronger selection force is needed (in the GP algorithm a shifted fitness proportionate system was used as the GP performed worse under a ranking system).

Both algorithms were implemented in the high-level declarative language SDML *1, with some time critical sections implemented in VisualWorks/Smalltalk. The execution times are both for the same Sun 5 Sparcstation. In figure 5 the average fitness and the fitness of the best individual is shown vs. execution time; the dotted lines are plus and minus one standard deviation of the population fitnesses and the crosses indicate when the generations occurred.

Figure 5: Best and average fitness vs. execution time for MGP run

Figure 6: Best and average fitness vs. number of evaluations for MGP run

Thus in the best run of the MGP algorithm it found a perfect solution after 36 generations (28,800 evaluations) and 1050 minutes of execution time. Compare this to the results in figure 7 for the best GP run obtained. This only found the perfect solution after 93 generations (55,800 evaluations) and 2380 minutes of execution time. The same number of runs were made of each (6). Although the best run in each case is shown, these results are typical.

Figure 7: Best and average fitness vs. execution time for GP run

Figure 8: Best and average fitness vs. number of evaluations for GP run

There are several things to note. Firstly, that in the MGP algorithm the learning is much smoother than in the GP algorithm - this may indicate that the MGP algorithm is acting as a more greedy but less robust search of the space of solutions. Secondly, that the MGP algorithm slows down a lot more quickly than the GP algorithm - this is due to the rapid inflation as the MGP algorithm learns to "grow" the base population to a greater depth. Such inflation is almost inevitable due to the distribution of solutions of different depths, as shown in figure 9 - there are just many more better solutions accessible using a greater depth of candidate gene (this is further explored in [9]). This may be another advantage of GP, as it inflates the population much more slowly. Thirdly, that while both techniques maintain a good level of variation in their base populations, the GP algorithm not only has a higher level of variation, but also increases the level of variation as it progresses (although it is possible that this is due to the fitness proportionate system of selection compared to the ranking system used in the MGP algorithm). Again this might indicate the greater robustness of standard GP over MGP.

Figure 9: Distribution of 50,000 randomly sampled propositional expressions of different depths in four variables using AND, OR and NOT compared to the parity function

What may not be immediately obvious from the graphs above is the difference in the shapes of the curves. In figure 10 we compare the curves of best and average fitnesses on the same axis of execution time. Here we see that, although the GP algorithm did better at the start, both in terms of best and average fitness, the MGP algorithm did not seem to suffer the characteristic flattening out that occurs in GP algorithms (at least, not anywhere to the same extent). It is perhaps this that indicates most clearly the potential of the MGP approach.

Figure 10: Comparing the MGP and GP runs

However, as mentioned above the technique is susceptible to hyper-inflation as well as being somewhat brittle to other modifications. For example, if a ceiling is placed on the maximum depth of candidate solutions this can destroy the effectiveness of the MGP algorithm. In figure 11 and figure 12, I show the results of a run where a depth limit of 14 is imposed (for the same even parity 4 problem). It is evident that as soon as the ceiling is reached (after about 40 generation) the algorithm ceases to be effective.

Figure 11: Average Depth in MGP run with a Depth Ceiling of 14

Figure 12: Average Depth in MGP run with a Depth Ceiling of 14

Meta-Genetic Programming: Co-evolving the Operators of Variation - Bruce Edmonds - 28 JAN 98

[Next] [Previous] [Top] [Contents]

Generated with CERN WebMaker