[Next] [Previous] [Top] [Contents]

Meta-Genetic Programming: Co-evolving the Operators of Variation - Bruce Edmonds

The main practical questions to be resolved with respect to MGP are whether using this technique leads to improved results and whether such improvements are worth the additional computational cost. To do this a straight comparison of the number of evaluations of the base population is unfair as when using MGP one has to also evaluate the operator population as well. Thus if one has a base population of 600 in both cases and an operator population of 200 in the second case execution time is increased by a factor of about 1.33. This is a bit rough and ready but seems to give a reasonably fair comparison reflecting computational times in practice.

I chose a problem that was deliberately straight-forward, but not *too*
easy for simple GP algorithms: the 4-parity problem allowing nodes of AND,
OR and NOT only, nodes for the four possible inputs, using base populations
of 600. The fitness function was the number of correct cases the gene made
(out of the total of 16), with a slight penalty for depth which can never
be greater than one. The initial population was randomly generated with
an even depth of 7.

The MGP algorithm had in addition a population of 200 operators using
nodes `up1`, `up2`, `down`, `down1` and `subs`
with terminals of `bott1` and `bott2` (as described in section
2. above). This is turn was operated on by a population of 9 cut-and-graft
operators (as illustrated in figure 3) and one
simple propagation operator, all with fixed equal weights (as illustrated
in figure 13). This population was also initially
randomly generated with an even depth of 6. The fitness function for this
population was found to be critical to the success of MGP. The basic fitness
was the average proportionate change effected in the fitnesses of the base
genes it operated upon. There were several (necessary) elaborations on
this basic function: the fitnesses were smoothed by a running average over
the last 3 generations; new genes were accorded a fitness of 1; and genes
that had no effect on the fitness of genes (i.e. they were essentially
propagation operators) were heavily discounted. (otherwise these came to
quickly dominate the population as most `active' operators had a fitness
of less than one).

The MGP version had, in addition, two further modifications, found
necessary to make it run successfully. *Firstly*, there was a depth
limitation of 15 imposed, as the MGP algorithm is susceptible to hyper-inflation
(this is potentially far worse than in a GP algorithm as an operator with
multiple `subs` operations can be evolved). *Secondly*, in
the MGP algorithm fitness differentials were vastly magnified by a ranking
mechanism, this was necessary in the operator population because of the
small differences in their fitnesses but was also because MGP introduces
a higher level of variation into the base population so a stronger selection
force is needed (in the GP algorithm a shifted fitness proportionate system
was used as the GP performed worse under a ranking system).

Both algorithms were implemented in the high-level declarative language SDML*1, with some time critical sections implemented in VisualWorks/Smalltalk. The execution times are both for the same Sun 5 Sparcstation. In figure 5 the average fitness and the fitness of the best individual is shown vs. execution time; the dotted lines are plus and minus one standard deviation of the population fitnesses and the crosses indicate when the generations occurred.

**Figure 5**: Best and average fitness vs.
execution time for MGP run

**Figure 6**: Best and average fitness vs. number of evaluations
for MGP run

Thus in the best run of the MGP algorithm it found a perfect
solution after 36 generations (28,800 evaluations) and 1050 minutes of
execution time. Compare this to the results in figure 7
for the best GP run obtained. This only found the perfect solution after
93 generations (55,800 evaluations) and 2380 minutes of execution time.
The same number of runs were made of each (6). Although the best run in
each case is shown, these results are typical.

**Figure 7**: Best and average fitness vs.
execution time for GP run

**Figure 8**: Best and average fitness vs. number of evaluations
for GP run

There are several things to note. *Firstly*, that in the MGP algorithm
the learning is much smoother than in the GP algorithm - this may indicate
that the MGP algorithm is acting as a more greedy but less robust search
of the space of solutions. *Secondly*, that the MGP algorithm slows
down a lot more quickly than the GP algorithm - this is due to the rapid
inflation as the MGP algorithm learns to "grow" the base population
to a greater depth. Such inflation is almost inevitable due to the distribution
of solutions of different depths, as shown in figure 9
- there are just many more better solutions accessible using a greater
depth of candidate gene (this is further explored in [9]).
This may be another advantage of GP, as it inflates the population much
more slowly. *Thirdly*, that while both techniques maintain a good
level of variation in their base populations, the GP algorithm not only
has a higher level of variation, but also increases the level of variation
as it progresses (although it is possible that this is due to the fitness
proportionate system of selection compared to the ranking system used in
the MGP algorithm). Again this might indicate the greater robustness of
standard GP over MGP.

**Figure 9**: Distribution of 50,000 randomly
sampled propositional expressions of different depths in four variables
using AND, OR and NOT compared to the parity function

What may not be immediately obvious from the graphs above is
the difference in the *shapes* of the curves. In figure
10 we compare the curves of best and average fitnesses on the same
axis of execution time. Here we see that, although the GP algorithm did
better at the start, both in terms of best and average fitness, the MGP
algorithm did not seem to suffer the characteristic flattening out that
occurs in GP algorithms (at least, not anywhere to the same extent). It
is perhaps this that indicates most clearly the potential of the MGP approach.

However, as mentioned above the technique is susceptible to hyper-inflation as well as being somewhat brittle to other modifications. For example, if a ceiling is placed on the maximum depth of candidate solutions this can destroy the effectiveness of the MGP algorithm. In figure 11 and figure 12, I show the results of a run where a depth limit of 14 is imposed (for the same even parity 4 problem). It is evident that as soon as the ceiling is reached (after about 40 generation) the algorithm ceases to be effective.

**Figure 11**: Average Depth in MGP run with
a Depth Ceiling of 14

**Figure 12**: Average Depth in MGP run with
a Depth Ceiling of 14

Meta-Genetic Programming: Co-evolving the Operators of Variation - Bruce Edmonds - 28 JAN 98

[Next] [Previous] [Top] [Contents]

Generated with CERN WebMaker