5.2 Example 1- utility learning agent facing structural change

5.2.2 Results

Limiting the depth of the models created to 10, We preformed 10 runs over 100 time periods for each type of agent. The three types were characterised by the memory they were given and the number of new models they created each time period: respectively 10, 20 and 30. We call these 10-memory, 20-memory and 30-memory agents, they represent agents with different bounds on their rationality. The results were then averaged over these 10 runs.

The first graph shows the (RMS) error of the agent's best model of the utility function compared with the actual function (figure 2). It shows a great improvement between the 10-memory agent's and 20-memory agents, but only a marginal improvement between 20 and 30-memory agent's, suggesting the existence of a sort of minimum capacity for this task.

Figure 2: Error in Agent's Best Model for Different Memories, Averaged Over 10 Runs

When you look at the utilities achieved by the agents with different memories (figure 3), you see that a memory capacity (above 10) does not significantly increase the average utility over time, but it does dramatically effect the reliability of the utility it gains. If this were a firm with the utility being its profits, this reliability would almost as important as its average profit level.

Figure 3: Utility Ratio Achieved for Agents with Different Memories, Averaged over 10 Runs

The model does show traits found in the real world. For example, one phenomenon that is observed is that agents sometimes get "locked" into inferior models for a considerable length of time (as in [3]) - the model implies an inferior course of action, but this course of action is such that the agent never receives disconformation of its model. Thus this remains its best model in terms of the limited data it has, so it repeats that action. If, for example, some consumers find a satisfactory brand at an early stage in the development of their tastes and then they may never try any others - their (limited) experience will never disconfirm their model of what would give them most satisfaction, even when they would like other brands better.

To illustrate the sort of learning processes that can be modelled using this set-up, I imposed a severe structural break on the environment half way through (date 50). The utility function of the agent alternates between a traditional convex utility curve (the easy curve) to a concave one with two local maxima (the hard curve), see figure 4.

Figure 4: The Two Utility curves (product 2 = 100 - product 1)

I ran the set-up with agents of different memory capacities (5, 10, 20 and 30 models) and maximum complexity of models (a depth of 5 and 10). I ran the simulation 10 times over 100 dates for each type of agent, averaging the results. I also performed these experiments with the utility curve switching from the hard curve to the easy and vice versa.

This is not the place to give the full results of this model but to give a flavour of some of the results I show the utility gained by agents with a memory of 5 and 20 models respectively where the utility curve they are learning swaps suddenly from the easy to the hard (figure 5) and visa versa (figure 6). There are also corresponding graphs for the error in their best models (figure 7 and figure 8, respectively). Note how the dynamics are not symmetrical; the first utility curve it encounters conditions the agent for when this changes. The agents had considerably more success (in terms of utility gained) going from easy to hard rather than vice versa.

Figure 5: The utility of 20-model and 5-model agents going from easy to hard utility curves

Figure 6: The utility of 20-model and 5-model agents going from hard to easy utility curve

Figure 7: The RMS Error of the best model of 20-model and 5-model agents going from easy to hard utility functions

Figure 8: The RMS Error of the best model of 20-model and 5-model agents going from hard to easy utility functions

To give a flavour of the sort of models these agents develop, in run 1 of the 30-memory agent batch the agent achieved the following model by date 75:

[amountBoughtOf 'product-2']]]

[average

[[amountBoughtOf 'product-2']

[constant 4.773]]]]]

[min

[[amountBoughtOf 'product-2']

[ifle1thenelse

[[average

[[amountBoughtOf 'product-2']

[amountBoughtOf 'product-2']]]]]]]]].

The extent of the fit learnt by the agent is shown in figure 9.

Figure 9: Learnt vs. Actual Utility Functions, Run 1 of 30-memory Agents

The purpose of this simulation is not to be an efficient maximiser of utility, but to model the learning of economic agents in a more credible way. It will only be vindicated (or otherwise) when compared to real data about these processes. This poses a problem: it is difficult to test for processes as distinct from states - almost any learning algorithm will eventually (given enough time and tuning) cluster around the optima. consequently testing via measures of the centrality of the distribution is unlikely to be very revealing. However, in a dynamic process the spread of behaviours around their average compared to time and simulation size might well do this (for an example of this see Section 5.3.1 below).

However, the model does show traits found in the real world. I will highlight two.

Firstly, one phenomenon that is observed is that agents sometimes get "locked" into inferior models for a considerable length of time - the model implies an inferior course of action, but this course of action is such that the agent never receives disconformation of its model. Thus this remains its best model in terms of the limited data it has, so it repeats that action. If, for example, some consumers find a satisfactory brand at an early stage in the development of their tastes and then they never try any others - their (limited) experience will never disconfirm their model of what would give them most satisfaction, even when they would like other brands better.

Secondly, the biasing effect of prior learning is clearly shown when change occurs. Agents achieved significantly higher utilities on the easy curve when they learnt this first. Learning about the hard curve first biased their learning about the easy curve (figure 5 and figure 6). This does capture some of the context-dependency that are observed with human agents - past learning does significantly influence future performance. In particular that it is easier to learn some things if you start with easy examples [11].

Modelling Bounded Rationality In Agent-Based Simulations using the Evolution of Mental Models - 17 MAR 98

[Next] [Previous] [Up] [Top] [Contents]

Generated with CERN WebMaker