[Next] [Previous] [Top] [Contents]
Towards a Descriptive Model of Agent Strategy Search - Bruce Edmonds
The structure of the computational model follows that of the Sonnemans experiment as closely as possible. Thus there are 36 experiments in parallel, each with one agent. The agents have to decide when to stop a sequence of offers and accept the value of the highest offer so far (minus the cost of the offers and a fixed `fee' of 50 cents). Thus in each run of this simulation each of 36 agents plays 69 games.
Thus each agent has to decide for each offer in each game whether to ask for another offer or accept the highest offer so far and at the end of part 3 to specify a fixed strategy (of the form discussed previously) for part 4. The material it has to work on is its experience in terms of its earnings and costs in previous games. How the agents specify strategy and make decisions based on this experience is the nub of the model.
Following the hints in Section 4, each agent has a small population of strategies which represent its current hypotheses. These are in the form of those in Table 1. As a result of the experience of each game the agent evaluates its hypotheses, keeps some, changes some, combines some and forgets the rest. In any particular game it uses the strategy that it has evaluated as its best to decide when to stop the bidding process. At the start each agent is given an initial population of these strategies generated at random, and from then on the agent works with these to produce newer and better strategies, so the population evolves and the strategies improve.
There are four key questions to be explored within this framework. Each question concerns how the agent model should be structured so that it reflects (as far as possible) the behaviour of the subjects in Sonneman's experiment. For each question I list some possible answers and explain how the model can be adapted to implement these.
1. Exactly how should the agents adapt their existing hypotheses in the light of their experience?
There are many possible operations that a subject could use to produce the next population of hypotheses. I list the selection that I implemented.
This is the operation of keeping the best existing hypothesis, as currently evaluated. The idea is that one holds on to one's best hypothesis.
Select a hypothesis probabilistically according to its current evaluation and keep it. Since selection is probabilistic this means that you keep a selection of your current hypotheses biased towards those that are currently doing best, but you might sometimes keep less good hypotheses as well.
Introduce a totally new randomly generated hypothesis. If one does not introduce any new hypotheses there is a danger of stagnation after a while.
Select two existing hypotheses probabilistically according to their current evaluations and join them into one new hypothesis using the boolean `OR' function. Thus if the selected hypotheses were `Hxb3 70' and `Nxb3 7', the new hypothesis would be `Hxb3 70 OR Nxb3 7'.
Select two existing hypotheses probabilistically according to their current evaluations and join them into one new hypothesis using the boolean `AND' function. Thus if the selected hypotheses were `Hxb3 70' and `Nxb3 7', the new hypothesis would be `Hxb3 70 AND Nxb3 7'.
Select two existing hypotheses probabilistically according to their current evaluations, randomly choose an appropriate function and join them into one new hypothesis using this function. The `generalisation' and `specialisation' operations described immediately above are special cases of this. This could be seen as a general exploratory constructive operation.
Randomly choose a node in an existing hypothesis (chosen probabilistically according to its current evaluation) and copy and keep the subexpression as a hypothesis. This might be associated with an attempt at simplification of a more complex expression.
Select two existing hypotheses probabilistically according to their current evaluations, randomly choose a node in one of them graft the other to that node replacing any subexpression that was there before, keep the result. This is another exploratory constructive operation attempting to utilise the properties of simpler hypotheses to make a more effective one.
Select two existing hypotheses probabilistically according to their current evaluations, `cut' a subtree from the first (as above) and `graft' it into the second (as immediately above), keep the result. This is similar to the `graft' operation immediately above, but does not have the disadvantage of always increasing the size and depth of the expressions.
Select two existing hypotheses probabilistically according to their current evaluations, randomly choose a subnode of each expressions and swap the subexpressions these indicate between the two hypotheses to make two new hypotheses, keep both. This is rather like two `cut and graft' operations combined. This operation is used very effectively techniques of automatic induction because it is exactly conserves the nodes in the expressions while shuffling around their configurations.
Select an existing hypothesis probabilistically according to its current evaluation, randomly pick a numeric parameter, increment or decrement that parameter, and keep the result. This is a classic exploratory move allowing the incremental parameterisation of an expression.
Thus to take some example mixes of these operations. A mix of 90% crossover and 10% propagation would implement a classic genetic programming algorithm. A mix of 10% election and 90% new would implement a search which merely tries out expressions at random. A mix of 50% hillclimb, and 50% propagation would start with a randomly generated set of expressions and then incrementally explore parameterisation of these.
In the agent model I use here, the proportion of the operations used by agents is set by the programmer.
2. In what way should the agents evaluate their hypotheses?
Clearly the most obvious way in which the expressions might be evaluated is according to the net earnings that using each hypothesis would have generated on the offer sequences that are encountered. However that is not the only possibility. The agent could be also biased in favour of syntactically simpler hypotheses or be in favour of those strategies that incur fewest costs.
As with the operation of variation there are quite a number of ways of combining these factors in the evaluation of hypotheses. It is possible that the subjects are attempting to maximise some function of these, but it is also possible that these factors might be applied separately, for example by eliminating all those which perform badly in any of these respects.
In this model each of these three aspects (earnings, costs and syntactic complexity) are evaluated for each strategy. There are then several different ways of treating the evaluation for each aspect. I chose four: a set proportion of the worst strategies with respect to each can be discarded and the rest just kept on; the strategies would have to reach a minimum level in order to be kept on; the strategies can be given a simple score with respect to each aspect; or they can be ranked.
3. How far back should the agents evaluate new hypotheses against past experience?
An optimal agent would remember the offer sequence in every single past game and use all of them to evaluate its strategies. This would be implausible in this case as the subjects were not allowed to keep notes and would have had to rely on their memory. Thus there is the question of how far back subjects recall in the process of evalutating their hypotheses. Broadly, the further back a subject recalls the more efficient and less contingent is their strategy development.
4. What number of hypotheses should the agents hold at any one time?
It seems clear that human beings would not merely consider one strategy for each game, starting completely afresh if this one was unsuccessful. Rather it is likely that they would recall previous strategies they had tried or thought of as a basis for further strategies. Thus, in effect, the subjects will have access to a variety of past strategies at any stage. However it is also clear that the ability of human subjects to recall (or reconstruct) past strategies is limited, in other words we forget some of our past thinking. Thus what would be a realistic number of strategies for an agent to hold within this framework is also a parameter to explore. Broadly, large populations implement a more exploratory search while small populations implement a more `path-dependent', contingent search.
Clearly, the total number of possibilities that could be constructed by varying the above parameters is vast. However, thanks to Sonnemans' data, we are able to pin it down in several different ways. What I will do is to exhibit the first steps in this modelling process. This is a process that obviously will need to be continued.
The model was implemented in the language SDML . The code for the simulations described here will be available at: http://bruce.edmonds.name/tdmass/code.html.
[Next] [Previous] [Top] [Contents]
Generated with CERN WebMaker