[Next] [Previous] [Top] [Contents]
Complexity and Scientific Modelling
Figure 1: Three patterns (after Grassberger [6])
As a result of this idea graphs like figure 2 have
dominated the literature about complexity in physics. However, as I shall
argue, for such a situation to be true you at least need some other assumptions.
Figure 2: The presumed relationship between complexity and order
As can be easily seen, in the modelling framework above there is absolutely no need for this to be true. Although the highly ordered data might well correspond to the simplest models, it will also often be the case that the most disordered data corresponds to the most complex model forms.
To see this possibility consider the following situation. A modeller has an infinite and precise symbolic language with a limited number of symbols and some fixed grammar such that it includes some small expressions, but expressions of increasing size can be constructed. Suppose this language describes members of a class of data strings of any length of any sequence of symbols taken from a fixed alphabet.
A simple counting argument shows that most such patterns are disordered (as defined either by something like Shannon information or algorithmic information measures), but a similar counting argument shows that only a few of these patterns can correspond to models with relatively small minimal representations. That is, most of the disordered patterns will correspond with the models with the relatively large minimal representations. Whatever the ordering in terms of ease of search, in general the bigger forms will be more difficult to find, i.e. more complex.
Thus, in this case, far from complexity and disorder being antithetical, one would be hard pushed to arrange things so that any of the most complex models would correspond to even slightly ordered patterns.
So if complexity does not necessarily lie between order and disorder,
where has our intuition gone wrong? Without any prior knowledge about
the process that produces the data we have no reliable way of distinguishing
what is merely very complex behaviour and what is irrelevant noise.
The diagrams above mislead us because our experience about the patterns
we typically encounter, has led us to recognise the noise, and separate
it out from the relevant pattern. That this is not necessarily so, see
figure 3, where we show each pattern as a magnification
of a section of the one to its right.
Figure 3: Three patterns with the inclusions shown
Faced with this new information one might change one's mind and say the rightmost pattern is the most complex. The initial judgement of the middle pattern comes not because such disordered patterns do not correspond to complex models in some precise languages, but because we are beings with limited resources used to receiving noisy data. We know it is not usually sensible to try to describe such patterns in such precise languages. This is for two reasons: firstly we do not have the time and secondly broader experience has taught us such models do not predict well on new data. In other words, we know that such "overfitting" of data is not likely to be a profitable activity. The association of what is apparently disordered with simplicity is thus the result of applying a natural heuristic and thus does not represent a necessary relationship.
Introducing specificity into the account makes sense of this. We are naturally good at distinguishing noise from relevant information (we have been practising since birth), so that we do not realise when we are doing it. The moral is that it is extremely useful to use less precise languages to describe such patterns, even at the cost of some accuracy with respect to the data. If the language allows expressions of varying degree of precision, then an overfitting model may well be more complex than a more suitable lower accuracy one. The most appropriate model for a pattern with high entropy might be a very simple and very vague model. If the complexity of the patterns is judged by complexity of their most suitable model one comes to the intuitive judgement that complexity is "between" order and disorder. It is only when one takes the unnatural step of leaving the specificity out of the account (e.g. by restricting the models to a uniformly precise language) that one is faced with the unnaturalness of the complexity of the best model.
[Next] [Previous] [Top] [Contents]
Generated with CERN WebMaker