The tendency for programs in genetic programming (GP) populations to grow in length has been widely reported [1,2,3,4,5,6,7]. This tendency has gone under various names such as ``bloat'', ``fluff'' and increasing ``structural complexity''. The principal explanation advanced for bloat has been the growth of ``introns'' or ``redundancy'', i.e. code which has no effect on the operation of the program which contains it. ([8] contains a survey of recent research in biology on ``introns''). Such introns are said to protect the program containing them from crossover [9,10,11,12]. [13] presents an analysis of some simple GP problems designed to investigate bloat. This shows that, with some function sets, longer programs can ``replicate'' more ``accurately'' when using crossover. That is offspring produced by crossover between longer programs are more likely to behave as their parents than children of shorter programs. [14] provides a detailed analysis of bloat using tree schemata specifically for GP.
In this paper we advance a more general explanation which should apply generally to any discrete variable length representation and generally to any progressive search technique. That is bloat is not specific to genetic programming applied to trees and tree based crossover but should also be found with other genetic operators and non-population based stochastic search techniques such as simulated annealing and stochastic iterated hill climbing.
In the next section we expand our argument that bloat is inherent in variable length representations such as GP. In Sections 3 and 4 we analyse a typical GP demonstration problem, showing that it suffers from bloat but also showing that bloat is not present in the absence of fitness based selection. Section 5 describes the results we have achieved and this is followed in Section 6 by a discussion of the potential advantages and disadvantages of bloat and potential responses to it. Finally Section 7 summarises our conclusions.