Strategies for Improving the Distribution of Random Function Outputs in GSGP

Abstract

In the last years, different approaches have been proposed to introduce semantic information to genetic programming. In particular, the geometric semantic genetic programming (GSGP) and the interesting properties of its evolutionary operators have gotten the attention of the community. This paper is interested in the use of GSGP to solve symbolic regression problems, where semantics is defined by the output set generated by a given individual when applied to the training cases. In this scenario, both mutation and crossover operators defined with fitness function based on Manhattan distance use randomly built functions to generate offspring. However, the outputs of these random functions are not guaranteed to be uniformly distributed in the semantic space, as the functions are generated considering the syntactic space. We hypothesize that the non-uniformity of the semantics of these functions may bias the search, and propose three different standard normalization techniques to improve the distribution of the outputs of these random functions over the semantic space. The results are compared with a popular strategy that uses a logistic function as a wrapper to the outputs, and show that the strategies tested can improve the results of the previous method. The experimental analysis also indicates that a more uniform distribution of the semantics of these functions does not necessarily imply in better results in terms of test error.

Publication
Genetic Programming: 20th European Conference, EuroGP 2017, Amsterdam, The Netherlands, April 19-21, 2017, Proceedings
comments powered by Disqus