NEUROEVOLUTION – IMPROVING DEEP LEARNING WITH EVOLUTIONARY COMPUTATION
Dr. Risto Miikkulainen
Topics:AI at Scale
THE POWER OF NEUROEVOLUTION
Many engineered systems have become too complex for humans to optimize. Circuit design has long depended on CAD, and, more recently, automated methods for software design have also started to emerge.
Design of machine-learning systems such as deep neural networks have also reached a level of complexity where humans can no longer optimize effectively. For the last few years, we have been developing methods for neuroevolution using evolutionary computation to discover more effective deep learning architectures. This research builds on more than 25 years of work on evolving network weights and topologies and coincides with related efforts at OpenAI, Uber.ai, DeepMindandGoogle Brain.
There are 3 reasons why neuroevolution in particular is a good approach compared with other methods such as Bayesian parameter optimization, gradient descent, and reinforcement learning:
Neuroevolution is a population-based search method, which makes it possible to explore the space of possible solutions more broadly than other methods. For example, instead of having to find solutions through incremental improvement, it can take advantage of stepping stones, thus discovering surprising and novel solutions. It can utilize well-tested methods from the Evolutionary Computation field for optimizing graph structures to design innovative deep learning topologies and components, and it’s parallelizable with minimal communication cost between threads – making it possible to take advantage of thousands of GPUs.
Neuroevolution can be harnessed to improve the state of the art in deep learning:
In the Omniglot multitask character recognition domain, evolution of hyperparameters, modules and topologies reduced errors from 32% to 10%. Two new approaches were introduced: Coevolution of a common network topology and components that fill it and evolution of different topologies for different alphabets with shared modules. Their strengths were combined to achieve greater improvement.
In the CelebA multitask face attribute recognition domain, state of the art was improved from 8.00% to 7.94%. This result was achieved with a new method, PTA, that extends CTR to multiple output decoder architectures.
In the language-modeling domain that predicted the next word in a language corpus, evolution of a gated recurrent node structure improved performance 10.8 perplexity points over the standard LSTM structure – a structure that has been essentially unchanged for more than 25 years. This method is based on tree encoding of the node structure, an archive to encourage exploration, and prediction of performance from partial training.