This paper, Value Iteration Networks won the Best Paper Award at NIPS 2016.
Most work in Deep RL has used neural network architectures that were developed for supervised learning, and don’t have any explicit module for planning. Given enough diverse training data and rich policy representation, this has been shown to work - but exploiting the prior we have about the planning computation underlying the behavior will help the system learn faster and become more data-efficient. That’s what this paper does by introducing the VIN model.
- The classic Value-Iteration algorithm can be represented as a Convolutional Neural Network, which can be embedded inside standard feed-forward networks and whose parameters can be learnt by regular backpropagation.