Could you summarize the algorithms 1 and 2, IDVQ and DSRC? Is the sim() function in algorithm 2 a dot product?
Playing Atari with Six Neurons
Deep reinforcement learning on Atari games maps pixel directly to actions; internally, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it. Aiming at devoting entire deep networks to decision making alone, we propose a new method for learning policies and compact state representations separately but simultaneously for policy approximation in reinforcement learning. State representations are generated by a novel algorithm based on Vector Quantization and Sparse Coding, trained online along with the network, and capable of growing its dictionary size over time. We also introduce new techniques allowing both the neural network and the evolution strategy to cope with varying dimensions. This enables networks of only 6 to 18 neurons to learn to play a selection of Atari games with performance comparable---and occasionally superior---to state-of-the-art techniques using evolution strategies on deep networks two orders of magnitude larger.