Normalizing Flows for Probabilistic Modeling and Inference

# Normalizing Flows for Probabilistic Modeling and Inference

## Abstract

Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational trade-offs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.

1

\editor

## 1 Introduction

\subfile

Sections/introduction

## 2 Normalizing Flows

\subfile

Sections/normalizing_flows

## 3 Constructing Flows Part I: Finite Compositions

\subfile

Sections/discrete_time

## 4 Constructing Flows Part II: Continuous-Time Transformations

\subfile

Sections/continuous_time

## 5 Generalizations

\subfile

Sections/flow_generalizations

## 6 Applications

\subfile

Sections/applications

## 7 Conclusions

\subfile

Sections/conclusions

## Acknowledgements

We would like to thank Ivo Danihelka for his invaluable feedback on the manuscript. We also thank Hyunjik Kim and Sébastien Racanière for useful discussions on a wide variety of flow-related topics.

\subfile

Sections/appendix

### Footnotes

1. footnotemark:

### References

1. Justin Alsing, Benjamin D. Wandelt, and Stephen M. Feeney. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. Monthly Notices of the Royal Astronomical Society, 477(3):2874–2885, 2018.
2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, pages 214–223, 2017.
3. Matthias Bauer and Andriy Mnih. Resampled priors for variational autoencoders. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 66–75, 2019.
4. Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, and Joern-Henrik Jacobsen. Invertible residual networks. In Proceedings of the 36th International Conference on Machine Learning, pages 573–582, 2019.
5. Yoshua Bengio and Samy Bengio. Modeling high-dimensional discrete data with multi-layer neural networks. In Advances in Neural Information Processing Systems, pages 400–406, 2000.
6. Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv Preprint arXiv:1308.3432, 2013.
7. Mikołaj Bińkowski, Dougal J. Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD GANs. In International Conference on Learning Representations, 2018.
8. David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
9. Vladimir I. Bogachev. Measure Theory. Springer Berlin Heidelberg, 2007.
10. Vladimir I. Bogachev, Alexander V. Kolesnikov, and Kirill V. Medvedev. Triangular transformations of measures. Sbornik: Mathematics, 196(3):309–335, 2005.
11. Johann Brehmer, Kyle Cranmer, Gilles Louppe, and Juan Pavez. Constraining effective field theories with machine learning. Physical Review Letters, 121(11):111801, 2018.
12. Guillaume Carlier, Alfred Galichon, and Filippo Santambrogio. From Knothe’s transport to Brenier’s map and a continuation method for optimal transport. SIAM Journal on Mathematical Analysis, 41(6):2554–2576, 2010.
13. Ricky T. Q. Chen, Jens Behrmann, David Duvenaud, and Jörn-Henrik Jacobsen. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, 2019.
14. Scott Saobing Chen and Ramesh A. Gopinath. Gaussianization. In Advances in Neural Information Processing Systems, pages 423–429, 2000.
15. Tian Qi Chen and David K. Duvenaud. Neural networks with cheap differential operators. In Advances in Neural Information Processing Systems, 2019.
16. Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018.
17. Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1724–1734, 2014.
18. Earl A. Coddington and Norman Levinson. Theory of ordinary differential equations. International Series in Pure and Applied Mathematics. McGraw-Hill, 1955.
19. Rob Cornish, Anthony L. Caterini, George Deligiannidis, and Arnaud Doucet. Localised generative flows. ArXiv Preprint arXiv:1909.13833, 2019.
20. Kyle Cranmer, Johann Brehmer, and Gilles Louppe. The frontier of simulation-based inference. ArXiv Preprint arXiv:1911.01429, 2019.
21. Ivo Danihelka, Balaji Lakshminarayanan, Benigno Uria, Daan Wierstra, and Peter Dayan. Comparison of maximum likelihood and GAN-based training of Real NVPs. ArXiv Preprint arXiv:1705.05263, 2017.
22. Nicola De Cao, Ivan Titov, and Wilker Aziz. Block neural autoregressive flow. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, 2019.
23. Zhiwei Deng, Megha Nawhal, Lili Meng, and Greg Mori. Continuous graph flow. ArXiv Preprint arXiv:1908.02436, 2019.
24. Peter J. Diggle and Richard J. Gratton. Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society. Series B (Methodological), pages 193–227, 1984.
25. Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. ICLR Workshop Track, 2015.
26. Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. In International Conference on Learning Representations, 2017.
27. Laurent Dinh, Jascha Sohl-Dickstein, Razvan Pascanu, and Hugo Larochelle. A RAD approach to deep mixture models. ICLR Workshop on Deep Generative Models for Highly Structured Data, 2019.
28. Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. Augmented neural ODEs. In Advances in Neural Information Processing Systems, 2019.
29. Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Cubic-spline flows. ICML Workshop on Invertible Neural Networks and Normalizing Flows, 2019a.
30. Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. In Advances in Neural Information Processing Systems, 2019b.
31. Luca Falorsi, Pim de Haan, Tim R. Davidson, and Patrick Forré. Reparameterizing distributions on Lie groups. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 3244–3253, 2019.
32. Jerome H. Friedman. Exploratory projection pursuit. Journal of the American Statistical Association, 82(397):249–266, 1987.
33. Mevlana C. Gemici, Danilo Jimenez Rezende, and Shakir Mohamed. Normalizing flows on Riemannian manifolds. NeurIPS Workshop on Bayesian Deep Learning, 2016.
34. Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked autoencoder for distribution estimation. In Proceedings of the 32nd International Conference on Machine Learning, pages 881–889, 2015.
35. Adam Golinski, Mario Lezcano-Casado, and Tom Rainforth. Improving normalizing flows via better orthogonal parameterizations. ICML Workshop on Invertible Neural Networks and Normalizing Flows, 2019.
36. Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse. The reversible residual network: Backpropagation without storing activations. In Advances in Neural Information Processing Systems, pages 2214–2224, 2017.
37. Pedro J. Gonçalves, Jan-Matthis Lueckmann, Michael Deistler, Marcel Nonnenmacher, Kaan Öcal, Giacomo Bassetto, Chaitanya Chintaluri, William F. Podlaski, Sara A. Haddad, Tim P. Vogels, David S. Greenberg, and Jakob H. Macke. Training deep neural density estimators to identify mechanistic models of neural dynamics. BioRxiv, 2019. doi: 10.1101/838383.
38. Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael Cree. Regularisation of neural networks by enforcing Lipschitz continuity. ArXiv Preprint arXiv:1804.04368, 2018.
39. Will Grathwohl, Ricky T. Q. Chen, Jesse Betterncourt, Ilya Sutskever, and David Duvenaud. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2019.
40. Alex Graves. Generating sequences with recurrent neural networks. ArXiv Preprint arXiv:1308.0850, 2013.
41. David Greenberg, Marcel Nonnenmacher, and Jakob Macke. Automatic posterior transformation for likelihood-free inference. In Proceedings of the 36th International Conference on Machine Learning, pages 2404–2414, 2019.
42. Aditya Grover, Manik Dhar, and Stefano Ermon. Flow-GAN: Combining maximum likelihood and adversarial learning in generative models. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
43. Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, and Sergey Levine. Latent space policies for hierarchical reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pages 1851–1860, 2018.
44. Junxian He, Graham Neubig, and Taylor Berg-Kirkpatrick. Unsupervised learning of syntactic structure with invertible neural projections. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1292–1302, 2018.
45. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
46. Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. Moglow: Probabilistic and controllable motion synthesis using normalising flows. ArXiv Preprint arXiv:1905.06598, 2019.
47. Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In Proceedings of the 36th International Conference on Machine Learning, pages 2722–2730, 2019.
48. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
49. Matthew Hoffman, Pavel Sountsov, Joshua V. Dillon, Ian Langmore, Dustin Tran, and Srinivas Vasudevan. NeuTra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. ArXiv Preprint arXiv:1903.03704, 2019.
50. Shion Honda, Hirotaka Akita, Katsuhiko Ishiguro, Toshiki Nakanishi, and Kenta Oono. Graph residual flow for molecular graph generation. ArXiv Preprint arXiv:1909.13521, 2019.
51. Emiel Hoogeboom, Jorn W. T. Peters, Rianne van den Berg, and Max Welling. Integer discrete flows and lossless compression. In Advances in Neural Information Processing Systems, 2019a.
52. Emiel Hoogeboom, Rianne Van Den Berg, and Max Welling. Emerging convolutions for generative normalizing flows. In Proceedings of the 36th International Conference on Machine Learning, pages 2771–2780, 2019b.
53. Chin-Wei Huang, David Krueger, Alexandre Lacoste, and Aaron Courville. Neural autoregressive flows. In Proceedings of the 35th International Conference on Machine Learning, pages 2078–2087, 2018.
54. Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics—Simulation and Computation, 19(2):433–450, 1990.
55. Aapo Hyvärinen and Petteri Pajunen. Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks, 12(3):429–439, 1999.
56. David Inouye and Pradeep Ravikumar. Deep density destructors. In Proceedings of the 35th International Conference on Machine Learning, pages 2167–2175, 2018.
57. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, pages 448–456, 2015.
58. JÃ¶rn-Henrik Jacobsen, Arnold W. M. Smeulders, and Edouard Oyallon. i-revnet: Deep invertible networks. In International Conference on Learning Representations, 2018.
59. Priyank Jaini, Kira A. Selby, and Yaoliang Yu. Sum-of-squares polynomial flow. In Proceedings of the 36th International Conference on Machine Learning, pages 3009–3018, 2019.
60. Lifeng Jin, Finale Doshi-Velez, Timothy Miller, Lane Schwartz, and William Schuler. Unsupervised learning of PCFGs with normalizing flow. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2442–2452, 2019.
61. Richard M. Johnson. The minimal transformation to orthonormality. Psychometrika, 31:61–66, 03 1966.
62. Nal Kalchbrenner, Aäron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. Video pixel networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1771–1779, 2017.
63. Sungwon Kim, Sang-Gil Lee, Jongyoon Song, Jaehyeon Kim, and Sungroh Yoon. FloWaveNet : A generative flow for raw audio. In Proceedings of the 36th International Conference on Machine Learning, pages 3370–3378, 2019.
64. Diederik P. Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible convolutions. In Advances in Neural Information Processing Systems, pages 10215–10224, 2018.
65. Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014a.
66. Diederik P. Kingma and Max Welling. Efficient gradient-based inference through transformations between bayes nets and neural nets. In Proceedings of the 31st International Conference on Machine Learning, pages 1782–1790, 2014b.
67. Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In Neural Information Processing Systems, pages 4743–4751, 2016.
68. Shoshichi Kobayashi and Katsumi Nomizu. Foundations of differential geometry, volume 1. Interscience Publishers, 1963.
69. Ivan Kobyzev, Simon Prince, and Marcus A. Brubaker. Normalizing flows: Introduction and ideas. ArXiv Preprint arXiv:1908.09257, 2019.
70. Jonas Köhler, Leon Klein, and Frank Noé. Equivariant flows: Sampling configurations for multi-body systems with symmetric energies. ArXiv Preprint arXiv:1910.00753, 2019.
71. Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, and Diederik P. Kingma. VideoFlow: A flow-based generative model for video. ICML Workshop on Invertible Neural Networks and Normalizing Flows, 2019.
72. Valero Laparra, Gustavo Camps-Valls, and Jesús Malo. Iterative Gaussianization: From ICA to random rotations. IEEE Transactions on Neural Networks, 22(4):537–549, 2011.
73. Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pages 29–37, 2011.
74. Mario Lezcano-Casado and David Martínez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In Proceedings of the 36th International Conference on Machine Learning, 2019.
75. Christos Louizos and Max Welling. Multiplicative normalizing flows for variational Bayesian neural networks. In Proceedings of the 34th International Conference on Machine Learning, pages 2218–2227, 2017.
76. Xuezhe Ma, Xiang Kong, Shanghang Zhang, and Eduard Hovy. MaCow: Masked convolutional generative flow. In Advances in Neural Information Processing Systems, 2019.
77. Kaushalya Madhawa, Katushiko Ishiguro, Kosuke Nakago, and Motoki Abe. GraphNVP: An invertible flow model for generating molecular graphs. ArXiv Preprint arXiv:1905.11600, 2019.
78. Murray Marshall. Positive polynomials and sums of squares. American Mathematical Society, 2008.
79. Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010.
80. John W. Milnor and David W. Weaver. Topology from the differentiable viewpoint. Princeton University Press, 1997.
81. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
82. Shakir Mohamed and Balaji Lakshminarayanan. Learning in implicit generative models. NeurIPS Workshop on Adversarial Training, 2016.
83. Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, and Jan Novák. Neural importance sampling. ACM Transactions on Graphics, 38(5):145, 2019.
84. Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Hybrid models with deep and invertible features. In Proceedings of the 36th International Conference on Machine Learning, pages 4723–4732, 2019.
85. Junier Oliva, Avinava Dubey, Manzil Zaheer, Barnabas Poczos, Ruslan Salakhutdinov, Eric Xing, and Jeff Schneider. Transformation autoregressive networks. In Proceedings of the 35th International Conference on Machine Learning, pages 3898–3907, 2018.
86. Aaron Van Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. In Proceedings of The 33rd International Conference on Machine Learning, pages 1747–1756, 2016.
87. George Papamakarios. Neural density estimation and likelihood-free inference. PhD thesis, University of Edinburgh, 2019. Available at https://arxiv.org/abs/1910.13233.
88. George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, pages 2338–2347, 2017.
89. George Papamakarios, David Sterratt, and Iain Murray. Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 837–848, 2019.
90. Lev Semenovich Pontryagin. Mathematical theory of optimal processes. Routledge, 1962.
91. Ryan Prenger, Rafael Valle, and Bryan Catanzaro. Waveglow: A flow-based generative network for speech synthesis. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3617–3621. IEEE, 2019.
92. Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning, pages 1530–1538, 2015.
93. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pages 1278–1286, 2014.
94. Danilo Jimenez Rezende, SÃ©bastien RacaniÃ¨re, Irina Higgins, and Peter Toth. Equivariant Hamiltonian flows. ArXiv Preprint arXiv:1909.13739, 2019.
95. Oren Rippel and Ryan Prescott Adams. High-dimensional probability estimation with deep density models. ArXiv Preprint arXiv:1302.5125, 2013.
96. Hannes Risken. Fokker–Planck equation. In The Fokker–Planck Equation: Methods of Solution and Applications, pages 63–95. Springer Berlin Heidelberg, 1996.
97. Murray Rosenblatt. Remarks on a multivariate transformation. The Annals of Mathematical Statistics, 23(3):470–472, 1952.
98. Walter Rudin. Principles of mathematical analysis. International Series in Pure and Applied Mathematics. McGraw-Hill, 1976.
99. Walter Rudin. Real and complex analysis. Tata McGraw-Hill Education, 2006.
100. Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. In International Conference on Learning Representations, 2017.
101. Yannick Schroecker, Mel Vecerik, and Jon Scholz. Generative predecessor models for sample-efficient imitation learning. In International Conference on Learning Representations, 2019.
102. Ron Shepard, Scott R. Brozell, and Gergely Gidofalvi. The representation and parametrization of orthogonal matrices. The Journal of Physical Chemistry A, 119(28):7924–7939, 2015.
103. Yang Song, Chenlin Meng, and Stefano Ermon. MintNet: Building invertible neural networks with masked convolutions. In Advances in Neural Information Processing Systems, pages 11002–11012, 2019.
104. Endre Süli. Lecture notes on numerical solutions of ordinary differential equations, 2010.
105. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112, 2014.
106. Esteban G. Tabak and Cristina V. Turner. A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2):145–164, 2013.
107. Esteban G. Tabak and Eric Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
108. Lucas Theis and Matthias Bethge. Generative image modeling using spatial LSTMs. In Advances in Neural Information Processing Systems, pages 1927–1935, 2015.
109. Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop, pages 1–5. IEEE, 2015.
110. Michalis K. Titsias. Learning model reparametrizations: Implicit variational inference by fitting MCMC distributions. ArXiv Preprint arXiv:1708.01529, 2017.
111. Jakub M. Tomczak and Max Welling. Improving variational auto-encoders using Householder flow. NeurIPS Workshop on Bayesian Deep Learning, 2016.
112. Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, and Ben Poole. Discrete flows: Invertible generative models of discrete data. In Advances in Neural Information Processing Systems, 2019.
113. Benigno Uria, Iain Murray, and Hugo Larochelle. RNADE: The real-valued neural autoregressive density-estimator. In Advances in Neural Information Processing Systems, pages 2175–2183, 2013.
114. Benigno Uria, Iain Murray, and Hugo Larochelle. A deep and tractable density estimator. In Proceedings of the 31st International Conference on Machine Learning, pages 467–475, 2014.
115. Benigno Uria, Marc-Alexandre Côté, Karol Gregor, Iain Murray, and Hugo Larochelle. Neural autoregressive distribution estimation. Journal of Machine Learning Research, 17(205):1–37, 2016.
116. Rianne van den Berg, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. Sylvester normalizing flows for variational inference. The 34th Conference on Uncertainty in Artificial Intelligence, 2018.
117. Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. WaveNet: A generative model for raw audio. ArXiv Preprint arXiv:1609.03499, 2016a.
118. Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems, pages 4797–4805, 2016b.
119. Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, and Demis Hassabis. Parallel WaveNet: Fast high-fidelity speech synthesis. In Proceedings of the 35th International Conference on Machine Learning, pages 3918–3926, 2018.
120. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
121. Cédric Villani. Optimal transport: Old and new, volume 338. Springer Science & Business Media, 2008.
122. Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.
123. Prince Zizhuang Wang and William Yang Wang. Riemannian normalizing flow on variational Wasserstein autoencoder for text modeling. ArXiv Preprint arXiv:1904.02399, 2019.
124. Patrick Nadeem Ward, Ariella Smofsky, and Avishek Joey Bose. Improving exploration in soft-actor-critic with normalizing flows policies. ICML Workshop on Invertible Neural Networks and Normalizing Flows, 2019.
125. Antoine Wehenkel and Gilles Louppe. Unconstrained monotonic neural networks. In Advances in Neural Information Processing Systems, 2019.
126. Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge J. Belongie, and Bharath Hariharan. PointFlow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the International Conference on Computer Vision, 2019.
127. Chunting Zhou, Xuezhe Ma, Di Wang, and Graham Neubig. Density matching for bilingual word embedding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 1588–1598, 2019.
128. Zachary Ziegler and Alexander Rush. Latent normalizing flows for discrete sequences. In Proceedings of the 36th International Conference on Machine Learning, pages 7673–7682, 2019.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters