This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Zheyu Oliver Wang, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA and [email protected];
(2) Ricardo Baptista, Computing + Mathematical Sciences, California Institute of Technology, Pasadena, CA and [email protected];
(3) Youssef Marzouk, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA and [email protected];
(4) Lars Ruthotto, Department of Mathematics, Emory University, Atlanta, GA and [email protected];
(5) Deepanshu Verma, Department of Mathematics, Emory University, Atlanta, GA and [email protected].
Since there are usually infinitely many diffeomorphisms that satisfy (2.1), imposing additional properties such as monotonicity can effectively limit the search. Optimal transport theory can further motivate the restriction to monotone transport maps. Conditional optimal transport maps ensure equality in (2.1) while minimizing a specific transport cost. These maps often have additional structural properties that can be exploited during learning, leading to several promising works. For example, when using the L2 transport costs, a unique solution to the OT problem exists and is indeed a monotone operator; see [10] for theoretical results on the conditional OT problem. As observed in related approaches for (unconditional) deep generative modeling [36, 56, 58, 17], adding a transport cost penalty to the maximum likelihood training objective does not limit the ability to match the distributions.
Measure transport approaches that introduce monotonicity through optimal transport theory include the CondOT approach [9], normalizing flows [20, 36, 58], and a GAN approach [3]. A more general framework for constructing OT-regularized transport maps is proposed by [56]. Many other measure transport approaches do not consider transport cost but enforce monotonicity by incorporating specific structures in their map parameterizations; see, for example [14, 26, 21, 39, 13, 15]. More generally, UMNN [53] provides a framework for constructing monotone neural networks. Other methods like the adaptive transport map (ATM) algorithm [4] enforce monotonicity through rectifier operators.
There are some close relatives among the family of conditional optimal transport maps to our proposed approaches. The PCP-Map approach is most similar to the CP-Flow approach in [20]. CP-Flow is primarily designed to approximately solve the joint optimal transport problem. Its variant, however, allows for variational inference in the setting of a variational autoencoder (VAE). Inside the GitHub repository associated with [20], a script enables CP-Flow to approximately solve the COT problem over a 1D Gaussian mixture target distribution in a similar fashion to PCP-Map. Therefore, we compare our approach to the amortized CP-Flow using numerical experiments and find that the amortized CP-Flow either fails to solve the problems or is considerably slower than PCP-Map; see subsection 6.4. Our implementation of PCP-Map differs in the following aspects: a simplified transport map architecture, new automatic differentiation tools that avoid the need for stochastic log-determinant estimators, and a projected gradient method to enforce non-negativity constraints on parts of the weights.
Another close relative is the CondOT approach in [9]. The main distinction between CondOT and PCP-Map is the definition of the learning problem. The former solves the W2 COT problem in an adversarial manner, which leads to a challenging stochastic saddle point problem. Our approach minimizes the L2 transport costs, which results in a minimization problem. The COT-Flow approach extends the dynamic OT formulation in [36] to enable conditional sampling and density estimation.