这是用户在 2024-5-3 22:39 为 https://ar5iv.labs.arxiv.org/html/1710.02634?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Notions of optimal transport theory and how to implement them on a computer
最优输送理论的概念及如何在计算机上实施它们

Bruno Lévy and Erica Schwindt
Bruno Lévy 和 Erica Schwindt
Abstract 摘要

This article gives an introduction to optimal transport, a mathematical theory that makes it possible to measure distances between functions (or distances between more general objects), to interpolate between objects or to enforce mass/volume conservation in certain computational physics simulations. Optimal transport is a rich scientific domain, with active research communities, both on its theoretical aspects and on more applicative considerations, such as geometry processing and machine learning. This article aims at explaining the main principles behind the theory of optimal transport, introduce the different involved notions, and more importantly, how they relate, to let the reader grasp an intuition of the elegant theory that structures them. Then we will consider a specific setting, called semi-discrete, where a continuous function is transported to a discrete sum of Dirac masses. Studying this specific setting naturally leads to an efficient computational algorithm, that uses classical notions of computational geometry, such as a generalization of Voronoi diagrams called Laguerre diagrams.
本文介绍了最优输运的数学理论,该理论使得能够衡量函数之间的距离(或更一般对象之间的距离),在对象之间进行插值,或在某些计算物理模拟中强制执行质量/体积保守性。最优输运是一个丰富的科学领域,拥有积极的研究社区,既包括其理论方面,也包括更实际的考虑,如几何处理和机器学习。本文旨在解释最优输运理论背后的主要原则,介绍涉及的不同概念,更重要的是它们之间的关系,以便读者把握构建它们的优雅理论的直觉。然后我们将考虑一个称为半离散的特定设置,其中连续函数被输送到 Dirac 质量的离散总和。研究这个特定设置自然地导致了一种高效的计算算法,该算法使用计算几何的经典概念,如称为拉盖尔图的 Voronoi 图的一般化。

1 Introduction 1 介绍

This article presents an introduction to optimal transport. It summarizes and complements a series of conferences given by B. Lévy between 2014 and 2017. The presentations stays at an elementary level, that corresponds to a computer scientist’s vision of the problem. In the article, we stick to using standard notions of analysis (functions, integrals) and linear algebra (vectors, matrices), and give an intuition of the notion of measure. The main objective of the presentation is to understand the overall structure of the reasoning 111Teach principles, not equations. [R. Feynman]
教授原理,而非方程式。 [R. Feynman]

本文介绍了最优输运。 它总结和补充了 B. Lévy 在 2014 年至 2017 年间举办的一系列会议。 演示在一个初等水平上停留,对应于计算机科学家对问题的看法。 在本文中,我们坚持使用分析的标准概念(函数,积分)和线性代数(向量,矩阵),并提供关于测度概念的直觉。 这篇演讲的主要目标是理解推理的整体结构 1
, and to follow a continuous path from the theory to an efficient algorithm that can be implemented in a computer.
以及从理论到有效算法的连续路径,这些算法可以在计算机中实现。

Optimal transport, initially studied by Monge, [Mon84], is a very general mathematical framework that can be used to model a wide class of application domains. In particular, it is a natural formulation for several fundamental questions in computer graphics [Mém11, Mér11, BvdPPH11], because it makes it possible to define new ways of comparing functions, of measuring distances between functions and interpolating between two (or more) functions :
最优输运最初由蒙日[Monge,Mon84]研究,是一个非常通用的数学框架,可以用来模拟广泛的应用领域。特别是,它是计算机图形学中几个基本问题的自然表述[Mém11,Mér11,BvdPPH11],因为它可以定义比较函数、测量函数之间距离以及插值两个(或多个)函数的新方法:

Refer to caption

Figure 1: Comparing functions: one would like to say that f1subscript𝑓1f_{1} is nearer to f2subscript𝑓2f_{2} than f3subscript𝑓3f_{3}, but the classical L2subscript𝐿2L_{2} norm “does not see” that the graph of f2subscript𝑓2f_{2} corresponds to the graph of f1subscript𝑓1f_{1} slightly shifted along the x𝑥x axis.
图 1:比较函数:人们希望说 f1subscript𝑓1f_{1}f3subscript𝑓3f_{3} 更靠近 f2subscript𝑓2f_{2} ,但是经典的 L2subscript𝐿2L_{2} 范数“看不到” f2subscript𝑓2f_{2} 的图与 f1subscript𝑓1f_{1} 的图沿 x𝑥x 轴稍微移动的对应关系。

Comparing functions 比较函数

Consider the functions f1subscript𝑓1f_{1}, f2subscript𝑓2f_{2} and f3subscript𝑓3f_{3} in Figure 1. Here we have chosen a function f1subscript𝑓1f_{1} with a wildly oscillating graph, and a function f2subscript𝑓2f_{2} obtained by translating the graph of f1subscript𝑓1f_{1} along the x𝑥x axis. The function f3subscript𝑓3f_{3} corresponds to the mean value of f1subscript𝑓1f_{1} (or f2subscript𝑓2f_{2}). If one measures the relative distances between these functions using the classical L2subscript𝐿2L_{2} norm, that is dL2(f,g)=(f(x)g(x))2𝑑xsubscript𝑑subscript𝐿2𝑓𝑔superscript𝑓𝑥𝑔𝑥2differential-d𝑥d_{L_{2}}(f,g)=\int(f(x)-g(x))^{2}dx, one will find that f1subscript𝑓1f_{1} is nearer to f3subscript𝑓3f_{3} than f2subscript𝑓2f_{2}. Optimal transport makes it possible to define a distance that will take into account that the graph of f2subscript𝑓2f_{2} can be obtained from f1subscript𝑓1f_{1} through a translation (like here), or through a deformation of the graph of f1subscript𝑓1f_{1}. From the point of view of this new distance, the function f1subscript𝑓1f_{1} will be nearer to f2subscript𝑓2f_{2} than to f3subscript𝑓3f_{3}.
考虑图 1 中的函数 f1subscript𝑓1f_{1}f2subscript𝑓2f_{2}f3subscript𝑓3f_{3} 。这里我们选择了一个图形急剧振荡的函数 f1subscript𝑓1f_{1} ,以及一个通过沿 x𝑥x 轴平移 f1subscript𝑓1f_{1} 的图形获得的函数 f2subscript𝑓2f_{2} 。函数 f3subscript𝑓3f_{3} 对应于 f1subscript𝑓1f_{1} (或 f2subscript𝑓2f_{2} )的平均值。如果使用经典的 L2subscript𝐿2L_{2} 范数测量这些函数之间的相对距离,即 dL2(f,g)=(f(x)g(x))2𝑑xsubscript𝑑subscript𝐿2𝑓𝑔superscript𝑓𝑥𝑔𝑥2differential-d𝑥d_{L_{2}}(f,g)=\int(f(x)-g(x))^{2}dx ,就会发现 f1subscript𝑓1f_{1}f2subscript𝑓2f_{2} 更接近 f3subscript𝑓3f_{3} 。最优输运使得可能定义一个考虑到 f2subscript𝑓2f_{2} 的图形可以通过平移(就像这里一样)或通过 f1subscript𝑓1f_{1} 的图形的变形从 f1subscript𝑓1f_{1} 获得。从这种新距离的视角来看,函数 f1subscript𝑓1f_{1} 将比 f3subscript𝑓3f_{3} 更接近 f2subscript𝑓2f_{2}

Refer to caption


Figure 2: Interpolating between two functions: linear interpolation makes a hump disappear while the other hump appears; displacement interpolation, stemming from optimal transport, will displace the hump as expected.
图 2:在两个函数之间插值:线性插值使一个驼峰消失,而另一个驼峰出现;来自最优输运的位移插值将按预期移动驼峰。

Interpolating between functions:
在函数之间插值:

Now consider the functions u𝑢u and v𝑣v in Figure 2. Here we suppose that u𝑢u corresponds to a certain physical quantity measured at an initial time t=0𝑡0t=0 and that v𝑣v corresponds to the same phenomenon measured at a final time t=1𝑡1t=1. The problem that we consider now consists in reconstructing what happened between t=0𝑡0t=0 and t=1𝑡1t=1. If we use linear interpolation (Figure 2, top-right), we will see the left hump progressively disappearing while the right hump progressively appears, which is not very realistic if the functions represent for instance a propagating wave. Optimal transport makes it possible to define another type of interpolation (Mc. Cann’s displacement interpolation, Figure 2, bottom-right), that will progressively displace and morph the graph of u𝑢u into the graph of v𝑣v.
现在考虑图 2 中的函数 u𝑢uv𝑣v 。在这里,我们假设 u𝑢u 对应于在初始时间 t=0𝑡0t=0 测量的某个物理量,而 v𝑣v 对应于在最终时间 t=1𝑡1t=1 测量的同一现象。我们现在考虑的问题是重建 t=0𝑡0t=0t=1𝑡1t=1 之间发生的事情。如果我们使用线性插值(图 2,右上角),我们会看到左侧的小丘逐渐消失,而右侧的小丘逐渐出现,这对于函数例如代表传播波的情况并不是很现实的。最优输运使得可以定义另一种类型的插值(Mc. Cann 的位移插值,图 2,右下角),这将逐渐将 u𝑢u 的图形位移和变形为 v𝑣v 的图形。

Optimal transport makes it possible to define a geometry of a space of functions222or more general objects, called probability measures, more on this later.
或更通用的对象,称为概率测度,以后会详细介绍。

最优输运使得可以定义函数空间的几何 2
, and thus gives a definition of distance in this space, as well as means of interpolating between different functions, and in general, defining the barycenter of a weighted family of functions, in a very general context. Thus, optimal transport appears as a fundamental tool in many applied domains. In computer graphics, applications were proposed, to compare and interpolate objects of diverse natures [BvdPPH11], to generate lenses that can concentrate light to form caustics in a prescribed manner [MMT17, STTP14]. Moreover, optimal transport defines new tools that can be used to discretize Partial Differential Equations, and define new numerical solution mechanisms [BCMO14]. This type of numerical solution mechanism can be used to simulate for instance fluids [GM17], with spectacular applications and results in computer graphics [dGWH+15].
,从而为该空间提供了距离的定义,以及在不同函数之间插值的手段,通常在非常一般的情况下定义加权函数族的重心。因此,最优输运在许多应用领域中都是一种基本工具。在计算机图形学中,有一些应用被提出,用于比较和插值不同性质的对象[BvdPPH11],生成可以聚焦光线以预定方式形成焦散光斑的透镜[MMT17, STTP14]。此外,最优输运定义了可以用来离散化偏微分方程并定义新的数值解机制[BCMO14]的新工具。这种数值解机制可以用来模拟例如流体[GM17],在计算机图形学中具有引人注目的应用和结果[dGWH + 15]。

The two sections that follow are partly inspired by [Vil09], [San14], [Caf03] and [AG13], but stay at an elementary level. Here the main goal is to give an intuition of the different concepts, and more importantly an idea of the way the relate together. Finally we will see how they can be directly used to design a computational algorithm with very good performance, that can be used in practice in several application domains.
以下两个部分在一定程度上受到[Vil09],[San14],[Caf03]和[AG13]的启发,但保持在初级水平。 在这里,主要目标是给出对不同概念的直觉,更重要的是对它们之间的关系的一种理解。 最后,我们将看到如何直接使用它们来设计一个性能非常优异的计算算法,该算法可以在实践中在多个应用领域中使用。

Refer to caption

Figure 3: Given two terrains defined by their height functions u𝑢u and v𝑣v, symbolized here as gray levels, Monge’s problem consists in transforming one terrain into the other one by moving matter through an application T𝑇T. This application needs to satisfy a mass conservation constraint.
图 3:假设有两个由高度函数定义的地形 u𝑢uv𝑣v ,这里表示为灰度级别,Monge 的问题在于通过将物质通过应用 T𝑇T 从一个地形转换为另一个地形。 这种应用需要满足质量守恒约束。

2 Monge’s problem 蒙日问题

The initial optimal transport problem was first introduced and studied by Monge, right before the French revolution [Mon84]. We first give an intuitive idea of the problem, then quickly introduce the notion of measure, that is necessary to formally state the problem in its most general form and to analyze it.
最初的最优输运问题是在法国大革命前由蒙日首次引入和研究的[Mon84]。我们首先直观地介绍问题的概念,然后快速引入测度的概念,这对于在其最一般形式下正式陈述问题并分析问题是必要的。

2.1 Intuition 直觉

Monge’s initial motivation to study this problem was very practical: supposing you have an army of workmen, how can you transform a terrain with an initial landscape into a given desired target landscape, while minimizing the total amount of work ?
蒙日最初研究这个问题的动机非常实际:假设你有一支工人队,如何将初始地形转变为目标地形,同时最大限度地减少总工作量?

Monge’s initial problem statement was as follows:
蒙日最初的问题陈述如下:

infT:XXXc(x,T(x))u(x)𝑑x subject to: BX,T1(B)u(x)𝑑x=Bv(x)𝑑xsubscriptinfimum:𝑇𝑋𝑋subscript𝑋𝑐𝑥𝑇𝑥𝑢𝑥differential-d𝑥 subject to: formulae-sequencefor-all𝐵𝑋subscriptsuperscript𝑇1𝐵𝑢𝑥differential-d𝑥subscript𝐵𝑣𝑥differential-d𝑥\begin{array}[]{l}\inf\limits_{T:X\rightarrow X}\int\limits_{X}c(x,T(x))u(x)dx\\[17.07164pt] \mbox{\ \ \ \ subject to: }\\[17.07164pt] \forall B\subset X,\int\limits_{T^{-1}(B)}u(x)dx=\int\limits_{B}v(x)dx\end{array}

where X𝑋X is a subset of 2superscript2{\mathbb{R}}^{2}, u𝑢u and v𝑣v are two positive functions defined on X𝑋X and such that Xu(x)𝑑xsubscript𝑋𝑢𝑥differential-d𝑥\int_{X}u(x)dx = Yv(x)𝑑xsubscript𝑌𝑣𝑥differential-d𝑥\int_{Y}v(x)dx, and c(,)𝑐c(\cdot,\cdot) is a convex distance (the Euclidean distance in Monge’s initial problem statement).
其中 X𝑋X2superscript2{\mathbb{R}}^{2} 的子集, u𝑢uv𝑣v 是在 X𝑋X 上定义的两个正函数,使得 Xu(x)𝑑xsubscript𝑋𝑢𝑥differential-d𝑥\int_{X}u(x)dx = Yv(x)𝑑xsubscript𝑌𝑣𝑥differential-d𝑥\int_{Y}v(x)dxc(,)𝑐c(\cdot,\cdot) 是一个凸距离(蒙日初始问题陈述中的欧几里得距离)。

The functions u𝑢u and v𝑣v represent the height of the current landscape and the height of the target landscape respectively (symbolized as gray levels in Figure 3). The problem consists in finding (if it exists) a function T𝑇T from X𝑋X to X𝑋X that transforms the current landscape u𝑢u into the desired one v𝑣v, while minimizing the product of the amount of transported earth u(x)𝑢𝑥u(x) with the distance c(x,T(x))𝑐𝑥𝑇𝑥c(x,T(x)) to which it was transported. Clearly, the amount of earth is conserved during transport, thus the total quantity of earth should be the same in the source and target landscapes (the integrals of u𝑢u and v𝑣v over X𝑋X should coincide). This global matter conservation constraint needs to be completed with a local one. The local matter conservation constraint enforces that in the target landscape, the quantity of earth received in any subset B𝐵B of X𝑋X corresponds to what was transported here, that is the quantity of earth initially present in the pre-image T1(B)superscript𝑇1𝐵T^{-1}(B) of B𝐵B under T𝑇T. Without this constraint, one could locally create matter in some places and annihilate matter in other places in a counterbalancing way. A map T𝑇T that satisfies the local mass conservation constraint is called a transport map.
函数 u𝑢uv𝑣v 分别表示当前地貌的高度和目标地貌的高度(在图 3 中以灰度级表示)。问题在于找到(如果存在)一个从 X𝑋XX𝑋X 的函数 T𝑇T ,将当前地貌 u𝑢u 转变为期望的地貌 v𝑣v ,同时最小化通过运输的土壤数量 u(x)𝑢𝑥u(x) 与其运输距离 c(x,T(x))𝑐𝑥𝑇𝑥c(x,T(x)) 的乘积。显然,土壤的数量在运输过程中是不变的,因此源地貌和目标地貌中土壤的总量应当相同(在 X𝑋Xu𝑢uv𝑣v 的积分应当一致)。这一全局物质守恒约束需要进一步补充一条局部物质守恒约束。局部物质守恒约束要求在目标地貌中,任意子集 B𝐵B 所接收的土壤数量应与运输到这里的土壤数量对应,也就是最初存在于 T𝑇T 下的 B𝐵B 的前像中土壤数量应当一致。如果没有这一约束,人们可以在一些地方本地创造物质,并在其他地方以相等的方式消除物质。 符合局部质量守恒约束的地图称为输运地图。

Refer to caption

Figure 4: Transport from a function (gray levels) to a discrete point-set (blue disks).
图 4:从函数(灰度级)传输到离散点集(蓝色圆盘)。

2.2 Monge’s problem with measures
2.2 蒙日问题与度量

We now suppose that instead of a “target landscape”, we wish to transport earth (or a resource) towards a set of points (that will be denoted by Y𝑌Y for now on), that represent for instance a set of factories that exploit a resource, see Figure 4. Each factory wishes to receive a certain quantity of resource (depending for instance of the number of potential customers around the factory). Thus, the function v𝑣v that represents the “target landscape” is replaced with a function on a finite set of points. However, if a function v𝑣v is zero everywhere except on a finite set of points, then its integral over X𝑋X is also zero. This is a problem, because for instance one cannot properly express the mass conservation constraint. For this reason, the notion of function is not rich enough for representing this configuration. One can use instead measures (more on this below), and associate with each factory a Dirac mass weighted by the quantity of resource to be transported to the factory.
现在我们假设,我们不是想要将土地(或资源)运输到“目标景观”,而是希望将土地(或资源)运输到一组点(暂时表示为 Y𝑌Y )。这些点例如代表一组开采资源的工厂,参见图 4。每个工厂都希望获得一定数量的资源(取决于工厂周围潜在客户的数量)。因此,表示“目标景观”的函数 v𝑣v 被替换为一组有限点上的函数。然而,如果函数 v𝑣v 在除一组有限点外的所有地方均为零,则其在 X𝑋X 上的积分也为零。这是一个问题,因为例如不能正确表达质量守恒约束。因此,函数的概念不足以表示这种配置。可以改用度量(下文详述),并将每个工厂与要运往工厂的资源数量加权的 Dirac 质量相关联。

From now on, we will use measures μ𝜇\mu and ν𝜈\nu to represent the “current landscape” and the “target landscape”. These measures are supported by sets X𝑋X and Y𝑌Y, that may be different sets (in the present example, X𝑋X is a subset of 2superscript2{\mathbb{R}}^{2} and Y𝑌Y is a discrete set of points). Using measures instead of function not only makes it possible to study our “transport to discrete set of factories” problem, but also it can be used to formalize computer objects (meshes) and directly leads to a computational algorithm. This algorithm is very elegant because it is a verbatim computer translation of the mathematical theory (see §7.6). In this particular setting, translating from the mathematical language to the algorithmic setting does not require to make any approximation. This is made possible by the generality of the notion of measure.
从现在开始,我们将使用测度 μ𝜇\muν𝜈\nu 来表示“当前景观”和“目标景观”。这些测度由集合 X𝑋XY𝑌Y 支持,这些集合可能是不同的(在当前示例中, X𝑋X2superscript2{\mathbb{R}}^{2} 的子集, Y𝑌Y 是一组离散的点)。使用测度而不是函数不仅可以研究我们的“运输到离散工厂集合”的问题,还可以用来形式化计算机对象(网格),并直接导致一个计算算法。这个算法非常优雅,因为它是数学理论的逐字计算机转换(见第 7.6 节)。在这种特定情境中,将数学语言转化为算法设置不需要进行任何近似。这是由测度概念的一般性所实现的。

The reader who wishes to learn more on measure theory may refer to the textbook [Tao11]. To keep the length of this article reasonable, we will not give here the formal definition of a measure. In our context, one can think of a measure as a “function” that can be only queried using integrals and that can be “concentrated” on very small sets (points). The following table can be used to intuitively translate from the “language of functions” to the “language of measures” :
希望学习测度论的读者可以参考教材[Tao11]。为了保持本文的长度合理,我们不会在此处给出测度的正式定义。在我们的语境中,可以将测度看作是一个只能通过积分查询并能“聚集”在非常小的集合(点)上的“函数”。以下表格可用于直观地从“函数的语言”翻译成“测度的语言”:

Function uMeasure μ Bu(x)𝑑xμ(B) or B𝑑μBf(x)u(x)𝑑xBf(x)𝑑μu(x)N/Amissing-subexpressionmissing-subexpressionFunction uMeasure μ missing-subexpressionmissing-subexpressionsubscript𝐵𝑢𝑥differential-d𝑥𝜇𝐵 or subscript𝐵differential-d𝜇subscript𝐵𝑓𝑥𝑢𝑥differential-d𝑥subscript𝐵𝑓𝑥differential-d𝜇𝑢𝑥𝑁𝐴\begin{array}[]{|c|c|}\hline\cr\mbox{Function $u$}&\mbox{Measure $\mu$ }\\[5.69054pt] \hline\cr\int_{B}u(x)dx&\mu(B)\mbox{ or }\int_{B}d\mu\\[5.69054pt] \int_{B}f(x)u(x)dx&\int_{B}f(x)d\mu\\[5.69054pt] u(x)&N/A\\ \hline\cr\end{array}

(Note: in contrast with functions, measures cannot be evaluated at a point, they can be only integrated over domains).
(注意:与函数相对应,测度不能在一个点被评估,它们只能在域上被积分)。

In its version with measures, Monge’s problem can be stated as follows:
在具有测度的版本中,蒙日问题可以陈述如下:

infT:XYXc(x,T(x))𝑑μ subject to ν=Tμsubscriptinfimum:𝑇𝑋𝑌subscript𝑋𝑐𝑥𝑇𝑥differential-d𝜇 subject to 𝜈𝑇𝜇\begin{array}[]{l}\inf\limits_{T:X\rightarrow Y}\int\limits_{X}c(x,T(x))d\mu\ \mbox{ subject to }\ \nu=T\sharp\mu\end{array} (M)

where X𝑋X and Y𝑌Y are Borel sets (that is, sets that can be measured), μ𝜇\mu and ν𝜈\nu are two measures on X𝑋X and Y𝑌Y respectively such that μ(X)=ν(Y)𝜇𝑋𝜈𝑌\mu(X)=\nu(Y) and c(,)𝑐c(\cdot,\cdot) is a convex distance. The constraint ν=Tμ𝜈𝑇𝜇\nu=T\sharp\mu, that reads “T𝑇T pushes μ𝜇\mu onto ν𝜈\nu” corresponds to the local mass conservation constraint. Given a measure μ𝜇\mu on X𝑋X and a map T𝑇T from X𝑋X to Y𝑌Y, the measure Tμ𝑇𝜇T\sharp\mu on Y𝑌Y, called “the pushforward of μ𝜇\mu by T𝑇T”, is such that Tμ(B)=μ(T1(B))𝑇𝜇𝐵𝜇superscript𝑇1𝐵T\sharp\mu(B)=\mu(T^{-1}(B)) for all Borel set BY𝐵𝑌B\subset Y. Thus, the local mass conservation constraint means that μ(T1(B))=ν(B)𝜇superscript𝑇1𝐵𝜈𝐵\mu(T^{-1}(B))=\nu(B) for all Borel set B𝐵B \subset Y𝑌Y.
其中 X𝑋XY𝑌Y 是 Borel 集合(即可以被测量的集合), μ𝜇\muν𝜈\nu 分别是 X𝑋XY𝑌Y 上的两个测度,使得 μ(X)=ν(Y)𝜇𝑋𝜈𝑌\mu(X)=\nu(Y)c(,)𝑐c(\cdot,\cdot) 是一个凸距离。约束条件 ν=Tμ𝜈𝑇𝜇\nu=T\sharp\mu ,“ T𝑇Tμ𝜇\mu 推到 ν𝜈\nu 上”对应于局部质量守恒约束。给定 X𝑋X 上的一个测度 μ𝜇\mu 和从 X𝑋XY𝑌Y 的映射 T𝑇TY𝑌Y 上的测度 Tμ𝑇𝜇T\sharp\mu ,称为“ μ𝜇\muT𝑇T 推送的前向”,满足所有 Borel 集合 BY𝐵𝑌B\subset Y 。因此,局部质量守恒约束意味着所有 Borel 集合 B𝐵B\subset Y𝑌Y

The local mass conservation constraint makes the problem very difficult: imagine now that you want to implement a computer program that enforces it: the constraint concerns all the subsets B𝐵B of Y𝑌Y. Could you imagine an algorithm that just tests whether a given map satisfies it ? What about enforcing it ? We will see below a series of transformations of the initial problem into equivalent problems, where the constraint becomes linear. We will finally end up with a simple convex optimization problem, that can be solved numerically using classical methods.
本地质量守恒约束使问题变得非常困难: 现在想象一下,您想要实现一个强制执行它的计算机程序: 约束涉及所有子集 B𝐵B Y𝑌Y 。 您能想象一个仅测试给定映射是否满足它的算法吗?关于强制执行它呢?我们将在下面看到一系列初始问题转化为等价问题的转换,约束变为线性。 最后,我们将得到一个简单的凸优化问题,可以使用经典方法进行数值解决。

Refer to caption


Figure 5: A classical example of the existence problem: there is no optimal transport between a segment L1subscript𝐿1L_{1} and two parallel segments L2subscript𝐿2L_{2} and L3subscript𝐿3L_{3} (it is always possible to find a better transport by replacing hh with h/22h/2).
图 5:存在问题的经典例子:在线段 L1subscript𝐿1L_{1} 和两个平行线段 L2subscript𝐿2L_{2}L3subscript𝐿3L_{3} 之间没有最佳传输(总是可以通过用 h/22h/2 替换 hh 找到更好的传输)。
Refer to caption
Figure 6: Four example of transport plans in 1D. A: a segment is translated. B: a segment is split into two segments. C: a Dirac mass is split into two Dirac masses; D: a Dirac mass is spread along two segments. The first two examples (A and B) have the form (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu where T𝑇T is a transport map. The third and fourth ones (C and D) have no corresponding transport map, because each of them splits a Dirac mass.
图 6:1D 运输方案示例。A:一个段被平移。B:一个段被分成两个段。C:一个 Dirac 质量被分成两个 Dirac 质量;D:一个 Dirac 质量沿两个段传播。前两个示例(A 和 B)的形式为 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu ,其中 T𝑇T 是一个运输映射。第三个和第四个示例(C 和 D)没有对应的运输映射,因为它们每个都分割了一个 Dirac 质量。

Before then, let us get back to examine the original problem. The local mass conservation constraint is not the only difficulty: the functional optimized by Monge’s problem is non-symmetric, and this causes additional difficulties when studying the existence of solutions for problem (M). The problem is not symmetric because T𝑇T needs to be a map, therefore the source and target landscape do not play the same role. Thus, it is possible to merge earth (if T(x1)=T(x2)𝑇subscript𝑥1𝑇subscript𝑥2T(x_{1})=T(x_{2}) for two different points x1subscript𝑥1x_{1} and x2subscript𝑥2x_{2}), but it is not possible to split earth (for that, we would need a “map” T𝑇T that could send the same point x𝑥x to two different points y1subscript𝑦1y_{1} and y2subscript𝑦2y_{2}). The problem is illustrated in Figure 5: suppose that you want to compute the optimal transport between a segment L1subscript𝐿1L_{1} (that symbolizes a “wall of earth”) and two parallel segments L2subscript𝐿2L_{2} and L3subscript𝐿3L_{3} (that symbolize two “trenches” with a depth that correspond to half the height of the wall of earth). Now we want to transport the wall of earth to the trenches, to make the landscape flat. To do so, it is possible to decompose L1subscript𝐿1L_{1} into segments of length hh, sent alternatively towards L2subscript𝐿2L_{2} and L3subscript𝐿3L_{3} (Figure 5 on the left). For any length hh, it is always possible to find a better map T𝑇T, that is a lower value of the functional in (M), by subdividing L1subscript𝐿1L_{1} into smaller segments (Figure 5 on the right). The best way to proceed consists in sending from each point of L1subscript𝐿1L_{1} half the earth to L2subscript𝐿2L_{2} and half the earth to L3subscript𝐿3L_{3}, which cannot be represented by a map. Thus, the best solution of problem (M) is not a map. In a more general setting, this problem appears each time the source measure μ𝜇\mu has mass concentrated on a manifold of dimension d1𝑑1d-1 [McC95] (like the segment L1subscript𝐿1L_{1} in the present example).
在此之前,让我们回到检查原始问题。当地质量守恒约束并不是唯一的困难时:蒙热问题的优化功能是非对称的,这在研究问题(M)的解存在时导致额外困难。该问题不对称是因为 T𝑇T 需要是一张地图,因此源地形和目标地形扮演的角色并不相同。因此,可以合并大地(如果 T(x1)=T(x2)𝑇subscript𝑥1𝑇subscript𝑥2T(x_{1})=T(x_{2}) 两个不同点 x1subscript𝑥1x_{1}x2subscript𝑥2x_{2} ),但不可能分开地球(为此,我们需要一张“地图” T𝑇T 能将同一点 x𝑥x 发送到两个不同点 y1subscript𝑦1y_{1}y2subscript𝑦2y_{2} )。该问题在图 5 中得到了说明:假设你想要计算一个代表“一堵土墙”的线段 L1subscript𝐿1L_{1} 和两个平行线段 L2subscript𝐿2L_{2}L3subscript𝐿3L_{3} 之间的最佳传输(代表两个深度相当于土墙高度一半的“沟渠”)。现在我们想把土墙运输到沟渠,使地形平坦。为了做到这一点,可以将 L1subscript𝐿1L_{1} 分解为长度为 hh 的线段,交替地发送到 L2subscript𝐿2L_{2}L3subscript𝐿3L_{3} (左边的图 5)。 对于任意长度 hh ,总是可以通过将 L1subscript𝐿1L_{1} 细分为更小的段(右侧第 5 图)而找到一个更好的地图 T𝑇T ,即(M)中的功能值更低的地图。最佳操作方法是从 L1subscript𝐿1L_{1} 的每一点向 L2subscript𝐿2L_{2} 发送地球的一半,向 L3subscript𝐿3L_{3} 发送地球的一半,而这不能用地图表示。因此,问题(M)的最佳解决方案不是地图。在一个更一般的设定中,每当源测度 μ𝜇\mu 在一个维度 d1𝑑1d-1 的流形上有集中质量时[McC95](例如当前示例中的段 L1subscript𝐿1L_{1} )时,该问题会出现。

3 Kantorovich’s relaxed problem
3Kantorovich 的放松问题

To overcome this difficulty, Kantorovich stated a problem with a larger space of solutions, that is, a relaxation of problem (M), where mass can be both split and merged. The idea consists in solving for the “graph of T𝑇T” instead of T𝑇T. One may think of the graph of T𝑇T as a function g𝑔g defined on X×Y𝑋𝑌X\times Y that indicates for each couple of points xX,yYformulae-sequence𝑥𝑋𝑦𝑌x\in X,y\in Y how much matter goes from x𝑥x to y𝑦y. However, once again, we cannot use standard functions to represent the graph of T𝑇T: if you think about the graph of a univariate function xf(x)maps-to𝑥𝑓𝑥x\mapsto f(x), it is defined on 2superscript2\mathbb{R}^{2} but concentrated on a curve. For this reason, as in our previous example with factories, one needs to use measures. Thus, we are now looking for a measure γ𝛾\gamma supported by the product space X×Y𝑋𝑌X\times Y. The relaxed problem is stated as follows:
为了克服这一困难,坎托罗维奇提出了一个具有更大解空间的问题,即问题(M)的放松,其中质量既可以分裂也可以合并。这个想法在解决“ T𝑇T 的图”而不是 T𝑇T 时得以体现。我们可以将 T𝑇T 的图想象为定义在 X×Y𝑋𝑌X\times Y 上的函数 g𝑔g ,它指示每对点 xX,yYformulae-sequence𝑥𝑋𝑦𝑌x\in X,y\in Y 之间有多少物质从 x𝑥x 流向 y𝑦y 。然而,再一次,我们不能使用标准函数来表示 T𝑇T 的图:如果你想到一个一元函数 xf(x)maps-to𝑥𝑓𝑥x\mapsto f(x) 的图,它在 2superscript2\mathbb{R}^{2} 上定义,但集中在一条曲线上。因此,与我们之前关于工厂的例子一样,我们需要使用测度。因此,我们现在正在寻找一个支持产品空间 X×Y𝑋𝑌X\times Y 的测度 γ𝛾\gamma 。放松后的问题陈述如下:

infγ{X×Yc(x,y)𝑑γ|γ0 and γΠ(μ,ν)}where: Π(μ,ν)={γ𝒫(X×Y)|(PX)γ=μ;(PY)γ=ν}subscriptinfimum𝛾conditional-setsubscript𝑋𝑌𝑐𝑥𝑦differential-d𝛾𝛾0 and 𝛾Π𝜇𝜈where: Π𝜇𝜈conditional-set𝛾𝒫𝑋𝑌formulae-sequencesubscript𝑃𝑋𝛾𝜇subscript𝑃𝑌𝛾𝜈\begin{array}[]{l}\inf\limits_{\gamma}\left\{\int\limits_{X\times Y}c(x,y)d\gamma\ |\ \gamma\geq 0\mbox{ and }\gamma\in\Pi(\mu,\nu)\right\}\\[14.22636pt] \mbox{where: }\\[8.53581pt] \Pi(\mu,\nu)=\{\gamma\in{\cal P}(X\times Y)\ |\ (P_{X})\sharp\gamma=\mu\ ;\ (P_{Y})\sharp\gamma=\nu\}\end{array} (K)

where (PX)subscript𝑃𝑋(P_{X}) and (PY)subscript𝑃𝑌(P_{Y}) denote the two projections (x,y)X×Yx𝑥𝑦𝑋𝑌maps-to𝑥(x,y)\in X\times Y\mapsto x and (x,y)X×Yy𝑥𝑦𝑋𝑌maps-to𝑦(x,y)\in X\times Y\mapsto y respectively.
其中 (PX)subscript𝑃𝑋(P_{X})(PY)subscript𝑃𝑌(P_{Y}) 表示两个投影 (x,y)X×Yx𝑥𝑦𝑋𝑌maps-to𝑥(x,y)\in X\times Y\mapsto x(x,y)X×Yy𝑥𝑦𝑋𝑌maps-to𝑦(x,y)\in X\times Y\mapsto y

The two measures (PX)γsubscript𝑃𝑋𝛾(P_{X})\sharp\gamma and (PY)γsubscript𝑃𝑌𝛾(P_{Y})\sharp\gamma obtained by pushing forward γ𝛾\gamma by the two projections are called the marginals of γ𝛾\gamma. The measures γ𝛾\gamma in the admissible set Π(μ,ν)Π𝜇𝜈\Pi(\mu,\nu), that is, the measures that have μ𝜇\mu and ν𝜈\nu as marginals, are called optimal transport plans. Let us now have a closer look at the two constraints on the marginals (PX)γ=μsubscript𝑃𝑋𝛾𝜇(P_{X})\sharp\gamma=\mu and (PX)γsubscript𝑃𝑋𝛾(P_{X})\sharp\gamma that define the set of optimal transport plans Π(μ,ν)Π𝜇𝜈\Pi(\mu,\nu). Recalling the definition of the pushforward (previous subsection), these two constraints can also be written as:
通过两个投射获得的由 γ𝛾\gamma 推动而来的两个值 (PX)γsubscript𝑃𝑋𝛾(P_{X})\sharp\gamma(PY)γsubscript𝑃𝑌𝛾(P_{Y})\sharp\gamma 被称为 γ𝛾\gamma 的两个边际。在可接受集合 Π(μ,ν)Π𝜇𝜈\Pi(\mu,\nu)γ𝛾\gamma 度量,即具有 μ𝜇\muν𝜈\nu 作为边际的测度,被称为最优输运方案。现在让我们更仔细地看看定义最优输运方案 Π(μ,ν)Π𝜇𝜈\Pi(\mu,\nu) 集合的两个边际 (PX)γ=μsubscript𝑃𝑋𝛾𝜇(P_{X})\sharp\gamma=\mu(PX)γsubscript𝑃𝑋𝛾(P_{X})\sharp\gamma 的两个约束条件。回顾向前推进的定义(前面的小节),这两个约束条件也可以写成:

(PX)γ=μBX,B𝑑μ=B×Y𝑑γ(PY)γ=νBY,B𝑑ν=X×B𝑑γ.subscript𝑃𝑋𝛾𝜇iffformulae-sequencefor-all𝐵𝑋subscript𝐵differential-d𝜇subscript𝐵𝑌differential-d𝛾subscript𝑃𝑌𝛾𝜈iffformulae-sequencefor-allsuperscript𝐵𝑌subscriptsuperscript𝐵differential-d𝜈subscript𝑋superscript𝐵differential-d𝛾\begin{array}[]{lcl}(P_{X})\sharp\gamma=\mu&\iff&\forall B\subset X,\int_{B}d\mu=\int_{B\times Y}d\gamma\\[5.69054pt] (P_{Y})\sharp\gamma=\nu&\iff&\forall B^{\prime}\subset Y,\int_{B^{\prime}}d\nu=\int_{X\times B^{\prime}}d\gamma.\end{array} (1)

Intuitively, the first constraint (PX)γ=μsubscript𝑃𝑋𝛾𝜇(P_{X})\sharp\gamma=\mu means that everything that comes from a subset B𝐵B of X𝑋X should correspond to the amount of matter (initially) contained by B𝐵B in the source landscape, and the second one (PY)γ=νsubscript𝑃𝑌𝛾𝜈(P_{Y})\sharp\gamma=\nu means that everything that goes into a subset Bsuperscript𝐵B^{\prime} of Y𝑌Y should correspond to the (prescribed) amount of matter contained by Bsuperscript𝐵B^{\prime} in the target landscape ν𝜈\nu.
直观上,第一个约束条件 (PX)γ=μsubscript𝑃𝑋𝛾𝜇(P_{X})\sharp\gamma=\mu 意味着来自 X𝑋X 的子集的一切应该对应于来源景观 B𝐵B 中包含的物质量(最初),而第二个条件 (PY)γ=νsubscript𝑃𝑌𝛾𝜈(P_{Y})\sharp\gamma=\nu 意味着进入 Y𝑌Y 的子集的一切应该对应于目标景观 ν𝜈\nuBsuperscript𝐵B^{\prime} 所包含的(规定的)物质量。

Refer to caption

Figure 7: A discrete version of Kantorovich’s problem.
图 7:康托罗维奇问题的离散版本。

We now examine the relation between the relaxed problem (K) and the initial problem (M). One can easily check that among the optimal transport plans, those with the form (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu correspond to a transport map T𝑇T:
我们现在研究松弛问题(K)与初始问题(M)之间的关系。可以很容易地检查到,在最优输运方案中,那些形式为 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu 的方案 对应于一个输运映射 T𝑇T

Observation 1.

If (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu is a transport plan, then T𝑇T pushes μ𝜇\mu onto ν𝜈\nu.


观察 1. 如果 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu 是一个输运方案,则 T𝑇T 会将 μ𝜇\mu 推送到 ν𝜈\nu
Proof.

(Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu is in Π(μ,ν)Π𝜇𝜈\Pi(\mu,\nu), thus (PY)(Id×T)μ=νsubscript𝑃𝑌𝐼𝑑𝑇𝜇𝜈(P_{Y})\sharp(Id\times T)\sharp\mu=\nu, or ((PY)(Id×T))μ=νsubscript𝑃𝑌𝐼𝑑𝑇𝜇𝜈\left((P_{Y})\circ(Id\times T)\right)\sharp\mu=\nu, and finally Tμ=ν𝑇𝜇𝜈T\sharp\mu=\nu. ∎


证明。 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\muΠ(μ,ν)Π𝜇𝜈\Pi(\mu,\nu) 中,因此 (PY)(Id×T)μ=νsubscript𝑃𝑌𝐼𝑑𝑇𝜇𝜈(P_{Y})\sharp(Id\times T)\sharp\mu=\nu ,或 ((PY)(Id×T))μ=νsubscript𝑃𝑌𝐼𝑑𝑇𝜇𝜈\left((P_{Y})\circ(Id\times T)\right)\sharp\mu=\nu ,最终 Tμ=ν𝑇𝜇𝜈T\sharp\mu=\nu 。∎

We can now observe that if a transport plan γ𝛾\gamma has the form γ=(Id×T)μ𝛾𝐼𝑑𝑇𝜇\gamma=(Id\times T)\sharp\mu, then problem (K) becomes:
我们现在可以观察到,如果一个运输计划 γ𝛾\gamma 的形式为 γ=(Id×T)μ𝛾𝐼𝑑𝑇𝜇\gamma=(Id\times T)\sharp\mu ,那么问题(K)就变成了:

min{X×Yc(x,y)d((Id×T)μ)}=min{Xc(x,T(x))dμ)\mbox{min}\left\{\int\limits_{X\times Y}c(x,y)d\left((Id\times T)\sharp\mu\right)\right\}\quad=\quad\mbox{min}\left\{\int\limits_{X}c(x,T(x))d\mu\right)

(one retrieves the initial Monge problem).
(其中一个是检索最初的蒙日问题)。

To further help grasping an intuition of this notion of transport plan, we show four 1D examples in Figure 6 (the transport plan is then 1D×1D=2D1𝐷1𝐷2𝐷1D\times 1D=2D). Intuitively, the transport plan γ𝛾\gamma may be thought of as a “table” indexed by x𝑥x and y𝑦y that indicates the quantity of matter transported from x𝑥x to y𝑦y. More exactly333We recall that one cannot evaluate a measure γ𝛾\gamma at a point (x,y)𝑥𝑦(x,y), we can just compute integrals with γ𝛾\gamma.
我们回顾一下,不能在点 (x,y)𝑥𝑦(x,y) 处评估度量 γ𝛾\gamma ,我们只能计算 γ𝛾\gamma 的积分。

为了进一步帮助理解这种运输计划的概念,我们在图 6 中展示了四个 1D 示例(然后运输计划为 1D×1D=2D1𝐷1𝐷2𝐷1D\times 1D=2D )。直觉上,运输计划 γ𝛾\gamma 可以被认为是由 x𝑥xy𝑦y 索引的“表”,指示从 x𝑥x 运输到 y𝑦y 的物质数量。 更确切地说 3
, the measure γ𝛾\gamma is non-zero on subsets of X×Y𝑋𝑌X\times Y that contain points (x,y)𝑥𝑦(x,y) such that some matter is transported from x𝑥x to y𝑦y. Whenever γ𝛾\gamma derives from a transport map T𝑇T, that is if γ𝛾\gamma has the form (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu, then we can consider γ𝛾\gamma as the “graph of T𝑇T” like in the first two examples of Figure 6 (A) and (B)444Note that the measure μ𝜇\mu is supposed to be absolutely continuous with respect to the Lebesgue measure. This is required, because for instance in example (B) of Figure 6, the transport map T𝑇T is undefined at the center of the segment. The absolute continuity requirement allows one to remove from X𝑋X any subset with zero measure.
注意,度量 μ𝜇\mu 应该对勒贝格测度是绝对连续的。 这是必需的,因为例如在图 6 的示例(B)中,运输映射 T𝑇T 在该段的中心处是未定义的。 绝对连续性要求允许从 X𝑋X 中去除任何零度量子集。

,度量 γ𝛾\gamma 在包含点 (x,y)𝑥𝑦(x,y)X×Y𝑋𝑌X\times Y 的子集上非零,点 (x,y)𝑥𝑦(x,y) 使从 x𝑥xy𝑦y 转移了一些物质。每当 γ𝛾\gamma 由运输图 T𝑇T 推导出来时,即如果 γ𝛾\gamma 具有形式 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu ,那么我们可以将 γ𝛾\gamma 视为“ T𝑇T 的图”,就像图 6(A)和(B)的前两个示例中 4
in Figure 6. 在图 6 中。

The transport plans of the two other examples (C) and (D) have no associated transport map, because they split Dirac masses. The transport plan associated with Figure 5 has the same nature (but this time in 2D×2D=4D2𝐷2𝐷4𝐷2D\times 2D=4D). It cannot be written with the form (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu because it splits the mass concentrated in L1subscript𝐿1L_{1} into L2subscript𝐿2L_{2} and L3subscript𝐿3L_{3}.
另外两个示例(C)和(D)的运输计划没有关联的传输地图,因为它们分裂了狄拉克质量。与图 5 相关的运输计划具有相同的性质(但这次是在 2D×2D=4D2𝐷2𝐷4𝐷2D\times 2D=4D )。它不能用形式 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu 来写,因为它将集中在 L1subscript𝐿1L_{1} 的质量分裂成 L2subscript𝐿2L_{2}L3subscript𝐿3L_{3}

Now the theoretical questions are:
现在的理论问题是:

  • when does an optimal transport plan exist ?


    • 何时存在最优输运方案?
  • when does it admit an associated optimal transport map ?


    • 何时承认其存在关联的最优输运图?

A standard approach to tackle this type of existence problem is to find a certain regularity both in the functional and in the space of the admissible transport plans, that is, proving that the functional is sufficiently “smooth” and finding a compact set of admissible transport plans. Since the set of admissible transport plans contains at least the product measure μνtensor-product𝜇𝜈\mu\otimes\nu, it is non-empty, and existence can be proven thanks to a topological argument that exploits the regularity of the functional and the compactness of the set. Once the existence of a transport plan is established, the other question concerns the existence of an associated transport map. Unfortunately, problem (K) does not directly show the structure required by this reasoning path. However, one can observe that (K) is a linear optimization problem subject to linear constraints. This suggests using certain tools, such as the dual formulation, that was also developed by Kantorovich. With this dual formulation, it is possible to exhibit an interesting structure of the problem, that can be used to answer both questions (existence of a transport plan, and existence of an associated transport map).
一种解决这种类型存在问题的标准方法是在可传递计划的函数和空间中找到一定的规律,即证明函数足够"平滑"并找到一组可接受的紧致传输计划集。由于可接受传输计划集至少包含乘积度量 μνtensor-product𝜇𝜈\mu\otimes\nu ,因此它不为空,并且可以通过利用函数的规则性和集合的紧致性所做的拓扑论证来证明它的存在性。一旦传输计划的存在性确立,另一个问题涉及相关传输映射的存在性。不幸的是,问题(K)并未直接显示出这种推理路径所需的结构。然而,我们可以观察到(K)是一个受线性约束的线性优化问题。这表明可以使用某些工具,例如康托洛维奇也开发的对偶形式。通过这种对偶形式,可以展示出该问题的一个有趣结构,可用来回答两个问题(传输计划的存在性和相关传输映射的存在性)。

4 Kantorovich’s dual problem
4Kantorovich 的对偶问题

Kantorovich’s duality applies to problem (K), in its most general form (with measures). To facilitate understanding, we will consider instead a discrete version of problem (K), where the involved entities are vectors and matrices (instead of measures and operators). This makes it easy to better discern the structure of (K), that is the linear nature of both the functional and the constraints. This also makes it easier for the reader to understand how to construct the dual by manipulating simpler objects (matrices and vectors), however the structure of the reasoning is the same in the general case.
Kantorovich 的对偶性适用于问题 (K),以其最一般形式 (带有度量)。为了方便理解,我们将考虑问题 (K) 的离散版本,其中涉及的实体是向量和矩阵 (而不是度量和算子)。这样能更容易地辨别问题 (K) 的结构,即泛函和约束的线性性质。这也使读者更容易理解如何通过操作更简单的对象 (矩阵和向量) 构造对偶问题,然而,在一般情况下,推理的结构是相同的。

4.1 The discrete Kantorovich problem
4.1 离散 Kantorovich 问题

In Figure 7, we show a discrete version of the 1D transport between two segments of Figure 6. The measures μ𝜇\mu and ν𝜈\nu are replaced with vectors U=(ui)i=1m𝑈subscriptsubscript𝑢𝑖𝑖1𝑚U=(u_{i})_{i=1\ldots m} and V=(vj)i=1n𝑉subscriptsubscript𝑣𝑗𝑖1𝑛V=(v_{j})_{i=1\ldots n}. The transport plan γ𝛾\gamma becomes a set of coefficients γijsubscript𝛾𝑖𝑗\gamma_{ij}. Each coefficient γijsubscript𝛾𝑖𝑗\gamma_{ij} indicates the quantity of matter that will be transported from uisubscript𝑢𝑖u_{i} to vjsubscript𝑣𝑗v_{j}. The discrete Kantorovich problem can be written as follows:
在图 7 中,我们展示了图 6 中两个区段之间的一维传输的离散版本。测度 μ𝜇\muν𝜈\nu 被向量 U=(ui)i=1m𝑈subscriptsubscript𝑢𝑖𝑖1𝑚U=(u_{i})_{i=1\ldots m}V=(vj)i=1n𝑉subscriptsubscript𝑣𝑗𝑖1𝑛V=(v_{j})_{i=1\ldots n} 替换。传输计划 γ𝛾\gamma 成为一组系数 γijsubscript𝛾𝑖𝑗\gamma_{ij} 。每个系数 γijsubscript𝛾𝑖𝑗\gamma_{ij} 表示将从 uisubscript𝑢𝑖u_{i} 输送到 vjsubscript𝑣𝑗v_{j} 的物质数量。离散 Kantorovich 问题可以表示如下:

minγ<C,γ> subject to {𝐏𝐱γ=U𝐏𝐲γ=Vγi,j0i,jformulae-sequencesubscript𝛾𝐶𝛾 subject to casessubscript𝐏𝐱𝛾𝑈subscript𝐏𝐲𝛾𝑉subscript𝛾𝑖𝑗0for-all𝑖𝑗\quad\quad\min\limits_{\gamma}<C,\gamma>\mbox{ subject to }\left\{\begin{array}[]{l}{\bf P_{x}}\gamma=U\\ {\bf P_{y}}\gamma=V\\ \gamma_{i,j}\geq 0\quad\forall i,j\end{array}\right. (2)

where γ𝛾\gamma is the vector of m×nsuperscript𝑚𝑛\mathbb{R}^{m\times n} with all coefficients γijsubscript𝛾𝑖𝑗\gamma_{ij} (that is, the matrix γijsubscript𝛾𝑖𝑗\gamma_{ij} “unrolled” into a vector), and C𝐶C the vector of m×nsuperscript𝑚𝑛\mathbb{R}^{m\times n} with the coefficients cijsubscript𝑐𝑖𝑗c_{ij} indicating the transport cost between point i𝑖i and point j𝑗j (for instance, the Euclidean cost). The objective function is simply the dot product, denoted by <C,γ><C,\gamma>, of the cost vector C𝐶C and the vector γ𝛾\gamma. The objective function is linear in γ𝛾\gamma. The constraints on the marginals (1) impose in this discrete version that the sums of the γijsubscript𝛾𝑖𝑗\gamma_{ij} coefficients over the columns correspond to the uisubscript𝑢𝑖u_{i} coefficients (Figure 7-B) and the sums over the rows correspond to the vjsubscript𝑣𝑗v_{j} coefficients (Figure 7-C). Intuitively, everything that leaves point i𝑖i should correspond to uisubscript𝑢𝑖u_{i}, that is the quantity of matter initially present in i𝑖i in the source landscape, and everything that arrives at a point j𝑗j should correspond to vjsubscript𝑣𝑗v_{j}, that is the quantity of matter desired at j𝑗j in the target landscape. As one can easily notice, in this form, both constraints are linear in γ𝛾\gamma. They can be written with two matrices 𝐏𝐱subscript𝐏𝐱{\bf P_{x}} and 𝐏𝐲subscript𝐏𝐲{\bf P_{y}}, of dimensions m×mn𝑚𝑚𝑛m\times mn and n×mn𝑛𝑚𝑛n\times mn respectively.
其中 γ𝛾\gamma 是具有所有系数 γijsubscript𝛾𝑖𝑗\gamma_{ij} (即矩阵 γijsubscript𝛾𝑖𝑗\gamma_{ij} “展开”为向量)的 m×nsuperscript𝑚𝑛\mathbb{R}^{m\times n} 的向量, C𝐶C 是具有指示点 i𝑖i 和点 j𝑗j 之间运输成本 cijsubscript𝑐𝑖𝑗c_{ij}m×nsuperscript𝑚𝑛\mathbb{R}^{m\times n} 的向量(例如,欧几里得成本)。目标函数简单地是成本向量 C𝐶C 和向量 γ𝛾\gamma 的点积,用 <C,γ><C,\gamma> 表示。目标函数在 γ𝛾\gamma 中是线性的。边际限制(1)在这个离散版本中要求列上系数 γijsubscript𝛾𝑖𝑗\gamma_{ij} 的总和对应于 uisubscript𝑢𝑖u_{i} 系数(图 7-B),行上的总和对应于 vjsubscript𝑣𝑗v_{j} 系数(图 7-C)。直观地看,离开点 i𝑖i 的一切都应与 uisubscript𝑢𝑖u_{i} 相对应,即最初在源景观中存在的物质量,而到达某一点 j𝑗j 的一切都应与 vjsubscript𝑣𝑗v_{j} 相对应,即目标景观中 j𝑗j 处所需的物质量。正如大家可以轻易发现的那样,在这种形式下,这两个约束在 γ𝛾\gamma 中都是线性的。它们可以分别用两个维数分别为 m×mn𝑚𝑚𝑛m\times mnn×mn𝑛𝑚𝑛n\times mn 的矩阵 𝐏𝐱subscript𝐏𝐱{\bf P_{x}}𝐏𝐲subscript𝐏𝐲{\bf P_{y}} 来表示。

4.2 Constructing the Kantorovich dual in the discrete setting
在离散设置中构建 Kantorovich 对偶

We introduce, arbitrarily for now, the following function L𝐿L defined by:
我们介绍,目前随意地,以下由 L𝐿L 定义的函数。

L(φ,ψ)=<C,γ><φ,𝐏𝐱γU><ψ,𝐏𝐲γV>L(\varphi,\psi)=<C,\gamma>-<\varphi,{\bf P_{x}}\gamma-U>-<\psi,{\bf P_{y}}\gamma-V>

that takes as arguments two vectors, φ𝜑\varphi in msuperscript𝑚{\mathbb{R}}^{m} and ψ𝜓\psi in nsuperscript𝑛{\mathbb{R}}^{n}. The function L𝐿L is constructed from the objective function <C,γ><C,\gamma> from which we subtracted the dot products of φ𝜑\varphi and ψ𝜓\psi with the vectors that correspond to the degree of violation of the constraints. One can observe that:
该函数接受两个向量作为参数, φ𝜑\varphimsuperscript𝑚{\mathbb{R}}^{m} 中, ψ𝜓\psinsuperscript𝑛{\mathbb{R}}^{n} 中。函数 L𝐿L 是从目标函数 <C,γ><C,\gamma> 构建的,我们从中减去了与对应于约束违反程度的向量的点积。可以观察到:

supφ,ψ[L(φ,ψ)]=<C,γ> if 𝐏𝐱γ=U and 𝐏𝐲γ=V=+ otherwise.subscriptsupremum𝜑𝜓delimited-[]𝐿𝜑𝜓formulae-sequenceabsent𝐶𝛾 if subscript𝐏𝐱𝛾𝑈 and subscript𝐏𝐲𝛾𝑉missing-subexpression otherwise.\begin{array}[]{lcl}\sup\limits_{\varphi,\psi}[L(\varphi,\psi)]&=&<C,\gamma>\mbox{ if }{\bf P_{x}}\gamma=U\mbox{ and }{\bf P_{y}}\gamma=V\\ &=&+\infty\mbox{ otherwise.}\end{array}

Indeed, if for instance a component i𝑖i of 𝐏𝐱γsubscript𝐏𝐱𝛾{\bf P_{x}}\gamma is non-zero, one can make L𝐿L arbitrarily large by suitably choosing the associated coefficient φisubscript𝜑𝑖\varphi_{i}.
实际上,例如,如果 𝐏𝐱γsubscript𝐏𝐱𝛾{\bf P_{x}}\gamma 的一个分量 i𝑖i 是非零的,则可以通过适当选择相关系数 φisubscript𝜑𝑖\varphi_{i} 使 L𝐿L 随意增大。

Now we consider: 现在我们考虑:

infγ0[supφ,ψ[L(φ,ψ)]]=infγ0𝐏𝐱γ=U𝐏𝐲γ=V[<C,γ>].\inf\limits_{\gamma\geq 0}\left[\sup\limits_{\varphi,\psi}[L(\varphi,\psi)]\right]=\inf\limits_{\tiny\begin{array}[]{l}\gamma\geq 0\\ {\bf P_{x}}\gamma=U\\ {\bf P_{y}}\gamma=V\end{array}}\left[<C,\gamma>\right].

There is equality, because to minimize sup[L(φ,ψ)]supremumdelimited-[]𝐿𝜑𝜓\sup[L(\varphi,\psi)], γ𝛾\gamma has no other choice than satisfying the constraints (see the previous observation). Thus, we obtain a new expression (left-hand side) of the discrete Kantorovich problem (right-hand side). We now further examine it, and replace L𝐿L by its expression:
有相等性,因为要使 sup[L(φ,ψ)]supremumdelimited-[]𝐿𝜑𝜓\sup[L(\varphi,\psi)] 最小化, γ𝛾\gamma 除了满足限制条件之外别无选择(请参见前面的观察)。 因此,我们获得了离散 Kantorovich 问题的新表达式(左侧), 第 2 项用其表达式替换。

infγ0[supφ,ψ(<C,γ><φ,𝐏𝐱γU><ψ,𝐏𝐲γV>)]\displaystyle\inf\limits_{\gamma\geq 0}\left[\sup\limits_{\varphi,\psi}\left(\begin{array}[]{ll}<C,\gamma>&-<\varphi,{\bf P_{x}}\gamma-U>\\ &-<\psi,{\bf P_{y}}\gamma-V>\end{array}\right)\right] (5)
=\displaystyle= supφ,ψ[infγ0(<C,γ><φ,𝐏𝐱γU><ψ,𝐏𝐲γV>)]\displaystyle\sup\limits_{\varphi,\psi}\left[\inf\limits_{\gamma\geq 0}\left(\begin{array}[]{ll}<C,\gamma>&-<\varphi,{\bf P_{x}}\gamma-U>\\ &-<\psi,{\bf P_{y}}\gamma-V>\end{array}\right)\right] (8)
=\displaystyle= supφ,ψ[infγ0(<γ,C𝐏𝐱tφ𝐏𝐲tψ>+<φ,U>+<ψ,V>)]\displaystyle\sup\limits_{\varphi,\psi}\left[\inf\limits_{\gamma\geq 0}\left(\begin{array}[]{l}<\gamma,C-{\bf P_{x}}^{t}\varphi-{\bf P_{y}}^{t}\psi>+\\[2.84526pt] <\varphi,U>+<\psi,V>\end{array}\right)\right] (11)
=\displaystyle= supφ,ψ𝐏𝐱tφ+𝐏𝐲tψC[<φ,U>+<ψ,V>].\displaystyle\sup\limits_{\tiny\begin{array}[]{c}\varphi,\psi\\ {\bf P_{x}}^{t}\varphi+{\bf P_{y}}^{t}\psi\leq C\end{array}}\left[<\varphi,U>+<\psi,V>\right].\quad\quad (14)

The first step (8) consists in exchanging the “infinfimum\inf” and “supsupremum\sup”. Then we rearrange the terms (11). By reinterpreting this equation as a constrained optimization problem (similarly to what we did in the previous paragraph), we finally obtain the constrained optimization problem in (14). In the constraint 𝐏𝐱tφ+𝐏𝐲tψCsuperscriptsubscript𝐏𝐱𝑡𝜑superscriptsubscript𝐏𝐲𝑡𝜓𝐶{\bf P_{x}}^{t}\varphi+{\bf P_{y}}^{t}\psi\leq C, the inequality is to be considered componentwise. Finally, the problem (14) can be rewritten as:
第一步(8)是交换“ infinfimum\inf ”和“ supsupremum\sup ”。然后我们重新排列项(11)。通过将这个方程重新解释为一个约束优化问题(类似于前一段的做法),我们最终得到了(14)中的约束优化问题。在约束 𝐏𝐱tφ+𝐏𝐲tψCsuperscriptsubscript𝐏𝐱𝑡𝜑superscriptsubscript𝐏𝐲𝑡𝜓𝐶{\bf P_{x}}^{t}\varphi+{\bf P_{y}}^{t}\psi\leq C 中,不等式应当被按分量考虑。最后,问题(14)可以被重写为:

supφ,ψ[<φ,U>+<ψ,V>] subject to φi+ψjcij,i,j.\begin{array}[]{l}\sup\limits_{\varphi,\psi}\left[<\varphi,U>+<\psi,V>\right]\\[5.69054pt] \mbox{ subject to }\varphi_{i}+\psi_{j}\leq c_{ij},\quad\forall i,j.\end{array} (15)

As compared to the primal problem (2) that depends on m×n𝑚𝑛m\times n variables (all the coefficients γijsubscript𝛾𝑖𝑗\gamma_{ij} of the optimal transport plan for all couples of points (i,j)𝑖𝑗(i,j)), this dual problem depends on m+n𝑚𝑛m+n variables (the components φisubscript𝜑𝑖\varphi_{i} and ψjsubscript𝜓𝑗\psi_{j} attached to the source points and target points). We will see later how to further reduce the number of variables, but before then, we go back to the general continuous setting (that is, with functions, measures and operators).
与基本问题(2)相比,该问题依赖于 m×n𝑚𝑛m\times n 变量(所有与一对点 (i,j)𝑖𝑗(i,j) 的最优输运方案的系数 γijsubscript𝛾𝑖𝑗\gamma_{ij} ),而该对偶问题依赖于 m+n𝑚𝑛m+n 个变量(与源点和目标点相关联的分量 φisubscript𝜑𝑖\varphi_{i}ψjsubscript𝜓𝑗\psi_{j} )。我们稍后将看到如何进一步减少变量的数量,但在那之前,我们回到一般的连续设置(即,使用函数、测度和算子)。

4.3 The Kantorovich dual in the continuous setting
连续设置中的 Kantorovich 对偶

The same reasoning path can be applied to the continuous Kantorovich problem (K), leading to the following problem (DK):
相同的推理路径可以应用于连续 Kantorovich 问题(K),导致以下问题(DK):

(DK)supφ,ψ[Xφ𝑑μ+Yψ𝑑ν] subject to: φ(x)+ψ(y)c(x,y)(x,y)X×Y,𝐷𝐾subscriptsupremum𝜑𝜓delimited-[]subscript𝑋𝜑differential-d𝜇subscript𝑌𝜓differential-d𝜈 subject to: formulae-sequence𝜑𝑥𝜓𝑦𝑐𝑥𝑦for-all𝑥𝑦𝑋𝑌\begin{array}[]{l}(DK)\quad\quad\sup\limits_{\varphi,\psi}\left[\int\limits_{X}\varphi d\mu+\int\limits_{Y}\psi d\nu\right]\\[11.38109pt] \mbox{ subject to: }\\ \varphi(x)+\psi(y)\leq c(x,y)\quad\forall(x,y)\in X\times Y,\end{array} (16)

where φ𝜑\varphi and ψ𝜓\psi are now functions defined on X𝑋X and Y𝑌Y555The functions φ𝜑\varphi and ψ𝜓\psi need to be taken in L1(μ)superscript𝐿1𝜇L^{1}(\mu) and L1(ν)superscript𝐿1𝜈L^{1}(\nu). The proof of the equivalence with problem (K) requires more precautions than in the discrete case, in particular step (8) (exchanging sup and inf), that uses a result of convex analysis (due to Rockafellar), see [Vil09] chapter 5.
函数 φ𝜑\varphiψ𝜓\psi 需要在 L1(μ)superscript𝐿1𝜇L^{1}(\mu)L1(ν)superscript𝐿1𝜈L^{1}(\nu) 中考虑。 与问题(K)等价的证明需要比离散情况更多的预防措施,特别是步骤(8)(交换 sup 和 inf),该步骤使用了凸分析的结果(来自 Rockafellar),参见 [Vil09] 章节 5。

其中 φ𝜑\varphiψ𝜓\psi 现在是在 X𝑋XY𝑌Y 上定义的函数 5
.

The classical image that gives an intuitive meaning to this dual problem is to consider that instead of transporting earth by ourselves, we are now hiring a company that will do the work on our behalf. The company has a special way of determining the price: the function φ(x)𝜑𝑥\varphi(x) corresponds to what they charge for loading earth at x𝑥x, and ψ(y)𝜓𝑦\psi(y) corresponds to what they charge for unloading earth at y𝑦y. The company aims at maximizing its profit (this is why the dual problem is a “sup” rather than an “inf)”, but it cannot charge more than what it would cost us if we were doing the work by ourselves (hence the constraint).
给这个双边问题带来直观含义的经典形象是,考虑到我们现在不再亲自搬运土地,而是雇佣一家代劳公司来代劳。该公司有一种特殊的确定价格的方法:函数 φ(x)𝜑𝑥\varphi(x) 对应他们在 x𝑥x 处装载土地的收费, ψ(y)𝜓𝑦\psi(y) 对应他们在 y𝑦y 处卸载土地的收费。该公司旨在最大化利润(这就是为什么双边问题是“sup”而不是“inf”),但不能收取比我们自己亲自完成工作时的成本更高的费用(因此有了约束)。

The existence of solutions for (DK) remains difficult to study, because the set of functions φ,ψ𝜑𝜓\varphi,\psi that satisfy the constraint is not compact. However, it is possible to reveal more structure of the problem, by introducing the notion of c-transform, that makes it possible to exhibit a set of admissible functions with sufficient regularity:
对于(DK)的解的存在性仍然很难研究,因为满足约束条件的函数 φ,ψ𝜑𝜓\varphi,\psi 的集合并不紧致。但是,通过引入 c 变换的概念,可以揭示问题的更多结构,从而展示出一组具有足够规则性的可接受函数:

Definition 1. 定义 1。
  • For f𝑓f any function on Y𝑌Y with values in {}\mathbb{R}\cup\{-\infty\} and not identically -\infty, we define its c𝑐c-transform by

    fc(x)=infyY[c(x,y)f(y)],xX.formulae-sequencesuperscript𝑓𝑐𝑥subscriptinfimum𝑦𝑌delimited-[]𝑐𝑥𝑦𝑓𝑦𝑥𝑋f^{c}(x)=\inf\limits_{y\in Y}\left[c(x,y)-f(y)\right],\quad x\in X.

    • 对于 Y𝑌Y 上的任意函数 f𝑓f ,其取值在 {}\mathbb{R}\cup\{-\infty\} 中并且不是恒等于 -\infty ,我们定义其 c𝑐c -变换为
  • If a function φ𝜑\varphi is such that there exists a function f𝑓f such that φ=fc𝜑superscript𝑓𝑐\varphi=f^{c}, then φ𝜑\varphi is said to be c𝑐c-concave;


    • 如果一个函数 φ𝜑\varphi 存在这样一个函数 f𝑓f ,使得 φ=fc𝜑superscript𝑓𝑐\varphi=f^{c} ,那么 φ𝜑\varphi 被称为 c𝑐c -凹函数;
  • 𝚿c(X)subscript𝚿𝑐𝑋{\bf\Psi}_{c}(X) denotes the set of c-concave functions on X𝑋X.


    𝚿c(X)subscript𝚿𝑐𝑋{\bf\Psi}_{c}(X) 表示在 X𝑋X 上的 c-凹函数集合。

We now show two properties of (DK) that will allow us to restrict the problem to the class of c𝑐c-concave functions to search for φ𝜑\varphi and ψ𝜓\psi:
我们现在展示(DK)的两个特性,这将允许我们将问题限制在 c𝑐c -凹函数类中,以便搜索 φ𝜑\varphiψ𝜓\psi

Observation 2.

If the pair (φ,ψ)𝜑𝜓(\varphi,\psi) is admissible for (DK), then the pair (ψc,ψ)superscript𝜓𝑐𝜓(\psi^{c},\psi) is admissible as well.


观察 2. 如果对于(DK)而言对 (φ,ψ)𝜑𝜓(\varphi,\psi) 是可接受的,那么对 (ψc,ψ)superscript𝜓𝑐𝜓(\psi^{c},\psi) 也是可接受的。
Proof. 证明。
{(x,y)X×Y,φ(x)+ψ(y)c(x,y)ψc(x)=infyY[c(x,y)ψ(y)]ψc(x)+ψ(y)=infyY[c(x,y)ψ(y)]+ψ(y)(c(x,y)ψ(y))+ψ(y)c(x,y).casesformulae-sequencefor-all𝑥𝑦𝑋𝑌𝜑𝑥𝜓𝑦𝑐𝑥𝑦superscript𝜓𝑐𝑥subscriptinfimum𝑦𝑌delimited-[]𝑐𝑥𝑦𝜓𝑦superscript𝜓𝑐𝑥𝜓𝑦subscriptinfimumsuperscript𝑦𝑌delimited-[]𝑐𝑥superscript𝑦𝜓superscript𝑦𝜓𝑦missing-subexpression𝑐𝑥𝑦𝜓𝑦𝜓𝑦missing-subexpression𝑐𝑥𝑦\begin{array}[]{l}\left\{\begin{array}[]{l}\forall(x,y)\in X\times Y,\varphi(x)+\psi(y)\leq c(x,y)\\ \psi^{c}(x)=\inf\limits_{y\in Y}\left[c(x,y)-\psi(y)\right]\end{array}\right.\\[8.53581pt] \begin{array}[]{lcl}\psi^{c}(x)+\psi(y)&=&\inf\limits_{y^{\prime}\in Y}\left[c(x,y^{\prime})-\psi(y^{\prime})\right]+\psi(y)\\ &\leq&(c(x,y)-\psi(y))+\psi(y)\\ &\leq&c(x,y).\end{array}\end{array}

Observation 3.

If the pair (φ,ψ)𝜑𝜓(\varphi,\psi) is admissible for (DK), then one obtains a better pair by replacing φ𝜑\varphi with ψcsuperscript𝜓𝑐\psi^{c}:


注意 3。如果对(DK)的 (φ,ψ)𝜑𝜓(\varphi,\psi) 对是可以接受的,那么通过将 φ𝜑\varphi 替换为 ψcsuperscript𝜓𝑐\psi^{c} 可以得到更好的对:
Proof. 证明。
ψc(x)=infyY[c(x,y)ψ(y)]yY,φ(x)c(x,y)ψ(y)}ψc(x)φ(x).casesmissing-subexpressionsuperscript𝜓𝑐𝑥absentsubscriptinfimum𝑦𝑌delimited-[]𝑐𝑥𝑦𝜓𝑦for-all𝑦𝑌𝜑𝑥absent𝑐𝑥𝑦𝜓𝑦superscript𝜓𝑐𝑥𝜑𝑥\left.\begin{array}[]{lcl}&\psi^{c}(x)=&\inf\limits_{y\in Y}\left[c(x,y)-\psi(y)\right]\\ \forall y\in Y,&\varphi(x)\leq&c(x,y)-\psi(y)\end{array}\right\}\Rightarrow\psi^{c}(x)\leq\varphi(x).

In terms of the previous intuitive image, this means that by replacing φ𝜑\varphi with ψcsuperscript𝜓𝑐\psi^{c}, the company can charge more while the price remains acceptable for the client (that is, the constraint is satisfied). Thus, we have:
根据先前的直观形象,这意味着通过用 ψcsuperscript𝜓𝑐\psi^{c} 替换 φ𝜑\varphi ,公司可以收取更多费用,同时价格对客户来说仍然可以接受(也就是说,约束得到满足)。因此,我们有:

inf(K)=supφ𝚿c(X)Xφ𝑑μ+Yφc𝑑ν=supψ𝚿c(Y)Xψc𝑑μ+Yψ𝑑νinfimum𝐾absentsubscriptsupremum𝜑subscript𝚿𝑐𝑋subscript𝑋𝜑differential-d𝜇subscript𝑌superscript𝜑𝑐differential-d𝜈missing-subexpressionabsentsubscriptsupremum𝜓subscript𝚿𝑐𝑌subscript𝑋superscript𝜓𝑐differential-d𝜇subscript𝑌𝜓differential-d𝜈\begin{array}[]{ll}\inf(K)&=\sup\limits_{\varphi\in{\bf\Psi}_{c}(X)}\quad\int\limits_{X}\varphi\ d\mu+\int\limits_{Y}\varphi^{c}\ d\nu\\[11.38109pt] &=\sup\limits_{\psi\in{\bf\Psi}_{c}(Y)}\quad\int\limits_{X}\psi^{c}\ d\mu+\int\limits_{Y}\psi\ d\nu\end{array}

We do not give here the detailed proof for existence. The reader is referred to [Vil09], Chapter 4. The idea is that we are now in a much better situation, since the set of admissible functions 𝚿c(X)subscript𝚿𝑐𝑋{\bf\Psi}_{c}(X) is compact666Provided that the value of ψ𝜓\psi is fixed at a point of Y𝑌Y in order to suppress invariance with respect to adding a constant to ψ𝜓\psi.
假定值为 ψ𝜓\psi 的固定在 Y𝑌Y 处,以抑制对于添加常数到 ψ𝜓\psi 的不变性。

我们这里不提供存在性的详细证明。读者可以参考 [Vil09],第 4 章。这个想法是,我们现在处于一个好得多的情况,因为可接受函数集 𝚿c(X)subscript𝚿𝑐𝑋{\bf\Psi}_{c}(X) 是紧致的 6
.

The optimal value gives an interesting information, that is the minimum cost of transforming μ𝜇\mu into ν𝜈\nu. This can be used to define a distance between distributions, and also gives a way to compare different distributions, which is of practical interest for some applications.
最优值提供了一个有趣的信息,即将 μ𝜇\mu 转换为 ν𝜈\nu 的最低成本。这可用于定义分布之间的距离,还提供了一种比较不同分布的方法,这在某些应用中具有实际意义。

5 From the Kantorovich dual to the optimal transport map
从 Kantorovich 对偶到最优输运映射

5.1 The c𝑐c-superdifferential
5.1The c𝑐c -超微分

Suppose now that in addition to the optimal cost you want to know the associated way to transform μ𝜇\mu into ν𝜈\nu, in other words, when it exists, the map T𝑇T from X𝑋X to Y𝑌Y which associated transport plan (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu minimizes the functional of the Monge problem. A result characterizes the support of γ𝛾\gamma, that is the subset cφX×Ysuperscript𝑐𝜑𝑋𝑌\partial^{c}\varphi\subset X\times Y of the pairs of points (x,y)𝑥𝑦(x,y) connected by the transport plan:
现在假设除了最优成本外,您还想了解将 μ𝜇\mu 转化为 ν𝜈\nu 的关联方式,换句话说,当存在时,从 X𝑋XY𝑌Y 的映射 T𝑇T 关联的运输方案 (Id×T)μ𝐼𝑑𝑇𝜇(Id\times T)\sharp\mu 最小化蒙日问题的泛函。结果描述了 γ𝛾\gamma 的支持,即由运输方案连接的点对 (x,y)𝑥𝑦(x,y) 的子集 cφX×Ysuperscript𝑐𝜑𝑋𝑌\partial^{c}\varphi\subset X\times Y

Theorem 1.

Let φ𝜑\varphi a c𝑐c-concave function. For all (x,y)cφ𝑥𝑦superscript𝑐𝜑(x,y)\in\partial^{c}\varphi, we have

φ(x)xc(x,y)=0,𝜑𝑥subscript𝑥𝑐𝑥𝑦0\nabla\varphi(x)-\nabla_{x}c(x,y)=0,

where cφ={(x,y)|φ(z)φ(x)+(c(z,y)c(x,y)),zX}superscript𝑐𝜑conditional-set𝑥𝑦formulae-sequence𝜑𝑧𝜑𝑥𝑐𝑧𝑦𝑐𝑥𝑦for-all𝑧𝑋\partial^{c}\varphi=\{(x,y)\ |\ \varphi(z)\leq\varphi(x)+(c(z,y)-c(x,y)),\forall z\in X\}777By definition of the c𝑐c-transform, if (x,y)cφ𝑥𝑦superscript𝑐𝜑(x,y)\in\partial^{c}\varphi, then φc(y)=c(x,y)φ(x)superscript𝜑𝑐𝑦𝑐𝑥𝑦𝜑𝑥\varphi^{c}(y)=c(x,y)-\varphi(x). Then, the c𝑐c-superdifferential can be characterized by the set of all points (x,y)X×Y𝑥𝑦𝑋𝑌(x,y)\in X\times Y such that φ(x)+φc(y)=c(x,y)𝜑𝑥superscript𝜑𝑐𝑦𝑐𝑥𝑦\varphi(x)+\varphi^{c}(y)=c(x,y)
根据 c𝑐c -变换的定义,如果 (x,y)cφ𝑥𝑦superscript𝑐𝜑(x,y)\in\partial^{c}\varphi ,则 φc(y)=c(x,y)φ(x)superscript𝜑𝑐𝑦𝑐𝑥𝑦𝜑𝑥\varphi^{c}(y)=c(x,y)-\varphi(x) 。然后, c𝑐c -超微分可以由所有满足条件 φ(x)+φc(y)=c(x,y)𝜑𝑥superscript𝜑𝑐𝑦𝑐𝑥𝑦\varphi(x)+\varphi^{c}(y)=c(x,y) 的点 (x,y)X×Y𝑥𝑦𝑋𝑌(x,y)\in X\times Y 来刻画
.
 其中 cφ={(x,y)|φ(z)φ(x)+(c(z,y)c(x,y)),zX}superscript𝑐𝜑conditional-set𝑥𝑦formulae-sequence𝜑𝑧𝜑𝑥𝑐𝑧𝑦𝑐𝑥𝑦for-all𝑧𝑋\partial^{c}\varphi=\{(x,y)\ |\ \varphi(z)\leq\varphi(x)+(c(z,y)-c(x,y)),\forall z\in X\} 7 denotes the so-called c𝑐c-superdifferential of φ𝜑\varphi.
表示所谓的 c𝑐c -超微分 φ𝜑\varphi


定理 1. 设 φ𝜑\varphic𝑐c -凹函数。对于所有 (x,y)cφ𝑥𝑦superscript𝑐𝜑(x,y)\in\partial^{c}\varphi ,我们有
Proof.

See [Vil09] chapters 9 and 10. ∎


证明。见[Vil09]第 9 和第 10 章。∎

In order to give an idea of the relation between the c𝑐c-superdifferential and the associated transport map T𝑇T, we present below a heuristic argument: consider a point (x,y)𝑥𝑦(x,y) in the c𝑐c-superdifferential cφsuperscript𝑐𝜑\partial^{c}\varphi, then for all zX𝑧𝑋z\in X we have
为了让大家了解 c𝑐c -超微分和相关的传输映射 T𝑇T 之间的关系,我们以下给出一个启发式论点:考虑 c𝑐c -超微分 cφsuperscript𝑐𝜑\partial^{c}\varphi 中的点 (x,y)𝑥𝑦(x,y) ,那么对于所有 zX𝑧𝑋z\in X ,我们有

c(x,y)φ(x)c(z,y)φ(z).𝑐𝑥𝑦𝜑𝑥𝑐𝑧𝑦𝜑𝑧c(x,y)-\varphi(x)\leq c(z,y)-\varphi(z). (17)

Now, by using (17), we can compute the derivative at x𝑥x with respect to an arbitrary direction w𝑤w
现在,通过使用(17),我们可以计算关于任意方向 w𝑤wx𝑥x 处的导数

limt0+φ(x+tw)φ(x)tlimt0+c(x+tw,y)c(x,y)tsubscript𝑡superscript0𝜑𝑥𝑡𝑤𝜑𝑥𝑡subscript𝑡superscript0𝑐𝑥𝑡𝑤𝑦𝑐𝑥𝑦𝑡\lim_{t\to 0^{+}}\frac{\varphi(x+tw)-\varphi(x)}{t}\leq\lim_{t\to 0^{+}}\frac{c(x+tw,y)-c(x,y)}{t}

and we obtain φ(x)wxc(x,y)w𝜑𝑥𝑤subscript𝑥𝑐𝑥𝑦𝑤\nabla\varphi(x)\cdot w\leq\nabla_{x}c(x,y)\cdot w. We can do the same derivation along direction w𝑤-w instead of w𝑤w, and then we get φ(x)w=xc(x,y)w𝜑𝑥𝑤subscript𝑥𝑐𝑥𝑦𝑤\nabla\varphi(x)\cdot w=\nabla_{x}c(x,y)\cdot w, wXfor-all𝑤𝑋\forall w\in X.
然后我们得到 φ(x)wxc(x,y)w𝜑𝑥𝑤subscript𝑥𝑐𝑥𝑦𝑤\nabla\varphi(x)\cdot w\leq\nabla_{x}c(x,y)\cdot w 。我们可以沿着方向 w𝑤-w 而不是 w𝑤w 进行相同的推导,然后我们得到 φ(x)w=xc(x,y)w𝜑𝑥𝑤subscript𝑥𝑐𝑥𝑦𝑤\nabla\varphi(x)\cdot w=\nabla_{x}c(x,y)\cdot wwXfor-all𝑤𝑋\forall w\in X

In the particular case of the L2subscript𝐿2L_{2} cost, that is with c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2}, this relation becomes (x,y)cφ,φ(x)+yx=0formulae-sequencefor-all𝑥𝑦superscript𝑐𝜑𝜑𝑥𝑦𝑥0\forall(x,y)\in\partial^{c}\varphi,\nabla\varphi(x)+y-x=0, thus, when the optimal transport map T𝑇T exists, it is given by
在特定情况下的 L2subscript𝐿2L_{2} 成本,即与 c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2} ,这个关系变成 (x,y)cφ,φ(x)+yx=0formulae-sequencefor-all𝑥𝑦superscript𝑐𝜑𝜑𝑥𝑦𝑥0\forall(x,y)\in\partial^{c}\varphi,\nabla\varphi(x)+y-x=0 ,因此,当最优输运映射 T𝑇T 存在时,它是由下式给出的

T(x)=xφ(x)=(x2/2φ(x)).𝑇𝑥𝑥𝜑𝑥superscriptnorm𝑥22𝜑𝑥T(x)=x-\nabla\varphi(x)=\nabla(\|x\|^{2}/2-\varphi(x)).

Not only this gives an expression of T𝑇T in function of φ𝜑\varphi, which is of high interest to us if we want to compute the transport explicitly. In addition, this makes it possible to characterize T𝑇T as the gradient of a convex function (see also Brenier’s polar factorization theorem [Bre91]). This convexity property is interesting, because it means that two “transported particles” x1T(x1)maps-tosubscript𝑥1𝑇subscript𝑥1x_{1}\mapsto T(x_{1}) et x2T(x2)maps-tosubscript𝑥2𝑇subscript𝑥2x_{2}\mapsto T(x_{2}) will never collide. We now see how to prove these two assertions (T𝑇T gradient of a convex function and absence of collision) in the case of the L2subscript𝐿2L_{2} transport (with (c(x,y)=1/2xy2(c(x,y)=1/2\|x-y\|^{2}).
不仅这给出了 T𝑇T 关于 φ𝜑\varphi 的表达式,如果我们想明确计算输运,这对我们很有帮助。此外,这使得可以将 T𝑇T 表征为凸函数的梯度(也见 Brenier 的极分解定理[Bre91])。这种凸性质是有趣的,因为它意味着两个“输运粒子” x1T(x1)maps-tosubscript𝑥1𝑇subscript𝑥1x_{1}\mapsto T(x_{1})x2T(x2)maps-tosubscript𝑥2𝑇subscript𝑥2x_{2}\mapsto T(x_{2}) 永远不会相撞。现在我们看到如何证明这两个断言(凸函数的梯度和无碰撞)在 L2subscript𝐿2L_{2} 输运的情况下(与 (c(x,y)=1/2xy2(c(x,y)=1/2\|x-y\|^{2} )。

Refer to caption

Figure 8: The upper envelope of a family of affine functions is a convex function.
图 8:一组仿射函数的上包络是一个凸函数。

Refer to caption


Figure 9: Different types of transport, with continuous and discrete measures μ𝜇\mu and ν𝜈\nu.
图 9:不同类型的交通工具,以连续和离散度量 μ𝜇\muν𝜈\nu
Observation 4.

If c(x,y)𝑐𝑥𝑦c(x,y) = 1/2xy212superscriptnorm𝑥𝑦21/2\|x-y\|^{2} and φ𝚿c(X)𝜑subscript𝚿𝑐𝑋\varphi\in{\bf\Psi}_{c}(X), then φ¯:xφ¯(x)=x2/2φ(x):¯𝜑maps-to𝑥¯𝜑𝑥superscriptnorm𝑥22𝜑𝑥\bar{\varphi}:x\mapsto\bar{\varphi}(x)=\|x\|^{2}/2-\varphi(x) is a convex function (it is an equivalence if X=Y=d𝑋𝑌superscript𝑑X=Y=\mathbb{R}^{d}, see [San15]).


观察 4。如果 c(x,y)𝑐𝑥𝑦c(x,y) = 1/2xy212superscriptnorm𝑥𝑦21/2\|x-y\|^{2}φ𝚿c(X)𝜑subscript𝚿𝑐𝑋\varphi\in{\bf\Psi}_{c}(X) ,那么 φ¯:xφ¯(x)=x2/2φ(x):¯𝜑maps-to𝑥¯𝜑𝑥superscriptnorm𝑥22𝜑𝑥\bar{\varphi}:x\mapsto\bar{\varphi}(x)=\|x\|^{2}/2-\varphi(x) 是凸函数(如果 X=Y=d𝑋𝑌superscript𝑑X=Y=\mathbb{R}^{d} ,则它是等价的,参见[San15])。
Proof. 证明。
φ(x)=ψc(x)=infy[xy22ψ(y)]=infy[x22xy+y22ψ(y)].𝜑𝑥superscript𝜓𝑐𝑥subscriptinfimum𝑦delimited-[]superscriptnorm𝑥𝑦22𝜓𝑦subscriptinfimum𝑦delimited-[]superscriptnorm𝑥22𝑥𝑦superscriptnorm𝑦22𝜓𝑦\begin{split}\varphi(x)=&\psi^{c}(x)\\ =&\inf\limits_{y}\left[\frac{\|x-y\|^{2}}{2}-\psi(y)\right]\\ =&\inf\limits_{y}\left[\frac{\|x\|^{2}}{2}-x\cdot y+\frac{\|y\|^{2}}{2}-\psi(y)\right].\end{split}

Then, 然后,

φ¯(x)=φ(x)x22=infy[xy+(y22ψ(y))].¯𝜑𝑥𝜑𝑥superscriptnorm𝑥22subscriptinfimum𝑦delimited-[]𝑥𝑦superscriptnorm𝑦22𝜓𝑦\begin{split}-\bar{\varphi}(x)=&\varphi(x)-\frac{\|x\|^{2}}{2}=\inf\limits_{y}\left[-x\cdot y+\left(\frac{\|y\|^{2}}{2}-\psi(y)\right)\right].\end{split}

Or equivalently, 或者等价的,

φ¯(x)=supy[xy(y22ψ(y))].¯𝜑𝑥subscriptsupremum𝑦delimited-[]𝑥𝑦superscriptnorm𝑦22𝜓𝑦\begin{split}\bar{\varphi}(x)=&\sup\limits_{y}\left[x\cdot y-\left(\frac{\|y\|^{2}}{2}-\psi(y)\right)\right].\end{split}

The function xxy(y22ψ(y))maps-to𝑥𝑥𝑦superscriptnorm𝑦22𝜓𝑦x\mapsto x\cdot y-\left(\frac{\|y\|^{2}}{2}-\psi(y)\right) is affine in x𝑥x, therefore the graph of φ¯¯𝜑\bar{\varphi} is the upper envelope of a family of hyperplanes, therefore φ¯¯𝜑\bar{\varphi} is a convex function (see Figure 8). ∎
函数 xxy(y22ψ(y))maps-to𝑥𝑥𝑦superscriptnorm𝑦22𝜓𝑦x\mapsto x\cdot y-\left(\frac{\|y\|^{2}}{2}-\psi(y)\right)x𝑥x 中是仿射的,因此 φ¯¯𝜑\bar{\varphi} 的图形是一族超平面的上包络,因此 φ¯¯𝜑\bar{\varphi} 是一个凸函数(见图 8)。∎

Observation 5.

We now consider the trajectories of two particles parameterized by t[0,1]𝑡01t\in[0,1], t(1t)x1+tT(x1)maps-to𝑡1𝑡subscript𝑥1𝑡𝑇subscript𝑥1t\mapsto(1-t)x_{1}+tT(x_{1}) and t(1t)x2+tT(x2)maps-to𝑡1𝑡subscript𝑥2𝑡𝑇subscript𝑥2t\mapsto(1-t)x_{2}+tT(x_{2}). If x1x2subscript𝑥1subscript𝑥2x_{1}\neq x_{2} and 0<t<10𝑡10<t<1 then there is no collision between the two particles.


观察 5.我们现在考虑两个由 t[0,1]𝑡01t\in[0,1]t(1t)x1+tT(x1)maps-to𝑡1𝑡subscript𝑥1𝑡𝑇subscript𝑥1t\mapsto(1-t)x_{1}+tT(x_{1})t(1t)x2+tT(x2)maps-to𝑡1𝑡subscript𝑥2𝑡𝑇subscript𝑥2t\mapsto(1-t)x_{2}+tT(x_{2}) 参数化的粒子的轨迹。 如果 x1x2subscript𝑥1subscript𝑥2x_{1}\neq x_{2}0<t<10𝑡10<t<1 ,那么两个粒子之间就不会发生碰撞。
Proof.

By contradiction, suppose there is a collision, that is there exists t(0,1)𝑡01t\in(0,1) and x1x2subscript𝑥1subscript𝑥2x_{1}\neq x_{2} such that

(1t)x1+tT(x1)=(1t)x2+tT(x2).1𝑡subscript𝑥1𝑡𝑇subscript𝑥11𝑡subscript𝑥2𝑡𝑇subscript𝑥2(1-t)x_{1}+tT(x_{1})=(1-t)x_{2}+tT(x_{2}).

Since T=φ¯𝑇¯𝜑T=\nabla\bar{\varphi}, we can rewrite the last equality as
T=φ¯𝑇¯𝜑T=\nabla\bar{\varphi} ,我们可以重写最后的等式为

(1t)(x1x2)+t(φ¯(x1)φ¯(x2))=0.1𝑡subscript𝑥1subscript𝑥2𝑡¯𝜑subscript𝑥1¯𝜑subscript𝑥20(1-t)(x_{1}-x_{2})+t(\nabla\bar{\varphi}(x_{1})-\nabla\bar{\varphi}(x_{2}))=0.

Therefore, 所以,

(1t)x1x22+t(φ¯(x1)φ¯(x2))(x1x2)=0.1𝑡superscriptnormsubscript𝑥1subscript𝑥22𝑡¯𝜑subscript𝑥1¯𝜑subscript𝑥2subscript𝑥1subscript𝑥20(1-t)\|x_{1}-x_{2}\|^{2}+t(\nabla\bar{\varphi}(x_{1})-\nabla\bar{\varphi}(x_{2}))\cdot(x_{1}-x_{2})=0.

The last step leads to a contradiction, between the left-hand side is the sum of two strictly positive numbers (recalling the definition of the convexity of φ¯¯𝜑\bar{\varphi}: x1x2,(x1x2)(φ¯(x1)φ¯(x2))>0formulae-sequencefor-allsubscript𝑥1subscript𝑥2subscript𝑥1subscript𝑥2¯𝜑subscript𝑥1¯𝜑subscript𝑥20\forall x_{1}\neq x_{2},(x_{1}-x_{2})\cdot(\nabla\bar{\varphi}(x_{1})-\nabla\bar{\varphi}(x_{2}))>0 )888Note that even if there is no collision, the trajectories can cross, that is (1t)x1+tT(x1)=(1t)x2+tT(x2)1𝑡subscript𝑥1𝑡𝑇subscript𝑥11superscript𝑡subscript𝑥2superscript𝑡𝑇subscript𝑥2(1-t)x_{1}+tT(x_{1})=(1-t^{\prime})x_{2}+t^{\prime}T(x_{2}) for some tt𝑡superscript𝑡t\neq t^{\prime} (see example in [Vil09]). If the cost is the Euclidean distance (instead of squared Euclidean distance), the non-intersection property is stronger and trajectories cannot cross. This comes at the expense of losing the uniqueness of the optimal transport plan[Vil09].
请注意,即使没有碰撞,轨迹也可以相交,即对于某些 tt𝑡superscript𝑡t\neq t^{\prime} (请参见[ Vil09]中的示例)。如果成本是欧氏距离(而不是平方欧氏距离),非交叉属性更强,轨迹不能相交。这是以失去最优运输方案的唯一性为代价[ Vil09]。

最后一步导致矛盾,左侧是两个严格正数的和(回想凸性的定义: φ¯¯𝜑\bar{\varphi}x1x2,(x1x2)(φ¯(x1)φ¯(x2))>0formulae-sequencefor-allsubscript𝑥1subscript𝑥2subscript𝑥1subscript𝑥2¯𝜑subscript𝑥1¯𝜑subscript𝑥20\forall x_{1}\neq x_{2},(x_{1}-x_{2})\cdot(\nabla\bar{\varphi}(x_{1})-\nabla\bar{\varphi}(x_{2}))>08


证明。通过反证法,假设存在一次碰撞,即存在 t(0,1)𝑡01t\in(0,1)x1x2subscript𝑥1subscript𝑥2x_{1}\neq x_{2} ,使得

6 Continuous, discrete and semi-discrete transport
连续、离散和半离散传输 6

The properties that we have presented in the previous sections are true for any couple of source and target measures μ𝜇\mu and ν𝜈\nu, that can derive from continuous functions or that can be discrete empirical measures (sum of Dirac masses). Figure 9 presents three configurations that are interesting to study. These configurations have specific properties, that lead to different algorithms for computing the transport. We give here some indications and references concerning the continuous \rightarrow continuous and discrete \rightarrow discrete cases. Then we will develop the continuous \rightarrow discrete case with more details in the next section.
我们在前几节中提到的性质对于任意一对源和目标度量 μ𝜇\muν𝜈\nu 都成立,可以由连续函数导出,也可以是离散经验度量(Dirac 质量之和)。图 9 呈现了三个有趣研究的配置。这些配置具有特定属性,导致了不同的计算传输算法。我们在此提供一些关于连续 \rightarrow 连续和离散 \rightarrow 离散情况的指示和参考文献。然后我们将在下一节更详细地讨论连续 \rightarrow 离散情况。

6.1 The continuous \rightarrow continuous case and Monge-Ampère equation
6.1 连续 \rightarrow 连续情况和 Monge-Ampère 方程

We recall that when the optimal transport map exists, in the case of the L2subscript𝐿2L_{2} cost (that is, c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2}), it can be deduced from the function φ𝜑\varphi by using the relation T(x)=φ¯=xφ𝑇𝑥¯𝜑𝑥𝜑T(x)=\nabla\bar{\varphi}=x-\nabla\varphi. The change of variable formula for integration over a subset B𝐵B of X𝑋X can be written as:
我们回顾一下,当最优输运映射存在时,在 L2subscript𝐿2L_{2} 成本的情况下(即 c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2} )。可以通过使用关系 T(x)=φ¯=xφ𝑇𝑥¯𝜑𝑥𝜑T(x)=\nabla\bar{\varphi}=x-\nabla\varphi 从函数 φ𝜑\varphi 推导出来 L2subscript𝐿2L_{2} 。对于 X𝑋X 的子集 B𝐵B 的积分的变量变换公式可以写成:

BX,B1𝑑μ=μ(B)=ν(T(B))=B|detJT(x)|𝑑μformulae-sequencefor-all𝐵𝑋subscript𝐵1differential-d𝜇𝜇𝐵𝜈𝑇𝐵subscript𝐵subscript𝐽𝑇𝑥differential-d𝜇\forall B\subset X,\int_{B}1d\mu=\mu(B)=\nu(T(B))=\int_{B}\left|\det J_{T}(x)\right|d\mu (18)

where JTsubscript𝐽𝑇J_{T} denotes the Jacobian matrix of T𝑇T and det\det denotes the determinant.
这里 JTsubscript𝐽𝑇J_{T} 表示 T𝑇T 的雅可比矩阵, det\det 表示行列式。

If μ𝜇\mu and ν𝜈\nu have densities u𝑢u and v𝑣v respectively, that is B,μ(B)=Bu(x)𝑑xfor-all𝐵𝜇𝐵subscript𝐵𝑢𝑥differential-d𝑥\forall B,\mu(B)=\int_{B}u(x)dx and ν(B)=Bv(x)𝑑x𝜈𝐵subscript𝐵𝑣𝑥differential-d𝑥\nu(B)=\int_{B}v(x)dx, then one can (formally) consider (18) pointwise in X𝑋X:
如果 μ𝜇\muν𝜈\nu 的密度分别为 u𝑢uv𝑣v ,即为 B,μ(B)=Bu(x)𝑑xfor-all𝐵𝜇𝐵subscript𝐵𝑢𝑥differential-d𝑥\forall B,\mu(B)=\int_{B}u(x)dxν(B)=Bv(x)𝑑x𝜈𝐵subscript𝐵𝑣𝑥differential-d𝑥\nu(B)=\int_{B}v(x)dx ,那么可以在 X𝑋X 中(正式地)逐点考虑(18):

xX,u(x)=|detJT(x)|v(T(x)).formulae-sequencefor-all𝑥𝑋𝑢𝑥subscript𝐽𝑇𝑥𝑣𝑇𝑥\forall x\in X,\ u(x)=\left|\det J_{T}(x)\right|v(T(x)). (19)

By injecting T=φ¯𝑇¯𝜑T=\nabla\bar{\varphi} and JT=Hφ¯subscript𝐽𝑇𝐻¯𝜑J_{T}=H\bar{\varphi} into (19), one obtains:
通过将 T=φ¯𝑇¯𝜑T=\nabla\bar{\varphi}JT=Hφ¯subscript𝐽𝑇𝐻¯𝜑J_{T}=H\bar{\varphi} 代入(19),可以得到:

xX,u(x)=|detHφ¯(x)|v(φ¯(x)),formulae-sequencefor-all𝑥𝑋𝑢𝑥𝐻¯𝜑𝑥𝑣¯𝜑𝑥\forall x\in X,\ u(x)=\left|\det H\bar{\varphi}(x)\right|v(\nabla\bar{\varphi}(x)), (20)

where Hφ¯𝐻¯𝜑H\bar{\varphi} denotes the Hessian matrix of φ¯¯𝜑\bar{\varphi}. Equation (20) is known ad the Monge-Ampère equation. It is a highly non-linear equation, and its solutions when they exist often present singularities999 This is similar to the eikonal equation, which solution corresponds to the distance field, that has a singularity on the medial axis.
这类似于等距方程,其解对应于距离场,在中轴线上存在奇点。

其中 Hφ¯𝐻¯𝜑H\bar{\varphi} 表示 φ¯¯𝜑\bar{\varphi} 的 Hessian 矩阵。方程(20)被称为蒙热-安普尔方程。这是一个高度非线性的方程,当存在解时,通常会呈现奇点 9
. Note that the derivation above is purely formal, and that studying the solutions of the Monge-Ampère equation require using more sophisticated tools. In particular, it is possible to define several types of weak solutions (viscosity solutions, solution in the sense of Brenier, solutions in the sense of Alexandrov …). Several algorithms to compute numerical solutions of the Monge-Ampère equations were proposed. As such, see for instance the Benamou-Brenier algorithm [BB00], that uses a dynamic formulation inspired by fluid dynamics (incompressible Euler equation with specific boundary conditions). See also [PPO14].
。请注意,上述推导纯粹是形式化的,研究蒙热-安普尔方程的解需要使用更复杂的工具。特别是,可以定义几种类型的弱解(粘度解,Brenier 意义上的解,Alexandrov 意义上的解…)。提出了几种计算蒙热-安普尔方程数值解的算法。例如,参见 Benamou-Brenier 算法[BB00],它使用受流体动力学启发的动态形式(具有特定边界条件的不可压缩欧拉方程)。另请参阅[PPO14]。

6.2 The discrete \rightarrow discrete case
6.2 离散 \rightarrow 离散案例

If μ𝜇\mu is the sum of m𝑚m Dirac masses and ν𝜈\nu the sum of n𝑛n Dirac masses, then the problem boils down to finding the m×n𝑚𝑛m\times n coefficients γijsubscript𝛾𝑖𝑗\gamma_{ij} that give for each pair of points i𝑖i of the source space and j𝑗j of the target space the quantity of matter transported from i𝑖i to j𝑗j. This corresponds to the transport plan in the discrete Kantorovich problem that we have seen previously §4. This type of problem (referred to as an assignment problem) can be solved by different methods of linear programming [BDM09]. These method can be dramatically accelerated by adding a regularization term, that can be interpreted as the entropy of the transport plan [Leo13]. This regularized version of optimal transport can be solved by highly efficient numerical algorithms [Cut13].
如果 μ𝜇\mum𝑚m 个 Dirac 质点的总和, ν𝜈\nun𝑛n 个 Dirac 质点的总和,则问题归结为找出 m×n𝑚𝑛m\times n 系数 γijsubscript𝛾𝑖𝑗\gamma_{ij} ,使得源空间 i𝑖i 和目标空间 j𝑗j 每对点之间从 i𝑖i 传输到 j𝑗j 的物质量。这对应于前面我们看到的离散 Kantorovich 问题中的传输计划第 4 节。这种问题(称为分配问题)可以通过不同的线性规划方法来解决[BDM09]。通过添加一个正则化项,可以显著加速这些方法,该正则化项可解释为传输计划的熵[Leo13]。这种正则化版本的最优传输可以通过高效的数值算法来解决[Cut13]。

6.3 The continuous \rightarrow discrete case
连续 \rightarrow 离散情况 6.3

This configuration, also called semi-discrete, corresponds to a continuous function transported to a sum of Dirac masses (see the examples of c-concave functions in [GM96]). This correspond to our example with factories that consume a resource, in §2.2. Semi-discrete transport has interesting connections with some notions of computational geometry and some specific sets of convex polyhedra that were studied by Alexandrov [Ale05] and later by Aurenhammer, Hoffman and Aranov [AHA92]. The next section is devoted to this configuration.
这种配置,也被称为半离散,对应于将连续函数转换为一组 Dirac 质量的总和(参见[GM96]中的 c-凹函数示例)。这对应于我们在第 2.2 节中消耗资源的工厂示例。半离散传输与计算几何的某些概念和一些特定的凸多面体集合有有趣的联系,这些概念曾由亚历山德罗夫[Ale05]以及后来由 Aurenhammer,Hoffman 和 Aranov[AHA92]进行研究。接下来的部分将致力于这种配置。

7 Semi-discrete transport 7 半离散输运

Refer to caption

Figure 10: Semi-discrete transport: gray levels symbolize the quantity of a resource that will be transported to 4 factories. Each factory will be allocated a part of the terrain in function of the quantity of resource that it should collect.
图 10:半离散传输:灰色级别象征着将被运送到 4 个工厂的资源数量。每个工厂将根据应收集的资源数量被分配一部分地形。

We now suppose that the source measure μ𝜇\mu is continuous, and that the target measure ν𝜈\nu is a sum of Dirac masses. A practical example of this type of configuration corresponds to a resource which available quantity is represented by a function u𝑢u. The resource is collected by a set of n𝑛n factories, as shown in Figure 10. Each factory is supposed to collect a certain prescribed quantity of resource νjsubscript𝜈𝑗\nu_{j}. Clearly, the sum of all prescriptions corresponds to the total quantity of available resource (j=1nνj=Xu(x)𝑑xsuperscriptsubscript𝑗1𝑛subscript𝜈𝑗subscript𝑋𝑢𝑥differential-d𝑥\sum_{j=1}^{n}\nu_{j}=\int_{X}u(x)dx).
我们现在假设源度量 μ𝜇\mu 为连续的,目标度量 ν𝜈\nu 是 Dirac 质量的总和。这种配置的一个实际例子对应于一种资源,其可用数量由函数 u𝑢u 表示。该资源由一组 n𝑛n 工厂收集,如图 10 所示。每个工厂都应该收集一定数量的资源 νjsubscript𝜈𝑗\nu_{j} 。显然,所有预定数量的总和对应于可用资源的总数量 ( j=1nνj=Xu(x)𝑑xsuperscriptsubscript𝑗1𝑛subscript𝜈𝑗subscript𝑋𝑢𝑥differential-d𝑥\sum_{j=1}^{n}\nu_{j}=\int_{X}u(x)dx )。

7.1 Semi-discrete Monge problem
7.1 半离散 Monge 问题

In this specific configuration, the Monge problem becomes:
在这种特定配置中,Monge 问题变为:

infT:XYXc(x,T(x))u(x)𝑑x, subject toT1(yj)u(x)𝑑x=νj,j.formulae-sequencesubscriptinfimum:𝑇𝑋𝑌subscript𝑋𝑐𝑥𝑇𝑥𝑢𝑥differential-d𝑥 subject tosubscriptsuperscript𝑇1subscript𝑦𝑗𝑢𝑥differential-d𝑥subscript𝜈𝑗for-all𝑗\begin{array}[]{l}\inf\limits_{T:X\rightarrow Y}\int\limits_{X}c(x,T(x))u(x)dx,\mbox{ subject to}\int\limits_{T^{-1}(y_{j})}u(x)dx=\nu_{j},\forall j.\end{array}

A transport map T𝑇T associates with each point x𝑥x of X𝑋X one of the points yjsubscript𝑦𝑗y_{j}. Thus, it is possible to partition X𝑋X, by associating to each yjsubscript𝑦𝑗y_{j} the region T1(yj)superscript𝑇1subscript𝑦𝑗T^{-1}(y_{j}) that contains all the points transported towards yjsubscript𝑦𝑗y_{j} by T𝑇T. The constraint imposes that the quantity of collected resource over each region T1(yj)superscript𝑇1subscript𝑦𝑗T^{-1}(y_{j}) corresponds to the prescribed quantity νjsubscript𝜈𝑗\nu_{j}.
一个运输映射 T𝑇TX𝑋X 上的每个点 x𝑥x 关联,关联到所有由 T𝑇T 运送到 yjsubscript𝑦𝑗y_{j} 的点 yjsubscript𝑦𝑗y_{j} 的一个点。因此,通过将每个 yjsubscript𝑦𝑗y_{j} 关联到包含所有通过 T𝑇T 运送到 yjsubscript𝑦𝑗y_{j} 的点的区域 T1(yj)superscript𝑇1subscript𝑦𝑗T^{-1}(y_{j}) ,可以对 X𝑋X 进行划分。约束要求在每个区域 T1(yj)superscript𝑇1subscript𝑦𝑗T^{-1}(y_{j}) 收集的资源量与规定的数量 νjsubscript𝜈𝑗\nu_{j} 相对应。

Let us now examine the form of the dual Kantorovich problem. In terms of measure, the source measure μ𝜇\mu has a density u𝑢u, and the target measure ν𝜈\nu is a sum of Dirac masses ν=j=1nνjδyj𝜈superscriptsubscript𝑗1𝑛subscript𝜈𝑗subscript𝛿subscript𝑦𝑗\nu=\sum_{j=1}^{n}\nu_{j}\delta_{y_{j}}, supported by the set of points Y={yj}𝑌subscript𝑦𝑗Y=\left\{y_{j}\right\}. We recall that in its general form, the dual Kantorovich problem is written as follows:
现在让我们来检查对偶 Kantorovich 问题的形式。就测度而言,源测度 μ𝜇\mu 具有密度 u𝑢u ,目标测度 ν𝜈\nu 是由支持在点集 Y={yj}𝑌subscript𝑦𝑗Y=\left\{y_{j}\right\} 上的 Dirac 质量 ν=j=1nνjδyj𝜈superscriptsubscript𝑗1𝑛subscript𝜈𝑗subscript𝛿subscript𝑦𝑗\nu=\sum_{j=1}^{n}\nu_{j}\delta_{y_{j}} 之和。我们想起,对偶 Kantorovich 问题的一般形式如下所示:

supψ𝚿𝐜(Y)[Xψc(x)𝑑μ+Yψ(y)𝑑ν].subscriptsupremum𝜓superscript𝚿𝐜𝑌delimited-[]subscript𝑋superscript𝜓𝑐𝑥differential-d𝜇subscript𝑌𝜓𝑦differential-d𝜈\sup\limits_{\psi\in{\bf\Psi^{c}}(Y)}\left[\int_{X}\psi^{c}(x)d\mu+\int_{Y}\psi(y)d\nu\right]. (21)

In our semi-discrete case, the functional becomes a function of n𝑛n variables, with the following form:
在我们的半离散情况中,该函数成为 n𝑛n 个变量的函数,形式如下:

F(ψ)𝐹𝜓\displaystyle F(\psi) =\displaystyle= F(ψ1,ψ2,ψn)𝐹subscript𝜓1subscript𝜓2subscript𝜓𝑛\displaystyle\!F(\psi_{1},\psi_{2},\ldots\psi_{n}) (22)
=\displaystyle= Xψc(x)u(x)𝑑x+j=1nψjνjsubscript𝑋superscript𝜓𝑐𝑥𝑢𝑥differential-d𝑥superscriptsubscript𝑗1𝑛subscript𝜓𝑗subscript𝜈𝑗\displaystyle\!\int\limits_{X}\!\psi^{c}(x)u(x)dx+\sum_{j=1}^{n}\psi_{j}\nu_{j} (23)
=\displaystyle= XinfyjY[c(x,yj)ψj]u(x)dx+j=1nψjνjsubscript𝑋subscriptinfimumsubscript𝑦𝑗𝑌delimited-[]𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑢𝑥𝑑𝑥superscriptsubscript𝑗1𝑛subscript𝜓𝑗subscript𝜈𝑗\displaystyle\!\int\limits_{X}\!\inf\limits_{y_{j}\in Y}\left[c(x,y_{j})-\psi_{j}\right]u(x)dx+\sum_{j=1}^{n}\psi_{j}\nu_{j} (24)
=\displaystyle= j=1nLagψc(yj)(c(x,yj)ψj)u(x)𝑑x+j=1nψjνj.superscriptsubscript𝑗1𝑛subscriptL𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑢𝑥differential-d𝑥superscriptsubscript𝑗1𝑛subscript𝜓𝑗subscript𝜈𝑗\displaystyle\!\sum\limits_{j=1}^{n}\int\limits_{\mathrm{L}ag_{\psi}^{c}(y_{j})}\!\left(c(x,y_{j})-\psi_{j}\right)u(x)dx+\sum_{j=1}^{n}\psi_{j}\nu_{j}. (25)

The first step (23) takes into account the nature of the measures μ𝜇\mu and ν𝜈\nu. In particular, one can notice that the measure ν𝜈\nu is completely defined by the scalars νjsubscript𝜈𝑗\nu_{j} associated with the points yjsubscript𝑦𝑗y_{j}, and the function ψ𝜓\psi is defined by the scalars ψjsubscript𝜓𝑗\psi_{j} that correspond to its value at each point yjsubscript𝑦𝑗y_{j}. The integral Yψ(y)𝑑νsubscript𝑌𝜓𝑦differential-d𝜈\int_{Y}\psi(y)d\nu becomes the dot product jψjνjsubscript𝑗subscript𝜓𝑗subscript𝜈𝑗\sum_{j}\psi_{j}\nu_{j}. Thus, the functional that corresponds to the dual Kantorovich problem becomes a function F𝐹F that depends on n𝑛n variables (the ψjsubscript𝜓𝑗\psi_{j}). Let us now replace the c-conjugate ψcsuperscript𝜓𝑐\psi^{c} with its expression, which gives (24). The integral in the left term can be reorganized, by grouping the points of X𝑋X for which the same point yjsubscript𝑦𝑗y_{j} minimizes c(x,yj)ψj𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗c(x,y_{j})-\psi_{j}, which gives (25), where the Laguerre cell Lagψc(yj)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗\mathrm{L}ag_{\psi}^{c}(y_{j}) is defined by:
第一步(23)考虑了测量 μ𝜇\muν𝜈\nu 的性质。特别是,可以注意到测量 ν𝜈\nu 完全由与点 yjsubscript𝑦𝑗y_{j} 相关联的标量 νjsubscript𝜈𝑗\nu_{j} 定义,函数 ψ𝜓\psi 由对应于每个点 yjsubscript𝑦𝑗y_{j} 的标量 ψjsubscript𝜓𝑗\psi_{j} 定义。积分 Yψ(y)𝑑νsubscript𝑌𝜓𝑦differential-d𝜈\int_{Y}\psi(y)d\nu 变成点积 jψjνjsubscript𝑗subscript𝜓𝑗subscript𝜈𝑗\sum_{j}\psi_{j}\nu_{j} 。因此,对偶 Kantorovich 问题对应的函数变成了一个依赖 n𝑛n 个变量( ψjsubscript𝜓𝑗\psi_{j} )的函数 F𝐹F 。现在让我们用它的表达式替换 c 共轭 ψcsuperscript𝜓𝑐\psi^{c} ,得到(24)。左项中的积分可以重新组织,通过将使得相同点 yjsubscript𝑦𝑗y_{j} 最小化 c(x,yj)ψj𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗c(x,y_{j})-\psi_{j}X𝑋X 的点进行分组,得到(25),其中 Laguerre 单元 Lagψc(yj)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗\mathrm{L}ag_{\psi}^{c}(y_{j}) 定义为:

Lagψc(yj)={xX|c(x,yj)ψjc(x,yk)ψk,kj}.\mathrm{L}ag_{\psi}^{c}(y_{j})=\left\{x\in X\quad|\quad c(x,y_{j})-\psi_{j}\leq c(x,y_{k})-\psi_{k},\ \forall k\neq j\right\}.

The Laguerre diagram, formed by the union of the Laguerre cells, is a classical structure in computational geometry. In the case of the L2subscript𝐿2L_{2} cost c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2}, it corresponds to the power diagram, that was studied by Aurenhammer at the end of the 80’s [Aur87]. One of its particularities is that the boundaries of the cells are rectilinear, making it reasonably easy to design computational algorithms to construct them.
拉盖尔图是由拉盖尔单元的联合形成的,是计算几何学中的经典结构。在 L2subscript𝐿2L_{2} 成本 c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2} 的情况下,它对应于能量图,这是在 80 年代末由 Aurenhammer 研究的[ Aur87]。其特点之一是单元的边界是直线的,这使得设计计算算法来构建它们相对容易。

Refer to caption       Refer to caption

Figure 11: The objective function of the dual Kantorovich problem is concave, because its graph is the lower envelope of a family of affine functions.
图 11:对偶坎托罗维奇问题的目标函数是凹的,因为其图形是一族仿射函数的下包络线。

7.2 Concavity of F𝐹F 7.2 F𝐹F 的凹性

The objective function F𝐹F is a particular case of the Kantorovich dual, and naturally inherits its properties, such as its concavity (that we did not discuss yet). This property is interesting both from a theoretical point of view, to study the existence and uniqueness of solutions, and from a practical point of view, to design efficient numerical solution mechanisms. In the semi-discrete case, the concavity of F𝐹F is easier to prove than in the general case. We summarize here the proof by Aurenhammer et. al [AHA92], that leads to an efficient algorithm [Mér11], [Lév15], [KMT16].
目标函数 F𝐹F 是康托罗维奇对偶问题的一个特例,并自然地继承了其性质,比如它的凹性(我们还没有讨论)。这一性质既有理论意义,用于研究解的存在性和唯一性,也有实际意义,用于设计高效的数值解算机制。在半离散情况下, F𝐹F 的凹性比一般情况容易证明。我们在这里总结 Aurenhammer 等人提出的证明,这导致了一个高效的算法。

Theorem 2.

The objective function F𝐹F of the semi-discrete Kantorovich dual problem (21) is concave.


定理 2。半离散康托罗维奇对偶问题(21)的目标函数 F𝐹F 是凹的。
Proof.

Consider the function G𝐺G defined by:

G(A,[ψ1,ψn])=X(c(x,yA(x))ψA(x))u(x)𝑑x,𝐺𝐴subscript𝜓1subscript𝜓𝑛subscript𝑋𝑐𝑥subscript𝑦𝐴𝑥subscript𝜓𝐴𝑥𝑢𝑥differential-d𝑥G(A,\left[\psi_{1},\ldots\psi_{n}\right])=\int\limits_{X}\left(c(x,y_{A(x)})-\psi_{A(x)}\right)u(x)dx, (26)

and parameterized by an assignment A:X[1n]:𝐴𝑋delimited-[]1𝑛A~{}:X\rightarrow[1\ldots n], that is, a function that associates with each point x𝑥x of X𝑋X the index j𝑗j of one of the points yjsubscript𝑦𝑗y_{j}. If we denote A1(j)={x|A(x)=j}superscript𝐴1𝑗conditional-set𝑥𝐴𝑥𝑗A^{-1}(j)=\left\{x|A(x)=j\right\}, then G𝐺G can be also written as:
并由赋值 A:X[1n]:𝐴𝑋delimited-[]1𝑛A~{}:X\rightarrow[1\ldots n] 参数化,即,将每个点 x𝑥x 对应到 X𝑋X 中某个点 yjsubscript𝑦𝑗y_{j} 的索引 j𝑗j 的函数。如果我们表示为 A1(j)={x|A(x)=j}superscript𝐴1𝑗conditional-set𝑥𝐴𝑥𝑗A^{-1}(j)=\left\{x|A(x)=j\right\} ,那么 G𝐺G 也可写成:

G(A,ψ)=jA1(j)(c(x,yj)ψj)u(x)𝑑x=jA1(j)c(x,yj)u(x)𝑑xjψjA1(j)u(x)𝑑x.𝐺𝐴𝜓subscript𝑗subscriptsuperscript𝐴1𝑗𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑢𝑥differential-d𝑥missing-subexpressionsubscript𝑗subscriptsuperscript𝐴1𝑗𝑐𝑥subscript𝑦𝑗𝑢𝑥differential-d𝑥subscript𝑗subscript𝜓𝑗subscriptsuperscript𝐴1𝑗𝑢𝑥differential-d𝑥\begin{array}[]{lcl}G(A,\psi)&=&\sum\limits_{j}\int\limits_{A^{-1}(j)}\left(c(x,y_{j})-\psi_{j}\right)u(x)dx\\[5.69054pt] &=&\sum\limits_{j}\int\limits_{A^{-1}(j)}c(x,y_{j})u(x)dx-\sum\limits_{j}\psi_{j}\int\limits_{A^{-1}(j)}u(x)dx.\end{array} (27)

The first term does not depend on the ψjsubscript𝜓𝑗\psi_{j}, and the second one is a linear combination of the ψjsubscript𝜓𝑗\psi_{j} coefficients, thus, for a given fixed assignment A𝐴A, ψG(A,ψ)maps-to𝜓𝐺𝐴𝜓\psi\mapsto G(A,\psi) is an affine function of ψ𝜓\psi. Figure 11 depicts the appearance of the graph of G𝐺G for different assignment. The horizontal axis symbolizes the components of the vector ψ𝜓\psi (of dimension n𝑛n) and the vertical axis the value of G(A,ψ)𝐺𝐴𝜓G(A,\psi). For a given assignment A𝐴A, the graph of G𝐺G is an hyperplane (symbolized here by a straight line).
第一项不依赖于 ψjsubscript𝜓𝑗\psi_{j} ,第二项是 ψjsubscript𝜓𝑗\psi_{j} 系数的线性组合,因此,对于给定的固定赋值 A𝐴AψG(A,ψ)maps-to𝜓𝐺𝐴𝜓\psi\mapsto G(A,\psi)ψ𝜓\psi 的仿射函数。图 11 描述了 G𝐺G 的图形在不同赋值下的外观。水平轴表示向量 ψ𝜓\psi (维度为 n𝑛n )的分量,垂直轴表示 G(A,ψ)𝐺𝐴𝜓G(A,\psi) 的值。对于给定赋值 A𝐴AG𝐺G 的图形是一个超平面(这里用直线表示)。


证明。考虑由 G𝐺G 定义的函数:

Among all the possible assignments A𝐴A, we distinguish Aψsuperscript𝐴𝜓A^{\psi} that associates with a point x𝑥x the index j𝑗j of the Laguerre cell x𝑥x belongs to 101010A𝐴A is undefined on the set of Laguerre cell boundaries, this does not cause any difficulty because this set has zero measure.
A𝐴A 在拉盖尔单元边界集上未定义,这不会造成任何困难,因为该集合具有零测度。

在所有可能的赋值 A𝐴A 中,我们将 Aψsuperscript𝐴𝜓A^{\psi} 与点 x𝑥x 相关联,该点为 x𝑥x 的拉盖尔单元 j𝑗j 的索引所属于 10
, that is: ,即:

Aψ(x)=argminj[c(x,yj)ψj].superscript𝐴𝜓𝑥subscript𝑗𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗A^{\psi}(x)=\arg\min\limits_{j}\left[c(x,y_{j})-\psi_{j}\right].

For a fixed vector ψ=ψ0𝜓superscript𝜓0\psi=\psi^{0}, among all the possible assignments A𝐴A, the assignment Aψ0superscript𝐴superscript𝜓0A^{\psi^{0}} minimizes the value G(A,ψ0)𝐺𝐴superscript𝜓0G(A,\psi^{0}), because it minimizes the integrand pointwise (see Figure 11 on the left. Thus, since G𝐺G is affine with respect to ψ𝜓\psi, the graph of the function ψG(Aψ,ψ)𝜓𝐺superscript𝐴𝜓𝜓\psi\rightarrow G(A^{\psi},\psi) is the lower envelope of a family of hyperplanes (symbolized as straight lines in Figure 11 on the right), hence ψG(Aψ,ψ)𝜓𝐺superscript𝐴𝜓𝜓\psi\rightarrow G(A^{\psi},\psi) is a concave function. Finally, the objective function F𝐹F of the dual Kantorovich problem can be written as F(ψ)=G(Aψ,ψ)+jνjψj𝐹𝜓𝐺superscript𝐴𝜓𝜓subscript𝑗subscript𝜈𝑗subscript𝜓𝑗F(\psi)=G(A^{\psi},\psi)+\sum_{j}\nu_{j}\psi_{j}, that is, the sum of a concave function and a linear function, hence it is also a concave function. ∎
对于一个固定的向量 ψ=ψ0𝜓superscript𝜓0\psi=\psi^{0} ,在所有可能的分配 A𝐴A 中,分配 Aψ0superscript𝐴superscript𝜓0A^{\psi^{0}} 使值 G(A,ψ0)𝐺𝐴superscript𝜓0G(A,\psi^{0}) 最小化,因为它逐点最小化被积函数(见图 11 左侧)。因此,由于 G𝐺Gψ𝜓\psi 是仿射的,函数 ψG(Aψ,ψ)𝜓𝐺superscript𝐴𝜓𝜓\psi\rightarrow G(A^{\psi},\psi) 的图是一族超平面的下包络线(在图 11 右侧用直线符号表示),因此 ψG(Aψ,ψ)𝜓𝐺superscript𝐴𝜓𝜓\psi\rightarrow G(A^{\psi},\psi) 是凹函数。最后,对偶 Kantorovich 问题的目标函数 F𝐹F 可以写成 F(ψ)=G(Aψ,ψ)+jνjψj𝐹𝜓𝐺superscript𝐴𝜓𝜓subscript𝑗subscript𝜈𝑗subscript𝜓𝑗F(\psi)=G(A^{\psi},\psi)+\sum_{j}\nu_{j}\psi_{j} ,即凹函数与线性函数之和,所以它也是凹函数。 ∎

7.3 The semi-discrete optimal transport map
7.3 半离散最优输运映射

Let us now examine the c𝑐c-superdifferential cψsuperscript𝑐𝜓\partial^{c}\psi, that is, the set of points (xX𝑥𝑋x\in X, yY𝑦𝑌y\in Y) connected by the optimal transport map. We recall that the c𝑐c-superdifferential cψsuperscript𝑐𝜓\partial^{c}\psi can be defined alternatively as cψ={(x,yj)|ψc(x)+ψj=c(x,yj)}superscript𝑐𝜓conditional-set𝑥subscript𝑦𝑗superscript𝜓𝑐𝑥subscript𝜓𝑗𝑐𝑥subscript𝑦𝑗\partial^{c}\psi=\left\{(x,y_{j})\ |\ \psi^{c}(x)+\psi_{j}=c(x,y_{j})\right\}. Consider a point x𝑥x of X𝑋X that belongs to the Laguerre cell Lagψc(yj)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗\mathrm{L}ag_{\psi}^{c}(y_{j}). The c𝑐c-superdifferential [cψ](x)delimited-[]superscript𝑐𝜓𝑥[\partial^{c}\psi](x) at x𝑥x is defined by:
让我们现在来检验 c𝑐c -超微分 cψsuperscript𝑐𝜓\partial^{c}\psi ,即由最优输运映射连接的点( xX𝑥𝑋x\in XyY𝑦𝑌y\in Y )的集合。我们回忆起 c𝑐c -超微分 cψsuperscript𝑐𝜓\partial^{c}\psi 可以另行定义为 cψ={(x,yj)|ψc(x)+ψj=c(x,yj)}superscript𝑐𝜓conditional-set𝑥subscript𝑦𝑗superscript𝜓𝑐𝑥subscript𝜓𝑗𝑐𝑥subscript𝑦𝑗\partial^{c}\psi=\left\{(x,y_{j})\ |\ \psi^{c}(x)+\psi_{j}=c(x,y_{j})\right\} 。考虑一个属于拉盖尔单元 Lagψc(yj)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗\mathrm{L}ag_{\psi}^{c}(y_{j})X𝑋Xx𝑥x 。在 x𝑥x 处的 c𝑐c -超微分 [cψ](x)delimited-[]superscript𝑐𝜓𝑥[\partial^{c}\psi](x) 定义为:

[cψ](x)delimited-[]superscript𝑐𝜓𝑥\displaystyle[\partial^{c}\psi](x) =\displaystyle= {yk|ψc(x)+ψk=c(x,yk)}conditional-setsubscript𝑦𝑘superscript𝜓𝑐𝑥subscript𝜓𝑘𝑐𝑥subscript𝑦𝑘\displaystyle\left\{y_{k}\ |\ \psi^{c}(x)+\psi_{k}=c(x,y_{k})\right\} (28)
=\displaystyle= {yk|infyl[c(x,yl)ψl]+ψk=c(x,yk)}conditional-setsubscript𝑦𝑘subscriptinfimumsubscript𝑦𝑙delimited-[]𝑐𝑥subscript𝑦𝑙subscript𝜓𝑙subscript𝜓𝑘𝑐𝑥subscript𝑦𝑘\displaystyle\left\{y_{k}\ |\ \inf_{y_{l}}\left[c(x,y_{l})-\psi_{l}\right]+\psi_{k}=c(x,y_{k})\right\} (29)
=\displaystyle= {yk|c(x,yj)ψj+ψk=c(x,yk)}conditional-setsubscript𝑦𝑘𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗subscript𝜓𝑘𝑐𝑥subscript𝑦𝑘\displaystyle\left\{y_{k}\ |\ c(x,y_{j})-\psi_{j}+\psi_{k}=c(x,y_{k})\right\} (30)
=\displaystyle= {yk|c(x,yj)ψj=c(x,yk)ψk}conditional-setsubscript𝑦𝑘𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑐𝑥subscript𝑦𝑘subscript𝜓𝑘\displaystyle\left\{y_{k}\ |\ c(x,y_{j})-\psi_{j}=c(x,y_{k})-\psi_{k}\right\} (31)
=\displaystyle= {yj}.subscript𝑦𝑗\displaystyle\left\{y_{j}\right\}. (32)

In the first step (29), we replace ψcsuperscript𝜓𝑐\psi^{c} by its definition, then we use the fact that x𝑥x belongs to the Laguerre cell of yjsubscript𝑦𝑗y_{j} (30), and finally, the only point of Y𝑌Y that satisfies (31) is yjsubscript𝑦𝑗y_{j} because we have supposed x𝑥x inside the Laguerre cell of yjsubscript𝑦𝑗y_{j}.
在第一步(29),我们用 ψcsuperscript𝜓𝑐\psi^{c} 的定义替换它,然后我们利用 x𝑥x 属于 yjsubscript𝑦𝑗y_{j} 的 Laguerre 单元(30),最后,满足(31)的唯一 Y𝑌Y 点是 yjsubscript𝑦𝑗y_{j} ,因为我们假设 x𝑥xyjsubscript𝑦𝑗y_{j} 的 Laguerre 单元内。

To summarize, the optimal transport map T𝑇T moves each point x𝑥x to the point yjsubscript𝑦𝑗y_{j} associated with the Laguerre cell Lagψc(yj)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗\mathrm{L}ag_{\psi}^{c}(y_{j}) that contains x𝑥x. The vector ψ1,ψnsubscript𝜓1subscript𝜓𝑛\psi_{1},\ldots\psi_{n} is the unique vector that maximizes the discrete dual Kantorovich function F𝐹F such that ψ𝜓\psi is c-concave. It is possible to show that ψ𝜓\psi is c-concave if and only if no Laguerre cell is empty of matter, that is the integral of u𝑢u is non-zero on each Laguerre cell. Indeed, we have the following results:
综上所述,最佳传输映射 T𝑇T 将每个点 x𝑥x 移动到与包含 x𝑥x 的 Laguerre 单元 Lagψc(yj)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗\mathrm{L}ag_{\psi}^{c}(y_{j}) 相关联的点 yjsubscript𝑦𝑗y_{j} 。向量 ψ1,ψnsubscript𝜓1subscript𝜓𝑛\psi_{1},\ldots\psi_{n} 是唯一的向量,它最大化离散对偶 Kantorovich 函数 F𝐹F ,使得 ψ𝜓\psi 是 c-凹的。可以证明,当且仅当没有 Laguerre 单元缺少物质时,即每个 Laguerre 单元上 u𝑢u 的积分非零时, ψ𝜓\psi 是 c-凹的。事实上,我们有以下结果:

Theorem 3.

Let Y={y1,,yn}𝑌subscript𝑦1subscript𝑦𝑛Y=\{y_{1},\ldots,y_{n}\} be a set of n𝑛n points. Let ψ𝜓\psi be any function defined on Y𝑌Y such that Lagψc(yi)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑖\mathrm{L}ag_{\psi}^{c}(y_{i}) are not empty sets. Then ψ𝜓\psi is a c𝑐c-concave function.


定理 3. 设 Y={y1,,yn}𝑌subscript𝑦1subscript𝑦𝑛Y=\{y_{1},\ldots,y_{n}\} 是一个包含 n𝑛n 个点的集合。设 ψ𝜓\psi 是在 Y𝑌Y 上定义的任意函数,使得 Lagψc(yi)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑖\mathrm{L}ag_{\psi}^{c}(y_{i}) 不是空集。那么 ψ𝜓\psi 是一个 c𝑐c -凹函数。
Proof.

By definition, for i=1,2,,k𝑖12𝑘i=1,2,\ldots,k

(ψc)c(yi)=infxX[c(x,yi)ψc(x)]=infxX[c(x,yi)(infyjY[c(x,yj)ψ(yj)])]=infxX{ψ(yi),if xLagψc(yi) c(x,yi)(c(x,yj)ψ(yj)<(c(x,yi)ψ(yi))),if xLagψc(yj) (ji) =ψ(yi).superscriptsuperscript𝜓𝑐𝑐subscript𝑦𝑖subscriptinfimum𝑥𝑋delimited-[]𝑐𝑥subscript𝑦𝑖superscript𝜓𝑐𝑥subscriptinfimum𝑥𝑋delimited-[]𝑐𝑥subscript𝑦𝑖subscriptinfimumsubscript𝑦𝑗𝑌delimited-[]𝑐𝑥subscript𝑦𝑗𝜓subscript𝑦𝑗subscriptinfimum𝑥𝑋cases𝜓subscript𝑦𝑖if xLagψc(yi) missing-subexpression𝑐𝑥subscript𝑦𝑖superscript𝑐𝑥subscript𝑦𝑗𝜓subscript𝑦𝑗absent𝑐𝑥subscript𝑦𝑖𝜓subscript𝑦𝑖if xLagψc(yj) (ji) missing-subexpression𝜓subscript𝑦𝑖\begin{split}(\psi^{c})^{c}(y_{i})&=\inf\limits_{x\in X}\left[c(x,y_{i})-\psi^{c}(x)\right]\\ &=\inf\limits_{x\in X}\left[c(x,y_{i})-\left(\inf\limits_{y_{j}\in Y}\left[c(x,y_{j})-\psi(y_{j})\right]\right)\right]\\ &=\inf\limits_{x\in X}\left\{\begin{array}[]{ll}\psi(y_{i}),\quad\quad\quad\quad\quad\ \quad\quad\quad\quad\hbox{if $x\in\mathrm{L}ag_{\psi}^{c}(y_{i})$ }\\ c(x,y_{i})-(\overbrace{c(x,y_{j})-\psi(y_{j})}^{\quad<(c(x,y_{i})-\psi(y_{i}))}),\ \hbox{if $x\in\mathrm{L}ag_{\psi}^{c}(y_{j})$ $(j\neq i)$ }\end{array}\right.\\ &=\psi(y_{i}).\end{split}

This allows to conclude that ψ𝜓\psi is a c𝑐c-convex function (see [San15, Proposition 1.34]). ∎
这样可以得出结论, ψ𝜓\psi 是一个 c𝑐c -凸函数(参见[San15,命题 1.34])。 ∎


证明。 根据定义,对于 i=1,2,,k𝑖12𝑘i=1,2,\ldots,k

Moreover, the converse of the theorem is also true:
此外,定理的逆命题也成立:

Theorem 4.

Let Y={y1,,yn}𝑌subscript𝑦1subscript𝑦𝑛Y=\{y_{1},\ldots,y_{n}\} be a set of n𝑛n points. Let ψ𝜓\psi be a c𝑐c-concave function defined on Y𝑌Y. Then the sets Lagψc(yi)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑖\mathrm{L}ag_{\psi}^{c}(y_{i}) are not empty for all i=1,,n𝑖1𝑛i=1,\ldots,n,.


定理 4 . 让 Y={y1,,yn}𝑌subscript𝑦1subscript𝑦𝑛Y=\{y_{1},\ldots,y_{n}\} 成为一个由 n𝑛n 个点组成的集合。让 ψ𝜓\psi 是一个定义在 Y𝑌Y 上的 c𝑐c -凹函数。那么对于所有的 i=1,,n𝑖1𝑛i=1,\ldots,n ,集合 Lagψc(yi)L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑖\mathrm{L}ag_{\psi}^{c}(y_{i}) 都不为空。
Proof.

Reasoning by contradiction, we suppose that ψ𝜓\psi is a c𝑐c-concave function and there exist i0{1,,n}subscript𝑖01𝑛i_{0}\in\{1,\ldots,n\} such that Lagψc(yi0)=L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦subscript𝑖0\mathrm{L}ag_{\psi}^{c}(y_{i_{0}})=\emptyset. Then, from definition

xX,j{1,,n}withji0,andϵj>0such thatc(x,yi0)ψ(yi0)c(x,yj)ψ(yj)+ϵj.formulae-sequenceformulae-sequencefor-all𝑥𝑋𝑗1𝑛with𝑗subscript𝑖0andsubscriptitalic-ϵ𝑗0such that𝑐𝑥subscript𝑦subscript𝑖0𝜓subscript𝑦subscript𝑖0𝑐𝑥subscript𝑦𝑗𝜓subscript𝑦𝑗subscriptitalic-ϵ𝑗\begin{split}\forall x\in X,\ \exists j\in\{1,\ldots,n\}\ \hbox{with}\ j\neq i_{0},\ \hbox{and}\ \epsilon_{j}>0\ \hbox{such that}\\ c(x,y_{i_{0}})-\psi(y_{i_{0}})\geq c(x,y_{j})-\psi(y_{j})+\epsilon_{j}.\end{split}

We will write x=xj𝑥subscript𝑥𝑗x=x_{j}. Thus,
我们会写 x=xj𝑥subscript𝑥𝑗x=x_{j} 。 因此。

c(xj,yi0)(infyjY[c(xj,yj)ψ(yj)])c(xj,yi0)(c(xj,yi0)ψ(yi0)ϵj)=ψ(yi0)+ϵj.𝑐subscript𝑥𝑗subscript𝑦subscript𝑖0subscriptinfimumsubscript𝑦𝑗𝑌delimited-[]𝑐subscript𝑥𝑗subscript𝑦𝑗𝜓subscript𝑦𝑗𝑐subscript𝑥𝑗subscript𝑦subscript𝑖0𝑐subscript𝑥𝑗subscript𝑦subscript𝑖0𝜓subscript𝑦subscript𝑖0subscriptitalic-ϵ𝑗𝜓subscript𝑦subscript𝑖0subscriptitalic-ϵ𝑗\begin{split}c(x_{j},y_{i_{0}})&-\left(\inf\limits_{y_{j}\in Y}\left[c(x_{j},y_{j})-\psi(y_{j})\right]\right)\\ &\geq c(x_{j},y_{i_{0}})-\left(c(x_{j},y_{i_{0}})-\psi(y_{i_{0}})-\epsilon_{j}\right)\\ &=\psi(y_{i_{0}})+\epsilon_{j}.\end{split} (33)

Therefore, 所以,

(ψc)c(yi0)=infxX[c(x,yi0)(infyjY[c(x,yj)ψ(yj)])]=infj[c(xj,yi0)(infyjY[c(xj,yj)ψ(yj)])]=infj[ψ(yi0)+ϵj]>ψ(yi0).superscriptsuperscript𝜓𝑐𝑐subscript𝑦subscript𝑖0subscriptinfimum𝑥𝑋delimited-[]𝑐𝑥subscript𝑦subscript𝑖0subscriptinfimumsubscript𝑦𝑗𝑌delimited-[]𝑐𝑥subscript𝑦𝑗𝜓subscript𝑦𝑗subscriptinfimum𝑗delimited-[]𝑐subscript𝑥𝑗subscript𝑦subscript𝑖0subscriptinfimumsubscript𝑦𝑗𝑌delimited-[]𝑐subscript𝑥𝑗subscript𝑦𝑗𝜓subscript𝑦𝑗subscriptinfimum𝑗delimited-[]𝜓subscript𝑦subscript𝑖0subscriptitalic-ϵ𝑗𝜓subscript𝑦subscript𝑖0\begin{split}(\psi^{c})^{c}(y_{i_{0}})&=\inf\limits_{x\in X}\left[c(x,y_{i_{0}})-\left(\inf\limits_{y_{j}\in Y}\left[c(x,y_{j})-\psi(y_{j})\right]\right)\right]\\ &=\inf\limits_{j}\left[c(x_{j},y_{i_{0}})-\left(\inf\limits_{y_{j}\in Y}\left[c(x_{j},y_{j})-\psi(y_{j})\right]\right)\right]\\ &=\inf\limits_{j}\left[\psi(y_{i_{0}})+\epsilon_{j}\right]\\ &>\psi(y_{i_{0}}).\end{split}

This contradicts the fact that ψ𝜓\psi is a c-concave function, because [San15, Proposition 1.34]. ∎
这与 ψ𝜓\psi 是 c-凹函数的事实相矛盾,因为[San15,命题 1.34]。∎


证明。通过反证法,我们假设 ψ𝜓\psi 是一个 c𝑐c -凹函数,并且存在这样的 i0{1,,n}subscript𝑖01𝑛i_{0}\in\{1,\ldots,n\} ,使得 Lagψc(yi0)=L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦subscript𝑖0\mathrm{L}ag_{\psi}^{c}(y_{i_{0}})=\emptyset 。然后,根据定义

We now proceed to compute the first and second order derivatives of the objective function F𝐹F. These derivatives are useful in practice to design computational algorithms.
现在我们开始计算目标函数 F𝐹F 的一阶和二阶导数。这些导数在实践中用于设计计算算法。

7.4 First-order derivatives of the objective function
目标函数的一阶导数

Since it is concave, F𝐹F admits a unique maximum ψsuperscript𝜓\psi^{*}, characterized by F(ψ)=0𝐹superscript𝜓0\nabla F(\psi^{*})=0 where F𝐹\nabla F denotes the gradient. Let us now examine the form of the gradient F=(G(Aψ,ψ)+j=1nψjνj)𝐹𝐺𝐴𝜓𝜓superscriptsubscript𝑗1𝑛subscript𝜓𝑗subscript𝜈𝑗\nabla F=\nabla\left(G(A\psi,\psi)+\sum_{j=1}^{n}\psi_{j}\nu_{j}\right). By replacing G(Aψ,ψ)𝐺subscript𝐴𝜓𝜓G(A_{\psi},\psi) with its expression (27), one obtains:
既然它是凹的, F𝐹F 允许一个由 F(ψ)=0𝐹superscript𝜓0\nabla F(\psi^{*})=0 描述的独特最大值 ψsuperscript𝜓\psi^{*} ,其中 F𝐹\nabla F 表示梯度。现在让我们考察梯度 F=(G(Aψ,ψ)+j=1nψjνj)𝐹𝐺𝐴𝜓𝜓superscriptsubscript𝑗1𝑛subscript𝜓𝑗subscript𝜈𝑗\nabla F=\nabla\left(G(A\psi,\psi)+\sum_{j=1}^{n}\psi_{j}\nu_{j}\right) 的形式。通过用其表达式(27)替换 G(Aψ,ψ)𝐺subscript𝐴𝜓𝜓G(A_{\psi},\psi) ,就可以得到:

Gψj(ψ)=limt0G(ψ+tej)G(ψ)t=limt01t{Xinf[c(x,y1)ψ1,,c(x,yj)ψjt,,c(x,yn)ψn]infi[c(x,yi)ψi]u(x)dx}.𝐺subscript𝜓𝑗𝜓subscript𝑡0𝐺𝜓𝑡subscript𝑒𝑗𝐺𝜓𝑡subscript𝑡01𝑡subscript𝑋infimum𝑐𝑥subscript𝑦1subscript𝜓1𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑡𝑐𝑥subscript𝑦𝑛subscript𝜓𝑛subscriptinfimum𝑖delimited-[]𝑐𝑥subscript𝑦𝑖subscript𝜓𝑖𝑢𝑥𝑑𝑥\begin{split}\frac{\partial G}{\partial\psi_{j}}(\psi)&=\lim_{t\to 0}\frac{G(\psi+t{e}_{j})-G(\psi)}{t}\\ &=\lim_{t\to 0}\frac{1}{t}\left\{\int_{X}\inf\left[c(x,y_{1})-\psi_{1},\ldots,c(x,y_{j})-\psi_{j}-t,\ldots,c(x,y_{n})-\psi_{n}\right]\right.\\ &\hskip 8.5359pt\left.-\inf\limits_{i}\left[c(x,y_{i})-\psi_{i}\right]u(x)dx\right\}.\end{split}

Let xLagψc(ym)𝑥L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑚x\in\mathrm{L}ag_{\psi}^{c}(y_{m}), then for t𝑡t small enough we have
假设 xLagψc(ym)𝑥L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑚x\in\mathrm{L}ag_{\psi}^{c}(y_{m}) ,则对于足够小的 t𝑡t 我们有

inf[c(x,y1)ψ1,,c(x,yj)ψjt,,c(x,yk)ψn]=c(x,ym)ψm.infimum𝑐𝑥subscript𝑦1subscript𝜓1𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑡𝑐𝑥subscript𝑦𝑘subscript𝜓𝑛𝑐𝑥subscript𝑦𝑚subscript𝜓𝑚\begin{split}\inf&\left[c(x,y_{1})-\psi_{1},\ldots,c(x,y_{j})-\psi_{j}-t,\ldots,c(x,y_{k})-\psi_{n}\right]\\ &=c(x,y_{m})-\psi_{m}.\end{split}

Indeed, since c(x,ym)ψm<c(x,yi)ψi𝑐𝑥subscript𝑦𝑚subscript𝜓𝑚𝑐𝑥subscript𝑦𝑖subscript𝜓𝑖c(x,y_{m})-\psi_{m}<c(x,y_{i})-\psi_{i} for all im𝑖𝑚i\neq m, in particular, if mj𝑚𝑗m\neq j: c(x,ym)ψm<c(x,yj)ψj𝑐𝑥subscript𝑦𝑚subscript𝜓𝑚𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗c(x,y_{m})-\psi_{m}<c(x,y_{j})-\psi_{j}. Thus, for t𝑡t small enough
实际上,由于对于所有的 im𝑖𝑚i\neq m 都有 c(x,ym)ψm<c(x,yi)ψi𝑐𝑥subscript𝑦𝑚subscript𝜓𝑚𝑐𝑥subscript𝑦𝑖subscript𝜓𝑖c(x,y_{m})-\psi_{m}<c(x,y_{i})-\psi_{i} ,特别地,如果 mj𝑚𝑗m\neq j : c(x,ym)ψm<c(x,yj)ψj𝑐𝑥subscript𝑦𝑚subscript𝜓𝑚𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗c(x,y_{m})-\psi_{m}<c(x,y_{j})-\psi_{j} 。因此,对于足够小的 t𝑡t

c(x,ym)ψmc(x,yj)ψjt.𝑐𝑥subscript𝑦𝑚subscript𝜓𝑚𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑡c(x,y_{m})-\psi_{m}\leq c(x,y_{j})-\psi_{j}-t.

In the case that m=j𝑚𝑗m=j,
m=j𝑚𝑗m=j 的情况下,

inf[c(x,y1)ψ1,,c(x,yj)ψjt,,c(x,yk)ψk]=c(x,yj)ψjt.infimum𝑐𝑥subscript𝑦1subscript𝜓1𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑡𝑐𝑥subscript𝑦𝑘subscript𝜓𝑘𝑐𝑥subscript𝑦𝑗subscript𝜓𝑗𝑡\begin{split}\inf&\left[c(x,y_{1})-\psi_{1},\ldots,c(x,y_{j})-\psi_{j}-t,\ldots,c(x,y_{k})-\psi_{k}\right]\\ &=c(x,y_{j})-\psi_{j}-t.\end{split}

Consequently, we obtain that
因此,我们得到

Gψj(ψ)=Lagψc(yj)u(x)𝑑x.𝐺subscript𝜓𝑗𝜓subscriptL𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗𝑢𝑥differential-d𝑥\begin{split}\frac{\partial G}{\partial\psi_{j}}(\psi)=-\int_{\mathrm{L}ag_{\psi}^{c}(y_{j})}u(x)dx.\end{split}

Recalling that the objective function F𝐹F is given by F(ψ)=G(Aψ,ψ)+jνjψj𝐹𝜓𝐺subscript𝐴𝜓𝜓subscript𝑗subscript𝜈𝑗subscript𝜓𝑗F(\psi)=G(A_{\psi},\psi)+\sum_{j}\nu_{j}\psi_{j}, we finally obtain:
回顾目标函数 F𝐹FF(ψ)=G(Aψ,ψ)+jνjψj𝐹𝜓𝐺subscript𝐴𝜓𝜓subscript𝑗subscript𝜈𝑗subscript𝜓𝑗F(\psi)=G(A_{\psi},\psi)+\sum_{j}\nu_{j}\psi_{j} 给出,我们最终获得:

Fψj=νjLagψc(yj)u(x)𝑑x.𝐹subscript𝜓𝑗subscript𝜈𝑗subscriptL𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗𝑢𝑥differential-d𝑥\frac{\partial F}{\partial\psi_{j}}=\nu_{j}-\int\limits_{\mathrm{L}ag_{\psi}^{c}(y_{j})}u(x)dx. (34)

Since the objective function F𝐹F is concave, it admits a unique maximum. At this maximum, all the components of the gradient vanish, this implies that the quantity of matter obtained at yjsubscript𝑦𝑗y_{j}, that is the integral of u𝑢u on the Laguerre cell of yjsubscript𝑦𝑗y_{j}, corresponds to the prescribed quantity of matter, that is νjsubscript𝜈𝑗\nu_{j}.
由于目标函数 F𝐹F 是凹的,它具有唯一的最大值。在这个最大值处,梯度的所有分量都消失,这意味着在 yjsubscript𝑦𝑗y_{j} 处获得的物质量,即在 yjsubscript𝑦𝑗y_{j} 的拉盖尔单元上 u𝑢u 的积分,与所规定的物质量相对应,即 νjsubscript𝜈𝑗\nu_{j}

7.5 Second order derivatives of the objective function
目标函数的二阶导数

The coefficients of the Hessian matrix 2F/ψiψjsuperscript2𝐹subscript𝜓𝑖subscript𝜓𝑗\partial^{2}F/\partial\psi_{i}\partial\psi_{j} are slightly more difficult to compute, since here, we cannot invoke the envelope theorem. We cannot avoid invoking Reynold’s formula. We do not detail the computations for length considerations, but give the final result.
Hessian 矩阵的系数稍微难以计算,因为在这里,我们无法调用包络定理。我们无法避免调用雷诺公式。出于篇幅考虑,我们不详细说明计算过程,但给出最终结果。

In the particular case of the L2subscript𝐿2L_{2} cost, that is c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2}, the second-order derivatives are given by:
L2subscript𝐿2L_{2} 成本的特定情况下,即 c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2} ,二阶导数为:

2Gψiψj(ψ)=Lagψc(yi)Lagψc(yj)u(x)yiyj𝑑S(x)(ij),2Gψj2(ψ)=ij2Gψiψj(ψ).formulae-sequencesuperscript2𝐺subscript𝜓𝑖subscript𝜓𝑗𝜓subscriptL𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑖L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗𝑢𝑥normsubscript𝑦𝑖subscript𝑦𝑗differential-d𝑆𝑥𝑖𝑗superscript2𝐺superscriptsubscript𝜓𝑗2𝜓subscript𝑖𝑗superscript2𝐺subscript𝜓𝑖subscript𝜓𝑗𝜓\begin{split}\frac{\partial^{2}G}{\partial\psi_{i}\partial\psi_{j}}(\psi)&=\int\limits_{\mathrm{L}ag_{\psi}^{c}(y_{i})\cap\mathrm{L}ag_{\psi}^{c}(y_{j})}\frac{u(x)}{\|y_{i}-y_{j}\|}\ dS(x)\quad(i\neq j),\\ \frac{\partial^{2}G}{\partial\psi_{j}^{2}}(\psi)&=-\sum\limits_{i\neq j}\frac{\partial^{2}G}{\partial\psi_{i}\partial\psi_{j}}(\psi).\end{split} (35)

7.6 A computational algorithm for L2subscript𝐿2L_{2} semi-discrete optimal transport
7.6A 计算算法用于 L2subscript𝐿2L_{2} 半离散最优输运

With the definition of F(ψ)𝐹𝜓F(\psi), the expression of its first order derivatives (gradient F𝐹\nabla F) and second order derivatives (Hessian matrix 2F=(2F/ψiψj)ijsuperscript2𝐹subscriptsuperscript2𝐹subscript𝜓𝑖subscript𝜓𝑗𝑖𝑗\nabla^{2}F=\left(\partial^{2}F/\partial\psi_{i}\partial\psi_{j}\right)_{ij}), we are now equipped to design a numerical solution mechanism that computes semi-discrete optimal transport by maximizing F𝐹F, based on a particular version [KMT16] of Newton’s optimization method [NW06]:
借助 F(ψ)𝐹𝜓F(\psi) 的定义,以及其一阶导数(梯度 F𝐹\nabla F )和二阶导数(Hessian 矩阵 2F=(2F/ψiψj)ijsuperscript2𝐹subscriptsuperscript2𝐹subscript𝜓𝑖subscript𝜓𝑗𝑖𝑗\nabla^{2}F=\left(\partial^{2}F/\partial\psi_{i}\partial\psi_{j}\right)_{ij} )的表达式,我们现在能够设计一个数值解机制,通过最大化 F𝐹F 来计算半离散最优输运,基于牛顿优化方法的一个特定版本[KMT16]。

Input:a mesh that supports the source density uthe points (yj)j=1nthe prescribed quantities (νj)j=1nOutput:the (unique) Laguerre diagram Lagψc such that:Lagψc(yj)u(x)𝑑x=νjj(1)ψ[00](2)While convergence is not reached(3)Compute F and 2F(4) Find pn such that 2F(ψ)p=F(ψ)(5) Find the descent parameter α(6)ψψ+αp(7)End whileInput:a mesh that supports the source density 𝑢missing-subexpressionthe points superscriptsubscriptsubscript𝑦𝑗𝑗1𝑛missing-subexpressionthe prescribed quantities superscriptsubscriptsubscript𝜈𝑗𝑗1𝑛Output:the (unique) Laguerre diagram L𝑎superscriptsubscript𝑔𝜓𝑐 such that:missing-subexpressionsubscriptL𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗𝑢𝑥differential-d𝑥subscript𝜈𝑗for-all𝑗missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression1𝜓delimited-[]002While convergence is not reached3Compute 𝐹 and superscript2𝐹4 Find 𝑝superscript𝑛 such that superscript2𝐹𝜓𝑝𝐹𝜓5 Find the descent parameter 𝛼6𝜓𝜓𝛼𝑝7End while\begin{array}[]{ll}\mbox{\bf Input:}&\mbox{a mesh that supports the source density }u\\ &\mbox{the points }(y_{j})_{j=1}^{n}\\ &\mbox{the prescribed quantities }(\nu_{j})_{j=1}^{n}\\[5.69054pt] \mbox{\bf Output:}&\mbox{the (unique) Laguerre diagram }\mathrm{L}ag_{\psi}^{c}\mbox{ such that:}\\ &\quad\int\limits_{\mathrm{L}ag_{\psi}^{c}(y_{j})}u(x)dx=\nu_{j}\quad\forall j\\[14.22636pt] \hline\cr\\ (1)&\psi\leftarrow[0\ldots 0]\\ (2)&\mbox{While convergence is not reached}\\ (3)&\quad\mbox{Compute }\nabla F\mbox{ and }\nabla^{2}F\\ (4)&\quad\mbox{ Find }p\in{\mathbb{R}}^{n}\mbox{ such that }\nabla^{2}F(\psi)p=-\nabla F(\psi)\\ (5)&\quad\mbox{ Find the descent parameter }\alpha\\ (6)&\quad\psi\leftarrow\psi+\alpha p\\ (7)&\mbox{End while}\end{array}

The source measure is given by its density, that is a positive piecewise linear function u𝑢u, supported by a triangulated mesh (2D) or tetrahedral mesh (3D) of a domain X𝑋X. The target measure is discrete, and supported by the pointset Y=(yj)j=1n𝑌superscriptsubscriptsubscript𝑦𝑗𝑗1𝑛Y=(y_{j})_{j=1}^{n}. Each target point will receive the prescribed quantity of matter νjsubscript𝜈𝑗\nu_{j}. Clearly, the prescriptions should be balanced with the available resource, that is Xu(x)𝑑x=jνjsubscript𝑋𝑢𝑥differential-d𝑥subscript𝑗subscript𝜈𝑗\int_{X}u(x)dx=\sum_{j}\nu_{j}. The algorithm computes for each point of the target measure the subset of X𝑋X that is affected to it through the optimal transport, T1(yj)=Lagψc(yj)superscript𝑇1subscript𝑦𝑗L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗T^{-1}(y_{j})=\mathrm{L}ag_{\psi}^{c}(y_{j}), that corresponds to the Laguerre cell of yjsubscript𝑦𝑗y_{j}. The Laguerre diagram is completely determined by the vector ψ𝜓\psi that maximizes F𝐹F.
源测度由其密度给出,这是一个正的分段线性函数 u𝑢u ,支持一个域 X𝑋X 的三角形网格(2D)或四面体网格(3D)。目标测度是离散的,支持点集 Y=(yj)j=1n𝑌superscriptsubscriptsubscript𝑦𝑗𝑗1𝑛Y=(y_{j})_{j=1}^{n} 。每个目标点将接收指定量的物质 νjsubscript𝜈𝑗\nu_{j} 。显然,配额应该与可用资源保持平衡,即 Xu(x)𝑑x=jνjsubscript𝑋𝑢𝑥differential-d𝑥subscript𝑗subscript𝜈𝑗\int_{X}u(x)dx=\sum_{j}\nu_{j} 。该算法为目标测度的每个点计算经由最优传输分配给它的 X𝑋X 子集, T1(yj)=Lagψc(yj)superscript𝑇1subscript𝑦𝑗L𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗T^{-1}(y_{j})=\mathrm{L}ag_{\psi}^{c}(y_{j}) ,对应于 yjsubscript𝑦𝑗y_{j} 的拉盖尔单元。拉盖尔图由使 F𝐹F 最大化的矢量 ψ𝜓\psi 完全确定。

Line (2) needs a criterion for convergence. The classical convergence criterion for a Newton algorithm uses the norm of the gradient of F𝐹F. In our case, the components of the gradient of F𝐹F have a geometric meaning, since F/ψj𝐹subscript𝜓𝑗\partial F/\partial\psi_{j} corresponds to the difference between the prescribed quantity νjsubscript𝜈𝑗\nu_{j} associated with j𝑗j and the quantity of matter present in the Laguerre cell of yjsubscript𝑦𝑗y_{j} given by Lagψc(yj)u(x)𝑑xsubscriptL𝑎superscriptsubscript𝑔𝜓𝑐subscript𝑦𝑗𝑢𝑥differential-d𝑥\int_{\mathrm{L}ag_{\psi}^{c}(y_{j})}\!\!u(x)dx. Thus, we can decide to stop the algorithm as soon as the largest absolute value of a component becomes smaller than a certain percentage of the smallest prescription. Thus we consider that convergence is reached if max|Fj|<ϵminjνjsubscript𝐹𝑗italic-ϵsubscript𝑗subscript𝜈𝑗\max|\nabla F_{j}|<\epsilon\min_{j}\nu_{j}, for a user-defined ϵitalic-ϵ\epsilon (typically 1% in the examples below).
第(2)行需要一个收敛标准。牛顿算法的经典收敛标准使用 F𝐹F 的梯度范数。在我们的情况下, F𝐹F 的梯度分量具有几何意义,因为 F/ψj𝐹subscript𝜓𝑗\partial F/\partial\psi_{j} 对应于所需数量 νjsubscript𝜈𝑗\nu_{j}j𝑗j 相关联的拉吉尔细胞中存在的物质数量与 yjsubscript𝑦𝑗y_{j} 给定的拉格朗日细胞中存在的物质数量之间的差异。因此,我们可以决定在某个分量的最大绝对值小于最小规定值的一定百分比时停止算法。因此,我们认为当 max|Fj|<ϵminjνjsubscript𝐹𝑗italic-ϵsubscript𝑗subscript𝜈𝑗\max|\nabla F_{j}|<\epsilon\min_{j}\nu_{j} 时收敛已经实现,对于用户定义的 ϵitalic-ϵ\epsilon (在下面的示例中通常为 1%)。

Line (3) computes the coefficients of the gradient and the Hessian matrix of F𝐹F, using (34) and (35). These computations involve integrals over the Laguerre cells and over their boundaries. For the L2subscript𝐿2L_{2} cost c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2}, the boundaries of the Laguerre cells are rectilinear, which dramatically simplifies the computations of the Hessian coefficients (35). In addition, it makes it possible to use efficient algorithms to compute the Laguerre diagram [Bow81, Wat81]. Their implementation is available in several programming libraries, such as GEOGRAM111111http://alice.loria.fr/software/geogram/doc/html/index.html
第(3)行使用(34)和(35)来计算 F𝐹F 的梯度和 Hessian 矩阵的系数。这些计算涉及拉盖尔单元和它们边界上的积分。对于 L2subscript𝐿2L_{2} 花费 c(x,y)=1/2xy2𝑐𝑥𝑦12superscriptnorm𝑥𝑦2c(x,y)=1/2\|x-y\|^{2} ,拉盖尔单元的边界是直线的,这显着简化了 Hessian 系数(35)的计算。此外,这使得可以使用高效的算法来计算拉盖尔图[ Bow81,Wat81]。它们的实现可在几个编程库中找到,如 GEOGRAM 11
et CGAL121212http://www.cgal.org 和 CGAL 12 . Then one needs to compute the intersection between each Laguerre cell and the mesh that supports the density u𝑢u. This can be done with specialized algorithms [Lév15], also available in GEOGRAM.
。然后需要计算每个拉盖尔单元与支持密度 u𝑢u 的网格之间的交集。这可以通过专门的算法[Lév15]完成,也可在 GEOGRAM 中找到。

Line (4) finds the Newton step p𝑝p by solving a linear system. We use the Conjugate Gradient algorithm [HS52] with the Jacobi preconditioner. In our empirical experiments below, we stopped the conjugate iterations as soon as 2Fp+F/F<103normsuperscript2𝐹𝑝𝐹norm𝐹superscript103\|\nabla^{2}Fp+\nabla F\|/\|\nabla F\|<10^{-3}.
第(4)行通过解线性系统找到牛顿步 p𝑝p 。我们使用带有雅可比预处理器的共轭梯度算法[HS52]。在我们的经验实验中,我们在共轭迭代次数达到 2Fp+F/F<103normsuperscript2𝐹𝑝𝐹norm𝐹superscript103\|\nabla^{2}Fp+\nabla F\|/\|\nabla F\|<10^{-3} 时停止。

Line (5) determines the descent parameter α𝛼\alpha. A result due to Mérigot and Kitagawa [KMT16] ensures the convergence of the Newton algorithm if the measure of the smallest Laguerre cell remains larger than a certain threshold (that is, half the smallest prescription νjsubscript𝜈𝑗\nu_{j}). There is also a condition on the norm of the gradient Fnorm𝐹\|\nabla F\| that we do not repeat here (the reader is referred to Mérigot and Kitagawa’s original article for more details). In our implementation, starting with α=1𝛼1\alpha=1, we iteratively divide α𝛼\alpha by two until both conditions are satisfied.
第(5)行确定了下降参数 α𝛼\alpha 。Mérigot 和 Kitagawa [KMT16]的结果确保了如果最小拉盖尔单元的度量保持高于一定阈值(即最小规定 νjsubscript𝜈𝑗\nu_{j} 的一半),牛顿算法的收敛。对于梯度 Fnorm𝐹\|\nabla F\| 的范数也有一个条件我们这里不再重复(读者可参考 Mérigot 和 Kitagawa 原始文章以获取更多细节)。在我们的实现中,从 α=1𝛼1\alpha=1 开始,我们将 α𝛼\alpha 迭代除以二直至满足这两个条件。

Let us now make one step backwards and think about the original definition of Monge’s problem (M). We wish to stress that the initial constraint (local mass conservation) that characterizes transport maps was terribly difficult. It is remarkable that after several rewrites (Kantorovich relaxation, duality, c-convexity), the final problem becomes as simple as optimizing a regular (C2superscript𝐶2C^{2}) concave function, for which computing the gradient and Hessian is easy in the semi-discrete case and boils down to evaluating volumes and areas in a Laguerre diagram. We wish also to stress that the computational algorithm did not require to make any approximation or discretization. The discrete, computer version is a particular setting of the general theory, that fully describes not only transport between smooth objects (functions), but also transport between less regular objects, such as pointsets and triangulated meshes. This is made possible by the rich mathematical vocabulary (measures) on which optimal transport theory acts. Thus, the computational algorithm is an elegant, direct verbatim translation of the theory into a computer program.
让我们现在往回退一步,思考一下蒙日问题(M)的原始定义。我们希望强调,表征传输映射的初始约束(局部质量守恒)是非常困难的。令人惊奇的是,在经过几次改写(康托洛维奇放松,对偶,c-凸性)之后,最终问题变得像优化正则( C2superscript𝐶2C^{2} )凹函数那么简单,其中在半离散情况下计算梯度和 Hessian 是容易的,而且归结为在拉盖尔图中评估体积和面积。我们还希望强调的是,计算算法不需要做任何近似或离散化。离散的、计算机版本是一般理论的特定设置,不仅完全描述了光滑对象(函数)之间的传输,也描述了不太规则的对象之间的传输,如点集和三角形网格。这得益于在最优传输理论上起作用的丰富数学词汇(度量)。因此,计算算法是将理论直接优雅地翻译成计算机程序的直接逐字翻译。

We now show some computational results and list possible applications of this algorithm.
我们现在展示一些计算结果,并列出这个算法的可能应用。

8 Results, examples and applications of L2subscript𝐿2L_{2} semi-discrete transport
8 结果,例子和 L2subscript𝐿2L_{2} 半离散传输的应用。

Refer to caption

Figure 12: A: transport between a uniform density and a random pointset; B: transport between a varying density and the same pointset; C: intersections between meshes used to compute the coefficients; D: transport between a measure supported by a surface and a 3D pointset.
图 12:A:均匀密度和随机点集之间的传输;B:变化密度和相同点集之间的传输;C:用于计算系数的网格之间的交点;D:支持表面的测量和三维点集之间的传输。

Refer to caption

Figure 13: Interpolation of 3D volumes using optimal transport.
图 13:使用最优传输进行 3D 体积的插值。

Refer to caption

Figure 14: Computing the transport between objects of different dimension, from a sphere to a cube. The sphere is sampled with 10 million points. The cross-section reveals the formation of a singularity that has some similarities with the medial axis of the cube.
图 14:计算不同维度物体之间的输运,从球到立方体。球体采样了 1000 万个点。横截面显示了一个类似于立方体中轴的奇异性的形成。

Refer to caption


Figure 15: An application of optimal transport to fluid simulation: numerical simulation of the Taylor-Rayleigh instability using the Gallouet-Mérigot scheme.
图 15:将最优输运应用于流体模拟:使用 Gallouet-Mérigot 方案进行的泰勒-雷利不稳定性数值模拟。

Refer to caption

Figure 16: Numerical simulation of an incompressible bi-phasic flow in a bottle.
图 16:在瓶子中不可压缩两相流动的数值模拟。

Refer to caption

Refer to caption

Figure 17: Top: Numerical simulation of the Taylor-Rayleigh instability using a 3D version of the Gallouet-Mérigot scheme, with a cross-section that reveals the internal structure of the vortices. Bottom: a closeup that shows the interface between the two fluids, represented by the Laguerre facets that bound two Laguerre cells of different fluid elements.
图 17:顶部:使用 Gallouet-Mérigot 方案的 3D 版本对泰勒-瑞利不稳定性进行数值模拟,通过一个横截面展示了涡旋的内部结构。底部:放大图显示了两种流体之间的界面,由拉盖尔面构成,这些面限定了不同流体元素的两个拉盖尔单元。

Figure 12 shows some examples of transport in 2D, between a uniform density and a pointset (A). Each Laguerre cell has the same area. Then we consider the same pointset, but this time with a density u(x,y)=10(1+sin(2πx)sin(2πy))𝑢𝑥𝑦1012𝜋𝑥2𝜋𝑦u(x,y)=10(1+\sin(2\pi x)\sin(2\pi y)) (image B). We obtain a completely different Laguerre diagram. Each cell of this diagram has the same value for the integrated density u𝑢u. (C): the coefficients of the gradient and the Hessian of F𝐹F computed in the previous section involve integrals of the density u𝑢u over the Laguerre cells and over their boundaries. The density u𝑢u is supported by a triangulated mesh, and linearly interpolated on the triangles. The integrals of u𝑢u over the Laguerre cells and their boundaries are evaluated in closed form, by computing the intersections between the Laguerre cells and the triangles of the mesh that supports u𝑢u. (D): the same algorithm can compute the optimal transport between a measure supported by a 3D surface and a pointset in 3D.
图 12 显示了在二维空间中,在均匀密度和点集 (A) 之间的一些运输示例。每个拉盖尔细胞的面积相同。然后我们考虑相同的点集,但是这次密度为 u(x,y)=10(1+sin(2πx)sin(2πy))𝑢𝑥𝑦1012𝜋𝑥2𝜋𝑦u(x,y)=10(1+\sin(2\pi x)\sin(2\pi y)) (图像 B)。我们获得了完全不同的拉盖尔图。这个图的每个细胞对于综合密度 u𝑢u 有相同的值。(C):在上一节中计算的梯度和 F𝐹F 的 Hessian 系数涉及拉盖尔细胞和边界上密度 u𝑢u 的积分。密度 u𝑢u 由一个三角化网格支持,并且在三角形上进行线性插值。拉盖尔细胞和其边界上 u𝑢u 的积分通过计算拉盖尔细胞与支持 u𝑢u 的网格三角形之间的交集来以闭式形式计算。(D):相同的算法可以计算由 3D 表面支持的测度与 3D 中的点集之间的最优运输。

Figure 13 shows two examples of volume deformation by optimal transport. The same algorithm is used, but this time with 3D Laguerre diagrams, and by computing intersections between the Laguerre cells and a tetrahedral mesh that supports the source density u𝑢u. The intermediary steps of the animation are generated by linear interpolation of the positions (Mc. Cann’s interpolation). Figure 14 demonstrates a more challenging configuration, computing transport between a sphere (surface) and a cube (volume). The sphere is approximated with 10 million Dirac masses. As can be seen, transport has a singularity that resembles the medial axis of the cube (rightmost figure).
图 13 显示了两个通过最优输运进行体积变形的示例。 使用相同的算法,但这次使用了 3D 拉盖尔图,并通过计算拉盖尔单元与支持源密度 u𝑢u 的四面体网格之间的交点。 动画的中间步骤是通过位置的线性插值(Mc. Cann 的插值)生成的。 图 14 展示了一个更具挑战性的配置,计算了球(表面)和立方体(体积)之间的输运。 球用 1000 万点质量近似。 可以看出,输运具有类似于立方体中轴的奇点(最右图)。

Figure 15 demonstrates an application in computational fluid dynamics. A heavy incompressible fluid (in red) is placed on top of a lighter incompressible fluid (in blue). Both fluids tend to exchange their positions, due to gravity, but incompressibility is an obstacle to their movement. As a result, vortices are created, and they become faster and faster (Taylor-Rayleigh instability). The numerical simulation method that we used here [GM17] directly computes the trajectories of particles (Lagrangian coordinates) while taking into account the incompressibility constraint, which is in general unnatural in Lagrangian coordinates. By providing a means of controlling the volumes of the Laguerre cells, optimal transport appears here as an easy way of enforcing incompressibility. Using some efficient geometric algorithms for the Laguerre diagram and its intersections [Lév15], the Gallouet-Mérigot numerical scheme for incompressible Euler scheme can be also applied in 3D to simulate the behavior of non-mixing incompressible fluids (Figure 16). The 3D version of the Taylor-Rayleigh instability is shown in Figure 17, with 10 million Laguerre cells. More complicated vortices are generated (shown here in cross-section).
图 15 展示了在计算流体力学中的应用。重的不可压缩流体(红色)直接放在轻的不可压缩流体(蓝色)上方。由于重力,这两种流体往往会交换位置,但不可压缩性是它们移动的障碍。结果,涡旋被形成,并且变得越来越快(泰勒-雷利不稳定性)。我们在这里使用的数值模拟方法[GM17]直接计算粒子的轨迹(拉格朗日坐标),同时考虑不可压缩性约束,这一般在拉格朗日坐标中是不自然的。通过提供控制拉盖尔细胞体积的方法,最优输运在这里作为执行不可压缩性的一种简单方式。利用一些有效的几何算法来处理拉盖尔图和其交点[Lév15],可以将 Gallouet-Mérigot 数值方案用于三维中模拟不混合不可压缩流体的行为(图 16)。图 17 展示了三维版本的泰勒-雷利不稳定性,包括 1000 万个拉盖尔细胞。 更复杂的涡流会生成(如交叉部分所示)。

The applications in computer animation and computational physics seem to be promising, because the semi-discrete algorithm behaves very well in practice. It is now reasonable to design simulation algorithms that solve a transport problem in each timestep (as our early computational fluid dynamics of the previous paragraph do). With our optimized implementation that uses a multicore processor for the geometric part of the algorithm and a GPU for solving the linear systems, the semi-discrete algorithm takes no more than a few tens of seconds to solve a problem with 10 millions unknown. This opens the door to numerical solution mechanisms for difficult problems. For instance, in astrophysics, the Early Universe Reconstruction problem [BFH+03] consists in going “backward in time” from observation data on the repartition of galaxy clusters, to “play the Big-Bang movie” backward. Our numerical experiments tend to confirm Brenier’s point of view that he expressed in the 2000’s, that semi-discrete optimal transport can be an efficient way of solving this problem.
计算机动画和计算物理中的应用似乎很有前途,因为半离散算法在实践中表现得非常好。现在设计解决每个时间步长中的传输问题的仿真算法是合理的(就像我们前一段落中的早期计算流体力学所做的那样)。通过我们优化的实现,该实现使用多核处理器处理算法的几何部分,使用 GPU 解决线性系统,半离散算法仅需几十秒即可解决涉及 1000 万未知数的问题。这为解决困难问题打开了数值解机制的大门。例如,在天体物理学中,早期宇宙重建问题[BFH + 03]包括从对星系团分布的观测数据“时光倒流”,以“倒放大爆炸电影”。我们的数值实验倾向于证实布雷尼尔在 2000 年代表达的观点,即半离散最优传输可以是解决这个问题的有效方法。

To ensure that our results are reproducible, the source-code associated with the numerical solution mechanism used in all these experiments is available in the EXPLORAGRAM component of the GEOGRAM programming library131313http://alice.loria.fr/software/geogram/doc/html/index.html
为了确保我们的结果具有重现性,所有这些实验中使用的数字解决方案机制相关的源代码可以在 GEOGRAM 编程库的 EXPLORAGRAM 组件中找到 13
.

Acknowledgments 致谢

This research is supported by EXPLORAGRAM (Inria Exploratory Research Grant). The authors wish to thank Quentin Mérigot, Yann Brenier, Jean-David Benamou, Nicolas Bonneel and Lénaïc Chizat for many discussions.
本研究得到 EXPLORAGRAM(Inria 探索性研究资助)的支持。作者感谢 Quentin Mérigot,Yann Brenier,Jean-David Benamou,Nicolas Bonneel 和 Lénaïc Chizat 的许多讨论。

References 参考文献

  • [AG13] Luigi Ambrosio and Nicolas Gigli. A user’s guide to optimal transport. Modelling and Optimisation of Flows on Networks, Lecture Notes in Mathematics, pages 1–155, 2013.
    Luigi Ambrosio 和 Nicolas Gigli。最优输运的用户指南。网络流建模与优化,数学讲义,2013 年,1-155 页。
  • [AHA92] Franz Aurenhammer, Friedrich Hoffmann, and Boris Aronov. Minkowski-type theorems and least-squares partitioning. In Symposium on Computational Geometry, pages 350–357, 1992.
    Franz Aurenhammer、Friedrich Hoffmann 和 Boris Aronov。闵可夫斯基类型定理和最小二乘分区。在计算几何学研讨会中,第 350-357 页,1992 年。
  • [Ale05] A. D. Alexandrov. Intrinsic geometry of convex surfaces (translation of the 1948 Russian original). CRC Press, 2005.
    A. D. Alexandrov. 凸曲面的固有几何(1948 年俄文原著的翻译)。CRC 出版社,2005 年。
  • [Aur87] Franz Aurenhammer. Power diagrams: Properties, algorithms and applications. SIAM J. Comput., 16(1):78–96, 1987.
    Franz Aurenhammer. 力图:性质、算法和应用。SIAM 计算杂志,16(1):78-96,1987 年。
  • [BB00] Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the monge-kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000.
    Jean-David Benamou 和 Yann Brenier。一个计算流体力学解决蒙日 (monge-kantorovich) 质量转移问题的问题解. 数值数学, 84(3):375–393, 2000.
  • [BCMO14] Jean-David Benamou, Guillaume Carlier, Quentin Mérigot, and Edouard Oudet. Discretization of functionals involving the monge-ampère operator. arXiv, August 2014. [math.NA] http://arxiv.org/abs/1408.4336.
    Jean-David Benamou,Guillaume Carlier,Quentin Mérigot 和 Edouard Oudet。涉及蒙日安培算子的泛函离散化。arXiv,2014 年 8 月。[math.NA] http://arxiv.org/abs/1408.4336.
  • [BDM09] Rainer Burkard, Mauro Dell’Amico, and Silvano Martello. Assignment Problems. SIAM, 2009.
  • [BFH+03] Y. Brenier, U. Frisch, M. Henon, G. Loeper, S. Matarrese, R. Mohayaee, and A. Sobolevskii. Reconstruction of the early universe as a convex optimization problem. arXiv, September 2003. arXiv:astro-ph/0304214v3.
  • [Bow81] Adrian Bowyer. Computing dirichlet tessellations. Comput. J., 24(2):162–166, 1981.
  • [Bre91] Yann Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Communications on Pure and Applied Mathematics, 44:375–417, 1991.
  • [BvdPPH11] Nicolas Bonneel, Michiel van de Panne, Sylvain Paris, and Wolfgang Heidrich. Displacement interpolation using lagrangian mass transport. ACM Trans. Graph., 30(6):158, 2011.
  • [Caf03] Luis Caffarelli. The monge-ampère equation and optimal transportation, an elementary review. Optimal transportation and applications (Martina Franca, 2001), Lecture Notes in Mathematics, pages 1–10, 2003.
  • [Cut13] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 2292–2300, 2013.
  • [dGWH+15] Fernando de Goes, Corentin Wallez, Jin Huang, Dmitry Pavlov, and Mathieu Desbrun. Power particles: an incompressible fluid solver based on power diagrams. ACM Trans. Graph., 34(4):50:1–50:11, 2015.
  • [GM96] Wilfrid Gangbo and Robert J. McCann. The geometry of optimal transportation. Acta Math., 177(2):113–161, 1996.
  • [GM17] Thomas O. Gallouët and Quentin Mérigot. A lagrangian scheme à la Brenier for the incompressible euler equations. Foundations of Computational Mathematics, May 2017.
  • [HS52] Magnus R. Hestenes and Eduard Stiefel. Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau of Standards, 49(6):409–436, December 1952.
  • [KMT16] Jun Kitagawa, Quentin Mérigot, and Boris Thibert. A newton algorithm for semi-discrete optimal transport. CoRR, abs/1603.05579, 2016.
  • [Leo13] Christian Leonard. A survey of the schrödinger problem and some of its connections with optimal transport. arXiv, August 2013. [math.PR] http://arxiv.org/abs/1308.0215.
  • [Lév15] Bruno Lévy. A numerical algorithm for L2subscript𝐿2L_{2} semi-discrete optimal transport in 3d. ESAIM M2AN (Mathematical Modeling and Analysis), 2015.
  • [McC95] Robert J. McCann. Existence and uniqueness of monotone measure-preserving maps. Duke Mathematical Journal, 80(2):309–323, 1995.
  • [Mém11] Facundo Mémoli. Gromov-wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, 11(4):417–487, 2011.
  • [Mér11] Quentin Mérigot. A multiscale approach to optimal transport. Comput. Graph. Forum, 30(5):1583–1592, 2011.
  • [MMT17] Quentin Mérigot, Jocelyn Meyron, and Boris Thibert. Light in power: A general and parameter-free algorithm for caustic design. CoRR, abs/1708.04820, 2017.
  • [Mon84] Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences (1781), pages 666–704, 1784.
  • [NW06] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, 2nd edition, 2006.
  • [PPO14] Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet. Optimal Transport with Proximal Splitting. SIAM Journal on Imaging Sciences, 7(1):212–238, January 2014.
  • [San14] Filippo Santambrogio. Introduction to Optimal Transport Theory. In Optimal Transport, Theory and Applications, August 2014. [math.PR] http://arxiv.org/abs/1009.3856.
  • [San15] Filippo Santambrogio. Optimal transport for applied mathematicians, volume 87 of Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer, Cham, 2015. Calculus of variations, PDEs, and modeling.
  • [STTP14] Yuliy Schwartzburg, Romain Testuz, Andrea Tagliasacchi, and Mark Pauly. High-contrast computational caustic design. ACM Trans. Graph., 33(4):74:1–74:11, 2014.
  • [Tao11] Terence Tao. An Introduction to Measure Theory. American Mathematical Society, 2011.
  • [Vil09] Cédric Villani. Optimal transport : old and new. Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009.
  • [Wat81] David Watson. Computing the n-dimensional delaunay tessellation with application to voronoi polytopes. Comput. J., 24(2):167–172, 1981.