Denoising Diffusion Bridge Models
去噪扩散桥模型

Linqi Zhou Aaron Lou Samar Khanna Stefano Ermon
周林琪 Aaron Lou Samar Khanna Stefano Ermon
Department of Computer Science, Stanford University
{linqizhou, aaronlou, samar.khanna, ermon}@stanford.edu
斯坦福大学计算机科学系 {linqizhou, aaronlou, samar.khanna, ermon}@stanford.edu

Abstract 抽象的

Diffusion models are powerful generative models that map noise to data using stochastic processes. However, for many applications such as image editing, the model input comes from a distribution that is not random noise. As such, diffusion models must rely on cumbersome methods like guidance or projected sampling to incorporate this information in the generative process. In our work, we propose Denoising Diffusion Bridge Models (DDBMs), a natural alternative to this paradigm based on diffusion bridges, a family of processes that interpolate between two paired distributions given as endpoints. Our method learns the score of the diffusion bridge from data and maps from one endpoint distribution to the other by solving a (stochastic) differential equation based on the learned score. Our method naturally unifies several classes of generative models, such as score-based diffusion models and OT-Flow-Matching, allowing us to adapt existing design and architectural choices to our more general problem. Empirically, we apply DDBMs to challenging image datasets in both pixel and latent space. On standard image translation problems, DDBMs achieve significant improvement over baseline methods, and, when we reduce the problem to image generation by setting the source distribution to random noise, DDBMs achieve comparable FID scores to state-of-the-art methods despite being built for a more general task.
扩散模型是强大的生成模型，它使用随机过程将噪声映射到数据。然而，对于许多应用程序（例如图像编辑），模型输入来自的分布不是随机噪声。因此，扩散模型必须依靠指导或预测采样等繁琐的方法来将这些信息纳入生成过程。在我们的工作中，我们提出了去噪扩散桥模型（DDBM），这是基于扩散桥的范式的自然替代方案，扩散桥是在作为端点给出的两个配对分布之间进行插值的一系列过程。我们的方法从数据中学习扩散桥的分数，并通过基于学习的分数求解（随机）微分方程，从一个端点分布映射到另一个端点分布。我们的方法自然地统一了几类生成模型，例如基于分数的扩散模型和 OT-Flow-Matching，使我们能够根据更普遍的问题调整现有的设计和架构选择。根据经验，我们将 DDBM 应用于像素和潜在空间中具有挑战性的图像数据集。在标准图像转换问题上，DDBM 比基线方法取得了显着改进，并且当我们通过将源分布设置为随机噪声来将问题简化为图像生成时，DDBM 实现了与最先进方法相当的 FID 分数，尽管是构建的用于更一般的任务。

1 Introduction 1简介

Diffusion models are a powerful class of generative models which learn to reverse a diffusion process mapping data to noise (Sohl-Dickstein et al., 2015; Song and Ermon, 2019; Ho et al., 2020; Song et al., 2020b). For image generation tasks, they have surpassed GAN-based methods (Goodfellow et al., 2014) and achieved a new state-of-the-art for perceptual quality (Dhariwal and Nichol, 2021). Furthermore, these capabilities have spurred the development of modern text-to-image generative AI systems(Ramesh et al., 2022).
扩散模型是一类功能强大的生成模型，它学习反转将数据映射到噪声的扩散过程（Sohl-Dickstein 等人，2015；Song 和 Ermon，2019；Ho 等人，2020；Song 等人，2020b）。对于图像生成任务，它们已经超越了基于 GAN 的方法（Goodfellow 等人，2014 年），并在感知质量方面达到了新的最先进水平（Dhariwal 和 Nichol，2021 年）。此外，这些功能刺激了现代文本到图像生成人工智能系统的发展（Ramesh 等人，2022）。

Despite these impressive results, standard diffusion models are ill-suited for other tasks. In particular, the diffusion framework assumes that the prior distribution is random noise, which makes it difficult to adapt to tasks such as image translation, where the goal is to map between pairs of images. As such, one resorts to cumbersome techniques, such as conditioning the model (Ho and Salimans, 2022; Saharia et al., 2021) or manually altering the sampling procedure (Meng et al., 2022; Song et al., 2020b). These methods are not theoretically principled and map in one direction (typically from corrupted to clean images), losing the cycle consistency condition (Zhu et al., 2017).
尽管取得了这些令人印象深刻的结果，但标准扩散模型并不适合其他任务。特别是，扩散框架假设先验分布是随机噪声，这使得它很难适应图像翻译等任务，其目标是在图像对之间进行映射。因此，人们求助于繁琐的技术，例如调节模型（Ho 和 Salimans，2022；Saharia 等人，2021）或手动更改采样程序（Meng 等人，2022；Song 等人，2020b）。这些方法没有理论上的原理，并且沿一个方向映射（通常从损坏的图像到干净的图像），失去了循环一致性条件（Zhu et al., 2017）。

Instead, we consider methods which directly model a transport between two arbitrary probability distributions. This framework naturally captures the desiderata of image translation, but existing methods fall short empirically. For instance, ODE based flow-matching methods (Lipman et al., 2023; Albergo and Vanden-Eijnden, 2023; Liu et al., 2022a), which learn a deterministic path between two arbitrary probability distributions, have mainly been applied to image generation problems and have not been investigated for image translation. Furthermore, on image generation, ODE methods have not achieved the same empirical success as diffusion models. Schrödinger Bridge and models (De Bortoli et al., 2021) are another type of model which instead learn an entropic optimal transport between two probability distributions. However, these rely on expensive iterative approximation methods and have also found limited empirical use. More recent extensions including Diffusion Bridge Matching (Shi et al., 2023; Peluchetti, 2023) similarly require expensive iterative calculations.
相反，我们考虑直接对两个任意概率分布之间的传输进行建模的方法。该框架自然地捕获了图像翻译的需求，但现有方法在经验上存在不足。例如，基于 ODE 的流匹配方法（Lipman et al., 2023；Albergo and Vanden-Eijnden, 2023；Liu et al., 2022a）主要应用于图像处理，该方法学习两个任意概率分布之间的确定性路径生成问题，尚未针对图像翻译进行研究。此外，在图像生成方面，ODE 方法尚未取得与扩散模型相同的经验成功。薛定谔桥和模型（De Bortoli et al., 2021）是另一种类型的模型，它学习两个概率分布之间的熵最优传输。然而，这些依赖于昂贵的迭代近似方法，并且也发现有限的经验使用。最近的扩展包括扩散桥匹配（Shi et al., 2023；Peluchetti, 2023）同样需要昂贵的迭代计算。

In our work, we seek a scalable alternative that unifies diffusion-based unconditional generation methods and transport-based distribution translation methods, and we name our general framework Denoising Diffusion Bridge Models (DDBMs). We consider a reverse-time perspective of diffusion bridges, a diffusion process conditioned on given endpoints, and use this perspective to establish a general framework for distribution translation. We then note that this framework subsumes existing generative modeling paradigms such as score matching diffusion models (Song et al., 2020b) and flow matching optimal transport paths (Albergo and Vanden-Eijnden, 2023; Lipman et al., 2023; Liu et al., 2022a). This allows us to reapply many design choices to our more general task. In particular, we use this to generalize and improve the architecture pre-conditioning, noise schedule, and model sampler, minimizing input sensitivity and stabilizing performance. We then apply DDBMs to high-dimensional images using both pixel and latent space based models. For standard image translation tasks, we achieve better image quality (as measured by FID (Heusel et al., 2017)) and significantly better translation faithfulness (as measured by LPIPS (Zhang et al., 2018) and MSE). Furthermore, when we reduce our problem to image generation, we match standard diffusion model performance.
在我们的工作中，我们寻求一种可扩展的替代方案，统一基于扩散的无条件生成方法和基于传输的分布翻译方法，我们将我们的通用框架命名为去噪扩散桥模型（DDBM）。我们考虑扩散桥的逆时视角，即以给定端点为条件的扩散过程，并使用这个视角建立分布转换的通用框架。然后我们注意到，该框架包含了现有的生成建模范式，例如分数匹配扩散模型（Song 等人，2020b）和流量匹配最佳传输路径（Albergo 和 Vanden-Eijnden，2023；Lipman 等人，2023；Liu 等人） ., 2022a)。这使我们能够将许多设计选择重新应用到我们更一般的任务中。特别是，我们用它来概括和改进架构预处理、噪声调度和模型采样器，最大限度地降低输入灵敏度并稳定性能。然后，我们使用基于像素和潜在空间的模型将 DDBM 应用于高维图像。对于标准图像翻译任务，我们实现了更好的图像质量（通过 FID（Heusel 等人，2017）测量）和显着更好的翻译忠实度（通过 LPIPS（Zhang 等人，2018）和 MSE 测量）。此外，当我们将问题简化为图像生成时，我们可以匹配标准扩散模型的性能。

2 Preliminaries 2预赛

Recent advances in generative models have relied on the classical notion of transporting a data distribution $q_{\rm{data}}({\mathbf{x}})$ gradually to a prior distribution $p_{\rm{prior}}({\mathbf{x}})$ (Villani, 2008). By learning to reverse this process, one can sample from the prior and generate realistic samples.
生成模型的最新进展依赖于将数据分布 $q_{\rm{data}}({\mathbf{x}})$ 逐渐传输到先验分布 $p_{\rm{prior}}({\mathbf{x}})$ 的经典概念（Villani，2008）。通过学习逆转这一过程，人们可以从先前的样本中进行采样并生成真实的样本。

2.1 Generative Modeling with Diffusion Models
2.1 扩散模型的生成建模

Diffusion process. We are interested in modeling the distribution $q_{\rm{data}}({\mathbf{x}})$ , for ${\mathbf{x}}\in\mathbb{R}^{d}$ . We do this by constructing a diffusion process, which is represented by a set of time-indexed variables $\{{\mathbf{x}}_{t}\}_{t=0}^{T}$ such that ${\mathbf{x}}_{0}\sim p_{0}({\mathbf{x}}):=q_{\rm{data}}({\mathbf{x}})$ and ${\mathbf{x}}_{T}\sim p_{T}({\mathbf{x}}):=p_{\rm{prior}}({\mathbf{x}})$ . Here $q_{\rm{data}}({\mathbf{x}})$ is the initial “data" distribution and $p_{\rm{prior}}({\mathbf{x}})$ is the final “prior" distribution. The process can be modeled as the solution to the following SDE
扩散过程。我们感兴趣的是对 ${\mathbf{x}}\in\mathbb{R}^{d}$ 的分布 $q_{\rm{data}}({\mathbf{x}})$ 进行建模。我们通过构建一个扩散过程来实现这一点，该过程由一组时间索引变量 $\{{\mathbf{x}}_{t}\}_{t=0}^{T}$ 表示，例如 ${\mathbf{x}}_{0}\sim p_{0}({\mathbf{x}}):=q_{\rm{data}}({\mathbf{x}})$ 和 ${\mathbf{x}}_{T}\sim p_{T}({\mathbf{x}}):=p_{\rm{prior}}({\mathbf{x}})$ 。这里 $q_{\rm{data}}({\mathbf{x}})$ 是初始“数据”分布， $p_{\rm{prior}}({\mathbf{x}})$ 是最终“先验”分布。该过程可以建模为以下 SDE 的解决方案

\displaystyle\quad d{\mathbf{x}}_{t}={\mathbf{f}}({\mathbf{x}}_{t},t)dt+g(t)d{\mathbf{w}}_{t}

(1)

where ${\mathbf{f}}:\mathbb{R}^{d}\times[0,T]\rightarrow\mathbb{R}^{d}$ is vector-valued drift function, $g:[0,T]\rightarrow\mathbb{R}$ is a scalar-valued diffusion coefficient, and ${\mathbf{w}}_{t}$ is a Wiener process. Following this diffusion process forward in time constrains the final variable ${\mathbf{x}}_{T}$ to follow distribution $p_{\rm{prior}}({\mathbf{x}})$ . The reverse of this process is given by
其中 ${\mathbf{f}}:\mathbb{R}^{d}\times[0,T]\rightarrow\mathbb{R}^{d}$ 是矢量值漂移函数， $g:[0,T]\rightarrow\mathbb{R}$ 是标量值扩散系数， ${\mathbf{w}}_{t}$ 是维纳过程。在时间上遵循此扩散过程会限制最终变量 ${\mathbf{x}}_{T}$ 遵循分布 $p_{\rm{prior}}({\mathbf{x}})$ 。该过程的逆过程由下式给出

\displaystyle\quad d{\mathbf{x}}_{t}={\mathbf{f}}({\mathbf{x}}_{t},t)-g(t)^{2}\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}))dt+g(t)d{\mathbf{w}}_{t}

(2)

where $p({\mathbf{x}}_{t})\vcentcolon=p({\mathbf{x}}_{t},t)$ is the marginal distribution of ${\mathbf{x}}_{t}$ at time $t$ . Furthermore, one can derive an equivalent deterministic process called the probability flow ODE (Song et al., 2020b), which has the same marginal distributions:
其中 $p({\mathbf{x}}_{t})\vcentcolon=p({\mathbf{x}}_{t},t)$ 是 ${\mathbf{x}}_{t}$ 在时间 $t$ 的边际分布。此外，我们还可以推导出一种等效的确定性过程，称为概率流 ODE（Song 等人，2020b），它具有相同的边际分布：

\displaystyle d{\mathbf{x}}_{t}=\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)-\frac{1}{2}g(t)^{2}\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t})\Big{]}dt

(3)

In particular, one can draw ${\mathbf{x}}_{T}\sim q_{\rm{data}}({\mathbf{y}})$ and sample $q_{\rm{data}}$ by solving either the above reverse SDE or ODE backward in time.
特别是，可以通过在时间上向后求解上述逆 SDE 或 ODE 来绘制 ${\mathbf{x}}_{T}\sim q_{\rm{data}}({\mathbf{y}})$ 和样本 $q_{\rm{data}}$ 。

Denoising score-matching. The score, $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t})$ , can be learned by the score-matching loss
去噪分数匹配。分数 $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t})$ 可以通过分数匹配损失来学习

\displaystyle\mathcal{L}(\theta)=\mathbb{E}_{{\mathbf{x}}_{t}\sim p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}),{\mathbf{x}}_{0}\sim q_{\rm{data}}({\mathbf{x}}),t\sim{\mathcal{U}}(0,T)}\Big{[}\norm{{\mathbf{s}}_{\theta}({\mathbf{x}}_{t},t)-\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})}^{2}\Big{]}

(4)

such that the minimizer ${\mathbf{s}}^{*}_{\theta}({\mathbf{x}}_{t},t)$ of the above loss approximates the true score. Crucially, the above loss is tractable because the transition kernel $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})$ , which depends on specific choices of drift and diffusion functions, is designed to be Gaussian ${\mathbf{x}}_{t}=\alpha_{t}{\mathbf{x}}_{0}+\sigma_{t}{\bm{\epsilon}}$ , where $\alpha_{t}$ and $\sigma_{t}$ are functions of time and ${\bm{\epsilon}}\sim{\mathcal{N}}({\bm{0}},{\bm{I}})$ . It is also common to view the diffusion process in terms of the ${\mathbf{x}}_{t}$ ’s signal-to-noise ratio (SNR), defined as $\alpha_{t}^{2}/\sigma_{t}^{2}$ .
使得上述损失的最小化 ${\mathbf{s}}^{*}_{\theta}({\mathbf{x}}_{t},t)$ 接近真实分数。至关重要的是，上述损失是易于处理的，因为取决于漂移和扩散函数的特定选择的转换内核 $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})$ 被设计为高斯 ${\mathbf{x}}_{t}=\alpha_{t}{\mathbf{x}}_{0}+\sigma_{t}{\bm{\epsilon}}$ ，其中 $\alpha_{t}$ 和 $\sigma_{t}$ 是时间和 ${\bm{\epsilon}}\sim{\mathcal{N}}({\bm{0}},{\bm{I}})$ 的函数。通常还根据 ${\mathbf{x}}_{t}$ 的信噪比 (SNR) 来查看扩散过程，定义为 $\alpha_{t}^{2}/\sigma_{t}^{2}$ 。

2.2 Diffusion Process with Fixed Endpoints
2.2 固定端点的扩散过程

Diffusion models are limited because they can only transport complex data distributions to a standard Gaussian distribution and cannot be naturally adapted to translating between two arbitrary distributions, e.g. in the case of image-to-image translation. Luckily, classical results have shown that one can condition a diffusion process on a fixed known endpoint via the famous Doob’s $h$ -transform:
扩散模型是有限的，因为它们只能将复杂的数据分布传输到标准高斯分布，并且不能自然地适应两个任意分布之间的转换，例如在图像到图像翻译的情况下。幸运的是，经典结果表明，我们可以通过著名的 Doob 的 $h$ 变换来调节固定已知端点上的扩散过程：

Stochastic bridges via $h$ -transform. Specifically, a diffusion process defined in Eq. (1) can be driven to arrive at a particular point of interest $y\in\mathbb{R}^{d}$ almost surely via Doob’s $h$ -transform (Doob and Doob, 1984; Rogers and Williams, 2000),
通过 $h$ 变换的随机桥。具体来说，方程中定义的扩散过程。 (1) 几乎肯定可以通过 Doob 的 $h$ 变换到达特定的兴趣点 $y\in\mathbb{R}^{d}$ （Doob 和 Doob，1984；Rogers 和 Williams，2000），

\displaystyle d{\mathbf{x}}_{t}={\mathbf{f}}({\mathbf{x}}_{t},t)dt+g(t)^{2}{\mathbf{h}}({\mathbf{x}}_{t},t,y,T)+g(t)d{\mathbf{w}}_{t},\quad{\mathbf{x}}_{0}\sim q_{\rm{data}}({\mathbf{x}}),\quad{\mathbf{x}}_{T}=y

(5)

where ${\mathbf{h}}(x,t,y,T)=\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})\big{\rvert}_{{\mathbf{x}}_{t}=x,{\mathbf{x}}_{T}=y}$ is the gradient of the log transition kernel of from $t$ to $T$ generated by the original SDE, evaluated at points ${\mathbf{x}}_{t}=x$ and ${\mathbf{x}}_{T}=y$ , and each ${\mathbf{x}}_{t}$ now explicitly depends on $y$ at time $T$ . Furthermore, $p({\mathbf{x}}_{T}=y\mid{\mathbf{x}}_{t})$ satisfies the Kolmogorov backward equation (specified in Appendix A). With specific drift and diffusion choices, e.g. ${\mathbf{f}}({\mathbf{x}}_{t},t)={\bm{0}}$ , ${\mathbf{h}}$ is tractable due to the tractable (Gaussian) transition kernel of the underlying diffusion process.
其中 ${\mathbf{h}}(x,t,y,T)=\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})\big{\rvert}_{{\mathbf{x}}_{t}=x,{\mathbf{x}}_{T}=y}$ 是原始 SDE 生成的从 $t$ 到 $T$ 的对数转换内核的梯度，在点 ${\mathbf{x}}_{t}=x$ 处评估，并且 ${\mathbf{x}}_{T}=y$ ，并且每个 ${\mathbf{x}}_{t}$ 现在在 $T$ 时间明确依赖于 $y$ 。此外， $p({\mathbf{x}}_{T}=y\mid{\mathbf{x}}_{t})$ 满足Kolmogorov后向方程（在附录A中指定）。具有特定的漂移和扩散选择，例如 ${\mathbf{f}}({\mathbf{x}}_{t},t)={\bm{0}}$ 、 ${\mathbf{h}}$ 由于底层扩散过程的易处理（高斯）转换核而易于处理。

When the initial point ${\mathbf{x}}_{0}$ is fixed, the process is often called a diffusion bridge (Särkkä and Solin, 2019; Heng et al., 2021; Delyon and Hu, 2006; Schauer et al., 2017; Peluchetti, ; Liu et al., 2022b), and its ability to connect any given ${\mathbf{x}}_{0}$ to a given value of ${\mathbf{x}}_{T}$ is promising for image-to-image translation. Furthermore, the transition kernel may be tractable, which serves as further motivation.

3 Denoising Diffusion Bridge Models

Refer to caption — Figure 1: A schematic for Denoising Diffusion Bridge Models. DDBM uses a diffusion process guided by a drift adjustment (in blue) towards an endpoint ${\mathbf{x}}_{T}=y$ . They lears to reverse such a bridge process by matching the denoising bridge score (in orange), which allows one to reverse from ${\mathbf{x}}_{T}$ to ${\mathbf{x}}_{0}$ for any ${\mathbf{x}}_{T}={\mathbf{y}}\sim q_{\rm{data}}({\mathbf{y}})$ . The forward SDE process shown on the top is unidirectional while the probability flow ODE shown at the bottom is deterministic and bidirectional. White nodes are stochastic while grey nodes are deterministic.

Assuming that the endpoints of a diffusion bridge both exist in $\mathbb{R}^{d}$ and come from an arbitrary and unknown joint distribution, i.e. $({\mathbf{x}}_{0},{\mathbf{x}}_{T})=({\mathbf{x}},{\mathbf{y}})\sim q_{\rm{data}}({\mathbf{x}},{\mathbf{y}})$ , we wish to devise a process that learns to approximately sample from $q_{\rm{data}}({\mathbf{x}}\mid{\mathbf{y}})$ by reversing the diffusion bridge with boundary distribution $q_{\rm{data}}({\mathbf{x}},{\mathbf{y}})$ , given a training set of paired samples drawn from $q_{\rm{data}}({\mathbf{x}},{\mathbf{y}})$ .

3.1 Time-Reversed SDE and Probability Flow ODE

Inspired by diffusion bridges, we construct the stochastic process $\{{\mathbf{x}}_{t}\}_{t=0}^{T}$ with marginal distribution $q({\mathbf{x}}_{t})$ such that $q({\mathbf{x}}_{0},{\mathbf{x}}_{T})$ approximates $q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})$ . Reversing the process amounts to sampling from $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ . Note that distribution $q(\cdot)$ is different from $p(\cdot)$ , i.e. the diffusion marginal distribution, in that the endpoint distributions are now $q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})=q_{\rm{data}}({\mathbf{x}},{\mathbf{y}})$ instead of the distribution of a diffusion $p({\mathbf{x}}_{0},{\mathbf{x}}_{T})=p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{0})q_{\rm{data}}({\mathbf{x}}_{0})$ , which defines a Gaussian ${\mathbf{x}}_{T}$ given ${\mathbf{x}}_{0}$ . We can construct the time-reversed SDE/probability flow ODE of $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ via the following theorem.

Theorem 1.

The evolution of conditional probability $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ has a time-reversed SDE of the form

\displaystyle d{\mathbf{x}}_{t}=\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)-g^{2}(t)\Big{(}{\mathbf{s}}({\mathbf{x}}_{t},t,y,T)-{\mathbf{h}}({\mathbf{x}}_{t},t,y,T)\Big{)}\Big{]}dt+g(t)d\hat{{\mathbf{w}}}_{t},\quad{\mathbf{x}}_{T}=y

(6)

with an associated probability flow ODE

\displaystyle d{\mathbf{x}}_{t}=\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)-g^{2}(t)\Big{(}\frac{1}{2}{\mathbf{s}}({\mathbf{x}}_{t},t,y,T)-{\mathbf{h}}({\mathbf{x}}_{t},t,y,T)\Big{)}\Big{]}dt,\quad{\mathbf{x}}_{T}=y

(7)

on $t\leq T-\epsilon$ for any $\epsilon>0$ , where $\hat{{\mathbf{w}}}_{t}$ denotes a Wiener process, ${\mathbf{s}}(x,t,y,T)=\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\big{|}_{{\mathbf{x}}_{t}=x,{\mathbf{x}}_{T}=y}$ and ${\mathbf{h}}$ is as defined in Eq. (5).

A schematic of the bridge process is shown in Figure 1. Note that this process is defined up to $T-\epsilon$ . To recover the initial distribution in the SDE case, we make an approximation that ${\mathbf{x}}_{T-\epsilon}\approx y$ for some small $\epsilon$ simulate SDE backward in time. For the ODE case, since we need to sample from $p({\mathbf{x}}_{T-\epsilon})$ which cannot be Dirac delta, we cannot approximate ${\mathbf{x}}_{T-\epsilon}$ with a single $y$ . Instead, we can first approximate ${\mathbf{x}}_{T-\epsilon^{\prime}}\approx y$ where $\epsilon>\epsilon^{\prime}>0$ , and then take an Euler-Maruyama step to ${\mathbf{x}}_{T-\epsilon}$ , and Eq. (7) can be used afterward. A toy visualization of VE bridge and VP bridges are shown in Figure 2. The top and bottom shows the respective SDE and ODE paths for VE and VP bridges.

3.2 Marginal Distributions and Denoising Bridge Score Matching

The sampling process in Theorem 1 requires approximation of the score ${\mathbf{s}}(x,t,y,T)=\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\big{|}_{{\mathbf{x}}_{t}=x,{\mathbf{x}}_{T}=y}$ where $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})=\int_{{\mathbf{x}}_{0}}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})d{\mathbf{x}}_{0}$ . However, as the true score is not known in closed-form, we take inspiration from denoising score-matching (Song et al., 2020b) and use a neural network to approximate the true score by matching against a tractable quantity. This usually results in closed-form marginal sampling of ${\mathbf{x}}_{t}$ given data (e.g. ${\mathbf{x}}_{0}$ in the case of diffusion models and $({\mathbf{x}}_{0},{\mathbf{x}}_{T})$ in our case), and given ${\mathbf{x}}_{t}$ , the model is trained to match against the closed-form denoising score objective. We are motivated to follow a similar approach because (1) tractable marginal sampling of ${\mathbf{x}}_{t}$ and (2) closed-form objectives enable a simple and scalable algorithm. We specify how to design the marginal sampling distribution and the tractable score objective below to approximate the ground-truth conditional score $\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ .
定理 1 中的采样过程需要分数 ${\mathbf{s}}(x,t,y,T)=\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\big{|}_{{\mathbf{x}}_{t}=x,{\mathbf{x}}_{T}=y}$ 的近似值，其中 $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})=\int_{{\mathbf{x}}_{0}}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})d{\mathbf{x}}_{0}$ 。然而，由于真实分数在封闭形式中是未知的，因此我们从去噪分数匹配（Song et al., 2020b）中获得灵感，并使用神经网络通过与易于处理的量进行匹配来近似真实分数。这通常会导致 ${\mathbf{x}}_{t}$ 给定数据的封闭形式边际采样（例如，在扩散模型的情况下为 ${\mathbf{x}}_{0}$ ，在我们的情况下为 $({\mathbf{x}}_{0},{\mathbf{x}}_{T})$ ），并给定 < b5>，训练模型以匹配封闭式去噪分数目标。我们有动力遵循类似的方法，因为 (1) ${\mathbf{x}}_{t}$ 的易于处理的边际抽样和 (2) 封闭形式的目标支持简单且可扩展的算法。我们在下面指定如何设计边际抽样分布和易处理的分数目标来近似地面实况条件分数 $\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ 。

	${\mathbf{f}}({\mathbf{x}}_{t},t)$	$g^{2}(t)$	$p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})$	$\text{SNR}_{t}$	$\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})$
VP	$\frac{d\log\alpha_{t}}{dt}{\mathbf{x}}_{t}$	$\frac{d}{dt}\sigma_{t}^{2}-2\frac{d\log\alpha_{t}}{dt}\sigma_{t}^{2}$	${\mathcal{N}}(\alpha_{t}{\mathbf{x}}_{0},\sigma_{t}^{2}{\bm{I}})$	$\alpha_{t}^{2}/\sigma_{t}^{2}$	$\frac{(\alpha_{t}/\alpha_{T}){\mathbf{x}}_{T}-{\mathbf{x}}_{t}}{\sigma_{t}^{2}(\text{SNR}_{t}/\text{SNR}_{T}-1)}$
VE	${\bm{0}}$	$\frac{d}{dt}\sigma_{t}^{2}$	${\mathcal{N}}({\mathbf{x}}_{0},\sigma_{t}^{2}{\bm{I}})$	$1/\sigma_{t}^{2}$	$\frac{{\mathbf{x}}_{T}-{\mathbf{x}}_{t}}{\sigma_{T}^{2}-\sigma_{t}^{2}}$

Table 1: VP and VE instantiations of diffusion bridges.
表 1：扩散桥的 VP 和 VE 实例。

Sampling distribution. Fortunately, for the former condition, we can design our sampling distribution $q(\cdot)$ such that $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})\vcentcolon=p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$ , where $p(\cdot)$ is the diffusion distribution pinned at both endpoints as in Eq. (5). For diffusion processes with Gaussian transition kernels, e.g. VE, VP (Song et al., 2020b), our sampling distribution is a Gaussian distribution of the form
抽样分配。幸运的是，对于前一种情况，我们可以设计采样分布 $q(\cdot)$ ，使得 $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})\vcentcolon=p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$ ，其中 $p(\cdot)$ 是固定在两个端点的扩散分布，如方程式 1 所示。（5）。对于具有高斯转移核的扩散过程，例如VE、VP（Song et al., 2020b），我们的抽样分布是以下形式的高斯分布

$\displaystyle q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$	$\displaystyle={\mathcal{N}}(\hat{\mu}_{t},\hat{\sigma}_{t}^{2}\bm{I}),\quad\text{where}$	(8)
$\displaystyle\hat{\mu}_{t}$	$\displaystyle=\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\frac{\alpha_{t}}{\alpha_{T}}{\mathbf{x}}_{T}+\alpha_{t}{\mathbf{x}}_{0}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})$
$\displaystyle\hat{\sigma}_{t}^{2}$	$\displaystyle=\sigma_{t}^{2}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})$

where $\alpha_{t}$ and $\sigma_{t}$ are pre-defined signal and noise schedules and $\text{SNR}_{t}=\alpha_{t}^{2}/\sigma_{t}^{2}$ is the signal-to-noise ratio at time $t$ . For VE schedule, we assume $\alpha_{t}=1$ and derivation details are provided in Appendix A.1. Notably, the mean of this distribution is a linear interpolation between the (scaled) endpoints, and the distribution approaches a Dirac distribution when nearing either end. For concreteness, we present the bridge processes generated by both VP and VE diffusion in Table 1 and recommend choosing ${\mathbf{f}}$ and $g$ specified therein.

Training objective. For the latter condition, diffusion bridges benefit from a similar setup as in diffusion models, since a pre-defined signal/noise schedule gives rise to a closed-form conditional score $\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$ . We show in the following theorem that with ${\mathbf{x}}_{t}\sim q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$ , a neural network ${\mathbf{s}}_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)$ that matches against this closed-form score approximates the true score.

Theorem 2 (Denoising Bridge Score Matching).

Let $({\mathbf{x}}_{0},{\mathbf{x}}_{T})\sim q_{\rm{data}}({\mathbf{x}},{\mathbf{y}})$ , ${\mathbf{x}}_{t}\sim q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$ , $t\sim p(t)$ for any non-zero time sampling distribution $p(t)$ in $[0,T]$ , and $w(t)$ be a non-zero loss weighting term of any choice. Minimum of the following objective:

\displaystyle\mathcal{L}(\theta)

\displaystyle=\mathbb{E}_{{\mathbf{x}}_{t},{\mathbf{x}}_{0},{\mathbf{x}}_{T},t}\Big{[}w(t)\norm{{\mathbf{s}}_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)-\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})}^{2}\Big{]}

(9)

satisfies ${\mathbf{s}}_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)=\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ .

In short, we establish a tractable diffusion bridge over two endpoints and, by matching the conditional score of the Gaussian bridge, we can learn the score of the new distribution $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ that satisfies the boundary distribution $q_{\rm{data}}({\mathbf{x}},{\mathbf{y}})$ .

4 Generalized Parameterization for Distribution Translation

Building the bridge process upon diffusion process allows us to further adapt many recent advancements in the score network parameterization ${\mathbf{s}}_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)$ (Ho et al., 2020; Song et al., 2020b; Salimans and Ho, 2022; Ho et al., 2022; Karras et al., 2022), different noise schedules, and efficient ODE sampling (Song et al., 2020a; Karras et al., 2022; Lu et al., 2022a; b; Zhang and Chen, 2022) to our more general framework. Among these works, EDM (Karras et al., 2022) proposes to parameterize the model output to be $D_{\theta}({\mathbf{x}}_{t},t)=c_{\text{skip}}(t){\mathbf{x}}_{t}+c_{\text{out}}(t)F_{\theta}(c_{\text{in}}(t){\mathbf{x}}_{t},c_{\text{noise}}(t))$ where $F_{\theta}$ is a neural network with parameter $\theta$ that predicts ${\mathbf{x}}_{0}$ . In a similar spirit, we adopt this pred- ${\mathbf{x}}$ parameterization and additionally derive a set of scaling functions for distribution translation, which we show is a strict superset.

Score reparameterization. Following the sampling distribution proposed in (8), a pred- ${\mathbf{x}}$ model can predict bridge score by

\displaystyle\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\approx-\frac{{\mathbf{x}}_{t}-\Big{(}\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\frac{\alpha_{t}}{\alpha_{T}}{\mathbf{x}}_{T}+\alpha_{t}D_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})\Big{)}}{\sigma_{t}^{2}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})}

(10)

Scaling functions and loss weighting. Following Karras et al. (2022), and let $a_{t}=\alpha_{t}/\alpha_{T}*\text{SNR}_{T}/\text{SNR}_{t}$ , $b_{t}=\alpha_{t}(1-\text{SNR}_{T}/\text{SNR}_{t})$ , $c_{t}=\sigma_{t}^{2}(1-\text{SNR}_{T}/\text{SNR}_{t})$ , the scaling functions and weighting function $w(t)$ can be derived to be

	$\displaystyle c_{\text{in}}(t)=\frac{1}{\sqrt{a_{t}^{2}\sigma_{T}^{2}+b_{t}^{2}\sigma_{0}^{2}+2a_{t}b_{t}\sigma_{0T}+c_{t})}},\quad c_{\text{out}}(t)=\sqrt{a_{t}^{2}(\sigma_{T}^{2}\sigma_{0}^{2}-\sigma_{0T}^{2})+\sigma_{0}^{2}c_{t}}*c_{\text{in}}(t)$		(11)
	$\displaystyle c_{\text{skip}}(t)=\Big{(}b_{t}\sigma_{0}^{2}+a_{t}\sigma_{0T}\Big{)}*c_{\text{in}}^{2}(t),\quad w(t)=\frac{1}{c_{\text{out}}(t)^{2}},\quad c_{\text{noise}}(t)=\frac{1}{4}\log{(t)}$		(12)

where $\sigma_{0}^{2}$ , $\sigma_{T}^{2}$ , and $\sigma_{0T}$ denote the variance of ${\mathbf{x}}_{0}$ , variance of ${\mathbf{x}}_{T}$ , and the covariance of the two, respectively. The only additional hyperparameters compared to EDM are $\sigma_{T}$ and $\sigma_{0T}$ , which characterize the distribution of ${\mathbf{x}}_{T}$ and its correlation with ${\mathbf{x}}_{0}$ . One can notice that in the case of EDM, $\sigma_{t}=t$ , $\sigma_{T}^{2}=\sigma_{0}^{2}+T^{2}$ because ${\mathbf{x}}_{T}={\mathbf{x}}_{0}+T{\bm{\epsilon}}$ for some Gaussian noise ${\bm{\epsilon}}$ , $\sigma_{0T}=\sigma_{0}^{2}$ , and $\text{SNR}_{T}/\text{SNR}_{t}=t^{2}/T^{2}$ . One can show that the scaling functions then reduce to those in EDM. We leave details in Appendix A.5.

Generalized time-reversal. Due to the probability flow ODE’s resemblance with classifier-guidance (Dhariwal and Nichol, 2021; Ho and Salimans, 2022), we can introduce an additional parameter $w$ to set the "strength" of drift adjustment as below.

\displaystyle d{\mathbf{x}}_{t}=\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)-g^{2}(t)\Big{(}\frac{1}{2}{\mathbf{s}}({\mathbf{x}}_{t},t,y,T)-w{\mathbf{h}}({\mathbf{x}}_{t},t,y,T)\Big{)}\Big{]}dt,\quad{\mathbf{x}}_{T}=y

(13)

which allows for a strictly wider class of marginal density of ${\mathbf{x}}_{t}$ generated by the resulting probability flow ODE. We examine the effect of this parameter in our ablation studies.

5 Stochastic Sampling for Denoising Diffusion Bridges

Algorithm 1 Denoising Diffusion Bridge Hybrid Sampler

Input: model

D_{\theta}({\mathbf{x}}_{t},t)

, time steps

\{t_{i}\}_{i=0}^{N}

, max time

T

, guidance strength

w

, step ratio

s

, distribution

q_{\rm{data}}({\mathbf{y}})

Output:

{\mathbf{x}}_{0}

Sample

{\mathbf{x}}_{N}\sim q_{\rm{data}}({\mathbf{y}})

for

i=N,\dots,1

Sample

{\bm{\epsilon}}_{i}\sim{\mathcal{N}}({\bm{0}},{\bm{I}})

\hat{t}_{i}\leftarrow

t_{i}+s(t_{i-1}-t_{i})

\bm{d}_{i}\leftarrow

-{\mathbf{f}}({\mathbf{x}}_{i},t_{i})+g^{2}(t_{i})\Big{(}{\mathbf{s}}({\mathbf{x}}_{i},t_{i},{\mathbf{x}}_{N},T)-{\mathbf{h}}({\mathbf{x}}_{i},t_{i},{\mathbf{x}}_{N},T)\Big{)}

\hat{{\mathbf{x}}}_{i}\leftarrow{\mathbf{x}}_{i}+\bm{d}_{i}(\hat{t}_{i}-t_{i})+g(t_{i})\sqrt{\hat{t}_{i}-t_{i}}{\bm{\epsilon}}_{i}

\hat{\bm{d}}_{i}\leftarrow-{\mathbf{f}}(\hat{{\mathbf{x}}}_{i},\hat{t}_{i})+g^{2}(\hat{t}_{i})\Big{(}\frac{1}{2}{\mathbf{s}}(\hat{{\mathbf{x}}}_{i},\hat{t}_{i},{\mathbf{x}}_{N},T)-w{\mathbf{h}}(\hat{{\mathbf{x}}}_{i},\hat{t}_{i},{\mathbf{x}}_{N},T)\Big{)}

{\mathbf{x}}_{i-1}\leftarrow\hat{{\mathbf{x}}}_{i}+\hat{\bm{d}}_{i}(t_{i-1}-\hat{t}_{i})

i\neq 1

then

\bm{d}^{\prime}_{i}\leftarrow-{\mathbf{f}}({\mathbf{x}}_{i-1},t_{i-1})+g^{2}(t_{i-1})\Big{(}\frac{1}{2}{\mathbf{s}}({\mathbf{x}}_{i-1},t_{i-1},{\mathbf{x}}_{N},T)-w{\mathbf{h}}({\mathbf{x}}_{i-1},t_{i-1},{\mathbf{x}}_{N},T)\Big{)}

{\mathbf{x}}_{i-1}\leftarrow\hat{{\mathbf{x}}}_{i}+(\frac{1}{2}\bm{d}^{\prime}_{i}+\frac{1}{2}\hat{\bm{d}}_{i})(t_{i-1}-\hat{t}_{i})

end if

end for

Although the probability flow ODE allows for one to use fast integration techniques to accelerate the sampling process (Zhang and Chen, 2022; Song et al., 2020a; Karras et al., 2022), purely following an ODE path is problematic because diffusion bridges have fixed starting points given as data ${\mathbf{x}}_{T}={\mathbf{y}}\sim q_{\rm{data}}({\mathbf{y}})$ , and following the probability flow ODE backward in time generates a deterministic "expected" path. This can result in "averaged" or blurry outputs given initial conditions. Thus, we are motivated to introduce noise into our sampling process to improve the sampling quality and diversity.
尽管概率流 ODE 允许使用快速积分技术来加速采样过程（Zhang 和 Chen，2022；Song 等人，2020a；Karras 等人，2022），但纯粹遵循 ODE 路径是有问题的，因为扩散桥具有以数据 ${\mathbf{x}}_{T}={\mathbf{y}}\sim q_{\rm{data}}({\mathbf{y}})$ 形式给出的固定起点，并且按照时间向后的概率流 ODE 生成确定性的“预期”路径。在给定初始条件的情况下，这可能会导致“平均”或模糊的输出。因此，我们有动力在采样过程中引入噪声，以提高采样质量和多样性。

Higher-order hybrid sampler. Our sampler is built upon prior higher-order ODE sampler in (Karras et al., 2022), which discretizes the sampling steps into $t_{N}>t_{N-1}>\dots>t_{0}$ with decreasing intervals (see Appendix A.6 for details). Inspired by the predictor-corrector sampler introduced by Song et al. (2020b), we additionally introduce a scheduled Euler-Maruyama step which follows the backward SDE in between higher-order ODE steps. This ensures that the marginal distribution at each step approximately stays the same. We introduce additional scaling hyperparameter $s$ , which define a step ratio in between $t_{i-1}$ and $t_{i}$ such that the interval $[t_{i}-s(t_{i}-t_{i-1}),t_{i}]$ is used for Euler-Maruyama steps and $[t_{i-1},t_{i}-s(t_{i}-t_{i-1})]$ is used for Heun steps, as described in Algorithm 1.
高阶混合采样器。我们的采样器基于（Karras et al., 2022）中先前的高阶 ODE 采样器构建，它将采样步骤离散化为间隔递减的 $t_{N}>t_{N-1}>\dots>t_{0}$ （详细信息请参阅附录 A.6）。受到 Song 等人引入的预测校正采样器的启发。 (2020b)，我们还引入了一个预定的 Euler-Maruyama 步骤，该步骤遵循高阶 ODE 步骤之间的向后 SDE。这确保了每一步的边际分布大致保持相同。我们引入了额外的缩放超参数 $s$ ，它定义了 $t_{i-1}$ 和 $t_{i}$ 之间的步长比，使得间隔 $[t_{i}-s(t_{i}-t_{i-1}),t_{i}]$ 用于欧拉-Maruyama 步骤和 $[t_{i-1},t_{i}-s(t_{i}-t_{i-1})]$ 用于 Heun 步骤，如算法 1 中所述。

6 Related Works and Special Cases
6相关作品和特例

Diffusion models. The advancements in diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020; Song et al., 2020b) have improved state-of-the-art in image generation and outperformed GANs (Goodfellow et al., 2014). The success of diffusion models goes hand in hand with important design choices such as network design (Song et al., 2020b; Karras et al., 2022; Nichol and Dhariwal, 2021; Hoogeboom et al., 2023; Peebles and Xie, 2023), improved noise-schedules (Nichol and Dhariwal, 2021; Karras et al., 2022; Peebles and Xie, 2023), faster and more accurate samplers (Song et al., 2020a; Lu et al., 2022a; b; Zhang and Chen, 2022), and guidance methods (Dhariwal and Nichol, 2021; Ho and Salimans, 2022). Given the large body of literature on diffusion models for unconditional generation, which largely is based on these various design choices, we seek to design our bridge formulation to allow for a seamless integration with this literature. As such, we adopt a time-reversal perspective to directly extend these methods.
扩散模型。扩散模型的进步（Sohl-Dickstein 等人，2015 年；Ho 等人，2020 年；Song 等人，2020b）提高了图像生成的最新技术，并超越了 GAN（Goodfellow 等人， 2014）。扩散模型的成功与网络设计等重要的设计选择密切相关（Song et al., 2020b；Karras et al., 2022；Nichol and Dhariwal, 2021；Hoogeboom et al., 2023；Peebles and Xie, 2023））、改进的噪声调度（Nichol 和 Dhariwal，2021；Karras 等人，2022；Peebles 和 Xie，2023）、更快、更准确的采样器（Song 等人，2020a；Lu 等人，2022a；b；Zhang）和 Chen，2022），以及指导方法（Dhariwal 和 Nichol，2021；Ho 和 Salimans，2022）。鉴于关于无条件生成的扩散模型的大量文献（很大程度上基于这些不同的设计选择），我们寻求设计我们的桥公式以允许与这些文献的无缝集成。因此，我们采用时间反转的视角来直接扩展这些方法。

Diffusion bridges, Schödinger bridges, and Doob’s h-transform. Diffusion bridges (Särkkä and Solin, 2019) are a common tool in probability theory and have been actively studied in recent years in the context of generative modeling (Liu et al., 2022b; Somnath et al., 2023; De Bortoli et al., 2021; Peluchetti, ; Peluchetti, 2023). Heng et al. (2021) explores diffusion bridges conditioned on fixed starting/ending points and learns to simulate the time-reversal of the bridge given an approximation of the score $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t})$ . More recently, instead of considering bridges with fixed endpoints, Liu et al. (2022b) uses Doob’s h-transform to bridge between arbitrary distributions. A forward bridge is learned via score-matching by simulating entire paths during training. In contrast, other works (Somnath et al., 2023; Peluchetti, ), while also adopting Doob’s h-transform, propose simulation-free algorithms for forward-time generation. Delbracio and Milanfar (2023) similarly constructs a Brownian Bridge for direct iteration and is successfully applied to image-restoration tasks. Another approach De Bortoli et al. (2021) proposes Iterative Proportional Fitting (IPF) to tractably solve Schödinger Bridge (SB) problems in translating between different distributions. Liu et al. (2023) is built on a tractable class of SB which results in a simulation-free algorithm and has demonstrated strong performance in image translation tasks. More recently, extending SB with IPF, Bridge-Matching (Shi et al., 2023) proposes to use Iterative Markovian Fitting to solve the SB problem. A similar algorithm is also developed by Peluchetti (2023) for distribution translation. A more related work to ours is Li et al. (2023), which proposes to directly reverse a Brownian Bridge for distribution translation in discrete time. Our method instead shows how to construct a bridge model from any existing VP and VE diffusion processes in continuous time, and Brownian Bridge (as considered in most previous works) is but a special case of VE bridges. We additionally show that, when implemented correctly, VP bridges can achieve very strong empirical performance. Although similar perspective can be derived using forward-time diffusion as in Peluchetti which also proposes VE/VP bridge schedules, our framework enjoys additional empirical (reusing diffusion designs) and theoretical (connection with OT-Flow-Matching(Lipman et al., 2023; Tong et al., 2023b) and Rectified Flow (Liu et al., 2022a)) benefits.
扩散桥、薛定谔桥和杜布 h 变换。扩散桥（Särkkä 和 Solin，2019）是概率论中的常用工具，近年来在生成建模的背景下得到了积极研究（Liu 等人，2022b；Somnath 等人，2023；De Bortoli 等人，2023）。，2021；佩卢切蒂，2023）。恒等人。 (2021) 探索了以固定起点/终点为条件的扩散桥，并学习在给定分数 $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t})$ 的近似值的情况下模拟桥的时间反转。最近，Liu 等人不再考虑具有固定端点的桥梁。 (2022b) 使用 Doob 的 h 变换来连接任意分布。前向桥是通过在训练期间模拟整个路径来通过分数匹配来学习的。相比之下，其他作品（Somnath 等人，2023；Peluchetti）在采用 Doob 的 h 变换的同时，提出了用于前向生成的免模拟算法。 Delbracio 和 Milanfar (2023) 类似地构建了用于直接迭代的布朗桥，并成功应用于图像恢复任务。 De Bortoli 等人的另一种方法。（2021）提出迭代比例拟合（IPF）来轻松解决不同分布之间转换的薛定谔桥（SB）问题。刘等人。 (2023) 是建立在易于处理的 SB 类之上的，它产生了免模拟算法，并在图像翻译任务中表现出了强大的性能。最近，通过 IPF 扩展 SB，Bridge-Matching（Shi et al., 2023）提出使用迭代马尔可夫拟合来解决 SB 问题。 Peluchetti (2023) 也开发了类似的算法用于分布翻译。与我们更相关的工作是 Li 等人。（2023），提出直接反转布朗桥以进行离散时间的分布转换。相反，我们的方法展示了如何在连续时间内从任何现有的 VP 和 VE 扩散过程构建桥梁模型，而布朗桥（如大多数以前的作品中所考虑的）只是 VE 桥梁的一个特例。我们还表明，如果正确实施，VP 桥可以实现非常强大的经验性能。尽管使用前向时间扩散可以得出类似的观点，如 Peluchetti 中也提出了 VE/VP 桥接方案，但我们的框架享有额外的经验（重用扩散设计）和理论（与 OT-Flow-Matching 的连接（Lipman 等人，2023） ; Tong 等人，2023b）和整流流（Liu 等人，2022a））的好处。

Flow and Optimal Transport Works based on Flow-Matching (Lipman et al., 2023; Tong et al., 2023b; Pooladian et al., 2023; Tong et al., 2023a) learn an ODE-based transport map to bridge two distributions. Lipman et al. (2023) has demonstrated that by matching the velocity field of predefined transport maps, one can create powerful generative models competitive with the diffusion counterparts. Improving this approach, Tong et al. (2023b); Pooladian et al. (2023) exploit potential couplings between distributions using minibatch simulation-free OT. Rectified Flow (Liu et al., 2022a) directly constructs the OT bridge and uses neural networks to fit the intermediate velocity field. Another line of work uses stochastic interpolants (Albergo and Vanden-Eijnden, 2023) to build flow models and directly avoid the use of Doob’s h-functions and provide an easy way to construct interpolation maps between distributions. Albergo et al. (2023) presents a general theory with stochastic interpolants unifying flow and diffusion, and shows that a bridge can be constructed from both an ODE and SDE perspctive. Separate from these methods, our model uses a different denoising bridge score-matching loss than this class of models. Constructing from this perspective allows us to extend many existing successful designs of diffusion models (which are not directly applicable to these works) to the bridge framework and push state-of-the-art further for image translation while retaining strong performance for unconditional generation.

6.1 Special Cases of Denoising Diffusion Bridge Models

Case 1: Unconditional diffusion process (Song et al., 2020b). For unconditional diffusion processes (which map data to noise), we can first show that the marginal $p({\mathbf{x}}_{t})$ when $p({\mathbf{x}}_{0})=q_{\rm{data}}({\mathbf{x}})$ exactly matches that of a regular diffusion process when ${\mathbf{x}}_{T}\sim q_{\rm{data}}({\mathbf{y}}\mid{\mathbf{x}})={\mathcal{N}}(\alpha_{T}{\mathbf{x}},\sigma_{T}^{2}{\bm{I}})$ . By taking expectation over ${\mathbf{x}}_{T}$ in Eq. (8), we have

\displaystyle p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})={\mathcal{N}}(\alpha_{t}{\mathbf{x}}_{0},\sigma_{t}{\bm{I}})

(14)

One can further show that during sampling, Eq. (6) and (7) reduce to the reverse SDE and ODE (respectively) of a diffusion process when ${\mathbf{x}}_{T}$ is sampled from a Gaussian. We leave derivation details to Appendix A.4.

Case 2: OT-Flow-Matching (Lipman et al., 2023; Tong et al., 2023b) and Rectified Flow (Liu et al., 2022a). These works learn to match deterministic dynamics defined through ODEs instead of SDEs. In this particular case, they work with “straight line" paths defined by ${\mathbf{x}}_{T}-{\mathbf{x}}_{0}$ .

To see that our framework generalizes this, first let us define a family of diffusion bridges with variance scaled by $c\in(0,1)$ such that $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})={\mathcal{N}}(\hat{\mu}_{t},c^{2}\hat{\sigma}_{t}^{2}\bm{I})$ where $\hat{\mu}_{t}$ and $\hat{\sigma}_{t}$ are as defined in Eq. (8). One can therefore show that with a VE diffusion where $\sigma_{t}^{2}=c^{2}t$ , given some fixed ${\mathbf{x}}_{0}$ and ${\mathbf{x}}_{t}$ , i.e. $T=1$ , and ${\mathbf{x}}_{t}$ sampled from Eq. (8),

\displaystyle\lim_{c\to 0}\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)-c^{2}g^{2}(t)\Big{(}\frac{1}{2}\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{1})-\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{1}\mid{\mathbf{x}}_{t}))\Big{)}\Big{]}={\mathbf{x}}_{1}-{\mathbf{x}}_{0}

(15)

where inside the bracket is the drift of probability flow ODE in Eq. (7) given ${\mathbf{x}}_{0}$ and ${\mathbf{x}}_{1}$ , and the right hand side is exactly the straight line path term. In other words, these methods learn to match the drift in the bridge probability flow ODE (with a specific VE schedule) in the noiseless limit. The score model can then be matched against ${\mathbf{x}}_{T}-{\mathbf{x}}_{0}$ , with some additional caveat to handle additional input ${\mathbf{x}}_{T}$ , our framework exactly reduces to that of OT-Flow-Matching and Rectified Flow (details in Appendix A.4).

7 Experiments

In this section we verify the generative capability of DDBM , and we want to answer the following questions: (1) How well does DDBM perform in image-to-image translation in pixel space? (2) Can DDBM perform well in unconditional generation when one side of the bridge reduces to Gaussian distribution? (3) How does the additional design choices introduced affect the final performance? Unless noted otherwise, we use the same VE diffusion schedule as in EDM for our bridge model by default. We leave further experiment details to Appendix B.


Condition	Pix2Pix [16]	SDEdit [28]	Rectified Flow [24]	I²SB [23]	DDBM (VE), ODE sampler	DDBM (VE), hybrid sampler

7.1 Image-to-Image Translation

We demonstrate that DDBM can deliver competitive results in general image-to-image translation tasks. We evaluate on datasets with different image resolutions to demonstrate its applicability on a variety of scales. We choose Edges $\to$ Handbags (Isola et al., 2017) scaled to $64\times 64$ pixels, which contains image pairs for translating from edge maps to colored handbags, and DIODE-Outdoor (Vasiljevic et al., 2019) scaled to $256\times 256$ , which contains normal maps and RGB images of real-world outdoor scenes. For evaluation metrics, we use Fréchet Inception Distance (FID) (Heusel et al., 2017) and Inception Scores (IS) (Barratt and Sharma, 2018) evaluated on all training samples translation quality, and we use LPIPS (Zhang et al., 2018) and MSE (in $[-1,1]$ scale) to measure perceptual similarity and translation faithfulness.

We compare with Pix2Pix (Isola et al., 2017), SDEdit (Meng et al., 2022), DDIB (Su et al., 2022), Rectified Flow (Liu et al., 2022a), and I²SB (Liu et al., 2023) as they are built for image-to-image translation. For SDEdit we train unconditional EDM on the target domain, e.g. colored images, and initialize the translation by noising source image, e.g. sketches, and generate by EDM sampler given the noisy image. The other baseline methods are run with their respective repo while using the same network architecture as ours. Diffusion and transport-based methods are evaluated with the same number of function evaluations ( $N=40$ , which is the default for EDM sampler for $64\times 64$ images) to demonstrate our sampler’s effectiveness in the regime when the number of sampling steps are low. Results are shown in Table 2 and additional settings are specified in Appendix B.

We observe that our model can perform translation with both high generation quality and faithfulness, and we find that VP bridges outperform VE bridges in some cases. In contrast, Rectified-Flow as an OT-based method struggles to perform well when the two domains share little low-level similarities (e.g. color, hue). DDIB also fails to produce coherent translation due to the wide differences in pixel-space distribution between the paired data. I²SB comes closest to our method, but falls short when limited by computational constraints, i.e. NFE is low. We additionally show qualitative comparison with the most performant baselines in Figure 3. More visual results can be found in Appendix B.1.

7.2 Ablation Studies

We now study the effect of our preconditioning and hybrid samplers on generation quality in the context of both VE and VP bridge (see Appendix B for VP bridge parameterization). In the left column of Figure 4, we fix the guidance scale $w$ at 1 and vary the Euler step size $s$ from 0 to 0.9 to introduce stochasticity. We see a significant decrease in FID score as we increase $s$ which produces the best performance at some value between 0 and 1 (e.g. $s=0.3$ for Edges $\rightarrow$ Handbags). Figure 3 also shows that the ODE sampler (i.e. $s=0$ ) produces blurry images while our hybrid sampler produces considerably sharper results. On the right column, we study the effect of $w$ (from 0 to 1) with fixed $s$ . We observe that VE bridges are not affected by the change in $w$ whereas VP bridges heavily rely on setting $w=1$ . We hypothesize that this is due to the fact that VP bridges follow "curved paths" and destroy signals in between, so it is reliant on Doob’s $h$ -function for further guidance towards correct probability distribution.

We also study the effect of our preconditioning in Table 3. Our baseline without our preconditioning and our sampler is a simple model that directly matches output of the neural network to the training target and generates using EDM (Karras et al., 2022) sampler. We see that each introduced component further boosts the generation performance. Therefore, we can conclude that the introduced practical components are essential for the success of our DDBM .

	Edges $\to$ Handbags-64 $\times$ 64				DIODE-256 $\times$ 256
	FID $\downarrow$	IS $\uparrow$	LPIPS $\downarrow$	MSE $\downarrow$	FID $\downarrow$	IS $\uparrow$	LPIPS $\downarrow$	MSE $\downarrow$
Pix2Pix (Isola et al., 2017)	74.8	4.24	0.356	0.209	82.4	4.22	0.556	0.133
DDIB (Su et al., 2022)	186.84	2.04	0.869	1.05	242.3	4.22	0.798	0.794
SDEdit (Meng et al., 2022)	26.5	3.58	0.271	0.510	31.14	5.70	0.714	0.534
Rectified Flow (Liu et al., 2022a)	25.3	2.80	0.241	0.088	77.18	5.87	0.534	0.157
I²SB (Liu et al., 2023)	7.43	3.40	0.244	0.191	9.34	5.77	0.373	0.145
DDBM (VE)	2.93	3.58	0.131	0.013	8.51	6.03	0.226	0.0107
DDBM (VP)	1.83	3.73	0.142	0.0402	4.43	6.21	0.244	0.0839

Table 2: Quantitative evaluation of pixel-space image-to-image translation.

Our precond.	Our sampler	E $\to$ H-64 $\times$ 64	DIODE-256 $\times$ 256
VE	VP	VE	VP
✗	✗	14.02	11.76	126.3	96.93
✓	✗	13.26	11.19	79.25	91.07
✗	✓	13.11	29.91	91.31	21.92
✓	✓	2.93	1.83	8.51	4.43

7.3 Unconditional Generation

When one side of the distribution becomes Gaussian distribution, our framework exactly reduces to that of diffusion models. Specifically, during training when the end point ${\mathbf{x}}_{T}\sim{\mathcal{N}}(\alpha_{T}{\mathbf{x}}_{0},\sigma_{T}^{2}{\bm{I}})$ , our intermediate bridge samples ${\mathbf{x}}_{t}$ follows the distribution ${\mathbf{x}}_{t}\sim{\mathcal{N}}(\alpha_{t}{\mathbf{x}}_{0},\sigma_{t}^{2}{\bm{I}})$ . We empirically verify that using our bridge sampling and the pred- ${\mathbf{x}}$ objective inspired by EDM, we can recover its performance by using our more generalized parameterization.

We evaluate our method on CIFAR-10 (Krizhevsky et al., 2009) and FFHQ- $64\times 64$ (Karras et al., 2019) which are processed according to Karras et al. (2022). We use FID score for quantitative evaluation using 50K generated images and use number of function evaluations (NFE) for generation efficiency. We compare our generation results against diffusion-based and optimal transport-based models including DDPM (Ho et al., 2020), DDIM (Song et al., 2020a), DDPM++ (Song et al., 2020b), NCSN++ (Song et al., 2020b), Rectified Flow (Liu et al., 2022a), EDM (Karras et al., 2022). Quantitative results are presented in Table 4 and generated samples are shown in Figure 5.

We observe that our model is able to match EDM performance with negligible degradation in FID scores for CIFAR-10 and marginal improvement for FFHQ- $64\times 64$ . This corroborates our claim that our method can benefit from advances in diffusion models and generalize many of the advanced parameterization techniques such as those introduced in EDM.

8 Conclusion 8结论

In this work, we introduce Denoising Diffusion Bridge Models, a novel class of models that builds a stochastic bridge between paired samples with tractable marginal distributions in between. The model is learned by matching the conditional score of a tractable bridge distribution, which allows one to transport from one distribution to another via a new reverse SDE or probability flow ODE. Additionally, this generalized framework shares many similarities with diffusion models, thus allowing us to reuse and generalize many designs of diffusion models. We believe that DDBM is a significant contribution towards a general framework for distribution translation. In the era of generative AI, DDBM has a further role to play.
在这项工作中，我们介绍了去噪扩散桥模型，这是一类新颖的模型，它在成对样本之间建立了一座随机桥，其间具有易于处理的边缘分布。该模型是通过匹配可处理桥分布的条件分数来学习的，这允许人们通过新的反向 SDE 或概率流 ODE 从一个分布传输到另一个分布。此外，这个通用框架与扩散模型有许多相似之处，因此允许我们重用和通用扩散模型的许多设计。我们相信 DDBM 对分发翻译的通用框架做出了重大贡献。在生成式人工智能时代，DDBM 可以发挥进一步的作用。

References

Albergo et al. [2023] Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023.
Albergo and Vanden-Eijnden [2023] Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=li7qeBbCR1t.
Barratt and Sharma [2018] Shane Barratt and Rishi Sharma. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.
De Bortoli et al. [2021] Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695–17709, 2021.
Delbracio and Milanfar [2023] Mauricio Delbracio and Peyman Milanfar. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. arXiv preprint arXiv:2303.11435, 2023.
Delyon and Hu [2006] Bernard Delyon and Ying Hu. Simulation of conditioned diffusion and application to parameter estimation. Stochastic Processes and their Applications, 116(11):1660–1675, 2006.
Dhariwal and Nichol [2021] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
Doob and Doob [1984] Joseph L Doob and JI Doob. Classical potential theory and its probabilistic counterpart, volume 262. Springer, 1984.
Goodfellow et al. [2014] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
Heng et al. [2021] Jeremy Heng, Valentin De Bortoli, Arnaud Doucet, and James Thornton. Simulating diffusion bridges with score matching. arXiv preprint arXiv:2111.07243, 2021.
Heusel et al. [2017] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
Ho and Salimans [2022] Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
Ho et al. [2020] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
Ho et al. [2022] Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
Hoogeboom et al. [2023] Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffusion for high resolution images. arXiv preprint arXiv:2301.11093, 2023.
Isola et al. [2017] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
Karras et al. [2019] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
Karras et al. [2022] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
Kingma et al. [2021] Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
Krizhevsky et al. [2009] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
Li et al. [2023] Bo Li, Kaitao Xue, Bin Liu, and Yu-Kun Lai. Bbdm: Image-to-image translation with brownian bridge diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1952–1961, 2023.
Lipman et al. [2023] Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=PqvMRDCJT9t.
Liu et al. [2023] Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A Theodorou, Weili Nie, and Anima Anandkumar. I²sb: Image-to-image schrödinger bridge. arXiv, 2023.
Liu et al. [2022a] Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022a.
Liu et al. [2022b] Xingchao Liu, Lemeng Wu, Mao Ye, and Qiang Liu. Let us build bridges: Understanding and extending diffusion generative models. arXiv preprint arXiv:2208.14699, 2022b.
Lu et al. [2022a] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022a.
Lu et al. [2022b] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
Meng et al. [2022] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022.
Nichol and Dhariwal [2021] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
Peebles and Xie [2023] William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023.
[31] Stefano Peluchetti. Non-denoising forward-time diffusions.
Peluchetti [2023] Stefano Peluchetti. Diffusion bridge mixture transports, schr $\backslash$ " odinger bridge problems and generative modeling. arXiv preprint arXiv:2304.00917, 2023.
Pooladian et al. [2023] Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Brandon Amos, Yaron Lipman, and Ricky Chen. Multisample flow matching: Straightening flows with minibatch couplings. arXiv preprint arXiv:2304.14772, 2023.
Ramesh et al. [2022] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
Rogers and Williams [2000] L Chris G Rogers and David Williams. Diffusions, Markov processes and martingales: Volume 2, Itô calculus, volume 2. Cambridge university press, 2000.
Rombach et al. [2022] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
Saharia et al. [2021] Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. ACM SIGGRAPH 2022 Conference Proceedings, 2021.
Salimans and Ho [2022] Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
Särkkä and Solin [2019] Simo Särkkä and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
Schauer et al. [2017] Moritz Schauer, Frank Van Der Meulen, and Harry Van Zanten. Guided proposals for simulating multi-dimensional diffusion bridges. 2017.
Shi et al. [2023] Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion schr $\backslash$ " odinger bridge matching. arXiv preprint arXiv:2303.16852, 2023.
Sohl-Dickstein et al. [2015] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
Somnath et al. [2023] Vignesh Ram Somnath, Matteo Pariset, Ya-Ping Hsieh, Maria Rodriguez Martinez, Andreas Krause, and Charlotte Bunne. Aligned diffusion schr $\backslash$ " odinger bridges. arXiv preprint arXiv:2302.11419, 2023.
Song et al. [2020a] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
Song and Ermon [2019] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
Song et al. [2020b] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
Su et al. [2022] Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2022.
Szavits-Nossan and Evans [2015] Juraj Szavits-Nossan and Martin R Evans. Inequivalence of nonequilibrium path ensembles: the example of stochastic bridges. Journal of Statistical Mechanics: Theory and Experiment, 2015(12):P12008, 2015.
Tong et al. [2023a] Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. Simulation-free schr $\backslash$ " odinger bridges via score and flow matching. arXiv preprint arXiv:2307.03672, 2023a.
Tong et al. [2023b] Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian Fatras, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023b.
Vasiljevic et al. [2019] Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, and Gregory Shakhnarovich. DIODE: A Dense Indoor and Outdoor DEpth Dataset. CoRR, abs/1908.00463, 2019. URL http://arxiv.org/abs/1908.00463.
Villani [2008] Cédric Villani. Optimal transport: Old and new. 2008.
Zhang and Chen [2022] Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
Zhang et al. [2018] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
Zhu et al. [2017] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV), pages 2242–2251, 2017.

Appendix: Denoising Diffusion Bridge Models

Appendix A Proofs

A.1 Marginal distribution

We note that for tractable transition kernels specified in Table 1, we can derive the marginal distribution of ${\mathbf{x}}_{t}$ using Bayes’ rule

p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})=\frac{p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})}{p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{0})}

We can directly derive this by looking at the resulting density function. First,

$\displaystyle p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})$	$\displaystyle=\frac{1}{\sqrt{2\pi}\sigma_{t}}\exp(-\frac{({\mathbf{x}}_{t}-\alpha_{t}{\mathbf{x}}_{0})^{2}}{2\sigma_{t}^{2}})$	(16)
$\displaystyle p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})$	$\displaystyle=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_{T}^{2}-\frac{\alpha_{T}^{2}}{\alpha_{t}^{2}}\sigma_{t}^{2}}}\exp(-\frac{(\frac{\alpha_{T}}{\alpha_{t}}{\mathbf{x}}_{t}-{\mathbf{x}}_{T})^{2}}{2(\sigma_{T}^{2}-\frac{\alpha_{T}^{2}}{\alpha_{t}^{2}}\sigma_{t}^{2})})$	(17)
	$\displaystyle=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_{T}^{2}-\frac{\alpha_{T}^{2}}{\alpha_{t}^{2}}\sigma_{t}^{2}}}\exp(-\frac{({\mathbf{x}}_{t}-\frac{\alpha_{t}}{\alpha_{T}}{\mathbf{x}}_{T})^{2}}{2\sigma_{t}^{2}(\frac{\text{SNR}_{t}}{\text{SNR}_{T}}-1)})$	(18)
$\displaystyle p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{0})$	$\displaystyle=\frac{1}{\sqrt{2\pi}\sigma_{T}}\exp(-\frac{({\mathbf{x}}_{T}-\alpha_{T}{\mathbf{x}}_{0})^{2}}{2\sigma_{T}^{2}})$	(19)

and we refer readers to Kingma et al. (2021) for details on $p({\mathbf{x}}_{s}\mid{\mathbf{x}}_{t})$ for any $s>t$ . Then we know

		$\displaystyle p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})$		(20)
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi}\underbrace{\frac{\sigma_{t}}{\sigma_{T}}\sqrt{\sigma_{T}^{2}-\frac{\alpha_{T}^{2}}{\alpha_{t}^{2}}\sigma_{t}^{2}}}_{\textstyle\hat{\sigma}_{t}}}\exp(-\frac{1}{2}\smash[b]{\underbrace{\Big{[}\frac{({\mathbf{x}}_{t}-\alpha_{t}{\mathbf{x}}_{0})^{2}}{\sigma_{t}^{2}}+\frac{({\mathbf{x}}_{t}-\frac{\alpha_{t}}{\alpha_{T}}{\mathbf{x}}_{T})^{2}}{\sigma_{t}^{2}(\frac{\text{SNR}_{t}}{\text{SNR}_{T}}-1)}-\frac{({\mathbf{x}}_{T}-\alpha_{T}{\mathbf{x}}_{0})^{2}}{\sigma_{T}^{2}}}_{\textstyle-\frac{\textstyle({\mathbf{x}}_{t}-\hat{\mu}_{t})^{2}}{\textstyle 2\hat{\sigma}_{t}^{2}}}}\Big{]})$		(21)

where

	$\displaystyle\hat{\sigma}_{t}^{2}$	$\displaystyle=\sigma_{t}^{2}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})$		(22)
	$\displaystyle\hat{\mu}_{t}$	$\displaystyle=\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\frac{\alpha_{t}}{\alpha_{T}}{\mathbf{x}}_{T}+\alpha_{t}{\mathbf{x}}_{0}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})$		(23)

A.2 Denoising Bridge Score Matching

See 2

Proof.

We can explicitly write the objective as

$\int_{{\mathbf{x}}_{t},{\mathbf{x}}_{0},{\mathbf{x}}_{T},t}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})w(t)p(t)\Big{[}\norm{{\mathbf{s}}_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)-\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})}^{2}\Big{]}d{\mathbf{x}}_{t}d{\mathbf{x}}_{0}d{\mathbf{x}}_{T}dt$

(24)

Since the objective is an ${\mathcal{L}}_{2}$ loss and $p(t),w(t)$ are non-zero, its minimum can be derived as

	$\displaystyle{\mathbf{s}}^{*}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)$
$\displaystyle=$	$\displaystyle\int_{{\mathbf{x}}_{0},t}\frac{q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})\cancel{w(t)p(t)}}{\int_{{\mathbf{x}}_{0}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})\cancel{w(t)p(t)}d{\mathbf{x}}_{0}}\nabla_{d}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T}){\mathbf{x}}_{0}dt$	(25)
$\displaystyle=$	$\displaystyle\int_{{\mathbf{x}}_{0}}\frac{q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})}{q({\mathbf{x}}_{t},{\mathbf{x}}_{T})}\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})d{\mathbf{x}}_{0}$	(26)
$\displaystyle=$	$\displaystyle\int_{{\mathbf{x}}_{0}}\frac{\cancel{q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})}q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})}{q({\mathbf{x}}_{t},{\mathbf{x}}_{T})}\frac{\nabla_{{\mathbf{x}}_{t}}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})}{\cancel{q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})}}d{\mathbf{x}}_{0}$	(27)
$\displaystyle=$	$\displaystyle\frac{\nabla_{{\mathbf{x}}_{t}}\int_{{\mathbf{x}}_{0}}q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})d{\mathbf{x}}_{0}}{q({\mathbf{x}}_{t},{\mathbf{x}}_{T})}$	(28)
$\displaystyle=$	$\displaystyle\frac{\nabla_{{\mathbf{x}}_{t}}q({\mathbf{x}}_{t},{\mathbf{x}}_{T})}{q({\mathbf{x}}_{t},{\mathbf{x}}_{T})}$	(29)
$\displaystyle=$	$\displaystyle\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$	(30)

Thus, minimizing the objective approximates the conditional score. ∎

A.3 Probability Flow ODE of Diffusion Bridges

See 1

Proof.

To find the time evolution of $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})=\int_{{\mathbf{x}}_{0}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})$ , we can first find the time evolution of $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0},{\mathbf{x}}_{T}=x_{T})$ for fixed endpoints $x_{0}$ and $x_{T}$ , which by Bayes’ rule is

p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})=\frac{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}

where $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})$ follows Kolmogorov forward equation

\displaystyle\frac{\partial}{\partial t}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})=-\nabla_{{\mathbf{x}}_{t}}\cdot\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\Big{]}+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})

(31)

and $p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})$ follows Kolmogorov backward equation Szavits-Nossan and Evans (2015) where

\displaystyle-\frac{\partial}{\partial t}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})={\mathbf{f}}({\mathbf{x}}_{t},t)\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})

(32)

The time derivative of $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})$ thus follows

	$\displaystyle\frac{\partial}{\partial t}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})$	(33)
$\displaystyle=$	$\displaystyle\frac{\partial}{\partial t}\frac{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}$	(34)
$\displaystyle=$	$\displaystyle\underbrace{\frac{p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\frac{\partial}{\partial t}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})}_{\leavevmode\hbox to7.49pt{\vbox to7.49pt{\pgfpicture\makeatletter\hbox{\hskip 3.74467pt\lower-3.74467pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{3.54468pt}{0.0pt}\pgfsys@curveto{3.54468pt}{1.95769pt}{1.95769pt}{3.54468pt}{0.0pt}{3.54468pt}\pgfsys@curveto{-1.95769pt}{3.54468pt}{-3.54468pt}{1.95769pt}{-3.54468pt}{0.0pt}\pgfsys@curveto{-3.54468pt}{-1.95769pt}{-1.95769pt}{-3.54468pt}{0.0pt}{-3.54468pt}\pgfsys@curveto{1.95769pt}{-3.54468pt}{3.54468pt}{-1.95769pt}{3.54468pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.75pt}{-2.25555pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}+\underbrace{\frac{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\frac{\partial}{\partial t}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}_{\leavevmode\hbox to7.49pt{\vbox to7.49pt{\pgfpicture\makeatletter\hbox{\hskip 3.74467pt\lower-3.74467pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{3.54468pt}{0.0pt}\pgfsys@curveto{3.54468pt}{1.95769pt}{1.95769pt}{3.54468pt}{0.0pt}{3.54468pt}\pgfsys@curveto{-1.95769pt}{3.54468pt}{-3.54468pt}{1.95769pt}{-3.54468pt}{0.0pt}\pgfsys@curveto{-3.54468pt}{-1.95769pt}{-1.95769pt}{-3.54468pt}{0.0pt}{-3.54468pt}\pgfsys@curveto{1.95769pt}{-3.54468pt}{3.54468pt}{-1.95769pt}{3.54468pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.75pt}{-2.25555pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}$	(35)

Further expanding the right-hand-side, we have

\displaystyle\begin{split}\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}&=-\frac{p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{)}\end{split}

\displaystyle\begin{split}\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}&=\frac{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\Big{(}-\nabla_{{\mathbf{x}}_{t}}\cdot\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\Big{]}+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\Big{)}\end{split}

We can notice that the sum of the first terms of and is the result of a product rule, thus
我们可以注意到和的第一项之和是乘积规则的结果，因此

\begin{split}\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}&=-\nabla_{{\mathbf{x}}_{t}}\cdot\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\Big{]}\\ &+\frac{1}{2}g^{2}(t)\Bigg{(}\frac{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\\ &-\frac{p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Bigg{)}\end{split}

(36)

We now focus on reducing the terms in the last bracket. For clarity, we similarly number the two terms inside the bracket such that
我们现在专注于减少最后一个括号中的术语。为了清楚起见，我们对括号内的两个术语进行类似的编号，以便

		$\displaystyle=\frac{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})$		(37)
		$\displaystyle=\frac{p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})$		(38)

Now we can complete these terms to be results of product rule by adding and subtracting the following term
现在我们可以通过添加和减去以下项来将这些项完成为乘积规则的结果

	$\displaystyle\frac{\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}$		(39)
	$\displaystyle=\underbrace{\frac{\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\cdot\Big{[}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{]}}_{\leavevmode\hbox to7.49pt{\vbox to7.49pt{\pgfpicture\makeatletter\hbox{\hskip 3.74467pt\lower-3.74467pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{3.54468pt}{0.0pt}\pgfsys@curveto{3.54468pt}{1.95769pt}{1.95769pt}{3.54468pt}{0.0pt}{3.54468pt}\pgfsys@curveto{-1.95769pt}{3.54468pt}{-3.54468pt}{1.95769pt}{-3.54468pt}{0.0pt}\pgfsys@curveto{-3.54468pt}{-1.95769pt}{-1.95769pt}{-3.54468pt}{0.0pt}{-3.54468pt}\pgfsys@curveto{1.95769pt}{-3.54468pt}{3.54468pt}{-1.95769pt}{3.54468pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.75pt}{-2.25555pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{5}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}$		(40)
	$\displaystyle=\underbrace{\frac{\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})}{p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{0}=x_{0})}\cdot\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\Big{]}}_{\leavevmode\hbox to7.49pt{\vbox to7.49pt{\pgfpicture\makeatletter\hbox{\hskip 3.74467pt\lower-3.74467pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{3.54468pt}{0.0pt}\pgfsys@curveto{3.54468pt}{1.95769pt}{1.95769pt}{3.54468pt}{0.0pt}{3.54468pt}\pgfsys@curveto{-1.95769pt}{3.54468pt}{-3.54468pt}{1.95769pt}{-3.54468pt}{0.0pt}\pgfsys@curveto{-3.54468pt}{-1.95769pt}{-1.95769pt}{-3.54468pt}{0.0pt}{-3.54468pt}\pgfsys@curveto{1.95769pt}{-3.54468pt}{3.54468pt}{-1.95769pt}{3.54468pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.75pt}{-2.25555pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{6}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}$		(41)

which takes 2 equivalent forms and . Now we can write Eq. (36) as

	$\displaystyle\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$	$\displaystyle=-\nabla_{{\mathbf{x}}_{t}}\cdot\Big{[}{\mathbf{f}}({\mathbf{x}}_{t},t)p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\Big{]}$		(42)
		$\displaystyle+\frac{1}{2}g^{2}(t)\Bigg{(}\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{4}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{5}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{6}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\Bigg{)}-g^{2}(t)\Bigg{(}\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{4}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{5}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\Bigg{)}$		(43)

We can notice that

	$\displaystyle\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{6}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$	$\displaystyle=\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{(}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})\Bigg{)}$		(44)
	$\displaystyle\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{4}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{5}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$	$\displaystyle=\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{(}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Bigg{)}$		(45)

and using Bayes’ rule,

\displaystyle\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})=\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})+\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0}=x_{0})

(46)

we have

	$\displaystyle\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{4}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{5}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}+\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\hskip 4.9644pt\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{6}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$	$\displaystyle=\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{(}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\Bigg{)}$		(47)
		$\displaystyle=\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})$		(48)

Therefore,

	$\displaystyle\frac{\partial}{\partial t}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})$	(49)
$\displaystyle=$	$\displaystyle-\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{[}\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{)}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})\Bigg{]}$
	$\displaystyle+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0}=x_{0})$

which is a Fokker-Planck equation for a (forward) SDE with the modified drift term
这是带有修正漂移项的（正向）SDE 的 Fokker-Planck 方程

{\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})

To find the time derivative for $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})=\int_{{\mathbf{x}}_{0}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})$ , we can simply marginalize out ${\mathbf{x}}_{0}$ with distribution $q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})$ in the resulting Fokker-Planck, which can be achieved due to linearity of expectation with respect to ${\mathbf{x}}_{0}$ . That is,
为了找到 $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})=\int_{{\mathbf{x}}_{0}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})$ 的时间导数，我们可以简单地边缘化 ${\mathbf{x}}_{0}$ 与结果 Fokker-Planck 中的分布 $q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T})$ ，这可以通过线性来实现关于 ${\mathbf{x}}_{0}$ 的期望。那是，

	$\displaystyle\mathbb{E}_{{\mathbf{x}}_{0}\sim q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T}=x_{T})}\Big{[}\frac{\partial}{\partial t}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0})\Big{]}$	(50)
$\displaystyle=$	$\displaystyle-\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{[}\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{)}\mathbb{E}_{{\mathbf{x}}_{0}\sim q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T}=x_{T})}\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0})\Big{]}\Bigg{]}$
	$\displaystyle+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}\mathbb{E}_{{\mathbf{x}}_{0}\sim q_{\rm{data}}({\mathbf{x}}_{0}\mid{\mathbf{x}}_{T}=x_{T})}\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0})\Big{]}$

Since for $t\in[0,T-c]$ for some $c>0$ , Doob’s h-function is well-defined, and $p(x_{t}\mid x_{0},x_{T})$ is smooth, and we can take the expectation inside the equations. Additionally, the drift adjustment $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})$ does not depend on ${\mathbf{x}}_{0}$ the expectation is simply over $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0})$ expectation and by definition LHS is $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})$ ,
因为对于 $t\in[0,T-c]$ 对于某些 $c>0$ ，Doob 的 h 函数是明确定义的，并且 $p(x_{t}\mid x_{0},x_{T})$ 是平滑的，我们可以在方程中取期望。此外，漂移调整 $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})$ 不依赖于 ${\mathbf{x}}_{0}$ 期望只是超过 $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T},{\mathbf{x}}_{0})$ 期望，并且根据定义 LHS 是 $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})$ ，

	$\displaystyle\frac{\partial}{\partial t}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})=$	$\displaystyle-\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{[}\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{)}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})\Bigg{]}$		(51)
		$\displaystyle+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})$		(51)

This characterizes a reverse SDE specified in Theorem 1.

We can further use conversion trick in Song et al. (2020b) to convert this into a continuity equation without any diffusion term where

\displaystyle\frac{\partial}{\partial t}q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})=\nabla_{{\mathbf{x}}_{t}}\cdot\Big{[}\tilde{{\mathbf{f}}}({\mathbf{x}}_{t},t)q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})\Big{]}

(52)

where

	$\displaystyle\tilde{{\mathbf{f}}}({\mathbf{x}}_{t},t)$	$\displaystyle={\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})-\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})$		(53)
		$\displaystyle={\mathbf{f}}({\mathbf{x}}_{t},t)-g^{2}(t)\Big{(}\frac{1}{2}\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})-\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{)}$		(54)

∎

A.4 Special Cases of Denoising Diffusion Bridges

Unconditional diffusion models. We first give a general intuition that the marginal distribution of ${\mathbf{x}}_{t}$ sampling from the bridge is the same as sampling marginally from $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0})$ for a diffusion transition kernel $p(\cdot)$ . We can see this by observing

\displaystyle{\mathbf{x}}_{t}=\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\frac{\alpha_{t}}{\alpha_{T}}{\mathbf{x}}_{T}+\alpha_{t}{\mathbf{x}}_{0}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})+\sigma_{t}^{2}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}}){\bm{\epsilon}}_{1}

(55)

where ${\bm{\epsilon}}_{1}\sim{\mathcal{N}}({\bm{0}},{\bm{I}})$ . And since we assume ${\mathbf{x}}_{T}\sim{\mathcal{N}}(\alpha_{T}{\mathbf{x}}_{0},\sigma_{T}^{2}{\bm{I}}),$ we rewrite the above equation as

$\displaystyle{\mathbf{x}}_{t}$	$\displaystyle=\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\frac{\alpha_{t}}{\alpha_{T}}(\alpha_{T}{\mathbf{x}}_{0}+\sigma_{T}{\bm{\epsilon}}_{2})+\alpha_{t}{\mathbf{x}}_{0}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})+\sigma_{t}\sqrt{(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})}{\bm{\epsilon}}_{1}$	(56)
	$\displaystyle=\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\alpha_{t}{\mathbf{x}}_{0}+\frac{\text{SNR}_{T}}{\text{SNR}_{t}}\frac{\alpha_{t}}{\alpha_{T}}\sigma_{T}{\bm{\epsilon}}_{2}+\alpha_{t}{\mathbf{x}}_{0}(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})+\sigma_{t}\sqrt{(1-\frac{\text{SNR}_{T}}{\text{SNR}_{t}})}{\bm{\epsilon}}_{1}$	(57)
	$\displaystyle=\alpha_{t}{\mathbf{x}}_{0}+\sigma_{t}{\bm{\epsilon}}$	(58)

where ${\bm{\epsilon}}\sim{\mathcal{N}}({\bm{0}},{\bm{I}})$ and the last equality is due to the fact that the addition of two Gaussian with variances $\sigma_{1}^{2}$ , $\sigma_{2}^{2}$ is another Gaussian with variance $\sigma_{1}^{2}+\sigma_{2}^{2}$ .

Formally, to show that it is a special case, we first observe that the score matching objective allows our network to approximate $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ which is the conditional score of the diffusion transition kernel. Then we will show that the Fokker-Planck equation reduces to that of a diffusion when marginalizing out dependency on ${\mathbf{x}}_{T}$ .

From proof of Theorem 1, we know that the Fokker-Planck equation for $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ follows

	$\displaystyle\frac{\partial}{\partial t}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})=$	$\displaystyle-\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{[}\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}=x_{T}\mid{\mathbf{x}}_{t})\Big{)}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})\Bigg{]}$		(59)
		$\displaystyle+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T}=x_{T})$		(60)

Here we note that we use $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ because we are considering a diffusion process as a special case of a general $q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})$ introduced in Theorem 2. We can marginalize out ${\mathbf{x}}_{T}$ such that

	$\displaystyle\frac{\partial}{\partial t}\mathbb{E}_{{\mathbf{x}}_{T}\sim p({\mathbf{x}}_{T})}\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\Big{]}$
$\displaystyle=$	$\displaystyle\mathbb{E}_{{\mathbf{x}}_{T}\sim p({\mathbf{x}}_{T})}\Bigg{[}-\nabla_{{\mathbf{x}}_{t}}\cdot\Bigg{[}\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)+g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})\Big{)}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\Bigg{]}\Bigg{]}$
	$\displaystyle+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}\mathbb{E}_{{\mathbf{x}}_{T}\sim p({\mathbf{x}}_{T})}\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\Big{]}$	(61)

and so

	$\displaystyle\frac{\partial}{\partial t}p({\mathbf{x}}_{t})=$	$\displaystyle-\nabla_{{\mathbf{x}}_{t}}\cdot\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)p({\mathbf{x}}_{t})\Big{)}-g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\mathbb{E}_{{\mathbf{x}}_{T}\sim p({\mathbf{x}}_{T})}\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})\Big{]}$
		$\displaystyle+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t})$		(62)

and the second term can be reduced by writing the expectation explicitly as

	$\displaystyle\mathbb{E}_{{\mathbf{x}}_{T}\sim p({\mathbf{x}}_{T})}\Big{[}p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})\Big{]}$
$\displaystyle=$	$\displaystyle\int_{{\mathbf{x}}_{T}}p({\mathbf{x}}_{T})p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{T})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})d{\mathbf{x}}_{T}$	(63)
$\displaystyle=$	$\displaystyle p({\mathbf{x}}_{t})\int_{{\mathbf{x}}_{T}}p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})d{\mathbf{x}}_{T}$	(64)
$\displaystyle=$	$\displaystyle p({\mathbf{x}}_{t})\int_{{\mathbf{x}}_{T}}\cancel{p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})}\frac{\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})}{\cancel{p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})}}d{\mathbf{x}}_{T}$	(65)
$\displaystyle=$	$\displaystyle p({\mathbf{x}}_{t})\nabla_{{\mathbf{x}}_{t}}\int_{{\mathbf{x}}_{T}}p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{t})d{\mathbf{x}}_{T}$	(66)
$\displaystyle=$	$\displaystyle{\bm{0}}$	(67)

Therefore, the resulting probability flow ODE is

\displaystyle\frac{\partial}{\partial t}p({\mathbf{x}}_{t})=

\displaystyle-\nabla_{{\mathbf{x}}_{t}}\cdot\Big{(}{\mathbf{f}}({\mathbf{x}}_{t},t)p({\mathbf{x}}_{t})\Big{)}+\frac{1}{2}g^{2}(t)\nabla_{{\mathbf{x}}_{t}}\cdot\nabla_{{\mathbf{x}}_{t}}p({\mathbf{x}}_{t})

(68)

which is that of a regular diffusion. Therefore, by setting data distribution $q_{\rm{data}}({\mathbf{x}}_{0},{\mathbf{x}}_{T})$ to be $p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{0})q_{\rm{data}}({\mathbf{x}}_{0})$ we recover unconditional diffusion models.

OT-Flow Matching and Rectified Flow. As proposed, we use a VE schedule such that ${\mathbf{f}}({\mathbf{x}}_{t},t)={\bm{0}}$ and $\sigma_{t}^{2}=c^{2}t$ for some constant $c\in[0,1]$ . Then the probability flow ODE conditioned on ${\mathbf{x}}_{0},{\mathbf{x}}_{T}$ becomes

\displaystyle d{\mathbf{x}}_{t}=-c^{2}\Big{[}\frac{1}{2}\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})-\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{0})\Big{]}dt

(69)

Specifically, the drift term $D=-c^{2}\Big{[}\frac{1}{2}\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{T})-\log p({\mathbf{x}}_{T}\mid{\mathbf{x}}_{0})\Big{]}$ becomes

\displaystyle D=-\frac{1}{2}c^{2}\Bigg{[}-\frac{{\bm{\epsilon}}}{c\sqrt{t(1-\frac{t}{T})}}+2\frac{(\frac{t}{T}{\mathbf{x}}_{T}+(1-\frac{t}{T}){\mathbf{x}}_{0}+c\sqrt{t(1-\frac{t}{T})}{\bm{\epsilon}}-{\mathbf{x}}_{T})}{c^{2}(T-t)}\Bigg{]}

(70)

where ${\mathbf{x}}_{t}=\frac{t}{T}{\mathbf{x}}_{T}+(1-\frac{t}{T}){\mathbf{x}}_{0}+c\sqrt{t(1-\frac{t}{T})}{\bm{\epsilon}}$ . And we can rearrange the terms to be

$\displaystyle D$	$\displaystyle=\Bigg{[}-\frac{(\frac{t}{T}{\mathbf{x}}_{T}+(1-\frac{t}{T}){\mathbf{x}}_{0}-{\mathbf{x}}_{T})}{(T-t)}\Bigg{]}+{\mathcal{O}}(c)$	(71)
	$\displaystyle=\Bigg{[}-\frac{((1-\frac{t}{T}){\mathbf{x}}_{0}-(1-\frac{t}{T}){\mathbf{x}}_{T})}{(T-t)}\Bigg{]}+{\mathcal{O}}(c)$	(72)
	$\displaystyle=\Bigg{[}\frac{({\mathbf{x}}_{T}-{\mathbf{x}}_{0})}{T}\Bigg{]}+{\mathcal{O}}(c)$	(73)

And by taking $c\to 0$ , we have $\lim_{c\to 0}D={\mathbf{x}}_{1}-{\mathbf{x}}_{0}$ for $T=1$ . Therefore the network learns to match this drift term in the noiseless limit of denoising diffusion bridge in OT-Flow Matching and Rectified Flow case.

We next note that the original score-matching loss is no longer valid as bridge noise $\hat{\sigma}_{t}\rightarrow 0$ causes exploding magnitude of bridge score $\nabla_{{\mathbf{x}}_{t}}\log q({\mathbf{x}}_{t}|{\mathbf{x}}_{0},{\mathbf{x}}_{T})$ . We can then resort to matching against $\lim_{c\to 0}D$ altogether.

One additional caveat is that our framework as presented needs to take in ${\mathbf{x}}_{T}$ as an additional condition. To handle this, we note the generalized parameterization can be used to define $s_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)=c_{\text{skip1}}(t){\mathbf{x}}_{t}+c_{\text{skip2}}(t){\mathbf{x}}_{T}+c_{\text{out}}(t)V_{\theta}({\mathbf{x}}_{t},t)$ where $V_{\theta}({\mathbf{x}}_{t},t)$ is our actual network. We then set $c_{\text{skip1}}(t)=c_{\text{skip2}}(t)=0$ and $c_{\text{out}}(t)=1$ and uses loss $\mathbb{E}_{{\mathbf{x}}_{t},t}\Bigg{[}\norm{s_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)-({\mathbf{x}}_{T}-{\mathbf{x}}_{0})}^{2}\Bigg{]}=\mathbb{E}_{{\mathbf{x}}_{t},t}\Bigg{[}\norm{V_{\theta}({\mathbf{x}}_{t},{\mathbf{x}}_{T},t)-({\mathbf{x}}_{T}-{\mathbf{x}}_{0})}^{2}\Bigg{]}$ , which is the case of OT-Flow-Matching and Rectified Flow.

A.5 Generalized Parameterization

We now derive the EDM scaling functions from first principle, as suggested by (Karras et al., 2022).

Let $a_{t}=\alpha_{t}/\alpha_{T}*\text{SNR}_{T}/\text{SNR}_{t}$ , $b_{t}=\alpha_{t}(1-\text{SNR}_{T}/\text{SNR}_{t})$ , $c_{t}=\sigma_{t}^{2}(1-\text{SNR}_{T}/\text{SNR}_{t})$ . First, we expand the pred- ${\mathbf{x}}$ objective as

\mathbb{E}_{{\mathbf{x}}_{t},{\mathbf{x}}_{0},{\mathbf{x}}_{T},t}\Big{[}\tilde{w}(t)\norm{c_{\text{skip}}(t){\mathbf{x}}_{t}+c_{\text{out}}F_{\theta}(c_{\text{in}}(t){\mathbf{x}}_{t},c_{\text{noise}}(t))-{\mathbf{x}}_{0}}^{2}\Big{]}

where ${\mathbf{x}}_{t}=a_{t}{\mathbf{x}}_{T}+b_{t}{\mathbf{x}}_{0}+\sqrt{c_{t}}{\bm{\epsilon}}$ for ${\bm{\epsilon}}\sim{\mathcal{N}}({\bm{0}},{\bm{I}})$ . To derive $c_{\text{in}}(t)$ , we set the variance of the resulting input $c_{\text{in}}(t){\mathbf{x}}_{t}$ to be 1, where

		$\displaystyle c_{\text{in}}^{2}(t)\Big{(}a_{t}^{2}\sigma_{T}^{2}+b_{t}^{2}\sigma_{0}^{2}+2a_{t}b_{t}\sigma_{0T}+c_{t}\Big{)}=1$		(74)
	$\displaystyle\implies$	$\displaystyle c_{\text{in}}(t)=\frac{1}{\sqrt{a_{t}^{2}\sigma_{T}^{2}+b_{t}^{2}\sigma_{0}^{2}+2a_{t}b_{t}\sigma_{0T}+c_{t}}}$		(75)

For simplicity we denote the neural network as $F_{\theta}$ , and the inner square loss can be expanded to be

		$\displaystyle\tilde{w}(t)\norm{c_{\text{skip}}(t)\Big{(}a_{t}{\mathbf{x}}_{T}+b_{t}{\mathbf{x}}_{0}+\sqrt{c_{t}}{\bm{\epsilon}}\Big{)}+c_{\text{out}}F_{\theta}-{\mathbf{x}}_{0}}^{2}$		(76)
	$\displaystyle=$	$\displaystyle\tilde{w}(t)c_{\text{out}}^{2}(t)\norm{F_{\theta}-\frac{1}{c_{\text{out}}(t)}\Bigg{(}\Big{[}1-c_{\text{skip}}(t)b_{t}\Big{]}{\mathbf{x}}_{0}-c_{\text{skip}}(t)\Big{[}a_{t}{\mathbf{x}}_{T}+\sqrt{c_{t}}{\bm{\epsilon}}\Big{]}\Bigg{)}}^{2}$		(77)

And we want the prediction target to have variance 1, thus

\frac{1}{c_{\text{out}}^{2}(t)}\Bigg{(}\Big{[}1-c_{\text{skip}}(t)b_{t}\Big{]}^{2}\sigma_{0}^{2}+c_{\text{skip}}(t)^{2}\Big{[}a_{t}\sigma_{T}^{2}+c_{t}\Big{]}-2\Big{[}1-c_{\text{skip}}(t)b_{t}\Big{]}c_{\text{skip}}(t)a_{t}\sigma_{0T}\Bigg{)}=1

(78)

and

c_{\text{out}}^{2}(t)=\Big{[}1-c_{\text{skip}}(t)b_{t}\Big{]}^{2}\sigma_{0}^{2}+c_{\text{skip}}(t)^{2}\Big{[}a_{t}\sigma_{T}^{2}+c_{t}\Big{]}-2\Big{[}1-c_{\text{skip}}(t)b_{t}\Big{]}c_{\text{skip}}(t)a_{t}\sigma_{0T}

(79)

Following reasoning in Karras et al. (2022), we minimize $c_{\text{out}}(t)^{2}$ w.r.t. $c_{\text{skip}}(t)$ by taking derivative and set to 0, which is

-2(1-c_{\text{skip}}(t)b_{t})b_{t}\sigma_{0}^{2}+2c_{\text{skip}}(t)(a_{t}^{2}\sigma_{T}^{2}+c_{t})-2(1-2c_{\text{skip}}(t)b_{t})a_{t}\sigma_{0T}=0

(80)

and this implies

	$\displaystyle c_{\text{skip}}(t)$	$\displaystyle=\frac{b_{t}\sigma_{0}^{2}+a_{t}\sigma_{0T}}{a_{t}^{2}\sigma_{T}^{2}+b_{t}^{2}\sigma_{0}^{2}+2a_{t}b_{t}\sigma_{0T}+c_{t}}$		(81)
		$\displaystyle=\Big{(}b_{t}\sigma_{0}^{2}+a_{t}\sigma_{0T}\Big{)}*c_{\text{in}}(t)^{2}$		(82)

And

$\displaystyle c_{\text{out}}^{2}(t)$	$\displaystyle=\sigma_{0}^{2}-2c_{\text{skip}}(t)b_{t}\sigma_{0}^{2}+\Big{(}b_{t}\sigma_{0}^{2}+a_{t}\sigma_{0T}\Big{)}c_{\text{skip}}(t)-2c_{\text{skip}}(t)a_{t}\sigma_{0T}$	(83)
	$\displaystyle=\sigma_{0}^{2}-\Big{(}b_{t}\sigma_{0}^{2}+a_{t}\sigma_{0T}\Big{)}c_{\text{skip}}(t)$	(84)
	$\displaystyle=\frac{a_{t}^{2}(\sigma_{0}^{2}\sigma_{T}^{2}-\sigma_{0T}^{2})+\sigma_{0}^{2}c_{t}}{a_{t}^{2}\sigma_{T}^{2}+b_{t}^{2}\sigma_{0}^{2}+2a_{t}b_{t}\sigma_{0T}+c_{t}}$	(85)
$\displaystyle\implies$	$\displaystyle c_{\text{out}}(t)=\sqrt{a_{t}^{2}(\sigma_{0}^{2}\sigma_{T}^{2}-\sigma_{0T}^{2})+\sigma_{0}^{2}c_{t}}*c_{\text{in}}(t)$	(86)

Finally, $\tilde{w}(t)c_{\text{out}}(t)^{2}(t)=1\implies\tilde{w}(t)=1/c_{\text{out}}(t)^{2}$ , and for time, we simply reuse that proposed in Karras et al. (2022) as no significant change in time’s distribution.

EDM (Karras et al., 2022) as a special case. In the case of unconditional diffusion models, we have ${\mathbf{x}}_{T}={\mathbf{x}}_{0}+T{\bm{\epsilon}}$ , so $\sigma_{T}^{2}=\sigma_{0}^{2}+T^{2}$ and $\sigma_{0T}=\sigma_{0}^{2}$ . Additionally, $a_{t}=t^{2}/T^{2}$ , $b_{t}=(1-t^{2}/T^{2})$ , $c_{t}=t^{2}(1-t^{2}/T^{2})$ . Substituting in these into the coefficients, we have

$\displaystyle c_{\text{in}}(t)$	$\displaystyle=\frac{1}{\sqrt{\frac{t^{4}}{T^{4}}(\sigma_{0}^{2}+T^{2})+(1-\frac{t^{2}}{T^{2}})^{2}\sigma_{0}^{2}+2\frac{t^{2}}{T^{2}}(1-\frac{t^{2}}{T^{2}})\sigma_{0}^{2}+t^{2}(1-\frac{t^{2}}{T^{2}})}}$	(87)
	$\displaystyle=\frac{1}{\sqrt{\cancel{\frac{t^{4}}{T^{4}}\sigma_{0}^{2}}+\cancel{\frac{t^{4}}{T^{2}}}+(1-\frac{t^{2}}{T^{2}})^{2}\sigma_{0}^{2}+2\frac{t^{2}}{T^{2}}\sigma_{0}^{2}-\cancel{2}\frac{t^{4}}{T^{4}}\sigma_{0}^{2}+t^{2}-\cancel{\frac{t^{4}}{T^{2}}}}}$	(88)
	$\displaystyle=\frac{1}{\sqrt{\sigma_{0}^{2}+t^{2}}}$	(89)
$\displaystyle c_{\text{skip}}(t)$	$\displaystyle=\frac{(1-\frac{t^{2}}{T^{2}})\sigma_{0}^{2}+\frac{t^{2}}{T^{2}}\sigma_{0}^{2}}{\sigma_{0}^{2}+t^{2}}$	(90)
	$\displaystyle=\frac{\sigma_{0}^{2}}{\sigma_{0}^{2}+t^{2}}$	(91)
$\displaystyle c_{\text{out}}(t)$	$\displaystyle=\sqrt{\frac{t^{4}}{T^{4}}(\sigma_{0}^{2}(\sigma_{0}^{2}+T^{2})-\sigma_{0}^{4})+\sigma_{0}^{2}t^{2}(1-\frac{t^{2}}{T^{2}})}*c_{\text{in}}(t)$	(92)
	$\displaystyle=\sqrt{\frac{t^{4}}{T^{4}}(\sigma_{0}^{4}+\sigma_{0}^{2}T^{2}-\sigma_{0}^{4})+\sigma_{0}^{2}t^{2}(1-\frac{t^{2}}{T^{2}})}*c_{\text{in}}(t)$	(93)
	$\displaystyle=\sqrt{\frac{t^{4}}{T^{2}}\sigma_{0}^{2}+\sigma_{0}^{2}t^{2}-\sigma_{0}^{2}\frac{t^{4}}{T^{2}}}*c_{\text{in}}(t)$	(94)
	$\displaystyle=\frac{\sigma_{0}t}{\sqrt{\sigma_{0}^{2}+t^{2}}}$	(95)

And $\tilde{w}(t)=1/c_{\text{out}}^{2}(t)=(\sigma_{0}^{2}+t^{2})/(\sigma_{0}^{2}t^{2})=1/t^{2}+1/\sigma_{0}^{2}$ .

A.6 Sampler Discretization

EDM introduces Heun sampler, which discretizes the sampling steps into $t_{0}<t_{1}\dots<t_{N}$ where

\displaystyle t_{i>0}=\Big{(}T^{\frac{1}{\rho}}+\frac{N-i}{N-1}(t_{\text{min}}^{\frac{1}{\rho}}-T^{\frac{1}{\rho}})\Big{)}^{\rho}\quad\text{and}\quad t_{0}=0

(96)

and $\rho=7$ is a default choice. It then integrates over the probability flow ODE path with second-order Heun steps for each such discretization step. We reuse this discretization for all our experiments.

Appendix B Experiment Details

Architecture. For unconditional generation, architectures are reused from Karras et al. (2022) for both CIFAR-10 and FFHQ-64 $\times$ 64. For pixel-space translation, we use ADM (Dhariwal and Nichol, 2021) architecture for both 64 $\times$ 64 and 256 $\times$ 256 resolutions. For latent-space translation, which reduces to 32 $\times$ 32 resolution in the latent space, we use ADM (Dhariwal and Nichol, 2021) architecture for 64 $\times$ 64 resolution but change the channel dimensions from 192 to 256 and reduce the number of residual blocks from 3 to 2, and we fix everything else to be same as that for 64 $\times$ 64 resolution. We use 0.1 dropout for all models. Conditioning is done via concatenation at the input level.

VE and VP bridge parameterization. There are many schedules we can choose for both types of bridges. For all our experiments, VE bridges follow $\sigma_{t}=t$ and $\alpha_{t}=1$ and VP bridges follow a time-invariant drift ${\mathbf{f}}({\mathbf{x}}_{t},t)=-0.5\beta_{0}{\mathbf{x}}_{t}$ . This is a special case of the linear noise schedule considered in Song et al. (2020b) with ${\mathbf{f}}({\mathbf{x}}_{t},t)=-0.5t(\beta_{1}-\beta_{0})-0.5\beta_{0}$ . We simply choose $\beta_{1}=\beta_{0}$ because this results in a bridge that has symmetric noise levels w.r.t. time. We observe that linearly increasing drift causes the max noise to shift towards a higher $t$ and the noise decreases faster to 0 at $t=T$ than for a symmetric bridge. This makes the learning process more difficult and degrades performance.

Training. We use AdamW optimizer with 0.0001 learning rate and no weight decay. The batch size is 256 for all image size less than 256 and training is done on 4 NVIDIA A100 40G. For 256 $\times$ 256 resolution, the batch size is 4 accumulated 4 times such that the effective batch size is 64, trained on 4 NVIDIA A100 40G. The training is terminated at 500K iterations. During training, for image-to-image translation, we set $\sigma_{0}=\sigma_{T}=0.5$ , $\sigma_{0T}=\sigma_{0}^{2}/2$ , and for unconditional generation, we set $\sigma_{0}=0.5$ , $\sigma_{T}=\sqrt{\sigma_{0}^{2}+T^{2}}$ and $\sigma_{0T}=\sigma_{0}^{2}$ . We use random flipping as our data augmentation for image-to-image translation and reuse augmentation from Karras et al. (2022) for generation.

Baselines. All baselines are trained using the same architecture as ours for each experiment. For SDEdit, we use pretrained EDM model on ${\mathbf{x}}_{0}$ and conduct image-to-image translation by first noising ${\mathbf{x}}_{T}$ and denoising using the pretrained model. We reuse the noise schedule proposed by (Karras et al., 2022) and for reasonable generation while retaining global structure of the image conditions, we noise ${\mathbf{x}}_{T}$ using the noise variance indexed at 1/3 of EDM noise schedule and denoise starting from this noised image for the remaining 1/3 of total of $N$ steps. For DDIB, we train two separate unconditional models starting for ${\mathbf{x}}_{0}$ and ${\mathbf{x}}_{T}$ separately and perform translation by reversing DDIM starting from ${\mathbf{x}}_{T}$ and generating using DDIM for ${\mathbf{x}}_{0}$ . We reuse the original baseline code for all baselines while.

Sampling. For all experiments we evaluate models on a low-step regime, i.e. the same number of sampling steps. For all experiments, we set guidance scale $w=0.5$ and for image translation and unconditional generation, we use euler step ratio ratio $s=0.33$ and $s=0$ respectively. In case of $s=0$ , no Euler step is done. With these settings, we set $N=18$ , or $\text{NFE}=53$ , for 32 $\times$ 32 resolution image translation, and for all other resolutions, we use $N=40$ , or $\text{NFE}=118$ , for image translation. For unconditional generation, $N=18\implies\text{NFE}=36$ for CIFAR-10 and $N=40\implies\text{NFE}=79$ for FFHQ-64 $\times$ 64. FID and IS scores are calculated using the entire training set for all datasets for image translation tasks. They are calculated using 50K samples for unconditional generation tasks.

B.1 Additional Results.

Latent space translation. Many real world applications of diffusion models rely on the existence of a latent space (Rombach et al., 2022), which significantly relieves computational burden in practice. We thus also investigate whether DDBM can naturally adapt to latent space translation tasks. We conduct our experiment on Day $\to$ Night (Isola et al., 2017) which is chosen because we observe that the autoencoder presented in Rombach et al. (2022) with factor 8 resolution reduction can faithfully reconstruct the dataset in high quality. We reuse the metrics and baselines from before, and we evaluate all models with $N=18$ using the EDM sampler default for $32\times 32$ images, which is the size for the latent space. To isolate the evaluation for latent space translation, we use the reconstructed ground-truths by the autoencoder as our pseudo ground-truths against which the decoded samples are compared for all metrics. Quantitative results are presented in Table 2.

	Day $\to$ Night-256 $\times$ 256
	FID $\downarrow$	IS $\uparrow$	LPIPS $\downarrow$	MSE $\downarrow$
Pix2Pix	157.1	2.48	0.622	0.148
SDEdit	151.1	2.93	0.582	0.412
DDIB	226.9	2.11	0.789	0.824
Rectified Flow	12.38	3.90	0.366	0.129
I²SB	15.56	4.03	0.363	0.412
DDBM	27.63	3.92	0.549	0.145

Table 5: Evaluation of latent-space image-to-image translation.

We notice that our model performs less well compared to Rectified Flow and I²SB but comes second in IS and MSE. We hypothesize that the predict- ${\mathbf{x}}$ parameterization is optimized for pixel-space generation and latent space contains structures that is more difficult for this parameterization to learn. Nevertheless, qualitative results from our model is high-quality, as shown in Figure 9.

Additional visualization

	CIFAR-10		FFHQ- $64\times 64$
	NFE $\downarrow$	FID $\downarrow$	NFE $\downarrow$	FID $\downarrow$
DDPM [13] DDPM [13]	1000	3.17	1000	3.52
DDIM [44] DDIM [ 44 ]	50	4.67	50	5.18
DDPM++ [46] DDPM++ [ 46 ]	1000	3.01	1000	3.39
NCSN++ [46]	1000	3.77	1000	25.95
Rectified Flow [24] 整流流 [24]	127	2.58	152	4.45
EDM [18] 电火花加工 [18]	35	2.04	79	2.53
DDBM	35	2.06	79	2.44

Denoising Diffusion Bridge Models 去噪扩散桥模型

Abstract 抽象的

1 Introduction 1简介

2 Preliminaries 2预赛

2.1 Generative Modeling with Diffusion Models2.1 扩散模型的生成建模

2.2 Diffusion Process with Fixed Endpoints2.2 固定端点的扩散过程

3 Denoising Diffusion Bridge Models

3.1 Time-Reversed SDE and Probability Flow ODE

Theorem 1.

3.2 Marginal Distributions and Denoising Bridge Score Matching

Theorem 2 (Denoising Bridge Score Matching).

4 Generalized Parameterization for Distribution Translation

5 Stochastic Sampling for Denoising Diffusion Bridges

6 Related Works and Special Cases6相关作品和特例

6.1 Special Cases of Denoising Diffusion Bridge Models

7 Experiments

7.1 Image-to-Image Translation

7.2 Ablation Studies

7.3 Unconditional Generation

8 Conclusion 8结论

References

Appendix A Proofs

A.1 Marginal distribution

A.2 Denoising Bridge Score Matching

Proof.

A.3 Probability Flow ODE of Diffusion Bridges

Proof.

A.4 Special Cases of Denoising Diffusion Bridges

A.5 Generalized Parameterization

A.6 Sampler Discretization

Appendix B Experiment Details

B.1 Additional Results.

Denoising Diffusion Bridge Models
去噪扩散桥模型

2.1 Generative Modeling with Diffusion Models
2.1 扩散模型的生成建模

2.2 Diffusion Process with Fixed Endpoints
2.2 固定端点的扩散过程

6 Related Works and Special Cases
6相关作品和特例