Flow matching and diffusion models are two popular frameworks in generative modeling. Despite seeming similar, there is some confusion in the community about their exact connection. In this post, we aim to clear up this confusion and show that diffusion models and Gaussian flow matching are the same, although different model specifications can lead to different network outputs and sampling schedules. This is great news, it means you can use the two frameworks interchangeably.
流匹配模型和扩散模型是生成模型中两个流行的框架。尽管看似相似,但社区中对它们之间的确切联系存在一些困惑。在这篇文章中,我们将消除这种混淆,并说明扩散模型和高斯流匹配是相同的,尽管不同的模型规格会导致不同的网络输出和采样时间表。这是一个好消息,这意味着您可以交替使用这两个框架。
Flow matching has gained popularity recently, due to the simplicity of its formulation and the “straightness” of its induced sampling trajectories. This raises the commonly asked question:
流量匹配因其表述简单和诱导采样轨迹的 "直线性 "而在最近大受欢迎。这就提出了一个常见问题:
"Which is better, diffusion or flow matching?"
"扩散和流动匹配哪个更好?
As we will see, diffusion models and flow matching are equivalent (for the common special case that the source distribution used with flow matching corresponds to a Gaussian), so there is no single answer to this question. In particular, we will show how to convert one formalism to another. But why does this equivalence matter? Well, it allows you to mix and match techniques developed from the two frameworks. For example, after training a flow matching model, you can use either a stochastic or deterministic sampling method (contrary to the common belief that flow matching is always deterministic).
我们将看到,扩散模型和流量匹配是等价的(对于常见的特殊情况,即流量匹配使用的源分布对应于高斯分布),因此这个问题没有单一的答案。特别是,我们将展示如何将一种形式主义转换为另一种形式主义。但为什么这种等价性很重要呢?因为它可以让你混合和匹配从这两种框架中开发出来的技术。例如,在训练流量匹配模型后,您可以使用随机或确定性采样方法(与人们通常认为流量匹配总是确定性的观点相反)。
We will focus on the most commonly used flow matching formalism with the optimal transport path
我们将重点讨论最常用的流量匹配形式与最佳传输路径
与整流有密切关系
和随机插补
.我们的目的不是推荐一种方法而不是另一种方法(这两种框架都很有价值,各自植根于不同的理论视角,而且实际上更令人鼓舞的是,它们在实践中产生了相同的算法),而是帮助实践者理解并自信地交替使用这些框架,同时理解在调整算法时的真正自由度--无论它被称为什么。
Check this Google Colab for code used to produce plots and animations in this post.
查看此 Google Colab,了解用于制作本帖中的绘图和动画的代码。
We start with a quick overview of the two frameworks.
我们首先简要介绍一下这两个框架。
A diffusion process gradually destroys an observed datapoint
扩散过程通过将数据与高斯噪声混合,随着时间
To generate new samples, we can “reverse” the forward process: We initialize the sample
为了生成新样本,我们可以 "逆转 "正向过程:我们根据标准高斯对样本
.样本的随机性仅来自初始高斯样本,整个反向过程都是确定的。我们稍后将讨论随机取样器。
In flow matching, we view the forward process as a linear interpolation between the data
在流量匹配中,我们将前向过程视为数据
This corresponds to the diffusion forward process if the noise is Gaussian (a.k.a. Gaussian flow matching) and we use the schedule
如果噪声为高斯噪声(又称高斯流匹配),且我们使用
Using simple algebra, we can derive that
通过简单的代数,我们可以得出
Initializing the sample
从标准高斯样本
So far, we can already discern the similar essences in the two frameworks:
至此,我们已经可以看出这两个框架的相似本质:
1. Same forward process, if we assume that one end of flow matching is Gaussian, and the noise schedule of the diffusion model is in a particular form.
1.同样的前向过程,如果我们假设流量匹配的一端是高斯的,而扩散模型的噪声表是特定形式的。
2. "Similar" sampling processes: both follow an iterative update that involves a guess of the clean data at the current time step. (Spoiler: below we will show they are exactly the same!)
2."相似 "的采样过程:两者都采用迭代更新的方式,包括对当前时间步的干净数据进行猜测。(剧透:下面我们将展示它们是完全一样的!)。
It is commonly thought that the two frameworks differ in how they generate samples: Flow matching sampling is deterministic with “straight” paths, while diffusion model sampling is stochastic and follows “curved paths”. Below, we clarify this misconception.
We will focus on deterministic sampling first, since it is simpler, and will discuss the stochastic case later on.
一般认为,这两种框架在生成样本的方式上有所不同:流量匹配采样是确定性的,路径是 "直线",而扩散模型采样是随机的,路径是 "曲线"。下面,我们将澄清这一误解。由于确定性采样较为简单,我们将首先关注确定性采样,稍后再讨论随机采样。
Imagine you want to use your trained denoiser model to transform random noise into a datapoint. Recall that the DDIM update is given by
想象一下,您想使用训练有素的去噪模型将随机噪音转换为数据点。回想一下,DDIM 更新由
Network Output 网络输出 | Reparametrization 重新参数化 |
---|---|
Remember the flow matching update in Equation (4)? This should look similar. If we set the network output as
还记得公式 (4) 中的流量匹配更新吗?这看起来应该很相似。如果我们在最后一行将网络输出设置为
Diffusion with DDIM sampler == Flow matching sampler (Euler).
使用 DDIM 采样器的扩散 == 流量匹配采样器(欧拉)。
Some other comments on the DDIM sampler:
关于 DDIM 采样器的其他一些评论:
The DDIM sampler analytically integrates the reparametrized sampling ODE (i.e.,
如果网络输出随时间变化是一个常数,那么 DDIM 采样器就会对重新参数化的采样 ODE(即
其中涉及
The DDIM sampler is invariant to a linear scaling applied to the noise schedule
DDIM 采样器对噪声表
To validate Claim 2, we present the results obtained using several noise schedules, each of which follows a flow-matching schedule (
为了验证主张 2,我们展示了使用几种噪声计划获得的结果,每种计划都遵循不同缩放因子的流量匹配计划 (
Wait a second! People often say flow matching results in straight paths, but in the above figure, the sampling trajectories look curved.
等一下!人们常说流量匹配的结果是直线路径,但在上图中,采样轨迹看起来是弯曲的。
Well first, why do they say that?
If the model would be perfectly confident about the data point it is moving to, the path from noise to data will be a straight line, with the flow matching noise schedule.
Straight line ODEs would be great because it means that there is no integration error whatsoever.
Unfortunately, the predictions are not for a single point. Instead they average over a larger distribution. And flowing straight to a point != straight to a distribution.
首先,他们为什么这么说?如果模型对其移动到的数据点完全有把握,那么从噪声到数据的路径将是一条直线,流量与噪声时间表相匹配。直线 ODE 非常好,因为这意味着不存在任何积分误差。遗憾的是,预测并不是针对单个点的。相反,它们是更大分布的平均值。而直线流向一个点 != 直线流向一个分布。
In the interactive graph below, you can change the variance of the data distribution on the right hand side by the slider.
Note how the variance preserving schedule is better (straighter paths) for wide distributions,
while the flow matching schedule works better for narrow distributions.
在下面的交互式图表中,你可以通过滑块改变右侧数据分布的方差。请注意,方差保持计划表更适合宽分布(路径更直),而流量匹配计划表更适合窄分布。
Finding such straight paths for real-life datasets like images is of course much less straightforward. But the conclusion remains the same: The optimal integration method depends on the data distribution.
当然,要为现实生活中的数据集(如图像)找到这样的直线路径就没那么简单了。但结论是相同的:最佳的整合方法取决于数据分布。
Two important takeaways from deterministic sampling:
确定性取样的两个重要启示
1. Equivalence in samplers: DDIM is equivalent to the flow matching sampler, and is invariant to a linear scaling to the noise schedule.
1.采样器的等效性:DDIM 等价于流量匹配采样器,并且对噪声计划的线性缩放不变。
2. Straightness misnomer: Flow matching schedule is only straight for a model predicting a single point. For realistic distributions, other schedules can give straighter paths.
2.直线误称:流量匹配计划只对预测单点的模型而言是直线的。对于现实的分布,其他时间表可以给出更直的路径。
Diffusion models 扩散模型
通过估算
Flow matching also fits in the above training objective. Recall below is the conditional flow matching objective
used by
流量匹配也符合上述训练目标。请回顾下面的条件流量匹配目标
Since
由于
Below we summarize several network outputs proposed in the literature, including a few versions
used by diffusion models and the one used by flow matching. They can be derived from each other given the current data
下面我们总结了文献中提出的几种网络输出,包括扩散模型使用的几个版本和流量匹配使用的版本。在当前数据
Network Output 网络输出 | Formulation 配方 | MSE on Network Output 网络输出的 MSE |
---|---|---|
In practice, however, the model output might make a difference. For example,
不过,在实际操作中,模型输出可能会有所区别。例如
Following the similar reason,
出于类似的原因,
Therefore, a heuristic is to choose a network output that is a combination of
因此,启发式方法是选择一个网络输出,它是
The weighting function is the most important part of the loss. It balances the importance of high frequency and low frequency components in perceptual data such as images, videos and audo
加权函数是损耗中最重要的部分。它平衡了图像、视频和音频等感知数据中高频和低频成分的重要性。
.这一点至关重要,因为这些信号中的某些高频成分是人类无法感知的,因此在模型容量有限的情况下,最好不要将模型容量浪费在这些成分上。根据损失的权重,我们可以得出以下非显而易见的结果:
Flow matching weighting == diffusion weighting of
流量匹配加权 ==
That is, the conditional flow matching objective in Equation (7) is the same as a commonly used setting in diffusion models! See Appendix D.2-3 in
也就是说,等式(7)中的条件流量匹配目标与扩散模型中常用的设置相同!请参见
的详细推导。下面我们将绘制文献中几种常用加权函数与
The flow matching weighting (also
随着
是流量匹配的再加权版本,与 EDM 加权非常相似
这在扩散模型中很流行。
We discuss the training noise schedule last, as it should be the least important to training for the following reasons:
我们最后讨论训练噪音时间表,因为它对训练来说应该是最不重要的,原因如下:
A few takeaways for training of diffusion models / flow matching:
对扩散模型/流量匹配训练的几点启示
1. Equivalence in weightings: The weighting function is important for training, which balances the importance of different frequency components of perceptual data. Flow matching weightings coincidentlly match commonly used diffusion training weightings in the literature.
1.加权等效:加权函数对训练非常重要,它可以平衡感知数据中不同频率成分的重要性。流量匹配权重与文献中常用的扩散训练权重不谋而合。
2. Insignificance of training noise schedule: The noise schedule is far less important to the training objective, but can affect the training efficiency.
2.训练噪音时间表的重要性:噪音计划对训练目标的影响要小得多,但会影响训练效率。
3. Difference in network outputs: The network output proposed by flow matching is new, which nicely balances
3.网络输出的差异:流量匹配提出的网络输出是新的,它很好地平衡了
In this section, we discuss different kinds of samplers in more detail.
本节将详细讨论各种采样器。
The Reflow operation in flow matching connects noise and data points in a straight line.
One can obtain these (data, noise) pairs by running a deterministic sampler from noise.
A model can then be trained to directly predict the data given the noise avoiding the need for sampling.
In the diffusion literature, the same approach was the one of the first distillation techniques
流量匹配中的回流操作将噪声点和数据点连成一条直线。通过从噪声中运行确定性采样器,可以获得这些(数据、噪声)对。然后就可以训练出一个模型,在噪声条件下直接预测数据,从而避免了采样的需要。在扩散文献中,同样的方法也是最早的蒸馏技术之一
So far we have just discussed the deterministic sampler of diffusion models or flow matching. An alternative is to use stochastic samplers such as the DDPM sampler
到目前为止,我们只是讨论了扩散模型或流量匹配的确定性采样器。另一种方法是使用随机取样器,如 DDPM 取样器
Performing one DDPM sampling step going from
执行一个从
For individual samples, these updates behave quite differently: the reversed DDIM update consistently pushes each sample away from the modes of the distribution, while the diffusion update is entirely random. However, when aggregating all samples, the resulting distributions after the updates are identical. Consequently, if we perform a DDIM sampling step (without reversing the sign) followed by a forward diffusion step, the overall distribution remains unchanged from the one prior to these updates.
对于单个样本,这些更新的表现截然不同:反向 DDIM 更新会持续地将每个样本推离分布的模式,而扩散更新则完全是随机的。然而,当汇总所有样本时,更新后的分布结果是相同的。因此,如果我们先执行一个 DDIM 采样步骤(不反转符号),然后再执行一个正向扩散步骤,总体分布与更新前的分布保持不变。
The fraction of the DDIM step to undo by renoising is a hyperparameter which we are free to choose (i.e. does not have to be exact half of the DDIM step), and which has been called the level of churn by
我们可以自由选择(即不必精确到 DDIM 步长的一半)通过重细化来撤销 DDIM 步长的部分超参数,这部分超参数被称为 "搅动水平"(level of churn)。
.有趣的是,在采样器中加入 "搅动 "的效果是降低了采样过程中早期模型预测对最终样本的影响,而增加了后期预测的权重。如下图所示:
Here we ran different samplers for 100 sampling steps using a cosine noise schedule
and
在这里,我们使用余弦噪声计划和
.忽略非线性相互作用,采样器产生的最终样本可以写成采样过程中预测值
而 DDPM 则更重视采样末期的预测。另见
和
We’ve observed the practical equivalence between diffusion models and flow matching algorithms. Here, we formally describe the equivalence of the forward and sampling processes using ODE and SDE, as a completeness in theory.
我们已经观察到扩散模型和流量匹配算法之间的实际等价性。在此,我们使用 ODE 和 SDE 正式描述了前向过程和采样过程的等价性,作为理论上的完备性。
The forward process of diffusion models which gradually destroys a data over time can be described by the following stochastic differential equation (SDE):
扩散模型的前向过程会随着时间的推移逐渐破坏数据,可以用下面的随机微分方程(SDE)来描述:
where
其中,
where
其中,
Note that we have introduced an additional parameter
请注意,我们引入了一个额外的参数
The interpolation between
流量匹配中
Assuming the interpolation is
假设插值为
The generative process is simply reversing the ODE in time, and replacing
生成过程只是在时间上反转 ODE,并将
在这种情况下,可以将其推广为 SDE:
Both frameworks are defined by three hyperparameters respectively:
这两个框架分别由三个超参数定义:
From flow matching to diffusion:
从流量匹配到扩散:
In summary, aside from training considerations and sampler selection, diffusion and Gaussian flow matching exhibit no fundamental differences.
总之,除了训练考虑因素和采样器选择之外,扩散匹配和高斯流匹配没有本质区别。
If you’ve read this far, hopefully we’ve convinced you that diffusion models and Gaussian flow matching are equivalent. However, we highlight two new model specifications that Gaussian flow matching brings to the field:
如果您读到这里,希望我们已经说服了您,扩散模型和高斯流匹配是等价的。不过,我们将重点介绍高斯流匹配为该领域带来的两种新的模型规范:
It would be interesting to investigate the importance of these two model specifications empirically in different real world applications, which we leave to future work. It is also an exciting research area to apply flow matching to more general cases where the source distribution is non-Gaussian, e.g. for more structured data like protein
在不同的实际应用中,对这两种模型规格的重要性进行实证研究是很有意义的,这也是我们今后的工作重点。将流匹配应用于源分布为非高斯分布的更一般情况也是一个令人兴奋的研究领域,例如蛋白质等结构更复杂的数据。
Thanks to our colleagues at Google DeepMind for fruitful discussions. In particular, thanks to Sander Dieleman, Ben Poole and Aleksander Hołyński.
感谢谷歌 DeepMind 的同事们富有成效的讨论。特别要感谢 Sander Dieleman、Ben Poole 和 Aleksander Hołyński。
@inproceedings{gao2025diffusionmeetsflow,
author = {Gao, Ruiqi and Hoogeboom, Emiel and Heek, Jonathan and Bortoli, Valentin De and Murphy, Kevin P. and Salimans, Tim},
title = {Diffusion Meets Flow Matching: Two Sides of the Same Coin},
year = {2024},
url = {https://diffusionflow.github.io/}
}