Lin Yang 林阳
Huawei Noah's Ark Lab
华为诺亚方舟实验室
China 中国
Junlong Lyu 吕俊龙
Huawei Noah's Ark Lab
华为诺亚方舟实验室
Hong Kong SAR, China
中国香港特别行政区
Wenlong Lyu 吕文龙
Huawei Noah's Ark Lab
华为诺亚方舟实验室
China 中国
Zhitang Chen 陈志堂
Huawei Noah's Ark Lab
华为诺亚方舟实验室
Hong Kong SAR, China
中国香港特别行政区
Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nyström approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve a state-of-the-art performance.
贝叶斯优化(BO)是一种样本高效的优化算法,被广泛应用于各类场景。在某些具有挑战性的 BO 任务中,由于加工误差、执行噪声或环境变异等优化过程中不可避免的随机性,会导致输入不确定性。这种不确定性使得实际评估值偏离预期输入,最终造成结果性能的显著波动。本文提出了一种新型鲁棒贝叶斯优化算法 AIRBO,能够有效识别在任意输入不确定性下均表现稳定的鲁棒最优点。我们通过将高斯过程与最大均值差异(MMD)相结合,直接对任意分布的不确定输入进行建模,并利用 Nyström 近似进一步加速后验推理。在 MMD 估计误差下建立了严格的理论遗憾界,同时通过合成函数和实际问题的广泛实验表明,本方法能处理多种输入不确定性并实现当前最优性能。
Bayesian Optimization (BO) is a powerful sequential decision-making algorithm for high-cost black-box optimization. Owing to its remarkable sample efficiency and capacity to balance exploration and exploitation, BO has been successfully applied in diverse domains, including neural architecture search [32], hyper-parameter tuning [4, 29, 12], and robotic control [18, 5], among others. Nevertheless, in some real-world problems, the stochastic nature of the optimization process, such as machining error during manufacturing, execution noise of control, or variability in contextual factor,inevitably introduces input randomness,rendering the design parameter
贝叶斯优化(BO)是一种针对高成本黑箱优化问题的强大序列决策算法。凭借其卓越的样本效率与探索-开发平衡能力,BO 已成功应用于神经网络架构搜索[32]、超参数调优[4,29,12]、机器人控制[18,5]等多个领域。然而在某些实际问题中,制造过程中的加工误差、控制执行噪声或环境因素变异等随机特性,会不可避免地引入输入随机性,导致设计参数
Figure 1: Robust Bayesian optimization problem.
图 1:鲁棒贝叶斯优化问题
To find a robust optimum, it is crucial to account for input uncertainty during the optimization process. Existing works [24, 3, 7, 10] along this direction assume that the exact input value, i.e.,
为寻找鲁棒最优解,在优化过程中考虑输入不确定性至关重要。现有相关研究[24,3,7,10]均假设可以观测到精确输入值(即图 1b 中的
An alternative approach involves directly modeling the uncertain inputs. A pioneering work by Moreno et al. [20] assumes Gaussian input distribution and employs a symmetric Kullback-Leibler divergence (SKL) to measure the distance of input variables. Dallaire et al. [13] implement a Gaussian process model with an expected kernel and derive a closed-form solution by restricting the kernel to linear, quadratic, or squared exponential kernels and assuming Gaussian inputs. Nonetheless, the applicability of these methods is limited due to their restrictive Gaussian input distribution assumption and kernel choice. To surmount these limitations, Oliveira et al. propose a robust Gaussian process model that incorporates input distribution by computing an integral kernel. Although this kernel can be applied to various distributions and offers a rigorous regret bound, its posterior inference requires a large sampling and can be time-consuming.
另一种方法直接对不确定输入进行建模。Moreno 等人[20]的开创性工作假设输入服从高斯分布,并采用对称 Kullback-Leibler 散度(SKL)来衡量输入变量的距离。Dallaire 等人[13]实现了带有期望核的高斯过程模型,通过将核函数限制为线性、二次或平方指数核,并假设高斯输入,推导出闭式解。然而,这些方法由于受限于高斯输入分布假设和核函数选择,其适用性较为有限。为克服这些限制,Oliveira 等人提出了一种鲁棒高斯过程模型,通过计算积分核来纳入输入分布。虽然该核函数可适用于多种分布并提供严格的遗憾界,但其后验推断需要大量采样且耗时较长。
In this work, we propose an Arbitrary Input uncertainty Robust Bayesian Optimization algorithm (AIRBO). This algorithm can directly model the uncertain input of arbitrary distribution and propagate the input uncertainty into the surrogate posterior, which can then be used to guide the search for robust optimum. To achieve this, we employ Gaussian Process (GP) as the surrogate and empower its kernel design with the Maximum Mean Discrepancy (MMD), which allows us to comprehensively compare the uncertain inputs in Reproducing Kernel Hilbert Space (RKHS) and accurately quantify the target function under various input uncertainties(Sec. 3.1). Moreover, to stabilize the MMD estimation and accelerate the posterior inference, we utilize Nyström approximation to reduce the space complexity of MMD estimation from
本文提出了一种任意输入不确定性鲁棒贝叶斯优化算法(AIRBO)。该算法可直接对任意分布的不确定输入进行建模,并将输入不确定性传递至代理后验分布,从而指导鲁棒最优解的搜索。为实现这一目标,我们采用高斯过程(GP)作为代理模型,并通过最大均值差异(MMD)增强其核函数设计,使我们能够在再生核希尔伯特空间(RKHS)中全面比较不确定输入,并精准量化不同输入不确定性下的目标函数(第 3.1 节)。此外,为稳定 MMD 估计并加速后验推断,我们采用 Nyström 近似将 MMD 估计的空间复杂度从
In this section, we first formulize the robust optimization problem under input uncertainty then briefly review the intuition behind Bayesian Optimization and Gaussian Processes.
本节首先形式化表述输入不确定性下的鲁棒优化问题,随后简要回顾贝叶斯优化与高斯过程背后的原理。
As illustrated in Figure 1b,we consider an optimization of expensive black-box function:
如图 1b 所示,我们考虑对昂贵黑箱函数
Depending on the specific problem and randomness source,the input distribution
根据具体问题和随机性来源的不同,输入分布
In this paper, we focus on finding the robust optimum with BO. Each iteration of BO involves two key steps: I) fitting a surrogate model and II) maximizing an acquisition function.
本文重点研究如何通过贝叶斯优化(BO)寻找鲁棒最优解。BO 的每次迭代包含两个关键步骤:I)拟合代理模型 II)最大化采集函数。
Gaussian Process Surrogate: To build a sample-efficient surrogate, we choose Gaussian Process (GP) as the surrogate model in this paper. Following [34], GP can be interpreted from a weight-space view: given a set of
高斯过程代理模型:为构建样本高效的代理模型,本文选择高斯过程(GP)作为基础模型。根据文献[34],GP 可以从权重空间视角进行解释:给定一组
where
其中
where
其中
From this interpretation of GP,we note that its core idea is to project the input
从高斯过程的这种解释中,我们注意到其核心思想是将输入
Acquisition Function Optimization: Given the posterior of GP surrogate model, the next step is to decide a query point
采集函数优化:给定 GP 代理模型的后验分布,下一步是确定查询点
where
其中
To cope with randomness during the optimization process, we aim to build a robust surrogate that can directly accept the uncertain inputs of arbitrary distributions and propagate the input uncertainty into the posterior. Inspired by the weight-space interpretation of GP, we empower GP kernel with MMD to compare the uncertain inputs in RKHS. In this way, the input randomness is considered during the covariance computation and naturally reflected in the resulting posterior, which then can be used to guide the search for a robust optimum(Sec. 3.1). To further accelerate the posterior inference, we employ Nyström approximation to stabilize the MMD estimation and reduce its space complexity (Sec. 3.2).
为应对优化过程中的随机性,我们旨在构建一个能直接接受任意分布不确定输入,并将输入不确定性传递至后验分布的鲁棒代理模型。受高斯过程权重空间解释的启发,我们通过最大平均差异(MMD)增强高斯过程核函数,在再生核希尔伯特空间中进行不确定输入的比较。这种方法在协方差计算时考虑了输入随机性,并自然地反映在最终后验分布中,从而可用来指导鲁棒最优解的搜索(第 3.1 节)。为进一步加速后验推断,我们采用 Nyström 近似法来稳定 MMD 估计并降低其空间复杂度(第 3.2 节)。
Assume
假设
To do so, we turn to the Integral Probabilistic Metric (IPM) [23]. The basic idea behind IPM is to define a distance measure between two distributions
为此,我们采用积分概率度量(IPM)[23]。IPM 的核心思想是将两个分布
where
其中
In this work, we choose MMD as the distance measurement for the uncertain inputs because of its intrinsic connection with distance measurement in RKHS. Given a characteristic kernel
本研究选择 MMD 作为不确定输入的距离度量,因其与 RKHS 中的距离度量存在本质关联。给定特征核
Without any additional assumption on the input distributions,except we can get
在不对输入分布做任何额外假设的情况下,只要我们能分别从
To integrate MMD into the GP surrogate,we design an MMD-based kernel over
为了将 MMD 整合到高斯过程代理模型中,我们设计了基于 MMD 的
with a learnable scaling parameter
其中
With the MMD kernel,our surrogate model places a prior
通过 MMD 核函数,我们的代理模型设置先验
where
其中
To derive the posterior distribution of our robust GP surrogate, it requires estimating the MMD between each pair of inputs. Gretton et al. prove the empirical estimator in Eq. 8 approximates MMD in a bounded and asymptotic way [15]. However,the sampling size
为推导鲁棒高斯过程代理模型的后验分布,需计算每对输入间的最大均值差异(MMD)。Gretton 等人证明公式 8 中的经验估计器能以有界渐近方式近似 MMD[15]。但用于估计的采样规模
To reduce the space and computation complexity while retaining a stable MMD estimation, we resort to the Nyström approximation [33]. This method alleviates the computational cost of kernel matrix by randomly selecting
为在保持稳定 MMD 估计的同时降低空间和计算复杂度,我们采用 Nyström 近似法[33]。该方法通过从
(12)
where
其中
Assume
假设
We consider a more general case. Choosing any suitable functional
我们考虑更一般化的情况。选择任意合适的泛函
One important theoretical guarantee to conduct
构建
which is just our object function,For
这正是我们的目标函数。对于径向核
Suppose we have an approximation kernel function
假设我们有一个近似核函数
where
其中
Denote
记
Assumption 1. For any
假设 1. 对于任意
Remark. Note that this assumption is standard in our case: we may assume
注记。请注意这一假设在本研究中是标准设定:我们可以设
Now we restrict our Gaussian process in the subspace
现在我们将高斯过程限制在子空间
Theorem 1. Let
定理 1。设
we have that the uncertain-inputs cumulative regret satisfies:
不确定输入的累计遗憾度满足:
with probability at least
置信度不低于
The proof of our main theorem 1 can be found in appendix B. 3
主定理 1 的证明详见附录 B.3
The assumption that
假设
To achieve an regret of order
为实现与精确改进
To analysis the exact order of
要精确分析
Theorem 2 (Bounding the Maximum information gain). Suppose
定理 2(最大信息增益上界)。假设
Thus,when
因此当
Figure 2: Modeling results under different types of input uncertainties.
图 2:不同类型输入不确定性下的建模结果。
In this section, we first experimentally demonstrate AIRBO's ability to model uncertain inputs of arbitrary distributions, then validate the Nyström-based inference acceleration for GP posterior, followed by experiments on robust optimization of synthetic functions and real-world benchmark.
本节首先通过实验验证 AIRBO 对任意分布不确定输入的建模能力,随后检验基于 Nyström 方法的高斯过程后验推断加速效果,最后在合成函数和现实基准测试上进行鲁棒优化实验
Modeling arbitrary uncertain inputs: We demonstrate MMDGP's capabilities by employing an RKHS function as the black-box function and randomly selecting 10 samples from its input domain. Various types of input randomness are introduced into the observation and produce training datasets of
对任意不确定输入的建模:我们通过采用 RKHS 函数作为黑盒函数,并从其输入域中随机选取 10 个样本,展示了 MMDGP 的建模能力。在观测中引入多种类型的输入随机性,生成具有不同
To further examine our model's ability under complex input uncertainty, we design the input distribution to follow a beta distribution with input-dependent variance:
为深入检验模型在复杂输入不确定性下的能力,我们设计输入分布遵循具有输入相关方差的贝塔分布:
Moreover,we evaluated MMDGP using a step-changing Chi-squared distribution
此外,我们采用阶跃变化的卡方分布
Comparing with the other surrogate models: We also compare our model with the other surrogate models under the step-changing Chi-squared input distribution. The results are reported in Figure 7 and they demonstrate our model outperforms obviously under such a complex input uncertainty (see Appendix D. 1 for more details)
与其他代理模型对比:我们还在阶跃变化卡方输入分布下将本模型与其他代理模型进行对比。结果如图 7 所示,表明在此类复杂输入不确定性条件下本模型具有明显优势(详见附录 D.1)。
Estimation variance of MMD: We first examine the variance of MMD estimation by employing two beta distributions
MMD 估计方差分析:我们首先通过采用两个贝塔分布
We further utilize this beta distribution
我们进一步利用这个β分布
Figure 3: The posterior derived from the empirical and Nyström MMD approximators with varying sampling sizes.
图 3:采用不同采样规模时,经验 MMD 近似器与尼斯特罗姆 MMD 近似器得到的后验对比。
Table 1: Performance of Posterior inference for 512 samples.
表 1:512 个样本的后验推断性能
Sampling Size 采样量 | Sub-sampling Size 子采样量 | Inference Time (seconds) 推断时间(秒) | Batch Size (samples) 批量大小(样本数) | |
Empirical 经验值 | 20 | - | 512 | |
Empirical 经验值 | 100 | - | 128 | |
Empirical 经验值 | 1000 | - | 1 | |
Nystrom | 100 | 10 | 512 | |
Nystrom 尼斯特罗姆法 | 1000 | 100 | 128 |
MMD estimation, it is crucial to note that a larger sampling size significantly increases GPU memory usage because of its quadratic space complexity of
在 MMD 估计中必须注意,较大的采样规模会显著增加 GPU 内存占用,因其空间复杂度与训练样本数
Table 1 summarizes the inference time of MMDGP posteriors at 512 samples with different sampling sizes. We find that, for beta distribution defined in this experiment, the Nyström MMD estimator with a sampling size of 100 and sub-sampling size of 10 already delivers a comparable result to the empirical estimator with 100 samples (as seen in the acquisition plot of Figure 3d). Also, the inference time is reduced from 8.117 to 0.78 seconds by enabling parallel computation for more samples. For the cases that require much more samples for MMD estimation (e.g., the input distribution is quite complex or high-dimensional), this Nyström-based acceleration can have a more pronounced impact.
表 1 总结了 512 个样本在不同采样规模下 MMDGP 后验的推断时间。我们发现,对于本实验定义的 beta 分布,采用 100 采样规模和 10 子采样规模的尼斯特罗姆 MMD 估计器,已能达到与 100 样本经验估计器相当的效果(见图 3d 的采集曲线)。同时通过启用更多样本的并行计算,推断时间从 8.117 秒缩短至 0.78 秒。对于需要更多 MMD 估计样本的情况(如输入分布非常复杂或高维时),这种基于尼斯特罗姆法的加速将产生更显著的效果。
Effect of Nyström estimator on optimization: To investigate the effect of Nyström estimator on optimization, we also perform an ablation study in Appendix D.2, the results in Figure 8 suggest that Nyström estimator slightly degrades the optimization performance but greatly improves the inference efficiency.
Nyström 估计器对优化的影响:为研究 Nyström 估计器对优化的影响,我们在附录 D.2 进行了消融实验。图 8 结果显示,Nyström 估计器虽轻微降低优化性能,但显著提升了推理效率。
Experiment setup: To experimentally validate AIRBO’s performance,we implement our algorithm
实验设置:为验证 AIRBO 性能,我们基于 BoTorch[2]实现算法
Figure 4: Robust optimization results on synthetic functions.
图 4:合成函数上的鲁棒优化结果
focus on non-robust optimization. 3) SKL-UCB employs symmetric Kullback-Leibler divergence to measure the distance between the uncertain inputs [20]. Its closed-form solution only exists if the input distributions are the Gaussians. 4) ERBF-UCB is the robust GP with the expected Radial Basis Function kernel proposed in [13]. It computes the expected kernel under input distribution using the Gaussian integrals. Assuming the input distributions are sub-Gaussians, this method can efficiently find the robust optimum. Since all the methods use UCB acquisition, we simply distinguish them by their surrogate names in the following tests.
专注于非鲁棒优化。3) SKL-UCB 采用对称 Kullback-Leibler 散度衡量不确定输入间的距离[20],其闭式解仅当输入分布为高斯分布时存在。4) ERBF-UCB 是文献[13]提出的采用期望径向基函数核的鲁棒高斯过程,通过高斯积分计算输入分布下的期望核函数。该方法在假设输入分布为次高斯分布时,能高效找到鲁棒最优解。由于所有方法均采用 UCB 采集函数,后续测试中我们直接以其代理模型名称进行区分。
At the end of the optimization,each algorithm needs to decide a final outcome
在优化过程结束时,每个算法都需要确定一个最终输出结果
where
其中
1D RKHS function: We begin the optimization evaluation with an RKHS function that is widely used in previous BO works [1,24,10]. Figure 4a shows its exact global optimum resides at
一维 RKHS 函数:我们首先采用先前贝叶斯优化研究中广泛使用的 RKHS 函数进行优化评估[1,24,10]。图 4a 显示当输入服从高斯分布
1D double-peak function: To test with more complex input uncertainty, we design a blackbox function with double peaks and set the input distribution to be a multi-modal distribution
一维双峰函数:为测试更复杂的输入不确定性,我们设计了一个具有双峰的黑箱函数,并将输入分布设定为多模态的 beta 分布。图 4c 展示了该黑箱函数(黑色实线)以及通过输入分布采样数值估算得到的对应函数期望值(即彩色线条)。需注意在 beta 分布下真实的稳健最优解位于
10D bumped-bowl function: we also extend our evaluation to a 10D bumped-bowl function [27] under a concatenated circular distribution. Figure 9 demonstrates AIRBO scales efficiently to high dimension and outperforms the others under complex input uncertainty(see Appendix D.3).
十维凸底隆起函数:我们还将评估扩展至拼接环形分布下的十维凸底隆起函数[27]。图 9 表明 AIRBO 能高效扩展至高维空间,在复杂输入不确定性条件下表现优于其他方法(详见附录 D.3)。
Robust robot pushing: To evaluate AIRBO in a real-world problem, we employ a robust robot pushing benchmark from [31], in which a ball is placed at the origin point of a 2D space and a robot learns to push it to a predefined target location
鲁棒机器人推球任务:为评估 AIRBO 在现实问题中的表现,我们采用[31]提出的鲁棒机器人推球基准测试。该测试将小球置于二维空间原点,机器人学习将其推至预设目标位置
Figure 6: The robot's initial locations and push times found by different algorithms
图 6:不同算法发现的机器人初始位置与推动时长
In this work, we generalize robust Bayesian Optimization to an uncertain input setting. The weight-space interpretation of GP inspires us to empower the GP kernel with MMD and build a robust surrogate for uncertain inputs of arbitrary distributions. We also employ the Nyström approximation to boost the posterior inference and provide theoretical regret bound under approximation error. The experiments on synthetic blackbox function and benchmarks demonstrate our method can handle various input uncertainty and achieve state-of-the-art optimization performance.
本研究将鲁棒贝叶斯优化推广至不确定输入场景。高斯过程的权重空间解释启发我们通过最大平均差异(MMD)增强 GP 核函数,构建适用于任意分布不确定输入的鲁棒代理模型。我们采用 Nyström 近似加速后验推理,并在近似误差下提供理论遗憾界。针对合成黑箱函数与基准测试的实验表明,本方法能处理多种输入不确定性,并实现最先进的优化性能。
There are several interesting directions that worth to explore: though we come to current MMD-based kernel from the weight-space interpretation of GP and the RKHS realization of MMD, our kernel design exhibits a deep connection with existing works on kernel over probability measures [22, 11] Along this direction, as our theoretic regret analysis in Section 4 does not assume any particular form of kernel and the Nyström acceleration can also be extended to the other kernel computation, it is possible that AIRBO can be further generalized to a more rich family of kernels. Moreover, the MMD used in our kernel is by no means limited to its RKHS realization. In fact,any function class
有几个值得探索的有趣方向:虽然我们当前基于 MMD 的核函数源自 GP 的权重空间解释和 MMD 的 RKHS 实现,但我们的核设计与现有关于概率测度核函数的研究[22,11]存在深刻联系。沿着这个方向,由于第 4 节的理论遗憾分析并未假设任何特定核形式,且 Nyström 加速也可推广至其他核计算,AIRBO 有可能进一步推广到更丰富的核函数族。此外,我们核中使用的 MMD 绝不限于其 RKHS 实现——实际上任何具有一致收敛保证且足够丰富的函数类
We sincerely thank Yanbin Zhu and Ke Ma for their help on formulating the problem. Also, a heartfelt appreciation goes to Lu Kang for her constant encouragement and support throughout this work. References
我们衷心感谢严斌朱和马柯在问题建模方面给予的帮助。同时,特别感谢康璐在整个研究过程中持续的鼓励与支持。参考文献
[1] John-Alexander M Assael et al. "Heteroscedastic treed bayesian optimisation". In: arXiv preprint arXiv:1410.7172 (2014).
[1] John-Alexander M Assael 等. "异方差树贝叶斯优化". 见于: arXiv 预印本 arXiv:1410.7172 (2014).
[2] Maximilian Balandat et al. "BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization". In: Advances in Neural Information Processing Systems 33. 2020. URL: http: //arxiv.org/abs/1910.06403
[2] Maximilian Balandat 等. "BoTorch: 高效蒙特卡洛贝叶斯优化框架". 见于: 神经信息处理系统进展 33. 2020. 网址: http://arxiv.org/abs/1910.06403
[3] Justin J Beland and Prasanth B Nair. "Bayesian Optimization Under Uncertainty". In: Advances in neural information processing systems (2017), p. 5.
[3] Justin J Beland 与 Prasanth B Nair. "不确定性条件下的贝叶斯优化". 见于: 神经信息处理系统进展 (2017), 第 5 页.
[4] James Bergstra et al. "Algorithms for hyper-parameter optimization". In: Advances in neural information processing systems 24 (2011).
[4] James Bergstra 等.《超参数优化算法》. 见: 神经信息处理系统进展 24 (2011).
[5] Felix Berkenkamp, Andreas Krause, and Angela P Schoellig. "Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics". In: Machine Learning (2021), pp. 1-35.
[5] Felix Berkenkamp, Andreas Krause 与 Angela P Schoellig.《带安全约束的贝叶斯优化:机器人学中安全自动的参数调谐》. 见: 机器学习 (2021), 第 1-35 页.
[6] Mikołaj Bińkowski et al. Demystifying MMD GANs. 2021. DOI: 10.48550/arXiv. 1801 01401. arXiv: 1801.01401
[6] Mikołaj Bińkowski 等.《揭秘 MMD 生成对抗网络》. 2021. DOI: 10.48550/arXiv.1801.01401. arXiv: 1801.01401
[7] Ilija Bogunovic et al. "Adversarially Robust Optimization with Gaussian Processes". In: Advances in Neural Information Processing Systems. Vol. 31. Curran Associates, Inc., 2018.
[7] Ilija Bogunovic 等.《基于高斯过程的对抗鲁棒优化》. 见: 神经信息处理系统进展. 第 31 卷. Curran Associates 公司, 2018.
[8] Antoine Chatalic et al. "Nyström Kernel Mean Embeddings". In: International Conference on Machine Learning. 2022.
[8] Antoine Chatalic 等. "Nyström 核均值嵌入". 见: 国际机器学习大会. 2022.
[9] Sayak Ray Chowdhury and Aditya Gopalan. "On Kernelized Multi-armed Bandits". In: International Conference on Machine Learning. 2017.
[9] Sayak Ray Chowdhury 与 Aditya Gopalan. "核化多臂老虎机研究". 见: 国际机器学习大会. 2017.
[10] Ryan B. Christianson and Robert B. Gramacy. Robust Expected Improvement for Bayesian Optimization. 2023. arXiv: 2302.08612
[10] Ryan B. Christianson 与 Robert B. Gramacy. 贝叶斯优化的鲁棒期望改进方法. 2023. arXiv 预印本: 2302.08612
[11] Andreas Christmann and Ingo Steinwart. "Universal Kernels on Non-Standard Input Spaces". In: NIPS. 2010. URL: https://api.semanticscholar.org/CorpusID:16968759
[11] Andreas Christmann 和 Ingo Steinwart. "非标准输入空间的通用核函数". 见: 神经信息处理系统会议. 2010. 网址: https://api.semanticscholar.org/CorpusID:16968759
[12] Alexander I Cowen-Rivers et al. "HEBO: pushing the limits of sample-efficient hyper-parameter optimisation". In: Journal of Artificial Intelligence Research 74 (2022), pp. 1269- 1349.
[12] 亚历山大·I·考恩-里弗斯等。"HEBO:突破高效超参数优化的样本效率极限"。见:《人工智能研究杂志》74 卷(2022 年),第 1269-1349 页。
[13] Patrick Dallaire, Camille Besse, and Brahim Chaib-draa. "An Approximate Inference with Gaussian Process to Latent Functions from Uncertain Data". In: Neurocomputing. Adaptive Incremental Learning in Neural Networks 74.11 (2011), pp. 1945-1955. DOI: 10.1016/j neucom.2010.09.024
[13] 帕特里克·达莱尔、卡米耶·贝斯与布拉希姆·柴卜-德拉。"基于高斯过程的隐函数近似推理方法处理不确定数据"。见:《神经计算:神经网络的自适应增量学习》74 卷 11 期(2011 年),第 1945-1955 页。DOI:10.1016/j.neucom.2010.09.024
[14] Arthur Gretton, Dougal Sutherland, and Wittawat Jitkrittum. "Interpretable comparison of distributions and models". In: Advances in Neural Information Processing Systems [Tutorial] (2019).
[14] 亚瑟·格雷顿、杜格尔·萨瑟兰与威塔瓦·吉克里通。"分布与模型的可解释性比较"。见:《神经信息处理系统进展》[教程](2019 年)。
[15] Arthur Gretton et al. "A kernel two-sample test". In: The Journal of Machine Learning Research 13.1 (2012), pp. 723-773.
[15] 亚瑟·格雷顿等。"核双样本检验"。见:《机器学习研究杂志》13 卷 1 期(2012 年),第 723-773 页。
[16] Trevor Hastie et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer, 2009.
[16] 特雷弗·哈斯蒂等。《统计学习基础:数据挖掘、推断与预测》。第 2 卷。施普林格出版社,2009 年。
[17] Thomas Kühn. "Eigenvalues of integral operators with smooth positive definite kernels". In: Archiv der Mathematik 49 (1987), pp. 525-534. URL: https://api.semanticscholar org/CorpusID:121372638
[17] 托马斯·库恩。"具有光滑正定核的积分算子特征值"。发表于:《数学档案》49 卷(1987 年),第 525-534 页。URL:https://api.semanticscholar org/CorpusID:121372638
[18] Ruben Martinez-Cantin et al. "Active policy learning for robot planning and exploration under uncertainty." In: Robotics: Science and systems. Vol. 3. 2007, pp. 321-328.
[18] 鲁本·马丁内斯-坎廷等。"不确定性下机器人规划与探索的主动策略学习"。发表于:《机器人学:科学与系统》第 3 卷。2007 年,第 321-328 页。
[19] Seyedali Mirjalili and Andrew Lewis. "Obstacles and difficulties for robust benchmark problems: A novel penalty-based robust optimisation method". In: Information Sciences 328 (2016), pp. 485-509.
[19] 赛义德阿里·米尔贾利利与安德鲁·刘易斯。"鲁棒基准问题的障碍与难点:一种新型基于惩罚的鲁棒优化方法"。发表于:《信息科学》328 卷(2016 年),第 485-509 页。
[20] Pedro Moreno, Purdy Ho, and Nuno Vasconcelos. "A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications". In: Advances in Neural Information Processing Systems. Vol. 16. MIT Press, 2003.
[20] 佩德罗·莫雷诺、珀迪·何与努诺·瓦斯科塞洛斯。《基于 Kullback-Leibler 散度的 SVM 分类核函数及其在多媒体应用中的应用》。发表于:《神经信息处理系统进展》第 16 卷。麻省理工学院出版社,2003 年。
[21] Krikamol Muandet et al. "Kernel Mean Embedding of Distributions: A Review and Beyond". In: Foundations and Trends® in Machine Learning 10.1-2 (2017), pp. 1-141. DOI: 10.1561/ 2200000060. arXiv: 1605.09522
[21] 克里卡莫尔·穆安迪特等。《分布核均值嵌入:综述与展望》。发表于:《机器学习基础与趋势》10.1-2(2017 年),第 1-141 页。DOI:10.1561/2200000060。预印本:arXiv:1605.09522。
[22] Krikamol Muandet et al. "Learning from Distributions via Support Measure Machines". In: NIPS. 2012.
[22] 克里卡莫尔·穆安迪特等。《通过支持测度机从分布中学习》。发表于:《神经信息处理系统大会》。2012 年。
[23] Alfred Müller. "Integral probability metrics and their generating classes of functions". In: Advances in applied probability 29.2 (1997), pp. 429-443.
[23] 阿尔弗雷德·穆勒。《积分概率度量及其生成函数类》。发表于:《应用概率进展》29.2(1997 年),第 429-443 页。
[24] Jose Nogueira et al. "Unscented Bayesian Optimization for Safe Robot Grasping". In: 2016 IEEE/RSJ Ineternational Conference on Intelligent Robots and Systems (IROS). Daejeon, South Korea: IEEE, 2016, pp. 1967-1972. ISBN: 978-1-5090-3762-9. DOI: 10.1109/IROS. 2016. 7759310
[24] Jose Nogueira 等.《基于无迹变换贝叶斯优化的机器人安全抓取》.见:2016 年 IEEE/RSJ 智能机器人与系统国际会议(IROS).韩国大田:IEEE,2016 年,第 1967-1972 页.ISBN:978-1-5090-3762-9.DOI:10.1109/IROS.2016.7759310
[25] Rafael Oliveira, Lionel Ott, and Fabio Ramos. "Bayesian Optimisation under Uncertain Inputs". In: Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 1177-1184.
[25] Rafael Oliveira,Lionel Ott 和 Fabio Ramos.《不确定输入下的贝叶斯优化》.见:第二十二届人工智能与统计国际会议论文集.PMLR,2019 年,第 1177-1184 页.
[26] Rafael Oliveira, Lionel Ott, and Fabio Ramos. Bayesian optimisation under uncertain inputs. 2019. arXiv: 1902.07908.
[26] Rafael Oliveira,Lionel Ott 和 Fabio Ramos.不确定输入下的贝叶斯优化.2019 年.arXiv:1902.07908.
[27] Nicholas D. Sanders et al. Bayesian Search for Robust Optima. 2021. arXiv: 1904.11416
[27] Nicholas D. Sanders 等.鲁棒最优解的贝叶斯搜索.2021 年.arXiv:1904.11416
[28] Bobak Shahriari et al. "Taking the Human Out of the Loop: A Review of Bayesian Optimization". In: Proceedings of the IEEE 104.1 (Jan. 2016), pp. 148-175. DOI: 10.1109/JPROC 2015.2494218
[28] Bobak Shahriari 等."将人类移出循环:贝叶斯优化方法综述".见:《IEEE 会刊》104 卷 1 期(2016 年 1 月),第 148-175 页.DOI:10.1109/JPROC 2015.2494218
[29] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. "Practical bayesian optimization of machine learning algorithms". In: Advances in neural information processing systems 25 (2012).
[29] Jasper Snoek, Hugo Larochelle 与 Ryan P Adams."机器学习算法的实用贝叶斯优化".见:《神经信息处理系统进展》第 25 卷(2012 年).
[30] Niranjan Srinivas et al. "Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design". In: International Conference on Machine Learning. 2009. URL: https://api.semanticscholar.org/CorpusID:59031327
[30] Niranjan Srinivas 等."多臂老虎机设置下的高斯过程优化:无遗憾与实验设计".见:国际机器学习大会.2009 年.URL:https://api.semanticscholar.org/CorpusID:59031327
[31] Zi Wang and Stefanie Jegelka. "Max-value entropy search for efficient Bayesian optimization". In: International Conference on Machine Learning. PMLR. 2017, pp. 3627-3635.
[31] Zi Wang 与 Stefanie Jegelka."基于最大价值熵搜索的高效贝叶斯优化".见:国际机器学习大会.PMLR 出版社.2017 年,第 3627-3635 页.
[32] Colin White, Willie Neiswanger, and Yash Savani. "Bananas: Bayesian optimization with neural architectures for neural architecture search". In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 12. 2021, pp. 10293-10301.
[32] 科林·怀特、威利·内斯旺格与雅什·萨瓦尼。《香蕉算法:基于神经架构贝叶斯优化的神经网络架构搜索》。见:AAAI 人工智能会议论文集。第 35 卷。第 12 期。2021 年,第 10293-10301 页。
[33] Christopher Williams and Matthias Seeger. "Using the Nyström Method to Speed Up Kernel Machines". In: Advances in Neural Information Processing Systems. Vol. 13. MIT Press, 2000.
[33] 克里斯托弗·威廉姆斯与马蒂亚斯·西格。《利用尼斯特伦方法加速核机器》。见:神经信息处理系统进展。第 13 卷。麻省理工学院出版社,2000 年。
[34] Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine learning. Vol. 2. 3. MIT press Cambridge, MA, 2006.
[34] 克里斯托弗·KI·威廉姆斯与卡尔·爱德华·拉斯穆森。《机器学习中的高斯过程》。第 2 卷。第 3 期。麻省理工学院出版社,剑桥,2006 年。
Nyström estimator can easily approximate the kernel mean embedding
Nyström 估计器可以轻松近似核均值嵌入
Assumption 2. There exists a positive constant
假设 2. 存在正常数
The true MMD distance between
真实 MMD 距离在
and now we have the lemma from Theorem 5.1 in [8]
根据文献[8]中定理 5.1 可得如下引理
Lemma 1. Let Assumption 2 hold. Furthermore,assume that for
引理 1. 设假设 2 成立。此外,假设对于
provided that,for
where
其中
Specifically,when the effective dimension
具体而言,当有效维度
Then,choosing the subsample size
那么,将子样本大小
第一种情况下的
or
或第二种情况下的
we get
Following the same notation in Section 4,now we can construct a Gaussian process
沿用第 4 节中的相同符号约定,我们现在可以构建一个在
where
其中
and the maximum information gain is defined as
最大信息增益定义为
We define the sub-Gaussian condition as follows:
我们定义次高斯条件如下:
Definition 1. For a given
定义 1. 对于给定
Now we can state the lemma for bounding the uncertain-inputs regret of exact kernel evaluations, which is originally stated in Theorem 5 in [26].
现在我们可以陈述关于精确核评估的不确定输入后悔值界限的引理,该引理最初见于文献[26]的定理 5。
Lemma 2. Let
引理 2. 设
and set
并设
with probability at least
概率至少为
Note that although the original theorem restricted to the case when
需注意,虽然原定理仅限于
Now let us derivative the inference under the inexact kernel estimations.
现在让我们推导在非精确核估计下的推理过程。
Theorem 3. Under the Assumption 1 for
定理 3. 在假设 1 对
Proof. Denote
we have 我们有
thus 因此
Notic that the following holds with a probability at least
请注意,根据假设 1,以下结论至少以
here we use the fact that
这里我们利用了
holds with a probability at least
以至少
Similarly,we can conduct the same estimation to
同理,我们可以对
It remains to estimate the error for estimating the information gain. Notice that, with a probability at least
仍需估计信息增益的误差。注意,以至少
here the second equation uses the fact that
此处第二个等式利用了
and
且
Theorem 4. Under the former setting as in Theorem 3 with probability at least
定理 4. 在定理 3 相同设定下,以至少
Proof. According to Lemma 2, equation (22), we have
证明. 根据引理 2 的式(22)可得
with 其中
Notice that 注意到
We also have [25], which means
我们还拥有[25],这意味着
combining (24), (28) and (29), we finally get the result in (27).
结合(24)、(28)和(29),我们最终得到(27)中的结果。
Now we can prove our main theorem 1 .
现在我们可以证明我们的主要定理 1。
Proof of Theorem 1 Let
定理 1 的证明 设
Here we denote
此处记
and the third inequality follows from the choice of
第三个不等式来自
Thus we have 因此我们得出
From Lemma 4 in [9], we have that
根据文献[9]中的引理 4,我们可得
and thus 因此
On the other hand, notice that
另一方面,注意到
we find that the
我们发现式(30)中的
Proof. Define the square of the MMD distance between
证明。定义
It is not hard to verify that
不难验证
Then according to Theorem 5 in [30],we have
根据文献[30]中的定理 5,我们得到
In our implementation of AIRBO,we design the kernel
在 AIRBO 的实现中,我们将用于 MMD 估计的核函数
where
其中
Depending on the form of input distributions, the sampling and sub-sampling sizes for Nyström MMD estimator are empirically selected via experiments. Moreover, as the input uncertainty is already modeled in the surrogate,we employ a classic UCB-based acquisition as Eq. 5 with
根据输入分布的形式,通过实验经验性地选择 Nyström MMD 估计器的采样和子采样规模。此外,由于代理模型中已对输入不确定性进行建模,我们采用如公式 5 所示的经典 UCB 采集函数(含
Figure 7: Modeling performance with a step-changing Chi-squared distribution.
图 7:阶跃变化卡方分布下的建模性能表现
To compare the modeling performances with the other models, we design the input uncertainty to follow a step-changing Chi-squared distribution:
为评估不同模型的建模性能,我们设计了一种阶梯式变化的卡方分布输入不确定性:
Several surrogate models are employed in this comparison, including:
本对比研究采用了多种代理模型,包括:
MMDGP-nystrom(160/10) is our method with a sampling size
MMDGP-nystrom(160/10)是我们的方法,采样大小为
uGP(40) is the surrogate from [25], which employs an integral kernel with sampling size
uGP(40)是文献[25]提出的代理模型,采用积分核函数且采样大小为
uGP(160) is also the surrogate from [25] but uses a much larger sampling size
uGP(160)同样来自文献[25],但采用更大的采样规模
skl is a robust GP surrogate equipped with a symmetric KL-based kernel, which is described in [20].
skl 是配备对称 KL 核的鲁棒高斯过程代理模型,具体描述见文献[20]。
ERBF [13] assumes the input uncertainty to be Gaussians and employs a close-form expected RBF kernel.
ERBF [13] 假设输入不确定性服从高斯分布,并采用闭式期望 RBF 核进行计算。
Figure 8: Ablation test for the Nyström approximation.
图 8:Nyström 近似法的消融测试结果。
In this experiment, we aim to examine the effect of Nyström approximation for optimization. To this end, we choose to optimize an RKHS function (Figure 4a) under a beta input distribution:
本实验旨在检验 Nyström 近似对优化效果的影响。为此,我们选择在贝塔输入分布
MMDGP-nystrom is our method with Nystrom approximation, in which the sampling size
MMDGP-nystrom 是我们采用 Nystrom 近似的方法,其中采样规模
MMDGP-raw-
MMDGP-raw-
MMDGP-raw-L also uses an empirical MMD estimator, but with a larger sampling size
MMDGP-raw-L 同样采用经验 MMD 估计器,但使用更大的采样规模
According to Figure 8, we find that 1) with sufficient computation power, the MMDGP-raw-L can obtain the best performance by using a large sample size. 2)However, with limited complexity, the performance
根据图 8 所示,我们发现:1)在计算资源充足的情况下,MMDGP-raw-L 通过使用大样本量能获得最佳性能;2)然而在复杂度受限时,
To further evaluate AIRBO's optimization performance on the high-dimensional problem, we employ a 10-dimensional bumped bowl function from [27, 19]:
为了进一步评估 AIRBO 在高维问题上的优化性能,我们采用了文献[27,19]中的 10 维凸包函数:
Figure 9: Optimization regret on 10D bumped-bowl problem.
图 9:10 维凸碗问题上的优化遗憾值。
Figure 10: The input GMM distribution.
图 10:输入的高斯混合模型分布。
Figure 11: Simulation results of the push configurations found by different algorithms.
图 11:不同算法发现的推送配置仿真结果。
Here,
此处,
Figure 9 shows the mean and std values of the optimization regrets. We note that 1) when it comes to a high-dimensional problem and complex input distribution, the misassumption of Gaussian input uncertainty renders the
图 9 展示了优化遗憾值的均值与标准差。我们注意到:1)当面临高维问题及复杂输入分布时,对高斯输入不确定性的错误假设会导致
This benchmark is based on a Box2D simulator from [31], where our objective is to identify a robust push configuration, enabling a robot to push a ball to predetermined targets under input randomness. In our experiment,we simplify the task by setting the push angle to
该基准测试基于[31]中的 Box2D 模拟器,我们的目标是找到一个稳健的推动配置,使机器人能在输入随机性的条件下将球推向预定目标。实验中我们将推动角度简化为
where the covariance matrix
其中协方差矩阵
Apart from the statistics of the found pre-images in Figure 6, we also simulate the robot pushes according to the found configurations and visualize the results in Figure 11. In this figure, each black hollow square represents an instance of the robot's initial location, the grey arrow indicates the push direction and duration, and the blue circle marks the ball's ending position after the push. We can find that, as the GP-UCB ignores the input uncertainty, it randomly pushes to these targets and the ball ending positions fluctuate. Also, due to the incorrect assumption of the input distribution, the SKL-UCB and ERBF-UCB fail to control the ball's ending position under input randomness. On the contrary, AIRBO successfully recognizes the twin targets in quadrant I as an optimal choice and frequently pushes to this area. Moreover, all the ball's ending positions are well controlled and centralized around the targets under input randomness.
除图 6 中统计的预映射发现结果外,我们还根据寻得的配置参数模拟了机器人推球动作,并将可视化结果呈现在图 11 中。该图中每个黑色空心方块代表机器人初始位置的一个实例,灰色箭头表示推动方向与持续时间,蓝色圆圈则标记推动后小球的终止位置。可以发现:由于 GP-UCB 忽略了输入不确定性,其推送目标具有随机性,导致小球终止位置呈现波动状态;同时由于对输入分布的错误假设,SKL-UCB 和 ERBF-UCB 在输入随机性下均无法有效控制小球终止位置。与之形成鲜明对比的是,AIRBO 成功识别出第一象限的双目标区域作为最优选择,并频繁向该区域推送。更重要的是,在输入随机性条件下,所有小球终止位置均得到精准控制,紧密聚集在目标区域周围。