Upper Confidence Bound

释义 Definition

上置信界（UCB）：在给定置信水平下，对某个未知量（如均值、回报、概率）可能取值范围的上限估计。在机器学习/强化学习中常用作一种决策规则：用“估计值 + 不确定性奖励”来兼顾利用（exploitation）与探索（exploration）。

发音 Pronunciation (IPA)

/pr knfdns band/

词源 Etymology

该短语由三部分构成：upper（上方的）+ confidence（统计学中的“置信”，表示把握程度）+ bound（界限/上界）。它源自统计推断里“置信区间（confidence interval）”的概念，其中区间的上端点就是 upper confidence bound；后来被广泛用于多臂老虎机（multi-armed bandit）与贝叶斯优化等算法命名与描述中（如 UCB、GP-UCB）。

例句 Examples

The algorithm chooses the option with the highest upper confidence bound.
算法会选择上置信界最高的那个选项。

In a multi-armed bandit setting, UCB adds an uncertainty bonus to the estimated reward so that rarely tried ations are explored more.
在多臂老虎机问题中，UCB 会在回报估计上加上“不确定性奖励”，从而让较少尝试的动作获得更多探索机会。

文学与名著用例 Literary Works

Auer, Cesa-Bianchi, Fischer (2002), Finite-time Analysis of the Multiarmed Bandit Problem：提出并分析经典 UCB1 算法，上置信界是其核心选择准则。
Sutton & Barto, Reinforcement Learning: An Introduction：在讨论探索-利用策略与 bandit 问题时提及 UCB 思路与相关方法。
Bubeck & Cesa-Bianchi (2012), Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems（综述）：系统讨论 UCB 类算法与遗憾（regret）分析。
Srinivas et al. (2010), Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design：提出 GP-UCB，将“上置信界”思想用于高斯过程贝叶斯优化。