这是用户在 2025-5-6 14:39 为 https://cs50.harvard.edu/ai/2024/notes/0/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Lecture 0  讲座 0

Artificial Intelligence  人工智能

Artificial Intelligence (AI) covers a range of techniques that appear as sentient behavior by the computer. For example, AI is used to recognize faces in photographs on your social media, beat the World’s Champion in chess, and process your speech when you speak to Siri or Alexa on your phone.
人工智能(AI)涵盖了一系列使计算机表现出似有感知行为的技巧。例如,AI 被用于识别社交媒体上的照片中的面孔、击败世界象棋冠军,以及处理您在手机上与 Siri 或 Alexa 对话时的语音。

In this course, we will explore some of the ideas that make AI possible:
在本课程中,我们将探讨使人工智能成为可能的一些理念:

  1. Search  搜索

Finding a solution to a problem, like a navigator app that finds the best route from your origin to the destination, or like playing a game and figuring out the next move.
解决问题,比如导航应用从起点找到最佳路线,或者玩游戏时想出下一步棋。

  1. Knowledge  知识

Representing information and drawing inferences from it.
表示信息并从中得出推论。

  1. Uncertainty  不确定性

Dealing with uncertain events using probability.
使用概率处理不确定事件。

  1. Optimization  优化

Finding not only a correct way to solve a problem, but a better—or the best—way to solve it.
不仅找到解决问题的正确方法,还要找到更好或最佳的方法来解决问题。

  1. Learning  学习。

Improving performance based on access to data and experience. For example, your email is able to distinguish spam from non-spam mail based on past experience.
基于数据和经验提高性能。例如,你的电子邮件能够根据以往的经验区分垃圾邮件和非垃圾邮件。

  1. Neural Networks  神经网络

A program structure inspired by the human brain that is able to perform tasks effectively.
一种受人类大脑启发的程序结构,能够有效地执行任务。

  1. Language  语言

Processing natural language, which is produced and understood by humans.
处理自然语言,这是人类产生和理解的。

Search  搜索

Search problems involve an agent that is given an initial state and a goal state, and it returns a solution of how to get from the former to the latter. A navigator app uses a typical search process, where the agent (the thinking part of the program) receives as input your current location and your desired destination, and, based on a search algorithm, returns a suggested path. However, there are many other forms of search problems, like puzzles or mazes.
搜索问题涉及一个初始状态和目标状态的代理,它返回从前者到后者的解决方案。导航应用程序使用典型的搜索过程,其中代理(程序的思考部分)接收当前位置和目标位置作为输入,并根据搜索算法返回建议路径。然而,还有许多其他形式的搜索问题,如谜题或迷宫。

15 puzzle

Finding a solution to a 15 puzzle would require the use of a search algorithm.
解决 15 个拼图问题需要使用搜索算法。

  • Agent  代理

    An entity that perceives its environment and acts upon that environment. In a navigator app, for example, the agent would be a representation of a car that needs to decide on which actions to take to arrive at the destination.
    一种感知其环境并对该环境采取行动的实体。例如,在导航应用中,智能体可以代表一辆需要决定采取哪些行动才能到达目的地的汽车。

  • State  状态

    A configuration of an agent in its environment. For example, in a 15 puzzle, a state is any one way that all the numbers are arranged on the board.
    智能体在其环境中的配置。例如,在 15 个数字拼图游戏中,状态是指所有数字在棋盘上排列的任何一种方式。

    • Initial State  初始状态

      The state from which the search algorithm starts. In a navigator app, that would be the current location.
      搜索算法的起始状态。在导航应用中,这将是当前位置。

  • Actions  动作

    Choices that can be made in a state. More precisely, actions can be defined as a function. Upon receiving state s as input, Actions(s) returns as output the set of actions that can be executed in state s. For example, in a 15 puzzle, the actions of a given state are the ways you can slide squares in the current configuration (4 if the empty square is in the middle, 3 if next to a side, 2 if in the corner).
    在某个状态下可以做出的选择。更确切地说,动作可以被定义为函数。当接收到状态 s 作为输入时, Actions(s) 返回在状态 s 中可以执行的动作集合。例如,在 15 个拼图游戏中,给定状态的动作是滑动方块的方式(如果空白方块在中间,则为 4 种方式;如果靠近边缘,则为 3 种方式;如果在角落,则为 2 种方式)。

  • Transition Model  转换模型

    A description of what state results from performing any applicable action in any state. More precisely, the transition model can be defined as a function. Upon receiving state s and action a as input, Results(s, a) returns the state resulting from performing action a in state s. For example, given a certain configuration of a 15 puzzle (state s), moving a square in any direction (action a) will bring to a new configuration of the puzzle (the new state).
    描述在任意状态下执行任何适用动作后产生的状态。更确切地说,转换模型可以被定义为函数。当接收到状态 s 和动作 a 作为输入时, Results(s, a) 返回在状态 s 中执行动作 a 后的结果状态。例如,给定 15 个拼图游戏的一定配置(状态 s ),将一个方块向任何方向移动(动作 a )将导致拼图的新配置(新状态)。

  • State Space  状态空间

    The set of all states reachable from the initial state by any sequence of actions. For example, in a 15 puzzle, the state space consists of all the 16!/2 configurations on the board that can be reached from any initial state. The state space can be visualized as a directed graph with states, represented as nodes, and actions, represented as arrows between nodes.
    从初始状态通过任何动作序列可达的所有状态集合。例如,在 15 个拼图游戏中,状态空间包括从任何初始状态可达的 16!/2 种配置。状态空间可以表示为一个有向图,其中状态用节点表示,动作用节点之间的箭头表示。

State Space

  • Goal Test  目标测试

    The condition that determines whether a given state is a goal state. For example, in a navigator app, the goal test would be whether the current location of the agent (the representation of the car) is at the destination. If it is — problem solved. If it’s not — we continue searching.
    确定给定状态是否为目标状态的条件。例如,在导航应用中,目标测试将是代理(汽车表示)的当前位置是否在目的地。如果是——问题解决。如果不是——我们继续搜索。

  • Path Cost  路径成本

    A numerical cost associated with a given path. For example, a navigator app does not simply bring you to your goal; it does so while minimizing the path cost, finding the fastest way possible for you to get to your goal state.
    与给定路径相关的数值成本。例如,导航应用不仅将您带到目的地,而且是在最小化路径成本的同时做到这一点,为您找到到达目标状态的最快方式。

Solving Search Problems  解决搜索问题

  • Solution  解决方案

    A sequence of actions that leads from the initial state to the goal state.
    从初始状态到目标状态的行动序列。

    • Optimal Solution  最优解

      A solution that has the lowest path cost among all solutions.
      一个在所有解决方案中具有最低路径成本的解决方案。

In a search process, data is often stored in a node, a data structure that contains the following data:
在搜索过程中,数据通常存储在节点中,节点是一种包含以下数据的数据结构:

  • A state  一个状态
  • Its parent node, through which the current node was generated
    其父节点,通过该节点生成当前节点
  • The action that was applied to the state of the parent to get to the current node
    应用到父节点状态以到达当前节点所采取的操作
  • The path cost from the initial state to this node
    从初始状态到该节点的路径代价

Nodes contain information that makes them very useful for the purposes of search algorithms. They contain a state, which can be checked using the goal test to see if it is the final state. If it is, the node’s path cost can be compared to other nodes’ path costs, which allows choosing the optimal solution. Once the node is chosen, by virtue of storing the parent node and the action that led from the parent to the current node, it is possible to trace back every step of the way from the initial state to this node, and this sequence of actions is the solution.
节点包含使它们对搜索算法非常有用的信息。它们包含一个状态,可以通过目标测试来检查它是否是最终状态。如果是,可以比较该节点的路径成本与其他节点的路径成本,从而选择最优解。一旦选择了节点,由于存储了父节点以及从父节点到当前节点所采取的操作,就可以从初始状态追踪到这个节点,这一系列操作就是解决方案。

However, nodes are simply a data structure — they don’t search, they hold information. To actually search, we use the frontier, the mechanism that “manages” the nodes. The frontier starts by containing an initial state and an empty set of explored items, and then repeats the following actions until a solution is reached:
然而,节点仅仅是一种数据结构——它们不进行搜索,只是存储信息。实际上进行搜索时,我们使用边界,这是“管理”节点的机制。边界最初包含一个初始状态和一组空的已探索项,然后重复以下操作,直到找到解决方案:

Repeat:  重复:

  1. If the frontier is empty,
    如果边界为空,

    • Stop. There is no solution to the problem.
      停止。该问题无解。
  2. Remove a node from the frontier. This is the node that will be considered.
    从边界中移除一个节点。这是将要考虑的节点。

  3. If the node contains the goal state,
    如果节点包含目标状态,

    • Return the solution. Stop.
      返回解决方案。停止。

    Else,  否则,

    * Expand the node (find all the new nodes that could be reached from this node), and add resulting nodes to the frontier.
    * Add the current node to the explored set.
    

Depth-First Search  深度优先搜索

In the previous description of the frontier, one thing went unmentioned. At stage 2 in the pseudocode above, which node should be removed? This choice has implications on the quality of the solution and how fast it is achieved. There are multiple ways to go about the question of which nodes should be considered first, two of which can be represented by the data structures of stack (in depth-first search) and queue (in breadth-first search; and here is a cute cartoon demonstration of the difference between the two).
在对边界的描述中,有一件事未被提及。在上面的伪代码的第 2 阶段,应该移除哪个节点?这个选择对解决方案的质量和实现速度有影响。关于哪些节点应该首先考虑的问题,有几种不同的方法,其中两种可以由栈(深度优先搜索)和队列(广度优先搜索)的数据结构来表示;这里有一个关于这两种方法差异的可爱卡通演示。

We start with the depth-first search (DFS) approach.
我们从深度优先搜索(DFS)方法开始。

A depth-first search algorithm exhausts each one direction before trying another direction. In these cases, the frontier is managed as a stack data structure. The catchphrase you need to remember here is “last-in first-out.” After nodes are being added to the frontier, the first node to remove and consider is the last one to be added. This results in a search algorithm that goes as deep as possible in the first direction that gets in its way while leaving all other directions for later.
深度优先搜索算法在尝试另一个方向之前会先耗尽每一个方向。在这些情况下,边界被管理为栈数据结构。你需要记住的口号是“后进先出”。在节点被添加到边界之后,首先移除并考虑的是最后添加的节点。这导致了一个搜索算法,它会尽可能地在遇到障碍的第一个方向上深入搜索,同时将其他方向留待以后考虑。

(An example from outside lecture: Take a situation where you are looking for your keys. In a depth-first search approach, if you choose to start with searching in your pants, you’d first go through every single pocket, emptying each pocket and going through the contents carefully. You will stop searching in your pants and start searching elsewhere only once you will have completely exhausted the search in every single pocket of your pants.)
(来自课堂外的例子:假设你在找你的钥匙。在深度优先搜索方法中,如果你选择从你的裤子开始搜索,你会首先检查每一个口袋,清空每个口袋并仔细检查里面的东西。只有当你已经完全检查完裤子的每一个口袋后,你才会停止在裤子里搜索,并开始在其他地方寻找。)

  • Pros:   优点:
    • At best, this algorithm is the fastest. If it “lucks out” and always chooses the right path to the solution (by chance), then depth-first search takes the least possible time to get to a solution.
      最好的情况下,这个算法是最快的。如果它“运气好”并且总是选择正确的路径到达解决方案(偶然),那么深度优先搜索到达解决方案所需的时间是最短的。
  • Cons:   缺点:
    • It is possible that the found solution is not optimal.
      可能找到的解决方案不是最优的。
    • At worst, this algorithm will explore every possible path before finding the solution, thus taking the longest possible time before reaching the solution.
      最坏的情况下,该算法将探索所有可能的路径,然后在找到解决方案之前花费最长的时间。

Code example:  代码示例:

    # Define the function that removes a node from the frontier and returns it.
    def remove(self):
    	  # Terminate the search if the frontier is empty, because this means that there is no solution.
        if self.empty():
            raise Exception("empty frontier")
        else:
        	  # Save the last item in the list (which is the newest node added)
            node = self.frontier[-1]
            # Save all the items on the list besides the last node (i.e. removing the last node)
            self.frontier = self.frontier[:-1]
            return node

Breadth-First Search  广度优先搜索

The opposite of depth-first search would be breadth-first search (BFS).
深度优先搜索的反面是广度优先搜索(BFS)。

A breadth-first search algorithm will follow multiple directions at the same time, taking one step in each possible direction before taking the second step in each direction. In this case, the frontier is managed as a queue data structure. The catchphrase you need to remember here is “first-in first-out.” In this case, all the new nodes add up in line, and nodes are being considered based on which one was added first (first come first served!). This results in a search algorithm that takes one step in each possible direction before taking a second step in any one direction.
广度优先搜索算法将同时遵循多个方向,在每个可能的方向上先迈出一小步,然后再在每个方向上迈出第二步。在这种情况下,边界被管理为一个队列数据结构。你需要记住的口号是“先进先出”。在这种情况下,所有新的节点都按顺序添加,节点是根据哪个节点先添加来考虑的(先来先服务!)。这导致了一个搜索算法,它在每个可能的方向上迈出一小步,然后再在任何方向上迈出第二步。

(An example from outside lecture: suppose you are in a situation where you are looking for your keys. In this case, if you start with your pants, you will look in your right pocket. After this, instead of looking at your left pocket, you will take a look in one drawer. Then on the table. And so on, in every location you can think of. Only after you will have exhausted all the locations will you go back to your pants and search in the next pocket.)
(讲座外的一个例子:假设你在找你的钥匙。在这种情况下,如果你从你的裤子开始,你会检查你的右口袋。然后,而不是检查左口袋,你会看看一个抽屉。然后是桌子。以此类推,在每个你能想到的地方。只有在你用完了所有的地方之后,你才会回到你的裤子,检查下一个口袋。)

  • Pros:   优点:
    • This algorithm is guaranteed to find the optimal solution.
      该算法保证找到最优解。
  • Cons:   缺点:
    • This algorithm is almost guaranteed to take longer than the minimal time to run.
      该算法几乎可以保证运行时间比最小运行时间要长。
    • At worst, this algorithm takes the longest possible time to run.
      最坏的情况下,这个算法需要运行最长时间。

Code example:  代码示例:

    # Define the function that removes a node from the frontier and returns it.
    def remove(self):
    	  # Terminate the search if the frontier is empty, because this means that there is no solution.
        if self.empty():
            raise Exception("empty frontier")
        else:
            # Save the oldest item on the list (which was the first one to be added)
            node = self.frontier[0]
            # Save all the items on the list besides the first one (i.e. removing the first node)
            self.frontier = self.frontier[1:]
            return node

Greedy Best-First Search
贪婪最佳优先搜索

Breadth-first and depth-first are both uninformed search algorithms. That is, these algorithms do not utilize any knowledge about the problem that they did not acquire through their own exploration. However, most often is the case that some knowledge about the problem is, in fact, available. For example, when a human maze-solver enters a junction, the human can see which way goes in the general direction of the solution and which way does not. AI can do the same. A type of algorithm that considers additional knowledge to try to improve its performance is called an informed search algorithm.
广度优先搜索和深度优先搜索都是非信息搜索算法。也就是说,这些算法没有利用通过自身探索获得的问题知识。然而,通常情况下,关于问题的某些知识实际上是可用的。例如,当人类迷宫解决者在交汇点时,人类可以看到哪个方向通向解决方案的一般方向,哪个方向则不行。人工智能也可以做到同样的事情。一种考虑额外知识以尝试提高其性能的算法称为信息搜索算法。

Greedy best-first search expands the node that is the closest to the goal, as determined by a heuristic function h(n). As its name suggests, the function estimates how close to the goal the next node is, but it can be mistaken. The efficiency of the greedy best-first algorithm depends on how good the heuristic function is. For example, in a maze, an algorithm can use a heuristic function that relies on the Manhattan distance between the possible nodes and the end of the maze. The Manhattan distance ignores walls and counts how many steps up, down, or to the sides it would take to get from one location to the goal location. This is an easy estimation that can be derived based on the (x, y) coordinates of the current location and the goal location.
贪婪最佳优先搜索扩展距离目标最近的节点,这是通过启发式函数 h(n)确定的。正如其名所示,该函数估计下一个节点距离目标有多近,但可能会出错。贪婪最佳优先算法的效率取决于启发式函数的好坏。例如,在一个迷宫中,算法可以使用一个启发式函数,该函数依赖于可能节点与迷宫终点之间的曼哈顿距离。曼哈顿距离忽略了墙壁,并计算从当前位置到目标位置需要向上、向下或向侧面走多少步。这是一种基于当前位置和目标位置的(x, y)坐标的简单估计。

Manhattan Distance

Manhattan Distance  曼哈顿距离

However, it is important to emphasize that, as with any heuristic, it can go wrong and lead the algorithm down a slower path than it would have gone otherwise. It is possible that an uninformed search algorithm will provide a better solution faster, but it is less likely to do so than an informed algorithm.
然而,重要的是强调,就像任何启发式方法一样,它可能会出错,导致算法走向比原本更慢的路径。有可能一个无信息的搜索算法会更快地提供一个更好的解决方案,但这比一个有信息的算法做这件事的可能性要小。

A* Search  A*搜索

A development of the greedy best-first algorithm, A* search considers not only h(n), the estimated cost from the current location to the goal, but also g(n), the cost that was accrued until the current location. By combining both these values, the algorithm has a more accurate way of determining the cost of the solution and optimizing its choices on the go. The algorithm keeps track of (cost of path until now + estimated cost to the goal), and once it exceeds the estimated cost of some previous option, the algorithm will ditch the current path and go back to the previous option, thus preventing itself from going down a long, inefficient path that h(n) erroneously marked as best.
A*搜索算法是贪婪最佳优先搜索算法的一种发展,它不仅考虑了 h(n),即从当前位置到目标位置的估计成本,还考虑了 g(n),即到达当前位置所累积的成本。通过结合这两个值,算法有了更准确的方式来确定解决方案的成本,并在过程中优化其选择。算法跟踪(到目前为止的成本 + 到目标位置的估计成本),一旦它超过了某些先前选项的估计成本,算法就会放弃当前路径,回到先前的选项,从而防止自己走向一个被 h(n)错误标记为最佳的长而低效的路径。

Yet again, since this algorithm, too, relies on a heuristic, it is as good as the heuristic that it employs. It is possible that in some situations it will be less efficient than greedy best-first search or even the uninformed algorithms. For A* search to be optimal, the heuristic function, h(n), should be:
再次,因为这个算法也依赖于启发式,所以它的效果和它所采用的启发式一样好。在某种情况下,它可能比贪婪最佳优先搜索或甚至无信息算法效率更低。为了使 A*搜索成为最优的,启发式函数 h(n)应该是:

  1. Admissible, or never overestimating the true cost, and
    可行的,或者说从不高估真实成本,
  2. Consistent, which means that the estimated path cost to the goal of a new node in addition to the cost of transitioning to it from the previous node is greater or equal to the estimated path cost to the goal of the previous node. To put it in an equation form, h(n) is consistent if for every node n and successor node n’ with step cost c, h(n) ≤ h(n’) + c.
    一致性,这意味着从新节点到目标路径的估计成本加上从先前的节点转换到该节点的成本,应大于或等于先前的节点到目标路径的估计成本。用方程式表示,如果对于每个节点 n 及其后继节点 n'和步长成本 c,h(n) ≤ h(n') + c,则 h(n)是一致的。

Adversarial Search  对抗搜索

Whereas, previously, we have discussed algorithms that need to find an answer to a question, in adversarial search the algorithm faces an opponent that tries to achieve the opposite goal. Often, AI that uses adversarial search is encountered in games, such as tic tac toe.
而之前,我们讨论了需要找到问题答案的算法,在对抗搜索中,算法面对的是试图实现相反目标的对手。通常,使用对抗搜索的 AI 在游戏中会遇到,比如井字棋。

Minimax  最大最小化

A type of algorithm in adversarial search, Minimax represents winning conditions as (-1) for one side and (+1) for the other side. Further actions will be driven by these conditions, with the minimizing side trying to get the lowest score, and the maximizer trying to get the highest score.
对抗搜索中的一种算法,Minimax 将胜利条件表示为一方的(-1)和另一方的(+1)。后续的行动将由这些条件驱动,最小化方试图获得最低分数,而最大化方试图获得最高分数。

Representing a Tic-Tac-Toe AI:
表示井字棋 AI:

  • S₀: Initial state (in our case, an empty 3X3 board)
    S₀:初始状态(在我们的例子中,一个空的 3X3 棋盘)
  • Players(s): a function that, given a state s, returns which player’s turn it is (X or O).
    玩家(s):一个函数,给定一个状态 s,返回是哪位玩家的回合(X 或 O)。
  • Actions(s): a function that, given a state s, return all the legal moves in this state (what spots are free on the board).
    Actions(s):给定状态 s,返回所有合法移动的函数(棋盘上哪些位置是空闲的)。
  • Result(s, a): a function that, given a state s and action a, returns a new state. This is the board that resulted from performing the action a on state s (making a move in the game).
    结果(s, a):给定状态 s 和动作 a,返回一个新的状态。这是在状态 s 上执行动作 a 后得到的棋盘(在游戏中进行移动)。
  • Terminal(s): a function that, given a state s, checks whether this is the last step in the game, i.e. if someone won or there is a tie. Returns True if the game has ended, False otherwise.
    终止(s):给定状态 s,检查这是否是游戏的最后一步,即是否有人获胜或平局。如果游戏结束,则返回 True,否则返回 False。
  • Utility(s): a function that, given a terminal state s, returns the utility value of the state: -1, 0, or 1.
    工具函数:给定一个终端状态 s,返回该状态的效用值:-1、0 或 1。

How the algorithm works:  算法的工作原理:

Recursively, the algorithm simulates all possible games that can take place beginning at the current state and until a terminal state is reached. Each terminal state is valued as either (-1), 0, or (+1).
递归地,该算法模拟从当前状态开始直到达到终端状态的所有可能游戏。每个终端状态的价值为(-1)、0 或(+1)。

Minimax in Tic Tac Toe

Minimax Algorithm in Tic Tac Toe
井字棋中的 Minimax 算法

Knowing based on the state whose turn it is, the algorithm can know whether the current player, when playing optimally, will pick the action that leads to a state with a lower or a higher value. This way, alternating between minimizing and maximizing, the algorithm creates values for the state that would result from each possible action. To give a more concrete example, we can imagine that the maximizing player asks at every turn: “if I take this action, a new state will result. If the minimizing player plays optimally, what action can that player take to bring to the lowest value?” However, to answer this question, the maximizing player has to ask: “To know what the minimizing player will do, I need to simulate the same process in the minimizer’s mind: the minimizing player will try to ask: ‘if I take this action, what action can the maximizing player take to bring to the highest value?’” This is a recursive process, and it could be hard to wrap your head around it; looking at the pseudo code below can help. Eventually, through this recursive reasoning process, the maximizing player generates values for each state that could result from all the possible actions at the current state. After having these values, the maximizing player chooses the highest one.
基于当前轮到谁的状态,算法可以知道当前玩家在最优策略下,会选择导致状态值更高或更低的行动。通过交替最小化和最大化,算法为每个可能的行动创建从该状态产生的状态值。为了更具体地说明,我们可以想象最大化玩家在每一轮都会问:“如果我采取这个行动,将产生一个新的状态。如果最小化玩家最优策略地行动,那个玩家会采取什么行动将值降到最低?”然而,为了回答这个问题,最大化玩家必须问:“为了知道最小化玩家会做什么,我需要在最小化玩家的思维中模拟相同的过程:最小化玩家会试图问:‘如果我采取这个行动,最大化玩家会采取什么行动将值提到最高?’”这是一个递归过程,可能很难理解;查看下面的伪代码可能会有所帮助。 最终,通过这种递归推理过程,最大化玩家为当前状态下所有可能动作可能导致的所有状态生成值。在获得这些值之后,最大化玩家选择其中最高值。

Minimax Algorithm

The Maximizer Considers the Possible Values of Future States.
最大化器考虑未来状态的可能值。

To put it in pseudocode, the Minimax algorithm works the following way:
将其用伪代码表示,Minimax 算法的工作方式如下:

  • Given a state s  给定状态 s

    • The maximizing player picks action a in Actions(s) that produces the highest value of Min-Value(Result(s, a)).
      最大化玩家在 Actions(s) 中选择动作 a,使得 Min-Value(Result(s, a)) 的值最大。
    • The minimizing player picks action a in Actions(s) that produces the lowest value of Max-Value(Result(s, a)).
      最小化玩家在 Actions(s) 中选择动作 a,使得 Max-Value(Result(s, a)) 的值最小。
  • Function Max-Value(state)
    Function 最大值(state)

    • v = -∞

    • if Terminal(state):

      ​ return Utility(state)  返回 Utility(state)

    • for action in Actions(state):
      for 行动 in Actions(state):

      v = Max(v, Min-Value(Result(state, action)))
      v = Max(v, Min-Value(Result(state, action)))

      return v  返回 v

  • Function Min-Value(state):
    函数 Min-Value(state):

    • v = ∞

    • if Terminal(state):  如果 Terminal(state):

      ​ return Utility(state)  返回 Utility(state)

    • for action in Actions(state):
      for 行动 in Actions(state):

      v = Min(v, Max-Value(Result(state, action)))
      v = Min(v, Max-Value(Result(state, action)))

      return v  返回 v

Alpha-Beta Pruning  Alpha-Beta 剪枝

A way to optimize Minimax, Alpha-Beta Pruning skips some of the recursive computations that are decidedly unfavorable. After establishing the value of one action, if there is initial evidence that the following action can bring the opponent to get to a better score than the already established action, there is no need to further investigate this action because it will decidedly be less favorable than the previously established one.
优化 Minimax 的方法之一是 Alpha-Beta 剪枝,它跳过了某些明显不利的递归计算。在确定一个动作的值之后,如果最初就有证据表明接下来的动作可以使对手得到比已确定的动作更好的分数,那么就没有必要进一步调查这个动作,因为它肯定会比之前确定的动作不利。

This is most easily shown with an example: a maximizing player knows that, at the next step, the minimizing player will try to achieve the lowest score. Suppose the maximizing player has three possible actions, and the first one is valued at 4. Then the player starts generating the value for the next action. To do this, the player generates the values of the minimizer’s actions if the current player makes this action, knowing that the minimizer will choose the lowest one. However, before finishing the computation for all the possible actions of the minimizer, the player sees that one of the options has a value of three. This means that there is no reason to keep on exploring the other possible actions for the minimizing player. The value of the not-yet-valued action doesn’t matter, be it 10 or (-10). If the value is 10, the minimizer will choose the lowest option, 3, which is already worse than the preestablished 4. If the not-yet-valued action would turn out to be (-10), the minimizer will this option, (-10), which is even more unfavorable to the maximizer. Therefore, computing additional possible actions for the minimizer at this point is irrelevant to the maximizer, because the maximizing player already has an unequivocally better choice whose value is 4.
这是一个最简单的例子:一个最大化玩家知道,在下一步,最小化玩家将尝试获得最低分数。假设最大化玩家有三个可能的行为,第一个的价值是 4。然后玩家开始生成下一个行为的值。为了做到这一点,玩家假设当前玩家采取这个行为,生成最小化者的行为的值,知道最小化者将选择最低的。然而,在完成最小化者所有可能行为的计算之前,玩家看到其中一个选项的值为 3。这意味着没有必要继续探索最小化者的其他可能行为。尚未评估的行为的值无关紧要,无论是 10 还是(-10)。如果值是 10,最小化者将选择最低选项,3,这已经比预先设定的 4 更差。如果尚未评估的行为最终是(-10),最小化者将选择这个选项,(-10),这对最大化者来说更加不利。 因此,在此点计算最小化器的其他可能动作对最大化器来说是不相关的,因为最大化玩家已经有一个明确更好的选择,其价值为 4。

Alpha Beta Pruning

Depth-Limited Minimax

There is a total of 255,168 possible Tic Tac Toe games, and 10²⁹⁰⁰⁰ possible games in Chess. The minimax algorithm, as presented so far, requires generating all hypothetical games from a certain point to the terminal condition. While computing all the Tic-Tac-Toe games doesn’t pose a challenge for a modern computer, doing so with chess is currently impossible.

Depth-limited Minimax considers only a pre-defined number of moves before it stops, without ever getting to a terminal state. However, this doesn’t allow for getting a precise value for each action, since the end of the hypothetical games has not been reached. To deal with this problem, Depth-limited Minimax relies on an evaluation function that estimates the expected utility of the game from a given state, or, in other words, assigns values to states. For example, in a chess game, a utility function would take as input a current configuration of the board, try to assess its expected utility (based on what pieces each player has and their locations on the board), and then return a positive or a negative value that represents how favorable the board is for one player versus the other. These values can be used to decide on the right action, and the better the evaluation function, the better the Minimax algorithm that relies on it.