人工智能(四) wu-kan

Generalizing search problems

  • So far: our search problems have assumed agent has complete control of environment
    • state does not change unless the agent (robot) changes it
  • Assumption not always reasonable
    • other agents whose interests conflict with yours
  • In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent

What are key features of a game

  • Players have their own interests
  • Each player tries to alter the world so as to best benefit itself
  • They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play

Game properties

  • Two-player
  • Discrete: Game states or decisions can be mapped on discrete values.
  • Finite: There are only a finite number of states and possible decisions that can be made
  • Zero-sum (“): Fully competitive
    • if one player wins, the other loses an equal amount
    • note that some games don’t have this property
  • Deterministic: no chance involved
    • no dice, or random deals of cards, or coin flips, etc.
  • Perfect information: all aspects of the state are fully observable
    • e.g., no hidden cards.

Extensive Form Two-Player Zero-Sum Games

  • But R,P,S is a simple /one shot0(一次性) game
    • single move each
    • in game theory: a strategic or normal form game (策略或范式博弈)
  • Many games extend over multiple moves
    • turn-taking: players act alternatively
    • e.g., chess, checkers, etc.
    • in game theory: extensive form games (扩展形式博弈)
  • We’ll focus on the extensive form
    • that’s where the computational questions emerges

Two-Player Zero-Sum Game – Definition

  • Two players A (Max) and B (Min)
  • Set of states S (a finite set of states of the game)
  • An initial stateI∈S(where game begins)
  • Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
  • Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
  • Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48

Two-Player Zero-Sum Game – Intuition

  • Players alternate moves (starting with A, or Max)
    • Game ends when some terminal t ∊T is reached
  • A game state: a state-player pair
    • Tells us what state we’re in and whose move it is
  • Utility function and terminals replace goals
    • A, or Max, wants to maximize the terminal payoff
    • B, or Min, wants to minimize the terminal payoff
  • Think of it as:
    • A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
    • This is why it’s called zero (or constant) sum

The MiniMax Strategy

  • Assume that the other player will always play their best move
    • you always play a move that will minimize the payoff thatcould be gained by the other player.
    • By minimizing the other player’s payoff, you maximize yourown.
  • Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
  • Build full game tree (all leaves are terminals)
    • Root is start state, edges are possible moves, etc.
    • Label terminal nodes with utilities
  • Back values upthe tree
    • U(t)is defined for all terminals (part of input)
    • U(n)= min {U(c) : c isa child of n} if nis a Min node
    • U(n)= max {U(c) : cis a child of n} if nis a Max node

DFS Minmax

  • Building the entire game tree and backing up values gives each player their strategy.
  • However, the game tree is exponential in size.
  • Furthermore, as we will see later it is not necessary to know all of the tree.
  • To solve these problems we find a depth-firstimplementation of minimax.
  • We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
  • This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.

Pruning

  • It is not necessary to examine entire tree to make correct MiniMax decision
  • Assume depth-first generation of tree
    • After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
    • So we needn’t generate or evaluate any further children of n!
  • Two types of pruning (cuts):
    • pruning of max nodes (α-cuts)
    • pruning of min nodes (β-cuts)

Cutting Max Nodes (Alpha Cuts)

  • At a Max node n:
    • Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
    • Letαbe the highest value of n’s children examined so far (changes as children examined)
  • While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
    • Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first

Practical Matters

“Real” games are too large to enumerate tree

  • e.g., chess branching factor is roughly 35
  • Depth 10 tree: 2,700,000,000,000,000 nodes
  • Even alpha-beta pruning won’t help here!

We must limit depth of search tree

  • Can’t expand all the way to terminal nodes
  • We must make heuristic estimates about the values of the(nonterminal) states at the leaves of the tree
  • These heuristics are often called evaluation function

Evaluation functions: basic requirements

  • Should order the terminal states in the same way as the trueutility function.
  • The computation must not take too long!
  • For nonterminal states, the evaluation function should bestrongly correlated with the actual chances of winning.

How to design evaluation functions

  • Features of the states,e.g., in chess, the number of whitepawns(卒), black pawns, white queens, etc.
  • The features, taken together, define various tags orequivalence classes of states: the states in each category havethe same values for all the features.
  • Any given category will contain some states that lead to wins,some that lead to draws, and some that lead to losses.
  • e.g., suppose our experience suggests that 72% of the statesin a category lead to a win, 20% to a loss, and 8% to a draw.
  • Then a reasonable evaluation for states in the category is theexpected utility value:0.72·1 + 0.20·(−1) + 0.08·0 = 0.52.
  • However, there are too many tags
  • Most evaluation functions compute separate numerical contributions from each feature and then combine them
  • e.g., each pawn is worth 1, a knight(马) or bishop(象) isworth 3, a rook(车)) 5, and the queen 9
  • Mathematically, a weighted linear functionEval(s) =w1·f1(s) +…+wn·fn(s) =∑ni=1wi·fi(s)
  • Deep Blue used over 8000 features
  • This involves a strong assumption: the contribution of eachfeature is independent of the values of the other features.
  • The assumption may not hold, hence nonlinear combinationsare also used
  • The features and weights are not part of the rules of chess!
  • They come from centuries of human chess-playing experience.
  • In case this kind of experience is not available, the weights ofthe evaluation function can be estimated by machine learning techniques.