这是用户在 2025-1-5 20:44 为 https://craftinginterpreters.com/parsing-expressions.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
6

Parsing Expressions  解析表达式

Grammar, which knows how to control even kings.
语法,它知道如何控制甚至国王。
Molière  莫里哀

This chapter marks the first major milestone of the book. Many of us have cobbled together a mishmash of regular expressions and substring operations to extract some sense out of a pile of text. The code was probably riddled with bugs and a beast to maintain. Writing a real parserone with decent error handling, a coherent internal structure, and the ability to robustly chew through a sophisticated syntaxis considered a rare, impressive skill. In this chapter, you will attain it.
本章标志着本书的第一个重要里程碑。我们中的许多人曾拼凑过一堆正则表达式和子字符串操作,试图从一堆文本中提取出有意义的信息。这些代码可能漏洞百出,维护起来如同驯服野兽。编写一个真正的解析器——具备良好的错误处理、一致的内部结构,以及能够稳健处理复杂语法的能力——被视为一项罕见且令人印象深刻的技能。在本章中,你将掌握这一技能。

It’s easier than you think, partially because we front-loaded a lot of the hard work in the last chapter. You already know your way around a formal grammar. You’re familiar with syntax trees, and we have some Java classes to represent them. The only remaining piece is parsingtransmogrifying a sequence of tokens into one of those syntax trees.
这比你想象的要容易,部分原因在于我们在上一章已经完成了许多繁重的工作。你已经掌握了形式语法的基本知识,熟悉了语法树,并且我们有一些 Java 类来表示它们。剩下的唯一部分就是解析——将一系列标记转换为这些语法树之一。

Some CS textbooks make a big deal out of parsers. In the ’60s, computer scientistsunderstandably tired of programming in assembly languagestarted designing more sophisticated, human-friendly languages like Fortran and ALGOL. Alas, they weren’t very machine-friendly for the primitive computers of the time.
一些计算机科学教材对解析器大书特书。在 60 年代,计算机科学家们——显然厌倦了用汇编语言编程——开始设计更复杂、更人性化的语言,如 Fortran 和 ALGOL。可惜的是,这些语言对当时原始的计算机并不十分友好。

These pioneers designed languages that they honestly weren’t even sure how to write compilers for, and then did groundbreaking work inventing parsing and compiling techniques that could handle these new, big languages on those old, tiny machines.
这些先驱者设计了他们甚至不确定如何编写编译器的语言,然后进行了开创性的工作,发明了能够在那些老旧的小型机器上处理这些新的大型语言的解析和编译技术。

Classic compiler books read like fawning hagiographies of these heroes and their tools. The cover of Compilers: Principles, Techniques, and Tools literally has a dragon labeled “complexity of compiler design” being slain by a knight bearing a sword and shield branded “LALR parser generator” and “syntax directed translation”. They laid it on thick.
经典的编译器书籍读起来就像是对这些英雄及其工具的阿谀奉承的圣徒传记。《编译器:原理、技术与工具》的封面上,确实有一条被标记为“编译器设计的复杂性”的龙,正被一位手持标有“LALR 解析器生成器”和“语法导向翻译”的剑与盾的骑士所斩杀。他们描绘得相当夸张。

A little self-congratulation is well-deserved, but the truth is you don’t need to know most of that stuff to bang out a high quality parser for a modern machine. As always, I encourage you to broaden your education and take it in later, but this book omits the trophy case.
适度的自我祝贺是应得的,但事实上,你并不需要了解大部分内容就能为现代机器编写出高质量的解析器。一如既往,我鼓励你拓宽知识面并在之后深入学习,但本书略去了那些荣誉展示。

6 . 1Ambiguity and the Parsing Game
歧义与解析游戏

In the last chapter, I said you can “play” a context-free grammar like a game in order to generate strings. Parsers play that game in reverse. Given a stringa series of tokenswe map those tokens to terminals in the grammar to figure out which rules could have generated that string.
在上一章中,我说过你可以像玩游戏一样“玩”一个上下文无关文法来生成字符串。解析器则逆向进行这个游戏。给定一个字符串——一系列标记——我们将这些标记映射到文法中的终结符,以找出哪些规则可能生成了该字符串。

The “could have” part is interesting. It’s entirely possible to create a grammar that is ambiguous, where different choices of productions can lead to the same string. When you’re using the grammar to generate strings, that doesn’t matter much. Once you have the string, who cares how you got to it?
“可能”这部分很有趣。完全有可能创建一个歧义语法,其中不同的产生式选择可能导致相同的字符串。当你使用语法生成字符串时,这并不重要。一旦你有了字符串,谁在乎你是怎么得到的呢?

When parsing, ambiguity means the parser may misunderstand the user’s code. As we parse, we aren’t just determining if the string is valid Lox code, we’re also tracking which rules match which parts of it so that we know what part of the language each token belongs to. Here’s the Lox expression grammar we put together in the last chapter:
在解析时,歧义意味着解析器可能会误解用户的代码。当我们进行解析时,我们不仅仅是在确定字符串是否为有效的 Lox 代码,我们还在跟踪哪些规则匹配了它的哪些部分,以便我们知道每个标记属于语言的哪一部分。这是我们上一章整理的 Lox 表达式语法:

expressionliteral
               | unary
               | binary
               | grouping ;

literalNUMBER | STRING | "true" | "false" | "nil" ;
grouping"(" expression ")" ;
unary          → ( "-" | "!" ) expression ;
binaryexpression operator expression ;
operator"==" | "!=" | "<" | "<=" | ">" | ">="
               | "+"  | "-"  | "*" | "/" ;

This is a valid string in that grammar:
这是一个符合该语法的有效字符串:

6 / 3 - 1

But there are two ways we could have generated it. One way is:
但我们有两种生成方式。一种是:

  1. Starting at expression, pick binary.
    expression 开始,选择 binary
  2. For the left-hand expression, pick NUMBER, and use 6.
    对于左侧的 expression ,选择 NUMBER ,并使用 6
  3. For the operator, pick "/".
    对于操作员,选择 "/"
  4. For the right-hand expression, pick binary again.
    对于右侧的 expression ,再次选择 binary
  5. In that nested binary expression, pick 3 - 1.
    在该嵌套的 binary 表达式中,选择 3 - 1

Another is:  另一个是:

  1. Starting at expression, pick binary.
    expression 开始,选择 binary
  2. For the left-hand expression, pick binary again.
    对于左侧的 expression ,再次选择 binary
  3. In that nested binary expression, pick 6 / 3.
    在该嵌套的 binary 表达式中,选择 6 / 3
  4. Back at the outer binary, for the operator, pick "-".
    回到外部的 binary ,对于操作员,选择 "-"
  5. For the right-hand expression, pick NUMBER, and use 1.
    对于右侧的 expression ,选择 NUMBER ,并使用 1

Those produce the same strings, but not the same syntax trees:
那些产生相同的字符串,但不产生相同的语法树:

Two valid syntax trees: (6 / 3) - 1 and 6 / (3 - 1)

In other words, the grammar allows seeing the expression as (6 / 3) - 1 or 6 / (3 - 1). The binary rule lets operands nest any which way you want. That in turn affects the result of evaluating the parsed tree. The way mathematicians have addressed this ambiguity since blackboards were first invented is by defining rules for precedence and associativity.
换句话说,语法允许将表达式视为 (6 / 3) - 16 / (3 - 1)binary 规则允许操作数以任何方式嵌套。这反过来又影响了解析树评估的结果。自黑板发明以来,数学家们解决这种歧义的方法是通过定义优先级和结合性的规则。

Without well-defined precedence and associativity, an expression that uses multiple operators is ambiguousit can be parsed into different syntax trees, which could in turn evaluate to different results. We’ll fix that in Lox by applying the same precedence rules as C, going from lowest to highest.
如果没有明确的优先级和结合性,使用多个运算符的表达式就会变得模糊不清——它可能被解析成不同的语法树,进而可能计算出不同的结果。我们将在 Lox 中通过应用与 C 语言相同的优先级规则来解决这个问题,从最低到最高。

Name  名称 Operators Associates
Equality  平等 == != Left  
Comparison  比较 > >= < <= Left  
Term - + Left  
Factor  因子 / * Left  
Unary  一元 ! - Right

Right now, the grammar stuffs all expression types into a single expression rule. That same rule is used as the non-terminal for operands, which lets the grammar accept any kind of expression as a subexpression, regardless of whether the precedence rules allow it.
目前,语法将所有表达式类型都塞进了一个单一的 expression 规则中。该规则同样被用作操作数的非终结符,这使得语法能够接受任何类型的表达式作为子表达式,而不管优先级规则是否允许。

We fix that by stratifying the grammar. We define a separate rule for each precedence level.
我们通过分层语法来解决这个问题。我们为每个优先级级别定义了一个单独的规则。

expression     → ...
equality       → ...
comparison     → ...
term           → ...
factor         → ...
unary          → ...
primary        → ...

Each rule here only matches expressions at its precedence level or higher. For example, unary matches a unary expression like !negated or a primary expression like 1234. And term can match 1 + 2 but also 3 * 4 / 5. The final primary rule covers the highest-precedence formsliterals and parenthesized expressions.
这里的每条规则仅匹配其优先级或更高优先级的表达式。例如, unary 可以匹配像 !negated 这样的一元表达式,或者像 1234 这样的基本表达式。而 term 可以匹配 1 + 2 ,也可以匹配 3 * 4 / 5 。最后的 primary 规则涵盖了最高优先级的表达式形式——字面量和括号内的表达式。

We just need to fill in the productions for each of those rules. We’ll do the easy ones first. The top expression rule matches any expression at any precedence level. Since equality has the lowest precedence, if we match that, then it covers everything.
我们只需要为每个规则填写产生式。我们先从简单的开始。顶部的 expression 规则匹配任何优先级下的任何表达式。由于 equality 具有最低的优先级,如果我们匹配它,那么它就覆盖了所有情况。

expressionequality

Over at the other end of the precedence table, a primary expression contains all the literals and grouping expressions.
在优先级表的另一端,主表达式包含所有字面量和分组表达式。

primaryNUMBER | STRING | "true" | "false" | "nil"
               | "(" expression ")" ;

A unary expression starts with a unary operator followed by the operand. Since unary operators can nest!!true is a valid if weird expressionthe operand can itself be a unary operator. A recursive rule handles that nicely.
一元表达式以一元运算符开头,后跟操作数。由于一元运算符可以嵌套—— !!true 是一个有效但奇怪的表达式——操作数本身也可以是一元运算符。递归规则很好地处理了这种情况。

unary          → ( "!" | "-" ) unary ;

But this rule has a problem. It never terminates.
但这条规则有一个问题。它永远不会终止。

Remember, each rule needs to match expressions at that precedence level or higher, so we also need to let this match a primary expression.
记住,每条规则都需要匹配该优先级或更高优先级的表达式,因此我们还需要让它匹配一个基本表达式。

unary          → ( "!" | "-" ) unary
               | primary ;

That works.  那行得通。

The remaining rules are all binary operators. We’ll start with the rule for multiplication and division. Here’s a first try:
剩下的规则都是二元运算符。我们将从乘法和除法的规则开始。这是第一次尝试:

factorfactor ( "/" | "*" ) unary
               | unary ;

The rule recurses to match the left operand. That enables the rule to match a series of multiplication and division expressions like 1 * 2 / 3. Putting the recursive production on the left side and unary on the right makes the rule left-associative and unambiguous.
该规则递归地匹配左操作数。这使得规则能够匹配一系列乘法和除法表达式,如 1 * 2 / 3 。将递归产生式放在左侧, unary 放在右侧,使规则具有左结合性且无歧义。

All of this is correct, but the fact that the first symbol in the body of the rule is the same as the head of the rule means this production is left-recursive. Some parsing techniques, including the one we’re going to use, have trouble with left recursion. (Recursion elsewhere, like we have in unary and the indirect recursion for grouping in primary are not a problem.)
所有这些都是正确的,但规则体中的第一个符号与规则头相同,这意味着这个产生式是左递归的。一些解析技术,包括我们将要使用的技术,在处理左递归时会遇到困难。(其他地方的递归,如我们在 unary 中的递归以及 primary 中用于分组的间接递归,则没有问题。)

There are many grammars you can define that match the same language. The choice for how to model a particular language is partially a matter of taste and partially a pragmatic one. This rule is correct, but not optimal for how we intend to parse it. Instead of a left recursive rule, we’ll use a different one.
有许多语法可以定义来匹配同一种语言。如何为特定语言建模的选择部分取决于个人偏好,部分则是出于实用考虑。这条规则是正确的,但对于我们预期的解析方式来说并非最优。我们将使用另一条规则,而非左递归规则。

factorunary ( ( "/" | "*" ) unary )* ;

We define a factor expression as a flat sequence of multiplications and divisions. This matches the same syntax as the previous rule, but better mirrors the code we’ll write to parse Lox. We use the same structure for all of the other binary operator precedence levels, giving us this complete expression grammar:
我们将因子表达式定义为一系列连续的乘法和除法运算。这与前一条规则的语法相匹配,但更好地反映了我们将编写的用于解析 Lox 的代码。我们对所有其他二元运算符优先级级别采用相同的结构,从而得到以下完整的表达式语法:

expressionequality ;
equalitycomparison ( ( "!=" | "==" ) comparison )* ;
comparisonterm ( ( ">" | ">=" | "<" | "<=" ) term )* ;
termfactor ( ( "-" | "+" ) factor )* ;
factorunary ( ( "/" | "*" ) unary )* ;
unary          → ( "!" | "-" ) unary
               | primary ;
primaryNUMBER | STRING | "true" | "false" | "nil"
               | "(" expression ")" ;

This grammar is more complex than the one we had before, but in return we have eliminated the previous one’s ambiguity. It’s just what we need to make a parser.
这个语法比我们之前的更复杂,但作为回报,我们消除了前一个语法的歧义。这正是我们构建解析器所需要的。

6 . 2Recursive Descent Parsing
递归下降解析

There is a whole pack of parsing techniques whose names are mostly combinations of “L” and “R”LL(k), LR(1), LALRalong with more exotic beasts like parser combinators, Earley parsers, the shunting yard algorithm, and packrat parsing. For our first interpreter, one technique is more than sufficient: recursive descent.
有一整套解析技术,其名称大多是“L”和“R”的组合——LL(k)、LR(1)、LALR——以及更奇特的工具,如解析器组合子、Earley 解析器、调度场算法和 packrat 解析。对于我们的第一个解释器,一种技术就足够了:递归下降。

Recursive descent is the simplest way to build a parser, and doesn’t require using complex parser generator tools like Yacc, Bison or ANTLR. All you need is straightforward handwritten code. Don’t be fooled by its simplicity, though. Recursive descent parsers are fast, robust, and can support sophisticated error handling. In fact, GCC, V8 (the JavaScript VM in Chrome), Roslyn (the C# compiler written in C#) and many other heavyweight production language implementations use recursive descent. It rocks.
递归下降是构建解析器的最简单方法,且无需使用如 Yacc、Bison 或 ANTLR 等复杂的解析器生成工具。你所需要的只是直接手写的代码。不过,别被它的简单所迷惑。递归下降解析器速度快、健壮性强,并能支持复杂的错误处理。实际上,GCC、V8(Chrome 中的 JavaScript 虚拟机)、Roslyn(用 C#编写的 C#编译器)以及许多其他重量级生产语言实现都采用了递归下降。它真的很棒。

Recursive descent is considered a top-down parser because it starts from the top or outermost grammar rule (here expression) and works its way down into the nested subexpressions before finally reaching the leaves of the syntax tree. This is in contrast with bottom-up parsers like LR that start with primary expressions and compose them into larger and larger chunks of syntax.
递归下降被认为是一种自顶向下的解析器,因为它从顶部或最外层的语法规则(此处为 expression )开始,逐步深入到嵌套的子表达式中,最终到达语法树的叶子节点。这与自底向上的解析器(如 LR 解析器)形成对比,后者从基本表达式开始,逐步组合成越来越大的语法块。

A recursive descent parser is a literal translation of the grammar’s rules straight into imperative code. Each rule becomes a function. The body of the rule translates to code roughly like:
递归下降解析器是将语法规则直接转换为命令式代码的字面翻译。每个规则变成一个函数。规则的主体大致翻译为如下代码:

Grammar notation  语法符号 Code representation  代码表示
TerminalCode to match and consume a token
匹配并消费令牌的代码
NonterminalCall to that rule’s function
调用该规则的函数
|if or switch statement   ifswitch 语句
* or +   *+ while or for loop   whilefor 循环
?if statement

The descent is described as “recursive” because when a grammar rule refers to itselfdirectly or indirectlythat translates to a recursive function call.
下降被称为“递归”是因为当语法规则直接或间接引用自身时,这转化为递归函数调用。

6 . 2 . 1The parser class  解析器类

Each grammar rule becomes a method inside this new class:
每个语法规则都成为这个新类中的一个方法:

lox/Parser.java
create new file  创建新文件
package com.craftinginterpreters.lox;

import java.util.List;

import static com.craftinginterpreters.lox.TokenType.*;

class Parser {
  private final List<Token> tokens;
  private int current = 0;

  Parser(List<Token> tokens) {
    this.tokens = tokens;
  }
}
lox/Parser.java, create new file

Like the scanner, the parser consumes a flat input sequence, only now we’re reading tokens instead of characters. We store the list of tokens and use current to point to the next token eagerly waiting to be parsed.
与扫描器类似,解析器也消耗一个扁平的输入序列,只不过现在我们读取的是标记而非字符。我们存储标记列表,并使用 current 来指向急切等待被解析的下一个标记。

We’re going to run straight through the expression grammar now and translate each rule to Java code. The first rule, expression, simply expands to the equality rule, so that’s straightforward.
我们现在将直接运行表达式语法,并将每个规则翻译为 Java 代码。第一个规则 expression 简单地扩展为 equality 规则,因此这很简单。

lox/Parser.java
add after Parser()  在 Parser()后添加
  private Expr expression() {
    return equality();
  }
lox/Parser.java, add after Parser()

Each method for parsing a grammar rule produces a syntax tree for that rule and returns it to the caller. When the body of the rule contains a nonterminala reference to another rulewe call that other rule’s method.
每种解析语法规则的方法都会为该规则生成一个语法树,并将其返回给调用者。当规则的主体包含一个非终结符——即对另一个规则的引用时,我们会调用该规则的方法。

The rule for equality is a little more complex.
相等性的规则稍微复杂一些。

equalitycomparison ( ( "!=" | "==" ) comparison )* ;

In Java, that becomes:
在 Java 中,这变为:

lox/Parser.java
add after expression()  在 expression()后添加
  private Expr equality() {
    Expr expr = comparison();

    while (match(BANG_EQUAL, EQUAL_EQUAL)) {
      Token operator = previous();
      Expr right = comparison();
      expr = new Expr.Binary(expr, operator, right);
    }

    return expr;
  }
lox/Parser.java, add after expression()

Let’s step through it. The first comparison nonterminal in the body translates to the first call to comparison() in the method. We take that result and store it in a local variable.
让我们逐步分析。正文中的第一个 comparison 非终结符转换为方法中对 comparison() 的第一次调用。我们获取该结果并将其存储在局部变量中。

Then, the ( ... )* loop in the rule maps to a while loop. We need to know when to exit that loop. We can see that inside the rule, we must first find either a != or == token. So, if we don’t see one of those, we must be done with the sequence of equality operators. We express that check using a handy match() method.
然后,规则中的 ( ... )* 循环映射到 while 循环。我们需要知道何时退出该循环。我们可以看到,在规则内部,我们必须首先找到 !=== 标记。因此,如果我们没有看到其中一个,那么我们必须已经完成了相等运算符的序列。我们使用一个方便的 match() 方法来表示该检查。

lox/Parser.java
add after equality()  在 equality()后添加
  private boolean match(TokenType... types) {
    for (TokenType type : types) {
      if (check(type)) {
        advance();
        return true;
      }
    }

    return false;
  }
lox/Parser.java, add after equality()

This checks to see if the current token has any of the given types. If so, it consumes the token and returns true. Otherwise, it returns false and leaves the current token alone. The match() method is defined in terms of two more fundamental operations.
此操作检查当前令牌是否具有任何给定类型。如果是,则消耗该令牌并返回 true 。否则,返回 false 并保持当前令牌不变。 match() 方法基于两个更基本的操作定义。

The check() method returns true if the current token is of the given type. Unlike match(), it never consumes the token, it only looks at it.
check() 方法在当前标记为给定类型时返回 true 。与 match() 不同,它从不消耗标记,仅查看它。

lox/Parser.java
add after match()  在 match() 后添加
  private boolean check(TokenType type) {
    if (isAtEnd()) return false;
    return peek().type == type;
  }
lox/Parser.java, add after match()

The advance() method consumes the current token and returns it, similar to how our scanner’s corresponding method crawled through characters.
advance() 方法消耗当前标记并返回它,类似于我们扫描器对应方法遍历字符的方式。

lox/Parser.java
add after check()  在 check()后添加
  private Token advance() {
    if (!isAtEnd()) current++;
    return previous();
  }
lox/Parser.java, add after check()

These methods bottom out on the last handful of primitive operations.
这些方法最终归结为最后几个基本操作。

lox/Parser.java
add after advance()  在 advance()后添加
  private boolean isAtEnd() {
    return peek().type == EOF;
  }

  private Token peek() {
    return tokens.get(current);
  }

  private Token previous() {
    return tokens.get(current - 1);
  }
lox/Parser.java, add after advance()

isAtEnd() checks if we’ve run out of tokens to parse. peek() returns the current token we have yet to consume, and previous() returns the most recently consumed token. The latter makes it easier to use match() and then access the just-matched token.
isAtEnd() 检查我们是否已经用完要解析的标记。 peek() 返回我们尚未消耗的当前标记, previous() 返回最近消耗的标记。后者使得使用 match() 后访问刚刚匹配的标记更加方便。

That’s most of the parsing infrastructure we need. Where were we? Right, so if we are inside the while loop in equality(), then we know we have found a != or == operator and must be parsing an equality expression.
这就是我们所需的大部分解析基础设施。我们说到哪儿了?对了,如果我们在 equality() 中的 while 循环内,那么我们知道已经找到了 !=== 运算符,并且必须解析一个相等表达式。

We grab the matched operator token so we can track which kind of equality expression we have. Then we call comparison() again to parse the right-hand operand. We combine the operator and its two operands into a new Expr.Binary syntax tree node, and then loop around. For each iteration, we store the resulting expression back in the same expr local variable. As we zip through a sequence of equality expressions, that creates a left-associative nested tree of binary operator nodes.
我们获取匹配的运算符标记,以便跟踪所处理的等式表达式类型。接着,再次调用 comparison() 来解析右侧操作数。将运算符及其两个操作数组合成一个新的 Expr.Binary 语法树节点,然后循环处理。每次迭代时,我们将结果表达式存储回同一个 expr 局部变量中。当快速遍历一系列等式表达式时,这便构建了一个左结合嵌套的二元运算符节点树。

The syntax tree created by parsing 'a == b == c == d == e'

The parser falls out of the loop once it hits a token that’s not an equality operator. Finally, it returns the expression. Note that if the parser never encounters an equality operator, then it never enters the loop. In that case, the equality() method effectively calls and returns comparison(). In that way, this method matches an equality operator or anything of higher precedence.
解析器一旦遇到非等值运算符的标记,就会退出循环。最后,它返回表达式。请注意,如果解析器从未遇到等值运算符,则它永远不会进入循环。在这种情况下,` equality() ` 方法实际上会调用并返回 ` comparison() `。通过这种方式,该方法匹配等值运算符或任何更高优先级的表达式。

Moving on to the next rule . . . 
继续下一个规则...

comparisonterm ( ( ">" | ">=" | "<" | "<=" ) term )* ;

Translated to Java:  翻译为 Java:

lox/Parser.java
add after equality()  在 equality()后添加
  private Expr comparison() {
    Expr expr = term();

    while (match(GREATER, GREATER_EQUAL, LESS, LESS_EQUAL)) {
      Token operator = previous();
      Expr right = term();
      expr = new Expr.Binary(expr, operator, right);
    }

    return expr;
  }
lox/Parser.java, add after equality()

The grammar rule is virtually identical to equality and so is the corresponding code. The only differences are the token types for the operators we match, and the method we call for the operandsnow term() instead of comparison(). The remaining two binary operator rules follow the same pattern.
语法规则与 equality 几乎相同,对应的代码也是如此。唯一的区别在于我们匹配的操作符的标记类型,以及我们为操作数调用的方法——现在是 term() 而不是 comparison() 。剩下的两个二元操作符规则遵循相同的模式。

In order of precedence, first addition and subtraction:
按优先级顺序,先进行加减法:

lox/Parser.java
add after comparison()  在 comparison()后添加
  private Expr term() {
    Expr expr = factor();

    while (match(MINUS, PLUS)) {
      Token operator = previous();
      Expr right = factor();
      expr = new Expr.Binary(expr, operator, right);
    }

    return expr;
  }
lox/Parser.java, add after comparison()

And finally, multiplication and division:
最后,乘法和除法:

lox/Parser.java
add after term()  在 term()后添加
  private Expr factor() {
    Expr expr = unary();

    while (match(SLASH, STAR)) {
      Token operator = previous();
      Expr right = unary();
      expr = new Expr.Binary(expr, operator, right);
    }

    return expr;
  }
lox/Parser.java, add after term()

That’s all of the binary operators, parsed with the correct precedence and associativity. We’re crawling up the precedence hierarchy and now we’ve reached the unary operators.
这就是所有的二元运算符,按照正确的优先级和结合性进行解析。我们正在爬升优先级层次结构,现在我们已经到达了一元运算符。

unary          → ( "!" | "-" ) unary
               | primary ;

The code for this is a little different.
此代码略有不同。

lox/Parser.java
add after factor()  在 factor()后添加
  private Expr unary() {
    if (match(BANG, MINUS)) {
      Token operator = previous();
      Expr right = unary();
      return new Expr.Unary(operator, right);
    }

    return primary();
  }
lox/Parser.java, add after factor()

Again, we look at the current token to see how to parse. If it’s a ! or -, we must have a unary expression. In that case, we grab the token and then recursively call unary() again to parse the operand. Wrap that all up in a unary expression syntax tree and we’re done.
再次,我们查看当前标记以确定如何解析。如果它是 !- ,我们必须有一个一元表达式。在这种情况下,我们获取该标记,然后递归调用 unary() 来解析操作数。将所有内容包装在一元表达式语法树中,我们就完成了。

Otherwise, we must have reached the highest level of precedence, primary expressions.
否则,我们必须已经达到了最高优先级的表达式,即基本表达式。

primaryNUMBER | STRING | "true" | "false" | "nil"
               | "(" expression ")" ;

Most of the cases for the rule are single terminals, so parsing is straightforward.
大多数规则的案例都是单一终端,因此解析是直接的。

lox/Parser.java
add after unary()  在 unary()后添加
  private Expr primary() {
    if (match(FALSE)) return new Expr.Literal(false);
    if (match(TRUE)) return new Expr.Literal(true);
    if (match(NIL)) return new Expr.Literal(null);

    if (match(NUMBER, STRING)) {
      return new Expr.Literal(previous().literal);
    }

    if (match(LEFT_PAREN)) {
      Expr expr = expression();
      consume(RIGHT_PAREN, "Expect ')' after expression.");
      return new Expr.Grouping(expr);
    }
  }
lox/Parser.java, add after unary()

The interesting branch is the one for handling parentheses. After we match an opening ( and parse the expression inside it, we must find a ) token. If we don’t, that’s an error.
有趣的分支是处理括号的部分。当我们匹配到一个开头的 ( 并解析其中的表达式后,必须找到一个 ) 标记。如果没有找到,那就是一个错误。

6 . 3Syntax Errors  语法错误

A parser really has two jobs:
解析器实际上有两个任务:

  1. Given a valid sequence of tokens, produce a corresponding syntax tree.
    给定一个有效的标记序列,生成相应的语法树。

  2. Given an invalid sequence of tokens, detect any errors and tell the user about their mistakes.
    给定一个无效的令牌序列,检测任何错误并告知用户其错误。

Don’t underestimate how important the second job is! In modern IDEs and editors, the parser is constantly reparsing codeoften while the user is still editing itin order to syntax highlight and support things like auto-complete. That means it will encounter code in incomplete, half-wrong states all the time.
不要低估第二项工作的重要性!在现代集成开发环境(IDE)和编辑器中,解析器会不断地重新解析代码——通常是在用户仍在编辑时——以便进行语法高亮和支持自动完成等功能。这意味着它将经常遇到不完整、半错误状态的代码。

When the user doesn’t realize the syntax is wrong, it is up to the parser to help guide them back onto the right path. The way it reports errors is a large part of your language’s user interface. Good syntax error handling is hard. By definition, the code isn’t in a well-defined state, so there’s no infallible way to know what the user meant to write. The parser can’t read your mind.
当用户未意识到语法错误时,解析器有责任引导他们回到正确的路径。错误报告的方式是您语言用户界面的重要组成部分。良好的语法错误处理是困难的。根据定义,代码未处于明确定义的状态,因此没有绝对可靠的方法来了解用户想要编写的内容。解析器无法读取您的思维。

There are a couple of hard requirements for when the parser runs into a syntax error. A parser must:
解析器在遇到语法错误时有一些硬性要求。解析器必须:

Those are the table stakes if you want to get in the parser game at all, but you really want to raise the ante beyond that. A decent parser should:
这些是进入解析器游戏的基本要求,但你确实需要在此基础上提高赌注。一个像样的解析器应该:

The last two points are in tension. We want to report as many separate errors as we can, but we don’t want to report ones that are merely side effects of an earlier one.
最后两点存在矛盾。我们希望尽可能多地报告独立的错误,但又不希望报告那些仅仅是早期错误副作用的错误。

The way a parser responds to an error and keeps going to look for later errors is called error recovery. This was a hot research topic in the ’60s. Back then, you’d hand a stack of punch cards to the secretary and come back the next day to see if the compiler succeeded. With an iteration loop that slow, you really wanted to find every single error in your code in one pass.
解析器对错误作出响应并继续寻找后续错误的方式称为错误恢复。这是 20 世纪 60 年代的一个热门研究课题。那时候,你会把一叠穿孔卡片交给秘书,第二天再回来查看编译器是否成功。由于迭代循环如此缓慢,你真的希望在一次编译中找出代码中的所有错误。

Today, when parsers complete before you’ve even finished typing, it’s less of an issue. Simple, fast error recovery is fine.
如今,解析器在你输入完成之前就已结束,这已不再是问题。简单快速的错误恢复就足够了。

6 . 3 . 1Panic mode error recovery
Panic 模式错误恢复

Of all the recovery techniques devised in yesteryear, the one that best stood the test of time is calledsomewhat alarminglypanic mode. As soon as the parser detects an error, it enters panic mode. It knows at least one token doesn’t make sense given its current state in the middle of some stack of grammar productions.
在昔日设计的所有恢复技术中,最能经受时间考验的是一种听起来有些令人不安的——恐慌模式。一旦解析器检测到错误,它就会进入恐慌模式。它知道至少有一个标记在当前语法产生式堆栈的中间状态下是没有意义的。

Before it can get back to parsing, it needs to get its state and the sequence of forthcoming tokens aligned such that the next token does match the rule being parsed. This process is called synchronization.
在恢复解析之前,它需要将其状态与即将到来的标记序列对齐,以确保下一个标记确实匹配正在解析的规则。这一过程称为同步。

To do that, we select some rule in the grammar that will mark the synchronization point. The parser fixes its parsing state by jumping out of any nested productions until it gets back to that rule. Then it synchronizes the token stream by discarding tokens until it reaches one that can appear at that point in the rule.
为此,我们在语法中选择一些规则来标记同步点。解析器通过跳出任何嵌套的产生式来固定其解析状态,直到返回到该规则。然后,它通过丢弃标记来同步标记流,直到到达可以在该规则点出现的标记。

Any additional real syntax errors hiding in those discarded tokens aren’t reported, but it also means that any mistaken cascaded errors that are side effects of the initial error aren’t falsely reported either, which is a decent trade-off.
任何隐藏在那些被丢弃的标记中的额外真实语法错误都不会被报告,但这也意味着任何由初始错误引发的级联错误也不会被错误地报告,这是一个不错的权衡。

The traditional place in the grammar to synchronize is between statements. We don’t have those yet, so we won’t actually synchronize in this chapter, but we’ll get the machinery in place for later.
语法中传统的同步位置是在语句之间。我们目前还没有这些内容,因此本章实际上不会进行同步操作,但我们会为后续章节搭建好机制。

6 . 3 . 2Entering panic mode  进入紧急模式

Back before we went on this side trip around error recovery, we were writing the code to parse a parenthesized expression. After parsing the expression, the parser looks for the closing ) by calling consume(). Here, finally, is that method:
在我们绕道讨论错误恢复之前,我们正在编写解析带括号表达式的代码。解析完表达式后,解析器通过调用 consume() 来查找结束的 ) 。最后,这就是那个方法:

lox/Parser.java
add after match()  在 match() 后添加
  private Token consume(TokenType type, String message) {
    if (check(type)) return advance();

    throw error(peek(), message);
  }
lox/Parser.java, add after match()

It’s similar to match() in that it checks to see if the next token is of the expected type. If so, it consumes the token and everything is groovy. If some other token is there, then we’ve hit an error. We report it by calling this:
它与 match() 类似,都会检查下一个标记是否为预期类型。如果是,则消耗该标记,一切顺利。如果出现其他标记,则意味着遇到了错误。我们通过调用以下内容来报告错误:

lox/Parser.java
add after previous()  在上一个之后添加
  private ParseError error(Token token, String message) {
    Lox.error(token, message);
    return new ParseError();
  }
lox/Parser.java, add after previous()

First, that shows the error to the user by calling:
首先,通过调用以下代码向用户显示错误:

lox/Lox.java
add after report()  在 report()后添加
  static void error(Token token, String message) {
    if (token.type == TokenType.EOF) {
      report(token.line, " at end", message);
    } else {
      report(token.line, " at '" + token.lexeme + "'", message);
    }
  }
lox/Lox.java, add after report()

This reports an error at a given token. It shows the token’s location and the token itself. This will come in handy later since we use tokens throughout the interpreter to track locations in code.
此函数报告给定令牌处的错误。它显示令牌的位置和令牌本身。由于我们在整个解释器中使用令牌来跟踪代码中的位置,这在以后会派上用场。

After we report the error, the user knows about their mistake, but what does the parser do next? Back in error(), we create and return a ParseError, an instance of this new class:
在我们报告错误后,用户知道了他们的错误,但解析器接下来会做什么呢?回到 error() ,我们创建并返回一个 ParseError,这是新类的一个实例:

class Parser {
lox/Parser.java
nest inside class Parser
嵌套在类 Parser 内部
  private static class ParseError extends RuntimeException {}

  private final List<Token> tokens;
lox/Parser.java, nest inside class Parser

This is a simple sentinel class we use to unwind the parser. The error() method returns the error instead of throwing it because we want to let the calling method inside the parser decide whether to unwind or not. Some parse errors occur in places where the parser isn’t likely to get into a weird state and we don’t need to synchronize. In those places, we simply report the error and keep on truckin’.
这是一个简单的哨兵类,我们用它来展开解析器。 error() 方法返回错误而不是抛出它,因为我们想让解析器内部的调用方法决定是否展开。有些解析错误发生在解析器不太可能进入奇怪状态的地方,我们不需要同步。在这些地方,我们只需报告错误并继续前进。

For example, Lox limits the number of arguments you can pass to a function. If you pass too many, the parser needs to report that error, but it can and should simply keep on parsing the extra arguments instead of freaking out and going into panic mode.
例如,Lox 限制了可以传递给函数的参数数量。如果传递了太多参数,解析器需要报告该错误,但它可以而且应该继续解析额外的参数,而不是惊慌失措并进入恐慌模式。

In our case, though, the syntax error is nasty enough that we want to panic and synchronize. Discarding tokens is pretty easy, but how do we synchronize the parser’s own state?
不过,在我们的例子中,语法错误严重到足以让我们想要报错并同步。丢弃标记相当容易,但我们如何同步解析器自身的状态呢?

6 . 3 . 3Synchronizing a recursive descent parser
同步递归下降解析器

With recursive descent, the parser’s statewhich rules it is in the middle of recognizingis not stored explicitly in fields. Instead, we use Java’s own call stack to track what the parser is doing. Each rule in the middle of being parsed is a call frame on the stack. In order to reset that state, we need to clear out those call frames.
使用递归下降法时,解析器的状态——即它正在识别哪些规则——并不显式地存储在字段中。相反,我们利用 Java 自身的调用栈来跟踪解析器的操作。每个正在解析的规则都是栈上的一个调用帧。为了重置该状态,我们需要清除这些调用帧。

The natural way to do that in Java is exceptions. When we want to synchronize, we throw that ParseError object. Higher up in the method for the grammar rule we are synchronizing to, we’ll catch it. Since we synchronize on statement boundaries, we’ll catch the exception there. After the exception is caught, the parser is in the right state. All that’s left is to synchronize the tokens.
在 Java 中实现这一点的自然方式是使用异常。当我们想要同步时,我们抛出那个 ParseError 对象。在我们同步到的语法规则的方法中更高层的位置,我们会捕获它。由于我们在语句边界上同步,我们会在那里捕获异常。捕获异常后,解析器处于正确的状态。剩下的就是同步标记了。

We want to discard tokens until we’re right at the beginning of the next statement. That boundary is pretty easy to spotit’s one of the main reasons we picked it. After a semicolon, we’re probably finished with a statement. Most statements start with a keywordfor, if, return, var, etc. When the next token is any of those, we’re probably about to start a statement.
我们希望丢弃标记,直到我们正好处于下一条语句的开头。这个边界很容易识别——这是我们选择它的主要原因之一。在分号之后,我们可能已经完成了一条语句。大多数语句以关键字开头—— forifreturnvar 等。当下一个标记是这些关键字之一时,我们可能即将开始一条语句。

This method encapsulates that logic:
该方法封装了该逻辑:

lox/Parser.java
add after error()  在 error()后添加
  private void synchronize() {
    advance();

    while (!isAtEnd()) {
      if (previous().type == SEMICOLON) return;

      switch (peek().type) {
        case CLASS:
        case FUN:
        case VAR:
        case FOR:
        case IF:
        case WHILE:
        case PRINT:
        case RETURN:
          return;
      }

      advance();
    }
  }
lox/Parser.java, add after error()

It discards tokens until it thinks it has found a statement boundary. After catching a ParseError, we’ll call this and then we are hopefully back in sync. When it works well, we have discarded tokens that would have likely caused cascaded errors anyway, and now we can parse the rest of the file starting at the next statement.
它会丢弃标记,直到认为找到了语句边界。在捕获到 ParseError 后,我们将调用此方法,希望此时能重新同步。当它运作良好时,我们已经丢弃了那些可能导致级联错误的标记,现在可以从下一个语句开始解析文件的其余部分。

Alas, we don’t get to see this method in action, since we don’t have statements yet. We’ll get to that in a couple of chapters. For now, if an error occurs, we’ll panic and unwind all the way to the top and stop parsing. Since we can parse only a single expression anyway, that’s no big loss.
可惜的是,由于我们还没有语句,所以无法看到这个方法的具体应用。我们将在接下来的几章中讨论这个问题。目前,如果出现错误,我们会直接 panic 并回退到最上层,停止解析。反正我们只能解析单个表达式,所以这也没什么大不了的。

6 . 4Wiring up the Parser  连接解析器

We are mostly done parsing expressions now. There is one other place where we need to add a little error handling. As the parser descends through the parsing methods for each grammar rule, it eventually hits primary(). If none of the cases in there match, it means we are sitting on a token that can’t start an expression. We need to handle that error too.
我们现在基本完成了表达式的解析。还有一个地方需要添加一些错误处理。当解析器通过每个语法规则的解析方法下降时,它最终会到达 primary() 。如果其中的所有情况都不匹配,这意味着我们遇到了一个不能作为表达式开头的标记。我们也需要处理这个错误。

    if (match(LEFT_PAREN)) {
      Expr expr = expression();
      consume(RIGHT_PAREN, "Expect ')' after expression.");
      return new Expr.Grouping(expr);
    }
lox/Parser.java
in primary()  在 primary()中
    throw error(peek(), "Expect expression.");
  }
lox/Parser.java, in primary()

With that, all that remains in the parser is to define an initial method to kick it off. That method is called, naturally enough, parse().
至此,解析器中剩下的就是定义一个初始方法来启动它。这个方法自然被称为 parse()

lox/Parser.java
add after Parser()  在 Parser()后添加
  Expr parse() {
    try {
      return expression();
    } catch (ParseError error) {
      return null;
    }
  }
lox/Parser.java, add after Parser()

We’ll revisit this method later when we add statements to the language. For now, it parses a single expression and returns it. We also have some temporary code to exit out of panic mode. Syntax error recovery is the parser’s job, so we don’t want the ParseError exception to escape into the rest of the interpreter.
我们稍后向语言中添加语句时会重新讨论这个方法。目前,它只解析单个表达式并返回它。我们还有一些临时代码用于退出恐慌模式。语法错误恢复是解析器的职责,因此我们不希望 ParseError 异常逃逸到解释器的其他部分。

When a syntax error does occur, this method returns null. That’s OK. The parser promises not to crash or hang on invalid syntax, but it doesn’t promise to return a usable syntax tree if an error is found. As soon as the parser reports an error, hadError gets set, and subsequent phases are skipped.
当发生语法错误时,此方法返回 null 。这是正常的。解析器承诺不会因无效语法而崩溃或挂起,但它不承诺在发现错误时返回可用的语法树。一旦解析器报告错误, hadError 就会被设置,后续阶段将被跳过。

Finally, we can hook up our brand new parser to the main Lox class and try it out. We still don’t have an interpreter, so for now, we’ll parse to a syntax tree and then use the AstPrinter class from the last chapter to display it.
最后,我们可以将全新的解析器连接到主 Lox 类并进行测试。由于我们还没有解释器,所以目前我们将解析为语法树,然后使用上一章的 AstPrinter 类来显示它。

Delete the old code to print the scanned tokens and replace it with this:
删除用于打印扫描标记的旧代码,并将其替换为以下内容:

    List<Token> tokens = scanner.scanTokens();
lox/Lox.java
in run()  在 run() 中
replace 5 lines  替换 5 行
    Parser parser = new Parser(tokens);
    Expr expression = parser.parse();

    // Stop if there was a syntax error.
    if (hadError) return;

    System.out.println(new AstPrinter().print(expression));
  }
lox/Lox.java, in run(), replace 5 lines

Congratulations, you have crossed the threshold! That really is all there is to handwriting a parser. We’ll extend the grammar in later chapters with assignment, statements, and other stuff, but none of that is any more complex than the binary operators we tackled here.
恭喜你,已经跨过了门槛!手写解析器其实就这么简单。在后续章节中,我们将扩展语法,加入赋值、语句等内容,但这些都不会比我们在这里处理的二元运算符更复杂。

Fire up the interpreter and type in some expressions. See how it handles precedence and associativity correctly? Not bad for less than 200 lines of code.
启动解释器并输入一些表达式。看看它是如何正确处理优先级和结合性的?对于不到 200 行代码来说,这还不错。

Challenges  挑战

  1. In C, a block is a statement form that allows you to pack a series of statements where a single one is expected. The comma operator is an analogous syntax for expressions. A comma-separated series of expressions can be given where a single expression is expected (except inside a function call’s argument list). At runtime, the comma operator evaluates the left operand and discards the result. Then it evaluates and returns the right operand.
    在 C 语言中,块是一种语句形式,允许你在预期单个语句的地方打包一系列语句。逗号运算符是表达式的类似语法。在预期单个表达式的地方(函数调用的参数列表内部除外),可以给出一个由逗号分隔的表达式系列。在运行时,逗号运算符会计算左操作数并丢弃结果,然后计算并返回右操作数。

    Add support for comma expressions. Give them the same precedence and associativity as in C. Write the grammar, and then implement the necessary parsing code.
    添加对逗号表达式的支持。赋予它们与 C 语言中相同的优先级和结合性。编写语法,然后实现必要的解析代码。

  2. Likewise, add support for the C-style conditional or “ternary” operator ?:. What precedence level is allowed between the ? and :? Is the whole operator left-associative or right-associative?
    同样地,添加对 C 风格的条件或“三元”运算符 ?: 的支持。在 ?: 之间允许的优先级级别是什么?整个运算符是左结合还是右结合?

  3. Add error productions to handle each binary operator appearing without a left-hand operand. In other words, detect a binary operator appearing at the beginning of an expression. Report that as an error, but also parse and discard a right-hand operand with the appropriate precedence.
    添加错误产生式以处理每个没有左操作数出现的二元运算符。换句话说,检测出现在表达式开头的二元运算符。将其报告为错误,但也要解析并丢弃具有适当优先级的右操作数。

Design Note: Logic Versus History
设计说明:逻辑与历史

Let’s say we decide to add bitwise & and | operators to Lox. Where should we put them in the precedence hierarchy? Cand most languages that follow in C’s footstepsplace them below ==. This is widely considered a mistake because it means common operations like testing a flag require parentheses.
假设我们决定在 Lox 中添加按位 &| 运算符。我们应该将它们放在优先级层次结构中的哪个位置?C 语言——以及大多数追随 C 语言脚步的语言——将它们放在 == 之下。这被广泛认为是一个错误,因为它意味着像测试标志这样的常见操作需要括号。

if (flags & FLAG_MASK == SOME_FLAG) { ... } // Wrong.
if ((flags & FLAG_MASK) == SOME_FLAG) { ... } // Right.

Should we fix this for Lox and put bitwise operators higher up the precedence table than C does? There are two strategies we can take.
我们应该为 Lox 修复这个问题,并将位运算符的优先级设置得比 C 语言更高吗?我们可以采取两种策略。

You almost never want to use the result of an == expression as the operand to a bitwise operator. By making bitwise bind tighter, users don’t need to parenthesize as often. So if we do that, and users assume the precedence is chosen logically to minimize parentheses, they’re likely to infer it correctly.
几乎从不希望将 == 表达式的结果用作位运算符的操作数。通过使位运算符绑定更紧密,用户不需要经常使用括号。因此,如果我们这样做,并且用户假设优先级是逻辑选择的以最小化括号,他们很可能会正确推断出它。

This kind of internal consistency makes the language easier to learn because there are fewer edge cases and exceptions users have to stumble into and then correct. That’s good, because before users can use our language, they have to load all of that syntax and semantics into their heads. A simpler, more rational language makes sense.
这种内部一致性使得语言更易于学习,因为用户需要遇到并纠正的边缘情况和例外更少。这是好事,因为在用户能够使用我们的语言之前,他们必须将所有语法和语义装入脑海。一个更简单、更合理的语言是有意义的。

But, for many users there is an even faster shortcut to getting our language’s ideas into their wetwareuse concepts they already know. Many newcomers to our language will be coming from some other language or languages. If our language uses some of the same syntax or semantics as those, there is much less for the user to learn (and unlearn).
但是,对于许多用户来说,将我们语言的思想融入他们的“湿件”中有一个更快捷的途径——利用他们已经熟悉的概念。许多新接触我们语言的用户可能来自其他一种或多种语言背景。如果我们的语言采用了与那些语言相似的语法或语义,用户需要学习(以及摒弃)的内容就会少得多。

This is particularly helpful with syntax. You may not remember it well today, but way back when you learned your very first programming language, code probably looked alien and unapproachable. Only through painstaking effort did you learn to read and accept it. If you design a novel syntax for your new language, you force users to start that process all over again.
这对于语法特别有帮助。你可能今天记得不太清楚,但回想当初学习第一门编程语言时,代码可能看起来陌生且难以接近。只有通过艰苦的努力,你才学会阅读并接受它。如果你为你的新语言设计了一种全新的语法,你就迫使用户重新开始这个过程。

Taking advantage of what users already know is one of the most powerful tools you can use to ease adoption of your language. It’s almost impossible to overestimate how valuable this is. But it faces you with a nasty problem: What happens when the thing the users all know kind of sucks? C’s bitwise operator precedence is a mistake that doesn’t make sense. But it’s a familiar mistake that millions have already gotten used to and learned to live with.
利用用户已有的知识是你可以用来促进语言采用的最强大工具之一。这一点的重要性几乎无法高估。但它也带来了一个棘手的问题:当用户所熟知的东西其实并不好时,该怎么办?C 语言的位运算符优先级就是一个毫无道理的错误。然而,这是一个数百万用户已经习惯并学会与之共处的熟悉错误。

Do you stay true to your language’s own internal logic and ignore history? Do you start from a blank slate and first principles? Or do you weave your language into the rich tapestry of programming history and give your users a leg up by starting from something they already know?
你是否坚持自己语言的内部逻辑而忽视历史?你是否从零开始,遵循基本原则?还是你将语言编织进编程历史的丰富挂毯中,通过从用户已知的内容出发,为他们提供助力?

There is no perfect answer here, only trade-offs. You and I are obviously biased towards liking novel languages, so our natural inclination is to burn the history books and start our own story.
这里没有完美的答案,只有权衡。你和我显然倾向于喜欢新颖的语言,所以我们自然的倾向是烧掉历史书,开始我们自己的故事。

In practice, it’s often better to make the most of what users already know. Getting them to come to your language requires a big leap. The smaller you can make that chasm, the more people will be willing to cross it. But you can’t always stick to history, or your language won’t have anything new and compelling to give people a reason to jump over.
在实践中,充分利用用户已有的知识往往更为可取。让他们接受你的语言需要跨越巨大的鸿沟。你能够缩小的差距越小,愿意跨越它的人就越多。但你不能总是固守历史,否则你的语言将缺乏新颖和引人入胜之处,无法给人们提供跨越的理由。