Backus–Naur Form 巴科斯-诺尔范式
In this topic, you have been introduced to regular languages and to the idea that a set of strings within a particular regular language can be defined using a finite state machine or regular expression.
在本主题中,我们向您介绍了正则语言以及可以使用有限状态机或正则表达式定义特定正则语言中的一组字符串的想法。
Now let’s cover another class of languages, known as context-free languages. The set of context-free languages includes all regular languages.
现在让我们介绍另一类语言,称为上下文无关语言。上下文无关语言集包括所有常规语言。
Most programming languages can be defined as context-free languages. Context-free languages are more complex than regular languages. To define the set of strings in a particular context free language, use context-free grammar: a set of production rules that describe all possible strings (in a given language). One example of a context-free grammar is Backus–Naur Form.
大多数编程语言都可以定义为上下文无关语言。上下文无关语言比常规语言更复杂。要以特定的上下文无关语言定义字符串集,请使用上下文无关语法:一组描述所有可能字符串(在给定语言中)的产生规则。上下文无关语法的一个例子是巴科斯-诺尔范式。
BNF (Backus–Naur Form) is a context-free grammar commonly used by developers of programming languages to specify the syntax rules of a language. John Backus was a program language designer who devised a notation to document IAL (an early implementation of Algol). Peter Naur later worked on Backus’ findings, and the notation was jointly credited to both computer scientists.
BNF(巴科斯-诺尔范式)是编程语言开发人员常用的上下文无关语法,用于指定语言的语法规则。 John Backus 是一位程序语言设计师,他设计了一种表示法来记录 IAL(Algol 的早期实现)。彼得·诺尔后来对巴克斯的发现进行了研究,该符号被共同归功于两位计算机科学家。
BNF uses a range of symbols and expressions to create production rules. A simple BNF production rule might look like this:
BNF 使用一系列符号和表达式来创建产生式规则。一个简单的 BNF 产生规则可能如下所示:
<digit> ::= 0|1|2|3|4|5|6|7|8|9
::= 0|1|2|3|4|5|6|7|8|9
This would be interpreted as: a digit can be defined as 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9
这将被解释为:数字可以定义为 0、1、2、3、4、5、6、7、8 或 9
The chevrons (< >) are used to denote a non-terminal symbol. If a non-terminal symbol appears on the right side of the production rules, it means that there will be another production rule (or set of rules) to define its replacement. Consider the following production rule:
V 形 (< >) 用于表示非终结符。如果非终结符号出现在产生式规则的右侧,则意味着将有另一个产生式规则(或一组规则)来定义其替换。考虑以下生产规则:
<fullname> :: =<title><name><name>
:: =
This shows that full name comprises a title, a name, and another name. However, all three component parts are non-terminal. Therefore, further production rules are required. For example, a production rule may define title as follows:
这表明全名包括头衔、姓名和别名。然而,所有三个组成部分都是非终端的。因此,需要进一步的生产规则。例如,产生式规则可以定义标题如下:
<title> :: =Mr|Mrs|Ms|Miss|Dr
In this rule, Mr, Mrs, Ms, Miss, and Dr are terminal symbols. They are not enclosed in chevrons so they are the actual values that are allowed for title. The pipe symbol | is a metacharacter that is used to denote alternatives. Each production rule will be strictly applied; you may know that other titles exist, for example Lord, but the production rule alone determines the valid options for use in this particular formal language.
在此规则中, Mr 、 Mrs 、 Ms 、 Miss和Dr是终结符。它们没有包含在 V 形中,因此它们是title允许的实际值。管道符号|是一个元字符,用于表示替代项。每条生产规则都会被严格执行;您可能知道存在其他头衔,例如 Lord,但产生规则本身决定了在该特定正式语言中使用的有效选项。
Production rules for something as complex as the syntax of a language, will come as a very large set of BNF statements that specify how every aspect of the language is defined. Whenever you find a non-terminal symbol on the right side of a production rule, there should be another rule that has the symbol on the left side. This continues until everything can be specified in relation to terminal symbols.
像语言语法这样复杂的事物的生产规则将作为一组非常大的 BNF 语句出现,这些语句指定了语言的各个方面的定义方式。每当您在产生式规则的右侧找到非终结符时,应该有另一个规则在左侧具有该符号。这一直持续到可以指定与终结符号相关的所有内容为止。
Here is a complete set of rules (for a small subset of a programming language):
这是一套完整的规则(针对编程语言的一小部分):
<addition> ::= <number>+<number>
::= +
<number> ::= <sign><integer>|<integer>
::= |
<integer> ::= <digit>|<digit><integer>
::= |
<digit>::=0|1|2|3|4|5|6|7|8|9
<sign> ::= +|- ::= +|-
Terminal symbols are the digits 0–9, and the plus and minus signs. Note that the plus sign appears twice, once as an operator and once as the sign for a number. A valid addition statement could have a double plus sign, e.g. 23 + +6
终端符号是数字 0-9 以及加号和减号。请注意,加号出现两次,一次作为运算符,一次作为数字符号。有效的加法语句可以有两个加号,例如 23 + +6
Recursion is used in BNF to write production rules to define 'one or more' of a symbol. For example, a number can be made up of one or more digits. You may start out by writing a regular production rule:
BNF 中使用递归来编写产生式规则来定义“一个或多个”符号。例如,数字可以由一位或多位数字组成。您可以从编写常规生产规则开始:
<number> ::= <digit>|<digit><digit>|<digit><digit><digit>|…
::= | | |…
But where would you stop? How many digits can a number have? Recursion allows an elegant solution:
但你会在哪里停下来呢?一个数字可以有多少位数字?递归提供了一个优雅的解决方案:
<number> ::= <digit>|<digit><number>
::= |
<digit> ::= 0|1|2|3|4|5|6|7|8|9
::= 0|1|2|3|4|5|6|7|8|9
Now you can use the production rules to ask if the following sets of digits are numbers:
现在您可以使用产生式规则来询问以下几组数字是否是数字:
2, 16, 234
It is clear that 2 is a number, as it is a digit (therefore satisfying the first alternative):
很明显 2 是一个数字,因为它是一个数字(因此满足第一个选择):
<number> ::= <digit>|<digit><number>
::= <数字> |
What about 16? Well, 16 is not a digit, so it doesn’t satisfy the first alternative. Look at the second alternative that states that a number can be defined as a <digit> followed by a <number>.
16呢?嗯,16 不是数字,所以它不满足第一个选择。看一下第二种选择,它指出数字可以定义为随后是一个。
<number> ::= <digit>|<digit><number>
::= | <数字><数字>
1 is a digit so that is ok. Is 6 a number? Yes — you can go back to the rule and see that it satisfies the first part:
1 是一个数字,所以没问题。 6是一个数字吗?是的 - 您可以返回规则并查看它是否满足第一部分:
<number> ::= <digit>|<digit><number>
::= <数字> |
Finally, consider 234. Immediately the first alternative can be discounted as there is more than one digit. Using the second part of the rule, you can start to satisfy this as 2 is a digit. Is the remainder, 34, a number? Again, use the second part of the rule and be happy that 3 is a digit. Is the remainder, 4, a number? Yes, as it satisfies the first part of the rule. Notice how you return to the rule taking each symbol, one at a time until the supply is exhausted (or you encounter a symbol that means the rule cannot be satisfied).
最后,考虑234 。第一个选项可以立即打折,因为有多个数字。使用规则的第二部分,您可以开始满足这一点,因为2是一个数字。余数34是一个数字吗?再次使用规则的第二部分,并庆幸3是一个数字。余数4是数字吗?是的,因为它满足规则的第一部分。请注意如何返回到规则,一次一个符号,直到供应耗尽(或者您遇到一个意味着无法满足规则的符号)。
If you have already studied how recursion works, you will see that this recursive production rule has a base case (i.e. a single digit number) and a general case (i.e. a single digit followed by a number). In the general case, a recursive definition has been used. You will also see how the problem (in this case working out if 234 is a number) gets smaller and smaller until it reaches the base case: is 234 a number? Is 34 a number? Is 4 a number?
如果您已经研究过递归的工作原理,您将看到此递归产生式规则具有基本情况(即单个数字)和一般情况(即单个数字后跟一个数字)。在一般情况下,使用递归定义。您还将看到问题(在本例中计算 234 是否是一个数字)如何变得越来越小,直到达到基本情况:234 是一个数字吗? 34是一个数字吗? 4是数字吗?
Parse trees can be very useful to check whether a string satisfies a production rule. Parsing means to break something down into its component parts. Consider the set of production rules that were examined earlier:
解析树对于检查字符串是否满足产生式规则非常有用。解析意味着将事物分解为其组成部分。考虑之前检查过的一组产生式规则:
<addition> ::= <number>+<number>
::= +
<number> ::= <sign><integer>|<integer>
::= |
<integer> ::= <digit>|<digit><integer>
::= |
<digit>::=0|1|2|3|4|5|6|7|8|9
<sign> ::= +|- ::= +|-
Let’s look at how to make use of a parse tree to check whether 24+54 is a valid addition.
让我们看看如何使用解析树来检查24+54是否是有效的加法。
Step 1. Put the production rule for addition at the top of the tree. It includes a terminal + symbol. This is fine, the string you are checking includes this symbol — so far, so good! Highlight the + symbol in your tree.
步骤1.将加法产生式规则放在树的顶部。它包括一个终端+符号。这很好,您正在检查的字符串包含这个符号 - 到目前为止,一切都很好!突出显示树中的+符号。
Step 2. To be a valid addition, the strings either side of the + symbol must satisfy the production rule <number> ::= <sign><integer>|<integer>. You must have a sign followed by an integer, or just an integer. Well, neither 24 or 54 starts with a sign. To be valid, the addition needs to satisfy the production rule for integer, so you can put integer below number on either side of the tree.
步骤 2.要成为有效的加法, +符号两侧的字符串必须满足产生式规则<number> ::= <sign><integer>|<integer> 。必须有一个符号后跟一个整数,或者只是一个整数。嗯, 24和54都不以符号开头。为了有效,加法需要满足整数的产生规则,因此您可以将整数放在树两侧的数字下方。
Step 3. Now show that 24 is an integer. The production rule for integer is <integer> ::= <digit>|<digit><integer>. 24 is not a digit (as digits are just single numeric symbols). Therefore, to be valid, 24 must be a <digit><integer>. Put this on the next line of the parse tree. You can do this on both sides, as 54 will be treated similarly.
步骤 3.现在证明 24 是一个整数。整数的产生规则为<integer> ::= <digit>|<digit><integer> 。 24 不是数字(因为数字只是单个数字符号)。因此,为了有效, 24 必须是<digit><integer> 。将其放在解析树的下一行。您可以在两侧执行此操作,因为 54 将得到类似的处理。
Step 4. Consider 24 again. Is it a digit followed by an integer? Well, 2 is definitely a digit so write 2 below <digit>. Similarly, you can write 5 below <digit> on the right side of the tree. Highlight 2 and 5.
步骤 4.再次考虑 24。它是一个数字后跟一个整数吗?好吧,2 绝对是一个数字,所以在<digit>下面写上 2 。同样,您可以在树右侧的<digit>下面写上5。突出显示 2 和 5。
Step 5. Now show that 4 is an integer (on both sides). The production rule for integer is <integer> ::= <digit>|<digit><integer>. Well, 4 is a digit so write <digit> below <integer> and 4 below <digit>. You can do this on the other side as well.
Highlight both digits.
步骤 5.现在证明 4 是一个整数(两边)。整数的产生规则为<integer> ::= <digit>|<digit><integer> 。好吧,4 是一个数字,所以在 <integer > 下面写上 <digit > ,在<digit>下面写上 4 。您也可以在另一侧执行此操作。突出显示这两个数字。
You have proved, using a parse tree, that 24+54 is a valid addition.
您已经使用解析树证明了24+54是有效的加法。
Alongside BNF notation, you can apply a diagrammatic approach that mirrors the formal grammar through a syntax diagram.
除了 BNF 表示法之外,您还可以应用图解方法,通过语法图反映形式语法。
In the absence of a key, you must make sure that you can recognise terminal and non-terminal symbols. Let's look at a simple example by considering a syntax diagram for the following production rule:
在没有密钥的情况下,您必须确保您可以识别终结符和非终结符。让我们看一个简单的例子,考虑以下产生式规则的语法图:
<special> ::= @ | ? | & | *
::=@| ? | & | *
By following one of the lines, any of these characters can be defined as special. The alternatives are stacked under each other and any of them can be picked as you follow the line through the diagram from start to end.
通过遵循其中一行,这些字符中的任何一个都可以定义为特殊字符。备选方案相互堆叠,当您沿着图表从头到尾沿着线进行操作时,可以选择其中任何一个。
Now let’s consider a more complex rule:
现在让我们考虑一个更复杂的规则:
<date> ::= <day-name><month-name>|<day-name><month-name><year>
::= |
This could be drawn using a syntax diagram as shown below:
这可以使用如下所示的语法图来绘制:
Observe how to deal with the fact that year is an optional component, as specified by the production rule. Syntax diagrams are always read from left to right, following the arrows. You can follow the top line that omits the year, or drop-down to include it. You cannot go back on yourself unless the arrows allow it. Recursion is dealt with by allowing a loop back through the diagram (in BNF the only way to represent a loop is by using recursion). The rule:
观察如何处理年份是可选组件这一事实,如生产规则所指定。语法图始终按照箭头从左到右阅读。您可以按照省略年份的顶行或下拉菜单来包含年份。除非箭头允许,否则你不能反悔。递归是通过允许循环返回图来处理的(在 BNF 中表示循环的唯一方法是使用递归)。规则:
<number> ::= <digit>|<digit><number>
::= |
Would be represented by a syntax diagram as shown below:
将由如下所示的语法图表示:
Classic regular expressions and finite state automata are used to describe regular languages. BNF is an example of a context-free grammar that is used to describe a context-free language. Since all regular languages are context-free, you can convert every regular expression to a BNF production rule (or set of rules). However, the reverse is not true.
经典的正则表达式和有限状态自动机用于描述正则语言。 BNF 是上下文无关语法的一个例子,用于描述上下文无关语言。由于所有正则语言都是上下文无关的,因此您可以将每个正则表达式转换为 BNF 产生规则(或规则集)。然而,反之则不然。
Consider that you are required to describe the need for balanced parentheses in an expression. For every open (left) parenthesis, there must be a matching closed (right) parenthesis. A finite state machine has no way of remembering anything, except for being in a state that corresponds to the fact. For example, there could be a state of having an unmatched open (left) parenthesis. A FSM that tried to 'count' unmatched parentheses would need a '1 open' state, '2 open' state, and so on. It could only represent an infinite number of open parentheses if it had an infinite number of open states — but, as the name makes clear, a FSM has a finite number of states.
考虑一下您需要描述表达式中平衡括号的需要。对于每个左括号,必须有一个匹配的右括号。有限状态机无法记住任何东西,除了处于与事实相对应的状态之外。例如,可能存在不匹配的左括号的状态。尝试“计算”不匹配括号的 FSM 需要“1 打开”状态、“2 打开”状态,依此类推。如果它有无限数量的开放状态,那么它只能表示无限数量的开放括号——但是,正如名称所表明的,有限状态机的状态数量是有限的。
Therefore, a context-free language is needed to describe syntax whenever there are an infinite number of elements of strings to be counted.
因此,每当要计算的字符串元素数量无限时,就需要一种上下文无关的语言来描述语法。