这是用户在 2025-1-3 9:50 为 https://www.boost.org/doc/libs/1_87_0/doc/html/boost_parser/tutorial.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world.
...世界上最受推崇和精心设计的 C++库项目之一。
Herb Sutter and Andrei Alexandrescu, C++ Coding Standards
— 赫伯·苏特和安德烈·亚历山德鲁斯库,C++ 编程规范

PrevUpHomeNext

Tutorial  教程

Terminology  术语
Hello, Whomever  你好,无论谁
A Trivial Example  一个简单的例子
A Trivial Example That Gracefully Handles Whitespace
一个优雅处理空白字符的简单示例
Semantic Actions  语义动作
Parsing to Find Subranges
解析以查找子范围
The Parse Context  解析上下文
Rule Parsers  规则解析器
Parsing into structs and classes
解析为 struct s 和 class es
Alternative Parsers  替代解析器
Parsing Quoted Strings  解析引号字符串
Parsing In Detail  详细解析
Backtracking  回溯
Symbol Tables  符号表
Mutable Symbol Tables  可变符号表
The Parsers And Their Uses
解析及其用途
Directives  指令
Combining Operations  结合操作
Attribute Generation  属性生成
The parse() API  The parse() API (原文中包含特殊符号和代码,因此未进行翻译。)
More About Rules  更多关于规则的信息
Algorithms and Views That Use Parsers
算法和解析器使用的视图
Unicode Support  Unicode 支持
Callback Parsing  回调解析
Error Handling and Debugging
错误处理和调试
Memory Allocation  内存分配
Best Practices  最佳实践
Writing Your Own Parsers
编写您自己的解析器

First, let's cover some terminology that we'll be using throughout the docs:
首先,让我们介绍一些将在文档中使用的术语:

A semantic action is an arbitrary bit of logic associated with a parser, that is only executed when the parser matches.
语义动作是与解析器相关联的任意逻辑片段,仅在解析器匹配时执行。

Simpler parsers can be combined to form more complex parsers. Given some combining operation C, and parsers P0, P1, ... PN, C(P0, P1, ... PN) creates a new parser Q. This creates a parse tree. Q is the parent of P1, P2 is the child of Q, etc. The parsers are applied in the top-down fashion implied by this topology. When you use Q to parse a string, it will use P0, P1, etc. to do the actual work. If P3 is being used to parse the input, that means that Q is as well, since the way Q parses is by dispatching to its children to do some or all of the work. At any point in the parse, there will be exactly one parser without children that is being used to parse the input; all other parsers being used are its ancestors in the parse tree.
更简单的解析器可以组合成更复杂的解析器。给定一些组合操作 C ,以及解析器 P0P1 ,... PNC(P0, P1, ... PN) 创建一个新的解析器 Q 。这创建了一个解析树。 QP1 的父节点, P2Q 的子节点等。解析器按照这种拓扑隐含的从上到下的方式应用。当你使用 Q 解析字符串时,它将使用 P0P1 等来完成实际工作。如果正在使用 P3 来解析输入,这意味着 Q 也在使用,因为 Q 解析的方式是通过将其子节点调度到做部分或全部工作。在解析的任何时刻,将恰好有一个没有子节点的解析器被用来解析输入;所有其他正在使用的解析器都是解析树中的祖先。

A subparser is a parser that is the child of another parser.
子解析器是另一个解析器的子解析器。

The top-level parser is the root of the tree of parsers.
顶级解析器是解析器树的根。

The current parser or bottommost parser is the parser with no children that is currently being used to parse the input.
当前解析器或最底层的解析器是当前用于解析输入的无子节点的解析器。

A rule is a kind of parser that makes building large, complex parsers easier. A subrule is a rule that is the child of some other rule. The current rule or bottommost rule is the one rule currently being used to parse the input that has no subrules. Note that while there is always exactly one current parser, there may or may not be a current rule — rules are one kind of parser, and you may or may not be using one at a given point in the parse.
规则是一种使构建大型、复杂解析器更简单的解析器。子规则是某个其他规则的子规则。当前规则或最底层的规则是当前用于解析没有子规则的输入的规则。请注意,虽然始终只有一个当前解析器,但可能有一个或没有当前规则——规则是解析器的一种,您可能在解析的某个点上使用或不使用它。

The top-level parse is the parse operation being performed by the top-level parser. This term is necessary because, though most parse failures are local to a particular parser, some parse failures cause the call to parse() to indicate failure of the entire parse. For these cases, we say that such a local failure "causes the top-level parse to fail".
顶级解析是顶级解析器正在执行的解释操作。这个术语是必要的,因为尽管大多数解析失败都是局部于特定解析器的,但有些解析失败会导致调用 parse() 以指示整个解析失败。在这些情况下,我们说这种局部失败“导致顶级解析失败”。

Throughout the Boost.Parser documentation, I will refer to "the call to parse()". Read this as "the call to any one of the functions described in The parse() API". That includes prefix_parse(), callback_parse(), and callback_prefix_parse().
在整个 Boost.Parser 文档中,我将提到“对 parse() 的调用”。请将其理解为“对 The parse() API 中描述的任何函数的调用”。这包括 prefix_parse()callback_parse()callback_prefix_parse()

There are some special kinds of parsers that come up often in this documentation.
这里有一些在文档中经常出现的特殊类型的解析器。

One is a sequence parser; you will see it created using operator>>, as in p1 >> p2 >> p3. A sequence parser tries to match all of its subparsers to the input, one at a time, in order. It matches the input iff all its subparsers do.
一个是一个序列解析器;您将看到它是如何使用 operator>> 创建的,就像 p1 >> p2 >> p3 一样。序列解析器试图按顺序将所有子解析器与输入匹配,一次一个。如果所有子解析器都匹配,则匹配输入。

Another is an alternative parser; you will see it created using operator|, as in p1 | p2 | p3. An alternative parser tries to match all of its subparsers to the input, one at a time, in order; it stops after matching at most one subparser. It matches the input iff one of its subparsers does.
另一个是替代解析器;您将看到它是如何使用 operator| 创建的,就像 p1 | p2 | p3 一样。替代解析器会尝试按顺序将所有子解析器与输入匹配,一次一个;它最多匹配一个子解析器后停止。如果其中一个子解析器匹配输入,则匹配输入。

Finally, there is a permutation parser; it is created using operator||, as in p1 || p2 || p3. A permutation parser tries to match all of its subparsers to the input, in any order. So the parser p1 || p2 || p3 is equivalent to (p1 >> p2 >> p3) | (p1 >> p3 >> p2) | (p2 >> p1 >> p3) | (p2 >> p3 >> p1) | (p3 >> p1 >> p2) | (p3 >> p2 >> p1). Hopefully its terseness is self-explanatory. It matches the input iff all of its subparsers do, regardless of the order they match in.
最后,有一个排列解析器;它是使用 operator|| 创建的,就像 p1 || p2 || p3 一样。排列解析器尝试以任何顺序将其子解析器与输入匹配。因此,解析器 p1 || p2 || p3 等同于 (p1 >> p2 >> p3) | (p1 >> p3 >> p2) | (p2 >> p1 >> p3) | (p2 >> p3 >> p1) | (p3 >> p1 >> p2) | (p3 >> p2 >> p1) 。希望它的简洁性是显而易见的。它只有在所有子解析器都匹配的情况下才匹配输入,无论它们匹配的顺序如何。

Boost.Parser parsers each have an attribute associated with them, or explicitly have no attribute. An attribute is a value that the parser generates when it matches the input. For instance, the parser double_ generates a double when it matches the input. ATTR() is a notional macro that expands to the attribute type of the parser passed to it; ATTR(double_) is double. This is similar to the attribute type trait.
每个 Boost.Parser 解析器都有一个与之关联的属性,或者明确没有属性。属性是解析器在匹配输入时生成的值。例如,当解析器 double_ 匹配输入时,它会生成一个 doubleATTR () 是一个概念宏,它扩展为传递给它的解析器的属性类型; ATTR(double_)double 。这与 attribute 类型特性类似。

Next, we'll look at some simple programs that parse using Boost.Parser. We'll start small and build up from there.
接下来,我们将查看一些使用 Boost.Parser 进行解析的简单程序。我们将从小处着手,逐步构建。

This is just about the most minimal example of using Boost.Parser that one could write. We take a string from the command line, or "World" if none is given, and then we parse it:
这是使用 Boost.Parser 所能编写的最简例子之一。我们从命令行获取一个字符串,如果没有提供,则使用 "World" ,然后对其进行解析:

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main(int argc, char const * argv[])
{
    std::string input = "World";
    if (1 < argc)
        input = argv[1];

    std::string result;
    bp::parse(input, *bp::char_, result);
    std::cout << "Hello, " << result << "!\n";
}

The expression *bp::char_ is a parser-expression. It uses one of the many parsers that Boost.Parser provides: char_. Like all Boost.Parser parsers, it has certain operations defined on it. In this case, *bp::char_ is using an overloaded operator* as the C++ version of a Kleene star operator. Since C++ has no postfix unary * operator, we have to use the one we have, so it is used as a prefix.
表达式 *bp::char_ 是一个解析表达式。它使用 Boost.Parser 提供的许多解析器之一: char_ 。像所有 Boost.Parser 解析器一样,它在其上定义了某些操作。在这种情况下, *bp::char_ 使用了重载的 operator* 作为 C++ 版本的 Kleene 星号运算符。由于 C++ 没有后缀一元 * 运算符,我们必须使用我们有的,所以它被用作前缀。

So, *bp::char_ means "any number of characters". In other words, it really cannot fail. Even an empty string will match it.
所以, *bp::char_ 表示“任意数量的字符”。换句话说,它实际上不可能失败。即使是空字符串也能匹配它。

The parse operation is performed by calling the parse() function, passing the parser as one of the arguments:
解析操作通过调用 parse() 函数执行,将解析器作为参数之一传递:

bp::parse(input, *bp::char_, result);

The arguments here are: input, the range to parse; *bp::char_, the parser used to do the parse; and result, an out-parameter into which to put the result of the parse. Don't get too caught up on this method of getting the parse result out of parse(); there are multiple ways of doing so, and we'll cover all of them in subsequent sections.
这里的参数有: input ,要解析的范围; *bp::char_ ,用于解析的解析器;以及 result ,一个输出参数,用于存放解析结果。不要过于纠结于从 parse() 获取解析结果的方法;有多种方法可以实现,我们将在后续章节中全部介绍。

Also, just ignore for now the fact that Boost.Parser somehow figured out that the result type of the *bp::char_ parser is a std::string. There are clear rules for this that we'll cover later.
此外,现在先忽略这样一个事实:Boost.Parser 不知怎么的推断出 *bp::char_ 解析器的结果类型是 std::string 。对此有明确的规则,我们稍后会讨论。

The effects of this call to parse() is not very interesting — since the parser we gave it cannot ever fail, and because we're placing the output in the same type as the input, it just copies the contents of input to result.
此调用 parse() 的效果并不很有趣——因为我们给出的解析器永远不会失败,而且因为我们把输出放在与输入相同的类型中,它只是将 input 的内容复制到 result

Let's look at a slightly more complicated example, even if it is still trivial. Instead of taking any old chars we're given, let's require some structure. Let's parse one or more doubles, separated by commas.
让我们看看一个稍微复杂一点的例子,即使它仍然很 trivial。不是随便拿给我们的任何旧的 char ,而是要求一些结构。让我们解析一个或多个由逗号分隔的 double

The Boost.Parser parser for double is double_. So, to parse a single double, we'd just use that. If we wanted to parse two doubles in a row, we'd use:
The Boost.Parser 解析器用于 doubledouble_ 。因此,要解析单个 double ,我们只需使用它。如果我们想连续解析两个 double ,我们会使用:

boost::parser::double_ >> boost::parser::double_

operator>> in this expression is the sequence-operator; read it as "followed by". If we combine the sequence-operator with Kleene star, we can get the parser we want by writing:
operator>> 在这个表达式中是序列运算符;读作“之后”。如果我们把序列运算符与 Kleene 星号结合,就可以通过编写以下内容来得到我们想要的解析器:

boost::parser::double_ >> *(',' >> boost::parser::double_)

This is a parser that matches at least one double — because of the first double_ in the expression above — followed by zero or more instances of a-comma-followed-by-a-double. Notice that we can use ',' directly. Though it is not a parser, operator>> and the other operators defined on Boost.Parser parsers have overloads that accept character/parser pairs of arguments; these operator overloads will create the right parser to recognize ','.
这是一个至少匹配一个 double 的解析器——因为上述表达式中的第一个 double_ ——后面跟着零个或多个由逗号和 double 组成的实例。请注意,我们可以直接使用 ',' 。尽管它不是一个解析器, operator>> 和其他在 Boost.Parser 解析器上定义的运算符有接受字符/解析器对参数的重载;这些运算符重载将创建识别 ',' 的正确解析器。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a list of doubles, separated by commas.  No pressure. ";
    std::string input;
    std::getline(std::cin, input);

    auto const result = bp::parse(input, bp::double_ >> *(',' >> bp::double_));

    if (result) {
        std::cout << "Great! It looks like you entered:\n";
        for (double x : *result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout
            << "Good job!  Please proceed to the recovery annex for cake.\n";
    }
}

The first example filled in an out-parameter to deliver the result of the parse. This call to parse() returns a result instead. As you can see, the result is contextually convertible to bool, and *result is some sort of range. In fact, the return type of this call to parse() is std::optional<std::vector<double>>. Naturally, if the parse fails, std::nullopt is returned. We'll look at how Boost.Parser maps the type of the parser to the return type, or the filled in out-parameter's type, a bit later.
第一个示例填充了一个输出参数以传递解析的结果。这个对 parse() 的调用返回了一个结果。正如你所见,结果可以上下文转换成 bool ,而 *result 是一种范围。实际上,这个对 parse() 的调用返回类型是 std::optional<std::vector<double>> 。当然,如果解析失败,则返回 std::nullopt 。我们稍后会看看 Boost.Parser 如何将解析器的类型映射到返回类型,或者填充的输出参数的类型。

[Note] Note  注意

There's a type trait that can tell you the attribute type for a parser, attribute (and an associated alias attribute_t). We'll discuss it more in the Attribute Generation section.
存在一种类型特性,可以告诉你解析器的属性类型, attribute (以及相关的别名 attribute_t )。我们将在属性生成部分进一步讨论。

If I run it in a shell, this is the result:
如果我在 shell 中运行它,这是结果:

$ example/trivial
Enter a list of doubles, separated by commas.  No pressure. 5.6,8.9
Great! It looks like you entered:
5.6
8.9
$ example/trivial
Enter a list of doubles, separated by commas.  No pressure. 5.6, 8.9
Good job!  Please proceed to the recovery annex for cake.

It does not recognize "5.6, 8.9". This is because it expects a comma followed immediately by a double, but I inserted a space after the comma. The same failure to parse would occur if I put a space before the comma, or before or after the list of doubles.
它不识别 "5.6, 8.9" 。这是因为它期望逗号后立即跟一个 double ,但我却在逗号后插入了空格。如果我在逗号前或 double 列表前后加空格,也会出现同样的解析失败。

One more thing: there is a much better way to write the parser above. Instead of repeating the double_ subparser, we could have written this:
还有一件事:上面解析器的写法有更好的方法。我们不必重复使用 double_ 子解析器,可以写成这样:

bp::double_ % ','

That's semantically identical to bp::double_ >> *(',' >> bp::double_). This pattern — some bit of input repeated one or more times, with a separator between each instance — comes up so often that there's an operator specifically for that, operator%. We'll be using that operator from now on.
这与 bp::double_ >> *(',' >> bp::double_) 在语义上相同。这种模式——一些输入重复一次或多次,每次之间有分隔符——出现得如此频繁,以至于有一个专门的操作符用于此, operator% 。从现在起,我们将使用该操作符。

Let's modify the trivial parser we just saw to ignore any spaces it might find among the doubles and commas. To skip whitespace wherever we find it, we can pass a skip parser to our call to parse() (we don't need to touch the parser passed to parse()). Here, we use ws, which matches any Unicode whitespace character.
让我们修改我们刚才看到的平凡解析器,使其忽略在 double s 和逗号之间可能找到的任何空格。要跳过我们找到的任何空白,我们可以将跳过解析器传递给我们的 parse() 调用(我们不需要触摸传递给 parse() 的解析器)。在这里,我们使用 ws ,它匹配任何 Unicode 空白字符。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a list of doubles, separated by commas.  No pressure. ";
    std::string input;
    std::getline(std::cin, input);

    auto const result = bp::parse(input, bp::double_ % ',', bp::ws);

    if (result) {
        std::cout << "Great! It looks like you entered:\n";
        for (double x : *result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout
            << "Good job!  Please proceed to the recovery annex for cake.\n";
    }
}

The skip parser, or skipper, is run between the subparsers within the parser passed to parse(). In this case, the skipper is run before the first double is parsed, before any subsequent comma or double is parsed, and at the end. So, the strings "3.6,5.9" and " 3.6 , \t 5.9 " are parsed the same by this program.
跳过解析器,或称为跳过器,在传递给 parse() 的解析器内的子解析器之间运行。在这种情况下,跳过器在解析第一个 double 之前运行,在解析任何后续逗号或 double 之前运行,并在最后运行。因此,该程序以相同的方式解析字符串 "3.6,5.9"" 3.6 , \t 5.9 "

Skipping is an important concept in Boost.Parser. You can skip anything, not just whitespace; there are lots of other things you might want to skip. The skipper you pass to parse() can be an arbitrary parser. For example, if you write a parser for a scripting language, you can write a skipper to skip whitespace, inline comments, and end-of-line comments.
跳过是 Boost.Parser 中的一个重要概念。你可以跳过任何内容,而不仅仅是空白;你可能想要跳过很多东西。传递给 parse() 的跳过器可以是一个任意的解析器。例如,如果你为脚本语言编写了一个解析器,你可以编写一个跳过器来跳过空白、行内注释和行尾注释。

We'll be using skip parsers almost exclusively in the rest of the documentation. The ability to ignore the parts of your input that you don't care about is so convenient that parsing without skipping is a rarity in practice.
我们将几乎在文档的其余部分使用跳过解析器。忽略你不需要关注的部分的能力非常方便,以至于在实际应用中不跳过的解析几乎很少见。

Like all parsing systems (lex & yacc, Boost.Spirit, etc.), Boost.Parser has a mechanism for associating semantic actions with different parts of the parse. Here is nearly the same program as we saw in the previous example, except that it is implemented in terms of a semantic action that appends each parsed double to a result, instead of automatically building and returning the result. To do this, we replace the double_ from the previous example with double_[action]; action is our semantic action:
与所有解析系统(lex & yacc、Boost.Spirit 等)一样,Boost.Parser 有一个将语义动作与解析的不同部分关联的机制。这里是一个与上一个例子几乎相同的程序,只不过它是在语义动作的术语中实现的,该动作将每个解析的 double 追加到结果中,而不是自动构建和返回结果。为此,我们将上一个例子中的 double_ 替换为 double_[action]action 是我们的语义动作:

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a list of doubles, separated by commas. ";
    std::string input;
    std::getline(std::cin, input);

    std::vector<double> result;
    auto const action = [&result](auto & ctx) {
        std::cout << "Got one!\n";
        result.push_back(_attr(ctx));
    };
    auto const action_parser = bp::double_[action];
    auto const success = bp::parse(input, action_parser % ',', bp::ws);

    if (success) {
        std::cout << "You entered:\n";
        for (double x : result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

Run in a shell, it looks like this:
在 shell 中运行,看起来是这样的:

$ example/semantic_actions
Enter a list of doubles, separated by commas. 4,3
Got one!
Got one!
You entered:
4
3

In Boost.Parser, semantic actions are implemented in terms of invocable objects that take a single parameter to a parse-context object. The parse-context object represents the current state of the parse. In the example we used this lambda as our invocable:
在 Boost.Parser 中,语义动作是通过接受一个解析上下文对象参数的可调用对象实现的。解析上下文对象表示解析的当前状态。在示例中,我们使用这个 lambda 作为我们的可调用对象:

auto const action = [&result](auto & ctx) {
    std::cout << "Got one!\n";
    result.push_back(_attr(ctx));
};

We're both printing a message to std::cout and recording a parsed result in the lambda. It could do both, either, or neither of these things if you like. The way we get the parsed double in the lambda is by asking the parse context for it. _attr(ctx) is how you ask the parse context for the attribute produced by the parser to which the semantic action is attached. There are lots of functions like _attr() that can be used to access the state in the parse context. We'll cover more of them later on. The Parse Context defines what exactly the parse context is and how it works.
我们都在向 std::cout 打印消息并在 lambda 中记录解析结果。如果你喜欢,它可以同时做这两件事,也可以只做其中一件,或者一件都不做。我们通过询问解析上下文来获取 lambda 中的解析 double_attr(ctx) 是询问解析上下文以获取与语义动作相关联的解析器产生的属性的方式。有许多像 _attr() 这样的函数可以用来访问解析上下文中的状态。我们将在稍后介绍更多这样的函数。解析上下文定义了解析上下文的确切含义及其工作方式。

Note that you can't write an unadorned lambda directly as a semantic action. Otherwise, the compile will see two '[' characters and think it's about to parse an attribute. Parentheses fix this:
请注意,您不能直接将未装饰的 lambda 作为语义动作写入。否则,编译器会看到两个 '[' 字符,并认为它即将解析一个属性。括号可以解决这个问题:

p[([](auto & ctx){/*...*/})]

Before you do this, note that the lambdas that you write as semantic actions are almost always generic (having an auto & ctx parameter), and so are very frequently re-usable. Most semantic action lambdas you write should be written out-of-line, and given a good name. Even when they are not reused, named lambdas keep your parsers smaller and easier to read.
在执行此操作之前,请注意,您作为语义动作编写的 lambda 函数几乎总是通用的(具有 auto & ctx 参数),因此它们非常频繁地可重用。您编写的多数语义动作 lambda 函数应该独立编写,并赋予一个良好的名称。即使它们没有被重用,命名 lambda 函数也能使您的解析器更小、更易于阅读。

[Important] Important  重要

Attaching a semantic action to a parser removes its attribute. That is, ATTR(p[a]) is always the special no-attribute type none, regardless of what type ATTR(p) is.
附加语义动作到解析器会移除其属性。也就是说, ATTR(p[a]) 总是特殊的无属性类型 none ,无论 ATTR(p) 是什么类型。

Semantic actions inside rules
规则内的语义动作

There are some other forms for semantic actions, when they are used inside of rules. See More About Rules for details.
存在一些其他形式的语义动作,当它们在 rules 内部使用时。有关详细信息,请参阅规则。

So far we've seen examples that parse some text and generate associated attributes. Sometimes, you want to find some subrange of the input that contains what you're looking for, and you don't want to generate attributes at all.
到目前为止,我们已经看到了一些解析文本并生成相关属性的示例。有时,你可能只想找到包含你所需内容的输入子范围,而不想生成任何属性。

There are two directives that affect the attribute type of any parser, raw[] and string_view[]. (We'll get to directives in more detail in the Directives section later. For now, you just need to know that a directive wraps a parser, and changes some aspect of how it functions.)
有两个指令会影响任何解析器的属性类型,即 raw[]string_view[] 。(我们将在指令部分详细讨论指令。现在,你只需要知道指令会包装解析器,并改变其功能的一些方面。)

raw[]  raw[]:原始数组

raw[] changes the attribute of its parser to be a subrange whose begin() and end() return the bounds of the sequence being parsed that match p.
raw[] 更改其解析器的属性,使其成为一个 subrange ,该 subrangebegin()end() 返回与 p 匹配的序列的界限。

namespace bp = boost::parser;
auto int_parser = bp::int_ % ',';            // ATTR(int_parser) is std::vector<int>
auto subrange_parser = bp::raw[int_parser];  // ATTR(subrange_parser) is a subrange

// Parse using int_parser, generating integers.
auto ints = bp::parse("1, 2, 3, 4", int_parser, bp::ws);
assert(ints);
assert(*ints == std::vector<int>({1, 2, 3, 4}));

// Parse again using int_parser, but this time generating only the
// subrange matched by int_parser.  (prefix_parse() allows matches that
// don't consume the entire input.)
auto const str = std::string("1, 2, 3, 4, a, b, c");
auto first = str.begin();
auto range = bp::prefix_parse(first, str.end(), subrange_parser, bp::ws);
assert(range);
assert(range->begin() == str.begin());
assert(range->end() == str.begin() + 10);

static_assert(std::is_same_v<
              decltype(range),
              std::optional<bp::subrange<std::string::const_iterator>>>);

Note that the subrange has the iterator type std::string::const_iterator, because that's the iterator type passed to prefix_parse(). If we had passed char const * iterators to prefix_parse(), that would have been the iterator type. The only exception to this comes from Unicode-aware parsing (see Unicode Support). In some of those cases, the iterator being used in the parse is not the one you passed. For instance, if you call prefix_parse() with char8_t * iterators, it will create a UTF-8 to UTF-32 transcoding view, and parse the iterators of that view. In such a case, you'll get a subrange whose iterator type is a transcoding iterator. When that happens, you can get the underlying iterator — the one you passed to prefix_parse() — by calling the .base() member function on each transcoding iterator in the returned subrange.
请注意, subrange 具有迭代器类型 std::string::const_iterator ,因为那是传递给 prefix_parse() 的迭代器类型。如果我们向 prefix_parse() 传递了 char const * 迭代器,那么迭代器类型就是那个。唯一的例外来自对 Unicode 的解析(见 Unicode 支持)。在这些情况中,用于解析的迭代器不是你传递的那个。例如,如果你用 char8_t * 迭代器调用 prefix_parse() ,它将创建一个 UTF-8 到 UTF-32 转换视图,并解析该视图的迭代器。在这种情况下,你将得到一个迭代器类型为转换迭代器的 subrange 。当发生这种情况时,你可以通过在返回的 subrange 中的每个转换迭代器上调用 .base() 成员函数来获取底层迭代器——即你传递给 prefix_parse() 的那个。

auto const u8str = std::u8string(u8"1, 2, 3, 4, a, b, c");
auto u8first = u8str.begin();
auto u8range = bp::prefix_parse(u8first, u8str.end(), subrange_parser, bp::ws);
assert(u8range);
assert(u8range->begin().base() == u8str.begin());
assert(u8range->end().base() == u8str.begin() + 10);
string_view[]  字符串视图数组

string_view[] has very similar semantics to raw[], except that it produces a std::basic_string_view<CharT> (where CharT is the type of the underlying range begin parsed) instead of a subrange. For this to work, the underlying range must be contiguous. Contiguity of iterators is not detectable before C++20, so this directive is only available in C++20 and later.
string_view[]raw[] 的语义非常相似,除了它产生一个 std::basic_string_view<CharT> (其中 CharT 是底层范围的开始解析类型)而不是一个 subrange 。为了使其工作,底层范围必须是连续的。在 C++20 之前,迭代器的连续性是不可检测的,因此此指令仅在 C++20 及以后版本中可用。

namespace bp = boost::parser;
auto int_parser = bp::int_ % ',';              // ATTR(int_parser) is std::vector<int>
auto sv_parser = bp::string_view[int_parser];  // ATTR(sv_parser) is a string_view

auto const str = std::string("1, 2, 3, 4, a, b, c");
auto first = str.begin();
auto sv1 = bp::prefix_parse(first, str.end(), sv_parser, bp::ws);
assert(sv1);
assert(*sv1 == str.substr(0, 10));

static_assert(std::is_same_v<decltype(sv1), std::optional<std::string_view>>);

Since string_view[] produces string_views, it cannot return transcoding iterators as described above for raw[]. If you parse a sequence of CharT with string_view[], you get exactly a std::basic_string_view<CharT>. If the parse is using transcoding in the Unicode-aware path, string_view[] will decompose the transcoding iterator as necessary. If you pass a transcoding view to parse() or transcoding iterators to prefix_parse(), string_view[] will still see through the transcoding iterators without issue, and give you a string_view of part of the underlying range.
由于 string_view[] 产生 string_view ,它不能像上面描述的那样为 raw[] 返回转码迭代器。如果你用 string_view[] 解析一个 CharT 序列,你会得到一个精确的 std::basic_string_view<CharT> 。如果解析在 Unicode 感知路径中使用转码, string_view[] 将根据需要分解转码迭代器。如果你将转码视图传递给 parse() 或将转码迭代器传递给 prefix_parse()string_view[] 仍然可以无问题地看穿转码迭代器,并给你一个底层范围的子范围。

auto sv2 = bp::parse("1, 2, 3, 4" | bp::as_utf32, sv_parser, bp::ws);
assert(sv2);
assert(*sv2 == "1, 2, 3, 4");

static_assert(std::is_same_v<decltype(sv2), std::optional<std::string_view>>);

Now would be a good time to describe the parse context in some detail. Any semantic action that you write will need to use state in the parse context, so you need to know what's available.
现在是一个详细描述解析上下文的好时机。你编写的任何语义动作都需要在解析上下文中使用状态,因此你需要知道有什么可用。

The parse context is an object that stores the current state of the parse — the current- and end-iterators, the error handler, etc. Data may seem to be "added" to or "removed" from it at different times during the parse. For instance, when a parser p with a semantic action a succeeds, the context adds the attribute that p produces to the parse context, then calls a, passing it the context.
解析上下文是一个对象,用于存储解析的当前状态——当前和结束迭代器、错误处理器等。数据可能在解析的不同时间被“添加”或“删除”。例如,当解析器 p 执行语义动作 a 成功时,上下文会将 p 生成的属性添加到解析上下文中,然后调用 a ,并将上下文传递给它。

Though the context object appears to have things added to or removed from it, it does not. In reality, there is no one context object. Contexts are formed at various times during the parse, usually when starting a subparser. Each context is formed by taking the previous context and adding or changing members as needed to form a new context object. When the function containing the new context object returns, its context object (if any) is destructed. This is efficient to do, because the parse context has only about a dozen data members, and each data member is less than or equal to the size of a pointer. Copying the entire context when mutating the context is therefore fast. The context does no memory allocation.
尽管上下文对象看起来被添加或删除了东西,但实际上并没有。实际上,没有上下文对象。上下文在解析过程中形成,通常在开始子解析器时。每个上下文都是通过取前一个上下文,并根据需要添加或更改成员来形成新的上下文对象。当包含新上下文对象的函数返回时,其上下文对象(如果有)将被销毁。这样做是高效的,因为解析上下文只有大约十几个数据成员,每个数据成员的大小不超过指针的大小。因此,在修改上下文时复制整个上下文是快速的。上下文不进行内存分配。

[Tip] Tip  提示

All these functions that take the parse context as their first parameter will find by found by Argument-Dependent Lookup. You will probably never need to qualify them with boost::parser::.
所有这些以解析上下文作为第一个参数的函数将通过依赖参数查找来找到。你可能永远不需要用 boost::parser:: 来限定它们。

Accessors for data that are always available
访问始终可用的数据访问器

By convention, the names of all Boost.Parser functions that take a parse context, and are therefore intended for use inside semantic actions, contain a leading underscore.
按照惯例,所有接受解析上下文作为参数的 Boost.Parser 函数,因此旨在在语义动作中使用,其名称都包含一个前置下划线。

_pass()

_pass() returns a reference to a bool indicating the success of failure of the current parse. This can be used to force the current parse to pass or fail:
_pass() 返回一个指向 bool 的引用,表示当前解析的成功或失败。这可以用来强制当前解析通过或失败:

[](auto & ctx) {
    // If the attribute fails to meet this predicate, fail the parse.
    if (!necessary_condition(_attr(ctx)))
        _pass(ctx) = false;
}

Note that for a semantic action to be executed, its associated parser must already have succeeded. So unless you previously wrote _pass(ctx) = false within your action, _pass(ctx) = true does nothing; it's redundant.
请注意,要执行语义动作,其关联的解析器必须已经成功。所以除非你之前在你的动作中写了 _pass(ctx) = false ,否则 _pass(ctx) = true 什么也不做;它是多余的。

_begin(), _end() and _where()
_begin()、_end() 和 _where()

_begin() and _end() return the beginning and end of the range that you passed to parse(), respectively. _where() returns a subrange indicating the bounds of the input matched by the current parse. _where() can be useful if you just want to parse some text and return a result consisting of where certain elements are located, without producing any other attributes. _where() can also be essential in tracking where things are located, to provide good diagnostics at a later point in the parse. Think mismatched tags in XML; if you parse a close-tag at the end of an element, and it does not match the open-tag, you want to produce an error message that mentions or shows both tags. Stashing _where(ctx).begin() somewhere that is available to the close-tag parser will enable that. See Error Handling and Debugging for an example of this.
_begin()_end() 分别返回传递给 parse() 的范围的开始和结束。 _where() 返回一个 subrange ,表示当前解析匹配的输入的界限。 _where() 如果您只想解析一些文本并返回一个仅包含某些元素位置的结果,而不产生其他属性,则非常有用。 _where() 在跟踪位置、在稍后提供良好的诊断方面也至关重要。考虑 XML 中的不匹配标签;如果您解析元素末尾的闭合标签,并且它不匹配开标签,您希望产生一个提及或显示这两个标签的错误消息。将 _where(ctx).begin() 存储在闭合标签解析器可访问的地方将启用此功能。请参阅错误处理和调试的示例。

_error_handler()  错误处理程序()

_error_handler() returns a reference to the error handler associated with the parser passed to parse(). Using _error_handler(), you can generate errors and warnings from within your semantic actions. See Error Handling and Debugging for concrete examples.
_error_handler() 返回与传递给 parse() 的解析器关联的错误处理程序引用。使用 _error_handler() ,您可以在您的语义动作中生成错误和警告。请参阅错误处理和调试以获取具体示例。

Accessors for data that are only sometimes available
访问有时可用的数据访问器
_attr()

_attr() returns a reference to the value of the current parser's attribute. It is available only when the current parser's parse is successful. If the parser has no semantic action, no attribute gets added to the parse context. It can be used to read and write the current parser's attribute:
_attr() 返回当前解析器属性值的引用。仅在当前解析器解析成功时可用。如果解析器没有语义动作,则不会向解析上下文添加任何属性。它可以用来读取和写入当前解析器的属性:

[](auto & ctx) { _attr(ctx) = 3; }

If the current parser has no attribute, a none is returned.
如果当前解析器没有属性,则返回一个 none

_val()

_val() returns a reference to the value of the attribute of the current rule being used to parse (if any), and is available even before the rule's parse is successful. It can be used to set the current rule's attribute, even from a parser that is a subparser inside the rule. Let's say we're writing a parser with a semantic action that is within a rule. If we want to set the current rule's value to some function of subparser's attribute, we would write this semantic action:
_val() 返回当前正在使用的规则(如果有)的属性值的引用,即使在规则解析成功之前也可以使用。可以用来设置当前规则的属性,即使是从规则内部的子解析器中也可以。假设我们正在编写一个具有规则内语义动作的解析器。如果我们想将当前规则的值设置为子解析器属性的某个函数,我们会编写这个语义动作:

[](auto & ctx) { _val(ctx) = some_function(_attr(ctx)); }

If there is no current rule, or the current rule has no attribute, a none is returned.
如果没有当前规则,或者当前规则没有属性,则返回一个 none

You need to use _val() in cases where the default attribute for a rule's parser is not directly compatible with the attribute type of the rule. In these cases, you'll need to write some code like the example above to compute the rule's attribute from the rule's parser's generated attribute. For more info on rules, see the next page, and More About Rules.
您需要在默认属性对于某个 rule 的解析器不直接兼容于 rule 的属性类型的情况下使用 _val() 。在这些情况下,您需要编写一些像上面示例中的代码来从 rule 的解析器生成的属性计算 rule 的属性。有关 rules 的更多信息,请参阅下一页,以及更多关于规则的内容。

_globals()  全局变量()

_globals() returns a reference to a user-supplied object that contains whatever data you want to use during the parse. The "globals" for a parse is an object — typically a struct — that you give to the top-level parser. Then you can use _globals() to access it at any time during the parse. We'll see how globals get associated with the top-level parser in The parse() API later. As an example, say that you have an early part of the parse that needs to record some black-listed values, and that later parts of the parse might need to parse values, failing the parse if they see the black-listed values. In the early part of the parse, you could write something like this.
_globals() 返回一个指向用户提供的对象的引用,该对象包含您在解析过程中想要使用的任何数据。解析的“全局变量”是一个对象——通常是结构体——您将其提供给顶层解析器。然后您可以在解析过程中任何时间使用 _globals() 来访问它。我们将在后面的 parse() API 中看到全局变量是如何与顶层解析器关联的。作为一个例子,假设您在解析的早期部分需要记录一些黑名单值,而解析的后期部分可能需要解析值,如果看到黑名单值则解析失败。在解析的早期部分,您可以编写如下内容。

[](auto & ctx) {
    // black_list is a std::unordered_set.
    _globals(ctx).black_list.insert(_attr(ctx));
}

Later in the parse, you could then use black_list to check values as they are parsed.
稍后解析时,您可以使用 black_list 来检查解析时的值。

[](auto & ctx) {
    if (_globals(ctx).black_list.contains(_attr(ctx)))
        _pass(ctx) = false;
}
_locals()  locals()

_locals() returns a reference to one or more values that are local to the current rule being parsed, if any. If there are two or more local values, _locals() returns a reference to a boost::parser::tuple. Rules with locals are something we haven't gotten to yet (see More About Rules), but for now all you need to know is that you can provide a template parameter (LocalState) to rule, and the rule will default construct an object of that type for use within the rule. You access it via _locals():
_locals() 返回对当前解析规则中一个或多个局部值的引用(如果有的话)。如果有两个或更多局部值, _locals() 返回对 boost::parser::tuple 的引用。具有局部值的规则是我们还没有涉及的(参见关于规则的更多信息),但到目前为止,你需要知道的是,你可以提供一个模板参数( LocalState )给 rule ,规则将默认构造一个该类型的对象以供规则内部使用。你可以通过 _locals() 访问它:

[](auto & ctx) {
    auto & local = _locals(ctx);
    // Use local here.  If 'local' is a hana::tuple, access its members like this:
    using namespace hana::literals;
    auto & first_element = local[0_c];
    auto & second_element = local[1_c];
}

If there is no current rule, or the current rule has no locals, a none is returned.
如果没有当前规则,或者当前规则没有本地变量,则返回一个 none

_params()

_params(), like _locals(), applies to the current rule being used to parse, if any (see More About Rules). It also returns a reference to a single value, if the current rule has only one parameter, or a boost::parser::tuple of multiple values if the current rule has multiple parameters. If there is no current rule, or the current rule has no parameters, a none is returned.
_params() ,类似于 _locals() ,适用于当前正在使用的解析规则(见关于规则的更多信息)。它还返回单个值的引用,如果当前规则只有一个参数,或者返回多个值的 boost::parser::tuple ,如果当前规则有多个参数。如果没有当前规则,或者当前规则没有参数,则返回 none

Unlike with _locals(), you do not provide a template parameter to rule. Instead you call the rule's with() member function (again, see More About Rules).
_locals() 不同,您没有为 rule 提供模板参数。相反,您调用 rulewith() 成员函数(再次,请参阅更多关于规则的内容)。

[Note] Note  注意

none is a type that is used as a return value in Boost.Parser for parse context accessors. none is convertible to anything that has a default constructor, convertible from anything, assignable form anything, and has templated overloads for all the overloadable operators. The intention is that a misuse of _val(), _globals(), etc. should compile, and produce an assertion at runtime. Experience has shown that using a debugger for investigating the stack that leads to your mistake is a far better user experience than sifting through compiler diagnostics. See the Rationale section for a more detailed explanation.
none 是一种类型,在 Boost.Parser 中用作解析上下文访问器的返回值。 none 可以转换为具有默认构造函数的任何类型,可以从任何类型转换,可以赋值给任何类型,并且对所有可重载运算符都有模板重载。意图是,对于 _val()_globals() 等的误用应该能够编译,并在运行时产生断言。经验表明,使用调试器来调查导致你错误的堆栈比筛选编译器诊断要好得多。请参阅“理由”部分以获取更详细的解释。

_no_case()

_no_case() returns true if the current parse context is inside one or more (possibly nested) no_case[] directives. I don't have a use case for this, but if I didn't expose it, it would be the only thing in the context that you could not examine from inside a semantic action. It was easy to add, so I did.
_no_case() 返回 true ,如果当前解析上下文位于一个或多个(可能嵌套的) no_case[] 指令内部。我没有用到这个功能,但如果我不公开它,那么在语义动作内部,你将无法检查上下文中的唯一一个东西。添加它很容易,所以我添加了它。

This example is very similar to the others we've seen so far. This one is different only because it uses a rule. As an analogy, think of a parser like char_ or double_ as an individual line of code, and a rule as a function. Like a function, a rule has its own name, and can even be forward declared. Here is how we define a rule, which is analogous to forward declaring a function:
这个例子与我们迄今为止看到的非常相似。这个例子唯一的不同之处在于它使用了 rule 。作为一个类比,将像 char_double_ 这样的解析器视为一行代码,将 rule 视为一个函数。像函数一样, rule 有自己的名字,甚至可以进行前置声明。以下是我们的定义方式,这相当于前置声明一个函数:

bp::rule<struct doubles, std::vector<double>> doubles = "doubles";

This declares the rule itself. The rule is a parser, and we can immediately use it in other parsers. That definition is pretty dense; take note of these things:
这声明了规则本身。 rule 是一个解析器,我们可以在其他解析器中立即使用它。那个定义相当密集;注意以下事项:

  • The first template parameter is a tag type struct doubles. Here we've declared the tag type and used it all in one go; you can also use a previously declared tag type.
    第一个模板参数是一个标签类型 struct doubles 。这里我们声明了标签类型并一次性使用它;您也可以使用之前声明的标签类型。
  • The second template parameter is the attribute type of the parser. If you don't provide this, the rule will have no attribute.
    第二个模板参数是解析器的属性类型。如果您不提供这个,规则将没有属性。
  • This rule object itself is called doubles.
    这个规则对象本身被称为 doubles
  • We've given doubles the diagnstic text "doubles" so that Boost.Parser knows how to refer to it when producing a trace of the parser during debugging.
    我们已经为 doubles 提供了诊断文本 "doubles" ,这样 Boost.Parser 在调试期间生成解析器跟踪时知道如何引用它。

Ok, so if doubles is a parser, what does it do? We define the rule's behavior by defining a separate parser that by now should look pretty familiar:
好的,所以如果 doubles 是一个解析器,它做什么?我们通过定义一个独立的解析器来定义规则的行為,到目前为止,这个解析器应该看起来相当熟悉:

auto const doubles_def = bp::double_ % ',';

This is analogous to writing a definition for a forward-declared function. Note that we used the name doubles_def. Right now, the doubles rule parser and the doubles_def non-rule parser have no connection to each other. That's intentional — we want to be able to define them separately. To connect them, we declare functions with an interface that Boost.Parser understands, and use the tag type struct doubles to connect them together. We use a macro for that:
这与为已声明的函数编写定义类似。注意,我们使用了名称 doubles_def 。目前, doubles 规则解析器和 doubles_def 非规则解析器之间没有连接。这是故意的——我们希望能够分别定义它们。为了将它们连接起来,我们声明了 Boost.Parser 能够理解的接口函数,并使用标签类型 struct doubles 将它们连接在一起。我们为此使用了一个宏:

BOOST_PARSER_DEFINE_RULES(doubles);

This macro expands to the code necessary to make the rule doubles and its parser doubles_def work together. The _def suffix is a naming convention that this macro relies on to work. The tag type allows the rule parser, doubles, to call one of these overloads when used as a parser.
这个宏展开为使规则 doubles 及其解析器 doubles_def 协同工作的必要代码。 _def 后缀是一种命名约定,这个宏依赖于它来工作。标签类型允许规则解析器 doubles 在用作解析器时调用这些重载之一。

BOOST_PARSER_DEFINE_RULES expands to two overloads of a function called parse_rule(). In the case above, the overloads each take a struct doubles parameter (to distinguish them from the other overloads of parse_rule() for other rules) and parse using doubles_def. You will never need to call any overload of parse_rule() yourself; it is used internally by the parser that implements rules, rule_parser.
BOOST_PARSER_DEFINE_RULES 展开为名为 parse_rule() 的函数的两个重载。在上面的例子中,每个重载都接受一个 struct doubles 参数(以区分其他规则中 parse_rule() 的其他重载)并使用 doubles_def 进行解析。您永远不需要自己调用 parse_rule() 的任何重载;它由实现 rulesrule_parser 的解析器内部使用。

Here is the definition of the macro that is expanded for each rule:
这里是对每个规则展开的宏定义:

#define BOOST_PARSER_DEFINE_IMPL(_, rule_name_)                                \
    template<                                                                  \
        typename Iter,                                                         \
        typename Sentinel,                                                     \
        typename Context,                                                      \
        typename SkipParser>                                                   \
    decltype(rule_name_)::parser_type::attr_type parse_rule(                   \
        decltype(rule_name_)::parser_type::tag_type *,                         \
        Iter & first,                                                          \
        Sentinel last,                                                         \
        Context const & context,                                               \
        SkipParser const & skip,                                               \
        boost::parser::detail::flags flags,                                    \
        bool & success,                                                        \
        bool & dont_assign)                                                    \
    {                                                                          \
        auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def);           \
        using attr_t =                                                         \
            decltype(parser(first, last, context, skip, flags, success));      \
        using attr_type = decltype(rule_name_)::parser_type::attr_type;        \
        if constexpr (boost::parser::detail::is_nope_v<attr_t>) {              \
            dont_assign = true;                                                \
            parser(first, last, context, skip, flags, success);                \
            return {};                                                         \
        } else if constexpr (std::is_same_v<attr_type, attr_t>) {              \
            return parser(first, last, context, skip, flags, success);         \
        } else if constexpr (std::is_constructible_v<attr_type, attr_t>) {     \
            return attr_type(                                                  \
                parser(first, last, context, skip, flags, success));           \
        } else {                                                               \
            attr_type attr{};                                                  \
            parser(first, last, context, skip, flags, success, attr);          \
            return attr;                                                       \
        }                                                                      \
    }                                                                          \
                                                                               \
    template<                                                                  \
        typename Iter,                                                         \
        typename Sentinel,                                                     \
        typename Context,                                                      \
        typename SkipParser,                                                   \
        typename Attribute>                                                    \
    void parse_rule(                                                           \
        decltype(rule_name_)::parser_type::tag_type *,                         \
        Iter & first,                                                          \
        Sentinel last,                                                         \
        Context const & context,                                               \
        SkipParser const & skip,                                               \
        boost::parser::detail::flags flags,                                    \
        bool & success,                                                        \
        bool & dont_assign,                                                    \
        Attribute & retval)                                                    \
    {                                                                          \
        auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def);           \
        using attr_t =                                                         \
            decltype(parser(first, last, context, skip, flags, success));      \
        if constexpr (boost::parser::detail::is_nope_v<attr_t>) {              \
            parser(first, last, context, skip, flags, success);                \
        } else {                                                               \
            parser(first, last, context, skip, flags, success, retval);        \
        }                                                                      \
    }

Now that we have the doubles parser, we can use it like we might any other parser:
现在我们有了 doubles 解析器,我们可以像使用任何其他解析器一样使用它:

auto const result = bp::parse(input, doubles, bp::ws);

The full program:   整个程序:

#include <boost/parser/parser.hpp>

#include <deque>
#include <iostream>
#include <string>


namespace bp = boost::parser;


bp::rule<struct doubles, std::vector<double>> doubles = "doubles";
auto const doubles_def = bp::double_ % ',';
BOOST_PARSER_DEFINE_RULES(doubles);

int main()
{
    std::cout << "Please enter a list of doubles, separated by commas. ";
    std::string input;
    std::getline(std::cin, input);

    auto const result = bp::parse(input, doubles, bp::ws);

    if (result) {
        std::cout << "You entered:\n";
        for (double x : *result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

All this is intended to introduce the notion of rules. It still may be a bit unclear why you would want to use rules. The use cases for, and lots of detail about, rules is in a later section, More About Rules.
所有这些旨在引入 rules 的概念。它仍然可能有点不清楚你为什么想使用 rules 。关于 rules 的使用案例和大量细节将在后面的章节“更多关于规则”中介绍。

[Note] Note  注意

The existence of rules means that will probably never have to write a low-level parser. You can just put existing parsers together into rules instead.
rules 的存在意味着可能永远不需要编写低级解析器。你只需将现有的解析器组合到 rules 中即可。

So far, we've seen only simple parsers that parse the same value repeatedly (with or without commas and spaces). It's also very common to parse a few values in a specific sequence. Let's say you want to parse an employee record. Here's a parser you might write:
到目前为止,我们只看到过简单的解析器,它们反复解析相同的值(带或不带逗号和空格)。解析特定顺序的几个值也非常常见。比如说,你想解析一个员工记录。下面是一个你可能编写的解析器:

namespace bp = boost::parser;
auto employee_parser = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

The attribute type for employee_parser is boost::parser::tuple<int, std::string, std::string, double>. That's great, in that you got all the parsed data for the record without having to write any semantic actions. It's not so great that you now have to get all the individual elements out by their indices, using get(). It would be much nicer to parse into the final data structure that your program is going to use. This is often some struct or class. Boost.Parser supports parsing into arbitrary aggregate structs, and non-aggregates that are constructible from the tuple at hand.
employee_parser 的属性类型是 boost::parser::tuple<int, std::string, std::string, double> 。这很好,因为你得到了记录的所有解析数据,而无需编写任何语义操作。现在你必须通过索引使用 get() 来获取所有单个元素,这就不那么好了。如果能解析成程序将要使用的最终数据结构会更好。这通常是某些 structclass 。Boost.Parser 支持将解析结果存储到任意聚合 struct 中,以及可以从当前元组构造的非聚合结构。

Aggregate types as attributes
聚合类型作为属性

If we have a struct that has data members of the same types listed in the boost::parser::tuple attribute type for employee_parser, it would be nice to parse directly into it, instead of parsing into a tuple and then constructing our struct later. Fortunately, this just works in Boost.Parser. Here is an example of parsing straight into a compatible aggregate type.
如果我们有一个具有与 boost::parser::tuple 属性类型中列出的相同类型的数据成员的 struct ,直接解析到它中会更好,而不是先解析到一个元组,然后再构建我们的 struct 。幸运的是,这正好在 Boost.Parser 中工作。这是一个将数据直接解析到兼容聚合类型的示例。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

namespace bp = boost::parser;

int main()
{
    std::cout << "Enter employee record. ";
    std::string input;
    std::getline(std::cin, input);

    auto quoted_string = bp::lexeme['"' >> +(bp::char_ - '"') >> '"'];
    auto employee_p = bp::lit("employee")
        >> '{'
        >> bp::int_ >> ','
        >> quoted_string >> ','
        >> quoted_string >> ','
        >> bp::double_
        >> '}';

    employee record;
    auto const result = bp::parse(input, employee_p, bp::ws, record);

    if (result) {
        std::cout << "You entered:\nage:      " << record.age
                  << "\nsurname:  " << record.surname
                  << "\nforename: " << record.forename
                  << "\nsalary  : " << record.salary << "\n";
    } else {
        std::cout << "Parse failure.\n";
    }
}

Unfortunately, this is taking advantage of the loose attribute assignment logic; the employee_parser parser still has a boost::parser::tuple attribute. See The parse() API for a description of attribute out-param compatibility.
很不幸,这是利用了宽松的属性赋值逻辑; employee_parser 解析器仍然有一个 boost::parser::tuple 属性。请参阅 parse() API 了解属性输出参数兼容性的描述。

For this reason, it's even more common to want to make a rule that returns a specific type like employee. Just by giving the rule a struct type, we make sure that this parser always generates an employee struct as its attribute, no matter where it is in the parse. If we made a simple parser P that uses the employee_p rule, like bp::int >> employee_p, P's attribute type would be boost::parser::tuple<int, employee>.
因此,更常见的是想要制定一个返回特定类型如 employee 的规则。只需给规则赋予 struct 类型,我们就可以确保这个解析器无论在解析的哪个位置,都始终生成一个 employee 结构作为其属性。如果我们创建一个简单的解析器 P ,它使用 employee_p 规则,如 bp::int >> employee_p ,那么 P 的属性类型将是 boost::parser::tuple<int, employee>

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

namespace bp = boost::parser;

bp::rule<struct quoted_string, std::string> quoted_string = "quoted name";
bp::rule<struct employee_p, employee> employee_p = "employee";

auto quoted_string_def = bp::lexeme['"' >> +(bp::char_ - '"') >> '"'];
auto employee_p_def = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

BOOST_PARSER_DEFINE_RULES(quoted_string, employee_p);

int main()
{
    std::cout << "Enter employee record. ";
    std::string input;
    std::getline(std::cin, input);

    static_assert(std::is_aggregate_v<std::decay_t<employee &>>);

    auto const result = bp::parse(input, employee_p, bp::ws);

    if (result) {
        std::cout << "You entered:\nage:      " << result->age
                  << "\nsurname:  " << result->surname
                  << "\nforename: " << result->forename
                  << "\nsalary  : " << result->salary << "\n";
    } else {
        std::cout << "Parse failure.\n";
    }
}

Just as you can pass a struct as an out-param to parse() when the parser's attribute type is a tuple, you can also pass a tuple as an out-param to parse() when the parser's attribute type is a struct:
正如您可以将一个 struct 作为 out-param 传递给 parse() ,当解析器的属性类型是元组时,您也可以将一个元组作为 out-param 传递给 parse() ,当解析器的属性类型是结构体时:

// Using the employee_p rule from above, with attribute type employee...
boost::parser::tuple<int, std::string, std::string, double> tup;
auto const result = bp::parse(input, employee_p, bp::ws, tup); // Ok!
[Important] Important  重要

This automatic use of structs as if they were tuples depends on a bit of metaprogramming. Due to compiler limits, the metaprogram that detects the number of data members of a struct is limited to a maximum number of members. Fortunately, that limit is configurable; see BOOST_PARSER_MAX_AGGREGATE_SIZE.
这种将 struct 自动用作元组的行为依赖于一点元编程。由于编译器限制,检测 struct 数据成员数量的元程序限制在最大成员数。幸运的是,这个限制是可以配置的;请参阅 BOOST_PARSER_MAX_AGGREGATE_SIZE

General class types as attributes
通用 class 类型作为属性

Many times you don't have an aggregate struct that you want to produce from your parse. It would be even nicer than the aggregate code above if Boost.Parser could detect that the members of a tuple that is produced as an attribute are usable as the arguments to some type's constructor. So, Boost.Parser does that.
很多时候,你并不需要一个从你的解析中生成的聚合结构。如果 Boost.Parser 能够检测到作为属性生成的元组的成员可以用作某些类型的构造函数的参数,那么这将比上面的聚合代码更好。所以,Boost.Parser 就是这样做的。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a string followed by two unsigned integers. ";
    std::string input;
    std::getline(std::cin, input);

    constexpr auto string_uint_uint =
        bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_;
    std::string string_from_parse;
    if (parse(input, string_uint_uint, bp::ws, string_from_parse))
        std::cout << "That yields this string: " << string_from_parse << "\n";
    else
        std::cout << "Parse failure.\n";

    std::cout << "Enter an unsigned integer followed by a string. ";
    std::getline(std::cin, input);
    std::cout << input << "\n";

    constexpr auto uint_string = bp::uint_ >> +bp::char_;
    std::vector<std::string> vector_from_parse;
    if (parse(input, uint_string, bp::ws, vector_from_parse)) {
        std::cout << "That yields this vector of strings:\n";
        for (auto && str : vector_from_parse) {
            std::cout << "  '" << str << "'\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

Let's look at the first parse.
让我们看看第一次解析。

constexpr auto string_uint_uint =
    bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_;
std::string string_from_parse;
if (parse(input, string_uint_uint, bp::ws, string_from_parse))
    std::cout << "That yields this string: " << string_from_parse << "\n";
else
    std::cout << "Parse failure.\n";

Here, we use the parser string_uint_uint, which produces a boost::parser::tuple<std::string, unsigned int, unsigned int> attribute. When we try to parse that into an out-param std::string attribute, it just works. This is because std::string has a constructor that takes a std::string, an offset, and a length. Here's the other parse:
这里,我们使用解析器 string_uint_uint ,它产生一个 boost::parser::tuple<std::string, unsigned int, unsigned int> 属性。当我们尝试将其解析为 out-param std::string 属性时,它就成功了。这是因为 std::string 有一个构造函数,它接受一个 std::string 、一个偏移量和长度。这是另一个解析:

constexpr auto uint_string = bp::uint_ >> +bp::char_;
std::vector<std::string> vector_from_parse;
if (parse(input, uint_string, bp::ws, vector_from_parse)) {
    std::cout << "That yields this vector of strings:\n";
    for (auto && str : vector_from_parse) {
        std::cout << "  '" << str << "'\n";
    }
} else {
    std::cout << "Parse failure.\n";
}

Now we have the parser uint_string, which produces boost::parser::tuple<unsigned int, std::string> attribute — the two chars at the end combine into a std::string. Those two values can be used to construct a std::vector<std::string>, via the count, T constructor.
现在我们有解析器 uint_string ,它产生 boost::parser::tuple<unsigned int, std::string> 属性——末尾的两个 char 结合成一个 std::string 。这两个值可以通过计数, T 构造函数来构建一个 std::vector<std::string>

Just like with using aggregates in place of tuples, non-aggregate class types can be substituted for tuples in most places. That includes using a non-aggregate class type as the attribute type of a rule.
就像用聚合体代替元组一样,大多数情况下可以用非聚合体 class 类型替换元组。这包括将非聚合体 class 类型用作 rule 的属性类型。

However, while compatible tuples can be substituted for aggregates, you can't substitute a tuple for some class type T just because the tuple could have been used to construct T. Think of trying to invert the substitution in the second parse above. Converting a std::vector<std::string> into a boost::parser::tuple<unsigned int, std::string> makes no sense.
然而,虽然兼容元组可以替换聚合,但你不能仅仅因为元组可以用来构建某个 class 类型 T 就替换它。想想在上述第二个解析中尝试反转替换。将一个 std::vector<std::string> 转换为 boost::parser::tuple<unsigned int, std::string> 没有意义。

Frequently, you need to parse something that might have one of several forms. operator| is overloaded to form alternative parsers. For example:
经常,你需要解析可能具有几种形式的内容。 operator| 被重载以形成替代解析器。例如:

namespace bp = boost::parser;
auto const parser_1 = bp::int_ | bp::eps;

parser_1 matches an integer, or if that fails, it matches epsilon, the empty string. This is equivalent to writing:
parser_1 匹配一个整数,如果失败,则匹配空字符串 epsilon。这相当于写成:

namespace bp = boost::parser;
auto const parser_2 = -bp::int_;

However, neither parser_1 nor parser_2 is equivalent to writing this:
然而, parser_1parser_2 都不等同于这样写:

namespace bp = boost::parser;
auto const parser_3 = bp::eps | bp::int_; // Does not do what you think.

The reason is that alternative parsers try each of their subparsers, one at a time, and stop on the first one that matches. Epsilon matches anything, since it is zero length and consumes no input. It even matches the end of input. This means that parser_3 is equivalent to eps by itself.
原因是替代解析器逐个尝试它们的子解析器,并在第一个匹配的停止。Epsilon 匹配任何内容,因为它长度为零且不消耗任何输入。它甚至可以匹配输入的末尾。这意味着 parser_3eps 本身等价。

[Note] Note  注意

For this reason, writing eps | p for any parser p is considered a bug. Debug builds will assert when eps | p is encountered.
因此,对于任何解析器 p,写入 eps | p 被视为一个错误。在调试构建中,遇到 eps | p 时会断言。

[Warning] Warning  警告

This kind of error is very common when eps is involved, and also very easy to detect. However, it is possible to write P1 >> P2, where P1 is a prefix of P2, such as int_ | int >> int_, or repeat(4)[hex_digit] | repeat(8)[hex_digit]. This is almost certainly an error, but is impossible to detect in the general case — remember that rules can be separately compiled, and consider a pair of rules whose associated _def parsers are int_ and int_ >> int_, respectively.
这种错误在涉及 eps 时非常常见,也很容易检测到。然而,可能编写 P1 >> P2 ,其中 P1P2 的前缀,例如 int_ | int >> int_ ,或 repeat(4)[hex_digit] | repeat(8)[hex_digit] 。这几乎肯定是一个错误,但在一般情况下无法检测到——记住 rules 可以单独编译,并考虑一对相关联的 _def 解析器,分别是 int_int_ >> int_

It is very common to need to parse quoted strings. Quoted strings are slightly tricky, though, when using a skipper (and you should be using a skipper 99% of the time). You don't want to allow arbitrary whitespace in the middle of your strings, and you also don't want to remove all whitespace from your strings. Both of these things will happen with the typical skipper, ws.
需要解析引号字符串的情况非常常见。然而,当使用跳过符时(你应该 99%的时间使用跳过符),引号字符串会变得稍微棘手一些。你不想在字符串中间允许任意空白字符,同时也不想从字符串中移除所有空白字符。典型的跳过符 ws 会导致这两种情况都发生。

So, here is how most people would write a quoted string parser:
所以,这是大多数人编写引号字符串解析器的方式:

namespace bp = boost::parser;
const auto string = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];

Some things to note:
请注意以下几点:

  • the result is a string;
    结果是字符串;
  • the quotes are not included in the result;
    引号不包括在结果中;
  • there is an expectation point before the close-quote;
    在引号关闭之前有一个期望点
  • the use of lexeme[] disables skipping in the parser, and it must be written around the quotes, not around the operator* expression; and
    使用 lexeme[] 禁用解析器的跳过功能,并且它必须写在引号周围,而不是 operator* 表达式周围;
  • there's no way to write a quote in the middle of the string.
    无法在字符串中间写入引号。

This is a very common pattern. I have written a quoted string parser like this dozens of times. The parser above is the quick-and-dirty version. A more robust version would be able to handle escaped quotes within the string, and then would immediately also need to support escaped escape characters.
这是一个非常常见的模式。我像这样写过几十次引号字符串解析器。上面的解析器是快速且简单的版本。一个更健壮的版本将能够处理字符串中的转义引号,然后还需要立即支持转义转义字符。

Boost.Parser provides quoted_string to use in place of this very common pattern. It supports quote- and escaped-character-escaping, using backslash as the escape character.
Boost.Parser 提供 quoted_string 来替代这个非常常见的模式。它支持引号和转义字符转义,使用反斜杠作为转义字符。

namespace bp = boost::parser;

auto result1 = bp::parse("\"some text\"", bp::quoted_string, bp::ws);
assert(result1);
std::cout << *result1 << "\n"; // Prints: some text

auto result2 =
    bp::parse("\"some \\\"text\\\"\"", bp::quoted_string, bp::ws);
assert(result2);
std::cout << *result2 << "\n"; // Prints: some "text"

As common as this use case is, there are very similar use cases that it does not cover. So, quoted_string has some options. If you call it with a single character, it returns a quoted_string that uses that single character as the quote-character.
与这种用例一样常见的是,还有一些非常类似的用例它没有涵盖。因此, quoted_string 有一些选项。如果你用单个字符调用它,它就返回一个使用该单个字符作为引号字符的 quoted_string

auto result3 = bp::parse("!some text!", bp::quoted_string('!'), bp::ws);
assert(result3);
std::cout << *result3 << "\n"; // Prints: some text

You can also supply a range of characters. One of the characters from the range must quote both ends of the string; mismatches are not allowed. Think of how Python allows you to quote a string with either '"' or '\'', but the same character must be used on both sides.
您也可以提供一组字符。该范围内的一个字符必须引用字符串的两端;不允许有误匹配。想想 Python 如何允许您使用 '"''\'' 来引用字符串,但两侧必须使用相同的字符。

auto result4 = bp::parse("'some text'", bp::quoted_string("'\""), bp::ws);
assert(result4);
std::cout << *result4 << "\n"; // Prints: some text

Another common thing to do in a quoted string parser is to recognize escape sequences. If you have simple escape sequencecs that do not require any real parsing, like say the simple escape sequences from C++, you can provide a symbols object as well. The template parameter T to symbols<T> must be char or char32_t. You don't need to include the escaped backslash or the escaped quote character, since those always work.
另一项在引号字符串解析器中常见的操作是识别转义序列。如果您有简单的转义序列,不需要任何实际解析,比如 C++中的简单转义序列,您也可以提供一个 symbols 对象。模板参数 Tsymbols<T> 必须是 charchar32_t 。您不需要包含转义的反斜杠或转义的引号字符,因为那些总是有效的。

// the c++ simple escapes
bp::symbols<char> const escapes = {
    {"'", '\''},
    {"?", '\?'},
    {"a", '\a'},
    {"b", '\b'},
    {"f", '\f'},
    {"n", '\n'},
    {"r", '\r'},
    {"t", '\t'},
    {"v", '\v'}};
auto result5 =
    bp::parse("\"some text\r\"", bp::quoted_string('"', escapes), bp::ws);
assert(result5);
std::cout << *result5 << "\n"; // Prints (with a CRLF newline): some text

Now that you've seen some examples, let's see how parsing works in a bit more detail. Consider this example.
现在你已经看到了一些例子,让我们更详细地看看解析是如何工作的。考虑这个例子。

namespace bp = boost::parser;
auto int_pair = bp::int_ >> bp::int_;         // Attribute: tuple<int, int>
auto int_pairs_plus = +int_pair >> bp::int_;  // Attribute: tuple<std::vector<tuple<int, int>>, int>

int_pairs_plus must match a pair of ints (using int_pair) one or more times, and then must match an additional int. In other words, it matches any odd number (greater than 1) of ints in the input. Let's look at how this parse proceeds.
int_pairs_plus 必须匹配一对 int s(使用 int_pair ),一次或多次,然后必须匹配一个额外的 int 。换句话说,它匹配输入中任何奇数(大于 1)的 int s。让我们看看这个解析是如何进行的。

auto result = bp::parse("1 2 3", int_pairs_plus, bp::ws);

At the beginning of the parse, the top level parser uses its first subparser (if any) to start parsing. So, int_pairs_plus, being a sequence parser, would pass control to its first parser +int_pair. Then +int_pair would use int_pair to do its parsing, which would in turn use bp::int_. This creates a stack of parsers, each one using a particular subparser.
在解析开始时,顶级解析器使用其第一个子解析器(如果有)来开始解析。因此,作为序列解析器的 int_pairs_plus 会将控制权传递给其第一个解析器 +int_pair 。然后 +int_pair 会使用 int_pair 进行解析,而 int_pair 又会使用 bp::int_ 。这创建了一个解析器栈,每个解析器都使用特定的子解析器。

Step 1) The input is "1 2 3", and the stack of active parsers is int_pairs_plus -> +int_pair -> int_pair -> bp::int_. (Read "->" as "uses".) This parses "1", and the whitespace after is skipped by bp::ws. Control passes to the second bp::int_ parser in int_pair.
步骤 1)输入为 "1 2 3" ,活动解析器栈为 int_pairs_plus -> +int_pair -> int_pair -> bp::int_ 。(将"->"读作"使用"。)这解析 "1" ,后面的空白由 bp::ws 跳过。控制权传递到 int_pair 中的第二个 bp::int_ 解析器。

Step 2) The input is "2 3" and the stack of parsers looks the same, except the active parser is the second bp::int_ from int_pair. This parser consumes "2" and then bp::ws skips the subsequent space. Since we've finished with int_pair's match, its boost::parser::tuple<int, int> attribute is complete. It's parent is +int_pair, so this tuple attribute is pushed onto the back of +int_pair's attribute, which is a std::vector<boost::parser::tuple<int, int>>. Control passes up to the parent of int_pair, +int_pair. Since +int_pair is a one-or-more parser, it starts a new iteration; control passes to int_pair again.
步骤 2)输入是 "2 3" ,解析器栈看起来相同,除了活动解析器是第二个 bp::int_int_pair 。这个解析器消耗 "2" ,然后 bp::ws 跳过后续空格。由于我们已经完成了 int_pair 的匹配,其 boost::parser::tuple<int, int> 属性已完成。它的父级是 +int_pair ,因此这个元组属性被推到 +int_pair 的属性后面, +int_pair 是一个 std::vector<boost::parser::tuple<int, int>> 。控制权传递到 int_pair 的父级, +int_pair 。由于 +int_pair 是一个一次或多次解析器,它开始新的迭代;控制权再次传递到 int_pair

Step 3) The input is "3" and the stack of parsers looks the same, except the active parser is the first bp::int_ from int_pair again, and we're in the second iteration of +int_pair. This parser consumes "3". Since this is the end of the input, the second bp::int_ of int_pair does not match. This partial match of "3" should not count, since it was not part of a full match. So, int_pair indicates its failure, and +int_pair stops iterating. Since it did match once, +int_pair does not fail; it is a zero-or-more parser; failure of its subparser after the first success does not cause it to fail. Control passes to the next parser in sequence within int_pairs_plus.
步骤 3)输入是 "3" ,解析器栈看起来相同,除了活动解析器是第一个从 int_pair 开始的 bp::int_ ,并且我们处于 +int_pair 的第二次迭代。此解析器消耗 "3" 。由于这是输入的末尾, int_pair 的第二个 bp::int_ 不匹配。这个 "3" 的部分匹配不应计算,因为它不是完整匹配的一部分。因此, int_pair 指示其失败, +int_pair 停止迭代。由于它已经匹配过一次, +int_pair 不会失败;它是一个零次或多次解析器;其子解析器在第一次成功后的失败不会导致它失败。控制传递到 int_pairs_plus 中的下一个解析器。

Step 4) The input is "3" again, and the stack of parsers is int_pairs_plus -> bp::int_. This parses the "3", and the parse reaches the end of input. Control passes to int_pairs_plus, which has just successfully matched with all parser in its sequence. It then produces its attribute, a boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>, which gets returned from bp::parse().
步骤 4)输入再次为 "3" ,解析器栈为 int_pairs_plus -> bp::int_ 。这解析了 "3" ,解析到达输入末尾。控制传递到 int_pairs_plus ,它刚刚成功匹配其序列中的所有解析器。然后它产生其属性,一个 boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int> ,从 bp::parse() 返回。

Something to take note of between Steps #3 and #4: at the beginning of #4, the input position had returned to where is was at the beginning of #3. This kind of backtracking happens in alternative parsers when an alternative fails. The next page has more details on the semantics of backtracking.
请注意步骤#3 和#4 之间的内容:在#4 的开始,输入位置回到了#3 的开始处。这种回溯发生在替代解析器中,当替代失败时。下一页有更多关于回溯语义的细节。

Parsers in detail
解析器详情

So far, parsers have been presented as somewhat abstract entities. You may be wanting more detail. A Boost.Parser parser P is an invocable object with a pair of call operator overloads. The two functions are very similar, and in many parsers one is implemented in terms of the other. The first function does the parsing and returns the default attribute for the parser. The second function does exactly the same parsing, but takes an out-param into which it writes the attribute for the parser. The out-param does not need to be the same type as the default attribute, but they need to be compatible.
到目前为止,解析器被呈现为某种程度上的抽象实体。你可能想要更多细节。一个 Boost.Parser 解析器 P 是一个可调用的对象,具有一对重载的调用操作符。这两个函数非常相似,在许多解析器中,一个是通过另一个实现的。第一个函数执行解析并返回解析器的默认属性。第二个函数执行完全相同的解析,但将解析器的属性写入一个输出参数。输出参数不需要与默认属性相同类型,但它们需要兼容。

Compatibility means that the default attribute is assignable to the out-param in some fashion. This usually means direct assignment, but it may also mean a tuple -> aggregate or aggregate -> tuple conversion. For sequence types, compatibility means that the sequence type has insert or push_back with the usual semantics. This means that the parser +boost::parser::int_ can fill a std::set<int> just as well as a std::vector<int>.
兼容性意味着默认属性可以以某种方式分配给输出参数。这通常意味着直接赋值,但也可能意味着元组到聚合或聚合到元组的转换。对于序列类型,兼容性意味着序列类型具有 insertpush_back 与常规语义。这意味着解析器 +boost::parser::int_ 可以像 std::set<int> 一样填充 std::vector<int>

Some parsers also have additional state that is required to perform a match. For instance, char_ parsers can be parameterized with a single code point to match; the exact value of that code point is stored in the parser object.
一些解析器还需要额外的状态来执行匹配。例如, char_ 解析器可以用单个码点进行参数化以进行匹配;该码点的确切值存储在解析器对象中。

No parser has direct support for all the operations defined on parsers (operator|, operator>>, etc.). Instead, there is a template called parser_interface that supports all of these operations. parser_interface wraps each parser, storing it as a data member, adapting it for general use. You should only ever see parser_interface in the debugger, or possibly in some of the reference documentation. You should never have to write it in your own code.
没有解析器直接支持在解析器上定义的所有操作( operator|operator>> 等)。相反,有一个名为 parser_interface 的模板支持所有这些操作。 parser_interface 包装每个解析器,将其存储为数据成员,以便于通用使用。你只能在调试器中看到 parser_interface ,或者在部分参考文档中。你永远不需要在自己的代码中编写它。

As described in the previous page, backtracking occurs when the parse attempts to match the current parser P, matches part of the input, but fails to match all of P. The part of the input consumed during the parse of P is essentially "given back".
如前页所述,当解析尝试匹配当前解析器 P 时,匹配了输入的一部分,但未能匹配所有 P 。在解析 P 时消耗的输入部分实际上是“返回”。

This is necessary because P may consist of subparsers, and each subparser that succeeds will try to consume input, produce attributes, etc. When a later subparser fails, the parse of P fails, and the input must be rewound to where it was when P started its parse, not where the latest matching subparser stopped.
这是必要的,因为 P 可能包含子解析器,每个成功的子解析器都会尝试消费输入、生成属性等。当后续的子解析器失败时, P 的解析也会失败,并且输入必须回滚到 P 开始解析时的位置,而不是最新匹配的子解析器停止的位置。

Alternative parsers will often evaluate multiple subparsers one at a time, advancing and then restoring the input position, until one of the subparsers succeeds. Consider this example.
替代解析器通常会逐个评估多个子解析器,前进并恢复输入位置,直到其中一个子解析器成功。考虑这个例子。

namespace bp = boost::parser;
auto const parser = repeat(53)[other_parser] | repeat(10)[other_parser];

Evaluating parser means trying to match other_parser 53 times, and if that fails, trying to match other_parser 10 times. Say you parse input that matches other_parser 11 times. parser will match it. It will also evaluate other_parser 21 times during the parse.
评估 parser 意味着尝试匹配 other_parser 53 次,如果失败,则尝试匹配 other_parser 10 次。假设你解析了匹配 other_parser 11 次的输入。 parser 将匹配它。在解析过程中,它还将评估 other_parser 21 次。

The attributes of the repeat(53)[other_parser] and repeat(10)[other_parser] are each std::vector<ATTR(other_parser)>; let's say that ATTR(other_parser) is int. The attribute of parser as a whole is the same, std::vector<int>. Since other_parser is busy producing ints — 21 of them to be exact — you may be wondering what happens to the ones produced during the evaluation of repeat(53)[other_parser] when it fails to find all 53 inputs. Its std::vector<int> will contain 11 ints at that point.
repeat(53)[other_parser]repeat(10)[other_parser] 的属性各为 std::vector<ATTR(other_parser)> ;假设 ATTR(other_parser)intparser 的整体属性相同,为 std::vector<int> 。由于 other_parser 正在忙于生产 int ,确切地说有 21 个——你可能想知道在 repeat(53)[other_parser] 未能找到所有 53 个输入时,在评估期间产生的那些会发生什么。那时它的 std::vector<int> 将包含 11 个 int

When a repeat-parser fails, and attributes are being generated, it clears its container. This applies to parsers such as the ones above, but also all the other repeat parsers, including ones made using operator+ or operator*.
当重复解析器失败且正在生成属性时,它会清除其容器。这适用于上述解析器,也适用于所有其他重复解析器,包括使用 operator+operator* 制作的解析器。

So, at the end of a successful parse by parser of 10 inputs (since the right side of the alternative only eats 10 repetitions), the std::vector<int> attribute of parser would contain 10 ints.
因此,在通过 parser 成功解析 10 个输入的末尾(因为替代项的右侧只吃 10 次重复), parserstd::vector<int> 属性将包含 10 个 int

[Note] Note  注意

Users of Boost.Spirit may be familiar with the hold[] directive. Because of the behavior described above, there is no such directive in Boost.Parser.
Boost.Spirit 的用户可能熟悉 hold[] 指令。由于上述描述的行为,Boost.Parser 中没有这样的指令。

Expectation points  期待值

Ok, so if parsers all try their best to match the input, and are all-or-nothing, doesn't that leave room for all kinds of bad input to be ignored? Consider the top-level parser from the Parsing JSON example.
好的,所以如果所有解析器都尽力匹配输入,并且都是全有或全无的,那么这不是为各种不良输入留出了空间吗?考虑一下“解析 JSON 示例”中的顶级解析器。

auto const value_p_def =
    number | bp::bool_ | null | string | array_p | object_p;

What happens if I use this to parse "\""? The parse tries number, fails. It then tries bp::bool_, fails. Then null fails too. Finally, it starts parsing string. Good news, the first character is the open-quote of a JSON string. Unfortunately, that's also the end of the input, so string must fail too. However, we probably don't want to just give up on parsing string now and try array_p, right? If the user wrote an open-quote with no matching close-quote, that's not the prefix of some later alternative of value_p_def; it's ill-formed JSON. Here's the parser for the string rule:
如果我用这个来解析 "\"" 会发生什么?解析尝试 number ,失败。然后尝试 bp::bool_ ,也失败了。接着 null 也失败了。最后,它开始解析 string 。好消息,第一个字符是 JSON 字符串的开引号。不幸的是,这也是输入的结尾,所以 string 也必须失败。然而,我们现在可能不想放弃解析 string 并尝试 array_p ,对吧?如果用户写了一个没有匹配闭合引号的开放引号,那不是 value_p_def 的某些后续替代的前缀;这是不规范的 JSON。这是 string 规则的解析器:

auto const string_def = bp::lexeme['"' >> *(string_char - '"') > '"'];

Notice that operator> is used on the right instead of operator>>. This indicates the same sequence operation as operator>>, except that it also represents an expectation. If the parse before the operator> succeeds, whatever comes after it must also succeed. Otherwise, the top-level parse is failed, and a diagnostic is emitted. It will say something like "Expected '"' here.", quoting the line, with a caret pointing to the place in the input where it expected the right-side match.
请注意,在右侧使用的是 operator> 而不是 operator>> 。这表示与 operator>> 相同的序列操作,但同时也代表了一种期望。如果在 operator> 之前的解析成功,那么它之后的内容也必须成功。否则,顶级解析将失败,并发出诊断。它可能会说“在这里期望'\"'”,引用该行,并用一个箭头指向输入中期望右侧匹配的位置。

Choosing to use > versus >> is how you indicate to Boost.Parser that parse failure is or is not a hard error, respectively.
选择使用 >>> 来指示 Boost.Parser 解析失败是或不是硬错误。

When writing a parser, it often comes up that there is a set of strings that, when parsed, are associated with a set of values one-to-one. It is tedious to write parsers that recognize all the possible input strings when you have to associate each one with an attribute via a semantic action. Instead, we can use a symbol table.
在编写解析器时,经常会出现一组字符串,当解析时,它们与一组值一一对应。当你必须通过语义动作将每个字符串与一个属性关联时,编写识别所有可能输入字符串的解析器是繁琐的。相反,我们可以使用符号表。

Say we want to parse Roman numerals, one of the most common work-related parsing problems. We want to recognize numbers that start with any number of "M"s, representing thousands, followed by the hundreds, the tens, and the ones. Any of these may be absent from the input, but not all. Here are three symbol Boost.Parser tables that we can use to recognize ones, tens, and hundreds values, respectively:
我们想要解析罗马数字,这是最常见的与工作相关解析问题之一。我们想要识别以任意数量的"M"开头的数字,代表千位,然后是百位、十位和个位。这些中的任何一个都可以从输入中省略,但不能全部省略。以下是三个符号 Boost.Parser 表,我们可以使用它们分别识别个位、十位和百位的值:

bp::symbols<int> const ones = {
    {"I", 1},
    {"II", 2},
    {"III", 3},
    {"IV", 4},
    {"V", 5},
    {"VI", 6},
    {"VII", 7},
    {"VIII", 8},
    {"IX", 9}};

bp::symbols<int> const tens = {
    {"X", 10},
    {"XX", 20},
    {"XXX", 30},
    {"XL", 40},
    {"L", 50},
    {"LX", 60},
    {"LXX", 70},
    {"LXXX", 80},
    {"XC", 90}};

bp::symbols<int> const hundreds = {
    {"C", 100},
    {"CC", 200},
    {"CCC", 300},
    {"CD", 400},
    {"D", 500},
    {"DC", 600},
    {"DCC", 700},
    {"DCCC", 800},
    {"CM", 900}};

A symbols maps strings of char to their associated attributes. The type of the attribute must be specified as a template parameter to symbols — in this case, int.
一个 symbolschar 的字符串映射到其关联的属性。属性的类型必须作为模板参数指定给 symbols — 在这种情况下, int

Any "M"s we encounter should add 1000 to the result, and all other values come from the symbol tables. Here are the semantic actions we'll need to do that:
任何遇到的“M”都应该将结果加 1000,其他所有值都来自符号表。以下是我们需要执行的语义动作:

int result = 0;
auto const add_1000 = [&result](auto & ctx) { result += 1000; };
auto const add = [&result](auto & ctx) { result += _attr(ctx); };

add_1000 just adds 1000 to result. add adds whatever attribute is produced by its parser to result.
add_1000 仅将 1000 添加到 resultadd 将其解析器产生的任何属性添加到 result

Now we just need to put the pieces together to make a parser:
现在我们只需要将这些部分组合起来制作一个解析器:

using namespace bp::literals;
auto const parser =
    *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add];

We've got a few new bits in play here, so let's break it down. 'M'_l is a literal parser. That is, it is a parser that parses a literal char, code point, or string. In this case, a char 'M' is being parsed. The _l bit at the end is a UDL suffix that you can put after any char, char32_t, or char const * to form a literal parser. You can also make a literal parser by writing lit(), passing an argument of one of the previously mentioned types.
我们在这里有一些新的功能,让我们来分解一下。 'M'_l 是一个字面量解析器。也就是说,它是一个解析字面量 char 、代码点或字符串的解析器。在这种情况下,正在解析一个 char 'M' 。末尾的 _l 位是一个 UDL 后缀,您可以在任何 charchar32_tchar const * 后面添加它来形成一个字面量解析器。您还可以通过编写 lit() 并传递之前提到的类型之一作为参数来创建一个字面量解析器。

Why do we need any of this, considering that we just used a literal ',' in our previous example? The reason is that 'M' is not used in an expression with another Boost.Parser parser. It is used within *'M'_l[add_1000]. If we'd written *'M'[add_1000], clearly that would be ill-formed; char has no operator*, nor an operator[], associated with it.
为什么我们需要这些,考虑到我们之前例子中刚刚使用了字面量 ',' ?原因是 'M' 不在另一个 Boost.Parser 解析器中的表达式中使用。它是在 *'M'_l[add_1000] 中使用的。如果我们写了 *'M'[add_1000] ,显然那是非法的; char 没有与它相关的 operator* ,也没有 operator[]

[Tip] Tip  提示

Any time you want to use a char, char32_t, or string literal in a Boost.Parser parser, write it as-is if it is combined with a preexisting Boost.Parser subparser p, as in 'x' >> p. Otherwise, you need to wrap it in a call to lit(), or use the _l UDL suffix.
任何您想在 Boost.Parser 解析器中使用 charchar32_t 或字符串字面量时,如果它与现有的 Boost.Parser 子解析器 p 结合使用,则按原样写入,例如 'x' >> p 。否则,您需要将其包裹在调用 lit() 中,或者使用 _l UDL 后缀。

On to the next bit: -hundreds[add]. By now, the use of the index operator should be pretty familiar; it associates the semantic action add with the parser hundreds. The operator- at the beginning is new. It means that the parser it is applied to is optional. You can read it as "zero or one". So, if hundreds is not successfully parsed after *'M'[add_1000], nothing happens, because hundreds is allowed to be missing — it's optional. If hundreds is parsed successfully, say by matching "CC", the resulting attribute, 200, is added to result inside add.
接下来是下一部分: -hundreds[add] 。到现在,索引操作符的使用应该已经很熟悉了;它与解析器 hundreds 关联语义动作 add 。开头的 operator- 是新的。这意味着应用到的解析器是可选的。你可以把它读作“零或一”。所以,如果 hundreds*'M'[add_1000] 之后没有成功解析,就没有什么发生,因为 hundreds 可以缺失——它是可选的。如果 hundreds 成功解析,比如说通过匹配 "CC" ,结果属性 200 将被添加到 result 中的 add 内部。

Here is the full listing of the program. Notice that it would have been inappropriate to use a whitespace skipper here, since the entire parse is a single number, so it was removed.
这里是程序的完整列表。请注意,在这里使用空格跳过是不合适的,因为整个解析是一个单独的数字,所以它被移除了。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a number using Roman numerals. ";
    std::string input;
    std::getline(std::cin, input);

    bp::symbols<int> const ones = {
        {"I", 1},
        {"II", 2},
        {"III", 3},
        {"IV", 4},
        {"V", 5},
        {"VI", 6},
        {"VII", 7},
        {"VIII", 8},
        {"IX", 9}};

    bp::symbols<int> const tens = {
        {"X", 10},
        {"XX", 20},
        {"XXX", 30},
        {"XL", 40},
        {"L", 50},
        {"LX", 60},
        {"LXX", 70},
        {"LXXX", 80},
        {"XC", 90}};

    bp::symbols<int> const hundreds = {
        {"C", 100},
        {"CC", 200},
        {"CCC", 300},
        {"CD", 400},
        {"D", 500},
        {"DC", 600},
        {"DCC", 700},
        {"DCCC", 800},
        {"CM", 900}};

    int result = 0;
    auto const add_1000 = [&result](auto & ctx) { result += 1000; };
    auto const add = [&result](auto & ctx) { result += _attr(ctx); };

    using namespace bp::literals;
    auto const parser =
        *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add];

    if (bp::parse(input, parser) && result != 0)
        std::cout << "That's " << result << " in Arabic numerals.\n";
    else
        std::cout << "That's not a Roman number.\n";
}

[Important] Important  重要

symbols stores all its strings in UTF-32 internally. If you do Unicode or ASCII parsing, this will not matter to you at all. If you do non-Unicode parsing of a character encoding that is not a subset of Unicode (EBCDIC, for instance), it could cause problems. See the section on Unicode Support for more information.
symbols 在内部以 UTF-32 存储所有字符串。如果你进行 Unicode 或 ASCII 解析,这对你来说根本无关紧要。如果你对不是 Unicode 子集的字符编码进行非 Unicode 解析(例如 EBCDIC),可能会引起问题。有关更多信息,请参阅关于 Unicode 支持的章节。

Diagnostic messages  诊断信息

Just like with a rule, you can give a symbols a bit of diagnostic text that will be used in error messages generated by Boost.Parser when the parse fails at an expectation point, as described in Error Handling and Debugging. See the symbols constructors for details.
就像使用 rule 一样,您可以为 symbols 提供一些诊断文本,这些文本将在 Boost.Parser 在期望点解析失败时生成的错误消息中使用,如错误处理和调试中所述。有关详细信息,请参阅 symbols 构造函数。

The previous example showed how to use a symbol table as a fixed lookup table. What if we want to add things to the table during the parse? We can do that, but we need to do so within a semantic action. First, here is our symbol table, already with a single value in it:
前一个示例展示了如何使用符号表作为固定查找表。如果我们想在解析过程中向表中添加内容怎么办?我们可以这样做,但需要在语义动作中完成。首先,这是我们的符号表,其中已经包含了一个值:

bp::symbols<int> const symbols = {{"c", 8}};
assert(parse("c", symbols));

No surprise that it works to use the symbol table as a parser to parse the one string in the symbol table. Now, here's our parser:
没有任何惊讶,使用符号表作为解析器来解析符号表中的一个字符串是可行的。现在,这是我们的解析器:

auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;

Here, we've attached the semantic action not to a simple parser like double_, but to the sequence parser (bp::char_ >> bp::int_). This sequence parser contains two parsers, each with its own attribute, so it produces two attributes as a tuple.
这里,我们将语义动作附加到序列解析器 (bp::char_ >> bp::int_) ,而不是简单的解析器 double_ 。这个序列解析器包含两个解析器,每个解析器都有自己的属性,因此它产生一个包含两个属性的元组。

auto const add_symbol = [&symbols](auto & ctx) {
    using namespace bp::literals;
    // symbols::insert() requires a string, not a single character.
    char chars[2] = {_attr(ctx)[0_c], 0};
    symbols.insert(ctx, chars, _attr(ctx)[1_c]);
};

Inside the semantic action, we can get the first element of the attribute tuple using UDLs provided by Boost.Hana, and boost::hana::tuple::operator[](). The first attribute, from the char_, is _attr(ctx)[0_c], and the second, from the int_, is _attr(ctx)[1_c] (if boost::parser::tuple aliases to std::tuple, you'd use std::get or boost::parser::get instead). To add the symbol to the symbol table, we call insert().
在语义动作中,我们可以使用 Boost.Hana 提供的 UDL 获取属性元组的第一个元素,以及 boost::hana::tuple::operator[]() 。第一个属性,来自 char_ ,是 _attr(ctx)[0_c] ,第二个,来自 int_ ,是 _attr(ctx)[1_c] (如果 boost::parser::tuple 别名到 std::tuple ,则使用 std::getboost::parser::get )。要将符号添加到符号表中,我们调用 insert()

auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;

During the parse, ("X", 9) is parsed and added to the symbol table. Then, the second 'X' is recognized by the symbol table parser. However:
在解析过程中, ("X", 9) 被解析并添加到符号表中。然后,符号表解析器识别了第二个 'X' 。然而:

assert(!parse("X", symbols));

If we parse again, we find that "X" did not stay in the symbol table. The fact that symbols was declared const might have given you a hint that this would happen.
如果我们再次解析,我们会发现 "X" 没有留在符号表中。 symbols 被声明为 const 的事实可能已经给你暗示了这种情况会发生。

The full program:   整个程序:

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    bp::symbols<int> const symbols = {{"c", 8}};
    assert(parse("c", symbols));

    auto const add_symbol = [&symbols](auto & ctx) {
        using namespace bp::literals;
        // symbols::insert() requires a string, not a single character.
        char chars[2] = {_attr(ctx)[0_c], 0};
        symbols.insert(ctx, chars, _attr(ctx)[1_c]);
    };
    auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;

    auto const result = parse("X 9 X", parser, bp::ws);
    assert(result && *result == 9);
    (void)result;

    assert(!parse("X", symbols));
}

[Important] Important  重要

symbols stores all its strings in UTF-32 internally. If you do Unicode or ASCII parsing, this will not matter to you at all. If you do non-Unicode parsing of a character encoding that is not a subset of Unicode (EBCDIC, for instance), it could cause problems. See the section on Unicode Support for more information.
symbols 在内部以 UTF-32 存储所有字符串。如果你进行 Unicode 或 ASCII 解析,这对你来说根本无关紧要。如果你对不是 Unicode 子集的字符编码进行非 Unicode 解析(例如 EBCDIC),可能会引起问题。有关更多信息,请参阅关于 Unicode 支持的章节。

It is possible to add symbols to a symbols permanently. To do so, you have to use a mutable symbols object s, and add the symbols by calling s.insert_for_next_parse(), instead of s.insert(). These two operations are orthogonal, so if you want to both add a symbol to the table for the current top-level parse, and leave it in the table for subsequent top-level parses, you need to call both functions.
可以永久地向 symbols 添加符号。为此,您必须使用可变 symbols 对象 s ,并通过调用 s.insert_for_next_parse() 添加符号,而不是 s.insert() 。这两个操作是正交的,因此如果您想同时将符号添加到当前顶级解析的表中,并保留在后续顶级解析的表中,您需要调用这两个函数。

It is also possible to erase a single entry from the symbol table, or to clear the symbol table entirely. Just as with insertion, there are versions of erase and clear for the current parse, and another that applies only to subsequent parses. The full set of operations can be found in the symbols API docs.
也可以从符号表中删除单个条目,或者完全清除符号表。与插入类似,删除和清除操作也有针对当前解析的版本,以及仅适用于后续解析的版本。完整的操作集可以在 symbols API 文档中找到。

[mpte There are two versions of each of the symbols *_for_next_parse() functions — one that takes a context, and one that does not. The one with the context is meant to be used within a semantic action. The one without the context is for use outside of any parse.]
[mpte 每个 symbols *_for_next_parse() 函数都有两个版本——一个接受上下文,一个不接受。带有上下文的版本旨在在语义动作中使用。不带上下文的版本用于任何解析之外。]

Boost.Parser comes with all the parsers most parsing tasks will ever need. Each one is a constexpr object, or a constexpr function. Some of the non-functions are also callable, such as char_, which may be used directly, or with arguments, as in char_('a', 'z'). Any parser that can be called, whether a function or callable object, will be called a callable parser from now on. Note that there are no nullary callable parsers; they each take one or more arguments.
Boost.Parser 附带所有大多数解析任务所需的解析器。每个解析器都是一个 constexpr 对象,或者一个 constexpr 函数。其中一些非函数也是可调用的,例如 char_ ,可以直接使用,或者带参数使用,如 char_ ('a', 'z') 。任何可以调用的解析器,无论是函数还是可调用对象,从现在起都称为可调用解析器。请注意,没有无参可调用解析器;它们每个都接受一个或多个参数。

Each callable parser takes one or more parse arguments. A parse argument may be a value or an invocable object that accepts a reference to the parse context. The reference parameter may be mutable or constant. For example:
每个可调用的解析器接受一个或多个解析参数。解析参数可能是一个值或接受解析上下文引用的可调用对象。引用参数可以是可变的或常量的。例如:

struct get_attribute
{
    template<typename Context>
    auto operator()(Context & ctx)
    {
        return _attr(ctx);
    }
};

This can also be a lambda. For example:
这也可以是一个 lambda。例如:

[](auto const & ctx) { return _attr(ctx); }

The operation that produces a value from a parse argument, which may be a value or a callable taking a parse context argument, is referred to as resolving the parse argument. If a parse argument arg can be called with the current context, then the resolved value of arg is arg(ctx); otherwise, the resolved value is just arg.
解析参数的操作,该参数可能是一个值或一个接受解析上下文参数的可调用对象,被称为解析参数的解析。如果解析参数 arg 可以在当前上下文中调用,则 arg 的解析值为 arg(ctx) ;否则,解析值就是 arg

Some callable parsers take a parse predicate. A parse predicate is not quite the same as a parse argument, because it must be a callable object, and cannot be a value. A parse predicate's return type must be contextually convertible to bool. For example:
一些可调用的解析器接受一个解析谓词。解析谓词并不完全等同于解析参数,因为它必须是一个可调用对象,而不能是一个值。解析谓词的返回类型必须能够上下文转换成 bool 。例如:

struct equals_three
{
    template<typename Context>
    bool operator()(Context const & ctx)
    {
        return _attr(ctx) == 3;
    }
};

This may of course be a lambda:
这当然可能是一个 lambda:

[](auto & ctx) { return _attr(ctx) == 3; }

The notional macro RESOLVE() expands to the result of resolving a parse argument or parse predicate. You'll see it used in the rest of the documentation.
该概念宏 RESOLVE () 扩展为解析参数或解析谓词解析的结果。您将在文档的其余部分看到它的使用。

An example of how parse arguments are used:
一个解析参数的使用示例:

namespace bp = boost::parser;
// This parser matches one code point that is at least 'a', and at most
// the value of last_char, which comes from the globals.
auto last_char = [](auto & ctx) { return _globals(ctx).last_char; }
auto subparser = bp::char_('a', last_char);

Don't worry for now about what the globals are for now; the take-away is that you can make any argument you pass to a parser depend on the current state of the parse, by using the parse context:
现在不用担心全局变量是什么;重要的是,你可以通过使用解析上下文,使传递给解析器的任何参数都依赖于当前的解析状态

namespace bp = boost::parser;
// This parser parses two code points.  For the parse to succeed, the
// second one must be >= 'a' and <= the first one.
auto set_last_char = [](auto & ctx) { _globals(ctx).last_char = _attr(x); };
auto parser = bp::char_[set_last_char] >> subparser;

Each callable parser returns a new parser, parameterized using the arguments given in the invocation.
每个可调用的解析器都返回一个新的解析器,该解析器使用在调用中给出的参数进行参数化。

This table lists all the Boost.Parser parsers. For the callable parsers, a separate entry exists for each possible arity of arguments. For a parser p, if there is no entry for p without arguments, p is a function, and cannot itself be used as a parser; it must be called. In the table below:
此表列出了所有 Boost.Parser 解析器。对于可调用的解析器,每个可能的参数数量都有一个单独的条目。对于解析器 p ,如果没有不带参数的 p 条目, p 是一个函数,它本身不能用作解析器;必须调用它。在下表中:

  • each entry is a global object usable directly in your parsers, unless otherwise noted;
    每条条目都是一个全局对象,可以直接在您的解析器中使用,除非另有说明;
  • "code point" is used to refer to the elements of the input range, which assumes that the parse is being done in the Unicode-aware code path (if the parse is being done in the non-Unicode code path, read "code point" as "char");
    "码点"用于指代输入范围的元素,假设解析是在 Unicode 感知的代码路径中进行的(如果解析是在非 Unicode 代码路径中进行的,将"码点"读作" char ");
  • RESOLVE() is a notional macro that expands to the resolution of parse argument or evaluation of a parse predicate (see The Parsers And Their Uses);
    RESOLVE () 是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》)
  • "RESOLVE(pred) == true" is a shorthand notation for "RESOLVE(pred) is contextually convertible to bool and true"; likewise for false;
    " RESOLVE(pred) == true " 是 " RESOLVE(pred) 在语境上可转换为 booltrue " 的缩写;同样适用于 false
  • c is a character of type char, char8_t, or char32_t;
    c 是类型 charchar8_tchar32_t 的字符;
  • str is a string literal of type char const[], char8_t const [], or char32_t const [];
    str 是类型 char const[]char8_t const []char32_t const [] 的字符串字面量;
  • pred is a parse predicate;
    pred 是一个解析谓词;
  • arg0, arg1, arg2, ... are parse arguments;
    arg0arg1arg2 等是解析参数;
  • a is a semantic action;
    a 是一个语义动作;
  • r is an object whose type models parsable_range;
    r 是一个类型为 parsable_range 的对象
  • p, p1, p2, ... are parsers; and
    pp1p2 等是解析器;并且
  • escapes is a symbols<T> object, where T is char or char32_t.
    escapes 是一个 symbols<T> 对象,其中 Tcharchar32_t
[Note] Note  注意

The definition of parsable_range is:
parsable_range 的定义是:

template<typename T>
concept parsable_range = std::ranges::forward_range<T> &&
    code_unit<std::ranges::range_value_t<T>>;

[Note] Note  注意

Some of the parsers in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.
一些表格中的解析器不消耗输入。除非在下面的表格中另有说明,所有解析器都会消耗它们匹配的输入。

Table 26.6. Parsers and Their Semantics
表 26.6. 解析器和它们的语义

Parser   解析器

Semantics   语义

Attribute Type   属性类型

Notes   注释

eps

Matches epsilon, the empty string. Always matches, and consumes no input.
匹配 epsilon,空字符串。总是匹配,不消耗任何输入。

None.

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters *eps, +eps, etc (this applies to unconditional eps only).
匹配 eps 无限次将导致无限循环,这是 C++中的未定义行为。Boost.Parser 在调试模式下遇到 *eps+eps 等时将断言(这仅适用于无条件 eps )。

eps(pred)

Fails to match the input if RESOLVE(pred) == false. Otherwise, the semantics are those of eps.
无法匹配输入,如果 RESOLVE(pred) == false 。否则,语义为 eps

None.

ws

Matches a single whitespace code point (see note), according to the Unicode White_Space property.
匹配单个空白代码点(见注解),根据 Unicode White_Space 属性。

None.

For more info, see the Unicode properties. ws may consume one code point or two. It only consumes two code points when it matches "\r\n".
更多信息,请参阅 Unicode 属性。 ws 可能消耗一个或两个码点。当它与 "\r\n" 匹配时,它只消耗两个码点。

eol

Matches a single newline (see note), following the "hard" line breaks in the Unicode line breaking algorithm.
匹配单个换行符(见注解),在 Unicode 断行算法中的“硬”断行之后。

None.

For more info, see the Unicode Line Breaking Algorithm. eol may consume one code point or two. It only consumes two code points when it matches "\r\n".
关于更多信息,请参阅 Unicode 行分隔算法。 eol 可能消耗一个或两个码点。当它匹配 "\r\n" 时,它只消耗两个码点。

eoi

Matches only at the end of input, and consumes no input.
仅匹配输入的末尾,不消耗任何输入。

None.

attr(arg0)

Always matches, and consumes no input. Generates the attribute RESOLVE(arg0).
总是匹配,不消耗输入。生成属性 RESOLVE(arg0)

decltype(RESOLVE(arg0)).

An important use case for attribute is to provide a default attribute value as a trailing alternative. For instance, an optional comma-delmited list is: int_ % ',' | attr(std::vector<int>). Without the "| attr(...)", at least one int_ match would be required.
一个重要的用例是使用 attribute 作为尾随备选方案来提供默认属性值。例如,一个可选的逗号分隔列表是: int_ % ',' | attr(std::vector<int>) 。如果没有“ | attr(...) ”,至少需要一个 int_ 匹配。

char_

Matches any single code point.
匹配任何单个码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

char_(arg0)

Matches exactly the code point RESOLVE(arg0).
匹配精确的代码点 RESOLVE(arg0)

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

char_(arg0, arg1)

Matches the next code point n in the input, if RESOLVE(arg0) <= n && n <= RESOLVE(arg1).
匹配输入中的下一个代码点 n ,如果 RESOLVE(arg0) <= n && n <= RESOLVE(arg1)

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

char_(r)

Matches the next code point n in the input, if n is one of the code points in r.
匹配输入中的下一个代码点 n ,如果 nr 中的代码点之一。

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

r is taken to be in a UTF encoding. The exact UTF used depends on r's element type. If you do not pass UTF encoded ranges for r, the behavior of char_ is undefined. Note that ASCII is a subset of UTF-8, so ASCII is fine. EBCDIC is not. r is not copied; a reference to it is taken. The lifetime of char_(r) must be within the lifetime of r. This overload of char_ does not take parse arguments.
r 被视为 UTF 编码。确切的 UTF 取决于 r 的元素类型。如果不为 r 提供 UTF 编码的范围, char_ 的行为是未定义的。注意,ASCII 是 UTF-8 的子集,所以 ASCII 是可以的。EBCDIC 不行。 r 不会被复制;而是取其引用。 char_(r) 的生命周期必须在 r 的生命周期内。此 char_ 重载不接收解析参数。

cp

Matches a single code point.
匹配单个码点。

char32_t

Similar to char_, but with a fixed char32_t attribute type; cp has all the same call operator overloads as char_, though they are not repeated here, for brevity.
类似于 char_ ,但具有固定的 char32_t 属性类型; cp 具有与 char_ 相同的调用操作符重载,尽管这里没有重复,以节省篇幅。

cu

Matches a single code point.
匹配单个码点。

char

Similar to char_, but with a fixed char attribute type; cu has all the same call operator overloads as char_, though they are not repeated here, for brevity. Even though the name "cu" suggests that this parser match at the code unit level, it does not. The name refers to the attribute type generated, much like the names int_ versus uint_.
类似于 char_ ,但具有固定的 char 属性类型; cu 具有与 char_ 相同的所有调用运算符重载,尽管这里没有重复,以节省篇幅。尽管名称“ cu ”暗示这个解析器在代码单元级别匹配,但实际上并非如此。该名称指的是生成的属性类型,就像名称 int_uint_ 一样。

blank

Equivalent to ws - eol.
相当于 ws - eol

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

control

Matches a single control-character code point.
匹配单个控制字符代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

digit

Matches a single decimal digit code point.
匹配单个十进制数字码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

punct

Matches a single punctuation code point.
匹配单个标点符号代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

hex_digit

Matches a single hexidecimal digit code point.
匹配单个十六进制数字代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

lower

Matches a single lower-case code point.
匹配单个小写代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

upper

Matches a single upper-case code point.
匹配单个大写代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

lit(c)

Matches exactly the given code point c.
匹配给定的代码点 c

None.

lit() does not take parse arguments.
lit() 不接受解析参数。

c_l

Matches exactly the given code point c.
匹配给定的代码点 c

None.

This is a UDL that represents lit(c), for example 'F'_l.
这是一个代表 lit(c) 的 UDL,例如 'F'_l

lit(r)

Matches exactly the given string r.
完全匹配给定的字符串 r

None.

lit() does not take parse arguments.
lit() 不接受解析参数。

str_l

Matches exactly the given string str.
完全匹配给定的字符串 str

None.

This is a UDL that represents lit(s), for example "a string"_l.
这是一个代表 lit(s) 的 UDL,例如 "a string"_l

string(r)

Matches exactly r, and generates the match as an attribute.
匹配精确地 r ,并将匹配项作为属性生成。

std::string

string() does not take parse arguments.
string() 不接受解析参数。

str_p

Matches exactly str, and generates the match as an attribute.
匹配精确地 str ,并将匹配项作为属性生成。

std::string

This is a UDL that represents string(s), for example "a string"_p.
这是一个代表 string(s) 的 UDL,例如 "a string"_p

bool_

Matches "true" or "false".
匹配 "true""false"

bool

bin

Matches a binary unsigned integral value.
匹配一个二进制无符号整数值。

unsigned int

For example, bin would match "101", and generate an attribute of 5u.
例如, bin 会匹配 "101" ,并生成 5u 的属性。

bin(arg0)

Matches exactly the binary unsigned integral value RESOLVE(arg0).
匹配二进制无符号整数值 RESOLVE(arg0)

unsigned int

oct

Matches an octal unsigned integral value.
匹配一个八进制无符号整数值。

unsigned int

For example, oct would match "31", and generate an attribute of 25u.
例如, oct 会匹配 "31" ,并生成 25u 的属性。

oct(arg0)

Matches exactly the octal unsigned integral value RESOLVE(arg0).
匹配精确的八进制无符号整数值 RESOLVE(arg0)

unsigned int

hex

Matches a hexadecimal unsigned integral value.
匹配一个无符号十六进制整数值。

unsigned int

For example, hex would match "ff", and generate an attribute of 255u.
例如, hex 会匹配 "ff" ,并生成 255u 的属性。

hex(arg0)

Matches exactly the hexadecimal unsigned integral value RESOLVE(arg0).
匹配精确的十六进制无符号整数值 RESOLVE(arg0)

unsigned int

ushort_

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned short

ushort_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned short

uint_

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned int

uint_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned int

ulong_

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned long

ulong_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned long

ulong_long

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned long long

ulong_long(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned long long

short_

Matches a signed integral value.
匹配一个有符号整数值。

short

short_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

short

int_

Matches a signed integral value.
匹配一个有符号整数值。

int

int_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

int

long_

Matches a signed integral value.
匹配一个有符号整数值。

long

long_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

long

long_long

Matches a signed integral value.
匹配一个有符号整数值。

long long

long_long(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

long long

float_

Matches a floating-point number. float_ uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default RealPolicies is supported by float_.
匹配一个浮点数。 float_ 使用了 Boost.Spirit 的解析实现细节。接受的格式具体可以在它们的实数解析器中找到。注意,只有默认的 RealPoliciesfloat_ 支持。

float

double_

Matches a floating-point number. double_ uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default RealPolicies is supported by double_.
匹配一个浮点数。 double_ 使用了 Boost.Spirit 的解析实现细节。接受的格式具体可以在它们的实数解析器中找到。注意,只有默认的 RealPoliciesdouble_ 支持。

double

repeat(arg0)[p]

Matches iff p matches exactly RESOLVE(arg0) times.
匹配当且仅当 p 恰好匹配 RESOLVE(arg0) 次。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

The special value Inf may be used; it indicates unlimited repetition. decltype(RESOLVE(arg0)) must be implicitly convertible to int64_t. Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters repeat(Inf)[eps] (this applies to unconditional eps only).
特殊值 Inf 可以使用;它表示无限重复。 decltype(RESOLVE(arg0)) 必须隐式转换为 int64_t 。匹配 eps 无限次将创建无限循环,这是 C++ 中的未定义行为。Boost.Parser 在调试模式下遇到 repeat(Inf)[eps] 时将断言(这仅适用于无条件 eps )。

repeat(arg0, arg1)[p]

Matches iff p matches between RESOLVE(arg0) and RESOLVE(arg1) times, inclusively.
匹配当且仅当 pRESOLVE(arg0)RESOLVE(arg1) 之间(含两端)匹配 RESOLVE(arg1) 次。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

The special value Inf may be used for the upper bound; it indicates unlimited repetition. decltype(RESOLVE(arg0)) and decltype(RESOLVE(arg1)) each must be implicitly convertible to int64_t. Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters repeat(n, Inf)[eps] (this applies to unconditional eps only).
特殊值 Inf 可用于上界;它表示无限重复。 decltype(RESOLVE(arg0))decltype(RESOLVE(arg1)) 必须隐式转换为 int64_t 。匹配 eps 无限次将创建无限循环,这是 C++ 中的未定义行为。Boost.Parser 在调试模式下遇到 repeat(n, Inf)[eps] 时将断言(这仅适用于无条件 eps )。

if_(pred)[p]

Equivalent to eps(pred) >> p.
相当于 eps(pred) >> p

std::optional<ATTR(p)>

It is an error to write if_(pred). That is, it is an error to omit the conditionally matched parser p.
这是写入 if_(pred) 的错误。也就是说,省略条件匹配的解析器 p 是错误的。

switch_(arg0)(arg1, p1)(arg2, p2) ...

Equivalent to p1 when RESOLVE(arg0) == RESOLVE(arg1), p2 when RESOLVE(arg0) == RESOLVE(arg2), etc. If there is such no argN, the behavior of switch_() is undefined.
相当于当 RESOLVE(arg0) == RESOLVE(arg1)p1 ,当 RESOLVE(arg0) == RESOLVE(arg2)p2 ,等等。如果没有这样的 argNswitch_() 的行为是未定义的。

std::variant<ATTR(p1), ATTR(p2), ...>

It is an error to write switch_(arg0). That is, it is an error to omit the conditionally matched parsers p1, p2, ....
这是写入 switch_(arg0) 的错误。也就是说,省略条件匹配的解析器 p1p2 ……是错误的。

symbols<T>

symbols is an associative container of key, value pairs. Each key is a std::string and each value has type T. In the Unicode parsing path, the strings are considered to be UTF-8 encoded; in the non-Unicode path, no encoding is assumed. symbols Matches the longest prefix pre of the input that is equal to one of the keys k. If the length len of pre is zero, and there is no zero-length key, it does not match the input. If len is positive, the generated attribute is the value associated with k.
symbols 是一个键值对的关联容器。每个键是 std::string ,每个值具有类型 T 。在 Unicode 解析路径中,字符串被认为是 UTF-8 编码的;在非 Unicode 路径中,假设没有编码。 symbols 匹配输入的最长前缀 pre ,该前缀等于键 k 之一。如果 len 的长度 pre 为零,并且没有零长度键,则不匹配输入。如果 len 为正,则生成的属性是与 k 关联的值。

T

Unlike the other entries in this table, symbols is a type, not an object.
不同于本表中的其他条目, symbols 是一种类型,而不是一个对象。

quoted_string

Matches '"', followed by zero or more characters, followed by '"'.
匹配 '"' ,后跟零个或多个字符,后跟 '"'

std::string

The result does not include the quotes. A quote within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括引号。字符串中的引号可以通过转义它来写入,即使用反斜杠。字符串中的反斜杠可以通过写两个连续的反斜杠来写入。除上述用法外,任何其他反斜杠的使用都将导致解析失败。在解析整个字符串时,跳过是禁用的,就像使用 lexeme[] 一样。

quoted_string(c)

Matches c, followed by zero or more characters, followed by c.
匹配 c ,后跟零个或多个字符,后跟 c

std::string

The result does not include the c quotes. A c within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 c 引号。字符串中的 c 可以通过使用反斜杠进行转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。除了解释字符串外,任何其他使用反斜杠的情况都会导致解析失败。在解析整个字符串时,跳过功能被禁用,就像使用 lexeme[] 一样。

quoted_string(r)

Matches some character Q in r, followed by zero or more characters, followed by Q.
匹配某些字符 Qr 中,后跟零个或多个字符,然后是 Q

std::string

The result does not include the Q quotes. A Q within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 Q 引号。字符串中的 Q 可以通过使用反斜杠进行转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。除了解释字符串外,任何其他使用反斜杠的情况都会导致解析失败。在解析整个字符串时,跳过功能被禁用,就像使用 lexeme[] 一样。

quoted_string(c, symbols)

Matches c, followed by zero or more characters, followed by c.
匹配 c ,后跟零个或多个字符,后跟 c

std::string

The result does not include the c quotes. A c within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using symbols will be interpreted as the corresponding value produced by symbols. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 c 引号。字符串中的 c 可以通过使用反斜杠转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。反斜杠后跟一个成功的匹配使用 symbols 将被解释为 symbols 生成的相应值。任何其他反斜杠的使用都将导致解析失败。在解析整个字符串时,跳过是禁用的,就像使用 lexeme[] 一样。

quoted_string(r, symbols)

Matches some character Q in r, followed by zero or more characters, followed by Q.
匹配某些字符 Qr 中,后跟零个或多个字符,然后是 Q

std::string

The result does not include the Q quotes. A Q within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using symbols will be interpreted as the corresponding value produced by symbols. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 Q 引号。字符串中的 Q 可以通过使用反斜杠转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。反斜杠后跟一个成功的匹配使用 symbols 将被解释为 symbols 生成的相应值。任何其他反斜杠的使用都将导致解析失败。在解析整个字符串时,跳过是禁用的,就像使用 lexeme[] 一样。


[Important] Important  重要

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).
所有字符解析器,如 char_cpcu ,都产生 charchar32_t 属性。因此,当你在上面的表中看到“ std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)> ”时,这实际上意味着每个字符属性序列都被转换成 std::string 。唯一不会发生这种情况的是,当你使用另一种字符类型引入具有属性的规则(或使用 attribute 这样做)。

[Note] Note  注意

A slightly more complete description of the attributes generated by these parsers is in a subsequent section. The attributes are repeated here so you can use see all the properties of the parsers in one place.
一个对这些解析器生成的属性更完整的描述将在下一节中。属性在此处重复,以便您可以在一个地方查看解析器的所有属性。

If you have an integral type IntType that is not covered by any of the Boost.Parser parsers, you can use a more verbose declaration to declare a parser for IntType. If IntType were unsigned, you would use uint_parser. If it were signed, you would use int_parser. For example:
如果您有一个任何 Boost.Parser 解析器都没有涵盖的整型 IntType ,您可以使用更冗长的声明来声明一个解析器用于 IntType 。如果 IntType 是无符号的,您将使用 uint_parser 。如果是带符号的,您将使用 int_parser 。例如:

constexpr parser_interface<int_parser<IntType>> hex_int;

uint_parser and int_parser accept three more non-type template parameters after the type parameter. They are Radix, MinDigits, and MaxDigits. Radix defaults to 10, MinDigits to 1, and MaxDigits to -1, which is a sentinel value meaning that there is no max number of digits.
uint_parserint_parser 在类型参数之后接受三个额外的非类型模板参数。它们是 RadixMinDigitsMaxDigitsRadix 默认为 10MinDigits1MaxDigits-1 ,这是一个哨兵值,表示没有最大数字限制。

So, if you wanted to parse exactly eight hexadecimal digits in a row in order to recognize Unicode character literals like C++ has (e.g. \Udeadbeef), you could use this parser for the digits at the end:
因此,如果您想解析连续的八个十六进制数字以识别类似于 C++中的 Unicode 字符字面量(例如 \Udeadbeef ),则可以使用此解析器来解析末尾的数字:

constexpr parser_interface<uint_parser<unsigned int, 16, 8, 8>> hex_int;

A directive is an element of your parser that doesn't have any meaning by itself. Some are second-order parsers that need a first-order parser to do the actual parsing. Others influence the parse in some way. You can often spot a directive lexically by its use of []; directives always []. Non-directives might, but only when attaching a semantic action.
指令是您解析器的一个元素,它本身没有任何意义。有些是二阶解析器,需要一阶解析器来进行实际的解析。其他的一些以某种方式影响解析。您通常可以通过其使用 [] ;指令来通过词法识别出指令;非指令可能,但仅当附加语义动作时。

The directives that are second order parsers are technically directives, but since they are also used to create parsers, it is more useful just to focus on that. The directives repeat() and if_() were already described in the section on parsers; we won't say much about them here.
二阶解析器指令在技术上也是指令,但鉴于它们也用于创建解析器,因此只需关注这一点更有用。指令 repeat()if_() 已在解析器部分中描述;这里我们不会过多介绍它们。

Interaction with sequence, alternative, and permutation parsers
与序列、替代和排列解析器的交互

Sequence, alternative, and permutation parsers do not nest in most cases. (Let's consider just sequence parsers to keep thinkgs simple, but most of this logic applies to alternative parsers as well.) a >> b >> c is the same as (a >> b) >> c and a >> (b >> c), and they are each represented by a single seq_parser with three subparsers, a, b, and c. However, if something prevents two seq_parsers from interacting directly, they will nest. For instance, lexeme[a >> b] >> c is a seq_parser containing two parsers, lexeme[a >> b] and c. This is because lexeme[] takes its given parser and wraps it in a lexeme_parser. This in turn turns off the sequence parser combining logic, since both sides of the second operator>> in lexeme[a >> b] >> c are not seq_parsers. Sequence parsers have several rules that govern what the overall attribute type of the parser is, based on the positions and attributes of it subparsers (see Attribute Generation). Therefore, it's important to know which directives create a new parser (and what kind), and which ones do not; this is indicated for each directive below.
序列、替代和排列解析器在大多数情况下不会嵌套。(让我们只考虑序列解析器以保持事情简单,但大部分逻辑也适用于替代解析器。) a >> b >> c(a >> b) >> ca >> (b >> c) 相同,它们各自由一个包含三个子解析器的单个 seq_parser 表示,分别是 abc 。然而,如果某些因素阻止两个 seq_parsers 直接交互,它们将会嵌套。例如, lexeme[a >> b] >> c 是一个包含两个解析器 lexeme[a >> b]cseq_parser 。这是因为 lexeme[] 将其给定的解析器包裹在 lexeme_parser 中。这反过来又关闭了序列解析器组合逻辑,因为 lexeme[a >> b] >> c 中的第二个 operator>> 的两边都不是 seq_parsers 。序列解析器有几条规则来规范解析器的整体属性类型,基于其子解析器的位置和属性(见属性生成)。因此,了解哪些指令创建新的解析器(以及是什么类型的解析器)以及哪些指令不创建解析器很重要;下面为每个指令指明了这一点。

The directives  指示
repeat()  重复()

See The Parsers And Their Uses. Creates a repeat_parser.
查看解析器和它们的用途。创建一个 repeat_parser

if_()

See The Parsers And Their Uses. Creates a seq_parser.
查看解析器和它们的用途。创建一个 seq_parser

omit[]  省略[]

omit[p] disables attribute generation for the parser p. Not only does omit[p] have no attribute, but any attribute generation work that normally happens within p is skipped.
omit[p] 禁用解析器的属性生成 p 。不仅没有属性,而且通常在 p 内发生的任何属性生成工作都会被跳过。

This directive can be useful in cases like this: say you have some fairly complicated parser p that generates a large and expensive-to-construct attribute. Now say that you want to write a function that just counts how many times p can match a string (where the matches are non-overlapping). Instead of using p directly, and building all those attributes, or rewriting p without the attribute generation, use omit[].
此指令在这种情况下可能很有用:比如说,你有一个相当复杂的解析器 p ,它生成一个庞大且构建成本高昂的属性。现在假设你想编写一个函数,只计算 p 可以匹配字符串的次数(匹配是非重叠的)。与其直接使用 p 并构建所有这些属性,或者在不生成属性的情况下重写 p ,不如使用 omit[]

Creates an omit_parser.
创建一个 omit_parser

raw[]  raw[]:原始数组

raw[p] changes the attribute from ATTR(p) to to a view that delimits the subrange of the input that was matched by p. The type of the view is subrange<I>, where I is the type of the iterator used within the parse. Note that this may not be the same as the iterator type passed to parse(). For instance, when parsing UTF-8, the iterator passed to parse() may be char8_t const *, but within the parse it will be a UTF-8 to UTF-32 transcoding (converting) iterator. Just like omit[], raw[] causes all attribute-generation work within p to be skipped.
raw[p] 将属性从 ATTR(p) 更改为定义由 p 匹配的输入子范围的视图。视图类型为 subrange<I> ,其中 I 是解析中使用的迭代器的类型。请注意,这可能与传递给 parse() 的迭代器类型不同。例如,当解析 UTF-8 时,传递给 parse() 的迭代器可能是 char8_t const * ,但在解析过程中将是一个 UTF-8 到 UTF-32 的转换(转换)迭代器。就像 omit[] 一样, raw[] 会导致在 p 内跳过所有属性生成工作。

Similar to the re-use scenario for omit[] above, raw[] could be used to find the locations of all non-overlapping matches of p in a string.
类似于上面 omit[] 的复用场景, raw[] 可以用来在一个字符串中找到所有非重叠匹配的 p 的位置。

Creates a raw_parser.
创建一个 raw_parser

string_view[]  字符串视图数组

string_view[p] is very similar to raw[p], except that it changes the attribute of p to std::basic_string_view<C>, where C is the character type of the underlying range being parsed. string_view[] requires that the underlying range being parsed is contiguous. Since this can only be detected in C++20 and later, string_view[] is not available in C++17 mode.
string_view[p]raw[p] 非常相似,除了它将 p 的属性更改为 std::basic_string_view<C> ,其中 C 是正在解析的底层范围的字符类型。 string_view[] 要求正在解析的底层范围是连续的。由于这只能在 C++20 及以后版本中检测到,因此 string_view[] 在 C++17 模式下不可用。

Similar to the re-use scenario for omit[] above, string_view[] could be used to find the locations of all non-overlapping matches of p in a string. Whether raw[] or string_view[] is more natural to use to report the locations depends on your use case, but they are essentially the same.
类似于上面 omit[] 的复用场景, string_view[] 可以用来查找字符串中所有非重叠匹配的 p 的位置。使用 raw[]string_view[] 来报告位置哪个更自然取决于你的用例,但它们本质上是一样的。

Creates a string_view_parser.
创建一个 string_view_parser

no_case[]  无_case[]

no_case[p] enables case-insensitive parsing within the parse of p. This applies to the text parsed by char_(), string(), and bool_ parsers. The number parsers are already case-insensitive. The case-insensitivity is achieved by doing Unicode case folding on the text being parsed and the values in the parser being matched (see note below if you want to know more about Unicode case folding). In the non-Unicode code path, a full Unicode case folding is not done; instead, only the transformations of values less than 0x100 are done. Examples:
no_case[p] 启用对 p 的解析中的不区分大小写的解析。这适用于 char_()string()bool_ 解析器解析的文本。数字解析器已经不区分大小写。通过在解析的文本和解析器中匹配的值上进行 Unicode 大小写折叠来实现不区分大小写(如需了解更多关于 Unicode 大小写折叠的信息,请参阅以下注释)。在非 Unicode 代码路径中,不执行完整的 Unicode 大小写折叠;相反,只对小于 0x100 的值进行转换。示例:

#include <boost/parser/transcode_view.hpp> // For as_utfN.

namespace bp = boost::parser;
auto const street_parser = bp::string(u8"Tobias Straße");
assert(!bp::parse("Tobias Strasse" | bp::as_utf32, street_parser));             // No match.
assert(bp::parse("Tobias Strasse" | bp::as_utf32, bp::no_case[street_parser])); // Match!

auto const alpha_parser = bp::no_case[bp::char_('a', 'z')];
assert(bp::parse("a" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!
assert(bp::parse("B" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!

Everything pretty much does what you'd naively expect inside no_case[], except that the two-character range version of char_ has a limitation. It only compares a code point from the input to its two arguments (e.g. 'a' and 'z' in the example above). It does not do anything special for multi-code point case folding expansions. For instance, char_(U'ß', U'ß') matches the input U"s", which makes sense, since U'ß' expands to U"ss". However, that same parser does not match the input U"ß"! In short, stick to pairs of code points that have single-code point case folding expansions. If you need to support the multi-expanding code points, use the other overload, like: char_(U"abcd/*...*/ß").
所有内容基本上都符合你天真地期望在 no_case[] 内执行的操作,除了 char_ 的两个字符范围版本有一个限制。它只将输入中的一个码点与其两个参数(例如上面的示例中的 'a''z' )进行比较。对于多码点的情况折叠扩展,它不做任何特殊处理。例如, char_(U'ß', U'ß') 与输入 U"s" 匹配,这是有意义的,因为 U'ß' 扩展为 U"ss" 。然而,那个相同的解析器不匹配输入 U"ß" !简而言之,坚持使用具有单码点情况折叠扩展的码点对。如果你需要支持多扩展的码点,请使用其他重载,如: char_(U"abcd/*...*/ß")

[Note] Note  注意

Unicode case folding is an operation that makes text uniformly one case, and if you do it to two bits of text A and B, then you can compare them bitwise to see if they are the same, except of case. Case folding may sometimes expand a code point into multiple code points (e.g. case folding "ẞ" yields "ss". When such a multi-code point expansion occurs, the expanded code points are in the NFKC normalization form.
Unicode 大小写折叠是一种将文本统一为单一种大小写的操作,如果你对两个文本片段 AB 进行大小写折叠,那么你可以通过位运算来比较它们是否相同,除了大小写之外。大小写折叠有时会将一个码点扩展成多个码点(例如,大小写折叠 "ẞ" 会产生 "ss" 。当发生这种多码点扩展时,扩展的码点处于 NFKC 归一化形式。

Creates a no_case_parser.
创建一个 no_case_parser

lexeme[]  lexeme[]:词元[]

lexeme[p] disables use of the skipper, if a skipper is being used, within the parse of p. This is useful, for instance, if you want to enable skipping in most parts of your parser, but disable it only in one section where it doesn't belong. If you are skipping whitespace in most of your parser, but want to parse strings that may contain spaces, you should use lexeme[]:
lexeme[p] 禁用跳过符的使用,如果在解析 p 时正在使用跳过符。这在某些情况下很有用,例如,如果您想在解析器的大多数部分启用跳过,但在不属于该部分的一个部分中禁用它。如果您在解析器的大多数部分跳过空白,但想解析可能包含空格的字符串,则应使用 lexeme[] :

namespace bp = boost::parser;
auto const string_parser = bp::lexeme['"' >> *(bp::char_ - '"') >> '"'];

Without lexeme[], our string parser would correctly match "foo bar", but the generated attribute would be "foobar".
没有 lexeme[] ,我们的字符串解析器会正确匹配 "foo bar" ,但生成的属性会是 "foobar"

Creates a lexeme_parser.
创建一个 lexeme_parser

skip[]  跳过[]

skip[] is like the inverse of lexeme[]. It enables skipping in the parse, even if it was not enabled before. For example, within a call to parse() that uses a skipper, let's say we have these parsers in use:
skip[]lexeme[] 的逆。它允许在解析中跳过,即使之前没有启用。例如,在一个使用跳转器的 parse() 调用中,假设我们使用了以下解析器:

namespace bp = boost::parser;
auto const one_or_more = +bp::char_;
auto const skip_or_skip_not_there_is_no_try = bp::lexeme[bp::skip[one_or_more] >> one_or_more];

The use of lexeme[] disables skipping, but then the use of skip[] turns it back on. The net result is that the first occurrence of one_or_more will use the skipper passed to parse(); the second will not.
使用 lexeme[] 禁用跳过,但随后使用 skip[] 又将其打开。最终结果是, one_or_more 的第一个出现将使用传递给 parse() 的跳过器;第二个则不会。

skip[] has another use. You can parameterize skip with a different parser to change the skipper just within the scope of the directive. Let's say we passed ws to parse(), and we're using these parsers somewhere within that parse() call:
skip[] 有另一种用途。您可以使用不同的解析器来参数化跳过,以便仅在指令的作用域内更改跳过器。假设我们将 ws 传递给 parse() ,并且我们正在该 parse() 调用中使用这些解析器:

namespace bp = boost::parser;
auto const zero_or_more = *bp::char_;
auto const skip_both_ways = zero_or_more >> bp::skip(bp::blank)[zero_or_more];

The first occurrence of zero_or_more will use the skipper passed to parse(), which is ws; the second will use blank as its skipper.
第一次出现 zero_or_more 将使用传递给 parse() 的跳过器,即 ws ;第二次将使用 blank 作为其跳过器。

Creates a skip_parser.
创建一个 skip_parser

merge[], separate[], and transform(f)[]
合并[], 分离[], 以及 transform(f)[]

These directives influence the generation of attributes. See Attribute Generation section for more details on them.
这些指令影响属性的生成。有关详细信息,请参阅属性生成部分。

merge[] and separate[] create a copy of the given seq_parser.
merge[]separate[] 创建给定 seq_parser 的副本。

transform(f)[] creates a tranform_parser.
transform(f)[] 创建一个 tranform_parser

Certain overloaded operators are defined for all parsers in Boost.Parser. We've already seen some of them used in this tutorial, especially operator>>, operator|, and operator||, which are used to form sequence parsers, alternative parsers, and permutation parsers, respectively.
某些重载运算符在 Boost.Parser 的所有解析器中都有定义。我们已经在本次教程中看到了一些它们的用法,特别是 operator>>operator|operator|| ,分别用于形成序列解析器、选择解析器和排列解析器。

Here are all the operator overloaded for parsers. In the tables below:
这里列出了所有用于解析器的运算符重载。在下表中的:

  • c is a character of type char or char32_t;
    c 是类型 charchar32_t 的字符;
  • a is a semantic action;
    a 是一个语义动作;
  • r is an object whose type models parsable_range (see Concepts); and
    r 是一个对象,其类型模拟 parsable_range (见概念);
  • p, p1, p2, ... are parsers.
    pp1p2 等是解析器。
[Note] Note  注意

Some of the expressions in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.
某些表格中的表达式不消耗任何输入。除非在下面的表格中另有说明,否则所有解析器都会消耗它们匹配的输入。

Table 26.7. Combining Operations and Their Semantics
表 26.7. 组合操作及其语义

Expression   表达式

Semantics   语义

Attribute Type   属性类型

Notes   注释

!p

Matches iff p does not match; consumes no input.
匹配当且仅当 p 不匹配;不消耗任何输入。

None.

&p

Matches iff p matches; consumes no input.
匹配当且仅当 p 匹配;不消耗任何输入。

None.

*p

Parses using p repeatedly until p no longer matches; always matches.
使用 p 重复解析,直到 p 不再匹配;始终匹配。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters *eps (this applies to unconditional eps only).
匹配 eps 无限次将导致无限循环,这是 C++中的未定义行为。Boost.Parser 在调试模式下遇到 *eps 时会断言(这仅适用于无条件 eps )。

+p

Parses using p repeatedly until p no longer matches; matches iff p matches at least once.
解析使用 p 重复进行,直到 p 不再匹配;如果 p 至少匹配一次,则匹配。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters +eps (this applies to unconditional eps only).
匹配 eps 无限次将导致无限循环,这是 C++中的未定义行为。Boost.Parser 在调试模式下遇到 +eps 时会断言(这仅适用于无条件 eps )。

-p

Equivalent to p | eps.
相当于 p | eps

std::optional<ATTR(p)>

p1 >> p2

Matches iff p1 matches and then p2 matches.
匹配当且仅当 p1 匹配然后 p2 匹配。

boost::parser::tuple<ATTR(p1), ATTR(p2)> (See note.)    boost::parser::tuple<ATTR(p1), ATTR(p2)> (见注释。)

>> is associative; p1 >> p2 >> p3, (p1 >> p2) >> p3, and p1 >> (p2 >> p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes; see Attribute Generation for the full rules.
>> 是关联的; p1 >> p2 >> p3(p1 >> p2) >> p3p1 >> (p2 >> p3) 都等价。此属性类型仅适用于 p1p2 均生成属性的情况;请参阅属性生成以获取完整规则。

p >> c

Equivalent to p >> lit(c).
相当于 p >> lit(c)

ATTR(p)

p >> r

Equivalent to p >> lit(r).
相当于 p >> lit(r)

ATTR(p)

p1 > p2

Matches iff p1 matches and then p2 matches. No back-tracking is allowed after p1 matches; if p1 matches but then p2 does not, the top-level parse fails.
匹配当且仅当 p1 匹配然后 p2 匹配。 p1 匹配后不允许回溯;如果 p1 匹配但随后 p2 不匹配,则顶级解析失败。

boost::parser::tuple<ATTR(p1), ATTR(p2)> (See note.)    boost::parser::tuple<ATTR(p1), ATTR(p2)> (见注释。)

> is associative; p1 > p2 > p3, (p1 > p2) > p3, and p1 > (p2 > p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes; see Attribute Generation for the full rules.
> 是关联的; p1 > p2 > p3(p1 > p2) > p3p1 > (p2 > p3) 都等价。此属性类型仅适用于 p1p2 均生成属性的情况;请参阅属性生成以获取完整规则。

p > c

Equivalent to p > lit(c).
相当于 p > lit(c)

ATTR(p)

p > r

Equivalent to p > lit(r).
相当于 p > lit(r)

ATTR(p)

p1 | p2

Matches iff either p1 matches or p2 matches.
匹配当且仅当 p1 匹配或 p2 匹配。

std::variant<ATTR(p1), ATTR(p2)> (See note.)    std::variant<ATTR(p1), ATTR(p2)> (见注释。)

| is associative; p1 | p2 | p3, (p1 | p2) | p3, and p1 | (p2 | p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes, and where the attribute types are different; see Attribute Generation for the full rules.
| 是关联的; p1 | p2 | p3(p1 | p2) | p3p1 | (p2 | p3) 都等价。此属性类型仅适用于 p1p2 均生成属性且属性类型不同的情况;有关完整规则,请参阅属性生成。

p | c

Equivalent to p | lit(c).
相当于 p | lit(c)

ATTR(p)

p | r

Equivalent to p | lit(r).
相当于 p | lit(r)

ATTR(p)

p1 || p2

Matches iff p1 matches and p2 matches, regardless of the order they match in.
匹配当且仅当 p1 匹配且 p2 匹配,无论它们匹配的顺序如何。

boost::parser::tuple<ATTR(p1), ATTR(p2)>

|| is associative; p1 || p2 || p3, (p1 || p2) || p3, and p1 || (p2 || p3) are all equivalent. It is an error to include a eps (conditional or non-conditional) in an operator|| expression. Though the parsers are matched in any order, the attribute elements are always in the order written in the operator|| expression.
|| 是关联的; p1 || p2 || p3(p1 || p2) || p3p1 || (p2 || p3) 都等价。在 operator|| 表达式中包含 eps (条件或非条件)是错误的。尽管解析器可以按任何顺序匹配,但属性元素始终按照 operator|| 表达式中书写的顺序排列。

p1 - p2

Equivalent to !p2 >> p1.
相当于 !p2 >> p1

ATTR(p1)

p - c

Equivalent to p - lit(c).
相当于 p - lit(c)

ATTR(p)

p - r

Equivalent to p - lit(r).
相当于 p - lit(r)

ATTR(p)

p1 % p2

Equivalent to p1 >> *(p2 >> p1).
相当于 p1 >> *(p2 >> p1)

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p1)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p1)>

p % c

Equivalent to p % lit(c).
相当于 p % lit(c)

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

p % r

Equivalent to p % lit(r).
相当于 p % lit(r)

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

p[a]

Matches iff p matches. If p matches, the semantic action a is executed.
匹配当且仅当 p 匹配。如果 p 匹配,则执行语义动作 a

None.


[Important] Important  重要

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).
所有字符解析器,如 char_cpcu ,都产生 charchar32_t 属性。因此,当你在上面的表中看到“ std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)> ”时,这实际上意味着每个字符属性序列都被转换成 std::string 。唯一不会发生这种情况的是,当你使用另一种字符类型引入具有属性的规则(或使用 attribute 这样做)。

There are a couple of special rules not captured in the table above:
上表未涵盖以下几条特殊规则:

First, the zero-or-more and one-or-more repetitions (operator*() and operator+(), respectively) may collapse when combined. For any parser p, +(+p) collapses to +p; **p, *+p, and +*p each collapse to just *p.
首先,零次或多次和一次或多次的重复(分别用 operator*()operator+() 表示)在组合时可能会合并。对于任何解析器 p+(+p) 合并为 +p**p*+p+*p 各自合并为仅 *p

Second, using eps in an alternative parser as any alternative except the last one is a common source of errors; Boost.Parser disallows it. This is true because, for any parser p, eps | p is equivalent to eps, since eps always matches. This is not true for eps parameterized with a condition. For any condition cond, eps(cond) is allowed to appear anywhere within an alternative parser.
其次,在替代解析器中使用 eps 作为除最后一个以外的任何替代方案是常见的错误来源;Boost.Parser 禁止这样做。这是因为,对于任何解析器 peps | peps 是等价的,因为 eps 总是匹配。对于用条件参数化的 eps ,则不是这样。对于任何条件 condeps(cond) 都允许出现在替代解析器中的任何位置。

[Note] Note  注意

When looking at Boost.Parser parsers in a debugger, or when looking at their reference documentation, you may see reference to the template parser_interface. This template exists to provide the operator overloads described above. It allows the parsers themselves to be very simple — most parsers are just a struct with two member functions. parser_interface is essentially invisible when using Boost.Parser, and you should never have to name this template in your own code.
当在调试器中查看 Boost.Parser 解析器或查看它们的参考文档时,您可能会看到对模板 parser_interface 的引用。此模板存在是为了提供上述描述的运算符重载。它允许解析器本身非常简单——大多数解析器只是一个具有两个成员函数的结构体。 parser_interface 在 Boost.Parser 中使用时实际上是不可见的,您永远不需要在自己的代码中命名此模板。

So far, we've seen several different types of attributes that come from different parsers, int for int_, boost::parser::tuple<char, int> for boost::parser::char_ >> boost::parser::int_, etc. Let's get into how this works with more rigor.
到目前为止,我们已经看到了来自不同解析器的几种不同类型的属性,例如 int 对应于 int_boost::parser::tuple<char, int> 对应于 boost::parser::char_ >> boost::parser::int_ 等。让我们更严谨地探讨这是如何工作的。

[Note] Note  注意

Some parsers have no attribute at all. In the tables below, the type of the attribute is listed as "None." There is a non-void type that is returned from each parser that lacks an attribute. This keeps the logic simple; having to handle the two cases — void or non-void — would make the library significantly more complicated. The type of this non-void attribute associated with these parsers is an implementation detail. The type comes from the boost::parser::detail namespace and is pretty useless. You should never see this type in practice. Within semantic actions, asking for the attribute of a non-attribute-producing parser (using _attr(ctx)) will yield a value of the special type boost::parser::none. When calling parse() in a form that returns the attribute parsed, when there is no attribute, simply returns bool; this indicates the success of failure of the parse.
一些解析器没有任何属性。在下表中,属性的类型被列为“None”。每个缺少属性的解析器返回一个非 void 类型。这使逻辑简单;处理两个情况—— void 或非 void ——会使库变得复杂得多。与这些解析器关联的非 void 属性的类型是实现细节。类型来自 boost::parser::detail 命名空间,相当无用。在实际情况中,您不应看到这种类型。在语义动作中,请求一个不产生属性的解析器的属性(使用 _attr(ctx) )将产生特殊类型 boost::parser::none 的值。当以返回解析属性的形式调用 parse() 时,如果没有属性,则简单地返回 bool ;这表示解析的成功或失败。

[Warning] Warning  警告

Boost.Parser assumes that all attributes are semi-regular (see std::semiregular). Within the Boost.Parser code, attributes are assigned, moved, copy, and default constructed. There is no support for move-only or non-default-constructible types.
Boost.Parser 假定所有属性都是半正则的(见 std::semiregular )。在 Boost.Parser 代码中,属性被分配、移动、复制和默认构造。不支持仅移动或非默认可构造的类型。

The attribute type trait, attribute
属性类型特性,属性

You can use attribute (and the associated alias, attribute_t) to determine the attribute a parser would have if it were passed to parse(). Since at least one parser (char_) has a polymorphic attribute type, attribute also takes the type of the range being parsed. If a parser produces no attribute, attribute will produce none, not void.
您可以使用 attribute (以及相关的别名, attribute_t )来确定如果将其传递给 parse() ,解析器将具有的属性。由于至少有一个解析器( char_ )具有多态属性类型, attribute 也接受正在解析的范围的类型。如果解析器不产生属性, attribute 将产生 none ,而不是 void

If you want to feed an iterator/sentinel pair to attribute, create a range from it like so:
如果您想将迭代器/哨兵对传递给 attribute ,请创建一个从它开始的范围,如下所示:

constexpr auto parser = /* ... */;
auto first = /* ... */;
auto const last = /* ... */;

namespace bp = boost::parser;
// You can of course use std::ranges::subrange directly in C++20 and later.
using attr_type = bp::attribute_t<decltype(BOOST_PARSER_SUBRANGE(first, last)), decltype(parser)>;

There is no single attribute type for any parser, since a parser can be placed within omit[], which makes its attribute type none. Therefore, attribute cannot tell you what attribute your parser will produce under all circumstances; it only tells you what it would produce if it were passed to parse().
没有任何解析器有单一的属性类型,因为解析器可以放置在 omit[] 中,这使得其属性类型为 none 。因此, attribute 不能告诉你你的解析器在所有情况下会产生什么属性;它只能告诉你如果将其传递给 parse() ,它会产生什么。

Parser attributes  解析属性

This table summarizes the attributes generated for all Boost.Parser parsers. In the table below:
此表总结了为所有 Boost.Parser 解析器生成的属性。在下表中:

  • RESOLVE() is a notional macro that expands to the resolution of parse argument or evaluation of a parse predicate (see The Parsers And Their Uses); and
    RESOLVE () 是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》);
  • x and y represent arbitrary objects.
    xy 代表任意对象。

Table 26.8. Parsers and Their Attributes
表 26.8。解析器和它们的属性

Parser   解析器

Attribute Type   属性类型

Notes   注释

eps

None.

eol

None.

eoi

None.

attr(x)

decltype(RESOLVE(x))

char_

The code point type in Unicode parsing, or char in non-Unicode parsing; see below.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char ;见下文。

Includes all the _p UDLs that take a single character, and all character class parsers like control and lower.
包括所有接受单个字符的 _p UDLs 以及所有类似 controllower 的字符类解析器。

cp

char32_t

cu

char

lit(x)

None.

Includes all the _l UDLs.
包括所有 _l UDLs。

string(x)

std::string

Includes all the _p UDLs that take a string.
包括所有接受字符串的 _p UDLs。

bool_

bool

bin

unsigned int

oct

unsigned int

hex

unsigned int

ushort_

unsigned short

uint_

unsigned int

ulong_

unsigned long

ulong_long

unsigned long long

short_

short

int_

int

long_

long

long_long

long long

float_

float

double_

double

symbols<T>

T


char_ is a bit odd, since its attribute type is polymorphic. When you use char_ to parse text in the non-Unicode code path (i.e. a string of char), the attribute is char. When you use the exact same char_ to parse in the Unicode-aware code path, all matching is code point based, and so the attribute type is the type used to represent code points, char32_t. All parsing of UTF-8 falls under this case.
char_ 有点奇怪,因为它的属性类型是多态的。当您使用 char_ 在非 Unicode 代码路径中解析文本(即一个 char 字符串)时,属性是 char 。当您使用完全相同的 char_ 在支持 Unicode 的代码路径中解析时,所有匹配都是基于代码点的,因此属性类型是用于表示代码点的类型, char32_t 。所有 UTF-8 的解析都属于这种情况。

Here, we're parsing plain chars, meaning that the parsing is in the non-Unicode code path, the attribute of char_ is char:
这里,我们正在解析纯文本 char ,意味着解析是在非 Unicode 代码路径中, char_ 的属性是 char

auto result = parse("some text", boost::parser::char_);
static_assert(std::is_same_v<decltype(result), std::optional<char>>));

When you parse UTF-8, the matching is done on a code point basis, so the attribute type is char32_t:
当你解析 UTF-8 时,匹配是基于码点的,因此属性类型是 char32_t

auto result = parse("some text" | boost::parser::as_utf8, boost::parser::char_);
static_assert(std::is_same_v<decltype(result), std::optional<char32_t>>));

The good news is that usually you don't parse characters individually. When you parse with char_, you usually parse repetition of then, which will produce a std::string, regardless of whether you're in Unicode parsing mode or not. If you do need to parse individual characters, and want to lock down their attribute type, you can use cp and/or cu to enforce a non-polymorphic attribute type.
好消息是,通常您不需要逐个解析字符。当您使用 char_ 解析时,通常解析重复的 then,这将产生 std::string ,无论您是否处于 Unicode 解析模式。如果您确实需要解析单个字符,并希望锁定它们的属性类型,您可以使用 cp 和/或 cu 来强制执行非多态属性类型。

Combining operation attributes
组合操作属性

Combining operations of course affect the generation of attributes. In the tables below:
当然,组合操作会影响属性生成。在下表中的:

  • m and n are parse arguments that resolve to integral values;
    mn 是解析参数,解析为整数值;
  • pred is a parse predicate;
    pred 是一个解析谓词;
  • arg0, arg1, arg2, ... are parse arguments;
    arg0arg1arg2 等是解析参数;
  • a is a semantic action; and
    a 是一个语义动作;并且
  • p, p1, p2, ... are parsers that generate attributes.
    pp1p2 等是生成属性的解析器。

Table 26.9. Combining Operations and Their Attributes
表 26.9. 组合操作及其属性

Parser   解析器

Attribute Type   属性类型

!p

None.

&p

None.

*p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

+p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

+*p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

*+p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

-p

std::optional<ATTR(p)>

p1 >> p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 > p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 >> p2 >> p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 > p2 >> p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 >> p2 > p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 > p2 > p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 | p2

std::variant<ATTR(p1), ATTR(p2)>

p1 | p2 | p3

std::variant<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 || p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 || p2 || p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 % p2

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p1)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p1)>

p[a]

None.

repeat(arg0)[p]

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

repeat(arg0, arg1)[p]

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

if_(pred)[p]

std::optional<ATTR(p)>

switch_(arg0)(arg1, p1)(arg2, p2)...

std::variant<ATTR(p1), ATTR(p2), ...>


[Important] Important  重要

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).
所有字符解析器,如 char_cpcu ,都产生 charchar32_t 属性。因此,当你在上面的表中看到“ std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)> ”时,这实际上意味着每个字符属性序列都被转换成 std::string 。唯一不会发生这种情况的是,当你使用另一种字符类型引入具有属性的规则(或使用 attribute 这样做)。

[Important] Important  重要

In case you did not notice it above, adding a semantic action to a parser erases the parser's attribute. The attribute is still available inside the semantic action as _attr(ctx).
如果在上文中你没有注意到,向解析器添加语义动作会擦除解析器的属性。该属性仍然在语义动作内部以 _attr(ctx) 的形式可用。

There are a relatively small number of rules that define how sequence parsers and alternative parsers' attributes are generated. (Don't worry, there are examples below.)
存在相对较少的规则定义了如何生成序列解析器和替代解析器的属性。(别担心,下面有示例。)

Sequence parser attribute rules
序列解析属性规则

The attribute generation behavior of sequence parsers is conceptually pretty simple:
序列解析器的属性生成行为在概念上相当简单:

  • the attributes of subparsers form a tuple of values;
    子解析器的属性形成一个值的元组;
  • subparsers that do not generate attributes do not contribute to the sequence's attribute;
    子解析器不生成属性,不会对序列的属性做出贡献
  • subparsers that do generate attributes usually contribute an individual element to the tuple result; except
    子解析器通常为元组结果贡献一个单独的元素,除了
  • when containers of the same element type are next to each other, or individual elements are next to containers of their type, the two adjacent attributes collapse into one attribute; and
    当相同元素类型的容器相邻,或者单个元素与它们类型的容器相邻时,两个相邻属性合并为一个属性;
  • if the result of all that is a degenerate tuple boost::parser::tuple<T> (even if T is a type that means "no attribute"), the attribute becomes T.
    如果所有这些的结果是一个退化的元组 boost::parser::tuple<T> (即使 T 是一种表示“没有属性”的类型),则属性变为 T

More formally, the attribute generation algorithm works like this. For a sequence parser p, let the list of attribute types for the subparsers of p be a0, a1, a2, ..., an.
更正式地说,属性生成算法是这样工作的。对于一个序列解析器 p ,让 p 的子解析器的属性类型列表为 a0, a1, a2, ..., an

We get the attribute of p by evaluating a compile-time left fold operation, left-fold({a1, a2, ..., an}, tuple<a0>, OP). OP is the combining operation that takes the current attribute type (initially boost::parser::tuple<a0>) and the next attribute type, and returns the new current attribute type. The current attribute type at the end of the fold operation is the attribute type for p.
我们通过评估编译时左折叠操作来获取 p 的属性, left-fold({a1, a2, ..., an}, tuple<a0>, OP) 是结合操作,它接受当前属性类型(最初为 boost::parser::tuple<a0> )和下一个属性类型,并返回新的当前属性类型。折叠操作结束时的当前属性类型是 p 的属性类型。

OP attempts to apply a series of rules, one at a time. The rules are noted as X >> Y -> Z, where X is the type of the current attribute, Y is the type of the next attribute, and Z is the new current attribute type. In these rules, C<T> is a container of T; none is a special type that indicates that there is no attribute; T is a type; CHAR is a character type, either char or char32_t; and Ts... is a parameter pack of one or more types. Note that T may be the special type none. The current attribute is always a tuple (call it Tup), so the "current attribute X" refers to the last element of Tup, not Tup itself, except for those rules that explicitly mention boost::parser::tuple<> as part of X's type.
尝试逐个应用一系列规则。规则标记为 X >> Y -> Z ,其中 X 是当前属性的类型, Y 是下一个属性的类型, Z 是新的当前属性类型。在这些规则中, C<T>T 的容器; none 是一个特殊类型,表示没有属性; T 是类型; CHAR 是字符类型,要么是 char 要么是 char32_tTs... 是一组一个或多个类型的参数包。注意, T 可能是特殊类型 none 。当前属性始终是一个元组(可以称之为 Tup ),因此“当前属性 X ”指的是 Tup 的最后一个元素,而不是 Tup 本身,除非那些明确提到 boost::parser::tuple<>X 类型一部分的规则。

The rules that combine containers with (possibly optional) adjacent values (e.g. C<T> >> optional<T> -> C<T>) have a special case for strings. If C<T> is exactly std::string, and T is either char or char32_t, the combination yields a std::string.
规则将容器与(可能可选的)相邻值(例如 C<T> >> optional<T> -> C<T> )组合在一起,对于字符串有一个特殊情况。如果 C<T> 精确等于 std::string ,并且 T 要么是 char ,要么是 char32_t ,则组合产生一个 std::string

Again, if the final result is that the attribute is boost::parser::tuple<T>, the attribute becomes T.
再次,如果最终结果是属性为 boost::parser::tuple<T> ,则属性变为 T

[Note] Note  注意

What constitutes a container in the rules above is determined by the container concept:
上述规则中,构成容器的要素由 container 概念决定:

template<typename T>
concept container = std::ranges::common_range<T> && requires(T t) {
    { t.insert(t.begin(), *t.begin()) }
        -> std::same_as<std::ranges::iterator_t<T>>;
};

Alternative parser attribute rules
替代解析器属性规则

The rules for alternative parsers are much simpler. For an alternative parer p, let the list of attribute types for the subparsers of p be a0, a1, a2, ..., an. The attribute of p is std::variant<a0, a1, a2, ..., an>, with the following steps applied:
替代解析器的规则要简单得多。对于替代解析器 p ,让子解析器 p 的属性类型列表为 a0, a1, a2, ..., anp 的属性为 std::variant<a0, a1, a2, ..., an> ,应用以下步骤:

  • all the none attributes are left out, and if any are, the attribute is wrapped in a std::optional, like std::optional<std::variant</*...*/>>;
    所有 none 属性都被省略了,如果有,属性会被包裹在 std::optional 中,例如 std::optional<std::variant</*...*/>>
  • duplicates in the std::variant template parameters <T1, T2, ... Tn> are removed; every type that appears does so exacly once;
    重复的 std::variant 模板参数 <T1, T2, ... Tn> 已被移除;每个出现的类型都恰好出现一次
  • if the attribute is std::variant<T> or std::optional<std::variant<T>>, the attribute becomes instead T or std::optional<T>, respectively; and
    如果属性是 std::variant<T>std::optional<std::variant<T>> ,则属性分别变为 Tstd::optional<T>
  • if the attribute is std::variant<> or std::optional<std::variant<>>, the result becomes none instead.
    如果属性是 std::variant<>std::optional<std::variant<>> ,结果变为 none
Formation of containers in attributes
容器在属性中的形成

The rule for forming containers from non-containers is simple. You get a vector from any of the repeating parsers, like +p, *p, repeat(3)[p], etc. The value type of the vector is ATTR(p).
非容器形成容器的规则很简单。您可以从任何重复的解析器中获取一个向量,如 +p*prepeat(3)[p] 等。向量的值类型为 ATTR(p)

Another rule for sequence containers is that a value x and a container c containing elements of x's type will form a single container. However, x's type must be exactly the same as the elements in c. There is an exception to this in the special case for strings and characters noted above. For instance, consider the attribute of char_ >> string("str"). In the non-Unicode code path, char_'s attribute type is guaranteed to be char, so ATTR(char_ >> string("str")) is std::string. If you are parsing UTF-8 in the Unicode code path, char_'s attribute type is char32_t, and the special rule makes it also produce a std::string. Otherwise, the attribute for ATTR(char_ >> string("str")) would be boost::parser::tuple<char32_t, std::string>.
另一条序列容器的规则是,一个值 x 和一个包含 x 类型元素的容器 c 将形成一个单独的容器。然而, x 的类型必须与 c 中的元素完全相同。在上述特殊情况下,对于字符串和字符存在一个例外。例如,考虑 char_ >> string("str") 的属性。在非 Unicode 代码路径中, char_ 的属性类型保证是 char ,因此 ATTR(char_ >> string("str"))std::string 。如果你在 Unicode 代码路径中解析 UTF-8, char_ 的属性类型是 char32_t ,特殊规则使得它也会产生一个 std::string 。否则, ATTR(char_ >> string("str")) 的属性将是 boost::parser::tuple<char32_t, std::string>

Again, there are no special rules for combining values and containers. Every combination results from an exact match, or fall into the string+character special case.
再次强调,组合值和容器没有特殊规则。每一种组合都来自精确匹配,或者落入字符串+字符的特殊情况。

Another special case: std::string assignment
另一个特殊情况: std::string 赋值

std::string can be assigned from a char. This is dumb. But, we're stuck with it. When you write a parser with a char attribute, and you try to parse it into a std::string, you've almost certainly made a mistake. More importantly, if you write this:
std::string 可以从 char 分配。这很愚蠢。但我们别无选择。当你用具有 char 属性的解析器进行解析,并尝试将其解析为 std::string 时,你几乎肯定犯了一个错误。更重要的是,如果你写下这样:

namespace bp = boost::parser;
std::string result;
auto b = bp::parse("3", bp::int_, bp::ws, result);

... you are even more likely to have made a mistake. Though this should work, because the assignment in std::string s; s = 3; is well-formed, Boost.Parser forbids it. If you write parsing code like the snippet above, you will get a static assertion. If you really do want to assign a float or whatever to a std::string, do it in a semantic action.
...你甚至更有可能犯错误。尽管这应该可以工作,因为 std::string s; s = 3; 中的任务格式良好,Boost.Parser 禁止这样做。如果你编写像上面片段那样的解析代码,你会得到一个静态断言。如果你真的想将 float 或任何东西赋值给 std::string ,请在语义动作中这样做。

Examples of attributes generated by sequence and alternative parsers
序列和替代解析器生成的属性示例

In the table: a is a semantic action; and p, p1, p2, ... are parsers that generate attributes. Note that only >> is used here; > has the exact same attribute generation rules.
在表中: a 是语义动作;而 pp1p2 、... 是生成属性的解析器。注意,这里只使用了 >>> 具有完全相同的属性生成规则。

Table 26.10. Sequence and Alternative Combining Operations and Their Attributes
表 26.10. 序列和替代组合操作及其属性

Expression   表达式

Attribute Type   属性类型

eps >> eps

None.

p >> eps

ATTR(p)

eps >> p

ATTR(p)

cu >> string("str")

std::string

string("str") >> cu

std::string

*cu >> string("str")

boost::parser::tuple<std::string, std::string>

string("str") >> *cu

boost::parser::tuple<std::string, std::string>

p >> p

boost::parser::tuple<ATTR(p), ATTR(p)>

*p >> p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

p >> *p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

*p >> -p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

-p >> *p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

string("str") >> -cu

std::string

-cu >> string("str")

std::string

!p1 | p2[a]

None.

p | p

ATTR(p)

p1 | p2

std::variant<ATTR(p1), ATTR(p2)>

p | eps

std::optional<ATTR(p)>

p1 | p2 | eps

std::optional<std::variant<ATTR(p1), ATTR(p2)>>

p1 | p2[a] | p3

std::optional<std::variant<ATTR(p1), ATTR(p3)>>


Controlling attribute generation with merge[] and separate[]
控制使用 merge[]和 separate[]生成属性

As we saw in the previous Parsing into structs and classes section, if you parse two strings in a row, you get two separate strings in the resulting attribute. The parser from that example was this:
如我们在上一节“解析为 struct s 和 class es”中看到的那样,如果你连续解析两个字符串,结果属性中会得到两个独立的字符串。那个例子中的解析器是这样的:

namespace bp = boost::parser;
auto employee_parser = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

employee_parser's attribute is boost::parser::tuple<int, std::string, std::string, double>. The two quoted_string parsers produce std::string attributes, and those attributes are not combined. That is the default behavior, and it is just what we want for this case; we don't want the first and last name fields to be jammed together such that we can't tell where one name ends and the other begins. What if we were parsing some string that consisted of a prefix and a suffix, and the prefix and suffix were defined separately for reuse elsewhere?
'的属性是 boost::parser::tuple<int, std::string, std::string, double> 。这两个 quoted_string 解析器产生 std::string 属性,并且这些属性没有合并。这是默认行为,这正是我们想要的;我们不希望姓名字段被挤在一起,以至于我们无法分辨一个名字的结束和另一个名字的开始。如果我们正在解析一个由前缀和后缀组成的字符串,而且前缀和后缀被分别定义以供其他地方重用,那会怎么样呢?

namespace bp = boost::parser;
auto prefix = /* ... */;
auto suffix = /* ... */;
auto special_string = prefix >> suffix;
// Continue to use prefix and suffix to make other parsers....

In this case, we might want to use these separate parsers, but want special_string to produce a single std::string for its attribute. merge[] exists for this purpose.
在这种情况下,我们可能想要使用这些独立的解析器,但希望 special_string 为其属性生成单个 std::stringmerge[] 就是为了这个目的而存在的。

namespace bp = boost::parser;
auto prefix = /* ... */;
auto suffix = /* ... */;
auto special_string = bp::merge[prefix >> suffix];

merge[] only applies to sequence parsers (like p1 >> p2), and forces all subparsers in the sequence parser to use the same variable for their attribute.
仅适用于序列解析器(如 p1 >> p2 ),并强制序列解析器中的所有子解析器使用相同的变量来表示它们的属性。

Another directive, separate[], also applies only to sequence parsers, but does the opposite of merge[]. If forces all the attributes produced by the subparsers of the sequence parser to stay separate, even if they would have combined. For instance, consider this parser.
另一个指令 separate[] 也仅适用于序列解析器,但与 merge[] 相反。它强制序列解析器的子解析器产生的所有属性保持独立,即使它们本可以合并。例如,考虑这个解析器。

namespace bp = boost::parser;
auto string_and_char = +bp::char_('a') >> ' ' >> bp::cp;

string_and_char matches one or more 'a's, followed by some other character. As written above, string_and_char produces a std::string, and the final character is appended to the string, after all the 'a's. However, if you wanted to store the final character as a separate value, you would use separate[].
string_and_char 匹配一个或多个 'a' ,后面跟其他字符。如上所述, string_and_char 产生一个 std::string ,最后一个字符追加到字符串中,所有 'a' 之后。但是,如果您想将最后一个字符作为单独的值存储,您将使用 separate[]

namespace bp = boost::parser;
auto string_and_char = bp::separate[+bp::char_('a') >> ' ' >> bp::cp];

With this change, string_and_char produces the attribute boost::parser::tuple<std::string, char32_t>.
使用此更改, string_and_char 生成属性 boost::parser::tuple<std::string, char32_t>

merge[] and separate[] in more detail
合并[]和分离[]的更详细说明

As mentioned previously, merge[] applies only to sequence parsers. All subparsers must have the same attribute, or produce no attribute at all. At least one subparser must produce an attribute. When you use merge[], you create a combining group. Every parser in a combining group uses the same variable for its attribute. No parser in a combining group interacts with the attributes of any parsers outside of its combining group. Combining groups are disjoint; merge[/*...*/] >> merge[/*...*/] will produce a tuple of two attributes, not one.
如前所述, merge[] 仅适用于序列解析器。所有子解析器必须具有相同的属性,或者根本不产生属性。至少有一个子解析器必须产生一个属性。当您使用 merge[] 时,您创建一个组合组。组合组中的每个解析器都使用相同的变量作为其属性。组合组中的任何解析器都不会与其组合组之外的任何解析器的属性交互。组合组是互斥的; merge[/*...*/] >> merge[/*...*/] 将产生两个属性的元组,而不是一个。

separate[] also applies only to sequence parsers. When you use separate[], you disable interaction of all the subparsers' attributes with adjacent attributes, whether they are inside or outside the separate[] directive; you force each subparser to have a separate attribute.
separate[] 也仅适用于序列解析器。当您使用 separate[] 时,您将禁用所有子解析器属性与相邻属性(无论它们是在 separate[] 指令内部还是外部)的交互;您将强制每个子解析器具有单独的属性。

The rules for merge[] and separate[] overrule the steps of the algorithm described above for combining the attributes of a sequence parser. Consider an example.
规则 merge[]separate[] 覆盖了上述算法中组合序列解析器属性的步骤。考虑一个例子。

namespace bp = boost::parser;
constexpr auto parser =
    bp::char_ >> bp::merge[(bp::string("abc") >> bp::char_ >> bp::char_) >> bp::string("ghi")];

You might think that ATTR(parser) would be bp::tuple<char, std::string>. It is not. The parser above does not even compile. Since we created a merge group above, we disabled the default behavior in which the char_ parsers would have collapsed into the string parser that preceded them. Since they are all treated as separate entities, and since they have different attribute types, the use of merge[] is an error.
您可能认为 ATTR(parser) 会是 bp::tuple<char, std::string> 。不是的。上面的解析器甚至无法编译。由于我们在上面创建了一个合并组,我们禁用了默认行为,即 char_ 解析器会合并到它们之前的 string 解析器中。由于它们都被视为独立的实体,并且具有不同的属性类型,因此使用 merge[] 是错误的。

Many directives create a new parser out of the parser they are given. merge[] and separate[] do not. Since they operate only on sequence parsers, all they do is create a copy of the sequence parser they are given. The seq_parser template has a template parameter CombiningGroups, and all merge[] and separate[] do is take a given seq_parser and create a copy of it with a different CombiningGroups template parameter. This means that merge[] and separate[] are can be ignored in operator>> expressions much like parentheses are. Consider an example.
许多指令会从给定的解析器中创建一个新的解析器。 merge[]separate[] 不会这样做。由于它们仅对序列解析器进行操作,它们所做的只是创建给定序列解析器的副本。 seq_parser 模板有一个模板参数 CombiningGroups ,而 merge[]separate[] 所做的只是接受一个给定的 seq_parser 并创建一个具有不同 CombiningGroups 模板参数的副本。这意味着在 merge[]separate[] 表达式中可以忽略 operator>> ,就像括号一样。考虑一个例子。

namespace bp = boost::parser;
constexpr auto parser1 = bp::separate[bp::int_ >> bp::int_] >> bp::int_;
constexpr auto parser2 = bp::lexeme[bp::int_ >> ' ' >> bp::int_] >> bp::int_;

Note that separate[] is a no-op here; it's only being used this way for this example. These parsers have different attribute types. ATTR(parser1) is boost::parser::tuple(int, int, int). ATTR(parser2) is boost::parser::tuple(boost::parser::tuple(int, int), int). This is because bp::lexeme[] wraps its given parser in a new parser. merge[] does not. That's why, even though parser1 and parser2 look so structurally similar, they have different attributes.
请注意, separate[] 在这里是一个空操作;它只是在这个例子中这样使用。这些解析器有不同的属性类型。 ATTR(parser1)boost::parser::tuple(int, int, int)ATTR(parser2)boost::parser::tuple(boost::parser::tuple(int, int), int) 。这是因为 bp::lexeme[] 将其给定的解析器包装在一个新的解析器中。 merge[] 没有这样做。这就是为什么,尽管 parser1parser2 看起来结构上很相似,但它们有不同的属性。

transform(f)[]

transform(f)[] is a directive that transforms the attribute of a parser using the given function f. For example:
transform(f)[] 是一个指令,用于使用给定的函数 f 转换解析器的属性。例如:

auto str_sum = [&](std::string const & s) {
    int retval = 0;
    for (auto ch : s) {
        retval += ch - '0';
    }
    return retval;
};

namespace bp = boost::parser;
constexpr auto parser = +bp::char_;
std::string str = "012345";

auto result = bp::parse(str, bp::transform(str_sum)[parser]);
assert(result);
assert(*result == 15);
static_assert(std::is_same_v<decltype(result), std::optional<int>>);

Here, we have a function str_sum that we use for f. It assumes each character in the given std::string s is a digit, and returns the sum of all the digits in s. Out parser parser would normally return a std::string. However, since str_sum returns a different type — int — that is the attribute type of the full parser, bp::transform(by_value_str_sum)[parser], as you can see from the static_assert.
这里,我们有一个用于 f 的函数 str_sum 。它假设给定 std::string s 中的每个字符都是数字,并返回所有数字的总和。我们的解析器 parser 通常会返回一个 std::string 。然而,由于 str_sum 返回了不同的类型—— int ,这是完整解析器 bp::transform(by_value_str_sum)[parser] 的属性类型,正如您从 static_assert 中看到的。

As is the case with attributes all throughout Boost.Parser, the attribute passed to f will be moved. You can take it by const &, &&, or by value.
与 Boost.Parser 中的所有属性一样,传递给 f 的属性将被移动。您可以通过 const &&& 或按值取它。

No distinction is made between parsers with and without an attribute, because there is a Regular special no-attribute type that is generated by parsers with no attribute. You may therefore write something like transform(f)[eps], and Boost.Parser will happily call f with this special no-attribute type.
没有在具有和不具有属性的解析器之间做出区分,因为解析器会生成一个没有属性的 Regular 特殊类型。因此,您可以写类似 transform(f)[eps] 的东西,Boost.Parser 会高兴地用这个特殊的没有属性的类型调用 f

Other directives that affect attribute generation
其他影响属性生成的指令

omit[p] disables attribute generation for the parser p. raw[p] changes the attribute from ATTR(p) to a view that indicates the subrange of the input that was matched by p. string_view[p] is just like raw[p], except that it produces std::basic_string_views. See Directives for details.
omit[p] 禁用解析器的属性生成。 p . raw[p] 将属性从 ATTR(p) 更改为表示输入匹配的子范围的视图。 pstring_view[p]raw[p] 类似,只是它产生 std::basic_string_view 。有关详细信息,请参阅指令。

There are multiple top-level parse functions. They have some things in common:
存在多个顶级解析函数。它们有一些共同点:

  • They each return a value contextually convertible to bool.
    他们各自返回一个可上下文转换为 bool 的值。
  • They each take at least a range to parse and a parser. The "range to parse" may be an iterator/sentinel pair or an single range object.
    他们每个至少需要一个解析范围和一个解析器。"解析范围"可能是一个迭代器/哨兵对或单个范围对象。
  • They each require forward iterability of the range to parse.
    它们各自需要范围的前向可迭代性来解析。
  • They each accept any range with a character element type. This means that they can each parse ranges of char, wchar_t, char8_t, char16_t, or char32_t.
    它们都接受任何具有字符元素类型的范围。这意味着它们可以分别解析 charwchar_tchar8_tchar16_tchar32_t 的范围。
  • The overloads with prefix_ in their name take an iterator/sentinel pair. For example prefix_parse(first, last, p, ws), which parses the range [first, last), advancing first as it goes. If the parse succeeds, the entire input may or may not have been matched. The value of first will indicate the last location within the input that p matched. The whole input was matched if and only if first == last after the call to parse().
    名称中包含 prefix_ 的重载函数接受一个迭代器/哨兵对。例如 prefix_parse(first, last, p, ws) ,它解析范围 [first, last) ,在解析过程中前进 first 。如果解析成功,整个输入可能已经或尚未完全匹配。 first 的值将指示输入中 p 匹配的最后一个位置。只有在调用 parse() 之后 first == last ,整个输入才被完全匹配。
  • When you call any of the range overloads of parse(), for example parse(r, p, ws), parse() only indicates success if all of r was matched by p.
    当你调用 parse() 的任何范围重载,例如 parse(r, p, ws)parse() 时,只有当 r 全部被 p 匹配时,才表示成功。
[Note] Note  注意

wchar_t is an accepted value type for the input. Please note that this is interpreted as UTF-16 on MSVC, and UTF-32 everywhere else.
wchar_t 是输入的接受值类型。请注意,在 MSVC 上这被解释为 UTF-16,在其他所有地方都是 UTF-32。

The overloads  过载

There are eight overloads of parse() and prefix_parse() combined, because there are three either/or options in how you call them.
共有八个 parse()prefix_parse() 的重载,因为调用它们的方式有三种任选其一的选项。

Iterator/sentinel versus range
迭代器/哨兵与范围

You can call prefix_parse() with an iterator and sentinel that delimit a range of character values. For example:
您可以使用迭代器和哨兵来调用 prefix_parse() ,以限定字符值范围。例如:

namespace bp = boost::parser;
auto const p = /* some parser ... */;

char const * str_1 = /* ... */;
// Using null_sentinel, str_1 can point to three billion characters, and
// we can call prefix_parse() without having to find the end of the string first.
auto result_1 = bp::prefix_parse(str_1, bp::null_sentinel, p, bp::ws);

char str_2[] = /* ... */;
auto result_2 = bp::prefix_parse(std::begin(str_2), std::end(str_2), p, bp::ws);

The iterator/sentinel overloads can parse successfully without matching the entire input. You can tell if the entire input was matched by checking if first == last is true after prefix_parse() returns.
迭代器/哨兵重载可以在不匹配整个输入的情况下成功解析。您可以通过检查 prefix_parse() 返回后 first == last 是否为真来确定是否匹配了整个输入。

By contrast, you call parse() with a range of character values. When the range is a reference to an array of characters, any terminating 0 is ignored; this allows calls like parse("str", p) to work naturally.
相比之下,您使用具有字符值范围的 parse() 。当范围是字符数组引用时,任何终止的 0 将被忽略;这允许像 parse("str", p) 这样的调用自然工作。

namespace bp = boost::parser;
auto const p = /* some parser ... */;

std::u8string str_1 = "str";
auto result_1 = bp::parse(str_1, p, bp::ws);

// The null terminator is ignored.  This call parses s-t-r, not s-t-r-0.
auto result_2 = bp::parse(U"str", p, bp::ws);

char const * str_3 = "str";
auto result_3 = bp::parse(bp::null_term(str_3) | bp::as_utf16, p, bp::ws);

Since there is no way to indicate that p matches the input, but only a prefix of the input was matched, the range (non-iterator/sentinel) overloads of parse() indicate failure if the entire input is not matched.
由于无法表示 p 与输入匹配,但只匹配了输入的前缀,因此 parse() 的非迭代器/哨兵重载在输入未完全匹配时表示失败。

With or without an attribute out-parameter
有无属性输出参数
namespace bp = boost::parser;
auto const p = '"' >> *(bp::char_ - '"') >> '"';
char const * str = "\"two words\"" ;

std::string result_1;
bool const success = bp::parse(str, p, result_1);   // success is true; result_1 is "two words"
auto result_2 = bp::parse(str, p);                  // !!result_2 is true; *result_2 is "two words"

When you call parse() with an attribute out-parameter and parser p, the expected type is something like ATTR(p). It doesn't have to be exactly that; I'll explain in a bit. The return type is bool.
当你使用具有属性输出参数的 parse() 和解析器 p 调用时,期望的类型类似于 ATTR(p) 。它不必完全是这样;我稍后会解释。返回类型是 bool

When you call parse() without an attribute out-parameter and parser p, the return type is std::optional<ATTR(p)>. Note that when ATTR(p) is itself an optional, the return type is std::optional<std::optional<...>>. Each of those optionals tells you something different. The outer one tells you whether the parse succeeded. If so, the parser was successful, but it still generates an attribute that is an optional — that's the inner one.
当您调用没有属性输出参数的 parse() 和解析器 p 时,返回类型是 std::optional<ATTR(p)> 。请注意,当 ATTR(p) 本身是一个 optional 时,返回类型是 std::optional<std::optional<...>> 。每个可选参数都告诉你不同的事情。外部的告诉你解析是否成功。如果是这样,解析器就成功了,但它仍然生成一个属性,这是一个 optional ——那就是内部的。

With or without a skipper
有无船长
namespace bp = boost::parser;
auto const p = '"' >> *(bp::char_ - '"') >> '"';
char const * str = "\"two words\"" ;

auto result_1 = bp::parse(str, p);         // !!result_1 is true; *result_1 is "two words"
auto result_2 = bp::parse(str, p, bp::ws); // !!result_2 is true; *result_2 is "twowords"
Compatibility of attribute out-parameters
属性输出参数的兼容性

For any call to parse() that takes an attribute out-parameter, like parse("str", p, bp::ws, out), the call is well-formed for a number of possible types of out; decltype(out) does not need to be exactly ATTR(p).
对于任何调用 parse() ,它接受一个属性输出参数,如 parse("str", p, bp::ws, out) ,该调用对于多种可能的 out 类型都是良好形成的; decltype(out) 不需要正好是 ATTR(p)

For instance, this is well-formed code that does not abort (remember that the attribute type of string() is std::string):
例如,这是一段没有终止的格式良好的代码(记住 string() 的属性类型是 std::string ):

namespace bp = boost::parser;
auto const p = bp::string("foo");

std::vector<char> result;
bool const success = bp::parse("foo", p, result);
assert(success && result == std::vector<char>({'f', 'o', 'o'}));

Even though p generates a std::string attribute, when it actually takes the data it generates and writes it into an attribute, it only assumes that the attribute is a container (see Concepts), not that it is some particular container type. It will happily insert() into a std::string or a std::vector<char> all the same. std::string and std::vector<char> are both containers of char, but it will also insert into a container with a different element type. p just needs to be able to insert the elements it produces into the attribute-container. As long as an implicit conversion allows that to work, everything is fine:
尽管 p 生成一个 std::string 属性,但实际上它将生成的数据写入属性时,它只假设该属性是一个 container (见概念),而不是某种特定的容器类型。它将愉快地 insert()std::stringstd::vector<char>std::stringstd::vector<char> 都是 char 的容器,但它也会将元素插入到具有不同元素类型的容器中。 p 只需要能够将其生成的元素插入到属性容器中即可。只要隐式转换允许这样做,一切就都正常了:

namespace bp = boost::parser;
auto const p = bp::string("foo");

std::deque<int> result;
bool const success = bp::parse("foo", p, result);
assert(success && result == std::deque<int>({'f', 'o', 'o'}));

This works, too, even though it requires inserting elements from a generated sequence of char32_t into a container of char (remember that the attribute type of +cp is std::vector<char32_t>):
这也可以工作,尽管它需要将生成的序列中的元素 char32_t 插入到容器 char 中(记住 +cp 的属性类型是 std::vector<char32_t> ):

namespace bp = boost::parser;
auto const p = +bp::cp;

std::string result;
bool const success = bp::parse("foo", p, result);
assert(success && result == "foo");

This next example works as well, even though the change to a container is not at the top level. It is an element of the result tuple:
这个下一个例子也有效,即使将更改应用到容器不是在顶层。它是结果元组的元素:

namespace bp = boost::parser;
auto const p = +(bp::cp - ' ') >> ' ' >> string("foo");

using attr_type = decltype(bp::parse(u8"", p));
static_assert(std::is_same_v<
              attr_type,
              std::optional<bp::tuple<std::string, std::string>>>);

using namespace bp::literals;

{
    // This is similar to attr_type, with the first std::string changed to a std::vector<int>.
    bp::tuple<std::vector<int>, std::string> result;
    bool const success = bp::parse(u8"rôle foo" | bp::as_utf8, p, result);
    assert(success);
    assert(bp::get(result, 0_c) == std::vector<int>({'r', U'ô', 'l', 'e'}));
    assert(bp::get(result, 1_c) == "foo");
}
{
    // This time, we have a std::vector<char> instead of a std::vector<int>.
    bp::tuple<std::vector<char>, std::string> result;
    bool const success = bp::parse(u8"rôle foo" | bp::as_utf8, p, result);
    assert(success);
    // The 4 code points "rôle" get transcoded to 5 UTF-8 code points to fit in the std::string.
    assert(bp::get(result, 0_c) == std::vector<char>({'r', (char)0xc3, (char)0xb4, 'l', 'e'}));
    assert(bp::get(result, 1_c) == "foo");
}

As indicated in the inline comments, there are a couple of things to take away from this example:
如内联注释所示,从这个例子中我们可以得到几点启示:

  • If you change an attribute out-param (such as std::string to std::vector<int>, or std::vector<char32_t> to std::deque<int>), the call to parse() will often still be well-formed.
    如果您更改一个输出参数属性(例如从 std::string 更改为 std::vector<int> ,或从 std::vector<char32_t> 更改为 std::deque<int> ),对 parse() 的调用通常仍然良好格式化。
  • When changing out a container type, if both containers contain character values, the removed container's element type is char32_t (or wchar_t for non-MSVC builds), and the new container's element type is char or char8_t, Boost.Parser assumes that this is a UTF-32-to-UTF-8 conversion, and silently transcodes the data when inserting into the new container.
    当更换容器类型时,如果两个容器都包含字符值,则移除的容器的元素类型为 char32_t (或非 MSVC 构建的 wchar_t ),而新容器的元素类型为 charchar8_t ,Boost.Parser 假定这是 UTF-32 到 UTF-8 的转换,并在将数据插入新容器时静默转换数据。

Let's look at a case where another simple-seeming type replacement does not work. First, the case that works:
让我们看看一个看似简单的类型替换不起作用的案例。首先,这是一个起作用的案例:

namespace bp = boost::parser;
auto parser = -(bp::char_ % ',');
std::vector<int> result;
auto b = bp::parse("a, b", parser, bp::ws, result);

ATTR(parser) is std::optional<std::string>. Even though we pass a std::vector<int>, everything is fine. However, if we modify this case only sightly, so that the std::optional<std::string> is nested within the attribute, the code becomes ill-formed.
ATTR(parser)std::optional<std::string> 。即使我们传递一个 std::vector<int> ,一切正常。然而,如果我们只稍微修改这个情况,使得 std::optional<std::string> 被嵌套在属性中,代码就变得不合法了。

struct S
{
    std::vector<int> chars;
    int i;
};
namespace bp = boost::parser;
auto parser = -(bp::char_ % ',') >> bp::int_;
S result;
auto b = bp::parse("a, b 42", parser, bp::ws, result);

If we change chars to a std::vector<char>, the code is still ill-formed. Same if we change chars to a std::string. We must actually use std::optional<std::string> exactly to make the code well-formed again.
如果我们把 chars 改为 std::vector<char> ,代码仍然是不合法的。同样,如果我们把 chars 改为 std::string 。实际上我们必须精确地使用 std::optional<std::string> 才能使代码再次合法。

The reason the same looseness from the top-level parser does not apply to a nested parser is that, at some point in the code, the parser -(bp::char_ % ',') would try to assign a std::optional<std::string> — the element type of the attribute type it normally generates — to a chars. If there's no implicit conversion there, the code is ill-formed.
同一级别的解析器中的相同宽松性不适用于嵌套解析器的原因是,在代码的某个点上,解析器 -(bp::char_ % ',') 会尝试将 std::optional<std::string> (它通常生成的属性类型的元素类型)赋值给 chars 。如果没有隐式转换,则代码是不规范的。

The take-away for this last example is that the ability to arbitrarily swap out data types within the type of the attribute you pass to parse() is very flexible, but is also limited to structurally simple cases. When we discuss rules in the next section, we'll see how this flexibility in the types of attributes can help when writing complicated parsers.
这个最后一个例子的启示是,在传递给 parse() 的属性类型中任意交换数据类型的能力非常灵活,但也仅限于结构简单的情形。当我们讨论下一节的 rules 时,我们将看到这种属性类型上的灵活性在编写复杂的解析器时是如何有帮助的。

Those were examples of swapping out one container type for another. They make good examples because that is more likely to be surprising, and so it's getting lots of coverage here. You can also do much simpler things like parse using a uint_, and writing its attribute into a double. In general, you can swap any type T out of the attribute, as long as the swap would not result in some ill-formed assignment within the parse.
这些是替换一种容器类型为另一种类型的示例。它们是很好的例子,因为这样更令人惊讶,因此在这里得到了很多关注。您还可以做更简单的事情,比如使用 uint_ 进行解析,并将它的属性写入 double 。一般来说,您可以交换属性中的任何类型 T ,只要交换不会在解析中导致某些不正确的赋值。

Here is another example that also produces surprising results, for a different reason.
这里还有一个例子,它也产生了令人惊讶的结果,但原因不同。

namespace bp = boost::parser;
constexpr auto parser = bp::char_('a') >> bp::char_('b') >> bp::char_('c') |
                        bp::char_('x') >> bp::char_('y') >> bp::char_('z');
std::string str = "abc";
bp::tuple<char, char, char> chars;
bool b = bp::parse(str, parser, chars);
assert(b);
assert(chars == bp::tuple('c', '\0', '\0'));

This looks wrong, but is expected behavior. At every stage of the parse that produces an attribute, Boost.Parser tries to assign that attribute to some part of the out-param attribute provided to parse(), if there is one. Note that ATTR(parser) is std::string, because each sequence parser is three char_ parsers in a row, which forms a std::string; there are two such alternatives, so the overall attribute is also std::string. During the parse, when the first parser bp::char_('a') matches the input, it produces the attribute 'a' and needs to assign it to its destination. Some logic inside the sequence parser indicates that this 'a' contributes to the value in the 0th position in the result tuple, if the result is being written into a tuple. Here, we passed a bp::tuple<char, char, char>, so it writes 'a' into the first element. Each subsequent char_ parser does the same thing, and writes over the first element. If we had passed a std::string as the out-param instead, the logic would have seen that the out-param attribute is a string, and would have appended 'a' to it. Then each subsequent parser would have appended to the string.
这看起来是错误的,但这是预期的行为。在解析过程中产生属性的每个阶段,Boost.Parser 都会尝试将那个属性分配给提供给 parse() 的出参属性的一部分,如果有的话。注意, ATTR(parser)std::string ,因为每个序列解析器是连续的三个 char_ 解析器,形成一个 std::string ;有两个这样的选择,所以整体属性也是 std::string 。在解析过程中,当第一个解析器 bp::char_('a') 与输入匹配时,它会产生属性 'a' 并将其分配给目标位置。序列解析器内部的某些逻辑表明,如果结果写入元组,则这 'a' 有助于在结果元组的第 0 个位置上的值。在这里,我们传递了一个 bp::tuple<char, char, char> ,因此它将 'a' 写入第一个元素。每个后续的 char_ 解析器都会做同样的事情,并覆盖第一个元素。如果我们传递了一个 std::string 作为出参,逻辑就会看到出参属性是一个字符串,并将 'a' 附加到它上面。然后每个后续解析器都会附加到字符串上。

Boost.Parser never looks at the arity of the tuple passed to parse() to see if there are too many or too few elements in it, compared to the expected attribute for the parser. In this case, there are two extra elements that are never touched. If there had been too few elements in the tuple, you would have seen a compilation error. The reason that Boost.Parser never does this kind of type-checking up front is that the loose assignment logic is spread out among the individual parsers; the top-level parse can determine what the expected attribute is, but not whether a passed attribute of another type is a suitable stand-in.
Boost.Parser 在传递给 parse() 的元组中从不检查元组的阶数,以查看其中是否元素过多或过少,与解析器期望的属性相比。在这种情况下,有两个额外的元素从未被触及。如果元组中的元素过少,你会看到编译错误。Boost.Parser 从不进行此类类型检查的原因是松散的赋值逻辑分散在各个解析器中;顶层解析可以确定期望的属性是什么,但不能确定传递的另一个类型的属性是否是合适的替代品。

Compatibility of variant attribute out-parameters
variant 属性输出参数的兼容性

The use of a variant in an out-param is compatible if the default attribute can be assigned to the variant. No other work is done to make the assignment compatible. For instance, this will work as you'd expect:
一个输出参数中的变体使用与默认属性可以分配给 variant 兼容。无需进行其他工作以使分配兼容。例如,这将按预期工作:

namespace bp = boost::parser;
std::variant<int, double> v;
auto b = bp::parse("42", bp::int_, v);
assert(b);
assert(v.index() == 0);
assert(std::get<0>(v) == 42);

Again, this works because v = 42 is well-formed. However, other kinds of substitutions will not work. In particular, the boost::parser::tuple to aggregate or aggregate to boost::parser::tuple transformations will not work. Here's an example.
再次,这是因为 v = 42 格式正确。然而,其他类型的替换将不会工作。特别是,将 boost::parser::tuple 聚合或聚合到 boost::parser::tuple 的转换将不会工作。这里有一个例子。

struct key_value
{
    int key;
    double value;
};

namespace bp = boost::parser;
std::variant<key_value, double> kv_or_d;
key_value kv;
bp::parse("42 13.0", bp::int_ >> bp::double_, kv);      // Ok.
bp::parse("42 13.0", bp::int_ >> bp::double_, kv_or_d); // Error: ill-formed!

In this case, it would be easy for Boost.Parser to look at the alternative types covered by the variant, and do a conversion. However, there are many cases in which there is no obviously correct variant alternative type, or in which the user might expect one variant alternative type and get another. Consider a couple of cases.
在这种情况下,Boost.Parser 很容易查看变体覆盖的替代类型并进行转换。然而,有许多情况下没有明显正确的变体替代类型,或者用户可能期望一种变体替代类型却得到另一种。考虑几个例子。

struct i_d { int i; double d; };
struct d_i { double d; int i; };
using v1 = std::variant<i_d, d_i>;

struct i_s { int i; short s; };
struct d_d { double d1; double d2; };
using v2 = std::variant<i_s, d_d>;

using tup_t = boost::parser::tuple<short, short>;

If we have a parser that produces a tup_t, and we have a v1 attribute out-param, the correct variant alternative type clearly does not exist — this case is ambiguous, and anyone can see that neither variant alternative is a better match. If we were assigning a tup_t to v2, it's even worse. The same ambiguity exists, but to the user, i_s is clearly "closer" than d_d.
如果我们有一个生成 tup_t 的解析器,并且我们有一个 v1 属性输出参数,正确的变体替代类型显然不存在——这种情况是模糊的,任何人都可以看出这两种变体替代都不是更好的匹配。如果我们正在将 tup_t 分配给 v2 ,那就更糟了。存在相同的模糊性,但对于用户来说, i_s 明显比 d_d 更接近。

So, Boost.Parser only does assignment. If some parser P generates a default attribute that is not assignable to a variant alternative that you want to assign it to, you can just create a rule that creates either an exact variant alternative type, or the variant itself, and use P as your rule's parser.
所以,Boost.Parser 只做赋值。如果某个解析器 P 生成了一个不能分配给想要分配的变体备选方案的默认属性,你可以创建一个 rule ,它创建一个精确的变体备选方案类型或变体本身,并使用 P 作为你的规则解析器。

Unicode versus non-Unicode parsing
Unicode 与非 Unicode 解析

A call to parse() either considers the entire input to be in a UTF format (UTF-8, UTF-16, or UTF-32), or it considers the entire input to be in some unknown encoding. Here is how it deduces which case the call falls under:
调用 parse() 时,要么将整个输入视为 UTF 格式(UTF-8、UTF-16 或 UTF-32),要么视为某种未知编码。以下是它是如何推断调用属于哪种情况的:

  • If the range is a sequence of char8_t, or if the input is a boost::parser::utf8_view, the input is UTF-8.
    如果范围是 char8_t 的序列,或者输入是 boost::parser::utf8_view ,则输入是 UTF-8。
  • Otherwise, if the value type of the range is char, the input is in an unknown encoding.
    否则,如果范围的值类型为 char ,则输入处于未知编码。
  • Otherwise, the input is in a UTF encoding.
    否则,输入使用 UTF 编码。
[Tip] Tip  提示

if you want to want to parse in ASCII-only mode, or in some other non-Unicode encoding, use only sequences of char, like std::string or char const *.
如果您想以 ASCII-only 模式解析,或者以某些其他非 Unicode 编码解析,请仅使用类似 std::stringchar const *char 序列。

[Tip] Tip  提示

If you want to ensure all input is parsed as Unicode, pass the input range r as r | boost::parser::as_utf32 — that's the first thing that happens to it inside parse() in the Unicode parsing path anyway.
如果您想确保所有输入都被解析为 Unicode,请将输入范围 r 传递为 r | boost::parser::as_utf32 — 这就是它在 Unicode 解析路径内部发生的第一件事。

[Note] Note  注意

Since passing boost::parser::utfN_view is a special case, and since a sequence of chars r is otherwise considered an unknown encoding, boost::parser::parse(r | boost::parser::as_utf8, p) treats r as UTF-8, whereas boost::parser::parse(r, p) does not.
由于通过 boost::parser::utfN_view 是一个特殊情况,并且由于 char 序列在其他情况下被视为未知编码, boost::parser::parse(r | boost::parser::as_utf8, p)r 视为 UTF-8,而 boost::parser::parse(r, p) 则不这样做。

The trace_mode parameter to parse()
The trace_mode parameter to parse() 的翻译为:解析()的trace_mode参数

Debugging parsers is notoriously difficult once they reach a certain size. To get a verbose trace of your parse, pass boost::parser::trace::on as the final parameter to parse(). It will show you the current parser being matched, the next few characters to be parsed, and any attributes generated. See the Error Handling and Debugging section of the tutorial for details.
调试解析器一旦达到一定规模就特别困难。要获取你的解析的详细跟踪,请将 boost::parser::trace::on 作为 parse() 的最后一个参数传递。它将显示当前正在匹配的解析器、接下来要解析的几个字符以及生成的任何属性。有关详细信息,请参阅教程中的错误处理和调试部分。

Globals and error handlers
全局变量和错误处理器

Each call to parse() can optionally have a globals object associated with it. To use a particular globals object with you parser, you call with_globals() to create a new parser with the globals object in it:
每次调用 parse() 都可以选择性地与一个全局对象关联。要使用特定的全局对象与您的解析器,您可以通过调用 with_globals() 来创建一个新的包含全局对象的解析器:

struct globals_t
{
    int foo;
    std::string bar;
};
auto const parser = /* ... */;
globals_t globals{42, "yay"};
auto result = boost::parser::parse("str", boost::parser::with_globals(parser, globals));

Every semantic action within that call to parse() can access the same globals_t object using _globals(ctx).
每个对该 parse() 的调用中的语义动作都可以使用 _globals(ctx) 访问相同的 globals_t 对象。

The default error handler is great for most needs, but if you want to change it, you can do so by creating a new parser with a call to with_error_handler():
默认错误处理器适用于大多数需求,但如果你想要更改它,可以通过调用 with_error_handler() 创建一个新的解析器来做到这一点

auto const parser = /* ... */;
my_error_handler error_handler;
auto result = boost::parser::parse("str", boost::parser::with_error_handler(parser, error_handler));
[Tip] Tip  提示

If your parsing environment does not allow you to report errors to a terminal, you may want to use callback_error_handler instead of the default error handler.
如果您的解析环境不允许您向终端报告错误,您可能希望使用 callback_error_handler 来代替默认的错误处理器。

[Important] Important  重要

Globals and the error handler are ignored, if present, on any parser except the top-level parser.
全局变量和错误处理器在除顶级解析器之外的任何解析器中都会被忽略,如果存在的话。

In the earlier page about rules (Rule Parsers), I described rules as being analogous to functions. rules are, at base, organizational. Here are the common use cases for rules. Use a rule if you want to:
在关于 rules (规则解析器)的早期页面中,我将 rules 描述为类似于函数。 rules 在本质上属于组织性的。以下是 rules 的常见用例。如果你想使用 rule

  • fix the attribute type produced by a parser to something other than the default;
    修正解析器生成的属性类型,使其不是默认值
  • create a parser that produces useful diagnostic text;
    创建一个生成有用诊断文本的解析器;
  • create a recursive rule (more on this below);
    创建一个递归规则(下面将详细介绍)
  • create a set of mutually-recursive parsers;
    创建一组相互递归的解析器
  • do callback parsing.   执行回调解析。

Let's look at the use cases in detail.
让我们详细看看这些用例。

Fixing the attribute type
修复属性类型

We saw in the previous section how parse() is flexible in what types it will accept as attribute out-parameters. Here's another example.
我们在上一节中看到了 parse() 在作为属性输出参数时可以接受哪些类型的灵活性。这里有一个另一个例子。

namespace bp = boost::parser;
auto result = bp::parse(input, bp::int % ',', result);

result can be one of many different types. It could be std::vector<int>. It could be std::set<long long>. It could be a lot of things. Often, this is a very useful property; if you had to rewrite all of your parser logic because you changed the desired container in some part of your attribute from a std::vector to a std::deque, that would be annoying. However, that flexibility comes at the cost of type checking. If you want to write a parser that always produces exactly a std::vector<unsigned int> and no other type, you also probably want a compilation error if you accidentally pass that parser a std::set<unsigned int> attribute instead. There is no way with a plain parser to enforce that its attribute type may only ever be a single, fixed type.
result 可以是许多不同类型之一。它可能是 std::vector<int> 。它可能是 std::set<long long> 。它可能是许多事物。通常,这是一个非常有用的属性;如果你不得不重写所有解析器逻辑,因为你在属性的一部分将期望的容器从 std::vector 改为 std::deque ,那会很烦人。然而,这种灵活性是以类型检查为代价的。如果你想编写一个总是产生 exactly a std::vector<unsigned int> 而不是其他类型的解析器,那么如果你不小心传递了一个 std::set<unsigned int> 属性给那个解析器,你可能也希望出现编译错误。使用普通的解析器无法强制其属性类型只能是单一、固定的类型。

Fortunately, rules allow you to write a parser that has a fixed attribute type. Every rule has a specific attribute type, provided as a template parameter. If one is not specified, the rule has no attribute. The fact that the attribute is a specific type allows you to remove attribute flexibility. For instance, say we have a rule defined like this:
幸运的是, rules 允许您编写具有固定属性类型的解析器。每个规则都有一个特定的属性类型,作为模板参数提供。如果没有指定,则规则没有属性。属性是特定类型的事实允许您去除属性灵活性。例如,假设我们有一个如下定义的规则:

bp::rule<struct doubles, std::vector<double>> doubles = "doubles";
auto const doubles_def = bp::double_ % ',';
BOOST_PARSER_DEFINE_RULES(doubles);

You can then use it in a call to parse(), and parse() will return a std::optional<std::vector<double>>:
您可以在调用 parse() 时使用它, parse() 将返回一个 std::optional<std::vector<double>>

auto const result = bp::parse(input, doubles, bp::ws);

If you call parse() with an attribute out-parameter, it must be exactly std::vector<double>:
如果您使用 parse() 带有属性输出参数,它必须是精确的 std::vector<double>

std::vector<double> vec_result;
bp::parse(input, doubles, bp::ws, vec_result); // Ok.
std::deque<double> deque_result;
bp::parse(input, doubles, bp::ws, deque_result); // Ill-formed!

If we wanted to use a std::deque<double> as the attribute type of our rule:
如果我们想将 std::deque<double> 用作我们规则的属性类型:

// Attribute changed to std::deque<double>.
bp::rule<struct doubles, std::deque<double>> doubles = "doubles";
auto const doubles_def = bp::double_ % ',';
BOOST_PARSER_DEFINE_RULES(doubles);

int main()
{
    std::deque<double> deque_result;
    bp::parse(input, doubles, bp::ws, deque_result); // Ok.
}

The take-away here is that the attribute flexibility is still available, but only within the rule — the parser bp::double_ % ',' can parse into a std::vector<double> or a std::deque<double>, but the rule doubles must parse into only the exact attribute it was declared to generate.
这里的关键是属性灵活性仍然可用,但仅限于规则内——解析器 bp::double_ % ',' 可以解析为 std::vector<double>std::deque<double> ,但规则 doubles 必须仅解析为声明时指定的确切属性。

The reason for this is that, inside the rule parsing implementation, there is code something like this:
. 这个原因在于,在规则解析实现内部,存在类似以下的代码:

using attr_t = ATTR(doubles_def);
attr_t attr;
parse(first, last, parser, attr);
attribute_out_param = std::move(attr);

Where attribute_out_param is the attribute out-parameter we pass to parse(). If that final move assignment is ill-formed, the call to parse() is too.
attribute_out_param 是我们传递给 parse() 的属性输出参数。如果最后的移动赋值不正确,对 parse() 的调用也是如此。

You can also use rules to exploit attribute flexibility. Even though a rule reduces the flexibility of attributes it can generate, the fact that it is so easy to write a new rule means that we can use rules themselves to get the attribute flexibility we want across our code:
您也可以使用规则来利用属性灵活性。尽管规则会降低它所生成的属性的灵活性,但编写新规则如此简单的事实意味着我们可以使用规则本身来在我们的代码中获得我们想要的属性灵活性:

namespace bp = boost::parser;

// We only need to write the definition once...
auto const generic_doubles_def = bp::double_ % ',';

bp::rule<struct vec_doubles, std::vector<double>> vec_doubles = "vec_doubles";
auto const & vec_doubles_def = generic_doubles_def; // ... and re-use it,
BOOST_PARSER_DEFINE_RULES(vec_doubles);

// Attribute changed to std::deque<double>.
bp::rule<struct deque_doubles, std::deque<double>> deque_doubles = "deque_doubles";
auto const & deque_doubles_def = generic_doubles_def; // ... and re-use it again.
BOOST_PARSER_DEFINE_RULES(deque_doubles);

Now we have one of each, and we did not have to copy any parsing logic that would have to be maintained in two places.
现在我们每种都有一份,而且我们不必复制任何需要在两个地方维护的解析逻辑。

Sometimes, you need to create a rule to enforce a certain attribute type, but the rule's attribute is not constructible from its parser's attribute. When that happens, you'll need to write a semantic action.
有时,您需要创建一条规则来强制执行某种属性类型,但规则的属性无法从其解析器的属性构建。当这种情况发生时,您需要编写语义动作。

struct type_t
{
    type_t() = default;
    explicit type_t(double x) : x_(x) {}
    // etc.

    double x_;
};

namespace bp = boost::parser;

auto doubles_to_type = [](auto & ctx) {
    using namespace bp::literals;
    _val(ctx) = type_t(_attr(ctx)[0_c] * _attr(ctx)[1_c]);
};

bp::rule<struct type_tag, type_t> type = "type";
auto const type_def = (bp::double_ >> bp::double_)[doubles_to_type];
BOOST_PARSER_DEFINE_RULES(type);

For a rule R and its parser P, we do not need to write such a semantic action if:
对于规则 R 及其解析器 P ,如果不需要编写这样的语义动作:

- ATTR(R) is an aggregate, and ATTR(P) is a compatible tuple;
- ATTR(R) 是一个聚合, ATTR(P) 是一个兼容元组;

- ATTR(R) is a tuple, and ATTR(P) is a compatible aggregate;
- ATTR(R) 是一个元组, ATTR(P) 是一个兼容的聚合

- ATTR(R) is a non-aggregate class type C, and ATTR(P) is a tuple whose elements can be used to construct C; or
- ATTR(R) 是一个非聚合类类型 C ,而 ATTR(P) 是一个元组,其元素可以用来构建 C ;或者

- ATTR(R) and ATTR(P) are compatible types.
- ATTR(R)ATTR(P) 是兼容的类型。

The notion of "compatible" is defined in The parse() API.
"“兼容”这一概念在 The parse() API 中定义。"

Creating a parser for better diagnostics
创建一个用于更好诊断的解析器

Each rule has associated diagnostic text that Boost.Parser can use for failures of that rule. This is useful when the parse reaches a parse failure at an expectation point (see Expectation points). Let's say you have the following code defined somewhere.
每个 rule 都与 Boost.Parser 可以用于该规则失败的诊断文本相关联。这在解析达到期望点时的解析失败时很有用(参见期望点)。假设你在某处定义了以下代码。

namespace bp = boost::parser;

bp::rule<struct value_tag> value =
    "an integer, or a list of integers in braces";

auto const ints = '{' > (value % ',') > '}';
auto const value_def = bp::int_ | ints;

BOOST_PARSER_DEFINE_RULES(value);

Notice the two expectation points. One before (value % ','), one before the final '}'. Later, you call parse in some input:
请注意两个期望点。一个在 (value % ',') 之前,一个在最终的 '}' 之前。稍后,你在某些输入中调用 parse:

bp::parse("{ 4, 5 a", value, bp::ws);

This runs should of the second expectation point, and produces output like this:
这次运行应该达到第二个期望点,并产生如下输出:

1:7: error: Expected '}' here:
{ 4, 5 a
       ^

That's a pretty good error message. Here's what it looks like if we violate the earlier expectation:
这是一个相当好的错误信息。如果我们违反了之前的期望,它看起来是这样的:

bp::parse("{ }", value, bp::ws);
1:2: error: Expected an integer, or a list of integers in braces % ',' here:
{ }
  ^

Not nearly as nice. The problem is that the expectation is on (value % ','). So, even thought we gave value reasonable dianostic text, we put the text on the wrong thing. We can introduce a new rule to put the diagnstic text in the right place.
远不如预期好。问题是期望在 (value % ',') 上。所以,尽管我们给出了 value 合理的诊断文本,但我们把文本放在了错误的地方。我们可以引入一条新规则,将诊断文本放在正确的位置。

namespace bp = boost::parser;

bp::rule<struct value_tag> value =
    "an integer, or a list of integers in braces";
bp::rule<struct comma_values_tag> comma_values =
    "a comma-delimited list of integers";

auto const ints = '{' > comma_values > '}';
auto const value_def = bp::int_ | ints;
auto const comma_values_def = (value % ',');

BOOST_PARSER_DEFINE_RULES(value, comma_values);

Now when we call bp::parse("{ }", value, bp::ws) we get a much better message:
现在当我们调用 bp::parse("{ }", value, bp::ws) 时,我们得到一条更好的消息:

1:2: error: Expected a comma-delimited list of integers here:
{ }
  ^

The rule value might be useful elsewhere in our code, perhaps in another parser. It's diagnostic text is appropriate for those other potential uses.
这段代码可能在我们代码的其他地方有用,也许在另一个解析器中。它的诊断文本适用于那些其他潜在用途。

Recursive rules  递归规则

It's pretty common to see grammars that include recursive rules. Consider this EBNF rule for balanced parentheses:
它很常见,语法中包含递归规则。考虑这个平衡括号的 EBNF 规则:

<parens> ::= "" | ( "(" <parens> ")" )

We can try to write this using Boost.Parser like this:
我们可以尝试使用 Boost.Parser 这样编写:

namespace bp = boost::parser;
auto const parens = '(' >> parens >> ')' | bp::eps;

We had to put the bp::eps second, because Boost.Parser's parsing algorithm is greedy. Otherwise, it's just a straight transliteration. Unfortunately, it does not work. The code is ill-formed because you can't define a variable in terms of itself. Well you can, but nothing good comes of it. If we instead make the parser in terms of a forward-declared rule, it works.
我们不得不将 bp::eps 放在第二个位置,因为 Boost.Parser 的解析算法是贪婪的。否则,它只是简单的转写。不幸的是,它不起作用。代码是不合法的,因为你不能在自身定义一个变量。虽然你可以这样做,但结果并不好。如果我们用前声明的 rule 来编写解析器,它就能工作了。

namespace bp = boost::parser;
bp::rule<struct parens_tag> parens = "matched parentheses";
auto const parens_def = '(' >> parens > ')' | bp::eps;
BOOST_PARSER_DEFINE_RULES(parens);

Later, if we use it to parse, it does what we want.
稍后,如果我们用它来解析,它就会做我们想要的事情。

assert(bp::parse("(((())))", parens, bp::ws));

When it fails, it even produces nice diagnostics.
当它失败时,甚至还能产生良好的诊断信息。

bp::parse("(((()))", parens, bp::ws);
1:7: error: Expected ')' here (end of input):
(((()))
       ^

Recursive rules work differently from other parsers in one way: when re-entering the rule recursively, only the attribute variable (_attr(ctx) in your semantic actions) is unique to that instance of the rule. All the other state of the uppermost instance of that rule is shared. This includes the value of the rule (_val(ctx)), and the locals and parameters to the rule. In other words, _val(ctx) returns a reference to the same object in every instance of a recursive rule. This is because each instance of the rule needs a place to put the attribute it generates from its parse. However, we only want a single return value for the uppermost rule; if each instance had a separate value in _val(ctx), then it would be impossible to build up the result of a recursive rule step by step during in the evaluation of the recursive instantiations.
递归工作方式与其他解析器不同:在递归进入规则时,只有属性变量(在您的语义动作中为 _attr(ctx) )对该规则的实例是唯一的。该规则最顶层实例的所有其他状态都是共享的。这包括规则值( _val(ctx) )、局部变量和规则参数。换句话说, _val(ctx) 在递归 rule 的每个实例中返回对同一对象的引用。这是因为每个规则的实例都需要一个地方来放置从解析生成的属性。然而,我们只想为最顶层的规则返回单个值;如果每个实例在 _val(ctx) 中都有不同的值,那么在递归实例的评估过程中逐步构建递归规则的结果将是不可能的。

Also, consider this rule:
此外,请考虑这条规则:

namespace bp = boost::parser;
bp::rule<struct ints_tag, std::vector<int>> ints = "ints";
auto const ints_def = bp::int_ >> ints | bp::eps;

What is the default attribute type for ints_def? It sure looks like std::optional<std::vector<int>>. Inside the evaluation of ints, Boost.Parser must evaluate ints_def, and then produce a std::vector<int> — the return type of ints — from it. How? How do you turn a std::optional<std::vector<int>> into a std::vector<int>? To a human, it seems obvious, but the metaprogramming that properly handles this simple example and the general case is certainly beyond me.
默认 ints_def 的属性类型是什么?它看起来像是 std::optional<std::vector<int>> 。在 ints 的评估过程中,Boost.Parser 必须评估 ints_def ,然后从它生成一个 std::vector<int> —— ints 的返回类型。如何做到?你如何将一个 std::optional<std::vector<int>> 转换为 std::vector<int> ?对人类来说这似乎很明显,但正确处理这个简单示例和一般情况的元编程肯定超出了我的能力。

Boost.Parser has a specific semantic for what consitutes a recursive rule. Each rule has a tag type associated with it, and if Boost.Parser enters a rule with a certain tag Tag, and the currently-evaluating rule (if there is one) also has the tag Tag, then rule instance being entered is considered to be a recursion. No other situations are considered recursion. In particular, if you have rules Ra and Rb, and Ra uses Rb, which in turn used Ra, the second use of Ra is not considered recursion. Ra and Rb are of course mutually recursive, but neither is considered a "recursive rule" for purposes of getting a unique value, locals, and parameters.
Boost.Parser 具有特定的语义来定义递归规则。每个规则都与一个标签类型相关联,如果 Boost.Parser 进入一个带有特定标签 Tag 的规则,并且当前正在评估的规则(如果有的话)也带有标签 Tag ,那么进入的规则实例被认为是递归。其他情况不认为是递归。特别是,如果你有规则 RaRb ,并且 Ra 使用 Rb ,而 Rb 又使用 Ra ,那么 Ra 的第二次使用不被认为是递归。 RaRb 当然是相互递归的,但它们都不被认为是用于获取唯一值、局部变量和参数的“递归规则”。

Mutually-recursive rules
相互递归规则

One of the advantages of using rules is that you can declare all your rules up front and then use them immediately afterward. This lets you make rules that use each other without introducing cycles:
使用规则的一个优点是您可以在一开始就声明所有规则,然后立即使用它们。这使得您能够创建相互使用的规则,而不会引入循环:

namespace bp = boost::parser;

// Assume we have some polymorphic type that can be an object/dictionary,
// array, string, or int, called `value_type`.

bp::rule<class string, std::string> const string = "string";
bp::rule<class object_element, bp::tuple<std::string, value_type>> const object_element = "object-element";
bp::rule<class object, value_type> const object = "object";
bp::rule<class array, value_type> const array = "array";
bp::rule<class value_tag, value_type> const value = "value";

auto const string_def = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];
auto const object_element_def = string > ':' > value;
auto const object_def = '{'_l >> -(object_element % ',') > '}';
auto const array_def = '['_l >> -(value % ',') > ']';
auto const value_def = bp::int_ | bp::bool_ | string | array | object;

BOOST_PARSER_DEFINE_RULES(string, object_element, object, array, value);

Here we have a parser for a Javascript-value-like type value_type. value_type may be an array, which itself may contain other arrays, objects, strings, etc. Since we need to be able to parse objects within arrays and vice versa, we need each of those two parsers to be able to refer to each other.
这里有一个用于类似 JavaScript 值的解析器 value_typevalue_type 可能是一个数组,它本身可能包含其他数组、对象、字符串等。由于我们需要能够解析数组中的对象以及反之亦然,因此需要这两个解析器能够相互引用。

Callback parsing  回调解析

Only rules can be callback parsers, so if you want to get attributes supplied to you via callbacks instead of somewhere in the middle of a giant attribute that represents the whole parse result, you need to use rules. See Parsing JSON With Callbacks for an extended example of callback parsing.
只有 rules 可以作为回调解析器,所以如果你想通过回调而不是在表示整个解析结果的巨大属性中间某处获取传递给你的属性,你需要使用 rules 。请参阅使用回调解析 JSON 的示例,以了解回调解析的扩展示例。

Accessors available in semantic actions on rules
访问规则上可用的语义动作访问器
_val()

Inside all of a rule's semantic actions, the expression _val(ctx) is a reference to the attribute that the rule generates. This can be useful when you want subparsers to build up the attribute in a specific way:
在所有规则的语义动作中,表达式 _val(ctx) 是对规则生成的属性的引用。这在你想要子解析器以特定方式构建属性时非常有用:

namespace bp = boost::parser;
using namespace bp::literals;

bp::rule<class ints, std::vector<int>> const ints = "ints";
auto twenty_zeros = [](auto & ctx) { _val(ctx).resize(20, 0); };
auto push_back = [](auto & ctx) { _val(ctx).push_back(_attr(ctx)); };
auto const ints_def = "20-zeros"_l[twenty_zeros] | +bp::int_[push_back];
BOOST_PARSER_DEFINE_RULES(ints);
[Tip] Tip  提示

That's just an example. It's almost always better to do things without using semantic actions. We could have instead written ints_def as "20-zeros" >> bp::attr(std::vector<int>(20)) | +bp::int_, which has the same semantics, is a lot easier to read, and is a lot less code.
这只是个例子。几乎总是最好在不使用语义动作的情况下做事。我们本可以将其写成 ints_def 作为 "20-zeros" >> bp::attr(std::vector<int>(20)) | +bp::int_ ,它们具有相同的语义,更容易阅读,并且代码更少。

Locals  当地人

The rule template takes another template parameter we have not discussed yet. You can pass a third parameter LocalState to rule, which will be defaulted csontructed by the rule, and made available within semantic actions used in the rule as _locals(ctx). This gives your rule some local state, if it needs it. The type of LocalState can be anything regular. It could be a single value, a struct containing multiple values, or a tuple, among others.
rule 模板使用了我们尚未讨论的另一个模板参数。您可以将第三个参数 LocalState 传递给 rule ,它将由 rule 默认构造,并在规则中使用的语义动作中作为 _locals(ctx) 提供。这为您的规则提供了一些局部状态,如果需要的话。 LocalState 的类型可以是任何常规类型。它可以是单个值、包含多个值的结构体或元组等。

struct foo_locals
{
    char first_value = 0;
};

namespace bp = boost::parser;

bp::rule<class foo, int, foo_locals> const foo = "foo";

auto record_first = [](auto & ctx) { _locals(ctx).first_value = _attr(ctx); }
auto check_against_first = [](auto & ctx) {
    char const first = _locals(ctx).first_value;
    char const attr = _attr(ctx);
    if (attr == first)
        _pass(ctx) = false;
    _val(ctx) = (int(first) << 8) | int(attr);
};

auto const foo_def = bp::cu[record_first] >> bp::cu[check_against_first];
BOOST_PARSER_DEFINE_RULES(foo);

foo matches the input if it can match two elements of the input in a row, but only if they are not the same value. Without locals, it's a lot harder to write parsers that have to track state as they parse.
foo 匹配输入,如果它能够连续匹配输入中的两个元素,但前提是这两个元素不是相同的值。没有局部变量,编写需要跟踪状态的解析器会更困难。

Parameters  参数

Sometimes, it is convenient to parameterize parsers. Consider these parsing rules from the YAML 1.2 spec:
有时,参数化解析器很方便。考虑以下来自 YAML 1.2 规范的解析规则:

[80]
s-separate(n,BLOCK-OUT) ::= s-separate-lines(n)
s-separate(n,BLOCK-IN)  ::= s-separate-lines(n)
s-separate(n,FLOW-OUT)  ::= s-separate-lines(n)
s-separate(n,FLOW-IN)   ::= s-separate-lines(n)
s-separate(n,BLOCK-KEY) ::= s-separate-in-line
s-separate(n,FLOW-KEY)  ::= s-separate-in-line

[136]
in-flow(n,FLOW-OUT)  ::= ns-s-flow-seq-entries(n,FLOW-IN)
in-flow(n,FLOW-IN)   ::= ns-s-flow-seq-entries(n,FLOW-IN)
in-flow(n,BLOCK-KEY) ::= ns-s-flow-seq-entries(n,FLOW-KEY)
in-flow(n,FLOW-KEY)  ::= ns-s-flow-seq-entries(n,FLOW-KEY)

[137]
c-flow-sequence(n,c) ::= “[” s-separate(n,c)? in-flow(c)? “]”

YAML [137] says that the parsing should proceed into two YAML subrules, both of which have these n and c parameters. It is certainly possible to transliterate these YAML parsing rules to something that uses unparameterized Boost.Parser rules, but it is quite painful to do so. It is better to use a parameterized rule.
YAML [137] 表示解析应继续进行到两个 YAML 子规则,这两个子规则都有这些 nc 参数。当然,可以将这些 YAML 解析规则转换为使用未参数化的 Boost.Parser rules 的某种形式,但这相当痛苦。最好使用参数化规则。

You give parameters to a rule by calling its with() member. The values you pass to with() are used to create a boost::parser::tuple that is available in semantic actions attached to the rule, using _params(ctx).
您通过调用 rulewith() 成员来传递参数。您传递给 with() 的值用于创建一个在规则附加的语义动作中可用的 boost::parser::tuple ,使用 _params(ctx)

Passing parameters to rules like this allows you to easily write parsers that change the way they parse depending on contextual data that they have already parsed.
传递参数给 rules 的方式允许你轻松编写根据已解析的上下文数据改变解析方式的解析器。

Here is an implementation of YAML [137]. It also implements the two YAML rules used directly by [137], rules [136] and [80]. The rules that those rules use are also represented below, but are implemented using only eps, so that I don't have to repeat too much of the (very large) YAML spec.
这里是对 YAML [137]的一个实现。它还实现了[137]直接使用的两个 YAML 规则,即规则[136]和[80]。这些规则所使用的规则也如下所示,但仅使用 eps 实现,这样我就不必重复太多(非常庞大)的 YAML 规范。

namespace bp = boost::parser;

// A type to represent the YAML parse context.
enum class context {
    block_in,
    block_out,
    block_key,
    flow_in,
    flow_out,
    flow_key
};

// A YAML value; no need to fill it in for this example.
struct value
{
    // ...
};

// YAML [66], just stubbed in here.
auto const s_separate_in_line = bp::eps;

// YAML [137].
bp::rule<struct c_flow_seq_tag, value> c_flow_sequence = "c-flow-sequence";
// YAML [80].
bp::rule<struct s_separate_tag> s_separate = "s-separate";
// YAML [136].
bp::rule<struct in_flow_tag, value> in_flow = "in-flow";
// YAML [138]; just eps below.
bp::rule<struct ns_s_flow_seq_entries_tag, value> ns_s_flow_seq_entries =
    "ns-s-flow-seq-entries";
// YAML [81]; just eps below.
bp::rule<struct s_separate_lines_tag> s_separate_lines = "s-separate-lines";

// Parser for YAML [137].
auto const c_flow_sequence_def =
    '[' >>
    -s_separate.with(bp::_p<0>, bp::_p<1>) >>
    -in_flow.with(bp::_p<0>, bp::_p<1>) >>
    ']';
// Parser for YAML [80].
auto const s_separate_def = bp::switch_(bp::_p<1>)
    (context::block_out, s_separate_lines.with(bp::_p<0>))
    (context::block_in, s_separate_lines.with(bp::_p<0>))
    (context::flow_out, s_separate_lines.with(bp::_p<0>))
    (context::flow_in, s_separate_lines.with(bp::_p<0>))
    (context::block_key, s_separate_in_line)
    (context::flow_key, s_separate_in_line);
// Parser for YAML [136].
auto const in_flow_def = bp::switch_(bp::_p<1>)
    (context::flow_out, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_in))
    (context::flow_in, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_in))
    (context::block_out, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_key))
    (context::flow_key, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_key));

auto const ns_s_flow_seq_entries_def = bp::eps;
auto const s_separate_lines_def = bp::eps;

BOOST_PARSER_DEFINE_RULES(
    c_flow_sequence,
    s_separate,
    in_flow,
    ns_s_flow_seq_entries,
    s_separate_lines);

YAML [137] (c_flow_sequence) parses a list. The list may be empty, and must be surrounded by brackets, as you see here. But, depending on the current YAML context (the c parameter to [137]), we may require certain spacing to be matched by s-separate, and how sub-parser in-flow behaves also depends on the current context.
YAML [137]( c_flow_sequence )解析列表。列表可能为空,并且必须用括号括起来,就像这里一样。但是,根据当前的 YAML 上下文([137]的 c 参数),我们可能需要通过 s-separate 匹配某些间距,并且子解析器 in-flow 的行为也取决于当前上下文。

In s_separate above, we parse differently based on the value of c. This is done above by using the value of the second parameter to s_separate in a switch-parser. The second parameter is looked up by using _p as a parse argument.
在上述 s_separate 中,我们根据 c 的值进行不同的解析。这是通过使用 switch-parser 中的第二个参数的值来实现的。第二个参数是通过使用 _p 作为解析参数来查找的。

in_flow does something similar. Note that in_flow calls its subrule by passing its first parameter, but using a fixed value for the second value. s_separate only passes its n parameter conditionally. The point is that a rule can be used with and without .with(), and that you can pass constants or parse arguments to .with().
in_flow 做类似的事情。注意, in_flow 通过传递第一个参数来调用其子规则,但第二个值使用固定值。 s_separate 仅在条件满足时传递其 n 参数。重点是规则可以带 .with() 使用,也可以不带使用,并且可以向 .with() 传递常量或解析参数。

With those rules defined, we could write a unit test for YAML [137] like this:
定义了这些规则后,我们可以这样编写一个针对 YAML [137] 的单元测试:

auto const test_parser = c_flow_sequence.with(4, context::block_out);
auto result = bp::parse("[]", test_parser);
assert(result);

You could extend this with tests for different values of n and c. Obviously, in real tests, you parse actual contents inside the "[]", if the other rules were implemented, like [138].
您可以使用不同的 nc 值进行扩展测试。显然,在实际测试中,如果实施了其他规则,如[138],您将解析 "[]" 内的实际内容。

The _p variable template
_p 变量模板

Getting at one of a rule's arguments and passing it as an argument to another parser can be very verbose. _p is a variable template that allows you to refer to the nth argument to the current rule, so that you can, in turn, pass it to one of the rule's subparsers. Using this, foo_def above can be rewritten as:
获取一个规则的参数并将其作为参数传递给另一个解析器可能非常冗长。 _p 是一个变量模板,允许您引用当前规则的 n 个参数,这样您就可以将其传递给规则的一个子解析器。使用此功能,上面的 foo_def 可以重写为:

auto const foo_def = bp::repeat(bp::_p<0>)[' '_l];

Using _p can prevent you from having to write a bunch of lambdas that get each get an argument out of the parse context using _params(ctx)[0_c] or similar.
使用 _p 可以防止您不得不编写一大堆 lambda 表达式,每个表达式都使用 _params(ctx)[0_c] 或类似的方式从解析上下文中获取一个参数。

Note that _p is a parse argument (see The Parsers And Their Uses), meaning that it is an invocable that takes the context as its only parameter. If you want to use it inside a semantic action, you have to call it.
请注意 _p 是一个解析参数(参见《解析器和它们的用途》),意味着它是一个只接受上下文作为参数的可调用对象。如果您想在语义动作中使用它,必须调用它。

Special forms of semantic actions usable within a rule
特殊形式的语义动作,可在规则中使用

Semantic actions in this tutorial are usually of the signature void (auto & ctx). That is, they take a context by reference, and return nothing. If they were to return something, that something would just get dropped on the floor.
本教程中的语义动作通常具有签名 void (auto & ctx) 。也就是说,它们通过引用接收上下文,并返回空值。如果它们返回某些内容,那些内容就会被扔在地上。

It is a pretty common pattern to create a rule in order to get a certain kind of value out of a parser, when you don't normally get it automatically. If I want to parse an int, int_ does that, and the thing that I parsed is also the desired attribute. If I parse an int followed by a double, I get a boost::parser::tuple containing one of each. But what if I don't want those two values, but some function of those two values? I probably write something like this.
这是一个很常见的模式,当你不希望自动获取时,为了从解析器中获取某种类型的值而创建一个规则。如果我想解析一个 intint_ 就做这个,我解析的东西也是想要的属性。如果我解析一个 int 后面跟着一个 double ,我会得到一个包含每个元素的 boost::parser::tuple 。但如果我不想得到这两个值,而是想得到这两个值的某个函数呢?我可能会写点像这样东西。

struct obj_t { /* ... */ };
obj_t to_obj(int i, double d) { /* ... */ }

namespace bp = boost::parser;
bp::rule<struct obj_tag, obj_t> obj = "obj";
auto make_obj = [](auto & ctx) {
    using boost::hana::literals;
    _val(ctx) = to_obj(_attr(ctx)[0_c], _attr(ctx)[1_c]);
};
constexpr auto obj_def = (bp::int_ >> bp::double_)[make_obj];

That's fine, if a little verbose. However, you can also do this instead:
那没问题,有点啰嗦。然而,你也可以这样做:

namespace bp = boost::parser;
bp::rule<struct obj_tag, obj_t> obj = "obj";
auto make_obj = [](auto & ctx) {
    using boost::hana::literals;
    return to_obj(_attr(ctx)[0_c], _attr(ctx)[1_c]);
};
constexpr auto obj_def = (bp::int_ >> bp::double_)[make_obj];

Above, we return the value from a semantic action, and the returned value gets assigned to _val(ctx).
以上,我们从语义动作返回值,返回的值被赋给 _val(ctx)

Finally, you can provide a function that takes the individual elements of the attribute (if it's a tuple), and returns the value to assign to _val(ctx):
最后,你可以提供一个函数,该函数接受属性(如果它是元组)的各个元素,并返回分配给 _val(ctx) 的值

namespace bp = boost::parser;
bp::rule<struct obj_tag, obj_t> obj = "obj";
constexpr auto obj_def = (bp::int_ >> bp::double_)[to_obj];

More formally, within a rule, the use of a semantic action is determined as follows. Assume we have a function APPLY that calls a function with the elements of a tuple, like std::apply. For some context ctx, semantic action action, and attribute attr, action is used like this:
更正式地说,在一条规则中,语义动作的使用如下确定。假设我们有一个函数 APPLY ,它调用一个带有元组元素的函数,如 std::apply 。对于某个上下文 ctx ,语义动作 action 和属性 attraction 的使用如下:

- _val(ctx) = APPLY(action, std::move(attr)), if that is well-formed, and attr is a tuple of size 2 or larger;
- 如果那样是正确格式的,并且 attr 是一个大小为 2 或更大的元组;

- otherwise, _val(ctx) = action(ctx), if that is well-formed;
否则, _val(ctx) = action(ctx) ,如果它是正确形成的;

- otherwise, action(ctx).
否则, action(ctx)

The first case does not pass the context to the action at all. The last case is the normal use of semantic actions outside of rules.
第一种情况根本不将上下文传递给动作。最后一种情况是规则之外的语义动作的正常使用。

Unless otherwise noted, all the algorithms and views are constrained very much like the way the parse() overloads are. The kinds of ranges, parsers, etc., that they accept are the same.
除非另有说明,所有算法和视图都受到非常类似于 parse() 重载的方式的限制。它们接受的类型、解析器等范围是相同的。

boost::parser::search()
boost::parser::search() 不可翻译

As shown in The parse() API, the two patterns of parsing in Boost.Parser are whole-parse and prefix-parse. When you want to find something in the middle of the range being parsed, there's no parse API for that. You can of course make a simple parser that skips everything before what you're looking for.
如 The parse() API 所示,Boost.Parser 中的解析模式有两种:完整解析和前缀解析。当你想在解析范围内的中间位置查找某些内容时,没有 parse API 可以做到这一点。当然,你可以创建一个简单的解析器,跳过你想要查找内容之前的所有内容。

namespace bp = boost::parser;
constexpr auto parser = /* ... */;
constexpr auto middle_parser = bp::omit[*(bp::char_ - parser)] >> parser;

middle_parser will skip over everything, one char_ at a time, as long as the next char_ is not the beginning of a successful match of parser. After this, control passes to parser itself. Ok, so that's not too hard to write. If you need to parse something from the middle in order to generate attributes, this is what you should use.
middle_parser 将跳过所有内容,每次跳过一个 char_ ,只要下一个 char_ 不是 parser 成功匹配的开始。之后,控制权传递给 parser 本身。好吧,这并不难写。如果您需要从中部解析某些内容以生成属性,这就是您应该使用的。

However, it often turns out you only need to find some subrange in the parsed range. In these cases, it would be nice to turn this into a proper algorithm in the pattern of the ones in std::ranges, since that's more idiomatic. boost::parser::search() is that algorithm. It has very similar semantics to std::ranges::search, except that it searches not for a match to an exact subrange, but to a match with the given parser. Like std::ranges::search(), it returns a subrange (boost::parser::subrange in C++17, std::ranges::subrange in C++20 and later).
然而,通常情况下,你只需要在解析的范围内找到某个子范围。在这些情况下,将其转换为类似于 std::ranges 中的算法模式会更好,因为这样更符合惯例。 boost::parser::search() 就是那个算法。它与 std::ranges::search 的语义非常相似,不同之处在于它不是搜索与精确子范围匹配,而是与给定的解析器匹配。像 std::ranges::search() 一样,它返回一个子范围(C++17 中的 boost::parser::subrange ,C++20 及以后版本中的 std::ranges::subrange )。

namespace bp = boost::parser;
auto result = bp::search("aaXYZq", bp::lit("XYZ"), bp::ws);
assert(!result.empty());
assert(std::string_view(result.begin(), result.end() - result.begin()) == "XYZ");

Since boost::parser::search() returns a subrange, whatever parser you give it produces no attribute. I wrote bp::lit("XYZ") above; if I had written bp::string("XYZ") instead, the result (and lack of std::string construction) would not change.
由于 boost::parser::search() 返回一个子范围,无论你给它什么解析器,都不会产生属性。我在上面写了 bp::lit("XYZ") ;如果我用 bp::string("XYZ") 代替,结果(以及缺少 std::string 构造)都不会改变。

As you can see above, one aspect of boost::parser::search() differs intentionally from the conventions of the std::ranges algorithms — it accepts C-style strings, treating them as if they were proper ranges.
如您所见, boost::parser::search() 的一个方面故意与 std::ranges 算法的惯例不同——它接受 C 风格字符串,将它们视为适当的范围。

Also, boost::parser::search() knows how to accommodate your iterator type. You can pass the C-style string "aaXYZq" as in the example above, or "aaXYZq" | bp::as_utf32, or "aaXYZq" | bp::as_utf8, or even "aaXYZq" | bp::as_utf16, and it will return a subrange whose iterators are the type that you passed as input, even though internally the iterator type might be something different (a UTF-8 -> UTF-32 transcoding iterator in Unicode parsing, as with all the | bp::as_utfN examples above). As long as you pass a range to be parsed whose value type is char, char8_t, char32_t, or that is adapted using some combination of as_utfN adaptors, this accommodation will operate correctly.
此外, boost::parser::search() 知道如何适应您的迭代器类型。您可以将上面的示例中的 C 风格字符串 "aaXYZq" 传递,或者 "aaXYZq" | bp::as_utf32 ,或者 "aaXYZq" | bp::as_utf8 ,或者甚至 "aaXYZq" | bp::as_utf16 ,它将返回一个子范围,其迭代器类型与您传递的类型相同,即使内部迭代器类型可能不同(Unicode 解析中的 UTF-8 -> UTF-32 转码迭代器,如上面所有 | bp::as_utfN 示例所示)。只要传递一个要解析的范围,其值类型为 charchar8_tchar32_t ,或者通过某些 as_utfN 适配器的组合进行适配,这种适应就会正常工作。

boost::parser::search() has multiple overloads. You can pass a range or an iterator/sentinel pair, and you can pass a skip parser or not. That's four overloads. Also, all four overloads take an optional boost::parser::trace parameter at the end. This is really handy for investigating why you're not finding something in the input that you expected to.
boost::parser::search() 有多个重载。你可以传递一个范围或迭代器/哨兵对,也可以传递一个跳过解析器或不传递。这四种重载。另外,所有四种重载都在最后接受一个可选的 boost::parser::trace 参数。这真的很方便,可以用来调查为什么你没有找到你期望在输入中找到的东西。

boost::parser::search_all
boost::parser::search_all 搜索所有

boost::parser::search_all creates boost::parser::search_all_views. boost::parser::search_all_view is a std::views-style view. It produces a range of subranges. Each subrange it produces is the next match of the given parser in the parsed range.
boost::parser::search_all 创建 boost::parser::search_all_viewsboost::parser::search_all_view 是一种 std::views 风格的视图。它产生一系列子范围。它产生的每个子范围都是给定解析器在解析范围中的下一个匹配项。

namespace bp = boost::parser;
auto r = "XYZaaXYZbaabaXYZXYZ" | bp::search_all(bp::lit("XYZ"));
int count = 0;
// Prints XYZ XYZ XYZ XYZ.
for (auto subrange : r) {
    std::cout << std::string_view(subrange.begin(), subrange.end() - subrange.begin()) << " ";
    ++count;
}
std::cout << "\n";
assert(count == 4);

All the details called out in the subsection on boost::parser::search() above apply to boost::parser::search_all: its parser produces no attributes; it accepts C-style strings as if they were ranges; and it knows how to get from the internally-used iterator type back to the given iterator type, in typical cases.
所有在上述 boost::parser::search() 子节中提到的细节都适用于 boost::parser::search_all :它的解析器不产生属性;它将 C 风格字符串视为范围;并且它知道如何在内部使用的迭代器类型和给定的迭代器类型之间转换,在典型情况下。

boost::parser::search_all can be called with, and boost::parser::search_all_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.
boost::parser::search_all 可以与,以及 boost::parser::search_all_view 可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace

boost::parser::split

boost::parser::split creates boost::parser::split_views. boost::parser::split_view is a std::views-style view. It produces a range of subranges of the parsed range split on matches of the given parser. You can think of boost::parser::split_view as being the complement of boost::parser::search_all_view, in that boost::parser::split_view produces the subranges between the subranges produced by boost::parser::search_all_view. boost::parser::split_view has very similar semantics to std::views::split_view. Just like std::views::split_view, boost::parser::split_view will produce empty ranges between the beginning/end of the parsed range and an adjacent match, or between adjacent matches.
boost::parser::split 创建 boost::parser::split_viewsboost::parser::split_view 是一种 std::views 风格的视图。它根据给定的解析器在匹配项上分割解析范围,产生一系列子范围。您可以将 boost::parser::split_view 视为 boost::parser::search_all_view 的补集,因为 boost::parser::split_view 生成由 boost::parser::search_all_view 生成的子范围之间的子范围。 boost::parser::split_viewstd::views::split_view 的语义非常相似。就像 std::views::split_view 一样, boost::parser::split_view 将在解析范围的开始/结束和相邻匹配项之间产生空范围,或者在相邻匹配项之间产生空范围。

namespace bp = boost::parser;
auto r = "XYZaaXYZbaabaXYZXYZ" | bp::split(bp::lit("XYZ"));
int count = 0;
// Prints '' 'aa' 'baaba' '' ''.
for (auto subrange : r) {
    std::cout << "'" << std::string_view(subrange.begin(), subrange.end() - subrange.begin()) << "' ";
    ++count;
}
std::cout << "\n";
assert(count == 5);

All the details called out in the subsection on boost::parser::search() above apply to boost::parser::split: its parser produces no attributes; it accepts C-style strings as if they were ranges; and it knows how to get from the internally-used iterator type back to the given iterator type, in typical cases.
所有在上述 boost::parser::search() 子节中提到的细节都适用于 boost::parser::split :它的解析器不产生属性;它将 C 风格字符串视为范围;并且它知道如何在内部使用的迭代器类型和给定的迭代器类型之间转换,在典型情况下。

boost::parser::split can be called with, and boost::parser::split_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.
boost::parser::split 可以与,以及 boost::parser::split_view 可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace

boost::parser::replace
[Important] Important  重要

boost::parser::replace and boost::parser::replace_view are not available on MSVC in C++17 mode.
boost::parser::replaceboost::parser::replace_view 在 MSVC 的 C++17 模式下不可用。

boost::parser::replace creates boost::parser::replace_views. boost::parser::replace_view is a std::views-style view. It produces a range of subranges from the parsed range r and the given replacement range replacement. Wherever in the parsed range a match to the given parser parser is found, replacement is the subrange produced. Each subrange of r that does not match parser is produced as a subrange as well. The subranges are produced in the order in which they occur in r. Unlike boost::parser::split_view, boost::parser::replace_view does not produce empty subranges, unless replacement is empty.
boost::parser::replace 创建 boost::parser::replace_viewsboost::parser::replace_view 是一种 std::views 风格的视图。它从解析范围 r 和给定的替换范围 replacement 生成一系列子范围。在解析范围内, wherever 找到与给定解析器 parser 匹配的地方, replacement 就是生成的子范围。 r 的每个子范围如果不匹配 parser ,也会生成一个子范围。子范围按照它们在 r 中出现的顺序生成。与 boost::parser::split_view 不同, boost::parser::replace_view 不会生成空子范围,除非 replacement 为空。

namespace bp = boost::parser;
auto card_number = bp::int_ >> bp::repeat(3)['-' >> bp::int_];
auto rng = "My credit card number is 1234-5678-9012-3456." | bp::replace(card_number, "XXXX-XXXX-XXXX-XXXX");
int count = 0;
// Prints My credit card number is XXXX-XXXX-XXXX-XXXX.
for (auto subrange : rng) {
    std::cout << std::string_view(subrange.begin(), subrange.end() - subrange.begin());
    ++count;
}
std::cout << "\n";
assert(count == 3);

If the iterator types Ir and Ireplacement for the r and replacement ranges passed are identical (as in the example above), the iterator type for the subranges produced is Ir. If they are different, an implementation-defined type is used for the iterator. This type is the moral equivalent of a std::variant<Ir, Ireplacement>. This works as long as Ir and Ireplacement are compatible. To be compatible, they must have common reference, value, and rvalue reference types, as determined by std::common_type_t. One advantage to this scheme is that the range of subranges represented by boost::parser::replace_view is easily joined back into a single range.
如果传递给 rreplacement 范围的迭代器类型 IrIreplacement 相同(如上例所示),则产生的子范围的迭代器类型为 Ir 。如果它们不同,则使用实现定义的类型作为迭代器。此类型是 std::variant<Ir, Ireplacement> 的道德等价物。只要 IrIreplacement 兼容,它就可以正常工作。为了兼容,它们必须具有由 std::common_type_t 确定的共同引用、值和右值引用类型。此方案的一个优点是,由 boost::parser::replace_view 表示的子范围的范围可以轻松地合并成一个范围。

namespace bp = boost::parser;
auto card_number = bp::int_ >> bp::repeat(3)['-' >> bp::int_];
auto rng = "My credit card number is 1234-5678-9012-3456." | bp::replace(card_number, "XXXX-XXXX-XXXX-XXXX") | std::views::join;
std::string replace_result;
for (auto ch : rng) {
    replace_result.push_back(ch);
}
assert(replace_result == "My credit card number is XXXX-XXXX-XXXX-XXXX.");

Note that we could not have written std::string replace_result(r.begin(), r.end()). This is ill-formed because the std::string range constructor takes two iterators of the same type, but decltype(rng.end()) is a sentinel type different from decltype(rng.begin()).
请注意,我们无法编写 std::string replace_result(r.begin(), r.end()) 。这是不合法的,因为 std::string 范围构造函数需要两个相同类型的迭代器,但 decltype(rng.end()) 是不同于 decltype(rng.begin()) 的哨兵类型。

Though the ranges r and replacement can both be C-style strings, boost::parser::replace_view must know the end of replacement before it does any work. This is because the subranges produced are all common ranges, and so if replacement is not, a common range must be formed from it. If you expect to pass very long C-style strings to boost::parser::replace and not pay to see the end until the range is used, don't.
尽管范围 rreplacement 都可以是 C 风格字符串, boost::parser::replace_view 必须在 replacement 之前知道其结束才能进行任何操作。这是因为产生的子范围都是公共范围,因此如果 replacement 不是,就必须从它形成公共范围。如果你预计要将非常长的 C 风格字符串传递给 boost::parser::replace ,并且不付费查看其结束直到使用范围,那么不要这样做。

ReplacementV is constrained almost exactly the same as V. V must model parsable_range and std::ranges::viewable_range. ReplacementV is the same, except that it can also be a std::ranges::input_range, whereas V must be a std::ranges::forward_range.
ReplacementVV 几乎完全相同。 V 必须模拟 parsable_rangestd::ranges::viewable_rangeReplacementV 相同,但也可以是 std::ranges::input_range ,而 V 必须是 std::ranges::forward_range

You may wonder what happens when you pass a UTF-N range for r, and a UTF-M range for replacement. What happens in this case is silent transcoding of replacement from UTF-M to UTF-N by the boost::parser::replace range adaptor. This doesn't require memory allocation; boost::parser::replace just slaps | boost::parser::as_utfN onto replacement. However, since Boost.Parser treats char ranges as unknown encoding, boost::parser::replace will not transcode from char ranges. So calls like this won't work:
您可能会想知道当您传递一个 UTF-N 范围给 r ,以及一个 UTF-M 范围给 replacement 时会发生什么。在这种情况下, replacement 会被 boost::parser::replace 范围适配器静默地从 UTF-M 转换为 UTF-N。这不需要内存分配; boost::parser::replace 只是将 | boost::parser::as_utfN 粘贴到 replacement 上。然而,由于 Boost.Parser 将 char 范围视为未知编码, boost::parser::replace 不会从 char 范围进行转换。因此,这样的调用将不会工作:

char const str[] = "some text";
char const replacement_str[] = "some text";
using namespace bp = boost::parser;
auto r = empty_str | bp::replace(parser, replacement_str | bp::as_utf8); // Error: ill-formed!  Can't mix plain-char inputs and UTF replacements.

This does not work, even though char and UTF-8 are the same size. If r and replacement are both ranges of char, everything will work of course. It's just mixing char and UTF-encoded ranges that does not work.
这不起作用,即使 char 和 UTF-8 大小相同。如果 rreplacement 都是 char 的范围,当然一切都会正常工作。只是混合 char 和 UTF 编码的范围不起作用。

All the details called out in the subsection on boost::parser::search() above apply to boost::parser::replace: its parser produces no attributes; it accepts C-style strings for the r and replacement parameters as if they were ranges; and it knows how to get from the internally-used iterator type back to the given iterator type, in typical cases.
所有在上述 boost::parser::search() 子节中提到的细节都适用于 boost::parser::replace :它的解析器不产生属性;它将 C 风格的字符串作为 rreplacement 参数的范围接受;并且它知道如何在典型情况下从内部使用的迭代器类型回到给定的迭代器类型。

boost::parser::replace can be called with, and boost::parser::replace_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.
boost::parser::replace 可以与,以及 boost::parser::replace_view 可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace

boost::parser::transform_replace
[Important] Important  重要

boost::parser::transform_replace and boost::parser::transform_replace_view are not available on MSVC in C++17 mode.
boost::parser::transform_replaceboost::parser::transform_replace_view 在 MSVC 的 C++17 模式下不可用。

[Important] Important  重要

boost::parser::transform_replace and boost::parser::transform_replace_view are not available on GCC in C++20 mode before GCC 12.
boost::parser::transform_replaceboost::parser::transform_replace_view 在 GCC 12 之前的 GCC C++20 模式下不可用。

boost::parser::transform_replace creates boost::parser::transform_replace_views. boost::parser::transform_replace_view is a std::views-style view. It produces a range of subranges from the parsed range r and the given invocable f. Wherever in the parsed range a match to the given parser parser is found, let parser's attribute be attr; f(std::move(attr)) is the subrange produced. Each subrange of r that does not match parser is produced as a subrange as well. The subranges are produced in the order in which they occur in r. Unlike boost::parser::split_view, boost::parser::transform_replace_view does not produce empty subranges, unless f(std::move(attr)) is empty. Here is an example.
boost::parser::transform_replace 创建 boost::parser::transform_replace_viewsboost::parser::transform_replace_view 是一种 std::views 风格的视图。它从解析范围 r 和给定的可调用 f 中生成一系列子范围。在解析范围内,只要找到与给定解析器 parser 匹配的项,就让 parser 的属性为 attrf(std::move(attr)) 是生成的子范围。对于不匹配 parser 的每个 r 子范围,也生成一个子范围。子范围按其在 r 中出现的顺序生成。与 boost::parser::split_view 不同, boost::parser::transform_replace_view 不会生成空子范围,除非 f(std::move(attr)) 为空。以下是一个示例。

auto string_sum = [](std::vector<int> const & ints) {
    return std::to_string(std::accumulate(ints.begin(), ints.end(), 0));
};

auto rng = "There are groups of [1, 2, 3, 4, 5] in the set." |
           bp::transform_replace('[' >> bp::int_ % ',' >> ']', bp::ws, string_sum);
int count = 0;
// Prints "There are groups of 15 in the set".
for (auto subrange : rng) {
    for (auto ch : subrange) {
        std::cout << ch;
    }
    ++count;
}
std::cout << "\n";
assert(count == 3);

Let the type decltype(f(std::move(attr))) be Replacement. Replacement must be a range, and must be compatible with r. See the description of boost::parser::replace_view's iterator compatibility requirements in the section above for details.
让类型 decltype(f(std::move(attr)))ReplacementReplacement 必须是一个范围,并且必须与 r 兼容。有关 boost::parser::replace_view 迭代器兼容性要求的详细信息,请参阅上方章节。

As with boost::parser::replace, boost::parser::transform_replace can be flattened from a view of subranges into a view of elements by piping it to std::views::join. See the section on boost::parser::replace above for an example.
boost::parser::replace 一样, boost::parser::transform_replace 可以通过将其管道化到 std::views::join 中从子范围视图转换为元素视图。有关示例,请参阅上面的 boost::parser::replace 部分。

Just like boost::parser::replace and boost::parser::replace_view, boost::parser::transform_replace and boost::parser::transform_replace_view do silent transcoding of the result to the appropriate UTF, if applicable. If both r and f(std::move(attr)) are ranges of char, or are both the same UTF, no transcoding occurs. If one of r and f(std::move(attr)) is a range of char and the other is some UTF, the program is ill-formed.
就像 boost::parser::replaceboost::parser::replace_view 一样, boost::parser::transform_replaceboost::parser::transform_replace_view 在适用的情况下将结果静默转换为适当的 UTF。如果 rf(std::move(attr)) 都是 char 的范围,或者都是相同的 UTF,则不进行转换。如果 rf(std::move(attr)) 中有一个是 char 的范围,而另一个是某些 UTF,则程序是无效的。

boost::parser::transform_replace_view will move each attribute into f; f may move from the argument or copy it as desired. f may return an lvalue reference. If it does so, the address of the reference will be taken and stored within boost::parser::transform_replace_view. Otherwise, the value returned by f is moved into boost::parser::transform_replace_view. In either case, the value type of boost::parser::transform_replace_view is always a subrange.
boost::parser::transform_replace_view 将每个属性移动到 ff 可以从参数移动或按需复制。 f 可能返回一个左值引用。如果它这样做,引用的地址将被取出并存储在 boost::parser::transform_replace_view 中。否则, f 返回的值将被移动到 boost::parser::transform_replace_view 。在两种情况下, boost::parser::transform_replace_view 的值类型始终是子范围。

boost::parser::transform_replace can be called with, and boost::parser::transform_replace_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.
boost::parser::transform_replace 可以与,以及 boost::parser::transform_replace_view 可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace

Boost.Parser was designed from the start to be Unicode friendly. There are numerous references to the "Unicode code path" and the "non-Unicode code path" in the Boost.Parser documentation. Though there are in fact two code paths for Unicode and non-Unicode parsing, the code is not very different in the two code paths, as they are written generically. The only difference is that the Unicode code path parses the input as a range of code points, and the non-Unicode path does not. In effect, this means that, in the Unicode code path, when you call parse(r, p) for some input range r and some parser p, the parse happens as if you called parse(r | boost::parser::as_utf32, p) instead. (Of course, it does not matter if r is a proper range, or an iterator/sentinel pair; those both work fine with boost::parser::as_utf32.)
Boost.Parser 从一开始就被设计成对 Unicode 友好。Boost.Parser 文档中有很多关于“Unicode 代码路径”和“非 Unicode 代码路径”的引用。尽管实际上存在两个用于 Unicode 和非 Unicode 解析的代码路径,但由于它们是通用编写的,这两个代码路径中的代码并没有很大差异。唯一的区别是,Unicode 代码路径将输入解析为一系列代码点,而非 Unicode 路径则不是。实际上,这意味着在 Unicode 代码路径中,当你为某个输入范围 r 和某个解析器 p 调用 parse(r, p) 时,解析就像你调用了 parse(r | boost::parser::as_utf32, p) 一样发生。(当然,如果 r 是一个合适的范围,或者是一个迭代器/哨兵对,这两者都与 boost::parser::as_utf32 配合得很好。)

Matching "characters" within Boost.Parser's parsers is assumed to be a code point match. In the Unicode path there is a code point from the input that is matched to each char_ parser. In the non-Unicode path, the encoding is unknown, and so each element of the input is considered to be a whole "character" in the input encoding, analogous to a code point. From this point on, I will therefore refer to a single element of the input exclusively as a code point.
匹配 Boost.Parser 的解析器中的“字符”被认为是码点匹配。在 Unicode 路径中,输入中的一个码点与每个 char_ 解析器匹配。在非 Unicode 路径中,编码未知,因此输入的每个元素都被视为输入编码中的一个“完整字符”,类似于码点。从现在起,因此我将专门将输入的单个元素称为码点。

So, let's say we write this parser:
所以,假设我们编写这个解析器:

constexpr auto char8_parser = boost::parser::char_('\xcc');

For any char_ parser that should match a value or values, the type of the value to match is retained. So char8_parser contains a char that it will use for matching. If we had written:
对于任何应该匹配值或值的 char_ 解析器,保留要匹配的值的类型。因此, char8_parser 包含一个 char ,它将用于匹配。如果我们写成:

constexpr auto char32_parser = boost::parser::char_(U'\xcc');

char32_parser would instead contain a char32_t that it would use for matching.
char32_parser 将包含一个用于匹配的 char32_t

So, at any point during the parse, if char8_parser were being used to match a code point next_cp from the input, we would see the moral equivalent of next_cp == '\xcc', and if char32_parser were being used to match next_cp, we'd see the equivalent of next_cp == U'\xcc'. The take-away here is that you can write char_ parsers that match specific values, without worrying if the input is Unicode or not because, under the covers, what takes place is a simple comparison of two integral values.
因此,在解析过程中,如果使用 char8_parser 来匹配输入中的代码点 next_cp ,我们会看到 next_cp == '\xcc' 的道德等价物;如果使用 char32_parser 来匹配 next_cp ,我们会看到 next_cp == U'\xcc' 的等价物。这里的要点是,您可以编写匹配特定值的 char_ 解析器,无需担心输入是否为 Unicode,因为实际上发生的是两个整数值的简单比较。

[Note] Note  注意

Boost.Parser actually promotes any two values to a common type using std::common_type before comparing them. This is almost always works because the input and any parameter passed to char_ must be character types.
Boost.Parser 实际上在比较之前使用 std::common_type 将任何两个值提升到公共类型。这几乎总是有效,因为输入和传递给 char_ 的任何参数必须是字符类型。

Since matches are always done at a code point level (remember, a "code point" in the non-Unicode path is assumed to be a single char), you get different results trying to match UTF-8 input in the Unicode and non-Unicode code paths:
由于匹配总是在代码点级别进行的(记住,在非 Unicode 路径中,“代码点”被认为是单个 char ),因此尝试在 Unicode 和非 Unicode 代码路径中匹配 UTF-8 输入时,您会得到不同的结果:

namespace bp = boost::parser;

{
    std::string str = (char const *)u8"\xcc\x80"; // encodes the code point U+0300
    auto first = str.begin();

    // Since we've done nothing to indicate that we want to do Unicode
    // parsing, and we've passed a range of char to parse(), this will do
    // non-Unicode parsing.
    std::string chars;
    assert(bp::parse(first, str.end(), *bp::char_('\xcc'), chars));

    // Finds one match of the *char* 0xcc, because the value in the parser
    // (0xcc) was matched against the two code points in the input (0xcc and
    // 0x80), and the first one was a match.
    assert(chars == "\xcc");
}
{
    std::u8string str = u8"\xcc\x80"; // encodes the code point U+0300
    auto first = str.begin();

    // Since the input is a range of char8_t, this will do Unicode
    // parsing.  The same thing would have happened if we passed
    // str | boost::parser::as_utf32 or even str | boost::parser::as_utf8.
    std::string chars;
    assert(bp::parse(first, str.end(), *bp::char_('\xcc'), chars));

    // Finds zero matches of the *code point* 0xcc, because the value in
    // the parser (0xcc) was matched against the single code point in the
    // input, 0x0300.
    assert(chars == "");
}
Implicit transcoding  隐式转码

Additionally, it is expected that most programs will use UTF-8 for the encoding of Unicode strings. Boost.Parser is written with this typical case in mind. This means that if you are parsing 32-bit code points (as you always are in the Unicode path), and you want to catch the result in a container C of char or char8_t values, Boost.Parser will silently transcode from UTF-32 to UTF-8 and write the attribute into C. This means that std::string, std::u8string, etc. are fine to use as attribute out-parameters for *char_, and the result will be UTF-8.
此外,预计大多数程序将使用 UTF-8 对 Unicode 字符串进行编码。Boost.Parser 就是针对这种典型情况编写的。这意味着如果您正在解析 32 位代码点(在 Unicode 路径中您总是这样做),并且希望将结果捕获在包含 charchar8_t 值的容器 C 中,Boost.Parser 将静默地将 UTF-32 转换为 UTF-8,并将属性写入 C 。这意味着 std::stringstd::u8string 等可以作为 *char_ 的属性输出参数使用,结果将是 UTF-8。

[Note] Note  注意

UTF-16 strings as attributes are not supported directly. If you want to use UTF-16 strings as attributes, you may need to do so by transcoding a UTF-8 or UTF-32 attribute to UTF-16 within a semantic action. You can do this by using boost::parser::as_utf16.
UTF-16 字符串作为属性不支持直接使用。如果您想使用 UTF-16 字符串作为属性,您可能需要在语义动作中将 UTF-8 或 UTF-32 属性转换为 UTF-16。您可以通过使用 boost::parser::as_utf16 来实现。

The treatment of strings as UTF-8 is nearly ubiquitous within Boost.Parser. For instance, though the entire interface of symbols uses std::string or std::string_view, UTF-32 comparisons are used internally.
字符串作为 UTF-8 的处理在 Boost.Parser 中几乎是普遍的。例如,尽管 symbols 的整个接口使用 std::stringstd::string_view ,但内部使用 UTF-32 比较。

Explicit transcoding  显式转码

I mentioned above that the use of boost::parser::utf*_view as the range to parse opts you in to Unicode parsing. Here's a bit more about these views and how best to use them.
我上面提到,使用 boost::parser::utf*_view 作为范围来解析 opts,将其引入 Unicode 解析。这里有一些关于这些视图以及如何最好地使用它们的更多信息。

If you want to do Unicode parsing, you're always going to be comparing code points at each step of the parse. As such, you're going to implicitly convert any parse input to UTF-32, if needed. This is what all the parse API functions do internally.
如果您想进行 Unicode 解析,您将始终在每个解析步骤中比较码点。因此,如果需要,您将隐式地将任何解析输入转换为 UTF-32。这就是所有解析 API 函数在内部所做的事情。

However, there are times when you have parse input that is a sequence of UTF-8-encoded chars, and you want to do Unicode-aware parsing. As mentioned previously, Boost.Parser has a special case for char inputs, and it will not assume that char sequences are UTF-8. If you want to tell the parse API to do Unicode processing on them anyway, you can use the as_utf32 range adapter. (Note that you can use any of the as_utf* adaptors and the semantics will not differ from the semantics below.)
然而,有时你需要解析输入为 UTF-8 编码的 char 序列,并且希望进行 Unicode 感知解析。如前所述,Boost.Parser 对 char 输入有特殊处理,它不会假设 char 序列是 UTF-8。如果你想让解析 API 无论如何都对这些进行 Unicode 处理,可以使用 as_utf32 范围适配器。(注意,你可以使用任何 as_utf* 适配器,其语义与下面的语义不会不同。)

namespace bp = boost::parser;

auto const p = '"' >> *(bp::char_ - '"' - 0xb6) >> '"';
char const * str = "\"two wörds\""; // ö is two code units, 0xc3 0xb6

auto result_1 = bp::parse(str, p);                // Treat each char as a code point (typically ASCII).
assert(!result_1);
auto result_2 = bp::parse(str | bp::as_utf32, p); // Unicode-aware parsing on code points.
assert(result_2);

The first call to parse() treats each char as a code point, and since "ö" is the pair of code units 0xc3 0xb6, the parse matches the second code unit against the - 0xb6 part of the parser above, causing the parse to fail. This happens because each code unit/char in str is treated as an independent code point.
第一次调用 parse() 将每个 char 视为一个码点,由于 "ö" 是码单元对 0xc3 0xb6 ,解析器将第二个码单元与上面的解析器的 - 0xb6 部分进行匹配,导致解析失败。这是因为 str 中的每个码单元/ char 都被视为一个独立的码点。

The second call to parse() succeeds because, when the parse gets to the code point for 'ö', it is 0xf6 (U+00F6), which does not match the - 0xb6 part of the parser.
第二次调用 parse() 成功,因为当解析器到达 'ö' 的代码点时,它是 0xf6 (U+00F6),这与解析器的 - 0xb6 部分不匹配。

The other adaptors as_utf8 and as_utf16 are also provided for completeness, if you want to use them. They each can transcode any sequence of character types.
其他适配器 as_utf8as_utf16 也提供以保持完整性,如果您想使用它们。它们各自可以转码任何字符类型的序列。

[Important] Important  重要

The as_utfN adaptors are optional, so they don't come with parser.hpp. To get access to them, #include <boost/parser/transcode_view.hpp>.
as_utfN 适配器是可选的,因此它们不包括 parser.hpp 。要获取它们, #include <boost/parser/transcode_view.hpp>

(Lack of) normalization
(缺乏)归一化

One thing that Boost.Parser does not handle for you is normalization; Boost.Parser is completely normalization-agnostic. Since all the parsers do their matching using equality comparisons of code points, you should make sure that your parsed range and your parsers all use the same normalization form.
Boost.Parser 不为你处理的一件事是规范化;Boost.Parser 对规范化一无所知。由于所有解析器都通过代码点的相等比较来进行匹配,你应该确保你的解析范围和解析器都使用相同的规范化形式。

In most parsing cases, being able to generate an attribute that represents the result of the parse, or being able to parse into such an attribute, is sufficient. Sometimes, it is not. If you need to parse a very large chunk of text, the generated attribute may be too large to fit in memory. In other cases, you may want to generate attributes sometimes, and not others. callback_rules exist for these kinds of uses. A callback_rule is just like a rule, except that it allows the rule's attribute to be returned to the caller via a callback, as long as the parse is started with a call to callback_parse() instead of parse(). Within a call to parse(), a callback_rule is identical to a regular rule.
在大多数解析情况下,能够生成一个表示解析结果的属性,或者能够将解析结果解析到这样的属性中,就足够了。有时则不然。如果你需要解析一个非常大的文本块,生成的属性可能太大而无法放入内存。在其他情况下,你可能有时想生成属性,有时则不想。 callback_rules 就是为了这些用途而存在的。 callback_rule 就像一条规则,只不过它允许通过回调将规则的属性返回给调用者,只要解析是以对 callback_parse() 的调用而不是对 parse() 的调用开始的。在 parse() 的调用中, callback_rule 与常规的 rule 相同。

For a rule with no attribute, the signature of a callback function is void (tag), where tag is the tag-type used when declaring the rule. For a rule with an attribute attr, the signature is void (tag, attr). For instance, with this rule:
对于没有属性的规则,回调函数的签名是 void (tag) ,其中 tag 是在声明规则时使用的标签类型。对于具有属性 attr 的规则,签名是 void (tag, attr) 。例如,对于这个规则:

boost::parser::callback_rule<struct foo_tag> foo = "foo";

this would be an appropriate callback function:
这是一个合适的回调函数:

void foo_callback(foo_tag)
{
    std::cout << "Parsed a 'foo'!\n";
}

For this rule:   对于这个规则:

boost::parser::callback_rule<struct bar_tag, std::string> bar = "bar";

this would be an appropriate callback function:
这是一个合适的回调函数:

void bar_callback(bar_tag, std::string const & s)
{
    std::cout << "Parsed a 'bar' containing " << s << "!\n";
}
[Important] Important  重要

In the case of bar_callback(), we don't need to do anything with s besides insert it into a stream, so we took it as a const lvalue reference. Boost.Parser moves all attributes into callbacks, so the signature could also have been void bar_callback(bar_tag, std::string s) or void bar_callback(bar_tag, std::string && s).
bar_callback() 的情况下,我们除了将其插入流中之外,不需要对 s 做任何事情,所以我们将其视为 const 左值引用。Boost.Parser 将所有属性移动到回调中,因此签名也可以是 void bar_callback(bar_tag, std::string s)void bar_callback(bar_tag, std::string && s)

You opt into callback parsing by parsing with a call to callback_parse() instead of parse(). If you use callback_rules with parse(), they're just regular rules. This allows you to choose whether to do "normal" attribute-generating/attribute-assigning parsing with parse(), or callback parsing with callback_parse(), without rewriting much parsing code, if any.
您通过调用 callback_parse() 而不是 parse() 来选择回调解析。如果您使用 callback_rulesparse() ,它们只是普通的 rules 。这允许您在不重写太多解析代码的情况下,选择是否使用 parse() 进行“正常”的属性生成/属性分配解析,或者使用 callback_parse() 进行回调解析。

The only reason all rules are not callback_rules is that you may want to have some rules use callbacks within a parse, and have some that do not. For instance, if you want to report the attribute of callback_rule r1 via callback, r1's implementation may use some rule r2 to generate some or all of its attribute.
唯一的原因是,所有 rules 都不是 callback_rules ,是因为你可能想在解析过程中让一些 rules 使用回调,而另一些则不使用。例如,如果你想通过回调报告 callback_rule r1 的属性, r1 的实现可能使用某些规则 r2 来生成其属性的一部分或全部。

See Parsing JSON With Callbacks for an extended example of callback parsing.
查看使用回调进行 JSON 解析的扩展示例。

Error handling  错误处理

Boost.Parser has good error reporting built into it. Consider what happens when we fail to parse at an expectation point (created using operator>). If I feed the parser from the Parsing JSON With Callbacks example a file called sample.json containing this input (note the unmatched '['):
Boost.Parser 内置了良好的错误报告功能。考虑当我们在一个期望点(使用 operator> 创建)处解析失败时会发生什么。如果我从“使用回调解析 JSON”示例中给解析器提供一个名为 sample.json 的文件,该文件包含以下输入(注意未匹配的 '[' ):

{
    "key": "value",
    "foo": [, "bar": []
}

This is the error message that is printed to the terminal:
这是打印到终端的错误信息:

sample.json:3:12: error: Expected ']' here:
    "foo": [, "bar": []
            ^

That message is formatted like the diagnostics produced by Clang and GCC. It quotes the line on which the failure occurred, and even puts a caret under the exact position at which the parse failed. This error message is suitable for many kinds of end-users, and interoperates well with anything that supports Clang and/or GCC diagnostics.
该消息的格式类似于 Clang 和 GCC 生成的诊断信息。它引用了发生失败的行,甚至还在解析失败的确切位置下面放置了一个插入符。此错误消息适用于许多类型的最终用户,并且与支持 Clang 和/或 GCC 诊断的任何东西都具有良好的互操作性。

Most of Boost.Parser's error handlers format their diagnostics this way, though you are not bound by that. You can make an error handler type that does whatever you want, as long as it meets the error handler interface.
大多数 Boost.Parser 的错误处理器以这种方式格式化它们的诊断信息,尽管你并不受此限制。你可以创建一个满足错误处理器接口的任何错误处理器类型。

The Boost.Parser error handlers are:
Boost.Parser 的错误处理器有:

  • default_error_handler: Produces formatted diagnostics like the one above, and prints them to std::cerr. default_error_handler has no associated file name, and both errors and diagnostics are printed to std::cerr. This handler is constexpr-friendly.
    default_error_handler :生成类似于上面的格式化诊断信息,并将它们打印到 std::cerrdefault_error_handler 没有关联的文件名,错误和诊断信息都打印到 std::cerr 。此处理程序对 constexpr 友好。
  • stream_error_handler: Produces formatted diagnostics. One or two streams may be used. If two are used, errors go to one stream and warnings go to the other. A file name can be associated with the parse; if it is, that file name will appear in all diagnostics.
    stream_error_handler :生成格式化的诊断信息。可以使用一个或两个流。如果使用两个流,错误信息发送到一个流,警告信息发送到另一个流。可以与解析关联一个文件名;如果是这样,该文件名将出现在所有诊断信息中。
  • callback_error_handler: Produces formatted diagnostics. Calls a callback with the diagnostic message to report the diagnostic, rather than streaming out the diagnostic. A file name can be associated with the parse; if it is, that file name will appear in all diagnostics. This handler is useful for recording the diagnostics in memory.
    callback_error_handler :生成格式化的诊断信息。通过回调函数传递诊断消息来报告诊断,而不是将诊断信息流式输出。可以与解析关联一个文件名;如果是这样,该文件名将出现在所有诊断信息中。此处理程序适用于在内存中记录诊断信息。
  • rethrow_error_handler: Does nothing but re-throw any exception that it is asked to handle. Its diagnose() member functions are no-ops.
    rethrow_error_handler : 只做重新抛出它被要求处理的任何异常。它的 diagnose() 成员函数都是空操作。
  • vs_output_error_handler: Directs all errors and warnings to the debugging output panel inside Visual Studio. Available on Windows only. Probably does nothing useful desirable when executed outside of Visual Studio.
    vs_output_error_handler :将所有错误和警告直接发送到 Visual Studio 内部的调试输出面板。仅在 Windows 上可用。在 Visual Studio 外部执行时可能没有任何有用的期望效果。

You can set the error handler to any of these, or one of your own, using with_error_handler() (see The parse() API). If you do not set one, default_error_handler will be used.
您可以将错误处理器设置为以下任何一个,或者使用您自己的,通过 with_error_handler() (参见 parse() API)。如果您没有设置,将使用 default_error_handler

How diagnostics are generated
诊断是如何生成的

Boost.Parser only generates error messages like the ones in this page at failed expectation points, like a > b, where you have successfully parsed a, but then cannot successfully parse b. This may seem limited to you. It's actually the best that we can do.
Boost.Parser 仅在失败期望点生成错误消息,如本页中的这些,例如 a > b ,你在其中成功解析了 a ,但随后无法成功解析 b 。这可能看起来很有限。实际上,这是我们能做到的最好的。

In order for error handling to happen other than at expectation points, we have to know that there is no further processing that might take place. This is true because Boost.Parser has P1 | P2 | ... | Pn parsers ("or_parsers"). If any one of these parsers Pi fails to match, it is not allowed to fail the parse — the next one (Pi+1) might match. If we get to the end of the alternatives of the or_parser and Pn fails, we still cannot fail the top-level parse, because the or_parser might be a subparser within a parent or_parser.
为了使错误处理发生在预期点之外,我们必须知道没有进一步的加工可能发生。这是真的,因为 Boost.Parser 有 P1 | P2 | ... | Pn 解析器(" or_parser s")。如果这些解析器中的任何一个 Pi 无法匹配,则不允许解析失败——下一个( Pi+1 )可能匹配。如果我们到达 or_parser 的替代方案末尾且 Pn 失败,我们仍然不能使顶级解析失败,因为 or_parser 可能是一个父 or_parser 中的子解析器。

Ok, so what might we do? Perhaps we could at least indicate when we ran into end-of-input. But we cannot, for exactly the same reason already stated. For any parser P, reaching end-of-input is a failure for P, but not necessarily for the whole parse.
好的,那么我们可能做什么呢?也许我们至少可以指出我们遇到了输入结束。但我们不能,原因与之前已经说明的完全相同。对于任何解析器 P ,遇到输入结束是 P 的失败,但不一定是整个解析的失败。

Perhaps we could record the farthest point ever reached during the parse, and report that at the top level, if the top level parser fails. That would be little help without knowing which parser was active when we reached that point. This would require some sort of repeated memory allocation, since in Boost.Parser the progress point of the parser is stored exclusively on the stack — by the time we fail the top-level parse, all those far-reaching stack frames are long gone. Not the best.
也许我们可以记录解析过程中达到的最远点,并在顶级解析失败时在顶级报告。但这并没有什么帮助,除非我们知道在达到那个点时是哪个解析器处于活动状态。这需要某种形式的重复内存分配,因为在 Boost.Parser 中,解析器的进度点仅存储在栈上——当我们失败顶级解析时,所有那些遥远的栈帧都已经消失了。这不是最好的。

Worse still, knowing how far you got in the parse and which parser was active is not very useful. Consider this.
更糟糕的是,知道你在解析中走了多远以及哪个解析器正在运行并不是很有用。考虑一下这个。

namespace bp = boost::parser;
auto a_b = bp::char_('a') >> bp::char_('b');
auto c_b = bp::char_('c') >> bp::char_('b');
auto result = bp::parse("acb", a_b | c_b);

If we reported the farthest-reaching parser and it's position, it would be the a_b parser, at position "bc" in the input. Is this really enlightening? Was the error in the input putting the 'a' at the beginning or putting the 'c' in the middle? If you point the user at a_b as the parser that failed, and never mention c_b, you are potentially just steering them in the wrong direction.
如果我们报告了影响最远的解析器和它的位置,它将是 a_b 解析器,位于输入中的 "bc" 位置。这真的有启发性吗?错误是在输入中将 'a' 放在开头还是将 'c' 放在中间?如果您将用户指向 a_b 作为失败的解析器,并且从未提及 c_b ,您可能只是在误导他们。

All error messages must come from failed expectation points. Consider parsing JSON. If you open a list with '[', you know that you're parsing a list, and if the list is ill-formed, you'll get an error message saying so. If you open an object with '{', the same thing is possible — when missing the matching '}', you can tell the user, "That's not an object", and this is useful feedback. The same thing with a partially parsed number, etc. If the JSON parser does not build in expectations like matched braces and brackets, how can Boost.Parser know that a missing '}' is really a problem, and that no later parser will match the input even without the '}'?
所有错误信息必须来自失败的预期点。考虑解析 JSON。如果你以 '[' 打开一个列表,你知道你正在解析一个列表,如果列表格式不正确,你会得到一个错误信息说它是这样的。如果你以 '{' 打开一个对象,同样的事情可能发生——当缺少匹配的 '}' 时,你可以告诉用户“这不是一个对象”,这是一种有用的反馈。部分解析的数字等情况也是如此。如果 JSON 解析器没有内置匹配的括号和方括号等预期,Boost.Parser 如何知道缺少的 '}' 真的是一个问题,以及即使没有 '}' ,后续的解析器也不会匹配输入呢?

[Important] Important  重要

The bottom line is that you should build expectation points into your parsers using operator> as much as possible.
底线是,你应该尽可能多地使用 operator> 将预期点构建到你的解析器中。

Using error handlers in semantic actions
使用语义动作中的错误处理器

You can get access to the error handler within any semantic action by calling _error_handler(ctx) (see The Parse Context). Any error handler must have the following member functions:
您可以通过调用 _error_handler(ctx) (参见解析上下文)在任何语义动作中获取错误处理器的访问权限。任何错误处理器都必须具有以下成员函数:

template<typename Context, typename Iter>
void diagnose(
    diagnostic_kind kind,
    std::string_view message,
    Context const & context,
    Iter it) const;

template<typename Context>
void diagnose(
    diagnostic_kind kind,
    std::string_view message,
    Context const & context) const;

If you call the second one, the one without the iterator parameter, it will call the first with _where(context).begin() as the iterator parameter. The one without the iterator is the one you will use most often. The one with the explicit iterator parameter can be useful in situations where you have messages that are related to each other, associated with multiple locations. For instance, if you are parsing XML, you may want to report that a close-tag does not match its associated open-tag by showing the line where the open-tag was found. That may of course not be located anywhere near _where(ctx).begin(). (A description of _globals() is below.)
如果您调用第二个,没有迭代器参数的那个,它将使用 _where(context).begin() 作为迭代器参数调用第一个。没有迭代器参数的那个是您最常使用的。具有显式迭代器参数的那个在您有相互关联的消息、与多个位置相关联的情况下可能很有用。例如,如果您正在解析 XML,您可能希望报告一个闭合标签与其关联的开放标签不匹配,通过显示开放标签被找到的行。当然,这可能在 _where(ctx).begin() 附近任何地方。( _globals() 的描述如下。)

[](auto & ctx) {
    // Assume we have a std::vector of open tags, and another
    // std::vector of iterators to where the open tags were parsed, in our
    // globals.
    if (_attr(ctx) != _globals(ctx).open_tags.back()) {
        std::string open_tag_msg =
            "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:";
        _error_handler(ctx).diagnose(
            boost::parser::diagnostic_kind::error,
            open_tag_msg,
            ctx,
            _globals(ctx).open_tags_position.back());
        std::string close_tag_msg =
            "does not match close-tag \"" + _attr(ctx) + "\" here:";
        _error_handler(ctx).diagnose(
            boost::parser::diagnostic_kind::error,
            close_tag_msg,
            ctx);

        // Explicitly fail the parse.  Diagnostics do not affect parse success.
        _pass(ctx) = false;
    }
}
_report_error() and _report_warning()
_report_error() 和 _report_warning()

There are also some convenience functions that make the above code a little less verbose, _report_error() and _report_warning():
有一些便利函数可以使上述代码更简洁, _report_error()_report_warning()

[](auto & ctx) {
    // Assume we have a std::vector of open tags, and another
    // std::vector of iterators to where the open tags were parsed, in our
    // globals.
    if (_attr(ctx) != _globals(ctx).open_tags.back()) {
        std::string open_tag_msg =
            "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:";
        _report_error(ctx, open_tag_msg, _globals(ctx).open_tag_positions.back());
        std::string close_tag_msg =
            "does not match close-tag \"" + _attr(ctx) + "\" here:";
        _report_error(ctx, close_tag_msg);

        // Explicitly fail the parse.  Diagnostics do not affect parse success.
        _pass(ctx) = false;
    }
}

You should use these less verbose functions almost all the time. The only time you would want to use _error_handler() directly is when you are using a custom error handler, and you want access to some part of its interface besides diagnose().
您几乎应该始终使用这些更简洁的函数。唯一您想直接使用 _error_handler() 的情况是,当您使用自定义错误处理器,并且想访问其接口的某些部分,而不仅仅是 diagnose()

Though there is support for reporting warnings using the functions above, none of the error handlers supplied by Boost.Parser will ever report a warning. Warnings are strictly for user code.
尽管支持使用上述函数报告警告,但 Boost.Parser 提供的任何错误处理器都不会报告警告。警告仅用于用户代码。

For more information on the rest of the error handling and diagnostic API, see the header reference pages for error_handling_fwd.hpp and error_handling.hpp.
有关错误处理和诊断 API 的其余部分,请参阅 error_handling_fwd.hpperror_handling.hpp 的头文件参考页面。

Creating your own error handler
创建您自己的错误处理器

Creating your own error handler is pretty easy; you just need to implement three member functions. Say you want an error handler that writes diagnostics to a file. Here's how you might do that.
创建自己的错误处理器相当简单;你只需要实现三个成员函数。比如说,你想创建一个将诊断信息写入文件的错误处理器。以下是实现方法。

struct logging_error_handler
{
    logging_error_handler() {}
    logging_error_handler(std::string_view filename) :
        filename_(filename), ofs_(filename_)
    {
        if (!ofs_)
            throw std::runtime_error("Could not open file.");
    }

    // This is the function called by Boost.Parser after a parser fails the
    // parse at an expectation point and throws a parse_error.  It is expected
    // to create a diagnostic message, and put it where it needs to go.  In
    // this case, we're writing it to a log file.  This function returns a
    // bp::error_handler_result, which is an enum with two enumerators -- fail
    // and rethrow.  Returning fail fails the top-level parse; returning
    // rethrow just re-throws the parse_error exception that got us here in
    // the first place.
    template<typename Iter, typename Sentinel>
    bp::error_handler_result
    operator()(Iter first, Sentinel last, bp::parse_error<Iter> const & e) const
    {
        bp::write_formatted_expectation_failure_error_message(
            ofs_, filename_, first, last, e);
        return bp::error_handler_result::fail;
    }

    // This function is for users to call within a semantic action to produce
    // a diagnostic.
    template<typename Context, typename Iter>
    void diagnose(
        bp::diagnostic_kind kind,
        std::string_view message,
        Context const & context,
        Iter it) const
    {
        bp::write_formatted_message(
            ofs_,
            filename_,
            bp::_begin(context),
            it,
            bp::_end(context),
            message);
    }

    // This is just like the other overload of diagnose(), except that it
    // determines the Iter parameter for the other overload by calling
    // _where(ctx).
    template<typename Context>
    void diagnose(
        bp::diagnostic_kind kind,
        std::string_view message,
        Context const & context) const
    {
        diagnose(kind, message, context, bp::_where(context).begin());
    }

    std::string filename_;
    mutable std::ofstream ofs_;
};

That's it. You just need to do the important work of the error handler in its call operator, and then implement the two overloads of diagnose() that it must provide for use inside semantic actions. The default implementation of these is even available as the free function write_formatted_message(), so you can just call that, as you see above. Here's how you might use it.
这就可以了。你只需要在其调用操作符中完成错误处理程序的重要工作,然后实现它必须为语义动作内部使用提供的两个重载的 diagnose() 。这些的默认实现甚至作为免费函数 write_formatted_message() 可用,所以你只需调用它,就像上面看到的那样。下面是如何使用它的示例。

int main()
{
    std::cout << "Enter a list of integers, separated by commas. ";
    std::string input;
    std::getline(std::cin, input);

    constexpr auto parser = bp::int_ >> *(',' > bp::int_);
    logging_error_handler error_handler("parse.log");
    auto const result = bp::parse(input, bp::with_error_handler(parser, error_handler));

    if (result) {
        std::cout << "It looks like you entered:\n";
        for (int x : *result) {
            std::cout << x << "\n";
        }
    }
}

We just define a logging_error_handler, and pass it by reference to with_error_handler(), which decorates the top-level parser with the error handler. We could not have written bp::with_error_handler(parser, logging_error_handler("parse.log")), because with_error_handler() does not accept rvalues. This is becuse the error handler eventually goes into the parse context. The parse context only stores pointers and iterators, keeping it cheap to copy.
我们刚刚定义了一个 logging_error_handler ,并通过引用传递给 with_error_handler() ,它用错误处理器装饰了顶层解析器。我们无法编写 bp::with_error_handler(parser, logging_error_handler("parse.log")) ,因为 with_error_handler() 不接受右值引用。这是因为错误处理器最终会进入解析上下文。解析上下文只存储指针和迭代器,以保持其复制成本低。

If we run the example and give it the input "1,", this shows up in the log file:
如果我们运行示例并给它输入 "1," ,这将在日志文件中显示:

parse.log:1:2: error: Expected int_ here (end of input):
1,
  ^
Fixing ill-formed code
修复格式错误的代码

Sometimes, during the writing of a parser, you make a simple mistake that is diagnosed horrifyingly, due to the high number of template instantiations between the line you just wrote and the point of use (usually, the call to parse()). By "sometimes", I mean "almost always and many, many times". Boost.Parser has a workaround for situations like this. The workaround is to make the ill-formed code well-formed in as many circumstances as possible, and then do a runtime assert instead.
有时,在编写解析器时,你可能会犯一个简单的错误,由于你刚刚编写的行和调用 parse() 的点之间有大量的模板实例化,这个错误会被可怕地诊断出来。这里的“有时”指的是“几乎总是,而且很多次”。Boost.Parser 为这种情况提供了一个解决方案。解决方案是在尽可能多的环境中使不规范的代码变得规范,然后在运行时进行断言。

Usually, C++ programmers try whenever they can to catch mistakes as early as they can. That usually means making as much bad code ill-formed as possible. Counter-intuitively, this does not work well in parser combinator situations. For an example of just how dramatically different these two debugging scenarios can be with Boost.Parser, please see the very long discussion in the none is weird section of Rationale.
通常,C++程序员会尽可能地尽早捕捉错误。这意味着尽可能多地使不良代码无效。出人意料的是,在解析器组合场景中,这并不奏效。例如,要了解使用 Boost.Parser 这两个调试场景可以有多么不同,请参阅《理由》中“ none 很奇怪”部分的漫长讨论。

If you are morally opposed to this approach, or just hate fun, good news: you can turn off the use of this technique entirely by defining BOOST_PARSER_NO_RUNTIME_ASSERTIONS.
如果您道德上反对这种方法,或者只是讨厌乐趣,好消息是:您可以通过定义 BOOST_PARSER_NO_RUNTIME_ASSERTIONS 完全关闭该技术的使用。

Runtime debugging  运行时调试

Debugging parsers is hard. Any parser above a certain complexity level is nearly impossible to debug simply by looking at the parser's code. Stepping through the parse in a debugger is even worse. To provide a reasonable chance of debugging your parsers, Boost.Parser has a trace mode that you can turn on simply by providing an extra parameter to parse() or callback_parse():
调试解析器很困难。任何高于一定复杂度的解析器几乎不可能仅通过查看解析器的代码来调试。在调试器中逐步执行解析甚至更糟。为了提高调试解析器的可能性,Boost.Parser 提供了一个跟踪模式,您可以通过为 parse()callback_parse() 提供一个额外的参数来开启它:

boost::parser::parse(input, parser, boost::parser::trace::on);

Every overload of parse() and callback_parse() takes this final parameter, which is defaulted to boost::parser::trace::off.
每个 parse()callback_parse() 的重载都采用这个最终参数,该参数默认为 boost::parser::trace::off

If we trace a substantial parser, we will see a lot of output. Each code point of the input must be considered, one at a time, to see if a certain rule matches. An an example, let's trace a parse using the JSON parser from Parsing JSON. The input is "null". null is one of the types that a Javascript value can have; the top-level parser in the JSON parser example is:
如果我们追踪一个大型解析器,我们会看到很多输出。输入的每个代码点都必须逐个考虑,以查看是否有某个规则匹配。例如,让我们使用从《解析 JSON》中的 JSON 解析器追踪一个解析。输入是 "null"null 是 JavaScript 值可以具有的类型之一;JSON 解析器示例中的顶层解析器是:

auto const value_p_def =
    number | bp::bool_ | null | string | array_p | object_p;

So, a JSON value can be a number, or a Boolean, a null, etc. During the parse, each alternative will be tried in turn, until one is matched. I picked null because it is relatively close to the beginning of the value_p_def alternative parser. Even so, the output is pretty huge. Let's break it down as we go:
所以,JSON 值可以是数字,也可以是布尔值,或者是 null 等。在解析过程中,将依次尝试每个选项,直到匹配成功。我选择了 null ,因为它相对接近 value_p_def 替代解析器的开头。即便如此,输出仍然相当庞大。让我们边走边分解它:

[begin value; input="null"]

Each parser is traced as [begin foo; ...], then the parsing operations themselves, and then [end foo; ...]. The name of a rule is used as its name in the begin and end parts of the trace. Non-rules have a name that is similar to the way the parser looked when you wrote it. Most lines will have the next few code points of the input quoted, as we have here (input="null").
每个解析器都按 [begin foo; ...] 进行追踪,然后是解析操作本身,接着是 [end foo; ...] 。规则的名字用作追踪的 beginend 部分的名字。非规则的名字与你在编写解析器时的外观相似。大多数行都会引用输入的几个代码点,就像这里一样( input="null" )。

[begin number | bool_ | null | string | ...; input="null"]

This shows the beginning of the parser inside the rule value — the parser that actually does all the work. In the example code, this parser is called value_p_def. Since it isn't a rule, we have no name for it, so we show its implementation in terms of subparsers. Since it is a bit long, we don't print the entire thing. That's why that ellipsis is there.
这显示了规则 value 内部的解析器开始——实际上做所有工作的解析器。在示例代码中,这个解析器被称为 value_p_def 。由于它不是一个规则,我们无法为其命名,因此我们用子解析器来展示其实现。由于它有点长,我们没有打印整个内容。这就是为什么那里有一个省略号的原因。

[begin number; input="null"]
  [begin raw[lexeme[ >> ...]][<<action>>]; input="null"]

Now we're starting to see the real work being done. number is a somewhat complicated parser that does not match "null", so there's a lot to wade through when following the trace of its attempt to do so. One thing to note is that, since we cannot print a name for an action, we just print "<<action>>". Something similar happens when we come to an attribute that we cannot print, because it has no stream insertion operation. In that case, "<<unprintable-value>>" is printed.
现在我们开始看到真正的努力正在进行。 number 是一个相当复杂的解析器,它不匹配 "null" ,所以在追踪其尝试匹配的过程中有很多东西需要处理。需要注意的是,由于我们无法打印一个动作的名称,所以我们只打印 "<<action>>" 。当我们遇到一个无法打印的属性时,也会发生类似的情况,因为它没有流插入操作。在这种情况下,打印 "<<unprintable-value>>"

    [begin raw[lexeme[ >> ...]]; input="null"]
      [begin lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"]
        [begin -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"]
          [begin -char_('-'); input="null"]
            [begin char_('-'); input="null"]
              no match
            [end char_('-'); input="null"]
            matched ""
            attribute: <<empty>>
          [end -char_('-'); input="null"]
          [begin char_('1', '9') >> *digit | char_('0'); input="null"]
            [begin char_('1', '9') >> *digit; input="null"]
              [begin char_('1', '9'); input="null"]
                no match
              [end char_('1', '9'); input="null"]
              no match
            [end char_('1', '9') >> *digit; input="null"]
            [begin char_('0'); input="null"]
              no match
            [end char_('0'); input="null"]
            no match
          [end char_('1', '9') >> *digit | char_('0'); input="null"]
          no match
        [end -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"]
        no match
      [end lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"]
      no match
    [end raw[lexeme[ >> ...]]; input="null"]
    no match
  [end raw[lexeme[ >> ...]][<<action>>]; input="null"]
  no match
[end number; input="null"]
[begin bool_; input="null"]
  no match
[end bool_; input="null"]

number and boost::parser::bool_ did not match, but null will:
numberboost::parser::bool_ 不匹配,但 null 将会匹配:

[begin null; input="null"]
  [begin "null" >> attr(null); input="null"]
    [begin "null"; input="null"]
      [begin string("null"); input="null"]
        matched "null"
        attribute:
      [end string("null"); input=""]
      matched "null"
      attribute: null

Finally, this parser actually matched, and the match generated the attribute null, which is a special value of the type json::value. Since we were matching a string literal "null", earlier there was no attribute until we reached the attr(null) parser.
最后,这个解析器实际上匹配了,匹配生成了属性 null ,它是一种特殊的类型 json::value 的值。由于我们正在匹配一个字符串字面量 "null" ,所以在达到 attr(null) 解析器之前,没有属性。

        [end "null"; input=""]
        [begin attr(null); input=""]
          matched ""
          attribute: null
        [end attr(null); input=""]
        matched "null"
        attribute: null
      [end "null" >> attr(null); input=""]
      matched "null"
      attribute: null
    [end null; input=""]
    matched "null"
    attribute: null
  [end number | bool_ | null | string | ...; input=""]
  matched "null"
  attribute: null
[end value; input=""]
--------------------
parse succeeded
--------------------

At the very end of the parse, the trace code prints out whether the top-level parse succeeded or failed.
在解析的末尾,跟踪代码会打印出顶层解析是否成功或失败。

Some things to be aware of when looking at Boost.Parser trace output:
一些在查看 Boost.Parser 跟踪输出时需要注意的事项:

  • There are some parsers you don't know about, because they are not directly documented. For instance, p[a] forms an action_parser containing the parser p and semantic action a. This is essentially an implementation detail, but unfortunately the trace output does not hide this from you.
    有一些解析器您可能不知道,因为它们没有直接记录。例如, p[a] 形成一个包含解析器 p 和语义动作 aaction_parser 。这本质上是实现细节,但不幸的是,跟踪输出并没有从您那里隐藏这一点。
  • For a parser p, the trace-name may be intentionally different from the actual structure of p. For example, in the trace above, you see a parser called simply "null". This parser is actually boost::parser::omit[boost::parser::string("null")], but what you typically write is just "null", so that's the name used. There are two special cases like this: the one described here for omit[string], and another for omit[char_].
    对于解析器 p ,其跟踪名称可能与实际结构 p 故意不同。例如,在上面的跟踪中,您可以看到一个简单地称为 "null" 的解析器。这个解析器实际上是 boost::parser::omit[boost::parser::string("null")] ,但您通常只写 "null" ,所以这就是使用的名称。有两种特殊情况:这里描述的 omit[string] 的情况,以及另一个 omit[char_] 的情况。
  • Since there are no other special cases for how parser names are printed, you may see parsers that are unlike what you wrote in your code. In the sections about the parsers and combining operations, you will sometimes see a parser or combining operation described in terms of an equivalent parser. For example, if_(pred)[p] is described as "Equivalent to eps(pred) >> p". In a trace, you will not see if_; you will see eps and p instead.
    由于没有其他特殊案例说明如何打印解析器名称,您可能会看到与您代码中编写的不同的解析器。在关于解析器和组合操作的章节中,您有时会看到解析器或组合操作被描述为等价解析器。例如, if_(pred)[p] 被描述为“相当于 eps(pred) >> p ”。在跟踪中,您将看不到 if_ ;您将看到 epsp
  • The values of arguments passed to parsers is printed whenever possible. Sometimes, a parse argument is not a value itself, but a callable that produces that value. In these cases, you'll see the resolved value of the parse argument.
    参数传递给解析器的值在可能的情况下会被打印出来。有时,解析参数本身不是一个值,而是一个产生该值的可调用对象。在这些情况下,您将看到解析参数的解析值。

Boost.Parser seldom allocates memory. The exceptions to this are:
Boost.Parser 很少分配内存。其例外情况是:

  • symbols allocates memory for the symbol/attribute pairs it contains. If symbols are added during the parse, allocations must also occur then. The data structure used by symbols is also a trie, which is a node-based tree. So, lots of allocations are likely if you use symbols.
    symbols 分配它包含的符号/属性对的内存。如果在解析过程中添加了符号,也必须在那时进行分配。 symbols 使用的也是 trie,这是一种基于节点的树。因此,如果您使用 symbols ,很可能会进行大量的分配。
  • The error handlers that can take a file name allocate memory for the file name, if one is provided.
    错误处理程序可以接受文件名,如果提供了文件名,将为文件名分配内存。
  • If trace is turned on by passing boost::parser::trace::on to a top-level parsing function, the names of parsers are allocated.
    如果通过传递 boost::parser::trace::on 到顶级解析函数来开启跟踪,则分配解析器的名称。
  • When a failed expectation is encountered (using operator>), the name of the failed parser is placed into a std::string, which will usually cause an allocation.
    当遇到失败的期望(使用 operator> )时,失败的解析器的名称将被放入 std::string 中,这通常会导致分配。
  • string()'s attribute is a std::string, the use of which implies allocation. You can avoid this allocation by explicitly using a different string type for the attribute that does not allocate.
    '的属性是一个 std::string ,使用它意味着分配。您可以通过显式使用不进行分配的字符串类型来避免这种分配。
  • The attribute for repeat(p) in all its forms, including operator*, operator+, and operator%, is std::vector<ATTR(p)>, the use of which implies allocation. You can avoid this allocation by explicitly using a different sequence container for the attribute that does not allocate. boost::container::static_vector or C++26's std::inplace_vector may be useful as such replacements.
    该属性在所有形式中,包括 operator*operator+operator% ,都是 std::vector<ATTR(p)> ,其使用意味着分配。您可以通过显式使用不进行分配的不同序列容器来避免这种分配。 boost::container::static_vector 或 C++26 的 std::inplace_vector 可能作为此类替代品很有用。

With the exception of allocating the name of the parser that was expected in a failed expectation situation, Boost.Parser does not does not allocate unless you tell it to, by using symbols, using a particular error_handler, turning on trace, or parsing into attributes that allocate.
除了在失败的期望情况下分配预期解析器的名称之外,Boost.Parser 不会分配,除非你通过使用 symbols 、使用特定的错误处理程序、开启跟踪或解析到分配属性的属性中告诉它。

Parse unicode from the start
解析从开始处的 Unicode

If you want to parse ASCII, using the Unicode parsing API will not actually cost you anything. Your input will be parsed, char by char, and compared to values that are Unicode code points (which are char32_ts). One caveat is that there may be an extra branch on each char, if the input is UTF-8. If your performance requirements can tolerate this, your life will be much easier if you just start with Unicode and stick with it.
如果您想解析 ASCII,使用 Unicode 解析 API 实际上不会让您付出任何代价。您的输入将被解析, charchar 解析,并与 Unicode 码点(即 char32_t )进行比较。一个注意事项是,如果输入是 UTF-8,每个字符可能都有一个额外的分支。如果您的性能要求可以容忍这一点,那么如果您从 Unicode 开始并坚持使用它,生活将会容易得多。

Starting with Unicode support and UTF-8 input will allow you to properly handle unexpected input, like non-ASCII languages (that's most of them), with no additional effort on your part.
从 Unicode 支持开始,UTF-8 输入将允许您无需额外努力即可正确处理意外输入,如非 ASCII 语言(这几乎是所有语言)。

Write rules, and test them in isolation
编写规则,并在隔离状态下测试它们

Treat rules as the unit of work in your parser. Write a rule, test its corners, and then use it to build larger rules or parsers. This allows you to get better coverage with less work, since exercising all the code paths of your rules, one by one, keeps the combinatorial number of paths through your code manageable.
将规则视为解析器中的工作单元。编写一个规则,测试其边界情况,然后使用它来构建更大的规则或解析器。这样可以以更少的劳动获得更好的覆盖率,因为逐个执行规则的所有代码路径,可以保持代码路径的组合数量在可控范围内。

Prefer auto-generated attributes to semantic actions
优先选择自动生成的属性而不是语义动作

There are multiple ways to get attributes out of a parser. You can:
有多种方法可以从解析器中获取属性。您可以选择:

  • use whatever attribute the parser generates;
    使用解析器生成的任何属性;
  • provide an attribute out-argument to parse() for the parser to fill in;
    提供一个属性输出参数给 parse() ,以便解析器填充;
  • use one or more semantic actions to assign attributes from the parser to variables outside the parser;
    使用一个或多个语义动作将解析器中的属性分配给解析器外部的变量;
  • use callback parsing to provide attributes via callback calls.
    使用回调解析通过回调调用提供属性。

All of these are fairly similar in how much effort they require, except for the semantic action method. For the semantic action approach, you need to have values to fill in from your parser, and keep them in scope for the duration of the parse.
所有这些在所需努力程度方面相当相似,除了语义动作方法。对于语义动作方法,你需要从你的解析器中获取值来填充,并在解析过程中保持它们的作用域。

It is much more straight forward, and leads to more reusable parsers, to have the parsers produce the attributes of the parse directly as a result of the parse.
它更直接,并且导致有更多可重用的解析器,让解析器直接将解析结果作为属性输出。

This does not mean that you should never use semantic actions. They are sometimes necessary. However, you should default to using the other non-semantic action methods, and only use semantic actions with a good reason.
这并不意味着你永远不应该使用语义动作。它们有时是必要的。然而,你应该默认使用其他非语义动作方法,并且只有出于良好理由才使用语义动作。

If your parser takes end-user input, give rules names that you would want an end-user to see
如果您的解析器接受最终用户输入,请给出您希望最终用户看到的规则名称

A typical error message produced by Boost.Parser will say something like, "Expected FOO here", where FOO is some rule or parser. Give your rules names that will read well in error messages like this. For instance, the JSON examples have these rules:
一个典型的由 Boost.Parser 生成的错误信息可能会说:“这里期望 FOO”,其中 FOO 是某个规则或解析器。为您的规则命名时,请确保它们在类似这样的错误信息中易于阅读。例如,JSON 示例中有这些规则:

bp::rule<class escape_seq, uint32_t> const escape_seq =
    "\\uXXXX hexadecimal escape sequence";
bp::rule<class escape_double_seq, uint32_t, double_escape_locals> const
    escape_double_seq = "\\uXXXX hexadecimal escape sequence";
bp::rule<class single_escaped_char, uint32_t> const single_escaped_char =
    "'\"', '\\', '/', 'b', 'f', 'n', 'r', or 't'";

Some things to note:
请注意以下几点:

- escape_seq and escape_double_seq have the same name-string. To an end-user who is trying to figure out why their input failed to parse, it doesn't matter which kind of result a parser rule generates. They just want to know how to fix their input. For either rule, the fix is the same: put a hexadecimal escape sequence there.
escape_seqescape_double_seq 具有相同的名称字符串。对于试图弄清楚为什么他们的输入无法解析的最终用户来说,解析规则生成的任何结果类型都无关紧要。他们只想知道如何修复他们的输入。对于这两个规则,修复方法相同:在那里放置一个十六进制转义序列。

- single_escaped_char has a terrible-looking name. However, it's not really used as a name anywhere per se. In error messages, it works nicely, though. The error will be "Expected '"', '', '/', 'b', 'f', 'n', 'r', or 't' here", which is pretty helpful.
- single_escaped_char 有一个看起来很糟糕的名字。然而,实际上它并不作为名字使用。在错误信息中,它工作得很好。错误将是“这里期望 ''', '', '/', 'b', 'f', 'n', 'r', 或 't'”,这非常有帮助。

Have a simple test that you can run to find ill-formed-code-as-asserts
有一个简单的测试,你可以运行以查找不规范的代码作为断言

Most of these errors are found at parser construction time, so no actual parsing is even necessary. For instance, a test case might look like this:
大多数这些错误都在解析器构建时被发现,因此甚至不需要进行实际解析。例如,一个测试用例可能看起来像这样:

TEST(my_parser_tests, my_rule_test) {
    my_rule r;
}

You should probably never need to write your own low-level parser. You have primitives like char_ from which to build up the parsers that you need. It is unlikely that you're going to need to do things on a lower level than a single character.
你可能永远不需要编写自己的底层解析器。你可以从诸如 char_ 这样的原语开始构建所需的解析器。你不太可能需要在比单个字符更低的级别上进行操作。

However. Some people are obsessed with writing everything for themselves. We call them C++ programmers. This section is for them. However, this section is not an in-depth tutorial. It is a basic orientation to get you familiar enough with all the moving parts of writing a parser that you can then learn by reading the Boost.Parser code.
然而,有些人沉迷于自己编写一切。我们称他们为 C++程序员。本节是为他们准备的。然而,本节不是一个深入的教程。它是一个基本的入门,让你熟悉编写解析器的所有组成部分,然后你可以通过阅读 Boost.Parser 代码来学习。

Each parser must provide two overloads of a function call(). One overload parses, producing an attribute (which may be the special no-attribute type detail::nope). The other one parses, filling in a given attribute. The type of the given attribute is a template parameter, so it can take any type that you can form a reference to.
每个解析器必须提供函数 call() 的两个重载版本。一个重载用于解析,生成一个属性(可能是特殊的无属性类型 detail::nope )。另一个重载用于解析,填充给定的属性。给定属性的类型是一个模板参数,因此它可以接受任何可以形成引用的类型。

Let's take a look at a Boost.Parser parser, opt_parser. This is the parser produced by use of operator-. First, here is the beginning of its definition.
让我们看看一个 Boost.Parser 解析器, opt_parser 。这是通过使用 operator- 产生的解析器。首先,这是其定义的开始。

template<typename Parser>
struct opt_parser
{

The end of its definition is:
它的定义结束为:

    Parser parser_;
};

As you can see, opt_parser's only data member is the parser it adapts, parser_. Here is its attribute-generating overload to call().
如您所见, opt_parser 的唯一数据成员是它所适配的解析器, parser_ 。这里是其生成属性的覆盖函数 call()

template<
    typename Iter,
    typename Sentinel,
    typename Context,
    typename SkipParser>
auto call(
    Iter & first,
    Sentinel last,
    Context const & context,
    SkipParser const & skip,
    detail::flags flags,
    bool & success) const
{
    using attr_t = decltype(parser_.call(
        first, last, context, skip, flags, success));
    detail::optional_of<attr_t> retval;
    call(first, last, context, skip, flags, success, retval);
    return retval;
}

First, let's look at the template and function parameters.
首先,让我们看看模板和函数参数。

  • Iter & first is the iterator. It is taken as an out-param. It is the responsibility of call() to advance first if and only if the parse succeeds.
    Iter & first 是迭代器。它被视为输出参数。只有在解析成功的情况下, call() 才负责前进 first
  • Sentinel last is the sentinel. If the parse has not yet succeeded within call(), and first == last is true, call() must fail (by setting bool & success to false).
    Sentinel last 是哨兵。如果在 call() 内尚未成功解析,并且 first == lasttrue ,则 call() 必须失败(通过将 bool & success 设置为 false )。
  • Context const & context is the parse context. It will be some specialization of detail::parse_context. The context is used in any call to a subparser's call(), and in some cases a new context should be created, and the new context passed to a subparser instead; more on that below.
    Context const & context 是解析上下文。它将是 detail::parse_context 的某种特殊化。上下文用于对子解析器的 call() 的任何调用,在某些情况下,应创建新的上下文,并将新上下文传递给子解析器;下面将详细介绍。
  • SkipParser const & skip is the current skip parser. skip should be used at the beginning of the parse, and in between any two uses of any subparser(s).
    SkipParser const & skip 是当前跳过解析器。 skip 应该用于解析的开始,以及任何两个子解析器使用之间。
  • detail::flags flags are a collection of flags indicating various things about the current state of the parse. flags is concerned with whether to produce attributes at all; whether to apply the skip parser skip; whether to produce a verbose trace (as when boost::parser::trace::on is passed at the top level); and whether we are currently inside the utility function detail::apply_parser.
    detail::flags flags 是一组标志,表示关于当前解析状态的各个方面。 flags 关注是否产生属性;是否应用跳过解析器 skip ;是否产生详细跟踪(例如当 boost::parser::trace::on 在顶层传递时);以及我们是否当前在实用函数 detail::apply_parser 内部。
  • bool & success is the final function parameter. It should be set to true if the parse succeeds, and false otherwise.
    bool & success 是最终函数参数。如果解析成功,则应设置为 true ,否则设置为 false

Now the body of the function. Notice that it just dispatches to the other call() overload. This is really common, since both overloads need to to the same parsing; only the attribute may differ. The first line of the body defines attr_t, the default attribute type of our wrapped parser parser_. It does this by getting the decltype() of a use of parser_.call(). (This is the logic represented by ATTR() in the rest of the documentation.) Since opt_parser represents an optional value, the natural type for its attribute is std::optional<ATTR(parser)>. However, this does not work for all cases. In particular, it does not work for the "no-attribute" type detail::nope, nor for std::optional<T>ATTR(--p) is just ATTR(-p). So, the second line uses an alias that takes care of those details, detail::optional_of<>. The third line just calls the other overload of call(), passing retval as the out-param. Finally, retval is returned on the last line.
现在进入函数体。注意,它只是调度到其他 call() 重载。这真的很常见,因为两个重载都需要进行相同的解析;只有属性可能不同。函数体的第一行定义了 attr_t ,我们包装解析器 parser_ 的默认属性类型。它是通过获取 parser_.call() 的使用 decltype() 来实现的。(这是文档中 ATTR () 所代表的逻辑。)由于 opt_parser 代表一个可选值,其属性的自然类型是 std::optional<ATTR(parser)> 。然而,这并不适用于所有情况。特别是,它不适用于“无属性”类型 detail::nope ,也不适用于 std::optional<T> —— ATTR(--p) 只是 ATTR(-p) 。因此,第二行使用了一个别名来处理这些细节, detail::optional_of<> 。第三行只是调用了 call() 的另一个重载,并将 retval 作为输出参数传递。最后, retval 在最后一行返回。

Now, on to the other overload.
现在,转到其他重载。

template<
    typename Iter,
    typename Sentinel,
    typename Context,
    typename SkipParser,
    typename Attribute>
void call(
    Iter & first,
    Sentinel last,
    Context const & context,
    SkipParser const & skip,
    detail::flags flags,
    bool & success,
    Attribute & retval) const
{
    [[maybe_unused]] auto _ = detail::scoped_trace(
        *this, first, last, context, flags, retval);

    detail::skip(first, last, skip, flags);

    if (!detail::gen_attrs(flags)) {
        parser_.call(first, last, context, skip, flags, success);
        success = true;
        return;
    }

    parser_.call(first, last, context, skip, flags, success, retval);
    success = true;
}

The template and function parameters here are identical to the ones from the other overload, except that we have Attribute & retval, our out-param.
模板和函数参数此处与来自其他重载的相同,除了我们有自己的输出参数 Attribute & retval

Let's look at the implementation a bit at a time.
让我们一次看一点实现。

[[maybe_unused]] auto _ = detail::scoped_trace(
    *this, first, last, context, flags, retval);

This defines a RAII trace object that will produce the verbose trace requested by the user if they passed boost::parser::trace::on to the top-level parse. It only has effect if detail::enable_trace(flags) is true. If trace is enabled, it will show the state of the parse at the point at which it is defined, and then again when it goes out of scope.
这定义了一个 RAII 跟踪对象,如果用户将 boost::parser::trace::on 传递给顶级解析,它将生成用户请求的详细跟踪。只有当 detail::enable_trace(flags)true 时才有效。如果跟踪已启用,它将在定义解析状态时显示,然后在它超出作用域时再次显示。

[Important] Important  重要

For the tracing code to work, you must define an overload of detail::print_parser for your new parser type/template. See <boost/parser/detail/printing.hpp> for examples.
为了跟踪代码能够工作,你必须为你的新解析器类型/模板定义一个 detail::print_parser 的重载。请参阅 <boost/parser/detail/printing.hpp> 以获取示例。

detail::skip(first, last, skip, flags);

This one is pretty simple; it just applies the skip parser. opt_parser only has one subparser, but if it had more than one, or if it had one that it applied more than once, it would need to repeat this line using skip between every pair of uses of any subparser.
这一部分相当简单;它只是应用了跳过解析器。 opt_parser 只有一个子解析器,但如果它有多个,或者如果它应用了不止一次,它就需要在每个子解析器的每次使用之间重复这一行,使用 skip 分隔。

if (!detail::gen_attrs(flags)) {
    parser_.call(first, last, context, skip, flags, success);
    success = true;
    return;
}

This path accounts for the case where we don't want to generate attributes at all, perhaps because this parser sits inside an omit[] directive.
此路径考虑了我们不希望生成任何属性的情况,可能是因为此解析器位于 omit[] 指令内部。

parser_.call(first, last, context, skip, flags, success, retval);
success = true;

This is the other, typical, path. Here, we do want to generate attributes, and so we do the same call to parser_.call(), except that we also pass retval.
这是另一条典型路径。在这里,我们确实想要生成属性,所以我们调用 parser_.call() ,同时也会传递 retval

Note that we set success to true after the call to parser_.call() in both code paths. Since opt_parser is zero-or-one, if the subparser fails, opt_parse still succeeds.
请注意,在两个代码路径中,我们在调用 parser_.call() 之后将 success 设置为 true 。由于 opt_parser 是零或一,如果子解析器失败, opt_parse 仍然成功。

When to make a new parse context
何时创建新的解析上下文

Sometimes, you need to change something about the parse context before calling a subparser. For instance, rule_parser sets up the value, locals, etc., that are available for that rule. action_parser adds the generated attribute to the context (available as _attr(ctx)). Contexts are immutable in Boost.Parser. To "modify" one for a subparser, you create a new one with the appropriate call to detail::make_context().
有时,在调用子解析器之前,您需要更改解析上下文中的某些内容。例如, rule_parser 设置可用于该规则的值、局部变量等。 action_parser 将生成的属性添加到上下文中(作为 _attr(ctx) 可用)。在 Boost.Parser 中,上下文是不可变的。要“修改”一个用于子解析器,您需要创建一个新的,并使用适当的 detail::make_context() 调用。

detail::apply_parser()

Sometimes a parser needs to operate on an out-param that is not exactly the same as its default attribute, but that is compatible in some way. To do this, it's often useful for the parser to call itself, but with slightly different parameters. detail::apply_parser() helps with this. See the out-param overload of repeat_parser::call() for an example. Note that since this creates a new scope for the ersatz parser, the scoped_trace object needs to know whether we're inside detail::apply_parser or not.
有时解析器需要在一个与默认属性不完全相同但以某种方式兼容的输出参数上操作。为此,解析器通常需要调用自身,但使用略微不同的参数。 detail::apply_parser() 有助于此。有关示例,请参阅 repeat_parser::call() 的输出参数重载。请注意,由于这为替代解析器创建了一个新的作用域, scoped_trace 对象需要知道我们是否处于 detail::apply_parser 内部。

That's a lot, I know. Again, this section is not meant to be an in-depth tutorial. You know enough now that the parsers in parser.hpp are at least readable.
这很多,我知道。再次强调,本节并非旨在提供深入教程。你现在已经足够了解, parser.hpp 中的解析器至少是可读的。


PrevUpHomeNext