这是用户在 2025-1-3 9:50 为 https://www.boost.org/doc/libs/1_87_0/doc/html/boost_parser/tutorial.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world.
...世界上最受推崇和精心设计的 C++库项目之一。
Herb Sutter and Andrei Alexandrescu, C++ Coding Standards
— 赫伯·苏特和安德烈·亚历山德鲁斯库,C++ 编程规范

PrevUpHomeNext

Tutorial  教程

Terminology  术语
Hello, Whomever  你好,无论谁
A Trivial Example  一个简单的例子
A Trivial Example That Gracefully Handles Whitespace
一个优雅处理空白字符的简单示例
Semantic Actions  语义动作
Parsing to Find Subranges
解析以查找子范围
The Parse Context  解析上下文
Rule Parsers  规则解析器
Parsing into structs and classes
解析为 struct s 和 class es
Alternative Parsers  替代解析器
Parsing Quoted Strings  解析引号字符串
Parsing In Detail  详细解析
Backtracking  回溯
Symbol Tables  符号表
Mutable Symbol Tables  可变符号表
The Parsers And Their Uses
解析及其用途
Directives  指令
Combining Operations  结合操作
Attribute Generation  属性生成
The parse() API  The parse() API (原文中包含特殊符号和代码,因此未进行翻译。)
More About Rules  更多关于规则的信息
Algorithms and Views That Use Parsers
算法和解析器使用的视图
Unicode Support  Unicode 支持
Callback Parsing  回调解析
Error Handling and Debugging
错误处理和调试
Memory Allocation  内存分配
Best Practices  最佳实践
Writing Your Own Parsers
编写您自己的解析器

First, let's cover some terminology that we'll be using throughout the docs:
首先,让我们介绍一些将在文档中使用的术语:

A semantic action is an arbitrary bit of logic associated with a parser, that is only executed when the parser matches.
语义动作是与解析器相关联的任意逻辑片段,仅在解析器匹配时执行。

Simpler parsers can be combined to form more complex parsers. Given some combining operation C, and parsers P0, P1, ... PN, C(P0, P1, ... PN) creates a new parser Q. This creates a parse tree. Q is the parent of P1, P2 is the child of Q, etc. The parsers are applied in the top-down fashion implied by this topology. When you use Q to parse a string, it will use P0, P1, etc. to do the actual work. If P3 is being used to parse the input, that means that Q is as well, since the way Q parses is by dispatching to its children to do some or all of the work. At any point in the parse, there will be exactly one parser without children that is being used to parse the input; all other parsers being used are its ancestors in the parse tree.
更简单的解析器可以组合成更复杂的解析器。给定一些组合操作 C ,以及解析器 P0P1 ,... PNC(P0, P1, ... PN) 创建一个新的解析器 Q 。这创建了一个解析树。 QP1 的父节点, P2Q 的子节点等。解析器按照这种拓扑隐含的从上到下的方式应用。当你使用 Q 解析字符串时,它将使用 P0P1 等来完成实际工作。如果正在使用 P3 来解析输入,这意味着 Q 也在使用,因为 Q 解析的方式是通过将其子节点调度到做部分或全部工作。在解析的任何时刻,将恰好有一个没有子节点的解析器被用来解析输入;所有其他正在使用的解析器都是解析树中的祖先。

A subparser is a parser that is the child of another parser.
子解析器是另一个解析器的子解析器。

The top-level parser is the root of the tree of parsers.
顶级解析器是解析器树的根。

The current parser or bottommost parser is the parser with no children that is currently being used to parse the input.
当前解析器或最底层的解析器是当前用于解析输入的无子节点的解析器。

A rule is a kind of parser that makes building large, complex parsers easier. A subrule is a rule that is the child of some other rule. The current rule or bottommost rule is the one rule currently being used to parse the input that has no subrules. Note that while there is always exactly one current parser, there may or may not be a current rule — rules are one kind of parser, and you may or may not be using one at a given point in the parse.
规则是一种使构建大型、复杂解析器更简单的解析器。子规则是某个其他规则的子规则。当前规则或最底层的规则是当前用于解析没有子规则的输入的规则。请注意,虽然始终只有一个当前解析器,但可能有一个或没有当前规则——规则是解析器的一种,您可能在解析的某个点上使用或不使用它。

The top-level parse is the parse operation being performed by the top-level parser. This term is necessary because, though most parse failures are local to a particular parser, some parse failures cause the call to parse() to indicate failure of the entire parse. For these cases, we say that such a local failure "causes the top-level parse to fail".
顶级解析是顶级解析器正在执行的解释操作。这个术语是必要的,因为尽管大多数解析失败都是局部于特定解析器的,但有些解析失败会导致调用 parse() 以指示整个解析失败。在这些情况下,我们说这种局部失败“导致顶级解析失败”。

Throughout the Boost.Parser documentation, I will refer to "the call to parse()". Read this as "the call to any one of the functions described in The parse() API". That includes prefix_parse(), callback_parse(), and callback_prefix_parse().
在整个 Boost.Parser 文档中,我将提到“对 parse() 的调用”。请将其理解为“对 The parse() API 中描述的任何函数的调用”。这包括 prefix_parse()callback_parse()callback_prefix_parse()

There are some special kinds of parsers that come up often in this documentation.
这里有一些在文档中经常出现的特殊类型的解析器。

One is a sequence parser; you will see it created using operator>>, as in p1 >> p2 >> p3. A sequence parser tries to match all of its subparsers to the input, one at a time, in order. It matches the input iff all its subparsers do.
一个是一个序列解析器;您将看到它是如何使用 operator>> 创建的,就像 p1 >> p2 >> p3 一样。序列解析器试图按顺序将所有子解析器与输入匹配,一次一个。如果所有子解析器都匹配,则匹配输入。

Another is an alternative parser; you will see it created using operator|, as in p1 | p2 | p3. An alternative parser tries to match all of its subparsers to the input, one at a time, in order; it stops after matching at most one subparser. It matches the input iff one of its subparsers does.
另一个是替代解析器;您将看到它是如何使用 operator| 创建的,就像 p1 | p2 | p3 一样。替代解析器会尝试按顺序将所有子解析器与输入匹配,一次一个;它最多匹配一个子解析器后停止。如果其中一个子解析器匹配输入,则匹配输入。

Finally, there is a permutation parser; it is created using operator||, as in p1 || p2 || p3. A permutation parser tries to match all of its subparsers to the input, in any order. So the parser p1 || p2 || p3 is equivalent to (p1 >> p2 >> p3) | (p1 >> p3 >> p2) | (p2 >> p1 >> p3) | (p2 >> p3 >> p1) | (p3 >> p1 >> p2) | (p3 >> p2 >> p1). Hopefully its terseness is self-explanatory. It matches the input iff all of its subparsers do, regardless of the order they match in.
最后,有一个排列解析器;它是使用 operator|| 创建的,就像 p1 || p2 || p3 一样。排列解析器尝试以任何顺序将其子解析器与输入匹配。因此,解析器 p1 || p2 || p3 等同于 (p1 >> p2 >> p3) | (p1 >> p3 >> p2) | (p2 >> p1 >> p3) | (p2 >> p3 >> p1) | (p3 >> p1 >> p2) | (p3 >> p2 >> p1) 。希望它的简洁性是显而易见的。它只有在所有子解析器都匹配的情况下才匹配输入,无论它们匹配的顺序如何。

Boost.Parser parsers each have an attribute associated with them, or explicitly have no attribute. An attribute is a value that the parser generates when it matches the input. For instance, the parser double_ generates a double when it matches the input. ATTR() is a notional macro that expands to the attribute type of the parser passed to it; ATTR(double_) is double. This is similar to the attribute type trait.
每个 Boost.Parser 解析器都有一个与之关联的属性,或者明确没有属性。属性是解析器在匹配输入时生成的值。例如,当解析器 double_ 匹配输入时,它会生成一个 doubleATTR () 是一个概念宏,它扩展为传递给它的解析器的属性类型; ATTR(double_)double 。这与 attribute 类型特性类似。

Next, we'll look at some simple programs that parse using Boost.Parser. We'll start small and build up from there.
接下来,我们将查看一些使用 Boost.Parser 进行解析的简单程序。我们将从小处着手,逐步构建。

This is just about the most minimal example of using Boost.Parser that one could write. We take a string from the command line, or "World" if none is given, and then we parse it:
这是使用 Boost.Parser 所能编写的最简例子之一。我们从命令行获取一个字符串,如果没有提供,则使用 "World" ,然后对其进行解析:

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main(int argc, char const * argv[])
{
    std::string input = "World";
    if (1 < argc)
        input = argv[1];

    std::string result;
    bp::parse(input, *bp::char_, result);
    std::cout << "Hello, " << result << "!\n";
}

The expression *bp::char_ is a parser-expression. It uses one of the many parsers that Boost.Parser provides: char_. Like all Boost.Parser parsers, it has certain operations defined on it. In this case, *bp::char_ is using an overloaded operator* as the C++ version of a Kleene star operator. Since C++ has no postfix unary * operator, we have to use the one we have, so it is used as a prefix.
表达式 *bp::char_ 是一个解析表达式。它使用 Boost.Parser 提供的许多解析器之一: char_ 。像所有 Boost.Parser 解析器一样,它在其上定义了某些操作。在这种情况下, *bp::char_ 使用了重载的 operator* 作为 C++ 版本的 Kleene 星号运算符。由于 C++ 没有后缀一元 * 运算符,我们必须使用我们有的,所以它被用作前缀。

So, *bp::char_ means "any number of characters". In other words, it really cannot fail. Even an empty string will match it.
所以, *bp::char_ 表示“任意数量的字符”。换句话说,它实际上不可能失败。即使是空字符串也能匹配它。

The parse operation is performed by calling the parse() function, passing the parser as one of the arguments:
解析操作通过调用 parse() 函数执行,将解析器作为参数之一传递:

bp::parse(input, *bp::char_, result);

The arguments here are: input, the range to parse; *bp::char_, the parser used to do the parse; and result, an out-parameter into which to put the result of the parse. Don't get too caught up on this method of getting the parse result out of parse(); there are multiple ways of doing so, and we'll cover all of them in subsequent sections.
这里的参数有: input ,要解析的范围; *bp::char_ ,用于解析的解析器;以及 result ,一个输出参数,用于存放解析结果。不要过于纠结于从 parse() 获取解析结果的方法;有多种方法可以实现,我们将在后续章节中全部介绍。

Also, just ignore for now the fact that Boost.Parser somehow figured out that the result type of the *bp::char_ parser is a std::string. There are clear rules for this that we'll cover later.
此外,现在先忽略这样一个事实:Boost.Parser 不知怎么的推断出 *bp::char_ 解析器的结果类型是 std::string 。对此有明确的规则,我们稍后会讨论。

The effects of this call to parse() is not very interesting — since the parser we gave it cannot ever fail, and because we're placing the output in the same type as the input, it just copies the contents of input to result.
此调用 parse() 的效果并不很有趣——因为我们给出的解析器永远不会失败,而且因为我们把输出放在与输入相同的类型中,它只是将 input 的内容复制到 result

Let's look at a slightly more complicated example, even if it is still trivial. Instead of taking any old chars we're given, let's require some structure. Let's parse one or more doubles, separated by commas.
让我们看看一个稍微复杂一点的例子,即使它仍然很 trivial。不是随便拿给我们的任何旧的 char ,而是要求一些结构。让我们解析一个或多个由逗号分隔的 double

The Boost.Parser parser for double is double_. So, to parse a single double, we'd just use that. If we wanted to parse two doubles in a row, we'd use:
The Boost.Parser 解析器用于 doubledouble_ 。因此,要解析单个 double ,我们只需使用它。如果我们想连续解析两个 double ,我们会使用:

boost::parser::double_ >> boost::parser::double_

operator>> in this expression is the sequence-operator; read it as "followed by". If we combine the sequence-operator with Kleene star, we can get the parser we want by writing:
operator>> 在这个表达式中是序列运算符;读作“之后”。如果我们把序列运算符与 Kleene 星号结合,就可以通过编写以下内容来得到我们想要的解析器:

boost::parser::double_ >> *(',' >> boost::parser::double_)

This is a parser that matches at least one double — because of the first double_ in the expression above — followed by zero or more instances of a-comma-followed-by-a-double. Notice that we can use ',' directly. Though it is not a parser, operator>> and the other operators defined on Boost.Parser parsers have overloads that accept character/parser pairs of arguments; these operator overloads will create the right parser to recognize ','.
这是一个至少匹配一个 double 的解析器——因为上述表达式中的第一个 double_ ——后面跟着零个或多个由逗号和 double 组成的实例。请注意,我们可以直接使用 ',' 。尽管它不是一个解析器, operator>> 和其他在 Boost.Parser 解析器上定义的运算符有接受字符/解析器对参数的重载;这些运算符重载将创建识别 ',' 的正确解析器。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a list of doubles, separated by commas.  No pressure. ";
    std::string input;
    std::getline(std::cin, input);

    auto const result = bp::parse(input, bp::double_ >> *(',' >> bp::double_));

    if (result) {
        std::cout << "Great! It looks like you entered:\n";
        for (double x : *result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout
            << "Good job!  Please proceed to the recovery annex for cake.\n";
    }
}

The first example filled in an out-parameter to deliver the result of the parse. This call to parse() returns a result instead. As you can see, the result is contextually convertible to bool, and *result is some sort of range. In fact, the return type of this call to parse() is std::optional<std::vector<double>>. Naturally, if the parse fails, std::nullopt is returned. We'll look at how Boost.Parser maps the type of the parser to the return type, or the filled in out-parameter's type, a bit later.
第一个示例填充了一个输出参数以传递解析的结果。这个对 parse() 的调用返回了一个结果。正如你所见,结果可以上下文转换成 bool ,而 *result 是一种范围。实际上,这个对 parse() 的调用返回类型是 std::optional<std::vector<double>> 。当然,如果解析失败,则返回 std::nullopt 。我们稍后会看看 Boost.Parser 如何将解析器的类型映射到返回类型,或者填充的输出参数的类型。

[Note] Note  注意

There's a type trait that can tell you the attribute type for a parser, attribute (and an associated alias attribute_t). We'll discuss it more in the Attribute Generation section.
存在一种类型特性,可以告诉你解析器的属性类型, attribute (以及相关的别名 attribute_t )。我们将在属性生成部分进一步讨论。

If I run it in a shell, this is the result:
如果我在 shell 中运行它,这是结果:

$ example/trivial
Enter a list of doubles, separated by commas.  No pressure. 5.6,8.9
Great! It looks like you entered:
5.6
8.9
$ example/trivial
Enter a list of doubles, separated by commas.  No pressure. 5.6, 8.9
Good job!  Please proceed to the recovery annex for cake.

It does not recognize "5.6, 8.9". This is because it expects a comma followed immediately by a double, but I inserted a space after the comma. The same failure to parse would occur if I put a space before the comma, or before or after the list of doubles.
它不识别 "5.6, 8.9" 。这是因为它期望逗号后立即跟一个 double ,但我却在逗号后插入了空格。如果我在逗号前或 double 列表前后加空格,也会出现同样的解析失败。

One more thing: there is a much better way to write the parser above. Instead of repeating the double_ subparser, we could have written this:
还有一件事:上面解析器的写法有更好的方法。我们不必重复使用 double_ 子解析器,可以写成这样:

bp::double_ % ','

That's semantically identical to bp::double_ >> *(',' >> bp::double_). This pattern — some bit of input repeated one or more times, with a separator between each instance — comes up so often that there's an operator specifically for that, operator%. We'll be using that operator from now on.
这与 bp::double_ >> *(',' >> bp::double_) 在语义上相同。这种模式——一些输入重复一次或多次,每次之间有分隔符——出现得如此频繁,以至于有一个专门的操作符用于此, operator% 。从现在起,我们将使用该操作符。

Let's modify the trivial parser we just saw to ignore any spaces it might find among the doubles and commas. To skip whitespace wherever we find it, we can pass a skip parser to our call to parse() (we don't need to touch the parser passed to parse()). Here, we use ws, which matches any Unicode whitespace character.
让我们修改我们刚才看到的平凡解析器,使其忽略在 double s 和逗号之间可能找到的任何空格。要跳过我们找到的任何空白,我们可以将跳过解析器传递给我们的 parse() 调用(我们不需要触摸传递给 parse() 的解析器)。在这里,我们使用 ws ,它匹配任何 Unicode 空白字符。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a list of doubles, separated by commas.  No pressure. ";
    std::string input;
    std::getline(std::cin, input);

    auto const result = bp::parse(input, bp::double_ % ',', bp::ws);

    if (result) {
        std::cout << "Great! It looks like you entered:\n";
        for (double x : *result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout
            << "Good job!  Please proceed to the recovery annex for cake.\n";
    }
}

The skip parser, or skipper, is run between the subparsers within the parser passed to parse(). In this case, the skipper is run before the first double is parsed, before any subsequent comma or double is parsed, and at the end. So, the strings "3.6,5.9" and " 3.6 , \t 5.9 " are parsed the same by this program.
跳过解析器,或称为跳过器,在传递给 parse() 的解析器内的子解析器之间运行。在这种情况下,跳过器在解析第一个 double 之前运行,在解析任何后续逗号或 double 之前运行,并在最后运行。因此,该程序以相同的方式解析字符串 "3.6,5.9"" 3.6 , \t 5.9 "

Skipping is an important concept in Boost.Parser. You can skip anything, not just whitespace; there are lots of other things you might want to skip. The skipper you pass to parse() can be an arbitrary parser. For example, if you write a parser for a scripting language, you can write a skipper to skip whitespace, inline comments, and end-of-line comments.
跳过是 Boost.Parser 中的一个重要概念。你可以跳过任何内容,而不仅仅是空白;你可能想要跳过很多东西。传递给 parse() 的跳过器可以是一个任意的解析器。例如,如果你为脚本语言编写了一个解析器,你可以编写一个跳过器来跳过空白、行内注释和行尾注释。

We'll be using skip parsers almost exclusively in the rest of the documentation. The ability to ignore the parts of your input that you don't care about is so convenient that parsing without skipping is a rarity in practice.
我们将几乎在文档的其余部分使用跳过解析器。忽略你不需要关注的部分的能力非常方便,以至于在实际应用中不跳过的解析几乎很少见。

Like all parsing systems (lex & yacc, Boost.Spirit, etc.), Boost.Parser has a mechanism for associating semantic actions with different parts of the parse. Here is nearly the same program as we saw in the previous example, except that it is implemented in terms of a semantic action that appends each parsed double to a result, instead of automatically building and returning the result. To do this, we replace the double_ from the previous example with double_[action]; action is our semantic action:
与所有解析系统(lex & yacc、Boost.Spirit 等)一样,Boost.Parser 有一个将语义动作与解析的不同部分关联的机制。这里是一个与上一个例子几乎相同的程序,只不过它是在语义动作的术语中实现的,该动作将每个解析的 double 追加到结果中,而不是自动构建和返回结果。为此,我们将上一个例子中的 double_ 替换为 double_[action]action 是我们的语义动作:

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a list of doubles, separated by commas. ";
    std::string input;
    std::getline(std::cin, input);

    std::vector<double> result;
    auto const action = [&result](auto & ctx) {
        std::cout << "Got one!\n";
        result.push_back(_attr(ctx));
    };
    auto const action_parser = bp::double_[action];
    auto const success = bp::parse(input, action_parser % ',', bp::ws);

    if (success) {
        std::cout << "You entered:\n";
        for (double x : result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

Run in a shell, it looks like this:
在 shell 中运行,看起来是这样的:

$ example/semantic_actions
Enter a list of doubles, separated by commas. 4,3
Got one!
Got one!
You entered:
4
3

In Boost.Parser, semantic actions are implemented in terms of invocable objects that take a single parameter to a parse-context object. The parse-context object represents the current state of the parse. In the example we used this lambda as our invocable:
在 Boost.Parser 中,语义动作是通过接受一个解析上下文对象参数的可调用对象实现的。解析上下文对象表示解析的当前状态。在示例中,我们使用这个 lambda 作为我们的可调用对象:

auto const action = [&result](auto & ctx) {
    std::cout << "Got one!\n";
    result.push_back(_attr(ctx));
};

We're both printing a message to std::cout and recording a parsed result in the lambda. It could do both, either, or neither of these things if you like. The way we get the parsed double in the lambda is by asking the parse context for it. _attr(ctx) is how you ask the parse context for the attribute produced by the parser to which the semantic action is attached. There are lots of functions like _attr() that can be used to access the state in the parse context. We'll cover more of them later on. The Parse Context defines what exactly the parse context is and how it works.
我们都在向 std::cout 打印消息并在 lambda 中记录解析结果。如果你喜欢,它可以同时做这两件事,也可以只做其中一件,或者一件都不做。我们通过询问解析上下文来获取 lambda 中的解析 double_attr(ctx) 是询问解析上下文以获取与语义动作相关联的解析器产生的属性的方式。有许多像 _attr() 这样的函数可以用来访问解析上下文中的状态。我们将在稍后介绍更多这样的函数。解析上下文定义了解析上下文的确切含义及其工作方式。

Note that you can't write an unadorned lambda directly as a semantic action. Otherwise, the compile will see two '[' characters and think it's about to parse an attribute. Parentheses fix this:
请注意,您不能直接将未装饰的 lambda 作为语义动作写入。否则,编译器会看到两个 '[' 字符,并认为它即将解析一个属性。括号可以解决这个问题:

p[([](auto & ctx){/*...*/})]

Before you do this, note that the lambdas that you write as semantic actions are almost always generic (having an auto & ctx parameter), and so are very frequently re-usable. Most semantic action lambdas you write should be written out-of-line, and given a good name. Even when they are not reused, named lambdas keep your parsers smaller and easier to read.
在执行此操作之前,请注意,您作为语义动作编写的 lambda 函数几乎总是通用的(具有 auto & ctx 参数),因此它们非常频繁地可重用。您编写的多数语义动作 lambda 函数应该独立编写,并赋予一个良好的名称。即使它们没有被重用,命名 lambda 函数也能使您的解析器更小、更易于阅读。

[Important] Important  重要

Attaching a semantic action to a parser removes its attribute. That is, ATTR(p[a]) is always the special no-attribute type none, regardless of what type ATTR(p) is.
附加语义动作到解析器会移除其属性。也就是说, ATTR(p[a]) 总是特殊的无属性类型 none ,无论 ATTR(p) 是什么类型。

Semantic actions inside rules
规则内的语义动作

There are some other forms for semantic actions, when they are used inside of rules. See More About Rules for details.
存在一些其他形式的语义动作,当它们在 rules 内部使用时。有关详细信息,请参阅规则。

So far we've seen examples that parse some text and generate associated attributes. Sometimes, you want to find some subrange of the input that contains what you're looking for, and you don't want to generate attributes at all.
到目前为止,我们已经看到了一些解析文本并生成相关属性的示例。有时,你可能只想找到包含你所需内容的输入子范围,而不想生成任何属性。

There are two directives that affect the attribute type of any parser, raw[] and string_view[]. (We'll get to directives in more detail in the Directives section later. For now, you just need to know that a directive wraps a parser, and changes some aspect of how it functions.)
有两个指令会影响任何解析器的属性类型,即 raw[]string_view[] 。(我们将在指令部分详细讨论指令。现在,你只需要知道指令会包装解析器,并改变其功能的一些方面。)

raw[]  raw[]:原始数组

raw[] changes the attribute of its parser to be a subrange whose begin() and end() return the bounds of the sequence being parsed that match p.
raw[] 更改其解析器的属性,使其成为一个 subrange ,该 subrangebegin()end() 返回与 p 匹配的序列的界限。

namespace bp = boost::parser;
auto int_parser = bp::int_ % ',';            // ATTR(int_parser) is std::vector<int>
auto subrange_parser = bp::raw[int_parser];  // ATTR(subrange_parser) is a subrange

// Parse using int_parser, generating integers.
auto ints = bp::parse("1, 2, 3, 4", int_parser, bp::ws);
assert(ints);
assert(*ints == std::vector<int>({1, 2, 3, 4}));

// Parse again using int_parser, but this time generating only the
// subrange matched by int_parser.  (prefix_parse() allows matches that
// don't consume the entire input.)
auto const str = std::string("1, 2, 3, 4, a, b, c");
auto first = str.begin();
auto range = bp::prefix_parse(first, str.end(), subrange_parser, bp::ws);
assert(range);
assert(range->begin() == str.begin());
assert(range->end() == str.begin() + 10);

static_assert(std::is_same_v<
              decltype(range),
              std::optional<bp::subrange<std::string::const_iterator>>>);

Note that the subrange has the iterator type std::string::const_iterator, because that's the iterator type passed to prefix_parse(). If we had passed char const * iterators to prefix_parse(), that would have been the iterator type. The only exception to this comes from Unicode-aware parsing (see Unicode Support). In some of those cases, the iterator being used in the parse is not the one you passed. For instance, if you call prefix_parse() with char8_t * iterators, it will create a UTF-8 to UTF-32 transcoding view, and parse the iterators of that view. In such a case, you'll get a subrange whose iterator type is a transcoding iterator. When that happens, you can get the underlying iterator — the one you passed to prefix_parse() — by calling the .base() member function on each transcoding iterator in the returned subrange.
请注意, subrange 具有迭代器类型 std::string::const_iterator ,因为那是传递给 prefix_parse() 的迭代器类型。如果我们向 prefix_parse() 传递了 char const * 迭代器,那么迭代器类型就是那个。唯一的例外来自对 Unicode 的解析(见 Unicode 支持)。在这些情况中,用于解析的迭代器不是你传递的那个。例如,如果你用 char8_t * 迭代器调用 prefix_parse() ,它将创建一个 UTF-8 到 UTF-32 转换视图,并解析该视图的迭代器。在这种情况下,你将得到一个迭代器类型为转换迭代器的 subrange 。当发生这种情况时,你可以通过在返回的 subrange 中的每个转换迭代器上调用 .base() 成员函数来获取底层迭代器——即你传递给 prefix_parse() 的那个。

auto const u8str = std::u8string(u8"1, 2, 3, 4, a, b, c");
auto u8first = u8str.begin();
auto u8range = bp::prefix_parse(u8first, u8str.end(), subrange_parser, bp::ws);
assert(u8range);
assert(u8range->begin().base() == u8str.begin());
assert(u8range->end().base() == u8str.begin() + 10);
string_view[]  字符串视图数组

string_view[] has very similar semantics to raw[], except that it produces a std::basic_string_view<CharT> (where CharT is the type of the underlying range begin parsed) instead of a subrange. For this to work, the underlying range must be contiguous. Contiguity of iterators is not detectable before C++20, so this directive is only available in C++20 and later.
string_view[]raw[] 的语义非常相似,除了它产生一个 std::basic_string_view<CharT> (其中 CharT 是底层范围的开始解析类型)而不是一个 subrange 。为了使其工作,底层范围必须是连续的。在 C++20 之前,迭代器的连续性是不可检测的,因此此指令仅在 C++20 及以后版本中可用。

namespace bp = boost::parser;
auto int_parser = bp::int_ % ',';              // ATTR(int_parser) is std::vector<int>
auto sv_parser = bp::string_view[int_parser];  // ATTR(sv_parser) is a string_view

auto const str = std::string("1, 2, 3, 4, a, b, c");
auto first = str.begin();
auto sv1 = bp::prefix_parse(first, str.end(), sv_parser, bp::ws);
assert(sv1);
assert(*sv1 == str.substr(0, 10));

static_assert(std::is_same_v<decltype(sv1), std::optional<std::string_view>>);

Since string_view[] produces string_views, it cannot return transcoding iterators as described above for raw[]. If you parse a sequence of CharT with string_view[], you get exactly a std::basic_string_view<CharT>. If the parse is using transcoding in the Unicode-aware path, string_view[] will decompose the transcoding iterator as necessary. If you pass a transcoding view to parse() or transcoding iterators to prefix_parse(), string_view[] will still see through the transcoding iterators without issue, and give you a string_view of part of the underlying range.
由于 string_view[] 产生 string_view ,它不能像上面描述的那样为 raw[] 返回转码迭代器。如果你用 string_view[] 解析一个 CharT 序列,你会得到一个精确的 std::basic_string_view<CharT> 。如果解析在 Unicode 感知路径中使用转码, string_view[] 将根据需要分解转码迭代器。如果你将转码视图传递给 parse() 或将转码迭代器传递给 prefix_parse()string_view[] 仍然可以无问题地看穿转码迭代器,并给你一个底层范围的子范围。

auto sv2 = bp::parse("1, 2, 3, 4" | bp::as_utf32, sv_parser, bp::ws);
assert(sv2);
assert(*sv2 == "1, 2, 3, 4");

static_assert(std::is_same_v<decltype(sv2), std::optional<std::string_view>>);

Now would be a good time to describe the parse context in some detail. Any semantic action that you write will need to use state in the parse context, so you need to know what's available.
现在是一个详细描述解析上下文的好时机。你编写的任何语义动作都需要在解析上下文中使用状态,因此你需要知道有什么可用。

The parse context is an object that stores the current state of the parse — the current- and end-iterators, the error handler, etc. Data may seem to be "added" to or "removed" from it at different times during the parse. For instance, when a parser p with a semantic action a succeeds, the context adds the attribute that p produces to the parse context, then calls a, passing it the context.
解析上下文是一个对象,用于存储解析的当前状态——当前和结束迭代器、错误处理器等。数据可能在解析的不同时间被“添加”或“删除”。例如,当解析器 p 执行语义动作 a 成功时,上下文会将 p 生成的属性添加到解析上下文中,然后调用 a ,并将上下文传递给它。

Though the context object appears to have things added to or removed from it, it does not. In reality, there is no one context object. Contexts are formed at various times during the parse, usually when starting a subparser. Each context is formed by taking the previous context and adding or changing members as needed to form a new context object. When the function containing the new context object returns, its context object (if any) is destructed. This is efficient to do, because the parse context has only about a dozen data members, and each data member is less than or equal to the size of a pointer. Copying the entire context when mutating the context is therefore fast. The context does no memory allocation.
尽管上下文对象看起来被添加或删除了东西,但实际上并没有。实际上,没有上下文对象。上下文在解析过程中形成,通常在开始子解析器时。每个上下文都是通过取前一个上下文,并根据需要添加或更改成员来形成新的上下文对象。当包含新上下文对象的函数返回时,其上下文对象(如果有)将被销毁。这样做是高效的,因为解析上下文只有大约十几个数据成员,每个数据成员的大小不超过指针的大小。因此,在修改上下文时复制整个上下文是快速的。上下文不进行内存分配。

[Tip] Tip  提示

All these functions that take the parse context as their first parameter will find by found by Argument-Dependent Lookup. You will probably never need to qualify them with boost::parser::.
所有这些以解析上下文作为第一个参数的函数将通过依赖参数查找来找到。你可能永远不需要用 boost::parser:: 来限定它们。

Accessors for data that are always available
访问始终可用的数据访问器

By convention, the names of all Boost.Parser functions that take a parse context, and are therefore intended for use inside semantic actions, contain a leading underscore.
按照惯例,所有接受解析上下文作为参数的 Boost.Parser 函数,因此旨在在语义动作中使用,其名称都包含一个前置下划线。

_pass()

_pass() returns a reference to a bool indicating the success of failure of the current parse. This can be used to force the current parse to pass or fail:
_pass() 返回一个指向 bool 的引用,表示当前解析的成功或失败。这可以用来强制当前解析通过或失败:

[](auto & ctx) {
    // If the attribute fails to meet this predicate, fail the parse.
    if (!necessary_condition(_attr(ctx)))
        _pass(ctx) = false;
}

Note that for a semantic action to be executed, its associated parser must already have succeeded. So unless you previously wrote _pass(ctx) = false within your action, _pass(ctx) = true does nothing; it's redundant.
请注意,要执行语义动作,其关联的解析器必须已经成功。所以除非你之前在你的动作中写了 _pass(ctx) = false ,否则 _pass(ctx) = true 什么也不做;它是多余的。

_begin(), _end() and _where()
_begin()、_end() 和 _where()

_begin() and _end() return the beginning and end of the range that you passed to parse(), respectively. _where() returns a subrange indicating the bounds of the input matched by the current parse. _where() can be useful if you just want to parse some text and return a result consisting of where certain elements are located, without producing any other attributes. _where() can also be essential in tracking where things are located, to provide good diagnostics at a later point in the parse. Think mismatched tags in XML; if you parse a close-tag at the end of an element, and it does not match the open-tag, you want to produce an error message that mentions or shows both tags. Stashing _where(ctx).begin() somewhere that is available to the close-tag parser will enable that. See Error Handling and Debugging for an example of this.
_begin()_end() 分别返回传递给 parse() 的范围的开始和结束。 _where() 返回一个 subrange ,表示当前解析匹配的输入的界限。 _where() 如果您只想解析一些文本并返回一个仅包含某些元素位置的结果,而不产生其他属性,则非常有用。 _where() 在跟踪位置、在稍后提供良好的诊断方面也至关重要。考虑 XML 中的不匹配标签;如果您解析元素末尾的闭合标签,并且它不匹配开标签,您希望产生一个提及或显示这两个标签的错误消息。将 _where(ctx).begin() 存储在闭合标签解析器可访问的地方将启用此功能。请参阅错误处理和调试的示例。

_error_handler()  错误处理程序()

_error_handler() returns a reference to the error handler associated with the parser passed to parse(). Using _error_handler(), you can generate errors and warnings from within your semantic actions. See Error Handling and Debugging for concrete examples.
_error_handler() 返回与传递给 parse() 的解析器关联的错误处理程序引用。使用 _error_handler() ,您可以在您的语义动作中生成错误和警告。请参阅错误处理和调试以获取具体示例。

Accessors for data that are only sometimes available
访问有时可用的数据访问器
_attr()

_attr() returns a reference to the value of the current parser's attribute. It is available only when the current parser's parse is successful. If the parser has no semantic action, no attribute gets added to the parse context. It can be used to read and write the current parser's attribute:
_attr() 返回当前解析器属性值的引用。仅在当前解析器解析成功时可用。如果解析器没有语义动作,则不会向解析上下文添加任何属性。它可以用来读取和写入当前解析器的属性:

[](auto & ctx) { _attr(ctx) = 3; }

If the current parser has no attribute, a none is returned.
如果当前解析器没有属性,则返回一个 none

_val()

_val() returns a reference to the value of the attribute of the current rule being used to parse (if any), and is available even before the rule's parse is successful. It can be used to set the current rule's attribute, even from a parser that is a subparser inside the rule. Let's say we're writing a parser with a semantic action that is within a rule. If we want to set the current rule's value to some function of subparser's attribute, we would write this semantic action:
_val() 返回当前正在使用的规则(如果有)的属性值的引用,即使在规则解析成功之前也可以使用。可以用来设置当前规则的属性,即使是从规则内部的子解析器中也可以。假设我们正在编写一个具有规则内语义动作的解析器。如果我们想将当前规则的值设置为子解析器属性的某个函数,我们会编写这个语义动作:

[](auto & ctx) { _val(ctx) = some_function(_attr(ctx)); }

If there is no current rule, or the current rule has no attribute, a none is returned.
如果没有当前规则,或者当前规则没有属性,则返回一个 none

You need to use _val() in cases where the default attribute for a rule's parser is not directly compatible with the attribute type of the rule. In these cases, you'll need to write some code like the example above to compute the rule's attribute from the rule's parser's generated attribute. For more info on rules, see the next page, and More About Rules.
您需要在默认属性对于某个 rule 的解析器不直接兼容于 rule 的属性类型的情况下使用 _val() 。在这些情况下,您需要编写一些像上面示例中的代码来从 rule 的解析器生成的属性计算 rule 的属性。有关 rules 的更多信息,请参阅下一页,以及更多关于规则的内容。

_globals()  全局变量()

_globals() returns a reference to a user-supplied object that contains whatever data you want to use during the parse. The "globals" for a parse is an object — typically a struct — that you give to the top-level parser. Then you can use _globals() to access it at any time during the parse. We'll see how globals get associated with the top-level parser in The parse() API later. As an example, say that you have an early part of the parse that needs to record some black-listed values, and that later parts of the parse might need to parse values, failing the parse if they see the black-listed values. In the early part of the parse, you could write something like this.
_globals() 返回一个指向用户提供的对象的引用,该对象包含您在解析过程中想要使用的任何数据。解析的“全局变量”是一个对象——通常是结构体——您将其提供给顶层解析器。然后您可以在解析过程中任何时间使用 _globals() 来访问它。我们将在后面的 parse() API 中看到全局变量是如何与顶层解析器关联的。作为一个例子,假设您在解析的早期部分需要记录一些黑名单值,而解析的后期部分可能需要解析值,如果看到黑名单值则解析失败。在解析的早期部分,您可以编写如下内容。

[](auto & ctx) {
    // black_list is a std::unordered_set.
    _globals(ctx).black_list.insert(_attr(ctx));
}

Later in the parse, you could then use black_list to check values as they are parsed.
稍后解析时,您可以使用 black_list 来检查解析时的值。

[](auto & ctx) {
    if (_globals(ctx).black_list.contains(_attr(ctx)))
        _pass(ctx) = false;
}
_locals()  locals()

_locals() returns a reference to one or more values that are local to the current rule being parsed, if any. If there are two or more local values, _locals() returns a reference to a boost::parser::tuple. Rules with locals are something we haven't gotten to yet (see More About Rules), but for now all you need to know is that you can provide a template parameter (LocalState) to rule, and the rule will default construct an object of that type for use within the rule. You access it via _locals():
_locals() 返回对当前解析规则中一个或多个局部值的引用(如果有的话)。如果有两个或更多局部值, _locals() 返回对 boost::parser::tuple 的引用。具有局部值的规则是我们还没有涉及的(参见关于规则的更多信息),但到目前为止,你需要知道的是,你可以提供一个模板参数( LocalState )给 rule ,规则将默认构造一个该类型的对象以供规则内部使用。你可以通过 _locals() 访问它:

[](auto & ctx) {
    auto & local = _locals(ctx);
    // Use local here.  If 'local' is a hana::tuple, access its members like this:
    using namespace hana::literals;
    auto & first_element = local[0_c];
    auto & second_element = local[1_c];
}

If there is no current rule, or the current rule has no locals, a none is returned.
如果没有当前规则,或者当前规则没有本地变量,则返回一个 none

_params()

_params(), like _locals(), applies to the current rule being used to parse, if any (see More About Rules). It also returns a reference to a single value, if the current rule has only one parameter, or a boost::parser::tuple of multiple values if the current rule has multiple parameters. If there is no current rule, or the current rule has no parameters, a none is returned.
_params() ,类似于 _locals() ,适用于当前正在使用的解析规则(见关于规则的更多信息)。它还返回单个值的引用,如果当前规则只有一个参数,或者返回多个值的 boost::parser::tuple ,如果当前规则有多个参数。如果没有当前规则,或者当前规则没有参数,则返回 none

Unlike with _locals(), you do not provide a template parameter to rule. Instead you call the rule's with() member function (again, see More About Rules).
_locals() 不同,您没有为 rule 提供模板参数。相反,您调用 rulewith() 成员函数(再次,请参阅更多关于规则的内容)。

[Note] Note  注意

none is a type that is used as a return value in Boost.Parser for parse context accessors. none is convertible to anything that has a default constructor, convertible from anything, assignable form anything, and has templated overloads for all the overloadable operators. The intention is that a misuse of _val(), _globals(), etc. should compile, and produce an assertion at runtime. Experience has shown that using a debugger for investigating the stack that leads to your mistake is a far better user experience than sifting through compiler diagnostics. See the Rationale section for a more detailed explanation.
none 是一种类型,在 Boost.Parser 中用作解析上下文访问器的返回值。 none 可以转换为具有默认构造函数的任何类型,可以从任何类型转换,可以赋值给任何类型,并且对所有可重载运算符都有模板重载。意图是,对于 _val()_globals() 等的误用应该能够编译,并在运行时产生断言。经验表明,使用调试器来调查导致你错误的堆栈比筛选编译器诊断要好得多。请参阅“理由”部分以获取更详细的解释。

_no_case()

_no_case() returns true if the current parse context is inside one or more (possibly nested) no_case[] directives. I don't have a use case for this, but if I didn't expose it, it would be the only thing in the context that you could not examine from inside a semantic action. It was easy to add, so I did.
_no_case() 返回 true ,如果当前解析上下文位于一个或多个(可能嵌套的) no_case[] 指令内部。我没有用到这个功能,但如果我不公开它,那么在语义动作内部,你将无法检查上下文中的唯一一个东西。添加它很容易,所以我添加了它。

This example is very similar to the others we've seen so far. This one is different only because it uses a rule. As an analogy, think of a parser like char_ or double_ as an individual line of code, and a rule as a function. Like a function, a rule has its own name, and can even be forward declared. Here is how we define a rule, which is analogous to forward declaring a function:
这个例子与我们迄今为止看到的非常相似。这个例子唯一的不同之处在于它使用了 rule 。作为一个类比,将像 char_double_ 这样的解析器视为一行代码,将 rule 视为一个函数。像函数一样, rule 有自己的名字,甚至可以进行前置声明。以下是我们的定义方式,这相当于前置声明一个函数:

bp::rule<struct doubles, std::vector<double>> doubles = "doubles";

This declares the rule itself. The rule is a parser, and we can immediately use it in other parsers. That definition is pretty dense; take note of these things:
这声明了规则本身。 rule 是一个解析器,我们可以在其他解析器中立即使用它。那个定义相当密集;注意以下事项:

  • The first template parameter is a tag type struct doubles. Here we've declared the tag type and used it all in one go; you can also use a previously declared tag type.
    第一个模板参数是一个标签类型 struct doubles 。这里我们声明了标签类型并一次性使用它;您也可以使用之前声明的标签类型。
  • The second template parameter is the attribute type of the parser. If you don't provide this, the rule will have no attribute.
    第二个模板参数是解析器的属性类型。如果您不提供这个,规则将没有属性。
  • This rule object itself is called doubles.
    这个规则对象本身被称为 doubles
  • We've given doubles the diagnstic text "doubles" so that Boost.Parser knows how to refer to it when producing a trace of the parser during debugging.
    我们已经为 doubles 提供了诊断文本 "doubles" ,这样 Boost.Parser 在调试期间生成解析器跟踪时知道如何引用它。

Ok, so if doubles is a parser, what does it do? We define the rule's behavior by defining a separate parser that by now should look pretty familiar:
好的,所以如果 doubles 是一个解析器,它做什么?我们通过定义一个独立的解析器来定义规则的行為,到目前为止,这个解析器应该看起来相当熟悉:

auto const doubles_def = bp::double_ % ',';

This is analogous to writing a definition for a forward-declared function. Note that we used the name doubles_def. Right now, the doubles rule parser and the doubles_def non-rule parser have no connection to each other. That's intentional — we want to be able to define them separately. To connect them, we declare functions with an interface that Boost.Parser understands, and use the tag type struct doubles to connect them together. We use a macro for that:
这与为已声明的函数编写定义类似。注意,我们使用了名称 doubles_def 。目前, doubles 规则解析器和 doubles_def 非规则解析器之间没有连接。这是故意的——我们希望能够分别定义它们。为了将它们连接起来,我们声明了 Boost.Parser 能够理解的接口函数,并使用标签类型 struct doubles 将它们连接在一起。我们为此使用了一个宏:

BOOST_PARSER_DEFINE_RULES(doubles);

This macro expands to the code necessary to make the rule doubles and its parser doubles_def work together. The _def suffix is a naming convention that this macro relies on to work. The tag type allows the rule parser, doubles, to call one of these overloads when used as a parser.
这个宏展开为使规则 doubles 及其解析器 doubles_def 协同工作的必要代码。 _def 后缀是一种命名约定,这个宏依赖于它来工作。标签类型允许规则解析器 doubles 在用作解析器时调用这些重载之一。

BOOST_PARSER_DEFINE_RULES expands to two overloads of a function called parse_rule(). In the case above, the overloads each take a struct doubles parameter (to distinguish them from the other overloads of parse_rule() for other rules) and parse using doubles_def. You will never need to call any overload of parse_rule() yourself; it is used internally by the parser that implements rules, rule_parser.
BOOST_PARSER_DEFINE_RULES 展开为名为 parse_rule() 的函数的两个重载。在上面的例子中,每个重载都接受一个 struct doubles 参数(以区分其他规则中 parse_rule() 的其他重载)并使用 doubles_def 进行解析。您永远不需要自己调用 parse_rule() 的任何重载;它由实现 rulesrule_parser 的解析器内部使用。

Here is the definition of the macro that is expanded for each rule:
这里是对每个规则展开的宏定义:

#define BOOST_PARSER_DEFINE_IMPL(_, rule_name_)                                \
    template<                                                                  \
        typename Iter,                                                         \
        typename Sentinel,                                                     \
        typename Context,                                                      \
        typename SkipParser>                                                   \
    decltype(rule_name_)::parser_type::attr_type parse_rule(                   \
        decltype(rule_name_)::parser_type::tag_type *,                         \
        Iter & first,                                                          \
        Sentinel last,                                                         \
        Context const & context,                                               \
        SkipParser const & skip,                                               \
        boost::parser::detail::flags flags,                                    \
        bool & success,                                                        \
        bool & dont_assign)                                                    \
    {                                                                          \
        auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def);           \
        using attr_t =                                                         \
            decltype(parser(first, last, context, skip, flags, success));      \
        using attr_type = decltype(rule_name_)::parser_type::attr_type;        \
        if constexpr (boost::parser::detail::is_nope_v<attr_t>) {              \
            dont_assign = true;                                                \
            parser(first, last, context, skip, flags, success);                \
            return {};                                                         \
        } else if constexpr (std::is_same_v<attr_type, attr_t>) {              \
            return parser(first, last, context, skip, flags, success);         \
        } else if constexpr (std::is_constructible_v<attr_type, attr_t>) {     \
            return attr_type(                                                  \
                parser(first, last, context, skip, flags, success));           \
        } else {                                                               \
            attr_type attr{};                                                  \
            parser(first, last, context, skip, flags, success, attr);          \
            return attr;                                                       \
        }                                                                      \
    }                                                                          \
                                                                               \
    template<                                                                  \
        typename Iter,                                                         \
        typename Sentinel,                                                     \
        typename Context,                                                      \
        typename SkipParser,                                                   \
        typename Attribute>                                                    \
    void parse_rule(                                                           \
        decltype(rule_name_)::parser_type::tag_type *,                         \
        Iter & first,                                                          \
        Sentinel last,                                                         \
        Context const & context,                                               \
        SkipParser const & skip,                                               \
        boost::parser::detail::flags flags,                                    \
        bool & success,                                                        \
        bool & dont_assign,                                                    \
        Attribute & retval)                                                    \
    {                                                                          \
        auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def);           \
        using attr_t =                                                         \
            decltype(parser(first, last, context, skip, flags, success));      \
        if constexpr (boost::parser::detail::is_nope_v<attr_t>) {              \
            parser(first, last, context, skip, flags, success);                \
        } else {                                                               \
            parser(first, last, context, skip, flags, success, retval);        \
        }                                                                      \
    }

Now that we have the doubles parser, we can use it like we might any other parser:
现在我们有了 doubles 解析器,我们可以像使用任何其他解析器一样使用它:

auto const result = bp::parse(input, doubles, bp::ws);

The full program:   整个程序:

#include <boost/parser/parser.hpp>

#include <deque>
#include <iostream>
#include <string>


namespace bp = boost::parser;


bp::rule<struct doubles, std::vector<double>> doubles = "doubles";
auto const doubles_def = bp::double_ % ',';
BOOST_PARSER_DEFINE_RULES(doubles);

int main()
{
    std::cout << "Please enter a list of doubles, separated by commas. ";
    std::string input;
    std::getline(std::cin, input);

    auto const result = bp::parse(input, doubles, bp::ws);

    if (result) {
        std::cout << "You entered:\n";
        for (double x : *result) {
            std::cout << x << "\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

All this is intended to introduce the notion of rules. It still may be a bit unclear why you would want to use rules. The use cases for, and lots of detail about, rules is in a later section, More About Rules.
所有这些旨在引入 rules 的概念。它仍然可能有点不清楚你为什么想使用 rules 。关于 rules 的使用案例和大量细节将在后面的章节“更多关于规则”中介绍。

[Note] Note  注意

The existence of rules means that will probably never have to write a low-level parser. You can just put existing parsers together into rules instead.
rules 的存在意味着可能永远不需要编写低级解析器。你只需将现有的解析器组合到 rules 中即可。

So far, we've seen only simple parsers that parse the same value repeatedly (with or without commas and spaces). It's also very common to parse a few values in a specific sequence. Let's say you want to parse an employee record. Here's a parser you might write:
到目前为止,我们只看到过简单的解析器,它们反复解析相同的值(带或不带逗号和空格)。解析特定顺序的几个值也非常常见。比如说,你想解析一个员工记录。下面是一个你可能编写的解析器:

namespace bp = boost::parser;
auto employee_parser = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

The attribute type for employee_parser is boost::parser::tuple<int, std::string, std::string, double>. That's great, in that you got all the parsed data for the record without having to write any semantic actions. It's not so great that you now have to get all the individual elements out by their indices, using get(). It would be much nicer to parse into the final data structure that your program is going to use. This is often some struct or class. Boost.Parser supports parsing into arbitrary aggregate structs, and non-aggregates that are constructible from the tuple at hand.
employee_parser 的属性类型是 boost::parser::tuple<int, std::string, std::string, double> 。这很好,因为你得到了记录的所有解析数据,而无需编写任何语义操作。现在你必须通过索引使用 get() 来获取所有单个元素,这就不那么好了。如果能解析成程序将要使用的最终数据结构会更好。这通常是某些 structclass 。Boost.Parser 支持将解析结果存储到任意聚合 struct 中,以及可以从当前元组构造的非聚合结构。

Aggregate types as attributes
聚合类型作为属性

If we have a struct that has data members of the same types listed in the boost::parser::tuple attribute type for employee_parser, it would be nice to parse directly into it, instead of parsing into a tuple and then constructing our struct later. Fortunately, this just works in Boost.Parser. Here is an example of parsing straight into a compatible aggregate type.
如果我们有一个具有与 boost::parser::tuple 属性类型中列出的相同类型的数据成员的 struct ,直接解析到它中会更好,而不是先解析到一个元组,然后再构建我们的 struct 。幸运的是,这正好在 Boost.Parser 中工作。这是一个将数据直接解析到兼容聚合类型的示例。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

namespace bp = boost::parser;

int main()
{
    std::cout << "Enter employee record. ";
    std::string input;
    std::getline(std::cin, input);

    auto quoted_string = bp::lexeme['"' >> +(bp::char_ - '"') >> '"'];
    auto employee_p = bp::lit("employee")
        >> '{'
        >> bp::int_ >> ','
        >> quoted_string >> ','
        >> quoted_string >> ','
        >> bp::double_
        >> '}';

    employee record;
    auto const result = bp::parse(input, employee_p, bp::ws, record);

    if (result) {
        std::cout << "You entered:\nage:      " << record.age
                  << "\nsurname:  " << record.surname
                  << "\nforename: " << record.forename
                  << "\nsalary  : " << record.salary << "\n";
    } else {
        std::cout << "Parse failure.\n";
    }
}

Unfortunately, this is taking advantage of the loose attribute assignment logic; the employee_parser parser still has a boost::parser::tuple attribute. See The parse() API for a description of attribute out-param compatibility.
很不幸,这是利用了宽松的属性赋值逻辑; employee_parser 解析器仍然有一个 boost::parser::tuple 属性。请参阅 parse() API 了解属性输出参数兼容性的描述。

For this reason, it's even more common to want to make a rule that returns a specific type like employee. Just by giving the rule a struct type, we make sure that this parser always generates an employee struct as its attribute, no matter where it is in the parse. If we made a simple parser P that uses the employee_p rule, like bp::int >> employee_p, P's attribute type would be boost::parser::tuple<int, employee>.
因此,更常见的是想要制定一个返回特定类型如 employee 的规则。只需给规则赋予 struct 类型,我们就可以确保这个解析器无论在解析的哪个位置,都始终生成一个 employee 结构作为其属性。如果我们创建一个简单的解析器 P ,它使用 employee_p 规则,如 bp::int >> employee_p ,那么 P 的属性类型将是 boost::parser::tuple<int, employee>

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

namespace bp = boost::parser;

bp::rule<struct quoted_string, std::string> quoted_string = "quoted name";
bp::rule<struct employee_p, employee> employee_p = "employee";

auto quoted_string_def = bp::lexeme['"' >> +(bp::char_ - '"') >> '"'];
auto employee_p_def = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

BOOST_PARSER_DEFINE_RULES(quoted_string, employee_p);

int main()
{
    std::cout << "Enter employee record. ";
    std::string input;
    std::getline(std::cin, input);

    static_assert(std::is_aggregate_v<std::decay_t<employee &>>);

    auto const result = bp::parse(input, employee_p, bp::ws);

    if (result) {
        std::cout << "You entered:\nage:      " << result->age
                  << "\nsurname:  " << result->surname
                  << "\nforename: " << result->forename
                  << "\nsalary  : " << result->salary << "\n";
    } else {
        std::cout << "Parse failure.\n";
    }
}

Just as you can pass a struct as an out-param to parse() when the parser's attribute type is a tuple, you can also pass a tuple as an out-param to parse() when the parser's attribute type is a struct:
正如您可以将一个 struct 作为 out-param 传递给 parse() ,当解析器的属性类型是元组时,您也可以将一个元组作为 out-param 传递给 parse() ,当解析器的属性类型是结构体时:

// Using the employee_p rule from above, with attribute type employee...
boost::parser::tuple<int, std::string, std::string, double> tup;
auto const result = bp::parse(input, employee_p, bp::ws, tup); // Ok!
[Important] Important  重要

This automatic use of structs as if they were tuples depends on a bit of metaprogramming. Due to compiler limits, the metaprogram that detects the number of data members of a struct is limited to a maximum number of members. Fortunately, that limit is configurable; see BOOST_PARSER_MAX_AGGREGATE_SIZE.
这种将 struct 自动用作元组的行为依赖于一点元编程。由于编译器限制,检测 struct 数据成员数量的元程序限制在最大成员数。幸运的是,这个限制是可以配置的;请参阅 BOOST_PARSER_MAX_AGGREGATE_SIZE

General class types as attributes
通用 class 类型作为属性

Many times you don't have an aggregate struct that you want to produce from your parse. It would be even nicer than the aggregate code above if Boost.Parser could detect that the members of a tuple that is produced as an attribute are usable as the arguments to some type's constructor. So, Boost.Parser does that.
很多时候,你并不需要一个从你的解析中生成的聚合结构。如果 Boost.Parser 能够检测到作为属性生成的元组的成员可以用作某些类型的构造函数的参数,那么这将比上面的聚合代码更好。所以,Boost.Parser 就是这样做的。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a string followed by two unsigned integers. ";
    std::string input;
    std::getline(std::cin, input);

    constexpr auto string_uint_uint =
        bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_;
    std::string string_from_parse;
    if (parse(input, string_uint_uint, bp::ws, string_from_parse))
        std::cout << "That yields this string: " << string_from_parse << "\n";
    else
        std::cout << "Parse failure.\n";

    std::cout << "Enter an unsigned integer followed by a string. ";
    std::getline(std::cin, input);
    std::cout << input << "\n";

    constexpr auto uint_string = bp::uint_ >> +bp::char_;
    std::vector<std::string> vector_from_parse;
    if (parse(input, uint_string, bp::ws, vector_from_parse)) {
        std::cout << "That yields this vector of strings:\n";
        for (auto && str : vector_from_parse) {
            std::cout << "  '" << str << "'\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

Let's look at the first parse.
让我们看看第一次解析。

constexpr auto string_uint_uint =
    bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_;
std::string string_from_parse;
if (parse(input, string_uint_uint, bp::ws, string_from_parse))
    std::cout << "That yields this string: " << string_from_parse << "\n";
else
    std::cout << "Parse failure.\n";

Here, we use the parser string_uint_uint, which produces a boost::parser::tuple<std::string, unsigned int, unsigned int> attribute. When we try to parse that into an out-param std::string attribute, it just works. This is because std::string has a constructor that takes a std::string, an offset, and a length. Here's the other parse:
这里,我们使用解析器 string_uint_uint ,它产生一个 boost::parser::tuple<std::string, unsigned int, unsigned int> 属性。当我们尝试将其解析为 out-param std::string 属性时,它就成功了。这是因为 std::string 有一个构造函数,它接受一个 std::string 、一个偏移量和长度。这是另一个解析:

constexpr auto uint_string = bp::uint_ >> +bp::char_;
std::vector<std::string> vector_from_parse;
if (parse(input, uint_string, bp::ws, vector_from_parse)) {
    std::cout << "That yields this vector of strings:\n";
    for (auto && str : vector_from_parse) {
        std::cout << "  '" << str << "'\n";
    }
} else {
    std::cout << "Parse failure.\n";
}

Now we have the parser uint_string, which produces boost::parser::tuple<unsigned int, std::string> attribute — the two chars at the end combine into a std::string. Those two values can be used to construct a std::vector<std::string>, via the count, T constructor.
现在我们有解析器 uint_string ,它产生 boost::parser::tuple<unsigned int, std::string> 属性——末尾的两个 char 结合成一个 std::string 。这两个值可以通过计数, T 构造函数来构建一个 std::vector<std::string>

Just like with using aggregates in place of tuples, non-aggregate class types can be substituted for tuples in most places. That includes using a non-aggregate class type as the attribute type of a rule.
就像用聚合体代替元组一样,大多数情况下可以用非聚合体 class 类型替换元组。这包括将非聚合体 class 类型用作 rule 的属性类型。

However, while compatible tuples can be substituted for aggregates, you can't substitute a tuple for some class type T just because the tuple could have been used to construct T. Think of trying to invert the substitution in the second parse above. Converting a std::vector<std::string> into a boost::parser::tuple<unsigned int, std::string> makes no sense.
然而,虽然兼容元组可以替换聚合,但你不能仅仅因为元组可以用来构建某个 class 类型 T 就替换它。想想在上述第二个解析中尝试反转替换。将一个 std::vector<std::string> 转换为 boost::parser::tuple<unsigned int, std::string> 没有意义。

Frequently, you need to parse something that might have one of several forms. operator| is overloaded to form alternative parsers. For example:
经常,你需要解析可能具有几种形式的内容。 operator| 被重载以形成替代解析器。例如:

namespace bp = boost::parser;
auto const parser_1 = bp::int_ | bp::eps;

parser_1 matches an integer, or if that fails, it matches epsilon, the empty string. This is equivalent to writing:
parser_1 匹配一个整数,如果失败,则匹配空字符串 epsilon。这相当于写成:

namespace bp = boost::parser;
auto const parser_2 = -bp::int_;

However, neither parser_1 nor parser_2 is equivalent to writing this:
然而, parser_1parser_2 都不等同于这样写:

namespace bp = boost::parser;
auto const parser_3 = bp::eps | bp::int_; // Does not do what you think.

The reason is that alternative parsers try each of their subparsers, one at a time, and stop on the first one that matches. Epsilon matches anything, since it is zero length and consumes no input. It even matches the end of input. This means that parser_3 is equivalent to eps by itself.
原因是替代解析器逐个尝试它们的子解析器,并在第一个匹配的停止。Epsilon 匹配任何内容,因为它长度为零且不消耗任何输入。它甚至可以匹配输入的末尾。这意味着 parser_3eps 本身等价。

[Note] Note  注意

For this reason, writing eps | p for any parser p is considered a bug. Debug builds will assert when eps | p is encountered.
因此,对于任何解析器 p,写入 eps | p 被视为一个错误。在调试构建中,遇到 eps | p 时会断言。

[Warning] Warning  警告

This kind of error is very common when eps is involved, and also very easy to detect. However, it is possible to write P1 >> P2, where P1 is a prefix of P2, such as int_ | int >> int_, or repeat(4)[hex_digit] | repeat(8)[hex_digit]. This is almost certainly an error, but is impossible to detect in the general case — remember that rules can be separately compiled, and consider a pair of rules whose associated _def parsers are int_ and int_ >> int_, respectively.
这种错误在涉及 eps 时非常常见,也很容易检测到。然而,可能编写 P1 >> P2 ,其中 P1P2 的前缀,例如 int_ | int >> int_ ,或 repeat(4)[hex_digit] | repeat(8)[hex_digit] 。这几乎肯定是一个错误,但在一般情况下无法检测到——记住 rules 可以单独编译,并考虑一对相关联的 _def 解析器,分别是 int_int_ >> int_

It is very common to need to parse quoted strings. Quoted strings are slightly tricky, though, when using a skipper (and you should be using a skipper 99% of the time). You don't want to allow arbitrary whitespace in the middle of your strings, and you also don't want to remove all whitespace from your strings. Both of these things will happen with the typical skipper, ws.
需要解析引号字符串的情况非常常见。然而,当使用跳过符时(你应该 99%的时间使用跳过符),引号字符串会变得稍微棘手一些。你不想在字符串中间允许任意空白字符,同时也不想从字符串中移除所有空白字符。典型的跳过符 ws 会导致这两种情况都发生。

So, here is how most people would write a quoted string parser:
所以,这是大多数人编写引号字符串解析器的方式:

namespace bp = boost::parser;
const auto string = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];

Some things to note:
请注意以下几点:

  • the result is a string;
    结果是字符串;
  • the quotes are not included in the result;
    引号不包括在结果中;
  • there is an expectation point before the close-quote;
    在引号关闭之前有一个期望点
  • the use of lexeme[] disables skipping in the parser, and it must be written around the quotes, not around the operator* expression; and
    使用 lexeme[] 禁用解析器的跳过功能,并且它必须写在引号周围,而不是 operator* 表达式周围;
  • there's no way to write a quote in the middle of the string.
    无法在字符串中间写入引号。

This is a very common pattern. I have written a quoted string parser like this dozens of times. The parser above is the quick-and-dirty version. A more robust version would be able to handle escaped quotes within the string, and then would immediately also need to support escaped escape characters.
这是一个非常常见的模式。我像这样写过几十次引号字符串解析器。上面的解析器是快速且简单的版本。一个更健壮的版本将能够处理字符串中的转义引号,然后还需要立即支持转义转义字符。

Boost.Parser provides quoted_string to use in place of this very common pattern. It supports quote- and escaped-character-escaping, using backslash as the escape character.
Boost.Parser 提供 quoted_string 来替代这个非常常见的模式。它支持引号和转义字符转义,使用反斜杠作为转义字符。

namespace bp = boost::parser;

auto result1 = bp::parse("\"some text\"", bp::quoted_string, bp::ws);
assert(result1);
std::cout << *result1 << "\n"; // Prints: some text

auto result2 =
    bp::parse("\"some \\\"text\\\"\"", bp::quoted_string, bp::ws);
assert(result2);
std::cout << *result2 << "\n"; // Prints: some "text"

As common as this use case is, there are very similar use cases that it does not cover. So, quoted_string has some options. If you call it with a single character, it returns a quoted_string that uses that single character as the quote-character.
与这种用例一样常见的是,还有一些非常类似的用例它没有涵盖。因此, quoted_string 有一些选项。如果你用单个字符调用它,它就返回一个使用该单个字符作为引号字符的 quoted_string

auto result3 = bp::parse("!some text!", bp::quoted_string('!'), bp::ws);
assert(result3);
std::cout << *result3 << "\n"; // Prints: some text

You can also supply a range of characters. One of the characters from the range must quote both ends of the string; mismatches are not allowed. Think of how Python allows you to quote a string with either '"' or '\'', but the same character must be used on both sides.
您也可以提供一组字符。该范围内的一个字符必须引用字符串的两端;不允许有误匹配。想想 Python 如何允许您使用 '"''\'' 来引用字符串,但两侧必须使用相同的字符。

auto result4 = bp::parse("'some text'", bp::quoted_string("'\""), bp::ws);
assert(result4);
std::cout << *result4 << "\n"; // Prints: some text

Another common thing to do in a quoted string parser is to recognize escape sequences. If you have simple escape sequencecs that do not require any real parsing, like say the simple escape sequences from C++, you can provide a symbols object as well. The template parameter T to symbols<T> must be char or char32_t. You don't need to include the escaped backslash or the escaped quote character, since those always work.
另一项在引号字符串解析器中常见的操作是识别转义序列。如果您有简单的转义序列,不需要任何实际解析,比如 C++中的简单转义序列,您也可以提供一个 symbols 对象。模板参数 Tsymbols<T> 必须是 charchar32_t 。您不需要包含转义的反斜杠或转义的引号字符,因为那些总是有效的。

// the c++ simple escapes
bp::symbols<char> const escapes = {
    {"'", '\''},
    {"?", '\?'},
    {"a", '\a'},
    {"b", '\b'},
    {"f", '\f'},
    {"n", '\n'},
    {"r", '\r'},
    {"t", '\t'},
    {"v", '\v'}};
auto result5 =
    bp::parse("\"some text\r\"", bp::quoted_string('"', escapes), bp::ws);
assert(result5);
std::cout << *result5 << "\n"; // Prints (with a CRLF newline): some text

Now that you've seen some examples, let's see how parsing works in a bit more detail. Consider this example.
现在你已经看到了一些例子,让我们更详细地看看解析是如何工作的。考虑这个例子。

namespace bp = boost::parser;
auto int_pair = bp::int_ >> bp::int_;         // Attribute: tuple<int, int>
auto int_pairs_plus = +int_pair >> bp::int_;  // Attribute: tuple<std::vector<tuple<int, int>>, int>

int_pairs_plus must match a pair of ints (using int_pair) one or more times, and then must match an additional int. In other words, it matches any odd number (greater than 1) of ints in the input. Let's look at how this parse proceeds.
int_pairs_plus 必须匹配一对 int s(使用 int_pair ),一次或多次,然后必须匹配一个额外的 int 。换句话说,它匹配输入中任何奇数(大于 1)的 int s。让我们看看这个解析是如何进行的。

auto result = bp::parse("1 2 3", int_pairs_plus, bp::ws);

At the beginning of the parse, the top level parser uses its first subparser (if any) to start parsing. So, int_pairs_plus, being a sequence parser, would pass control to its first parser +int_pair. Then +int_pair would use int_pair to do its parsing, which would in turn use bp::int_. This creates a stack of parsers, each one using a particular subparser.
在解析开始时,顶级解析器使用其第一个子解析器(如果有)来开始解析。因此,作为序列解析器的 int_pairs_plus 会将控制权传递给其第一个解析器 +int_pair 。然后 +int_pair 会使用 int_pair 进行解析,而 int_pair 又会使用 bp::int_ 。这创建了一个解析器栈,每个解析器都使用特定的子解析器。

Step 1) The input is "1 2 3", and the stack of active parsers is int_pairs_plus -> +int_pair -> int_pair -> bp::int_. (Read "->" as "uses".) This parses "1", and the whitespace after is skipped by bp::ws. Control passes to the second bp::int_ parser in int_pair.
步骤 1)输入为 "1 2 3" ,活动解析器栈为 int_pairs_plus -> +int_pair -> int_pair -> bp::int_ 。(将"->"读作"使用"。)这解析 "1" ,后面的空白由 bp::ws 跳过。控制权传递到 int_pair 中的第二个 bp::int_ 解析器。

Step 2) The input is "2 3" and the stack of parsers looks the same, except the active parser is the second bp::int_ from int_pair. This parser consumes "2" and then bp::ws skips the subsequent space. Since we've finished with int_pair's match, its boost::parser::tuple<int, int> attribute is complete. It's parent is +int_pair, so this tuple attribute is pushed onto the back of +int_pair's attribute, which is a std::vector<boost::parser::tuple<int, int>>. Control passes up to the parent of int_pair, +int_pair. Since +int_pair is a one-or-more parser, it starts a new iteration; control passes to int_pair again.
步骤 2)输入是 "2 3" ,解析器栈看起来相同,除了活动解析器是第二个 bp::int_int_pair 。这个解析器消耗 "2" ,然后 bp::ws 跳过后续空格。由于我们已经完成了 int_pair 的匹配,其 boost::parser::tuple<int, int> 属性已完成。它的父级是 +int_pair ,因此这个元组属性被推到 +int_pair 的属性后面, +int_pair 是一个 std::vector<boost::parser::tuple<int, int>> 。控制权传递到 int_pair 的父级, +int_pair 。由于 +int_pair 是一个一次或多次解析器,它开始新的迭代;控制权再次传递到 int_pair

Step 3) The input is "3" and the stack of parsers looks the same, except the active parser is the first bp::int_ from int_pair again, and we're in the second iteration of +int_pair. This parser consumes "3". Since this is the end of the input, the second bp::int_ of int_pair does not match. This partial match of "3" should not count, since it was not part of a full match. So, int_pair indicates its failure, and +int_pair stops iterating. Since it did match once, +int_pair does not fail; it is a zero-or-more parser; failure of its subparser after the first success does not cause it to fail. Control passes to the next parser in sequence within int_pairs_plus.
步骤 3)输入是 "3" ,解析器栈看起来相同,除了活动解析器是第一个从 int_pair 开始的 bp::int_ ,并且我们处于 +int_pair 的第二次迭代。此解析器消耗 "3" 。由于这是输入的末尾, int_pair 的第二个 bp::int_ 不匹配。这个 "3" 的部分匹配不应计算,因为它不是完整匹配的一部分。因此, int_pair 指示其失败, +int_pair 停止迭代。由于它已经匹配过一次, +int_pair 不会失败;它是一个零次或多次解析器;其子解析器在第一次成功后的失败不会导致它失败。控制传递到 int_pairs_plus 中的下一个解析器。

Step 4) The input is "3" again, and the stack of parsers is int_pairs_plus -> bp::int_. This parses the "3", and the parse reaches the end of input. Control passes to int_pairs_plus, which has just successfully matched with all parser in its sequence. It then produces its attribute, a boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>, which gets returned from bp::parse().
步骤 4)输入再次为 "3" ,解析器栈为 int_pairs_plus -> bp::int_ 。这解析了 "3" ,解析到达输入末尾。控制传递到 int_pairs_plus ,它刚刚成功匹配其序列中的所有解析器。然后它产生其属性,一个 boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int> ,从 bp::parse() 返回。

Something to take note of between Steps #3 and #4: at the beginning of #4, the input position had returned to where is was at the beginning of #3. This kind of backtracking happens in alternative parsers when an alternative fails. The next page has more details on the semantics of backtracking.
请注意步骤#3 和#4 之间的内容:在#4 的开始,输入位置回到了#3 的开始处。这种回溯发生在替代解析器中,当替代失败时。下一页有更多关于回溯语义的细节。

Parsers in detail
解析器详情

So far, parsers have been presented as somewhat abstract entities. You may be wanting more detail. A Boost.Parser parser P is an invocable object with a pair of call operator overloads. The two functions are very similar, and in many parsers one is implemented in terms of the other. The first function does the parsing and returns the default attribute for the parser. The second function does exactly the same parsing, but takes an out-param into which it writes the attribute for the parser. The out-param does not need to be the same type as the default attribute, but they need to be compatible.
到目前为止,解析器被呈现为某种程度上的抽象实体。你可能想要更多细节。一个 Boost.Parser 解析器 P 是一个可调用的对象,具有一对重载的调用操作符。这两个函数非常相似,在许多解析器中,一个是通过另一个实现的。第一个函数执行解析并返回解析器的默认属性。第二个函数执行完全相同的解析,但将解析器的属性写入一个输出参数。输出参数不需要与默认属性相同类型,但它们需要兼容。

Compatibility means that the default attribute is assignable to the out-param in some fashion. This usually means direct assignment, but it may also mean a tuple -> aggregate or aggregate -> tuple conversion. For sequence types, compatibility means that the sequence type has insert or push_back with the usual semantics. This means that the parser +boost::parser::int_ can fill a std::set<int> just as well as a std::vector<int>.
兼容性意味着默认属性可以以某种方式分配给输出参数。这通常意味着直接赋值,但也可能意味着元组到聚合或聚合到元组的转换。对于序列类型,兼容性意味着序列类型具有 insertpush_back 与常规语义。这意味着解析器 +boost::parser::int_ 可以像 std::set<int> 一样填充 std::vector<int>

Some parsers also have additional state that is required to perform a match. For instance, char_ parsers can be parameterized with a single code point to match; the exact value of that code point is stored in the parser object.
一些解析器还需要额外的状态来执行匹配。例如, char_ 解析器可以用单个码点进行参数化以进行匹配;该码点的确切值存储在解析器对象中。

No parser has direct support for all the operations defined on parsers (operator|, operator>>, etc.). Instead, there is a template called parser_interface that supports all of these operations. parser_interface wraps each parser, storing it as a data member, adapting it for general use. You should only ever see parser_interface in the debugger, or possibly in some of the reference documentation. You should never have to write it in your own code.
没有解析器直接支持在解析器上定义的所有操作( operator|operator>> 等)。相反,有一个名为 parser_interface 的模板支持所有这些操作。 parser_interface 包装每个解析器,将其存储为数据成员,以便于通用使用。你只能在调试器中看到 parser_interface ,或者在部分参考文档中。你永远不需要在自己的代码中编写它。

As described in the previous page, backtracking occurs when the parse attempts to match the current parser P, matches part of the input, but fails to match all of P. The part of the input consumed during the parse of P is essentially "given back".
如前页所述,当解析尝试匹配当前解析器 P 时,匹配了输入的一部分,但未能匹配所有 P 。在解析 P 时消耗的输入部分实际上是“返回”。

This is necessary because P may consist of subparsers, and each subparser that succeeds will try to consume input, produce attributes, etc. When a later subparser fails, the parse of P fails, and the input must be rewound to where it was when P started its parse, not where the latest matching subparser stopped.
这是必要的,因为 P 可能包含子解析器,每个成功的子解析器都会尝试消费输入、生成属性等。当后续的子解析器失败时, P 的解析也会失败,并且输入必须回滚到 P 开始解析时的位置,而不是最新匹配的子解析器停止的位置。

Alternative parsers will often evaluate multiple subparsers one at a time, advancing and then restoring the input position, until one of the subparsers succeeds. Consider this example.
替代解析器通常会逐个评估多个子解析器,前进并恢复输入位置,直到其中一个子解析器成功。考虑这个例子。

namespace bp = boost::parser;
auto const parser = repeat(53)[other_parser] | repeat(10)[other_parser];

Evaluating parser means trying to match other_parser 53 times, and if that fails, trying to match other_parser 10 times. Say you parse input that matches other_parser 11 times. parser will match it. It will also evaluate other_parser 21 times during the parse.
评估 parser 意味着尝试匹配 other_parser 53 次,如果失败,则尝试匹配 other_parser 10 次。假设你解析了匹配 other_parser 11 次的输入。 parser 将匹配它。在解析过程中,它还将评估 other_parser 21 次。

The attributes of the repeat(53)[other_parser] and repeat(10)[other_parser] are each std::vector<ATTR(other_parser)>; let's say that ATTR(other_parser) is int. The attribute of parser as a whole is the same, std::vector<int>. Since other_parser is busy producing ints — 21 of them to be exact — you may be wondering what happens to the ones produced during the evaluation of repeat(53)[other_parser] when it fails to find all 53 inputs. Its std::vector<int> will contain 11 ints at that point.
repeat(53)[other_parser]repeat(10)[other_parser] 的属性各为 std::vector<ATTR(other_parser)> ;假设 ATTR(other_parser)intparser 的整体属性相同,为 std::vector<int> 。由于 other_parser 正在忙于生产 int ,确切地说有 21 个——你可能想知道在 repeat(53)[other_parser] 未能找到所有 53 个输入时,在评估期间产生的那些会发生什么。那时它的 std::vector<int> 将包含 11 个 int

When a repeat-parser fails, and attributes are being generated, it clears its container. This applies to parsers such as the ones above, but also all the other repeat parsers, including ones made using operator+ or operator*.
当重复解析器失败且正在生成属性时,它会清除其容器。这适用于上述解析器,也适用于所有其他重复解析器,包括使用 operator+operator* 制作的解析器。

So, at the end of a successful parse by parser of 10 inputs (since the right side of the alternative only eats 10 repetitions), the std::vector<int> attribute of parser would contain 10 ints.
因此,在通过 parser 成功解析 10 个输入的末尾(因为替代项的右侧只吃 10 次重复), parserstd::vector<int> 属性将包含 10 个 int

[Note] Note  注意

Users of Boost.Spirit may be familiar with the hold[] directive. Because of the behavior described above, there is no such directive in Boost.Parser.
Boost.Spirit 的用户可能熟悉 hold[] 指令。由于上述描述的行为,Boost.Parser 中没有这样的指令。

Expectation points  期待值

Ok, so if parsers all try their best to match the input, and are all-or-nothing, doesn't that leave room for all kinds of bad input to be ignored? Consider the top-level parser from the Parsing JSON example.
好的,所以如果所有解析器都尽力匹配输入,并且都是全有或全无的,那么这不是为各种不良输入留出了空间吗?考虑一下“解析 JSON 示例”中的顶级解析器。

auto const value_p_def =
    number | bp::bool_ | null | string | array_p | object_p;

What happens if I use this to parse "\""? The parse tries number, fails. It then tries bp::bool_, fails. Then null fails too. Finally, it starts parsing string. Good news, the first character is the open-quote of a JSON string. Unfortunately, that's also the end of the input, so string must fail too. However, we probably don't want to just give up on parsing string now and try array_p, right? If the user wrote an open-quote with no matching close-quote, that's not the prefix of some later alternative of value_p_def; it's ill-formed JSON. Here's the parser for the string rule:
如果我用这个来解析 "\"" 会发生什么?解析尝试 number ,失败。然后尝试 bp::bool_ ,也失败了。接着 null 也失败了。最后,它开始解析 string 。好消息,第一个字符是 JSON 字符串的开引号。不幸的是,这也是输入的结尾,所以 string 也必须失败。然而,我们现在可能不想放弃解析 string 并尝试 array_p ,对吧?如果用户写了一个没有匹配闭合引号的开放引号,那不是 value_p_def 的某些后续替代的前缀;这是不规范的 JSON。这是 string 规则的解析器:

auto const string_def = bp::lexeme['"' >> *(string_char - '"') > '"'];

Notice that operator> is used on the right instead of operator>>. This indicates the same sequence operation as operator>>, except that it also represents an expectation. If the parse before the operator> succeeds, whatever comes after it must also succeed. Otherwise, the top-level parse is failed, and a diagnostic is emitted. It will say something like "Expected '"' here.", quoting the line, with a caret pointing to the place in the input where it expected the right-side match.
请注意,在右侧使用的是 operator> 而不是 operator>> 。这表示与 operator>> 相同的序列操作,但同时也代表了一种期望。如果在 operator> 之前的解析成功,那么它之后的内容也必须成功。否则,顶级解析将失败,并发出诊断。它可能会说“在这里期望'\"'”,引用该行,并用一个箭头指向输入中期望右侧匹配的位置。

Choosing to use > versus >> is how you indicate to Boost.Parser that parse failure is or is not a hard error, respectively.
选择使用 >>> 来指示 Boost.Parser 解析失败是或不是硬错误。

When writing a parser, it often comes up that there is a set of strings that, when parsed, are associated with a set of values one-to-one. It is tedious to write parsers that recognize all the possible input strings when you have to associate each one with an attribute via a semantic action. Instead, we can use a symbol table.
在编写解析器时,经常会出现一组字符串,当解析时,它们与一组值一一对应。当你必须通过语义动作将每个字符串与一个属性关联时,编写识别所有可能输入字符串的解析器是繁琐的。相反,我们可以使用符号表。

Say we want to parse Roman numerals, one of the most common work-related parsing problems. We want to recognize numbers that start with any number of "M"s, representing thousands, followed by the hundreds, the tens, and the ones. Any of these may be absent from the input, but not all. Here are three symbol Boost.Parser tables that we can use to recognize ones, tens, and hundreds values, respectively:
我们想要解析罗马数字,这是最常见的与工作相关解析问题之一。我们想要识别以任意数量的"M"开头的数字,代表千位,然后是百位、十位和个位。这些中的任何一个都可以从输入中省略,但不能全部省略。以下是三个符号 Boost.Parser 表,我们可以使用它们分别识别个位、十位和百位的值:

bp::symbols<int> const ones = {
    {"I", 1},
    {"II", 2},
    {"III", 3},
    {"IV", 4},
    {"V", 5},
    {"VI", 6},
    {"VII", 7},
    {"VIII", 8},
    {"IX", 9}};

bp::symbols<int> const tens = {
    {"X", 10},
    {"XX", 20},
    {"XXX", 30},
    {"XL", 40},
    {"L", 50},
    {"LX", 60},
    {"LXX", 70},
    {"LXXX", 80},
    {"XC", 90}};

bp::symbols<int> const hundreds = {
    {"C", 100},
    {"CC", 200},
    {"CCC", 300},
    {"CD", 400},
    {"D", 500},
    {"DC", 600},
    {"DCC", 700},
    {"DCCC", 800},
    {"CM", 900}};

A symbols maps strings of char to their associated attributes. The type of the attribute must be specified as a template parameter to symbols — in this case, int.
一个 symbolschar 的字符串映射到其关联的属性。属性的类型必须作为模板参数指定给 symbols — 在这种情况下, int

Any "M"s we encounter should add 1000 to the result, and all other values come from the symbol tables. Here are the semantic actions we'll need to do that:
任何遇到的“M”都应该将结果加 1000,其他所有值都来自符号表。以下是我们需要执行的语义动作:

int result = 0;
auto const add_1000 = [&result](auto & ctx) { result += 1000; };
auto const add = [&result](auto & ctx) { result += _attr(ctx); };

add_1000 just adds 1000 to result. add adds whatever attribute is produced by its parser to result.
add_1000 仅将 1000 添加到 resultadd 将其解析器产生的任何属性添加到 result

Now we just need to put the pieces together to make a parser:
现在我们只需要将这些部分组合起来制作一个解析器:

using namespace bp::literals;
auto const parser =
    *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add];

We've got a few new bits in play here, so let's break it down. 'M'_l is a literal parser. That is, it is a parser that parses a literal char, code point, or string. In this case, a char 'M' is being parsed. The _l bit at the end is a UDL suffix that you can put after any char, char32_t, or char const * to form a literal parser. You can also make a literal parser by writing lit(), passing an argument of one of the previously mentioned types.
我们在这里有一些新的功能,让我们来分解一下。 'M'_l 是一个字面量解析器。也就是说,它是一个解析字面量 char 、代码点或字符串的解析器。在这种情况下,正在解析一个 char 'M' 。末尾的 _l 位是一个 UDL 后缀,您可以在任何 charchar32_tchar const * 后面添加它来形成一个字面量解析器。您还可以通过编写 lit() 并传递之前提到的类型之一作为参数来创建一个字面量解析器。

Why do we need any of this, considering that we just used a literal ',' in our previous example? The reason is that 'M' is not used in an expression with another Boost.Parser parser. It is used within *'M'_l[add_1000]. If we'd written *'M'[add_1000], clearly that would be ill-formed; char has no operator*, nor an operator[], associated with it.
为什么我们需要这些,考虑到我们之前例子中刚刚使用了字面量 ',' ?原因是 'M' 不在另一个 Boost.Parser 解析器中的表达式中使用。它是在 *'M'_l[add_1000] 中使用的。如果我们写了 *'M'[add_1000] ,显然那是非法的; char 没有与它相关的 operator* ,也没有 operator[]

[Tip] Tip  提示

Any time you want to use a char, char32_t, or string literal in a Boost.Parser parser, write it as-is if it is combined with a preexisting Boost.Parser subparser p, as in 'x' >> p. Otherwise, you need to wrap it in a call to lit(), or use the _l UDL suffix.
任何您想在 Boost.Parser 解析器中使用 charchar32_t 或字符串字面量时,如果它与现有的 Boost.Parser 子解析器 p 结合使用,则按原样写入,例如 'x' >> p 。否则,您需要将其包裹在调用 lit() 中,或者使用 _l UDL 后缀。

On to the next bit: -hundreds[add]. By now, the use of the index operator should be pretty familiar; it associates the semantic action add with the parser hundreds. The operator- at the beginning is new. It means that the parser it is applied to is optional. You can read it as "zero or one". So, if hundreds is not successfully parsed after *'M'[add_1000], nothing happens, because hundreds is allowed to be missing — it's optional. If hundreds is parsed successfully, say by matching "CC", the resulting attribute, 200, is added to result inside add.
接下来是下一部分: -hundreds[add] 。到现在,索引操作符的使用应该已经很熟悉了;它与解析器 hundreds 关联语义动作 add 。开头的 operator- 是新的。这意味着应用到的解析器是可选的。你可以把它读作“零或一”。所以,如果 hundreds*'M'[add_1000] 之后没有成功解析,就没有什么发生,因为 hundreds 可以缺失——它是可选的。如果 hundreds 成功解析,比如说通过匹配 "CC" ,结果属性 200 将被添加到 result 中的 add 内部。

Here is the full listing of the program. Notice that it would have been inappropriate to use a whitespace skipper here, since the entire parse is a single number, so it was removed.
这里是程序的完整列表。请注意,在这里使用空格跳过是不合适的,因为整个解析是一个单独的数字,所以它被移除了。

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a number using Roman numerals. ";
    std::string input;
    std::getline(std::cin, input);

    bp::symbols<int> const ones = {
        {"I", 1},
        {"II", 2},
        {"III", 3},
        {"IV", 4},
        {"V", 5},
        {"VI", 6},
        {"VII", 7},
        {"VIII", 8},
        {"IX", 9}};

    bp::symbols<int> const tens = {
        {"X", 10},
        {"XX", 20},
        {"XXX", 30},
        {"XL", 40},
        {"L", 50},
        {"LX", 60},
        {"LXX", 70},
        {"LXXX", 80},
        {"XC", 90}};

    bp::symbols<int> const hundreds = {
        {"C", 100},
        {"CC", 200},
        {"CCC", 300},
        {"CD", 400},
        {"D", 500},
        {"DC", 600},
        {"DCC", 700},
        {"DCCC", 800},
        {"CM", 900}};

    int result = 0;
    auto const add_1000 = [&result](auto & ctx) { result += 1000; };
    auto const add = [&result](auto & ctx) { result += _attr(ctx); };

    using namespace bp::literals;
    auto const parser =
        *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add];

    if (bp::parse(input, parser) && result != 0)
        std::cout << "That's " << result << " in Arabic numerals.\n";
    else
        std::cout << "That's not a Roman number.\n";
}

[Important] Important  重要

symbols stores all its strings in UTF-32 internally. If you do Unicode or ASCII parsing, this will not matter to you at all. If you do non-Unicode parsing of a character encoding that is not a subset of Unicode (EBCDIC, for instance), it could cause problems. See the section on Unicode Support for more information.
symbols 在内部以 UTF-32 存储所有字符串。如果你进行 Unicode 或 ASCII 解析,这对你来说根本无关紧要。如果你对不是 Unicode 子集的字符编码进行非 Unicode 解析(例如 EBCDIC),可能会引起问题。有关更多信息,请参阅关于 Unicode 支持的章节。

Diagnostic messages  诊断信息

Just like with a rule, you can give a symbols a bit of diagnostic text that will be used in error messages generated by Boost.Parser when the parse fails at an expectation point, as described in Error Handling and Debugging. See the symbols constructors for details.
就像使用 rule 一样,您可以为 symbols 提供一些诊断文本,这些文本将在 Boost.Parser 在期望点解析失败时生成的错误消息中使用,如错误处理和调试中所述。有关详细信息,请参阅 symbols 构造函数。

The previous example showed how to use a symbol table as a fixed lookup table. What if we want to add things to the table during the parse? We can do that, but we need to do so within a semantic action. First, here is our symbol table, already with a single value in it:
前一个示例展示了如何使用符号表作为固定查找表。如果我们想在解析过程中向表中添加内容怎么办?我们可以这样做,但需要在语义动作中完成。首先,这是我们的符号表,其中已经包含了一个值:

bp::symbols<int> const symbols = {{"c", 8}};
assert(parse("c", symbols));

No surprise that it works to use the symbol table as a parser to parse the one string in the symbol table. Now, here's our parser:
没有任何惊讶,使用符号表作为解析器来解析符号表中的一个字符串是可行的。现在,这是我们的解析器:

auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;

Here, we've attached the semantic action not to a simple parser like double_, but to the sequence parser (bp::char_ >> bp::int_). This sequence parser contains two parsers, each with its own attribute, so it produces two attributes as a tuple.
这里,我们将语义动作附加到序列解析器 (bp::char_ >> bp::int_) ,而不是简单的解析器 double_ 。这个序列解析器包含两个解析器,每个解析器都有自己的属性,因此它产生一个包含两个属性的元组。

auto const add_symbol = [&symbols](auto & ctx) {
    using namespace bp::literals;
    // symbols::insert() requires a string, not a single character.
    char chars[2] = {_attr(ctx)[0_c], 0};
    symbols.insert(ctx, chars, _attr(ctx)[1_c]);
};

Inside the semantic action, we can get the first element of the attribute tuple using UDLs provided by Boost.Hana, and boost::hana::tuple::operator[](). The first attribute, from the char_, is _attr(ctx)[0_c], and the second, from the int_, is _attr(ctx)[1_c] (if boost::parser::tuple aliases to std::tuple, you'd use std::get or boost::parser::get instead). To add the symbol to the symbol table, we call insert().
在语义动作中,我们可以使用 Boost.Hana 提供的 UDL 获取属性元组的第一个元素,以及 boost::hana::tuple::operator[]() 。第一个属性,来自 char_ ,是 _attr(ctx)[0_c] ,第二个,来自 int_ ,是 _attr(ctx)[1_c] (如果 boost::parser::tuple 别名到 std::tuple ,则使用 std::getboost::parser::get )。要将符号添加到符号表中,我们调用 insert()

auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;

During the parse, ("X", 9) is parsed and added to the symbol table. Then, the second 'X' is recognized by the symbol table parser. However:
在解析过程中, ("X", 9) 被解析并添加到符号表中。然后,符号表解析器识别了第二个 'X' 。然而:

assert(!parse("X", symbols));

If we parse again, we find that "X" did not stay in the symbol table. The fact that symbols was declared const might have given you a hint that this would happen.
如果我们再次解析,我们会发现 "X" 没有留在符号表中。 symbols 被声明为 const 的事实可能已经给你暗示了这种情况会发生。

The full program:   整个程序:

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    bp::symbols<int> const symbols = {{"c", 8}};
    assert(parse("c", symbols));

    auto const add_symbol = [&symbols](auto & ctx) {
        using namespace bp::literals;
        // symbols::insert() requires a string, not a single character.
        char chars[2] = {_attr(ctx)[0_c], 0};
        symbols.insert(ctx, chars, _attr(ctx)[1_c]);
    };
    auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;

    auto const result = parse("X 9 X", parser, bp::ws);
    assert(result && *result == 9);
    (void)result;

    assert(!parse("X", symbols));
}

[Important] Important  重要

symbols stores all its strings in UTF-32 internally. If you do Unicode or ASCII parsing, this will not matter to you at all. If you do non-Unicode parsing of a character encoding that is not a subset of Unicode (EBCDIC, for instance), it could cause problems. See the section on Unicode Support for more information.
symbols 在内部以 UTF-32 存储所有字符串。如果你进行 Unicode 或 ASCII 解析,这对你来说根本无关紧要。如果你对不是 Unicode 子集的字符编码进行非 Unicode 解析(例如 EBCDIC),可能会引起问题。有关更多信息,请参阅关于 Unicode 支持的章节。

It is possible to add symbols to a symbols permanently. To do so, you have to use a mutable symbols object s, and add the symbols by calling s.insert_for_next_parse(), instead of s.insert(). These two operations are orthogonal, so if you want to both add a symbol to the table for the current top-level parse, and leave it in the table for subsequent top-level parses, you need to call both functions.
可以永久地向 symbols 添加符号。为此,您必须使用可变 symbols 对象 s ,并通过调用 s.insert_for_next_parse() 添加符号,而不是 s.insert() 。这两个操作是正交的,因此如果您想同时将符号添加到当前顶级解析的表中,并保留在后续顶级解析的表中,您需要调用这两个函数。

It is also possible to erase a single entry from the symbol table, or to clear the symbol table entirely. Just as with insertion, there are versions of erase and clear for the current parse, and another that applies only to subsequent parses. The full set of operations can be found in the symbols API docs.
也可以从符号表中删除单个条目,或者完全清除符号表。与插入类似,删除和清除操作也有针对当前解析的版本,以及仅适用于后续解析的版本。完整的操作集可以在 symbols API 文档中找到。

[mpte There are two versions of each of the symbols *_for_next_parse() functions — one that takes a context, and one that does not. The one with the context is meant to be used within a semantic action. The one without the context is for use outside of any parse.]
[mpte 每个 symbols *_for_next_parse() 函数都有两个版本——一个接受上下文,一个不接受。带有上下文的版本旨在在语义动作中使用。不带上下文的版本用于任何解析之外。]

Boost.Parser comes with all the parsers most parsing tasks will ever need. Each one is a constexpr object, or a constexpr function. Some of the non-functions are also callable, such as char_, which may be used directly, or with arguments, as in char_('a', 'z'). Any parser that can be called, whether a function or callable object, will be called a callable parser from now on. Note that there are no nullary callable parsers; they each take one or more arguments.
Boost.Parser 附带所有大多数解析任务所需的解析器。每个解析器都是一个 constexpr 对象,或者一个 constexpr 函数。其中一些非函数也是可调用的,例如 char_ ,可以直接使用,或者带参数使用,如 char_ ('a', 'z') 。任何可以调用的解析器,无论是函数还是可调用对象,从现在起都称为可调用解析器。请注意,没有无参可调用解析器;它们每个都接受一个或多个参数。

Each callable parser takes one or more parse arguments. A parse argument may be a value or an invocable object that accepts a reference to the parse context. The reference parameter may be mutable or constant. For example:
每个可调用的解析器接受一个或多个解析参数。解析参数可能是一个值或接受解析上下文引用的可调用对象。引用参数可以是可变的或常量的。例如:

struct get_attribute
{
    template<typename Context>
    auto operator()(Context & ctx)
    {
        return _attr(ctx);
    }
};

This can also be a lambda. For example:
这也可以是一个 lambda。例如:

[](auto const & ctx) { return _attr(ctx); }

The operation that produces a value from a parse argument, which may be a value or a callable taking a parse context argument, is referred to as resolving the parse argument. If a parse argument arg can be called with the current context, then the resolved value of arg is arg(ctx); otherwise, the resolved value is just arg.
解析参数的操作,该参数可能是一个值或一个接受解析上下文参数的可调用对象,被称为解析参数的解析。如果解析参数 arg 可以在当前上下文中调用,则 arg 的解析值为 arg(ctx) ;否则,解析值就是 arg

Some callable parsers take a parse predicate. A parse predicate is not quite the same as a parse argument, because it must be a callable object, and cannot be a value. A parse predicate's return type must be contextually convertible to bool. For example:
一些可调用的解析器接受一个解析谓词。解析谓词并不完全等同于解析参数,因为它必须是一个可调用对象,而不能是一个值。解析谓词的返回类型必须能够上下文转换成 bool 。例如:

struct equals_three
{
    template<typename Context>
    bool operator()(Context const & ctx)
    {
        return _attr(ctx) == 3;
    }
};

This may of course be a lambda:
这当然可能是一个 lambda:

[](auto & ctx) { return _attr(ctx) == 3; }

The notional macro RESOLVE() expands to the result of resolving a parse argument or parse predicate. You'll see it used in the rest of the documentation.
该概念宏 RESOLVE () 扩展为解析参数或解析谓词解析的结果。您将在文档的其余部分看到它的使用。

An example of how parse arguments are used:
一个解析参数的使用示例:

namespace bp = boost::parser;
// This parser matches one code point that is at least 'a', and at most
// the value of last_char, which comes from the globals.
auto last_char = [](auto & ctx) { return _globals(ctx).last_char; }
auto subparser = bp::char_('a', last_char);

Don't worry for now about what the globals are for now; the take-away is that you can make any argument you pass to a parser depend on the current state of the parse, by using the parse context:
现在不用担心全局变量是什么;重要的是,你可以通过使用解析上下文,使传递给解析器的任何参数都依赖于当前的解析状态

namespace bp = boost::parser;
// This parser parses two code points.  For the parse to succeed, the
// second one must be >= 'a' and <= the first one.
auto set_last_char = [](auto & ctx) { _globals(ctx).last_char = _attr(x); };
auto parser = bp::char_[set_last_char] >> subparser;

Each callable parser returns a new parser, parameterized using the arguments given in the invocation.
每个可调用的解析器都返回一个新的解析器,该解析器使用在调用中给出的参数进行参数化。

This table lists all the Boost.Parser parsers. For the callable parsers, a separate entry exists for each possible arity of arguments. For a parser p, if there is no entry for p without arguments, p is a function, and cannot itself be used as a parser; it must be called. In the table below:
此表列出了所有 Boost.Parser 解析器。对于可调用的解析器,每个可能的参数数量都有一个单独的条目。对于解析器 p ,如果没有不带参数的 p 条目, p 是一个函数,它本身不能用作解析器;必须调用它。在下表中:

  • each entry is a global object usable directly in your parsers, unless otherwise noted;
    每条条目都是一个全局对象,可以直接在您的解析器中使用,除非另有说明;
  • "code point" is used to refer to the elements of the input range, which assumes that the parse is being done in the Unicode-aware code path (if the parse is being done in the non-Unicode code path, read "code point" as "char");
    "码点"用于指代输入范围的元素,假设解析是在 Unicode 感知的代码路径中进行的(如果解析是在非 Unicode 代码路径中进行的,将"码点"读作" char ");
  • RESOLVE() is a notional macro that expands to the resolution of parse argument or evaluation of a parse predicate (see The Parsers And Their Uses);
    RESOLVE () 是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》)
  • "RESOLVE(pred) == true" is a shorthand notation for "RESOLVE(pred) is contextually convertible to bool and true"; likewise for false;
    " RESOLVE(pred) == true " 是 " RESOLVE(pred) 在语境上可转换为 booltrue " 的缩写;同样适用于 false
  • c is a character of type char, char8_t, or char32_t;
    c 是类型 charchar8_tchar32_t 的字符;
  • str is a string literal of type char const[], char8_t const [], or char32_t const [];
    str 是类型 char const[]char8_t const []char32_t const [] 的字符串字面量;
  • pred is a parse predicate;
    pred 是一个解析谓词;
  • arg0, arg1, arg2, ... are parse arguments;
    arg0arg1arg2 等是解析参数;
  • a is a semantic action;
    a 是一个语义动作;
  • r is an object whose type models parsable_range;
    r 是一个类型为 parsable_range 的对象
  • p, p1, p2, ... are parsers; and
    pp1p2 等是解析器;并且
  • escapes is a symbols<T> object, where T is char or char32_t.
    escapes 是一个 symbols<T> 对象,其中 Tcharchar32_t
[Note] Note  注意

The definition of parsable_range is:
parsable_range 的定义是:

template<typename T>
concept parsable_range = std::ranges::forward_range<T> &&
    code_unit<std::ranges::range_value_t<T>>;

[Note] Note  注意

Some of the parsers in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.
一些表格中的解析器不消耗输入。除非在下面的表格中另有说明,所有解析器都会消耗它们匹配的输入。

Table 26.6. Parsers and Their Semantics
表 26.6. 解析器和它们的语义

Parser   解析器

Semantics   语义

Attribute Type   属性类型

Notes   注释

eps

Matches epsilon, the empty string. Always matches, and consumes no input.
匹配 epsilon,空字符串。总是匹配,不消耗任何输入。

None.

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters *eps, +eps, etc (this applies to unconditional eps only).
匹配 eps 无限次将导致无限循环,这是 C++中的未定义行为。Boost.Parser 在调试模式下遇到 *eps+eps 等时将断言(这仅适用于无条件 eps )。

eps(pred)

Fails to match the input if RESOLVE(pred) == false. Otherwise, the semantics are those of eps.
无法匹配输入,如果 RESOLVE(pred) == false 。否则,语义为 eps

None.

ws

Matches a single whitespace code point (see note), according to the Unicode White_Space property.
匹配单个空白代码点(见注解),根据 Unicode White_Space 属性。

None.

For more info, see the Unicode properties. ws may consume one code point or two. It only consumes two code points when it matches "\r\n".
更多信息,请参阅 Unicode 属性。 ws 可能消耗一个或两个码点。当它与 "\r\n" 匹配时,它只消耗两个码点。

eol

Matches a single newline (see note), following the "hard" line breaks in the Unicode line breaking algorithm.
匹配单个换行符(见注解),在 Unicode 断行算法中的“硬”断行之后。

None.

For more info, see the Unicode Line Breaking Algorithm. eol may consume one code point or two. It only consumes two code points when it matches "\r\n".
关于更多信息,请参阅 Unicode 行分隔算法。 eol 可能消耗一个或两个码点。当它匹配 "\r\n" 时,它只消耗两个码点。

eoi

Matches only at the end of input, and consumes no input.
仅匹配输入的末尾,不消耗任何输入。

None.

attr(arg0)

Always matches, and consumes no input. Generates the attribute RESOLVE(arg0).
总是匹配,不消耗输入。生成属性 RESOLVE(arg0)

decltype(RESOLVE(arg0)).

An important use case for attribute is to provide a default attribute value as a trailing alternative. For instance, an optional comma-delmited list is: int_ % ',' | attr(std::vector<int>). Without the "| attr(...)", at least one int_ match would be required.
一个重要的用例是使用 attribute 作为尾随备选方案来提供默认属性值。例如,一个可选的逗号分隔列表是: int_ % ',' | attr(std::vector<int>) 。如果没有“ | attr(...) ”,至少需要一个 int_ 匹配。

char_

Matches any single code point.
匹配任何单个码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

char_(arg0)

Matches exactly the code point RESOLVE(arg0).
匹配精确的代码点 RESOLVE(arg0)

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

char_(arg0, arg1)

Matches the next code point n in the input, if RESOLVE(arg0) <= n && n <= RESOLVE(arg1).
匹配输入中的下一个代码点 n ,如果 RESOLVE(arg0) <= n && n <= RESOLVE(arg1)

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

char_(r)

Matches the next code point n in the input, if n is one of the code points in r.
匹配输入中的下一个代码点 n ,如果 nr 中的代码点之一。

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见属性生成。

r is taken to be in a UTF encoding. The exact UTF used depends on r's element type. If you do not pass UTF encoded ranges for r, the behavior of char_ is undefined. Note that ASCII is a subset of UTF-8, so ASCII is fine. EBCDIC is not. r is not copied; a reference to it is taken. The lifetime of char_(r) must be within the lifetime of r. This overload of char_ does not take parse arguments.
r 被视为 UTF 编码。确切的 UTF 取决于 r 的元素类型。如果不为 r 提供 UTF 编码的范围, char_ 的行为是未定义的。注意,ASCII 是 UTF-8 的子集,所以 ASCII 是可以的。EBCDIC 不行。 r 不会被复制;而是取其引用。 char_(r) 的生命周期必须在 r 的生命周期内。此 char_ 重载不接收解析参数。

cp

Matches a single code point.
匹配单个码点。

char32_t

Similar to char_, but with a fixed char32_t attribute type; cp has all the same call operator overloads as char_, though they are not repeated here, for brevity.
类似于 char_ ,但具有固定的 char32_t 属性类型; cp 具有与 char_ 相同的调用操作符重载,尽管这里没有重复,以节省篇幅。

cu

Matches a single code point.
匹配单个码点。

char

Similar to char_, but with a fixed char attribute type; cu has all the same call operator overloads as char_, though they are not repeated here, for brevity. Even though the name "cu" suggests that this parser match at the code unit level, it does not. The name refers to the attribute type generated, much like the names int_ versus uint_.
类似于 char_ ,但具有固定的 char 属性类型; cu 具有与 char_ 相同的所有调用运算符重载,尽管这里没有重复,以节省篇幅。尽管名称“ cu ”暗示这个解析器在代码单元级别匹配,但实际上并非如此。该名称指的是生成的属性类型,就像名称 int_uint_ 一样。

blank

Equivalent to ws - eol.
相当于 ws - eol

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

control

Matches a single control-character code point.
匹配单个控制字符代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

digit

Matches a single decimal digit code point.
匹配单个十进制数字码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

punct

Matches a single punctuation code point.
匹配单个标点符号代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

hex_digit

Matches a single hexidecimal digit code point.
匹配单个十六进制数字代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

lower

Matches a single lower-case code point.
匹配单个小写代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

upper

Matches a single upper-case code point.
匹配单个大写代码点。

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char 。参见 char_ 条目。

lit(c)

Matches exactly the given code point c.
匹配给定的代码点 c

None.

lit() does not take parse arguments.
lit() 不接受解析参数。

c_l

Matches exactly the given code point c.
匹配给定的代码点 c

None.

This is a UDL that represents lit(c), for example 'F'_l.
这是一个代表 lit(c) 的 UDL,例如 'F'_l

lit(r)

Matches exactly the given string r.
完全匹配给定的字符串 r

None.

lit() does not take parse arguments.
lit() 不接受解析参数。

str_l

Matches exactly the given string str.
完全匹配给定的字符串 str

None.

This is a UDL that represents lit(s), for example "a string"_l.
这是一个代表 lit(s) 的 UDL,例如 "a string"_l

string(r)

Matches exactly r, and generates the match as an attribute.
匹配精确地 r ,并将匹配项作为属性生成。

std::string

string() does not take parse arguments.
string() 不接受解析参数。

str_p

Matches exactly str, and generates the match as an attribute.
匹配精确地 str ,并将匹配项作为属性生成。

std::string

This is a UDL that represents string(s), for example "a string"_p.
这是一个代表 string(s) 的 UDL,例如 "a string"_p

bool_

Matches "true" or "false".
匹配 "true""false"

bool

bin

Matches a binary unsigned integral value.
匹配一个二进制无符号整数值。

unsigned int

For example, bin would match "101", and generate an attribute of 5u.
例如, bin 会匹配 "101" ,并生成 5u 的属性。

bin(arg0)

Matches exactly the binary unsigned integral value RESOLVE(arg0).
匹配二进制无符号整数值 RESOLVE(arg0)

unsigned int

oct

Matches an octal unsigned integral value.
匹配一个八进制无符号整数值。

unsigned int

For example, oct would match "31", and generate an attribute of 25u.
例如, oct 会匹配 "31" ,并生成 25u 的属性。

oct(arg0)

Matches exactly the octal unsigned integral value RESOLVE(arg0).
匹配精确的八进制无符号整数值 RESOLVE(arg0)

unsigned int

hex

Matches a hexadecimal unsigned integral value.
匹配一个无符号十六进制整数值。

unsigned int

For example, hex would match "ff", and generate an attribute of 255u.
例如, hex 会匹配 "ff" ,并生成 255u 的属性。

hex(arg0)

Matches exactly the hexadecimal unsigned integral value RESOLVE(arg0).
匹配精确的十六进制无符号整数值 RESOLVE(arg0)

unsigned int

ushort_

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned short

ushort_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned short

uint_

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned int

uint_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned int

ulong_

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned long

ulong_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned long

ulong_long

Matches an unsigned integral value.
匹配一个无符号整数值。

unsigned long long

ulong_long(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).
匹配无符号整数值 RESOLVE(arg0)

unsigned long long

short_

Matches a signed integral value.
匹配一个有符号整数值。

short

short_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

short

int_

Matches a signed integral value.
匹配一个有符号整数值。

int

int_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

int

long_

Matches a signed integral value.
匹配一个有符号整数值。

long

long_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

long

long_long

Matches a signed integral value.
匹配一个有符号整数值。

long long

long_long(arg0)

Matches exactly the signed integral value RESOLVE(arg0).
匹配精确的已签名的整数值 RESOLVE(arg0)

long long

float_

Matches a floating-point number. float_ uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default RealPolicies is supported by float_.
匹配一个浮点数。 float_ 使用了 Boost.Spirit 的解析实现细节。接受的格式具体可以在它们的实数解析器中找到。注意,只有默认的 RealPoliciesfloat_ 支持。

float

double_

Matches a floating-point number. double_ uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default RealPolicies is supported by double_.
匹配一个浮点数。 double_ 使用了 Boost.Spirit 的解析实现细节。接受的格式具体可以在它们的实数解析器中找到。注意,只有默认的 RealPoliciesdouble_ 支持。

double

repeat(arg0)[p]

Matches iff p matches exactly RESOLVE(arg0) times.
匹配当且仅当 p 恰好匹配 RESOLVE(arg0) 次。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

The special value Inf may be used; it indicates unlimited repetition. decltype(RESOLVE(arg0)) must be implicitly convertible to int64_t. Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters repeat(Inf)[eps] (this applies to unconditional eps only).
特殊值 Inf 可以使用;它表示无限重复。 decltype(RESOLVE(arg0)) 必须隐式转换为 int64_t 。匹配 eps 无限次将创建无限循环,这是 C++ 中的未定义行为。Boost.Parser 在调试模式下遇到 repeat(Inf)[eps] 时将断言(这仅适用于无条件 eps )。

repeat(arg0, arg1)[p]

Matches iff p matches between RESOLVE(arg0) and RESOLVE(arg1) times, inclusively.
匹配当且仅当 pRESOLVE(arg0)RESOLVE(arg1) 之间(含两端)匹配 RESOLVE(arg1) 次。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

The special value Inf may be used for the upper bound; it indicates unlimited repetition. decltype(RESOLVE(arg0)) and decltype(RESOLVE(arg1)) each must be implicitly convertible to int64_t. Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters repeat(n, Inf)[eps] (this applies to unconditional eps only).
特殊值 Inf 可用于上界;它表示无限重复。 decltype(RESOLVE(arg0))decltype(RESOLVE(arg1)) 必须隐式转换为 int64_t 。匹配 eps 无限次将创建无限循环,这是 C++ 中的未定义行为。Boost.Parser 在调试模式下遇到 repeat(n, Inf)[eps] 时将断言(这仅适用于无条件 eps )。

if_(pred)[p]

Equivalent to eps(pred) >> p.
相当于 eps(pred) >> p

std::optional<ATTR(p)>

It is an error to write if_(pred). That is, it is an error to omit the conditionally matched parser p.
这是写入 if_(pred) 的错误。也就是说,省略条件匹配的解析器 p 是错误的。

switch_(arg0)(arg1, p1)(arg2, p2) ...

Equivalent to p1 when RESOLVE(arg0) == RESOLVE(arg1), p2 when RESOLVE(arg0) == RESOLVE(arg2), etc. If there is such no argN, the behavior of switch_() is undefined.
相当于当 RESOLVE(arg0) == RESOLVE(arg1)p1 ,当 RESOLVE(arg0) == RESOLVE(arg2)p2 ,等等。如果没有这样的 argNswitch_() 的行为是未定义的。

std::variant<ATTR(p1), ATTR(p2), ...>

It is an error to write switch_(arg0). That is, it is an error to omit the conditionally matched parsers p1, p2, ....
这是写入 switch_(arg0) 的错误。也就是说,省略条件匹配的解析器 p1p2 ……是错误的。

symbols<T>

symbols is an associative container of key, value pairs. Each key is a std::string and each value has type T. In the Unicode parsing path, the strings are considered to be UTF-8 encoded; in the non-Unicode path, no encoding is assumed. symbols Matches the longest prefix pre of the input that is equal to one of the keys k. If the length len of pre is zero, and there is no zero-length key, it does not match the input. If len is positive, the generated attribute is the value associated with k.
symbols 是一个键值对的关联容器。每个键是 std::string ,每个值具有类型 T 。在 Unicode 解析路径中,字符串被认为是 UTF-8 编码的;在非 Unicode 路径中,假设没有编码。 symbols 匹配输入的最长前缀 pre ,该前缀等于键 k 之一。如果 len 的长度 pre 为零,并且没有零长度键,则不匹配输入。如果 len 为正,则生成的属性是与 k 关联的值。

T

Unlike the other entries in this table, symbols is a type, not an object.
不同于本表中的其他条目, symbols 是一种类型,而不是一个对象。

quoted_string

Matches '"', followed by zero or more characters, followed by '"'.
匹配 '"' ,后跟零个或多个字符,后跟 '"'

std::string

The result does not include the quotes. A quote within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括引号。字符串中的引号可以通过转义它来写入,即使用反斜杠。字符串中的反斜杠可以通过写两个连续的反斜杠来写入。除上述用法外,任何其他反斜杠的使用都将导致解析失败。在解析整个字符串时,跳过是禁用的,就像使用 lexeme[] 一样。

quoted_string(c)

Matches c, followed by zero or more characters, followed by c.
匹配 c ,后跟零个或多个字符,后跟 c

std::string

The result does not include the c quotes. A c within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 c 引号。字符串中的 c 可以通过使用反斜杠进行转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。除了解释字符串外,任何其他使用反斜杠的情况都会导致解析失败。在解析整个字符串时,跳过功能被禁用,就像使用 lexeme[] 一样。

quoted_string(r)

Matches some character Q in r, followed by zero or more characters, followed by Q.
匹配某些字符 Qr 中,后跟零个或多个字符,然后是 Q

std::string

The result does not include the Q quotes. A Q within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 Q 引号。字符串中的 Q 可以通过使用反斜杠进行转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。除了解释字符串外,任何其他使用反斜杠的情况都会导致解析失败。在解析整个字符串时,跳过功能被禁用,就像使用 lexeme[] 一样。

quoted_string(c, symbols)

Matches c, followed by zero or more characters, followed by c.
匹配 c ,后跟零个或多个字符,后跟 c

std::string

The result does not include the c quotes. A c within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using symbols will be interpreted as the corresponding value produced by symbols. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 c 引号。字符串中的 c 可以通过使用反斜杠转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。反斜杠后跟一个成功的匹配使用 symbols 将被解释为 symbols 生成的相应值。任何其他反斜杠的使用都将导致解析失败。在解析整个字符串时,跳过是禁用的,就像使用 lexeme[] 一样。

quoted_string(r, symbols)

Matches some character Q in r, followed by zero or more characters, followed by Q.
匹配某些字符 Qr 中,后跟零个或多个字符,然后是 Q

std::string

The result does not include the Q quotes. A Q within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using symbols will be interpreted as the corresponding value produced by symbols. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].
结果不包括 Q 引号。字符串中的 Q 可以通过使用反斜杠转义来表示。字符串中的反斜杠可以通过连续写两个反斜杠来表示。反斜杠后跟一个成功的匹配使用 symbols 将被解释为 symbols 生成的相应值。任何其他反斜杠的使用都将导致解析失败。在解析整个字符串时,跳过是禁用的,就像使用 lexeme[] 一样。


[Important] Important  重要

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).
所有字符解析器,如 char_cpcu ,都产生 charchar32_t 属性。因此,当你在上面的表中看到“ std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)> ”时,这实际上意味着每个字符属性序列都被转换成 std::string 。唯一不会发生这种情况的是,当你使用另一种字符类型引入具有属性的规则(或使用 attribute 这样做)。

[Note] Note  注意

A slightly more complete description of the attributes generated by these parsers is in a subsequent section. The attributes are repeated here so you can use see all the properties of the parsers in one place.
一个对这些解析器生成的属性更完整的描述将在下一节中。属性在此处重复,以便您可以在一个地方查看解析器的所有属性。

If you have an integral type IntType that is not covered by any of the Boost.Parser parsers, you can use a more verbose declaration to declare a parser for IntType. If IntType were unsigned, you would use uint_parser. If it were signed, you would use int_parser. For example:
如果您有一个任何 Boost.Parser 解析器都没有涵盖的整型 IntType ,您可以使用更冗长的声明来声明一个解析器用于 IntType 。如果 IntType 是无符号的,您将使用 uint_parser 。如果是带符号的,您将使用 int_parser 。例如:

constexpr parser_interface<int_parser<IntType>> hex_int;

uint_parser and int_parser accept three more non-type template parameters after the type parameter. They are Radix, MinDigits, and MaxDigits. Radix defaults to 10, MinDigits to 1, and MaxDigits to -1, which is a sentinel value meaning that there is no max number of digits.
uint_parserint_parser 在类型参数之后接受三个额外的非类型模板参数。它们是 RadixMinDigitsMaxDigitsRadix 默认为 10MinDigits1MaxDigits-1 ,这是一个哨兵值,表示没有最大数字限制。

So, if you wanted to parse exactly eight hexadecimal digits in a row in order to recognize Unicode character literals like C++ has (e.g. \Udeadbeef), you could use this parser for the digits at the end:
因此,如果您想解析连续的八个十六进制数字以识别类似于 C++中的 Unicode 字符字面量(例如 \Udeadbeef ),则可以使用此解析器来解析末尾的数字:

constexpr parser_interface<uint_parser<unsigned int, 16, 8, 8>> hex_int;

A directive is an element of your parser that doesn't have any meaning by itself. Some are second-order parsers that need a first-order parser to do the actual parsing. Others influence the parse in some way. You can often spot a directive lexically by its use of []; directives always []. Non-directives might, but only when attaching a semantic action.
指令是您解析器的一个元素,它本身没有任何意义。有些是二阶解析器,需要一阶解析器来进行实际的解析。其他的一些以某种方式影响解析。您通常可以通过其使用 [] ;指令来通过词法识别出指令;非指令可能,但仅当附加语义动作时。

The directives that are second order parsers are technically directives, but since they are also used to create parsers, it is more useful just to focus on that. The directives repeat() and if_() were already described in the section on parsers; we won't say much about them here.
二阶解析器指令在技术上也是指令,但鉴于它们也用于创建解析器,因此只需关注这一点更有用。指令 repeat()if_() 已在解析器部分中描述;这里我们不会过多介绍它们。

Interaction with sequence, alternative, and permutation parsers
与序列、替代和排列解析器的交互

Sequence, alternative, and permutation parsers do not nest in most cases. (Let's consider just sequence parsers to keep thinkgs simple, but most of this logic applies to alternative parsers as well.) a >> b >> c is the same as (a >> b) >> c and a >> (b >> c), and they are each represented by a single seq_parser with three subparsers, a, b, and c. However, if something prevents two seq_parsers from interacting directly, they will nest. For instance, lexeme[a >> b] >> c is a seq_parser containing two parsers, lexeme[a >> b] and c. This is because lexeme[] takes its given parser and wraps it in a lexeme_parser. This in turn turns off the sequence parser combining logic, since both sides of the second operator>> in lexeme[a >> b] >> c are not seq_parsers. Sequence parsers have several rules that govern what the overall attribute type of the parser is, based on the positions and attributes of it subparsers (see Attribute Generation). Therefore, it's important to know which directives create a new parser (and what kind), and which ones do not; this is indicated for each directive below.
序列、替代和排列解析器在大多数情况下不会嵌套。(让我们只考虑序列解析器以保持事情简单,但大部分逻辑也适用于替代解析器。) a >> b >> c(a >> b) >> ca >> (b >> c) 相同,它们各自由一个包含三个子解析器的单个 seq_parser 表示,分别是 abc 。然而,如果某些因素阻止两个 seq_parsers 直接交互,它们将会嵌套。例如, lexeme[a >> b] >> c 是一个包含两个解析器 lexeme[a >> b]cseq_parser 。这是因为 lexeme[] 将其给定的解析器包裹在 lexeme_parser 中。这反过来又关闭了序列解析器组合逻辑,因为 lexeme[a >> b] >> c 中的第二个 operator>> 的两边都不是 seq_parsers 。序列解析器有几条规则来规范解析器的整体属性类型,基于其子解析器的位置和属性(见属性生成)。因此,了解哪些指令创建新的解析器(以及是什么类型的解析器)以及哪些指令不创建解析器很重要;下面为每个指令指明了这一点。

The directives  指示
repeat()  重复()

See The Parsers And Their Uses. Creates a repeat_parser.
查看解析器和它们的用途。创建一个 repeat_parser

if_()

See The Parsers And Their Uses. Creates a seq_parser.
查看解析器和它们的用途。创建一个 seq_parser

omit[]  省略[]

omit[p] disables attribute generation for the parser p. Not only does omit[p] have no attribute, but any attribute generation work that normally happens within p is skipped.
omit[p] 禁用解析器的属性生成 p 。不仅没有属性,而且通常在 p 内发生的任何属性生成工作都会被跳过。

This directive can be useful in cases like this: say you have some fairly complicated parser p that generates a large and expensive-to-construct attribute. Now say that you want to write a function that just counts how many times p can match a string (where the matches are non-overlapping). Instead of using p directly, and building all those attributes, or rewriting p without the attribute generation, use omit[].
此指令在这种情况下可能很有用:比如说,你有一个相当复杂的解析器 p ,它生成一个庞大且构建成本高昂的属性。现在假设你想编写一个函数,只计算 p 可以匹配字符串的次数(匹配是非重叠的)。与其直接使用 p 并构建所有这些属性,或者在不生成属性的情况下重写 p ,不如使用 omit[]

Creates an omit_parser.
创建一个 omit_parser

raw[]  raw[]:原始数组

raw[p] changes the attribute from ATTR(p) to to a view that delimits the subrange of the input that was matched by p. The type of the view is subrange<I>, where I is the type of the iterator used within the parse. Note that this may not be the same as the iterator type passed to parse(). For instance, when parsing UTF-8, the iterator passed to parse() may be char8_t const *, but within the parse it will be a UTF-8 to UTF-32 transcoding (converting) iterator. Just like omit[], raw[] causes all attribute-generation work within p to be skipped.
raw[p] 将属性从 ATTR(p) 更改为定义由 p 匹配的输入子范围的视图。视图类型为 subrange<I> ,其中 I 是解析中使用的迭代器的类型。请注意,这可能与传递给 parse() 的迭代器类型不同。例如,当解析 UTF-8 时,传递给 parse() 的迭代器可能是 char8_t const * ,但在解析过程中将是一个 UTF-8 到 UTF-32 的转换(转换)迭代器。就像 omit[] 一样, raw[] 会导致在 p 内跳过所有属性生成工作。

Similar to the re-use scenario for omit[] above, raw[] could be used to find the locations of all non-overlapping matches of p in a string.
类似于上面 omit[] 的复用场景, raw[] 可以用来在一个字符串中找到所有非重叠匹配的 p 的位置。

Creates a raw_parser.
创建一个 raw_parser

string_view[]  字符串视图数组

string_view[p] is very similar to raw[p], except that it changes the attribute of p to std::basic_string_view<C>, where C is the character type of the underlying range being parsed. string_view[] requires that the underlying range being parsed is contiguous. Since this can only be detected in C++20 and later, string_view[] is not available in C++17 mode.
string_view[p]raw[p] 非常相似,除了它将 p 的属性更改为 std::basic_string_view<C> ,其中 C 是正在解析的底层范围的字符类型。 string_view[] 要求正在解析的底层范围是连续的。由于这只能在 C++20 及以后版本中检测到,因此 string_view[] 在 C++17 模式下不可用。

Similar to the re-use scenario for omit[] above, string_view[] could be used to find the locations of all non-overlapping matches of p in a string. Whether raw[] or string_view[] is more natural to use to report the locations depends on your use case, but they are essentially the same.
类似于上面 omit[] 的复用场景, string_view[] 可以用来查找字符串中所有非重叠匹配的 p 的位置。使用 raw[]string_view[] 来报告位置哪个更自然取决于你的用例,但它们本质上是一样的。

Creates a string_view_parser.
创建一个 string_view_parser

no_case[]  无_case[]

no_case[p] enables case-insensitive parsing within the parse of p. This applies to the text parsed by char_(), string(), and bool_ parsers. The number parsers are already case-insensitive. The case-insensitivity is achieved by doing Unicode case folding on the text being parsed and the values in the parser being matched (see note below if you want to know more about Unicode case folding). In the non-Unicode code path, a full Unicode case folding is not done; instead, only the transformations of values less than 0x100 are done. Examples:
no_case[p] 启用对 p 的解析中的不区分大小写的解析。这适用于 char_()string()bool_ 解析器解析的文本。数字解析器已经不区分大小写。通过在解析的文本和解析器中匹配的值上进行 Unicode 大小写折叠来实现不区分大小写(如需了解更多关于 Unicode 大小写折叠的信息,请参阅以下注释)。在非 Unicode 代码路径中,不执行完整的 Unicode 大小写折叠;相反,只对小于 0x100 的值进行转换。示例:

#include <boost/parser/transcode_view.hpp> // For as_utfN.

namespace bp = boost::parser;
auto const street_parser = bp::string(u8"Tobias Straße");
assert(!bp::parse("Tobias Strasse" | bp::as_utf32, street_parser));             // No match.
assert(bp::parse("Tobias Strasse" | bp::as_utf32, bp::no_case[street_parser])); // Match!

auto const alpha_parser = bp::no_case[bp::char_('a', 'z')];
assert(bp::parse("a" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!
assert(bp::parse("B" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!

Everything pretty much does what you'd naively expect inside no_case[], except that the two-character range version of char_ has a limitation. It only compares a code point from the input to its two arguments (e.g. 'a' and 'z' in the example above). It does not do anything special for multi-code point case folding expansions. For instance, char_(U'ß', U'ß') matches the input U"s", which makes sense, since U'ß' expands to U"ss". However, that same parser does not match the input U"ß"! In short, stick to pairs of code points that have single-code point case folding expansions. If you need to support the multi-expanding code points, use the other overload, like: char_(U"abcd/*...*/ß").
所有内容基本上都符合你天真地期望在 no_case[] 内执行的操作,除了 char_ 的两个字符范围版本有一个限制。它只将输入中的一个码点与其两个参数(例如上面的示例中的 'a''z' )进行比较。对于多码点的情况折叠扩展,它不做任何特殊处理。例如, char_(U'ß', U'ß') 与输入 U"s" 匹配,这是有意义的,因为 U'ß' 扩展为 U"ss" 。然而,那个相同的解析器不匹配输入 U"ß" !简而言之,坚持使用具有单码点情况折叠扩展的码点对。如果你需要支持多扩展的码点,请使用其他重载,如: char_(U"abcd/*...*/ß")

[Note] Note  注意

Unicode case folding is an operation that makes text uniformly one case, and if you do it to two bits of text A and B, then you can compare them bitwise to see if they are the same, except of case. Case folding may sometimes expand a code point into multiple code points (e.g. case folding "ẞ" yields "ss". When such a multi-code point expansion occurs, the expanded code points are in the NFKC normalization form.
Unicode 大小写折叠是一种将文本统一为单一种大小写的操作,如果你对两个文本片段 AB 进行大小写折叠,那么你可以通过位运算来比较它们是否相同,除了大小写之外。大小写折叠有时会将一个码点扩展成多个码点(例如,大小写折叠 "ẞ" 会产生 "ss" 。当发生这种多码点扩展时,扩展的码点处于 NFKC 归一化形式。

Creates a no_case_parser.
创建一个 no_case_parser

lexeme[]  lexeme[]:词元[]

lexeme[p] disables use of the skipper, if a skipper is being used, within the parse of p. This is useful, for instance, if you want to enable skipping in most parts of your parser, but disable it only in one section where it doesn't belong. If you are skipping whitespace in most of your parser, but want to parse strings that may contain spaces, you should use lexeme[]:
lexeme[p] 禁用跳过符的使用,如果在解析 p 时正在使用跳过符。这在某些情况下很有用,例如,如果您想在解析器的大多数部分启用跳过,但在不属于该部分的一个部分中禁用它。如果您在解析器的大多数部分跳过空白,但想解析可能包含空格的字符串,则应使用 lexeme[] :

namespace bp = boost::parser;
auto const string_parser = bp::lexeme['"' >> *(bp::char_ - '"') >> '"'];

Without lexeme[], our string parser would correctly match "foo bar", but the generated attribute would be "foobar".
没有 lexeme[] ,我们的字符串解析器会正确匹配 "foo bar" ,但生成的属性会是 "foobar"

Creates a lexeme_parser.
创建一个 lexeme_parser

skip[]  跳过[]

skip[] is like the inverse of lexeme[]. It enables skipping in the parse, even if it was not enabled before. For example, within a call to parse() that uses a skipper, let's say we have these parsers in use:
skip[]lexeme[] 的逆。它允许在解析中跳过,即使之前没有启用。例如,在一个使用跳转器的 parse() 调用中,假设我们使用了以下解析器:

namespace bp = boost::parser;
auto const one_or_more = +bp::char_;
auto const skip_or_skip_not_there_is_no_try = bp::lexeme[bp::skip[one_or_more] >> one_or_more];

The use of lexeme[] disables skipping, but then the use of skip[] turns it back on. The net result is that the first occurrence of one_or_more will use the skipper passed to parse(); the second will not.
使用 lexeme[] 禁用跳过,但随后使用 skip[] 又将其打开。最终结果是, one_or_more 的第一个出现将使用传递给 parse() 的跳过器;第二个则不会。

skip[] has another use. You can parameterize skip with a different parser to change the skipper just within the scope of the directive. Let's say we passed ws to parse(), and we're using these parsers somewhere within that parse() call:
skip[] 有另一种用途。您可以使用不同的解析器来参数化跳过,以便仅在指令的作用域内更改跳过器。假设我们将 ws 传递给 parse() ,并且我们正在该 parse() 调用中使用这些解析器:

namespace bp = boost::parser;
auto const zero_or_more = *bp::char_;
auto const skip_both_ways = zero_or_more >> bp::skip(bp::blank)[zero_or_more];

The first occurrence of zero_or_more will use the skipper passed to parse(), which is ws; the second will use blank as its skipper.
第一次出现 zero_or_more 将使用传递给 parse() 的跳过器,即 ws ;第二次将使用 blank 作为其跳过器。

Creates a skip_parser.
创建一个 skip_parser

merge[], separate[], and transform(f)[]
合并[], 分离[], 以及 transform(f)[]

These directives influence the generation of attributes. See Attribute Generation section for more details on them.
这些指令影响属性的生成。有关详细信息,请参阅属性生成部分。

merge[] and separate[] create a copy of the given seq_parser.
merge[]separate[] 创建给定 seq_parser 的副本。

transform(f)[] creates a tranform_parser.
transform(f)[] 创建一个 tranform_parser

Certain overloaded operators are defined for all parsers in Boost.Parser. We've already seen some of them used in this tutorial, especially operator>>, operator|, and operator||, which are used to form sequence parsers, alternative parsers, and permutation parsers, respectively.
某些重载运算符在 Boost.Parser 的所有解析器中都有定义。我们已经在本次教程中看到了一些它们的用法,特别是 operator>>operator|operator|| ,分别用于形成序列解析器、选择解析器和排列解析器。

Here are all the operator overloaded for parsers. In the tables below:
这里列出了所有用于解析器的运算符重载。在下表中的:

  • c is a character of type char or char32_t;
    c 是类型 charchar32_t 的字符;
  • a is a semantic action;
    a 是一个语义动作;
  • r is an object whose type models parsable_range (see Concepts); and
    r 是一个对象,其类型模拟 parsable_range (见概念);
  • p, p1, p2, ... are parsers.
    pp1p2 等是解析器。
[Note] Note  注意

Some of the expressions in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.
某些表格中的表达式不消耗任何输入。除非在下面的表格中另有说明,否则所有解析器都会消耗它们匹配的输入。

Table 26.7. Combining Operations and Their Semantics
表 26.7. 组合操作及其语义

Expression   表达式

Semantics   语义

Attribute Type   属性类型

Notes   注释

!p

Matches iff p does not match; consumes no input.
匹配当且仅当 p 不匹配;不消耗任何输入。

None.

&p

Matches iff p matches; consumes no input.
匹配当且仅当 p 匹配;不消耗任何输入。

None.

*p

Parses using p repeatedly until p no longer matches; always matches.
使用 p 重复解析,直到 p 不再匹配;始终匹配。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters *eps (this applies to unconditional eps only).
匹配 eps 无限次将导致无限循环,这是 C++中的未定义行为。Boost.Parser 在调试模式下遇到 *eps 时会断言(这仅适用于无条件 eps )。

+p

Parses using p repeatedly until p no longer matches; matches iff p matches at least once.
解析使用 p 重复进行,直到 p 不再匹配;如果 p 至少匹配一次,则匹配。

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters +eps (this applies to unconditional eps only).
匹配 eps 无限次将导致无限循环,这是 C++中的未定义行为。Boost.Parser 在调试模式下遇到 +eps 时会断言(这仅适用于无条件 eps )。

-p

Equivalent to p | eps.
相当于 p | eps

std::optional<ATTR(p)>

p1 >> p2

Matches iff p1 matches and then p2 matches.
匹配当且仅当 p1 匹配然后 p2 匹配。

boost::parser::tuple<ATTR(p1), ATTR(p2)> (See note.)    boost::parser::tuple<ATTR(p1), ATTR(p2)> (见注释。)

>> is associative; p1 >> p2 >> p3, (p1 >> p2) >> p3, and p1 >> (p2 >> p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes; see Attribute Generation for the full rules.
>> 是关联的; p1 >> p2 >> p3(p1 >> p2) >> p3p1 >> (p2 >> p3) 都等价。此属性类型仅适用于 p1p2 均生成属性的情况;请参阅属性生成以获取完整规则。

p >> c

Equivalent to p >> lit(c).
相当于 p >> lit(c)

ATTR(p)

p >> r

Equivalent to p >> lit(r).
相当于 p >> lit(r)

ATTR(p)

p1 > p2

Matches iff p1 matches and then p2 matches. No back-tracking is allowed after p1 matches; if p1 matches but then p2 does not, the top-level parse fails.
匹配当且仅当 p1 匹配然后 p2 匹配。 p1 匹配后不允许回溯;如果 p1 匹配但随后 p2 不匹配,则顶级解析失败。

boost::parser::tuple<ATTR(p1), ATTR(p2)> (See note.)    boost::parser::tuple<ATTR(p1), ATTR(p2)> (见注释。)

> is associative; p1 > p2 > p3, (p1 > p2) > p3, and p1 > (p2 > p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes; see Attribute Generation for the full rules.
> 是关联的; p1 > p2 > p3(p1 > p2) > p3p1 > (p2 > p3) 都等价。此属性类型仅适用于 p1p2 均生成属性的情况;请参阅属性生成以获取完整规则。

p > c

Equivalent to p > lit(c).
相当于 p > lit(c)

ATTR(p)

p > r

Equivalent to p > lit(r).
相当于 p > lit(r)

ATTR(p)

p1 | p2

Matches iff either p1 matches or p2 matches.
匹配当且仅当 p1 匹配或 p2 匹配。

std::variant<ATTR(p1), ATTR(p2)> (See note.)    std::variant<ATTR(p1), ATTR(p2)> (见注释。)

| is associative; p1 | p2 | p3, (p1 | p2) | p3, and p1 | (p2 | p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes, and where the attribute types are different; see Attribute Generation for the full rules.
| 是关联的; p1 | p2 | p3(p1 | p2) | p3p1 | (p2 | p3) 都等价。此属性类型仅适用于 p1p2 均生成属性且属性类型不同的情况;有关完整规则,请参阅属性生成。

p | c

Equivalent to p | lit(c).
相当于 p | lit(c)

ATTR(p)

p | r

Equivalent to p | lit(r).
相当于 p | lit(r)

ATTR(p)

p1 || p2

Matches iff p1 matches and p2 matches, regardless of the order they match in.
匹配当且仅当 p1 匹配且 p2 匹配,无论它们匹配的顺序如何。

boost::parser::tuple<ATTR(p1), ATTR(p2)>

|| is associative; p1 || p2 || p3, (p1 || p2) || p3, and p1 || (p2 || p3) are all equivalent. It is an error to include a eps (conditional or non-conditional) in an operator|| expression. Though the parsers are matched in any order, the attribute elements are always in the order written in the operator|| expression.
|| 是关联的; p1 || p2 || p3(p1 || p2) || p3p1 || (p2 || p3) 都等价。在 operator|| 表达式中包含 eps (条件或非条件)是错误的。尽管解析器可以按任何顺序匹配,但属性元素始终按照 operator|| 表达式中书写的顺序排列。

p1 - p2

Equivalent to !p2 >> p1.
相当于 !p2 >> p1

ATTR(p1)

p - c

Equivalent to p - lit(c).
相当于 p - lit(c)

ATTR(p)

p - r

Equivalent to p - lit(r).
相当于 p - lit(r)

ATTR(p)

p1 % p2

Equivalent to p1 >> *(p2 >> p1).
相当于 p1 >> *(p2 >> p1)

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p1)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p1)>

p % c

Equivalent to p % lit(c).
相当于 p % lit(c)

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

p % r

Equivalent to p % lit(r).
相当于 p % lit(r)

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

p[a]

Matches iff p matches. If p matches, the semantic action a is executed.
匹配当且仅当 p 匹配。如果 p 匹配,则执行语义动作 a

None.


[Important] Important  重要

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).
所有字符解析器,如 char_cpcu ,都产生 charchar32_t 属性。因此,当你在上面的表中看到“ std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)> ”时,这实际上意味着每个字符属性序列都被转换成 std::string 。唯一不会发生这种情况的是,当你使用另一种字符类型引入具有属性的规则(或使用 attribute 这样做)。

There are a couple of special rules not captured in the table above:
上表未涵盖以下几条特殊规则:

First, the zero-or-more and one-or-more repetitions (operator*() and operator+(), respectively) may collapse when combined. For any parser p, +(+p) collapses to +p; **p, *+p, and +*p each collapse to just *p.
首先,零次或多次和一次或多次的重复(分别用 operator*()operator+() 表示)在组合时可能会合并。对于任何解析器 p+(+p) 合并为 +p**p*+p+*p 各自合并为仅 *p

Second, using eps in an alternative parser as any alternative except the last one is a common source of errors; Boost.Parser disallows it. This is true because, for any parser p, eps | p is equivalent to eps, since eps always matches. This is not true for eps parameterized with a condition. For any condition cond, eps(cond) is allowed to appear anywhere within an alternative parser.
其次,在替代解析器中使用 eps 作为除最后一个以外的任何替代方案是常见的错误来源;Boost.Parser 禁止这样做。这是因为,对于任何解析器 peps | peps 是等价的,因为 eps 总是匹配。对于用条件参数化的 eps ,则不是这样。对于任何条件 condeps(cond) 都允许出现在替代解析器中的任何位置。

[Note] Note  注意

When looking at Boost.Parser parsers in a debugger, or when looking at their reference documentation, you may see reference to the template parser_interface. This template exists to provide the operator overloads described above. It allows the parsers themselves to be very simple — most parsers are just a struct with two member functions. parser_interface is essentially invisible when using Boost.Parser, and you should never have to name this template in your own code.
当在调试器中查看 Boost.Parser 解析器或查看它们的参考文档时,您可能会看到对模板 parser_interface 的引用。此模板存在是为了提供上述描述的运算符重载。它允许解析器本身非常简单——大多数解析器只是一个具有两个成员函数的结构体。 parser_interface 在 Boost.Parser 中使用时实际上是不可见的,您永远不需要在自己的代码中命名此模板。

So far, we've seen several different types of attributes that come from different parsers, int for int_, boost::parser::tuple<char, int> for boost::parser::char_ >> boost::parser::int_, etc. Let's get into how this works with more rigor.
到目前为止,我们已经看到了来自不同解析器的几种不同类型的属性,例如 int 对应于 int_boost::parser::tuple<char, int> 对应于 boost::parser::char_ >> boost::parser::int_ 等。让我们更严谨地探讨这是如何工作的。

[Note] Note  注意

Some parsers have no attribute at all. In the tables below, the type of the attribute is listed as "None." There is a non-void type that is returned from each parser that lacks an attribute. This keeps the logic simple; having to handle the two cases — void or non-void — would make the library significantly more complicated. The type of this non-void attribute associated with these parsers is an implementation detail. The type comes from the boost::parser::detail namespace and is pretty useless. You should never see this type in practice. Within semantic actions, asking for the attribute of a non-attribute-producing parser (using _attr(ctx)) will yield a value of the special type boost::parser::none. When calling parse() in a form that returns the attribute parsed, when there is no attribute, simply returns bool; this indicates the success of failure of the parse.
一些解析器没有任何属性。在下表中,属性的类型被列为“None”。每个缺少属性的解析器返回一个非 void 类型。这使逻辑简单;处理两个情况—— void 或非 void ——会使库变得复杂得多。与这些解析器关联的非 void 属性的类型是实现细节。类型来自 boost::parser::detail 命名空间,相当无用。在实际情况中,您不应看到这种类型。在语义动作中,请求一个不产生属性的解析器的属性(使用 _attr(ctx) )将产生特殊类型 boost::parser::none 的值。当以返回解析属性的形式调用 parse() 时,如果没有属性,则简单地返回 bool ;这表示解析的成功或失败。

[Warning] Warning  警告

Boost.Parser assumes that all attributes are semi-regular (see std::semiregular). Within the Boost.Parser code, attributes are assigned, moved, copy, and default constructed. There is no support for move-only or non-default-constructible types.
Boost.Parser 假定所有属性都是半正则的(见 std::semiregular )。在 Boost.Parser 代码中,属性被分配、移动、复制和默认构造。不支持仅移动或非默认可构造的类型。

The attribute type trait, attribute
属性类型特性,属性

You can use attribute (and the associated alias, attribute_t) to determine the attribute a parser would have if it were passed to parse(). Since at least one parser (char_) has a polymorphic attribute type, attribute also takes the type of the range being parsed. If a parser produces no attribute, attribute will produce none, not void.
您可以使用 attribute (以及相关的别名, attribute_t )来确定如果将其传递给 parse() ,解析器将具有的属性。由于至少有一个解析器( char_ )具有多态属性类型, attribute 也接受正在解析的范围的类型。如果解析器不产生属性, attribute 将产生 none ,而不是 void

If you want to feed an iterator/sentinel pair to attribute, create a range from it like so:
如果您想将迭代器/哨兵对传递给 attribute ,请创建一个从它开始的范围,如下所示:

constexpr auto parser = /* ... */;
auto first = /* ... */;
auto const last = /* ... */;

namespace bp = boost::parser;
// You can of course use std::ranges::subrange directly in C++20 and later.
using attr_type = bp::attribute_t<decltype(BOOST_PARSER_SUBRANGE(first, last)), decltype(parser)>;

There is no single attribute type for any parser, since a parser can be placed within omit[], which makes its attribute type none. Therefore, attribute cannot tell you what attribute your parser will produce under all circumstances; it only tells you what it would produce if it were passed to parse().
没有任何解析器有单一的属性类型,因为解析器可以放置在 omit[] 中,这使得其属性类型为 none 。因此, attribute 不能告诉你你的解析器在所有情况下会产生什么属性;它只能告诉你如果将其传递给 parse() ,它会产生什么。

Parser attributes  解析属性

This table summarizes the attributes generated for all Boost.Parser parsers. In the table below:
此表总结了为所有 Boost.Parser 解析器生成的属性。在下表中:

  • RESOLVE() is a notional macro that expands to the resolution of parse argument or evaluation of a parse predicate (see The Parsers And Their Uses); and
    RESOLVE () 是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》);
  • x and y represent arbitrary objects.
    xy 代表任意对象。

Table 26.8. Parsers and Their Attributes
表 26.8。解析器和它们的属性

Parser   解析器

Attribute Type   属性类型

Notes   注释

eps

None.

eol

None.

eoi

None.

attr(x)

decltype(RESOLVE(x))

char_

The code point type in Unicode parsing, or char in non-Unicode parsing; see below.
Unicode 解析中的码点类型,或在非 Unicode 解析中的 char ;见下文。

Includes all the _p UDLs that take a single character, and all character class parsers like control and lower.
包括所有接受单个字符的 _p UDLs 以及所有类似 controllower 的字符类解析器。

cp

char32_t

cu

char

lit(x)

None.

Includes all the _l UDLs.
包括所有 _l UDLs。

string(x)

std::string

Includes all the _p UDLs that take a string.
包括所有接受字符串的 _p UDLs。

bool_

bool

bin

unsigned int

oct

unsigned int

hex

unsigned int

ushort_

unsigned short

uint_

unsigned int

ulong_

unsigned long

ulong_long

unsigned long long

short_

short

int_

int

long_

long

long_long

long long

float_

float

double_

double

symbols<T>

T


char_ is a bit odd, since its attribute type is polymorphic. When you use char_ to parse text in the non-Unicode code path (i.e. a string of char), the attribute is char. When you use the exact same char_ to parse in the Unicode-aware code path, all matching is code point based, and so the attribute type is the type used to represent code points, char32_t. All parsing of UTF-8 falls under this case.
char_ 有点奇怪,因为它的属性类型是多态的。当您使用 char_ 在非 Unicode 代码路径中解析文本(即一个 char 字符串)时,属性是 char 。当您使用完全相同的 char_ 在支持 Unicode 的代码路径中解析时,所有匹配都是基于代码点的,因此属性类型是用于表示代码点的类型, char32_t 。所有 UTF-8 的解析都属于这种情况。

Here, we're parsing plain chars, meaning that the parsing is in the non-Unicode code path, the attribute of char_ is char:
这里,我们正在解析纯文本 char ,意味着解析是在非 Unicode 代码路径中, char_ 的属性是 char

auto result = parse("some text", boost::parser::char_);
static_assert(std::is_same_v<decltype(result), std::optional<char>>));

When you parse UTF-8, the matching is done on a code point basis, so the attribute type is char32_t:
当你解析 UTF-8 时,匹配是基于码点的,因此属性类型是 char32_t

auto result = parse("some text" | boost::parser::as_utf8, boost::parser::char_);
static_assert(std::is_same_v<decltype(result), std::optional<char32_t>>));

The good news is that usually you don't parse characters individually. When you parse with char_, you usually parse repetition of then, which will produce a std::string, regardless of whether you're in Unicode parsing mode or not. If you do need to parse individual characters, and want to lock down their attribute type, you can use cp and/or cu to enforce a non-polymorphic attribute type.
好消息是,通常您不需要逐个解析字符。当您使用 char_ 解析时,通常解析重复的 then,这将产生 std::string ,无论您是否处于 Unicode 解析模式。如果您确实需要解析单个字符,并希望锁定它们的属性类型,您可以使用 cp 和/或 cu 来强制执行非多态属性类型。

Combining operation attributes
组合操作属性

Combining operations of course affect the generation of attributes. In the tables below:
当然,组合操作会影响属性生成。在下表中的:

  • m and n are parse arguments that resolve to integral values;
    mn 是解析参数,解析为整数值;
  • pred is a parse predicate;
    pred 是一个解析谓词;
  • arg0, arg1, arg2, ... are parse arguments;
    arg0arg1arg2 等是解析参数;
  • a is a semantic action; and
    a 是一个语义动作;并且
  • p, p1, p2, ... are parsers that generate attributes.
    pp1p2 等是生成属性的解析器。

Table 26.9. Combining Operations and Their Attributes
表 26.9. 组合操作及其属性

Parser   解析器

Attribute Type   属性类型

!p

None.

&p

None.

*p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

+p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

+*p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

*+p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

-p

std::optional<ATTR(p)>

p1 >> p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 > p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 >> p2 >> p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 > p2 >> p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 >> p2 > p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 > p2 > p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 | p2

std::variant<ATTR(p1), ATTR(p2)>

p1 | p2 | p3

std::variant<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 || p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 || p2 || p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 % p2

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p1)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p1)>

p[a]

None.

repeat(arg0)[p]

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

repeat(arg0, arg1)[p]

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

if_(pred)[p]

std::optional<ATTR(p)>

switch_(arg0)(arg1, p1)(arg2, p2)...

std::variant<ATTR(p1), ATTR(p2), ...>


[Important] Important  重要

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).
所有字符解析器,如 char_cpcu ,都产生 charchar32_t 属性。因此,当你在上面的表中看到“ std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)> ”时,这实际上意味着每个字符属性序列都被转换成 std::string 。唯一不会发生这种情况的是,当你使用另一种字符类型引入具有属性的规则(或使用 attribute 这样做)。

[Important] Important  重要

In case you did not notice it above, adding a semantic action to a parser erases the parser's attribute. The attribute is still available inside the semantic action as _attr(ctx).
如果在上文中你没有注意到,向解析器添加语义动作会擦除解析器的属性。该属性仍然在语义动作内部以 _attr(ctx) 的形式可用。

There are a relatively small number of rules that define how sequence parsers and alternative parsers' attributes are generated. (Don't worry, there are examples below.)
存在相对较少的规则定义了如何生成序列解析器和替代解析器的属性。(别担心,下面有示例。)

Sequence parser attribute rules
序列解析属性规则

The attribute generation behavior of sequence parsers is conceptually pretty simple:
序列解析器的属性生成行为在概念上相当简单:

  • the attributes of subparsers form a tuple of values;
    子解析器的属性形成一个值的元组;
  • subparsers that do not generate attributes do not contribute to the sequence's attribute;
    子解析器不生成属性,不会对序列的属性做出贡献
  • subparsers that do generate attributes usually contribute an individual element to the tuple result; except
    子解析器通常为元组结果贡献一个单独的元素,除了
  • when containers of the same element type are next to each other, or individual elements are next to containers of their type, the two adjacent attributes collapse into one attribute; and
    当相同元素类型的容器相邻,或者单个元素与它们类型的容器相邻时,两个相邻属性合并为一个属性;
  • if the result of all that is a degenerate tuple boost::parser::tuple<T> (even if T is a type that means "no attribute"), the attribute becomes T.
    如果所有这些的结果是一个退化的元组 boost::parser::tuple<T> (即使 T 是一种表示“没有属性”的类型),则属性变为 T

More formally, the attribute generation algorithm works like this. For a sequence parser p, let the list of attribute types for the subparsers of p be a0, a1, a2, ..., an.
更正式地说,属性生成算法是这样工作的。对于一个序列解析器 p ,让 p 的子解析器的属性类型列表为 a0, a1, a2, ..., an

We get the attribute of p by evaluating a compile-time left fold operation, left-fold({a1, a2, ..., an}, tuple<a0>, OP). OP is the combining operation that takes the current attribute type (initially boost::parser::tuple<a0>) and the next attribute type, and returns the new current attribute type. The current attribute type at the end of the fold operation is the attribute type for p.
我们通过评估编译时左折叠操作来获取 p 的属性, left-fold({a1, a2, ..., an}, tuple<a0>, OP) 是结合操作,它接受当前属性类型(最初为 boost::parser::tuple<a0> )和下一个属性类型,并返回新的当前属性类型。折叠操作结束时的当前属性类型是 p 的属性类型。

OP attempts to apply a series of rules, one at a time. The rules are noted as X >> Y -> Z, where X is the type of the current attribute, Y is the type of the next attribute, and Z is the new current attribute type. In these rules, C<T> is a container of T; none is a special type that indicates that there is no attribute; T is a type; CHAR is a character type, either char or char32_t; and Ts... is a parameter pack of one or more types. Note that T may be the special type none. The current attribute is always a tuple (call it Tup), so the "current attribute X" refers to the last element of Tup, not Tup itself, except for those rules that explicitly mention boost::parser::tuple<> as part of X's type.
尝试逐个应用一系列规则。规则标记为 X >> Y -> Z ,其中 X 是当前属性的类型, Y 是下一个属性的类型, Z 是新的当前属性类型。在这些规则中, C<T>T 的容器; none 是一个特殊类型,表示没有属性; T 是类型; CHAR 是字符类型,要么是 char 要么是 char32_tTs... 是一组一个或多个类型的参数包。注意, T 可能是特殊类型 none 。当前属性始终是一个元组(可以称之为 Tup ),因此“当前属性 X ”指的是 Tup 的最后一个元素,而不是 Tup 本身,除非那些明确提到 boost::parser::tuple<>X 类型一部分的规则。

The rules that combine containers with (possibly optional) adjacent values (e.g. C<T> >> optional<T> -> C<T>) have a special case for strings. If C<T> is exactly std::string, and T is either char or char32_t, the combination yields a std::string.
规则将容器与(可能可选的)相邻值(例如 C<T> >> optional<T> -> C<T> )组合在一起,对于字符串有一个特殊情况。如果 C<T> 精确等于 std::string ,并且 T 要么是 char ,要么是 char32_t ,则组合产生一个 std::string

Again, if the final result is that the attribute is boost::parser::tuple<T>, the attribute becomes T.
再次,如果最终结果是属性为 boost::parser::tuple<T> ,则属性变为 T

[Note] Note  注意

What constitutes a container in the rules above is determined by the container concept:
上述规则中,构成容器的要素由 container 概念决定:

template<typename T>
concept container = std::ranges::common_range<T> && requires(T t) {
    { t.insert(t.begin(), *t.begin()) }
        -> std::same_as<std::ranges::iterator_t<T>>;
};

Alternative parser attribute rules
替代解析器属性规则

The rules for alternative parsers are much simpler. For an alternative parer p, let the list of attribute types for the subparsers of p be a0, a1, a2, ..., an. The attribute of p is std::variant<a0, a1, a2, ..., an>, with the following steps applied:
替代解析器的规则要简单得多。对于替代解析器 p ,让子解析器 p 的属性类型列表为 a0, a1, a2, ..., anp 的属性为 std::variant<a0, a1, a2, ..., an> ,应用以下步骤:

  • all the none attributes are left out, and if any are, the attribute is wrapped in a std::optional, like std::optional<std::variant</*...*/>>;
    所有 none 属性都被省略了,如果有,属性会被包裹在 std::optional 中,例如 std::optional<std::variant</*...*/>>
  • duplicates in the std::variant template parameters <T1, T2, ... Tn> are removed; every type that appears does so exacly once;
    重复的 std::variant 模板参数 <T1, T2, ... Tn> 已被移除;每个出现的类型都恰好出现一次
  • if the attribute is std::variant<T> or std::optional<std::variant<T>>, the attribute becomes instead T or std::optional<T>, respectively; and
    如果属性是 std::variant<T>std::optional<std::variant<T>> ,则属性分别变为 Tstd::optional<T>
  • if the attribute is std::variant<> or std::optional<std::variant<>>, the result becomes none instead.
    如果属性是 std::variant<>std::optional<std::variant<>> ,结果变为 none
Formation of containers in attributes
容器在属性中的形成

The rule for forming containers from non-containers is simple. You get a vector from any of the repeating parsers, like +p, *p, repeat(3)[p], etc. The value type of the vector is ATTR(p).
非容器形成容器的规则很简单。您可以从任何重复的解析器中获取一个向量,如 +p*prepeat(3)[p] 等。向量的值类型为 ATTR(p)

Another rule for sequence containers is that a value x and a container c containing elements of x's type will form a single container. However, x's type must be exactly the same as the elements in c. There is an exception to this in the special case for strings and characters noted above. For instance, consider the attribute of char_ >> string("str"). In the non-Unicode code path, char_'s attribute type is guaranteed to be char, so ATTR(char_ >> string("str")) is std::string. If you are parsing UTF-8 in the Unicode code path, char_'s attribute type is char32_t, and the special rule makes it also produce a std::string. Otherwise, the attribute for ATTR(char_ >> string("str")) would be boost::parser::tuple<char32_t, std::string>.
另一条序列容器的规则是,一个值 x 和一个包含 x 类型元素的容器 c 将形成一个单独的容器。然而, x 的类型必须与 c 中的元素完全相同。在上述特殊情况下,对于字符串和字符存在一个例外。例如,考虑 char_ >> string("str") 的属性。在非 Unicode 代码路径中, char_ 的属性类型保证是 char ,因此 ATTR(char_ >> string("str"))std::string 。如果你在 Unicode 代码路径中解析 UTF-8, char_ 的属性类型是 char32_t ,特殊规则使得它也会产生一个 std::string 。否则, ATTR(char_ >> string("str")) 的属性将是 boost::parser::tuple<char32_t, std::string>

Again, there are no special rules for combining values and containers. Every combination results from an exact match, or fall into the string+character special case.
再次强调,组合值和容器没有特殊规则。每一种组合都来自精确匹配,或者落入字符串+字符的特殊情况。

Another special case: std::string assignment
另一个特殊情况: std::string 赋值

std::string can be assigned from a char. This is dumb. But, we're stuck with it. When you write a parser with a char attribute, and you try to parse it into a std::string, you've almost certainly made a mistake. More importantly, if you write this:
std::string 可以从 char 分配。这很愚蠢。但我们别无选择。当你用具有 char 属性的解析器进行解析,并尝试将其解析为 std::string 时,你几乎肯定犯了一个错误。更重要的是,如果你写下这样:

namespace bp = boost::parser;
std::string result;
auto b = bp::parse("3", bp::int_, bp::ws, result);

... you are even more likely to have made a mistake. Though this should work, because the assignment in std::string s; s = 3; is well-formed, Boost.Parser forbids it. If you write parsing code like the snippet above, you will get a static assertion. If you really do want to assign a float or whatever to a std::string, do it in a semantic action.
...你甚至更有可能犯错误。尽管这应该可以工作,因为 std::string s; s = 3; 中的任务格式良好,Boost.Parser 禁止这样做。如果你编写像上面片段那样的解析代码,你会得到一个静态断言。如果你真的想将 float 或任何东西赋值给 std::string ,请在语义动作中这样做。

Examples of attributes generated by sequence and alternative parsers
序列和替代解析器生成的属性示例

In the table: a is a semantic action; and p, p1, p2, ... are parsers that generate attributes. Note that only >> is used here; > has the exact same attribute generation rules.
在表中: a 是语义动作;而 pp1p2 、... 是生成属性的解析器。注意,这里只使用了 >>> 具有完全相同的属性生成规则。

Table 26.10. Sequence and Alternative Combining Operations and Their Attributes
表 26.10. 序列和替代组合操作及其属性

Expression   表达式

Attribute Type   属性类型

eps >> eps

None.

p >> eps

ATTR(p)

eps >> p

ATTR(p)

cu >> string("str")

std::string

string("str") >> cu

std::string

*cu >> string("str")

boost::parser::tuple<std::string, std::string>

string("str") >> *cu

boost::parser::tuple<std::string, std::string>

p >> p

boost::parser::tuple<ATTR(p), ATTR(p)>

*p >> p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

p >> *p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

*p >> -p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

-p >> *p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>
std::string 如果 ATTR(p)charchar32_t ,否则 std::vector<ATTR(p)>

string("str") >> -cu

std::string

-cu >> string("str")

std::string

!p1 | p2[a]

None.

p | p

ATTR(p)

p1 | p2

std::variant<ATTR(p1), ATTR(p2)>

p | eps

std::optional<ATTR(p)>

p1 | p2 | eps

std::optional<std::variant<ATTR(p1), ATTR(p2)>>

p1 | p2[a] | p3

std::optional<std::variant<ATTR(p1), ATTR(p3)>>


Controlling attribute generation with merge[] and separate[]
控制使用 merge[]和 separate[]生成属性

As we saw in the previous Parsing into structs and classes section, if you parse two strings in a row, you get two separate strings in the resulting attribute. The parser from that example was this:
如我们在上一节“解析为 struct s 和 class es”中看到的那样,如果你连续解析两个字符串,结果属性中会得到两个独立的字符串。那个例子中的解析器是这样的:

namespace bp = boost::parser;
auto employee_parser = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

employee_parser's attribute is boost::parser::tuple<int, std::string, std::string, double>. The two quoted_string parsers produce std::string attributes, and those attributes are not combined. That is the default behavior, and it is just what we want for this case; we don't want the first and last name fields to be jammed together such that we can't tell where one name ends and the other begins. What if we were parsing some string that consisted of a prefix and a suffix, and the prefix and suffix were defined separately for reuse elsewhere?
'的属性是 boost::parser::tuple<int, std::string, std::string, double> 。这两个 quoted_string 解析器产生 std::string 属性,并且这些属性没有合并。这是默认行为,这正是我们想要的;我们不希望姓名字段被挤在一起,以至于我们无法分辨一个名字的结束和另一个名字的开始。如果我们正在解析一个由前缀和后缀组成的字符串,而且前缀和后缀被分别定义以供其他地方重用,那会怎么样呢?

namespace bp = boost::parser;
auto prefix = /* ... */;
auto suffix = /* ... */;
auto special_string = prefix >> suffix;
// Continue to use prefix and suffix to make other parsers....

In this case, we might want to use these separate parsers, but want special_string to produce a single std::string for its attribute. merge[] exists for this purpose.
在这种情况下,我们可能想要使用这些独立的解析器,但希望 special_string 为其属性生成单个 std::stringmerge[] 就是为了这个目的而存在的。

namespace bp = boost::parser;
auto prefix = /* ... */;
auto suffix = /* ... */;
auto special_string = bp::merge[prefix >> suffix];

merge[] only applies to sequence parsers (like p1 >> p2), and forces all subparsers in the sequence parser to use the same variable for their attribute.
仅适用于序列解析器(如 p1 >> p2 ),并强制序列解析器中的所有子解析器使用相同的变量来表示它们的属性。

Another directive, separate[], also applies only to sequence parsers, but does the opposite of merge[]. If forces all the attributes produced by the subparsers of the sequence parser to stay separate, even if they would have combined. For instance, consider this parser.
另一个指令 separate[] 也仅适用于序列解析器,但与 merge[] 相反。它强制序列解析器的子解析器产生的所有属性保持独立,即使它们本可以合并。例如,考虑这个解析器。

namespace bp = boost::parser;
auto string_and_char = +bp::char_('a') >> ' ' >> bp::cp;

string_and_char matches one or more 'a's, followed by some other character. As written above, string_and_char produces a std::string, and the final character is appended to the string, after all the 'a's. However, if you wanted to store the final character as a separate value, you would use separate[].
string_and_char 匹配一个或多个 'a' ,后面跟其他字符。如上所述, string_and_char 产生一个 std::string ,最后一个字符追加到字符串中,所有 'a' 之后。但是,如果您想将最后一个字符作为单独的值存储,您将使用 separate[]

namespace bp = boost::parser;
auto string_and_char = bp::separate[+bp::char_('a') >> ' ' >> bp::cp];

With this change, string_and_char produces the attribute boost::parser::tuple<std::string, char32_t>.
使用此更改, string_and_char 生成属性 boost::parser::tuple<std::string, char32_t>

merge[] and separate[] in more detail
合并[]和分离[]的更详细说明

As mentioned previously,