可解释科学发现与符号回归方法综述

1 Introduction 1 引言

Symbolic Regression (SR) is a rapidly growing subfield within machine learning (ML) to infer symbolic mathematical expressions from data [1, 2]. Interest in SR is being driven by the observation that it is not sufficient to only have accurate predictive models; however, it is often necessary that the learned models be interpretable [3]. A model is interpretable if the relationship between the input and output of the model can be logically or mathematically traced in a succinct manner. In other words, learnable models are interpretable if expressed as mathematical equations. As "disciplines" become increasingly data-rich and adopt ML techniques, the demand for interpretable models is likely to grow. For example, in the natural sciences (e.g., physics), mathematical models derived from first principles make it possible to reason about the underlying phenomenon in a way that is not possible with predictive models like deep neural networks. In critical disciplines like healthcare, non-interpretable models may never be allowed to be deployed - however accurate they maybe [4].
符号回归（Symbolic Regression, SR）是机器学习（ML）中一个快速发展的子领域，旨在从数据中推断符号化的数学表达式[1, 2]。对 SR 的兴趣源于以下观察：仅拥有高精度的预测模型是不够的，通常还需要所学模型具备可解释性[3]。若模型输入与输出之间的关系能以逻辑或数学方式简洁追溯，则该模型可被视为可解释。换言之，若以数学方程形式表达，可学习模型即具备可解释性。随着各"学科"数据日益丰富并采用 ML 技术，对可解释模型的需求可能持续增长。例如在自然科学（如物理学）中，基于第一性原理推导的数学模型能以深度神经网络等预测模型无法实现的方式解析底层现象。而在医疗保健等关键领域，无论非可解释模型精度多高，都可能永远无法获准部署[4]。

Example: Consider a data set consisting of samples

(q_{1}, q_{2}, r, F)

,where

q_{1}

and

q_{2}

are the charges of two particles,

r

is the distance between them and

F

is the measured force between the particles. Assume

q_{1}, q_{2}

, and

r

are the input variables,and

F

is the output variable. Suppose we model the input-output relationship as

F = θ_{0} + θ_{1} q_{1} + θ_{2} q_{2} + θ_{3} r

. Then,using the data set,we can infer the model’s parameters

(θ_{i})

. The model will be interpretable because we will know the impact of each variable on the output. For example,if

θ_{3}

is negative,then that implies that as

r

increases,the force

F

will decrease. From physics,we know that this form of the model is unlikely to be accurate. On the other hand, we could model the relationship using a neural network

F = N N (q_{1}, q_{2}, r, θ)

. We expect the model to be highly accurate and predictive because neural networks (NNs) are universal function approximators. However, the model is uninterpretable because the input-output relationship is not easily apparent. The input feature vector subsequently undergoes several layers of nonlinear transformations,i.e.,

y = σ (\sum_{i} W_{i} σ (\sum_{j} W_{j} σ (\sum_{k} W_{k} σ (\dots \sum_{ℓ} W_{ℓ} x))))

,where

σ

is a nonlinear activation function,and

W_{i d x}

are the learnable parameters of NN layer of index idx. Such models,called "blackbox",do not have an internal logic to let users understand how inputs are mathematically mapped to outputs. Explainability is the application of other methods to explain model predictions and to understand how it is learned. It refers to why the model makes the decision that way. What distinguishes explainability from interpretability is that interpretable models are transparent [3]. For example, the linear regression model predictions can be interpreted by evaluating the relative contribution of individual features to the predictions using their weights. An ideal SR model will return the relationship as

k \frac{q_{1} q_{2}}{r^{2}}

,which is the definition of the Coulomb force between two charged particles with a constant

1 k = 8.98 \times 10^{9}

. However,learning the SR model is highly non-trivial as it involves searching over a large space of mathematical operations and identifying the right constant(k)that will fit the data. SR models can be directly inferred from data or can be used to "whitebox" a "blackbox" model such as a neural network.
示例：考虑一个由样本

(q_{1}, q_{2}, r, F)

组成的数据集，其中

q_{1}

和

q_{2}

是两个粒子的电荷，

r

是它们之间的距离，

F

是测量到的粒子间作用力。假设

q_{1}, q_{2}

、

r

为输入变量，

F

为输出变量。若将输入-输出关系建模为

F = θ_{0} + θ_{1} q_{1} + θ_{2} q_{2} + θ_{3} r

，则可通过数据集推断模型参数

(θ_{i})

。该模型具有可解释性，因为我们将了解每个变量对输出的影响。例如，若

θ_{3}

为负值，则意味着随着

r

增大，作用力

F

会减小。从物理学角度看，这种模型形式可能不够精确。另一方面，我们也可以使用神经网络

F = N N (q_{1}, q_{2}, r, θ)

来建模该关系。由于神经网络（NNs）是通用函数逼近器，我们预期该模型具有高度准确性和预测能力。然而，该模型不可解释，因为输入-输出关系不易直观呈现。输入特征向量随后会经历多层非线性变换，即

y = σ (\sum_{i} W_{i} σ (\sum_{j} W_{j} σ (\sum_{k} W_{k} σ (\dots \sum_{ℓ} W_{ℓ} x))))

，其中

σ

是非线性激活函数，

W_{i d x}

是指数为 idx 的神经网络层可学习参数。这类被称为“黑箱”的模型缺乏内部逻辑机制，无法让用户理解输入如何通过数学映射转化为输出。可解释性则是应用其他方法来阐明模型预测结果并理解其学习过程，核心在于揭示模型为何做出特定决策。与可解释性不同，可解释模型具有透明特性[3]——例如线性回归模型可通过特征权重评估各变量对预测结果的相对贡献度来实现解释。理想的符号回归(SR)模型会返回如

k \frac{q_{1} q_{2}}{r^{2}}

所示的关系式，该式实为带常数项

1 k = 8.98 \times 10^{9}

的两个带电粒子间库仑力的定义式。然而，SR 模型的学习过程极具挑战性，需在庞大数学运算符空间中进行搜索，并确定能拟合数据的最优常数(k)。SR 模型既可直接从数据中推导获得，也可用于将神经网络等"黑箱"模型转化为"白箱"模型。

The ultimate goal of SR is to bridge data and observations following the Keplerian trial and error approach [5]. Kepler developed a data-driven model for planetary motion using the most accurate astronomical measurements of the era, which resulted in elliptic orbits described by a power law. In contrast, Newton developed a dynamic relationship between physical variables that described the underlying process at the origin of these elliptic orbits. Newton's approach [6] led to three laws of motion later verified by experimental observations. Whereas both methods fit the data well, Newton's approach could be generalized to predict behavior in regimes where no data were available. Although SR is regarded as a data-driven model discovery tool, it aims to find a symbolic model that simultaneously fits data well and could be generalized to uncovered regimes.
SR 的终极目标是遵循开普勒的试错法[5]，在数据与观测之间架起桥梁。开普勒利用当时最精确的天文测量数据，为行星运动开发了一个数据驱动模型，最终得出由幂律描述的椭圆轨道。相比之下，牛顿则建立了物理变量之间的动力学关系，揭示了这些椭圆轨道背后的成因过程。牛顿的方法[6]后来被实验观测验证，形成了三大运动定律。尽管两种方法都能很好地拟合数据，但牛顿的方法能够推广到尚无数据的领域以预测行为。虽然 SR 被视为数据驱动的模型发现工具，但其目标是找到一个既能良好拟合数据、又能推广到未知领域的符号模型。

SR is deployed as an interpretable and predictive ML model or a data-driven scientific discovery method. SR was investigated as early as 1970 in research works [7, 8, 9] aiming to rediscover empirical laws. Such works iteratively apply a set of data-driven heuristics to formulate mathematical expressions. The first AI system meant to automate scientific discovery is called BACON [10, 11]. It was developed by Patrick Langley in the late 1970s and was successful in rediscovering versions of various physical laws, such as Coulomb's law and Galileo's laws for the pendulum and constant acceleration, among many others. SR was later studied by Koza [12, 13, 1] who proposed that genetic programming (GP) can be used to discover symbolic models by encoding mathematical expressions as computational trees, where GP is an evolutionary algorithm that iteratively evolves an initial population of individuals via biology-inspired operations. SR was since then tackled with GP-based methods [1, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. Moreover, it was popularized as a data-driven scientific discovery tool with the commercial software Eureqa [26] based on a research work [2]. Whereas GP-based methods achieve high prediction accuracy, they do not scale to high dimensional data sets and are sensitive to hyperparameters [19]. More recently, SR has been addressed with deep learning-based methods 27. 28, 19, 29, 30, 31, 32, 33 which leverage neural networks (NNs) to learn accurate symbolic models. SR has been applied in fundamental and applied sciences such as astrophysics [34], chemistry [35, 36], materials science [37, 38], semantic similarity measurement [39], climatology 40, medicine [41], among many others. Many of these applications are promising, showing the potential of SR. A recent SR benchmarking platform SRBench is introduced by La Cava et al. [42]. It comprises 14 SR methods (among which ten are GP-based), applied on 252 data sets. The goal of SRBench was to provide a benchmark for rigorous evaluation and comparison of SR methods.
SR 作为一种可解释的预测性机器学习模型或数据驱动的科学发现方法被部署。早在 1970 年的研究工作中[7,8,9]，SR 就被探索用于重新发现经验定律。这类工作通过迭代应用一系列数据驱动的启发式方法来构建数学表达式。首个旨在自动化科学发现的人工智能系统名为 BACON[10,11]，由 Patrick Langley 于 1970 年代末开发，成功重新发现了包括库仑定律、伽利略摆锤定律和匀加速定律在内的多种物理定律版本。随后 Koza[12,13,1]研究了 SR，提出可通过将数学表达式编码为计算树，利用遗传编程(GP)发现符号模型——GP 是一种通过生物启发操作迭代进化初始个体群体的进化算法。此后 SR 主要通过基于 GP 的方法进行研究[1,14-25]。此外，基于研究工作[2]的商业软件 Eureqa[26]将其推广为数据驱动的科学发现工具。基于高斯过程（GP）的方法虽然预测精度高，但无法扩展到高维数据集且对超参数敏感[19]。最近，符号回归（SR）开始采用基于深度学习的方法[27, 28, 19, 29, 30, 31, 32, 33]，这些方法利用神经网络（NNs）学习精确的符号模型。SR 已广泛应用于基础科学与应用科学领域，如天体物理学[34]、化学[35, 36]、材料科学[37, 38]、语义相似性度量[39]、气候学[40]、医学[41]等。许多应用展现了 SR 的巨大潜力。La Cava 等人[42]近期推出了 SR 基准测试平台 SRBench，该平台包含 14 种 SR 方法（其中 10 种基于 GP），应用于 252 个数据集，旨在为 SR 方法的严格评估与比较提供基准。

^{1} k

is the electric force (or Coulomb) constant,

k = 8.9875517923 \times 10^{9} kg \cdot m^{3} \cdot s^{- 4} \cdot A^{- 2}

in SI base units.

^{1} k

是电力（或库仑）常数，

k = 8.9875517923 \times 10^{9} kg \cdot m^{3} \cdot s^{- 4} \cdot A^{- 2}

以国际单位制基本单位表示。

This survey aims to help researchers effectively and comprehensively understand the SR problem and how it could be solved, as well as to present the current status of the advances made in this growing subfield. The survey is structured as follows. First, we define the SR problem, present a structured and comprehensive review of methods, and discuss their strengths and limitations. Furthermore, we discuss the adoption of these SR methods across various application domains and assess their effectiveness. Along with this survey, a living review [25] aims to group state-of-the-art SR methods and applications and track advances made in the SR field. The objective is to update this list often to incorporate new research works.
本综述旨在帮助研究者有效且全面地理解 SR 问题及其解决途径，并展示这一快速发展的子领域当前的研究进展。综述结构如下：首先，我们定义 SR 问题，对现有方法进行结构化、全面的梳理，并探讨其优势与局限。此外，我们分析了这些 SR 方法在不同应用领域的采用情况，评估其实际效果。配合本综述，动态文献[25]将持续汇总最前沿的 SR 方法与应用案例，追踪该领域进展，通过定期更新纳入最新研究成果。

This paper is organized as follows. The SR problem definition is presented in Section 2. We present an overview of methods deployed to solve the SR problem in Section 3, and the methods are discussed in detail in Section 4.5 and 6. Selected applications are described and discussed in Section 7. Section 8 presents an overview of existing benchmark data sets. Finally, we summarize our conclusions and discuss perspectives in Section 10.
本文组织结构如下：第 2 节阐述 SR 问题定义；第 3 节概述 SR 问题解决方法框架；第 4.5 节与第 6 节将详细解析具体方法；第 7 节选取典型应用场景进行论述；第 8 节汇总现有基准数据集；最后在第 10 节总结结论并展望未来研究方向。

2 Problem Definition 2 问题定义

The problem of symbolic regression can be defined in terms of classical Empirical Risk Minimization (ERM) [43].
符号回归问题可以依据经典的经验风险最小化（ERM）框架进行定义[43]。

Data: Given a data set

D = {(x_{i}, y_{i})}_{i = 1}^{n}

,where

x_{i} \in R^{d}

is the input vector and

y_{i} \in R

is a scalar output.
数据：给定数据集

D = {(x_{i}, y_{i})}_{i = 1}^{n}

，其中

x_{i} \in R^{d}

为输入向量，

y_{i} \in R

为标量输出。

Function Class: Let

F

be a function class consisting of mappings

f : R^{d} \to R

.
函数类：设

F

为由映射

f : R^{d} \to R

构成的函数类。

Loss Function: Define the loss function for every candidate

f \in F

:
损失函数：为每个候选函数

f \in F

定义损失函数：

\begin{matrix} (1) & l (f) := \sum_{i = 1}^{n} l (f (x_{i}), y_{i}) \end{matrix}

A common choice is the squared difference between the output and prediction,i.e.

l (f) = \sum_{i} {(y_{i} - f (x_{i}))}^{2}

.
常见的选择是输出与预测之间的平方差，即

l (f) = \sum_{i} {(y_{i} - f (x_{i}))}^{2}

。

Optimization: The optimization task is to find the function(f)over the set of functions

F

that minimizes the loss function:
优化：优化任务是在函数集

F

上找到使损失函数最小化的函数(f)。

\begin{matrix} (2) & f^{*} = \underset{f \in F}{\arg min} l (f) \end{matrix}

As stated below, what distinguishes SR from conventional regression problems is the discrete nature of the function class

F

. Different methods for solving the SR problem reduce to characterizing the function class.
如下所述，符号回归与传统回归问题的区别在于函数类

F

的离散性质。解决符号回归问题的不同方法归结为对函数类的刻画。

2.1 Class of Function 2.1 函数类

In SR,to define

F

,we specify a library of elementary arithmetic operations and mathematical functions and variables,and an element

f \in F

is the set of all functions that can be obtained by function composition in the library [44]. For example, consider a library:
在符号回归（SR）中，为了定义

F

，我们指定了一个包含基本算术运算、数学函数及变量的库，而

f \in F

则是通过该库中函数组合所能获得的所有函数的集合[44]。例如，考虑如下库：

\begin{matrix} (3) & L = {id (\cdot), add (\cdot, \cdot), sub (\cdot, \cdot), mul (\cdot, \cdot), + 1, - 1} \end{matrix}

Then the set of all of the polynomials (in one variable

x

) with integer coefficients can be derived from

L

using function composition.
那么所有具有整数系数的（单变量

x

）多项式集合都可以通过函数复合从

L

导出。

2.2 Expression representation
2.2 表达式表示法

It is convenient to express symbolic expressions in a sequential form using either a unary-binary expression tree or the polish notation [45]. For example,the expression

f (x) = x_{1} x_{2} - 2 x_{3}

can be derived using function composition from

L

(Eq. 3) and represented as a tree-like structure illustrated in Figure 1a. By traversing the (binary) tree top to bottom and left to right in a depth-first manner, we can represent the same expression as a unique sequence called the polish form, as illustrated in Figure 1b.
为便于表达符号式，可采用一元二元表达式树或波兰表示法[45]将其转换为序列形式。例如，表达式

f (x) = x_{1} x_{2} - 2 x_{3}

可通过函数组合从

L

（式 3）派生而来，并以图 1a 所示的树状结构呈现。通过自上而下、从左至右深度优先遍历该（二叉）树，可将同一表达式表示为唯一的序列形式，称为波兰式，如图 1b 所示。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_4.jpg?x=565&y=144&w=518&h=470&r=0

Figure 1: (a) Example of a unary-binary tree that encodes

f (x) = x_{1} x_{2} - 2 x_{3}

. (b) Sequence representation of the tree-like structure of

f (x)

.
图 1：(a) 编码

f (x) = x_{1} x_{2} - 2 x_{3}

的一元二元树示例。(b)

f (x)

树状结构的序列表示形式。

In practice,the library

L

includes many other common elementary mathematical functions,including the basic trigonometric functions like sine, cosine, logarithm, exponential, square root, power low, etc. Prior domain knowledge is advantageous for library definition because it reduces the search space to only include the most relevant mathematical operations to the studied problem. Furthermore, a large range of possible numeric constants should be possible to express. For example, numbers in base-10 floating point notation rounded up to four significant digits can be represented as triple of (sign, mantissa, exponent) [31]. The function sin(3.456x), for example,can be represented as

[\sin, mul, 3456, E - 3, x]

.
实际上，该库

L

包含了许多其他常见的初等数学函数，包括基本三角函数如正弦、余弦、对数、指数、平方根、幂律等。先验领域知识对于库的定义是有利的，因为它将搜索空间缩小到仅包含与研究问题最相关的数学运算。此外，应能表达大范围的可能的数值常量。例如，以十进制浮点表示法四舍五入至四位有效数字的数字可以表示为（符号、尾数、指数）的三元组[31]。例如，函数 sin(3.456x)可以表示为

[\sin, mul, 3456, E - 3, x]

。

3 Symbolic regression methods overview
3 符号回归方法概述

In this survey, we categorize SR methods in the following manner: regression-based methods, expression tree-based methods, physics-inspired and mathematics-inspired methods, as presented in Figure 2. For each category, a summary of the mathematical tool, the expression form, the set of unknowns, and the search space, is presented in Table 1.
在本综述中，我们将符号回归方法分类如下：基于回归的方法、基于表达式树的方法、物理启发和数学启发的方法，如图 2 所示。每个类别的数学工具、表达式形式、未知数集合和搜索空间的总结见表 1。

The linear method defines the functional form as a linear combination of nonlinear functions of

x

that are comprised in the predefined library

L

. Linear models are expressed as:
线性方法将函数形式定义为由预定义库

L

中包含的

x

非线性函数的线性组合。线性模型表示为：

\begin{matrix} (4) & f (x, θ) = \sum_{j} θ_{j} h_{j} (x) \end{matrix}

where

j

spans the base functions of

L

. The optimization problem reduces to find the set of parameters

{θ}

that minimizes the loss function defined over a continuous parameter space

Θ = R^{M}

as follows:
其中

j

遍历

L

的基函数。优化问题简化为寻找参数集

{θ}

，以最小化在连续参数空间

Θ = R^{M}

上定义的损失函数，如下所示：

\begin{matrix} (5) & θ^{*} = \underset{θ \in Θ}{\arg min} \sum_{i} l (f (x_{i}, θ), y_{i}) \end{matrix}

This method is advantageous for being deterministic and disadvantageous because it imposes a single model structure which is fixed during training when the model's parameters are learned.
该方法的优势在于其确定性，劣势则在于它强制采用单一模型结构，在训练学习模型参数时该结构保持固定不变。

The nonlinear method defines the model structure by a neural network. Nonlinear models can thus be expressed as:
非线性方法通过神经网络定义模型结构。因此，非线性模型可表示为：

\begin{matrix} (6) & f (x, W) = σ (\sum_{i} W_{i} σ (\sum_{j} W_{j} σ (\dots \sum_{ℓ} W_{ℓ} x)))) \end{matrix}

where

σ

is a nonlinear activation function,and

W_{i d x}

are the learnable parameters of NN layer of index

i d x

. Similarly to the linear method,the optimization problem reduces finding the set of parameters

{W, b}

of neural network layers, which minimizes the loss function over the space of real values.
其中

σ

是一个非线性激活函数，

W_{i d x}

是索引为

i d x

的神经网络层的可学习参数。与线性方法类似，优化问题简化为寻找神经网络层的参数集

{W, b}

，该参数集在实数空间上最小化损失函数。

Expression tree-based methods treat mathematical expressions as unary-binary trees whose internal nodes are operators and terminals are operands (variables or constants). This category comprises GP-based, deep neural transformers, and reinforcement learning-based methods. In GP-based methods, a set of transition rules (e.g., mutation, crossover) is defined over the tree space and applied to an initial population of trees throughout many iterations until the loss function is minimized. Transformers [46] represent a novel architecture of neural network (encoder and decoder) that uses attention mechanism. The latter was primarily used to capture long-range dependencies in a sentence. Transformers were designed to operate on sequential data and to perform sequence-to-sequence (seq2seq) tasks. For their use in SR,input data points(x,y)and symbolic expressions (f)are encoded as sequences and transformers perform set-to-sequence tasks. The unknowns are the weight parameters of the encoder and the decoder. Reinforcement learning (RL) is a machine learning method that seeks to learn a policy

π (x ∣ θ)

by training an agent to perform a task by interacting with its environment in discrete time steps. An RL setting requires four components: state space, action space, state transition probabilities, and reward. The agent selects an action that is sent to the environment. A reward and a new state are sent back to the agent from its environment and used by the agent to improve its policy at the next time step. In the context of SR, symbolic expression (sequence) represents a state, predicting an element in a sequence represents an action, the parent and sibling represent the environment, and the reward is commonly chosen as the mean square error (MSE). RL-based SR methods are commonly hybrid and use various ML tools (e.g., NN, RNN, etc.) in a joint manner with RL.
基于表达式树的方法将数学表达式视为一元-二元树，其内部节点为运算符，叶节点为操作数（变量或常量）。这类方法包括基于遗传编程（GP）、深度神经变换器和强化学习的方法。在基于 GP 的方法中，定义了一组在树空间上的转换规则（如变异、交叉），并在多次迭代中应用于初始树种群，直到损失函数最小化。变换器[46]代表了一种新颖的神经网络架构（编码器和解码器），它使用注意力机制。后者最初用于捕捉句子中的长距离依赖关系。变换器设计用于处理序列数据并执行序列到序列（seq2seq）任务。在符号回归（SR）中的应用中，输入数据点(x,y)和符号表达式(f)被编码为序列，变换器执行集合到序列的任务。未知的是编码器和解码器的权重参数。强化学习（RL）是一种机器学习方法，旨在通过训练智能体在离散时间步中与环境交互来完成任务，从而学习策略

π (x ∣ θ)

。RL 设定需要四个组成部分：状态空间、动作空间、状态转移概率和奖励。智能体选择一个动作发送给环境，环境则返回奖励和新状态给智能体，用于在下一时间步改进其策略。在符号回归（SR）的上下文中，符号表达式（序列）代表状态，预测序列中的元素代表动作，父节点和兄弟节点代表环境，而奖励通常选择均方误差（MSE）。基于 RL 的 SR 方法通常是混合型的，并联合使用多种机器学习工具（如神经网络 NN、循环神经网络 RNN 等）与 RL 协同工作。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_5.jpg?x=331&y=141&w=988&h=762&r=0

Figure 2: Taxonomy based on the type of symbolic regression methods.

ϕ

denotes a neural network function,

W

denotes the set of learnable parameters in NN.

x

denotes the input data,

z

denotes a reduced representation of

x

,and

x^{'}

denotes a new representation of

x

,e.g.,by defining new features based on the original ones.

T

represents the final population of selected expression trees in genetic programming.
图 2：基于符号回归方法类型的分类体系。

ϕ

表示神经网络函数，

W

表示神经网络中的可学习参数集合，

x

表示输入数据，

z

表示

x

的简化表示，

x^{'}

表示

x

的新表示（例如通过基于原始特征定义新特征），

T

代表遗传编程中最终选定的表达式树种群。

4 Linear symbolic regression
4 线性符号回归

The linear approach assumes,by definition,that the target symbolic expression

(f (x))

is a linear combination of nonlinear functions of feature attributes:
线性方法默认假设目标符号表达式

(f (x))

是特征属性非线性函数的线性组合：

\begin{matrix} (7) & f (x) = \sum_{j} θ_{j} h_{j} (x) \end{matrix}

Here

x

denotes the input features vector,

θ_{j}

denotes a weight coefficient,and

h_{j} (\cdot)

denotes a unary operator of the library

L

. This approach predefines the model’s structure and reduces the SR problem to learn only the model’s parameters by solving a system of linear equations. The particular case where

f (x)

is a linear combination of degree-one monomial reduces to a conventional linear regression problem,i.e.,

f (x) = \sum_{j} θ_{j} x^{j} =

θ_{0} + θ_{1} x + θ_{2} x^{2} + \dots

. There exist two cases for this problem: (1) a unidimensional case defined by

f : R^{d} \to R

; and (2) a multidimensional case defined by

f : R^{d} \to R^{m}

,with

d

the number of input features and

m

the number of variables required for a complete description of a system; for example, the Lorenz system for fluid flow is defined in terms of three physical variables which depend on time.
此处

x

表示输入特征向量，

θ_{j}

表示权重系数，

h_{j} (\cdot)

表示库

L

中的一元运算符。该方法预定义了模型结构，并将符号回归问题简化为仅通过求解线性方程组学习模型参数。当

f (x)

为一次单项式的线性组合时，特例退化为常规线性回归问题，即

f (x) = \sum_{j} θ_{j} x^{j} =

θ_{0} + θ_{1} x + θ_{2} x^{2} + \dots

。该问题存在两种情况：(1) 由

f : R^{d} \to R

定义的单维情形；(2) 由

f : R^{d} \to R^{m}

定义的多维情形，其中

d

表示输入特征数量，

m

表示完整描述系统所需的变量数；例如，描述流体流动的洛伦兹系统就以三个随时间变化的物理变量定义。

Table 1: Table summarizing symbolic regression methods. The mathematical tool, the expression form, the set of unknowns, and the search space are specified for each method. Set2seq abbreviates "set-to-sequence".
表 1：符号回归方法汇总表。每种方法均标注了数学工具、表达式形式、未知数集合及搜索空间。Set2seq 为"集合到序列"的缩写。

Method 方法	Tool 工具	Expression form 表达式形式	Unkown 未知	Search space 搜索空间
Linear SR 线性 SR	Uni-D linear system 单维线性系统	$y = \sum_{i} θ_{i} f_{i} (x)$	${θ}_{i}$	$R$
	Multi-D linear system 多维线性系统	$y_{i} = \sum_{j} θ_{j} f_{j} (x)$	${({θ}_{i})}_{j}$	$R$
Nonlinear SR 非线性 SR	Neural Network 神经网络	$y = f (W \cdot x + b)$	${W, b}$	$R$
Expression-tree search 表达式树搜索	Genetic Programming 遗传编程	Expression tree 表达式树		trees 树结构
	Transformers 变压器	set2seq mapping 集合到序列映射	${W_{q}, W_{k}, W_{v}}$	$R$
	Reinforcement learning 强化学习	set2seq mapping 集合到序列映射	$π (θ)$	$R$
Physics-inspired 物理启发式	AI-Feynman AI-费曼	$y = f (x, θ)$	-	-
Mathematics-inspired 数学启发的	Symbolic metamodels 符号化元模型	$G (x, θ)$	$θ$	$R$

4.1 Unidimensional case 4.1 单维情况

Given a data set

D = {(x_{i}, y_{i})}_{i = 1}^{n}

,the mathematical expression could be either univariate

(x_{i} \in R, y_{i} = f (x_{i}))

or multivariate

(x_{i} \in R^{d}, y_{i} = f (x_{i}))

. The methodology of linear SR is presented in detail for the univariate case in Secion 4.1.1 for simplicity and is extended for the multivariate case in Section 4.1.2
给定数据集

D = {(x_{i}, y_{i})}_{i = 1}^{n}

，数学表达式可以是单变量

(x_{i} \in R, y_{i} = f (x_{i}))

或多变量

(x_{i} \in R^{d}, y_{i} = f (x_{i}))

。为简化起见，4.1.1 节详细介绍了单变量情况下的线性符号回归方法，并在 4.1.2 节中扩展至多变量情形。

4.1.1 Univariate function
4.1.1 单变量函数

Data set:

D = {x_{i} \in R; y_{i} = f (x_{i})}

. 数据集：

D = {x_{i} \in R; y_{i} = f (x_{i})}

。

Library:

L

can include any number of mathematical operators such that the dimension of the data set is always greater than the dimension of the library matrix (see discussion below).
库：

L

可包含任意数量的数学运算符，确保数据集的维度始终大于库矩阵的维度（详见下文讨论）。

In this approach,a coefficient

θ_{j}

is assigned to each candidate function

(f_{j} (\cdot) \in L)

as an activeness criterion such that:
在此方法中，为每个候选函数

(f_{j} (\cdot) \in L)

分配一个系数

θ_{j}

作为活跃性准则，具体表现为：

\begin{matrix} (8) & y = \sum_{j} θ_{j} f_{j} (x) \end{matrix}

Applying Eq. 8 to input-output pairs

(x_{i}, y_{i})

yields a system of linear equations as follows:
将方程 8 应用于输入-输出对

(x_{i}, y_{i})

，可得到如下线性方程组：

y_{1} = θ_{0} + θ_{1} f_{1} (x_{1}) + θ_{2} f_{2} (x_{1}) + \dots + θ_{k} f_{k} (x_{1})

\begin{matrix} (9) & y_{2} = θ_{0} + θ_{1} f_{1} (x_{2}) + θ_{2} f_{2} (x_{2}) + \dots + θ_{k} f_{k} (x_{2}) \end{matrix}

y_{n} = θ_{0} + θ_{1} f_{1} (x_{n}) + θ_{2} f_{2} (x_{n}) + \dots + θ_{k} f_{k} (x_{n})

which can be represented in a matrix form as:
这可以用矩阵形式表示为：

\begin{matrix} (10) & [\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{matrix}] = [\begin{matrix} 1 & f_{1} (x_{1}) & f_{2} (x_{1}) & \dots & f_{k} (x_{1}) \\ 1 & f_{1} (x_{2}) & f_{2} (x_{2}) & \dots & f_{k} (x_{2}) \\ ⋮ \\ 1 & f_{1} (x_{n}) & f_{2} (x_{n}) & \dots & f_{k} (x_{n}) \end{matrix}] [\begin{matrix} θ_{0} \\ θ_{1} \\ ⋮ \\ θ_{k} \end{matrix}] \end{matrix}

Equation 10 can then be presented in a compact form:
随后，方程 10 可以简洁地表示为：

\begin{matrix} (11) & Y = U (X) \cdot Θ \end{matrix}

where

Θ \in R^{(k + 1)}

is the sparse vector of coefficients,and

U \in R^{n \times (k + 1)}

is the library matrix which can be represented as a function of the input vector

X

as follows:
其中

Θ \in R^{(k + 1)}

为稀疏系数向量，

U \in R^{n \times (k + 1)}

是库矩阵，可表示为输入向量

X

的函数，具体如下：

\begin{matrix} (12) & U (X) = [\begin{matrix} ∣ & ∣ & ∣ \\ 1 & f_{1} (X) & f_{2} (X) & \dots & f_{k} (X) \\ ∣ & ∣ & ∣ \end{matrix}] \end{matrix}

Example: For a library defined as:
示例：对于一个定义为以下的库：

\begin{matrix} (13) & L = {1, x, {(\cdot)}^{2}, \sin (\cdot), \cos (\cdot), \exp (\cdot)} \end{matrix}

The matrix

U

becomes: 矩阵

U

变为：

U (X) = [\begin{matrix} ∣ & ∣ & ∣ & ∣ & ∣ & ∣ \\ 1 & X & X^{2} & \sin (X) & \cos (X) & \exp (X) \\ ∣ & ∣ & ∣ & ∣ & ∣ & ∣ \end{matrix}]

Each row (of index

i

) in Eq. 12 is a vector of

(k + 1)

functions of

x_{i}

. The vector of coefficients,i.e.,the model’s parameters, is obtained by solving Eq. 11 as follows

^{2}

:
方程 12 中的每一行（索引

i

）是一个由

(k + 1)

个关于

x_{i}

的函数组成的向量。系数向量（即模型参数）通过求解方程 11 获得，具体如下

^{2}

：

\begin{matrix} (14) & Θ = {(U^{T} U)}^{- 1} U^{T} Y \end{matrix}

The magnitude of a coefficient

θ_{k}

effectively measures the size of the contribution of the associated function

f_{k} (\cdot)

to the final prediction. Finally,the prediction vector

\hat{Y}

can be evaluated using Eq. 11,
系数

θ_{k}

的幅值实质上衡量了关联函数

f_{k} (\cdot)

对最终预测的贡献大小。最终，预测向量

\hat{Y}

可通过方程 11 计算得出，

An exemplary schematic is illustrated in Figure 3 for the univariate function

f (x) = 1 + α x^{3}

. Only coefficients associated with functions

{1, x^{3}}

of the library are non-zero,with values equal to 1 and

α

,respectively.
图 3 展示了一个单变量函数

f (x) = 1 + α x^{3}

的示例示意图。仅与函数库中

{1, x^{3}}

相关联的系数为非零值，其数值分别为 1 和

α

。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_7.jpg?x=649&y=1142&w=350&h=211&r=0

Figure 3: Schematic of the system of linear equations of Eq. 11 for

f (x) = 1 + α x^{3}

. A library matrix

U (X)

of nonlinear functions of the input is constructed,where

L = {1, x, x^{2}, x^{3}, \dots}

. The marked entries in the

Θ

vector denote the non-zero coefficients determining which functions of the library are active.
图 3：针对

f (x) = 1 + α x^{3}

的方程 11 线性方程组示意图。构建了一个输入非线性函数的库矩阵

U (X)

，其中

L = {1, x, x^{2}, x^{3}, \dots}

。

Θ

向量中标记的条目表示决定库中哪些函数被激活的非零系数。

In the following, linear SR is tested on synthetic data. In each experiment, training and test data sets are generated. Each set consists of twenty data points randomly sampled from a uniform distribution

U (- 1, 1)

, and

y

is evaluated using a univariate function,i.e.,

D = {(x_{i}, f (x_{i}))}_{i = 1}^{n}

. Two libraries are considered in these experiments:

L_{1} = {x, {(\cdot)}^{2}, {(\cdot)}^{3}, \dots, {(\cdot)}^{9}}

and

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

. The results are reported in terms of the output expression (Equation 7) and the coefficient of determination

R^{2}

. SR problems are grouped into (i) pure polynomial functions and (ii) mixed polynomial and trigonometric functions. In each experiment, parameters are learned using the training data set, and results are reported for the test data set in
在以下实验中，线性符号回归（SR）在合成数据上进行了测试。每组实验都会生成训练数据集和测试数据集。每个数据集包含从均匀分布

U (- 1, 1)

中随机采样的二十个数据点，并使用单变量函数（即

D = {(x_{i}, f (x_{i}))}_{i = 1}^{n}

）对

y

进行评估。实验中考虑了两个库：

L_{1} = {x, {(\cdot)}^{2}, {(\cdot)}^{3}, \dots, {(\cdot)}^{9}}

和

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

。结果以输出表达式（公式 7）和决定系数

R^{2}

的形式呈现。符号回归问题分为两类：(i)纯多项式函数和(ii)混合多项式与三角函数。每组实验中，参数通过训练数据集学习，并在测试数据集上报告结果。

Table 2 表 2

For polynomial functions,an exact output is obtained using

L_{1}

with an

R^{2} = 1.0

,whereas only approximate output is obtained using

L_{2}

. In the latter case,the quality of the fit depends on the size of the training data set. An exemplary result is shown in Figure 4 for

f (x) = x + x^{2} + x^{3}

. Points represent the (test) data of the input file,i.e.,X; the red curve represents

f (x)

as a function of

x

,and the blue and black dashed curves represent the predicted function

\hat{f} (x)

obtained using

L_{1}

and

L_{2}

respectively. An exact match between the ground-truth function and the predicted one is found using

L_{1}

,whereas a significant discrepancy is obtained using

L_{2}

. This discrepancy could be explained by the fact that various functions in

L_{2}

exhibit the same

x

-dependence over the covered

x

-range.
对于多项式函数，使用

L_{1}

配合

R^{2} = 1.0

可获得精确输出，而使用

L_{2}

仅能得到近似结果。后一种情况下，拟合质量取决于训练数据集的大小。图 4 展示了

f (x) = x + x^{2} + x^{3}

的典型结果：点代表输入文件中的（测试）数据，即 X；红色曲线表示作为

x

函数的

f (x)

，蓝色和黑色虚线则分别代表通过

L_{1}

和

L_{2}

得到的预测函数

\hat{f} (x)

。使用

L_{1}

时，真实函数与预测函数完全吻合，而使用

L_{2}

则存在显著差异。这种差异可解释为：在

L_{2}

中，多种函数在覆盖的

x

范围内表现出相同的

x

依赖性。

^{2}

Technically the pseudo-inverse,

U^{+}

^{2}

技术上称为伪逆，

U^{+}

Table 2: Results of linear SR in the case of univariate functions.

D = {(x_{i}; y_{i})}; x_{i} \in U (- 1, 1, 20)

and

y_{i} = f (x_{i})

L_{1} = {x, {(\cdot)}^{2}, \dots, {(\cdot)}^{9}}

and

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

. T denotes True,and

F

denotes False.
表 2：单变量函数线性符号回归结果。

D = {(x_{i}; y_{i})}; x_{i} \in U (- 1, 1, 20)

与

y_{i} = f (x_{i})

。

L_{1} = {x, {(\cdot)}^{2}, \dots, {(\cdot)}^{9}}

与

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

。T 表示真，

F

表示假。

Benchmark 基准测试	Expression 表达	$L_{1}$		$L_{2}$
Benchmark 基准测试	Expression 表达	Exp 表达式	$R^{2}$	Exp 表达式	$R^{2}$
Nguyen-2 阮氏-2	$x^{4} + x^{3} + x^{2} + x$	T	1.0	F	0.886
Nguyen-3 阮氏-3	$x^{5} + x^{4} + x^{3} + x^{2} + x$	T	1.0	F	0.867
Livermore-21 利弗莫尔-21	$x^{8} + x^{7} + x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x$	T	1.0	F	0.869
Livermore-9 利弗莫尔-9	$x^{9} + x^{8} + x^{7} + x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x$	T	1.0	F	0.882
Livermore-6 利弗莫尔-6	$x + 2 x^{2} + 3 x^{3} + 4 x^{4}$	T	1.0	F	0.417
Livermore-19 利弗莫尔-19	$x + x^{2} + x^{4} + x^{5}$	T	1.0	F	-0.079
Livermore-14* 利弗莫尔-14*	$x + x^{2} + x^{3} + \sin (x)$	F	1.0	F	-0.857
Nguyen-5 阮-5	$\sin (x^{2}) \cos (x) - 1$	F	0.999	F	-3.97
Nguyen-6 阮-6	$\sin (x) + \sin (x + x^{2})$	F	0.999	F	0.564

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_8.jpg?x=463&y=798&w=714&h=465&r=0

Figure 4: Result of linear SR for the Nguyen-1 benchmark,i.e.,

f (x) = x + x^{2} + x^{3}

. Red points represent (test) data set. The red curve represents the true function. The blue and black dashed curves represent the learned functions using

L_{1}

and

L_{2}

,respectively.
图 4：Nguyen-1 基准测试的线性 SR 结果，即

f (x) = x + x^{2} + x^{3}

。红点代表（测试）数据集。红色曲线表示真实函数。蓝色和黑色虚线曲线分别表示使用

L_{1}

和

L_{2}

学习到的函数。

For mixed polynomial and trigonometric expressions, both library choices do not produce the exact expression. However,a better

R^{2}

-coefficient is obtained using

L_{1}

. In the case of Nguyen-5 benchmark for example,i.e.,

f (x) = \sin (x^{2}) \cos (x) - 1

,the resulting function is the Taylor expansion of

f

:
对于混合多项式和三角表达式，两种库选择均未生成精确表达式。然而，使用

L_{1}

可获得更优的

R^{2}

系数。以 Nguyen-5 基准测试为例，即

f (x) = \sin (x^{2}) \cos (x) - 1

，所得函数为

f

的泰勒展开式：

\hat{y} (x) \approx - 1 + 0.9 x^{2} - 0.5 x^{4} - 0.13 x^{6} + O (x^{8})

In conclusion, this approach can not learn the ground-truth function when the latter is a multiplication of two functions (i.e.,

f (x) = f_{1} (x) * f_{2} (x)

) or when it has a multiplicative or an additive factor to the variable (e.g.,

\sin (α + x), \exp (λ * x)

,etc.). In the best case,it outputs an approximation of the ground-truth function. Furthermore, this approach fails to predict the correct mathematical expression when the library is extended to include a mixture of polynomial, trigonometric, exponential, and logarithmic functions.
综上所述，当真实函数为两个函数的乘积（即

f (x) = f_{1} (x) * f_{2} (x)

）或对变量具有乘法或加法因子时（例如

\sin (α + x), \exp (λ * x)

等），该方法无法学习到真实函数。在最佳情况下，它仅能输出真实函数的一个近似。此外，当函数库扩展至包含多项式、三角函数、指数函数和对数函数的混合时，该方法无法预测出正确的数学表达式。

4.1.2 Multivariate function
4.1.2 多元函数

For a given data set

D = {x_{i} \in R^{d}; y_{i} = f (x_{1}, \dots, x_{d})}

,where

d

is the number of features,the same equations presented in Section 4.1.1 are applicable. However, the dimension of the library matrix U changes to consider the features vector dimension. For example, for the same library shown in Eq. 13 and a two dimensional features
对于给定的数据集

D = {x_{i} \in R^{d}; y_{i} = f (x_{1}, \dots, x_{d})}

，其中

d

表示特征数量，第 4.1.1 节中提出的方程同样适用。然而，库矩阵 U 的维度需调整为考虑特征向量的维度。例如，对于式 13 所示的相同库及二维特征

vector,i.e.,

X \in R^{2}, U (X)

becomes:
向量，即

X \in R^{2}, U (X)

变为：

\begin{matrix} (15) & U (X) = [\begin{matrix} | & | & | & | & | \\ 1 & X & X^{P 2} & \sin (X) & \cos (X) & \exp (X) \\ | & | & | & | & | \end{matrix}] \end{matrix}

= [\begin{matrix} ∣ & ∣ & ∣ & ∣ & ∣ & ∣ & ∣ & ∣ \\ 1 & x_{1} & x_{2} & x_{1}^{2} & x_{1} x_{2} & x_{2}^{2} & \sin (x_{1}) & \sin (x_{2}) & \dots \\ ∣ & ∣ & ∣ & ∣ & ∣ & ∣ & ∣ & ∣ \end{matrix}]

Here,

X^{P_{q}}

denotes polynomials in

X

of the order

q

.
此处，

X^{P_{q}}

表示

X

中阶数为

q

的多项式。

Table 3 presents the results of the experiments performed on two-variables dependent functions,i.e.,

f (x_{1}, x_{2})

. Similarly to Section 4.1.1, training and test data sets are generated by randomly sampling twenty pairs of points

(x_{1}, x_{2})

from a uniform distribution

U (- 1, 1)

such that

D = {(x_{1 i}, x_{2 i}, f (x_{1 i}, x_{2 i}))}_{i = 1}^{n}

. The same choices for the library are considered:

L_{1} = {x, {(\cdot)}^{2}, \dots, {(\cdot)}^{9}}

and

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

. An exact match between the ground-truth and predicted function is obtained using

L_{1}

for any polynomial function, whereas only approximate solutions are obtained for trigonometric functions. The results are approximate of the ground-truth function using

L_{2}

.
表 3 展示了针对双变量依赖函数（即

f (x_{1}, x_{2})

）的实验结果。与第 4.1.1 节类似，训练和测试数据集通过从均匀分布

U (- 1, 1)

中随机采样二十对点

(x_{1}, x_{2})

生成，且满足

D = {(x_{1 i}, x_{2 i}, f (x_{1 i}, x_{2 i}))}_{i = 1}^{n}

。库的选择保持不变：

L_{1} = {x, {(\cdot)}^{2}, \dots, {(\cdot)}^{9}}

和

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

。对于任何多项式函数，使用

L_{1}

可得到真实函数与预测函数的精确匹配，而三角函数仅能获得近似解。结果通过

L_{2}

对真实函数进行了近似。

Table 3: Results for multivariate functions using linear SR.

D = {(x_{1}, x_{2}) \in U (- 1, 1, 20); y = f (x_{1}, x_{2})}

L_{1} = {x, {(\cdot)}^{2}, \dots, {(\cdot)}^{9}}

and

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

. T and

F

refer to True and False.
表 3：使用线性符号回归处理多元函数的结果。

D = {(x_{1}, x_{2}) \in U (- 1, 1, 20); y = f (x_{1}, x_{2})}

。

L_{1} = {x, {(\cdot)}^{2}, \dots, {(\cdot)}^{9}}

与

L_{2} = L_{1} \cup {\sin (\cdot), \cos (\cdot), \tan (\cdot), \exp (\cdot), sigmoid (\cdot)}

。T 和

F

分别代表真与假。

Benchmark 基准测试	Expression 表达	$L_{1}$		$L_{2}$
Benchmark 基准测试	Expression 表达	Result 结果	$R^{2}$	Result 结果	$R^{2}$
Nguyen-12 阮氏-12	$x_{1}^{4} - x_{1}^{3} + \frac{1}{2} x_{2}^{2} - x_{2}$	T	1.0	F	$\approx 1$
Livermore-5 利弗莫尔-5	$x_{1}^{4} - x_{1}^{3} + x_{1}^{2} - x_{2}$	T	1.0	F	$\approx 1$
Nguyen-9 阮氏-9	$\sin (x_{1}) + \sin (x_{2}^{2})$	F	$\approx 1$	F	$\approx 1$
Nguyen-10 阮氏-10	$2 \sin (x_{1}) \cos (x_{2})$	F	$\approx 1$	F	$\approx 1$

Furthermore, linear SR is tested on a dataset generated using a two-dimensional multivariate normal distribution

N (μ, \sum)

,as shown in Fig. 5 Different analytic expressions for

f (x_{1}, x_{2})

were tested with different library bases that are summarized in Table 4, including pure polynomial basis functions, polynomial and trigonometric basis functions, and a mixed library.
此外，线性 SR 在由二维多元正态分布

N (μ, \sum)

生成的数据集上进行了测试，如图 5 所示。针对

f (x_{1}, x_{2})

的不同解析表达式，测试了多种库基函数，总结于表 4 中，包括纯多项式基函数、多项式与三角基函数的组合以及混合库。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_9.jpg?x=493&y=1391&w=666&h=439&r=0

Figure 5: Two-dimensional multivariate normal distribution used in test applications.
图 5：测试应用中使用的二维多元正态分布。

The function

y_{1} = \cos (x_{1}) + \sin (x_{2})

is explored with all three bases. In the case of a pure polynomial basis,the correct terms of the Taylor expansion of both

\cos (x_{1})

and

\sin (x_{2})

are identified with only approximate values of their coefficients,i.e.,

{\hat{y}}_{1} = (0.88 - 0.3 x_{1}^{2} + 0.01 x_{1}^{4}) + (0.97 - 0.2 x_{2}^{3})

,which is reflected in the significantly high reconstruction error of the order of

30 %

. In both bases where trigonometric functions are enlisted,the correct terms

\cos (x_{1})

and

\sin (x_{2})

are identified with an excellent reconstruction error,that is

\geq 10^{- 7}

. Note that the lowest reconstruction error is obtained for the library U2, which has the least number of operations and, consequently, the lowest number of coefficients.
函数

y_{1} = \cos (x_{1}) + \sin (x_{2})

在三种基函数下均被探索。对于纯多项式基的情况，泰勒展开的正确项（包括

\cos (x_{1})

和

\sin (x_{2})

）被识别出，但其系数仅为近似值（即

{\hat{y}}_{1} = (0.88 - 0.3 x_{1}^{2} + 0.01 x_{1}^{4}) + (0.97 - 0.2 x_{2}^{3})

），这反映在高达

30 %

量级的重构误差上。而在包含三角函数的两种基中，正确项

\cos (x_{1})

和

\sin (x_{2})

被准确识别，且重构误差极低（

\geq 10^{- 7}

）。值得注意的是，库 U2 因操作次数最少、系数数量最低，获得了最小的重构误差。

	Name 名称	List of functions 功能列表
Library 库	U1	$1, X, X^{P 2}, X^{P 3}, X^{P 4}$
	U2	$1, X, X^{P 2}, \cos (X), \sin (X)$
	U3	$1, X, X^{P 2}, \cos (X), \sin (X), \tan (X), \exp (X)$ ,sigmoid(X)

The function

y_{2} = x_{1}^{2} + \cos (x_{2})

is also tested. For the pure polynomial basis,the reconstructed function

{\hat{y}}_{2} = x_{1}^{2} + (0.83 + 0.49 x_{2} - x_{2}^{2})

predicts approximate values with a reconstruction error of

\leq 1 %

. An excellent prediction is made for the other bases,which enlist both operations in

y_{2} (x_{1}, x_{2})

.
函数

y_{2} = x_{1}^{2} + \cos (x_{2})

同样经过了测试。对于纯多项式基，重构函数

{\hat{y}}_{2} = x_{1}^{2} + (0.83 + 0.49 x_{2} - x_{2}^{2})

以

\leq 1 %

的重构误差预测近似值。在其他基函数中表现优异，这些基函数同时包含了

y_{2} (x_{1}, x_{2})

中的运算。

In the same exercise,a more complicated function form is tested that includes mixed terms,i.e.,

y_{3} = x_{1} (1 +

x_{2}) + \cos (x_{1}) * \sin (x_{2})

. The difference between the true and the predicted function is illustrated in Fig. 6 . The linear approach performs similarly for all three library bases. A low reconstruction error is obtained because the operation term

\cos (x_{1}) * \sin (x_{2})

y_{3}

is not enlisted in any of the libraries,showing an important limitation of the current approach.
在同一实验中，测试了一个包含混合项（即

y_{3} = x_{1} (1 +

x_{2}) + \cos (x_{1}) * \sin (x_{2})

）的更复杂函数形式。真实函数与预测函数之间的差异如图 6 所示。线性方法在所有三个库基函数中表现相似。由于操作项

\cos (x_{1}) * \sin (x_{2})

未包含在任何库的

y_{3}

中，重构误差较低，这显示了当前方法的一个重要局限性。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_10.jpg?x=220&y=822&w=1208&h=514&r=0

Figure 6: Difference between true(y)and predicted

(\hat{y})

values of the function

y = x_{1} (1 + x_{2}) + \cos (x_{1}) * \sin (x_{2})

, for the three libraries defined in Table 4: U1 (left), U2 (center), U3 (right).
图 6：函数

y = x_{1} (1 + x_{2}) + \cos (x_{1}) * \sin (x_{2})

的真实值(y)与预测值

(\hat{y})

之间的差异，针对表 4 中定义的三个库：U1（左）、U2（中）、U3（右）。

4.2 Multidimensional case
4.2 多维情况

The target mathematical expression comprises

m

components,i.e.,

Y = [y_{1}, \dots, y_{m}]

,and the goal is to learn the coefficients of a system of linear equations rather than one mathematical expression. Each component

(y_{j})

is described by:
目标数学表达式由

m

个分量组成，即

Y = [y_{1}, \dots, y_{m}]

，其目标是学习线性方程组的系数而非单一数学表达式。每个分量

(y_{j})

描述如下：

\begin{matrix} (16) & y_{j} = f_{j} (x) = \sum_{k} θ_{j k} h_{k} (x) \end{matrix}

In this case,there exist

m

sparse vectors of coefficients,i.e.,

Θ = [\begin{array}{lll} θ_{1} & \dots & θ_{m} \end{array}]

. Consider the Lorenz system, which is a set of ordinary differential equations that captures nonlinearities in the dynamics of fluid convection. It consists of three variables

{x_{1}, x_{2}, x_{3}}

and their first-order derivatives with respect to time

{\frac{d x_{1}}{d t}, \frac{d x_{2}}{d t}, \frac{d x_{3}}{d t}}

, which we will refer to as

{y_{1}, y_{2}, y_{3}}

. Using the library of Eq. 13,the system of linear equations is represented in a matrix form as follows:
在这种情况下，存在

m

稀疏系数向量，即

Θ = [\begin{matrix} θ_{1} & \dots & θ_{m} \end{matrix}]

。考虑洛伦兹系统，这是一组捕捉流体对流动力学中非线性现象的常微分方程。它包含三个变量

{x_{1}, x_{2}, x_{3}}

及其对时间的一阶导数

{\frac{d x_{1}}{d t}, \frac{d x_{2}}{d t}, \frac{d x_{3}}{d t}}

，我们将这些导数称为

{y_{1}, y_{2}, y_{3}}

。利用式 13 的库，该线性方程组以矩阵形式表示如下：

\begin{matrix} (17) & [\begin{matrix} y_{1} & y_{2} & y_{3} \\ ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ \end{matrix}] = [\begin{matrix} 1 & x_{1} & x_{2} & x_{1}^{2} & x_{1} x_{2} & x_{2}^{2} & \exp (x_{2}) \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \end{matrix}] [\begin{matrix} θ_{1} & θ_{2} & θ_{3} \\ ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ \end{matrix}] \end{matrix}

Here,

Y \in R^{n \times 3}, U (X) \in R^{n \times k}

and

Θ \in R^{k \times 3}

,where

n

is the size of the input data and

k

is the number of columns in the library matrix U. The

j^{t h}

-component of the

Y

vector is given by:
这里，

Y \in R^{n \times 3}, U (X) \in R^{n \times k}

和

Θ \in R^{k \times 3}

，其中

n

表示输入数据的大小，

k

表示库矩阵 U 的列数。

Y

向量的

j^{t h}

分量由下式给出：

\begin{matrix} (18) & y_{j} = θ_{j, 0} + θ_{j, 1} x_{1} + θ_{j, 2} x_{2} + θ_{j, 3} x_{1}^{2} + \dots + θ_{j, k} \exp (x_{2}) \end{matrix}

Equation 17 can be written in a compact form as:
式 17 可以简写为：

\begin{matrix} (19) & y_{k} = U (x^{T}) θ_{k} \end{matrix}

The application presented in [33] uses this approach, where the authors aim to learn differential equations that govern the dynamics of a given system, such as a nonlinear pendulum and the Lorenz system. The approach successfully learned the exact weights, allowing them to recover the correct governing equations.
文献[33]中展示的应用采用了这种方法，作者旨在学习支配给定系统（如非线性摆和洛伦兹系统）动力学的微分方程。该方法成功学习了精确权重，使他们能够恢复正确的控制方程。

An exemplary schematic is illustrated in Figure 7 for the Lorenz system defined by

\dot{x} = σ (y - x), \dot{y} = x (ρ - z) - y

\dot{z} = x y - β z

. Here

x, y

,and

z

are physical variables and

\dot{x}, \dot{y}

,and

\dot{z}

are their respective time-derivatives. Only coefficients associated with functions

{x_{1}, x_{1} x_{2},}

should be non-zero and equal to the factors shown in the Lorenz system's set of equations.
示例示意图如图 7 所示，展示由

\dot{x} = σ (y - x), \dot{y} = x (ρ - z) - y

、

\dot{z} = x y - β z

定义的洛伦兹系统。其中

x, y

和

z

为物理变量，

\dot{x}, \dot{y}

及

\dot{z}

分别为其时间导数。仅与函数

{x_{1}, x_{1} x_{2},}

相关的系数应为非零且等于洛伦兹方程组中所示的因子。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_11.jpg?x=556&y=815&w=534&h=211&r=0

Figure 7: Schematic of the system of Eq. 11 for the Lorenz system defined by

y_{1} = σ (x_{2} - x_{1}), y_{2} = x_{1} (ρ - x_{3}) - x_{2}

y_{3} = x_{1} x_{2} - β x_{3}

. A library

U (X)

of nonlinear functions of the input is constructed. The marked entries in the

θ

s vectors denote the non-zero coefficients determining which library functions are active for each of the three variables

{y_{1}, y_{2}, y_{3}}

.
图 7：式 11 定义的洛伦兹系统示意图。构建了一个输入非线性函数库

U (X)

。

θ

向量中标示的条目表示非零系数，这些系数决定了三个变量

{y_{1}, y_{2}, y_{3}}

中哪些库函数处于激活状态。

In summary, the linear approach is only successful in particular cases and can not be generalized. Its main limitation is in predefining the model's structure as a linear combination of nonlinear functions, reducing the SR problem to solve a system of linear equations. In contrast, the main mission of SR is to learn the model's structure and parameters. A direct consequence of this limitation is that the linear approach fails to learn expressions in many cases: (i) composition of functions (e.g.,

f (x) = f_{1} (x) * f_{2} (x)

); (ii) multivariate functions (e.g.,

\exp (x * y), \tan (x + y)

,etc.); and (iii) functions including multiplicative or additive factors to their arguments (e.g.,

\exp (λ x)

). Finally the dimension of the library matrix can be challenging in computing resources for extended libraries and high-dimensional data sets.
综上所述，线性方法仅在特定情况下适用，无法推广。其主要局限在于预先将模型结构定义为非线性函数的线性组合，将符号回归问题简化为求解线性方程组。而符号回归的核心任务是学习模型的结构与参数。这一局限直接导致线性方法在多数情况下无法学习以下表达式：(i) 函数的组合（如

f (x) = f_{1} (x) * f_{2} (x)

）；(ii) 多元函数（如

\exp (x * y), \tan (x + y)

等）；(iii) 参数包含乘性或加性因子的函数（例如

\exp (λ x)

）。此外，对于扩展库和高维数据集，库矩阵的维度可能对计算资源构成挑战。

5 Nonlinear symbolic regression
5 非线性符号回归

The nonlinear method uses deep neural networks (DNN), known for their great ability to detect and learn complex patterns directly from data.
非线性方法采用深度神经网络（DNN），该技术以直接从数据中检测和学习复杂模式的能力著称。

DNN has the advantage of being fully differentiable in its free parameters allowing end-to-end training using back-propagation. This approach searches the target expression by replacing the standard activation functions in a neural network with elementary mathematical operations. Figure 8 shows an NN-based architecture for SR called the Equation Learner (EQL) network proposed by Martius and Lampert [47] in comparison with a standard NN. Only two hidden layers are shown for simple visualization, but the network's deepness is controlled as per the case study.
DNN 的优势在于其自由参数完全可微分，允许通过反向传播进行端到端训练。该方法通过将神经网络中的标准激活函数替换为基本数学运算来搜索目标表达式。图 8 展示了一种基于神经网络的符号回归架构，称为方程学习器（EQL）网络，由 Martius 和 Lampert[47]提出，并与标准神经网络进行了对比。为简化可视化仅显示两个隐藏层，但网络深度会根据案例研究进行调控。

The EQL network uses a multi-layer feed-forward NN with one output node. A linear transformation

z^{[l]}

is applied at every hidden layer(l),followed by a nonlinear transformation

a_{i}^{[l]}

using unary (i.e.,one argument) and binary (i.e., two arguments) activation functions as follows
EQL 网络采用带单输出节点的多层前馈神经网络。每个隐藏层(l)先进行线性变换

z^{[l]}

，随后使用一元（单参数）和二元（双参数）激活函数进行非线性变换

a_{i}^{[l]}

，具体如下：

z^{[l]} = W^{[l]} \cdot a^{[l - 1]} + b^{[l]}

\begin{matrix} (20) & a_{i}^{[l]} = f_{i} (z_{i}^{[l]}) \end{matrix}

where

{W, b}

denote the weight parameters and

f_{i}

denotes individual activation function from the library

L = {

identity,

{(\cdot)}^{n}, \cos, \sin, \exp, \log, sigmoid}

. In a standard NN,the same activation function is applied to all hidden units and is typically chosen among {RELU, tanh, sigmoid, softmax, etc.}.
其中

{W, b}

表示权重参数，

f_{i}

表示从函数库

L = {

（恒等函数、

{(\cdot)}^{n}, \cos, \sin, \exp, \log, sigmoid}

等）中选取的个体激活函数。在标准神经网络中，所有隐藏单元采用相同的激活函数，通常从{RELU、tanh、sigmoid、softmax 等}中选择。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_12.jpg?x=170&y=148&w=1305&h=552&r=0

Figure 8: Exemplary setup of a standard NN (8a) and EQL-NN (8b) with input

x

,output

\hat{y}

and two hidden layers. In (a),

f

denotes the activation function usually chosen among {RELU,tanh,sigmoid} while in EQL each node has a specific activation function drawn from the function class

F

.
图 8：标准神经网络(8a)与 EQL-NN(8b)的示例架构，包含输入

x

、输出

\hat{y}

及两个隐藏层。(a)中

f

表示通常从{RELU、tanh、sigmoid}中选取的激活函数，而 EQL 中每个节点的激活函数均从函数类

F

中独立选择。

The problem reduces to learn the correct weight parameters

{W^{[l]}, b^{[l]}}

,whereas the operators of the target mathematical expression are selected during training. To overcome the interpretability limitation of neural network-based architectures and to promote simple over complex solutions as a typical formula describing a physical process,sparsity is enforced by adding a regularization term

l_{1}

to the

l_{2}

loss function such that,
问题简化为学习正确的权重参数

{W^{[l]}, b^{[l]}}

，而目标数学表达式的运算符在训练过程中被选择。为克服神经网络架构可解释性限制，并促使描述物理过程的典型公式倾向于简单而非复杂解，通过向

l_{2}

损失函数添加正则化项

l_{1}

来强制稀疏性，具体表现为：

\begin{matrix} (21) & ℓ = \frac{1}{N} \sum_{i = 1}^{N} {‖ \begin{matrix} \hat{y} (x_{i}) - y_{i} \end{matrix} ‖}^{2} + λ \sum_{l = 1}^{L} {| W^{[l]} |}_{1} \end{matrix}

Where

N

denotes the number of data entries and

L

denotes the number of layers. Whereas this method is end-to-end differentiable in NN parameters and scales well to high dimensional problems, back-propagation through activation functions such as division or logarithm requires simplifications to the search space, thus limiting its ability to produce simple expressions involving divisions (e.g.,

\frac{\sin (x / y)}{x}

). An extended version EQL

^{\div}

[48] includes only the division, whereas exponential and logarithm activation functions are not included because of numerical issues.
其中

N

表示数据条目数量，

L

表示层数。尽管该方法在神经网络参数上端到端可微分且能良好扩展至高维问题，但通过除法或对数等激活函数进行反向传播时需对搜索空间进行简化，因而限制了其生成包含除法（例如

\frac{\sin (x / y)}{x}

）的简单表达式的能力。扩展版本 EQL

^{\div}

[48]仅包含除法运算，由于数值稳定性问题未纳入指数和对数激活函数。

6 Tree expression 6 树表达式

This section discusses SR methods in which a mathematical expression is regarded as a unary-binary tree consisting of internal nodes and terminals. Every tree node represents a mathematical operation (e.g.,

+, -, \times, \sin, \log

, etc.) that is drawn from a pre-defined function class

F

(Section 2.1) and every tree terminal node (or leaf) represents an operand, i.e., variable or constant, as illustrated for the example shown in Figure 9. Expression tree-based methods include genetic programming, transformers, and reinforcement learning.
本节讨论将数学表达式视为由内部节点和叶节点组成的单目-双目树结构的符号回归方法。每个树节点代表从预定义函数类

F

（第 2.1 节）中选取的数学运算（如

+, -, \times, \sin, \log

等），每个叶节点代表操作数（即变量或常数），如图 9 示例所示。基于表达式树的方法包括遗传编程、Transformer 模型和强化学习。

6.1 Genetic Programming 6.1 遗传编程

Genetic programming (GP) is an evolutionary algorithm in computer science that searches the space of computer programs to solve a given problem. Starting with a "population" (set) of "individuals" (trees) that is randomly generated,GP evolves the initial population

T_{G P}^{(0)}

using a set of evolutionary "transition rules" (operations)

{r_{i} : f \to f ∣ i \in N}

that is defined over the tree space. GP evolutionary operations include mutation, crossover, and selection. The mutation operation introduces random variations to an individual by replacing one subtree with another randomly generated subtree (Figure 10-right). The crossover operation involves exchanging content between two individuals, for example, by swapping one random subtree of one individual with another random subtree of another individual (Figure 10-left). Finally, the selection operation is used to select which individuals from the current population persist onto the next population. A common selection operator is tournament selection,in which a set of

k

candidate individuals are randomly sampled from the population, and the individual with the highest fitness i.e., a minimum loss is selected. In a GP algorithm, a single iteration corresponds to one generation. The application of one generation of GP on a population

T_{G P}^{(i)}

produces a new, augmented population

T_{G P}^{(i + 1)}

. In each generation,each individual has a probability of undergoing a mutation operation and a probability of undergoing a crossover operation. The selection is applied when the dimension of the current population is the same as the previous one. Throughout

M_{k}

iterations,the following steps are undertaken: (1) transition rules are applied to the function set

F^{k} = {f_{1}^{k}, \dots, f_{M_{k}}^{k}}

such that

f^{k + 1} = r_{i} (f^{k})

where

k

denotes the iteration index; (2) the loss function

ℓ (F^{k})

is evaluated for the set; and (3) an elite set of individuals is selected for the next iteration step. The GP algorithm repeats this procedure until a predetermined accuracy level is achieved.
遗传编程（GP）是计算机科学中的一种进化算法，用于在计算机程序空间中搜索以解决给定问题。GP 从随机生成的“个体”（树）组成的“种群”开始，通过定义在树空间上的一组进化“转换规则”（操作）

{r_{i} : f \to f ∣ i \in N}

来演化初始种群

T_{G P}^{(0)}

。GP 的进化操作包括变异、交叉和选择。变异操作通过将一个子树替换为另一个随机生成的子树（图 10 右侧）来为个体引入随机变化。交叉操作涉及两个个体之间的内容交换，例如，将一个个体的随机子树与另一个个体的随机子树进行交换（图 10 左侧）。最后，选择操作用于从当前种群中选出哪些个体能够延续到下一代种群中。常见的选择操作符是锦标赛选择，其中从种群中随机抽取一组

k

候选个体，并选择适应度最高（即损失最小）的个体。在 GP 算法中，一次迭代对应一代。对种群

T_{G P}^{(i)}

应用一代 GP 会产生一个新的、扩增的种群

T_{G P}^{(i + 1)}

。在每一代中，每个个体都有一定概率经历变异操作和交叉操作。当当前种群的维度与前一代相同时，会进行选择操作。经过

M_{k}

次迭代后，执行以下步骤：(1) 将转换规则应用于函数集

F^{k} = {f_{1}^{k}, \dots, f_{M_{k}}^{k}}

，使得

f^{k + 1} = r_{i} (f^{k})

，其中

k

表示迭代索引；(2) 评估该集合的损失函数

ℓ (F^{k})

；(3) 为下一步迭代选择一组精英个体。GP 算法重复这一过程，直到达到预定的精度水平。

Fr: L'enfant n'est pas allé à l'école parce qu'elle était fermée.
法文：L'enfant n'est pas allé à l'école parce qu'elle était fermée.

En: The kid did not go to school because it was cold.
英文：孩子没去上学，因为天气太冷了。

Fr: L'enfant n'est pas allé à l'école parce qu'il faisait froid.
源文：孩子因为天气冷没有去上学。

The two sentences are identical except for the last word, which refers to the school in the first sentence (i.e., "closed") and to the weather in the second one (i.e., "cold"). Transformers create a context-dependent word embedding that it pays particular attention to the terms (of the sequence) with high weights. In this example, the noun that the adjective of each sentence refers to has a significant weight and is therefore considered for translating the word "it". Technically,an embedding

x_{i}

is assigned to each element of the input sequence,and a set of

m

key-value pairs is defined,i.e.,

S = {(k_{1}, v_{1}), \dots, (k_{m}, v_{m})}

. For each query,the attention mechanism computes a linear combination of values

\sum_{j} ω_{j} v_{j}

,where the attention weights

(ω_{j} \propto q \cdot k_{j})

are derived using the dot product between the query(q)and all keys

(k_{j})

,as follows:
这两句话除了最后一个词外完全相同，第一句中的“closed”指代学校，而第二句中的“cold”则指代天气。Transformer 模型生成了一种依赖于上下文的词嵌入，特别关注序列中权重较高的词汇。在此例中，每个句子形容词所修饰的名词具有显著权重，因此在翻译“it”一词时会被纳入考量。从技术上讲，输入序列的每个元素被分配一个嵌入

x_{i}

，并定义一组

m

键值对，即

S = {(k_{1}, v_{1}), \dots, (k_{m}, v_{m})}

。对于每个查询，注意力机制计算值的线性组合

\sum_{j} ω_{j} v_{j}

，其中注意力权重

(ω_{j} \propto q \cdot k_{j})

通过查询(q)与所有键

(k_{j})

的点积得出，如下所示：

\begin{matrix} (22) & Attention (q, S) = \sum_{j} σ (q \cdot k_{j}) v_{j} \end{matrix}

Here,

q = x W_{q}

is a query,

k_{i} = x_{i} W_{k}

is a key,

v_{i} = x_{i} W_{v}

is a value,and

W_{q}, W_{k}, W_{v}

are the learnable parameters. The architecture of the self-attention mechanism is illustrated in Figure 11,
此处，

q = x W_{q}

为查询，

k_{i} = x_{i} W_{k}

为键，

v_{i} = x_{i} W_{v}

为值，

W_{q}, W_{k}, W_{v}

为可学习参数。自注意力机制的结构如图 11 所示，

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_14.jpg?x=244&y=844&w=1161&h=783&r=0

Input 输入

Figure 11: Evaluation of Attention(q,S)(Eq. 22) for a query

q_{i}

,computed using the input vector embedding

x_{i}

.
图 11：针对查询

q_{i}

的注意力评估（公式 22），使用输入向量嵌入

x_{i}

计算得出。

In the context of SR,both input data points

{(x_{i}, y_{i}) ∣ x_{i} \in R^{d}, y_{i} \in R, i \in N_{n}}

and mathematical expressions

f

are encoded as sequences of symbolic representations as discussed in Section 2.2. The role of the transformer is to create the dependencies at two levels, first between numerical and symbolic sequences and between tokens of symbolic sequence. Consider the mathematical expression

f (x, y, z) = \sin (x / y) - \sin (z)

,which can be written as a sequence of tokens following the polish notation:
在符号回归（SR）的背景下，输入数据点

{(x_{i}, y_{i}) ∣ x_{i} \in R^{d}, y_{i} \in R, i \in N_{n}}

和数学表达式

f

均按照第 2.2 节所述被编码为符号表示序列。变换器的作用在于建立两个层面的依赖关系：首先是数值序列与符号序列之间的依赖，其次是符号序列内部标记间的依赖。以数学表达式

f (x, y, z) = \sin (x / y) - \sin (z)

为例，其可按照波兰表示法写作标记序列：

sin 正弦

x

y

sin 正弦

z

Each symbol is associated with an embedding such that:
每个符号都与一个嵌入相关联，使得：

x_{1} :- x_{2} : \sin x_{3} : \div x_{4} : x x_{5} : y x_{6} : \sin x_{7} : z

In this particular example,for query

(x_{7} : z)

,the attention mechanism will give a higher weight for the binary operator

(x_{1} : -)

than for the variable

(x_{5} : y)

or the division operator

(x_{3} : \div)

.
在这个特定例子中，对于查询

(x_{7} : z)

，注意力机制会给二元运算符

(x_{1} : -)

分配比变量

(x_{5} : y)

或除法运算符

(x_{3} : \div)

更高的权重。

Transformers consist of an encoder-decoder structure; each block comprises a self-attention layer and a feed-forward neural network. TNN inputs a sequence of embeddings

{x_{i}}

and outputs a "context-dependent" sequence of embeddings

{y_{i}}

one at a time,through a latent representation

z_{i}

. TNN is an auto-regressive model, i.e., sampling each symbol is conditioned by the previously sampled symbols and the latent sequence. An example of a TNN encoder is shown in Figure 12. In symbolic regression case, the encoder and the decoder do not share the same vocabulary because the decoder has a mixture of symbolic and numeric representations, while the encoder has only numeric representations. There exist two approaches to solving SR problems using transformers. First is the skeleton approach [32, 49 where the transformer conducts the two-steps procedure: (1) the decoder predicts a skeleton

f_{e}

,a parametric function that defines the general shape of the target expression up to a choice of constants, using the function class

F

and (2) the constants are fitted using optimization techniques such as the non-linear optimization solver BFGS. For example,if

f = \cos (2 x_{1}) - 0.1 \exp (x_{2})

,then the decoder predicts

f_{e} = \cos (\circ x_{1}) - \circ \exp (x_{2})

where - denotes an unknown constant. The second is an end-to-end (E2E) approach [31] where both the skeleton and the numerical values of the constants are simultaneously predicted. Both approaches are further discussed in Section 7.
Transformer 由编码器-解码器结构组成；每个块包含一个自注意力层和一个前馈神经网络。TNN 输入一个嵌入序列

{x_{i}}

，并通过潜在表示

z_{i}

逐个输出“上下文相关”的嵌入序列

{y_{i}}

。TNN 是一个自回归模型，即每个符号的采样都依赖于先前采样的符号和潜在序列。图 12 展示了一个 TNN 编码器的示例。在符号回归案例中，编码器和解码器不共享相同的词汇表，因为解码器混合了符号和数值表示，而编码器仅有数值表示。使用 Transformer 解决符号回归问题存在两种方法。第一种是骨架方法[32,49]，其中 Transformer 执行两步流程：(1)解码器预测一个骨架

f_{e}

，这是一个参数化函数，定义了目标表达式的整体形状，但常数待定，使用函数类

F

；(2)通过非线性优化求解器 BFGS 等优化技术拟合常数。例如，若

f = \cos (2 x_{1}) - 0.1 \exp (x_{2})

，则解码器预测

f_{e} = \cos (\circ x_{1}) - \circ \exp (x_{2})

，其中“-”代表未知常数。第二种是端到端（E2E）方法[31]，其中骨架和常数的数值同时被预测。这两种方法将在第 7 节进一步讨论。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_15.jpg?x=490&y=477&w=665&h=861&r=0

Figure 12: Structure of a TNN encoder [46]. It comprises an attention layer and a feed-forward neural network.
图 12：TNN 编码器结构[46]。它包含一个注意力层和一个前馈神经网络。

6.3 Reinforcement learning
6.3 强化学习

Reinforcement learning provides a framework for learning and decision-making by trial and error [50]. An RL Setting consists of four components(S,A,P,R)in a Markov decision process. In this setting,an agent observes a state

s \in S

of the environment and,based on that,takes action

a \in A

,which results in a reward

r = R (s, a)

, and the environment then transitions to a new state

s^{'} \in S

. The interaction goes on in time steps until a terminal state is reached. The aim of the agent is to learn the policy

P

(also called transition dynamics),which is a mapping from states to actions that maximize the expected cumulative reward. An exemplary sketch of an RL-based SR method is illustrated in Figure 13.
强化学习通过试错提供了一个学习和决策的框架[50]。一个强化学习设置由马尔可夫决策过程中的四个组成部分（S,A,P,R）构成。在此设置中，智能体观察环境的状态

s \in S

，并基于此采取行动

a \in A

，从而获得奖励

r = R (s, a)

，随后环境转移到新状态

s^{'} \in S

。这种交互随时间步持续，直至达到终止状态。智能体的目标是学习策略

P

（也称为转移动态），即从状态到行动的映射，以最大化预期累积奖励。图 13 展示了一个基于强化学习的符号回归方法示例示意图。

SR problem can be framed in RL as follows: the agent (NN) observes the environment (parent and sibling in a tree) and, based on the observation, takes an action (predict the next token of the sequence) and transitions
SR 问题可以如下在强化学习（RL）中构建框架：智能体（神经网络）观察环境（树结构中的父节点与兄弟节点），基于观察采取行动（预测序列的下一个标记），并进行状态转移。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_16.jpg?x=280&y=160&w=1076&h=506&r=0

Figure 13: Exemplary sketch of a general RL-based SR method.

s_{t}, a_{t}

,and

r_{t} = R (s_{t}, a_{t})

denote the state, action,and reward at time step

t . (t + 1)

denotes the next time step.
图 13：一个典型的基于强化学习的 SR 方法示意图。

s_{t}, a_{t}

和

r_{t} = R (s_{t}, a_{t})

分别表示时间步

t . (t + 1)

的状态、动作及奖励，

t . (t + 1)

代表下一时间步。

into a new state. In this view, the NN model is like a policy, the parent and sibling are like observations, and sampled symbols are like actions.
进入新状态。在此视角下，神经网络模型类似于策略，父节点与兄弟节点如同观测值，而采样符号则类比于动作。

7 Applications 7 应用场景

Most existing algorithms for solving SR are GP-based, whereas many others, and more recent, are deep learning (DL)-based. There exist two different strategies to solve SR problems, as illustrated in the taxonomy of Figure 14.
大多数现有的解决符号回归（SR）问题的算法基于遗传规划（GP），而其他许多（尤其是较新的）方法则基于深度学习（DL）。如图 14 的分类所示，存在两种不同的策略来解决 SR 问题。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_16.jpg?x=254&y=1139&w=1136&h=480&r=0

Figure 14: Strategies for solving SR problem. An SR algorithm has three types of input: data (x), a new or reduced representation of the data

(x^{'} or z)

,or a model

(f (x))

learned from the data.
图 14：解决 SR 问题的策略。SR 算法有三种输入类型：数据（x）、数据的新或简化表示

(x^{'} or z)

，或从数据中学习的模型

(f (x))

。

The first is a one-step approach, where data points are directly fed into an SR algorithm. A second is a two-step approach involving a process which either learns a new representation of data or learns a "blackbox" model, which will be then fed into SR algorithm as described below:
第一种是一步到位的方法，直接将数据点输入到符号回归（SR）算法中。第二种是分两步走的方法，涉及一个过程：要么学习数据的新表示，要么学习一个“黑箱”模型，随后如后文所述将其输入到 SR 算法中。

Learn a new representation of the original data set by defining new features (reducing the number of independent variables) or a reduced representation using specific NN architectures such as principal component analysis and autoencoders.
通过学习定义新特征（减少自变量数量）或利用特定神经网络架构（如主成分分析和自编码器）获得降维表示，从而学习原始数据集的新表示。

Learn a `"blackbox"` model either using regular NN or using conceptual NN such as graph neural network (GNN). In this case, an SR algorithm is applied to the learned model or parts of it.
使用常规神经网络或概念性神经网络（如图神经网络 GNN）学习一个“黑箱”模型。此时，符号回归算法会应用于已学习模型或其部分组件上。

We group the applications based on the categories presented in Section 3, and we summarize them in Table 5 GP-based applications will not be reviewed here; they are listed in the living review [25], along with DL-based applications. State-of-the-art GP-based methods are discussed in detail in [54]. Among GP-based applications is the commercial software Eureqa [26], the most well-known GP-based method that uses the algorithm proposed by Schmidt and Lipson in [2]. Eureqa is used as a baseline SR method in several research works.
我们根据第 3 节中提出的类别对应用进行分组，并在表 5 中进行了总结。基于遗传规划（GP）的应用在此不做评述，它们与基于深度学习（DL）的应用一同列在持续更新的综述[25]中。文献[54]详细讨论了最先进的基于 GP 的方法。在基于 GP 的应用中，商业软件 Eureqa[26]最为知名，它采用了 Schmidt 和 Lipson 在文献[2]中提出的算法，并在多项研究工作中被用作符号回归（SR）方法的基准。

Table 5: Table summarizing symbolic regression applications. D refers to data input, 1 refers to data representation input, and 2 refers to model input to SR.
表 5：符号回归应用总结表。D 代表数据输入，1 代表数据表示输入，2 代表模型输入至 SR。

Application 应用	Ref 参考文献	Name/description(code) 名称/描述(代码)	Year 年份	Method 方法	Strategy 策略
SINDY	51	Sparse Identification of Nonlinear Dynamics link 非线性动力学稀疏辨识链接	2016	Linear 线性	D
SINDY-AE	33	Data-driven discovery of coordinates and gov- erning equations (https://github.com/kpchamp/ SindyAutoencoders 数据驱动的坐标与主导方程发现 (https://github.com/kpchamp/SindyAutoencoders)	2019	Linear 线性	1
EQL	47	Equation learner (https://github.com/ KristofPusztai/EQL 方程学习器 (https://github.com/KristofPusztai/EQL)	2016	non linear 非线性	D
EQL÷	48	Equation learner division (https://github.com/ martius-lab/EQL 方程学习器分割（https://github.com/martius-lab/EQL）	2018	Non linear 非线性	D
Eureqa 尤里卡	26	Commercial software 商业软件	2011	GP	D
FFX	20	Fast function extraction (https://github.com/ natekupp/ffx/tree/master/ffx) 快速函数提取 (https://github.com/ natekupp/ffx/tree/master/ffx)	2011	GP	D
ITEA	22	Interaction-Transformation Evolutionary Algorithm for SR (https://github.com/folivetti/ITEA/) 符号回归的交互-变换进化算法 (https://github.com/folivetti/ITEA/)	2019	GP	D
MRGP	23	(https://github.com/ flexgp/gp-learners)	2014	GP	D
E2ESR	31	(https://github.com/facebookresearch/ symbolicregression)	2022	TNN	D
NeSymReS	32	(https:// github.com/SymposiumOrganization/ NeuralSymbolicRegressionThatScales) (https://github.com/SymposiumOrganization/NeuralSymbolicRegressionThatScales)	2021	TNN	D
DSR	19	Deep symbolic regression (https://github.com/ brendenpetersen/deep-symbolic-regression) 深度符号回归（https://github.com/brendenpetersen/deep-symbolic-regression）	2019	RNN,RL 循环神经网络,强化学习	D
NGPPS	29	SR via Neural-Guided GP population seed- (https://github.com/brendenpetersen/ deep-symbolic-regression) 通过神经引导遗传编程种群种子实现的符号回归-(https://github.com/brendenpetersen/deep-symbolic-regression)	2021	RNN,GP,RL 循环神经网络,遗传编程,强化学习	D
AIFeynman AI 费曼算法	27	Physics-inspired method for SR (https://github.com/SJ001/AI-Feynman 物理启发的 SR 方法（https://github.com/SJ001/AI-Feynman）	2019	Physics-informed 物理信息驱动	1
SM	30	Symbolic Metamodel (https://bitbucket.org/ mvdschaar/mlforhealthlabpub) 符号元模型（https://bitbucket.org/ mvdschaar/mlforhealthlabpub）	2019	Mathematics 数学	2
GNN	52	Discovering Symbolic Models from DL with Induc- tive Biases (https://github.com/MilesCranmer/ symbolic_deep_learning) 通过归纳偏置从深度学习中发现符号模型（https://github.com/MilesCranmer/symbolic_deep_learning）	2020	GNN	2
HEAL	53	Heuristic and Evolutionary Algorithms Labo- ratory (https://github.com/heal-research/ HeuristicLab 启发式与进化算法实验室（https://github.com/heal-research/HeuristicLab）	-	Heuristic 启发式	D

SINDY-AE [33] is a hybrid SR method that combines autoencoder network [55] with linear SR [51]. The novelty of this approach is in simultaneously learning sparse dynamical models and reduced representations of coordinates that define the model using snapshot data. Given a data set

x (t) \in R^{n}

,this method seeks to learn coordinate transformations from original to intrinsic coordinates

z = ϕ (x)

(encoder) and back via

x = ψ (z)

(decoder),along with the dynamical model associated with the set of reduced coordinates

z (t) \in R^{d} (d ≪ n)

:
SINDY-AE [33]是一种混合符号回归方法，结合了自编码器网络[55]与线性符号回归[51]。该方法的创新点在于同时学习稀疏动力学模型和利用快照数据定义模型的降维坐标表示。给定数据集

x (t) \in R^{n}

，此方法旨在学习从原始坐标到内在坐标

z = ϕ (x)

（编码器）及反向

x = ψ (z)

（解码器）的坐标变换，并建立与降维坐标集

z (t) \in R^{d} (d ≪ n)

8 Datasets 8 数据集

For symbolic regression purposes, there exist several benchmark data sets that can be categorized into two main groups: (1) ground-truth problems (or synthetic data) and (2) real-world problems (or real data), as summarized in Figure 18. In this section, we describe each category and discuss its main strength and limitations.
针对符号回归任务，现有基准数据集可分为两大类型：(1) 基准真值问题（合成数据）和 (2) 现实问题（真实数据），如图 18 所示。本节将分别阐述每类数据集的特点及其主要优势与局限性。

https://cdn.noedgeai.com/0196b52b-6cde-7a94-8571-d8fe11e76e16_22.jpg?x=357&y=756&w=937&h=275&r=0

Figure 18: Taxonomy based on the type of SR benchmark problems.
图 18：基于 SR 基准问题类型的分类法。

Ground-truth regression problems are characterized by known mathematical equations, they are listed in Table 6 These include (1) physics-inspired equations [27, 60] and (2) real-valued symbolic equations [1, 14, 15, 16, 17, 18, 19, 61 .
真实回归问题的特征在于已知数学方程，它们列于表 6 中，包括：(1) 物理启发的方程[27,60]；(2) 实值符号方程[1,14-19,61]。

Table 6: Table summarizing ground-truth problems for symbolic regression.
表 6：符号回归真实问题汇总表。

Type 类型	Benchmark 基准测试	Number of problems 问题数量	Year 年份	Reference 参考文献
Physics-related 物理学相关	Feynman Database 费曼数据库	119	2019	27
Physics-related 物理学相关	Strogatz Repository 斯特罗加茨知识库	10	2011	60
Mathematics-related 数学相关	Koza 科扎	3	1994	1
	Keijzer 凯泽	15	2003	14
	Vladislavleva 弗拉季斯拉列娃	8	2009	15
	Nguyen 阮	12	2011	17
	Korns 科恩斯	15	2011	16
	R	3	2013	61
	Jin 金	6	2019	18
	Livermore 利弗莫尔	22	2021	19

The Feynman Symbolic Regression Database [62] is the largest SR database that originates from Feynman lectures on Physics series [63, 64, and is proposed in 27 . It consists of 1194 physics-inspired equations that describe static physical systems and various physics processes. The proposed equations depend on at least one variable and, at most, nine variables. Each benchmark (corresponding to one equation) is generated by randomly sampling one million entries. Each entry is a row of randomly generated input variables, which are sampled uniformly between 1 and 5 . This range of sampling was slightly adjusted for some equations to avoid unphysical results (e.g., division by zero or the square root of a negative number). The output is evaluated using function

f

,e.g.

D = {x_{i} \in R^{d}, y_{i} = f (x_{1}, \dots, x_{d})}

.
费曼符号回归数据库[62]是源自费曼物理学讲座系列[63,64]的最大 SR 数据库，并于文献 27 中提出。该库包含 1194 个受物理学启发的方程，用于描述静态物理系统及各类物理过程。所提出的方程至少依赖一个变量，最多涉及九个变量。每个基准（对应一个方程）通过随机采样一百万条数据生成，每条数据是一行随机生成的输入变量，其数值在 1 至 5 之间均匀采样。针对部分方程，为避免非物理结果（如除零或负数的平方根），采样范围略有调整。输出结果通过函数

f

计算得出，例如

D = {x_{i} \in R^{d}, y_{i} = f (x_{1}, \dots, x_{d})}

。

^{4}

The equation number II.11.17 is missing in the benchmark repository.

^{4}

基准库中缺失方程 II.11.17 的编号。

This benchmark is rich in proposing various theoretical formulae. Still, it suffers a few limitations: (1) there is no distinction between variables and constants, i.e., constants are randomly sampled and, in some cases, in domains extremely far from physical values. For example, the speed of light is sampled from a uniform distribution

U (1, 20)

whereas its physical value is orders of magnitude higher,i.e.,

c = 2.988 \times 10^{8} m / s

,and the gravitational constant is sampled from

U (1, 2)

whereas its physical value is orders of magnitude smaller,

G = 6.6743 \times 10^{- 11} m^{3} {kg}^{- 1} s^{- 2}

,among others (e.g.,vacuum permittivity

ϵ \sim 10^{- 12}

,Boltzmann constant

k_{b} \sim 10^{- 23}

,Planck constant

h \sim 10^{- 34}

). (2) Some variables are sampled in nonphysical ranges. For example, the gravitational force is defined between two masses distant by

r

F = G m_{1} m_{2} / r^{2}

. This force is weak unless defined between significantly massive objects (e.g.,the mass of the earth is

M_{e} = 5.9722 \times 10^{24} kg

) whereas

m_{1}

and

m_{2}

are sampled in

U (1, 5)

in the Feynman database. (3) Some variables are treated as floats while they are integers, and (4) many equations are duplicates of each other (e.g., a multiplicative function of two variables

f (x, y) = x * y)

or have similar functional forms.
该基准测试集在提出各类理论公式方面内容丰富，但仍存在若干局限性：(1) 未区分变量与常量，即常量被随机采样，某些情况下其取值域与物理实际值相差甚远。例如，光速从均匀分布

U (1, 20)

中采样，而其物理值高出数个数量级，即

c = 2.988 \times 10^{8} m / s

；引力常数采样自

U (1, 2)

，而其物理值则小数个数量级，为

G = 6.6743 \times 10^{- 11} m^{3} {kg}^{- 1} s^{- 2}

，其他例子还包括真空介电常数

ϵ \sim 10^{- 12}

、玻尔兹曼常数

k_{b} \sim 10^{- 23}

、普朗克常数

h \sim 10^{- 34}

等。(2) 部分变量采样范围不符合物理实际。例如，两质量间距离

r

时的引力定义为

F = G m_{1} m_{2} / r^{2}

。此力除非作用于极大质量物体（如地球质量为

M_{e} = 5.9722 \times 10^{24} kg

）否则较弱，而 Feynman 数据库中

m_{1}

与

m_{2}

的采样范围却是

U (1, 5)

。(3) 某些本应为整数的变量被当作浮点数处理；(4) 许多方程存在重复，例如两变量的乘积函数

f (x, y) = x * y)

或具有相似函数形式。

The ODE-Strogatz repository [60] consists of ten physics equations that describe the behavior of dynamical systems which can exhibit chaotic and/or non-linear behavior. Each dataset is one state of a two-state system of ordinary differential equations.
ODE-Strogatz 代码库[60]包含十个描述动力系统行为的物理方程，这些系统可能表现出混沌和/或非线性行为。每个数据集均为一个由常微分方程构成的双态系统的一个状态。

Within the same category, there exist several benchmarks [1, 14, 15, 16, 17, 18, 19] consisting of real-valued symbolic functions. The majority of these benchmarks are proposed for GP-based methods and grouped into four categories: polynomial, trigonometric, logarithmic, exponential, and square-root functions, and a combination of univariate and bivariate functions. The suggested functions do not have any physical meaning, and most depend either on one or two independent variables. Datasets are generally generated by randomly sampling either 20 or 100 points in narrow ranges. The most commonly known is the so-called Nguyen benchmark, which consists of 12 symbolic functions taken from [65, 66, 67]. Only four equations have the scalars

{1, 2, 1 / 2}

as constants therein. Each benchmark is defined by a ground-truth expression, training, and test datasets. The equations proposed in these benchmarks can not be found in a single repository. Therefore we list them in the appendix in Tables 7.9 and Tables 10-11 for completeness and for easy comparison.
在同一类别中，存在多个由实值符号函数组成的基准测试集[1, 14, 15, 16, 17, 18, 19]。这些基准大多针对基于 GP 的方法提出，并分为四类：多项式、三角函数、对数函数、指数函数与平方根函数，以及单变量与双变量函数的组合。所建议的函数不具备任何物理意义，且多数仅依赖于一个或两个独立变量。数据集通常通过在狭窄范围内随机采样 20 或 100 个点生成。其中最广为人知的是所谓的 Nguyen 基准，它包含取自[65, 66, 67]的 12 个符号函数，其中仅四个方程含有标量

{1, 2, 1 / 2}

作为常量。每个基准均由真实表达式、训练集和测试集定义。这些基准中提出的方程无法在单一存储库中找到，因此为完整性和便于比较，我们在附录的表 7.9 及表 10-11 中列出了它们。

Real-world problems are characterized by an unknown model that underlies data. This category comprises two groups: observations and measurements. Data sets in the observations category can originate from any domain, such as health informatics, environmental science, business, commerce, etc. Data could be collected online or offline from reports or studies. A wide range of problems can be assessed from the following repositories: the PMLB [68], the OpenML [69], and the UCI [70]. An exemplary application in this category is wind speed forecasting [71]. Measurements represent sets of data points that are collected (and sometimes analyzed) in physics experiments. Here the target model is either an underlying theory than can be derived from first principles or not. In the first case, symbolic regression would either infer the correct model structure and parameters or contribute to the theory development of the studied process, whereas in the second case, the symbolic regression output could be the awaited theory.
现实世界的问题往往由数据背后未知的模型所定义。此类问题可分为两组：观测数据与测量数据。观测类数据集可源自任何领域，如健康信息学、环境科学、商业贸易等。这些数据可能通过线上或线下方式从报告或研究中收集。通过以下资源库可评估各类问题：PMLB [68]、OpenML [69] 和 UCI [70]。此类应用的典型范例是风速预测[71]。测量数据则代表物理学实验中收集（有时会分析）的数据点集合。此时目标模型要么是基于第一性原理可推导的理论，要么尚未明确。前者情况下，符号回归既可推断出正确模型结构与参数，又能推动所研究过程的理论发展；后者情况下，符号回归的输出可能成为期待中的理论。

9 Discussion 9 讨论

SR

is a growing area of

ML

and is gaining more attention as interpretability is increasingly promoted [3] in

AI

applications. SR is propelled by the fact that ML models are becoming very big in parameters at the expense of making accurate predictions. An exemplary application is the chatGPT-4, a large language model comprising hundreds of billions of parameters and trained on hundreds of terabytes of textual data. Such big models are very complicated networks. ChatGPT-4, for example, is accomplishing increasingly complicated and intelligent tasks to the point that it is showing emergent properties [72]. However, it is not straightforward to understand when it works and, more importantly, when it does not. In addition, its performance improves with increasing the number of parameters, highlighting that its prediction accuracy depends on the size of the training data set. Therefore, a new paradigm is needed, especially in scientific disciplines, such as physical sciences, where problems are of causal hypothesis-driven nature. SR is by far the most potential candidate to fulfill the interpretability requirements and is expected to play a central role in the future of ML.

SR

是

ML

中一个不断发展的领域，随着可解释性在

AI

应用中被日益强调[3]，它正获得更多关注。SR 的推动力在于，机器学习模型为了做出准确预测，参数规模变得非常庞大。一个典型应用是 chatGPT-4，这个大型语言模型包含数千亿参数，并基于数百 TB 文本数据进行训练。此类巨型模型是极其复杂的网络。以 ChatGPT-4 为例，它正在完成越来越复杂和智能的任务，甚至展现出涌现特性[72]。然而，要理解其何时有效运作——更重要的是何时失效——并非易事。此外，其性能随着参数数量增加而提升，这表明预测精度取决于训练数据集的规模。因此，在物理科学等以因果假设驱动为本质的科学领域，亟需新范式。目前 SR 是实现可解释性要求最具潜力的候选方案，预计将在机器学习未来发展中扮演核心角色。

Despite the significant advances made in this subfield and the high performance of most deep learning-based SR methods proposed in the literature, still, SR methods fail to recover relatively simple relationships. A case in point is the Nguyen-12 expression,i.e.,

f (x, y) = x^{4} - x^{3} + y^{2} / 2 - y

,where

x

and

y

are uniformly sampled in the range

[0, 1]

. The NGPPS method could not recover this particular expression using the library basis

L = {+, -, \times, \div, \sin, \cos, \exp, \log, x, y}

. A variant of this expression,Nguyen-12*,consisting of the same equation but defined over a larger domain,i.e.,data points sampled in

[0, 10]

,was successfully covered using the same library,with a recovery rate of

12 %

. This result is significantly below the perfect performance on all other Nguyen expressions. A similar observation is made for the Livermore-5whose expression is

f (x, y) = x^{4} - x^{3} + x^{2} - y

.
尽管该子领域已取得显著进展，且文献中提出的大多数基于深度学习的 SR 方法表现出高性能，但 SR 方法仍无法恢复相对简单的关系。一个典型例子是 Nguyen-12 表达式，即

f (x, y) = x^{4} - x^{3} + y^{2} / 2 - y

，其中

x

和

y

在

[0, 1]

范围内均匀采样。NGPPS 方法无法使用库基

L = {+, -, \times, \div, \sin, \cos, \exp, \log, x, y}

恢复这一特定表达式。该表达式的一个变体 Nguyen-12*（由相同方程构成但定义在更大域上，即在

[0, 10]

采样的数据点）使用相同库成功覆盖，恢复率为

12 %

。这一结果明显低于所有其他 Nguyen 表达式的完美表现。对于表达式为

f (x, y) = x^{4} - x^{3} + x^{2} - y

的 Livermore-5 也观察到类似现象。

We ran NGPPS on Nguyen-12 with two libraries,a pure polynomial basis

L_{1} = {+, -, \times, \div, {(\cdot)}^{2}, {(\cdot)}^{3}, {(\cdot)}^{4}, x, y}

and a mixed basis

L_{2} = L_{1} \cup {\sin, \cos, \exp, \log,

sqrt,expneg

}

. The algorithm succeeds in recovering Nguyen-12 only using a pure polynomial basis with a recovery rate of

3 %

. The same observation is made by applying linear SR on Nguyen-12. This highlights how strongly the predicted expression depends on the set of allowable mathematical operations. A practical way to encounter this limitation is to implement basic domain knowledge in SR applications whenever possible. For example, astronomical data collected by detecting the light curves of astronomical objects exhibit periodic behavior. In such cases, periodic functions such as trigonometric functions should be part of the library basis.
我们在 Nguyen-12 上运行了 NGPPS，使用了两个库：纯多项式基

L_{1} = {+, -, \times, \div, {(\cdot)}^{2}, {(\cdot)}^{3}, {(\cdot)}^{4}, x, y}

和混合基

L_{2} = L_{1} \cup {\sin, \cos, \exp, \log,

sqrt,expneg

}

。该算法仅使用纯多项式基就成功恢复了 Nguyen-12，恢复率为

3 %

。同样的观察结果也出现在对 Nguyen-12 应用线性 SR 时。这突显了预测表达式对允许的数学操作集的强烈依赖性。在实际应用中，应对这一限制的一种实用方法是在 SR 应用中尽可能实现基本的领域知识。例如，通过检测天体光变曲线收集的天文数据表现出周期性行为。在这种情况下，三角函数等周期函数应作为库基的一部分。

Most SR methods are only applied to synthetic data for which the input-output relationship is known. This is justified because the methods must be cross-checked, and their performance must be evaluated using ground-truth expressions. However, the reported results are for synthetic data only. To the best of our knowledge, only one physics application [34] succeeded in extracting New's laws of gravitation by applying SR to astronomical data. The absence of such applications leads us to state that SR is still a relatively nascent area with the potential to make a big impact. Physics in general, and physical sciences in particular, represent a very broad field for SR development purposes and are very rich both in data and expressions, e.g., areas such as astronomy and high-energy physics are very rich in data. In addition, lots of our acquired knowledge in physics can be used for SR methods test purposes because underlying phenomena and equations are well known. All that is needed is greater effort and investment.
大多数符号回归（SR）方法仅应用于已知输入输出关系的合成数据。这种做法是合理的，因为这些方法必须经过交叉验证，且其性能需通过真实表达式来评估。然而，所报告的结果仅针对合成数据。据我们所知，仅有一项物理学应用[34]通过将 SR 应用于天文数据成功推导出了牛顿万有引力定律。此类应用的缺乏使我们认为 SR 仍是一个相对新兴的领域，具有产生重大影响的潜力。物理学整体，尤其是物理科学，为 SR 的发展提供了极其广阔的领域，不仅数据丰富，表达式也多样，例如天文学和高能物理等领域就拥有海量数据。此外，我们在物理学中积累的大量知识可用于 SR 方法的测试，因为相关现象和基础方程已广为人知。当前最需要的是更多的努力和投入。

10 Conclusion 10 结论

This work presents an in-depth introduction to the symbolic regression problem and an expansive review of its methodologies and state-of-the-art applications. Also, this work highlights a number of conclusions that can be made about symbolic regression methods, including (1) linear symbolic regression suffer many limitations, all originating from predefining the model structure, (2) neural network-based methods lead to numerical issues and the library can not include all mathematical operations, (3) expression tree-based methods are yet the most powerful in terms of model performance on synthetic data, in particular transformer-based ones, (4) model predictions strongly depend on the set of allowable operations in the library basis, and (5) generally, deep learning-based methods are performing better than other ML-based methods.
本研究深入介绍了符号回归问题，并全面综述了其方法论与前沿应用。同时，本文总结了关于符号回归方法的若干结论，包括：(1) 线性符号回归存在诸多局限性，其根源均在于预定义模型结构；(2) 基于神经网络的方法会导致数值问题，且运算库无法涵盖所有数学运算；(3) 基于表达式树的方法（尤其是基于 Transformer 的）在合成数据上的模型性能目前最具优势；(4) 模型预测高度依赖运算库基函数中允许的操作集合；(5) 总体而言，基于深度学习的方法优于其他基于机器学习的方法。

Symbolic regression represents a powerful tool for learning interpretable models in a data-driven manner. Its application is likely to grow in the future because it balances prediction accuracy and interpretability. Despite the limited SR application to real data, the few existing ones are very promising. A potential path to boost progress in this subfield is to apply symbolic regression to experimental data in physics.
符号回归是一种以数据驱动方式学习可解释模型的强大工具。由于其在预测准确性与可解释性之间取得了平衡，未来其应用范围有望进一步扩大。尽管目前符号回归在实际数据中的应用有限，但现有的少数案例已展现出巨大潜力。推动该子领域发展的一个潜在路径是将符号回归应用于物理实验数据中。

References 参考文献

[1] J. R. Koza, "Genetic programming as a means for programming computers by natural selection," Proceedings of the National Academy of Sciences, vol. 4, no. 2, pp. 87-112, 1994.
[1] J. R. Koza，《通过自然选择编程计算机的遗传编程》，《美国国家科学院院刊》，第 4 卷，第 2 期，第 87-112 页，1994 年。

[2] M. Schmidt and H. Lipson, "Distilling free-form natural laws from experimental data," Science, vol. 324, no. 5923, pp. 81-85, 2009.
[2] M. Schmidt 与 H. Lipson，《从实验数据中提炼自由形式的自然法则》，《科学》，第 324 卷，第 5923 期，第 81-85 页，2009 年。

[3] C. Rudin, "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead," Nature Machine Intelligence, vol. 1, no. 5, pp. 206-215, 2019.
[3] C.鲁丁，《停止为高风险决策解释黑箱机器学习模型，转而使用可解释模型》，《自然·机器智能》，第 1 卷第 5 期，第 206-215 页，2019 年。

[4] M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha, "Systematic poisoning attacks on and defenses for machine learning in healthcare," IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 6, pp. 1893-1905, 2015.
[4] M.莫扎法里-科尔马尼、S.苏尔-科莱、A.拉古纳坦与 N.K.贾，《医疗保健领域机器学习系统性的投毒攻击与防御》，《IEEE 生物医学与健康信息学杂志》，第 19 卷第 6 期，第 1893-1905 页，2015 年。

[5] J. Kepler, Epitome astronomiae Copernicanae. in: Noscemus Wiki.
[5] J.开普勒，《哥白尼天文学概要》，收录于：Noscemus 维基。

[6] I. Newton, A. Motte, and J. Machin, The Mathematical Principles of Natural Philosophy. No. Volume 1 in The Mathematical Principles of Natural Philosophy, London: B. Motte, 1729.
[6] I.牛顿、A.莫特与 J.马钦，《自然哲学的数学原理》，《自然哲学的数学原理》第 1 卷，伦敦：B.莫特出版社，1729 年。

[7] D. Gerwin, "Information processing, data inferences, and scientific generalization," Systems Research and Behavioral Science, vol. 19, pp. 314-325, 1974.
[7] D. 格温，《信息处理、数据推断与科学归纳》，《系统研究与行为科学》，第 19 卷，第 314-325 页，1974 年。

[8] P. Langley, "Data-driven discovery of physical laws," Cognitive Science, vol. 5, no. 1, pp. 31-54, 1981.
[8] P. 兰利，《数据驱动的物理定律发现》，《认知科学》，第 5 卷第 1 期，第 31-54 页，1981 年。

[9] B. C. Falkenhainer and R. S. Michalski, "Integrating quantitative and qualitative discovery: The abacus system," Mach. Learn., vol. 1, p. 367-401, mar 1986.
[9] B. C. 福肯海纳与 R. S. 米哈尔斯基，《定量与定性发现的融合：ABACUS 系统》，《机器学习》，第 1 卷，第 367-401 页，1986 年 3 月。

[10] L. P.W., "Bacon: A production system that discovers empirical laws," 1979.
[10] L. P.W.，《培根：一个发现经验法则的生产系统》，1979 年。

[11] P. Langley, H. A. Simon, G. L. Bradshaw, and J. M. Zytkow, Scientific Discovery: Computational Explorations of the Creative Process. Cambridge, MA, USA: MIT Press, 1987.
[11] P. Langley, H. A. Simon, G. L. Bradshaw, J. M. Zytkow 著，《科学发现：创造性过程的计算探索》。美国马萨诸塞州剑桥：麻省理工学院出版社，1987 年。

[12] J. R. Koza, "Hierarchical genetic algorithms operating on populations of computer programs," in Proceedings of the 11th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI'89, (San Francisco, CA, USA), p. 768-774, Morgan Kaufmann Publishers Inc., 1989.
[12] J. R. Koza，“操作于计算机程序群体的分层遗传算法”，收录于《第 11 届国际人工智能联合会议论文集第一卷》，IJCAI'89，（美国加利福尼亚州旧金山），第 768-774 页，Morgan Kaufmann 出版社，1989 年。

[13] J. R. Koza, "Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems," tech. rep., Stanford, CA, USA, 1990.
[13] J. R. Koza，“遗传编程：一种通过基因培育计算机程序群体以解决问题的范式”，技术报告，美国加利福尼亚州斯坦福，1990 年。

[14] M. Keijzer, "Improving symbolic regression with interval arithmetic and linear scaling," in Genetic Programming (C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, and E. Costa, eds.), (Berlin, Heidelberg), pp. 70-82, Springer Berlin Heidelberg, 2003.
[14] M. Keijzer，《通过区间算术和线性缩放改进符号回归》，载于《遗传编程》（C. Ryan、T. Soule、M. Keijzer、E. Tsang、R. Poli 和 E. Costa 编），柏林/海德堡，第 70-82 页，Springer Berlin Heidelberg，2003 年。

[15] E. Vladislavleva, G. Smits, and D. den Hertog, "Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming," IEEE Transactions on Evolutionary Computation, vol. 13, pp. 333-349, 2009.
[15] E. Vladislavleva、G. Smits 和 D. den Hertog，《非线性阶数作为通过帕累托遗传编程生成的模型复杂度度量》，IEEE 进化计算汇刊，第 13 卷，第 333-349 页，2009 年。

[16] M. F. Korns, Accuracy in Symbolic Regression, pp. 129-151. New York, NY: Springer New York, 2011.
[16] M. F. Korns 著，《符号回归中的准确性》，第 129-151 页。美国纽约：Springer 纽约公司，2011 年。

[17] N. Q. Uy, N. X. Hoai, M. O'Neill, R. I. McKay, and E. G. López, "Semantically-based crossover in genetic programming: application to real-valued symbolic regression," Genetic Programming and Evolvable Machines, vol. 12, pp. 91-119, 2010.
[17] N. Q. Uy、N. X. Hoai、M. O'Neill、R. I. McKay 和 E. G. López 合著，《遗传编程中的语义交叉操作：在实数符号回归中的应用》，发表于《Genetic Programming and Evolvable Machines》期刊，第 12 卷，第 91-119 页，2010 年。

[18] Y. Jin, W. Fu, J. Kang, J. Guo, and J. Guo, "Bayesian symbolic regression," 2019.
[18] Y. Jin、W. Fu、J. Kang、J. Guo 和 J. Guo 合著，《贝叶斯符号回归》，2019 年。

[19] B. K. Petersen, "Deep symbolic regression: Recovering mathematical expressions from data via policy gradients," CoRR, vol. abs/1912.04871, 2019.
[19] B. K. Petersen 著，《深度符号回归：通过策略梯度从数据中恢复数学表达式》，发表于《CoRR》期刊，卷号 abs/1912.04871，2019 年。

[20] T. McConaghy, FFX: Fast, Scalable, Deterministic Symbolic Regression Technology, pp. 235-260. New York, NY: Springer New York, 2011.
[20] T. McConaghy 著，《FFX：快速、可扩展、确定性符号回归技术》，第 235-260 页，纽约，NY：Springer New York 出版社，2011 年。

[21] M. Virgolin, T. Alderliesten, C. Witteveen, and P. A. N. Bosman, "A model-based genetic programming approach for symbolic regression of small expressions," CoRR, vol. abs/1904.02050, 2019.
[21] M. 维尔戈林、T. 阿尔德利斯特恩、C. 维特文与 P. A. N. 博斯曼，《基于模型的遗传编程方法在小规模符号回归中的应用》，CoRR，卷 abs/1904.02050，2019 年。

[22] F. O. de França and G. S. I. Aldeia, "Interaction-transformation evolutionary algorithm for symbolic regression," CoRR, vol. abs/1902.03983, 2019.
[22] F. O. 德弗兰萨与 G. S. I. 阿尔代亚，《符号回归的交互转换进化算法》，CoRR，卷 abs/1902.03983，2019 年。

[23] I. Arnaldo, K. Krawiec, and U.-M. O'Reilly, "Multiple regression genetic programming," in Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO '14, (New York, NY, USA), p. 879-886, Association for Computing Machinery, 2014.
[23] I. 阿纳尔多、K. 克拉维茨与 U.-M. 奥莱利，《多元回归遗传编程》，载《2014 年遗传与进化计算年会论文集》，GECCO '14，（美国纽约州纽约市），第 879-886 页，计算机器协会，2014 年。

[24] W. G. L. Cava, T. R. Singh, J. Taggart, S. Suri, and J. H. Moore, "Stochastic optimization approaches to learning concise representations," CoRR, vol. abs/1807.00981, 2018.
[24] W. G. L. 卡瓦、T. R. 辛格、J. 塔格特、S. 苏里与 J. H. 摩尔，《学习简洁表示的随机优化方法》，CoRR，卷 abs/1807.00981，2018 年。

[25] N. Makke and S. Chawla, "A living review of symbolic regression," https://github.com/nmakke/SR-LivingReview, 2022.
[25] N. Makke 与 S. Chawla，《符号回归的活态综述》，https://github.com/nmakke/SR-LivingReview，2022 年。

[26] R. Dubcakova, "Eureqa: software review," Genetic Programming and Evolvable Machines, vol. 12, pp. 173- 178, June 2011.
[26] R. Dubcakova，《Eureqa：软件评测》，遗传编程与可进化机器，第 12 卷，第 173-178 页，2011 年 6 月。

[27] S.-M. Udrescu and M. Tegmark, "Ai feynman: a physics-inspired method for symbolic regression," 2019.
[27] S.-M. Udrescu 与 M. Tegmark，《AI Feynman：一种受物理学启发的符号回归方法》，2019 年。

[28] G. Martius and C. H. Lampert, "Extrapolation and learning equations," CoRR, vol. abs/1610.02995, 2016.
[28] G. Martius 与 C. H. Lampert，《外推与方程学习》，CoRR，卷 abs/1610.02995，2016 年。

[29] T. N. Mundhenk, M. Landajuela, R. Glatt, C. P. Santiago, D. M. Faissol, and B. K. Petersen, "Symbolic regression via neural-guided genetic programming population seeding," CoRR, vol. abs/2111.00053, 2021.
[29] T. N. Mundhenk、M. Landajuela、R. Glatt、C. P. Santiago、D. M. Faissol 与 B. K. Petersen，《通过神经引导遗传编程种群播种实现符号回归》，CoRR，卷 abs/2111.00053，2021 年。

[30] A. M. Alaa and M. van der Schaar, "Demystifying black-box models with symbolic metamodels," in Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché- Buc, E. Fox, and R. Garnett, eds.), vol. 32, (New York), Curran Associates, Inc., 2019.
[30] A. M. Alaa 与 M. van der Schaar 合著，《用符号元模型揭秘黑箱模型》，收录于《神经信息处理系统进展》（H. Wallach、H. Larochelle、A. Beygelzimer、F. d'Alché-Buc、E. Fox 及 R. Garnett 编），第 32 卷，（纽约）Curran Associates 公司，2019 年。

[31] P.-A. Kamienny, S. d'Ascoli, G. Lample, and F. Charton, "End-to-end symbolic regression with transformers," 2022.
[31] P.-A. Kamienny、S. d'Ascoli、G. Lample 及 F. Charton 合著，《基于 Transformer 的端到端符号回归》，2022 年。

[32] L. Biggio, T. Bendinelli, A. Neitz, A. Lucchi, and G. Parascandolo, "Neural symbolic regression that scales," CoRR, vol. abs/2106.06427, 2021.
[32] L. Biggio、T. Bendinelli、A. Neitz、A. Lucchi 及 G. Parascandolo 合著，《可扩展的神经符号回归》，CoRR，卷 abs/2106.06427，2021 年。

[33] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, "Data-driven discovery of coordinates and governing
[33] K. Champion、B. Lusch、J. N. Kutz 及 S. L. Brunton 合著，《数据驱动的坐标与支配关系发现》

equations," Proceedings of the National Academy of Sciences, vol. 116, no. 45, pp. 22445-22451, 2019.
方程，《美国国家科学院院刊》，第 116 卷，第 45 期，第 22445-22451 页，2019 年。

[34] P. Lemos, N. Jeffrey, M. Cranmer, S. Ho, and P. Battaglia, "Rediscovering orbital mechanics with machine learning," 2022.
[34] P. 莱莫斯、N. 杰弗里、M. 克兰默、S. 何与 P. 巴塔利亚，《用机器学习重新发现轨道力学》，2022 年。

[35] R. Batra, L. Song, and R. Ramprasad, "Emerging materials intelligence ecosystems propelled by machine learning," Nature Reviews Materials, vol. 6, pp. 655-678, Nov. 2020.
[35] R. Batra, L. Song, 和 R. Ramprasad，“由机器学习推动的新兴材料智能生态系统”，《自然综述：材料》，第 6 卷，第 655-678 页，2020 年 11 月。

[36] A. Hernandez, A. Balasubramanian, F. Yuan, S. Mason, and T. Mueller, "Fast, accurate, and transferable many-body interatomic potentials by symbolic regression," 2019.
[36] A. Hernandez, A. Balasubramanian, F. Yuan, S. Mason, 和 T. Mueller，“通过符号回归实现快速、准确且可迁移的多体原子间势”，2019 年。

[37] Y. Wang, N. Wagner, and J. M. Rondinelli, "Symbolic regression in materials science," MRS Communications, vol. 9, pp. 793-805, sep 2019.
[37] Y. Wang, N. Wagner, 和 J. M. Rondinelli，“材料科学中的符号回归”，《MRS 通讯》，第 9 卷，第 793-805 页，2019 年 9 月。

[38] B. Weng, Z. Song, R. Zhu, Q. Yan, Q. Sun, C. G. Grice, Y. Yan, and W.-J. Yin, "Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts," Nature Communications, vol. 11, 2020.
[38] B. Weng, Z. Song, R. Zhu, Q. Yan, Q. Sun, C. G. Grice, Y. Yan, 和 W.-J. Yin，“源自符号回归的简单描述符加速新型钙钛矿催化剂的发现”，《自然通讯》，第 11 卷，2020 年。

[39] J. Martinez-Gil and J. M. Chaves-Gonzalez, "A novel method based on symbolic regression for interpretable semantic similarity measurement," Expert Systems with Applications, vol. 160, p. 113663, 2020.
[39] J. Martinez-Gil 与 J. M. Chaves-Gonzalez 合著，《一种基于符号回归的可解释语义相似度测量新方法》，《专家系统及其应用》，第 160 卷，第 113663 页，2020 年。

[40] I. A. Abdellaoui and S. Mehrkanoon, "Symbolic regression for scientific discovery: an application to wind speed forecasting," 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 01-08, 2021.
[40] I. A. 阿卜杜拉胡伊与 S. 梅尔卡努恩合著，《符号回归在科学发现中的应用：以风速预测为例》，2021 年 IEEE 计算智能研讨会系列(SSCI)，第 01-08 页，2021 年。

[41] M. Virgolin, Z. Wang, T. Alderliesten, and P. A. N. Bosman, "Machine learning for the prediction of pseudorealistic pediatric abdominal phantoms for radiation dose reconstruction," Journal of Medical Imaging, vol. 7, no. 4, p. 046501, 2020.
[41] M. 维尔戈林、Z. 王、T. 阿尔德利斯特恩与 P. A. N. 博斯曼合著，《机器学习用于预测辐射剂量重建中的拟真实儿科腹部模型》，《医学影像杂志》，第 7 卷第 4 期，第 046501 页，2020 年。

[42] W. G. L. Cava, P. Orzechowski, B. Burlacu, F. O. de França, M. Virgolin, Y. Jin, M. Kommenda, and J. H. Moore, "Contemporary symbolic regression methods and their relative performance," CoRR, vol. abs/2107.14351, 2021.
[42] W. G. L. 卡瓦、P. 奥尔泽霍夫斯基、B. 布尔拉克、F. O. 德弗兰萨、M. 维尔戈林、Y. 金、M. 科门达与 J. H. 摩尔，《当代符号回归方法及其相对性能》，CoRR，卷 abs/2107.14351，2021 年。

[43] V. Vapnik, "Principles of risk minimization for learning theory," in Advances in Neural Information Processing Systems (J. Moody, S. Hanson, and R. Lippmann, eds.), vol. 4, (Cambridge, Massachusetts), Morgan-Kaufmann, 1991.
[43] V. 瓦普尼克，《学习理论中的风险最小化原则》，收录于《神经信息处理系统进展》（J. 穆迪、S. 汉森与 R. 利普曼编），第 4 卷，马萨诸塞州剑桥，摩根-考夫曼出版社，1991 年。

[44] M. Virgolin and S. P. Pissis, "Symbolic regression is np-hard," 2022.
[44] M. 维尔戈林与 S. P. 皮西斯，《符号回归是 NP 难问题》，2022 年。

[45] R. Robinson, "Jan Lukasiewicz: Aristotle's syllogistic from the standpoint of modern formal logic. second edition enlarged. pp. xvi 222. oxford: Clarendon press, 1957. cloth, 305. net.," The Classical Review, vol. 8, no. 3-4, p. 282-282, 1958.
[45] R. 罗宾逊，《扬·武卡谢维奇：从现代形式逻辑角度看亚里士多德的三段论。第二版增补本。共 xvi+222 页。牛津：克拉伦登出版社，1957 年。布面精装，定价 305 便士。》，《古典评论》，第 8 卷第 3-4 期，第 282 页，1958 年。

[46] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," CoRR, vol. abs/1706.03762, 2017.
[46] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, 和 I. Polosukhin, "注意力就是您所需的一切," CoRR, vol. abs/1706.03762, 2017.

[47] G. Martius and C. H. Lampert, "Extrapolation and learning equations," CoRR, vol. abs/1610.02995, 2016.
[47] G. Martius 和 C. H. Lampert, "外推与学习方程," CoRR, vol. abs/1610.02995, 2016.

[48] S. S. Sahoo, C. H. Lampert, and G. Martius, "Learning equations for extrapolation and control," CoRR, vol. abs/1806.07259, 2018.
[48] S. S. Sahoo, C. H. Lampert, 和 G. Martius, "用于外推与控制的方程学习," CoRR, vol. abs/1806.07259, 2018.

[49] M. Valipour, B. You, M. Panju, and A. Ghodsi, "Symbolicgpt: A generative transformer model for symbolic regression," CoRR, vol. abs/2106.14131, 2021.
[49] M. Valipour, B. You, M. Panju, 和 A. Ghodsi, "SymbolicGPT: 一种符号回归的生成式 Transformer 模型," CoRR, vol. abs/2106.14131, 2021.

[50] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018.
[50] R. S. 萨顿与 A. G. 巴托，《强化学习：导论》。美国马萨诸塞州剑桥：布拉德福德图书出版社，2018 年。

[51] S. L. Brunton, J. L. Proctor, and J. N. Kutz, "Discovering governing equations from data by sparse identification of nonlinear dynamical systems," Proceedings of the National Academy of Sciences, vol. 113, no. 15, pp. 3932-3937, 2016.
[51] S. L. 布伦顿、J. L. 普罗克特与 J. N. 库茨，“通过非线性动力系统的稀疏辨识从数据中发现控制方程”，《美国国家科学院院刊》，第 113 卷，第 15 期，第 3932-3937 页，2016 年。

[52] M. D. Cranmer, A. Sanchez-Gonzalez, P. W. Battaglia, R. Xu, K. Cranmer, D. N. Spergel, and S. Ho, "Discovering symbolic models from deep learning with inductive biases," CoRR, vol. abs/2006.11287, 2020.
[52] M. D. 克兰默、A. 桑切斯-冈萨雷斯、P. W. 巴塔利亚、R. 徐、K. 克兰默、D. N. 斯佩格尔与 S. 何，“利用归纳偏置从深度学习中发掘符号模型”，《CoRR》，卷 abs/2006.11287，2020 年。

[53] Heuristic and E. A. Laboratory.
[53] 启发式与 E. A. 实验室。

[54] W. La Cava, K. Danai, and L. Spector, "Inference of compact nonlinear dynamic models by epigenetic local search," Engineering Applications of Artificial Intelligence, vol. 55, pp. 292-306, 2016.
[54] W. La Cava, K. Danai, 与 L. Spector, 《通过表观遗传局部搜索推断紧凑非线性动态模型》, 《工程人工智能应用》, 卷 55, 页 292-306, 2016 年。

[55] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation, p. 318-362. Cambridge, MA, USA: MIT Press, 1986.
[55] D. E. Rumelhart, G. E. Hinton, 与 R. J. Williams, 《通过误差传播学习内部表示》, 页 318-362. 美国剑桥: MIT 出版社, 1986 年。

[56] B. Chen, K. Huang, S. Raghupathi, I. Chandratreya, Q. Du, and H. Lipson, "Discovering state variables hidden in experimental data," 2021.
[56] B. Chen, K. Huang, S. Raghupathi, I. Chandratreya, Q. Du, 与 H. Lipson, 《发现隐藏在实验数据中的状态变量》, 2021 年。

[57] C. Meijer, On the g-function. Netherlands: North-Holland, 1946.
[57] C. Meijer, 《论 g 函数》, 荷兰: 北荷兰出版社, 1946 年。

[58] R. Beals and J. Szmigielski, "Meijer g-functions: a gentle introduction," Notices of the American Mathematical Society, vol. 60, pp. 866-873, 2013.
[58] R. 比尔斯与 J. 斯米吉尔斯基，《Meijer G 函数：温和入门指南》，美国数学会通告，第 60 卷，第 866-873 页，2013 年。

[59] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, p. 229-256, may 1992.
[59] R. J. 威廉姆斯，《连接主义强化学习中简单的统计梯度跟随算法》，机器学习，第 8 卷，第 229-256 页，1992 年 5 月。

[60] W. La Cava, K. Danai, and L. Spector, "Inference of compact nonlinear dynamic models by epigenetic local search," Engineering Applications of Artificial Intelligence, vol. 55, pp. 292-306, 2016.
[60] W. La Cava、K. Danai 和 L. Spector 合著，《通过表观局部搜索推断紧凑非线性动态模型》，《人工智能工程应用》，第 55 卷，第 292-306 页，2016 年。

[61] K. Krawiec and T. Pawlak, "Approximating geometric crossover by semantic backpropagation," in Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, GECCO '13, (New York, NY, USA), p. 941-948, Association for Computing Machinery, 2013.
[61] K. Krawiec 与 T. Pawlak 合著，《通过语义反向传播逼近几何交叉》，载于《第 15 届遗传与进化计算年会论文集》，GECCO '13，（美国纽约州纽约市），第 941-948 页，计算机协会，2013 年。

[62] M. Tegmark, "The feynman symbolic regression database," May 2019.
[62] M. Tegmark 著，《费曼符号回归数据库》，2019 年 5 月。

[63] R. Feynman, R. Leighton, and M. Sands, The Feynman Lectures on Physics, Vol. I: The New Millennium Edition: Mainly Mechanics, Radiation, and Heat. The Feynman Lectures on Physics, New York: Basic Books, 2011.
[63] R·费曼, R·莱顿, 与 M·桑兹合著，《费曼物理学讲义第一卷：新千年版：主要涵盖力学、辐射与热学》。费曼物理学讲义系列，纽约：Basic Books 出版社，2011 年。

[64] R. Feynman, R. Leighton, M. Sands, and M. Gottlieb, The Feynman Lectures on Physics. No. Volume 2 in The Feynman Lectures on Physics, Boston: Pearson/Addison-Wesley, 2006.
[64] R·费曼, R·莱顿, M·桑兹, 及 M·戈特利布合著，《费曼物理学讲义》。费曼物理学讲义系列之第二卷，波士顿：Pearson/Addison-Wesley 出版社，2006 年。

[65] M. Keijzer, "Improving symbolic regression with interval arithmetic and linear scaling," in Genetic Programming (C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, and E. Costa, eds.), (Berlin, Heidelberg), pp. 70-82, Springer Berlin Heidelberg, 2003.
[65] M·凯泽著，"利用区间算术与线性缩放改进符号回归"，收录于《遗传编程》（C·瑞安, T·索尔, M·凯泽, E·曾, R·波利, 与 E·科斯塔编），柏林/海德堡：Springer Berlin Heidelberg 出版社，2003 年，第 70-82 页。

[66] N. Hoai, R. McKay, D. Essam, and R. Chau, "Solving the symbolic regression problem with tree-adjunct grammar guided genetic programming: the comparative results," in Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600), vol. 2, pp. 1326-1331 vol.2, 2002.
[66] N·怀, R·麦凯, D·埃萨姆, 与 R·周合著，"基于树邻接语法引导遗传编程的符号回归问题求解：对比实验结果"，发表于《2002 年进化计算大会论文集》。CEC'02（分类号 02TH8600），第 2 卷，2002 年，第 1326-1331 页。

[67] C. G. Johnson, "Genetic programming crossover: Does it cross over?," in Genetic Programming (L. Van-neschi, S. Gustafson, A. Moraglio, I. De Falco, and M. Ebner, eds.), (Berlin, Heidelberg), pp. 97-108, Springer Berlin Heidelberg, 2009.
[67] C·G·约翰逊，《遗传编程中的交叉操作：是否真正实现交叉？》，载于《遗传编程》（L·范内斯基、S·古斯塔夫森、A·莫拉利奥、I·德法尔科、M·埃布纳编），柏林/海德堡：施普林格出版社，2009 年，第 97-108 页。

[68] R. S. Olson, W. La Cava, P. Orzechowski, R. J. Urbanowicz, and J. H. Moore, "Pmlb: a large benchmark suite for machine learning evaluation and comparison," BioData Mining, vol. 10, pp. 1-13, Dec 2017.
[68] R·S·奥尔森、W·拉卡瓦、P·奥尔泽霍夫斯基、R·J·乌尔班诺维奇、J·H·摩尔，《PMLB：用于机器学习评估与对比的大规模基准测试集》，《生物数据挖掘》，第 10 卷，第 1-13 页，2017 年 12 月。

[69] J. Vanschoren, J. N. van Rijn, B. Bischl, and L. Torgo, "Openml: networked science in machine learning," SIGKDD Explorations, vol. 15, no. 2, pp. 49-60, 2013.
[69] J·范绍伦、J·N·范里恩、B·比施尔、L·托尔戈，《OpenML：机器学习中的网络化科学》，《SIGKDD 探索》，第 15 卷第 2 期，第 49-60 页，2013 年。

[70] D. Dua and C. Graff, "UCI machine learning repository," 2017.
[70] D·杜瓦、C·格拉夫，《UCI 机器学习数据库》，2017 年。

[71] I. A. Abdellaoui and S. Mehrkanoon, "Symbolic regression for scientific discovery: an application to wind speed forecasting," 2021.
[71] I. A. Abdellaoui 和 S. Mehrkanoon，《符号回归在科学发现中的应用：以风速预测为例》，2021 年。

[72] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus, "Emergent abilities of large language models," 2022.
[72] J. Wei、Y. Tay、R. Bommasani、C. Raffel、B. Zoph、S. Borgeaud、D. Yogatama、M. Bosma、D. Zhou、D. Metzler、E. H. Chi、T. Hashimoto、O. Vinyals、P. Liang、J. Dean 和 W. Fedus，《大型语言模型的涌现能力》，2022 年。

[73] U.-M. O'Reilly, "Genetic programming ii: Automatic discovery of reusable programs.," Artificial Life, vol. 1, no. 4, pp. 439-441, 1994.
[73] U.-M. O'Reilly，《遗传编程 II：可重用程序的自动发现》，《人工生命》，第 1 卷，第 4 期，第 439-441 页，1994 年。

A Datasets Benchmarks Equations
数据集基准测试方程

Table 7: Ground-truth expressions for Koza [73, Nguyen [17], Jin [18], Keijzer [14] and R 61 benchmarks.
表 7：Koza [73]、Nguyen [17]、Jin [18]、Keijzer [14]及 R 61 基准的真实表达式。

Dataset 数据集	Expression 表达	Variables 变量	Data range 数据范围
Koza-1 $^{5}$ 科扎-1 $^{5}$	$x^{4} + x^{3} + x^{2} + x$	1	U $[- 1, 1, 20]$
Koza-2 科扎-2	$x^{5} - 2 x^{3} + x$	1	U $[- 1, 1, 20]$
Koza-3 科扎-3	$x^{6} - 2 x^{4} + x^{2}$	1	U $[- 1, 1, 20]$
Nguyen-1 阮氏-1	$x^{3} + x^{2} + x$	1	U(-1,1,20) 均匀分布(-1,1,20)
Nguyen-2 阮氏-2	$x^{4} + x^{3} + x^{2} + x$	1	U(-1,1,20) 均匀分布(-1,1,20)
Nguyen-3 阮氏-3	$x^{5} + x^{4} + x^{3} + x^{2} + x$	1	U(-1,1,20) 均匀分布(-1,1,20)
Nguyen-4	$x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x$	1	U(-1,1,20) 均匀分布(-1,1,20)
Nguyen-5 阮氏-5	$\sin (x^{2}) \cos (x) - 1$	1	U(-1,1,20) 均匀分布(-1,1,20)
Nguyen-6 阮氏-6	$\sin (x) + \sin (x + x^{2})$	1	U(-1,1,20) 均匀分布(-1,1,20)
Nguyen-7 阮氏-7	$\log (x + 1) + \log (x^{2} + 1)$	1	U(0,2,20) 均匀分布(0,2,20)
Nguyen-8 阮氏-8	$\sqrt{x}$	1	U(0,4,20) 均匀分布(0,4,20)
Nguyen-9 阮氏-9	$\sin (x) + \sin (y^{2})$	2	U(-1,1,100) 均匀分布(-1,1,100)
Nguyen-10 阮氏-10	$2 \sin (x) \cos (y)$	2	U(-1,1,100) 均匀分布(-1,1,100)
Nguyen-11 阮-11	$x^{y}$	2
Nguyen-12 阮-12	$x^{4} - x^{3} + \frac{1}{2} y^{2} - y$	2
Jin-1 金-1	$2.5 x^{4} - 1.3 x^{3} + 0.5 y^{2} - 1.7 y$	2	U(-3,3,100)
Jin-2 金-2	$8.0 x^{2} + 8.0 y^{3} - 15.0$	2	U(-3,3,100) 均匀分布(-3,3,100)
Jin-3 金-3	$0.2 x^{3} + 1.5 y^{3} - 1.2 y - 0.5 x$	2	U(-3,3,100) 均匀分布(-3,3,100)
Jin-4 金-4	$1.5 \exp (x) + 5.0 \cos (y)$	2	U(-3,3,100) 均匀分布(-3,3,100)
Jin-5 金-5	$6.0 \sin (x) \cos (y)$	2	U(-3,3,100) 均匀分布(-3,3,100)
Jin-6 金-6	$1.35 x y + 5.5 \sin ((x - 1.0) (y - 1.0)$	2	U(-3,3,100)
Keijzer-1 凯泽-1	$0.3 x \sin (2 π x)$	1	E $[- 1, 1, 0.1]$
Keijzer-2 凯泽-2	$0.3 x \sin (2 π x)$	1	E $[- 2, 2, 0.1]$
Keijzer-3 凯泽-3	$0.3 x \sin (2 π x)$	1	E $[- 3, 3, 0.1]$
Keijzer-4 凯泽-4	$x^{3} e^{- x} \cos (x) \sin (x) (\sin^{2} (x) \cos (x) - 1)$	1	$E [0, 10, 0.05]$
Keijzer-5 凯泽-5	$30 x z / (x - 10) y^{2}$	3	$x, z : U [- 1, 1, 1000]$ $y : U [1, 2, 1000]$
Keijzer-6 凯泽-6	$\sum_{1}^{x} i$	1	$E [1, 50, 1]$
Keijzer-7 凯泽-7	$\log x$	1	E $[1, 100, 1]$
Keijzer-8 凯泽-8	$\sqrt{x}$	1	E $[0, 100, 1]$
Keijzer-9 凯泽-9	$arcsinh (x) = \log (x + \sqrt{x^{2} + 1})$	1	E $[0, 100, 1]$
Keijzer-10 凯泽-10	$x^{y}$	2	$U [0, 1, 100]$
Keijzer-11 凯泽-11	$x y + \sin ((x - 1) (y - 1))$	2	U[-3, 3, 20] 均匀分布[-3, 3, 20]
Keijzer-12 凯泽-12	$x^{4} - x^{3} + y^{2} / 2 - y$	2	U[-3, 3, 20] 均匀分布[-3, 3, 20]
Keijzer-13 凯泽-13	$6 \sin (x) \cos (y)$	2	U $[- 3, 3, 20]$
Keijzer-14 凯泽-14	$8 / (2 + x^{2} + y^{2})$	2	U $[- 3, 3, 20]$
Keijzer-15 凯泽-15	$x^{3} / 5 + y^{3} / 2 - y - x$	2	U $[- 3, 3, 20]$
R1	${(x + 1)}^{3} / (x^{2} - x + 1)$	1	E $[- 1, 1, 20]$
R2	$(x^{5} - 3 x^{3} + 1) / (x^{2} + 1)$	1	E $[- 1, 1, 20]$
R3	$(x^{6} + x^{5}) / (x^{4} + x^{3} + x^{2} + x + 1)$	1	E $[- 1, 1, 20]$

Table 8: Ground-truth expressions for Korns [16] and Livermore [19] benchmarks.
表 8：Korns [16] 和 Livermore [19] 基准测试的真实表达式。

Dataset 数据集	Expression 表达	Variables 变量	Data range 数据范围
Korns-1 科恩斯-1	$1.57 + (24.3 v)$	1	U[-50, 50, 10000]
Korns-2 科恩斯-2	$0.23 + 14.2 \frac{v + y}{3 ω}$	3	U[-50, 50, 10000 均匀分布[-50, 50, 10000
Korns-3 科恩斯-3	$- 5.41 + 4.9 \frac{v - x + y / w}{3 ω}$	4	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-4 科恩斯-4	$- 2.3 + 0.13 \sin (z)$	1	U[-50, 50, 10000 均匀分布[-50, 50, 10000
Korns-5 科恩斯-5	$3 + 2.13 \ln (ω)$	1	U[-50, 50, 10000 均匀分布[-50, 50, 10000
Korns-6 科恩斯-6	$1.3 + 0.13 \sqrt{x}$	1	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-7 科恩斯-7	$213.80940889 (1 - e^{- 0.54723748542 x})$	1	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-8 科恩斯-8	$6.87 + 11 \sqrt{7.23 x v ω}$	3	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-9 科恩斯-9	$\frac{\sqrt{x}}{\ln (y)} \frac{e^{z}}{v^{2}}$	4	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-10 科恩斯-10	$0.81 + 24.3 \frac{2 y + 3 z^{2}}{4 v^{3} + 5 ω^{4}}$	4	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-11 科恩斯-11	$6.87 + 11 \cos (7.23 x^{3})$	1	U[-50, 50, 10000] 均匀分布[-50,50,10000]
Korns-12 科恩斯-12	$2 - 2.1 \cos (9.8 x) \sin (1.3 ω)$	2	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-13 科恩斯-13	$32 - 3 \frac{\tan (x)}{\tan (y)} \frac{\tan (z)}{\tan (v)}$	4	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-14 科恩斯-14	$22 - 4.2 (\cos (x) - \tan (y)) \frac{\tanh (z)}{\sin (v)}$	4	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Korns-15 科恩斯-15	$12 - 6 \frac{\tan (x)}{e^{y}} (\ln (z) - \tan (v))$	4	U[-50, 50, 10000] 均匀分布[-50, 50, 10000]
Livermore-1 利弗莫尔-1	$1 / 3 + x + \sin (x^{2})$	1	U[-10,10,1000] 均匀分布[-10,10,1000]
Livermore-2 利弗莫尔-2	$\sin (x^{2}) \cos (x) - 2$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-3 利弗莫尔-3	$\sin (x^{3}) \cos (x^{2}) - 1$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-4 利弗莫尔-4	$\log (x + 1) + \log (x^{2} + 1) + \log (x)$	1	U[0,2,20]
Livermore-5 利弗莫尔-5	$x^{4} - x^{3} + x^{2} - y$	2	U[0,1,20]
Livermore-6 利弗莫尔-6	$4 x^{4} + 3 x^{3} + 2 x^{2} + x$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-7 利弗莫尔-7	$\sinh (x)$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-8 利弗莫尔-8	$\cosh (x)$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-9 利弗莫尔-9	$x^{9} + x^{8} + x^{7} + x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-10 利弗莫尔-10	$6 \sin (x) \cos (y)$	2	U[0,1,20] 均匀分布[0,1,20]
Livermore-11 利弗莫尔-11	$x^{2} y^{2} / (x + y)$	2	U[-1,1,50] 均匀分布[-1,1,50]
Livermore-12 利弗莫尔-12	$x^{5} / y^{3}$	2	U[-1,1,50] 均匀分布[-1,1,50]
Livermore-13 利弗莫尔-13	$x^{1 / 3}$	1	U $[0, 4, 20]$
Livermore-14 利弗莫尔-14	$x^{3} + x^{2} + x + \sin (x) + \sin (x^{2})$	1	U $[- 1, 1, 20]$
Livermore-15 利弗莫尔-15	$x^{1 / 5}$	1	U[0,4,20]
Livermore-16 利弗莫尔-16	$x^{2 / 5}$	1	U $[0, 4, 20]$
Livermore-17 利弗莫尔-17	$4 \sin (x) \cos (y)$	2	U[0,1,20] 均匀分布[0,1,20]
Livermore-18 利弗莫尔-18	$\sin (x^{2}) \cos (x) - 5$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-19 利弗莫尔-19	$x^{5} + x^{4} + x^{2} + x$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-20 利弗莫尔-20	$\exp (- x^{2})$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-21 利弗莫尔-21	$x^{8} + x^{7} + x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x$	1	U[-1,1,20] 均匀分布[-1,1,20]
Livermore-22 利弗莫尔-22	$\exp (- 0.5 x^{2})$	1	U[-1,1,20] 均匀分布[-1,1,20]

Table 9: Ground-truth expressions for Vladislavleva [15] benchmark.
表 9：Vladislavleva [15]基准测试的真实表达式。

Dataset 数据集	Expression 表达	Variables 变量	Data range 数据范围
Vladislavleva-1 弗拉季斯拉夫列娃-1	$e^{- {(x - 1)}^{2}}$ $1.2 + {(y - 2.5)}^{2}$	1	U $[0.3, 4, 100]$
Vladislavleva-2 弗拉季斯拉夫列娃-2	$e^{- x} x^{3} (\cos x \sin x) (\cos x \sin^{2} x - 1)$	2	E $[0.5, 10, 0.1]$
Vladislavleva-3 弗拉季斯拉夫列娃-3	$e^{- x} x^{3} (\cos x \sin x) (\cos x \sin^{2} x - 1) (y - 5)$	2	$x : E [0.05, 10, 0.1]$ $y : E [0.05, 10.05, 2]$
Vladislavleva-4 弗拉季斯拉夫列娃-4	$\frac{10}{5 + \sum_{i = 1}^{5} {(x_{i} - 3)}^{2}}$	5	U[0.05, 6.05, 1024]
Vladislavleva-5 弗拉季斯拉夫-5	$30 (x - 1) \frac{(z - 1)}{y^{2} (x - 10)}$	3	$x : U [0.05, 2, 300]$ $y : U [1, 2, 300]$ $z : U [0.05, 2, 300]$
Vladislavleva-6 弗拉季斯拉夫-6	$6 \sin (x) \cos (y)$	2	U $[0.1, 5.9, 30]$
Vladislavleva-7 弗拉季斯拉夫-7	$(x - 3) (y - 3) + 2 \sin ((x - 4) (y - 4))$	2	U[0.05, 6.05, 300]
Vladislavleva-8 弗拉季斯拉夫-8	$\frac{{(x - 3)}^{4} + {(y - 3)}^{3} - (y - 3)}{{(y - 2)}^{4} + 10}$	2	U $[0.05, 6.05, 50]$

Function form 函数形式	#v
$f = \exp (- θ^{2} / 2) / \sqrt{(} 2 π)$	1
$f = \exp (- {(θ / σ)}^{2} / 2) / (\sqrt{(} 2 π) σ)$	2
$f = \exp (- {((θ - θ_{1}) / σ)}^{2} / 2) / (\sqrt{(} 2 π) σ)$	3
$d = \sqrt{{(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2}}$ ,_____ $G m 1 m 2$ $Γ = \frac{1}{({(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2} + {(z_{2} - z_{1})}^{2}}$ $m = \frac{m_{0}}{\sqrt{1 - v^{2} / c^{2}}}$	4 9 3
$A = x_{1} y_{1} + x_{2} y_{2} + x_{3} y_{3}$	6
$F = μ N_{n}$	2
$F = q_{1} q_{2} / (4 π ϵ r^{2})$	4
$E_{f} = q_{1} r / (4 π ϵ r^{3})$	3
$F = q_{2} E_{f}$	2
$F = q (E_{f} + B v \sin (θ))$	5
$K = 1 / 2 m (v^{2} + u^{2} + w^{2})$	4
$U = G m 1 m 2 (\frac{1}{r_{2}} - \frac{1}{r_{1}})$	5
$U = m g z$	3
$U = \frac{1}{2} k_{spring} x^{2}$	2
$x^{'} = (x - u t) / \sqrt{1 - u^{2} / c^{2}}$	4
$t^{'} = (t - u x / c^{2}) / \sqrt{1 - u^{2} / c^{2}}$	4
$p = m_{0} v / \sqrt{1 - v^{2} / c^{2}}$	3
$v^{'} = (u + v) / (1 + u v / c^{2})$	3
$r = (m_{1} r_{1} + m_{2} r_{2}) / (m 1 + m 2)$	4
$τ = r F \sin (θ)$	3
$L = m r v \sin (θ)$ $E = \frac{1}{4} m (ω^{2} + ω_{0}^{2}) x^{2}$	4 4
$V_{e} = q / C$	2
$θ_{1} = \arcsin (n \sin (θ 2))$	2
$f_{f} = 1 / (\frac{1}{d_{1}} + \frac{n}{d_{2}})$	3
$k = ω / c$	2
$x = \sqrt{x_{1}^{2} + x_{2}^{2} - 2 x_{1} x_{2} \cos (θ_{1} - θ_{2})}$	4
$I_{⋆} = I_{0 ⋆} \sin^{2} (n θ / 2) / \sin^{2} (θ / 2)$	3
$θ = \arcsin (λ / n d)$	3
$P = q^{2} a^{2} / (6 π ϵ c^{3})$	4
$P = (1 / 2 ϵ c E_{f}^{2}) (8 π r^{2} / 3) (ω^{4} / {(ω^{2} - ω_{0}^{2})}^{2})$	6
$ω = q v B / p$	4
$ω = ω_{0} / (1 - v / c)$	3
$ω = (1 + v / c) / \sqrt{1 - v^{2} / c^{2}} ω_{0}$ $E = h ω$ $I_{⋆} = I_{1} + I_{2} + 2 \sqrt{I_{1} I_{2}} \cos (δ)$	3 2 3
$r = 4 π ϵ {\bar{h}}^{2} / (m q^{2})$	4
$E = \frac{3}{2} p_{F} V$	2
$E = 1 / (γ - 1) p_{F} V$	3
$P_{F} = n k b_{T} / V$	4

Function form 函数形式	#
$n = n_{0} \exp (- m g x / (k_{b} T))$	6
$L_{r a d} = \bar{h} ω^{3} / (π^{2} c^{2} (\exp (\bar{h} ω / (k_{b} T)) - 1))$	5
$v = m u_{drift} q V_{e} / d$	4
$D = μ_{e} k_{b} T$	3
$κ = 1 / (γ - 1) k_{b} v / A$	4
$E = n k_{b} T \ln (\frac{V 2}{V 1})$	5
$c = \sqrt{} (γ p r / ρ)$	3
$E = m c^{2} / \sqrt{1 - v^{2} / c^{2}}$	3
$x = x_{1} (\cos (ω t) + α \cos {(ω t)}^{2})$	4
$P = κ (T_{2} - T_{1}) A / d$	5
$F_{E} = P w r / (4 π r^{2})$	2
$V_{e} = q / (4 π ϵ r)$	3
$V_{e} = \frac{1}{4 π ϵ} p_{d} \cos (θ) / r^{2}$	4
$E_{f} = \frac{3}{4 π ϵ} p_{d} z / r^{5} \sqrt{x^{2} + y^{2}}$	6
$E_{f} = \frac{3}{4 π ϵ} p_{d} \cos (θ) \sin (θ) / r^{3}$	4
$E = \frac{3}{5} q^{2} / (4 π ϵ d)$	3
$E_{den} = ϵ E f^{2} / 2$	2
$E_{f} = σ_{d e n} / ϵ 1 / (1 + χ)$	3
$x = q E_{f} / (m (ω_{0}^{2} - ω^{2}))$	5
$n = n_{0} (1 + p_{d} E_{f} \cos (θ) / (k_{b} T))$	6
$P_{⋆} = n_{r h o} p_{d}^{2} E_{f} / (3 k_{b} T)$	5
$P_{⋆} = n α / (1 - (n α / 3)) ϵ E_{f}$	4
$θ = 1 + n α / (1 - (n α / 3))$	2
$B = 1 / (4 π ϵ c^{2}) 2 I / r$	4
$ρ_{c} = ρ_{c_{0}} / \sqrt{1 - v^{2} / c^{2}}$	3
$j = ρ_{c_{0}} v / \sqrt{1 - v^{2} / c^{2}}$	3
$E = - μ_{M} B \cos (θ)$	3
$E = - p_{d} E_{f} \cos (θ)$	3
$V_{e} = q / (4 π ϵ r (1 - v / c))$	5
$k = \sqrt{ω^{2} / c^{2} - π^{2} / d^{2}}$	3
$F_{E} = ϵ c E_{f}^{2}$	3
$E_{d e n} = ϵ {\dot{E}}_{f}^{2}$	2
$I = q v / (2 π r)$	3
$μ_{M} = q v r / 2$	3
$ω = g q B / (2 m)$	4
$μ_{M} = q h / (2 m)$	3
$E = g μ_{M} B J_{z} / h$	5
$M = n_{r h o} μ_{M} \tanh (μ_{M} B / (k_{b} T))$	5
$f = μ_{m} B / (k_{b} T) + (μ_{m} α) / (ϵ c^{2} k_{b} T) M$	8
$E = μ_{M} (1 + χ) B$	6
$F = Y A_{x} / d$	4
$μ_{S} = Y / (2 (1 + σ))$	2
$E = \bar{h} ω / (\exp (\bar{h} ω / (k_{b} T)) - 1)$	4

Function form 函数形式	#variables
$n = 1 / (\exp (\bar{h} ω / (k_{b} T)) - 1)$	4
$n = n_{0} / (\exp (μ_{m} B / (k_{b} T)) + \exp (- μ_{m} B / (k_{b} T)))$
$ω = 2 μ_{M} B / \bar{h}$	3
$p_{γ} = \sin {(E_{n} t / \bar{h})}^{2}$	3
$p_{γ} = (p_{d} E_{f} t / \bar{h}) \sin {((ω - ω_{0}) t / 2)}^{2} / {((ω - ω_{0}) t / 2)}^{2}$	6
$E = μ_{M} \sqrt{B_{x}^{2} + B_{y}^{2} + B_{z}^{2}}$	3
$L = n \bar{h}$	2
$v = 2 E_{n} d^{2} k / \bar{h}$	4
$I = I_{0} (\exp (q V_{e} / (k_{b} T)) - 1)$	5
$E = 2 U (1 - \cos (k d))$	3
$m = {\bar{h}}^{2} / (2 E_{n} d^{2})$	3
$k = 2 π α / (n d)$	3
$f = β (1 + α \cos (θ))$	3
$E = - m q^{4} / (2 {(4 π ϵ)}^{2} {\bar{h}}^{2}) (1 / n^{2})$	4
$j = - ρ_{c_{0}} q A_{v e c} / m$	4

Interpretable Scientific Discovery with Symbolic Regression: A Review
可解释科学发现与符号回归方法综述

Abstract 摘要

Contents 目录

1 Introduction 1 引言

2 Problem Definition 2 问题定义

2.1 Class of Function 2.1 函数类

2.2 Expression representation
2.2 表达式表示法

3 Symbolic regression methods overview
3 符号回归方法概述

4 Linear symbolic regression
4 线性符号回归

4.1 Unidimensional case 4.1 单维情况

4.1.1 Univariate function
4.1.1 单变量函数

Table 2 表 2

4.1.2 Multivariate function
4.1.2 多元函数

4.2 Multidimensional case
4.2 多维情况

5 Nonlinear symbolic regression
5 非线性符号回归

6 Tree expression 6 树表达式

6.1 Genetic Programming 6.1 遗传编程

6.2 Transformers 6.2 变压器模型

6.3 Reinforcement learning
6.3 强化学习

7 Applications 7 应用场景

8 Datasets 8 数据集

9 Discussion 9 讨论

10 Conclusion 10 结论

References 参考文献

A Datasets Benchmarks Equations
数据集基准测试方程

Interpretable Scientific Discovery with Symbolic Regression: A Review可解释科学发现与符号回归方法综述

Abstract 摘要

Contents 目录

1 Introduction 1 引言

2 Problem Definition 2 问题定义

2.1 Class of Function 2.1 函数类

2.2 Expression representation2.2 表达式表示法

3 Symbolic regression methods overview3 符号回归方法概述

4 Linear symbolic regression4 线性符号回归

4.1 Unidimensional case 4.1 单维情况

4.1.1 Univariate function4.1.1 单变量函数

Table 2 表 2

4.1.2 Multivariate function4.1.2 多元函数

4.2 Multidimensional case4.2 多维情况

5 Nonlinear symbolic regression5 非线性符号回归

6 Tree expression 6 树表达式

6.1 Genetic Programming 6.1 遗传编程

6.2 Transformers 6.2 变压器模型

6.3 Reinforcement learning6.3 强化学习

7 Applications 7 应用场景

8 Datasets 8 数据集

9 Discussion 9 讨论

10 Conclusion 10 结论

References 参考文献

A Datasets Benchmarks Equations数据集 基准测试 方程

Interpretable Scientific Discovery with Symbolic Regression: A Review
可解释科学发现与符号回归方法综述

2.2 Expression representation
2.2 表达式表示法

3 Symbolic regression methods overview
3 符号回归方法概述

4 Linear symbolic regression
4 线性符号回归

4.1.1 Univariate function
4.1.1 单变量函数

4.1.2 Multivariate function
4.1.2 多元函数

4.2 Multidimensional case
4.2 多维情况

5 Nonlinear symbolic regression
5 非线性符号回归

6.3 Reinforcement learning
6.3 强化学习

A Datasets Benchmarks Equations
数据集基准测试方程