这是用户在 2024-11-22 15:24 为 https://ar5iv.labs.arxiv.org/html/2307.00035 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Parameter Identification for Partial Differential Equations with Spatiotemporal Varying Coefficients
具有时空变化系数的偏微分方程的参数辨识

Guangtao Zhang11footnotemark: 122footnotemark: 2 张广涛12 Yiting Duan33footnotemark: 3 段怡婷3 Guanyu Pan44footnotemark: 455footnotemark: 5 潘冠宇45 Qijing Chen66footnotemark: 677footnotemark: 7 陈启静67 Huiyu Yang88footnotemark: 899footnotemark: 9 杨慧宇89 Zhikun Zhang1010footnotemark: 101111footnotemark: 11 张志坤1011 zkzhang@hust.edu.cn
Abstract 抽象

To comprehend complex systems with multiple states, it is imperative to reveal the identity of these states by system outputs. Nevertheless, the mathematical models describing these systems often exhibit nonlinearity so that render the resolution of the parameter inverse problem from the observed spatiotemporal data a challenging endeavor. Starting from the observed data obtained from such systems, we propose a novel framework that facilitates the investigation of parameter identification for multi-state systems governed by spatiotemporal varying parametric partial differential equations. Our framework consists of two integral components: a constrained self-adaptive physics-informed neural network, encompassing a sub-network, as our methodology for parameter identification, and a finite mixture model approach to detect regions of probable parameter variations. Through our scheme, we can precisely ascertain the unknown varying parameters of the complex multi-state system, thereby accomplishing the inversion of the varying parameters. Furthermore, we have showcased the efficacy of our framework on two numerical cases: the 1D Burgers’ equation with time-varying parameters and the 2D wave equation with a space-varying parameter.
要理解具有多个状态的复杂系统,必须通过系统输出揭示这些状态的身份。然而,描述这些系统的数学模型经常表现出非线性,因此使得从观察到的时空数据中解决参数逆问题的分辨率成为一项具有挑战性的工作。从从此类系统获得的观测数据开始,我们提出了一个新的框架,该框架有助于研究由时空变化参数偏微分方程控制的多状态系统的参数识别。我们的框架由两个不可或缺的部分组成:一个受约束的自适应物理信息神经网络,包含一个子网络,作为我们的参数识别方法,以及一个有限混合模型方法,用于检测可能参数变化的区域。通过我们的方案,我们可以精确确定复杂多态系统的未知变化参数,从而完成变化参数的反演。此外,我们还展示了我们的框架在两种数值情况下的有效性:具有时变参数的 1D Burgers 方程和具有空间变化参数的 2D 波动方程。

keywords:
Multi-state complex system; Parameter identification; Inverse problem; Physics-informed neural network; Finite mixture model; Change-point detection;
关键字:
多态复杂系统;参数识别;逆问题;物理信息神经网络;有限混合模型;变化点检测;

1 Introduction 1 介绍

Parameter identification for partial differential equations (PDEs) is also known as the inverse problem, encompassing various mathematical branches such as numerical analysis, nonlinear analysis, and optimization algorithms. The target of the inverse problem is inferring unknown parameters of PDE from a set of spatiotemporal data with potential noise [1] and this field has progressed rapidly over the past few decades with proposed methods such as the sparse Bayesian learning algorithm [2], the least squares method [3], the frequency and Bayesian methods [4], and the physics-informed neural networks (PINNs) [5], etc. In the mechanics of material fields, accurate property parameter detection will benefit the damage detection and design for new multi-functional materials [6]. In biomechanics, identifying important parameters in human tissue can be helpful for treatment and disease prevention [7, 8]. And parameter identification method is also widely used in other engineering fields such as oil exploration and fluid mechanism [9, 10].
偏微分方程 (PDE) 的参数识别也称为逆问题,包括各种数学分支,例如数值分析、非线性分析和优化算法。逆问题的目标是从一组具有潜在噪声的时空数据中推断出 PDE 的未知参数 [1],在过去的几十年里,该领域随着提出的方法迅速发展,例如稀疏贝叶斯学习算法 [2]、最小二乘法 [3]、频率和贝叶斯方法 [4] 以及物理信息神经网络 (PINN) [5]等。在材料领域的力学中,精确的性能参数检测将有利于新型多功能材料的损伤检测和设计 [6]。在生物力学中,识别人体组织中的重要参数有助于治疗和疾病预防 [78]。参数辨识方法也广泛应用于石油勘探、流体机构等其他工程领域 [910]。

Nowadays, multi-state systems with time-varying or space-varying parameters have been widely used in fields such as physics, biology, chemical processes [11], and society [12]. One of the most powerful ways of understanding a multi-state complex system with time-varying parameters is discovering its state transition path. Various theories have been proposed to model and characterize the system dynamics such as the transition path theory [13], the transition path sampling [14], and the Markov state model [15]. For the system governed by a varying parametric PDE, the evolutionary process of the varying parameters determines the state transition path of the multi-state system. For space-varying parameters in higher dimensions, the transition region could be inscribed instead of the transition path. As such, identifying the unknown varying parameters is becoming a necessary first step for discovering the pattern variation in complex systems.
如今,具有时变或空间变化参数的多态系统已广泛应用于物理学、生物学、化学过程 [11] 和社会学 [12] 等领域。理解具有时变参数的多态复杂系统的最有效方法之一是发现其状态转换路径。已经提出了各种理论来建模和表征系统动力学,例如过渡路径理论 [13]、过渡路径采样 [14] 和马尔可夫状态模型 [15]。对于由变化参数偏微分方程控制的系统,变化参数的进化过程决定了多态系统的状态转换路径。对于更高维度中的空间变化参数,可以内接过渡区域而不是过渡路径。因此,识别未知的变化参数正在成为发现复杂系统中模式变化的必要第一步。

The PINNs have been demonstrated as an efficient way to infer the unknown parameters of PDEs from the observed data. The original idea of PINNs was introduced by Lagaris in 1998 [16], and has been well established by Raissi et al. for solving two main problems: the forward problem for PDE resolution and parameter identification for PDE [17, 18]. From Raissi, varied numerical techniques have been proposed to improve the performance of PINNs for that two problems [19, 20] and been successfully used in solving problems in materials [21], biology [22], topological optimization [23], and fluid[24]. For the varying parameter inferring task, Revanth et al. proposed the backward compatible PINNs(bc-PINNs)[25] to learn time-varying parameters of time-varying parametric Burgers’ equation from the observed data without any prior information. However, the inferring results of bc-PINNs only follow a trend similar to the true values. As a result, such inaccurate results are insufficient for us to explore the transition path. To solve the above, we need a more accurate parameter identification method.
PINN 已被证明是从观察到的数据中推断 PDE 未知参数的有效方法。PINN 的最初想法由 Lagaris 于 1998 年提出 [16],并已被 Raissi 等人很好地确立,用于解决两个主要问题:偏微分方程分辨率的正向问题和偏微分方程的参数识别 [1718]。Raissi 提出了各种数值技术来提高 PINN 在这两个问题上的性能 [1920],并成功用于解决材料 [21]、生物学 [22]、拓扑优化 [23] 和流体 [24] 中的问题。对于变化参数推断任务,Revanth 等人提出了向后兼容的 PINN(bc-PINN)[25],以从没有任何先验信息的观测数据中学习时变参数 Burgers 方程的时变参数。但是,bc-PINN 的推理结果仅遵循与真实值相似的趋势。因此,这种不准确的结果不足以让我们探索过渡路径。要解决上述问题,我们需要一种更准确的参数识别方法。

After obtaining the inferring results of the varying parametric PDEs, the next part is detecting the change region of the varying parameters. Change-point detection is an important part of time series analysis and probability anomaly detection [26]. This work requires us to pinpoint the locations of changes in statistical characteristics and points in time at which the probability density functions change [27]. Based on the parameter inferring results of a varying system, a fast and accurate change point detection method may contribute to detecting the change points of the system and locating their position, which may be signification for us to discover the state transition path. For time series data, There has been extensive work in detection change points [28, 29, 30] and becomes a signification part of controlling the reliability and stability of the system. Unlike time series analysis, in this study, the change points of time-varying and space-varying parameters make up the region of variation about the intrinsic nature of the multi-state complex system. It would be interesting research to reveal this hidden parameter variation from the output of the system.

Data-driven statistical modeling based on finite mixture distributions is a rapidly evolving field, with a wide range of applications expanding rapidly [31]. Recently, the finite mixture models is utilized in various fields, such as biometrics, physics, medicine, and marketing. It offers a straightforward method for describing a continuous system’s variation through discrete state space. Despite being a simple linear extension of the classical statistical model, finite mixture models share features concerning inference, specifically a discrete latent structure that results in certain fundamental challenges in estimation, such as the need to determine the unknown number of groups, states, and clusters. The expectation maximization(EM) algorithm is an iterative technique based on maximum likelihood estimation for estimating the parameters of statistical models when the data comprises both observed and hidden variables in the context of finite mixture models [32]. The key advantage of the EM algorithm is that it provides a means of estimating the parameters of models with latent variables without explicitly computing the posterior distribution of the latent variables. This statistical method is particularly useful when there are missing or incomplete data, or when the data is partially observed. This can be computationally efficient, especially when dealing with complex models.

In this paper, we introduce a novel framework for discovering the state transition path of a multi-state parametric PDE system in two steps. Firstly, we use the modified constrained self-adaptive physics-informed neural networks (cSPINNs) to identify the unknown varying parameters and then detect the change region via a change point detection method based on a finite mixture model. Specifically, we modify the cSPINNs by adding a sub-network to learn the varying parameters and this can obtain more accurate results than the previous bc-PINNs. Next, we detect the change points concerning where the parameter change based on the inferring results by employing the finite mixture method. Finally, we take the 1D time-varying parametric Burgers’ Equation and 2D space-varying wave equation as test examples to demonstrate the performance of our method.

This paper is structured as follows. In section 2, we describe forms of parametric partial differential equations with time and space-varying parameters which are the test cases for our method. In section 3, the proposed framework containing two main methods for discovering the state transition path is presented in detail. In section 4, we test the performance of our framework based on the 1D time-varying parametric Burgers’ equation and the 2D space-varying parametric wave equation and analysis their results. Section 5 is the comparison of cSPINNs and bc-PINNs via 1D Burgers’ equation. Section 6 is the performance of our framework on 2D space-varying wave equation and section 7 is the conclusion and discussion.

2 Parametric Partial Differential Equations with Time and Space Varying Parameter

To elucidate the situation of the partial differential equations with Time-varying parameters, we use the following 1D time-varying parametric Burgers’ equation as an example. The Burgers’ equation is a nonlinear second-order partial differential equation that is used as a simplified model in fluid mechanics. The equation is given by the Dutch mathematician Johannes Burgers’ [33] and in this study, we generally write as the following form

ut=λ1uux+λ2uxx,u_{t}=\lambda_{1}uu_{x}+\lambda_{2}u_{xx}, (2.1)

where uu is the fluid velocity at position xx and time tt, the term λ1uux\lambda_{1}uu_{x} is known as the convective term, the term λ2uxx\lambda_{2}u_{xx} is the diffusive term and λ2\lambda_{2} is the kinematic viscosity of the fluid. The Burgers’ equation combines the effects of convection and diffusion in a non-linear way and is used to model a variety of phenomena in fluid mechanics, including shock waves, turbulence, and flow in porous media [34]. Besides fluid mechanics, it has also been used in other areas of physics, such as in modeling traffic flow in transportation engineering [35].

Let λ1\lambda_{1} and λ2\lambda_{2} be time-varying parameters and take values in a finite discrete parameter space. We rewrite equation (2.1) as

ut=λ1(t)uux+λ2(t)uxx.u_{t}=\lambda_{1}(t)uu_{x}+\lambda_{2}(t)u_{xx}. (2.2)

Thus we get a continuous system with discrete states. In this time-varying parameter system, the parameter may exhibit local invariance. As such, in the global time domain, research attention is directed toward how the system state changes over time and at which points these changes occur. The subsequent objective of this study is to establish a comprehensive mathematical framework that builds upon existing solutions of the system. This framework serves to address the inverse problem for parameters in the equation and change point detection of time-varying parameter systems.

In this paper, the observed data of the 1D time-varying parametric Burgers’ equation is computed via the numerical method fast Fourier transform where the initial value is as givens:

u(x,0)=exp{(x+1)2},u(x,0)=\exp{\{-(x+1)^{2}\}}, (2.3)

and the domain is (x,t)[8,8]×(0,10](x,t)\in[-8,8]\times(0,10]. The observed data of three cases: constant parameters without change point, only λ1\lambda_{1} changes once, and λ1\lambda_{1} and λ2\lambda_{2} are all change with multiple change points are shown in the figure 1.

Refer to caption
Refer to caption
Refer to caption
Figure 1: Numerical solution of the 1D time-varying parametric Burgers’ equation for three cases. From left to right, figures correspond to constant parameters without change point, time-varying λ1\lambda_{1} with one change point, and time-varying λ1\lambda_{1} and λ2\lambda_{2} with multiple change points.

From the above figures, it is clear that the transition path of states and the change points of the time-varying system can not be revealed directly from the observed data. Moreover, the difference between the first and the second is obscure, let alone the system information and state transfer paths. Therefore, we can apply the modified cSPINNs as a bridge to link the observed data and the unknown parameters. In this way, we can discover the hidden information together with the transition path.

For the situation of partial differential equations with space-varying parameters, as a contrast to that previous example, we will introduce the 2D wave equation, whose parameters are not a constant in the space plane. The wave equation is a mathematical model that describes wave phenomena. It is typically expressed as a partial differential equation and can describe wave processes in both space and time. It has widespread applications in physics, engineering, mathematics, and other fields. The general form of the wave equation can be written as:

utt=α22u.u_{tt}=\alpha^{2}\nabla^{2}u. (2.4)

Here, the uu represents the wave amplitude, the α\alpha is the wave speed, and the 2\nabla^{2} is the Laplacian operator, which represents the second derivative in space. This equation describes how the wave amplitude changes and propagates during a wave process. The second time derivative represents the acceleration of the wave amplitude, while the Laplacian operator represents the second derivative in space. Let α\alpha be a space-varying parameter α(x,y)\alpha(x,y) and then rewrite the (2.4) as

utt=[α(x,y)]22u.u_{tt}=[\alpha(x,y)]^{2}\nabla^{2}u. (2.5)

In many cases of scientific computing research, there can be sudden and discontinuous changes in local regions, which can have a significant impact on the output of the system. Therefore, it is of great practical significance to obtain the regions of varying neutral states in such a space through scientific calculations. By doing so, we can better understand the underlying physical processes and develop more accurate models to describe them.

3 Data-driven Discovery of Parameter Identification Framework for Partial Differential Equations

In this section, we use two parts to introduce our state transition path discovery framework for a varying parameter system. Firstly, we illustrate the modified cSPINNs method for identifying varying parameters from the observed data. Next, we describe the finite mixture model as our change point detection method.

3.1 Modified Constrained Self-adaptive Physics Informed Neural Networks

In this subsection, we introduce the modified cSPINNs to solve the inverse problem. We firstly consider the model problem given the spatial domain Ω\Omega, and temporal domain t[0,T]t\in[0,T], which with explicit parametric form of parameterized PDEs:

ut[(x,t)]+𝒩[u(x,t);λp(t)]=0,xΩ,t(0,T],\displaystyle u_{t}[(x,t)]+\mathcal{N}[u(x,t);\lambda_{p}(t)]=0,\quad x\in\Omega,t\in(0,T], (3.1)

where 𝒩[]\mathcal{N}[\cdot] is an operator parameterized by physics parameter λp(t)\lambda_{p}(t), which includes any combination of linear and non-linear terms of spatial derivatives. To infer the unknown parameters of the PDE via PINNs [5], we need to construct a neural network u^(x,t;𝒘)\hat{u}(x,t;\boldsymbol{w}) given the spatial xΩx\in\Omega and temporal t[0,T]t\in[0,T] inputs with the trainable parameters 𝒘\boldsymbol{w} to fit the data {xko,tko,uko}k=1No\{x_{k}^{o},t_{k}^{o},u_{k}^{o}\}_{k=1}^{N_{o}}. Meanwhile, the neural network also needs to satisfy the physics laws, i.e. the parameterized governing PDE. Therefore, we can train a physics-informed model by minimizing the following loss function

(𝒘)=λrr(𝒘)+λoo(𝒘),\displaystyle\mathcal{L}(\boldsymbol{w})=\lambda_{r}\mathcal{L}_{r}(\boldsymbol{w})+\lambda_{o}\mathcal{L}_{o}(\boldsymbol{w}), (3.2)

where

r(𝒘)=1Nri=1Nr|u^t[(xri,tri)]+𝒩[u^(xri,tri);λp(tri)]|2,\displaystyle\mathcal{L}_{r}(\boldsymbol{w})=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left|\hat{u}_{t}[(x_{r}^{i},t_{r}^{i})]+\mathcal{N}[\hat{u}(x_{r}^{i},t_{r}^{i});\lambda_{p}(t_{r}^{i})]\right|^{2}, (3.3a)
o(𝒘)=1Noi=1No|u^(xoi,toi)uo(xoi,toi)|2.\displaystyle\mathcal{L}_{o}(\boldsymbol{w})=\frac{1}{N_{o}}\sum_{i=1}^{N_{o}}\left|\hat{u}\left({x}_{o}^{i},{t}_{o}^{i}\right)-u_{o}\left({x}_{o}^{i},{t}_{o}^{i}\right)\right|^{2}. (3.3b)

Here, r\mathcal{L}_{r} and o\mathcal{L}_{o} are loss functions due to the residual in the PDE loss, data loss between observed data, and predicted value from the network. We use u^\hat{u} to represent the output of the neural network or in other words, the PDE solution, which is parameterized by 𝒘\boldsymbol{w}. The weights λr\lambda_{r} and λo\lambda_{o} could highly influence the convergence rate of different loss components and the final accuracy of PINNs [36]. Recently, many works [36, 37, 38, 39, 40] are proposed to explore the weighting strategy during PINNs training, which has become one of the mainstream directions of PINNs. To further enhance the learning ability in the physics domain with the complex solution and improve the accuracy of inferred parameters, we introduce a constrained self-adaptive weighting residual loss function. For the inverse problem, the training goal is determined by the residual loss and data loss, here we mainly consider the residual loss, which is closely related to the accuracy of inferred parameters. Then we first rewrite the residual loss function as

r(𝒘)=1Nri=1Nrλ^ri|u^t[(xri,tri)]+𝒩[u^(xri,tri);λp(tri)]|2,\displaystyle\mathcal{L}_{r}(\boldsymbol{w})=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\hat{\lambda}_{r}^{i}\left|\hat{u}_{t}[(x_{r}^{i},t_{r}^{i})]+\mathcal{N}[\hat{u}(x_{r}^{i},t_{r}^{i});\lambda_{p}(t_{r}^{i})]\right|^{2}, (3.4a)

during training, we update the trainable weights {λ^ri}i=1Nr\{\hat{\lambda}_{r}^{i}\}_{i=1}^{N_{r}} as

𝝀rk+1\displaystyle{\boldsymbol{\lambda}_{r}^{k+1}} =𝝀^rk+ηk𝝀^rk(𝒘,λr,λo,𝝀^rk),\displaystyle={\boldsymbol{\hat{\lambda}}_{r}^{k}}+\eta_{k}\nabla_{\hat{\boldsymbol{\lambda}}_{r}^{k}}\mathcal{L}\left(\boldsymbol{w},{\lambda}_{r},{\lambda}_{o},\hat{\boldsymbol{\lambda}}_{r}^{k}\right), (3.5a)
λrik+1\displaystyle{\lambda}_{r_{i}}^{k+1} =|λrik+1|i=1Nr|λrik+1|×C,\displaystyle=\frac{|{\lambda}_{r_{i}}^{k+1}|}{\sum_{i=1}^{N_{r}}|{\lambda}_{r_{i}}^{k+1}|}\times C, (3.5b)
λ^rik+1\displaystyle\hat{{\lambda}}_{r_{i}}^{k+1} =(1ϵ)×λ^rik+ϵ×λrik+1,\displaystyle=(1-\epsilon)\times\hat{{\lambda}}_{r_{i}}^{k}+\epsilon\times{{\lambda}}_{r_{i}}^{k+1}, (3.5c)

where we denotes rir_{i} as ithi^{th} residual points in {xri,tri}i=1Nr{\{x_{r}^{i},t_{r}^{i}\}}_{i=1}^{N_{r}}, kk and k+1k+1 the training iteration numbers. 𝝀rk+1{\boldsymbol{\lambda}_{r}^{k+1}} is a middle variable before normalization, in other words, we first normalize the λrik+1𝝀rk+1{\lambda}_{r_{i}}^{k+1}\in{\boldsymbol{\lambda}_{r}^{k+1}} and get the final λ^rik+1\hat{{\lambda}}_{r_{i}}^{k+1} by a weighted sum of the previous weight λ^rik\hat{{\lambda}}_{r_{i}}^{k} of iteration kk and the normalized λrik+1{\lambda}_{r_{i}}^{k+1} in the current k+1k+1 iteration. We set CC as the expectation of weights in PINNs here, i.e., we let C=E(i=1Nrλr^i)=NrC=E(\sum_{i=1}^{N_{r}}\hat{{\lambda}_{r}}_{i})=N_{r}. We update the weights by gradient ascend here to raise PINNs’ attention in the area that is difficult to learn. Figure 2 illustrates the modified constrained self-adaptive PINNs framework for parameter identification problems. The Neural Network 1 is used to approximate the solution u^\hat{u}, and the Neural Network 2 which is the adding sub-network we mentioned above is applied to reconstruct the varying parameters λ1(t)\lambda_{1}(t) and λ2(t)\lambda_{2}(t). Training loss is composed of the modified PDE loss and the data loss, which correspond to the physics laws and the real observed data, respectively. Here, we obtained the observed data by numerically solving the time-dependent Burgers’ equation as in [41], which depends on a spectral method and uses the specfem2D package to simulate the wave equation. It is worth noting that we consider the physical parameters of the system to evolve, which could be modeled using a neural network with time as input and predicted parameters as output. Readers could see the neural network structure in Figure 2 for more details.

Refer to caption
Figure 2: The schematics of the constrained self-adaptive PINNs’ framework for the inverse problem.

3.2 Change Point Detection by Finite Mixture Method

For the data of the system varying parameters λ1(t)\lambda_{1}(t) and λ2(t)\lambda_{2}(t) obtained from modified cSPINNs, we need to find a suitable way to perform change-point detection work for system (2.2). In general, due to the observation noise and the biased estimation of training the network, we have

Pr(yt|𝐗)=𝒩(E(yt),σ2),\Pr(y_{t}|\mathbf{X})=\mathcal{N}(\mathrm{E}(y_{t}),\sigma^{2}), (3.6)

where ytY={y1,y2,,yN}y_{t}\in\mathrm{Y}=\{y_{1},y_{2},\cdots,y_{N}\} is a biased estimate of the parameter λ(t)\lambda(t) at discrete-time points {t1,t2,,tN}0T\{t_{1},t_{2},\cdots,t_{N}\}^{T}_{0}, 𝐗\mathbf{X} is the system output spatiotemporal data u(x,t)u(x,t) and 𝒩\mathcal{N} is 1D Gaussian distribution with density function

f𝒩(y;μ,σ2)=1σ2πexp{(yμ)22σ2}.f_{\mathcal{N}}(y;\,\mu,\sigma^{2})=\frac{1}{\sigma\sqrt{2\pi}}\mathrm{exp}{\left\{-\frac{(y-\mu)^{2}}{2\sigma^{2}}\right\}}. (3.7)

Due to the time-varying parameters in system (2.2), the probabilistic model (3.6) is extended to a Gaussian mixture model (GMM) for observations

Pr(y|ϑ)=k=1Kαkf𝒩(y;μk,σk2),\Pr(y|\vartheta)=\sum_{k=1}^{K}\alpha_{k}f_{\mathcal{N}}(y;\,\mu_{k},\sigma_{k}^{2}), (3.8)

where ϑ={μk,σk,αk,k=1,2,,K}\vartheta=\{\mu_{k},\sigma_{k},\alpha_{k},\,k=1,2,\cdots,K\} is model unidentified parameters with proportional factor k=1Kαk=1\sum_{k=1}^{K}\alpha_{k}=1. Define the latent variable DtkD_{tk} is a 0/10/1 encoding of the assignment of the observation yty_{t} to kk subgroup of the mixture model.

γtk={1,yt is from k subgroup,0,otherwise,\gamma_{tk}=\begin{cases}1,\quad&\text{$y_{t}$ is from $k$ subgroup},\\ 0,\quad&\text{otherwise},\end{cases} (3.9)

with its responsive estimation

γ^tk=E(γtk|Y,ϑ)=αkf𝒩(yt;μk,σk2)k=1Kαkf𝒩(yt;μk,σk2).\hat{\gamma}_{tk}=\mathrm{E}(\gamma_{tk}|\mathrm{Y},\vartheta)=\frac{\alpha_{k}f_{\mathcal{N}}(y_{t};\,\mu_{k},\sigma_{k}^{2})}{\sum_{k=1}^{K}\alpha_{k}f_{\mathcal{N}}(y_{t};\,\mu_{k},\sigma_{k}^{2})}. (3.10)

The expectation step uses the parameter estimations of the model from the previous step to calculate the conditional expectation of the log-likelihood function for the observation data

E[logPr(y,γ|Y,ϑ)]=k=1K{(logαk)t=1Nγtk+[log(1/2π)logσk(ytμk)22σk2]t=1Nγ^tk}.\mathrm{E}[\mathrm{log}\Pr(y,\gamma|\mathrm{Y},\vartheta)]=\sum_{k=1}^{K}\Big{\{}(\mathrm{log}\,\alpha_{k})\sum_{t=1}^{N}\gamma_{tk}+\Big{[}\mathrm{log}(1/\sqrt{2\pi})-\mathrm{log}\,\sigma_{k}-\frac{(y_{t}-\mu_{k})^{2}}{2\sigma^{2}_{k}}\Big{]}\sum_{t=1}^{N}\hat{\gamma}_{tk}\Big{\}}. (3.11)

The maximization step determines the parameters ϑ^j{m1}\hat{\vartheta}_{j}^{\{m-1\}} for maximizing the log-likelihood function of the complete data obtained in the expectation step

ϑ^new=argmaxϑE[logPr(y,γ|Y,ϑ)].\hat{\vartheta}^{\mathrm{new}}=\arg\mathop{\max}\limits_{\vartheta}\mathrm{E}[\mathrm{log}\Pr(y,\gamma|\mathrm{Y},\vartheta)]. (3.12)

By Lagrange constrained optimization method, the updates of model parameters in each iteration are

{μ^k,(σ^k)2,η^k}new={t=1Nγ^tkyjt=1Nγ^tk,t=1Nγ^tk(yjμk)2j=1Nγ^tk,t=1Nγ^tkN}.\{\hat{\mu}_{k},(\hat{\sigma}_{k})^{2},\hat{\eta}_{k}\}^{\mathrm{new}}=\Big{\{}\frac{\sum_{t=1}^{N}\hat{\gamma}_{tk}y_{j}}{\sum_{t=1}^{N}\hat{\gamma}_{tk}},\frac{\sum_{t=1}^{N}\hat{\gamma}_{tk}(y_{j}-\mu_{k})^{2}}{\sum_{j=1}^{N}\hat{\gamma}_{tk}},\frac{\sum_{t=1}^{N}\hat{\gamma}_{tk}}{N}\Big{\}}. (3.13)

Then in continuous iterations, until the algorithm converges, the final two-state GMM parameter estimates

ϑ^={μ^k,σ^k,η^k,,k=1,2,,K},\hat{\vartheta}=\{\hat{\mu}_{k},\hat{\sigma}_{k},\hat{\eta}_{k},\>,k=1,2,\cdots,K\}, (3.14)

are generated. Thus the soft classification probability results based on GMM of observations Y\mathrm{Y} can be obtained as a N×KN\times K matrix

𝐆={gi,j}1iN, 1jK,\mathbf{G}=\{g_{i,j}\}_{1\leq i\leq N,\,1\leq j\leq K}, (3.15)

where

gi,j=ηjf𝒩(yi;μj,σj)k=1Kηkf𝒩(yi;μk,σk),g_{i,j}=\frac{\eta_{j}f_{\mathcal{N}}(y_{i};\mu_{j},\sigma_{j})}{\sum_{k=1}^{K}\eta_{k}f_{\mathcal{N}}(y_{i};\mu_{k},\sigma_{k})}, (3.16)

which is deduced from the Bayes theorem, and it reveals the magnitude of the probability that the ii-th sample belongs to the jj-th mixture component of the GMM model. Hence for the observation data Y={y1,y2,,yN}\mathrm{Y}=\{y_{1},y_{2},\cdots,y_{N}\}. For 1D Burgers’ equation with time-varying parameters, we can calculate a corresponding sequence of change-point probabilities in time interval [t1,t+1][t-1,t+1]

𝐏change={pt=1k=1K(gt1,kgt,kgt+1,k), 2tN1}.\mathbf{P}^{\mathrm{change}}=\Big{\{}p_{t}=1-\sum_{k=1}^{K}(g_{t-1,k}\cdot g_{t,k}\cdot g_{t+1,k}),\>2\leq t\leq N-1\Big{\}}. (3.17)

For a 2D space-varying wave equation with a space-varying parameter α\alpha, we need to consider the Gaussian distribution in a high dimension

f𝒩(𝐱;𝝁,𝚺)=k=1K1(2π)d2|𝚺|12exp{12(𝐱𝝁)T𝚺1(𝐱𝝁)},f_{\mathcal{N}}(\mathbf{x};\boldsymbol{\mu},\boldsymbol{\Sigma})=\sum^{K}_{k=1}\frac{1}{(2\pi)^{\frac{d}{2}}{\lvert\boldsymbol{\Sigma}\rvert}^{\frac{1}{2}}}\exp\Big{\{}-{\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{\mathrm{T}}\,\boldsymbol{\Sigma}^{-1}\,(\mathbf{x}-\boldsymbol{\mu})}\Big{\}}, (3.18)

Similarly, after getting the two-dimensional GMM, we have the soft classification probability results for space point (x,y)(x,y) in the domain

gx,y,j=ηjf𝒩((x,y);𝝁j,𝚺j)k=1Kηkf𝒩((x,y);𝝁k,𝚺k),g_{x,y,j}=\frac{\eta_{j}f_{\mathcal{N}}((x,y);\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})}{\sum_{k=1}^{K}\eta_{k}f_{\mathcal{N}}((x,y);\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}, (3.19)

Then we give a similar calculation of change-point probabilities in a cross-shaped five-point region {(x,y),(x1,y),(x+1,y),(x,y1),(x,y+1)}\{(x,y),(x-1,y),(x+1,y),(x,y-1),(x,y+1)\}

𝐏change={px,y=1k=1K(gx,y,jgx1,y,jgx+1,y,jgx,y1,jgx,y+1,j), 2xNx1, 2yNy1}.\begin{split}&\mathbf{P}^{\mathrm{change}}=\Big{\{}p_{x,y}=1-\sum_{k=1}^{K}(g_{x,y,j}\cdot g_{x-1,y,j}\cdot g_{x+1,y,j}\cdot g_{x,y-1,j}\cdot g_{x,y+1,j}),\>2\leq x\leq N_{x}-1,\>2\leq y\leq N_{y}-1\Big{\}}.\end{split} (3.20)

Finally, the peaks of this time series could be regarded as the detected state change-points of systems (2.2) and (2.5) in global time.

4 1D Burgers’ Equation with Time-varying Parameter

In this section, we use three distinctive types of numerical cases to test the performance of our framework. Moreover, those three category cases represent different evolutionary models of the time-varying 1D parametric Burgers’ equation, and their hidden state transition paths can be discovered via our framework.

To better identify parameter λ1(t)\lambda_{1}(t), a sub-network with the input tt and the output λ1(t;ϕ)\lambda_{1}(t;\phi) is used to model the dynamics of the parameter, where ϕ\phi denotes all trainable parameters of the network and could be optimized during training with the time-varying parameters λ1\lambda_{1} and λ2\lambda_{2}. The loss function is denoted as

(a). Mean squared error on the observed data

MSEo=1Nok=1No(u^(𝒙ko,tko)uko)2,(𝒙ko,tko)Ω×T.\mathrm{MSE}_{o}=\frac{1}{N_{o}}\sum_{k=1}^{N_{o}}\left(\hat{u}\left(\boldsymbol{x}_{k}^{o},t_{k}^{o}\right)-u_{k}^{o}\right)^{2},\quad\left(\boldsymbol{x}_{k}^{o},t_{k}^{o}\right)\in\Omega\times T. (4.1)

(b). Mean squared error of the residual points

MSER\displaystyle\mathrm{MSE}_{R} =1Nrk=1Nr(R(𝒙kr,tkr))2,(𝒙kr,tkr)Ω×T,\displaystyle=\frac{1}{N_{r}}\sum_{k=1}^{N_{r}}\left(R\left(\boldsymbol{x}_{k}^{r},t_{k}^{r}\right)\right)^{2},\quad\left(\boldsymbol{x}_{k}^{r},t_{k}^{r}\right)\in\Omega\times T, (4.2)
R:\displaystyle R: =u^tλ1(t)u^u^xλ2(t)u^xx.\displaystyle=\hat{u}_{t}-\lambda_{1}(t)\hat{u}\hat{u}_{x}-\lambda_{2}(t)\hat{u}_{xx}.

(c). Total mean squared error for inverse

MSE=λoMSEo+λRMSER.\mathrm{MSE}=\lambda_{o}\mathrm{MSE}_{o}+\lambda_{R}\mathrm{MSE}_{R}. (4.3)

In the following numerical experiments, we will get No=4,000N_{o}=4,000 observed data, and Nr=64,000N_{r}=64,000 residual points randomly sampled from the computational domain with Ω=[8,8],T=10\Omega=[-8,8],T=10. We let the weights of the PDE loss term and the residual loss term λo=λR=1\lambda_{o}=\lambda_{R}=1. We use the modified multilayer perceptron (MLP) [39] with a depth of 6, a width of 128, and the tahn activation function as the Neural Network 1 for solving the inverse problem. As for Neural Network 2, a modified MLP is used here, which has 1 input neuron and consists of 4 hidden layers with 40 neurons in each layer, and the activation function is chosen as tanh. The Adam optimizer is used here to minimize the loss function with Ne=200,000N_{e}=200,000 epochs. Meanwhile, We set the batch size of residual points Nbs=4,000N_{bs}=4,000 to reduce the memory requirement of hardware. The initial learning rate is 0.001, and the exponential learning rate annealing method is applied here with hyper-parameter γ=0.9\gamma=0.9 during training. The total time-domain [0,10][0,10] of the parametric Burgers’ equation has been discretized into 256 times steps uniformly. To identify the parameters λ1\lambda_{1} and λ2\lambda_{2}, all the observed data within five steps segment has been chosen. Prediction errors of identifying parameters via modified cSPINNs are shown in the appendix Appendix B: Absolute Error between Reference and Predicted Solution of 1D parametric Burgers’ Equation while the statistical inferring results and the L2L^{2} error for learning Burgers’ equation are shown in the table 1. Next, we start to exhibit our results.

4.1 Case 1: Burgers’ Equation with Single Change Point

The fundamental evolutionary model for a parametric PDE-governed time-varying system necessitates that one time-varying parameter contains one change point throughout the entire process. In this study, we explore three conditions: the first is the trivial case with no change point; the second and third are cases that feature a single varying parameter with one abrupt shift or one gradual change, respectively.

case 1.1:λ1(t)=1.5.case1.2:λ1(t)={0.5,0t<5.1,5t10.case 1.3:λ1(t)={0.5,0t<4.77,0.98x4.194.77t<5.27,1,5.27t10.\begin{split}&{\rm case\>1.1}:\lambda_{1}(t)=1.5.\quad{\rm case1.2}:\lambda_{1}(t)=\begin{cases}0.5,&0\leq t<5.\\ 1,&5\leq t\leq 10.\\ \end{cases}\quad\\ &{\rm case\>1.3}:\lambda_{1}(t)=\begin{cases}0.5,&0\leq t<4.77,\\ 0.98x-4.19&4.77\leq t<5.27,\\ 1,&5.27\leq t\leq 10.\end{cases}\end{split} (4.4)

Follow the proposed framework mentioned in Section 3, we apply modified cSPINNs with a sub-network to learn the time-varying parameter λ1\lambda_{1} of the parametric Burgers’ equation, then we detect the change points by a finite mixture model. Through the results attained above, the transition path could be discovered. Figure 3 shows the time-varying parameter values obtained using modified cSPINNs and the results of our change point detection scheme. Sub-figures in the first column and the second column illustrate values of λ1\lambda_{1} learned and λ2\lambda_{2} learned using modified cSPINNs, separately. And the last row of sub-figures is the results of the finite mixture model. It demonstrates that our framework performs well for all three cases. The main advantage of our framework is that we can discover the transition path of a time-varying system governed by a parametric PDE without any prior information. More specifically, we can predict the values of parameters(constant or time-varying) and the locations of change points without any prior information about time segments.

From figure 3, we can observe that the predicted parameters accurately fit the reference solution for both constant and time-varying cases where the predicted errors mainly appear at the location with discontinuity. Moreover, the error of case 1.2 shown in the second row with an abrupt change is larger than case 1.3 which the time-varying parameter λ1\lambda_{1} evolves gradually. This phenomenon seems reasonable since it is always hard for PINNs to tackle problems with discontinuities [42]. To better identify the change points, we prefer to use the probability method to finish our change point detection task. Our criterion of detection is measured by probability through the finite mixture model. It successfully captures the same change point in case 1.2 as the reference solution which has properties of low variance and high confidence. In this way, we managed to find out all the change points in the evolutionary process in case 1.3.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Figures from the first and second columns represent λ1\lambda_{1} and λ2\lambda_{2} learned using the modified cSPINNs approach for the time-varying parametric Burgers’ equation. Figures from the last column illustrate change point detection results of λ1\lambda_{1} learned. The first, second, and third rows correspond with cases 1.1, 1.2, and 1.3.

The total discretized time points for all experiments is 256 such that we may get 255 probability results through the change point detection method. For better analysis, we set a threshold as 1e-6. The results show that there exists one change point at t=5t=5 for case 1.2 and 9 change points for case 1.3. Thus the transition path of (λ1\lambda_{1},λ2\lambda_{2}) for cases above are (1.5,0.1)(1.5,0.1)(1.5,0.1)\rightarrow(1.5,0.1), (0.5,0.1)(1,0.1)(0.5,0.1)\rightarrow(1,0.1) and (0.5,0.1)(1,0.1)(0.5,0.1)\rightarrow(1,0.1) with sequential gradual change points.

4.2 Case 2: One Time-varying Parameter with Multiple Change Points

The second type of evolutionary model is one time-varying parameter with multiple change points. Here, the time-varying parameter is λ1\lambda_{1} and another parameter λ2\lambda_{2} is constantly 0.1. In this scenario, we test the performance of our framework through two cases. The first is the time-varying parameter λ1\lambda_{1} takes two values with two change points, and the second is the time-varying parameter λ1\lambda_{1} takes two values with three change points. The reference solution has been obtained as follows: