Long-Term Prediction of Sea Surface Temperature by Temporal Embedding Transformer With Attention Distilling and Partial Stacked Connection
基于时序嵌入变换器的海表温度长期预测：注意力蒸馏和部分堆叠连接方法

Hao Dai $^{®}$ , Zhigang He, Guomei Wei, Famei Lei $^{®}$ , Xining Zhang $^{®}$ , Weijie Zhang, and Shaoping Shang $^{()}$
戴浩 $^{®}$ ，何志刚，魏国美，雷发美 $^{®}$ ，张西宁 $^{®}$ ，张维杰，和商绍平 $^{()}$

Abstract 摘要

Sea surface temperature (SST) is one of the most important parameters in the global ocean-atmosphere system, and its long-term changes will have a significant impact on global climate and ecosystems. Accurate prediction of SST, therefore, especially the improvement of long-term predictive skills is of great significance for fishery farming, marine ecological protection, and planning of maritime activities. Since the effective and precise description of the long-range dependence between input and output requires higher model prediction ability, it is an extremely challenging task to achieve accurate long-term prediction of SST. Inspired by the successful application of the transformer and its variants in natural language processing similar to time-series prediction, we introduce it to the SST prediction in the China Sea. The model Transformer with temporal embedding, attention Distilling, and Stacked connection in Part (TransDtSt-Part) is developed by embedding the temporal information in the classic transformer, combining attention distillation and partial stacked connection, and performing generative decoding. High-resolution satellite-derived data from the National Oceanic and Atmospheric Administration is utilized, and long-term SST predictions with day granularity are achieved under univariate and multivariate patterns. With root mean square error and mean absolute error as metrics, the TransDtSt-Part outperforms all competitive baselines in five oceans (i.e., subareas of Bohai, Yellow Sea, East China Sea, Taiwan Strait, and South China Sea) and six prediction horizons (i.e., 30, 60, 90, 180, 270, and 360 days). Experimental results demonstrate that the performance of the innovative model is encouraging and promising for the long-term prediction of SST.
海面温度（SST）是全球海洋-大气系统中最重要的参数之一，其长期变化将对全球气候和生态系统产生显著影响。因此，准确预测海面温度，尤其是提高长期预测能力，对渔业养殖、海洋生态保护和海上活动规划具有重大意义。由于有效且精确地描述输入和输出之间的长期依赖关系需要更高的模型预测能力，实现海面温度的准确长期预测是一项极具挑战性的任务。受变换器及其变体在自然语言处理和时间序列预测中成功应用的启发，我们将其引入中国海的海面温度预测中。我们开发了一个名为"具有时间嵌入、注意力蒸馏和部分堆叠连接的变换器"（TransDtSt-Part）的模型，通过在经典变换器中嵌入时间信息、结合注意力蒸馏和部分堆叠连接，并执行生成解码。本研究利用了美国国家海洋和大气管理局的高分辨率卫星数据，在单变量和多变量模式下实现了天级粒度的长期海面温度预测。以均方根误差和平均绝对误差为指标，TransDtSt-Part 在渤海、黄海、东海、台湾海峡和南海五大海域的六个预测周期（即 30、60、90、180、270 和 360 天）中均优于所有竞争性基准模型。实验结果表明，这个创新模型在海面温度长期预测方面表现令人鼓舞且前景广阔。

Index Terms-Attention distilling, China Sea, long-term prediction, partial stacked connection, sea surface temperature (SST), temporal transformer.
索引词-注意力蒸馏，中国海，长期预测，部分堆叠连接，海面温度（SST），时间变换器。

I. Introduction 一、引言

SEA surface temperature (SST) provides basic information about the global climate system and is an important
海面温度（SST）提供了关于全球气候系统的基本信息，并且是一个重要的

Manuscript received 7 October 2023; revised 19 December 2023; accepted 17 January 2024. Date of publication 25 January 2024; date of current version 12 February 2024. This work was supported by the Fujian Province Marine Economic Development Subsidy Fund Project under Grant ZHHY-2019-2. (Corresponding authors: Hao Dai; Shaoping Shang.)
手稿于 2023 年 10 月 7 日收到；2023 年 12 月 19 日修订；2024 年 1 月 17 日接受。发表日期为 2024 年 1 月 25 日；当前版本日期为 2024 年 2 月 12 日。本工作得到福建省海洋经济发展补贴基金项目（编号：ZHHY-2019-2）的支持。（通讯作者：戴浩；商少平。）

Hao Dai, Zhigang He, Guomei Wei, Famei Lei, Weijie Zhang, and Shaoping Shang are with the Institute of Ocean Exploration Technology, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China (e-mail: daihaozxn@163.com; zghe@xmu.edu.cn; weiguomei@163.com; lfm101659@163.com; oot@xmu.edu.cn; spshang@xmu.edu.cn).
戴浩、何志刚、魏国美、雷发美、张卫杰和商少平均任职于厦门大学海洋与地球科学学院海洋探索技术研究所，361005，中国厦门（电子邮件：daihaozxn@163.com; zghe@xmu.edu.cn; weiguomei@163.com; lfm101659@163.com; oot@xmu.edu.cn; spshang@xmu.edu.cn）。

Xining Zhang is with the Fujian Key Laboratory of Light Propagation and Transformation, College of Information Science and Engineering, Huaqiao University, Xiamen 361021, China (e-mail: zhangxn_hqu@163.com).
张西宁任职于华侨大学信息科学与工程学院福建光传播与转换重点实验室，361021，中国厦门（电子邮件：zhangxn_hqu@163.com）。

This article has supplementary downloadable material available at https://doi.org/10.1109/JSTARS.2024.3357191, provided by the authors.
本文可在 https://doi.org/10.1109/JSTARS.2024.3357191 获取作者提供的补充可下载材料。

Digital Object Identifier 10.1109/JSTARS.2024.3357191
数字对象标识符 10.1109/JSTARS.2024.3357191
parameter for weather prediction and atmospheric model simulations [1], [2]. SST measurements benefit a wide range of operational applications, including climate monitoring, fishery farming, maritime commercial activities, etc. From north to south, the China Sea is mainly composed of the Bohai Sea, the Yellow Sea, the East China Sea, the Taiwan Strait, and the South China Sea. The China Sea spans four climatic zones: temperate zone, warm temperate zone, subtropical zone, and tropical zone. The confluence of ocean currents and the development of fronts have created many important fishing grounds, and the marine aquaculture industry is developed. Meanwhile, red tide disasters are more serious in spring and summer every year. Moreover, the China Sea is part of the Maritime Silk Road and also has one of the busiest shipping lanes in the world. Hence, accurate prediction of SST in the China Sea is crucial for an in-depth understanding of marine fishery farming, ecological change dynamics, and maritime activity planning, which are very important to the production and lives of Chinese people.
天气预测和大气模型模拟的重要参数[1]，[2]。海面温度（SST）测量对众多业务应用具有重要益处，包括气候监测、渔业养殖、海上商业活动等。从北到南，中国海主要由渤海、黄海、东海、台湾海峡和南海组成。中国海跨越四个气候带：温带、暖温带、亚热带和热带。海洋洋流的汇合和锋面的发展形成了许多重要的渔场，海洋养殖业也十分发达。同时，每年春夏季红潮灾害尤为严重。此外，中国海是海上丝绸之路的一部分，也拥有世界上最繁忙的航道之一。因此，准确预测中国海的海面温度对深入了解海洋渔业、生态变化动态和海上活动规划至关重要，这对中国人民的生产和生活具有极其重要的意义。

SST is mainly affected by many factors, such as solar radiation, air-sea heat flux, and diurnal winds, which form a complex and changeable vertical structure that changes over time. Changes in solar radiation affect the energy balance at the sea surface. Variations in cloud cover, atmospheric conditions, and the time of day influence the radiation balance, impacting SST. Positive heat flux, where the ocean gains heat from the atmosphere, leads to an increase in SST, and negative heat flux results in a decrease. Diurnal winds can affect the vertical mixing of the ocean layers. During the day, solar heating can lead to the development of sea breezes, causing mixing and influencing SST. At night, cooling processes may dominate. Due to the irregularity of thermal radiation, flux, and the uncertainty of wind blowing over the sea surface, it is difficult to construct effective and reliable mathematical equations to describe the causal relationship between SST and these factors, resulting in many difficulties in accurately predicting SST [3], [4].
海面温度（SST）主要受多种因素影响，如太阳辐射、海气热通量和日风，这些因素形成了一个随时间变化的复杂且多变的垂直结构。太阳辐射的变化会影响海面的能量平衡。云层覆盖、大气条件和一天中的时间会影响辐射平衡，进而影响海面温度。正热通量时，海洋从大气中获得热量，会导致海面温度升高；负热通量则导致海面温度下降。日风可以影响海洋层的垂直混合。白天，太阳加热可能导致海风发展，引起混合并影响海面温度。夜间，冷却过程可能占主导。由于热辐射的不规则性、通量以及海面上风的不确定性，很难构建有效且可靠的数学方程来描述海面温度与这些因素之间的因果关系，这导致准确预测海面温度存在诸多困难。

Currently, SST prediction methods are mainly divided into three categories: numerical models based on physics, datadriven techniques, and hybrids of the two. The numerical models rely on the dynamical and thermal equations based on the required initial and boundary conditions. They describe the physical states using partial differential equations and make predictions of future SST after conducting a large number of calculations to derive numerical solutions [5], [6], [7]. Since no
目前，海面温度预测方法主要分为三类：基于物理的数值模型、数据驱动技术以及两者的混合方法。数值模型依赖于满足初始和边界条件的动力学和热力学方程。它们使用偏微分方程描述物理状态，并通过进行大量计算得出数值解来预测未来的海面温度。由于没有
clear physical explanation can be given for the mechanism of SST generation and evolution, the construction of such models is generally inaccurate, relatively complex, and computationally expensive. They are more suitable for large-scale SST prediction with rough resolution. Starting from the characteristics of the data and internal laws, the data-driven methods learn rules from the data and make predictions by training a large number of known samples. The methods mainly include a statistical approach [8], genetic algorithm [9], and deep learning [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25]. Under the premise that a large amount of reliable observation data is available, this type of method can obtain satisfactory prediction results quickly. Now the hybrid approach combines the numerical model and artificial neural network for SST prediction [4]. Due to the use of two models, such a method is the most intricate, consuming the most computing resources and taking the longest time.
由于无法对海面温度（SST）生成和演变的机制给出清晰的物理解释，这类模型的构建通常不准确、相对复杂且计算成本高。它们更适合大规模的海面温度粗分辨率预测。从数据特征和内在规律出发，数据驱动方法通过训练大量已知样本来学习规则并进行预测。这些方法主要包括统计方法[8]、遗传算法[9]和深度学习[10-25]。在拥有大量可靠观测数据的前提下，这类方法可以快速获得令人满意的预测结果。目前还有混合方法结合数值模型和人工神经网络进行海面温度预测[4]。由于使用了两种模型，这种方法最为复杂，消耗最多计算资源，耗时最长。

Through literature research, we find that most studies on SST prediction using daily average data only focus on short- and medium-term prediction (lead time

\leq 10

days) probably because it is difficult to find reliable dependencies from the long-term complex time patterns in the future [10], [11], [12], [14], [15], [16], [18], [19], [21], [22], [23]. The so-called long-term prediction is weekly mean or monthly mean as the granularity interval which is too rough [13], [17], [20], [24], [25]. No long-term prediction studies with a prediction horizon

\geq 30

days based on daily average SST have been reported.
通过文献研究，我们发现使用日平均数据的海面温度预测研究大多仅关注短期和中期预测（预测期限小于

\leq 10

天），可能是因为在未来复杂的长期时间模式中难以找到可靠的依赖关系[10-23]。所谓的长期预测以周平均或月平均作为粒度间隔，这个间隔太过粗糙[13,17,20,24,25]。尚未有基于日平均海面温度、预测期限为

\geq 30

天的长期预测研究报告。

Extending the lead time of SST is a key requirement for practical applications, such as marine ecosystems and long-term planning of marine activities. Benefiting from the self-attention mechanism, transformer has gained a huge advantage in modeling sequential data dependencies, such as natural language processing (NLP, [26]) and audio processing [27]. This brings light to its introduction for SST prediction. However, the prediction task is extremely challenging in the long-term prediction horizon setting. First, it is unreliable to discover temporal dependencies directly from long-term time series because dependencies may be masked by entangled temporal patterns. Second, due to the quadratic complexity of the sequence length, the canonical transformer with the self-attention mechanism is computationally prohibitive and difficult to be directly used for long-term prediction.
延长海面温度（SST）预测的提前期是海洋生态系统和海洋活动长期规划等实际应用的关键要求。得益于自注意力机制，变换器在模拟序列数据依赖性方面已经获得了巨大优势，如自然语言处理（NLP, [26]）和音频处理[27]。这为将其引入 SST 预测带来了希望。然而，长期预测视角下的预测任务极具挑战性。首先，直接从长期时间序列中发现时间依赖性是不可靠的，因为依赖性可能被纠缠的时间模式所掩盖。其次，由于序列长度的二次复杂度，具有自注意力机制的经典变换器在计算上是禁止的，并且难以直接用于长期预测。

Based on the traditional transformer, therefore, this article uses generative decoding, embeds timing information, and adds attention distillation/partial stacked connection to construct the model named Transformer with temporal embedding, attention Distilling, and Stacked connection in Part (TransDtSt-Part). Five typical oceans of the China Sea are selected. With the help of daily average SST, using univariate and multivariate prediction patterns, the long-term prediction skill of TransDtSt-Part is comprehensively evaluated by comparing it with multiple baseline models.
基于传统变换器，本文使用生成解码、嵌入时间信息，并添加注意力蒸馏/部分堆叠连接，构建了一个名为具有时间嵌入、注意力蒸馏和部分堆叠连接的变换器（TransDtSt-Part）的模型。选择了中国海的五个典型海域。借助每日平均海面温度，使用单变量和多变量预测模式，通过与多个基线模型的比较，全面评估了 TransDtSt-Part 的长期预测能力。

The rest of this article is organized as follows. The details of the model proposed in this article are provided in Section II. Section III clarifies the data used in this article and the research area. Section IV evaluates the proposed model via experiments. Finally, Section V concludes this article.
本文的其余部分安排如下。第二节提供了本文提出的模型的详细信息。第三节阐明了本文使用的数据和研究区域。第四节通过实验评估所提出的模型。最后，第五节对本文进行总结。

Fig. 1. Classic transformer architecture.
图 1. 经典变换器架构。

II. Methods II. 方法

A. Classic Transformer A. 经典变换器

Similar to recurrent neural networks (RNN), the classic transformer [28] is designed to process sequential input data and is applied to tasks, such as translation and text summarization. It features self-attention, which differentially weights the importance of each part of the input (including the recursive output). Unlike RNNs, transformers process all inputs at once during training. The self-attention mechanism provides context for any position in the input sequence. Hence, to a large extent, transformer solves the problems of RNN training inefficiency and long-range dependency insufficiency. It should be noted that during inference, the transformer decoder still uses the autoregressive method for dynamic decoding.
类似于循环神经网络（RNN），经典变换器[28]被设计用于处理序列输入数据，并应用于翻译和文本摘要等任务。它具有自注意力机制，可以差异性地加权输入中每个部分的重要性（包括递归输出）。与 RNN 不同，变换器在训练期间一次性处理所有输入。自注意力机制为输入序列中的任何位置提供上下文。因此，在很大程度上，变换器解决了 RNN 训练效率低和长距离依赖不足的问题。需要注意的是，在推理过程中，变换器解码器仍然使用自回归方法进行动态解码。

The classic transformer for NLP first parses the input text into tokens through a byte-pair-encoded tokenizer, and each token is converted into a vector through word embedding. Then, the tagged position information is added to the word embedding.
用于自然语言处理的经典变换器首先通过字节对编码分词器将输入文本解析为标记，然后每个标记通过词嵌入转换为向量。接着，标记的位置信息被添加到词嵌入中。

As shown in Fig. 1, the transformer model uses an encoder and decoder architecture. The encoder consists of encoding layers that iteratively process the input layer by layer, while the decoder consists of decoding layers that perform the same operation on the output of the encoder.
如图 1 所示，变换器模型使用编码器和解码器架构。编码器由编码层组成，这些层逐层迭代处理输入，而解码器由解码层组成，这些层对编码器的输出执行相同的操作。

The function of each encoder layer is to generate an encoding containing information about which parts of the input are related to each other. It passes its encoding as input to the next encoder layer. Each decoder layer does the opposite, taking all the encodings and using their combined contextual information to generate an output sequence. To achieve this, each encoder and decoder layer uses a multihead scaled dot-product attention mechanism. For each part of the input, attention weighs the relevance of all other parts and extracts information from them to produce output. Each decoder layer has an additional crossattention for incorporating the output of the encoder. Both the
每个编码器层的功能是生成一个编码，其中包含关于输入中哪些部分相互关联的信息。它将此编码作为输入传递给下一个编码器层。每个解码器层则执行相反的操作，获取所有编码并使用其组合的上下文信息来生成输出序列。为实现这一点，每个编码器和解码器层都使用多头缩放点积注意力机制。对于输入的每个部分，注意力权衡所有其他部分的相关性，并从中提取信息以生成输出。每个解码器层还有一个额外的交叉注意力机制，用于整合编码器的输出。两个

Fig. 2. Structure of TransDtSt-Part.
图 2. TransDtSt-Part 的结构
encoder and decoder layers have a feedforward neural network for additional processing of the output and contain residual connection and layer normalization steps.
编码器和解码器层具有前馈神经网络，用于输出的额外处理，并包含残差连接和层归一化步骤。

B. TransDtSt-Part Model B. TransDtSt-Part 模型

LSTM and its variants, which have been used in NLP, have also been employed to predict SST [23], [24]. Inspired by the great success of transformer and the similarity to a certain extent between SST prediction and sequence tasks, such as machine translation, according to the characteristics of SST time-series prediction, we make the following improvements to construct the model TransDtSt-Part.
长短期记忆网络（LSTM）及其变体已被应用于自然语言处理，同时也被用于预测海表温度（SST）[23]，[24]。受变压器的巨大成功和 SST 预测与机器翻译等序列任务在某种程度上的相似性的启发，根据海表温度时间序列预测的特点，我们对模型 TransDtSt-Part 进行了以下改进。

Employing generative decoding to improve the recursive output form of the traditional transformer, which is inefficient and has accumulated errors, to heighten the prediction skill.
采用生成解码方法改进传统变压器的递归输出形式，这种形式效率低下且容易累积误差，以提高预测能力。
Considering the time-dimensional information that has not been employed in the previous SST prediction works, and adding timestamp embedding.
考虑了先前海表温度预测工作中未使用的时间维度信息，并添加了时间戳嵌入。
Performing attention distilling and partial stacked connection of the encoder to improve the prediction accuracy and enhance the robustness.
通过对编码器进行注意力蒸馏和部分堆叠连接，以提高预测准确性并增强鲁棒性。

The article designs a deep-learning network for SST longterm prediction and the structure of the TransDtSt-Part with attention distilling and partial stacked connection is exhibited in Fig. 2.
本文设计了一个用于海面温度长期预测的深度学习网络，图 2 展示了具有注意力蒸馏和部分堆叠连接的 TransDtSt-Part 结构。

From bottom to top, these three improvements correspond to the black dotted boxes marked (1), (2), and (3) in Fig. 2, respectively. These improvements will be detailed as follows.
从底部到顶部，这三项改进分别对应图 2 中标记为(1)、(2)和(3)的黑色虚线框。这些改进将详细阐述如下。

Generative Decoding: As shown in Fig. 2, the input $X_{en}$ of the stacked encoder before embedding is the daily average data of SST for $n$ days from the historical $(t - n)$ th day to the current $t$ th day ( seq $_{len}$ ). The start token strategy is successfully applied to NLP [26], and we extend it into a generative way in the article. Instead of choosing specific flags as the token, we sample a long sequence in the input, that is, sampling a sequence of length (label $_{l e n}$ ) in the input sequence of the stacked encoder before embedding (namely, the daily average data of SST from the day $(t - x)$ th to the day $t$ th in Fig. $2, x \leq n$ , i.e., label $_{len} \leq$ seq $_{len})$ . The initial values of the expected prediction horizons [day $(t + 1)$ th-day $(t + m)$ th, a total of pred $_{len}$ days] are $m$ zeros, and are connected as the input of the decoder before embedding.
生成解码：如图 2 所示，嵌入前堆叠编码器的输入 $X_{en}$ 是从历史 $(t - n)$ 日到当前 $t$ 日的海面温度 $n$ 天的日平均数据（序列 $_{len}$ ）。起始标记策略在自然语言处理中已成功应用[26]，我们在本文中将其扩展为生成方式。与选择特定标记作为标记不同，我们对输入采样一个长序列，即在堆叠编码器嵌入前输入序列中采样一个长度为（标签 $_{l e n}$ ）的序列（即图中从 $(t - x)$ 日到 $t$ 日的海面温度日平均数据，即标签 $_{len} \leq$ 序列 $_{len})$ ）。预期预测范围[从第 $(t + 1)$ 日到第 $(t + m)$ 日，共 $_{len}$ 天]的初始值为 $m$ 个零，并连接作为解码器嵌入前的输入。

In Fig. 2, the SST sequence

X_{token}

is used as the start token to lead the initial values of the target sequence into the classic transformer decoder. Through a forward process, the multistep
在图 2 中，海面温度序列

X_{token}

被用作起始标记，以引导目标序列的初始值进入经典的 Transformer 解码器。通过前向过程，多步骤

Fig. 3. Sample formation instructions.
图 3. 样本形成说明。
prediction output (namely, the “outputs” in Fig. 2) can be obtained. In this way, the sudden drop in the inference speed of the original “dynamic decoding” in the long prediction is alleviated, and the accumulation of errors is avoided.
预测输出（即图 2 中的"输出"）可以得到。通过这种方式，缓解了原始"动态解码"在长期预测中推理速度的突然下降，并避免了误差的累积。

As shown in Fig. 3, for a new sample, the lookback window SST with a length of seq

_{l e n}

moves forward one step as a whole, and the pred

_{len}

prediction also moves one step forward accordingly. Repeat this to form

k

samples and one batch.
如图 3 所示，对于一个新样本，长度为 seq 的海表温度回溯窗口整体向前移动一步，pred 预测也相应地向前移动一步。重复此过程以形成样本和一个批次。
2) Temporal Embedding: In previous articles on deep learning to predict SST, temporal dimension information has not been exploited. The ability to achieve remote independence requires global information, for example, hierarchical timestamps (week, month, and year). These are rarely exploited in canonical self-attention, so a query-key mismatch between the encoder and decoder may lead to a potential drop in prediction performance.
2) 时间嵌入：在之前关于预测海面温度（SST）的深度学习文章中，时间维度信息尚未被开发利用。要实现远程独立性，需要全局信息，例如分层时间戳（周、月和年）。这些在典型的自注意力机制中很少被利用，因此编码器和解码器之间的查询-键不匹配可能导致预测性能的潜在下降。

In this article, before the SST enters the stacked encoder and decoder, we perform three forms of embedding on the input and superimpose them. The three forms of embedding are as follows.
在本文中，在海面温度进入堆叠的编码器和解码器之前，我们对输入进行三种形式的嵌入并将其叠加。这三种嵌入方式如下。

Position embedding: Similar to NLP, it is required to deal with longer inputs in long-term SST prediction, so a parallel input strategy is adopted. However, considering the contextual relationship between time-series SST data, position embedding needs to be added. Hence, we follow the embedding operation in [28]. Specifically, it is formalized by
位置嵌入：与自然语言处理（NLP）类似，长期海面温度预测需要处理较长的输入，因此采用并行输入策略。但考虑到时间序列海面温度数据之间的上下文关系，需要添加位置嵌入。因此，我们遵循文献[28]中的嵌入操作。具体地，其公式化表示为

\begin{aligned} {PE}_{(pos, 2 i)} & = \sin (pos / ((10000)^{\frac{2 i}{d_{model}}})) \\ {PE}_{(pos, 2 i + 1)} & = \cos (pos / ((10000)^{\frac{2 i}{d_{model}}})) \end{aligned}

where pos represents position,

i

represents dimension, and

d_{model}

represents embedding vector dimension.
其中 pos 表示位置，

i

表示维度，

d_{model}

表示嵌入向量维度。
2) Timestamp embedding: Temporal embedding is performed on the timestamp information corresponding to the encoder and decoder inputs, respectively, to access
2) 时间戳嵌入：在编码器和解码器输入对应的时间戳信息上分别进行时间嵌入，以便访问
the global context information. Embedded coding is performed according to the timestamp interval type. This article uses daily average SST data, that is, the timestamp interval is “day.” Assuming that each day is indexed from 0 , it is encoded into week, month, and year. The encoding formulas are as follows:
全局上下文信息。根据时间戳间隔类型执行嵌入编码。本文使用日平均海面温度数据，即时间戳间隔为"天"。假设每天的索引从 0 开始，将其编码为周、月和年。编码公式如下：

\begin{aligned} {indexdayofweek}_{encoded} & = \frac{2 \times indexdayofweek}{6} - 1 \\ {indexdayofmonth}_{encoded} & = \frac{2 \times indexdayofmonth}{30} - 1 \\ {indexdayofyear}_{encoded} & = \frac{2 \times indexdayofyear}{365} - 1 \end{aligned}

where indexdayofweek

\in [0, \dots, 6]

, indexdayofmonth

\in

[0, \dots, 30]

, and indexdayofyear

\in [0, \dots, 365]

.
其中 indexdayofweek

\in [0, \dots, 6]

、indexdayofmonth

\in

[0, \dots, 30]

，以及 indexdayofyear

\in [0, \dots, 365]

。

With the help of (2), (3), and (4), the timestamp “day” is encoded into three vectors and then passes through a linear layer with the input dimension 3 and output dimension

d_{model}

for timestamp embedding.
在（2）、（3）和（4）的帮助下，时间戳"天"被编码为三个向量，然后通过输入维度为 3、输出维度为

d_{model}

的线性层进行时间戳嵌入。
3) Scalar projection: To align the dimension, onedimensional (1-D) convolution is performed with kernelsize 3 , stride 1 , padding 1 , and the circular padding mode, and the encoder and decoder inputs are separately projected.
3) 标量投影：为了对齐维度，执行一维（1-D）卷积，卷积核大小为 3，步长为 1，填充为 1，并使用循环填充模式，编码器和解码器输入分别进行投影。
3) Attention Distilling and Partial Stacked Connection: In the long-term prediction task of SST, more computing resources are consumed because of long time series. To improve the prediction ability of the model, given the redundant combination of values

V

in the feature map of the entire encoder, we first use the distilling operation to sample, greatly reducing the input size, and prioritize the attention scores with dominant features. Then, the input is halved encoder-by-encoder to enhance the distillation robustness, and the distilling layer is reduced accordingly
3) 注意力蒸馏和部分堆叠连接：在海面温度（SST）的长期预测任务中，由于时间序列较长，会消耗更多计算资源。为了提高模型的预测能力，鉴于整个编码器特征图中值的冗余组合，我们首先使用蒸馏操作进行采样，大幅减小输入尺寸，并优先考虑具有主导特征的注意力分数。然后，逐编码器将输入减半，以增强蒸馏的鲁棒性，并相应地减少蒸馏层。

Fig. 4. Encoder stacking process in TransDtSt-Part network.
图 4. TransDtSt-Part 网络中的编码器堆叠过程。
to align the output of each encoder. Finally, the outputs of all or partial encoders are concatenated to form the final feature map.
以对齐每个编码器的输出。最后，将所有或部分编码器的输出级联以形成最终特征图。

The whole process of the encoder stacking is shown in Fig. 4 (take stacking three encoders as an example).
编码器堆叠的整个过程如图 4 所示（以堆叠三个编码器为例）。

According to the encoder settings of the traditional transformer, Fig. 5(a) exhibits that the encoder layer mainly includes scaling dot product multihead self-attention, residual connection, layer normalization, feedforward, activation, and dropout, where feedforward mainly includes the 1-D convolution Conv1d with parameters kernelsize 1 , padding 0 , and stride 1 to realize the effect of a fully connected structure [28], activation, and dropout.
根据传统变换器的编码器设置，图 5(a)展示了编码器层主要包括缩放点积多头自注意力、残差连接、层归一化、前馈、激活和丢弃，其中前馈主要包括具有参数卷积核大小 1、填充 0 和步长 1 的一维卷积 Conv1d，以实现全连接结构的效果[28]，以及激活和丢弃。

As shown in Fig. 5(b), the distilling layer mainly consists of downsampling (1-D convolution, parameters: convolution kernelsize 3 , padding 2 , stride 1 , padding mode circular), batch normalization, activation (function selected as ELU [29]), and maximum pooling. In particular, stride

= 2

in the maximum pooling is employed to achieve halved self-attention “distilling.”
如图 5(b)所示，蒸馏层主要由下采样（一维卷积，参数：卷积核大小 3，填充 2，步长 1，填充模式循环）、批归一化、激活（选择 ELU 函数[29]）和最大池化组成。特别地，最大池化中的步长

= 2

被用于实现对自注意力的"蒸馏"，将其减半。

It should be pointed out that we investigate the effect of the encoder stacking connection length on the SST prediction performance, and discover that probably due to receiving more long-term information, a longer stack is more sensitive to the input, resulting in the prediction effect of connecting all encoders is inferior to that of partial connecting encoders (for details, please refer to the prediction indicators of models TransDtStAll and TransDtSt-Part in Supplementary Material A. Ablation study). Therefore, in the model TransDtSt-Part, we take the most robust strategy of joining Encoder I and Encoder III of Fig. 4.
需要指出的是，我们研究了编码器堆叠连接长度对海面温度预测性能的影响，并发现可能由于接收了更多长期信息，更长的堆叠对输入更敏感，导致连接所有编码器的预测效果不如部分连接编码器（详细信息请参见补充材料 A 中模型 TransDtStAll 和 TransDtSt-Part 的预测指标。消融研究）。因此，在 TransDtSt-Part 模型中，我们采用了图 4 中连接编码器 I 和编码器 III 的最稳健策略。

Fig. 5. Encoder layer and distilling layer in stacked encoder. (a) Encoder layer. (b) Distilling layer.
图 5. 堆叠编码器中的编码器层和蒸馏层。(a) 编码器层。(b) 蒸馏层。

Fig. 6. Research area.
图 6. 研究区域。

III. Data and Research Area
III. 数据和研究区域

A. Data A. 数据

Similar to many works of literature that employ deep-learning techniques to predict SST [19], [20], [21], [22], the source data come from multiyear daily average data in the Optimum Interpolation High-Resolution SST Dataset Version 2 (OISST) provided by the Physical Sciences Laboratory of National Oceanic and Atmospheric Administration. The time range of the dataset is from September 1981 to the present, and the spatial coverage is

{89.875}^{\circ} S

{89.875}^{\circ} N

and

{0.125}^{\circ} E

{359.875}^{\circ} E

with a spatial resolution of

{0.25}^{\circ}

latitude by

{0.25}^{\circ}

longitude. Models are trained, validated, and tested on OISST as ground truth.
与许多使用深度学习技术预测海面温度（SST）的文献 [19]、[20]、[21]、[22] 类似，源数据来自美国国家海洋与大气管理局物理科学实验室提供的第 2 版最优插值高分辨率海面温度数据集（OISST）的多年逐日平均数据。该数据集的时间范围从 1981 年 9 月至今，空间覆盖范围从

{89.875}^{\circ} S

到

{89.875}^{\circ} N

和从

{0.125}^{\circ} E

到

{359.875}^{\circ} E

，空间分辨率为

{0.25}^{\circ}

纬度乘以

{0.25}^{\circ}

经度。模型以 OISST 作为地面真值进行训练、验证和测试。

B. Research Area B. 研究区域

To accurately compare the effects of our method in different China Sea areas, the interest subareas of the five typical oceans should be equal in size. Since the Taiwan Strait is a strip-shaped region and the research area that can be selected is relatively limited,

5 \times 5

pixels is basically the largest square region in the OISST dataset. Meanwhile, we also consider the application effect of our model in nearshore and offshore areas. The subregions of the Taiwan Strait and Bohai Sea include the coastal areas, while those in the Yellow Sea, East China Sea, and South China Sea belong to the open sea. As shown in the red boxes of Fig. 6, the longitude and latitude coordinate ranges of the five sea areas from north to south are Bohai Sea (

{119.625}^{\circ} E - {120.625}^{\circ} E, {38.625}^{\circ} N - {39.625}^{\circ} N

), Yellow Sea (

{122.125}^{\circ} E - {123.125}^{\circ} E, {35.125}^{\circ} N - {36.125}^{\circ} N

), East China Sea (

{124.125}^{\circ} E - {125.125}^{\circ} E, {29.125}^{\circ} N - {30.125}^{\circ} N

),
为了准确比较我们的方法在不同中国海域的效果，五个典型海域的感兴趣子区域应大小相等。由于台湾海峡是一个条状区域，可选择的研究区域相对有限，

5 \times 5

像素基本上是 OISST 数据集中最大的方形区域。同时，我们还考虑了模型在近海和离岸区域的应用效果。台湾海峡和渤海的子区域包括沿海区域，而黄海、东海和南海属于远洋区域。如图 6 的红色框所示，从北到南五个海域的经纬度坐标范围分别是渤海（

{119.625}^{\circ} E - {120.625}^{\circ} E, {38.625}^{\circ} N - {39.625}^{\circ} N

）、黄海（

{122.125}^{\circ} E - {123.125}^{\circ} E, {35.125}^{\circ} N - {36.125}^{\circ} N

）、东海（

{124.125}^{\circ} E - {125.125}^{\circ} E, {29.125}^{\circ} N - {30.125}^{\circ} N

）、

Taiwan Strait (

{118.875}^{\circ} E - {119.875}^{\circ} E, {23.625}^{\circ} N - {24.625}^{\circ} N

), and South China Sea (

{115.125}^{\circ} E - {116.125}^{\circ} E, {19.125}^{\circ} N -

{20.125}^{\circ} N

).
台湾海峡（

{118.875}^{\circ} E - {119.875}^{\circ} E, {23.625}^{\circ} N - {24.625}^{\circ} N

）和南海（

{115.125}^{\circ} E - {116.125}^{\circ} E, {19.125}^{\circ} N -

{20.125}^{\circ} N

）。

IV. Experiments and Results
四、实验和结果

A. Dataset Partitioning and Preprocessing Method
A. 数据集划分和预处理方法

We use a total of 40 years of data from 1982 to 2021 and divide by the volume of 7:1:2, i.e., 1982-2009 as the training dataset, 2010-2013 as the validation set for hyperparameters tuning, and 2014-2021 as a test set to evaluate the generalization ability of the model in the face of new data. In this article, mean-variance normalization is employed to preprocess the SST training set and applied to the validation and test datasets in the same way.
我们使用了 1982 年至 2021 年共 40 年的数据，按照 7:1:2 的比例划分，即 1982-2009 年作为训练数据集，2010-2013 年作为调整超参数的验证集，2014-2021 年作为测试集，以评估模型在面对新数据时的泛化能力。在本文中，对海面温度训练集采用均值-方差归一化进行预处理，并以相同方式应用于验证集和测试集。

B. Baseline Models B. 基准模型

Four baseline models are selected for predictive performance comparison, i.e., RNN-based model LSTM [30], CNN-based model TCN [31], transformer-based model informer [32], and interpretable time-series prediction model N-BEATS [33].
选择了四个基线模型进行预测性能比较，即基于 RNN 的模型 LSTM [30]、基于 CNN 的模型 TCN [31]、基于 Transformer 的模型 Informer [32]以及可解释的时间序列预测模型 N-BEATS [33]。

C. Hardware Platform and Software Environment
C. 硬件平台和软件环境

The experiments of the proposed model and baseline models are carried out on the following hardware configurations: CPU-Intel i9-9900k, RAM-64G, NVIDIA Geforce RTX 2080Ti 11G. The software environment adopts the Win10 Operating System, Integrated Development Environment Pycharm, and deep-learning framework PyTorch (1.10.1).
在以下硬件配置上进行了所提出模型和基线模型的实验：CPU-Intel i9-9900k，内存-64G，NVIDIA Geforce RTX 2080Ti 11G。软件环境采用 Win10 操作系统，集成开发环境 Pycharm，以及深度学习框架 PyTorch（1.10.1）。

D. Hyperparameters D. 超参数

The hyperparameters to be determined in the prediction of the model TransDtSt-Part are as follows:
在模型 TransDtSt-Part 的预测中，需要确定的超参数如下：

Number of encoder layers: $e_{layers} = [3, 4, 5, 6]$ .
编码器层数： $e_{layers} = [3, 4, 5, 6]$ 。
Number of decoder layers: $d_{layers} = [1, 2]$ .
解码器层数： $d_{layers} = [1, 2]$ 。
Generate dimensions of all sublayers and embedding layers in the model: $d_{model} = [128, 256, 512]$ .
模型中所有子层和嵌入层的生成维度： $d_{model} = [128, 256, 512]$ 。
Dimension of feedforward network layer: $d_{ff} = [512$ , 1024] and $d_{ff} > d_{model}$ .
前馈网络层的维度： $d_{ff} = [512$ ，1024] 和 $d_{ff} > d_{model}$ 。
Training epoch: epoch $= [10, 15, 20]$ .
训练轮次：第 $= [10, 15, 20]$ 轮。
Training with the early stop strategy, involving waiting for patience: patience $= [3, 5]$ .
采用提前停止策略进行训练，等待耐心值：耐心值 $= [3, 5]$ 。
Since the TransDtSt-Part model has a large number of hyperparameter combinations, the screening workload is relatively large. Considering the cost of time and computing resources, for some lead time in a certain ocean and univariate/multivariate prediction pattern, we first fix the parameters epoch $= 10$ , patience $= 3$ , and use grid search to determine the best parameters of $e_{layers}, d_{layers}, d_{model}$ , and $d_{ff}$ . That is, construct networks through various hyperparameter combinations within the screening range, model fitting on the training dataset, and choose the hyperparameters corresponding to the best prediction results on the validation set as the optimal values. Then, fix the selected best hyperparameters ( $e_{layers}, d_{layers}, d_{model}$ , and $d_{ff}$ ), set different epoch/patience combinations, compare the prediction performances on the validation dataset, and determine
由于 TransDtSt-Part 模型具有大量的超参数组合，筛选工作量相对较大。考虑到时间和计算资源成本，对于某些海洋区域和单变量/多变量预测模式，我们首先固定参数轮次 $= 10$ ，耐心值 $= 3$ ，并使用网格搜索来确定 $e_{layers}, d_{layers}, d_{model}$ 和 $d_{ff}$ 的最佳参数。即通过筛选范围内的各种超参数组合构建网络，在训练数据集上进行模型拟合，并选择验证集上预测结果最佳的超参数对应值作为最优值。然后，固定所选择的最佳超参数（ $e_{layers}, d_{layers}, d_{model}$ 和 $d_{ff}$ ），设置不同的轮次/耐心值组合，比较验证数据集上的预测性能，并确定
the best epoch and patience (taking the Bohai Sea as an example, Section B in the Supplementary Material shows the optimal hyperparameter determination process and results).
选择最佳训练轮次和耐心值（以渤海为例，补充材料 B 节展示了最优超参数确定的过程和结果）。

Other fixed hyperparameters: the activation function is gelu, the initial learning rate

lr = 0.0001

, the number of heads

n

heads

= 8

, batchsize

= 32

, and dropout

= 0.05

.
其他固定的超参数：激活函数为 gelu，初始学习率

lr = 0.0001

，注意力头数量

n

个，批次大小

= 32

，以及丢弃率

= 0.05

。

E. Loss Function and Optimizer
E. 损失函数和优化器

We choose MSE as the loss function when predicting SST, and the loss is passed back to the whole model from the decoder output. The optimizer selects Adam.
我们在预测海面温度时选择均方误差（MSE）作为损失函数，并将损失从解码器输出反向传播到整个模型。优化器选择 Adam。

F. Predictive Pattern F. 预测模式

Specifically, the univariate prediction pattern refers to taking the historical data of the geographical center grid point of the interesting region as input, feeding each model, and obtaining the multistep prediction of the point at one time. The multivariate prediction pattern refers to using all the historical data of the 25 grid points of the interesting area as input, feeding each model, and getting multistep predictions for all points at once.
具体而言，单变量预测模式是指以感兴趣区域地理中心网格点的历史数据作为输入，将数据输入每个模型，并一次性获得该点的多步预测。多变量预测模式是指使用感兴趣区域 25 个网格点的所有历史数据作为输入，将数据输入每个模型，并一次性获取所有点的多步预测。

G. Metrics G. 评价指标

As shown in Fig. 3, the metrics of whole length pred

_{len}

predictions in all samples are calculated to evaluate the predictive skill.
如图 3 所示，计算所有样本中整体长度预测的评价指标，以评估预测技能。

According to (5) and (6), the RMSE and MAE between the ground truth and the prediction are calculated as evaluation indicators. A lower RMSE or MAE indicates a better prediction
根据（5）和（6），将地面真实值与预测值之间的均方根误差（RMSE）和平均绝对误差（MAE）作为评价指标。较低的 RMSE 或 MAE 表示预测效果更好

\begin{aligned} RMSE & = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{n}} \\ MAE & = \frac{\sum_{i = 1}^{n} | y_{i} - x_{i} |}{n} \end{aligned}

where

y_{i} (i = 1, \dots, n)

is the prediction,

x_{i} (i = 1, \dots, n)

is the ground truth, and

n

is the number of samples.
其中

y_{i} (i = 1, \dots, n)

是预测值，

x_{i} (i = 1, \dots, n)

是地面真实值，

n

是样本数量。

H. Main Results H. 主要结果

Table I shows univariate SST predictive skill for five sea subregions and lead times of

{pred}_{len} \in {30, 60

90, 180, 270

, and 360 } days. Also, Table II is for the multivariate case. Since the change cycle of SST is in years, seq

_{l e n}

= 360

days is chosen. And pred

_{len} \leq

label

_{len} \leq

seq

_{len}

, seq

_{len}

=

label

_{len} +

pred

_{len}

could generally achieve better prediction results (Section C in Supplementary Material will give the prediction performance with fixed seq

_{l e n}

and different label

_{l e n})

. Hence, when pred

_{len} \in {30, 60, 90, 180}

days, seq

_{len}

is equal to 360 days, correspondingly label

_{len} \in {330, 300, 270, 180}

days. And when

{pred}_{len} \in {270, 360}

days, seq

_{len} =

label

_{len}

= 360

days.
表 I 展示了五个海洋子区域的单变量海面温度（SST）预测技能和预测提前期为

{pred}_{len} \in {30, 60

、

90, 180, 270

和 360 天的情况。表 II 则是多变量情况。由于海面温度的变化周期在年尺度上，因此选择了 seq

_{l e n}

= 360

天。并且 pred

_{len} \leq

标签

_{len} \leq

seq

_{len}

、seq

_{len}

=

标签

_{len} +

pred

_{len}

通常可以获得更好的预测结果（补充材料第 C 节将提供固定 seq

_{l e n}

和不同标签

_{l e n})

的预测性能）。因此，当 pred

_{len} \in {30, 60, 90, 180}

天时，seq

_{len}

等于 360 天，相应地标签

_{len} \in {330, 300, 270, 180}

天。当

{pred}_{len} \in {270, 360}

天时，seq

_{len} =

标签

_{len}

= 360

天。

In addition, we examine the relationship between encoder input length and model performance. So Tables I and II also include the prediction errors of each model when seqlen

\in

{360, 540, 720}

days, and label

_{len} = {pred}_{len} = 360

days.
此外，我们研究了编码器输入长度与模型性能之间的关系。因此表 I 和表 II 还包括了当 seqlen

\in

{360, 540, 720}

天，以及标签

_{len} = {pred}_{len} = 360

天时每个模型的预测误差。

Fig. 7. Univariate, all oceans, number of prediction improvement rate intervals of TransDtSt-Part relative to N-BEATS. (a) All lead times. (b) Lead times

\geq 180

days.
图 7. TransDtSt-Part 相对于 N-BEATS 的单变量、全海洋预测改进率区间。(a) 所有预测时间。(b) 预测时间

\geq 180

天。

In the prediction effect comparison of the baseline models, we call the models LSTM, N-BEATS, and TCN of the Library Darts and use the Library Optuna for hyperparameter screening. Note that N-BEATS achieves multivariate prediction by flattening the model input into a 1 -D series and reshaping the output into a tensor of appropriate size, while TCN requires seqlen to be greater than pred

_{len}

during prediction, so the symbol "

\times

" is employed to fill in the prediction blanks of seq

_{len} =

pred

_{len} =

360 days in Tables I and II.
在 baseline 模型的预测效果比较中，我们调用了 Darts 库的 LSTM、N-BEATS 和 TCN 模型，并使用 Optuna 库进行超参数筛选。需要注意的是，N-BEATS 通过将模型输入展平为一维序列并将输出重塑为适当大小的张量来实现多变量预测，而 TCN 在预测时要求 seqlen 大于 pred

_{len}

，因此在表 I 和表 II 中使用"

\times

"来填充 seq

_{len} =

pred

_{len} =

360 天的预测空白。

Analysis of SST Univariate Prediction Results: Following conditions can be found in Table I.
海面温度单变量预测结果分析：从表 I 中可以发现以下条件。
From the perspective of the lead times of $30 - 360$ days in each ocean, the prediction error of the model TransDtStPart, which uses generative decoding to predict multiple values in one step, has generally resisted the extended lead time, showing the characteristics of a steady and slow rise with the prediction horizon increasing.
从每个海域 $30 - 360$ 天预测时间的角度来看，使用生成解码在一个步骤中预测多个值的模型 TransDtStPart 的预测误差，总体上抵制了延长的预测时间，呈现出随着预测范围增加而稳定缓慢上升的特征。
For all oceans and all lead times, the prediction error of TransDtSt-Part is smaller than other baseline models from the statistical values of the optimal prediction results (i.e., the Count row). It is verified that the long-term SST predictive skill of TransDtSt-Part is satisfactory with the day as the fine-grained interval. When compared with $N -$ BEATS which has closer predictive performance, according to the prediction improvement rate (RMSE and MAE $\leq 5 %, 5 % - 10 %$ , and $\geq 10 %$ ) of TransDtSt-Part relative to N-BEATS, we count the counts of all lead times for all research subareas (a total of 40 counting points), as shown in Fig. 7(a).
对于所有海域和所有预测时间，TransDtSt-Part 的预测误差在最优预测结果的统计值（即 Count 行）中小于其他基准模型。这验证了 TransDtSt-Part 以天为细粒度间隔的长期海面温度预测能力是令人满意的。与预测性能相近的 $N -$ BEATS 相比，根据 TransDtSt-Part 相对于 N-BEATS 的预测改进率（RMSE 和 MAE $\leq 5 %, 5 % - 10 %$ ，以及 $\geq 10 %$ ），我们统计了所有研究子区域的所有预测时间的计数（共 40 个计数点），如图 7(a)所示。
As can be seen in Fig. 7(a), about half of the predictors improve below $5 %$ . However, if we consider the statistical results of the prediction horizons $\geq 180$ days [that is, Fig. 7(b), a total of 25 count points], we find that the counts whose metrics improve below $5 %$ are greatly decreased, while the number of improvements more than $5 %$ does not change. This phenomenon reveals that compared with N-BEATS, a relatively large improvement occurs at a longer lead time. In other words, TransDtSt-Part improves more for a longer prediction horizon.
如图 7(a)所示，大约一半的预测器改进低于 $5 %$ 。但是，如果考虑预测范围为 $\geq 180$ 天的统计结果[即图 7(b)，共 25 个计数点]，我们发现指标改进低于 $5 %$ 的计数大大减少，而改进超过 $5 %$ 的数量没有变化。这一现象揭示，与 N-BEATS 相比，较大的改进发生在更长的预测时间。换句话说，TransDtSt-Part 对于更长的预测范围有更多改进。
The prediction errors in the open seas of the Yellow Sea, the East China Sea, and the South China Sea are
黄海、东海和南海公海的预测误差

TABLE I 表 I
SST Univariate Predictive Skill of the Models With Different Prediction Horizons in the Five Sea Areas
不同预测范围下模型在五个海域的海面温度单变量预测技能

Models Metrics 模型指标		TransDtSt-Part		LSTM		N-BEATS		TCN		Informer 信息变压器
Models Metrics 模型指标		RMSE $(^{\circ} C)$ 均方根误差 $(^{\circ} C)$	MAE $(^{\circ} C)$ 平均绝对误差 $(^{\circ} C)$	RMSE $(^{\circ} C)$ 均方根误差 $(^{\circ} C)$	MAE $(^{\circ} C)$ 平均绝对误差 $(^{\circ} C)$	RMSE $(^{\circ} C)$ 均方根误差 $(^{\circ} C)$	MAE $(^{\circ} C)$ 平均绝对误差 $(^{\circ} C)$	RMSE ( $^{\circ} C$ ) 均方根误差（ $^{\circ} C$ ）	MAE $(^{\circ} C)$ MAE 平均绝对误差	RMSE $(^{\circ} C)$ 均方根误差（RMSE）	MAE $(^{\circ} C)$ MAE 平均绝对误差
	30	0.957	0.747	3.144	2.513	1.006	0.780	4.385	3.892	1.037	0.810
	60	1.052	0.820	5.386	4.218	1.087	0.857	7.799	6.897	1.155	0.917
	90	1.170	0.917	7.632	5.903	1.179	0.937	11.463	10.294	1.181	0.940
	180	1.160	0.912	12.088	9.939	1.202	0.961	16.166	14.551	1.233	0.989
	270	1.123	0.884	13.373	11.315	1.379	1.104	11.484	10.329	1.195	0.950
	360	1.142	0.900	11.748	9.606	1.288	1.023	$\times$	$\times$	1.195	0.942
	$360^{a}$	1.100	0.854	8.492	7.522	1.243	0.985	10.153	7.586	1.194	0.941
	$360^{b}$	1.082	0.849	11.785	9.658	1.372	1.092	1.476	1.178	1.184	0.934
Yellow Sea, China 黄海，中国	30	0.876	0.658	2.522	2.028	0.880	0.663	3.384	2.977	0.950	0.714
	60	0.981	0.742	4.179	3.279	1.000	0.771	5.919	5.375	1.059	0.805
	90	1.000	0.751	5.776	4.533	1.022	0.785	8.805	7.862	1.105	0.846
	180	1.055	0.802	8.996	7.381	1.154	0.896	11.846	10.617	1.152	0.882
	270	1.070	0.828	9.521	7.958	1.161	0.905	8.658	7.711	1.173	0.919
	360	1.068	0.816	9.297	7.584	1.204	0.936	$\times$	$\times$	1.152	0.888
	$360^{a}$	1.036	0.787	6.619	5.787	1.144	0.891	9.126	8.362	1.102	0.852
	$360^{b}$	1.004	0.760	9.221	7.430	1.157	0.886	1.292	0.989	1.114	0.855
East China Sea 东海	30	0.793	0.602	2.094	1.644	0.798	0.613	3.076	2.704	0.821	0.628
	60	0.868	0.665	3.496	2.708	0.880	0.687	5.229	4.692	0.885	0.684
	90	0.893	0.692	4.690	3.596	0.902	0.703	7.061	6.303	0.916	0.713
	180	0.937	0.738	7.519	6.139	1.016	0.801	9.249	8.340	0.976	0.759
	270	0.930	0.722	8.146	6.889	0.946	0.744	7.097	6.387	0.957	0.749
	360	0.935	0.732	7.457	6.078	0.958	0.750	$\times$	$\times$	0.975	0.765
	$360^{a}$	0.943	0.739	5.119	4.499	0.963	0.751	4.072	3.587	0.959	0.748
	$360^{b}$	0.920	0.718	7.458	6.102	0.960	0.753	1.096	0.853	0.926	0.719
	30	1.088	0.833	1.773	1.364	1.094	0.842	2.199	1.790	1.109	0.857
	60	1.109	0.864	2.600	2.056	1.132	0.879	3.611	3.099	1.133	0.881
	90	1.138	0.882	3.285	2.558	1.146	0.884	4.597	4.061	1.152	0.889
	180	1.122	0.864	4.950	3.968	1.217	0.951	6.293	5.454	1.156	0.901
	270	1.160	0.904	5.046	4.196	1.175	0.909	4.799	4.228	1.190	0.921
	360	1.135	0.881	4.716	3.772	1.203	0.934	$\times$	$\times$	1.159	0.897
	$360^{a}$	1.154	0.896	3.422	2.914	1.237	0.955	2.257	1.666	1.181	0.903
	$360^{b}$	1.161	0.906	4.888	3.931	1.212	0.939	1.302	0.997	1.200	0.922
South China Sea 南海	30	0.754	0.599	1.394	1.101	0.773	0.610	1.543	1.235	0.764	0.611
	60	0.805	0.647	1.886	1.465	0.835	0.666	2.467	2.019	0.831	0.671
	90	0.817	0.655	2.501	1.915	0.836	0.663	3.430	2.919	0.873	0.706
	180	0.862	0.696	3.376	2.751	0.907	0.719	4.408	3.985	0.911	0.741
	270	0.850	0.686	3.883	3.220	0.916	0.732	2.860	2.419	0.893	0.715
	360	0.856	0.692	3.486	2.802	0.944	0.761	$\times$	$\times$	0.898	0.723
	$360^{a}$	0.884	0.713	2.459	2.161	0.940	0.754	1.112	0.881	0.923	0.741
	$360^{b}$	0.895	0.708	3.293	2.659	0.951	0.761	1.001	0.786	0.920	0.728

Models Metrics TransDtSt-Part LSTM N-BEATS TCN Informer RMSE (^(@)C) MAE (^(@)C) RMSE (^(@)C) MAE (^(@)C) RMSE (^(@)C) MAE (^(@)C) RMSE ( ^(@)C ) MAE (^(@)C) RMSE (^(@)C) MAE (^(@)C) https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-08.jpg?height=169&width=33&top_left_y=575&top_left_x=301 30 0.957 0.747 3.144 2.513 1.006 0.780 4.385 3.892 1.037 0.810 60 1.052 0.820 5.386 4.218 1.087 0.857 7.799 6.897 1.155 0.917 90 1.170 0.917 7.632 5.903 1.179 0.937 11.463 10.294 1.181 0.940 180 1.160 0.912 12.088 9.939 1.202 0.961 16.166 14.551 1.233 0.989 270 1.123 0.884 13.373 11.315 1.379 1.104 11.484 10.329 1.195 0.950 360 1.142 0.900 11.748 9.606 1.288 1.023 xx xx 1.195 0.942 360^("a ") 1.100 0.854 8.492 7.522 1.243 0.985 10.153 7.586 1.194 0.941 360^("b ") 1.082 0.849 11.785 9.658 1.372 1.092 1.476 1.178 1.184 0.934 Yellow Sea, China 30 0.876 0.658 2.522 2.028 0.880 0.663 3.384 2.977 0.950 0.714 60 0.981 0.742 4.179 3.279 1.000 0.771 5.919 5.375 1.059 0.805 90 1.000 0.751 5.776 4.533 1.022 0.785 8.805 7.862 1.105 0.846 180 1.055 0.802 8.996 7.381 1.154 0.896 11.846 10.617 1.152 0.882 270 1.070 0.828 9.521 7.958 1.161 0.905 8.658 7.711 1.173 0.919 360 1.068 0.816 9.297 7.584 1.204 0.936 xx xx 1.152 0.888 360^("a ") 1.036 0.787 6.619 5.787 1.144 0.891 9.126 8.362 1.102 0.852 360^("b ") 1.004 0.760 9.221 7.430 1.157 0.886 1.292 0.989 1.114 0.855 East China Sea 30 0.793 0.602 2.094 1.644 0.798 0.613 3.076 2.704 0.821 0.628 60 0.868 0.665 3.496 2.708 0.880 0.687 5.229 4.692 0.885 0.684 90 0.893 0.692 4.690 3.596 0.902 0.703 7.061 6.303 0.916 0.713 180 0.937 0.738 7.519 6.139 1.016 0.801 9.249 8.340 0.976 0.759 270 0.930 0.722 8.146 6.889 0.946 0.744 7.097 6.387 0.957 0.749 360 0.935 0.732 7.457 6.078 0.958 0.750 xx xx 0.975 0.765 360^("a ") 0.943 0.739 5.119 4.499 0.963 0.751 4.072 3.587 0.959 0.748 360^("b ") 0.920 0.718 7.458 6.102 0.960 0.753 1.096 0.853 0.926 0.719 https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-08.jpg?height=167&width=33&top_left_y=1705&top_left_x=301 30 1.088 0.833 1.773 1.364 1.094 0.842 2.199 1.790 1.109 0.857 60 1.109 0.864 2.600 2.056 1.132 0.879 3.611 3.099 1.133 0.881 90 1.138 0.882 3.285 2.558 1.146 0.884 4.597 4.061 1.152 0.889 180 1.122 0.864 4.950 3.968 1.217 0.951 6.293 5.454 1.156 0.901 270 1.160 0.904 5.046 4.196 1.175 0.909 4.799 4.228 1.190 0.921 360 1.135 0.881 4.716 3.772 1.203 0.934 xx xx 1.159 0.897 360^("a ") 1.154 0.896 3.422 2.914 1.237 0.955 2.257 1.666 1.181 0.903 360^("b ") 1.161 0.906 4.888 3.931 1.212 0.939 1.302 0.997 1.200 0.922 South China Sea 30 0.754 0.599 1.394 1.101 0.773 0.610 1.543 1.235 0.764 0.611 60 0.805 0.647 1.886 1.465 0.835 0.666 2.467 2.019 0.831 0.671 90 0.817 0.655 2.501 1.915 0.836 0.663 3.430 2.919 0.873 0.706 180 0.862 0.696 3.376 2.751 0.907 0.719 4.408 3.985 0.911 0.741 270 0.850 0.686 3.883 3.220 0.916 0.732 2.860 2.419 0.893 0.715 360 0.856 0.692 3.486 2.802 0.944 0.761 xx xx 0.898 0.723 360^("a ") 0.884 0.713 2.459 2.161 0.940 0.754 1.112 0.881 0.923 0.741 360^("b ") 0.895 0.708 3.293 2.659 0.951 0.761 1.001 0.786 0.920 0.728

| Models Metrics | | TransDtSt-Part | | LSTM | | N-BEATS | | TCN | | Informer | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | | RMSE $\left({ }^{\circ} \mathrm{C}\right)$ | MAE $\left({ }^{\circ} \mathrm{C}\right)$ | RMSE $\left({ }^{\circ} \mathrm{C}\right)$ | MAE $\left({ }^{\circ} \mathrm{C}\right)$ | RMSE $\left({ }^{\circ} \mathrm{C}\right)$ | MAE $\left({ }^{\circ} \mathrm{C}\right)$ | RMSE ( ${ }^{\circ} \mathrm{C}$ ) | MAE $\left({ }^{\circ} \mathrm{C}\right)$ | RMSE $\left({ }^{\circ} \mathrm{C}\right)$ | MAE $\left({ }^{\circ} \mathrm{C}\right)$ | | ![](https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-08.jpg?height=169&width=33&top_left_y=575&top_left_x=301) | 30 | 0.957 | 0.747 | 3.144 | 2.513 | 1.006 | 0.780 | 4.385 | 3.892 | 1.037 | 0.810 | | | 60 | 1.052 | 0.820 | 5.386 | 4.218 | 1.087 | 0.857 | 7.799 | 6.897 | 1.155 | 0.917 | | | 90 | 1.170 | 0.917 | 7.632 | 5.903 | 1.179 | 0.937 | 11.463 | 10.294 | 1.181 | 0.940 | | | 180 | 1.160 | 0.912 | 12.088 | 9.939 | 1.202 | 0.961 | 16.166 | 14.551 | 1.233 | 0.989 | | | 270 | 1.123 | 0.884 | 13.373 | 11.315 | 1.379 | 1.104 | 11.484 | 10.329 | 1.195 | 0.950 | | | 360 | 1.142 | 0.900 | 11.748 | 9.606 | 1.288 | 1.023 | $\times$ | $\times$ | 1.195 | 0.942 | | | $360{ }^{\text {a }}$ | 1.100 | 0.854 | 8.492 | 7.522 | 1.243 | 0.985 | 10.153 | 7.586 | 1.194 | 0.941 | | | $360{ }^{\text {b }}$ | 1.082 | 0.849 | 11.785 | 9.658 | 1.372 | 1.092 | 1.476 | 1.178 | 1.184 | 0.934 | | Yellow Sea, China | 30 | 0.876 | 0.658 | 2.522 | 2.028 | 0.880 | 0.663 | 3.384 | 2.977 | 0.950 | 0.714 | | | 60 | 0.981 | 0.742 | 4.179 | 3.279 | 1.000 | 0.771 | 5.919 | 5.375 | 1.059 | 0.805 | | | 90 | 1.000 | 0.751 | 5.776 | 4.533 | 1.022 | 0.785 | 8.805 | 7.862 | 1.105 | 0.846 | | | 180 | 1.055 | 0.802 | 8.996 | 7.381 | 1.154 | 0.896 | 11.846 | 10.617 | 1.152 | 0.882 | | | 270 | 1.070 | 0.828 | 9.521 | 7.958 | 1.161 | 0.905 | 8.658 | 7.711 | 1.173 | 0.919 | | | 360 | 1.068 | 0.816 | 9.297 | 7.584 | 1.204 | 0.936 | $\times$ | $\times$ | 1.152 | 0.888 | | | $360{ }^{\text {a }}$ | 1.036 | 0.787 | 6.619 | 5.787 | 1.144 | 0.891 | 9.126 | 8.362 | 1.102 | 0.852 | | | $360{ }^{\text {b }}$ | 1.004 | 0.760 | 9.221 | 7.430 | 1.157 | 0.886 | 1.292 | 0.989 | 1.114 | 0.855 | | East China Sea | 30 | 0.793 | 0.602 | 2.094 | 1.644 | 0.798 | 0.613 | 3.076 | 2.704 | 0.821 | 0.628 | | | 60 | 0.868 | 0.665 | 3.496 | 2.708 | 0.880 | 0.687 | 5.229 | 4.692 | 0.885 | 0.684 | | | 90 | 0.893 | 0.692 | 4.690 | 3.596 | 0.902 | 0.703 | 7.061 | 6.303 | 0.916 | 0.713 | | | 180 | 0.937 | 0.738 | 7.519 | 6.139 | 1.016 | 0.801 | 9.249 | 8.340 | 0.976 | 0.759 | | | 270 | 0.930 | 0.722 | 8.146 | 6.889 | 0.946 | 0.744 | 7.097 | 6.387 | 0.957 | 0.749 | | | 360 | 0.935 | 0.732 | 7.457 | 6.078 | 0.958 | 0.750 | $\times$ | $\times$ | 0.975 | 0.765 | | | $360{ }^{\text {a }}$ | 0.943 | 0.739 | 5.119 | 4.499 | 0.963 | 0.751 | 4.072 | 3.587 | 0.959 | 0.748 | | | $360{ }^{\text {b }}$ | 0.920 | 0.718 | 7.458 | 6.102 | 0.960 | 0.753 | 1.096 | 0.853 | 0.926 | 0.719 | | ![](https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-08.jpg?height=167&width=33&top_left_y=1705&top_left_x=301) | 30 | 1.088 | 0.833 | 1.773 | 1.364 | 1.094 | 0.842 | 2.199 | 1.790 | 1.109 | 0.857 | | | 60 | 1.109 | 0.864 | 2.600 | 2.056 | 1.132 | 0.879 | 3.611 | 3.099 | 1.133 | 0.881 | | | 90 | 1.138 | 0.882 | 3.285 | 2.558 | 1.146 | 0.884 | 4.597 | 4.061 | 1.152 | 0.889 | | | 180 | 1.122 | 0.864 | 4.950 | 3.968 | 1.217 | 0.951 | 6.293 | 5.454 | 1.156 | 0.901 | | | 270 | 1.160 | 0.904 | 5.046 | 4.196 | 1.175 | 0.909 | 4.799 | 4.228 | 1.190 | 0.921 | | | 360 | 1.135 | 0.881 | 4.716 | 3.772 | 1.203 | 0.934 | $\times$ | $\times$ | 1.159 | 0.897 | | | $360{ }^{\text {a }}$ | 1.154 | 0.896 | 3.422 | 2.914 | 1.237 | 0.955 | 2.257 | 1.666 | 1.181 | 0.903 | | | $360{ }^{\text {b }}$ | 1.161 | 0.906 | 4.888 | 3.931 | 1.212 | 0.939 | 1.302 | 0.997 | 1.200 | 0.922 | | South China Sea | 30 | 0.754 | 0.599 | 1.394 | 1.101 | 0.773 | 0.610 | 1.543 | 1.235 | 0.764 | 0.611 | | | 60 | 0.805 | 0.647 | 1.886 | 1.465 | 0.835 | 0.666 | 2.467 | 2.019 | 0.831 | 0.671 | | | 90 | 0.817 | 0.655 | 2.501 | 1.915 | 0.836 | 0.663 | 3.430 | 2.919 | 0.873 | 0.706 | | | 180 | 0.862 | 0.696 | 3.376 | 2.751 | 0.907 | 0.719 | 4.408 | 3.985 | 0.911 | 0.741 | | | 270 | 0.850 | 0.686 | 3.883 | 3.220 | 0.916 | 0.732 | 2.860 | 2.419 | 0.893 | 0.715 | | | 360 | 0.856 | 0.692 | 3.486 | 2.802 | 0.944 | 0.761 | $\times$ | $\times$ | 0.898 | 0.723 | | | $360{ }^{\text {a }}$ | 0.884 | 0.713 | 2.459 | 2.161 | 0.940 | 0.754 | 1.112 | 0.881 | 0.923 | 0.741 | | | $360{ }^{\text {b }}$ | 0.895 | 0.708 | 3.293 | 2.659 | 0.951 | 0.761 | 1.001 | 0.786 | 0.920 | 0.728 |

TABLE II 表 II
SST Multivariate Predictive Skill of the Models With Different Prediction Horizons in the Five Sea Areas
不同预测时段五个海域的海面温度多变量预测技能

Models Metrics 模型度量指标		TransDtSt-Part		LSTM		N-BEATS		TCN		Informer 信息转换器
Models Metrics 模型度量指标		RMSE ( $^{\circ} C$ ) 均方根误差（ $^{\circ} C$ ）	MAE ( $^{\circ} C$ ) MAE（平均绝对误差）	RMSE ( $^{\circ} C$ ) RMSE（均方根误差）	MAE ( $^{\circ} C$ ) MAE（平均绝对误差）	RMSE ( $^{\circ} C$ ) RMSE（均方根误差）	MAE ( $^{\circ} C$ ) MAE（ $^{\circ} C$ ）	RMSE ( $^{\circ} C$ ) 均方根误差（ $^{\circ} C$ ）	MAE ( $^{\circ} C$ ) MAE（ $^{\circ} C$ ）	RMSE ( $^{\circ} C$ ) 均方根误差（ $^{\circ} C$ ）	MAE ( $^{\circ} C$ ) MAE（平均绝对误差）
	30	1.029	0.800	3.019	2.377	1.059	0.824	4.371	3.875	1.131	0.879
	60	1.136	0.899	5.079	3.929	1.187	0.928	8.054	7.120	1.212	0.955
	90	1.224	0.968	7.056	5.532	1.263	0.997	10.459	9.136	1.273	1.005
	180	1.244	0.999	10.512	8.572	1.279	1.004	13.720	12.019	1.321	1.053
	270	1.247	0.979	11.587	9.642	1.262	0.990	9.494	8.299	1.248	0.983
	360	1.162	0.910	10.951	8.945	1.345	1.069	$\times$	$\times$	1.258	0.997
	$360^{a}$	1.184	0.932	8.398	7.440	1.419	1.138	10.320	8.885	1.240	0.974
	$360^{b}$	1.165	0.921	10.265	8.574	1.357	1.069	1.463	1.163	1.198	0.941
	30	0.889	0.672	2.443	1.914	0.929	0.712	3.699	3.290	1.039	0.799
	60	1.018	0.785	4.240	3.221	1.027	0.794	6.644	5.966	1.079	0.832
	90	1.031	0.772	5.921	4.321	1.060	0.816	8.633	7.640	1.130	0.866
	180	1.187	0.947	8.728	6.951	1.269	0.969	11.622	10.182	1.252	0.994
	270	1.112	0.866	8.480	6.745	1.201	0.925	9.028	8.055	1.137	0.885
	360	1.026	0.883	8.619	6.967	1.263	0.989	$\times$	$\times$	1.148	0.890
	$360^{a}$	1.063	0.826	6.661	5.861	1.279	0.995	9.721	8.681	1.159	0.907
	$360^{b}$	1.049	0.807	8.844	7.245	1.273	0.972	1.296	0.997	1.171	0.919
East China Sea 东海	30	0.847	0.644	2.253	1.765	0.848	0.655	2.811	2.425	0.889	0.688
	60	0.903	0.691	3.553	2.653	0.953	0.738	5.017	4.479	0.918	0.716
	90	0.923	0.713	5.156	3.953	0.952	0.739	6.839	6.095	0.961	0.751
	180	0.946	0.741	6.702	5.571	0.971	0.758	9.741	8.692	0.992	0.772
	270	0.944	0.738	7.535	6.324	0.959	0.754	6.866	6.178	1.000	0.780
	360	0.972	0.758	7.061	5.750	1.000	0.783	$\times$	$\times$	0.980	0.771
	$360^{a}$	0.975	0.753	5.075	4.468	0.997	0.784	8.267	7.303	0.994	0.779
	$360^{b}$	0.963	0.746	6.598	5.423	1.087	0.862	1.139	0.891	0.975	0.760
Taiwan Strait 台湾海峡	30	1.112	0.850	2.048	1.579	1.201	0.919	2.452	2.031	1.122	0.863
	60	1.168	0.901	2.587	2.126	1.193	0.919	3.767	3.210	1.176	0.915
	90	1.191	0.912	2.971	2.065	1.202	0.930	5.290	4.572	1.192	0.919
	180	1.166	0.895	4.869	3.786	1.169	0.901	6.975	6.073	1.283	0.987
	270	1.175	0.906	4.625	3.691	1.183	0.911	4.867	4.217	1.237	0.946
	360	1.162	0.900	5.272	4.205	1.203	0.925	$\times$	$\times$	1.186	0.910
	$360^{a}$	1.248	0.940	3.674	3.121	1.253	0.974	6.695	5.869	1.240	0.948
	$360^{b}$	1.190	0.915	4.671	3.764	1.296	1.004	1.386	1.043	1.283	0.988
South China Sea 南海	30	0.759	0.597	1.367	1.110	0.767	0.599	1.599	1.305	0.832	0.669
	60	0.809	0.651	1.913	1.446	0.826	0.658	2.465	2.049	0.840	0.676
	90	0.829	0.669	2.367	1.803	0.850	0.676	3.044	2.571	0.875	0.702
	180	0.861	0.691	3.323	2.717	0.877	0.695	3.510	3.060	0.915	0.735
	270	0.867	0.698	3.643	3.037	0.897	0.717	3.168	2.706	0.887	0.714
	360	0.877	0.702	3.012	2.447	0.927	0.738	$\times$	$\times$	0.894	0.720
	$360^{a}$	0.847	0.680	2.435	2.152	0.955	0.764	2.781	2.473	0.918	0.738
	$360^{b}$	0.891	0.705	3.443	2.777	0.941	0.750	0.968	0.763	0.930	0.741
Count 计数		40	40	0	0	0	0	0	0	0	0

Models Metrics TransDtSt-Part LSTM N-BEATS TCN Informer RMSE ( ^(@)C ) MAE ( ^(@)C ) RMSE ( ^(@)C ) MAE ( ^(@)C ) RMSE ( ^(@)C ) MAE ( ^(@)C ) RMSE ( ^(@)C ) MAE ( ^(@)C ) RMSE ( ^(@)C ) MAE ( ^(@)C ) https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-09.jpg?height=174&width=42&top_left_y=569&top_left_x=310 30 1.029 0.800 3.019 2.377 1.059 0.824 4.371 3.875 1.131 0.879 60 1.136 0.899 5.079 3.929 1.187 0.928 8.054 7.120 1.212 0.955 90 1.224 0.968 7.056 5.532 1.263 0.997 10.459 9.136 1.273 1.005 180 1.244 0.999 10.512 8.572 1.279 1.004 13.720 12.019 1.321 1.053 270 1.247 0.979 11.587 9.642 1.262 0.990 9.494 8.299 1.248 0.983 360 1.162 0.910 10.951 8.945 1.345 1.069 xx xx 1.258 0.997 360^("a ") 1.184 0.932 8.398 7.440 1.419 1.138 10.320 8.885 1.240 0.974 360^("b ") 1.165 0.921 10.265 8.574 1.357 1.069 1.463 1.163 1.198 0.941 https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-09.jpg?height=242&width=44&top_left_y=910&top_left_x=309 30 0.889 0.672 2.443 1.914 0.929 0.712 3.699 3.290 1.039 0.799 60 1.018 0.785 4.240 3.221 1.027 0.794 6.644 5.966 1.079 0.832 90 1.031 0.772 5.921 4.321 1.060 0.816 8.633 7.640 1.130 0.866 180 1.187 0.947 8.728 6.951 1.269 0.969 11.622 10.182 1.252 0.994 270 1.112 0.866 8.480 6.745 1.201 0.925 9.028 8.055 1.137 0.885 360 1.026 0.883 8.619 6.967 1.263 0.989 xx xx 1.148 0.890 360^("a ") 1.063 0.826 6.661 5.861 1.279 0.995 9.721 8.681 1.159 0.907 360^("b ") 1.049 0.807 8.844 7.245 1.273 0.972 1.296 0.997 1.171 0.919 East China Sea 30 0.847 0.644 2.253 1.765 0.848 0.655 2.811 2.425 0.889 0.688 60 0.903 0.691 3.553 2.653 0.953 0.738 5.017 4.479 0.918 0.716 90 0.923 0.713 5.156 3.953 0.952 0.739 6.839 6.095 0.961 0.751 180 0.946 0.741 6.702 5.571 0.971 0.758 9.741 8.692 0.992 0.772 270 0.944 0.738 7.535 6.324 0.959 0.754 6.866 6.178 1.000 0.780 360 0.972 0.758 7.061 5.750 1.000 0.783 xx xx 0.980 0.771 360^("a ") 0.975 0.753 5.075 4.468 0.997 0.784 8.267 7.303 0.994 0.779 360^("b ") 0.963 0.746 6.598 5.423 1.087 0.862 1.139 0.891 0.975 0.760 Taiwan Strait 30 1.112 0.850 2.048 1.579 1.201 0.919 2.452 2.031 1.122 0.863 60 1.168 0.901 2.587 2.126 1.193 0.919 3.767 3.210 1.176 0.915 90 1.191 0.912 2.971 2.065 1.202 0.930 5.290 4.572 1.192 0.919 180 1.166 0.895 4.869 3.786 1.169 0.901 6.975 6.073 1.283 0.987 270 1.175 0.906 4.625 3.691 1.183 0.911 4.867 4.217 1.237 0.946 360 1.162 0.900 5.272 4.205 1.203 0.925 xx xx 1.186 0.910 360^("a ") 1.248 0.940 3.674 3.121 1.253 0.974 6.695 5.869 1.240 0.948 360^("b ") 1.190 0.915 4.671 3.764 1.296 1.004 1.386 1.043 1.283 0.988 South China Sea 30 0.759 0.597 1.367 1.110 0.767 0.599 1.599 1.305 0.832 0.669 60 0.809 0.651 1.913 1.446 0.826 0.658 2.465 2.049 0.840 0.676 90 0.829 0.669 2.367 1.803 0.850 0.676 3.044 2.571 0.875 0.702 180 0.861 0.691 3.323 2.717 0.877 0.695 3.510 3.060 0.915 0.735 270 0.867 0.698 3.643 3.037 0.897 0.717 3.168 2.706 0.887 0.714 360 0.877 0.702 3.012 2.447 0.927 0.738 xx xx 0.894 0.720 360^("a ") 0.847 0.680 2.435 2.152 0.955 0.764 2.781 2.473 0.918 0.738 360^("b ") 0.891 0.705 3.443 2.777 0.941 0.750 0.968 0.763 0.930 0.741 Count 40 40 0 0 0 0 0 0 0 0

| Models Metrics | | TransDtSt-Part | | LSTM | | N-BEATS | | TCN | | Informer | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | | RMSE ( ${ }^{\circ} \mathrm{C}$ ) | MAE ( ${ }^{\circ} \mathrm{C}$ ) | RMSE ( ${ }^{\circ} \mathrm{C}$ ) | MAE ( ${ }^{\circ} \mathrm{C}$ ) | RMSE ( ${ }^{\circ} \mathrm{C}$ ) | MAE ( ${ }^{\circ} \mathrm{C}$ ) | RMSE ( ${ }^{\circ} \mathrm{C}$ ) | MAE ( ${ }^{\circ} \mathrm{C}$ ) | RMSE ( ${ }^{\circ} \mathrm{C}$ ) | MAE ( ${ }^{\circ} \mathrm{C}$ ) | | ![](https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-09.jpg?height=174&width=42&top_left_y=569&top_left_x=310) | 30 | 1.029 | 0.800 | 3.019 | 2.377 | 1.059 | 0.824 | 4.371 | 3.875 | 1.131 | 0.879 | | | 60 | 1.136 | 0.899 | 5.079 | 3.929 | 1.187 | 0.928 | 8.054 | 7.120 | 1.212 | 0.955 | | | 90 | 1.224 | 0.968 | 7.056 | 5.532 | 1.263 | 0.997 | 10.459 | 9.136 | 1.273 | 1.005 | | | 180 | 1.244 | 0.999 | 10.512 | 8.572 | 1.279 | 1.004 | 13.720 | 12.019 | 1.321 | 1.053 | | | 270 | 1.247 | 0.979 | 11.587 | 9.642 | 1.262 | 0.990 | 9.494 | 8.299 | 1.248 | 0.983 | | | 360 | 1.162 | 0.910 | 10.951 | 8.945 | 1.345 | 1.069 | $\times$ | $\times$ | 1.258 | 0.997 | | | $360{ }^{\text {a }}$ | 1.184 | 0.932 | 8.398 | 7.440 | 1.419 | 1.138 | 10.320 | 8.885 | 1.240 | 0.974 | | | $360{ }^{\text {b }}$ | 1.165 | 0.921 | 10.265 | 8.574 | 1.357 | 1.069 | 1.463 | 1.163 | 1.198 | 0.941 | | ![](https://cdn.mathpix.com/cropped/2025_06_27_a650ee37d30fd62bb9b7g-09.jpg?height=242&width=44&top_left_y=910&top_left_x=309) | 30 | 0.889 | 0.672 | 2.443 | 1.914 | 0.929 | 0.712 | 3.699 | 3.290 | 1.039 | 0.799 | | | 60 | 1.018 | 0.785 | 4.240 | 3.221 | 1.027 | 0.794 | 6.644 | 5.966 | 1.079 | 0.832 | | | 90 | 1.031 | 0.772 | 5.921 | 4.321 | 1.060 | 0.816 | 8.633 | 7.640 | 1.130 | 0.866 | | | 180 | 1.187 | 0.947 | 8.728 | 6.951 | 1.269 | 0.969 | 11.622 | 10.182 | 1.252 | 0.994 | | | 270 | 1.112 | 0.866 | 8.480 | 6.745 | 1.201 | 0.925 | 9.028 | 8.055 | 1.137 | 0.885 | | | 360 | 1.026 | 0.883 | 8.619 | 6.967 | 1.263 | 0.989 | $\times$ | $\times$ | 1.148 | 0.890 | | | $360{ }^{\text {a }}$ | 1.063 | 0.826 | 6.661 | 5.861 | 1.279 | 0.995 | 9.721 | 8.681 | 1.159 | 0.907 | | | $360{ }^{\text {b }}$ | 1.049 | 0.807 | 8.844 | 7.245 | 1.273 | 0.972 | 1.296 | 0.997 | 1.171 | 0.919 | | East China Sea | 30 | 0.847 | 0.644 | 2.253 | 1.765 | 0.848 | 0.655 | 2.811 | 2.425 | 0.889 | 0.688 | | | 60 | 0.903 | 0.691 | 3.553 | 2.653 | 0.953 | 0.738 | 5.017 | 4.479 | 0.918 | 0.716 | | | 90 | 0.923 | 0.713 | 5.156 | 3.953 | 0.952 | 0.739 | 6.839 | 6.095 | 0.961 | 0.751 | | | 180 | 0.946 | 0.741 | 6.702 | 5.571 | 0.971 | 0.758 | 9.741 | 8.692 | 0.992 | 0.772 | | | 270 | 0.944 | 0.738 | 7.535 | 6.324 | 0.959 | 0.754 | 6.866 | 6.178 | 1.000 | 0.780 | | | 360 | 0.972 | 0.758 | 7.061 | 5.750 | 1.000 | 0.783 | $\times$ | $\times$ | 0.980 | 0.771 | | | $360{ }^{\text {a }}$ | 0.975 | 0.753 | 5.075 | 4.468 | 0.997 | 0.784 | 8.267 | 7.303 | 0.994 | 0.779 | | | $360{ }^{\text {b }}$ | 0.963 | 0.746 | 6.598 | 5.423 | 1.087 | 0.862 | 1.139 | 0.891 | 0.975 | 0.760 | | Taiwan Strait | 30 | 1.112 | 0.850 | 2.048 | 1.579 | 1.201 | 0.919 | 2.452 | 2.031 | 1.122 | 0.863 | | | 60 | 1.168 | 0.901 | 2.587 | 2.126 | 1.193 | 0.919 | 3.767 | 3.210 | 1.176 | 0.915 | | | 90 | 1.191 | 0.912 | 2.971 | 2.065 | 1.202 | 0.930 | 5.290 | 4.572 | 1.192 | 0.919 | | | 180 | 1.166 | 0.895 | 4.869 | 3.786 | 1.169 | 0.901 | 6.975 | 6.073 | 1.283 | 0.987 | | | 270 | 1.175 | 0.906 | 4.625 | 3.691 | 1.183 | 0.911 | 4.867 | 4.217 | 1.237 | 0.946 | | | 360 | 1.162 | 0.900 | 5.272 | 4.205 | 1.203 | 0.925 | $\times$ | $\times$ | 1.186 | 0.910 | | | $360{ }^{\text {a }}$ | 1.248 | 0.940 | 3.674 | 3.121 | 1.253 | 0.974 | 6.695 | 5.869 | 1.240 | 0.948 | | | $360{ }^{\text {b }}$ | 1.190 | 0.915 | 4.671 | 3.764 | 1.296 | 1.004 | 1.386 | 1.043 | 1.283 | 0.988 | | South China Sea | 30 | 0.759 | 0.597 | 1.367 | 1.110 | 0.767 | 0.599 | 1.599 | 1.305 | 0.832 | 0.669 | | | 60 | 0.809 | 0.651 | 1.913 | 1.446 | 0.826 | 0.658 | 2.465 | 2.049 | 0.840 | 0.676 | | | 90 | 0.829 | 0.669 | 2.367 | 1.803 | 0.850 | 0.676 | 3.044 | 2.571 | 0.875 | 0.702 | | | 180 | 0.861 | 0.691 | 3.323 | 2.717 | 0.877 | 0.695 | 3.510 | 3.060 | 0.915 | 0.735 | | | 270 | 0.867 | 0.698 | 3.643 | 3.037 | 0.897 | 0.717 | 3.168 | 2.706 | 0.887 | 0.714 | | | 360 | 0.877 | 0.702 | 3.012 | 2.447 | 0.927 | 0.738 | $\times$ | $\times$ | 0.894 | 0.720 | | | $360{ }^{\text {a }}$ | 0.847 | 0.680 | 2.435 | 2.152 | 0.955 | 0.764 | 2.781 | 2.473 | 0.918 | 0.738 | | | $360{ }^{\text {b }}$ | 0.891 | 0.705 | 3.443 | 2.777 | 0.941 | 0.750 | 0.968 | 0.763 | 0.930 | 0.741 | | Count | | 40 | 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Fig. 8. Univariate, every ocean, all lead times, number of prediction improvement rate intervals of TransDtSt-Part relative to N-BEATS. (a) Yellow Sea. (b) East China Sea. © South China Sea. (d) Bohai Sea. (e) Taiwan Strait.
图 8. TransDtSt-Part 相对于 N-BEATS 的单变量、每个海域、所有预测时间的预测改进率区间数。(a) 黄海。(b) 东海。(c) 南海。(d) 渤海。(e) 台湾海峡。
relatively small (RMSE~0.75

^{\circ} C - 1^{\circ} C

, MAE~

{0.6}^{\circ} C -

{0.8}^{\circ} C

), while those in the Bohai Sea and the Taiwan Strait are relatively large (RMSE

\sim {0.95}^{\circ} C - {1.2}^{\circ} C

, MAE

\sim {0.75}^{\circ} C - {0.9}^{\circ} C

). The prediction effect of the open sea is better than that of other oceans, which may be attributed to the larger SST fluctuations in the coastal area, while the SST in the open ocean area is relatively stable [17], [21], [22], [24].
相对较小（均方根误差约为 0.75，平均绝对误差约为），而渤海和台湾海峡的误差相对较大（均方根误差，平均绝对误差）。开阔海域的预测效果优于其他海域，这可能是由于沿海地区海面温度波动较大，而公海区域的海面温度相对稳定[17]、[21]、[22]、[24]。
Divided by sea area, the number of prediction improvement rates of TransDtSt-Part relative to N-BEATS is calculated for all lead times, as shown in Fig. 8. It can be found that the numbers of the three improvement rate intervals in the Yellow Sea and the Bohai Sea are roughly the same, while the improvement rates in the East China Sea, South China Sea, and Taiwan Strait are concentrated below

10 %

.
按海域划分，计算 TransDtSt-Part 相对于 N-BEATS 在所有预测时间的预测改进率数量，如图 8 所示。可以发现，黄海和渤海的三个改进率区间数量大致相同，而东海、南海和台湾海峡的改进率主要集中在以下区间。
4) The prediction accuracy of the model TransDtSt-Part is significantly higher than that of the models LSTM for each ocean and each lead time. This may be due to the fact that LSTM adopts autoregressive decoding, so the model prediction error accumulates as the lead time increases. Since the period of the SST signal is about 360 days, the TCN prediction error is smaller than other lead times when seq

_{len}

is an integer multiple of 360 days (i.e., seq

_{len} = 720

days). The predictive skill of TransDtSt-Part that considers full attention is better than that of Informer only with part of the attention coefficients.
4) 对于每个海洋和每个预测时间尺度，模型 TransDtSt-Part 的预测精度都显著高于 LSTM 模型。这可能是因为 LSTM 采用自回归解码，导致模型预测误差随预测时间的增加而累积。由于海面温度信号的周期约为 360 天，当 seq

_{len}

为 360 天的整数倍（即 seq

_{len} = 720

天）时，TCN 的预测误差小于其他预测时间。考虑全注意力机制的 TransDtSt-Part 的预测能力优于仅使用部分注意力系数的 Informer。
2) Analysis of SST Multivariate Prediction Results: Similar to the analysis of univariate prediction results, the following conclusions on SST multivariate prediction results (Table II) are drawn.
2) 海面温度多变量预测结果分析：与单变量预测结果分析类似，对海面温度多变量预测结果（表 II）得出以下结论。

The model TransDtSt-Part also has a trend of a steady and slow rise in performance as the lead time becomes longer.
模型 TransDtSt-Part 在预测时间变长的情况下，性能也呈现稳步缓慢上升的趋势。
In all research areas, the prediction error of TransDtStPart is smaller than that of other baseline models at all prediction horizons.
在所有研究区域中，TransDtSt-Part 的预测误差在所有预测视野中都小于其他基准模型。
From Fig. 9, compared with N-BEATS, TransDtSt-Part improves more for longer lead times.
从图 9 可以看出，与 N-BEATS 相比，TransDtSt-Part 在更长的预测时间范围内提升更为明显。

Fig. 9. Multivariate, all oceans, number of prediction improvement rate intervals of TransDtSt-Part relative to N-BEATS. (a) All lead times. (b) Lead times

\geq 180

days.
图 9. TransDtSt-Part 相对于 N-BEATS 的多变量、全海洋预测改进率区间数。(a) 所有预测时间。(b) 预测时间小于

\geq 180

天。

Fig. 10. Multivariate, every ocean, all lead times, number of prediction improvement rate intervals of TransDtSt-Part relative to N-BEATS. (a) Yellow Sea. (b) East China Sea. © South China Sea. (d) Bohai Sea. (e) Taiwan Strait.
图 10. TransDtSt-Part 相对于 N-BEATS 的多变量、各海域、所有预测时间的预测改进率区间数。(a) 黄海。(b) 东海。(c) 南海。(d) 渤海。(e) 台湾海峡。
3) The prediction errors in the open seas of the Yellow Sea, the East China Sea, and the South China Sea are relatively small (RMSE

\sim {0.75}^{\circ} C - {1.20}^{\circ} C

, MAE

\sim {0.60}^{\circ} C -

{0.95}^{\circ} C

), while those in the Bohai Sea and the Taiwan Strait are relatively large (RMSE

\sim 1^{\circ} C - {1.25}^{\circ} C

MAE \sim {0.80}^{\circ} C - {1.0}^{\circ} C)

.
3) 黄海、东海和南海的公海预测误差相对较小（均方根误差

\sim {0.75}^{\circ} C - {1.20}^{\circ} C

，平均绝对误差

\sim {0.60}^{\circ} C -

{0.95}^{\circ} C

），而渤海和台湾海峡的预测误差相对较大（均方根误差

\sim 1^{\circ} C - {1.25}^{\circ} C

，

MAE \sim {0.80}^{\circ} C - {1.0}^{\circ} C)

）。
Fig. 10 exhibits the statistical value of the number of prediction improvement rate intervals of TransDtSt-Part relative to N-BEATS for all prediction horizons and research areas. The numbers of the three improvement rate intervals in the Yellow Sea are roughly equal, the improvement rates in the Bohai Sea are distributed at both ends of

\leq 5 %

and

\geq 10 %

, and the improvement rates in the East China Sea, South China Sea, and Taiwan Strait are concentrated below

10 %

.
图 10 展示了 TransDtSt-Part 相对于 N-BEATS 在所有预测范围和研究区域的预测改进率区间的统计值。在黄海区域，三个改进率区间大致相等；渤海区域的改进率分布在

\leq 5 %

和

\geq 10 %

两端；东海、南海和台湾海峡的改进率主要集中在

10 %

以下。
3) Relationship Between Encoder Input Length and Model Performance: From Tables I and II, the prediction results of TransDtSt-Part present area-specific for seq

_{len} \in

{360, 540, 720}

days, label

_{len} = 360

days, and pred

_{len} = 360

days. Specifically, for the univariate case, the prediction errors of the Bohai Sea and the Yellow Sea decrease with the increase of

{seq}_{len}

, while the prediction errors of the East China Sea, the Taiwan Strait, and the South China Sea increase. For the multivariate case, the prediction error of the East China Sea
3) 编码器输入长度与模型性能的关系：从表 I 和表 II 可以看出，TransDtSt-Part 的预测结果对于 seq

_{len} \in

{360, 540, 720}

天、标签

_{len} = 360

天和预测

_{len} = 360

天具有特定区域的特征。具体而言，对于单变量情况，渤海和黄海的预测误差随着

{seq}_{len}

的增加而减小，而东海、台湾海峡和南海的预测误差则增加。对于多变量情况，东海的预测误差

TABLE III 表 III
Seasonal Prediction Error-Univariate Prediction Showcase of Bohai Sea
季节性预测误差 - 渤海单变量预测展示

Model 模型	RMSE ( $^{\circ} C$ ) 均方根误差（ $^{\circ} C$ ）				MAE ( $^{\circ} C$ ) 平均绝对误差（ $^{\circ} C$ ）
Model 模型	Winter 冬季	Spring 春季	Summer 夏季	Autumn 秋季	Winter 冬季	Spring 春季	Summer 夏季	Autumn 秋季
TransDtSt-Part	0.857	0.823	0.816	0.865	0.711	0.673	0.658	0.705
LSTM	4.319	3.257	3.261	3.206	3.857	2.762	2.678	2.706
N-BEATS	0.902	0.872	0.826	0.897	0.735	0.721	0.666	0.722
TCN	3.195	2.845	3.454	3.048	2.735	2.431	3.109	2.556
Informer	0.901	0.804	0.843	0.925	0.743	0.654	0.676	0.747

Fig. 11. Under the univariate pattern, the predictions (pred

_{len} = 360

days with seq

_{len} = 720

days) of (a) TransDtSt-Part, (b) LSTM, © N-BEATS, (d) TCN, and (e) informer on the Bohai Sea. The orange/blue curves stand for slices of the prediction/ground truth.
图 11. 在单变量模式下，渤海海域(a) TransDtSt-Part、(b) LSTM、(c) N-BEATS、(d) TCN 和(e) Informer 的预测结果（预测

_{len} = 360

天，序列长度

_{len} = 720

天）。橙色/蓝色曲线分别代表预测值和真实值的切片。

Fig. 12. Under the univariate pattern, the correlations between the predictions (pred

_{l e n} = 360

days with seq

_{l e n} = 720

days) of (a) TransDtSt-Part, (b) LSTM, © N-BEATS, (d) TCN, and (e) informer and the ground truth on the Bohai Sea. The dark red dash line stands for the 1:1 line.
图 12. 在单变量模式下，渤海海域(a) TransDtSt-Part、(b) LSTM、(c) N-BEATS、(d) TCN 和(e) Informer 预测结果与真实值之间的相关性（预测

_{l e n} = 360

天，序列长度

_{l e n} = 720

天）。深红色虚线表示 1:1 线。
decreases with the seqlen increasing. The prediction errors of the Bohai Sea, the Yellow Sea, and the Taiwan Strait increase first and then decrease. But the South China Sea is the opposite, first decreasing and then increasing.
随着序列长度增加，预测误差呈减小趋势。渤海、黄海和台湾海峡的预测误差先增加后减少。但南海则相反，先减少后增加。
4) Univariate Prediction Showcase: Fig. 11 shows the prediction slices of the model TransDtSt-Part and the baseline models in the Bohai Sea with seq

_{len} = 720

days, label

_{len} =

360 days, and pred

_{len} = 360

days. Fig. 12 shows the correlation between each model’s prediction and ground truth.
4) 单变量预测展示：图 11 展示了在渤海海域，序列长度为

_{len} = 720

天，标签长度为

_{len} =

天，预测长度为

_{len} = 360

天时，TransDtSt-Part 模型和基准模型的预测切片。图 12 展示了每个模型预测结果与真实值之间的相关性。

It can be seen from Fig. 11(b) that due to the autoregressive decoding, the LSTM prediction error continues to accumulate, and the prediction curve seriously deviates from
从图 11(b)可以看出，由于自回归解码，LSTM 的预测误差持续累积，预测曲线严重偏离
the ground truth. From Figs. 11©, (d), and 12©, (d), NBEATS and TCN have larger prediction errors at some troughs (

2^{\circ} C - 5^{\circ} C

) and peaks (

22^{\circ} C - 25^{\circ} C

). Although in Figs.

11 (a)

, (e), and 12(a), (e), informer and TransDtSt-Part have relatively poor performance in the low-value part of SST, they will be more in line with the ground truth in the high-value part of SST. Compared with informer, the overall predictive skill of TransDtSt-Part considering all attention coefficients is better.
真实值。从图 11©、(d)以及图 12©、(d)可以看出，NBEATS 和 TCN 在某些波谷（

2^{\circ} C - 5^{\circ} C

）和波峰（

22^{\circ} C - 25^{\circ} C

）处的预测误差较大。尽管在图

11 (a)

、(e)以及图 12(a)、(e)中，Informer 和 TransDtSt-Part 在海面温度低值区域的表现相对较差，但在海面温度高值区域更接近真实值。与 Informer 相比，考虑所有注意力系数的 TransDtSt-Part 的整体预测技能更好。
5) Multivariate Prediction Showcase: Taking the South China Sea as an example, when the pred

_{len}

is 270 days, Fig. 13 exhibits the slices of the last dimension of the prediction results
5) 多变量预测展示：以南海为例，当预测长度为

_{len}

天时，图 13 展示了预测结果最后一个维度的切片

TABLE IV 表 IV
Seasonal Prediction Error-Multivariate Prediction Showcase of South China Sea
南海季节性预测误差的多变量预测展示

Model 模型	RMSE ( $^{\circ} C$ ) 均方根误差（ $^{\circ} C$ ）			MAE ( $^{\circ} C$ ) MAE（平均绝对误差）
Model 模型	Winter 冬季	Spring 春季	Summer 夏季	Winter 冬季	Spring 春季	Summer 夏季
TransDtSt-Part	1.069	1.045	1.072	1.084	0.842	0.833
LSTM	11.938	10.977	12.439	11.009	9.791	9.413
N-BEATS	1.357	1.344	1.342	1.380	1.091	1.073
TCN	1.482	1.471	1.452	1.446	1.206	1.198
Informer	1.220	1.139	1.155	1.155	0.974	0.914

Fig. 13. Based on the multivariate pattern, the predictions (pred

_{len} = 270

days with seqlen

= 360

days) of (a) TransDtSt-Part, (b) LSTM, © N-BEATS, (d) TCN, and (e) informer on the South China Sea. The orange/blue curves stand for slices of the prediction/ground truth.
图 13. 基于多变量模式，(a) TransDtSt-Part、(b) LSTM、(c) N-BEATS、(d) TCN 和(e) Informer 在南海地区的预测结果（预测

_{len} = 270

天，序列长度

= 360

天）。橙色/蓝色曲线代表预测/真实值的切片。

Fig. 14. Based on the multivariate pattern, the correlations between the predictions ( pred

_{len} = 270

days with seq

_{len} = 360

days) of (a) TransDtSt-Part, (b) LSTM, © N-BEATS, (d) TCN, and (e) informer and the ground truth on the South China Sea. The dark red dash line stands for the 1:1 line.
图 14. 基于多变量模式，(a) TransDtSt-Part、(b) LSTM、(c) N-BEATS、(d) TCN 和(e) Informer 在南海地区的预测结果（预测

_{len} = 270

天，序列长度

_{len} = 360

天）与真实值之间的相关性。深红色虚线代表 1:1 直线。
and the corresponding Ground truth. Fig. 14 shows the correlation between the prediction and Ground truth at this time.
以及相应的真实值。图 14 展示了此时预测与真实值之间的相关性。

The prediction performance of LSTM in Figs. 13(b) and 14(b) is even worse, and the model cannot make normal predictions and finally presents a straight line. TCN has also been unable to capture the individual long-range dependencies between outputs and inputs for SST long-sequence [Figs. 13(d) and 14(d)]. From Figs. 13© and 14©, N-BEATS exhibits overestimation at the lower SST (

22^{\circ} C - 24^{\circ} C

) and underestimation at the higher SST (

28^{\circ} C - {30.5}^{\circ} C

) under this case. In Fig. 13(a) and (e), informer
在图 13(b)和图 14(b)中，LSTM 的预测性能更差，模型无法进行正常预测，最终呈现为一条直线。TCN 也无法捕捉海面温度长序列中输出和输入之间的个体长期依赖关系[图 13(d)和图 14(d)]。从图 13(c)和图 14(c)可以看出，N-BEATS 在该情况下对低海面温度（

22^{\circ} C - 24^{\circ} C

）存在高估，对高海面温度（

28^{\circ} C - {30.5}^{\circ} C

）存在低估。在图 13(a)和(e)中，Informer
and TransDtSt-Part can still accurately grasp the long-term change trend of SST. On the whole, informer’s prediction is more fluctuating, and the prediction curve of TransDtSt-Part is smoother. Both informer and TransDtSt-Part have large prediction errors in the range of

24^{\circ} C - 28^{\circ} C

[Fig. 14(a) and (e)].
并且 TransDtSt-Part 仍然能够准确把握海面温度的长期变化趋势。总体而言，Informer 的预测更为波动，而 TransDtSt-Part 的预测曲线更加平滑。Informer 和 TransDtSt-Part 在[图 14(a)和(e)]所示范围内都存在较大的预测误差。
6) Seasonal Prediction Error Analysis: For all SST prediction slice data in the univariate prediction showcase (i.e., Bohai Sea with seq

_{len} = 720

days, label

_{len} = 360

days, and pred

_{len}

= 360

days) and multivariate prediction showcase (i.e., South China Sea with seq

_{len} = 360

days, label

_{len} = 360

days, and
6) 季节性预测误差分析：对于单变量预测展示中的所有海表温度预测切片数据（即渤海海域，序列天数为

_{len} = 720

，标签天数为

_{len} = 360

，预测天数为

_{len}

= 360

）和多变量预测展示（即南海，序列天数为

_{len} = 360

，标签天数为

_{len} = 360

，以及
pred

_{len} = 270

days), according to the four seasons of Winter (Jan.-Mar.), Spring (Apr.-Jun.), Summer (Jul.-Sep.), and Autumn (Oct.-Dec.), the average predictive skills of TransDtStPart and the baseline models are calculated, respectively, as shown in Tables III and IV.
按照冬季（1 月-3 月）、春季（4 月-6 月）、夏季（7 月-9 月）和秋季（10 月-12 月）这四个季节，分别计算 TransDtStPart 和基准模型的平均预测技能，如表 III 和表 IV 所示。

It is easy to find from Tables III and IV that except for the Spring in the univariate prediction of Bohai Sea, the prediction performance of the TransDtSt-Part model is slightly inferior to that of the informer and its predictive skills are the best in other seasons. It proves the excellent seasonal SST prediction ability of the TransDtSt-Part model.
从表 III 和表 IV 可以很容易地发现，除了渤海海域单变量预测的春季外，TransDtSt-Part 模型的预测性能略低于 Informer，而在其他季节其预测技能是最佳的。这证明了 TransDtSt-Part 模型在海面温度季节性预测中的卓越能力。

V. CONCLUSION V. 结论

We focus on the long-term prediction of SST in the China Sea at a fine-grained daily level in the article. Transformer has powerful time-series modeling capabilities, but it also with some disadvantages, such as high computational complexity, low autoregressive decoding efficiency, and easy accumulation of errors. We make targeted improvements to build the model TransDtSt-Part: using generative decoding, embedding time-dimensional information, and introducing attention distilling and partial stacked connection. Among the extensive experiments of two prediction patterns and multiple lead times in the five China Sea regions, the prediction performance of the model TransDtSt-Part outperforms all competitive baseline models to varying degrees, proving its excellent long-term predictive skill of SST. It may be helpful for many urgent long-term requirements in marine and climate applications.
本文聚焦于中国海域日级精度的长期海面温度预测。虽然 Transformer 具有强大的时间序列建模能力，但也存在一些缺点，如高计算复杂度、自回归解码效率低、以及容易累积误差。我们针对性地改进，构建了 TransDtSt-Part 模型：使用生成式解码、嵌入时间维度信息，并引入注意力蒸馏和部分堆叠连接。通过在中国海五个区域进行的广泛实验，包括两种预测模式和多个预测时间，TransDtSt-Part 模型在不同程度上超越了所有竞争性基准模型，证明了其在海面温度长期预测中的出色技能。这可能对海洋和气候应用中的许多紧迫的长期需求有所帮助。

AcKNOWLEDGMENT 致谢

Special thanks for the support from high-resolution SST data provided by the NOAA/OAR/ESRL PSL, Boulder, Colorado, USA. All Daily mean SST data used in this article were obtained from NOAA/OAR/ESRL PSL at https://psl.noaa.gov/ data/gridded/data.noaa.oisst.v2.highres.html.
特别感谢 NOAA/OAR/ESRL PSL（美国科罗拉多州博尔德）提供的高分辨率海面温度数据。本文使用的所有逐日平均海面温度数据均从 https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html 的 NOAA/OAR/ESRL PSL 获取。

We would also like to thank the third-party Python library Darts and Optuna.
我们还要感谢第三方 Python 库 Darts 和 Optuna。

References 参考文献

[1] M. Bouali, O. T. Sato, and P. S. Polito, “Temporal trends in sea surface temperature gradients in the South Atlantic Ocean,” Remote Sens. Environ., vol. 194, pp. 100-114, Jun. 2017, doi: 10.1016/j.rse.2017.03.008.
[1] M. Bouali, O. T. Sato 和 P. S. Polito，"南大西洋海面温度梯度的时间趋势"，《遥感环境》，第 194 卷，第 100-114 页，2017 年 6 月，doi: 10.1016/j.rse.2017.03.008。
[2] T. D. Herbert, L. C. Peterson, K. T. Lawrence, and Z. H. Liu, “Tropical ocean temperatures over the past 3.5 million years,” Science, vol. 328, no. 5985, pp. 1530-1534, Jun. 2010, doi: 10.1126/science. 1185435.
[2] T. D. Herbert、L. C. Peterson、K. T. Lawrence 和 Z. H. Liu，"过去 350 万年的热带海洋温度"，《科学》杂志，第 328 卷，第 5985 期，第 1530-1534 页，2010 年 6 月，doi: 10.1126/science.1185435。
[3] K. Patil, M. C. Deo, S. Ghosh, and M. Ravichandran, “Predicting sea surface temperatures in the North Indian Ocean with nonlinear autoregressive neural networks,” Int. J. Oceanogr., vol. 2013, Apr. 2013, Art. no. 302479, doi: 10.1155/2013/302479.
[3] K. Patil、M. C. Deo、S. Ghosh 和 M. Ravichandran，"使用非线性自回归神经网络预测北印度洋海面温度"，《国际海洋学》杂志，第 2013 卷，2013 年 4 月，文章编号 302479，doi: 10.1155/2013/302479。
[4] K. Patil, M. C. Deo, and M. Ravichandran, “Prediction of sea surface temperature by combining numerical and neural techniques,”

J

. Atmos. Ocean. Technol., vol. 33, no. 8, pp. 1715-1726, Aug. 2016, doi: 10.1175/JTECH-D-15-0213.1.
[4] K. Patil、M. C. Deo 和 M. Ravichandran，"通过结合数值和神经技术预测海面温度"，《大气与海洋技术》杂志，第 33 卷，第 8 期，第 1715-1726 页，2016 年 8 月，doi: 10.1175/JTECH-D-15-0213.1。
[5] T. N. Krishnamurti, A. Chakraborty, R. Krishnamurti, W. K. Dewar, and C. A. Clayson, “Seasonal prediction of sea surface temperature anomalies using a suite of 13 coupled atmosphere-ocean models,” J. Climate, vol. 19, pp. 6069-6088, Dec. 2006, doi: 10.1175/JCLI3938.1.
[5] T. N. Krishnamurti、A. Chakraborty、R. Krishnamurti、W. K. Dewar 和 C. A. Clayson，"使用 13 个耦合大气-海洋模型套件预测海面温度异常"，《气候》杂志，第 19 卷，第 6069-6088 页，2006 年 12 月，doi: 10.1175/JCLI3938.1。
[6] T. N. Stockdale, M. A. Balmaseda, and A. Vidard, “Tropical Atlantic SST prediction with coupled ocean-atmosphere GCMs,” J. Climate, vol. 19, no. 23, pp. 6047-6061, Dec. 2006, doi: 10.1175/JCLI3947.1.
[6] T. N. 斯托克代尔、M. A. 巴尔马塞达和 A. 维达德，"使用耦合海-气候模式预测热带大西洋海面温度"，气候杂志，第 19 卷，第 23 期，第 6047-6061 页，2006 年 12 月，doi: 10.1175/JCLI3947.1.
[7] Y. F. Wang, Z. H. Zhang, and P. Huang, “An improved model-based analogue forecasting for the prediction of the tropical Indo-Pacific sea surface temperature in a coupled climate model,” Int. J. Climatol., vol. 40, no. 15, pp. 6346-6360, Dec. 2020, doi: 10.1002/joc.6584.
[7] 王义锋、张志华和黄平，"耦合气候模式中热带印度-太平洋海面温度预测的改进基于模型的模拟方法"，国际气候学杂志，第 40 卷，第 15 期，第 6346-6360 页，2020 年 12 月，doi: 10.1002/joc.6584.
[8] J. S. Kug, I. S. Kang, J. Y. Lee, and J. G. Jhun, “A statistical approach to Indian Ocean sea surface temperature prediction using a dynamical ENSO prediction,” Geophys. Res. Lett., vol. 31, no. 9, May 2004, Art. no. L09212, doi: 10.1029/2003GL019209.
[8] 库济硕、康一声、李俊烨和郑俊国，"使用动力学 ENSO 预测的印度洋海面温度统计预测方法"，地球物理研究通讯，第 31 卷，第 9 期，2004 年 5 月，文章编号 L09212，doi: 10.1029/2003GL019209.
[9] R. S. Neetu, S. Basu, A. Sarkar, and P. K. Pal, “Data-adaptive prediction of sea-surface temperature in the Arabian Sea,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 1, pp. 9-13, Jan. 2011, doi: 10.1109/LGRS.2010.2050674.
[9] R. S. 尼图、S. 巴苏、A. 萨卡尔和 P. K. 帕尔，"阿拉伯海海面温度的数据自适应预测"，IEEE 地球科学遥感通讯，第 8 卷，第 1 期，第 9-13 页，2011 年 1 月，doi: 10.1109/LGRS.2010.2050674.
[10] S. G. Aparna, G. D’Souza, and N. B. Arjun, “Prediction of daily sea surface temperature using artificial neural networks,” Int. J. Remote Sens., vol. 39, no. 12, pp. 4214-4231, Apr. 2018, doi: 10.1080/01431161.2018.1454623.
[10] S. G. Aparna、G. D'Souza 和 N. B. Arjun，"使用人工神经网络预测日海面温度"，国际遥感杂志，第 39 卷，第 12 期，第 4214-4231 页，2018 年 4 月，doi: 10.1080/01431161.2018.1454623.
[11] S. Hou et al., “D2CL: A dense dilated convolutional LSTM model for sea surface temperature prediction,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 12514-12523, 2021, doi: 10.1109/JSTARS.2021.3128577.
[11] S. Hou 等，"D2CL：一种用于海面温度预测的密集扩张卷积 LSTM 模型"，IEEE 选择性地球观测和遥感应用主题杂志，第 14 卷，第 12514-12523 页，2021 年，doi: 10.1109/JSTARS.2021.3128577.
[12] S. Y. Hou et al., “MUST: A multi-source spatio-temporal data fusion model for short-term sea surface temperature prediction,” Ocean Eng., vol. 259, Sep. 2022, Art. no. 111932, doi: 10.1016/j.oceaneng.2022.111932.
[12] S. Y. Hou 等，"MUST：一种用于短期海面温度预测的多源时空数据融合模型"，海洋工程，第 259 卷，2022 年 9 月，文章编号 111932，doi: 10.1016/j.oceaneng.2022.111932.
[13] M. Jahanbakht, W. Xiang, and M. R. Azghadi, “Sea surface temperature forecasting with ensemble of stacked deep neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 19, 2022, Art. no. 1502605, doi: 10.1109/LGRS.2021.3098425.
[13] M. Jahanbakht、W. Xiang 和 M. R. Azghadi，"利用堆叠深度神经网络集成进行海面温度预测"，IEEE 地球科学与遥感通讯，第 19 卷，2022 年，文章编号 1502605，doi: 10.1109/LGRS.2021.3098425.
[14] J. J. Liu, B. G. Jin, J. K. Yang, and L. Y. Xu, “Sea surface temperature prediction using a cubic B-spline interpolation and spatiotemporal attention mechanism,” Remote Sens. Lett., vol. 12, no. 5, pp. 478-487, May 2021, doi: 10.1080/2150704X.2021.1897182.
[14] 刘建军、金冰冰、杨军科和徐丽英："使用三次 B 样条插值和时空注意力机制的海面温度预测"，《遥感通讯》，第 12 卷，第 5 期，第 478-487 页，2021 年 5 月，doi: 10.1080/2150704X.2021.1897182.
[15] K. Patil and M. C. Deo, “Prediction of daily sea surface temperature using efficient neural networks,” Ocean Dyn., vol. 67, pp. 357-368, Apr. 2017, doi: 10.1007/s10236-017-1032-9.
[15] K. Patil 和 M. C. Deo："使用高效神经网络预测每日海面温度"，《海洋动力学》，第 67 卷，第 357-368 页，2017 年 4 月，doi: 10.1007/s10236-017-1032-9.
[16] K. Patil and M. C. Deo, “Basin-scale prediction of sea surface temperature with artificial neural networks,” J. Atmos. Ocean. Technol., vol. 35, no. 7, pp. 1441-1455, Jul. 2018, doi: 10.1175/JTECH-D-17-0217.1.
[16] K. Patil 和 M. C. Deo："利用人工神经网络进行大范围海面温度预测"，《大气与海洋技术杂志》，第 35 卷，第 7 期，第 1441-1455 页，2018 年 7 月，doi: 10.1175/JTECH-D-17-0217.1.
[17] L. Wei, L. Guan, and L. Q. Qu, “Prediction of sea surface temperature in the South China Sea by artificial neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 4, pp. 558-562, Apr. 2020, doi: 10.1109/LGRS.2019.2926992.
[17] 李伟、关岭和曲良强："利用人工神经网络预测南海海面温度"，《IEEE 地球科学与遥感通讯》，第 17 卷，第 4 期，第 558-562 页，2020 年 4 月，doi: 10.1109/LGRS.2019.2926992.
[18] S. Wolff, F. O’Donnch, and B. Chen, “Statistical and machine learning ensemble modelling to forecast sea surface temperature,” J. Mar. Syst., vol. 208, Aug. 2020, Art. no. 103347, doi: 10.1016/j.jmarsys.2020.103347.
[18] S. Wolff、F. O'Donnch 和 B. Chen，"用于预测海面温度的统计和机器学习集成建模"，海洋系统杂志，第 208 卷，2020 年 8 月，文章编号：103347，doi: 10.1016/j.jmarsys.2020.103347.
[19] C. J. Xiao et al., “A spatiotemporal deep learning model for sea surface temperature field prediction using time-series satellite data,” Environ. Model. Softw., vol. 120, Oct. 2019, Art. no. 104502, doi: 10.1016/j.envsoft.2019.104502.
[19] C. J. Xiao 等，"使用时间序列卫星数据的海面温度场预测时空深度学习模型"，环境软件，第 120 卷，2019 年 10 月，文章编号：104502，doi: 10.1016/j.envsoft.2019.104502.
[20] J. Xie, J. Zhang, J. Yu, and L. Xu, “An adaptive scale sea surface temperature predicting method based on deep learning with attention mechanism,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 5, pp. 740-744, May 2020, doi: 10.1109/LGRS.2019.2931728.
[20] J. Xie、J. Zhang、J. Yu 和 L. Xu，"基于注意力机制的深度学习自适应尺度海面温度预测方法"，IEEE 地球科学与遥感通讯，第 17 卷，第 5 期，第 740-744 页，2020 年 5 月，doi: 10.1109/LGRS.2019.2931728.
[21] L. Y. Xu, Q. Li, J. Yu, L. Wang, J. Xie, and S. X. Shi, “Spatiotemporal predictions of SST time series in China’s offshore waters using a regional convolution long short-term memory (RC-LSTM) network,” Int. J. Remote Sens., vol. 41, no. 9, pp. 3368-3389, May 2020, doi: 10.1080/01431161.2019.1701724.
[21] L. Y. Xu、Q. Li、J. Yu、L. Wang、J. Xie 和 S. X. Shi，"使用区域卷积长短期记忆（RC-LSTM）网络预测中国近海海面温度时间序列"，国际遥感杂志，第 41 卷，第 9 期，第 3368-3389 页，2020 年 5 月，doi: 10.1080/01431161.2019.1701724.
[22] L. Y. Xu, Y. F. Li, J. Yu, Q. Li, and S. X. Shi, “Prediction of sea surface temperature using a multiscale deep combination neural network,” Remote Sens. Lett., vol. 11, no. 7, pp. 611-619, Jul. 2020, doi: 10.1080/2150704X.2020.1746853.
[22] 徐龙英、李亚飞、余建、李强和师少校，"使用多尺度深度组合神经网络预测海表温度"，《遥感通讯》，第 11 卷，第 7 期，第 611-619 页，2020 年 7 月，doi: 10.1080/2150704X.2020.1746853.
[23] Y. Yang, J. Dong, X. Sun, E. Lima, Q. Mu, and X. H. Wang, “A CFCC-LSTM model for sea surface temperature prediction,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 2, pp. 207-211, Feb. 2018, doi: 10.1109/LGRS.2017.2780843.
[23] 杨扬、董佳、孙鑫、利马、穆强和王晓辉，"用于海表温度预测的 CFCC-LSTM 模型"，《IEEE 地球科学与遥感通讯》，第 15 卷，第 2 期，第 207-211 页，2018 年 2 月，doi: 10.1109/LGRS.2017.2780843.
[24] Q. Zhang, H. Wang, J. Dong, G. Zhong, and X. Sun, “Prediction of sea surface temperature using long short-term memory,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 10, pp. 1745-1749, Oct. 2017, doi: 10.1109/LGRS.2017.2733548.
[24] 张强、王浩、董佳、钟国和孙鑫，"使用长短期记忆网络预测海表温度"，《IEEE 地球科学与遥感通讯》，第 14 卷，第 10 期，第 1745-1749 页，2017 年 10 月，doi: 10.1109/LGRS.2017.2733548.
[25] X. Zhang, Y. Li, A. C. Frery, and P. Ren, “Sea surface temperature prediction with memory graph convolutional networks,” IEEE Geosci. Remote Sens. Lett., vol. 19, Jul. 2022, Art. no. 8017105, doi: 10.1109/LGRS.2021.3097329.
[25] 张新、李阳、弗雷雷和任鹏，"使用记忆图卷积网络预测海表温度"，《IEEE 地球科学与遥感通讯》，第 19 卷，2022 年 7 月，文章编号 8017105，doi: 10.1109/LGRS.2021.3097329.
[26] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL, 2019, pp. 4171-4186, doi: 10.18653/v1/N19-1423.
[26] J. Devlin, M. W. Chang, K. Lee 和 K. Toutanova，"BERT：用于语言理解的深度双向变换器预训练"，载于 NAACL 会议论文集，2019 年，第 4171-4186 页，doi: 10.18653/v1/N19-1423。
[27] C. A. Huang et al., “Music transformer: Generating music with longterm structure,” in Proc. Int. Conf. Learn. Representations, 2019, doi: 10.48550/arXiv.1809.04281.
[27] C. A. Huang 等，"音乐变换器：生成具有长期结构的音乐"，载于学习表征国际会议论文集，2019 年，doi: 10.48550/arXiv.1809.04281。
[28] A. Vaswani et al., “Attention is all you need,” in Proc. NIPS, 2017, pp. 6000-6010, doi: 10.48550/arXiv.1706.03762.
[28] A. Vaswani 等，"注意力即是一切"，载于 NIPS 会议论文集，2017 年，第 6000-6010 页，doi: 10.48550/arXiv.1706.03762。
[29] D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” in Proc. ICLR, 2016, doi: 10.48550/arXiv.1511.07289.
[29] D. Clevert，T. Unterthiner 和 S. Hochreiter，"通过指数线性单元（ELUs）进行快速准确的深度网络学习"，载于 ICLR 会议论文集，2016 年，doi: 10.48550/arXiv.1511.07289。
[30] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
[30] S. 霍克雷特和 J. 施米德胡伯，"长短期记忆"，神经计算，第 9 卷，第 8 期，第 1735-1780 页，1997 年，doi: 10.1162/neco.1997.9.8.1735.
[31] S. J. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 2018, doi: 10.48550/arXiv.1803.01271.
[31] S. J. 白、J. Z. 科尔特和 V. 科尔特，"对序列建模的通用卷积和循环网络的实证评估"，2018 年，doi: 10.48550/arXiv.1803.01271.
[32] H. Y. Zhou et al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Proc. AAAI, Virtual Event, 2021, pp. 11106-11115, doi: 10.48550/arXiv.2012.07436.
[32] H. Y. 周等，"信息转换器：超越高效转换器的长序列时间序列预测"，在 AAAI 会议论文集，虚拟会场，2021 年，第 11106-11115 页，doi: 10.48550/arXiv.2012.07436.
[33] B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio, “N-BEATS: Neural basis expansion analysis for interpretable time series forecasting,” in Proc. ICLR, 2020, doi: 10.48550/arXiv.1905.10437.
[33] B. N. 奥雷什金、D. 卡尔波夫、N. 查帕多斯和 Y. 本吉奥，"N-BEATS：用于可解释时间序列预测的神经基础扩展分析"，在 ICLR 会议论文集，2020 年，doi: 10.48550/arXiv.1905.10437.

Hao Dai was born in 1982. He received the Ph.D. degree in instrument science and technology from the Zhejiang University, Hangzhou, China, in 2012.
戴浩于 1982 年出生。他于 2012 年获得浙江大学仪器科学与技术专业博士学位，地点位于杭州，中国。

He is currently a Senior Engineer with the Institute of Ocean Exploration Technology, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China. His research interests include artificial intelligence oceanography and ocean metrology.
目前，他在厦门大学海洋与地球科学学院海洋探索技术研究所担任高级工程师。他的研究兴趣包括人工智能海洋学和海洋计量学。

Zhigang He was born in 1973. He received the Ph.D. degree in ocean chemistry from the Xiamen University, Xiamen, China, in 2005.
何志刚于 1973 年出生。他于 2005 年获得厦门大学海洋化学专业博士学位，地点位于厦门，中国。

He is currently an Associate Professor with the Xiamen University. His research interests include ocean observation technology, circulation in the South China Sea and its adjacent region.
目前，他在厦门大学担任副教授。他的研究兴趣包括海洋观测技术、南海及其邻近海域的海洋环流。

Guomei Wei was born in 1985. She received the M.S. degree in environmental sciences from the Xiamen University, Xiamen, China, in 2010.
魏国美于 1985 年出生。她于 2010 年在中国厦门市的厦门大学获得环境科学硕士学位。

Her research interests include quality control technology for radar data.
她的研究兴趣包括雷达数据质量控制技术。

Famei Lei was born in 1984. He received the M.S. degree in environmental sciences from the Xiamen University, Xiamen, China, in 2012.
雷发美于 1984 年出生。他于 2012 年在中国厦门市的厦门大学获得环境科学硕士学位。

His research interest includes data processing of ocean observation instruments.
他的研究兴趣包括海洋观测仪器的数据处理。

Xining Zhang was born in 1982. She received the Ph.D. degree in optical engineering from the Zhejiang University, Hangzhou, China, in 2012.
张西宁于 1982 年出生。她于 2012 年从浙江大学获得光学工程博士学位，学校位于中国杭州。

She is currently an Associate Professor with the College of Information Science and Engineering, Huaqiao University, Quanzhou, China. Her research interests include surface plasmon properties in metal micro/nano structures, and optical devices based on micro/nanofibers.
目前，她在华侨大学信息科学与工程学院担任副教授。她的研究兴趣包括金属微/纳米结构中的表面等离子体特性，以及基于微/纳米光纤的光学设备。

Weijie Zhang was born in 1986. She received the M.S. degree in environmental sciences from the Xiamen University, Xiamen, China, in 2015.
张伟杰于 1986 年出生。她于 2015 年从厦门大学获得环境科学硕士学位，学校位于中国厦门。

Her research interest includes visualization in ocean science.
她的研究兴趣包括海洋科学可视化。

Shaoping Shang was born in 1962. He received the M.S. degree in physical oceanography from the Xiamen University, Xiamen, China, in 1986.
1962 年出生的商绍平。1986 年，他在厦门大学获得物理海洋学硕士学位。

He is currently a Professor with the College of Ocean and Earth Sciences, Xiamen University, Xiamen, China. His research interests include ocean observation, marine information technology, numerical modeling, and remote sensing of marine environment.
他目前是厦门大学海洋与地球科学学院的教授。他的研究兴趣包括海洋观测、海洋信息技术、数值建模以及海洋环境遥感。

$^{a}$ means seq $_{1 en} = 540$ days, label $_{1 en} = 360$ days, and pred $_{1 en} = 360$ days while $^{b}$ means seq $_{1 en} = 720$ days, label $_{1 en} = 360$ days, and pred $_{1 en} = 360$ days.
$^{a}$ 表示序列 $_{1 en} = 540$ 天，标签 $_{1 en} = 360$ 天，预测 $_{1 en} = 360$ 天，而 $^{b}$ 表示序列 $_{1 en} = 720$ 天，标签 $_{1 en} = 360$ 天，预测 $_{1 en} = 360$ 天。
Bold values highlights the optimal prediction results for each lead time in every sea area.
粗体数值突出显示了每个海域每个预测时间步的最佳预测结果。
The “Count” row represents the count of the optimal value of each model under all lead times in all sea areas.
"Count"行表示在所有海域和所有预测时间尺度下每个模型的最佳值计数。
Bold values highlights the optimal prediction results for each lead time in every sea area.
加粗的数值突出显示了每个海域中每个预测时间尺度下的最佳预测结果。

Long-Term Prediction of Sea Surface Temperature by Temporal Embedding Transformer With Attention Distilling and Partial Stacked Connection 基于时序嵌入变换器的海表温度长期预测：注意力蒸馏和部分堆叠连接方法

Abstract 摘要

I. Introduction 一、引言

II. Methods II. 方法

A. Classic Transformer A. 经典变换器

B. TransDtSt-Part Model B. TransDtSt-Part 模型

III. Data and Research AreaIII. 数据和研究区域

A. Data A. 数据

B. Research Area B. 研究区域

IV. Experiments and Results四、实验和结果

A. Dataset Partitioning and Preprocessing MethodA. 数据集划分和预处理方法

B. Baseline Models B. 基准模型

C. Hardware Platform and Software EnvironmentC. 硬件平台和软件环境

D. Hyperparameters D. 超参数

E. Loss Function and OptimizerE. 损失函数和优化器

F. Predictive Pattern F. 预测模式

G. Metrics G. 评价指标

H. Main Results H. 主要结果

V. CONCLUSION V. 结论

AcKNOWLEDGMENT 致谢

References 参考文献

Long-Term Prediction of Sea Surface Temperature by Temporal Embedding Transformer With Attention Distilling and Partial Stacked Connection
基于时序嵌入变换器的海表温度长期预测：注意力蒸馏和部分堆叠连接方法

III. Data and Research Area
III. 数据和研究区域

IV. Experiments and Results
四、实验和结果

A. Dataset Partitioning and Preprocessing Method
A. 数据集划分和预处理方法

C. Hardware Platform and Software Environment
C. 硬件平台和软件环境

E. Loss Function and Optimizer
E. 损失函数和优化器