Deep Learning for Cosmological Parameter Inference from Dark Matter Halo Density Field
基于暗物质晕密度场的深度学习宇宙学参数推断

Zhiwei Min School of Physics and Astronomy, Sun Yat-Sen University, Zhuhai 519082, China CSST Science Center for the Guangdong–Hong Kong–Macau Greater Bay Area, SYSU, Zhuhai 519082, China Xu Xiao School of Physics and Astronomy, Sun Yat-Sen University, Zhuhai 519082, China CSST Science Center for the Guangdong–Hong Kong–Macau Greater Bay Area, SYSU, Zhuhai 519082, China Jiacheng Ding School of Physics and Astronomy, Sun Yat-Sen University, Zhuhai 519082, China CSST Science Center for the Guangdong–Hong Kong–Macau Greater Bay Area, SYSU, Zhuhai 519082, China Liang Xiao School of Physics and Astronomy, Sun Yat-Sen University, Zhuhai 519082, China CSST Science Center for the Guangdong–Hong Kong–Macau Greater Bay Area, SYSU, Zhuhai 519082, China Jie Jiang School of Computer Science,South China Normal University,Guangzhou 510631,China Donglin Wu School of Computer Science,South China Normal University,Guangzhou 510631,China Qiufan Lin Peng Cheng Laboratory, No. 2, Xingke 1st Street, Shenzhen 518000, China Yang Wang Peng Cheng Laboratory, No. 2, Xingke 1st Street, Shenzhen 518000, China Shuai Liu School of Educational Science,Hunan Normal University,Changsha 410081,China Zhixin Chen School of Educational Science,Hunan Normal University,Changsha 410081,China Xiangru Li School of Computer Science,South China Normal University,Guangzhou 510631,China Jinqu Zhang zjq@scnu.edu.cn School of Computer Science,South China Normal University,Guangzhou 510631,China Le Zhang zhangle7@mail.sysu.edu.cn School of Physics and Astronomy, Sun Yat-Sen University, Zhuhai 519082, China Peng Cheng Laboratory, No. 2, Xingke 1st Street, Shenzhen 518000, China CSST Science Center for the Guangdong–Hong Kong–Macau Greater Bay Area, SYSU, Zhuhai 519082, China Xiao-Dong Li lixiaod25@mail.sysu.edu.cn School of Physics and Astronomy, Sun Yat-Sen University, Zhuhai 519082, China CSST Science Center for the Guangdong–Hong Kong–Macau Greater Bay Area, SYSU, Zhuhai 519082, China

(September 19, 2024) (2024 年 9 月 19 日)

Abstract 摘要

We propose a lightweight deep convolutional neural network (lCNN) to estimate cosmological parameters from simulated three-dimensional dark matter (DM) halo distributions and associated statistics. The training dataset comprises 2000 realizations of a cubic box with a side length of 1000 $h^{-1}{\rm Mpc}$ , and interpolated over a cubic grid of $300^{3}$ voxels, with each simulation produced using $512^{3}$ DM particles and $512^{3}$ neutrinos. Under the flat $\Lambda$ CDM model, simulations vary standard six cosmological parameters including $\Omega_{m}$ , $\Omega_{b}$ , $h$ , $n_{s}$ , $\sigma_{8}$ , $w$ , along with the neutrino mass sum, $M_{\nu}$ . We find that: 1) within the framework of lCNN, extracting large-scale structure information is more efficient from the halo density field compared to relying on the statistical quantities including the power spectrum, the two-point correlation function, and the coefficients from wavelet scattering transform; 2) combining the halo density field with its Fourier transformed counterpart enhances predictions, while augmenting the training dataset with measured statistics further improves performance; 3) achieving high accuracy in inferring $\Omega_{m}$ , $h$ , and $\sigma_{8}$ by the neural network model, while being inefficient in predicting $\Omega_{b}$ , $n_{s}$ , $M_{\nu}$ and $w$ ; 4) compared to the simple fully connected network trained with three statistical quantities, our CNN yields statistically reduced errors, showing improvements of approximately 23% for $\Omega_{m}$ , 11% for $h$ , 8% for $n_{s}$ , and 21% for $\sigma_{8}$ . Additionally, in comparison with the likelihood-based analysis on $P(k)$ data, our CNN provides much tighter constraints on parameters, especially on $\Omega_{m}$ and $\sigma_{8}$ . Our study emphasizes this lCNN-based novel approach in extracting large-scale structure information and estimating cosmological parameters.
我们提出了一种轻量级深度卷积神经网络(lCNN)，用于从模拟的三维暗物质(DM)晕分布及其相关统计量中估计宇宙学参数。训练数据集包含 2000 个边长为 1000 $h^{-1}{\rm Mpc}$ 的立方体实现，并在 $300^{3}$ 体素的立方网格上进行插值，每个模拟使用 $512^{3}$ 个 DM 粒子和 $512^{3}$ 个中微子生成。在平坦的 $\Lambda$ CDM 模型下，模拟变化了六个标准宇宙学参数，包括 $\Omega_{m}$ 、 $\Omega_{b}$ 、 $h$ 、 $n_{s}$ 、 $\sigma_{8}$ 、 $w$ ，以及中微子质量总和 $M_{\nu}$ 。研究发现：1)在 lCNN 框架内，从晕密度场提取大尺度结构信息比依赖功率谱、两点相关函数和小波散射变换系数等统计量更为高效；2)将晕密度场与其傅里叶变换结果结合可提升预测性能，而用测量统计量扩充训练集能进一步改善结果；3)神经网络模型在推断 $\Omega_{m}$ 、 $h$ 和 $\sigma_{8}$ 时达到高精度，但对 $\Omega_{b}$ 、 $n_{s}$ 、 $M_{\nu}$ 和 $w$ 的预测效率较低；4)与使用三种统计量训练的简单全连接网络相比，我们的 CNN 在统计误差上显著降低，对 $\Omega_{m}$ 、 $h$ 、 $n_{s}$ 和 $\sigma_{8}$ 的改进分别约为 23%、11%、8%和 21%。此外，与基于 $P(k)$ 数据的似然分析相比，我们的卷积神经网络（CNN）能够为参数提供更为严格的约束，尤其是对 $\Omega_{m}$ 和 $\sigma_{8}$ 的约束更为显著。本研究突出了这种基于轻量化 CNN 的新方法在提取大尺度结构信息及估算宇宙学参数方面的优势。

I Introduction 引言

One of the compelling challenges in modern cosmology is the precise estimation of cosmological parameters. With the continuous development of observational techniques, our understanding of the Universe is progressively deepening. However, to comprehensively and accurately understand the evolution and nature of the Universe, key parameters such as the expansion rate and dark energy density need more sophisticated measurement and analysis. This is crucial for validating cosmological models and unlocking the puzzles of the Universe such as and Hubble and $S_{8}$ tensions. High-precision parameter estimates will validate or challenge existing theories, e.g., the $\Lambda{\rm CDM}$ model (Weinberg, 1989; Peebles and Ratra, 2003; Li et al., 2011), leading to greater progress in understanding the nature of the Universe.
现代宇宙学中一项引人注目的挑战是精确估算宇宙学参数。随着观测技术的持续发展，我们对宇宙的认识正逐步深化。然而，为了全面而准确地理解宇宙的演化及其本质，诸如膨胀速率和暗能量密度等关键参数需要更精细的测量与分析。这对于验证宇宙学模型、解开诸如哈勃常数与 $S_{8}$ 张力等宇宙之谜至关重要。高精度的参数估计将验证或挑战现有理论，例如 $\Lambda{\rm CDM}$ 模型（Weinberg, 1989; Peebles and Ratra, 2003; Li et al., 2011），从而推动人类对宇宙本质认知的更大进展。

The large-scale structure (LSS) of the Universe holds significant cosmological information. These vast and intricate formations depict the distribution, accumulation, and evolution of matter in the Universe, serving as crucial observables for comprehending cosmic origins and evolution (Bardeen et al., 1986; De Lapparent et al., 1986; Huchra et al., 2012; Tegmark et al., 2004; Guzzo et al., 2014). Through the observation and analysis of LSS, we can track the evolution of the Universe, comprehend its expansion history across various redshifts, explore the formation mechanisms of galaxy clusters and superclusters, and investigate the impacts of DM and dark energy on the evolution of LSS.
宇宙的大尺度结构（LSS）蕴含着重要的宇宙学信息。这些庞大而复杂的构造描绘了宇宙中物质的分布、聚集及演化过程，是理解宇宙起源与演化的关键观测对象（Bardeen et al., 1986; De Lapparent et al., 1986; Huchra et al., 2012; Tegmark et al., 2004; Guzzo et al., 2014）。通过对 LSS 的观测与分析，我们能够追溯宇宙的演化历程，理解其在不同红移下的膨胀历史，探索星系团和超星系团的形成机制，并研究暗物质（DM）与暗能量对 LSS 演化的影响。

At present, the two-point correlation function (2PCF) and its Fourier counterpart, the power spectrum, are the most commonly used statistical tools for analyzing LSS(Zhong et al., 2024), due to the fact that their sensitivity to both the geometry and the cosmic evolution (Kaiser, 1987; Ballinger et al., 1996; Eisenstein et al., 1998; Blake and Glazebrook, 2003; Seo and Eisenstein, 2003), allowing for the effective extraction of information regarding Gaussian perturbations. These methods have been successfully applied in analyzing galaxy redshift surveys such as the 2dFGRS (Colless et al., 2003), 6dFGS (Beutler et al., 2011), the WiggleZ Survey (Riemer–Sørensen et al., 2012), and the SDSS Survey (York et al., 2000; Eisenstein et al., 2005; Percival et al., 2007; Anderson et al., 2014; Samushia et al., 2014; Ross et al., 2015; Beutler et al., 2017; Sánchez et al., 2017; Alam et al., 2017a; Chuang et al., 2017; Neveux et al., 2020). However, they encounter difficulties in extracting small-scale information, e.g., $\lesssim 40~{}h^{-1}{\rm Mpc}$ , from the LSS due to the pronounced influence of nonlinear structure evolution caused by gravitational collapse on such scales. Consequently, direct comparisons between observations and theories on the nonlinear scales become challenging.
目前，两点相关函数（2PCF）及其傅里叶对应量——功率谱，是分析大尺度结构（LSS）最常用的统计工具（Zhong et al., 2024），因为它们对几何结构和宇宙演化均具有敏感性（Kaiser, 1987; Ballinger et al., 1996; Eisenstein et al., 1998; Blake and Glazebrook, 2003; Seo and Eisenstein, 2003），能有效提取关于高斯扰动的信息。这些方法已成功应用于分析星系红移巡天数据，如 2dFGRS（Colless et al., 2003）、6dFGS（Beutler et al., 2011）、WiggleZ 巡天（Riemer–Sørensen et al., 2012）及 SDSS 巡天（York et al., 2000; Eisenstein et al., 2005; Percival et al., 2007; Anderson et al., 2014; Samushia et al., 2014; Ross et al., 2015; Beutler et al., 2017; Sánchez et al., 2017; Alam et al., 2017a; Chuang et al., 2017; Neveux et al., 2020）。然而，由于引力塌缩导致的非线性结构演化在小尺度上（例如 $\lesssim 40~{}h^{-1}{\rm Mpc}$ ）影响显著，这些方法在提取 LSS 中小尺度信息时面临困难，使得观测与理论在非线性尺度上的直接对比变得极具挑战性。

Alternative statistical measures have been turned to in probing the small-scale properties of the Universe beyond 2PCF. The three-point correlation function (Sabiu et al., 2016; Slepian et al., 2017) has been utilized to improve cosmological constraints, while the more complicated four-point correlation function (Sabiu et al., 2019), has demonstrated more stringent constriants. Furthermore, Lavaux and Wandelt (2012) employed cosmic voids as a means to probe the cosmic geometry. Additionally, Li et al. (2017) explored the redshift dependence of the 2PCF along the line-of-sight as a probe for cosmological parameters. The symmetry of galaxy pairs was tested (Marinoni and Buzzi, 2010) and the redshift dependence of the Alcock-Paczynski effect (AP effect) can be exploited to mitigate redshift distortions (RSDs) (Li et al., 2015). In addition, Li et al. (2018) utilized the tomographic AP method on SDSS galaxy data to obtain a strong constraint on dark energy, and Porqueres et al. (2021) present a field-level inference on cosmological parameters by analyzing cosmic shear data.
为探索超越两点相关函数（2PCF）的宇宙小尺度特性，研究者们转向了其他统计方法。三点相关函数（Sabiu 等，2016；Slepian 等，2017）被用于提升宇宙学约束精度，而更复杂的四点相关函数（Sabiu 等，2019）则展现出更强的限制能力。此外，Lavaux 和 Wandelt（2012）利用宇宙空洞作为探测宇宙几何的手段。Li 等人（2017）通过研究视线方向上 2PCF 的红移依赖性，将其作为宇宙学参数的探针。星系对的对称性得到了检验（Marinoni 和 Buzzi，2010），且 Alcock-Paczynski 效应（AP 效应）的红移依赖性可用于减轻红移畸变（RSDs）（Li 等，2015）。另外，Li 等（2018）在 SDSS 星系数据上应用层析 AP 方法，获得了对暗能量的强约束；Porqueres 等（2021）则通过分析宇宙剪切数据，实现了对宇宙学参数的场级推断。

Recently, the mark weighted correlation function (MCF) (White, 2016) has proposed as an alternative approach. It assigns density weights to various galaxy features to extract non-Gaussian information on LSS. Demonstrated effectiveness in capturing detailed clustering information has led to significantly enhanced constraints on cosmological parameters such as $\Omega_{m}$ and $w$ (Yang et al., 2020; Lai et al., 2023; Yin et al., 2024). Moreover, (Fang et al., 2019; Yin et al., 2024) utilized the $\beta$ -skeleton statistics to constrain cosmological parameters.
最近，标记加权相关函数（MCF）（White, 2016）被提出作为一种替代方法。该方法通过为不同星系特征分配密度权重，以提取大尺度结构（LSS）中的非高斯信息。其在捕捉详细聚类信息方面展现的有效性，显著增强了对宇宙学参数如 $\Omega_{m}$ 和 $w$ 的约束（Yang 等，2020；Lai 等，2023；Yin 等，2024）。此外，（Fang 等，2019；Yin 等，2024）利用 $\beta$ 骨架统计量对宇宙学参数进行了约束。

Although the methods mentioned above can extract rich information from LSS, they also exhibit certain drawbacks. Some methods are overly complex and demand substantial computational resources. In recent years, the rapid development and application of machine learning have introduced new and powerful technical tools for astronomical data analysis, offering innovative solutions to the challenges encountered in survey data analysis (Way et al., 2016; Chen and Zhang, 2014; Jordan and Mitchell, 2015; Rodríguez-Mazahua et al., 2016; Ball et al., 2017; Sen et al., 2022). Machine learning-based data analysis methods offer significant advantages over traditional approaches in terms of efficiency, accuracy, and feature extraction capabilities. For instance, Wu et al. (2021, 2023) developed a deep learning technique to infer the non-linear velocity field from the DM density field. In addition, (Wang et al., 2024) present a deep-learning technique for reconstructing the dark-matter density field from the redshift-space distribution of dark-matter halos.
尽管上述方法能从 LSS 中提取丰富信息，但它们也存在一定缺陷。部分方法过于复杂，需消耗大量计算资源。近年来，机器学习的快速发展和应用为天文数据分析引入了强大新技术工具，为巡天数据分析中的挑战提供了创新解决方案（Way et al., 2016; Chen and Zhang, 2014; Jordan and Mitchell, 2015; Rodríguez-Mazahua et al., 2016; Ball et al., 2017; Sen et al., 2022）。基于机器学习的数据分析方法在效率、精度和特征提取能力上较传统方法具有显著优势。例如，Wu 等人（2021，2023）开发了从暗物质密度场推断非线性速度场的深度学习技术；此外，（Wang et al., 2024）提出利用暗物质晕红移空间分布重建暗物质密度场的深度学习技术。

Ravanbakhsh et al. (2016); Pan et al. (2020) utilized convolutional neural networks (CNNs) to extract information from 3D DM distribution and accurately estimate cosmological parameters. Meanwhile, (Lazanu, 2021) employed the Quijote simulation (Villaescusa-Navarro et al., 2020) to estimate cosmological parameters from 3D DM distribution using CNNs, comparing the constraints with those obtained from power-spectrum-based methods. Additionally, Hortua (2021) utilized Quijote simulation data to estimate cosmological parameters from a Bayesian neural network, resulting in a posterior distribution of parameters. Recently, Hwang et al. (2023) applied the Vision Transformer, known for its advantages in natural language processing, to the estimation of cosmological parameters, and compares its performance with traditional CNNs and 2PCF.
Ravanbakhsh 等人（2016 年）与 Pan 等人（2020 年）利用卷积神经网络（CNNs）从三维暗物质分布中提取信息，精确估算了宇宙学参数。与此同时，Lazanu（2021 年）借助 Quijote 模拟（Villaescusa-Navarro 等人，2020 年），采用 CNNs 技术从三维暗物质分布中估算宇宙学参数，并将结果与基于功率谱的传统方法所获约束进行了对比。此外，Hortua（2021 年）运用 Quijote 模拟数据，通过贝叶斯神经网络估算了宇宙学参数，得到了参数的后验分布。最近，Hwang 等人（2023 年）将自然语言处理领域表现优异的 Vision Transformer 应用于宇宙学参数估计，并对比了其与传统 CNNs 及两点相关函数（2PCF）的性能差异。

In this study, we explore a deep-learning-based approach to extract cosmological information from the halo number density field. In contrast to previous studies (Lazanu, 2021; Ravanbakhsh et al., 2016; Pan et al., 2020), we utilize the halo number density field instead of the DM particle density field. Recently, (Makinen et al., 2022) presented the Graph Information Maximizing Neural Networks, which are capable of quantifying cosmological information from discrete catalog data. In this study, to more realistically reflect real observations, we incorporate several observational effects into our mock samples, such as redshift-space distortion (RSD) effects, coordinate transformations to the fiducial cosmology background, and fixing the halo number density. Meanwhile, the entire parameter space for the fiducial cosmological parameters is jointly inferred, rather than just inferring some of them with other parameters fixed. Using the halo catalog of the Quijote’s LH $\nu w$ simulation (Villaescusa-Navarro et al., 2020), our proposed lCNN framework demonstrates the ability to provide reliable constraints on cosmological parameters. Furthermore, we observe that by combining various statistics as input to the lCNN, the performance of the neural network can be noticeably enhanced.
在本研究中，我们探索了一种基于深度学习的方法，从晕数目密度场中提取宇宙学信息。与先前研究（Lazanu, 2021; Ravanbakhsh 等人, 2016; Pan 等人, 2020）不同，我们采用晕数目密度场而非暗物质粒子密度场。近期，（Makinen 等人, 2022）提出的图信息最大化神经网络能够从离散目录数据中量化宇宙学信息。为了更真实地反映实际观测，我们在模拟样本中引入了多种观测效应，如红移空间畸变（RSD）效应、坐标转换至基准宇宙学背景、以及固定晕数目密度。同时，我们联合推断基准宇宙学参数的整个参数空间，而非仅固定部分参数推断其余。利用 Quijote 的 LH $\nu w$ 模拟（Villaescusa-Navarro 等人, 2020）的晕目录数据，我们提出的 lCNN 框架展现出对宇宙学参数提供可靠约束的能力。此外，我们发现通过组合多种统计量作为 lCNN 的输入，可显著提升神经网络性能。

This paper is part of the “Dark-AI” project¹¹1https://dark-ai.top/, a project aims to apply state-of-the-art machine learning algorithms to address frontier problems in cosmology.The structure of this paper is as follows. In Sect. II, we introduce the samples utilized for training and testing, whereas in Sect. III, we outline the architecture of our neural network. Sect. IV is dedicated to presenting the results. Finally, we conclude in Sect. V by discussing the results.
本文隶属于“暗黑人工智能”项目 ¹ ，该项目旨在运用最前沿的机器学习算法解决宇宙学中的尖端问题。论文结构安排如下：第二节介绍用于训练与测试的数据样本；第三节概述神经网络架构设计；第四节专注于成果展示；最后于第五节总结全文并探讨研究结果。

II Data 数据

To estimate cosmological parameters, training and test samples are constructed using the FoF DM halo catalogues from LH $\nu w$ simulations, a subset of 2000 simulations within the Quijote simulations (Villaescusa-Navarro et al., 2020)–an ensemble of publicly available $N$ -body simulations. These simulations utilize the TreePM code Gadget-III (Springel, 2005) and are conducted in boxes with side length $1~{}h^{-1}{\rm Gpc}$ . The LH $\nu w$ simulations offer various cosmological results, evolving $512^{3}$ DM particles together with $512^{3}$ neutrino particles. For this study, we focus on the snapshot at $z=0.5$ . Beginning from $z=127$ , the simulations evolve over time, with matter power spectra and transfer functions obtained from CAMB (Lewis et al., 2000), appropriately adjusted. These quantities are used to determine displacements and peculiar velocities via second-order perturbation theory, which are then employed to assign initial particle positions on a regular grid using the 2LPTBouchet et al. (1994). The simulations are executed by employing Latin-hypercube sampling, a statistical technique for generating a quasi-random sample of parameter values from a multidimensional distribution, with 7 cosmological parameters. The parameter ranges are as follows: $\Omega_{m}\in[0.1,0.5]$ , $\Omega_{b}\in[0.03,0.07]$ , $h\in[0.5,0.9]$ , $n_{s}\in[0.8,1.2]$ , $\sigma_{8}\in[0.6,1.0]$ , $M_{\nu}\in[0,1]~{}\mathrm{eV}$ , and $w\in[-1.3,-0.7]$ . In order to perform a standard likelihood-based analysis using $P(k)$ , we also utilize 1000 realizations with different random seeds for a fiducial cosmology to estimate the covariance matrixVillaescusa-Navarro et al. (2020).The value of the cosmological parameters for the fiducial model are $\Omega_{m}=0.3175,\Omega_{b}=0.049,h=0.6711,n_{s}=0.9624,\sigma_{8}=0.834,M_{% \nu}=0.0~{}\mathrm{eV},\mathrm{and}\,w=-1$ . For this model, the simulatons are run with $1h^{-1}\mathrm{Gpc}$ box and $512^{3}$ DM particles using 2LPT initial conditions. As a comparison, we also discussed the Rockstar halo catalogs provided by Quijote. The results indicate that the training outcomes of Rockstar are significantly inferior to those of FoF, primarily due to the much lower halo number density of the former. See Appendix B for details.
为估算宇宙学参数，本研究基于 Quijote 模拟（Villaescusa-Navarro et al., 2020）中 2000 次模拟的子集——LH $\nu w$ 模拟的 FoF 暗物质晕目录构建训练与测试样本。这些公开可用的 $N$ 体模拟采用 TreePM 代码 Gadget-III（Springel, 2005），在边长为 $1~{}h^{-1}{\rm Gpc}$ 的立方体中进行。LH $\nu w$ 模拟可生成多种宇宙学结果，其中 $512^{3}$ 个暗物质粒子与 $512^{3}$ 个中微子粒子共同演化。本工作重点关注 $z=0.5$ 时刻的快照数据。模拟从 $z=127$ 开始随时间演化，物质功率谱和转移函数通过 CAMB（Lewis et al., 2000）获取并经过适当调整。这些数据通过二阶微扰理论计算位移和本动速度，进而采用 2LPT（Bouchet et al., 1994）方法在规则网格上分配初始粒子位置。模拟采用拉丁超立方抽样——一种从多维分布中生成准随机参数值的统计技术，涉及 7 个宇宙学参数，其范围分别为： $\Omega_{m}\in[0.1,0.5]$ 、 $\Omega_{b}\in[0.03,0.07]$ 、 $h\in[0.5,0.9]$ 、 $n_{s}\in[0.8,1.2]$ 、 $\sigma_{8}\in[0.6,1.0]$ 、 $M_{\nu}\in[0,1]~{}\mathrm{eV}$ 及 $w\in[-1.3,-0.7]$ 。为了利用 $P(k)$ 进行标准的基于似然的分析，我们还采用了 1000 组具有不同随机种子的基准宇宙学实现来估计协方差矩阵（Villaescusa-Navarro 等人，2020 年）。该基准模型的宇宙学参数值为 $\Omega_{m}=0.3175,\Omega_{b}=0.049,h=0.6711,n_{s}=0.9624,\sigma_{8}=0.834,M_{% \nu}=0.0~{}\mathrm{eV},\mathrm{and}\,w=-1$ 。针对此模型，模拟运行采用 $1h^{-1}\mathrm{Gpc}$ 立方体与 $512^{3}$ 暗物质粒子，并基于二阶拉格朗日扰动理论（2LPT）初始条件。作为对比，我们还分析了 Quijote 提供的 Rockstar 晕族目录。结果表明，Rockstar 的训练结果显著逊色于 FoF 方法，主要归因于前者晕族数密度远低于后者。详见附录 B。

In this study, we performed the following preprocessing steps on the halo catalogs in the Quijote LH $\nu w$ simulations to make it available as data for use by the neural network:
本研究中，我们对 Quijote LH $\nu w$ 模拟中的晕族目录执行了以下预处理步骤，以使其可作为神经网络训练数据使用：

1) The RSD effect was incorporated along the line of sight (LoS) to more accurately reproduce real observational conditions, as expressed by:
1) 为更精确地模拟实际观测条件，沿视线方向(LoS)引入了 RSD 效应，其表达式为：

\bm{s}=\bm{r}+\frac{\bm{v}\cdot\hat{z}}{aH(a)}\hat{z},

(1)

where $\bm{r}$ , $\bm{s}$ are the position of halos in real space and redshift space respectively. $\hat{z}$ is the unit vector along LoS, $\bm{v}$ is the peculiar velocity of halos, and $H(a)$ is the Hubble parameter at scale factor $a$ . In preparing each mock, we calculated the RSD effects separately based on the cosmological parameters in each Quijote simulation using Eq. 1.
其中 $\bm{r}$ 和 $\bm{s}$ 分别表示晕在真实空间和红移空间中的位置。 $\hat{z}$ 为沿视线方向(LoS)的单位向量， $\bm{v}$ 是晕的 peculiar velocity（本动速度）， $H(a)$ 为尺度因子 $a$ 处的哈勃参数。在制备每个模拟样本时，我们根据 Quijote 模拟中各宇宙学参数，利用方程 1 单独计算了红移空间畸变(RSD)效应。

2) Because observational data cannot determine the true cosmological parameters, we must rely on a specific fiducial cosmology. To ensure consistency with the observational data, all mock data must match with this fiducial cosmology background. The fiducial cosmology is derived from Planck 2018 measurements (Aghanim et al., 2020), where $\Omega_{m}=0.3071$ , $w=-1$ . The relation between the Quijote cosmologies and the fiducial cosmology is expressed by:
2) 由于观测数据无法确定真实的宇宙学参数，我们必须依赖特定的基准宇宙学模型。为确保与观测数据的一致性，所有模拟数据都需与该基准宇宙学背景相匹配。该基准宇宙学源自 Planck 2018 测量结果（Aghanim 等人，2020），其中 $\Omega_{m}=0.3071$ 、 $w=-1$ 。Quijote 宇宙学参数与基准宇宙学的关系通过以下公式表达：

s_{\perp}=s_{\perp}^{0}\frac{d^{f}_{A}(z)}{d_{A}(z)}\,,~{}\quad s_{\parallel}=% s_{\parallel}^{0}\frac{H(z)}{H^{f}(z)}

(2)

where $d_{A}(z)$ and $H(z)$ represent the angular diameter distance and the Hubble parameter at redshift $z$ , respectively. The variables $s_{\parallel}^{0}$ and $s_{\perp}^{0}$ represent the comoving coordinates in each Quijote simulation. The superscript $f$ denotes the fiducial cosmology, while $\bm{s}_{\perp}$ and $\bm{s}_{\parallel}$ represent components perpendicular and parallel to LoS, respectively. It should be noted that the analysis employs a fiducial cosmological model to convert redshifts to comoving distances prior to calculating the clustering signal. The objective of this transformation process is to more accurately reflect the data analyzed in real observations (Alam et al., 2017b). The results of the analysis are generally not sensitive to the specific parameters of the fiducial cosmology, provided that the fiducial cosmology are not significantly inaccurate.
其中 $d_{A}(z)$ 和 $H(z)$ 分别表示红移 $z$ 处的角直径距离和哈勃参数。变量 $s_{\parallel}^{0}$ 和 $s_{\perp}^{0}$ 代表每个 Quijote 模拟中的共动坐标。上标 $f$ 表示基准宇宙学模型，而 $\bm{s}_{\perp}$ 和 $\bm{s}_{\parallel}$ 则分别代表与视线方向垂直和平行的分量。需要注意的是，在计算聚类信号之前，分析采用基准宇宙学模型将红移转换为共动距离。此转换过程的目的是更准确地反映实际观测中分析的数据（Alam 等人，2017b）。只要基准宇宙学模型没有显著偏差，分析结果通常对其具体参数不敏感。

3) After conversion from the fiducial cosmology, the box sizes are no longer the same in all three dimensions. Therefore, to conveniently feed the data cubes into the neural network, we cut the converted boxes into sides of equal length, specifically $744~{}h^{-1}{\rm Mpc}$ . Consequently, only the halos within such box in each simulation are considered.
3) 从基准宇宙学模型转换后，盒子尺寸在三个维度上不再相同。因此，为便于将数据立方体输入神经网络，我们将转换后的盒子切割成等边长的立方体，具体长度为 $744~{}h^{-1}{\rm Mpc}$ 。这样一来，每个模拟中仅考虑此类立方体内的晕。

4) Considering that DM halos with very low mass contribute significant noise, we implemented a cutoff for small mass halos. This cutoff was chosen appropriately such that the number of DM halos has a density equal to $2\times 10^{-4}~{}h^{3}{\rm Mpc}^{-3}$ in each box to be compatible with current spectral observations Yuan et al. (2023). Furthermore, if the halo number density in a simulation box is lower than that value, the box is discarded, resulting in 1710 data cubes remaining. Of these, 1500 are used for training and 210 for testing. Note that, the parameter distributions deviate from a uniform distribution due to the discarding of some simulation boxes corresponding to different cosmological models, as illustrated in Fig. 1. It can be observed that there are noticeably fewer samples with lower $\sigma_{8}$ . This leads to poorer predictive performance in low $\sigma_{8}$ range. We also discussed replacing the fixed number density of halos with a fixed halo mass cutoff. The results indicate that our results are insensitive to both methods. Regarding the parameter $\sigma_{8}$ , predictions in the lower range have significantly improved. This is because the method of fixing the halo mass cutoff does not require discarding any cosmology, thus maintaining a uniform distribution of parameters. As a result, there are more samples with low $\sigma_{8}$ values, leading to improved predictive performance. Details are in Appendix B.
4）考虑到极低质量的暗物质晕会引入显著噪声，我们对小质量晕设置了截断阈值。该阈值经适当选取，使得每个模拟盒中的暗物质晕数密度与当前光谱观测数据兼容（Yuan 等人，2023 年文献 $2\times 10^{-4}~{}h^{3}{\rm Mpc}^{-3}$ ）。若某模拟盒中晕数密度低于该值，则予以剔除，最终保留 1710 个数据立方体。其中 1500 个用于训练，210 个用于测试。需注意的是，由于剔除部分对应不同宇宙学模型的模拟盒，参数分布偏离均匀分布（如图 1 所示），可观察到低 $\sigma_{8}$ 区域的样本明显偏少，导致低 $\sigma_{8}$ 区间的预测性能较差。我们还探讨了将固定晕数密度改为固定晕质量截断的方案，结果表明两种方法对结果均不敏感。针对参数 $\sigma_{8}$ ，低值区间的预测精度显著提升，这是因为固定晕质量截断法无需剔除任何宇宙学模型，从而保持了参数的均匀分布，使得低 $\sigma_{8}$ 值的样本更多，预测性能得以改善。详见附录 B。

5) The halo number density field is discretized into mesh cells by assigning the haloes to a $300^{3}$ mesh using the Cloud-in-Cell (CIC) scheme, with a cell resolution of $(2.48~{}h^{-1}{\rm Mpc})^{3}$ .
5) 通过使用云格点（CIC）方案将晕团分配到 $300^{3}$ 网格上，并将网格单元分辨率设为 $(2.48~{}h^{-1}{\rm Mpc})^{3}$ ，晕团数密度场被离散化为网格单元。

Refer to caption — Figure 1: Distribution of cosmological parameters across the simulation boxes following the preprocessing steps on the halo catalogs in the Quijote LH $\nu w$ simulations.As observed, by discarding simulations with halo number densities that are too low, the resulting distributions for each parameter exhibit slight deviations from a uniform distribution.
图 1：经过 Quijote LH $\nu w$ 模拟中晕族目录预处理步骤后，宇宙学参数在模拟盒中的分布情况。如图所示，通过剔除晕族数密度过低的模拟，各参数最终分布与均匀分布相比呈现轻微偏差。

II.1 Training and Test Samples
二、训练与测试样本

After preprocessing the simulation data, as mentioned previously, we obtained halo catalogs at the redshift of 0.5 for 1710 cosmological models. We utilized the spatial distribution information of halos together with various associated statistics as both training and test sets for the neural network. This study utilized three datasets for training and testing, as described below.
如前述对模拟数据进行预处理后，我们获得了 1710 个宇宙学模型在红移 0.5 处的晕族目录。利用晕族空间分布信息及其多种关联统计量作为神经网络的训练集与测试集。本研究采用如下所述三个数据集进行训练和测试。

Dataset A: we utilized the three-dimensional distribution of the DM halo number density field, $n(\bm{x})$ , which is interpolated onto a $300^{3}$ mesh with a resolution of $(2.48~{}h^{-1}{\rm Mpc})$ along each side, to extract the input cosmological parameters. The first and second rows of Fig. 2 display the projected halo number density fields in three different cosmological models, along with their corresponding zoomed-in plots.
数据集 A：我们采用暗物质晕数密度场 $n(\bm{x})$ 的三维分布，将其插值到分辨率为每边 $(2.48~{}h^{-1}{\rm Mpc})$ 的 $300^{3}$ 网格上，以提取输入宇宙学参数。图 2 第一、二行展示了三种不同宇宙学模型中投影晕数密度场及其对应的局部放大图。

Dataset B: We utilized the Fourier-transformed halo density field with $300^{3}$ grids. Letting $\tilde{\delta}(\mathbf{k})$ denote the Fourier transform of the overdensity $\delta(\mathbf{x})$ , defined by
数据集 B：我们使用经傅里叶变换的 $300^{3}$ 网格晕密度场。设 $\tilde{\delta}(\mathbf{k})$ 表示过密度 $\delta(\mathbf{x})$ 的傅里叶变换，其定义为

\tilde{\delta}(\bm{k})=\int\frac{d^{3}x}{(2\pi)^{3/2}}\delta(\bm{x})\exp(-i\bm% {k}\cdot\bm{x})\,,

(3)

where $\delta(\bm{x})=n(\bm{x})/\bar{n}-1$ is the density contrast, a dimensionless measure of overdensity at each point.
其中 $\delta(\bm{x})=n(\bm{x})/\bar{n}-1$ 为密度对比度，即各点处超密度的无量纲度量。

In practice, to complement $n(\bm{x})$ , we retain only the low-frequency (i.e. large-scale) modes in the Fourier space field, which are not captured by the configuration space field. Specifically, we filter $\tilde{\delta}(\bm{k})$ with $|k|<0.5h\mathrm{Mpc}^{-1}$ , resulting in a datacube of Fourier modes on a $60^{3}$ grid. The third and fourth rows of Fig. 2 show the amplitudes of the Fourier fields for the three different cosmological models, along with their corresponding zoomed-in plots. Here, the zero-frequency mode is located at the center of each plot. Note that both amplitude and phase are input into the neural network, where each Fourier mode can be expressed as $\tilde{\delta}=Ae^{i\phi}$ , with $A$ representing amplitude and $\phi$ representing phase.
实际操作中，为补充 $n(\bm{x})$ ，我们仅保留傅里叶空间场中未被位形空间场捕捉的低频（即大尺度）模式。具体而言，我们使用 $|k|<0.5h\mathrm{Mpc}^{-1}$ 对 $\tilde{\delta}(\bm{k})$ 进行滤波，生成一个基于 $60^{3}$ 网格的傅里叶模态数据立方体。图 2 第三、四行展示了三种不同宇宙学模型下傅里叶场的振幅及其对应的局部放大图。此处，零频模位于各图中心。需注意，神经网络同时输入振幅和相位信息，每个傅里叶模态可表示为 $\tilde{\delta}=Ae^{i\phi}$ ，其中 $A$ 代表振幅， $\phi$ 代表相位。

Dataset C: in addition to the density field information, we have integrated various statistics into our training samples. These statistics comprise the two-point correlation function of halos, $\xi(r)$ , and the corresponding power spectrum, $P(k)$ , and the wavelet scattering transform $(\rm{WST})$ coefficients, labeled as $S_{n}$ , where $n$ denotes the order of WST coefficients.
数据集 C：除密度场信息外，我们还将多种统计量整合至训练样本中。这些统计量包括晕团两点相关函数 $\xi(r)$ 、对应功率谱 $P(k)$ ，以及标记为 $S_{n}$ 的小波散射变换 $(\rm{WST})$ 系数（其中 $n$ 表示 WST 系数的阶次）。

The power spectrum is given by the following average over Fourier space:
功率谱由傅里叶空间中的下述平均值给出：

\left<\tilde{\delta}(\bm{k})\tilde{\delta}^{*}(\bm{k}^{\prime})\right>=(2\pi)^% {3}P(k)\delta^{3}(\bm{k}-\bm{k}^{\prime}).

(4)

The relationship between $P(k)$ and $\xi(r)$ is a Fourier transform, which can be mathematically expressed as follows,
$P(k)$ 与 $\xi(r)$ 之间的关系是一种傅里叶变换，其数学表达式如下，

\xi(r)\equiv\left<\delta(\bm{x})\delta(\bm{x-r})\right>=\int\frac{d^{3}k}{(2% \pi)^{3}}P(k)e^{i\bm{k}\cdot\bm{r}}\,.

(5)

Considering the relatively large uncertainty of statistics at small scales due to noise, $\xi(r)$ and $P(k)$ are normalized by their mean value,namely we only utilized their shapes and focused on specific ranges: $r\in\left[20,200\right]~{}h^{-1}{\rm Mpc}$ for $\xi(r)$ and $k\in\left[0.05,0.6\right]~{}h{\rm Mpc^{-1}}$ for $P(k)$ . In other words, we discarded magnitude information, keeping only shape information.
考虑到小尺度下统计因噪声导致的不确定性较大，我们将 $\xi(r)$ 和 $P(k)$ 按其均值进行归一化处理，即仅利用其形态特征并聚焦于特定范围： $\xi(r)$ 对应 $r\in\left[20,200\right]~{}h^{-1}{\rm Mpc}$ ， $P(k)$ 对应 $k\in\left[0.05,0.6\right]~{}h{\rm Mpc^{-1}}$ 。换言之，我们舍弃了幅值信息，仅保留了形状信息。

The wavelet scattering transform (WST) was originally introduced in the context of signal processing in computer vision, as discussed by Bruna and Mallat (2013); Mallat (2012). This method serves the purpose of capturing the statistical properties inherent in an input field. In the WST framework, an input field $I(\bm{x})$ undergoes two primary nonlinear operations: wavelet convolutions and modulus calculations. Essentially, when $\Psi_{j_{1},l_{1}}(\bm{x})$ denotes an oriented wavelet probing a scale $j_{1}$ and angle $l_{1}$ , the WST operation transforms $I(\bm{x})$ as follows:
小波散射变换（WST）最初由 Bruna 和 Mallat（2013 年）、Mallat（2012 年）在计算机视觉信号处理领域提出。该方法旨在捕捉输入场中固有的统计特性。在 WST 框架下，输入场 $I(\bm{x})$ 经历两种主要的非线性操作：小波卷积和模运算。具体而言，当 $\Psi_{j_{1},l_{1}}(\bm{x})$ 表示探测尺度 $j_{1}$ 和角度 $l_{1}$ 的定向小波时，WST 操作将 $I(\bm{x})$ 转换如下：

I^{\prime}(\bm{x})=\left|I(\bm{x})\otimes\Psi_{j_{1},l_{1}}(x)\right|\,.

(6)

Here, $\otimes$ represents convolution. The averaging of this operation produces a WST coefficient $S_{n}$ , essentially a real number describing the characteristics of the field. Through the utilization of a set of localized wavelets $\Psi_{j_{1},l_{1}}(\bm{x})$ , exploring different scales $j_{1}$ and angles $l_{1}$ , repeated iterations of this process generate a scattering network. The WST coefficients, $S_{n}$ , up to order $n=2$ , are determined by the following relationships:
此处， $\otimes$ 代表卷积运算。对该操作进行平均后产生一个 WST 系数 $S_{n}$ ，本质上是一个描述场特性的实数。通过使用一组局部化小波 $\Psi_{j_{1},l_{1}}(\bm{x})$ ，探索不同尺度 $j_{1}$ 和角度 $l_{1}$ ，该过程的反复迭代生成了一个散射网络。WST 系数 $S_{n}$ ，直至阶数 $n=2$ ，由以下关系式确定：

$\displaystyle S_{0}$	$\displaystyle=\left<\|I(\bm{x})\|\right>,$	(7)
$\displaystyle S_{1}\left(j_{1},l_{1}\right)$	$\displaystyle=\left<\left\|I(\bm{x})\otimes\Psi_{j_{1},l_{1}}(\bm{x})\right\|% \right>,$
$\displaystyle S_{2}\left(j_{2},l_{2},j_{1},l_{1}\right)$	$\displaystyle=\left<\left\|\left(\left\|I(\bm{x})\otimes\Psi_{j_{1},l_{1}}(\bm{x% })\right\|\right)\otimes\Psi_{j_{2},l_{2}}(\bm{x})\right\|\right>\,.$

Here, $<\cdot>$ denotes averaging over samples. Generally, a family of wavelets $\Psi_{j_{1},l_{1}}(\bm{x})$ can be generated by applying dilations and rotations to a mother wavelet. In our study, the mother wavelet is a solid harmonic multiplied by a Gaussian envelope, taking the form of
这里， $<\cdot>$ 表示对样本的平均。通常，一族小波 $\Psi_{j_{1},l_{1}}(\bm{x})$ 可以通过对母小波进行伸缩和旋转生成。在本研究中，母小波是一个乘以高斯包络的实值球谐函数，其形式为：

\Psi_{l}^{m}(\bm{x})=\frac{1}{(2\pi)^{3/2}}\mathrm{e}^{-|\bm{x}|^{2}/2\sigma^{% 2}}|\bm{x}|^{l}Y_{l}^{m}(\frac{\bm{x}}{\left|\bm{x}\right|})\,,

(8)

where, $Y^{m}_{l}$ represents the Laplacian spherical harmonics, and $\sigma$ denotes the Gaussian width measured in field pixels. In this study, we set $\sigma=0.25$ . Given a 3D input field, along with a total number of spatial dyadic scales $J$ and total orientations $L$ , WST coefficients can be calculated to any order. Here, the coefficient order is defined as a function of $(j,l)$ , where $j\in[0,1,...,J-1,J]$ and $l\in[0,1,...,L-1,L]$ . Detailed information on the coefficients can be found in Valogiannis et al. (2023). In our analysis, we set $J=6$ and $L=4$ , resulting in a total of 140 WST coefficients, excluding $S_{0}$ . In summary, the WST coefficients in our work are
其中， $Y^{m}_{l}$ 代表拉普拉斯球谐函数， $\sigma$ 表示以场像素为单位测量的高斯宽度。在本研究中，我们设定 $\sigma=0.25$ 。给定一个三维输入场，连同空间二进尺度总数 $J$ 和总方向数 $L$ ，WST 系数可计算至任意阶数。此处，系数阶数定义为 $(j,l)$ 的函数，其中 $j\in[0,1,...,J-1,J]$ 且 $l\in[0,1,...,L-1,L]$ 。关于系数的详细信息可参见 Valogiannis 等人（2023 年）的研究。在我们的分析中，我们设定 $J=6$ 和 $L=4$ ，最终得到共计 140 个 WST 系数（不包括 $S_{0}$ ）。简而言之，本工作中涉及的 WST 系数为

\begin{split}S_{0}&=\left<|I(\bm{x})^{q}|\right>,\\ S_{1}(j_{1},l_{1})&=\left<\big{(}\sum_{m=-l_{1}}^{m=l_{1}}|I(\bm{x})\otimes% \Psi_{j_{1},l_{1}}^{m}(\bm{x})|^{2}\big{)}^{q/2}\right>,\\ S_{2}(j_{2},j_{1},l_{1})&=\left<\big{(}\sum_{m=-l_{1}}^{m=l_{1}}|U_{1}(j_{1},l% _{1})(\bm{x})\otimes\Psi_{j_{2},l_{1}}^{m}(\bm{x})|^{2}\big{)}^{q/2}\right>\,,% \end{split}

(9)

with 随着

U_{1}(j_{1},l_{1})(\bm{x})=\sum_{m=-l_{1}}^{m=l_{1}}|I(\mathbf{x})\otimes\Psi_% {j_{1},l_{1}}^{m}(\bm{x})|^{2}\,,

(10)

where $q$ is a specified power governing operations on a target field. Choosing $q>1$ or $q<1$ highlights overdense or underdense regions, respectively, while $q=1$ represents the basic WST scenario. In our analysis, we consider all three cases: $q=0.5$ , $q=1$ , and $q=2$ . Fig. 3 presents these three statistics, derived from the same three simulation boxes as depicted in Fig. 2. For WST coefficients, only $q=0.5$ is displayed.
其中 $q$ 为作用于目标场的指定幂次运算。选择 $q>1$ 或 $q<1$ 分别突出显示过密或欠密区域，而 $q=1$ 代表基本 WST 场景。我们的分析涵盖了三种情况： $q=0.5$ 、 $q=1$ 与 $q=2$ 。图 3 展示了这三种统计量，其源自与图 2 相同的三个模拟箱体。对于 WST 系数，仅显示 $q=0.5$ 。

Finally, employing the principal component analysis (PCA) technique, an efficient compression scheme is utilized to retain most of the signal information encoded in the data while projecting out the noise-dominated modes. The original $\xi(r)$ has 266 bins, $P(k)$ has 243 bins, and $S_{1}$ , $S_{2}$ totally consist of 420 bins. Through PCA, each measurement statistic is compressed into a one-dimensional vector of 20 dimensions.
最终采用主成分分析（PCA）技术，通过高效压缩方案在保留数据中大部分信号信息的同时剔除噪声主导模式。原始 $\xi(r)$ 包含 266 个分箱， $P(k)$ 含 243 个分箱，而 $S_{1}$ 与 $S_{2}$ 共计 420 个分箱。经 PCA 处理后，每项测量统计量被压缩为 20 维的一维向量。

III Method 三、方法

To fully exploit the three-dimensional nature of the data, we employed a deep 3D convolutional network. After investigating several architectures, we propose a lightweight deep convolutional neural network (lCNN) that is efficient and present high performance in parameter estimation. Fig. 4 schematically depicts lCNN designed for determining cosmological parameters.
为充分利用数据的三维特性，我们采用了深度三维卷积网络。在考察多种架构后，提出一种轻量级深度卷积神经网络（lCNN），该网络在参数估计中兼具高效性与卓越性能。图 4 示意性展示了为测定宇宙学参数而设计的 lCNN 结构。

The network comprises three types of layers: 3D convolutions which is followed by batch normalization, max pooling layers,and fully connected layers. It begins with a $60^{3}$ -voxel input layer representing the density field. When incorporating the Fourier transform of the density, two extra channels are introduced to accommodate the amplitude and phase of the Fourier modes. Thus, we represent the dimension of the input data cube as $60^{3}\times n$ , where $n$ is for the number of channels utilized. Following this 4 convolutional layers are applied, and each is accompanied by batch normalization and a max-pooling layer with a kernel $(2,2,2)$ for dimensionality reduction. The size of kernel in convolutional layers is $(3,3,3)$ except the third, which is $(4,4,4)$ . Specifically, in the first two convolutional layers, we performed padding operations, which involve adding extra layers of zeros around the input data matrix before applying the convolution operation. The main purpose of padding is to preserve the spatial dimensions of the input volume Li et al. (2020). For comparison, we examined circular padding, which is more suitable for data with periodic boundary conditions. We found that the results did not show significant changes, as 97.7% to 98.1% of the pixel values in the halo density field are zero. For details, see Appendix B. After four 3D convolutions,the input data information are encoded by $128\times 2^{3}$ voxels, which are transited into a standard deep neural network after the flatten operation. Here, we introduce two new hidden layers, with 1324 and 128 neurons respectively, before concluding with a seven-neuron output layer. This output layer corresponds to the seven parameters that have been varied in the simulations. Additionally, when employing statistical measurements such as power spectra, 2PCF, and WST coefficients, each measurement originally has a dimension of 20. We then construct two fully connected layers to transform the dimension of each statistic to 100, which are concatenated with the output features of lCNN before passing them through the fully connected layers. It is worth mentioning that changing the number of convolutional layers and fully connected layers in the CNN does not have a significant impact on our results. This indicates that our results exhibit a certain robustness to the choice of neural network architecture. For specific details on this discussion, please refer to Appendix A.
该网络由三种类型的层构成：三维卷积层（后接批量归一化）、最大池化层以及全连接层。网络起始于一个 $60^{3}$ 体素输入层，用于表示密度场。当引入密度的傅里叶变换时，会额外增加两个通道以容纳傅里叶模的振幅和相位。因此，我们将输入数据立方体的维度表示为 $60^{3}\times n$ ，其中 $n$ 代表所使用的通道数。随后应用四个卷积层，每层均配有批量归一化及核大小为 $(2,2,2)$ 的最大池化层以实现降维。除第三层卷积核尺寸为 $(4,4,4)$ 外，其余卷积层核尺寸均为 $(3,3,3)$ 。特别地，在前两个卷积层中，我们执行了填充操作——即在卷积运算前向输入数据矩阵外围添加零值层，其主要目的是保持输入体积的空间维度（Li 等人，2020 年）。为对比效果，我们还考察了更适用于周期性边界条件数据的圆形填充方式，发现结果未出现显著变化（因晕密度场中 97.7%至 98.1%的像素值为零），详见附录 B。经过四次三维卷积运算后，输入数据信息被编码为 $128\times 2^{3}$ 个体素，经展平操作后转入标准深度神经网络。在此，我们引入了两个新的隐藏层，分别包含 1324 和 128 个神经元，最后接一个七神经元的输出层。该输出层对应模拟中变化的七个参数。此外，当采用功率谱、两点相关函数（2PCF）和小波散射变换（WST）系数等统计量时，每个统计量的原始维度为 20。我们随后构建两个全连接层，将各统计量的维度转换至 100，再与轻量级卷积神经网络（lCNN）的输出特征拼接后通过全连接层。值得一提的是，改变 CNN 中卷积层和全连接层的数量对我们的结果影响不大，这表明我们的结果对神经网络架构的选择具有一定鲁棒性。具体讨论细节请参阅附录 A。

Throughout the network, rectified linear unit (ReLU) activation functions are employed. For optimization, the Adam optimizer (Kingma and Ba, 2017) is utilized with a learning rate of $5\times 10^{-5}$ . For our machine learning task, we opted for the widely-used Mean Squared Error (MSE) loss function. This metric quantifies the average squared difference between predicted and true values, defined as
整个网络采用修正线性单元（ReLU）作为激活函数。优化过程使用 Adam 优化器（Kingma 和 Ba，2017），学习率为 $5\times 10^{-5}$ 。针对机器学习任务，我们选择了广泛使用的均方误差（MSE）损失函数。该指标通过计算预测值与真实值间平方差的平均值来量化误差，其定义为

{\rm MSE}=\frac{1}{N}\sum^{N}_{i=1}\left(y^{\rm pred}_{i}-y^{\rm true}_{i}% \right)^{2}

(11)

In the training and testing process, we employed a density field that has been interpolated into a $300^{3}$ grid using the CIC scheme, as previously mentioned. For training purposes, we divided a single $300^{3}$ data cube into $5^{3}$ sub-boxes, each with dimensions of $60^{3}$ . When combining the density field in Fourier space, which has dimensions of $60^{3}\times 2$ , the density and its Fourier transform were concatenated into a sub-box with dimensions of $60^{3}\times n$ , where $n=3$ . When utilizing only statistical measurements without incorporating density fields, parameter estimation is exclusively performed using the random forest network.
在训练与测试过程中，我们采用了如前所述的 CIC 方案将密度场插值至 $300^{3}$ 网格。为便于训练，我们将单个 $300^{3}$ 数据立方体划分为 $5^{3}$ 个子区域，每个子区域尺寸为 $60^{3}$ 。在傅里叶空间合并密度场时（其维度为 $60^{3}\times 2$ ），密度场及其傅里叶变换被拼接成维度为 $60^{3}\times n$ 的子区域，其中 $n=3$ 。当仅使用统计量而不结合密度场时，参数估计仅通过随机森林网络完成。

During training, for each epoch, we randomly selected a sub-box to feed into the neural network. For testing, all $5^{3}$ sub-boxes are fed into the neural network, and the predictions are averaged to estimate the cosmological model. The training and testing processes are schematically depicted in Fig. 5.
训练阶段，每个 epoch 随机选取一个子区域输入神经网络；测试阶段则将所有 $5^{3}$ 个子区域输入网络，通过预测结果的平均值来估计宇宙学模型。图 5 展示了训练与测试流程的示意图。

III.1 Evaluation Metrics 三.1 评估指标

Once the data is divided into training and test sets, we proceed to estimate the cosmological parameters for inputs outlined in subsequent sections. We evaluate the performance of each model through four approaches on the test set to quantify the results: 1) plotting the predicted values against the ground truth for the test set, quantified by the coefficient of determination $R^{2}$ , $R^{2}$ ranges from $0$ to $1$ , where $1$ represents perfect inference.; 2) calculating the averaged bias (Bias) for each parameter; 3) calculating the relative squared error (RSE) for each parameter; 4) calculating the Root Mean Square (RMSE) for each parameter. These quantities are defined as follows:
数据划分为训练集和测试集后，我们着手估算后续章节中所述输入的宇宙学参数。通过四种方法在测试集上评估各模型性能以量化结果：1)绘制测试集预测值与真实值的对比图，以决定系数 $R^{2}$ 量化， $R^{2}$ 的取值范围为 $0$ 至 $1$ ，其中 $1$ 代表完美推断；2)计算各参数的平均偏差（Bias）；3)计算各参数的相对平方误差（RSE）；4)计算各参数的均方根误差（RMSE）。这些量定义如下：

R^{2}=1-\frac{\sum_{i}\big{(}y_{i}^{\rm pred}-y_{i}^{\rm true}\big{)}^{2}}{% \sum_{i}\big{(}y_{i}^{\rm true}-\bar{y}^{\rm true}\big{)}^{2}}\,,

(12)

{\rm Bias}=\frac{1}{N}\sum_{i}\big{(}y_{i}^{\rm pred}-y_{i}^{\rm true}\big{)}\,,

(13)

{\rm RSE}=\frac{\sum_{i}\big{(}y_{i}^{\rm pred}-y_{i}^{\rm true}\big{)}^{2}}{% \sum_{i}\big{(}y_{i}^{\rm true}-\bar{y}^{\rm true}\big{)}^{2}}\,,

(14)

and 和

{\rm RMSE}=\sqrt{\frac{1}{N}\sum_{i}\big{(}y_{i}^{\rm pred}-y_{i}^{\rm true}% \big{)}^{2}}\,,

(15)

where the summation runs through the entire $N$ test samples, with the bar indicating the average. The $R^{2}$ quantifies the fraction by which the error variance is less than the true variance, while the RMSE provides an overall measure of the model’s prediction accuracy, with lower values indicating better performance. Similarly, RSE measures the relative error between predicted and true values by comparing the squared difference between them. On other hand, Bias denotes the systematic error of predictions from true values. A Bias close to zero indicates that, on average, the model is making predictions that are unbiased.
其中求和遍历全部 $N$ 个测试样本，上划线表示平均值。 $R^{2}$ 量化了误差方差小于真实方差的比例，而 RMSE 提供了模型预测准确性的整体度量，数值越低表现越好。类似地，RSE 通过比较预测值与真实值的平方差来衡量相对误差。另一方面，Bias 表示预测相对于真实值的系统误差，接近零的 Bias 意味着模型平均而言做出了无偏预测。

IV Results 四、结果分析

In this section, we present the results obtained from various models using different inputs, including the density field, its Fourier modes, the three statistical measurements, and their combinations. Additionally, we compare the performance of predictions made by the random forest model using statistical measurements alone.
本节展示了采用不同输入（包括密度场、其傅里叶模式、三项统计量及其组合）时，各类模型所获得的结果。此外，我们还对比了随机森林模型仅基于统计量进行预测的性能表现。

Five distinct models were devised to evaluate the optimal choice among different datasets as inputs, including:
为评估不同数据集作为输入时的最优选择，我们设计了五种不同的模型，具体包括：

1.

Model “ ${\rm CNN}(r)$ ”: utilizing solely the density field.

1. 模型“ ${\rm CNN}(r)$ ”：仅利用密度场数据。
2.

Model “CNN ( $r+k$ )”: incorporating the density field along with its Fourier modes.

2. 模型“CNN( $r+k$ )”：整合密度场及其傅里叶模态。
3.

Model “CNN( $r$ )+FC(statistics)”: employing the density field together with three statistics including $P(k)$ , $\xi(r)$ , and $S_{n}$ .

3. 模型“CNN( $r$ )+FC(统计量)”：结合密度场与三项统计量 $P(k)$ 、 $\xi(r)$ 及 $S_{n}$ 共同使用。
4.

Model “CNN( $r+k$ )+FC(statistics)”: combining the case “CNN ( $r+k$ )” with three statistics.

4. 模型“CNN( $r+k$ )+FC(统计量)”：将“CNN( $r+k$ )”案例与三项统计量相结合。
5.

Model “RF(statistics)”: using only the three measured statistics.

5. 模型“RF(统计量)”：仅采用三项测量统计量。

Here, “CNN” and “FC” represent the convolutional layers and fully connected layers of lCNN, while ”RF” represents the random forest network. For comparison, the model “FC(statistics)” incorporating the three statistics was trained using a fully connected network similar to the FC layers of our CNN, and “FC(Pk)” corresponds to the FC network trained using the statistics of the power spectrum alone.
此处，“CNN”与“FC”分别代表 lCNN 中的卷积层和全连接层，“RF”则指代随机森林网络。为便于比较，采用与 CNN 全连接层结构相似的纯全连接网络训练了包含三项统计量的模型“FC(statistics)”，而“FC(Pk)”对应仅基于功率谱统计量训练的全连接网络。

Fig. 6 displays loss curves against the number of epochs for the five different training sets. The blue and red curves represent the loss for the training and testing data sets, respectively, corresponding to 87.7% and 12.3% of the full dataset.
图 6 展示了五个不同训练集上损失随训练轮次变化的曲线。蓝色与红色曲线分别代表训练集和测试集的损失情况，其数据量各占完整数据集的 87.7%和 12.3%。

It can be observed that when using “CNN( $r$ )” alone, the loss function on the training set decreases slower compared to other cases, gradually converging to $0.77$ . Incorporating $k$ -space data or statistics has the effect of reducing loss on the test data, which is an indication of a modest improvement in performance. However, on the training data, there is a considerable reduction in loss, which suggests that the model has achieved its optimum generalisation point in the test set. In such cases, the lowest test set loss is typically taken to indicate the optimal performance of the model.
可以观察到，单独使用“CNN( $r$ )”时，训练集上的损失函数下降速度较其他情况更慢，最终逐渐收敛至 $0.77$ 。引入 $k$ 空间数据或统计量能有效降低测试数据上的损失，这表明模型性能得到了适度提升。然而在训练数据上，损失出现了显著下降，这暗示模型在测试集上已达到其最佳泛化点。此类情况下，通常以测试集损失最低点作为模型最优性能的判定标准。

For the case of “CNN( $r+k$ )+FC(statistics)”, the loss function drops to the lowest value of 0.57 at epoch about 200, outperforming all other cases. Importantly, comparing with the loss functions for the testing dataset, combining both fields and statistics results in the lowest loss values. Thus, as expected, this case demonstrates the best performance for parameter estimation, as we will demonstrate later. Additionally, we trained lCNN until achieving the lowest loss values for the test set.
在“CNN( $r+k$ )+FC(统计量)”案例中，损失函数于约 200 个训练周期时降至最低值 0.57，表现优于所有其他情况。关键的是，与测试数据集的损失函数相比，联合使用场量和统计量能得到最低的损失值。因此，正如预期那样，该案例展示了参数估计的最佳性能，后文将予以详述。此外，我们训练 lCNN 直至测试集损失值达到最低。

In Fig. 7, we present the actual predictions from our designed lCNN using the entire test dataset. The corresponding $R^{2}$ value for each case is also listed in each panel. For comparison, black curves are drawn to represent perfect parameter recovery ( $R^{2}=1$ ), where the correlation between the predicted and true values is 100%.
图 7 展示了我们设计的 lCNN 在整个测试数据集上的实际预测结果，各子图中同时标注了对应案例的 $R^{2}$ 值。为便于比较，图中用黑色曲线表示参数完全复原的理想情况( $R^{2}=1$ )，此时预测值与真实值的相关性为 100%。

The five panels in each row correspond to the five input models for fixed cosmological parameters. Results for the seven cosmological parameters are presented from top to bottom, respectively. The prediction accuracies for $\Omega_{m}$ , $\sigma_{8}$ and $h$ from lCNN are significantly higher compared to other parameters. Especially, the predicted $\Omega_{m}$ values closely match the ground truth (black lines), with relatively small scatters. As expected, overall, the model “CNN( $r+k$ )+FC(statistics)” demonstrates the best performance for parameter predictions among other models, evident from its highest average $R^{2}$ value. However, none of the five models perform well for predicting $\Omega_{b}$ , $M_{\nu}$ , and $w$ , with the highest $R^{2}$ values only reaching $0.152$ , $0.037$ , and $0.196$ , respectively. This is because these parameters do not visibly imprint unique features in LSS in our simulation mocks and also degenerate with other parameters.
每行中的五个面板对应于固定宇宙学参数下的五种输入模型。七种宇宙学参数的结果分别从上至下展示。lCNN 对 $\Omega_{m}$ 、 $\sigma_{8}$ 和 $h$ 的预测精度显著高于其他参数，尤其是预测的 $\Omega_{m}$ 值与真实值（黑线）高度吻合，离散程度相对较小。正如预期，总体而言，“CNN( $r+k$ )+FC(统计量)”模型在参数预测方面表现最佳，其平均 $R^{2}$ 值最高即为明证。然而，五种模型在预测 $\Omega_{b}$ 、 $M_{\nu}$ 和 $w$ 时均表现不佳，最高 $R^{2}$ 值仅分别达到 $0.152$ 、 $0.037$ 和 $0.196$ 。这是因为在我们的模拟数据中，这些参数未在 LSS 上留下明显独特特征，且与其他参数存在简并性。

Additionally, LSS is sensitive to the total density of $\Omega_{m}$ , rather than the relatively small quantity of $\Omega_{b}$ , as no baryonic feedback is considered in the cold DM simulations. Moreover, one snapshot at $z=0.5$ cannot effectively distinguish the different dark energy equations of state $w$ . Since both lCNN and the random forest fail to provide effective predictions for $\Omega_{b}$ , $M_{\nu}$ and $w$ , we do not display their results in the following.
此外，由于冷暗物质模拟中未考虑重子反馈效应，LSS 对 $\Omega_{m}$ 的总密度敏感，而非相对较少的 $\Omega_{b}$ 量。再者，单一 $z=0.5$ 时刻的快照难以有效区分不同的暗能量状态方程 $w$ 。鉴于 lCNN 与随机森林模型均未能对 $\Omega_{b}$ 、 $M_{\nu}$ 及 $w$ 提供有效预测，后续将不展示其相关结果。

CNN( $r$ ) CNN( $r$ ) CNN( $r+k$ ) CNN( $r+k$ ) CNN( $r$ )+FC(statistics)
卷积神经网络( $r$ )+全连接层(统计特征) CNN( $r+k$ )+FC(statistics)
卷积神经网络( $r+k$ )+全连接层(统计特征) RF (statistics) FC(statistics) $\Omega_{m}$ 0.013/0.047/0.161 0.023/0.041/0.126 0.005/0.040/0.118 0.014/0.039/0.113 -0.006/0.050/0.187 -0.009/0.050/0.182 $h$ 0.011/0.060/0.284 -0.039/0.073/0.429 0.004/0.063/0.314 -0.004/0.061/0.295 0.008 /0.077/0.454 0.004/0.067/0.361 $n_{s}$ 0.004/0.110/0.881 -0.009/0.099/0.717 -0.020/0.099/0.713 -0.020/0.091/0.610 -0.003/ 0.099/0.740 -0.011/0.095/0.657 $\sigma_{8}$ 0.017/0.072/0.440 -0.014/0.067/0.388 -0.0007/0.059/0.610 -0.001/0.059/0.293 -0.0003/ 0.078/0.582 0.007/0.069/0.396

Table 1: Summary of the measured evaluation metrics of Bias, RMSE, and RSE for the six models across four cosmological parameters. A small value for Bias, RMSE, or RSE indicates that the predictions made by the model are close to the true values, suggesting that the performance is relatively good.
表 1：六种模型在四个宇宙学参数上测得的偏差(Bias)、均方根误差(RMSE)及相对标准误差(RSE)指标汇总。偏差、均方根误差或相对标准误差值越小，表明模型预测结果越接近真实值，反映出模型性能相对较优。

In Tab. 1, the Bias, RMSE, and RSE metrics are presented for detailed comparison across the four cosmological parameters among the five models. The results agree with the $R^{2}$ values depicted in Fig. 7. The “CNN( $r+k$ )+FC(statistics)” model demonstrates the smallest RMSE and RSE values across almost all of these parameters, indicating high accuracy and small uncertainty estimation compared to other models. Notably, Bias values for all models closely match the true values within a $2\sigma$ level compared to RMSE, highlighting the robustness of the networks and negligible systematic errors.
表 1 详细对比了五种模型在四个宇宙学参数上的偏差、均方根误差和相对标准误差指标。该结果与图 7 所示的 $R^{2}$ 值一致。"CNN( $r+k$ )+FC(统计量)"模型在几乎所有参数上都展现出最小的均方根误差和相对标准误差，表明其相较于其他模型具有更高的精度和更小的不确定性估计。值得注意的是，所有模型的偏差值均与真实值高度吻合，在 $2\sigma$ 水平上优于均方根误差，凸显了神经网络架构的稳健性及可忽略的系统误差。

To emphasize the MSE values for different models, in Fig. 8, we display the relative MSE values compared to the model “RF(statistics)”. This is represented as the ratio of MSE for each model to that from the random forest network. Since our trained lCNN models are ineffective for $\Omega_{b}$ , $M_{\nu}$ , and $w$ , due to the much low $R^{2}$ values, we only compare MSE values of the four parameters ( $\Omega_{m}$ , $h$ , $n_{s}$ , $\sigma_{8}$ ) relative to that of model “RF(statistics)”. From this comparison, we observe that, except for the parameter $n_{s}$ with the model “CNN( $r$ )”, lCNN performs significantly better than with “RF(statistics)”. However, compared to random forests (RF), when training on statistical quantities using the same fully connected layers as our lCNN, denoted as the model “FC(statistics)”, there was an improvement in the performance of parameters $n_{s}$ , $h$ and $\sigma_{8}$ . Additionally, the combination of the density field and the Fourier modes, i.e., the model “CNN( $r+k$ )”, performs better than using the density field alone, corresponding to the model “CNN( $r$ )”, except for the parameter $h$ . Moreover, feeding the three measured statistics to lCNN further enhances the accuracy of prediction, effectively lowering the MSE values. The best performance is achieved for the model “CNN( $r+k$ )+FC(statistics)” (red line), reducing MSE by about 5–37% when compared with “FC(statistics)” model across the parameters.
为了突出不同模型的均方误差（MSE）值差异，图 8 展示了各模型相对于“RF(statistics)”基准的相对 MSE 比值。由于我们训练的 lCNN 模型对 $\Omega_{b}$ 、 $M_{\nu}$ 和 $w$ 参数效果不佳（其 $R^{2}$ 值过低），故仅比较四个参数（ $\Omega_{m}$ 、 $h$ 、 $n_{s}$ 、 $\sigma_{8}$ ）与基准模型的 MSE 比率。分析表明：除采用“CNN( $r$ )”模型的 $n_{s}$ 参数外，lCNN 在其他参数上的表现均显著优于“RF(statistics)”。与随机森林（RF）相比，当使用与 lCNN 相同的全连接层训练统计量（即“FC(statistics)”模型）时，参数 $n_{s}$ 、 $h$ 和 $\sigma_{8}$ 的预测性能有所提升。此外，结合密度场与傅里叶模态的“CNN( $r+k$ )”模型表现优于仅使用密度场的“CNN( $r$ )”模型（参数 $h$ 除外）。将三种统计量输入 lCNN 可进一步降低 MSE 值，其中“CNN( $r+k$ )+FC(statistics)”模型（红色曲线）表现最佳，相较“FC(statistics)”模型能使各参数 MSE 降低约 5%-37%。

To investigate the error of lCNN in prediction, we illustrate the joint distribution of each parameter pair and the histogram of each parameter in Fig. 9. Two models are presented for comparison: “CNN( $r+k$ )+FC(statistics)” and “FC(statistics)”.For clarity, the plots display the distributions of the errors of cosmological parameters, centered around the mean of the parameter space, i.e.
为探究 lCNN 的预测误差，图 9 展示了各参数对的联合分布及单参数直方图。对比模型为“CNN( $r+k$ )+FC(statistics)”和“FC(statistics)”。为清晰呈现，图中显示的是以参数空间均值为中心的标准化的宇宙学参数误差分布，即

p_{i}=\Delta p_{i}+\bar{p}^{\rm true}\,,~{}\quad{\rm with~{}}\Delta p_{i}=p_{i% }^{\rm pred}-p_{i}^{\rm true}

(16)

where $\bar{p}^{\rm true}$ denotes the averaged true value over all 210 test samples with varied cosmological parameters. Here $\Delta p_{i}$ denotes the bias for a given parameter predicted from the $i$ -th test sample.
其中 $\bar{p}^{\rm true}$ 表示 210 个不同宇宙学参数测试样本的真值均值， $\Delta p_{i}$ 代表第 $i$ 个测试样本的给定参数预测偏差。

In contrast to likelihood analysis, error estimation in machine learning requires a careful consideration. Jeffrey and Wandelt (2020) suggests defining parameter error using marginal flows and Moment Networks. Shridhar et al. (2018) introduces an uncertainty estimation method using Bayesian convolutional neural networks with variational inference. Another approach (Zhang et al., 2023), involves predicting error by sampling hidden variables. Alternatively, prediction accuracy (Lazanu, 2021) can provide an error estimation. In this study, we use a large sample of true parameter values. The distribution of the bias, $\Delta p_{i}$ , between the predicted and true values as defined in Eq. 16, yields a probability distribution of biases in parameter accuracy, which directly corresponds to our neural network’s error estimation. To test the robustness of these error estimates, we divided the test sample into four bins based on parameter values, as shown in Fig. 10. This approach allows us to estimate the bias distribution within specific parameter ranges and calculate the standard deviation, which serves as the error for each parameter range. Theoretically, using more bins increases the accuracy of the error estimate, and the standard deviation of the bias more closely approximates the true error. As shown in Fig. 10, the standard deviation of the bias distribution does not vary significantly with different parameter values. Additionally, these standard deviation values are essentially the same as the error values in Fig. 9, indicating that our error estimation is reasonable.
与似然分析不同，机器学习中的误差估计需谨慎考量。Jeffrey 和 Wandelt（2020）提出利用边际流和矩网络定义参数误差。Shridhar 等人（2018）则采用贝叶斯卷积神经网络结合变分推理进行不确定性估计。Zhang 等人（2023）另辟蹊径，通过采样隐变量预测误差。此外，Lazanu（2021）指出预测精度本身亦可作为误差估计依据。本研究采用大量真实参数值样本，通过式 16 定义的预测值与真实值间偏差 $\Delta p_{i}$ 的分布，得到参数准确度的偏差概率分布，该分布直接对应神经网络误差估计。为验证误差估计的稳健性，我们按参数值将测试样本划分为四个区间（图 10 所示），借此估算特定参数范围内的偏差分布并计算标准差，作为各参数区间的误差值。理论上，增加区间数量可提升误差估计精度，偏差标准差也更接近真实误差。如图 10 所示，偏差分布的标准差在不同参数值间未呈现显著变化。此外，这些标准差数值与图 9 中的误差值基本一致，表明我们的误差估计是合理的。

To further justify the validity of using this method for error estimation, we conducted a likelihood-based analysis on the statistics of the power spectrum alone. Initially, we computed the covariance matrix of $P(k)$ using 1000 Quijote realizations of the fiducial cosmology in the range of $k\in[0.05,0.6]~{}h\mathrm{Mpc}^{-1}$ . The likelihood can be constructed through $-2\ln\mathcal{L}(P(k)|\bm{\theta})\propto[P(k,\bm{\theta})-\bar{P}(k)]^{T}% \Sigma^{-1}[P(k,\bm{\theta})-\bar{P}(k)]$ , where $\bar{P}(k)$ is the “true” power, estimated from the mean spectrum of all mock realizations. The covariance $\Sigma$ is primarily sourced from the sampling variance of $P(k)$ in the mock data and was directly estimated from the Quijote fiducial-cosmology mock realizations.
为进一步验证采用此方法进行误差估计的有效性，我们基于似然性对功率谱统计量进行了独立分析。首先，我们利用 1000 组 Quijote 基准宇宙学模拟在 $k\in[0.05,0.6]~{}h\mathrm{Mpc}^{-1}$ 范围内计算了 $P(k)$ 的协方差矩阵。似然函数可通过 $-2\ln\mathcal{L}(P(k)|\bm{\theta})\propto[P(k,\bm{\theta})-\bar{P}(k)]^{T}% \Sigma^{-1}[P(k,\bm{\theta})-\bar{P}(k)]$ 构建，其中 $\bar{P}(k)$ 代表“真实”功率，由所有模拟实现的平均谱估计得出。协方差 $\Sigma$ 主要来源于模拟数据中 $P(k)$ 的采样方差，并直接基于 Quijote 基准宇宙学模拟实现进行估计。

We performed our likelihood evaluation using the Monte Carlo Markov Chain (MCMC) method, utilizing the emcee package (Foreman-Mackey et al., 2013). In the MCMC process, the $P(k,\bm{\theta})$ data for a given cosmological parameter set were derived from the 1710 cosmologies by performing the nearest interpolation in high-dimensional space, which enables us to estimate $P(k,\bm{\theta})$ for any sampled point in the parameter space. The cosmological constraints from $P(k)$ alone, based on the likelihood inference, are summarized in Fig. 9. As seen, the $P(k)$ -alone-derived 1 $\sigma$ statistical errors are 0.069 for $\Omega_{m}$ , 0.085 for $h$ , 0.090 for $n_{s}$ , and 0.091 for $\sigma_{8}$ . Thus, the constraints are weaker than those from lCNN, especially for $\Omega_{m}$ and $\sigma_{8}$ .
我们采用蒙特卡洛马尔可夫链（MCMC）方法进行似然评估，并利用 emcee 包（Foreman-Mackey 等人，2013 年）。在 MCMC 过程中，给定宇宙学参数集的 $P(k,\bm{\theta})$ 数据通过在高维空间执行最近邻插值从 1710 个宇宙学模型中导出，这使我们能够估算参数空间中任意采样点的 $P(k,\bm{\theta})$ 。基于似然推断，仅从 $P(k)$ 获得的宇宙学约束总结于图 9。如图所示，单独由 $P(k)$ 导出的 1 $\sigma$ 统计误差分别为： $\Omega_{m}$ 为 0.069， $h$ 为 0.085， $n_{s}$ 为 0.090， $\sigma_{8}$ 为 0.091。因此，这些约束弱于 lCNN 所得结果，尤其对于 $\Omega_{m}$ 和 $\sigma_{8}$ 而言。

The two-dimensional contour plots illustrate the joint probability distribution at 68% and 95% levels, respectively, providing information about the correlation between each pair of parameters. Meanwhile, the one-dimensional distribution displays the marginalized probability of each parameter. As observed, for each parameter, the model “CNN ( $r+k$ )+ FC(statistics)” offers a more sharply marginalized probability distribution than that for “FC(statistics)”. In other words, the former provides more accurate predictions on the cosmological parameters. This finding is further confirmed by the contour plots.
二维等高线图分别展示了 68%和 95%置信水平下的联合概率分布，揭示了各参数对之间的相关性。与此同时，一维分布显示了各参数的边缘化概率分布。如图所示，对于每个参数，模型“CNN( $r+k$ )+FC(statistics)”比“FC(statistics)”具有更尖锐的边缘化概率分布。换言之，前者能对宇宙学参数作出更精确的预测。这一结论在等高线图中得到了进一步验证。

The contour area corresponds to the statistical uncertainty. As observed, the scatters of the difference between the predicted values and the true ones for all test datasets, derived from lCNN are considerably smaller than those from the case “FC(statistics)”. Both the centers of the two-dimensional contour and the one-dimensional distribution for each parameter are close to the averaged true value (gray dashed), and the deviation is significantly smaller than the statistical uncertainty. In particular, the mean and the associated standard deviation, $\sigma_{p}$ , derived from the model “CNN(r+k)+FC(statistics)” for the marginalized distribution are listed at the top of each one-dimensional plot. As observed, all the mean values closely agree with the true ones. The deviation for “CNN( $r+k$ )+FC(statistics)” from the averaged true value is 2.8% for $\Omega_{m}$ , 2.4% for $h$ , 2.5% for $n_{s}$ , and 0.6% for $\sigma_{8}$ , respectively. In comparison with using the model “FC(statistics)”, lCNN yields smaller statistical errors. Specifically, it is reduced by 23% for $\Omega_{m}$ , 11% for $h$ , 8% for $n_{s}$ and 21% for $\sigma_{8}$ , respectively. In comparison with using the likelihood-based analysis on $P(k)$ data, our lCNN performs much tighter constraints on parameters, especially on $\Omega_{m}$ and $\sigma_{8}$ .
等高线区域对应统计不确定性。如图所示，lCNN 对所有测试数据集预测值与真实值差异的离散度显著小于“FC(statistics)”案例。无论是二维等高线中心还是一维参数分布，均接近平均真实值（灰色虚线），且偏差明显小于统计不确定性。特别地，在每幅一维分布图顶部列出了“CNN(r+k)+FC(statistics)”模型得到的边缘化分布均值及其标准偏差 $\sigma_{p}$ 。观测表明，所有均值与真实值高度吻合。“CNN( $r+k$ )+FC(statistics)”相对于平均真实值的偏差分别为： $\Omega_{m}$ 偏差 2.8%、 $h$ 偏差 2.4%、 $n_{s}$ 偏差 2.5%、 $\sigma_{8}$ 偏差 0.6%。与使用“FC(statistics)”模型相比，lCNN 产生的统计误差更小，具体表现为： $\Omega_{m}$ 误差降低 23%、 $h$ 降低 11%、 $n_{s}$ 降低 8%、 $\sigma_{8}$ 降低 21%。相较于基于 $P(k)$ 数据的似然分析，我们的 lCNN 对参数（尤其是 $\Omega_{m}$ 和 $\sigma_{8}$ ）施加了更强的约束。

We also observe that there is almost no correlation between the predicted parameters overall, although weak correlations exist between certain parameter pairs, such as $n_{s}$ and $h$ . Since the parameters in the test sample were randomly generated and uncorrelated, there should be no significant correlation between the values of the parameters predicted by ICNN. The results in the test sample meet our expectations, demonstrating that our ICNN model can achieve high accuracy in parameter prediction with statistical errors that are smaller than that of the conventional likelihood analysis based on the power-spectrum statistics alone.
我们还观察到，尽管某些参数对（如 $n_{s}$ 和 $h$ ）之间存在微弱相关性，但预测参数总体上几乎不存在相关性。由于测试样本中的参数是随机生成且无关联的，ICNN 预测的参数值之间不应存在显著相关性。测试样本的结果符合我们的预期，表明我们的 ICNN 模型在参数预测上能够实现高精度，其统计误差小于仅基于功率谱统计的传统似然分析。

V Concluding Remarks 五、结论性评述

In this study, we have designed a lightweight deep convolutional neural network, lCNN, aimed at estimating cosmological parameters from simulated three-dimensional DM halo number density field and associated statistics. Our training dataset consists of 2000 realizations of a cubic box with a side length of 1000 $h^{-1}{\rm Mpc}$ , each sampled with $512^{3}$ DM particles and $512^{3}$ neutrinos interpolated over a cubic grid of $300^{3}$ voxels. Under the flat $\Lambda$ CDM model, simulations vary the standard six cosmological parameters, including $\Omega_{m}$ , $\Omega_{b}$ , $h$ , $n_{s}$ , $\sigma_{8}$ , $w$ , along with the neutrino mass sum, $M_{\nu}$ .
在本研究中，我们设计了一个轻量级的深度卷积神经网络 lCNN，旨在从模拟的三维暗物质晕数量密度场及其相关统计量中估计宇宙学参数。我们的训练数据集包含 2000 个边长为 1000 $h^{-1}{\rm Mpc}$ 的立方体模拟，每个模拟使用 $512^{3}$ 个暗物质粒子和 $512^{3}$ 个中微子在 $300^{3}$ 个体素立方网格上进行插值。在平坦的 $\Lambda$ CDM 模型下，模拟变化了六个标准宇宙学参数，包括 $\Omega_{m}$ 、 $\Omega_{b}$ 、 $h$ 、 $n_{s}$ 、 $\sigma_{8}$ 、 $w$ ，以及中微子质量总和 $M_{\nu}$ 。

Seven distinct models have been considered to assess the optimal input datasets, including: “CNN( $r$ )”, which utilizes solely the density field; “CNN( $r+k$ )”, incorporating both density field and its Fourier modes; “CNN( $r$ )+FC(statistics)”, employing the density field along with three statistics (i.e., the halo density power spectrum, the 2PCF, and the WST coefficients); “CNN( $r+k$ )+FC(statistics)”, combining “CNN( $r+k$ )” with three statistics; “RF(statistics)”, utilizing the random forest neural network trained solely with the three measured statistics, for comparison with the lCNN;“FC(statistics)” and FC(Pk), utilizing the fully connected neural network trained solely with the three measured statistics and $P(k)$ respectively.
为评估最优输入数据集，我们考虑了七种不同的模型，包括：“CNN( $r$ )”，仅利用密度场；“CNN( $r+k$ )”，结合密度场及其傅里叶模式；“CNN( $r$ )+FC(statistics)”，采用密度场及三项统计量（即晕密度功率谱、两点相关函数和 WST 系数）；“CNN( $r+k$ )+FC(statistics)”，将“CNN( $r+k$ )”与三项统计量结合；“RF(statistics)”，使用仅基于三项测量统计量训练的随机森林神经网络，用于与 lCNN 比较；“FC(statistics)”和 FC(Pk)，分别利用全连接神经网络仅基于三项测量统计量和 $P(k)$ 进行训练。

Our findings reveal several key insights: 1) within the framework of lCNN, extracting LSS information is more efficient from the halo density field compared to relying on statistical quantities including the power spectrum, 2PCF, and WST coefficients; 2) combining the halo density field with its Fourier-transformed counterpart enhances predictions, and augmenting the training dataset with measured statistics further improves performance; 3) the neural network model achieves high accuracy in inferring $\Omega_{m}$ , $h$ , and $\sigma_{8}$ , while showing inefficiency in predicting $\Omega_{b}$ , $n_{s}$ , $M_{\nu}$ , and $w$ ; 4) moreover, compared to the simple fully connected network trained with three statistical quantities, our proposed lCNN model yields high prediction accuracy in the parameters and provides smaller statistical errors, reducing the errors by about 23% for $\Omega_{m}$ , 11% for $h$ , 8% for $n_{s}$ , and 21% for $\sigma_{8}$ , respectively; 5) Compared to the likelihood-based analysis of the $P(k)$ data, our lCNN achieves significantly tighter constraints on parameters, particularly on $\Omega_{m}$ , $h$ , and $\sigma_{8}$ , reducing them by 46%, 29%, and 40%, respectively.
我们的研究发现揭示了几个关键洞见：1)在 lCNN 框架内，相较于依赖包括功率谱、两点相关函数（2PCF）和小波散射变换（WST）系数在内的统计量，从晕密度场中提取 LSS 信息更为高效；2)将晕密度场与其傅里叶变换版本结合能增强预测效果，而通过实测统计量扩充训练数据集可进一步提升性能；3)神经网络模型在推断 $\Omega_{m}$ 、 $h$ 和 $\sigma_{8}$ 时达到高精度，但在预测 $\Omega_{b}$ 、 $n_{s}$ 、 $M_{\nu}$ 和 $w$ 时表现不佳；4)此外，与使用三种统计量训练的简单全连接网络相比，我们提出的 lCNN 模型在参数预测上具有更高的准确性，并提供更小的统计误差，分别将 $\Omega_{m}$ 、 $h$ 、 $n_{s}$ 和 $\sigma_{8}$ 的误差降低了约 23%、11%、8%和 21%；5)与基于 $P(k)$ 数据的似然分析相比，我们的 lCNN 对参数尤其是 $\Omega_{m}$ 、 $h$ 和 $\sigma_{8}$ 的约束显著更紧，分别减少了 46%、29%和 40%。

Note that the previous constraints (Villaescusa-Navarro et al., 2020; Massara et al., 2020), especially for the neutrino mass sum, $M_{\nu}$ are considerably tighter than those derived here. This is because their estimations are optimal for two primary reasons: i) they are derived from the total matter density power spectrum, which cannot be directly or accurately measured from real observations. In contrast, our observable is the spatial distribution of halos and the associated statistics. 2) To more accurately reflect real observations, we have incorporated the effects of RSD, coordinate mapping from fiducial cosmology, and fixed the halo number density in our dataset. These additional observational effects included in our mock data have the potential to significantly weaken the cosmological parameter constraints.
需要注意的是，先前的研究（Villaescusa-Navarro 等人，2020；Massara 等人，2020）所设定的限制条件，尤其是针对中微子质量总和的约束 $M_{\nu}$ ，比本文得出的结果要严格得多。这主要归因于两点优化因素：其一，他们的估算基于总物质密度功率谱，而该谱无法通过实际观测直接或精确测得；相比之下，我们的观测对象是晕的空间分布及其相关统计量。其二，为更真实地模拟实际观测，我们在数据集中引入了红移空间畸变(RSD)效应、基准宇宙学坐标映射，并固定了晕数量密度。这些纳入模拟数据的额外观测效应，可能会显著削弱对宇宙学参数的约束力。

Machine learning is highly effective at analyzing complex features in complicated datasets. From this perspective, a limitation of our study is that our training samples are composed of sparse halo fields with a low number density of $2\times 10^{-4}$ . Consequently, many small-scale structures and clustering details are not captured in such sparse fields. A promising direction for future investigation would be to increase the number density by one to two orders of magnitude to better mimic the observational data from stage-IV surveys. In such scenarios, we expect that machine learning could significantly enhance performance and offer substantial advantages over traditional statistical methods.
机器学习在分析复杂数据集中的精细特征方面表现出色。从这个角度看，本研究的一个局限在于训练样本由稀疏的晕场构成，其 $2\times 10^{-4}$ 数密度较低。因此，这类稀疏场未能捕捉到许多小尺度结构及聚类细节。未来研究的一个可行方向是将数密度提高一到两个数量级，以更好地模拟第四阶段巡天的观测数据。在此类场景下，我们预期机器学习能显著提升性能，并较传统统计方法展现出巨大优势。

In future work, we intend to evaluate the ability of the network to predict cosmological parameters from light-cone simulations, and finally, apply it to real observational data.
在后续工作中，我们计划评估该网络从光锥模拟中预测宇宙学参数的能力，并最终将其应用于实际观测数据。

Acknowledgements. 致谢。

We thank Francisco Villaescusa-Navarro and Yin Li for helpful discussions. This work is supported by National SKA Program of China (2020SKA0110401, 2020SKA0110402, 2020SKA0110100), the National Key R&D Program of China (2020YFC2201600, 2018YFA0404504, 2018YFA0404601), the National Science Foundation of China (11890691, 12203107, 12073088, 12373005), the China Manned Space Project with No. CMS-CSST-2021 (A02, A03, B01), the Guangdong Basic and Applied Basic Research Foundation (2019A1515111098), and the 111 project of the Ministry of Education No. B20019. We also wish to acknowledge the Beijing Super Cloud Center (BSCC) and Beijing Beilong Super Cloud Computing Co., Ltd (http://www.blsc.cn/) for providing HPC resources that have significantly contributed to the research results presented in this paper.
我们感谢 Francisco Villaescusa-Navarro 和李因有益的讨论。本工作得到了中国平方公里阵列射电望远镜(SKA)专项(2020SKA0110401, 2020SKA0110402, 2020SKA0110100)、国家重点研发计划(2020YFC2201600, 2018YFA0404504, 2018YFA0404601)、国家自然科学基金(11890691, 12203107, 12073088, 12373005)、中国载人航天工程空间科学项目(编号 CMS-CSST-2021-A02, A03, B01)、广东省基础与应用基础研究基金(2019A1515111098)以及教育部 111 计划(编号 B20019)的支持。同时，我们感谢北京超级云计算中心(BSCC)和北京北龙超级云计算有限责任公司(http://www.blsc.cn/)为本研究提供的高性能计算资源，这些资源对本文研究成果的取得起到了重要作用。

References 参考文献

Weinberg (1989) S. Weinberg, Reviews of modern physics 61, 1 (1989).
S. Weinberg，《现代物理评论》61 卷，第 1 期(1989 年)。
Peebles and Ratra (2003) 皮布尔斯与拉特拉（2003 年） P. J. E. Peebles and B. Ratra, Reviews of modern physics 75, 559 (2003).
P·J·E·皮布尔斯与 B·拉特拉，《现代物理评论》75 卷，559 页（2003 年）。
Li et al. (2011) 李等人（2011 年） M. Li, X.-D. Li, S. Wang, and Y. Wang, Communications in theoretical physics 56, 525 (2011).
李 M、李 X-D、王 S 与王 Y，《理论物理通讯》56 卷，525 页（2011 年）。
Bardeen et al. (1986) 巴丁等人（1986 年） J. M. Bardeen, J. Bond, N. Kaiser, and A. Szalay, Astrophysical Journal, Part 1 (ISSN 0004-637X), vol. 304, May 1, 1986, p. 15-61. SERC-supported research. 304, 15 (1986).
J·M·巴丁、J·邦德、N·凯泽与 A·萨拉伊，《天体物理学杂志》第一部分（ISSN 0004-637X），第 304 卷，1986 年 5 月 1 日，第 15-61 页。英国科学与工程研究委员会资助研究。304, 15 (1986)。
De Lapparent et al. (1986)
德拉帕朗等人（1986 年） V. De Lapparent, M. J. Geller, and J. P. Huchra, Astrophysical Journal, Part 2-Letters to the Editor (ISSN 0004-637X), vol. 302, March 1, 1986, p. L1-L5. Research supported by the Smithsonian Institution. 302, L1 (1986).
V·德拉帕朗、M·J·盖勒与 J·P·胡克拉，《天体物理学杂志》第二部分-致编辑的信（ISSN 0004-637X），第 302 卷，1986 年 3 月 1 日，第 L1-L5 页。史密森学会资助研究。302, L1 (1986)。
Huchra et al. (2012) 胡克拉等人（2012 年） J. P. Huchra, L. M. Macri, K. L. Masters, T. H. Jarrett, P. Berlind, M. Calkins, A. C. Crook, R. Cutri, P. Erdoğdu, E. Falco, et al., The Astrophysical Journal Supplement Series 199, 26 (2012).
J·P·胡克拉、L·M·麦克里、K·L·马斯特斯、T·H·贾勒特、P·伯林德、M·卡尔金斯、A·C·克鲁克、R·卡特里、P·埃尔多杜、E·法尔科等，《天体物理学杂志增刊》第 199 卷，第 26 页（2012 年）。
Tegmark et al. (2004) 泰格马克等人（2004 年） M. Tegmark, M. R. Blanton, M. A. Strauss, F. Hoyle, D. Schlegel, R. Scoccimarro, M. S. Vogeley, D. H. Weinberg, I. Zehavi, A. Berlind, et al., The Astrophysical Journal 606, 702 (2004).
M·泰格马克、M·R·布兰顿、M·A·斯特劳斯、F·霍伊尔、D·施莱格尔、R·斯科奇马罗、M·S·沃格利、D·H·温伯格、I·泽哈维、A·伯林德等，《天体物理学杂志》第 606 卷，第 702 页（2004 年）。
Guzzo et al. (2014) Guzzo 等人（2014 年） L. Guzzo, M. Scodeggio, B. Garilli, B. Granett, A. Fritz, U. Abbas, C. Adami, S. Arnouts, J. Bel, M. Bolzonella, et al., Astronomy & Astrophysics 566, A108 (2014).
L. Guzzo, M. Scodeggio, B. Garilli, B. Granett, A. Fritz, U. Abbas, C. Adami, S. Arnouts, J. Bel, M. Bolzonella 等，《天文学与天体物理学》566 卷，A108（2014 年）。
Zhong et al. (2024) Zhong 等人（2024 年） K. Zhong, M. Gatti, and B. Jain, Improving convolutional neural networks for cosmological fields with random permutation (2024), eprint 2403.01368.
K. Zhong, M. Gatti 与 B. Jain，《通过随机排列改进宇宙学场卷积神经网络》（2024 年），预印本编号 2403.01368。
Kaiser (1987) 凯泽（1987 年） N. Kaiser, Monthly Notices of the Royal Astronomical Society 227, 1 (1987).
N·凯泽，《皇家天文学会月报》第 227 卷，第 1 页（1987 年）。
Ballinger et al. (1996) 巴林杰等人（1996 年） W. Ballinger, J. Peacock, and A. Heavens, Monthly Notices of the Royal Astronomical Society 282, 877 (1996).
W·巴林杰、J·皮科克与 A·希文斯，《皇家天文学会月报》第 282 卷，第 877 页（1996 年）。
Eisenstein et al. (1998) 艾森斯坦等人（1998 年） D. J. Eisenstein, W. Hu, and M. Tegmark, The Astrophysical Journal 504, L57 (1998).
D·J·艾森斯坦、W·胡与 M·泰格马克，《天体物理学杂志》504 卷，L57 页（1998 年）。
Blake and Glazebrook (2003)
Blake 和 Glazebrook (2003) C. Blake and K. Glazebrook, The Astrophysical Journal 594, 665 (2003).
C. Blake 与 K. Glazebrook，《天体物理学杂志》594 卷，665 页（2003 年）。
Seo and Eisenstein (2003)
Seo 与 Eisenstein（2003 年） H.-J. Seo and D. J. Eisenstein, The Astrophysical Journal 598, 720 (2003).
H.-J. Seo 与 D. J. Eisenstein，《天体物理学杂志》598 卷，720 页（2003 年）。
Colless et al. (2003) Colless 等（2003 年） M. Colless, B. A. Peterson, C. Jackson, J. A. Peacock, S. Cole, P. Norberg, I. K. Baldry, C. M. Baugh, J. Bland-Hawthorn, T. Bridges, et al., arXiv e-prints astro-ph/0306581 (2003), eprint astro-ph/0306581.
M. Colless、B. A. Peterson、C. Jackson、J. A. Peacock、S. Cole、P. Norberg、I. K. Baldry、C. M. Baugh、J. Bland-Hawthorn、T. Bridges 等，《arXiv 电子预印本》astro-ph/0306581（2003 年），电子版编号 astro-ph/0306581。
Beutler et al. (2011) Beutler 等人（2011 年） F. Beutler, C. Blake, M. Colless, D. H. Jones, L. Staveley-Smith, L. Campbell, Q. Parker, W. Saunders, and F. Watson, Monthly Notices of the Royal Astronomical Society 416, 3017–3032 (2011), ISSN 0035-8711, URL http://dx.doi.org/10.1111/j.1365-2966.2011.19250.x.
F. Beutler、C. Blake、M. Colless、D. H. Jones、L. Staveley-Smith、L. Campbell、Q. Parker、W. Saunders 及 F. Watson，《皇家天文学会月报》416 卷，3017–3032 页（2011 年），ISSN 0035-8711，网址 http://dx.doi.org/10.1111/j.1365-2966.2011.19250.x。
Riemer–Sørensen et al. (2012)
Riemer–Sørensen 等人（2012 年） S. Riemer–Sørensen, C. Blake, D. Parkinson, T. M. Davis, S. Brough, M. Colless, C. Contreras, W. Couch, S. Croom, D. Croton, et al., Physical Review D 85 (2012), ISSN 1550-2368, URL http://dx.doi.org/10.1103/PhysRevD.85.081101.
S. Riemer–Sørensen、C. Blake、D. Parkinson、T. M. Davis、S. Brough、M. Colless、C. Contreras、W. Couch、S. Croom、D. Croton 等，《物理评论 D》85 卷（2012 年），ISSN 1550-2368，网址 http://dx.doi.org/10.1103/PhysRevD.85.081101。
York et al. (2000) York 等（2000 年） D. G. York, J. Adelman, J. E. Anderson Jr, S. F. Anderson, J. Annis, N. A. Bahcall, J. Bakken, R. Barkhouser, S. Bastian, E. Berman, et al., The Astronomical Journal 120, 1579 (2000).
D. G. York、J. Adelman、J. E. Anderson Jr、S. F. Anderson、J. Annis、N. A. Bahcall、J. Bakken、R. Barkhouser、S. Bastian、E. Berman 等，《天文期刊》120 卷，1579 页（2000 年）。
Eisenstein et al. (2005) Eisenstein 等（2005 年） D. J. Eisenstein, I. Zehavi, D. W. Hogg, R. Scoccimarro, M. R. Blanton, R. C. Nichol, R. Scranton, H.-J. Seo, M. Tegmark, Z. Zheng, et al., The Astrophysical Journal 633, 560 (2005).
D. J. 艾森斯坦、I. 泽哈维、D. W. 霍格、R. 斯科奇马罗、M. R. 布兰顿、R. C. 尼科尔、R. 斯克兰顿、H.-J. 徐、M. 泰格马克、Z. 郑等，《天体物理学杂志》633 卷，560 页（2005 年）。
Percival et al. (2007) 珀西瓦尔等人（2007 年） W. J. Percival, S. Cole, D. J. Eisenstein, R. C. Nichol, J. A. Peacock, A. C. Pope, and A. S. Szalay, Monthly Notices of the Royal Astronomical Society 381, 1053 (2007).
W. J. 珀西瓦尔、S. 科尔、D. J. 艾森斯坦、R. C. 尼科尔、J. A. 皮科克、A. C. 波普、A. S. 萨莱，《皇家天文学会月报》381 卷，1053 页（2007 年）。
Anderson et al. (2014) 安德森等人（2014 年） L. Anderson, E. Aubourg, S. Bailey, F. Beutler, V. Bhardwaj, M. Blanton, A. S. Bolton, J. Brinkmann, J. R. Brownstein, A. Burden, et al., Monthly Notices of the Royal Astronomical Society 441, 24 (2014).
L. 安德森、E. 奥伯格、S. 贝利、F. 博伊特勒、V. 巴德瓦杰、M. 布兰顿、A. S. 博尔顿、J. 布林克曼、J. R. 布朗斯坦、A. 伯登等，《皇家天文学会月报》441 卷，24 页（2014 年）。
Samushia et al. (2014) 萨穆希亚等人（2014 年） L. Samushia, B. A. Reid, M. White, W. J. Percival, A. J. Cuesta, G.-B. Zhao, A. J. Ross, M. Manera, É. Aubourg, F. Beutler, et al., Monthly Notices of the Royal Astronomical Society 439, 3504 (2014).
L. 萨穆希亚、B. A. 里德、M. 怀特、W. J. 珀西瓦尔、A. J. 奎斯塔、G.-B. 赵、A. J. 罗斯、M. 曼尼拉、É. 奥伯格、F. 博伊特勒等，《皇家天文学会月报》439 卷，3504 页（2014 年）。
Ross et al. (2015) 罗斯等人（2015 年） A. J. Ross, L. Samushia, C. Howlett, W. J. Percival, A. Burden, and M. Manera, Monthly Notices of the Royal Astronomical Society 449, 835 (2015).
A. J. Ross, L. Samushia, C. Howlett, W. J. Percival, A. Burden, 和 M. Manera，《皇家天文学会月报》449 卷，835 页（2015 年）。
Beutler et al. (2017) Beutler 等人（2017 年） F. Beutler, H.-J. Seo, S. Saito, C.-H. Chuang, A. J. Cuesta, D. J. Eisenstein, H. Gil-Marín, J. N. Grieb, N. Hand, F.-S. Kitaura, et al., Monthly Notices of the Royal Astronomical Society 466, 2242 (2017).
F. Beutler、H.-J. Seo、S. Saito、C.-H. Chuang、A. J. Cuesta、D. J. Eisenstein、H. Gil-Marín、J. N. Grieb、N. Hand、F.-S. Kitaura 等，《皇家天文学会月报》466 卷，2242 页（2017 年）。
Sánchez et al. (2017) Sánchez 等人（2017 年） A. G. Sánchez, R. Scoccimarro, M. Crocce, J. N. Grieb, S. Salazar-Albornoz, C. D. Vecchia, M. Lippich, F. Beutler, J. R. Brownstein, C.-H. Chuang, et al., Monthly Notices of the Royal Astronomical Society 464, 1640 (2017).
A. G. Sánchez, R. Scoccimarro, M. Crocce, J. N. Grieb, S. Salazar-Albornoz, C. D. Vecchia, M. Lippich, F. Beutler, J. R. Brownstein, C.-H. Chuang 等，《皇家天文学会月报》464 卷，1640 页（2017 年）。
Alam et al. (2017a) Alam 等（2017a） S. Alam, M. Ata, S. Bailey, F. Beutler, D. Bizyaev, J. A. Blazek, A. S. Bolton, J. R. Brownstein, A. Burden, C.-H. Chuang, et al., Monthly Notices of the Royal Astronomical Society 470, 2617 (2017a).
S. Alam, M. Ata, S. Bailey, F. Beutler, D. Bizyaev, J. A. Blazek, A. S. Bolton, J. R. Brownstein, A. Burden, C.-H. Chuang 等，《皇家天文学会月报》470 卷，2617 页（2017a 年）。
Chuang et al. (2017) Chuang 等（2017 年） C.-H. Chuang, M. Pellejero-Ibanez, S. Rodriguez-Torres, A. J. Ross, G.-b. Zhao, Y. Wang, A. J. Cuesta, J. Rubiño-Martín, F. Prada, S. Alam, et al., Monthly Notices of the Royal Astronomical Society 471, 2370 (2017).
庄志宏、M. 佩列赫罗-伊巴涅斯、S. 罗德里格斯-托雷斯、A. J. 罗斯、赵刚兵、王宇、A. J. 奎斯塔、J. 鲁比尼奥-马丁、F. 普拉达、S. 阿拉姆等，《皇家天文学会月报》471 卷，2370 页（2017 年）。
Neveux et al. (2020) 内沃等人（2020 年） R. Neveux, E. Burtin, A. de Mattia, A. Smith, A. J. Ross, J. Hou, J. Bautista, J. Brinkmann, C.-H. Chuang, K. S. Dawson, et al., Monthly Notices of the Royal Astronomical Society 499, 210–229 (2020), ISSN 1365-2966, URL http://dx.doi.org/10.1093/mnras/staa2780.
R. 内沃、E. 伯廷、A. 德马蒂亚、A. 史密斯、A. J. 罗斯、侯杰、J. 包蒂斯塔、J. 布林克曼、庄志宏、K. S. 道森等，《皇家天文学会月报》499 卷，210-229 页（2020 年），ISSN 1365-2966，网址 http://dx.doi.org/10.1093/mnras/staa2780。
Sabiu et al. (2016) 萨比乌等人（2016 年） C. G. Sabiu, D. F. Mota, C. Llinares, and C. Park, Astronomy & Astrophysics 592, A38 (2016).
C. G. Sabiu、D. F. Mota、C. Llinares 与 C. Park，《天文学与天体物理学》592 卷，A38 页（2016 年）。
Slepian et al. (2017) Slepian 等人（2017 年） Z. Slepian, D. J. Eisenstein, F. Beutler, C.-H. Chuang, A. J. Cuesta, J. Ge, H. Gil-Marín, S. Ho, F.-S. Kitaura, C. K. McBride, et al., Monthly Notices of the Royal Astronomical Society 468, 1070 (2017).
Z. Slepian、D. J. Eisenstein、F. Beutler、C.-H. Chuang、A. J. Cuesta、J. Ge、H. Gil-Marín、S. Ho、F.-S. Kitaura、C. K. McBride 等，《皇家天文学会月刊》468 卷，1070 页（2017 年）。
Sabiu et al. (2019) Sabiu 等人（2019 年） C. G. Sabiu, B. Hoyle, J. Kim, and X.-D. Li, The Astrophysical Journal Supplement Series 242, 29 (2019).
C. G. Sabiu、B. Hoyle、J. Kim 与 X.-D. Li，《天体物理学杂志增刊系列》242 卷，29 页（2019 年）。
Lavaux and Wandelt (2012)
Lavaux 和 Wandelt（2012 年） G. Lavaux and B. D. Wandelt, The Astrophysical Journal 754, 109 (2012).
G. Lavaux 与 B. D. Wandelt，《天体物理学报》754 卷，109 页（2012 年）。
Li et al. (2017) 李等人（2017 年） X.-D. Li, C. Park, C. G. Sabiu, H. Park, C. Cheng, J. Kim, and S. E. Hong, The Astrophysical Journal 844, 91 (2017).
李晓东、朴成旭、Sabiu C. G.、朴铉基、程诚、金健、Hong S. E.，《天体物理学报》844 卷，91 页（2017 年）。
Marinoni and Buzzi (2010)
Marinoni 与 Buzzi（2010 年） C. Marinoni and A. Buzzi, Nature 468, 539 (2010).
C. Marinoni 与 A. Buzzi，《自然》468 卷，539 页（2010 年）。
Li et al. (2015) 李等人（2015 年） X.-D. Li, C. Park, C. G. Sabiu, and J. Kim, Monthly Notices of the Royal Astronomical Society 450, 807 (2015).
李晓东、朴成哲、C. G. Sabiu 与金正勋，《皇家天文学会月报》450 卷，807 页（2015 年）。
Li et al. (2018) 李等人（2018 年） Y. Li, M. Schmittfull, and U. Seljak, Journal of Cosmology and Astroparticle Physics 2018, 022 (2018).
李毅、M. Schmittfull 与 U. Seljak 合著，《宇宙学与天体粒子物理期刊》2018 年第 022 期（2018 年）。
Porqueres et al. (2021) Porqueres 等人（2021 年） N. Porqueres, A. F. Heavens, D. J. Mortlock, and G. Lavaux (2021), URL https://api.semanticscholar.org/CorpusID:236976077.
N. Porqueres、A. F. Heavens、D. J. Mortlock 及 G. Lavaux 合著（2021 年），网址：https://api.semanticscholar.org/CorpusID:236976077。
White (2016) 怀特（2016 年） M. White, Journal of Cosmology and Astroparticle Physics 2016, 057–057 (2016), ISSN 1475-7516, URL http://dx.doi.org/10.1088/1475-7516/2016/11/057.
M. White，《宇宙学与天体粒子物理学杂志》2016 年第 11 期，057–057 页（2016 年），ISSN 1475-7516，网址 http://dx.doi.org/10.1088/1475-7516/2016/11/057。
Yang et al. (2020) 杨等人（2020 年） Y. Yang, H. Miao, Q. Ma, M. Liu, C. G. Sabiu, J. Forero-Romero, Y. Huang, L. Lai, Q. Qian, Y. Zheng, et al., The Astrophysical Journal 900, 6 (2020), ISSN 1538-4357, URL http://dx.doi.org/10.3847/1538-4357/aba35b.
Y. Yang、H. Miao、Q. Ma、M. Liu、C. G. Sabiu、J. Forero-Romero、Y. Huang、L. Lai、Q. Qian、Y. Zheng 等，《天体物理学杂志》第 900 卷，第 6 页（2020 年），ISSN 1538-4357，网址 http://dx.doi.org/10.3847/1538-4357/aba35b。
Lai et al. (2023) 赖等人（2023 年） L. M. Lai, J. C. Ding, X. L. Luo, Y. Z. Yang, Z. H. Wang, K. S. Liu, G. F. Liu, X. Wang, Y. Zheng, Z. Y. Li, et al., Improving constraint on $\omega_{m}$ from sdss using marked correlation functions (2023), eprint 2312.03244.
赖 LM、丁 JC、罗 XL、杨 YZ、王 ZH、刘 KS、刘 GF、王 X、郑 Y、李 ZY 等，《利用标记相关函数改进对 $\omega_{m}$ 的约束——基于 SDSS 数据的研究》（2023 年），预印本号 2312.03244。
Yin et al. (2024) 尹等人（2024 年） F. Yin, J. Ding, L. Lai, W. Zhang, L. Xiao, Z. Wang, J. Forero-Romero, L. Zhang, and X.-D. Li, Improving sdss cosmological constraints through $\beta$ -skeleton weighted correlation functions (2024), eprint 2403.14165.
尹峰、丁剑、赖力、张伟、肖龙、王喆、J. Forero-Romero、张亮、李晓东，《通过 $\beta$ 骨架加权相关函数提升 SDSS 宇宙学约束》（2024 年），预印本编号 2403.14165。
Fang et al. (2019) 方等人（2019 年） F. Fang, J. Forero-Romero, G. Rossi, X.-D. Li, and L.-L. Feng, Monthly Notices of the Royal Astronomical Society 485, 5276 (2019).
方峰、J. Forero-Romero、G. Rossi、李晓东、冯莉莉，《皇家天文学会月报》485 卷，5276 页（2019 年）。
Way et al. (2016) 韦等人（2016 年） M. J. Way, J. D. Scargle, K. M. Ali, and A. N. Srivastava, Advances in Machine Learning and Data Mining for Astronomy (Chapman & Hall/CRC, 2016), 1st ed., ISBN 1138199303.
M. J. 韦、J. D. 斯卡格尔、K. M. 阿里与 A. N. 斯利瓦斯塔瓦合著，《机器学习与数据挖掘在天文学中的进展》（查普曼&霍尔/CRC 出版社，2016 年），第一版，ISBN 1138199303。
Chen and Zhang (2014) 陈与张（2014 年） C. P. Chen and C.-Y. Zhang, Information sciences 275, 314 (2014).
C. P. 陈与张春阳，《信息科学》275 卷，314 页（2014 年）。
Jordan and Mitchell (2015)
乔丹与米切尔（2015 年） M. I. Jordan and T. M. Mitchell, Science 349, 255 (2015).
M. I. 乔丹与 T. M. 米切尔，《科学》349 卷，255 页（2015 年）。
Rodríguez-Mazahua et al. (2016)
罗德里格斯-马扎华等（2016） L. Rodríguez-Mazahua, C.-A. Rodríguez-Enríquez, J. L. Sánchez-Cervantes, J. Cervantes, J. L. García-Alcaraz, and G. Alor-Hernández, The Journal of Supercomputing 72, 3073 (2016).
L. 罗德里格斯-马扎华、C.-A. 罗德里格斯-恩里克斯、J. L. 桑切斯-塞万提斯、J. 塞万提斯、J. L. 加西亚-阿尔卡拉斯及 G. 阿洛尔-埃尔南德斯，《超级计算杂志》72 卷，3073 页（2016 年）。
Ball et al. (2017) 鲍尔等（2017） J. E. Ball, D. T. Anderson, and C. S. Chan, Journal of applied remote sensing 11, 042609 (2017).
J. E. 鲍尔、D. T. 安德森与 C. S. 陈，《应用遥感期刊》11 卷，042609 页（2017 年）。
Sen et al. (2022) Sen 等人（2022 年） S. Sen, S. Agarwal, P. Chakraborty, and K. P. Singh, Experimental Astronomy 53, 1 (2022).
S. 森、S. 阿加瓦尔、P. 查克拉博蒂与 K. P. 辛格，《实验天文学》53 卷，第 1 期（2022 年）。
Wu et al. (2021) 吴等人（2021 年） Z. Wu, Z. Zhang, S. Pan, H. Miao, X. Luo, X. Wang, C. G. Sabiu, J. Forero-Romero, Y. Wang, and X.-D. Li, The Astrophysical Journal 913, 2 (2021), ISSN 1538-4357, URL http://dx.doi.org/10.3847/1538-4357/abf3bb.
吴宗泽、张子涵、潘晟、缪海兴、罗晓峰、王晓晨、C. G. Sabiu、J. Forero-Romero、王宇轩、李晓东，《天体物理学报》913 卷 2 期（2021 年），ISSN 1538-4357，网址 http://dx.doi.org/10.3847/1538-4357/abf3bb。
Wu et al. (2023) 吴等人（2023 年） Z. Wu, L. Xiao, X. Xiao, J. Wang, X. Kang, Y. Wang, X. Wang, and X.-D. Li, Monthly Notices of the Royal Astronomical Society 522, 4748 (2023).
吴宗泽、肖磊、肖翔、王佳、康旭、王宇轩、王晓晨、李晓东，《皇家天文学会月报》522 卷 4748 页（2023 年）。
Wang et al. (2024) 王等人（2024 年） Z. Wang, F. Shi, X. Yang, Q. Li, Y. Liu, and X. Li, Sci. China Phys. Mech. Astron. 67, 219513 (2024), eprint 2305.11431.
王 Z、石 F、杨 X、李 Q、刘 Y 和李 X，《中国科学：物理学力学天文学》67 卷，219513 页（2024 年），预印本号 2305.11431。
Ravanbakhsh et al. (2016)
Ravanbakhsh 等人（2016） S. Ravanbakhsh, J. Oliva, S. Fromenteau, L. Price, S. Ho, J. Schneider, and B. Póczos, in International conference on machine learning (PMLR, 2016), pp. 2407–2416.
S. Ravanbakhsh, J. Oliva, S. Fromenteau, L. Price, S. Ho, J. Schneider, 与 B. Póczos 合著，载于《国际机器学习会议》（PMLR 出版社，2016 年），第 2407–2416 页。
Pan et al. (2020) 潘等人（2020 年） S. Pan, M. Liu, J. Forero-Romero, C. G. Sabiu, Z. Li, H. Miao, and X.-D. Li, Science China Physics, Mechanics & Astronomy 63, 110412 (2020).
潘松、刘明、J. Forero-Romero、C. G. Sabiu、李政、苗海龙与李晓东合著，《中国科学：物理学力学天文学》第 63 卷，110412（2020 年）。
Lazanu (2021) 拉扎努（2021 年） A. Lazanu, Journal of Cosmology and Astroparticle Physics 2021, 039 (2021).
A. Lazanu，《宇宙学与天体粒子物理学杂志》2021，039（2021）。
Villaescusa-Navarro et al. (2020) F. Villaescusa-Navarro, C. Hahn, E. Massara, A. Banerjee, A. M. Delgado, D. K. Ramanah, T. Charnock, E. Giusarma, Y. Li, E. Allys, et al., The Astrophysical Journal Supplement Series 250, 2 (2020), ISSN 1538-4365, URL http://dx.doi.org/10.3847/1538-4365/ab9d82.
Hortua (2021) H. J. Hortua, arXiv preprint arXiv:2112.11865 (2021).
Hwang et al. (2023) S. Y. Hwang, C. G. Sabiu, I. Park, and S. E. Hong, Journal of Cosmology and Astroparticle Physics 2023, 075 (2023).
Makinen et al. (2022) T. L. Makinen, T. Charnock, P. Lemos, N. Porqueres, A. F. Heavens, and B. D. Wandelt, The Open Journal of Astrophysics 5, 18 (2022), eprint 2207.05202.
Springel (2005) V. Springel, Monthly Notices of the Royal Astronomical Society 364, 1105–1134 (2005), ISSN 1365-2966, URL http://dx.doi.org/10.1111/j.1365-2966.2005.09655.x.
Lewis et al. (2000) A. Lewis, A. Challinor, and A. Lasenby, The Astrophysical Journal 538, 473–476 (2000), ISSN 1538-4357, URL http://dx.doi.org/10.1086/309179.
Bouchet et al. (1994) F. R. Bouchet, S. Colombi, E. Hivon, and R. Juszkiewicz, arXiv: Astrophysics (1994), URL https://api.semanticscholar.org/CorpusID:119519083.
Aghanim et al. (2020) N. Aghanim et al. (Planck), Astron. Astrophys. 641, A6 (2020), [Erratum: Astron.Astrophys. 652, C4 (2021)], eprint 1807.06209.
Alam et al. (2017b) S. Alam et al. (BOSS), Mon. Not. Roy. Astron. Soc. 470, 2617 (2017b), eprint 1607.03155.
Yuan et al. (2023) S. Yuan et al. (DESI) (2023), eprint 2310.09329.
Bruna and Mallat (2013) J. Bruna and S. Mallat, IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1872 (2013).
Mallat (2012) S. Mallat, Group invariant scattering (2012), eprint 1101.2286.
Valogiannis et al. (2023) G. Valogiannis, S. Yuan, and C. Dvorkin, arXiv preprint arXiv:2310.16116 (2023).
Li et al. (2020) Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, IEEE Transactions on Neural Networks and Learning Systems 33, 6999 (2020), URL https://api.semanticscholar.org/CorpusID:214803074.
Kingma and Ba (2017) D. P. Kingma and J. Ba, Adam: A method for stochastic optimization (2017), eprint 1412.6980.
Jeffrey and Wandelt (2020) N. Jeffrey and B. D. Wandelt, ArXiv abs/2011.05991 (2020), URL https://api.semanticscholar.org/CorpusID:226306994.
Shridhar et al. (2018) K. Shridhar, F. Laumann, and M. Liwicki, arXiv e-prints arXiv:1806.05978 (2018), eprint 1806.05978.
Zhang et al. (2023) H. Zhang, S. Zuo, and L. Zhang, Research in Astronomy and Astrophysics 23, 075011 (2023), eprint 2306.09217.
Foreman-Mackey et al. (2013) D. Foreman-Mackey, D. W. Hogg, D. Lang, and J. Goodman, Publications of the Astronomical Society of the Pacific 125, 306 (2013), eprint 1202.3665.
Massara et al. (2020) E. Massara, F. Villaescusa-Navarro, S. Ho, N. Dalal, and D. N. Spergel, Physical review letters 126 1, 011301 (2020), URL https://api.semanticscholar.org/CorpusID:210942819.
Banerjee et al. (2019) A. Banerjee, E. Castorina, F. Villaescusa-Navarro, T. Court, and M. Viel, Journal of Cosmology and Astroparticle Physics 2020, 032 (2019), URL https://api.semanticscholar.org/CorpusID:196623504.

Appendix A Validating the robustness of lCNN

In order to validate the robustness of these results with regard to architectural choices, we proceeded to train nine additional CNNs with varying numbers of convolutional and fully connected layers using only the density field, i.e., “CNN( $r$ )”. For these networks, we employed a $(3,3,3)$ convolutional kernel with zero padding to ensure that the original output data size was preserved when integrating convolutional layers into our baseline network. We ensure that the newly added convolutional layers maintain consistency in both the input and output feature numbers.

Furthermore, fully-connected layers were added before the output layer of the base network. Each model was trained for 1000 epochs with identical hyperparameters, after which their performance on the test data was evaluated to determine the epoch at which overfitting occurred.

As illustrated in Tab. 2, the various models exhibit overfitting at approximately the same number of epochs, and their losses are also approximately equivalent. Nevertheless, an increase in the loss function is observed when the fully connected (FC) layer reaches six layers. This phenomenon is due to the fact that the incorporation of numerous fully connected layers results in a considerable increase in the number of parameters, thereby rendering the model more challenging to converge. Consequently, the findings indicate that, within a specific range, the model is capable of adapting to variations in its architectural design.

3 4 5 6 4 880/0.77 - - - 5 - 800/0.76 875/0.80 845/0.81 6 - 935/0.73 975/0.79 820/0.81 7 - 950/0.75 975/0.76 965/0.80

Table 2: Validation of the robustness of lCNN concerning architectural choices. Nine additional CNNs were trained with varying numbers of convolutional and fully connected layers to observe changes in overfitting epochs and minimum loss, using only the density field, i.e., “CNN(

r

)” as a test. The bold text represents the results of our benchmark network, which has 4 convolutional layers and 3 fully connected layers.

Appendix B Validating the robustness of DM halo data selection

In the following, we will focus on the statistical analysis of “CNN( $r$ )” as an illustrative case to facilitate a quantitative comparison. A discussion of the impact of DM halo data selection on the results of our study was carried out with the following three halo datasets explored:

1.

“FoF+ $M_{\rm min}$ fixed” – replacing the fixed number density with a fixed minimum DM halo mass cutoff of $M_{\rm min}=10^{13}M_{\odot}h^{-1}$ to generate FoF catalogs as training data.
2.

“FoF+ $n_{\rm fix}$ ” – the fiduical catalogs generated by fixing the halo number density to $n_{\rm fix}=2\times 10^{-4}~{}h^{3}{\rm Mpc}^{-3}$ for the FoF catalogs.
3.

“Rockstar” – using the Rockstar halo catalogs from the Quijote simulations instead of the FoF halo catalogs.
4.

“FoF+flat priors” – recovering $\sigma_{8}$ and $n_{s}$ to flat priors through data augmentation by various means of reflecting and rotating the original halo density field.

After training on each of the datasets, the performance of the parameter recovery was evaluated based on each corresponding test dataset, as shown in Fig. 12.

The first column of Fig. 12 represents the results for the FoF catalog with a minimum mass of $10^{13}~{}M_{\odot}/h$ fixed. This adjustment flattens the priors of parameters such as $\sigma_{8}$ in the training data, leading to improved performance in parameter spaces with sparse training data. Consequently, the reconstruction of $\sigma_{8}$ at lower values shows better accuracy compared to the fiducial case. Additionally, the reconstructions for the other parameters remain comparable to those in the fiducial one, indicating that the results are not sensitive to the choice between $n_{\rm fix}$ and $M_{\rm min}$ .

In comparison to our fiducial case (the second column) and the other two cases, Rockstar halo catalogs result in larger RMSE values across all listed cosmological parameters. This is due to the fact that each Rockstar halo catalog typically has a significantly lower halo number density than the other catalogs, which results in a reduction in the amount of cosmological information encoded.

In comparison to Fig. 1, it is evident that the fiducial halo catalogs result in the distribution of $\sigma_{8}$ being absent at low values. Consequently, following the recovery of the flat prior on $\sigma_{8}$ in the case of “FoF+flat prior”, the augmented training data can provide additional information, thereby reducing the bias value and more accurately predicting $\sigma_{8}<0.8$ than in the other cases.

In our work, we applied zero padding in the first two convolutional layers of the CNN (please refer to the second column of Fig. 12 for the corresponding results). For comparison, we discussed circular padding (see the fifth column of Fig. 12), which is suitable for data with periodic boundary conditions. As observed, circular padding provides a slight improvement in training performance over zero padding, though the results are generally similar. The discrepancy between the two padding methods is not substantial, as evidenced by the fact that 97.7% to 98.1% of the pixel values in the 1710 samples used are zero.

Deep Learning for Cosmological Parameter Inference from Dark Matter Halo Density Field基于暗物质晕密度场的深度学习宇宙学参数推断