*For correspondence: *通讯作者：
zhoupc1988@gmail.com

^{†}

These authors contributed equally to this work

^{†}

这些作者对本工作贡献相同

Competing interests: The authors declare that no competing interests exist.
利益冲突：作者声明不存在任何利益冲突。

Funding: See page 33 资金来源：见第 33 页
Received: 19 May 2017
收到日期：2017 年 5 月 19 日
Accepted: 20 February 2018
接受日期：2018 年 2 月 20 日
Published: 22 February 2018
发表日期：2018 年 2 月 22 日
Reviewing editor: David C Van Essen, Washington University in St. Louis, United States
审稿编辑：David C Van Essen，圣路易斯华盛顿大学，美国
© Copyright Zhou et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
© 版权归 Zhou 等人所有。本文根据知识共享署名许可协议发布，允许在注明原作者和来源的前提下无限制使用和再分发。

Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data
高效且准确地从微内窥镜视频数据中提取体内钙信号

Pengcheng Zhou $^{1, 2, 3, 4, 5 *}$ , Shanna L Resendez $^{6 †}$ , Jose Rodriguez-Romaguera $^{6 †}$ , Jessica C Jimenez $^{7, 8, 9 †}$ , Shay Q Neufeld $^{10 †}$ , Andrea Giovannucci $^{11}$ , Johannes Friedrich $^{11}$ , Eftychios A Pnevmatikakis $^{11}$ , Garret D Stuber $^{6, 12, 13}$ , Rene Hen $^{7, 8, 9}$ , Mazen A Kheirbek $^{14, 15, 16, 17}$ , Bernardo L Sabatini $^{10}$ , Robert E Kass $^{1, 3, 18}$ , Liam Paninski $^{2, 4, 5, 7, 19, 20}$
Pengcheng Zhou $^{1, 2, 3, 4, 5 *}$ ，Shanna L Resendez $^{6 †}$ ，Jose Rodriguez-Romaguera $^{6 †}$ ，Jessica C Jimenez $^{7, 8, 9 †}$ ，Shay Q Neufeld $^{10 †}$ ，Andrea Giovannucci $^{11}$ ，Johannes Friedrich $^{11}$ ，Eftychios A Pnevmatikakis $^{11}$ ，Garret D Stuber $^{6, 12, 13}$ ，Rene Hen $^{7, 8, 9}$ ，Mazen A Kheirbek $^{14, 15, 16, 17}$ ，Bernardo L Sabatini $^{10}$ ，Robert E Kass $^{1, 3, 18}$ ，Liam Paninski $^{2, 4, 5, 7, 19, 20}$ $^{1}$ Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, United States; $^{2}$ Department of Statistics, Columbia University, New York, United States; $^{3}$ Machine Learning Department, Carnegie Mellon University, Pittsburgh, United States; $^{4}$ Grossman Center for the Statistics of Mind, Columbia University, New York, United States; $^{5}$ Center for Theoretical Neuroscience, Columbia University, New York, United States; $^{6}$ Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, United States; $^{7}$ Department of Neuroscience, Columbia University, New York, United States; $^{8}$ Division of Integrative Neuroscience, Department of Psychiatry, New York State Psychiatric Institute, New York, United States; $^{9}$ Department of Psychiatry & Pharmacology, Columbia University, New York, United States; $^{10}$ Department of Neurobiology, Harvard Medical School, Howard Hughes Medical Institute, Boston, United States; $^{11}$ Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, United States; $^{12}$ Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, United States; $^{13}$ Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, United States; $^{14}$ Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, United States; $^{15}$ Neuroscience Graduate Program, University of California, San Francisco, United States; $^{16}$ Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, United States; $^{17}$ Department of Psychiatry, University of California, San Francisco, San Francisco, United States; $^{18}$ Department of Statistics, Carnegie Mellon University, Pittsburgh, United States; $^{19}$ Kavli Institute for Brain Science, Columbia University, New York, United States; $^{20}$ Neurotechnology Center, Columbia University, New York, United States
$^{1}$ 认知神经基础中心，卡内基梅隆大学，美国匹兹堡； $^{2}$ 统计系，哥伦比亚大学，美国纽约； $^{3}$ 机器学习系，卡内基梅隆大学，美国匹兹堡； $^{4}$ 心智统计格罗斯曼中心，哥伦比亚大学，美国纽约； $^{5}$ 理论神经科学中心，哥伦比亚大学，美国纽约； $^{6}$ 精神病学系，北卡罗来纳大学教堂山分校，美国教堂山； $^{7}$ 神经科学系，哥伦比亚大学，美国纽约； $^{8}$ 综合神经科学部，精神病学系，纽约州精神病研究所，美国纽约； $^{9}$ 精神病学与药理学系，哥伦比亚大学，美国纽约； $^{10}$ 神经生物学系，哈佛医学院，霍华德·休斯医学研究所，美国波士顿； $^{11}$ 计算生物学中心，弗莱特伦研究所，西蒙斯基金会，美国纽约； $^{12}$ 细胞生物学与生理学系，北卡罗来纳大学教堂山分校，美国教堂山； $^{13}$ 北卡罗来纳大学教堂山分校神经科学中心，美国教堂山； $^{14}$ 加州大学旧金山分校韦尔神经科学研究所，美国旧金山； $^{15}$ 加州大学旧金山分校神经科学研究生项目，美国旧金山； $^{16}$ 加州大学旧金山分校卡夫利基础神经科学研究所，美国旧金山； $^{17}$ 加州大学旧金山分校精神病学系，美国旧金山； $^{18}$ 卡内基梅隆大学统计系，美国匹兹堡； $^{19}$ 哥伦比亚大学卡夫利脑科学研究所，美国纽约； $^{20}$ 哥伦比亚大学神经技术中心，美国纽约

Abstract 摘要

In vivo calcium imaging through microendoscopic lenses enables imaging of previously inaccessible neuronal populations deep within the brains of freely moving animals. However, it is computationally challenging to extract single-neuronal activity from microendoscopic data, because of the very large background fluctuations and high spatial overlaps intrinsic to this recording modality. Here, we describe a new constrained matrix factorization approach to accurately separate the background and then demix and denoise the neuronal signals of interest. We compared the proposed method against previous independent components analysis and constrained nonnegative matrix factorization approaches. On both simulated and experimental data recorded from mice, our method substantially improved the quality of extracted cellular signals and detected more well-isolated neural signals, especially in noisy data regimes. These
通过微内窥镜镜头进行的体内钙成像，使得对自由活动动物大脑深处先前无法接触的神经元群体进行成像成为可能。然而，由于该记录方式固有的极大背景波动和高度空间重叠，从微内窥镜数据中提取单个神经元活动在计算上具有挑战性。在此，我们描述了一种新的约束矩阵分解方法，能够准确分离背景，进而解混并去噪感兴趣的神经信号。我们将该方法与先前的独立成分分析和约束非负矩阵分解方法进行了比较。在来自小鼠的模拟和实验数据上，我们的方法显著提升了提取的细胞信号质量，并检测到更多良好分离的神经信号，尤其是在噪声较大的数据环境中。

advances can in turn significantly enhance the statistical power of downstream analyses, and ultimately improve scientific conclusions derived from microendoscopic data.
这些进展反过来可以显著增强后续分析的统计效能，最终提升基于微内窥镜数据得出的科学结论的准确性。

Introduction 引言

Monitoring the activity of large-scale neuronal ensembles during complex behavioral states is fundamental to neuroscience research. Continued advances in optical imaging technology are greatly expanding the size and depth of neuronal populations that can be visualized. Specifically, in vivo calcium imaging through microendoscopic lenses and the development of miniaturized microscopes have enabled deep brain imaging of previously inaccessible neuronal populations of freely moving mice (Flusberg et alo, 2008; Ghosh et alo, 2011; Ziv and Ghosh, 2015). This technique has been widely used to study the neural circuits in cortical, subcortical, and deep brain areas, such as hippocampus (Cai et al., 2016; Ziv et al., 2013; Jimenez et al., 2018; Rubin et al., 2015), entorhinal cortex (Kitamura et alo, 2015; Sun et alo, 2015), hypothalamus (Jennings et al., 2015), prefrontal cortex (PFC) (Pinto and Dan, 2015), premotor cortex (Markowitz et alo, 2015), dorsal pons (Cox et al., 2016), basal forebrain (Harrison et al., 2016), striatum (Barbera et al., 2016; Carvalho Poyraz et al., 2016; Klaus et al., 2017), amygdala (Yu et al., 2017), and other brain regions.
在复杂行为状态下监测大规模神经元群体的活动是神经科学研究的基础。光学成像技术的持续进步极大地扩展了可视化神经元群体的规模和深度。具体而言，通过微内窥镜镜头进行的体内钙成像以及微型显微镜的发展，使得能够对自由活动小鼠中先前无法接触的深脑神经元群体进行成像成为可能（Flusberg 等，2008；Ghosh 等，2011；Ziv 和 Ghosh，2015）。该技术已被广泛用于研究皮层、皮层下及深脑区域的神经回路，如海马（Cai 等，2016；Ziv 等，2013；Jimenez 等，2018；Rubin 等，2015）、内嗅皮层（Kitamura 等，2015；Sun 等，2015）、下丘脑（Jennings 等，2015）、前额叶皮层（PFC）（Pinto 和 Dan，2015）、前运动皮层（Markowitz 等，2015）、背侧脑桥（Cox 等，2016）、基底前脑（Harrison 等，2016）、纹状体（Barbera 等，2016；Carvalho Poyraz 等，2016；Klaus 等，2017）、杏仁核（Yu 等，2017）及其他脑区。

Although microendoscopy has potential applications across numerous neuroscience fields (Ziv and Ghosh, 2015), methods for extracting cellular signals from this data are currently limited and suboptimal. Most existing methods are specialized for two-photon or light-sheet microscopy. However, these methods are not suitable for analyzing single-photon microendoscopic data because of its distinct features: specifically, this data typically displays large, blurry background fluctuations due to fluorescence contributions from neurons outside the focal plane. In Figure 1, we use a typical microendoscopic dataset to illustrate these effects (see Video 1 for raw video). Figure 1A shows an example frame of the selected data, which contains large signals additional to the neurons visible in the focal plane. These extra fluorescence signals contribute as background that contaminates the single-neuronal signals of interest. In turn, standard methods based on local correlations for visualizing cell outlines (Smith and Häusser, 2010) are not effective here, because the correlations in the fluorescence of nearby pixels are dominated by background signals (Figure 1B). For some neurons with strong visible signals, we can manually draw regions-of-interest (ROI) (Figure 1C). Following (Barbera et al., 2016; Pinto and Dan, 2015), we used the mean fluorescence trace of the surrounding pixels (blue, Figure 1D) to roughly estimate this background fluctuation; subtracting it from the raw trace in the neuron ROI yields a relatively good estimation of neuron signal (red, Figure 1D). Figure 1D shows that the background (blue) has much larger variance than the relatively sparse neural signal (red); moreover, the background signal fluctuates on similar timescales as the single-neuronal signal, so we can not simply temporally filter the background away after extraction of the mean signal within the ROI. This large background signal is likely due to a combination of local fluctuations resulting from out-of-focus fluorescence or neuropil activity, hemodynamics of blood vessels, and global fluctuations shared more broadly across the field of view (photo-bleaching effects, drifts in

z

of the focal plane, etc.), as illustrated schematically in Figure 1E.
尽管微型内窥镜在众多神经科学领域具有潜在应用（Ziv 和 Ghosh，2015），但目前从这些数据中提取细胞信号的方法仍然有限且不理想。大多数现有方法专门针对双光子或光片显微镜。然而，这些方法不适用于分析单光子微型内窥镜数据，因为其具有独特的特征：具体来说，该数据通常显示出由于焦平面外神经元的荧光贡献而产生的大量模糊背景波动。在图 1 中，我们使用一个典型的微型内窥镜数据集来说明这些效应（原始视频见视频 1）。图 1A 显示了所选数据的一个示例帧，其中包含除焦平面可见神经元外的额外大信号。这些额外的荧光信号作为背景，污染了感兴趣的单个神经元信号。反过来，基于局部相关性用于可视化细胞轮廓的标准方法（Smith 和 Häusser，2010）在这里效果不佳，因为邻近像素的荧光相关性主要受背景信号支配（图 1B）。对于一些具有明显强信号的神经元，我们可以手动绘制感兴趣区域（ROI）（图 1C）。参考（Barbera 等，2016；Pinto 和 Dan，2015），我们使用周围像素的平均荧光轨迹（蓝色，图 1D）来大致估计背景波动；将其从神经元 ROI 中的原始轨迹中减去，可以得到相对较好的神经元信号估计（红色，图 1D）。图 1D 显示，背景信号（蓝色）的方差远大于相对稀疏的神经信号（红色）；此外，背景信号的波动时间尺度与单个神经元信号相似，因此我们不能仅在提取 ROI 内的平均信号后通过时间滤波来去除背景。这个较大的背景信号很可能是由局部波动（由焦外荧光或神经纤维活动引起）、血管的血流动力学以及更广泛分布于视野中的全局波动（光漂白效应、焦平面位置的漂移等）共同作用所致，如图 1E 示意所示。

The existing methods for extracting individual neural activity from microendoscopic data can be divided into two classes: semi-manual ROI analysis (Barbera et al., 2016; Klaus et al., 2017; Pinto and Dan, 2015) and PCA/ICA analysis (Mukamel et al., 2009). Unfortunately, both approaches have well-known flaws (Resendez et al., 2016). For example, ROI analysis does not effectively demix signals of spatially overlapping neurons, and drawing ROls is laborious for large population recordings. More importantly, in many cases, the background contaminations are not adequately corrected, and thus the extracted signals are not sufficiently clean enough for downstream analyses. As for PCA/ICA analysis, it is a linear demixing method and therefore typically fails when the neural components exhibit strong spatial overlaps (Pnevmatikakis et al., 2016), as is the case in the microendoscopic setting.
现有从微内窥镜数据中提取单个神经活动的方法可分为两类：半手动 ROI 分析（Barbera 等，2016；Klaus 等，2017；Pinto 和 Dan，2015）和 PCA/ICA 分析（Mukamel 等，2009）。不幸的是，这两种方法都有众所周知的缺陷（Resendez 等，2016）。例如，ROI 分析无法有效分离空间重叠神经元的信号，而且对于大规模群体记录，绘制 ROI 非常费力。更重要的是，在许多情况下，背景污染未得到充分校正，因此提取的信号不足以满足后续分析的需求。至于 PCA/ICA 分析，它是一种线性分离方法，因此当神经成分表现出强烈的空间重叠时（Pnevmatikakis 等，2016），通常会失败，这正是微内窥镜环境中的情况。

Recently, constrained nonnegative matrix factorization (CNMF) approaches were proposed to simultaneously denoise, deconvolve, and demix calcium imaging data (Pnevmatikakis et al., 2016). However, current implementations of the CNMF approach were optimized for 2-photon and light-
最近，约束非负矩阵分解（CNMF）方法被提出，用于同时去噪、反卷积和分离钙成像数据（Pnevmatikakis 等，2016）。然而，目前 CNMF 方法的实现是针对双光子和光-

Figure 1. Microendoscopic data contain large background signals with rapid fluctuations due to multiple sources. (A) An example frame of microendoscopic data recorded in dorsal striatum (see Materials and methods section for experimental details). (B) The local ‘correlation image’ (Smith and Häusser, 2010) computed from the raw video data. Note that it is difficult to discern neuronal shapes in this image due to the high background spatial correlation level. © The mean-subtracted data within the cropped area (green) in (A). Two ROls were selected and coded with different colors. (D) The mean fluorescence traces of pixels within the two selected ROls (magenta and blue) shown in © and the difference between the two traces. (E) Cartoon illustration of various sources of fluorescence signals in microendoscopic data. ‘BG’ abbreviates ‘background’.
图 1. 微内窥镜数据包含来自多种来源的具有快速波动的大量背景信号。(A) 在背侧纹状体记录的微内窥镜数据示例帧（实验细节见材料与方法部分）。(B) 从原始视频数据计算的局部“相关图像”（Smith 和 Häusser，2010）。注意，由于背景空间相关水平较高，该图像中难以辨别神经元形状。(C) (A)中裁剪区域（绿色）内的均值减除数据。选择了两个感兴趣区域（ROI）并用不同颜色编码。(D) (C)中两个选定 ROI（品红色和蓝色）内像素的平均荧光轨迹及两条轨迹的差异。(E) 微内窥镜数据中各种荧光信号来源的示意图。“BG”缩写自“background”（背景）。
sheet microscopy, where the background has a simpler spatiotemporal structure. When applied to microendoscopic data, CNMF often has poor performance because the background is not modeled sufficiently accurately (Barbera et al., 2016).
片层显微镜，其背景具有更简单的时空结构。应用于微内窥镜数据时，CNMF 通常表现不佳，因为背景建模不够准确（Barbera 等，2016）。

In this paper, we significantly extend the CNMF framework to obtain a robust approach for extracting single-neuronal signals from microendoscopic data. Specifically, our extended CNMF for microendoscopic data (CNMF-E) approach utilizes a more accurate and flexible spatiotemporal background model that is able to handle the properties of the strong background signal illustrated in Figure 1, along with new specialized algorithms to initialize and fit the model components. After a brief description of the model and algorithms, we first use simulated data to illustrate the power of the new approach. Next, we compare CNMF-E with PCA/ICA analysis comprehensively on both simulated data and four experimental datasets recorded in different brain areas. The results show that CNMF-E outperforms PCA/ICA in terms of detecting more well-isolated neural signals, extracting higher signal-to-noise ratio (SNR) cellular signals, and obtaining more robust results in low SNR regimes. Finally, we show that downstream analyses of calcium imaging data can substantially benefit from these improvements.
在本文中，我们显著扩展了 CNMF 框架，提出了一种从微内窥镜数据中提取单神经元信号的鲁棒方法。具体而言，我们针对微内窥镜数据扩展的 CNMF 方法（CNMF-E）采用了更准确且灵活的时空背景模型，能够处理图 1 中所示的强背景信号特性，并配合新的专用算法来初始化和拟合模型组件。在简要介绍模型和算法后，我们首先使用模拟数据展示该新方法的强大功能。接着，我们在模拟数据和四个不同脑区记录的实验数据集上，全面比较了 CNMF-E 与 PCA/ICA 分析的性能。结果表明，CNMF-E 在检测更多独立神经信号、提取更高信噪比（SNR）的细胞信号以及在低 SNR 条件下获得更稳健结果方面均优于 PCA/ICA。最后，我们展示了钙成像数据的下游分析能够从这些改进中显著受益。

Model and model fitting
模型与模型拟合

CNMF for microendoscope data (CNMF-E)
微内窥镜数据的 CNMF（CNMF-E）

The recorded video data can be represented by a matrix

Y \in R_{+}^{d \times T}

, where

d

is the number of pixels in the field of view and

T

is the number of frames observed. In our model, each neuron

i

is characterized by its spatial ‘footprint’ vector

a_{i} \in R_{+}^{d}

characterizing the cell’s shape and location, and ‘calcium activity’ timeseries

c_{i} \in R_{+}^{T}

, modeling (up to a multiplicative and additive constant) cell

i

‘s mean fluorescence signal at each frame. Here, both

a_{i}

and

c_{i}

are constrained to be nonnegative because of their physical interpretations. The background fluctuation is represented by a matrix

B \in R_{+}^{d \times T}

. If the field of view contains a total number of

K

neurons, then the observed movie data is modeled as a superposition of all neurons’ spatiotemporal activity, plus time-varying background and additive noise:
记录的视频数据可以用一个矩阵

Y \in R_{+}^{d \times T}

表示，其中

d

是视野中的像素数量，

T

是观察到的帧数。在我们的模型中，每个神经元

i

由其空间“足迹”向量

a_{i} \in R_{+}^{d}

表征，该向量描述了细胞的形状和位置，以及“钙活动”时间序列

c_{i} \in R_{+}^{T}

，模拟（乘法和加法常数的变换下）细胞

i

在每一帧的平均荧光信号。这里，

a_{i}

和

c_{i}

都被限制为非负值，因为它们具有物理意义。背景波动由矩阵

B \in R_{+}^{d \times T}

表示。如果视野中包含总共

K

个神经元，则观察到的电影数据被建模为所有神经元时空活动的叠加，加上随时间变化的背景和加性噪声：

Y = \sum_{i = 1}^{K} a_{i} \cdot c_{i}^{T} + B + E = A C + B + E,

where

A = [a_{1}, \dots, a_{K}]

and

C = {[c_{1}, \dots, c_{K}]}^{T}

. The noise term

E \in R^{d \times T}

is modeled as Gaussian,

E (t) \sim N (0, Σ)

is a diagonal matrix, indicating that the noise is spatially and temporally uncorrelated.
其中

A = [a_{1}, \dots, a_{K}]

和

C = {[c_{1}, \dots, c_{K}]}^{T}

。噪声项

E \in R^{d \times T}

被建模为高斯噪声，

E (t) \sim N (0, Σ)

是一个对角矩阵，表明噪声在空间和时间上都是不相关的。

Estimating the model parameters

A, C

in model (1) gives us all neurons’ spatial footprints and their denoised temporal activity. This can be achieved by minimizing the residual sum of squares (RSS), aka the Frobenius norm of the matrix

Y - (A C + B)

,
估计模型（1）中的参数

A, C

可以得到所有神经元的空间轮廓及其去噪的时间活动。这可以通过最小化残差平方和（RSS），即矩阵

Y - (A C + B)

的 Frobenius 范数来实现，

‖ Y - (A C + B) ‖_{F}^{2},

while requiring the model variables

A, C

and

B

to follow the desired constraints, discussed below.
同时要求模型变量

A, C

和

B

满足下文讨论的期望约束条件。

Constraints on neuronal spatial footprints $A$ and neural temporal traces C
神经元空间轮廓 $A$ 和神经时间轨迹 C 的约束

Each spatial footprint

a_{i}

should be spatially localized and sparse, since a given neuron will cover only a small fraction of the field of view, and therefore most elements of

a_{i}

will be zero. Thus, we need to incorporate spatial locality and sparsity constraints on

A

(Pnevmatikakis et alo, 2016). We discuss details further below.
每个空间轮廓

a_{i}

应该在空间上局部化且稀疏，因为一个神经元只会覆盖视野中的一小部分，因此

a_{i}

的大多数元素将为零。因此，我们需要对

A

引入空间局部性和稀疏性约束（Pnevmatikakis 等，2016）。详细内容将在下文讨论。

Similarly, the temporal components

c_{i}

are highly structured, as they represent the cells’ fluorescence responses to sparse, nonnegative trains of action potentials. Following (Vogelstein et al., 2010; Pnevmatikakis et al., 2016), we model the calcium dynamics of each neuron

c_{i}

with a stable autoregressive (AR) process of order

p

,
同样，时间成分

c_{i}

具有高度结构性，因为它们表示细胞对稀疏、非负动作电位序列的荧光响应。参考（Vogelstein 等，2010；Pnevmatikakis 等，2016），我们用一个稳定的自回归（AR）过程，阶数为

p

，来模拟每个神经元

c_{i}

的钙动力学，

c_{i} (t) = \sum_{j = 1}^{p} γ_{j}^{(i)} c_{i} (t - j) + s_{i} (t)

where

s_{i} (t) \geq 0

is the number of spikes that neuron fired at the

t

-th frame. (Note that there is no further noise input into

c_{i} (t)

beyond the spike signal

s_{i} (t)

.) The AR coefficients

{γ_{j}^{(i)}}

are different for each neuron and they are estimated from the data. In practice, we usually pick

p = 2

, thus incorporating both a nonzero rise and decay time of calcium transients in response to a spike; then Equation (3) can be expressed in matrix form as
其中

s_{i} (t) \geq 0

是神经元在第

t

帧发放的脉冲数。（注意，除了脉冲信号

s_{i} (t)

外，

c_{i} (t)

中没有进一步的噪声输入。）AR 系数

{γ_{j}^{(i)}}

对每个神经元不同，并且它们是从数据中估计得到的。实际上，我们通常选择

p = 2

，从而同时考虑了脉冲引起的钙瞬变的非零上升和衰减时间；然后方程（3）可以用矩阵形式表示为

G_{i} \cdot c_{i} = s_{i}, with G_{i} = [\begin{array}{ccccc} 1 & 0 & 0 & \dots & 0 \\ - γ_{1}^{(i)} & 1 & 0 & \dots & 0 \\ - γ_{2}^{(i)} & - γ_{1}^{(i)} & 1 & \dots & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & \dots & - γ_{2}^{(i)} & - γ_{1}^{(i)} & 1 \end{array}] .

The neural activity

s_{i}

is nonnegative and typically sparse; to enforce sparsity, we can penalize the

ℓ_{0}

(Jewell and Witten, 2017) or

ℓ_{1}

(Pnevmatikakis et al., 2016; Vogelstein et al., 2010) norm of

s_{i}

, or limit the minimum size of nonzero spike counts (Friedrich et alo, 2017b). When the rise time
神经活动

s_{i}

是非负且通常稀疏的；为了强制稀疏性，我们可以对

s_{i}

的

ℓ_{0}

范数（Jewell 和 Witten，2017）或

ℓ_{1}

范数（Pnevmatikakis 等，2016；Vogelstein 等，2010）进行惩罚，或者限制非零脉冲计数的最小大小（Friedrich 等，2017b）。当上升时间

Video 1. An example of typical microendoscopic data. The video was recorded in dorsal striatum; experimental details can be found above. MP4
视频 1. 典型微内窥镜数据的示例。视频记录于背侧纹状体；实验细节见上文。MP4
DOI: https://doi.org/10.7554/eLife.28728.003
DOI：https://doi.org/10.7554/eLife.28728.003

Video 2. Comparison of CNMF-E with rank-1 NMF in estimating background fluctuation in simulated data. Top left: the simulated fluorescence data in Figure 2. Bottom left: the ground truth of neuron signals in the simulation. Top middle: the estimated background from the raw video data (top left) using CNMF-E. Bottom middle: the residual of the raw video after subtracting the background estimated with CNMF-E. Top right and top bottom: same as top middle and bottom middle, but the background is estimated with rank-1 NMF. MP4
视频 2. CNMF-E 与秩-1 NMF 在模拟数据中估计背景波动的比较。左上：图 2 中的模拟荧光数据。左下：模拟中神经元信号的真实值。中上：使用 CNMF-E 从原始视频数据（左上）估计的背景。中下：从原始视频中减去 CNMF-E 估计背景后的残差。右上和右下：与中上和中下相同，但背景估计采用秩-1 NMF。MP4
DOI: https://doi.org/10.7554/eLife.28728.005
DOI：https://doi.org/10.7554/eLife.28728.005
constant is small compared to the timebin width (low imaging frame rate), we typically use a simpler AR(1) model (with an instantaneous rise following a spike) (Pnevmatikakis et al., 2016).
当常数相比于时间箱宽度较小时（成像帧率较低），我们通常使用更简单的 AR(1)模型（在脉冲后瞬时上升）（Pnevmatikakis 等，2016）。

Constraints on background activity $B$
背景活动的约束 $B$

In the above we have largely followed previously described CNMF approaches (Pnevmatikakis et alo, 2016) for modeling calcium imaging signals. However, to accurately model the background effects in microendoscopic data, we need to depart significantly from these previous approaches. Constraints on the background term

B

in Equation (1) are essential to the success of CNMF-E, since clearly, if

B

is completely unconstrained we could just absorb the observed data

Y

entirely into

B

, which would lead to recovery of no neural activity. At the same time, we need to prevent the residual of the background term (i.e.

B - \hat{B}

, where

\hat{B}

denotes the estimated spatiotemporal background) from corrupting the estimated neural signals

A C

in model (1), since subsequently, the extracted neuronal activity would be mixed with background fluctuations, leading to artificially high correlations between nearby cells. This problem is even worse in the microendoscopic context because the background fluctuation usually has significantly larger variance than the isolated cellular signals of interest (Figure 1D), and therefore any small errors in the estimation of

B

can severely corrupt the estimated neural signal

A C

.
上述内容我们在很大程度上遵循了先前描述的 CNMF 方法（Pnevmatikakis 等，2016）来建模钙成像信号。然而，为了准确建模微内窥镜数据中的背景效应，我们需要显著偏离这些先前的方法。方程（1）中背景项

B

的约束对于 CNMF-E 的成功至关重要，因为显然，如果

B

完全不受约束，我们就可以将观测数据

Y

完全吸收到

B

中，这将导致无法恢复任何神经活动。与此同时，我们需要防止背景项的残差（即

B - \hat{B}

，其中

\hat{B}

表示估计的时空背景）污染模型（1）中估计的神经信号

A C

，因为随后提取的神经元活动将与背景波动混合，导致邻近细胞之间出现人为的高相关性。在微内窥镜环境中，这个问题更为严重，因为背景波动的方差通常远大于感兴趣的孤立细胞信号（图 1D），因此对

B

的任何小误差都会严重破坏估计的神经信号

A C

。

In (Pnevmatikakis et al., 2016),

B

is modeled as a rank-1 nonnegative matrix

B = b \cdot f^{T}

, where

b \in

R_{+}^{d}

and

f \in R_{+}^{T}

. This model mainly captures the global fluctuations within the field of view (FOV). In applications to two-photon or light-sheet data, this rank-1 model has been shown to be sufficient for relatively small spatial regions; the simple low-rank model does not hold for larger fields of view, and so we can simply divide large FOVs into smaller patches for largely parallel processing (Pnevmatikakis et al., 2016; Giovannucci et al., 2017b). (See [Pachitariu et al., 2016] for an alternative approach.) However, as we will see below, the local rank-1 model fails in many microendoscopic datasets, where multiple large overlapping background sources exist even within modestly sized FOVs.
在（Pnevmatikakis 等，2016）中，

B

被建模为一个秩为 1 的非负矩阵

B = b \cdot f^{T}

，其中

b \in

R_{+}^{d}

和

f \in R_{+}^{T}

。该模型主要捕捉视野（FOV）内的全局波动。在应用于双光子或光片数据时，这个秩为 1 的模型已被证明对于相对较小的空间区域是足够的；简单的低秩模型不适用于较大的视野，因此我们可以将大视野划分为较小的块以实现大规模并行处理（Pnevmatikakis 等，2016；Giovannucci 等，2017b）。（参见[Pachitariu 等，2016]的另一种方法。）然而，正如我们下面将看到的，局部秩为 1 的模型在许多微内窥镜数据集中失败，因为即使在适中大小的视野内也存在多个大型重叠的背景源。

Thus, we propose a new model to constrain the background term

B

. We first decompose the background into two terms:
因此，我们提出了一个新的模型来约束背景项

B

。我们首先将背景分解为两个部分：

B = B^{f} + B^{c},

where

B^{f}

represents fluctuating activity and

B^{c} = b_{0} \cdot 1^{T}

models constant baselines (

1 \in R^{T}

denotes a vector of

T

ones). To model

B^{f}

, we exploit the fact that background sources (largely due to blurred out-of-focus fluorescence) are empirically much coarser spatially than the average neuron soma size

l

. Thus, we model

B^{f}

at one pixel as a linear combination of the background fluorescence in pixels which are chosen to be nearby but not nearest neighbors:
其中

B^{f}

表示波动活动，

B^{c} = b_{0} \cdot 1^{T}

模型恒定基线（

1 \in R^{T}

表示一个由

T

个 1 组成的向量）。为了建模

B^{f}

，我们利用了背景源（主要是由于模糊的离焦荧光）在空间上远比平均神经元胞体大小

l

粗糙的经验事实。因此，我们将一个像素处的

B^{f}

建模为附近但非最近邻像素的背景荧光的线性组合：

B_{i t}^{f} = \sum_{j \in Ω_{i}} w_{i j} \cdot B_{j t}^{f}, \forall t = 1 \dots T

where

Ω_{i} = {j ∣ dist (x_{i}, x_{j}) \in [l_{n}, l_{n} + 1)}

, with

dist (x_{i}, x_{j})

the Euclidean distance between pixel

i

and

j

. Thus,

Ω_{i}

only selects the neighboring pixels with a distance of

l_{n}

from the

i

-th pixel (the green dot and black pixels in Figure 2B illustrate

i

and

Ω_{i}

, respectively); here

l_{n}

is a parameter that we choose to be greater than

l

(the size of the typical soma in the FOV), e.g.,

l_{n} = 2 l

. This choice of

l_{n}

ensures that pixels

i

and

j

in Equation (6) share similar background fluctuations, but do not belong to the same soma.
其中

Ω_{i} = {j ∣ dist (x_{i}, x_{j}) \in [l_{n}, l_{n} + 1)}

，

dist (x_{i}, x_{j})

是像素

i

和

j

之间的欧氏距离。因此，

Ω_{i}

仅选择距离第

i

个像素为

l_{n}

的邻近像素（图 2B 中的绿色点和黑色像素分别示意了

i

和

Ω_{i}

）；这里

l_{n}

是我们选择的一个参数，且大于

l

（视野中典型胞体的大小），例如

l_{n} = 2 l

。这种

l_{n}

的选择确保了方程（6）中的像素

i

和

j

共享相似的背景波动，但不属于同一个胞体。

We can rewrite Equation (6) in matrix form:
我们可以将方程（6）重写为矩阵形式：

B^{f} = W B^{f}

where

W_{i j} = 0

dist (x_{i}, x_{j}) \notin [l_{n}, l_{n} + 1)

. In practice, this hard constraint is difficult to enforce computationally and is overly stringent given the noisy observed data. We relax the model by replacing the right-hand side

B^{f}

with the more convenient closed-form expression
其中

W_{i j} = 0

如果

dist (x_{i}, x_{j}) \notin [l_{n}, l_{n} + 1)

。在实际操作中，这一硬性约束在计算上难以执行，且考虑到观测数据的噪声，这一约束过于严格。我们通过将右侧的

B^{f}

替换为更方便的闭式表达式来放宽模型。

B^{f} = W \cdot (Y - A C - b_{0} \cdot 1^{T})

According to Equations (1) and (5), this change ignores the noise term

E

; since elements in

E

are spatially uncorrelated,

W \cdot E

contributes as a very small disturbance to

{\hat{B}}^{f}

in the left-hand side. We found this substitution for

{\hat{B}}^{f}

led to significantly faster and more robust model fitting.
根据方程（1）和（5），这一变化忽略了噪声项

E

；由于

E

中的元素在空间上不相关，

W \cdot E

对左侧的

{\hat{B}}^{f}

仅构成极小的扰动。我们发现将

{\hat{B}}^{f}

替换为该表达式显著加快了模型拟合速度并提高了鲁棒性。

Fitting the CNMF-E model
拟合 CNMF-E 模型

Table 1 lists the variables in the proposed CNMF-E model. Now we can formulate the estimation of all model variables as a single optimization meta-problem:
表 1 列出了所提出的 CNMF-E 模型中的变量。现在我们可以将所有模型变量的估计表述为一个单一的优化元问题：

\begin{array}{cl} \underset{A, C, S, B^{f}, W, b_{0}}{minimize} & {‖ Y - A C - b_{0} \cdot 1^{T} - B^{f} ‖}_{F}^{2} \\ subject to & A \geq 0, A is sparse and spatially localized \\ c_{i} \geq 0, s_{i} \geq 0, G^{(i)} c_{i} = s_{i}, s_{i} issparse \forall i = 1 \dots K \\ B^{f} \cdot 1 = 0 \\ B^{f} = W \cdot (Y - A C - b_{0} \cdot 1^{T}) \\ W_{i j} = 0 if dist (x_{i}, x_{j}) \notin [l_{n}, l_{n} + 1) \end{array}

We call this a ‘meta-problem’ because we have not yet explicitly defined the sparsity and spatial locality constraints on

A

and

S = {[s_{1}, \dots, s_{K}]}^{T}

; these can be customized by users under different assumptions (see details in Materials and methods). Also note that

s_{i}

is completely determined by

c_{i}

and

G^{(i)}

, and

B^{f}

is not optimized explicitly but (as discussed above) can be estimated as

W \cdot (Y - A C - b_{0} \cdot 1^{T})

, so we optimize with respect to

W

instead.
我们称之为“元问题”，因为我们尚未明确定义对

A

和

S = {[s_{1}, \dots, s_{K}]}^{T}

的稀疏性和空间局部性约束；这些可以根据不同假设由用户自定义（详见材料与方法部分）。还要注意，

s_{i}

完全由

c_{i}

和

G^{(i)}

决定，而

B^{f}

并未被显式优化，但（如上所述）可以估计为

W \cdot (Y - A C - b_{0} \cdot 1^{T})

，因此我们改为针对

W

进行优化。

The problem (P-All) optimizes all variables together and is non-convex but can be divided into three simpler subproblems that we solve iteratively:
问题 (P-All) 同时优化所有变量，属于非凸问题，但可以分解为三个更简单的子问题，我们通过迭代求解：

Estimating

A, b_{0}

given

\hat{C}, {\hat{B}}^{f}

在给定

\hat{C}, {\hat{B}}^{f}

的情况下估计

A, b_{0}

\begin{array}{cl} \underset{A, b_{0}}{minimize} & {‖ Y - A \cdot \hat{C} - b_{0} \cdot 1^{T} - {\hat{B}}^{f} ‖}_{F}^{2} \\ subject to & A \geq 0, A is sparse and spatially localized \end{array}

Estimating

C, b_{0}

given

\hat{A}, {\hat{B}}^{f}

在给定

\hat{A}, {\hat{B}}^{f}

的情况下估计

C, b_{0}

Video 3. Initialization procedure for the simulated data in Figure 3. Top left: correlation image of the filtered data. Red dots are centers of initialized neurons. Top middle: candidate seed pixels (small red dots) for initializing neurons on top of PNR image. The large red dot indicates the current seed pixel. Top right: the correlation image surrounding the selected seed pixel or the spatial footprint of the initialized neuron. Bottom: the filtered fluorescence trace at the seed pixel or the initialized temporal activity (both raw and denoised). MP4
视频 3. 图 3 中模拟数据的初始化过程。左上：滤波数据的相关图像。红点为初始化神经元的中心。中上：用于初始化神经元的候选种子像素（小红点），叠加在 PNR 图像上。大红点表示当前的种子像素。右上：选定种子像素周围的相关图像或初始化神经元的空间足迹。底部：种子像素处的滤波荧光轨迹或初始化的时间活动（包括原始和去噪数据）。MP4
DOI: https://doi.org/10.7554/eLife.28728.008

Video 4. The results of CNMF-E in demixing simulated data in Figure 4 (SNR reduction factor

= 1

). Top left: the simulated fluorescence data. Bottom left: the estimated background. Top middle: the residual of the raw video (top left) after subtracting the estimated background (bottom left). Bottom middle: the denoised neural signals. Top right: the residual of the raw video data (top right) after subtracting the estimated background (bottom left) and denoised neural signal (bottom middle). Bottom right: the ground truth of neural signals in simulation. MP4 DOI: https://doi.org/10.7554/eLife.28728.010
视频 4. CNMF-E 在图 4 中分离模拟数据的结果（信噪比降低因子

= 1

）。左上：模拟荧光数据。左下：估计的背景。中上：从原始视频（左上）中减去估计背景（左下）后的残差。中下：去噪后的神经信号。右上：从原始视频数据（右上）中减去估计背景（左下）和去噪神经信号（中下）后的残差。右下：模拟中神经信号的真实值。MP4 DOI：https://doi.org/10.7554/eLife.28728.010

\begin{array}{cl} \underset{C, S, b_{0}}{minimize} & {‖ Y - \hat{A} \cdot C - b_{0} \cdot 1^{T} - {\hat{B}}^{f} ‖}_{F}^{2} \\ subject to & c_{i} \geq 0, s_{i} \geq 0 \\ G^{(i)} c_{i} = s_{i}, s_{i} is sparse \forall i = 1 \dots K \end{array}

Estimating

W, b_{0}

given

\hat{A}, \hat{C}

给定

\hat{A}, \hat{C}

估计

W, b_{0}

\begin{array}{cl} \underset{W, B^{f}, b_{0}}{minimize} & {‖ Y - \hat{A} \cdot \hat{C} - b_{0} \cdot 1^{T} - B^{f} ‖}_{F}^{2} \\ subject to & B^{f} \cdot 1 = 0 \\ B^{f} = W \cdot (Y - \hat{A} \cdot \hat{C} - b_{0} \cdot 1^{T}) \\ W_{i j} = 0 if dist (x_{i}, x_{j}) \notin [l_{n}, l_{n} + 1) \end{array}

For each of these subproblems, we are able to use well-established algorithms (e.g. solutions for (P-S) and (P-T) are discussed in Friedrich et alo, 2017a; Pnevmatikakis et al., 2016) or slight modifications thereof. By iteratively solving these three subproblems, we obtain tractable updates for all model variables in problem (P-All). Furthermore, this strategy gives us the flexibility of further potential interventions (either automatic or semi-manual) in the optimization procedure, for example, incorporating further prior information on neurons’ morphology, or merging/splitting/deleting spatial components and detecting missed neurons from the residuals. These steps can significantly improve the quality of the model fitting; this is an advantage compared with PCA/ICA, which offers no easy option for incorporation of stronger prior information or manually guided improvements on the estimates.
对于每个子问题，我们能够使用成熟的算法（例如，(P-S) 和 (P-T) 的解决方案在 Friedrich 等人，2017a；Pnevmatikakis 等人，2016 中有讨论）或对其进行轻微修改。通过迭代求解这三个子问题，我们获得了问题 (P-All) 中所有模型变量的可行更新。此外，这一策略为优化过程中的进一步潜在干预（无论是自动的还是半手动的）提供了灵活性，例如，结合关于神经元形态的更多先验信息，或合并/拆分/删除空间成分以及从残差中检测遗漏的神经元。这些步骤可以显著提高模型拟合的质量；这相比于 PCA/ICA 是一个优势，因为后者没有简单的选项来结合更强的先验信息或对估计进行手动指导的改进。

Full details on the algorithms for initializing and then solving these three subproblems are provided in the Materials and methods section.
关于初始化及随后求解这三个子问题的算法的完整细节，详见材料与方法部分。

Figure 2. CNMF-E can accurately separate and recover the background fluctuations in simulated data. (A) An example frame of simulated microendoscopic data formed by summing up the fluorescent signals from the multiple sources illustrated in Figure 1E. (B) A zoomed-in version of the circle in (A). The green dot indicates the pixel of interest. The surrounding black pixels are its neighbors with a distance of 15 pixels. The red area approximates the size of a typical neuron in the simulation. © Raw fluorescence traces of the selected pixel and some of its neighbors on the black ring. Note the high correlation. (D) Fluorescence traces (raw data; true and estimated background; true and initial estimate of neural signal) from the center pixel as selected in (B). Note that the background dominates the raw data in this pixel, but nonetheless we can accurately estimate the background and subtract it away here. Scalebars: 10 seconds. Panels (E-G) show the cellular signals in the same frame as (A). (E) Ground truth neural activity. (F) The residual of the raw frame after subtracting the background estimated with CNMF-E; note the close correspondence with E. (G) Same as (F), but the background is estimated with rank-1 NMF. A video showing (E-G) for all frames can be found at Video 2. (H) The mean correlation coefficient (over all pixels) between the true background fluctuations and the estimated background fluctuations. The rank of NMF varies and we run randomly-initialized NMF for 10 times for each rank. The red line is the performance of CNMF-E, which requires no selection of the NMF rank. (I) The performance of CNMF-E and rank-1 NMF in recovering the background fluctuations from the data superimposed with an increasing number of background sources.
图 2. CNMF-E 能够准确分离并恢复模拟数据中的背景波动。(A) 模拟微内窥镜数据的示例帧，由图 1E 中多个信号源的荧光信号叠加形成。(B) (A)中圆圈的放大图。绿色点表示感兴趣的像素。周围的黑色像素是距离为 15 像素的邻居。红色区域大致表示模拟中典型神经元的大小。(C) 选定像素及其黑色环上部分邻居的原始荧光轨迹。注意高度相关性。(D) (B)中选定中心像素的荧光轨迹（原始数据；真实和估计的背景；真实和初始估计的神经信号）。注意该像素的原始数据中背景占主导，但我们仍能准确估计并去除背景。比例尺：10 秒。(E-G) 显示与(A)相同帧中的细胞信号。(E) 真实神经活动。(F) 用 CNMF-E 估计背景后从原始帧中减去背景的残差；注意与(E)的高度对应。 (G) 与(F)相同，但背景使用秩为 1 的 NMF 估计。展示(E-G)所有帧的视频可见于视频 2。(H) 真实背景波动与估计背景波动之间的平均相关系数（所有像素的平均值）。NMF 的秩变化，我们对每个秩随机初始化 NMF 运行 10 次。红线表示 CNMF-E 的性能，CNMF-E 不需要选择 NMF 的秩。(I) CNMF-E 和秩为 1 的 NMF 在从叠加了越来越多背景源的数据中恢复背景波动的性能。

Results 结果

CNMF-E can reliably estimate large high-rank background fluctuations
CNMF-E 能够可靠地估计大规模高秩背景波动

We first use simulated data to illustrate the background model in CNMF-E and compare its performance against the low-rank NMF model used in the basic CNMF approach (Pnevmatikakis et alo, 2016). We generated the observed fluorescence

Y

by summing up simulated fluorescent signals of multiple sources as shown in Figure 1E plus additive Gaussian white noise (Figure 2A).
我们首先使用模拟数据来说明 CNMF-E 中的背景模型，并将其性能与基本 CNMF 方法中使用的低秩 NMF 模型（Pnevmatikakis 等，2016）进行比较。我们通过将多个源的模拟荧光信号相加（如图 1E 所示）并加上高斯白噪声，生成了观测到的荧光

Y

（图 2A）。

An example pixel (green dot, Figure 2A,B) was selected to illustrate the background model in CNMF-E (Equation (6)), which assumes that each pixel’s background activity can be reconstructed using its neighboring pixels’ activities. The selected neighbors form a ring and their distances to the center pixel are larger than a typical neuron size (Figure 2B). Figure 2C shows that the fluorescence traces of the center pixel and its neighbors are highly correlated due to the shared large background fluctuations. Here, for illustrative purposes, we fit the background by solving problem (P-B) directly
选取了一个示例像素（绿色点，图 2A、B）来说明 CNMF-E 中的背景模型（公式（6）），该模型假设每个像素的背景活动可以通过其邻近像素的活动来重建。所选的邻居形成一个环，其与中心像素的距离大于典型神经元的大小（图 2B）。图 2C 显示，由于共享的大背景波动，中心像素及其邻居的荧光轨迹高度相关。在此，为了说明问题，我们通过直接求解问题（P-B）来拟合背景
while assuming

\hat{A} \hat{C} = 0

. This mistaken assumption should make the background estimation more challenging (due to true neural components getting absorbed into the background), but nonetheless in Figure 2 we see that the background fluctuation was well recovered (Figure 2D). Subtracting this estimated background from the observed fluorescence in the center yields a good visualization of the cellular signal (Figure 2D). Thus, this example shows that we can reconstruct a complicated background trace while leaving the neural signal uncontaminated.
同时假设

\hat{A} \hat{C} = 0

。这一错误假设应当使背景估计更加困难（因为真实的神经成分会被吸收到背景中），但尽管如此，在图 2 中我们看到背景波动被很好地恢复了（图 2D）。从中心像素的观测荧光中减去该估计背景，得到了细胞信号的良好可视化（图 2D）。因此，这个例子表明我们可以重建复杂的背景轨迹，同时保持神经信号不受污染。

For the example frame in Figure 2A, the true cellular signals are sparse and weak (Figure 2E). When we subtract the estimated background using CNMF-E from the raw data, we obtain a good recovery of the true signal (Figure 2D,F). For comparison, we also estimate the background activity by applying a rank-1 NMF model as used in basic CNMF; the resulting background-subtracted image is still severely contaminated by the background (Figure 2G). This is easy to understand: the spatiotemporal background signal in microendoscopic data typically has a rank higher than one, due to the various signal sources indicated in Figure 1E), and therefore a rank-1 NMF background model is insufficient.
对于图 2A 中的示例帧，真实的细胞信号稀疏且微弱（图 2E）。当我们使用 CNMF-E 从原始数据中减去估计的背景时，能够很好地恢复真实信号（图 2D，F）。作为对比，我们还通过应用基本 CNMF 中使用的秩为 1 的 NMF 模型来估计背景活动；得到的背景减除图像仍然被背景严重污染（图 2G）。这很容易理解：微内窥镜数据中的时空背景信号通常具有高于一的秩，这是由于图 1E 中所示的各种信号源，因此秩为 1 的 NMF 背景模型是不足够的。

A naive approach would be to simply increase the rank of the NMF background model. Figure 2H demonstrates that this approach is ineffective: higher rank NMF does yield generally better reconstruction performance, but with high variability and low reliability (due to randomness in the initial conditions of NMF). Eventually as the NMF rank increases many single-neuronal signals of interest are swallowed up in the estimated background signal (data not shown). In contrast, CNMF-E recovers the background signal more accurately than any of the high-rank NMF models.
一种简单的做法是直接增加 NMF 背景模型的秩。图 2H 显示这种方法效果不佳：更高秩的 NMF 确实通常能带来更好的重建性能，但其结果具有较大变异性且可靠性低（由于 NMF 初始条件的随机性）。随着 NMF 秩的增加，许多感兴趣的单神经元信号最终被估计的背景信号吞没（数据未显示）。相比之下，CNMF-E 比任何高秩 NMF 模型都能更准确地恢复背景信号。

In real data analysis settings, the rank of NMF is an unknown and the selection of its value is a nontrivial problem. We simulated data sets with different numbers of local background sources and use a single parameter setting to run CNMF-E for reconstructing the background over multiple such simulations. Figure 21 shows that the performance of CNMF-E does not degrade quickly as we have more background sources, in contrast to rank-1 NMF. Therefore, CNMF-E can recover the background accurately across a diverse range of background sources, as desired.
在真实数据分析环境中，NMF 的秩是未知的，选择其值是一个复杂的问题。我们模拟了具有不同数量局部背景源的数据集，并使用单一参数设置运行 CNMF-E，在多次此类模拟中重建背景。图 21 显示，随着背景源数量增加，CNMF-E 的性能不会迅速下降，这与秩为 1 的 NMF 形成对比。因此，CNMF-E 能够在多样的背景源范围内准确恢复背景，达到了预期效果。

CNMF-E accurately initializes single-neuronal spatial and temporal components
CNMF-E 准确地初始化了单神经元的空间和时间成分

Next, we used simulated data to validate our proposed initialization procedure (Figure 3A). In this example, we simulated 200 neurons with strong spatial overlaps (Figure 3B). One of the first steps in our initialization procedure is to apply a Gaussian spatial filter to the images to reduce the (spatially coarser) background and boost the power of neuron-sized objects in the images. In Figure 3C, we see that the local correlation image (Smith and Häusser, 2010) computed on the spatially filtered data provides a good initial visualization of neuron locations; compare to Figure 1B, where the correlation image computed on the raw data was highly corrupted by background signals.
接下来，我们使用模拟数据验证了我们提出的初始化过程（图 3A）。在这个例子中，我们模拟了 200 个具有强空间重叠的神经元（图 3B）。初始化过程的第一步之一是对图像应用高斯空间滤波器，以减少（空间上更粗糙的）背景并增强图像中神经元大小物体的信号强度。在图 3C 中，我们看到在空间滤波数据上计算的局部相关图像（Smith 和 Häusser，2010）为神经元位置提供了良好的初步可视化；与图 1B 相比，后者是在原始数据上计算的相关图像，受背景信号严重干扰。

We choose two example ROIs to illustrate how CNMF-E removes the background contamination and demixes nearby neural signals for accurate initialization of neurons’ shapes and activity. In the first example, we choose a well-isolated neuron
我们选择两个示例 ROI 来说明 CNMF-E 如何去除背景污染并分离邻近的神经信号，以准确初始化神经元的形状和活动。在第一个例子中，我们选择了一个孤立良好的神经元

Video 5. The results of CNMF-E in demixing the simulated data in Figure 4 (SNR reduction factor = 6). Conventions as in previous video. MP4
视频 5. CNMF-E 在分离图 4 中模拟数据的结果（信噪比降低因子=6）。惯例同前视频。MP4
DOI: https://doi.org/10.7554/eLife.28728.011 (green box, Figure

3 A + B

). We select three pixels located in the center, the periphery, and the outside of the neuron and show the corresponding fluorescence traces in both the raw data and the spatially filtered data (Figure 3D). The raw traces are noisy and highly correlated, but the filtered traces show relatively clean neural signals. This is because spatial filtering reduces the shared background activity and the remaining neural signals dominate the filtered data. Similarly, Figure 3E is an example showing how CNMF-E demixes two overlapping neurons. The filtered traces in the centers of the two neurons still preserve their own temporal activity.
DOI：https://doi.org/10.7554/eLife.28728.011（绿色框，图

3 A + B

）。我们选择位于神经元中心、边缘和外部的三个像素，并展示对应的荧光轨迹，包括原始数据和空间滤波后的数据（图 3D）。原始轨迹噪声较大且高度相关，但滤波后的轨迹显示出相对干净的神经信号。这是因为空间滤波减少了共享的背景活动，剩余的神经信号在滤波数据中占主导地位。同样，图 3E 展示了 CNMF-E 如何分离两个重叠神经元的示例。两个神经元中心的滤波轨迹仍然保留了各自的时间活动。

After initializing the neurons’ traces using the spatially filtered data, we initialize our estimate of
在使用空间滤波数据初始化神经元轨迹后，我们初始化对...
their spatial footprints. Note that simply initializing these spatial footprints with the spatially filtered data does not work well (data not shown), since the resulting shapes are distorted by the spatial filtering process. We found that it was more effective to initialize each spatial footprint by regressing the initial neuron traces onto the raw movie data (see Materials and methods for details). The initial values already match the simulated ground truth with fairly high fidelity (Figure 3D+E). In this simulated data, CNMF-E successfully identified all 200 neurons and initialized their spatial and temporal components (Figure 3F). We then evaluate the quality of initialization using all neurons’ spatial and temporal similarities with their counterparts in the ground truth data. Figure 3G shows that all initialized neurons have high similarities with the truth, indicating a good recovery and demixing of all neuron sources.
它们的空间轮廓。请注意，仅用空间滤波后的数据初始化这些空间轮廓效果并不好（数据未展示），因为空间滤波过程会导致形状失真。我们发现，通过将初始神经元时间轨迹回归到原始电影数据上来初始化每个空间轮廓更为有效（具体见材料与方法）。初始值已经与模拟的真实数据高度匹配（图 3D+E）。在这组模拟数据中，CNMF-E 成功识别了所有 200 个神经元，并初始化了它们的空间和时间成分（图 3F）。随后，我们通过所有神经元的空间和时间相似度与真实数据中的对应部分进行比较，评估初始化的质量。图 3G 显示，所有初始化的神经元与真实数据均具有高度相似性，表明所有神经元信号源均被良好恢复和分离。

Thresholds on the minimum local correlation and the minimum peak-to-noise ratio (PNR) for detecting seed pixels are necessary for defining the initial spatial components. To quantify the sensitivity of choosing these two thresholds, we plot the local correlations and the PNRs of all pixels chosen as the local maxima within an area of

\frac{l}{4} \times \frac{l}{4}

, where

l

is the diameter of a typical neuron, in the correlation image or the PNR image (Figure 3H). Pixels are classified into two classes according to their locations relative to the closest neurons: neurons’ central areas and outside areas (see Materials and methods for full details). It is clear that the two classes are linearly well separated and the thresholds can be chosen within a broad range of values (Figure 3H), indicating that the algorithm is robust with respect to these threshold parameters here. In lower SNR settings, these boundaries may be less clear, and an incremental approach (in which we choose the highest-SNR neurons first, then estimate the background and examine the residual to select the lowest-SNR cells) may be preferred; this incremental approach is discussed in more depth in the Materials and methods section.
检测种子像素所需的最小局部相关性和最小峰值信噪比（PNR）阈值对于定义初始空间成分是必要的。为了量化选择这两个阈值的敏感性，我们绘制了在相关图像或 PNR 图像中，所有被选为局部极大值的像素在一个区域

\frac{l}{4} \times \frac{l}{4}

内的局部相关性和 PNR，其中

l

是典型神经元的直径（图 3H）。根据像素相对于最近神经元的位置，将像素分为两类：神经元中心区域和外部区域（详见材料与方法部分）。显然，这两类像素在线性上分离良好，阈值可以在较宽的范围内选择（图 3H），这表明算法对这些阈值参数具有鲁棒性。在较低信噪比设置下，这些边界可能不那么清晰，可能更倾向于采用增量方法（即先选择信噪比最高的神经元，然后估计背景并检查残差以选择信噪比最低的细胞）；该增量方法在材料与方法部分有更深入的讨论。

CNMF-E recovers the true neural activity and is robust to noise contamination and neuronal correlations in simulated data
CNMF-E 能够恢复真实的神经活动，并且在模拟数据中对噪声污染和神经元相关性具有鲁棒性

Using the same simulated dataset as in the previous section, we further refine the neuron shapes

(A)

and the temporal traces

(C)

by iteratively fitting the CNMF-E model. We compare the final results with PCA/ICA analysis (Mukamel et al., 2009) and the original CNMF method (Pnevmatikakis et al., 2016).
使用与上一节相同的模拟数据集，我们通过迭代拟合 CNMF-E 模型，进一步优化神经元形状

(A)

和时间轨迹

(C)

。我们将最终结果与 PCA/ICA 分析（Mukamel 等，2009）以及原始 CNMF 方法（Pnevmatikakis 等，2016）进行了比较。

After choosing the thresholds for seed pixels (Figure 3H), we run CNMF-E in full automatic mode, without any manual interventions. Two open-source MATLAB packages, CellSort (https:// github.com/mukamel-lab/CellSort; Mukamel, 2016) and ca_source_extraction (https://github.com/ epnev/ca_source_extraction; Pnevmatikakis, 2016), were used to perform PCA/ICA (Mukamel et al., 2009) and basic CNMF (Pnevmatikakis et al., 2016), respectively. Since the initialization algorithm in CNMF fails due to the large contaminations from the background fluctuations in this setting (recall Figure 2), we use the ground truth as its initialization. As for the rank of the background model in CNMF, we tried all integer values between 1 and 16 and set it as 7 because it has the best performance in matching the ground truth. We emphasize that including the CNMF approach in this comparison is not fair for the other two approaches, because it uses the ground truth heavily, while PCA/ICA and CNMF-E are blind to the ground truth. The purpose here is to show the limitations of basic CNMF in modeling the background activity in microendoscopic data.
在选择种子像素的阈值后（图 3H），我们以全自动模式运行 CNMF-E，未进行任何手动干预。使用了两个开源 MATLAB 软件包，CellSort（https://github.com/mukamel-lab/CellSort；Mukamel，2016）和 ca_source_extraction（https://github.com/epnev/ca_source_extraction；Pnevmatikakis，2016），分别执行 PCA/ICA（Mukamel 等，2009）和基础 CNMF（Pnevmatikakis 等，2016）。由于 CNMF 中的初始化算法在此设置下因背景波动的强烈干扰而失败（回顾图 2），我们使用真实数据作为其初始化。至于 CNMF 中背景模型的秩，我们尝试了 1 到 16 之间的所有整数值，并将其设为 7，因为该值在匹配真实数据方面表现最佳。我们强调，将 CNMF 方法纳入此次比较对另外两种方法并不公平，因为它大量依赖真实数据，而 PCA/ICA 和 CNMF-E 对真实数据是盲目的。这里的目的是展示基础 CNMF 在建模微内窥镜数据中的背景活动时的局限性。

Video 6. The results of CNMF-E in demixing dorsal striatum data. Top left: the recorded fluorescence data. Bottom left: the estimated background. Top middle: the residual of the raw video (top left) after subtracting the estimated background (bottom left). Bottom middle: the denoised neural signals. Top right: the residual of the raw video data (top right) after subtracting the estimated background (bottom left) and denoised neural signal (bottom middle). Bottom right: the denoised neural signals while all neurons’ activity are coded with pseudocolors. MP4 DOI: https://doi.org/10.7554/eLife.28728.014
视频 6. CNMF-E 在分离背侧纹状体数据中的结果。左上：记录的荧光数据。左下：估计的背景。中上：从原始视频（左上）中减去估计背景（左下）后的残差。中下：去噪后的神经信号。右上：从原始视频数据（右上）中减去估计背景（左下）和去噪神经信号（中下）后的残差。右下：去噪后的神经信号，所有神经元的活动均用伪彩色编码。MP4 DOI：https://doi.org/10.7554/eLife.28728.014

Figure 3. CNMF-E accurately initializes individual neurons’ spatial and temporal components in simulated data. (A) An example frame of the simulated data. Green and red squares will correspond to panels (D) and (E) below, respectively. (B) The temporal mean of the cellular activity in the simulation. © The correlation image computed using the spatially filtered data. (D) An example of initializing an isolated neuron. Three selected pixels correspond to the center, the periphery, and the outside of a neuron. The raw traces and the filtered traces are shown as well. The yellow dashed line is the true neural signal of the selected neuron. Triangle markers highlight the spike times from the neuron. (E) Same as (D), but two neurons are spatially overlapping in this example. Note that in both cases neural activity is clearly visible in the filtered traces, and the initial estimates of the spatial footprints are already quite accurate (dashed lines are ground truth). (F) The contours of all initialized neurons on top of the correlation image as shown in (D). Contour colors represent the rank of neurons’ SNR (SNR decreases from red to yellow). The blue dots are centers of the true neurons. (G) The spatial and the temporal cosine similarities between each simulated neuron and its counterpart in the initialized neurons. (H) The local correlation and the peak-to-noise ratio for pixels located in the central area of each neuron (blue) and other areas (green). The red lines are the thresholding boundaries for screening seed pixels in our initialization step. A video showing the whole initialization step can be found at Video 3.
图 3. CNMF-E 在模拟数据中准确初始化单个神经元的空间和时间成分。(A) 模拟数据的一个示例帧。绿色和红色方框分别对应下方的面板(D)和(E)。(B) 模拟中细胞活动的时间平均值。(C) 使用空间滤波数据计算的相关图像。(D) 初始化一个孤立神经元的示例。三个选定的像素分别对应神经元的中心、边缘和外部。显示了原始信号和滤波后的信号。黄色虚线为所选神经元的真实神经信号。三角标记突出显示神经元的脉冲时间。(E) 与(D)相同，但此示例中两个神经元在空间上重叠。注意在两种情况下，滤波后的信号中神经活动都清晰可见，空间轮廓的初始估计已经相当准确（虚线为真实值）。(F) 所有初始化神经元的轮廓叠加在(D)所示的相关图像上。轮廓颜色表示神经元信噪比(SNR)的排名（SNR 从红色到黄色递减）。蓝点为真实神经元的中心。 (G) 每个模拟神经元与其对应的初始化神经元之间的空间和时间余弦相似度。(H) 位于每个神经元中心区域（蓝色）和其他区域（绿色）像素的局部相关性和峰值信噪比。红线为我们初始化步骤中筛选种子像素的阈值边界。整个初始化步骤的视频可见于视频 3。

We first pick three closeby neurons from the ground truth (Figure 4A, top) and see how well these neurons’ activities are recovered. PCA/ICA fails to identify one neuron, and for the other two identified neurons, it recovers temporal traces that are sufficiently noisy that small calcium transients are submerged in the noise. As for CNMF, the neuron shapes remain more or less at the initial condition (i.e. the ground truth spatial footprints), but clear contaminations in the temporal traces are visible. This is because the pure NMF model in CNMF does not model the true background well and the residuals in the background are mistakenly captured by neural components. In contrast, on this example, CNMF-E recovers the true neural shapes and neural activity with high accuracy.
我们首先从真实数据中选取三个相邻的神经元（图 4A，顶部），观察这些神经元的活动恢复情况。PCA/ICA 未能识别出一个神经元，对于另外两个被识别的神经元，其恢复的时间轨迹噪声较大，小的钙信号瞬变被噪声淹没。至于 CNMF，神经元形状基本保持在初始状态（即真实的空间足迹），但时间轨迹中明显存在污染。这是因为 CNMF 中的纯 NMF 模型无法很好地建模真实背景，背景中的残留部分被错误地归入神经元成分。相比之下，在此示例中，CNMF-E 能够高精度地恢复真实的神经元形状和神经活动。

We also compare the number of detected neurons: PCA/ICA detected 195 out of 200 neurons, while CNMF-E detected all 200 neurons. We also quantitatively evaluated the performance of source extraction by showing the spatial and temporal cosine similarities between detected neurons and ground truth (Figure 4C); we find that the neurons detected using PCA/ICA have much lower similarities with the ground truth (Figure 4C). We also note that the CNMF results are much worse than those of CNMF-E here, despite the fact that CNMF is initialized at the ground truth parameter values. This result clarifies an important point: the improvements from CNMF-E are not simply due to improvements in the initialization step. Furthermore, running the full iterative pipeline of CNMF-E leads to improvements in both spatial and temporal similarities, compared with the results in the initialization step.
我们还比较了检测到的神经元数量：PCA/ICA 检测到了 200 个神经元中的 195 个，而 CNMF-E 检测到了全部 200 个神经元。我们还通过显示检测到的神经元与真实数据之间的空间和时间余弦相似度（图 4C）对源提取的性能进行了定量评估；结果发现，使用 PCA/ICA 检测到的神经元与真实数据的相似度明显较低（图 4C）。我们还注意到，尽管 CNMF 是以真实参数值初始化的，但其结果远不如 CNMF-E。这一结果阐明了一个重要点：CNMF-E 的改进并非仅仅源于初始化步骤的改进。此外，与初始化步骤的结果相比，运行完整的 CNMF-E 迭代流程在空间和时间相似度上均有所提升。

In many downstream analyses of calcium imaging data, pairwise correlations provide an important metric to study coordinated network activity (Warp et al., 2012; Barbera et al., 2016; Dombeck et al., 2009; Klaus et al., 2017). Since PCA/ICA seeks statistically independent components, which forces the temporal traces to have near-zero correlation, the correlation structure is badly corrupted in the raw PCA/ICA outputs (Figure 4D). We observed that a large proportion of the independence comes from the noisy baselines in the extracted traces (data not shown), so we postprocessed the PCA/ICA output by thresholding at the 3 standard deviation level. This recovers some nonzero correlations, but the true correlation structure is not recovered accurately (Figure 4D). By contrast, the CNMF-E results matched the ground truth very well due to accurate extraction of individual neurons’ temporal activity (Figure 4D). As for CNMF, the estimated correlations are slightly elevated relative to the true correlations. This is due to the shared (highly correlated) background fluctuations that corrupt the recovered activity of nearby neurons.
在许多钙成像数据的下游分析中，成对相关性提供了研究协调网络活动的重要指标（Warp 等，2012；Barbera 等，2016；Dombeck 等，2009；Klaus 等，2017）。由于 PCA/ICA 寻求统计独立成分，这使得时间轨迹的相关性接近于零，因此原始 PCA/ICA 输出中的相关结构被严重破坏（图 4D）。我们观察到，大部分独立性来自提取轨迹中的噪声基线（数据未显示），因此我们通过在 3 个标准差水平进行阈值处理来后处理 PCA/ICA 输出。这恢复了一些非零相关性，但真实的相关结构并未被准确恢复（图 4D）。相比之下，CNMF-E 的结果由于准确提取了单个神经元的时间活动，与真实情况非常吻合（图 4D）。至于 CNMF，估计的相关性相较于真实相关性略有升高。这是由于共享的（高度相关的）背景波动污染了附近神经元恢复的活动。

Next, we compared the performance of the different methods under different SNR regimes. Because of the above inferior results we skip comparisons to the basic CNMF here. Based on the same simulation parameters as above, we vary the noise level

Σ

by multiplying it with a SNR reduction factor. Figure 4E shows that CNMF-E detects all neurons over a wide SNR range, while PCA/ ICA fails to detect the majority of neurons when the SNR drops to sufficiently low levels. Moreover, the detected neurons in CNMF-E preserve high spatial and temporal similarities with the ground truth (Figure 4F-G). This high accuracy of extracting neurons’ temporal activity benefits from the modeling of the calcium dynamics, which leads to significantly denoised neural activity. If we skip the temporal denoising step in the algorithm, CNMF-E is less robust to noise, but still outperforms PCA/ICA significantly (Figure 4G). When SNR is low, the improvements yielded by CNMF-E can be crucial for detecting weak neuron events, as shown in Figure 4H.
接下来，我们比较了不同方法在不同信噪比（SNR）条件下的表现。由于上述较差的结果，我们在此跳过与基本 CNMF 的比较。基于上述相同的模拟参数，我们通过乘以一个 SNR 降低因子来改变噪声水平

Σ

。图 4E 显示，CNMF-E 在较宽的 SNR 范围内能够检测到所有神经元，而当 SNR 降至足够低的水平时，PCA/ICA 无法检测到大多数神经元。此外，CNMF-E 检测到的神经元在空间和时间上与真实数据保持高度相似（图 4F-G）。这种高精度提取神经元时间活动的能力得益于对钙动力学的建模，从而显著去噪了神经活动。如果跳过算法中的时间去噪步骤，CNMF-E 对噪声的鲁棒性会降低，但仍显著优于 PCA/ICA（图 4G）。当 SNR 较低时，CNMF-E 带来的改进对于检测微弱的神经元事件至关重要，如图 4H 所示。

Finally, we examine the ability of CNMF-E to demix correlated and overlapping neurons. Using the two example neurons in Figure 3E, we ran multiple simulations at varying correlation levels and extracted neural components using the CNMF-E pipeline and PCA/ICA analysis. The spatial footprints in these simulations were fixed, but the temporal components were varied to have different correlation levels

(γ)

between calcium traces by tuning their shared component with the common background fluctuations. For high correlation levels (

γ > 0.7

), the initialization procedure tends to first initialize a component that explains the common activity between two neurons and then initialize another component to account for the residual of one neuron. After iteratively refining the model variables, CNMF-E successfully extracted the two neurons’ spatiotemporal activity even at very high correlation levels (

γ = 0.95

; Figure 5A,B). PCA/ICA was also often able to separate two neurons for large correlation levels (

γ = 0.9

, Figure 5B), but the extracted traces have problematic negative spikes that serve to reduce their statistical dependences (Figure 4A).
最后，我们检验了 CNMF-E 分离相关且重叠神经元的能力。以图 3E 中的两个示例神经元为例，我们在不同相关水平下进行了多次模拟，并使用 CNMF-E 流程和 PCA/ICA 分析提取神经元成分。这些模拟中的空间足迹是固定的，但通过调整它们与共同背景波动的共享成分，改变钙信号的时间成分以具有不同的相关水平

(γ)

。对于高相关水平（

γ > 0.7

），初始化过程倾向于先初始化一个成分来解释两个神经元之间的共同活动，然后再初始化另一个成分来解释其中一个神经元的残差。经过迭代细化模型变量后，CNMF-E 成功提取了两个神经元的时空活动，即使在非常高的相关水平下（

γ = 0.95

；图 5A,B）。PCA/ICA 在较大相关水平下也常能分离出两个神经元（

γ = 0.9

，图 5B），但提取的信号存在有问题的负脉冲，这些负脉冲用于减少它们的统计依赖性（图 4A）。

Application to dorsal striatum data
应用于背侧纹状体数据

We now turn to the analysis of large-scale microendoscopic datasets recorded from freely behaving mice. We run both CNMF-E and PCA/ICA for all datasets and compare their performances in detail.
我们现在转向对自由活动小鼠的大规模微内窥镜数据集的分析。我们对所有数据集运行了 CNMF-E 和 PCA/ICA 两种方法，并详细比较了它们的性能。

We begin by analyzing in vivo calcium imaging data of neurons expressing GCaMP6f in the mouse dorsal striatum. (Full experimental details and algorithm parameter settings for this and the following datasets appear in the Methods and Materials section.) CNMF-E extracted 692 putative neural components from this dataset; PCA/ICA extracted 547 components (starting from 700 initial components, and then automatically removing false positives using the same criterion as applied in CNMF-E). Figure 6A shows how CNMF-E decomposes an example frame into four components: the constant baselines that are invariant over time, the fluctuating background, the denoised neural
我们首先分析了小鼠背侧纹状体中表达 GCaMP6f 的神经元的体内钙成像数据。（本数据集及以下数据集的完整实验细节和算法参数设置见方法与材料部分。）CNMF-E 从该数据集中提取了 692 个假定的神经成分；PCA/ICA 提取了 547 个成分（从 700 个初始成分开始，然后使用与 CNMF-E 相同的标准自动去除假阳性）。图 6A 展示了 CNMF-E 如何将一个示例帧分解为四个成分：时间不变的恒定基线、波动的背景、去噪的神经

Video 7. The results of CNMF-E in demixing PFC data. Conventions as in previous video. MP4
视频 7. CNMF-E 在分离前额叶皮层数据中的结果。惯例同前视频。MP4
DOI: https://doi.org/10.7554/eLife.28728.016

Video 8. Comparison of CNMF-E with PCA/ICA in demixing overlapped neurons in Figure 7G. Top left: the recorded fluorescence data. Bottom left: the residual of the raw video (top left) after subtracting the estimated background using CNMF-E. Top middle and top right: the spatiotemporal activity and temporal traces of three neurons extracted using CNMF-E. Bottom middle and bottom right: the spatiotemporal activity and temporal traces of three neurons extracted using PCA/ICA. MP4
视频 8. 图 7G 中 CNMF-E 与 PCA/ICA 在分离重叠神经元方面的比较。左上：记录的荧光数据。左下：使用 CNMF-E 估计背景后，从原始视频（左上）中减去背景的残差。中上和右上：使用 CNMF-E 提取的三个神经元的时空活动和时间轨迹。中下和右下：使用 PCA/ICA 提取的三个神经元的时空活动和时间轨迹。MP4
DOI: https://doi.org/10.7554/eLife.28728.017
DOI：https://doi.org/10.7554/eLife.28728.017
signals, and the residuals. We highlight an example neuron by drawing its ROI to demonstrate the power of CNMF-E in isolating fluorescence signals of neurons from the background fluctuations. For the selected neuron, we plot the mean fluorescence trace of the raw data and the estimated background (Figure 6B). These two traces are very similar, indicating that the background fluctuation dominates the raw data. By subtracting this estimated background component from the raw data, we acquire a clean trace that represents the neural signal.
信号及残差。我们通过绘制感兴趣区域（ROI）突出显示一个示例神经元，以展示 CNMF-E 在从背景波动中分离神经元荧光信号的能力。对于选定的神经元，我们绘制了原始数据和估计背景的平均荧光轨迹（图 6B）。这两条轨迹非常相似，表明背景波动主导了原始数据。通过从原始数据中减去该估计的背景成分，我们获得了代表神经信号的干净轨迹。

To quantify the background effects further, we compute the contribution of each signal component in explaining the variance in the raw data. For each pixel, we compute the variance of the raw data first and then compute the variance of the background-subtracted data. Then the reduced variance is divided by the variance of the raw data, giving the proportion of variance explained by the background. Figure 6C (blue) shows the distribution of the background-explained variance over all pixels. The background accounts for around

90 %

of the variance on average. We further remove the denoised neural signals and compute the variance reduction; Figure 6C shows that neural signals account for less than

10 %

of the raw signal variance. This analysis is consistent with our observations that background dominates the fluorescence signal and extracting high-quality neural signals requires careful background signal removal.
为了进一步量化背景效应，我们计算了每个信号成分在解释原始数据方差中的贡献。对于每个像素，我们首先计算原始数据的方差，然后计算去除背景后的数据方差。接着，将减少的方差除以原始数据的方差，得到背景解释的方差比例。图 6C（蓝色）显示了所有像素中背景解释方差的分布。背景平均占方差的约

90 %

。我们进一步去除去噪的神经信号并计算方差减少；图 6C 显示神经信号占原始信号方差的比例不到

10 %

。该分析与我们的观察一致，即背景主导了荧光信号，提取高质量的神经信号需要仔细去除背景信号。

The contours of the spatial footprints inferred by the two approaches (PCA/ICA and CNMF-E) are depicted in Figure 6D, superimposed on the correlation image of the filtered raw data. The indicated area was cropped from Figure 6A (left). In this case, most neurons inferred by PCA/ICA were inferred by CNMF-E as well, with the exception of a few components that seemed to be false positives (judging by their spatial shapes and temporal traces and visual inspection of the raw data movie; detailed data not shown). However, many realistic components were only detected by CNMF-E (shown as the green areas in Figure 6D). In these plots, we rank the inferred components according to their SNRs; the color indicates the relative rank (decaying from red to yellow). We see that the components missed by PCA/ICA have low SNRs (green shaded areas with yellow contours).
图 6D 展示了两种方法（PCA/ICA 和 CNMF-E）推断的空间足迹轮廓，叠加在滤波后的原始数据相关图像上。所示区域是从图 6A（左）中裁剪出来的。在这种情况下，大多数由 PCA/ICA 推断的神经元也被 CNMF-E 推断出来，只有少数几个成分似乎是误报（根据它们的空间形状和时间轨迹以及对原始数据电影的目视检查判断；详细数据未显示）。然而，许多真实的成分仅被 CNMF-E 检测到（在图 6D 中显示为绿色区域）。在这些图中，我们根据信噪比（SNR）对推断的成分进行排序；颜色表示相对排名（从红色到黄色递减）。我们看到 PCA/ICA 遗漏的成分具有较低的 SNR（绿色阴影区域带黄色轮廓）。

Figure 6E shows the spatial and temporal components of 14 example neurons detected only by CNMF-E. Here (and in the following figures), for illustrative purposes, we show the calcium traces before the temporal denoising step. For neurons that are inferred by both methods, CNMF-E shows significant improvements in the SNR of the extracted cellular signals (Figure 6F), even before the temporal denoising step is applied. In panel

G

we randomly select 10 examples and examine their spatial and temporal components. Compared with the CNMF-E results, PCA/ICA components have much smaller size, often with negative dips surrounding the neuron (remember that ICA avoids spatial overlaps in order to reduce nearby neurons’ statistical dependences, leading to some loss of signal strength; see (Pnevmatikakis et al., 2016) for further discussion). The activity traces extracted by
图 6E 显示了仅由 CNMF-E 检测到的 14 个示例神经元的空间和时间成分。在这里（以及后续图中），为了说明，我们展示了时间去噪步骤之前的钙信号轨迹。对于两种方法都推断出的神经元，CNMF-E 在提取的细胞信号的信噪比（SNR）上表现出显著提升（图 6F），即使在应用时间去噪步骤之前也是如此。在面板

G

中，我们随机选择了 10 个示例并检查它们的空间和时间成分。与 CNMF-E 结果相比，PCA/ICA 成分的尺寸要小得多，且常常在神经元周围出现负向下陷（请记住，ICA 为了减少邻近神经元的统计依赖，避免空间重叠，导致信号强度有所损失；详见（Pnevmatikakis 等，2016））。提取的活动轨迹

Video 9. The results of CNMF-E in demixing ventral hippocampus data. Conventions as in Video 6. MP4 DOI: https://doi.org/10.7554/eLife.28728.019
视频 9. CNMF-E 在分离腹侧海马数据中的结果。惯例同视频 6。MP4 DOI：https://doi.org/10.7554/eLife.28728.019

CNMF-E are visually cleaner than the PCA/ICA traces; this is important for reliable event detection, particularly in low SNR examples. See Klaus et al., 2017) for additional examples of CNMF-E applied to striatal data.
CNMF-E 的视觉效果比 PCA/ICA 的轨迹更清晰；这对于可靠的事件检测尤为重要，特别是在低信噪比的样本中。有关 CNMF-E 应用于纹状体数据的更多示例，请参见 Klaus 等人，2017 年的研究。

Application to data in prefrontal cortex
应用于前额叶皮层的数据

We repeat a similar analysis on GCaMP6s data recorded from prefrontal cortex (PFC, Figure 7), to quantify the performance of the algorithm in a different brain area with a different calcium indicator. Again we find that CNMF-E successfully extracts neural signals from a strong fluctuating background (Figure 7A), which contributes a large proportion of the variance in the raw data (Figure 7B). Similarly as with the striatum data, PCA/ICA analysis missed many components that have very weak signals ( 33 missed components here). For the matched neurons, CNMF-E shows strong improvements in the SNRs of the extracted traces (Figure 7D). Consistent with our observation in striatum (Figure 6G), the spatial footprints of PCA/ICA components are shrunk to promote statistical independence between neurons, while the neurons inferred by CNMF-E have visually reasonable morphologies (Figure 6E). As for calcium traces with high SNRs (Figure 7E, cell 1-6), CNMF-E traces have smaller noise values, which is important for detecting small calcium transients (Figure 7E, cell 4). For traces with low SNRs (Figure 7, cell 7-10), it is challenging to detect any calcium events from the PCA/ICA traces due to the large noise variance; CNMF-E is able to visually recover many of these weaker signals. For those cells missed by PCA/ICA, their traces extracted by CNMF-E have reasonable morphologies and visible calcium events (Figure 7F).
我们对来自前额叶皮层（PFC，图 7）使用 GCaMP6s 记录的数据进行了类似的分析，以量化该算法在不同脑区和不同钙指示剂中的表现。我们再次发现，CNMF-E 能够成功地从强烈波动的背景中提取神经信号（图 7A），该背景在原始数据中贡献了很大比例的方差（图 7B）。与纹状体数据类似，PCA/ICA 分析遗漏了许多信号非常微弱的成分（此处遗漏了 33 个成分）。对于匹配的神经元，CNMF-E 在提取的信号轨迹的信噪比（SNR）上表现出显著提升（图 7D）。与我们在纹状体中的观察一致（图 6G），PCA/ICA 成分的空间足迹被缩小以促进神经元之间的统计独立性，而 CNMF-E 推断出的神经元则具有视觉上合理的形态（图 6E）。对于高信噪比的钙信号轨迹（图 7E，细胞 1-6），CNMF-E 的轨迹噪声值更小，这对于检测微小的钙瞬变非常重要（图 7E，细胞 4）。对于信噪比较低的痕迹（图 7，第 7-10 个细胞），由于噪声方差较大，难以从 PCA/ICA 痕迹中检测到任何钙信号事件；而 CNMF-E 能够直观地恢复许多这些较弱的信号。对于 PCA/ICA 未检测到的细胞，CNMF-E 提取的痕迹具有合理的形态并且钙信号事件清晰可见（图 7F）。

The demixing performance of PCA/ICA analysis can be relatively weak because it is inherently a linear demixing method (Pnevmatikakis et al., 2016). Since CNMF-E uses a more suitable nonlinear matrix factorization method, it has a better capability of demixing spatially overlapping neurons. As an example, Figure 7G shows three closeby neurons identified by both CNMF-E and PCA/ICA analysis. PCA/ICA forces its obtained filters to be spatially separated to reduce their dependence (thus reducing the effective signal strength), while CNMF-E allows inferred spatial components to have large overlaps (Figure 7G, left), retaining the full signal power. In the traces extracted by PCA/ICA, the component labeled in green contains many negative ‘spikes,’ which are highly correlated with the spiking activity of the blue neuron (Figure 7G, yellow). In addition, the green PCA/ICA neuron has significant crosstalk with the red neuron due to the failure of signal demixing (Figure 7G, cyan); the CNMF-E traces shows no comparable negative ‘spikes’ or crosstalk. See also Video 8 for further details.
PCA/ICA 分析的去混性能相对较弱，因为它本质上是一种线性去混方法（Pnevmatikakis 等，2016）。由于 CNMF-E 采用了更合适的非线性矩阵分解方法，因此在去混空间重叠的神经元方面具有更好的能力。举例来说，图 7G 显示了由 CNMF-E 和 PCA/ICA 分析均识别出的三个相邻神经元。PCA/ICA 强制其获得的滤波器在空间上分离以减少它们的依赖性（从而降低有效信号强度），而 CNMF-E 允许推断的空间成分有较大重叠（图 7G，左），保留了完整的信号强度。在 PCA/ICA 提取的信号轨迹中，标记为绿色的成分包含许多负“峰”，这些峰与蓝色神经元的尖峰活动高度相关（图 7G，黄色）。此外，由于信号去混失败，绿色 PCA/ICA 神经元与红色神经元存在显著串扰（图 7G，青色）；而 CNMF-E 的轨迹则没有出现类似的负“峰”或串扰。更多细节请参见视频 8。

Application to ventral hippocampus neurons
应用于腹侧海马神经元

In the previous two examples, we analyzed data with densely packed neurons, in which the neuron sizes are all similar. In the next example, we apply CNMF-E to a dataset with much sparser and more heterogeneous neural signals. The data used here were recorded from amygdala-projecting neurons expressing GCaMP6f in ventral hippocampus. In this dataset, some neurons that are slightly above or below the focal plane were visible with prominent signals, though their spatial shapes are larger than neurons in the focal plane.
在前两个例子中，我们分析了神经元密集排列且大小相似的数据。在下一个例子中，我们将 CNMF-E 应用于神经信号更稀疏且异质性更大的数据集。这里使用的数据记录自腹侧海马区表达 GCaMP6f 的杏仁核投射神经元。在该数据集中，一些略高于或低于焦平面的神经元显示出明显的信号，尽管它们的空间形状比焦平面内的神经元更大。

This example is somewhat more challenging due to the large diversity of neuron sizes. It is possible to set multiple parameters to detect neurons of different sizes (or to e.g. differentially detect somas versus smaller segments of axons or dendrites passing through the focal plane), but for illustrative purposes here we use a single neural size parameter to initialize all of the components. This in turn splits some large neurons into multiple components. Following this crude initialization step, we updated the background component and then picked the missing neurons from the residual using a second greedy component initialization step. Next, we ran CNMF-E for three iterations of
这个例子由于神经元大小的多样性较大，因此稍显复杂。可以设置多个参数来检测不同大小的神经元（或者例如区别检测胞体与通过焦平面的较小轴突或树突片段），但出于演示目的，这里我们使用单一的神经元大小参数来初始化所有组件。这反过来会将一些较大的神经元分割成多个组件。在这个粗略的初始化步骤之后，我们更新了背景组件，然后通过第二次贪婪组件初始化步骤从残差中挑选出遗漏的神经元。接下来，我们运行了三次迭代的 CNMF-E。

Figure 4. CNMF-E outperforms PCA/ICA analysis in extracting individual neurons’ activity from simulated data and is robust to low SNR. (A) The results of PCA/ICA, CNMF, and CNMF-E in recovering the spatial footprints and temporal traces of three example neurons. The trace colors match the neuron colors shown in the left. (B) The intermediate residual sum of squares (RSS) values (normalized by the final RSS value), during the CNMF-E model fitting. The ‘refine initialization’ step refers to the modification of the initialization results in the case of high temporal correlation (details in Materials and methods). © The spatial and the temporal cosine similarities between the ground truth and the neurons detected using different methods. (D) The pairwise correlations between the calcium activity traces extracted using different methods. (E-G) The performances of PCA/ICA and CNMF-E under different noise levels: the number of missed neurons (E), and the spatial (F) and temporal (G) cosine similarities between the extracted components and the ground truth. (H) The calcium traces of one example neuron: the ground truth (black), the PCA/ICA trace (blue), the CNMF-E trace (red) and the CNMF-E trace without being denoised (cyan). The similarity values shown in the figure are computed as the cosine similarity between each trace and the ground truth (black). Two videos showing the demixing results of the simulated data can be found in Video 4 (SNR reduction factor

= 1

) and Video 5 (SNR reduction factor

= 6

).
图 4. CNMF-E 在从模拟数据中提取单个神经元活动方面优于 PCA/ICA 分析，并且对低信噪比具有鲁棒性。(A) PCA/ICA、CNMF 和 CNMF-E 在恢复三个示例神经元的空间足迹和时间轨迹方面的结果。轨迹颜色与左侧显示的神经元颜色相匹配。(B) CNMF-E 模型拟合过程中中间残差平方和（RSS）值（归一化为最终 RSS 值）。“细化初始化”步骤指的是在高时间相关性情况下对初始化结果的修改（详见材料与方法）。(C) 使用不同方法检测到的神经元与真实值之间的空间和时间余弦相似度。(D) 使用不同方法提取的钙信号活动轨迹之间的成对相关性。(E-G) PCA/ICA 和 CNMF-E 在不同噪声水平下的性能：漏检神经元数量(E)，提取成分与真实值之间的空间(F)和时间(G)余弦相似度。 (H) 一个示例神经元的钙信号轨迹：真实信号（黑色）、PCA/ICA 轨迹（蓝色）、CNMF-E 轨迹（红色）以及未去噪的 CNMF-E 轨迹（青色）。图中显示的相似度值是通过每条轨迹与真实信号（黑色）之间的余弦相似度计算得出。两个展示模拟数据解混结果的视频分别为视频 4（信噪比降低因子

= 1

）和视频 5（信噪比降低因子

= 6

）。
DOI: https://doi.org/10.7554/eLife.28728.009
DOI：https://doi.org/10.7554/eLife.28728.009
updating the model variables

A, C

, and

B

. The first two iterations were performed automatically; we included manual interventions (e.g. merging/deleting components) before the last iteration, leading to improved source extraction results (see Video 10 for details on the manual merge and delete interventions performed here). In this example, we detected 24 CNMF-E components and 24 PCA/ ICA components. The contours of these inferred neurons are shown in Figure 8A. In total we have 20 components detected by both methods (shown in the first three rows of Figure 8B+C); each method detected extra components that are not detected by the other (the last rows of Figure 8B
更新模型变量

A, C

和

B

。前两次迭代是自动完成的；在最后一次迭代之前，我们进行了人工干预（例如合并/删除成分），从而提升了信号源提取的效果（有关此处进行的人工合并和删除干预的详细信息，请参见视频 10）。在此示例中，我们检测到了 24 个 CNMF-E 成分和 24 个 PCA/ICA 成分。这些推断出的神经元轮廓显示在图 8A 中。总共 20 个成分被两种方法共同检测到（显示在图 8B+C 的前三行）；每种方法还检测到了对方未检测到的额外成分（图 8B 的最后几行）。

C). Once again, the PCA/ICA filters contain many negative pixels in an effort to reduce spatial overlaps; see components 3 and 5 in Figure 8A-C, for example. All traces of the inferred neurons are shown in Figure 8D+E. We can see that the CNMF-E traces have much lower noise level and cleaner neural signals in both high and low SNR settings. Conversely, the calcium traces of the three extra neurons identified by PCA/ICA show noisy signals that are unlikely to be neural responses.
C). 再次强调，PCA/ICA 滤波器包含许多负像素，以减少空间重叠；例如，见图 8A-C 中的成分 3 和 5。所有推断出的神经元的信号轨迹显示在图 8D+E 中。我们可以看到，CNMF-E 的信号轨迹在高信噪比和低信噪比条件下噪声水平更低，神经信号更清晰。相反，PCA/ICA 识别出的三个额外神经元的钙信号轨迹显示出噪声较大，且不太可能是神经反应。

Application to footshock responses in the bed nucleus of the stria terminalis (BNST)
应用于终纹床核（BNST）对足部电击反应的研究

Identifying neurons and extracting their temporal activity is typically just the first step in the analysis of calcium imaging data; downstream analyses rely heavily on the quality of this initial source extraction. We showed above that, compared to PCA/ICA, CNMF-E is better at extracting activity dynamics, especially in regimes where neuronal activities are correlated (c.f. Figure 4D). Using in vivo electrophysiological recordings, we previously showed that neurons in the bed nucleus of the stria terminalis (BNST) show strong responses to unpredictable footshock stimuli (Jennings et al., 2013). We therefore measured
识别神经元并提取其时间活动通常只是钙成像数据分析的第一步；后续分析在很大程度上依赖于这一初始信号源提取的质量。我们之前展示过，与 PCA/ICA 相比，CNMF-E 在提取活动动态方面表现更佳，尤其是在神经元活动相关的情况下（参见图 4D）。利用体内电生理记录，我们之前证明了终纹床核（BNST）中的神经元对不可预测的足部电击刺激表现出强烈反应（Jennings 等，2013）。因此，我们测量了

Video 10. Extracted spatial and temporal components of CNMF-E at different stages (ventral hippocampal dataset). After initializing components, we ran matrix updates and interventions in automatic mode, resulting in 32 components in total. In the next iteration, we manually deleted 6 components and automatically merged neurons as well. In the last iterations, 4 neurons were merged into 2 neurons with manual verifications. The correlation image in the top left panel is computed from the background-subtracted data in the final step. MP4
视频 10. 不同阶段 CNMF-E 提取的空间和时间成分（腹侧海马数据集）。在初始化成分后，我们以自动模式运行矩阵更新和干预，总共得到 32 个成分。在下一次迭代中，我们手动删除了 6 个成分，并自动合并了神经元。在最后几次迭代中，经过手动验证，4 个神经元合并成 2 个神经元。左上角面板中的相关图像是从最终步骤中减去背景的数据计算得出。MP4
DOI: https://doi.org/10.7554/eLife.28728.020 calcium dynamics in CaMKII-expressing neurons that were transfected with the calcium indicator GCaMP6s in the BNST and analyzed the synchronous activity of multiple neurons in response to unpredictable footshock stimuli. We chose 12 example neurons that were detected by both CNMF-E and PCA/ICA methods and show their spatial and temporal components in Figure 9A-C. The activity around the onset of the repeated stimuli are aligned and shown as pseudo-colored images in panel D. The median responses of CNMF-E neurons display prominent responses to the footshock stimuli compared with the resting state before stimuli onset. In comparison, the activity dynamics extracted by PCA/ICA have relatively low SNR, making it more challenging to reliably extract footshock responses. Panel E summarizes the results of panel D; we see that CNMF-E outputs significantly more easily detectable responses than does PCA/ICA. This is an example in which downstream analyses of calcium imaging data can significantly benefit from the improvements in the accuracy of source extraction offered by CNMF-E. (sheintuch2017tracking recently presented another such example, showing that more neurons can be tracked across multiple days using CNMF-E outputs, compared to PCA/ICA.)
我们在 BNST 中转染了表达钙指示剂 GCaMP6s 的 CaMKII 表达神经元，记录了钙动力学，并分析了多个神经元对不可预测电击刺激的同步活动。我们选择了 12 个由 CNMF-E 和 PCA/ICA 方法均检测到的示例神经元，并在图 9A-C 中展示了它们的空间和时间成分。重复刺激开始时的活动被对齐，并在面板 D 中以伪彩色图像显示。CNMF-E 神经元的中位响应显示出对电击刺激的显著反应，相较于刺激开始前的静息状态。相比之下，PCA/ICA 提取的活动动态信噪比较低，使得可靠提取电击反应更加困难。面板 E 总结了面板 D 的结果；我们看到 CNMF-E 输出的响应明显比 PCA/ICA 更易于检测。这是一个例子，说明钙成像数据的下游分析可以显著受益于 CNMF-E 在信号源提取准确性上的提升。 (sheintuch2017tracking 最近展示了另一个类似的例子，表明使用 CNMF-E 输出相比 PCA/ICA，可以在多天内追踪更多的神经元。)

Conclusion 结论

Microendoscopic calcium imaging offers unique advantages and has quickly become a critical method for recording large neural populations during unrestrained behavior. However, previous methods fail to adequately remove background contaminations when demixing single neuron activity from the raw data. Since strong background signals are largely inescapable in the context of onephoton imaging, insufficient removal of the background could yield problematic conclusions in downstream analysis. This has presented a severe and well-known bottleneck in the field. We have delivered a solution for this critical problem, building on the constrained nonnegative matrix factorization framework introduced in Pnevmatikakis et al., 2016 but significantly extending it in order to more accurately and robustly remove these contaminating background components.
微内窥镜钙成像具有独特优势，已迅速成为在自由行为中记录大量神经元群体的关键方法。然而，先前的方法在从原始数据中分离单个神经元活动时，未能充分去除背景污染。由于在单光子成像环境中强背景信号几乎不可避免，背景去除不足可能导致后续分析中出现问题性结论。这在该领域构成了一个严重且广为人知的瓶颈。我们基于 Pnevmatikakis 等人 2016 年提出的约束非负矩阵分解框架，提出了解决这一关键问题的方案，并对其进行了显著扩展，以更准确、更稳健地去除这些污染的背景成分。

Figure 5. CNMF-E is able to demix neurons with high temporal correlations. (A) An example simulation from the experiments summarized in panel (B), where

corr (c_{1}, c_{2})

is 0.9 : green and red traces correspond to the corresponding neuronal shapes in the left panels. The blue trace is the mean background fluorescence fluctuation over the whole FOV. (B) The extraction accuracy of the spatial (

a_{1}

and

a_{2}

) and the temporal (

c_{1}

and

c_{2}

) components of two close-by neurons, computed via the cosine similarity between the ground truth and the extraction results. DOI: https://doi.org/10.7554/eLife.28728.012
图 5. CNMF-E 能够分离具有高度时间相关性的神经元。(A) 来自面板(B)总结的实验的一个示例模拟，其中

corr (c_{1}, c_{2})

为 0.9：绿色和红色曲线对应左侧面板中相应的神经元形状。蓝色曲线是整个视野内背景荧光波动的平均值。(B) 通过计算真实值与提取结果之间的余弦相似度，得到两个相邻神经元的空间（

a_{1}

和

a_{2}

）和时间（

c_{1}

和

c_{2}

）成分的提取准确度。DOI：https://doi.org/10.7554/eLife.28728.012

The proposed CNMF-E algorithm can be used in either automatic or semi-automatic mode, and leads to significant improvements in the accuracy of source extraction compared with previous methods. In addition, CNMF-E requires very few parameters to be specified, and these parameters are easily interpretable and can be selected within a broad range. We demonstrated the power of CNMF-E using data from a wide diversity of brain areas (subcortical, cortical, and deep brain areas), SNR regimes, calcium indicators, neuron sizes and densities, and hardware setups. Among all these examples (and many others not shown here), CNMF-E performs well and improves significantly on the standard PCA/ICA approach. Considering that source extraction is typically just the first step in calcium imaging data analysis pipelines (Mohammed et alo, 2016), these improvements should in turn lead to more stable and interpretable results from downstream analyses. Further applications of the CNMF-E approach appear in (Cameron et al., 2016; Donahue and Kreitzer, 2017; Jimenez et al., 2016; Jimenez et al., 2018; Klaus et al., 2017; Lin et al., 2017; Murugan et al., 2016; Murugan et al., 2017; Rodriguez-Romaguera et al., 2017; Tombaz et al., 2016; Ung et al.,
所提出的 CNMF-E 算法可以在自动或半自动模式下使用，并且与以往方法相比，在信号源提取的准确性上有显著提升。此外，CNMF-E 只需指定极少的参数，这些参数易于理解且可在较宽的范围内选择。我们利用来自多种脑区（皮层下、皮层及深脑区）、信噪比环境、钙指示剂、神经元大小和密度以及硬件配置的数据，展示了 CNMF-E 的强大性能。在所有这些示例（以及许多未在此展示的示例）中，CNMF-E 表现良好，并显著优于标准的 PCA/ICA 方法。考虑到信号源提取通常只是钙成像数据分析流程中的第一步（Mohammed 等，2016），这些改进应当进一步促进后续分析结果的稳定性和可解释性。 CNMF-E 方法的进一步应用见于（Cameron 等，2016；Donahue 和 Kreitzer，2017；Jimenez 等，2016；Jimenez 等，2018；Klaus 等，2017；Lin 等，2017；Murugan 等，2016；Murugan 等，2017；Rodriguez-Romaguera 等，2017；Tombaz 等，2016；Ung 等，

Figure 6. Neurons expressing GCaMP6f recorded in vivo in mouse dorsal striatum area. (A) An example frame of the raw data and its four components decomposed by CNMF-E. (B) The mean fluorescence traces of the raw data (black), the estimated background activity (blue), and the backgroundsubtracted data (red) within the segmented area (red) in (A). The variance of the black trace is about

2 x

the variance of the blue trace and

4 x

the variance of the red trace. © The distributions of the variance explained by different components over all pixels; note that estimated background signals dominate the total variance of the signal. (D) The contour plot of all neurons detected by CNMF-E and PCA/ICA superimposed on the correlation image. Green areas represent the components that are only detected by CNMF-E. The components are sorted in decreasing order based on their SNRs (from red to yellow). (E) The spatial and temporal components of 14 example neurons that are only detected by CNMF-E. These neurons all correspond to green areas in (D). (F) The signal-to-noise ratios (SNRs) of all neurons detected by both methods. Colors match the example traces shown in (G), which shows the spatial and temporal components of 10 example neurons detected by both methods. Scalebar: 10 s. See Video 6 for the demixing results.
图 6. 在小鼠背侧纹状体区域体内记录的表达 GCaMP6f 的神经元。(A) 原始数据的示例帧及其通过 CNMF-E 分解的四个成分。(B) 原始数据（黑色）、估计的背景活动（蓝色）和背景扣除数据（红色）在(A)中分割区域（红色）内的平均荧光轨迹。黑色轨迹的方差约为蓝色轨迹方差的

2 x

倍，红色轨迹方差的

4 x

倍。(C) 所有像素中不同成分解释的方差分布；注意估计的背景信号主导了信号的总方差。(D) 由 CNMF-E 和 PCA/ICA 检测到的所有神经元轮廓图叠加在相关图像上。绿色区域表示仅由 CNMF-E 检测到的成分。成分按信噪比（SNR）从高到低排序（由红到黄）。(E) 14 个仅由 CNMF-E 检测到的示例神经元的空间和时间成分。这些神经元均对应(D)中的绿色区域。(F) 两种方法均检测到的所有神经元的信噪比（SNR）。颜色与（G）中显示的示例轨迹相匹配，（G）展示了两种方法检测到的 10 个示例神经元的空间和时间成分。比例尺：10 秒。有关去混叠结果，请参见视频 6。
DOI: https://doi.org/10.7554/eLife.28728.013
DOI：https://doi.org/10.7554/eLife.28728.013

2017; Yu et al., 2017; Mackevicius et al., 2017; Madangopal et al., 2017; Roberts et al., 2017; Ryan et al., 2017; Roberts et al., 2017; Sheintuch et al., 2017).
2017；Yu 等，2017；Mackevicius 等，2017；Madangopal 等，2017；Roberts 等，2017；Ryan 等，2017；Roberts 等，2017；Sheintuch 等，2017）。

We have released our MATLAB implementation of CNMF-E as open-source software (https:// github.com/zhoupc/CNMF_E (Zhou, 2017a)). A Python implementation has also been incorporated into the CalmAn toolbox (Giovannucci et al., 2017b). We welcome additions or suggestions for modifications of the code, and hope that the large and growing microendoscopic imaging community finds CNMF-E to be a helpful tool in furthering neuroscience research.
我们已将 CNMF-E 的 MATLAB 实现作为开源软件发布（https://github.com/zhoupc/CNMF_E（Zhou，2017a））。Python 实现也已被纳入 CalmAn 工具箱（Giovannucci 等，2017b）。我们欢迎对代码的补充或修改建议，并希望庞大且不断增长的微内窥镜成像社区能发现 CNMF-E 是推动神经科学研究的有用工具。

Materials and methods 材料与方法

Algorithm for solving problem (P-S)
求解问题 (P-S) 的算法

In problem (P-S),

b_{0}

is unconstrained and can be updated in closed form:

{\hat{b}}_{0} = \frac{1}{T} (\tilde{Y} - A \cdot \hat{C} - {\hat{B}}^{f}) \cdot 1

. By plugging this update into problem (P-S), we get a reduced problem
在问题 (P-S) 中，

b_{0}

不受约束，可以通过闭式解更新：

{\hat{b}}_{0} = \frac{1}{T} (\tilde{Y} - A \cdot \hat{C} - {\hat{B}}^{f}) \cdot 1

。将此更新代入问题 (P-S)，我们得到一个简化问题

\begin{array}{cl} \underset{A}{minimize} & ‖ \tilde{Y} - A \cdot \tilde{C} ‖_{F}^{2} \\ subject to & A \geq 0, A is local and sparse, \end{array}

where

\tilde{Y} = Y - {\hat{B}}^{f} - \frac{1}{T} Y 11^{T}

and

\tilde{C} = \hat{C} - \frac{1}{T} \hat{C} 11^{T}

. We approach this problem using a version of "hierarchical alternating least squares’ (HALS; Cichocki et alo, 2007), a standard algorithm for nonnegative matrix factorization. (Friedrich et al., 2017b) modified the fastHALS algorithm (Cichocki and Phan, 2009) to estimate the nonnegative spatial components

A, b

and the nonnegative temporal activity

C, f

in the CNMF model

Y = A \cdot C + b f^{T} + E

by including sparsity and localization constraints. We solve a problem similar to the subproblem solved in Friedrich et al. (2017b):
其中

\tilde{Y} = Y - {\hat{B}}^{f} - \frac{1}{T} Y 11^{T}

和

\tilde{C} = \hat{C} - \frac{1}{T} \hat{C} 11^{T}

。我们采用“分层交替最小二乘法”（HALS；Cichocki 等，2007）的一种变体来解决该问题，这是一种非负矩阵分解的标准算法。（Friedrich 等，2017b）通过引入稀疏性和局部化约束，修改了 fastHALS 算法（Cichocki 和 Phan，2009），以估计 CNMF 模型中的非负空间成分

A, b

和非负时间活动

C, f

Y = A \cdot C + b f^{T} + E

。我们求解的问题与 Friedrich 等（2017b）中解决的子问题类似：

\begin{array}{cl} \underset{A}{minimize} & ‖ \tilde{Y} - A \cdot \tilde{C} ‖_{F}^{2} \\ subject to & A \geq 0, A is local and sparse, \end{array}

where

P_{k}

denotes the the spatial patch constraining the nonzero pixels of the

k

-th neuron and restricts the candidate spatial support of neuron

k

. This regularization reduces the number of free parameters in

A

, leading to speed and accuracy improvements. The spatial patches can be determined using a mildly dilated version of the support of the previous estimate of A (Pnevmatikakis et al., 2016; Friedrich et al., 2017a).
其中

P_{k}

表示限制第

k

个神经元非零像素的空间补丁，并限制神经元

k

的候选空间支持。该正则化减少了

A

中的自由参数数量，从而提高了速度和准确性。空间补丁可以使用先前估计的 A 支持的轻度膨胀版本来确定（Pnevmatikakis 等，2016；Friedrich 等，2017a）。

Algorithms for solving problem (P-T)
求解问题 (P-T) 的算法

In problem (P-T), the model variable

C \in R^{K \times T}

could be very large, making the direct solution of (PT ) computationally expensive. Unlike problem (

P - S

), the problem (

P - T

) cannot be readily parallelized because the constraints

G^{(i)} c_{i} \geq 0

couple the entries within each row of C , and the residual term couples entries across columns. Here, we follow the block coordinate-descent approach used in (Pnevmatikakis et al., 2016) and propose an algorithm that sequentially updates each

c_{i}

and

b_{0}

. For each neuron, we start with a simple unconstrained estimate of

c_{i}

, denoted as

{\hat{y}}_{i}

, that minimizes the residual of the spatiotemporal data matrix while fixing other neurons’ spatiotemporal activity and the baseline term

b_{0}

,
在问题 (P-T) 中，模型变量

C \in R^{K \times T}

可能非常大，使得直接求解 (P-T) 计算成本高昂。与问题 (

P - S

) 不同，问题 (

P - T

) 不能轻易并行化，因为约束

G^{(i)} c_{i} \geq 0

连接了 C 中每行的条目，而残差项连接了各列之间的条目。在这里，我们遵循 (Pnevmatikakis 等，2016) 中使用的块坐标下降方法，提出一种算法，依次更新每个

c_{i}

和

b_{0}

。对于每个神经元，我们从一个简单的无约束估计

c_{i}

开始，记为

{\hat{y}}_{i}

，该估计在固定其他神经元的时空活动和基线项

b_{0}

的情况下，最小化时空数据矩阵的残差，

{\hat{y}}_{i} = \underset{c_{i} \in R^{T}}{argmin} {‖ Y - {\hat{A}}_{∖ i} \cdot {\hat{C}}_{∖ i} - {\hat{a}}_{i} c_{i} - {\hat{b}}_{0} \cdot 1^{T} - {\hat{B}}^{f} ‖}_{F}^{2} = {\hat{c}}_{i} + \frac{{\hat{a}}_{i}^{T} \cdot Y_{res}}{{\hat{a}}_{i}^{T} {\hat{a}}_{i}},

where

Y_{res} = Y - \hat{A} \hat{C} - {\hat{b}}_{0} I^{T} - B^{f}

represents the residual given the current estimate of the model variables. Due to its unconstrained nature,

{\hat{y}}_{i}

is a noisy estimate of

c_{i}

, plus a constant baseline resulting from inaccurate estimation of

b_{0}

. Given

{\hat{y}}_{i}

, various deconvolution algorithms can be applied to obtain the denoised trace

{\hat{c}}_{i}

and deconvolved signal

{\hat{s}}_{i}

(Vogelstein et al., 2009; Pnevmatikakis et alo, 2013; Deneux et al., 2016; Friedrich et al., 2017b; Jewell and Witten, 2017); in CNMF-E, we use the OASIS algorithm from (Friedrich et alo, 2017b). (Note that the estimation of

c_{i}

is not dependent on accurate estimation of

b_{0}

, because the algorithm for estimating

c_{i}

will also automatically estimate the baseline term in

{\hat{y}}_{i}

.) After the

c_{i}

's are updated,
其中

Y_{res} = Y - \hat{A} \hat{C} - {\hat{b}}_{0} I^{T} - B^{f}

表示给定当前模型变量估计的残差。由于其无约束的性质，

{\hat{y}}_{i}

是

c_{i}

的噪声估计值，加上由于

b_{0}

估计不准确而产生的常数基线。给定

{\hat{y}}_{i}

，可以应用各种去卷积算法来获得去噪迹线

{\hat{c}}_{i}

和去卷积信号

{\hat{s}}_{i}

（Vogelstein 等，2009；Pnevmatikakis 等，2013；Deneux 等，2016；Friedrich 等，2017b；Jewell 和 Witten，2017）；在 CNMF-E 中，我们使用(Friedrich 等，2017b)中的 OASIS 算法。（注意，

c_{i}

的估计不依赖于

b_{0}

的准确估计，因为估计

c_{i}

的算法也会自动估计

{\hat{y}}_{i}

中的基线项。）在更新

c_{i}

之后，

Video 11. The results of CNMF-E in demixing BNST data. Conventions as in Video 6. MP4
视频 11. CNMF-E 在分离 BNST 数据中的结果。惯例同视频 6。MP4
DOI: https://doi.org/10.7554/eLife.28728.022

Figure 7. Neurons expressing GCaMP6s recorded in vivo in mouse prefrontal cortex. (A-F) follow similar conventions as in the corresponding panels of Figure 6. (G) Three example neurons that are close to each other and detected by both methods. Yellow shaded areas highlight the negative ‘spikes’ correlated with nearby activity, and the cyan shaded area highlights one crosstalk between nearby neurons. Scalebar: 20 s. See Video 7 for the demixing results and Video 8 for the comparision of CNMF-E and PCA/ICA in the zoomed-in area of (G).
图 7. 在小鼠前额叶皮层体内记录的表达 GCaMP6s 的神经元。(A-F) 遵循与图 6 相应面板类似的惯例。(G) 三个彼此靠近且被两种方法均检测到的示例神经元。黄色阴影区域突出显示与附近活动相关的负“峰值”，青色阴影区域突出显示附近神经元之间的一个串扰。比例尺：20 秒。有关去混叠结果，请参见视频 7；有关(G)中放大区域 CNMF-E 与 PCA/ICA 的比较，请参见视频 8。
DOI: https://doi.org/10.7554/eLife.28728.015
DOI：https://doi.org/10.7554/eLife.28728.015
we update

b_{0}

using the closed-form expression

{\hat{b}}_{0} = \frac{1}{T} (\tilde{Y} - \hat{A} \cdot \hat{C} - {\hat{B}}^{f}) \cdot 1

.
我们使用闭式表达式

{\hat{b}}_{0} = \frac{1}{T} (\tilde{Y} - \hat{A} \cdot \hat{C} - {\hat{B}}^{f}) \cdot 1

更新

b_{0}

。

Estimating background by solving problem (P-B)
通过求解问题(P-B)估计背景

Next we discuss our algorithm for estimating the spatiotemporal background signal by solving problem (P-B) as a linear regression problem given

\hat{A}

and

\hat{C}

. Since

B^{f} \cdot 1 = 0

, we can easily estimate the constant baselines for each pixel as
接下来我们讨论通过将问题 (P-B) 作为线性回归问题来估计时空背景信号的算法，给定

\hat{A}

和

\hat{C}

。由于

B^{f} \cdot 1 = 0

，我们可以轻松地估计每个像素的恒定基线为

Figure 8. Neurons expressing GCaMP6f recorded in vivo in mouse ventral hippocampus. (A) Contours of all neurons detected by CNMF-E (red) and PCA/ICA method (green). The grayscale image is the local correlation image of the background-subtracted video data, with background estimated using CNMF-E. (B) Spatial components of all neurons detected by CNMF-E. The neurons in the first three rows are also detected by PCA/ICA, while the neurons in the last row are only detected by CNMF-E. © Spatial components of all neurons detected by PCA/ICA; similar to (B), the neurons in the first three rows are also detected by CNMF-E and the neurons in the last row are only detected by PCA/ICA method. (D) Temporal traces of all detected components in (B). ‘Match’ indicates neurons in top three rows in panel (B); ‘Other’ indicates neurons in the fourth row. (E) Temporal traces of all components in ©. Scalebars: 20 seconds. See Video 9 for demixing results.
图 8. 在小鼠腹侧海马体内体内记录的表达 GCaMP6f 的神经元。(A) 由 CNMF-E（红色）和 PCA/ICA 方法（绿色）检测到的所有神经元的轮廓。灰度图像是背景减除后视频数据的局部相关图像，背景使用 CNMF-E 估计。(B) 由 CNMF-E 检测到的所有神经元的空间成分。前三行的神经元也被 PCA/ICA 检测到，而最后一行的神经元仅被 CNMF-E 检测到。(C) 由 PCA/ICA 检测到的所有神经元的空间成分；与(B)类似，前三行的神经元也被 CNMF-E 检测到，最后一行的神经元仅被 PCA/ICA 方法检测到。(D) (B)中所有检测到成分的时间轨迹。“Match”表示(B)面板前三行的神经元；“Other”表示第四行的神经元。(E) (C)中所有成分的时间轨迹。比例尺：20 秒。详见视频 9 的解混结果。
DOI: https://doi.org/10.7554/eLife.28728.018
DOI：https://doi.org/10.7554/eLife.28728.018

{\hat{b}}_{0} = \frac{1}{T} (Y - \hat{A} \cdot \hat{C}) \cdot 1

Next we replace the

b_{0}

in (P-B) with this estimate and rewrite (P-B) as
接下来，我们用该估计值替换（P-B）中的

b_{0}

，并将（P-B）重写为

\begin{array}{cl} \underset{W}{minimize} & ‖ X - W \cdot X ‖_{F}^{2}, \\ subject to & W_{i j} = 0 if dist (x_{i}, x_{j}) \notin [l_{n}, l_{n} + 1), \end{array}

where

X = Y - \hat{A} \cdot \hat{C} - {\hat{b}}_{0} 1^{T}

. Given the optimized

\hat{W}

, our estimation of the fluctuating background is

{\hat{B}}^{f} = \hat{W} X

. The new optimization problem (P-W) can be readily parallelized into

d

linear regression problems for each pixel separately. By estimating all row columns of

W_{i, :}

, we are able to obtain the whole background signal as
其中

X = Y - \hat{A} \cdot \hat{C} - {\hat{b}}_{0} 1^{T}

。给定优化后的

\hat{W}

，我们对波动背景的估计为

{\hat{B}}^{f} = \hat{W} X

。新的优化问题（P-W）可以很容易地并行化为

d

个针对每个像素的线性回归问题。通过估计

W_{i, :}

的所有行列，我们能够获得整个背景信号为

\hat{B} = \hat{W} X + {\hat{b}}_{0} 1^{T}

In some cases,

X

might include large residuals from the inaccurate estimation of the neurons’
在某些情况下，

X

可能包含来自神经元不准确估计的大量残差
spatiotemporal activity

A C

, for example, missing neurons in the estimation. These residuals act as outliers and distort the estimation of

{\hat{B}}^{f}

and

b_{0}

. To overcome this problem, we use robust least squares regression (RLSR) via hard thresholding to avoid contaminations from the outliers (Bhatia et alo, 2015). Before solving the problem (P-W), we compute

B^{-} = \hat{W} (Y - \hat{A} \cdot \hat{C} - {\hat{b}}_{0} 1^{T})

(the current estimate of the fluctuating background) and then apply a simple clipping preprocessing step to

X

:
时空活动

A C

，例如，估计中缺失的神经元。这些残差作为异常值，扭曲了

{\hat{B}}^{f}

和

b_{0}

的估计。为了解决这个问题，我们通过硬阈值使用鲁棒最小二乘回归（RLSR）来避免异常值的干扰（Bhatia 等，2015）。在求解问题（P-W）之前，我们计算

B^{-} = \hat{W} (Y - \hat{A} \cdot \hat{C} - {\hat{b}}_{0} 1^{T})

（波动背景的当前估计），然后对

X

进行简单的截断预处理：

X_{i t}^{c l i p p e d} = {\begin{array}{cc} B_{i t}^{-} & if X_{it} \geq B_{it}^{-} + ζ \cdot σ_{i} \\ X_{i t} & otherwise \end{array}

Then we update the regression estimate using

X^{clipped}

instead of

X

, and iterate. Here,

σ_{i}

is the standard deviation of the noise at

x_{i}

and its value can be estimated using the power spectral density (PSD) method (Pnevmatikakis et al., 2016). As for the first iteration of the model fitting, we set each

B_{i t}^{-} = \frac{1}{| Ω_{i} |} \sum_{j \in Ω_{i}} {\tilde{X}}_{j t}

as the mean of the

{\tilde{X}}_{j t}

for all

j \in Ω_{i}

. The thresholding coefficient

ζ

can be specified by users, although we have found a fixed default works well across the datasets used here. This preprocessing removes most calcium transients by replacing those frames with the previously estimated background only. As a result, it increases the robustness to inaccurate estimation of

A C

, and in turn leads to a better extraction of

A C

in the following iterations.
然后我们使用

X^{clipped}

代替

X

来更新回归估计，并进行迭代。这里，

σ_{i}

是

x_{i}

处噪声的标准差，其值可以通过功率谱密度（PSD）方法估计（Pnevmatikakis 等，2016）。至于模型拟合的第一次迭代，我们将每个

B_{i t}^{-} = \frac{1}{| Ω_{i} |} \sum_{j \in Ω_{i}} {\tilde{X}}_{j t}

设为所有

j \in Ω_{i}

的

{\tilde{X}}_{j t}

的均值。阈值系数

ζ

可以由用户指定，尽管我们发现一个固定的默认值在这里使用的数据集中表现良好。此预处理通过仅用先前估计的背景替换那些帧，去除了大部分钙瞬变。因此，它提高了对

A C

不准确估计的鲁棒性，进而在后续迭代中更好地提取

A C

。

Initialization of model variables
模型变量的初始化

Since problem (P-All) is not convex in all of its variables, a good initialization of model variables is crucial for fast convergence and accurate extraction of all neurons’ spatiotemporal activity. Previous methods assume the background component is relatively weak, allowing us to initialize

\hat{A}

and

\hat{C}

while ignoring the background or simply initializing it with a constant baseline over time. However, the noisy background in microendoscopic data fluctuates more strongly than the neural signals (c.f. Figure 6C and Figure 7B), which makes previous methods less valid for the initialization of CNMF-E.
由于问题（P-All）在其所有变量上都不是凸的，模型变量的良好初始化对于快速收敛和准确提取所有神经元的时空活动至关重要。以往的方法假设背景成分相对较弱，使我们能够在忽略背景的情况下初始化

\hat{A}

和

\hat{C}

，或者仅用随时间变化的常数基线来初始化背景。然而，微内窥镜数据中的噪声背景波动比神经信号更强烈（参见图 6C 和图 7B），这使得以往的方法在 CNMF-E 的初始化中有效性较低。

Here, we design a new algorithm to initialize

\hat{A}

and

\hat{C}

without estimating

\hat{B}

. The whole procedure is illustrated in Figure 10 and described in Algorithm 1. The key aim of our algorithm is to exploit the relative spatial smoothness in the background compared to the single neuronal signals visible in the focal plane. Thus, we can use spatial filtering to reduce the background in order to estimate single neurons’ temporal activity, and then initialize each neuron’s spatial footprint given these temporal traces. Once we have initialized

\hat{A}

and

\hat{C}

, it is straightforward to initialize the constant baseline

b_{0}

and the fluctuating background

B^{f}

by solving problem (P-B).
在这里，我们设计了一种新的算法来初始化

\hat{A}

和

\hat{C}

，无需估计

\hat{B}

。整个过程如图 10 所示，并在算法 1 中进行了描述。我们算法的关键目标是利用背景相对于焦平面中可见的单个神经元信号的相对空间平滑性。因此，我们可以使用空间滤波来减少背景，以估计单个神经元的时间活动，然后根据这些时间轨迹初始化每个神经元的空间足迹。一旦我们初始化了

\hat{A}

和

\hat{C}

，通过求解问题(P-B)即可轻松初始化常数基线

b_{0}

和波动背景

B^{f}

。

Spatially filtering the data
空间滤波数据
We first filter the raw video data with a customized image kernel (Figure 10A). The kernel is generated from a Gaussian filter
我们首先用定制的图像核对原始视频数据进行滤波（图 10A）。该核是由高斯滤波器生成的

h (x) = \exp (- \frac{‖ x ‖^{2}}{2 (l / 4)^{2}})

Here, we use

h (x)

to approximate a cell body; the factor of

1 / 4

in the Gaussian width is chosen to match a Gaussian shape to a cell of width

l

. Instead of using

h (x)

as the filtering kernel directly, we subtract its spatial mean (computed over a region of width equal to

l

) and filter the raw data with

\tilde{h} (x) = h (x) - \bar{h} (x)

. The filtered data is denoted as

Z \in R^{d \times T}

(Figure 10B). This spatial filtering step helps accomplish two goals: (1) reducing the background

B

, so that

Z

is dominated by neural signals (albeit somewhat spatially distorted) in the focal plane (see Figure 10B as an example); (2) performing a template matching to detect cell bodies similar to the Gaussian kernel. Consequently,

Z

has large values near the center of each cell body. (However, note that we can not simply e.g. apply CNMF to

Z

, because the spatial components in a factorization of the matrix

Z

will typically no longer be nonnegative, and therefore NMF-based approaches can not be applied directly.) More importantly, the calcium traces near the neuron center in the filtered data preserve the calcium activity of the corresponding neurons because the filtering step results in a weighted average of cellular signals surrounding each pixel (Figure 10B). Thus, the fluorescence traces in pixels close to neuron centers in

Z

can be used for initializing the neurons’
在这里，我们使用

h (x)

来近似细胞体；高斯宽度中的因子

1 / 4

被选择用来使高斯形状匹配宽度为

l

的细胞。我们没有直接使用

h (x)

作为滤波核，而是减去其空间均值（在宽度等于

l

的区域内计算），并用

\tilde{h} (x) = h (x) - \bar{h} (x)

对原始数据进行滤波。滤波后的数据记为

Z \in R^{d \times T}

（见图 10B）。这一步空间滤波有助于实现两个目标：（1）减少背景

B

，使得

Z

主要由焦平面内的神经信号主导（尽管在空间上有些失真）（参见图 10B 示例）；（2）执行模板匹配以检测与高斯核相似的细胞体。因此，

Z

在每个细胞体中心附近具有较大值。（然而，请注意，我们不能简单地例如将 CNMF 应用于

Z

，因为矩阵

Z

的分解中的空间成分通常不再是非负的，因此基于 NMF 的方法不能直接应用。）更重要的是，滤波数据中靠近神经元中心的钙信号轨迹保留了相应神经元的钙活动，因为滤波步骤导致每个像素周围细胞信号的加权平均（图 10B）。因此，

Z

中靠近神经元中心的像素的荧光轨迹可以用来直接初始化神经元的
temporal activity directly. These pixels are defined as seed pixels. We next propose a quantitative method to rank all potential seed pixels.
时间活动。这些像素被定义为种子像素。接下来我们提出一种定量方法来对所有潜在的种子像素进行排序。

Ranking seed pixels 种子像素排序

A seed pixel

x

should have two main features: first,

Z (x)

, which is the filtered trace at pixel

x

, should have high peak-to-noise ratio (PNR) because it encodes the calcium concentration

c_{i}

of one neuron; second, a seed pixel should have high temporal correlations with its neighboring pixels (e.g. 4 nearest neighbors) because they share the same

c_{i}

. We computed two metrics for each of these two features:
一个种子像素

x

应具备两个主要特征：首先，

Z (x)

，即像素

x

处的滤波轨迹，应具有高峰值信噪比（PNR），因为它编码了一个神经元的钙浓度

c_{i}

；其次，种子像素应与其邻近像素（例如 4 个最近邻）具有高时间相关性，因为它们共享相同的

c_{i}

。我们为这两个特征分别计算了两个指标：

P (x) = \frac{max_{t} (Z (x, t))}{σ (x)}, L (x) = \frac{1}{4} \sum_{dist (x, x^{'}) = 1} corr (Z (x), Z (x^{'})) .

Recall that

σ (x)

is the standard deviation of the noise at pixel

x

; the function corr () refers to Pearson correlation here. In our implementation, we usually threshold

Z (x)

3 σ (x)

before computing

L (x)

to reduce the influence of the background residuals, noise, and spikes from nearby neurons.
回想一下，

σ (x)

是像素

x

处噪声的标准差；这里的函数 corr() 指的是皮尔逊相关系数。在我们的实现中，通常会先用

3 σ (x)

对

Z (x)

进行阈值处理，然后再计算

L (x)

，以减少背景残留、噪声和邻近神经元尖峰的影响。

Most pixels can be ignored when selecting seed pixels because their local correlations or PNR values are too small. To avoid unnecessary searches of the pixels, we set thresholds for both

P (x)

and

L (x)

, and only pick pixels larger than the thresholds

P_{min}

and

L_{min}

. It is empirically useful to combine both metrics for screening seed pixels. For example, high PNR values could result from large noise, but these pixels usually have small

L (x)

because the noise is not shared with neighboring pixels. On the other hand, insufficient removal of background during the spatial filtering leads to high

L (x)

, but the corresponding

P (x)

are usually small because most background fluctuations have been removed. So we create another matrix

R (x) = P (x) \cdot L (x)

that computes the pixelwise product of

P (x)

and

L (x)

. We rank all

R (x)

in a descending order and choose the pixel

x^{*}

with the largest

R (x)

for initialization.
在选择种子像素时，大多数像素可以忽略，因为它们的局部相关性或 PNR 值太小。为了避免不必要的像素搜索，我们对

P (x)

和

L (x)

都设置了阈值，只选择大于阈值

P_{min}

和

L_{min}

的像素。经验上，将这两个指标结合起来筛选种子像素是有用的。例如，高 PNR 值可能是由较大噪声引起的，但这些像素通常具有较小的

L (x)

，因为噪声不会与邻近像素共享。另一方面，空间滤波过程中背景去除不足会导致较高的

L (x)

，但相应的

P (x)

通常较小，因为大部分背景波动已被去除。因此，我们创建了另一个矩阵

R (x) = P (x) \cdot L (x)

，计算

P (x)

和

L (x)

的像素乘积。我们将所有

R (x)

按降序排列，选择具有最大

R (x)

的像素

x^{*}

作为初始化。

Algorithm 1. Initialize model variables

A

and

C

given the raw data
算法 1. 根据原始数据初始化模型变量

A

和

C

Require: data \(Y \in \mathbb{R}^{d \times T}\), neuron size \(l\), the minimum local correlation \(L_{\text {min }}\) and the minimum PNR \(P_{\text {min }}\) for selecting seed pixels.
    \(\mathrm{h} \leftarrow\) a truncated 2D Gaussian kernel of width \(\sigma_{\mathrm{x}}=\sigma_{\mathrm{y}}=\frac{1}{4} ; \mathrm{h} \in \mathbb{R}^{1 \times 1} \quad \quad \triangleright\) 2D Gaussian kernel
    \(\tilde{h} \leftarrow h-\bar{h} ; \tilde{h} \in \mathbb{R}^{l \times l} \quad \triangleright\) mean - centered kernel for spatial filtering
    \(Z \leftarrow \operatorname{conv}(Y, h) ; Z \in \mathbb{R}^{d \times T} \quad \quad \triangleright\) spatially filter the raw data
    \(L \leftarrow\) local cross - correlation image of the filtered data \(\mathrm{Z} ; L \in \mathbb{R}^{d}\)
    \(P \leftarrow\) PNR image of the filtered data Z; \(P \in \mathbb{R}^{d}\)
    \(k \leftarrow 0 \quad \triangleright\) neuron number
    while True do
        if \(L(\boldsymbol{x}) \leq L_{\text {min }}\) or \(P(\boldsymbol{x}) \leq P_{\text {min }}\) for all pixel \(\boldsymbol{x}\) then
            break;
        else
            \(k \leftarrow k+1\)
            \(\hat{\boldsymbol{a}}_{k} \leftarrow \boldsymbol{0} ; \boldsymbol{a} \in \mathbb{R}^{d}\)
            \(\boldsymbol{x}^{*} \leftarrow \operatorname{argmax}_{\boldsymbol{x}}(L(\boldsymbol{x}) \cdot P(\boldsymbol{x})) \quad\) D select a seed pixel
            \(\Omega_{k} \leftarrow\left\{\boldsymbol{x} \mid \boldsymbol{x}\right.\) is in the square box of length \((2 l+1)\) surrounding pixel \(\left.\boldsymbol{x}^{*}\right\} \quad \triangleright\) crop a small box near \(\boldsymbol{x}^{*}\)
                15: \(\quad \boldsymbol{r}(\boldsymbol{x}) \leftarrow \operatorname{corr}\left(Z(\boldsymbol{x},:), Z\left(\boldsymbol{x}^{*},:\right)\right)\) for all \(\boldsymbol{x} \in \Omega_{k} ; \boldsymbol{r} \in \mathbb{R}^{\left|\Omega_{k}\right|}\)
            \(\boldsymbol{y}_{B G} \leftarrow \frac{\sum_{\{\boldsymbol{x} \mid \boldsymbol{r}(\boldsymbol{x}) \leq 0.3\}} Y(\boldsymbol{x},:)}{\sum_{\{\boldsymbol{x} \mid \boldsymbol{r}(\boldsymbol{x}) \leq 0.3\}} 1} ; \boldsymbol{y}_{B G} \in \mathbf{R}^{T} \quad\) D estimate the background signal
            \(\hat{\boldsymbol{c}}_{k} \leftarrow \frac{\sum_{\{\boldsymbol{x} \mid \boldsymbol{x}(\boldsymbol{x}) \geq 0.7\}} Z(\boldsymbol{x},:)}{\sum_{\{\boldsymbol{x} \mid \boldsymbol{r}(\boldsymbol{x}) \geq 0.7\}} 1} ; \hat{\boldsymbol{c}}_{k} \in \mathbf{R}^{T} \quad\) D estimate neural signal
            \(\hat{\boldsymbol{a}}_{k}\left(\Omega_{k}\right), \hat{\boldsymbol{b}}^{(f)}, \hat{\boldsymbol{b}}^{(0)} \leftarrow \operatorname{argmin}_{\boldsymbol{a}, \boldsymbol{b}^{(f)}, \boldsymbol{b}^{(0)}}\left\|Y_{\Omega_{k}}-\left(\boldsymbol{a} \cdot \hat{\boldsymbol{c}}_{k}^{T}+\boldsymbol{b}^{(f)} \cdot \boldsymbol{y}_{B G}^{T}+\boldsymbol{b}^{(0)} \cdot \boldsymbol{1}^{T}\right)\right\|_{F}^{2}\)
            \(\hat{\boldsymbol{a}}_{k} \leftarrow \max \left(0, \hat{\boldsymbol{a}}_{k}\right) \quad \triangleright\) the spatial component of the \(\mathrm{k}-\) th neuron
            \(Y \leftarrow Y-\hat{\boldsymbol{a}}_{k} \cdot \hat{\boldsymbol{c}}_{k}^{T} \quad \triangleright\) peel away the neuron's spatiotemporal activity
            update \(\mathrm{L}(\boldsymbol{x})\) and \(\mathrm{P}(\boldsymbol{x})\) locally given the new Y
    \(A \leftarrow\left[\hat{\boldsymbol{a}}_{1}, \hat{\boldsymbol{a}}_{2}, \cdots, \hat{\boldsymbol{a}}_{k}\right]_{T}\)
    \(C \leftarrow\left[\hat{\boldsymbol{c}}_{1}, \hat{\boldsymbol{c}}_{2}, \cdots, \hat{\boldsymbol{c}}_{k}\right]^{T}\)
    return \(A, C\)

Greedy initialization 贪婪初始化

Our initialization method greedily initializes neurons one by one. Every time we initialize a neuron, we will remove its initialized spatiotemporal activity from the raw video data and initialize the next neuron from the residual. For the same neuron, there are several seed pixels that could be used to
我们的方法通过贪婪地逐个初始化神经元来进行。每次初始化一个神经元后，我们会从原始视频数据中去除其已初始化的时空活动，然后从残差中初始化下一个神经元。对于同一个神经元，有多个种子像素可用于
initialize it. But once the neuron has been initialized from any of these seed pixels (and the spatiotemporal residual matrix has been updated by peeling away the corresponding activity), the remaining seed pixels related to this neuron have lowered PNR and local correlation. This helps avoid the duplicate initialization of the same neuron. Also,

P (x)

and

L (x)

have to be updated after each neuron is initialized, but since only a small area near the initialized neuron is affected, we can update these quantities locally to reduce the computational cost. This procedure is repeated until the specified number of neurons have been initialized or no more candidate seed pixels exist.
初始化它。但一旦该神经元已从任一这些种子像素初始化（并且通过剥离相应的活动更新了时空残差矩阵），与该神经元相关的其余种子像素的 PNR 和局部相关性都会降低。这有助于避免对同一神经元的重复初始化。此外，

P (x)

和

L (x)

必须在每个神经元初始化后更新，但由于只有初始化神经元附近的小区域受到影响，我们可以局部更新这些量以减少计算成本。该过程会重复进行，直到初始化了指定数量的神经元或不再存在候选种子像素。

This initialization algorithm can greedily initialize the required number of neurons, but the subproblem of estimating

{\hat{a}}_{i}

given

{\hat{c}}_{i}

still has to deal with the large background activity in the residual matrix. We developed a simple method to remove this background and accurately initialize neuron shapes, described next. We first crop a

(2 l + 1) \times (2 l + 1)

square centered at

x^{*}

in the field of view (Figure 10A-E). Then we compute the temporal correlation between the filtered traces of pixel

x^{*}

and all other pixels in the patch (Figure 10D). We choose those pixels with small temporal correlations (e.g. 0.3) as the neighboring pixels that are outside of the neuron (the green contour in Figure 10D). Next, we estimate the background fluctuations as the median values of these pixels for each frame in the raw data (Figure 10E). We also select pixels that are within the neuron by selecting correlation coefficients larger than 0.7, then

{\hat{c}}_{i}

is refined by computing the mean filtered traces of these pixels (Figure 10E). Finally, we regress the raw fluorescence signal in each pixel onto three sources: the neuron signal (Figure 10E), the local background fluctuation (Figure 10F), and a constant baseline. Our initial estimate of

{\hat{a}}_{i}

is given by the regression weights onto

{\hat{c}}_{i}

in Figure 10F.
该初始化算法可以贪婪地初始化所需数量的神经元，但在给定

{\hat{c}}_{i}

的情况下估计

{\hat{a}}_{i}

的子问题仍需处理残差矩阵中的大量背景活动。我们开发了一种简单的方法来去除该背景并准确初始化神经元形状，具体如下所述。我们首先在视野中以

x^{*}

为中心裁剪一个

(2 l + 1) \times (2 l + 1)

的正方形（图 10A-E）。然后计算像素

x^{*}

的滤波迹线与补丁中所有其他像素的时间相关性（图 10D）。我们选择时间相关性较小的像素（例如 0.3）作为神经元外部的邻近像素（图 10D 中的绿色轮廓）。接着，我们估计这些像素在原始数据中每一帧的背景波动，取其中位数值（图 10E）。我们还通过选择相关系数大于 0.7 的像素来选取神经元内部的像素，然后通过计算这些像素的平均滤波迹线来细化

{\hat{c}}_{i}

（图 10E）。最后，我们将每个像素的原始荧光信号回归到三个来源：神经元信号（图 10E）、局部背景波动（图 10F）和一个常数基线。我们对

{\hat{a}}_{i}

的初始估计由图 10F 中对

{\hat{c}}_{i}

的回归权重给出。

Modifications for high temporal or spatial correlation
针对高时间相关性或空间相关性的修改

The above procedure works well in most experimental datasets as long as neurons are not highly spatially overlapped and temporally correlated. However, in a few extreme cases, this initialization may lead to bad local minima. We found that two practical modifications can lead to improved results.
只要神经元在空间上没有高度重叠且时间上不高度相关，上述过程在大多数实验数据集中表现良好。然而，在一些极端情况下，这种初始化可能导致不良的局部极小值。我们发现两种实际的修改可以带来更好的结果。

High temporal correlation, low spatial overlaps
高时间相关性，低空间重叠

The greedy initialization procedure assumes that closeby neurons are not highly correlated. If this assumption fails, CNMF-E will first merge nearby neurons into one component for explaining the shared fluctuations, and then the following initialized components will only capture the residual signals of each neuron. Our solution to this issue relies on our accurate background removal procedure, after which we simply re-estimate each neural trace

c_{i}

as a weighted fluorescence trace of the back-ground-subtracted video

(Y - {\hat{B}}^{f} - {\hat{b}}_{0} 1^{T})

,
贪婪初始化过程假设邻近的神经元之间相关性不高。如果该假设不成立，CNMF-E 会首先将邻近的神经元合并为一个成分来解释共享的波动，然后后续初始化的成分只会捕捉每个神经元的残差信号。我们对该问题的解决方案依赖于我们准确的背景去除过程，之后我们简单地将每个神经信号轨迹

c_{i}

重新估计为背景去除视频的加权荧光轨迹

(Y - {\hat{B}}^{f} - {\hat{b}}_{0} 1^{T})

，

{\hat{c}}_{i} = \frac{{\tilde{a}}^{T} \cdot (Y - {\hat{B}}^{f} - {\hat{b}}_{0} 1^{T})}{{\tilde{a}}^{T} \cdot \tilde{a}},

where

{\tilde{a}}_{i}

only selects pixels with large weights by thresholding the estimated

{\hat{a}}_{i}

with

max ({\hat{a}}_{i}) / 2

(this reduces the contributions from smaller neighboring neurons). This strategy improves the extraction of individual neurons’ traces in the high correlation scenarios and the spatial footprints can be corrected in the following step of updating

\hat{A}

. Figure 4B and Figure 5 illustrate this procedure.
其中

{\tilde{a}}_{i}

仅通过对估计的

{\hat{a}}_{i}

进行阈值处理

max ({\hat{a}}_{i}) / 2

来选择权重大像素（这减少了来自较小邻近神经元的贡献）。该策略改善了高相关性场景下单个神经元信号轨迹的提取，空间足迹可以在随后的更新

\hat{A}

步骤中得到修正。图 4B 和图 5 展示了该过程。

High spatial overlaps, low temporal correlation
高空间重叠，低时间相关性

CNMF-E may initialize components with shared temporal traces because they have highly overlapping areas. We solve this problem by de-correlating their traces (following a similar approach in [Pnevmatikakis et al., 2016]). We start by assuming that neurons with high spatial overlap do not fire spikes within the same frame. If so, only the inferred spiking trace with the largest value is kept and the rest will be set to 0 . Then we initialize each

c_{i}

given these thresholded spiking traces and the corresponding AR coefficients.
CNMF-E 可能会因为组件具有高度重叠的区域而初始化具有共享时间轨迹的组件。我们通过去相关它们的轨迹来解决这个问题（遵循[Pnevmatikakis 等人，2016]中的类似方法）。我们首先假设空间高度重叠的神经元不会在同一帧内发放脉冲。如果是这样，则仅保留推断出的脉冲轨迹中值最大的那个，其余的将被设为 0。然后，我们根据这些阈值化的脉冲轨迹和相应的 AR 系数初始化每个

c_{i}

。

Interventions 干预

We use iterative matrix updates to estimate model variables in CNMF-E. This strategy gives us the flexibility of integrating prior information on neuron morphology and temporal activity during the model fitting. The resulting interventions (which can in principle be performed either automatically or under manual control) can in turn lead to faster convergence and more accurate source
我们使用迭代矩阵更新来估计 CNMF-E 中的模型变量。这一策略使我们在模型拟合过程中能够灵活地整合关于神经元形态和时间活动的先验信息。由此产生的干预（原则上可以自动执行或在手动控制下进行）反过来可以促进更快的收敛和更准确的信号源识别。

Figure 9. Neurons extracted by CNMF-E show more reproducible responses to footshock stimuli, with larger signal sizes relative to the across-trial variability, compared to PCA/ICA. (A-C) Spatial components (A), spatial locations (B) and temporal components © of 12 example neurons detected by both CNMF-E and PCA/ICA. (D) Calcium responses of all example neurons to footshock stimuli. Colormaps show trial-by-trial responses of each neuron, extracted by CNMF-E (top, red) and PCA/ICA (bottom, green), aligned to the footshock time. The solid lines are medians of neural responses over 11 trials and the shaded areas correpond to median

\pm 1

median absolute deviation (MAD). Dashed lines indicate the shock timings. (E) Scatter plot of peak-to-MAD ratios for all response curves in (D). For each neuron, Peak is corrected by subtracting the mean activity within 4 s prior to stimulus onset and MAD is computed as the mean MAD values over all timebins shown in (D). The red line shows

y = x

. Scalebars: 10 s. See Video 11 for demixing results.
图 9. 由 CNMF-E 提取的神经元对足部电击刺激表现出更具重复性的反应，且信号大小相对于跨试验变异性更大，相较于 PCA/ICA。（A-C）由 CNMF-E 和 PCA/ICA 共同检测到的 12 个示例神经元的空间成分（A）、空间位置（B）和时间成分（C）。（D）所有示例神经元对足部电击刺激的钙信号反应。色彩图显示每个神经元的逐试验反应，分别由 CNMF-E（上方，红色）和 PCA/ICA（下方，绿色）提取，时间对齐至电击时刻。实线为 11 次试验中神经反应的中位数，阴影区域对应中位数±中位绝对偏差（MAD）。虚线表示电击时间。（E）(D)中所有反应曲线的峰值与 MAD 比值散点图。对于每个神经元，峰值通过减去刺激开始前 4 秒内的平均活动进行校正，MAD 计算为(D)中所有时间点的平均 MAD 值。红线表示 1。比例尺：10 秒。详见视频 11 的去混叠结果。
DOI: https://doi.org/10.7554/eLife.28728.021

Figure 10. Illustration of the initialization procedure. (A) Raw video data and the kernel for filtering the video data. (B) The spatially high-pass filtered data. © The local correlation image and the peak-to-noise ratio (PNR) image calculated from the filtered data in (B). (D) The temporal correlation coefficients between the filtered traces (B) of the selected seed pixel (the red cross) and all other pixels in the cropped area as shown in (A-C). The red and green contour correspond to correlation coefficients equal to 0.7 and 0.3 , respectively. (E) The estimated background fluctuation

y_{B G} (t)

(green) and the initialized temporal trace

{\hat{c}}_{i} (t)

of the neuron (red).

y_{B G} (t)

is computed as the median of the raw fluorescence traces of all pixels (green area) outside of the green contour shown in (D) and

{\hat{c}}_{i} (t)

is computed as the mean of the filtered fluorescence traces of all pixels inside the red contour. (F) The decomposition of the raw video data within the cropped area. Each component is a rank-1 matrix and the related temporal traces are estimated in (E). The spatial components are estimated by regressing the raw video data against these three traces. See Video 3 for an illustration of the initialization procedure.
图 10. 初始化过程示意图。(A) 原始视频数据及用于滤波的视频数据核。(B) 空间高通滤波后的数据。(C) 从(B)中滤波数据计算得到的局部相关图像和峰值信噪比（PNR）图像。(D) 选定种子像素（红色十字）与裁剪区域内所有其他像素在(B)中滤波轨迹的时间相关系数，如(A-C)所示。红色和绿色轮廓分别对应相关系数为 0.7 和 0.3。(E) 估计的背景波动

y_{B G} (t)

（绿色）和神经元初始化的时间轨迹

{\hat{c}}_{i} (t)

（红色）。

y_{B G} (t)

计算为(D)中绿色轮廓外绿色区域内所有像素的原始荧光轨迹的中位数，

{\hat{c}}_{i} (t)

计算为红色轮廓内所有像素滤波荧光轨迹的均值。(F) 裁剪区域内原始视频数据的分解。每个成分为秩 1 矩阵，相关的时间轨迹在(E)中估计。空间成分通过将原始视频数据对这三个轨迹进行回归估计。请参见视频 3 了解初始化过程的示意。
DOI: https://doi.org/10.7554/eLife.28728.023
DOI：https://doi.org/10.7554/eLife.28728.023
extraction. We integrate 5 interventions in our CNMF-E implementation. Following these interventions, we usually run one more iteration of matrix updates.
提取。我们在 CNMF-E 实现中整合了 5 种干预措施。完成这些干预后，通常会再运行一次矩阵更新迭代。

Merge existing components
合并现有组件

When a single neuron is split mistakenly into multiple components, a merge step is necessary to rejoin these components. If we can find all split components, we can superimpose all their spatiotemporal activities and run rank-1 NMF to obtain the spatial and temporal activity of the merged neuron. We automatically merge components for which the spatial and temporal components are correlated above certain thresholds. Our code also provides methods to manually specify neurons to be merged based on human judgment.
当单个神经元被错误地拆分成多个部分时，需要进行合并步骤以重新连接这些部分。如果我们能找到所有拆分的部分，就可以将它们的时空活动叠加起来，并运行秩-1 非负矩阵分解（NMF）以获得合并后神经元的空间和时间活动。我们会自动合并空间和时间成分相关性超过某些阈值的部分。我们的代码还提供了基于人工判断手动指定需要合并的神经元的方法。

Split extracted components
拆分提取的成分

When highly correlated neurons are mistakenly merged into one component, we need to use spatial information to split into multiple components according to neurons’ morphology. Our current implementation of component splitting requires users to manually draw ROls for splitting the spatial footprint of the extracted component. Automatic methods for ROI segmentation (Apthorpe et al., 2016; Pachitariu et al., 2013) could be added as an alternative in future implementations.
当高度相关的神经元被错误地合并为一个成分时，我们需要利用空间信息根据神经元的形态将其拆分成多个成分。我们当前的成分拆分实现要求用户手动绘制感兴趣区域（ROI）以拆分提取成分的空间足迹。未来的实现中可以添加自动 ROI 分割方法（Apthorpe 等，2016；Pachitariu 等，2013）作为替代方案。

Table 1. Variables used in the CNMF-E model and algorithm.

R

: real numbers;

R_{+}

: positive real numbers;

N

: natural numbers;

N_{+}

: positive integers.
表 1. CNMF-E 模型和算法中使用的变量。

R

：实数；

R_{+}

：正实数；

N

：自然数；

N_{+}

：正整数。

Name 名称	Description 描述	Domain 领域
$d$	number of pixels 像素数量	$N_{+}$
$T$	number of frames 帧数	$N_{+}$
$K$	number of neurons 神经元数量	$N$
$Y$	motion corrected video data 运动校正后的视频数据	$R_{+}^{d \times T}$
$A$	spatial footprints of all neurons 所有神经元的空间足迹	$R_{+}^{d \times K}$
C	temporal activities of all neurons 所有神经元的时间活动	$R_{+}^{K \times T}$
B	background activity 背景活动	$R_{+}^{d \times T}$
E	observation noise 观测噪声	$R^{d \times T}$
W	weight matrix to reconstruct $B$ using neighboring pixels 用于利用邻近像素重建 $B$ 的权重矩阵	$R^{d \times d}$
$b_{0}$	constant baseline for all pixels 所有像素的恒定基线	$R_{+}^{d}$
$x_{i}$	spatial location of the $i$ th pixel 第 $i$ 个像素的空间位置	$N^{2}$
$σ_{i}$	standard deviation of the noise at pixel $x_{i}$ 像素 $x_{i}$ 处噪声的标准差	$R_{+}$

DOI: https://doi.org/10.7554/eLife.28728.006

Remove false positives 去除假阳性

Some extracted components have spatial shapes that do not correspond to real neurons or temporal traces that do not correspond to neural activity. These components might explain some neural signals or background activity mistakenly. Our source extraction can benefit from the removal of these false positives. This can be done by manually examining all extracted components, or in principle automatically by training a classifier for detecting real neurons. The current implementation relies on visual inspection to exclude false positives. We also rank neurons based on their SNRs and set a cutoff to discard all extracted components that fail to meet this cutoff. As with the splitting step, removing false positives could also potentially use automated ROI detection algorithms in the future. See Video 10 for an example involving manual merge and delete operations.
一些提取的成分在空间形状上不对应真实神经元，或在时间轨迹上不对应神经活动。这些成分可能错误地解释了一些神经信号或背景活动。我们的源提取过程可以通过去除这些假阳性成分而受益。这可以通过手动检查所有提取的成分来完成，或者原则上通过训练分类器自动检测真实神经元来实现。当前的实现依赖于视觉检查来排除假阳性。我们还根据信噪比对神经元进行排序，并设置一个阈值，丢弃所有未达到该阈值的提取成分。与分割步骤类似，去除假阳性未来也可能使用自动 ROI 检测算法。有关手动合并和删除操作的示例，请参见视频 10。

Pick undetected neurons from the residual
从残差中挑选未检测到的神经元

If all neural signals and background are accurately estimated, the residual of the CNMF-E model

Y_{res} = Y - \hat{A} \hat{C} - \hat{B}

should be relatively spatially and temporally uncorrelated. However, the initialization might miss some neurons due to large background fluctuations and/or high neuron density. After we estimate the background

\hat{B}

and extract a majority of the neurons, those missed neurons have prominent fluorescent signals left in the residual. To select these undetected neurons from the residual

Y_{res}

, we use the same algorithm as for initializing neurons from the raw video data, but typically now the task is easier because the background has been removed.
如果所有神经信号和背景都被准确估计，CNMF-E 模型的残差

Y_{res} = Y - \hat{A} \hat{C} - \hat{B}

在空间和时间上应该相对不相关。然而，由于背景波动较大和/或神经元密度较高，初始化可能会遗漏一些神经元。在我们估计背景

\hat{B}

并提取大部分神经元后，那些被遗漏的神经元在残差中仍有明显的荧光信号。为了从残差

Y_{res}

中选择这些未检测到的神经元，我们使用与从原始视频数据初始化神经元相同的算法，但通常现在任务更容易，因为背景已被去除。

Post-process the spatial footprints
后处理空间足迹

Each single neuron has localized spatial shapes and including this prior into the model fitting of CNMF-E, as suggested in (Pnevmatikakis et alo, 2016), leads to better extraction of spatial footprints. In the model fitting step, we constrain

A

to be sparse and spatially localized. These constraints do give us compact neuron shapes in most cases, but in some cases there are still some visually abnormal components detected. We include a heuristic automated post-processing step after each iteration of updating spatial shapes (P-S). For each extracted neuron

A (:, k)

, we first convert it to a 2D image and perform morphological opening to remove isolated pixels resulting from noise (Haralick et al., 1987). Next we label all connected components in the image and create a mask to select the largest component. All pixels outside of the mask in

A (:, i)

are set to be. This post-processing induces compact neuron shapes by removing extra pixels and helps avoid mistakenly explaining the fluorescence signals of the other neurons.
每个单个神经元具有局部的空间形状，将这一先验纳入 CNMF-E 的模型拟合中，如（Pnevmatikakis 等，2016）所建议，有助于更好地提取空间足迹。在模型拟合步骤中，我们约束

A

为稀疏且空间局部化。这些约束在大多数情况下确实为我们提供了紧凑的神经元形状，但在某些情况下仍会检测到一些视觉上异常的成分。我们在每次更新空间形状（P-S）后加入一个启发式自动后处理步骤。对于每个提取的神经元

A (:, k)

，我们首先将其转换为二维图像，并执行形态学开运算以去除由噪声引起的孤立像素（Haralick 等，1987）。接着，我们标记图像中的所有连通成分，并创建一个掩膜以选择最大的成分。将

A (:, i)

中掩膜外的所有像素设为。该后处理通过去除多余像素诱导出紧凑的神经元形状，有助于避免错误解释其他神经元的荧光信号。

Further algorithmic details
进一步的算法细节

The simplest pipeline for running CNMF-E includes the following steps:
运行 CNMF-E 的最简单流程包括以下步骤：

Initialize $\hat{A}, \hat{C}$ using the proposed initialization procedure.
使用所提出的初始化程序初始化 $\hat{A}, \hat{C}$ 。
Solve problem (P-B) for updates of ${\hat{b}}_{0}$ and ${\hat{B}}^{f}$ .
求解问题 (P-B) 以更新 ${\hat{b}}_{0}$ 和 ${\hat{B}}^{f}$ 。
Iteratively solve problem (P-S) and (P-T) to update $\hat{A}, \sim \hat{C}$ and $b_{0}$ .
迭代求解问题 (P-S) 和 (P-T) 以更新 $\hat{A}, \sim \hat{C}$ 和 $b_{0}$ 。
If desired, apply interventions to intermediate results.
如有需要，对中间结果进行干预。
Repeat steps 2, 3, and 4 until the inferred components are stable.
重复步骤 2、3 和 4，直到推断出的组件稳定为止。

In practice, the estimation of the background

B

(step 2 ) often does not vary greatly from iteration to iteration and so this step usually can be run with fewer iterations to save time. In practice, we also use spatial and temporal decimation for improved speed, following (Friedrich et alo, 2017a). We first run the pipeline on decimated data to get good initializations, then we up-sample the results

\hat{A}, \hat{C}

to the original resolution and run one iteration of steps (2-3) on the raw data. This strategy improves on processing the raw data directly because downsampling increases the signal-to-noise ratio and eliminates many false positives.
在实际操作中，背景

B

的估计（步骤 2）通常在每次迭代之间变化不大，因此该步骤通常可以减少迭代次数以节省时间。实际上，我们还采用空间和时间降采样以提高速度，参照（Friedrich et al., 2017a）。我们首先在降采样数据上运行流程以获得良好的初始化，然后将结果

\hat{A}, \hat{C}

上采样到原始分辨率，并在原始数据上运行一次步骤（2-3）的迭代。该策略优于直接处理原始数据，因为降采样提高了信噪比并消除了许多假阳性。

Step 4 provides a fast method for correcting abnormal components without redoing the whole analysis. (This is an important improvement over the PCA/ICA pipeline, where if users encounter poor estimated components it is necessary to repeat the whole analysis with new parameter values, which may not necessarily yield improved cell segmentations.) The interventions described here themselves can be independent tasks in calcium imaging analysis; with further work we expect many of these steps can be automated. In our interface for performing manual interventions, the most frequently used function is to remove false positives. Again, components can be rejected following visual inspection in PCA/ICA analysis, but the performance of CNMF-E can be improved with further iterations after removing false positives, while this is not currently an option for PCA/ICA.
步骤 4 提供了一种快速修正异常成分的方法，无需重新进行整个分析。（这是对 PCA/ICA 流程的重要改进，因为在 PCA/ICA 中，如果用户遇到估计不佳的成分，必须使用新的参数值重复整个分析，而这不一定能带来更好的细胞分割效果。）这里描述的干预措施本身可以作为钙成像分析中的独立任务；通过进一步的工作，我们预计许多步骤可以实现自动化。在我们用于执行手动干预的界面中，最常用的功能是去除假阳性。同样，在 PCA/ICA 分析中也可以通过目视检查来拒绝成分，但 CNMF-E 在去除假阳性后通过进一步迭代可以提升性能，而这目前在 PCA/ICA 中尚不可行。

We have also found a two-step initialization procedure useful for detecting neurons: we first start from relatively high thresholds of

P_{min}

and

L_{min}

to initialize neurons with large activity from the raw video data; then we estimate the background components by solving problem (P-B); finally we can pick undetected neurons from the residual using smaller thresholds. We can terminate the model iterations when the residual sum of squares (RSS) stabilizes (see Figure 4B), but this is seldom used in practice because computing the RSS is time-consuming. Instead we usually automatically stop the iterations after the number of detected neurons stabilizes. If manual interventions are performed, we typically run one last iteration of updating

B, A

and

C

sequentially to further refine the results.
我们还发现一个两步初始化程序对于检测神经元非常有用：我们首先从较高的阈值

P_{min}

和

L_{min}

开始，从原始视频数据中初始化活动量较大的神经元；然后通过求解问题(P-B)来估计背景成分；最后我们可以使用较小的阈值从残差中挑选未检测到的神经元。当残差平方和（RSS）稳定时（见图 4B），我们可以终止模型迭代，但这在实际中很少使用，因为计算 RSS 耗时。相反，我们通常在检测到的神经元数量稳定后自动停止迭代。如果进行人工干预，通常会最后运行一次依次更新

B, A

和

C

的迭代，以进一步优化结果。

Parameter selection 参数选择

Table 2 shows 5 key parameters used in CNMF-E. All of these parameters have interpretable meaning and can be easily picked within a broad range. The parameter

l

controls the size of the spatial filter in the initialization step and is chosen as the diameter of a typical neuron in the FOV. As long as

l

is much smaller than local background sources, the filtered data can be used for detecting seed pixels and then initializing neural traces. The distance between each seed pixel and its selected neighbors

l_{n}

has to be larger than the neuron size

l

and smaller than the spatial range of local background sources; in practice, this range is fairly broad. We usually set

l_{n}

2 l

. To determine the thresholds

P_{min}

and

L_{min}

, we first compute the correlation image and PNR image and then visually select very weak neurons from these two images.

P_{min}

and

L_{min}

are determined to ensure that CNMF-E is able to choose seed pixels from these weak neurons. Small

P_{min}

and

L_{min}

yield more false positive neurons, but they can be removed in the intervention step. Finally, in practice, our results are not sensitive to the selection of the outlier parameter

ζ

, thus we frequently set it as 10 .
表 2 展示了 CNMF-E 中使用的 5 个关键参数。所有这些参数都有可解释的含义，并且可以在较宽的范围内轻松选择。参数

l

控制初始化步骤中空间滤波器的大小，选择为视野中典型神经元的直径。只要

l

远小于局部背景源，滤波后的数据即可用于检测种子像素并初始化神经信号。每个种子像素与其选定邻居之间的距离

l_{n}

必须大于神经元大小

l

且小于局部背景源的空间范围；在实际操作中，这个范围相当宽泛。我们通常将

l_{n}

设置为

2 l

。为了确定阈值

P_{min}

和

L_{min}

，我们首先计算相关图像和 PNR 图像，然后从这两幅图像中目视选择非常弱的神经元。

P_{min}

和

L_{min}

的确定是为了确保 CNMF-E 能够从这些弱神经元中选择种子像素。较小的

P_{min}

和

L_{min}

会产生更多的假阳性神经元，但它们可以在干预步骤中被移除。最后，在实际应用中，我们的结果对异常值参数

ζ

的选择不敏感，因此我们经常将其设置为 10。

Complexity analysis 复杂度分析

In step 1, the time cost is mainly determined by spatial filtering, resulting in

O (d T)

time. As for the initialization of a single neuron given a seed pixel, it is only

(O (T))

. Considering the fact that the number of neurons is typically much smaller than the number of pixels in this data, the complexity for step one remains

O (d T)

. In step 2 , the complexity of estimating

{\hat{b}}_{0}

O (d T)

and estimating

{\hat{B}}^{f}

scales linearly with the number of pixels

d

. For each pixel, the computational complexity for estimating

W_{i, :}

O (T)

. Thus, the computational complexity in updating the background component is

O (d T)

. In step 3, the computational complexities of solving problems (P-S) and (P-T) have been discussed in
在步骤 1 中，时间成本主要由空间滤波决定，结果为

O (d T)

时间。至于给定种子像素初始化单个神经元，仅需

(O (T))

。考虑到神经元数量通常远小于数据中的像素数量，步骤一的复杂度保持为

O (d T)

。在步骤 2 中，估计

{\hat{b}}_{0}

的复杂度为

O (d T)

，估计

{\hat{B}}^{f}

的复杂度随像素数量

d

线性增长。对于每个像素，估计

W_{i, :}

的计算复杂度为

O (T)

。因此，更新背景成分的计算复杂度为

O (d T)

。在步骤 3 中，求解问题（P-S）和（P-T）的计算复杂度已在

Table 2. Optional user-specified parameters.
表 2。可选的用户指定参数。

Name 名称	Description 描述	Default values 默认值	Used in 使用于
$l$	size of a typical neuron soma in the FOV 视野中典型神经元胞体的大小	$30 μ m$	Algorithm 1 算法 1
$l_{n}$	the distance between each pixel and its neighbors 每个像素与其邻居之间的距离	$60 μ m$	Problem (P-B) 问题 (P-B)
$P_{min}$	the minimum peak-to-noise ratio of seed pixels 种子像素的最小峰值信噪比	10	Algorithm 1 算法 1
$L_{min}$	the minimum local correlation of seed pixels 种子像素的最小局部相关性	0.8	Algorithm 1 算法 1
$ζ$	the ratio between the outlier threshold and the noise 异常值阈值与噪声的比率	10	Problem (P-B) 问题 (P-B)

DOI: https://doi.org/10.7554/eLife.28728.024
DOI：https://doi.org/10.7554/eLife.28728.024
previous literature (Pnevmatikakis et al., 2016) and they scale linearly with pixel number

d

and time

T

, that is,

O (d T)

. For the interventions, the one with the largest computational cost is picking undetected neurons from the residual, which is the same as the initialization step. Therefore, the computational cost for step 4 is

O (d T)

. To summarize, the complexity for running CNMF-E is

O (d T)

, that is, the method scales linearly with both the number of pixels and the total recording time.
先前文献（Pnevmatikakis 等，2016）中，它们与像素数量

d

和时间

T

成线性关系，即

O (d T)

。对于干预措施，计算成本最高的是从残差中挑选未检测到的神经元，这与初始化步骤相同。因此，步骤 4 的计算成本为

O (d T)

。总结来说，运行 CNMF-E 的复杂度为

O (d T)

，即该方法与像素数量和总录制时间均呈线性关系。

Implementations 实现

Our MATLAB implementation supports running CNMF-E in three different modes that are optimized for different datasets: single-mode, patch-mode and multi-batch-mode.
我们的 MATLAB 实现支持以三种不同模式运行 CNMF-E，这些模式针对不同的数据集进行了优化：单模式、分块模式和多批次模式。

Single-mode is a naive implementation that loads data into memory and fits the model. It is fast for processing small datasets (<1 GB).
单模式是一种简单的实现方式，将数据加载到内存中并拟合模型。它适合快速处理小型数据集（<1 GB）。

For larger datasets, many computers have insufficient RAM for loading all data into memory and storing intermediate results. Patch-mode CNMF-E divides the whole FOV into multiple small patches and maps data to the hard drive (Giovannucci et al., 2017b). The data within each patch are loaded only when we process that patch. This significantly reduces the memory consumption. More importantly, this mode allows running CNMF-E in parallel on multi-core CPUs, yielding a speed-up roughly proportional to the number of available cores.
对于较大的数据集，许多计算机的内存不足以将所有数据加载到内存中并存储中间结果。分块模式 CNMF-E 将整个视野划分为多个小块，并将数据映射到硬盘（Giovannucci 等，2017b）。只有在处理某个块时才加载该块内的数据。这大大减少了内存消耗。更重要的是，该模式允许在多核 CPU 上并行运行 CNMF-E，速度提升大致与可用核心数成正比。

Multi-batch mode builds on patch-mode and is optimized for even larger datasets, especially data collected over multiple sessions/days. This mode segments data into multiple batches temporally and assumes that the neuron footprints

A

are shared across all batches. We process each batch using patch mode and perform partial weighted updates on

A

given the traces

C

obtained in each batch.
多批次模式基于补丁模式构建，针对更大规模的数据集进行了优化，尤其是跨多个会话/天收集的数据。该模式将数据按时间划分为多个批次，并假设神经元轮廓

A

在所有批次中共享。我们使用补丁模式处理每个批次，并根据每个批次获得的痕迹

C

对

A

进行部分加权更新。

All modes also include a logging system for keeping track of manual interventions and intermediate operations.
所有模式还包括一个日志系统，用于跟踪手动干预和中间操作。

The Python implementation is similar; see Giovannucci et al., 2017b) for full details.
Python 实现类似；详见 Giovannucci 等人，2017b。

Running time 运行时间

To provide a sense of the running time of the different steps of the algorithm, we timed the code on the simulation data shown in Figure 4. This dataset is

253 \times 316

pixels

\times 2000

frames. The analyses were performed on a desktop with Intel Xeon CPU E5-2650 v4 @2.20 GHz and 128 GB RAM running Ubuntu 16.04. We used a parallel implementation for performing the CNMF-E analysis, with patch size

64 \times 64

pixels, using up to 12 cores. PCA/ICA took

\sim 211

seconds to converge, using 250 PCs and 220 ICs. CNMF-E spent 55 s for initialization, 1 s for merging and deleting components, 110 s for the first round of the background estimation and 40 s in the following updates, 8 s for picking neurons from the residual, and 10 s per iteration for updating spatial

(A)

and temporal

(C)

components, resulting in a total of 258 s .
为了提供算法不同步骤运行时间的参考，我们对图 4 所示的模拟数据进行了计时。该数据集为

253 \times 316

像素

\times 2000

帧。分析在一台配备 Intel Xeon CPU E5-2650 v4 @2.20 GHz 和 128 GB 内存，运行 Ubuntu 16.04 的台式机上进行。我们使用了并行实现来执行 CNMF-E 分析，补丁大小为

64 \times 64

像素，最多使用 12 个核心。PCA/ICA 使用 250 个主成分和 220 个独立成分，收敛时间为

\sim 211

秒。CNMF-E 初始化耗时 55 秒，合并和删除成分耗时 1 秒，第一轮背景估计耗时 110 秒，后续更新耗时 40 秒，从残差中提取神经元耗时 8 秒，更新空间

(A)

和时间

(C)

成分每次迭代耗时 10 秒，总计 258 秒。

Finally, Table 3 shows the running time of processing the four experimental datasets.
最后，表 3 显示了处理四个实验数据集的运行时间。

Simulation experiments 模拟实验

Details of the simulated experiment of Figure 2
图 2 模拟实验的详细信息

The field of view was

256 \times 256

, with 1000 frames. We simulated 50 neurons whose shapes were simulated as spherical 2-D Gaussian. The neuron centers were drawn uniformly from the whole FOV and the Gaussian widths

σ_{x}

and

σ_{y}

for each neuron was also randomly drawn from

N (\frac{l}{4}, {(\frac{1}{10} \frac{l}{4})}^{2})

, where

l = 12

pixels. Spikes were simulated from a Bernoulli process with probability of spiking per timebin
视野为

256 \times 256

，共 1000 帧。我们模拟了 50 个神经元，其形状模拟为二维球形高斯。神经元中心均匀分布在整个视野内，每个神经元的高斯宽度

σ_{x}

和

σ_{y}

也随机取自

N (\frac{l}{4}, {(\frac{1}{10} \frac{l}{4})}^{2})

，其中

l = 12

为像素。脉冲通过伯努利过程模拟，每个时间片的发放概率为

Table 3. Running time (sec) for processing the 4 experimental datasets.
表 3. 处理 4 个实验数据集的运行时间（秒）。

Dataset 数据集	Striatum 纹状体	PFC		BNST
Size ( $x \times y \times t$ ) 大小（ $x \times y \times t$ ）	$256 \times 256 \times 6000$	$175 \times 184 \times 9000$	$175 \times 184 \times 9000$	$175 \times 184 \times 9000$
(# PCs, # ICs) （主成分数，独立成分数）	$(2000, 700)$	$(275, 250)$	$(100, 50)$	$(200, 150)$
PFC/ICA	986	181	174	52
CNMF-E	726	221	225	435

DOI: https://doi.org/10.7554/eLife.28728.025
DOI：https://doi.org/10.7554/eLife.28728.025
0.01 and then convolved with a temporal kernel

g (t) = \exp (- t / τ_{d}) - \exp (- t / τ_{r})

, with fall time

τ_{d} = 6

timebin and rise time

τ_{r} = 1

timebin. We simulated the spatial footprints of local backgrounds as 2-D Gaussian as well, but the mean Gaussian width is 5 times larger than the neurons’ widths. As for the spatial footprint of the blood vessel in Figure 2A, we simulated a cubic function and then convolved it with a 2-D Gaussian (Gaussian width=3pixel). We use a random walk model to simulate the temporal fluctuations of local background and blood vessel. For the data used in Figure 2A-H, there were 23 local background sources; for Figure 21, we varied the number of background sources.
0.01，然后与一个时间核

g (t) = \exp (- t / τ_{d}) - \exp (- t / τ_{r})

进行卷积，衰减时间为

τ_{d} = 6

时间单元，上升时间为

τ_{r} = 1

时间单元。我们同样将局部背景的空间轮廓模拟为二维高斯，但高斯的平均宽度是神经元宽度的 5 倍。至于图 2A 中血管的空间轮廓，我们模拟了一个三次函数，然后与二维高斯（高斯宽度=3 像素）进行卷积。我们使用随机游走模型来模拟局部背景和血管的时间波动。对于图 2A-H 中使用的数据，有 23 个局部背景源；对于图 21，我们改变了背景源的数量。

We used the raw data to estimate the background in CNMF-E without subtracting the neural signals

\hat{A} \hat{C}

in problem (P-B). We set

l_{n} = 15

pixels and left the remaining parameters at their default values. The plain NMF was performed using the built-in MATLAB function nnmf, which utilizes random initialization.
我们使用原始数据在 CNMF-E 中估计背景，未减去问题(P-B)中的神经信号

\hat{A} \hat{C}

。我们设置了

l_{n} = 15

像素，其他参数保持默认值。普通 NMF 使用 MATLAB 内置函数 nnmf 执行，该函数采用随机初始化。

Details of the simulated experiment of Figure 3, Figure 4 and Figure 5
图 3、图 4 和图 5 的模拟实验细节

We used the same simulation settings for both Figure 3 and Figure 4. The field of view was

253 \times

316 and the number of frames was 2000 . We simulated 200 neurons using the same method as the simulation in Figure 2, but for the background we used the spatiotemporal activity of the background extracted using CNMF-E from real experimental data (data not shown). The noise level

Σ

was also estimated from the data. When we varied the SNR in Figure 4D-G, we multiplied

Σ

with an SNR reduction factor.
我们对图 3 和图 4 使用了相同的模拟设置。视野为

253 \times

316，帧数为 2000。我们使用与图 2 模拟相同的方法模拟了 200 个神经元，但背景使用了通过 CNMF-E 从真实实验数据中提取的时空活动（数据未显示）。噪声水平

Σ

也从数据中估计得出。当我们在图 4D-G 中改变信噪比（SNR）时，我们将

Σ

乘以一个 SNR 降低因子。

We set

l = 12

pixels to create the spatial filtering kernel. As for the thresholds used for determining seed pixels, we varied them for different SNR settings by visually checking the corresponding local correlation images and PNR images. The selected values were

L_{min} = [0.9, 0.8, 0.8, 0.8, 0.6, 0.6]

and

P_{min} = [15, 10, 10, 8, 6, 6]

for different SNR reduction factors

[1, 2, 3, 4, 5, 6]

. For PCA/ICA analysis, we set the number of PCs and ICs as 600 and 300, respectively.
我们设置了

l = 12

像素来创建空间滤波核。至于用于确定种子像素的阈值，我们通过目视检查相应的局部相关图像和 PNR 图像，针对不同的 SNR 设置进行了调整。选定的值分别为

L_{min} = [0.9, 0.8, 0.8, 0.8, 0.6, 0.6]

和

P_{min} = [15, 10, 10, 8, 6, 6]

，对应不同的 SNR 降低因子

[1, 2, 3, 4, 5, 6]

。对于 PCA/ICA 分析，我们将主成分（PCs）和独立成分（ICs）的数量分别设置为 600 和 300。

The simulation in Figure 5 only includes two neurons (as seen in Figure 3E) using the same simulation parameters. We replaced their temporal traces

c_{1}

and

c_{2}

with

(1 - ρ) c_{1} + ρ c_{3}

and

(1 - ρ) c_{2} + ρ c_{3}

, where

ρ

is tuned to generate different correlation levels

(γ)

, and

c_{3}

is simulated in the same way as

c_{1}

and

c_{2}

. We also added a new background source whose temporal profile is

c_{3}

to increase the neuron-background correlation as

ρ

increases. CNMF-E was run as in Figure 4. We used 20 PCs and ICs for PCA/ICA.
图 5 中的模拟仅包含两个神经元（如图 3E 所示），使用相同的模拟参数。我们将它们的时间轨迹

c_{1}

和

c_{2}

替换为

(1 - ρ) c_{1} + ρ c_{3}

和

(1 - ρ) c_{2} + ρ c_{3}

，其中

ρ

被调节以产生不同的相关水平

(γ)

，而

c_{3}

的模拟方式与

c_{1}

和

c_{2}

相同。我们还添加了一个新的背景源，其时间轮廓为

c_{3}

，以随着

ρ

的增加提高神经元与背景的相关性。CNMF-E 的运行方式与图 4 相同。我们使用了 20 个主成分（PCs）和独立成分（ICs）进行 PCA/ICA 分析。

In vivo microendoscopic imaging and data analysis
体内微内窥镜成像及数据分析

For all experimental data used in this work, we ran both CNMF-E and PCA/ICA. For CNMF-E, we chose parameters so that we initialized about 10-20% extra components, which were then merged or deleted (some automatically, some under manual supervision) to obtain the final estimates. Exact parameter settings are given for each dataset below. For PCA/ICA, the number of ICs were selected to be slightly larger than our extracted components in CNMF-E (as we found this led to the best results for this algorithm), and the number of PCs was selected to capture over

90 %

of the signal variance. The weight of temporal information in spatiotemporal ICA was set as 0.1. After obtaining PCA/ICA filters, we again manually removed components that were clearly not neurons based on neuron morphology.
对于本研究中使用的所有实验数据，我们同时运行了 CNMF-E 和 PCA/ICA。对于 CNMF-E，我们选择的参数使得初始化的成分数量比实际需要多出约 10-20%，这些多余的成分随后被合并或删除（部分自动完成，部分在人工监督下进行），以获得最终估计。每个数据集的具体参数设置如下所示。对于 PCA/ICA，独立成分（IC）的数量被选为略多于 CNMF-E 中提取的成分数量（因为我们发现这样能获得该算法的最佳结果），主成分（PC）的数量则选取以捕获超过

90 %

的信号方差。时空 ICA 中时间信息的权重设置为 0.1。在获得 PCA/ICA 滤波器后，我们再次根据神经元形态手动剔除明显非神经元的成分。

We computed the SNR of extracted cellular traces to quantitatively compare the performances of two approaches. For each cellular trace

y

, we first computed its denoised trace

c

using the selected deconvolution algorithm (here, it is thresholded OASIS); then the SNR of

y

is
我们计算了提取的细胞信号的信噪比（SNR），以定量比较两种方法的性能。对于每个细胞信号

y

，我们首先使用选定的去卷积算法（此处为阈值化的 OASIS）计算其去噪信号

c

；然后

y

的信噪比为

S N R = \frac{‖ c ‖_{2}^{2}}{‖ y - c ‖_{2}^{2}}

For PCA/ICA results, the calcium signal

y

of each IC is the output of its corresponding spatial filter, while for CNMF-E results, it is the trace before applying temporal deconvolution, that is,

{\hat{y}}_{i}

in Equation (9). All the data can be freely accessed online (Zhou et alo, 2017).
对于 PCA/ICA 结果，每个独立成分（IC）的钙信号

y

是其对应空间滤波器的输出，而对于 CNMF-E 结果，则是应用时间反卷积之前的轨迹，即方程（9）中的

{\hat{y}}_{i}

。所有数据均可在线自由访问（Zhou et al., 2017）。

Dorsal striatum data 背侧纹状体数据

Expression of the genetically encoded calcium indicator GCaMP6f in neurons was achieved using a recombinant adeno-associated virus (AAV) encoding the GCaMP6f protein under transcriptional control of the synapsin promoter (AAV-Syn-GCaMP6f). This viral vector was packaged (Serotype 1) and stored in undiluted aliquots at a working concentration of

> 1012

genomic copies per ml at

- 80^{\circ} C

until intracranial injection.

500 μ l

of AAV1-Syn-GCaMP6f was injected unilaterally into dorsal striatum ( 0.6 mm anterior to Bregma, 2.2 mm lateral to Bregma, 2.5 mm ventral to the surface of the brain). 1 week post-injection, a 1 mm gradient index of refraction (GRIN) lens was implanted into dorsal striatum

\sim 300 μ m

above the center of the viral injection. Three weeks after the implantation, the GRIN lens was reversibly coupled to a miniature one-photon microscope with an integrated 475 nm LED (Inscopix). Using nVistaHD Acquisition software, images were acquired at 30 frames per second with the LED transmitting

\sim 0.1

to 0.2 mW of light while the mouse was freely moving in an open-field arena. Images were down sampled to 10 Hz and processed into TIFFs using Mosaic software. All experimental manipulations were performed in accordance with protocols approved by the Harvard Standing Committee on Animal Care following guidelines described in the US NIH Guide for the Care and Use of Laboratory Animals.
通过使用重组腺相关病毒（AAV）表达遗传编码的钙指示剂 GCaMP6f，实现了神经元中 GCaMP6f 的表达。该病毒载体在突触素启动子的转录控制下编码 GCaMP6f 蛋白（AAV-Syn-GCaMP6f）。该病毒载体被包装（血清型 1），并以工作浓度为

> 1012

基因组拷贝/ml 的未稀释分装液储存在

- 80^{\circ} C

，直至进行颅内注射。将

500 μ l

的 AAV1-Syn-GCaMP6f 单侧注射到背侧纹状体（距 Bregma 前方 0.6 毫米，距 Bregma 侧方 2.2 毫米，距脑表面腹侧 2.5 毫米）。注射后 1 周，在病毒注射中心上方

\sim 300 μ m

处植入直径 1 毫米的梯度折射率（GRIN）透镜于背侧纹状体。植入后三周，GRIN 透镜可逆地连接到集成 475 纳米 LED 的小型单光子显微镜（Inscopix）。使用 nVistaHD 采集软件，在小鼠自由活动于开放场地时，以 30 帧每秒的速度采集图像，LED 发射

\sim 0.1

至 0.2 毫瓦的光。图像被降采样至 10 赫兹，并使用 Mosaic 软件处理成 TIFF 格式。所有实验操作均按照哈佛动物护理常设委员会批准的方案进行，遵循美国国立卫生研究院《实验动物护理和使用指南》中描述的指导原则。

The parameters used in running CNMF-E were:

l = 13

pixels,

l_{n} = 18

pixels,

L_{min} = 0.7

, and

P_{min} =

7.728 components were initialized from the raw data in the first pass before subtracting the background, and then additional components were initialized in a second pass. Highly correlated nearby components were merged and false positives were removed using the automated approach described above. In the end, we obtained 692 components.
运行 CNMF-E 时使用的参数为：

l = 13

像素，

l_{n} = 18

像素，

L_{min} = 0.7

，以及

P_{min} =

。在第一次处理原始数据时初始化了 7,728 个成分，然后减去背景后，在第二次处理中初始化了额外的成分。使用上述自动化方法合并了高度相关的邻近成分，并去除了假阳性。最终，我们获得了 692 个成分。

Prefrontal cortex data 前额叶皮层数据

Cortical neurons were targeted by administering two microinjections of 300 ul of AAV-DJ-CamkllaGCaMP6s (titer:

5.3 \times 1012, 1 : 6

dilution, UNC vector core) into the prefrontal cortex (PFC) (coordinates relative to bregma; injection 1:

+ 1.5 mm AP, 0.6 mm ML, - 2.4 ml DV

; injection 2: +2.15 AP ,

0.43 mm ML, - 2.4 mm DV

) of an adult male wild type (WT) mice. Immediately following the virus injection procedure, a 1 mm diameter GRIN lens implanted 300 um above the injection site (coordinates relative to bregma:

+ 1.87 mm AP, 0.5 mm ML, - 2.1 ml DV

). After sufficient time had been allowed for the virus to express and the tissue to clear underneath the lens ( 3 weeks), a baseplate was secured to the skull to interface the implanted GRIN lens with a miniature, integrated microscope (nVista, 473 nm excitation LED, Inscopix) and subsequently permit the visualization of Ca 2 +signals from the PFC of a freely behaving mouse. The activity of PFC neurons were recorded at 15 Hz over a 10 min period (nVista HD Acquisition Software, Inscopix) while the test subject freely explored an empty novel chamber. Acquired data was spatially down sampled by a factor of 2, motion corrected, and temporally down sampled to 15 Hz (Mosaic Analysis Software, Inscopix). All procedures were approved by the University of North Carolina Institutional Animal Care and Use Committee (UNC IACUC).
通过向成年雄性野生型（WT）小鼠的前额叶皮层（PFC）（相对于冠状缝的坐标；注射 1：

+ 1.5 mm AP, 0.6 mm ML, - 2.4 ml DV

；注射 2：+2.15 AP，

0.43 mm ML, - 2.4 mm DV

）注射两次各 300 微升的 AAV-DJ-CamkllaGCaMP6s（滴度：

5.3 \times 1012, 1 : 6

稀释，UNC 载体核心），以靶向皮层神经元。病毒注射程序完成后，立即在注射部位上方 300 微米处植入直径 1 毫米的 GRIN 透镜（相对于冠状缝的坐标：

+ 1.87 mm AP, 0.5 mm ML, - 2.1 ml DV

）。在给予足够时间让病毒表达并使透镜下方组织清晰（3 周）后，将一个基座固定在颅骨上，以连接植入的 GRIN 透镜与微型集成显微镜（nVista，473 nm 激发 LED，Inscopix），从而实现对自由活动小鼠 PFC 中 Ca2+信号的可视化。PFC 神经元的活动以 15 Hz 的频率记录 10 分钟（nVista HD 采集软件，Inscopix），记录期间测试对象自由探索一个空的新环境。采集的数据经过空间下采样 2 倍、运动校正，并时间下采样至 15 Hz（Mosaic 分析软件，Inscopix）。所有程序均获得北卡罗来纳大学动物护理和使用委员会（UNC IACUC）的批准。

The parameters used in running CNMF-E were:

l = 13

pixels,

l_{n} = 18

pixels,

L_{min} = 0.9

, and

P_{min} = 15

. There were 169 components initialized in the first pass and we obtained 225 components after running the whole CNMF-E pipeline.
运行 CNMF-E 时使用的参数为：

l = 13

像素，

l_{n} = 18

像素，

L_{min} = 0.9

，和

P_{min} = 15

。第一次运行时初始化了 169 个成分，经过整个 CNMF-E 流程后，我们获得了 225 个成分。

Ventral hippocampus data
腹侧海马数据

The calcium indicator GCaMP6f was expressed in ventral hippocampal-amygdala projecting neurons by injecting a retrograde canine adeno type 2-Cre virus (CAV2-Cre; from Larry Zweifel, University of Washington) into the basal amydala (coordinates relative to bregma:

- 1.70 AP, 3.00 mm ML

, and -4.25 mm DV from brain tissue at site), and a Cre-dependent GCaMP6f adeno associated virus (AAV1-flex-Synapsin-GCaMP6f, UPenn vector core) into ventral CA1 of the hippocampus
通过向基底杏仁核注射逆行犬腺病毒 2 型-Cre 病毒（CAV2-Cre；来自华盛顿大学 Larry Zweifel）表达钙指示剂 GCaMP6f 于腹侧海马-杏仁核投射神经元（相对于缝合点的坐标：

- 1.70 AP, 3.00 mm ML

，脑组织深度-4.25 毫米 DV），并向海马腹侧 CA1 注射 Cre 依赖性 GCaMP6f 腺相关病毒（AAV1-flex-Synapsin-GCaMP6f，宾夕法尼亚大学载体核心）。
(coordinates relative to bregma:

- 3.16 mm AP, 3.50 mm ML

, and -3.50 mm DV from brain tissue at site). A 0.5 mm diameter GRIN lens was then implanted over the vCA1 subregion and imaging began 3 weeks after surgery to allow for sufficient viral expression. Mice were then imaged with Inscopix miniaturized microscopes and nVistaHD Acquisition software as described above; images were acquired at 15 frames per second, while mice explored an anxiogenic Elevated Plus Maze arena. Videos were motion corrected and spatially downsampled using Mosaic software. All procedures were performed in accordance with protocols approved by the New York State Psychiatric Institutional Animal Care and Use Committee following guidelines described in the US NIH Guide for the Care and Use of Laboratory Animals.
（相对于大脑缝坐标：

- 3.16 mm AP, 3.50 mm ML

，并且从脑组织表面向下 3.50 毫米）。随后，在 vCA1 亚区上方植入直径为 0.5 毫米的 GRIN 透镜，手术后三周开始成像，以确保病毒表达充分。随后，使用 Inscopix 微型显微镜和上述 nVistaHD 采集软件对小鼠进行成像；图像以每秒 15 帧的速度采集，同时小鼠在一个引发焦虑的高架十字迷宫中探索。视频使用 Mosaic 软件进行了运动校正和空间降采样。所有操作均按照纽约州精神病学机构动物护理和使用委员会批准的方案进行，遵循美国国立卫生研究院《实验动物护理和使用指南》中的相关指导原则。

The parameters used in running CNMF-E were:

l = 15

pixels,

l_{n} = 30

pixels,

ζ = 10, L_{min} = 0.9

, and

P_{min} = 15

. We first temporally downsampled the data by 2 . Then we applied CNMF-E to the downsampled data. There were 53 components initialized. After updating the background component, the algorithm detected six more neurons from the residual. We merged most of these components and deleted false positives. In the end, there were 24 components left. The intermediate results before and after each manual intervention are shown in Video 10.
运行 CNMF-E 时使用的参数为：

l = 15

像素，

l_{n} = 30

像素，

ζ = 10, L_{min} = 0.9

，和

P_{min} = 15

。我们首先将数据在时间上下采样了 2 倍。然后我们将 CNMF-E 应用于下采样后的数据。初始化了 53 个成分。更新背景成分后，算法从残差中检测到了另外六个神经元。我们合并了大部分这些成分并删除了假阳性。最终，剩下 24 个成分。每次手动干预前后的中间结果显示在视频 10 中。

BNST data with footshock
带脚踢刺激的 BNST 数据

Calcium indicator GCaMP6s was expressed within CaMKII-expressing neurons in the BNST by injecting the recombinant adeno-associated virus AAVdj-CaMKII-GCaMP6s (packaged at UNC Vector Core) into the anterior dorsal portion of BNST (coordinates relative to bregma:

0.10 mm AP, - 0.95

mm ML, - 4.30 mm DV)

. A 0.6 mm diameter GRIN lens was implanted above the injection site within the BNST. As described above, images were acquired using a detachable miniature one-photon microscope and nVistaHD Acquisition Software (Inscopix). Images were acquired at 20 frames per second while the animal was freely moving inside a sound-attenuated chamber equipped with a house light and a white noise generator (Med Associates). Unpredictable foot shocks were delivered through metal bars in the floor as an aversive stimulus during a 10 min session. Each unpredictable foot shock was 0.75 mA in intensity and 500 ms in duration on a variable interval (VI-60). As described above, images were motion corrected, downsampled and processed into TIFFs using Mosaic Software. These procedures were conducted in adult C57BL/6J mice (Jackson Laboratories) and in accordance with the Guide for the Care and Use of Laboratory Animals, as adopted by the NIH, and with approval from the Institutional Animal Care and Use Committee of the University of North Carolina at Chapel Hill (UNC).
通过向 BNST 前背部注射重组腺相关病毒 AAVdj-CaMKII-GCaMP6s（由 UNC Vector Core 包装），在表达 CaMKII 的 BNST 神经元中表达钙指示剂 GCaMP6s（相对于冠状缝的坐标：

0.10 mm AP, - 0.95

mm ML, - 4.30 mm DV)

）。在 BNST 注射部位上方植入直径为 0.6 毫米的 GRIN 透镜。如上所述，使用可拆卸的微型单光子显微镜和 nVistaHD 采集软件（Inscopix）采集图像。图像采集速度为每秒 20 帧，动物在配备有室内灯和白噪声发生器（Med Associates）的隔音箱内自由活动。不可预测的电击通过地板上的金属条作为厌恶刺激，在 10 分钟的实验过程中施加。每次不可预测的电击强度为 0.75 毫安，持续时间为 500 毫秒，采用变间隔（VI-60）方式。如上所述，图像经过运动校正、降采样，并使用 Mosaic 软件处理成 TIFF 格式。这些程序在成年 C57BL/6J 小鼠（Jackson Laboratories）中进行，符合美国国立卫生研究院（NIH）采纳的《实验动物护理和使用指南》，并获得北卡罗来纳大学教堂山分校（UNC）机构动物护理和使用委员会的批准。

The parameters used in running CNMF-E were:

l = 15

pixels,

l_{n} = 23

pixels,

ζ = 10, L_{min} = 0.9

, and

P_{min} = 15

. There were 149 components initialized and we detected 29 more components from the residual after estimating the background. there were 127 components left after running the whole pipeline.
运行 CNMF-E 时使用的参数为：

l = 15

像素，

l_{n} = 23

像素，

ζ = 10, L_{min} = 0.9

，和

P_{min} = 15

。初始化了 149 个成分，在估计背景后从残差中检测到 29 个额外成分。整个流程运行后剩余 127 个成分。

Code availability 代码可用性

All analyses were performed with custom-written MATLAB code. MATLAB implementations of the CNMF-E algorithm can be freely downloaded from https://github.com/zhoupc/CNMF_E (Zhou, 2017a). We also implemented CNMF-E as part of the Python package CalmAn (Giovannucci et al., 2017b), a computational analysis toolbox for large-scale calcium imaging and behavioral data (https://github.com/simonsfoundation/CalmAn [Giovannucci et al., 2017a]).
所有分析均使用自编写的 MATLAB 代码完成。CNMF-E 算法的 MATLAB 实现可从 https://github.com/zhoupc/CNMF_E（Zhou，2017a）免费下载。我们还将 CNMF-E 实现为 Python 包 CalmAn（Giovannucci 等，2017b）的一部分，该包是用于大规模钙成像和行为数据的计算分析工具箱（https://github.com/simonsfoundation/CalmAn [Giovannucci 等，2017a]）。

The scripts for generating all figures and the experimental data in this paper can be accessed from https://github.com/zhoupc/eLife_submission (Zhou, 2017b).

Acknowledgements

We would like to thank CNMF-E users who received early access to our package and provided tremendously helpful feedback and suggestions, especially James Hyde, Jesse Wood, and Sean Piantadosi in Susanne Ahmari’s lab in University of Pittsburgh, Andreas Klaus in Rui Costa’s Lab in the Champalimaud Neurobiology of Action Laboratory, Suoqin Jin in Xiangmin Xu’s lab at University of California - Irvine, Conor Heins at the National Institute of Drug Abuse, Chris Donahue in Anatol Kreitzer’s lab at University of California - San Francisco, Xian Zhang in Bo Li’s lab at Cold Spring Harbor Laboratory, Emily Mackevicius in Michale Fee’s lab at Massachusetts Institute of Technology, Courtney Cameron and Malavika Murugan in Ilana Witten’s lab at Princeton University, Pranav

Mamidanna in Jonathan Whitlock’s lab at Norwegian University of Science and Technology, and Milekovic Tomislav in Gregoire Courtine’s group at EPFL. We also thank Andreas Klaus for valuable comments on the manuscript.

Additional information 附加信息

Funding 资金支持
Funder 资助者	Author 作者
National Institute of Mental Health 国家精神卫生研究所	Pengcheng Zhou Jessica C Jimenez Rene Hen Mazen A Kheirbek Robert E Kass 周鹏程杰西卡·C·希门尼斯雷内·亨马赞·A·凯尔贝克罗伯特·E·卡斯
National Institute on Drug Abuse 国家药物滥用研究所	Pengcheng Zhou Jose Rodriguez-Romaguera Garret D Stuber 周鹏程何塞·罗德里格斯-罗马格拉加勒特·D·斯图伯
Intelligence Advanced Research Projects Activity 智能先进研究项目活动	Pengcheng Zhou Liam Paninski 周鹏程利亚姆·帕宁斯基
Defense Advanced Research Projects Agency 国防高级研究计划局	Liam Paninski 利亚姆·帕宁斯基
Army Research Office 陆军研究办公室	Liam Paninski 利亚姆·帕宁斯基
National Institute of Biomedical Imaging and Bioengineering 国家生物医学成像与生物工程研究所	Liam Paninski 利亚姆·帕宁斯基
Eunice Kennedy Shriver National Institute of Child Health and Human Development 尤尼斯·肯尼迪·施赖弗国家儿童健康与人类发展研究所	Shanna L Resendez Garret D Stuber
Howard Hughes Medical Institute 霍华德·休斯医学研究所	Jessica C Jimenez
National Institute on Aging 国家老龄研究所	Jessica C Jimenez Rene Hen 杰西卡·C·希门尼斯雷内·亨
New York State Stem Cell Science 纽约州干细胞科学	Jessica C Jimenez Rene Hen 杰西卡·C·希门尼斯雷内·亨
Hope for Depression Research Foundation 抑郁症研究基金会	Jessica C Jimenez Rene Hen 杰西卡·C·希门尼斯雷内·亨
Canadian Institutes of Health Research 加拿大卫生研究院	Shay Q Neufeld 谢伊·Q·诺伊费尔德
Simons Foundation 西蒙斯基金会	Andrea Giovannucci Johannes Friedrich Eftychios A Pnevmatikakis Garret D Stuber Liam Paninski
International Mental Health Research Organization 国际心理健康研究组织	Mazen A Kheirbek
National Institute of Neurological Disorders and Stroke 国家神经疾病与中风研究所	Bernardo L Sabatini 伯纳多·L·萨巴蒂尼

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
资助方在研究设计、数据收集与解释以及提交发表工作的决定中均未参与。

Author contributions 作者贡献

Pengcheng Zhou, Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing-original draft, Project administration, Writingreview and editing; Shanna L Resendez, Shay Q Neufeld, Resources, Data curation, Funding acquisition, Validation, Investigation, Writing-review and editing; Jose Rodriguez-Romaguera, Resources, Data curation, Validation, Investigation, Visualization, Writing-review and editing; Jessica C
周鹏程，概念构思，资源提供，数据整理，软件，正式分析，验证，调查，数据可视化，方法学，原始稿撰写，项目管理，审稿与编辑；Shanna L Resendez，Shay Q Neufeld，资源提供，数据整理，资金获取，验证，调查，审稿与编辑；Jose Rodriguez-Romaguera，资源提供，数据整理，验证，调查，数据可视化，审稿与编辑；Jessica C

Jimenez, Resources, Data curation, Funding acquisition, Validation, Investigation, Visualization, Writ-ing-review and editing; Andrea Giovannucci, Johannes Friedrich, Eftychios A Pnevmatikakis, Software; Garret D Stuber, Rene Hen, Resources, Supervision, Funding acquisition; Mazen A Kheirbek, Resources, Supervision, Funding acquisition, Validation, Writing-review and editing; Bernardo L Sabatini, Robert E Kass, Resources, Supervision, Funding acquisition, Visualization, Writing-review and editing; Liam Paninski, Conceptualization, Resources, Supervision, Funding acquisition, Validation, Visualization, Methodology, Writing-original draft, Project administration, Writing-review and editing
Jimenez，资源，数据整理，资金获取，验证，调查，数据可视化，撰写-审阅与编辑；Andrea Giovannucci，Johannes Friedrich，Eftychios A Pnevmatikakis，软件；Garret D Stuber，Rene Hen，资源，监督，资金获取；Mazen A Kheirbek，资源，监督，资金获取，验证，撰写-审阅与编辑；Bernardo L Sabatini，Robert E Kass，资源，监督，资金获取，数据可视化，撰写-审阅与编辑；Liam Paninski，概念构思，资源，监督，资金获取，验证，数据可视化，方法学，撰写-初稿，项目管理，撰写-审阅与编辑

Author ORCIDs 作者 ORCID

Pengcheng Zhou (iD http://orcid.org/0000-0003-1237-3931
Andrea Giovannucci (iD) http://orcid.org/0000-0002-7850-444X
Johannes Friedrich (10) http://orcid.org/0000-0002-1321-5866
Eftychios A Pnevmatikakis (iD) https://orcid.org/0000-0003-1509-6394
Garret D Stuber (1D) http://orcid.org/0000-0003-1730-4855
Ethics 伦理学
Animal experimentation: These procedures were conducted in accordance with the Guide for the Care and Use of Laboratory Animals, as adopted by the NIH, and with approval from the Harvard Standing Committee on Animal Care (protocol number: IS00000571 ), or the University of North Carolina Institutional Animal Care and Use Committee (UNC IACUC, protocol number: 16-075.0), or the New York State Psychiatric Institutional Animal Care and Use Committee (protocol number: NYSPI1412 ).
动物实验：这些程序均按照美国国立卫生研究院（NIH）采纳的《实验动物护理和使用指南》进行，并获得哈佛动物护理常务委员会（协议编号：IS00000571）、北卡罗来纳大学机构动物护理和使用委员会（UNC IACUC，协议编号：16-075.0）或纽约州精神病院机构动物护理和使用委员会（协议编号：NYSPI1412）的批准。

Decision letter and Author response
决定信及作者回复
Decision letter https://doi.org/10.7554/eLife.28728.030
决定信 https://doi.org/10.7554/eLife.28728.030
Author response https://doi.org/10.7554/eLife.28728.031
作者回复 https://doi.org/10.7554/eLife.28728.031

Additional files 附加文件

Supplementary files 补充文件

Transparent reporting form
透明报告表

DOI: https://doi.org/10.7554/eLife.28728.026
DOI：https://doi.org/10.7554/eLife.28728.026
Major datasets 主要数据集
The following dataset was generated:
生成了以下数据集：

Author(s) 作者	Year 年份	Dataset title 数据集标题	Dataset URL 数据集网址	Database, license, and accessibility information 数据库、许可及可访问性信息
Zhou P, Resendez SL, Rodriguez-Romaguera J, Jimenez JC, Neufeld SQ, Giovannucci A, Friedrich J, Pnevmatikakis EA, Stuber GD, Hen R, Kheirbek MA, Sabatini BL, Kass RE, Paninski L	2017	Data from: Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data 数据来源：高效且准确地从微内窥镜视频数据中提取体内钙信号	https://doi.org/10.5061/ dryad.kr17k https://doi.org/10.5061/dryad.kr17k	Available at Dryad Digital Repository under a CCO Public Domain Dedication 可在 Dryad 数字资源库以 CC0 公共领域授权获取

References 参考文献
Apthorpe N, Riordan A, Aguilar R, Homann J, Gu Y, Tank D, Seung HS. 2016. Automatic neuron detection in calcium imaging data using convolutional networks. Advances in Neural Information Processing Systems 29: 3270-3278.
Apthorpe N, Riordan A, Aguilar R, Homann J, Gu Y, Tank D, Seung HS. 2016. 使用卷积网络自动检测钙成像数据中的神经元。《神经信息处理系统进展》29: 3270-3278。

Barbera G, Liang B, Zhang L, Gerfen CR, Culurciello E, Chen R, Li Y, Lin DT. 2016. Spatially compact neural clusters in the dorsal striatum encode locomotion relevant information. Neuron 92:202-213. DOI: https://doi. org/10.1016/j.neuron.2016.08.037, PMID: 27667003
Barbera G, Liang B, Zhang L, Gerfen CR, Culurciello E, Chen R, Li Y, Lin DT. 2016. 背侧纹状体中空间紧凑的神经簇编码与运动相关的信息。《神经元》92:202-213. DOI: https://doi.org/10.1016/j.neuron.2016.08.037, PMID: 27667003
Bhatia K, Jain P, Kar P. 2015. Robust regression via hard thresholding. Advances in Neural Information Processing Systems 28:721-729.
Bhatia K, Jain P, Kar P. 2015. 通过硬阈值实现的鲁棒回归。《神经信息处理系统进展》28:721-729。
Cai DJ, Aharoni D, Shuman T, Shobe J, Biane J, Song W, Wei B, Veshkini M, La-Vu M, Lou J, Flores SE, Kim I, Sano Y, Zhou M, Baumgaertel K, Lavi A, Kamata M, Tuszynski M, Mayford M, Golshani P, et al. 2016. A shared neural ensemble links distinct contextual memories encoded close in time. Nature 534:115-118. DOI: https:// doi.org/10.1038/nature17955, PMID: 27251287
Cai DJ, Aharoni D, Shuman T, Shobe J, Biane J, Song W, Wei B, Veshkini M, La-Vu M, Lou J, Flores SE, Kim I, Sano Y, Zhou M, Baumgaertel K, Lavi A, Kamata M, Tuszynski M, Mayford M, Golshani P, 等. 2016. 一个共享的神经群体连接了时间上接近编码的不同情境记忆。《自然》534:115-118. DOI: https://doi.org/10.1038/nature17955, PMID: 27251287
Cameron CM, Pillow J, Witten IB. 2016. Cellular resolution calcium imaging and optogenetic excitation reveal a role for IL to NAc projection neurons in encoding of spatial information during cocaine-seeking. Neuroscience Meeting Planner. Society for Neuroscience, San Diego.
Cameron CM, Pillow J, Witten IB. 2016. 细胞分辨率的钙成像和光遗传激发揭示了 IL 到 NAc 投射神经元在可卡因寻求过程中空间信息编码中的作用。神经科学会议策划。神经科学学会，圣地亚哥。
Carvalho Poyraz F, Holzner E, Bailey MR, Meszaros J, Kenney L, Kheirbek MA, Balsam PD, Kellendonk C. 2016. Decreasing striatopallidal pathway function enhances motivation by energizing the initiation of goal-directed action. The Journal of Neuroscience 36:5988-6001. DOI: https://doi.org/10.1523/JNEUROSCI.0444-16.2016, PMID: 27251620
Carvalho Poyraz F, Holzner E, Bailey MR, Meszaros J, Kenney L, Kheirbek MA, Balsam PD, Kellendonk C. 2016. 降低纹状体苍白球通路功能通过激发目标导向行为的启动来增强动机。《神经科学杂志》36:5988-6001。DOI：https://doi.org/10.1523/JNEUROSCI.0444-16.2016，PMID：27251620
Cichocki A, Phan AH. 2009. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E92-A:708-721. DOI: https://doi.org/10.1587/transfun.E92.A. 708
Cichocki A, Phan AH. 2009. 大规模非负矩阵和张量分解的快速局部算法。IEICE 电子、通信与计算机科学基础事务 E92-A:708-721。DOI：https://doi.org/10.1587/transfun.E92.A.708
Cichocki A, Zdunek R, Amari S. 2007. Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization. Lecture Notes in Computer Science 4666:169-176. DOI: https://doi.org/10.1007/978-3-540-74494-8_22
Cichocki A, Zdunek R, Amari S. 2007. 非负矩阵和三维张量分解的分层 ALS 算法。计算机科学讲义 4666:169-176。DOI：https://doi.org/10.1007/978-3-540-74494-8_22
Cox J, Pinto L, Dan Y. 2016. Calcium imaging of sleep-wake related neuronal activity in the dorsal pons. Nature Communications 7:10763. DOI: https://doi.org/10.1038/ncomms10763, PMID: 26911837
Cox J, Pinto L, Dan Y. 2016. 背侧脑桥中与睡眠-觉醒相关的神经元活动的钙成像。自然通讯 7:10763. DOI: https://doi.org/10.1038/ncomms10763, PMID: 26911837
Deneux T, Kaszas A, Szalay G, Katona G, Lakner T, Grinvald A, Rózsa B, Vanzetta I. 2016. Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications 7 :12190. DOI: https://doi.org/10.1038/ncomms12190, PMID: 27432255
Deneux T, Kaszas A, Szalay G, Katona G, Lakner T, Grinvald A, Rózsa B, Vanzetta I. 2016. 从噪声钙信号中准确估计尖峰，用于体内大型神经元群体的超快速三维成像。自然通讯 7:12190. DOI: https://doi.org/10.1038/ncomms12190, PMID: 27432255
Dombeck DA, Graziano MS, Tank DW. 2009. Functional clustering of neurons in motor cortex determined by cellular resolution imaging in awake behaving mice. Journal of Neuroscience 29:13751-13760. DOI: https://doi. org/10.1523/JNEUROSCI.2985-09.2009, PMID: 19889987
Dombeck DA, Graziano MS, Tank DW. 2009. 通过清醒行为小鼠的细胞分辨率成像确定运动皮层中神经元的功能聚类。神经科学杂志 29:13751-13760. DOI: https://doi.org/10.1523/JNEUROSCI.2985-09.2009, PMID: 19889987
Donahue CH, Kreitzer AC. 2017. Function of Basal Ganglia Circuitry in Motivation. Neuroscience Meeting Planner. Society for Neuroscience, Washinton, DC.
Donahue CH, Kreitzer AC. 2017. 基底节回路在动机中的功能。神经科学会议策划。神经科学学会，华盛顿特区。
Flusberg BA, Nimmerjahn A, Cocker ED, Mukamel EA, Barretto RP, Ko TH, Burns LD, Jung JC, Schnitzer MJ. 2008. High-speed, miniaturized fluorescence microscopy in freely moving mice. Nature Methods 5:935-938. DOI: https://doi.org/10.1038/nmeth.1256, PMID: 18836457
Flusberg BA, Nimmerjahn A, Cocker ED, Mukamel EA, Barretto RP, Ko TH, Burns LD, Jung JC, Schnitzer MJ. 2008. 在自由活动的小鼠中实现高速微型荧光显微镜。Nature Methods 5:935-938. DOI: https://doi.org/10.1038/nmeth.1256, PMID: 18836457
Friedrich J, Yang W, Soudry D, Mu Y, Ahrens MB, Yuste R, Peterka DS, Paninski L. 2017a. Multi-scale approaches for high-speed imaging and analysis of large neural populations. PLOS Computational Biology 13:e1005685. DOI: https://doi.org/10.1371/journal.pcbi.1005685, PMID: 28771570
Friedrich J, Yang W, Soudry D, Mu Y, Ahrens MB, Yuste R, Peterka DS, Paninski L. 2017a. 用于大规模神经群体高速成像和分析的多尺度方法。PLOS Computational Biology 13:e1005685. DOI: https://doi.org/10.1371/journal.pcbi.1005685, PMID: 28771570
Friedrich J, Zhou P, Paninski L. 2017b. Fast online deconvolution of calcium imaging data. PLOS Computational Biology 13:e1005423-1005426. DOI: https://doi.org/10.1371/journal.pcbi.1005423, PMID: 28291787
Friedrich J, Zhou P, Paninski L. 2017b. 钙成像数据的快速在线反卷积。PLOS Computational Biology 13:e1005423-1005426. DOI: https://doi.org/10.1371/journal.pcbi.1005423, PMID: 28291787
Ghosh KK, Burns LD, Cocker ED, Nimmerjahn A, Ziv Y, Gamal AE, Schnitzer MJ. 2011. Miniaturized integration of a fluorescence microscope. Nature Methods 8:871-878. DOI: https://doi.org/10.1038/nmeth.1694, PMID: 21 909102
Ghosh KK, Burns LD, Cocker ED, Nimmerjahn A, Ziv Y, Gamal AE, Schnitzer MJ. 2011. 微型荧光显微镜的集成。Nature Methods 8:871-878. DOI: https://doi.org/10.1038/nmeth.1694, PMID: 21909102
Giovannucci A, Friedrich J, Deverett B, Staneva V, Chklovskii D, Pnevmatikakis EA. 2017a. CalmAn. Github. 6bd51e2. https://github.com/flatironinstitute/CalmAn
Giovannucci A, Friedrich J, Deverett B, Staneva V, Chklovskii D, Pnevmatikakis EA. 2017a. CalmAn。Github。6bd51e2。https://github.com/flatironinstitute/CalmAn
Giovannucci A, Friedrich J, Deverett B, Staneva V, Chklovskii D, Pnevmatikakis EA. 2017b. CalmAn: an open source toolbox for large scale calcium imaging data analysis on standalone machines. Cosyne Abstracts. Cosyne2017.
Giovannucci A, Friedrich J, Deverett B, Staneva V, Chklovskii D, Pnevmatikakis EA. 2017b. CalmAn：一个用于独立机器上大规模钙成像数据分析的开源工具箱。Cosyne 摘要。Cosyne2017。
Haralick RM, Sternberg SR, Zhuang X. 1987. Image analysis using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence 9:532-550. DOI: https://doi.org/10.1109/TPAMI.1987.4767941, PMID: 21869411
Haralick RM, Sternberg SR, Zhuang X. 1987. 使用数学形态学进行图像分析。IEEE 模式分析与机器智能汇刊 9:532-550。DOI：https://doi.org/10.1109/TPAMI.1987.4767941，PMID：21869411
Harrison TC, Pinto L, Brock JR, Dan Y. 2016. Calcium imaging of basal forebrain activity during innate and learned behaviors. Frontiers in Neural Circuits 10 :1-12. DOI: https://doi.org/10.3389/fncir.2016.00036, PMID: 27242444
Harrison TC, Pinto L, Brock JR, Dan Y. 2016. 基底前脑活动的钙成像研究，涵盖先天和学习行为。神经回路前沿 10:1-12。DOI：https://doi.org/10.3389/fncir.2016.00036，PMID：27242444
Jennings JH, Sparta DR, Stamatakis AM, Ung RL, Pleil KE, Kash TL, Stuber GD. 2013. Distinct extended amygdala circuits for divergent motivational states. Nature 496 :224-228. DOI: https://doi.org/10.1038/ nature12041, PMID: 23515155
Jennings JH, Sparta DR, Stamatakis AM, Ung RL, Pleil KE, Kash TL, Stuber GD. 2013. 不同的延伸杏仁核回路对应不同的动机状态。Nature 496 :224-228. DOI: https://doi.org/10.1038/nature12041, PMID: 23515155
Jennings JH, Ung RL, Resendez SL, Stamatakis AM, Taylor JG, Huang J, Veleta K, Kantak PA, Aita M, ShillingScrivo K, Ramakrishnan C, Deisseroth K, Otte S, Stuber GD. 2015. Visualizing hypothalamic network dynamics for appetitive and consummatory behaviors. Cell 160:516-527. DOI: https://doi.org/10.1016/j.cell.2014.12.026, PMID: 25635459
Jennings JH, Ung RL, Resendez SL, Stamatakis AM, Taylor JG, Huang J, Veleta K, Kantak PA, Aita M, ShillingScrivo K, Ramakrishnan C, Deisseroth K, Otte S, Stuber GD. 2015. 可视化下丘脑网络动态以研究食欲和摄食行为。Cell 160:516-527. DOI: https://doi.org/10.1016/j.cell.2014.12.026, PMID: 25635459
Jewell S, Witten D. 2017. Exact Spike Train Inference Via

ℓ_{0}

Optimization. arXiv. https://arxiv.org/abs/1703. 08644.
Jewell S, Witten D. 2017. 通过

ℓ_{0}

优化实现精确的尖峰列推断。arXiv. https://arxiv.org/abs/1703.08644.

Jimenez JC, Goldberg A, Ordek G, Luna VM, Su K, Pena S, Zweifel L, Hen R, Kheirbek M. 2016. Subcortical projection-specific control of innate anxiety and learned fear by the ventral hippocampus. Neuroscience Meeting Planner. Society for Neuroscience, San Diego.
Jimenez JC, Goldberg A, Ordek G, Luna VM, Su K, Pena S, Zweifel L, Hen R, Kheirbek M. 2016. 腹侧海马对先天性焦虑和习得性恐惧的亚皮层投射特异性控制。神经科学会议策划。神经科学学会，圣地亚哥。

Jimenez JC, Su K, Goldberg AR, Luna VM, Biane JS, Ordek G, Zhou P, Ong SK, Wright MA, Zweifel L, Paninski L, Hen R, Kheirbek MA. 2018. Anxiety Cells in a Hippocampal-Hypothalamic Circuit. Neuron 97:670-683. DOI: https://doi.org/10.1016/j.neuron.2018.01.016, PMID: 29397273
Jimenez JC, Su K, Goldberg AR, Luna VM, Biane JS, Ordek G, Zhou P, Ong SK, Wright MA, Zweifel L, Paninski L, Hen R, Kheirbek MA. 2018. 海马-下丘脑回路中的焦虑细胞。Neuron 97:670-683. DOI: https://doi.org/10.1016/j.neuron.2018.01.016, PMID: 29397273
Kitamura T, Sun C, Martin J, Kitch LJ, Schnitzer MJ, Tonegawa S. 2015. Entorhinal cortical ocean cells encode specific contexts and drive context-specific fear memory. Neuron 87:1317-1331. DOI: https://doi.org/10.1016/ j.neuron.2015.08.036, PMID: 26402611
Kitamura T, Sun C, Martin J, Kitch LJ, Schnitzer MJ, Tonegawa S. 2015. 内嗅皮层海洋细胞编码特定情境并驱动情境特异性恐惧记忆。Neuron 87:1317-1331. DOI: https://doi.org/10.1016/j.neuron.2015.08.036, PMID: 26402611

Klaus A, Martins GJ, Paixao VB, Zhou P, Paninski L, Costa RM. 2017. The spatiotemporal organization of the striatum encodes action space. Neuron 95:1171-1180. DOI: https://doi.org/10.1016/j.neuron.2017.08.015, PMID: 28858619
Klaus A, Martins GJ, Paixao VB, Zhou P, Paninski L, Costa RM. 2017. 纹状体的时空组织编码动作空间。Neuron 95:1171-1180. DOI: https://doi.org/10.1016/j.neuron.2017.08.015, PMID: 28858619
Lin X, Grieco SF, Jin S, Zhou P, Nie Q, Kwapis J, Wood MA, Baglietto-Vargas D, Laferla FM, Xu X. 2017. In vivo calcium imaging of hippocampal neuronal network activity associated with memory behavior deficits in the Alzheimer’s disease mouse model. Neuroscience Meeting Planner. Society for Neuroscience, Washinton, DC.
Lin X, Grieco SF, Jin S, Zhou P, Nie Q, Kwapis J, Wood MA, Baglietto-Vargas D, Laferla FM, Xu X. 2017. 阿尔茨海默病小鼠模型中与记忆行为缺陷相关的海马神经元网络活动的体内钙成像。神经科学会议策划。神经科学学会，华盛顿特区。
Mackevicius EM, Denisenko N, Fee MS. 2017. Neural sequences underlying the rapid learning of new syllables in juvenile zebra finches. Neuroscience Meeting Planner. Society for Neuroscience, Washinton, DC.
Mackevicius EM, Denisenko N, Fee MS. 2017. 神经序列在幼年斑胸草雀快速学习新音节中的作用。神经科学会议策划。神经科学学会，华盛顿特区。
Madangopal R, Heins C, Caprioli D, Liang B, Barbera G, Komer L, Bossert J, Hope B, Shaham Y, Lin D. 2017. In vivo calcium imaging to assess the role of prelimbic cortex neuronal ensembles in encoding reinstatement of palatable food-seeking in rats. Neuroscience Meeting Planner. Society for Neuroscience, Washinton, DC.
Madangopal R, Heins C, Caprioli D, Liang B, Barbera G, Komer L, Bossert J, Hope B, Shaham Y, Lin D. 2017. 体内钙成像评估前额叶皮层神经元群在编码大鼠可口食物寻求行为恢复中的作用。神经科学会议策划。神经科学学会，华盛顿特区。
Markowitz JE, Liberti WA, Guitchounts G, Velho T, Lois C, Gardner TJ. 2015. Mesoscopic patterns of neural activity support songbird cortical sequences. PLOS Biology 13:e1002158. DOI: https://doi.org/10.1371/journal. pbio.1002158, PMID: 26039895
Markowitz JE, Liberti WA, Guitchounts G, Velho T, Lois C, Gardner TJ. 2015. 中观神经活动模式支持鸣禽皮层序列。PLOS Biology 13:e1002158. DOI: https://doi.org/10.1371/journal.pbio.1002158, PMID: 26039895
Mohammed AI, Gritton HJ, Tseng HA, Bucklin ME, Yao Z, Han X. 2016. An integrative approach for analyzing hundreds of neurons in task performing mice using wide-field calcium imaging. Scientific Reports 6:20986. DOI: https://doi.org/10.1038/srep20986, PMID: 26854041
Mohammed AI, Gritton HJ, Tseng HA, Bucklin ME, Yao Z, Han X. 2016. 使用宽视野钙成像对执行任务的小鼠中数百个神经元进行综合分析的方法。Scientific Reports 6:20986. DOI: https://doi.org/10.1038/srep20986, PMID: 26854041
Mukamel EA, Nimmerjahn A, Schnitzer MJ. 2009. Automated analysis of cellular signals from large-scale calcium imaging data. Neuron 63:747-760. DOI: https://doi.org/10.1016/j.neuron.2009.08.009, PMID: 19778505
Mukamel EA, Nimmerjahn A, Schnitzer MJ. 2009. 大规模钙成像数据中细胞信号的自动分析。Neuron 63:747-760. DOI: https://doi.org/10.1016/j.neuron.2009.08.009, PMID: 19778505
Mukamel EA. 2016. CellSort. Github. 45f28d7. https://github.com/mukamel-lab/CellSort
Murugan M, Park M, Taliaferro J, Jang HJ, Cox J, Parker N, Bhave V, Nectow A, Pillow J, Witten I. 2017. Combined social and spatial coding in a descending projection from the prefrontal cortex. bioRxiv. DOI: https://doi.org/10.1101/155929
Murugan M, Park M, Taliaferro J, Jang HJ, Cox J, Parker N, Bhave V, Nectow A, Pillow J, Witten I. 2017. 来自前额叶皮层的下行投射中的社会和空间编码结合。bioRxiv. DOI: https://doi.org/10.1101/155929
Murugan M, Taliaferro JP, Park M, Jang H, Witten IB. 2016. Detecting action potentials in neuronal populations with calcium imaging. Neuroscience Meeting Planner. Society for Neuroscience, San Diego.
Murugan M, Taliaferro JP, Park M, Jang H, Witten IB. 2016. 利用钙成像检测神经元群体中的动作电位。神经科学会议策划。神经科学学会，圣地亚哥。
Pachitariu M, Packer AM, Pettit N, Dalgleish H, Hausser M, Sahani M. 2013. Extracting regions of interest from biological images with convolutional sparse block coding. Advances in Neural Information Processing Systems 26:1745-1753.
Pachitariu M, Packer AM, Pettit N, Dalgleish H, Hausser M, Sahani M. 2013. 使用卷积稀疏块编码从生物图像中提取感兴趣区域。神经信息处理系统进展 26:1745-1753。
Pachitariu M, Stringer C, Schröder S, Dipoppa M, Rossi LF, Carandini M, Harris KD. 2016. Suite2p: beyond 10, 000 neurons with standard two-photon microscopy. bioRxiv. DOI: https://doi.org/10.1101/061507
Pachitariu M, Stringer C, Schröder S, Dipoppa M, Rossi LF, Carandini M, Harris KD. 2016. Suite2p：利用标准双光子显微镜超越 1 万神经元的分析。bioRxiv. DOI: https://doi.org/10.1101/061507
Pinto L, Dan Y. 2015. Cell-type-specific activity in prefrontal cortex during goal-directed behavior. Neuron 87: 437-450. DOI: https://doi.org/10.1016/j.neuron.2015.06.021, PMID: 26143660
Pinto L, Dan Y. 2015. 目标导向行为中前额叶皮层的细胞类型特异性活动。Neuron 87: 437-450. DOI: https://doi.org/10.1016/j.neuron.2015.06.021, PMID: 26143660
Pnevmatikakis EA, Merel J, Pakman A, Paninski L. 2013. Bayesian spike inference from calcium imaging data. In: 2013 Asilomar Conference on Signals, Systems and Computers. p. 349-353. DOI: https://doi.org/10.1109/ ACSSC.2013.6810293
Pnevmatikakis EA, Merel J, Pakman A, Paninski L. 2013. 基于钙成像数据的贝叶斯尖峰推断。载于：2013 年 Asilomar 信号、系统与计算机会议。第 349-353 页。DOI: https://doi.org/10.1109/ACSSC.2013.6810293
Pnevmatikakis EA, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, Reardon T, Mu Y, Lacefield C, Yang W, Ahrens M, Bruno R, Jessell TM, Peterka DS, Yuste R, Paninski L. 2016. Simultaneous denoising, deconvolution, and demixing of calcium imaging data. Neuron 89:285-299. DOI: https://doi.org/10.1016/j.neuron.2015.11. 037, PMID: 26774160
Pnevmatikakis EA, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, Reardon T, Mu Y, Lacefield C, Yang W, Ahrens M, Bruno R, Jessell TM, Peterka DS, Yuste R, Paninski L. 2016. 钙成像数据的同时去噪、去卷积和分离。Neuron 89:285-299. DOI: https://doi.org/10.1016/j.neuron.2015.11.037, PMID: 26774160
Pnevmatikakis EA. 2016. Ca_source_extraction. Github. 5a25d5a. https://github.com/epnev/ca_source_extraction Resendez SL, Jennings JH, Ung RL, Namboodiri VM, Zhou ZC, Otis JM, Nomura H, McHenry JA, Kosyk O, Stuber GD. 2016. Visualization of cortical, subcortical and deep brain neural circuit dynamics during naturalistic mammalian behavior with head-mounted microscopes and chronically implanted lenses. Nature Protocols 11: 566-597. DOI: https://doi.org/10.1038/nprot.2016.021, PMID: 26914316
Pnevmatikakis EA. 2016. Ca_source_extraction. Github. 5a25d5a. https://github.com/epnev/ca_source_extraction Resendez SL, Jennings JH, Ung RL, Namboodiri VM, Zhou ZC, Otis JM, Nomura H, McHenry JA, Kosyk O, Stuber GD. 2016. 使用头戴显微镜和长期植入透镜可视化自然哺乳动物行为中的皮层、皮下和深脑神经回路动态。Nature Protocols 11: 566-597. DOI: https://doi.org/10.1038/nprot.2016.021, PMID: 26914316
Roberts TF, Hisey E, Tanaka M, Kearney MG, Chattree G, Yang CF, Shah NM, Mooney R. 2017. Identification of a motor-to-auditory pathway important for vocal learning. Nature Neuroscience 20:978-986. DOI: https://doi. org/10.1038/nn.4563, PMID: 28504672
Roberts TF, Hisey E, Tanaka M, Kearney MG, Chattree G, Yang CF, Shah NM, Mooney R. 2017. 识别对声学学习重要的运动到听觉通路。Nature Neuroscience 20:978-986. DOI: https://doi.org/10.1038/nn.4563, PMID: 28504672
Rodriguez-Romaguera J, Ung RL, Nomura H, Namboodiri VMK, Otis JM, Robinson JE, Resendez SL, McHenry JA, Eckman LEH, Kosyk TL, van den Munkhof HE, Zhou P, Paninski L, Kash TL, Bruchas MR, Stuber GD. 2017. Nociceptin neurons in the bed nucleus of the stria terminalis regulate anxiety. Neuroscience Meeting Planner. Society for Neuroscience, Washinton, DC.
Rodriguez-Romaguera J, Ung RL, Nomura H, Namboodiri VMK, Otis JM, Robinson JE, Resendez SL, McHenry JA, Eckman LEH, Kosyk TL, van den Munkhof HE, Zhou P, Paninski L, Kash TL, Bruchas MR, Stuber GD. 2017. 终纹床核中的 nociceptin 神经元调节焦虑。神经科学会议策划。神经科学学会，华盛顿特区。
Rubin A, Geva N, Sheintuch L, Ziv Y. 2015. Hippocampal ensemble dynamics timestamp events in long-term memory. eLife 4:e12247. DOI: https://doi.org/10.7554/eLife.12247, PMID: 26682652
Rubin A, Geva N, Sheintuch L, Ziv Y. 2015. 海马群体动力学为长期记忆中的事件打时间戳。eLife 4:e12247. DOI: https://doi.org/10.7554/eLife.12247, PMID: 26682652
Ryan PJ, Ross SI, Campos CA, Derkach VA, Palmiter RD. 2017. Oxytocin-receptor-expressing neurons in the parabrachial nucleus regulate fluid intake. Nature Neuroscience 20:1722-1733. DOI: https://doi.org/10.1038/ s41593-017-0014-z, PMID: 29184212
Ryan PJ, Ross SI, Campos CA, Derkach VA, Palmiter RD. 2017. 桥脑旁核中表达催产素受体的神经元调节液体摄入。自然神经科学 20:1722-1733. DOI: https://doi.org/10.1038/s41593-017-0014-z, PMID: 29184212
Sheintuch L, Rubin A, Brande-Eilat N, Geva N, Sadeh N, Pinchasof O, Ziv Y. 2017. Tracking the same neurons across multiple days in

{Ca}^{2 +}

imaging data. Cell Reports 21:1102-1115. DOI: https://doi.org/10.1016/j.celrep. 2017.10.013, PMID: 29069591
Sheintuch L, Rubin A, Brande-Eilat N, Geva N, Sadeh N, Pinchasof O, Ziv Y. 2017. 在

{Ca}^{2 +}

成像数据中跨多天追踪相同神经元。细胞报告 21:1102-1115. DOI: https://doi.org/10.1016/j.celrep.2017.10.013, PMID: 29069591

Smith SL, Häusser M. 2010. Parallel processing of visual space by neighboring neurons in mouse visual cortex. Nature Neuroscience 13:1144-1149. DOI: https://doi.org/10.1038/nn.2620, PMID: 20711183
Smith SL, Häusser M. 2010. 小鼠视觉皮层中邻近神经元对视觉空间的并行处理。自然神经科学 13:1144-1149。DOI: https://doi.org/10.1038/nn.2620，PMID: 20711183

Sun C, Kitamura T, Yamamoto J, Martin J, Pignatelli M, Kitch LJ, Schnitzer MJ, Tonegawa S. 2015. Distinct speed dependence of entorhinal island and ocean cells, including respective grid cells. PNAS 112:9466-9471.
Sun C, Kitamura T, Yamamoto J, Martin J, Pignatelli M, Kitch LJ, Schnitzer MJ, Tonegawa S. 2015. 内嗅岛细胞和海洋细胞（包括各自的网格细胞）速度依赖性的差异。PNAS 112:9466-9471。
DOI: https://doi.org/10.1073/pnas.1511668112, PMID: 26170279
DOI: https://doi.org/10.1073/pnas.1511668112，PMID: 26170279
Tombaz T, Dunn BA, Hovde K, Whitlock JR. 2016. Action planning and action observation in rodent parietal cortex. Neuroscience Meeting Planner. Society for Neuroscience, San Diego.
Tombaz T, Dunn BA, Hovde K, Whitlock JR. 2016. 啮齿类顶叶皮层中的动作计划与动作观察。神经科学会议策划。神经科学学会，圣地亚哥。
Ung RL, Rodriguez-Romaguera J, Nomura H, Namboodiri VMK, Otis JM, Robinson JE, Resendez SL, McHenry JA, Eckman LEH, Kosyk TL, van den Munkhof HE, Zhou P, Paninski L, Kash TL, Bruchas MR, Stuber GD. 2017. Encoding the relationship between anxiety-related behaviors and nociceptin neurons of the bed nucleus of the stria terminalis. Neuroscience Meeting Planner. Society for Neuroscience, Washinton, DC.
Ung RL, Rodriguez-Romaguera J, Nomura H, Namboodiri VMK, Otis JM, Robinson JE, Resendez SL, McHenry JA, Eckman LEH, Kosyk TL, van den Munkhof HE, Zhou P, Paninski L, Kash TL, Bruchas MR, Stuber GD. 2017. 编码焦虑相关行为与终纹床核中 nociceptin 神经元之间的关系。神经科学会议策划。神经科学学会，华盛顿特区。
Vogelstein JT, Packer AM, Machado TA, Sippy T, Babadi B, Yuste R, Paninski L. 2010. Fast nonnegative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology 104: 3691-3704. DOI: https://doi.org/10.1152/jn.01073.2009, PMID: 20554834
Vogelstein JT, Packer AM, Machado TA, Sippy T, Babadi B, Yuste R, Paninski L. 2010. 用于从群体钙成像中推断尖峰序列的快速非负反卷积。神经生理学杂志 104: 3691-3704。DOI: https://doi.org/10.1152/jn.01073.2009, PMID: 20554834
Vogelstein JT, Watson BO, Packer AM, Yuste R, Jedynak B, Paninski L. 2009. Spike inference from calcium imaging using sequential Monte Carlo methods. Biophysical Journal 97:636-655. DOI: https://doi.org/10.1016/ j.bpj.2008.08.005, PMID: 19619479
Vogelstein JT, Watson BO, Packer AM, Yuste R, Jedynak B, Paninski L. 2009. 使用序贯蒙特卡洛方法从钙成像中推断尖峰。生物物理学杂志 97:636-655。DOI: https://doi.org/10.1016/j.bpj.2008.08.005, PMID: 19619479

Warp E, Agarwal G, Wyart C, Friedmann D, Oldfield CS, Conner A, Del Bene F, Arrenberg AB, Baier H, Isacoff EY. 2012. Emergence of patterned activity in the developing zebrafish spinal cord. Current Biology 22:93-102. DOI: https://doi.org/10.1016/j.cub.2011.12.002, PMID: 22197243
Warp E, Agarwal G, Wyart C, Friedmann D, Oldfield CS, Conner A, Del Bene F, Arrenberg AB, Baier H, Isacoff EY. 2012. 斑马鱼脊髓发育过程中模式化活动的出现。Current Biology 22:93-102. DOI: https://doi.org/10.1016/j.cub.2011.12.002, PMID: 22197243
Yu K, Ahrens S, Zhang X, Schiff H, Ramakrishnan C, Fenno L, Deisseroth K, Zhao F, Luo MH, Gong L, He M, Zhou P, Paninski L, Li B. 2017. The central amygdala controls learning in the lateral amygdala. Nature Neuroscience 20:1680-1685. DOI: https://doi.org/10.1038/s41593-017-0009-9, PMID: 29184202
Yu K, Ahrens S, Zhang X, Schiff H, Ramakrishnan C, Fenno L, Deisseroth K, Zhao F, Luo MH, Gong L, He M, Zhou P, Paninski L, Li B. 2017. 中央杏仁核控制侧杏仁核的学习。Nature Neuroscience 20:1680-1685. DOI: https://doi.org/10.1038/s41593-017-0009-9, PMID: 29184202
Zhou P, Resendez SL, Rodriguez-Romaguera J, Jimenez JC, Neufeld SQ, Giovannucci A, Friedrich J, Pnevmatikakis EE, Stuber GD, Hen R, Kheirbek MA, Sabatini BL, Kass RE, Paninski L. 2017. Data from: efficient and accurate extraction of in vivo calcium signals from microendoscopic video data. Dryad Digital Repository. DOI: https://doi.org/10.5061/dryad.kr17k
Zhou P, Resendez SL, Rodriguez-Romaguera J, Jimenez JC, Neufeld SQ, Giovannucci A, Friedrich J, Pnevmatikakis EE, Stuber GD, Hen R, Kheirbek MA, Sabatini BL, Kass RE, Paninski L. 2017. 数据来源：从微内窥镜视频数据中高效且准确地提取体内钙信号。Dryad 数字资源库。DOI: https://doi.org/10.5061/dryad.kr17k
Zhou P. 2017a. CNMF-E. Github. 088afc1. https://github.com/zhoupc/CNMF_E
Zhou P. 2017b. eLife_submission. Github. 1c65f70. https://github.com/zhoupc/eLife_submission
Ziv Y, Burns LD, Cocker ED, Hamel EO, Ghosh KK, Kitch LJ, El Gamal A, Schnitzer MJ. 2013. Long-term dynamics of CA1 hippocampal place codes. Nature Neuroscience 16:264-266. DOI: https://doi.org/10.1038/nn.3329, PMID: 23396101
Ziv Y, Burns LD, Cocker ED, Hamel EO, Ghosh KK, Kitch LJ, El Gamal A, Schnitzer MJ. 2013. CA1 海马位置编码的长期动态。自然神经科学 16:264-266. DOI: https://doi.org/10.1038/nn.3329, PMID: 23396101
Ziv Y, Ghosh KK. 2015. Miniature microscopes for large-scale imaging of neuronal activity in freely behaving rodents. Current Opinion in Neurobiology 32:141-147. DOI: https://doi.org/10.1016/j.conb.2015.04.001, PMID: 25951292
Ziv Y, Ghosh KK. 2015. 用于自由行为啮齿动物大规模神经活动成像的微型显微镜。神经生物学当前观点 32:141-147. DOI: https://doi.org/10.1016/j.conb.2015.04.001, PMID: 25951292

Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data 高效且准确地从微内窥镜视频数据中提取体内钙信号

Abstract 摘要

Introduction 引言

Model and model fitting模型与模型拟合

CNMF for microendoscope data (CNMF-E)微内窥镜数据的 CNMF（CNMF-E）

Constraints on neuronal spatial footprints A A AA and neural temporal traces C神经元空间轮廓 A A AA 和神经时间轨迹 C 的约束

Constraints on background activity B B BB背景活动的约束 B B BB

Fitting the CNMF-E model拟合 CNMF-E 模型

Results 结果

CNMF-E can reliably estimate large high-rank background fluctuationsCNMF-E 能够可靠地估计大规模高秩背景波动

CNMF-E accurately initializes single-neuronal spatial and temporal componentsCNMF-E 准确地初始化了单神经元的空间和时间成分

CNMF-E recovers the true neural activity and is robust to noise contamination and neuronal correlations in simulated dataCNMF-E 能够恢复真实的神经活动，并且在模拟数据中对噪声污染和神经元相关性具有鲁棒性

Application to dorsal striatum data应用于背侧纹状体数据

Application to data in prefrontal cortex应用于前额叶皮层的数据

Application to ventral hippocampus neurons应用于腹侧海马神经元

Application to footshock responses in the bed nucleus of the stria terminalis (BNST)应用于终纹床核（BNST）对足部电击反应的研究

Conclusion 结论

Materials and methods 材料与方法

Algorithm for solving problem (P-S)求解问题 (P-S) 的算法

Algorithms for solving problem (P-T)求解问题 (P-T) 的算法

Estimating background by solving problem (P-B)通过求解问题(P-B)估计背景

Initialization of model variables模型变量的初始化

Ranking seed pixels 种子像素排序

Greedy initialization 贪婪初始化

Modifications for high temporal or spatial correlation针对高时间相关性或空间相关性的修改

High temporal correlation, low spatial overlaps高时间相关性，低空间重叠

High spatial overlaps, low temporal correlation高空间重叠，低时间相关性

Interventions 干预

Merge existing components合并现有组件

Split extracted components拆分提取的成分

Remove false positives 去除假阳性

Pick undetected neurons from the residual从残差中挑选未检测到的神经元

Post-process the spatial footprints后处理空间足迹

Further algorithmic details进一步的算法细节

Parameter selection 参数选择

Complexity analysis 复杂度分析

Implementations 实现

Running time 运行时间

Simulation experiments 模拟实验

Details of the simulated experiment of Figure 2图 2 模拟实验的详细信息

Details of the simulated experiment of Figure 3, Figure 4 and Figure 5图 3、图 4 和图 5 的模拟实验细节

In vivo microendoscopic imaging and data analysis体内微内窥镜成像及数据分析

Dorsal striatum data 背侧纹状体数据

Prefrontal cortex data 前额叶皮层数据

Ventral hippocampus data腹侧海马数据

BNST data with footshock带脚踢刺激的 BNST 数据

Code availability 代码可用性

Acknowledgements

Additional information 附加信息

Author contributions 作者贡献

Author ORCIDs 作者 ORCID

Additional files 附加文件

Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data
高效且准确地从微内窥镜视频数据中提取体内钙信号

Model and model fitting
模型与模型拟合

CNMF for microendoscope data (CNMF-E)
微内窥镜数据的 CNMF（CNMF-E）

Constraints on neuronal spatial footprints $A$ and neural temporal traces C
神经元空间轮廓 $A$ 和神经时间轨迹 C 的约束

Constraints on background activity $B$
背景活动的约束 $B$

Fitting the CNMF-E model
拟合 CNMF-E 模型

CNMF-E can reliably estimate large high-rank background fluctuations
CNMF-E 能够可靠地估计大规模高秩背景波动

CNMF-E accurately initializes single-neuronal spatial and temporal components
CNMF-E 准确地初始化了单神经元的空间和时间成分

CNMF-E recovers the true neural activity and is robust to noise contamination and neuronal correlations in simulated data
CNMF-E 能够恢复真实的神经活动，并且在模拟数据中对噪声污染和神经元相关性具有鲁棒性

Application to dorsal striatum data
应用于背侧纹状体数据

Application to data in prefrontal cortex
应用于前额叶皮层的数据

Application to ventral hippocampus neurons
应用于腹侧海马神经元

Application to footshock responses in the bed nucleus of the stria terminalis (BNST)
应用于终纹床核（BNST）对足部电击反应的研究

Algorithm for solving problem (P-S)
求解问题 (P-S) 的算法

Algorithms for solving problem (P-T)
求解问题 (P-T) 的算法

Estimating background by solving problem (P-B)
通过求解问题(P-B)估计背景

Initialization of model variables
模型变量的初始化

Modifications for high temporal or spatial correlation
针对高时间相关性或空间相关性的修改

High temporal correlation, low spatial overlaps
高时间相关性，低空间重叠

High spatial overlaps, low temporal correlation
高空间重叠，低时间相关性

Merge existing components
合并现有组件

Split extracted components
拆分提取的成分

Pick undetected neurons from the residual
从残差中挑选未检测到的神经元

Post-process the spatial footprints
后处理空间足迹

Further algorithmic details
进一步的算法细节

Details of the simulated experiment of Figure 2
图 2 模拟实验的详细信息

Details of the simulated experiment of Figure 3, Figure 4 and Figure 5
图 3、图 4 和图 5 的模拟实验细节

In vivo microendoscopic imaging and data analysis
体内微内窥镜成像及数据分析

Ventral hippocampus data
腹侧海马数据

BNST data with footshock
带脚踢刺激的 BNST 数据