抽象的
宏基因组学和宏转录组学领域涉及完整核苷酸序列的检查、基因鉴定以及不同生物体或环境样本中潜在生物学功能的分析。尽管宏基因组学有巨大的发现机会,但序列数据的庞大数量和复杂性常常给处理分析和可视化带来挑战。本文重点介绍了高级可视化工具在有效探索、查询和分析这些复杂数据集方面的关键作用。该文章强调了可访问性的重要性,根据其预期应用对各种可视化工具进行了分类,并强调了它们在使生物信息学家和非生物信息学家能够有效地解释元组学数据并从中获取见解方面的实用性。
图形概要
关键词
一、简介
地球上微生物细胞的总数估计为 10 30 [
宏基因组学和宏转录组学[
- Di Carlo P.
- Serra N.
- Alduina R.
- Guarino R.
- Craxì A.
- Giammanco A.
- et al.
- Aguiar-Pulido V.
- Huang W.
- Suarez-Ulloa V.
- Cickovski T.
- Mathee K.
- Narasimhan G.
同样,元转录组学 [
- Zhang
- Thompson Y.
- Branck K.N.
- Yan Yan T.
- Nguyen L.H.
- Franzosa E.A.
- et al.
- Bashiardes S.
- Zilberman-Schapira G.
- Elinav E.
宏基因组组装基因组(MAG)是指从宏基因组数据集中重建特定微生物的个体基因组(在不同的完成度和可能的污染水平)的过程。由于宏基因组样本的复杂性和多样性,从宏基因组中提取基因组的过程具有挑战性。然而,测序技术和计算方法的进步使得从宏基因组中提取和表征基因组的准确性越来越高。这些提取的基因组可以为微生物群落的多样性和功能提供有价值的见解,这有助于发现新的生物体、代谢途径和潜在的生物技术应用。
典型的鸟枪法宏基因组分析涉及以下步骤(图 1):
- •
测序:最初,研究人员对样本进行宏基因组测序,生成包含来自环境中存在的各种微生物的 DNA 片段的数据集。 - •
质量控制:检查原始宏基因组序列的质量并清除接头和引物等污染物。 - •
组装/读取映射:在此步骤中,比对短 DNA 片段(读取)以重建较长的基因组序列。使用各种组装方法将清理后的序列组装成重叠群和支架,例如从头组装(不存在参考基因组)、基于参考的组装(如果存在参考基因组)或混合组装(参考引导和部分从头组装) 。 - •
分箱和基因组重建:根据核苷酸组成、覆盖范围和其他特征的相似性,将组装的重叠群(连续 DNA 序列)分组为类似的操作分类单元。通过分箱重建的基因组通常称为宏基因组组装基因组(MAG) - •
注释:MAG 注释有与分离基因组类似的功能和分类信息。
同样,典型的宏转录组学分析涉及以下步骤:
- •
样品采集和 RNA 提取:样品从感兴趣的环境中采集,例如土壤、水或人体肠道。然后,从收集的样本中提取总RNA以捕获活跃转录的基因。 - •
cDNA 合成:在此步骤中,使用逆转录将提取的 RNA 转化为互补 DNA (cDNA)。 - •
测序文库制备:在此步骤中,通常使用片段化和接头连接等方法制备 cDNA 样品的测序文库。 - •
测序:使用 Illumina 或 PacBio 等平台对准备好的文库进行高通量测序。 - •
数据预处理:与宏基因组学一样,需要进行数据预处理,例如接头序列修剪、低质量读数去除和过滤核糖体 RNA (rRNA) 序列。 - •
读取映射:将测序的读取映射到参考基因组或转录组,以识别表达的基因并量化其丰度。 - •
差异表达分析:在此步骤中,鉴定在不同条件下或不同样本之间差异表达的基因。 - •
功能注释和通路分析:在此步骤中,根据 NCBI 的 RefSeq [15] 或 UniProt [16] 等数据库对差异表达基因进行注释,以分配假定的功能以及功能差异表达的途径富集。目的是了解起作用的生物过程。
在这篇综述中,我们重点关注旨在分析和显示宏基因组数据的宏基因组可视化工具,包括 DNA 序列、功能信息和元数据。可视化在宏基因组学领域至关重要,因为它使研究人员能够了解复杂的微生物群落结构、分类组成和功能潜力。尽管已经开发了几种可视化工具来帮助研究人员探索和解释宏基因组数据,但宏基因组可视化领域仍处于起步阶段,复杂性、功能性、可扩展性和互操作性方面的挑战仍然存在。尽管如此,宏基因组可视化可以实现几个重要任务的自动化:
- •
对大量数据集进行交互式、直观的探索和可视化有助于识别数据中的模式和趋势。 - •
多个样本的比较有助于识别相似性和差异,从而增强对宏基因组数据固有的多样性和复杂性的理解。 - •
各种数据类型(包括功能、分类和元数据)的集成有助于全面理解宏基因组数据集。 - •
研究人员之间共享数据和结果可以促进更强有力的合作,并提高研究工作的可重复性。
2. 数据库和存储库
目前,可用的宏基因组和宏转录组数据集,包括原始读数、测序支架、预测基因和注释以及相关元数据,托管在各种公开可用的存储库和数据库中[
- Mukherjee S.
- Stamatis D.
- Li C.T.
- Ovchinnikova G.
- Bertsch J.
- Sundaramurthi J.C.
- et al.
- Chen I.-M.A.
- Chu K.
- Palaniappan K.
- Ratner A.
- Huang J.
- Huntemann M.
- et al.
- Mitchell A.L.
- Almeida A.
- Beracochea M.
- Boland M.
- Burgin J.
- Cochrane G.
- et al.
表 1 数据库和存储库。
数据库名称 | 描述 | 数据类型 | 无障碍 | 用户提交 |
---|---|---|---|---|
基因库 | 测序数据存档 | 基因组、宏基因组、宏转录组、扩增子 | 公开访问 | Yes |
序列读取存档 (SRA) | 测序数据存档 | 原始测序数据 | 公开访问 | Yes |
欧洲核苷酸档案 (ENA) | 所有公开可用的核苷酸序列的存档 | 基因组、宏基因组、宏转录组、扩增子 | 公开访问 | Yes |
DOE 系统生物学知识库 (KBase) | 用于共享、整合和分析微生物、植物和群落数据的平台 | 基因组、宏基因组、宏转录组、扩增子 | 公开访问 | Yes |
基因组在线数据库(黄金) | 基因组项目和元数据存储库(生态系统) | 生态系统 | 公开访问 | Yes |
综合微生物基因组和微生物组 (IMG/M) | 社区驱动的存储库,托管培养和未培养微生物类群的基因组、宏基因组、宏转录组、扩增子、质粒和基因组片段 | 宏基因组、宏转录组、扩增子、基因组 | 公开访问 | Yes |
MGnify | Archive for exploration, and analysis, of microbiome sequencing datasets | Metagenomes, Metatranscriptomes, Amplicons, MAGs | Publicly accessible | Yes |
Metagenome RAST (MG-RAST) | Microbiome repository with a unified pipeline for automated analysis of metagenomic samples | Metagenomes | Registered users | Yes |
Integrated Microbial Viral Genomes (IMG/VR) | Viral genomes and metagenomes | Viral Genomes, Viral Metagenomes | Publicly accessible | Yes |
NMPFamsDB | Novel protein families from IMG’s metagenomes and metatranscriptomes | Protein Families | Publicly accessible | No |
FESnov catalog | Catalog reporting functionally unannotated proteins derived from MAGs | Proteins | Publicly accessible | No |
NIH Human Microbiome Project | Metagenomes from human host-associated systems, such as the gut microbiome | Human Microbiome Metagenomes | Publicly accessible | No |
TerrestrialMetagenomeDB | Annotation of metagenomes obtained from soil samples | Soil Metagenomes | Publicly accessible | Yes |
MarineMetagenomeDB | Annotation of metagenomes obtained from marine samples | Marine Metagenomes | Publicly accessible | Yes |
HumanMetagenomeDB | Annotation of metagenomes obtained from human microbiome samples | Human Microbiome Metagenomes | Publicly accessible | Yes |
SPIRE | Searchable resource of ecosystem metadata obtained from MAGs | Ecosystem Metadata | Publicly accessible | No |
Marine Metagenomics Portal (MMP) | Collection of databases annotating marine-oriented metagenomic datasets | Marine Metagenomes | Publicly accessible | No |
National Microbiome Data Collaborative (NMDC) | A platform for collaboration and data sharing among researchers studying microbiomes across diverse ecosystems | Microbiome Data | Publicly accessible | Yes |
综合微生物基因组和微生物组 (IMG/M) 数据库是一个社区驱动的存储库,其中包含来自生命各个领域的培养和未培养微生物类群的基因组、宏基因组和宏转录组、扩增子、质粒和通过靶向测序生成的感兴趣的基因组片段[
- Chen I.-M.A.
- Chu K.
- Palaniappan K.
- Pillay M.
- Ratner A.
- Huang J.
- et al.
- Chen I.-M.A.
- Chu K.
- Palaniappan K.
- Ratner A.
- Huang J.
- Huntemann M.
- et al.
- Chen I.-M.A.
- Chu K.
- Palaniappan K.
- Pillay M.
- Ratner A.
- Huang J.
- et al.
- Chen I.-M.A.
- Chu K.
- Palaniappan K.
- Ratner A.
- Huang J.
- Huntemann M.
- et al.
- Clum A.
- Huntemann M.
- Bushnell B.
- Foster B.
- Foster B.
- Roux S.
- et al.
- Mitchell A.L.
- Almeida A.
- Beracochea M.
- Boland M.
- Burgin J.
- Cochrane G.
- et al.
- Mitchell A.L.
- Almeida A.
- Beracochea M.
- Boland M.
- Burgin J.
- Cochrane G.
- et al.
除了 IMG/M、MGnify 和 MG-RAST 之外,还提供了几个更专业的宏基因组数据库,重点关注特定的微生物组类型。例如,IMG/VR [
- Camargo A.P.
- Nayfach S.
- Chen I.-M.A.
- Palaniappan K.
- Ratner A.
- Chu K.
- et al.
- Baltoumas F.A.
- Karatzas E.
- Paez-Espino D.
- Venetsianou N.K.
- Aplakidou E.
- Oulas A.
- et al.
- Rodríguez Del Río Á.
- Giner-Lamia J.
- Cantalapiedra C.P.
- Botas J.
- Deng Z.
- Hernández-Plaza A.
- et al.
- Corrêa F.B.
- Saraiva J.P.
- Stadler P.F.
- da Rocha U.N.
- Sunagawa S.
- Coelho L.P.
- Chaffron S.
- Kultima J.R.
- Labadie K.
- Salazar G.
- et al.
最后,国家微生物组数据协作组织 (NMDC) [
3. 序列空间
在本节中,我们描述了上述存储库中当今的序列宏基因组/宏转录组空间(2024 年 4 月快照)。 IMG/M 目前拥有 207,655 个数据集,其中包括 54,030 个宏基因组数据集和 14,411 个宏转录组数据集(65,987,169,755 个支架)。同样,以全面收集未培养病毒基因组而闻名的IMG/VR数据库包含来自宏基因组的总共14,203,973个病毒基因组和8023,647个病毒OTU。 MGnify 拥有来自 2932 项研究的 573,344 个数据集。在这些数据集中,459,617 个扩增子、39,605 个宏基因组和 2581 个宏转录组。此外,MGnify 还具有 11 个宏基因组组装基因组 (MAG) 目录中编目的 429,448 个基因组。 MGnify 蛋白质数据库拥有超过 24 亿个从宏基因组组装预测的独特序列。 SPIRE 包含来自 739 项研究的 99,146 个宏基因组样本。 SPIRE 的宏基因组组装总大小为 16 兆碱基对 (Tbp),包含 350 亿个预测蛋白质序列和 116 万个新生成的中等到高质量的宏基因组组装基因组 (MAG)。
4. 管道
虽然每个流程可能采用不同的方法并集成不同的分析方法,但目前所有可用的工作流程都集中于三个主要程序:i)非编码RNA基因(ncRNA)和其他标记区域的识别,ii)蛋白质编码基因的预测,以及 iii) 样本的功能和分类注释。 ncRNA(例如 rRNA、tRNA)和标记区域(例如 CRISPR 元件)通过使用 INFERNAL [
- Russel J.
- Pinilla-Redondo R.
- Mayo-Muñoz D.
- Shah S.A.
- Sørensen S.J.
- Borodovsky M.
- Lomsadze A.
- Rho M.
- Tang H.
- Ye Y.
基因调用后,可以通过根据参考数据库搜索预测基因来执行功能注释(例如,RefSeq [
- Manghi P.
- Blanco-Míguez A.
- Manara S.
- NabiNejad A.
- Cumbo F.
- Beghini F.
- et al.
5. 宏基因组学中使用的中央可视化布局
尽管宏基因组具有异质性且可视化复杂,但常见的可视化概念始终可用于某些目的(图 2)。
5.1 马戏团
它是一个圆形数据可视化工具,显示沿圆周排列的不同实体之间的关系(图2A)。它最初是为基因组学和生物信息学应用而开发的,但后来被用于各个领域,以可视化复杂的关系和模式。在 Circos 图中,数据由连接圆上点的带状或圆弧表示。圆圈上每个点的位置代表一个特定的实体或类别,丝带表示它们之间的联系或关系。带的厚度或颜色可用于编码定量信息,使其能够有效地说明基因组数据,例如基因组重排、元素之间的相互作用或大型数据集中的相关性。 Circos 图提供了一种独特且具有视觉吸引力的方式来表示复杂数据集中的复杂关系和模式。例如,NMPFamsDB [
5.2 扰动图
UpSet 图是一种数据可视化工具,用于以比传统维恩图更详细、信息更丰富的方式表示集合的交集和基数(图 2B)。在处理较大的集合或集合之间的多个交集时,UpSet 图特别有用。它们旨在解决维恩图的一些局限性,例如缩放到大量集合和呈现交集大小的困难。 UpSet 图的主要功能包括: (i) 矩阵显示 - UpSet 图使用矩阵来表示集合的交集,而不是使用重叠的圆圈。矩阵中的每一行对应于一组唯一的组合,并且单元格指示该特定组合是否存在或不存在。 (ii) 集合大小的条形图 - 该图通常包括显示各个集合的大小的条形图或直方图,以便清楚地了解元素在集合中的分布。 (iii) 交叉点大小条 - 该图还包括代表每个交叉点大小的条形,允许快速比较不同交叉点的大小。 (iv) 注释 - UpSet 图可能包括附加注释或标签,以提供上下文或突出显示数据的特定特征。例如,FLAME [
- Karatzas E.
- Baltoumas F.A.
- Aplakidou E.
- Kontou P.I.
- Stathopoulos P.
- Stefanis L.
- et al.
5.3 维恩图
这是一个图形表示,显示元素集或元素组之间的关系(并集和交集)(图 2B)。它由重叠的圆圈组成,每个圆圈代表一个集合,圆圈之间的重叠或相交代表这些集合之间共享的元素。维恩图的主要目的是直观地描述不同组或类别之间的共性和差异。维恩图的关键组成部分包括 (i) 圆形或椭圆形 - 图中的每个圆形或椭圆形代表一个集合或类别。属于该集合的元素包含在圆圈内。 (ii) 重叠 - 圆圈之间的重叠区域表示两个集合共有的元素。重叠的大小反映了共享元素的范围。 (iii)非重叠区域-每个圆圈的非重叠部分代表该特定集合所独有的元素。维恩图广泛应用于包括宏基因组学在内的各个领域,以可视化不同元素集之间的关系和重叠,例如分类组成、功能基因注释、比较条件或环境以及群落结构。例如,NMPFamsDB 是一个包含源自微生物宏基因组和元转录组的新型蛋白质家族的数据库,在其图形摘要中使用了维恩图。该图说明了新型蛋白质家族在生命各个领域的分布和覆盖范围。这种视觉表现有效地传达了许多新的蛋白质家族包含来自多个分类群的成员,突出了关于这些蛋白质的保守性和重要性的有趣发现。
5.4 热图
它是一种图形表示形式,使用颜色来可视化数据矩阵或网格中变量的强度(图 2C)。它通过将主要变量的值排列在彩色方块网格中来说明主要变量的值,其中两个轴变量分为类似于条形图或直方图的范围。每个单元格的颜色表示轴变量相应范围内主变量的值。在宏基因组分析的背景下,可以采用热图来显示不同样本或条件下特定微生物类群或功能基因的丰度或存在。热图中的行和列可能分别对应于单个微生物分类群或基因以及不同的样本,颜色指示每个元素的相对丰度或出现情况。这种可视化类型对于识别模式、对相关分类群或基因进行聚类以及深入了解宏基因组数据集中微生物群落的组成和动态非常有价值。例如,在[
- Lluch J.
- Servant F.
- Païssé S.
- Valle C.
- Valière S.
- Kuchly C.
- et al.
5.5 条形图
它们代表基于统计数据和数字的数据。条形图使用两个轴来绘制矩形条(图 2D)。其中一个轴代表观察/类别,通常是固定变量,而另一个轴代表观察所携带的数值大小。条形图的典型类型包括水平条形图、垂直条形图、双条形图、多条形图和条形线。在宏基因组学领域,条形图提供了一种有用的可视化方法,用于表示生物样本中不同分类群(例如物种、属、门)的丰度或分布。此类条形图的示例有:(i) 堆积条形图、(ii) 分组条形图和(iii) 相对丰度条形图。在堆叠条形图中,每个条形图都分为多个部分,每个部分代表一个不同的分类组。每个片段的高度对应于样本中该组的丰度。分组条形图可用于比较多个样本中不同分类组的丰度。每组条形代表不同的样本,并且在每组内,条形代表不同分类组的丰度。相对丰度条形图显示每个分类组的相对丰度而不是绝对计数。它对于比较样本中不同类群的比例非常有用。例如,在[
5.6 网络
一般意义上,网络可视化表示系统内元素之间的连接和关系,其中这些元素是节点,它们之间的连接是边。通过使用图形表示,网络可视化提供了一种清晰直观的方法来理解复杂网络内的结构、依赖关系和交互(图 2D)。网络可用于可视化多个科学领域的数据。在生物学中,网络通常用于提供有关生物系统、样本或实体之间的连接性或其他关系的信息[
5.7 旭日图(克朗)
它有多个名称,例如环形图和径向树图,用于可视化分层数据集(图 2F)。它通过使用一系列同心环来展示层次结构,其中每个环对应于层次结构中的特定级别。每个环内的分段按比例划分以表示该级别的细节。通过关注环内的某个段,人们可以了解该段与整个层次结构及其父环段的关系。 Sunburst 图表采用放射状布局,为分类数据集提供身临其境的可视化体验。与树形图中使用的矩形布局不同,旭日图充满空间,并展示每个环如何细分为连续的段,有效地说明了数据的层次结构细分。图表中分类学的直观表示被证明对于宏基因组分析很有价值。其径向布局可以直观地探索不同分类级别之间的关系,从而深入了解微生物群落的组成和分布。例如,[
5.8 树形图
它是通过嵌套矩形表示分层数据的可视化(图 2G)。树状图中的每个矩形对应于特定类别或子类别,矩形的大小反映了它们所代表的数据的定量值。层次结构通过矩形之间的嵌套来描述,顶层矩形代表整个数据集,并将其细分为每个后续级别的更小的矩形。树形图可以有效地显示层次结构并促进对复杂数据集的直观探索,使其在信息可视化、财务分析和项目管理等领域特别有用。在宏基因组分析中,树形图可以用作可视化工具来表示微生物分类或功能数据中的层次结构。例如,在[
- Bremel R.D.
- Homan E.J.
5.9 系统发育树
它们是一种特定类型的树图(树状图),可用于表示分类关系(图 2H)。这些根据宏基因组数据构建的图表通过描述基于遗传相似性的分支模式,有助于说明这些微生物之间的进化关系,从而深入了解给定生态系统中整个微生物群落的生物多样性和进化历史。例如,[
5.10 桑基图
桑基图,也称为桑基图或流程图,是一种可视化表示,说明多个实体之间的资源或信息流 [
- Platzer A.
- Polzin J.
- Rembart K.
- Han P.P.
- Rauer D.
- Nussbaumer T.
5.11 气泡图
它是一种在二维平面上使用不同大小的圆圈显示三维数据的视觉表示(图2J)。每个圆圈或“气泡”代表一个数据点,并根据其沿两个轴的值进行定位。图表上的位置传达了两个变量之间的关系,而气泡的大小表示第三个变量的大小。在生物学中,气泡图可用于表示多变量数据,例如比较不同栖息地的物种丰度。图表上每个气泡的位置可能表示环境参数,而气泡的大小可以代表特定物种的种群规模。这种可视化方法对于识别不同数据集中的模式、相关性和潜在的生态趋势非常有用。例如,[
5.12 蜂巢图
蜂巢图背后的基本概念是以结构化和直观的方式可视化多个变量或类别之间的关系或联系(图 2K)。它通常用于表示具有多个维度的复杂网络或数据集 [
5.13 降维方法
降维方法 [
- Armstrong G.
- Rahman G.
- Martino C.
- McDonald D.
- Gonzalez A.
- Mishne G.
- et al.
- Becht E.
- McInnes L.
- Healy J.
- Dutertre C.-A.
- Kwok I.W.H.
- Ng L.G.
- et al.
- Chari T.
- Pachter L.
- Nie Y.
- Zhao J.-Y.
- Tang Y.-Q.
- Guo P.
- Yang Y.
- Wu X.-L.
- et al.
其他众所周知的降维方法包括均匀流形逼近和投影 (UMAP)、t 分布随机邻域嵌入 (t-SNE) 和潜在狄利克雷分配 (LDA)。统一流形逼近和投影 (UMAP) 是一种非线性降维方法,可保留数据中的全局和局部结构,使其能够有效地可视化复杂数据集。 UMAP 在元基因组领域得到了频繁的应用,其使用非常普遍。这种非线性机器学习方法的整合预计将显着增强我们对宏基因组的理解。 t 分布随机邻域嵌入 (t-SNE) 是另一种流行的非线性方法,专注于保留数据点之间的局部关系,通常用于在二维或三维中可视化高维数据。潜在狄利克雷分配(LDA)是一种概率生成模型,常用于自然语言处理中的主题建模。它通过将文档表示为主题的分布来降低维度,从而允许探索大型文本语料库中的潜在主题。总的来说,这些降维方法为可视化和探索跨不同领域的复杂数据集提供了强大的工具(例如,scRNA-seq,请参阅 SCALA 应用程序 [
5.14 稀疏曲线
它是一种调整样本间宏基因组克隆文库大小差异的方法,以帮助比较 α 多样性。稀疏的概念涉及选择等于或小于最小样本中的样本数的指定数量的样本,然后从较大样本中随机消除读数,直到剩余样本数达到阈值。基于这些大小相等的子样本,可以计算多样性指标以与生态系统相矛盾,并且与样本大小的差异无关。计算出的稀疏度由折线图表示。稀疏曲线不仅反映了样本覆盖度,还描述了采样深度是否足以估计多样性。曲线表示采样深度足够,上升图表示采样深度不足。稀疏曲线通常用于生态学和生物多样性研究,以评估采样工作在捕获生物群落多样性方面的充分性 [
- Chakraborty J.
- Palit K.
- Das S.
- Wang L.
- Jin L.
- Xue B.
- Wang Z.
- Peng Q.
5.15 基因图谱
通常称为遗传图谱或基因组图谱,它是特定染色体或整个基因组上基因排列和位置的直观表示。与 Circos 一样,它提供了遗传结构的图形概述,指示基因、标记和其他遗传特征的相对位置。基因图谱是基因组学和宏基因组学研究的重要工具,有助于理解基因连锁、遗传距离和遗传物质的组织。高分辨率基因图谱对于涉及基因鉴定、标记辅助育种以及各种性状或疾病的遗传基础研究的研究尤其重要。下一代测序等技术进步显着提高了基因图谱的准确性和精确度,有助于我们了解包括人类在内的各种生物体的遗传景观。例如,在[
5.16 树形图
它是一种图形表示,描述不同元素或组件之间的层次结构或关系。它被称为“树”,因为它通常类似于一棵倒立的树,只有一个根或起点,分支成各种分支和子分支。树图的结构由通过边连接的节点组成,其中每个节点代表一个特定的实体或概念,边表示它们之间的关系或连接。树图通常用于计算机科学、语言学、概率论和组织图等各个领域,以直观地组织和说明层次结构。
5.17 空间填充贴图
像希尔伯特曲线这样的空间填充曲线是复杂的几何图案,以连续且不重叠的方式遍历和覆盖二维空间。希尔伯特曲线(或此类中的任何其他曲线)表现为连续的分形结构,其形成根源于将正方形递归细分为四个较小的子正方形,然后以特定顺序连接它们的中心。这条复杂的曲线系统地遍历指定区域内的所有点,保持原始曲线上的点之间的接近度及其在平面上的空间排列。历史上,希尔伯特曲线曾被用来生成大型支架(例如人类染色体)的基因组图谱和细菌基因组的全基因组比对[
- Pavlopoulos G.A.
- Kumar P.
- Sifrim A.
- Sakai R.
- Lin M.L.
- Voet T.
- et al.
- Pavlopoulos G.A.
- Kumar P.
- Sifrim A.
- Sakai R.
- Lin M.L.
- Voet T.
- et al.
在宏基因组分析领域,浏览复杂的数据集和理解微生物群落之间错综复杂的关系提出了重大挑战。为了应对这些挑战,提出的各种可视化概念可能很有用。在此表(表 2)中,我们重点关注宏基因组可视化中遇到的主要挑战,从表示复杂关系到处理大型数据集和理解分类层次结构。表中列出的每个可视化概念都提供了专门针对特定宏基因组挑战而定制的独特功能,为研究人员提供了探索、分析和解释复杂生物数据的宝贵工具。
表 2 可视化概念按其与宏基因组可视化挑战的相关性进行组织。
可视化挑战 | 可视化概念 |
---|---|
表示复杂的关系 | 马戏团、网络 |
处理大集合或交集 | 翻转图、维恩图 |
可视化样本的丰度 | 热图、条形图 |
显示分层数据结构 | 树形图、树木、旭日图(克朗) |
了解分类关系 | 树木、旭日图(克朗) |
说明流程或过渡 | 桑基图、网络、蜂巢图 |
可视化多维数据 | 蜂巢图、3D 网络、降维方法 |
标准化多元化措施 | 稀疏曲线 |
可视化遗传排列 | 基因图谱、基因组查看器 |
更高分辨率的线性表示 | 空间填充图/曲线 |
6. 宏基因组可视化工具的主要应用
在这一部分中,我们展示了各种可视化工具,并根据其主要功能对它们进行组织。尽管我们的汇编可能并不详尽,但我们重点关注成熟的工具,以阐明在不断发展的数据可视化领域中可用于可视化宏基因组数据的一系列选项。这些工具分为主要组,包括质量控制、分箱、组装、基因组内容查看器、分类、社区和网络(表 3)。
表 3 代表性工具按其主要功能进行组织。
TOOL | 按主要功能分类 | 输入数据类型 | 许可证类型 | IMPLEMENTATION | LAST UPDATE |
---|---|---|---|---|---|
快速质量控制 | 质量控制 | 原始序列数据(在任何比对或组装步骤之前) | 开源 | 独立式 | 2023 |
长QC 103 | 质量控制 | 原始长读长测序数据(在任何比对或组装步骤之前 - PacBio 测序、Oxford Nanopore 测序) | 开源 | 独立式 | 2023 |
MinIONQC 104 | 质量控制 | 原始序列数据(在任何比对或组装步骤之前 - FASTQ、FAST5 格式) | 开源 | 独立式 | 2020 |
纳米包 105 | 质量控制 | 原始序列数据(在任何比对或组装步骤之前 - FASTQ、FAST5 格式) | 开源 | 工具套件 | 2023 |
SOAPnuke 106
SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience. 2018; 7https://doi.org/10.1093/gigascience/gix120IF: 9.2 Q1 | Quality Control | Raw sequence data (before any alignment or assembly steps - FASTQ format) | Open source | Stand-alone | 2024 |
SequelTools 107 | Quality Control | Raw Long-Read Sequencing Data (before any alignment or assembly steps - PacBio Sequencing, Oxford Nanopore Sequencing) | Open source | Stand-alone | 2020 |
ABySS-Explorer 108 | Assembly | ABySS Assemblies (scaffolds or contigs in FASTA format), Raw sequence data | Open source | Stand-alone | 2018 |
Assembly Graph Browser (AGB) 109 | Assembly | Assembly Graph Files (GFA (Graphical Fragment Assembly)) | Open source | Stand-alone | 2019 |
GfaViz 110 | Assembly | Assembly Graph Files (GFA (Graphical Fragment Assembly)) | Open source | Stand-alone | 2019 |
SGTK 111 | Assembly | Assembly Graph Files (GFA (Graphical Fragment Assembly)) | Open source | Toolkit | Archived in 2023 |
PanGraphViewer 112
PanGraphViewer: a versatile tool to visualize pangenome graphs. Bioinformatics. 2023; https://doi.org/10.1101/2023.03.30.534931 | Assembly/Pangenome | Pangenome graphs (rGFA, GFA_v1, VCF), Annotation Files (BED, GTF / GFF) | Open source | Stand-alone | 2022 |
MetagenomeScope | Assembly | GFA, FASTG, GML, LastGraph | Open source | Web-based tool | 2020 |
BinaRena 113 | Binning | (Human) Assembled Data (FASTA) | BSD 3-Clause License | Web application | 2023 |
CONCOCT 114
CONCOCT: Clust cONtigs Cover Compos. 2013; https://doi.org/10.48550/ARXIV.1312.4038 | Binning | Metagenomic Sequencing data, Contig Sequence | Open source | Stand-alone | 2019 |
MetaWRAP 115 | Binning | Metagenomic sequencing data (FASTQ format), Assembled contigs (FASTA), | Open source | Pipeline | 2020 |
VizBin 116
VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome. 2015; 3https://doi.org/10.1186/s40168-014-0066-1IF: 15.5 Q1 | Binning | Metagenomic Fragments (Contigs / reads)(FASTA) | BSD License (4-clause) | Stand-alone | 2019 |
Anvio 117 | Contig & Genome Viewer / Communities / Taxonomy | DNA sequence (FASTA), Contigs (FASTA), Short reads (FASTA), External / Internal genome database | Open source | Stand-alone | 2023 |
CGViewer.js 118 | Contig & Genome Viewer | JSON files | Open source | Web-based tool | 2019 |
CRAMER 119 | Contig, Genome & MSA Viewer | Metagenomic sequence data (Raw DNA sequence / FASTA files) | Open source | Stand-alone | 2019 |
Elviz 120 | Contig & Genome Viewer | Metagenomic sequence data (Raw DNA sequence / FASTA files) | Open source | Web-based application | 2024 |
GDV 121 | Contig, Genome & MSA Viewer | RNA-seq data, ChIP-seq data, Genome Sequence Data, Proteomic Data & Epigenomic Data | Open source | Web-based application | 2021 |
Gosling 122 | Contig, Genome & MSA Viewer | Metagenomic sequence data (Raw DNA sequence / FASTA files) | Open source | Toolkit | 2021 |
IMG/M 23 , IMG/VR
IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2018; https://doi.org/10.1093/nar/gky901IF: 14.9 Q1 30
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 2022; (gkac1037)https://doi.org/10.1093/nar/gkac1037IF: 14.9 Q1 | Contig and Genome Viewer | Visualization of IMG/M and IMG/VR contig annotations | Open source | Web-based platforms | 2023 |
IGV 123 | Genome Viewer | Metagenome sequence data (FASTA), Alignment Data, Variant Calls, Gene Annotations (GFF) | Open source | Stand-alone | 2023 |
JBrowse 124 | Genome Viewer | Metagenome sequence data (FASTA), Alignment Data, Variant Calls, Gene Annotations (GFF) | Open source | Stand-alone | 2024 |
MetaErg 47 | Contig Viewer | Metagenomic Contig, Gene Prediction File, Taxonomic Information File | Open source | Stand-alone pipeline | 2020 |
Tablet 125 | Genome Viewer | SAM (Sequence Alignment/Map) and BAM (Binary Alignment/Map), Variant Call Format (VCF), Metagenome Sequence, Genome Assembly Files, Sequence Read Files | BSD-2-Clause license | Stand-alone | 2021 |
UCSC Genome Browser 126 | Genome & MSA Viewer | Genome Sequence Data, Annotation Data (GFF), ChIP-Seq Data, RNA-seq Data, Multiple Sequence Alignments (MSA) | Open source | Online portal | 2022 |
ENSEMBL 127 | Genome Viewer | Genome Sequence Data, Annotation Data (GFF), ChIP-Seq Data, RNA-seq Data, Multiple Sequence Alignments (MSA) | Open source | Suite of tools | 2024 |
Artemis 128 | Genome Viewer | Genome Sequence Data, Annotation Data (Genebank, EMBL format) | Open source | Stand-alone | 2011 |
UGENE 129 | Genome Viewer | Genome Sequence Data (FASTA, GFF, SAM/BAM, BED), Annotation Data (Genebank, EMBL format, BED, GFF), Multiple Sequence Alignments (MAF), Expression Data Files | Open source | Stand-alone | 2023 |
Geneious 130 | Genome Viewer | Genome Sequence Data (FASTA, GFF, SAM/BAM, BED), Annotation Data (Genebank, EMBL format, BED, GFF), Multiple Sequence Alignments (MAF), Expression Data Files | Free trial - Requires subscription | Part of a software suite | 2023 |
BV-BRC 131 | MSA Viewer | Multiple Sequence Alignments (MSA) | Portal | Web-based resource | 2022 |
MSAViewer 132 | MSA Viewer | Multiple Sequence Alignments (MSA) | Open source | Web-based application | 2023 |
Strudel 133 | MSA Viewer | Metadata (CSV,TSV), Aligned Sequence Data, Phylogenetic Tree Data, Annotation Data (GFF) | Open source | Stand alone | 2015 |
SuiteMSA 134 | MSA Viewer | Multiple Sequence Alignments (MSA) | Open source | Stand alone | 2013 |
JalView 135 | MSA Viewer | Multiple Sequence Alignments (ex FASTA, Clustal, Stockholm) | Open source | Stand alone | 2023 |
MSABrowser 136
MSABrowser: dynamic and fast visualization of sequence alignments, variations and annotations. Bioinforma Adv. 2021; 1vbab009https://doi.org/10.1093/bioadv/vbab009 | MSA Viewer | Multiple Sequence Alignments (MSA) | Open source | Stand-alone web-based application | 2021 |
Seaview 137
Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation. in: Katoh K. Multiple Sequence Alignment. Springer US,
New York, NY2021: 241-260https://doi.org/10.1007/978-1-0716-1036-7_15 | MSA Viewer | Multiple Sequence Alignments (ex FASTA, Clustal, Stockholm, PHYLIP) | Open source | Stand-alone or helper application | 2024 |
Panache 138 | Pangenome Viewer | Graphical Fragment Assembly (GFA) | Open source | Web-based interface | 2022 |
Pan-Tetris 139
Pan-Tetris: an interactive visualisation for Pan-genomes. BMC Bioinforma. 2015; 16S3https://doi.org/10.1186/1471-2105-16-S11-S3IF: 3.0 Q2 | Pangenome Viewer | Pangenome map files (ex PanGee), meta-information (TIGRFAM) | Open source | Software tool | 2015 |
PanViz 140 | Pangenome Viewer | Pangenome Matrix (pattern of each gene group) and functional annotation files (GeneOntology) | Open source | Pipeline | 2017 |
PanX 141
panX: pan-genome analysis and exploration. Nucleic Acids Res. 2018; 46 (e5–e5)https://doi.org/10.1093/nar/gkx977IF: 14.9 Q1 | Pangenome Viewer | Set of annotated bacterial strains (NCBI RefSeq, users input in GeneBank format) | Open source | Pipeline | 2018 |
Pantools 142 | Pangenome & Panproteome Viewer | Annotation Files (GTF / GFF), Multiple Sequence Alignment File (FASTA), Genomic Sequence Files (FASTA), Variations adding (VCF files and a PAV table) | Open source | Stand-alone | 2024 |
Bifrost 143 | Pangenome Viewer | Annotation Files (GTF / GFF), Multiple Sequence Alignment File (FASTA), Genomic Sequence Files (FASTA), | Open source | Stand-alone | 2024 |
PanGenome Graph Builder 144 | Pangenome Viewer | Annotation Files (GTF / GFF), Multiple Sequence Alignment File (FASTA), Genomic Sequence Files (FASTA) | Open source | Stand-alone | 2024 |
TwoPaCo 145 | Pangenome Viewer | Annotation Files (GTF / GFF), Multiple Sequence Alignment File (FASTA), Genomic Sequence Files (FASTA) | Open source | Stand-alone | 2022 |
Minigraph-Cactus 146
Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2023; https://doi.org/10.1038/s41587-023-01793-wIF: 46.9 Q1 | Pangenome Viewer | Annotation Files (GTF / GFF), Multiple Sequence Alignment File (FASTA), Genomic Sequence Files (FASTA) | Open source | Pipeline | 2024 |
Jasper/Microbiome Maps 147
Microbiome maps: Hilbert curve visualizations of metagenomic profiles. Front Bioinform. 2023; 31154588https://doi.org/10.3389/fbinf.2023.1154588 | Abundance analysis / Taxonomy / Ecosystem visualization | Abundance profiles / OTU table | Not open source | Stand-alone | 2023 |
QIIME / QIIME 2 148
QIIME 2 enables comprehensive end‐to‐end analysis of diverse microbiome data and comparative studies with publicly available data. CP Bioinforma. 2020; 70e100https://doi.org/10.1002/cpbi.100 | Communities/ Taxonomy | raw DNA sequence reads | Open source | Analysis package | 2024 |
Phyloseq 149
phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013; 8e61217https://doi.org/10.1371/journal.pone.0061217IF: 3.7 Q2 | Communities/ Taxonomy | OTU table (operational taxonomic units), phylogenetic tree | Open source | R package | 2013 |
MicrobiomeAnalyet 150 | Communities/ Taxonomy/PCA visualization | OTU table (operational taxonomic units), taxon list, gene list, Gene abundance table, BIOM file | Open source | Web-based platform | 2024 |
MetagenomeSeq 151 Joseph Nathaniel Paulson HT. metagenomeSeq 2017. https://doi.org/10.18129/B9.BIOC.METAGENOMESEQ. | Communities/ Taxonomy/PCA visualization | Taxonomic or Functional Annotations, Count Data Table | Open source | R package | 2019 |
MEGA 152 | Taxonomy | Metagenome sequence data (FASTA), Phylogenetic Data (NEXUS, NEWICK) | Open source | Can be used as stand-alone and as part of a pipeline | 2022 |
PAUP 153 Wilgenbusch J.C., Swofford D. Inferring Evolutionary Trees with PAUP *. CP in Bioinformatics 2003;00. https://doi.org/10.1002/0471250953.bi0604s00. | Taxonomy | Metagenome sequence data (FASTA), Phylogenetic Data (NEXUS, NEWICK) | Proprietary, and thus commercial | Stand-alone | 2007 |
FigTree | Taxonomy | Phylogenetic Data (NEXUS, NEWICK) | Open source | Stand-alone | 2018 |
iTOL 154 , 155
itol.toolkit accelerates working with iTOL (Interactive Tree of Life) by an automated generation of annotation files. Bioinformatics. 2023; 39btad339https://doi.org/10.1093/bioinformatics/btad339IF: 5.8 Q1 | Taxonomy | Phylogenetic Data (NEXUS, NEWICK) | Open source | Web-based platform | 2023 |
PhyD3 156 | Taxonomy | Phylogenetic Data (NEXUS, NEWICK) | Open source | Web-based tool | 2017 |
Dendroscope 157 | Taxonomy (viewer) | Phylogenetic Data (NEXUS, NEWICK) | Open source | Stand-alone | 2023 |
Cytoscape 158 , 159 | Network visualization | Graphs - Lists (source - destination) | Open source | Stand-alone | 2023 |
Gephi 160 Bastian M., Heymann S., Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks 2009. https://doi.org/10.13140/2.1.1341.1520. | Network visualization | Graphs - Lists (source - destination) | Open source | Stand-alone | 2023 |
Pajek 161
Analysis and visualization of large networks with program package Pajek. Complex Adapt Syst Model. 2016; 4https://doi.org/10.1186/s40294-016-0017-8 | Network visualization Large Networks | Has its file format | Open source | Stand-alone | 2023 |
Arena3Dweb 162 ,
Arena3Dweb: interactive 3D visualization of multilayered networks. Nucleic Acids Res. 2021; https://doi.org/10.1093/nar/gkab278IF: 14.9 Q1 163
Arena3Dweb: interactive 3D visualization of multilayered networks supporting multiple directional information channels, clustering analysis and application integration. NAR Genom Bioinforma. 2023; 5lqad053https://doi.org/10.1093/nargab/lqad053IF: 4.6 | Network visualization 3D Multilayered Networks | Network lists (source - destination but by defining their layers) | Open source | Web server and stand-alone | 2023 |
NORMA 164 , 165
The network makeup artist (NORMA-2.0): distinguishing annotated groups in a network using innovative layout strategies. Bioinforma Adv. 2022; 2vbac036https://doi.org/10.1093/bioadv/vbac036 | Network and group visualization | Network lists (source - destination) and annotation files (nodes and the annotation group they belong to) | Open source | Web server and stand-alone | 2022 |
6.1 质量控制
在宏基因组分析中,常见的做法是从原始序列数据生成支架或宏基因组组装基因组 (MAG)。此过程中关键的初始阶段是对原始数据进行质量控制 (QC)。这包括评估读数和碱基质量、修剪接头、分析 GC 分布、消除受污染的读数、解决富集偏差、生成质量指标以及各种其他步骤。为此目的创建了许多工具,生成上述统计数据的可视化表示,例如 FastQC、LongQC [
6.2 组装
基因组组装是一个复杂的过程,涉及将 DNA 序列拼凑在一起,本质上是构建生物体基因组数据的扩展 DNA 序列(重叠群),以试图重建其完整的基因组。生物体的基因组是其全部 DNA 内容,包括基因和非编码区。如果参考基因组可用,则将读数与该基因组进行比对,而在没有参考基因组的情况下,则采用从头组装。从头组装对于研究非模式生物、具有显着结构变异的基因组或具有不同基因组的群体尤其重要。
组装可视化是指基因组组装过程结果的图形表示,有助于研究人员了解组装基因组的结构和特征。可视化基因组组装对于质量评估、识别潜在问题和深入了解整体基因组架构至关重要。为此,可以使用大量工具进行从头宏基因组组装 [
- Lapidus A.L.
- Korobeynikov A.I.
- Vollmers J.
- Wiegand S.
- Kaster A.-K.
- Zerbino D.R.
- Namiki T.
- Hachiya T.
- Tanaka H.
- Sakakibara Y.
- Yuan Y.
- Ma RK-K
- Chan T.-F.
6.3 分箱
Βinning 是宏基因组分析中的关键步骤,涉及对基因组片段(contigs)进行分组以重建微生物基因组草案(MAG)[
- Kang D.D.
- Li F.
- Kirton E.
- Thomas A.
- Egan R.
- An H.
- et al.
- Kang D.D.
- Froula J.
- Egan R.
- Wang Z.
- Lin H.-H.
- Liao Y.-C.
- Alneberg J.
- Bjarnason B.S.
- de Bruijn I.
- Schirmer M.
- Quick J.
- Ijaz U.Z.
- et al.
- Laczny C.C.
- Sternal T.
- Plugaru V.
- Gawron P.
- Atashpendar A.
- Margossian H.H.
- et al.
- Lin H.-H.
- Liao Y.-C.
- Seah B.K.B.
- Gruber-Vodicka H.R.
6.4 社区检测
宏基因组分析分为几个关键步骤,每个步骤都有助于全面了解微生物群落。聚类是生物信息学和宏基因组分析中的一项基本技术,可以揭示复杂数据集中的潜在模式和关系。层次聚类作为一种重要的非基于图的方法脱颖而出。它将序列组织成簇的层次结构,通常可视化为树状图,提供微生物实体之间关系的深刻表示。凝聚方法(其中各个簇逐渐合并)和分裂方法(其中单个簇迭代地划分)是两种主要策略。广泛使用的凝聚层次聚类算法包括单联动、完全联动、质心联动和平均联动方法,以及邻居连接[
另一种方法是应用基于图的聚类[
- Atkinson H.J.
- Morris J.H.
- Ferrin T.E.
- Babbitt P.C.
- Azad A.
- Pavlopoulos G.A.
- Ouzounis C.A.
- Kyrpides N.C.
- Buluç A.
Selvitopi O., Ekanayake S., Guidi G., Pavlopoulos G.A., Azad A., Buluc A. Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA: IEEE; 2020, p. 1–14. https://doi.org/10.1109/SC41405.2020.00079.
Selvitopi O., Ekanayake S., Guidi G., Awan M.G., Pavlopoulos G.A., Azad A., et al. Extreme-Scale Many-against-Many Protein Similarity Search. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA: IEEE; 2022, p. 1–12. https://doi.org/10.1109/SC41404.2022.00006.
- Lapidus A.L.
- Korobeynikov A.I.
- Torun F.M.
- Bilgin H.I.
- Kaplan O.I.
有多种工具可促进宏基因组分析中的聚类和可视化,例如 QIIME 2 [
- Estaki M.
- Jiang L.
- Bokulich N.A.
- McDonald D.
- González A.
- Kosciolek T.
- et al.
- Eren A.M.
- Esen Ö.C.
- Quince C.
- Vineis J.H.
- Morrison H.G.
- Sogin M.L.
- et al.
- McMurdie P.J.
- Holmes S.
- Estaki M.
- Jiang L.
- Bokulich N.A.
- McDonald D.
- González A.
- Kosciolek T.
- et al.
- Eren A.M.
- Esen Ö.C.
- Quince C.
- Vineis J.H.
- Morrison H.G.
- Sogin M.L.
- et al.
- Eren A.M.
- Esen Ö.C.
- Quince C.
- Vineis J.H.
- Morrison H.G.
- Sogin M.L.
- et al.
- McMurdie P.J.
- Holmes S.
主成分分析 (PCA) [
当前版本的 QIIME2 [
- Estaki M.
- Jiang L.
- Bokulich N.A.
- McDonald D.
- González A.
- Kosciolek T.
- et al.
Joseph Nathaniel Paulson HT. metagenomeSeq 2017. https://doi.org/10.18129/B9.BIOC.METAGENOMESEQ.
6.5 基因组/重叠群查看器
基因组查看器是用于可视化和分析基因组数据的工具,为研究人员、科学家和生物信息学家提供遗传信息的图形表示,使他们能够探索、解释和理解基因组的复杂性 [
- Chen I.-M.A.
- Chu K.
- Palaniappan K.
- Pillay M.
- Ratner A.
- Huang J.
- et al.
- Camargo A.P.
- Nayfach S.
- Chen I.-M.A.
- Palaniappan K.
- Ratner A.
- Chu K.
- et al.
- Eren A.M.
- Esen Ö.C.
- Quince C.
- Vineis J.H.
- Morrison H.G.
- Sogin M.L.
- et al.
泛基因组查看器是旨在可视化和分析泛基因组数据的工具或软件应用程序。这些工具通过提供全基因组的交互式和信息丰富的视觉表示,帮助研究人员探索一个物种或一组相关生物体内的遗传多样性[
- Eizenga J.M.
- Novak A.M.
- Sibbesen J.A.
- Heumos S.
- Ghaffaari A.
- Hickey G.
- et al.
- Vernikos G.S.
- Hennig A.
- Bernhardt J.
- Nieselt K.
- Ding W.
- Baumdicker F.
- Neher R.A.
- Yuan Y.
- Ma RK-K
- Chan T.-F.
- Hickey G.
- Monlong J.
- Ebler J.
- Novak A.M.
- Eizenga J.M.
- Gao Y.
- et al.
Contig 可视化工具用于表示和分析由短 DNA 测序读数组装的 DNA 或其他生物分子的连续序列。可视化重叠群对于评估基因组或转录组组装的质量、识别结构变异以及深入了解基因组区域的组织至关重要。已建立的工具有绷带 [
最后,多序列比对 (MSA) 对于比较和理解同源序列之间的相似性和差异至关重要。多序列比对 (MSA) 可视化工具,例如 AlignmentViewer、BV-BRC [
- Gouy M.
- Tannier E.
- Comte N.
- Parsons D.P.
- Torun F.M.
- Bilgin H.I.
- Kaplan O.I.
6.6 分类
分类学旨在根据共同特征和进化关系对生物体进行分类。分类系统以层次框架呈现,范围从更广泛的类别到更具体的类别。基因组分类数据库(GTDB;https://gtdb.ecogenomic.org)为原核生物提供最先进的基于基因组的分类法,该分类法在系统发育上是一致的,并且是等级标准化的 [
- Pavlopoulos G.A.
- Soldatos T.G.
- Barbosa-Silva A.
- Schneider R.
- Zhou T.
- Xu K.
- Zhao F.
- Liu W.
- Li L.
- Hua Z.
- et al.
- McMurdie P.J.
- Holmes S.
- Andersen K.S.
- Kirkegaard R.H.
- Karst S.M.
- Albertsen M.
Joseph Nathaniel Paulson HT. metagenomeSeq 2017. https://doi.org/10.18129/B9.BIOC.METAGENOMESEQ.
- Platzer A.
- Polzin J.
- Rembart K.
- Han P.P.
- Rauer D.
- Nussbaumer T.
Wilgenbusch J.C., Swofford D. Inferring Evolutionary Trees with PAUP *. CP in Bioinformatics 2003;00. https://doi.org/10.1002/0471250953.bi0604s00.
- Eren A.M.
- Esen Ö.C.
- Quince C.
- Vineis J.H.
- Morrison H.G.
- Sogin M.L.
- et al.
6.7 网络和协会
利用宏基因组学领域的网络为了解群落内微生物之间复杂的相互作用提供了宝贵的见解。例如,分类网络通过采用分类学分类来帮助理解不同微生物类群之间的关系。这些网络中的节点代表分类单位,而边缘则表示相似或共现的程度。功能网络能够探索基于功能注释构建的微生物基因或通路之间的关系。共现网络说明了各种微生物物种或功能基因之间的共存模式,揭示了潜在的共生或拮抗关系。生态网络用于分析群落动态、识别关键物种、评估网络稳定性并衡量环境因素对微生物相互作用的影响。系统发育网络显示微生物物种之间的进化关系,揭示模式并帮助识别具有共享功能的密切相关的分类单元。宿主-微生物网络代表宿主生物体与栖息在不同身体部位的微生物群落之间复杂的相互作用和关系。例如,人类很容易受到大量微生物的侵害,包括细菌、病毒、真菌和其他微生物。另外,疾病关联网络在调查微生物群落与宿主健康之间的相关性方面发挥着作用。与前一类别类似,这些网络的构建涵盖了宿主-微生物和微生物-微生物之间的相互作用,为微生物组在健康和疾病中的作用提供了相关的见解。 最后,微生物组流行病学网络表示种群内微生物之间的相互关联,重点关注微生物群落的流行病学维度。这种形式的网络分析需要检查影响人群中微生物流行的扩散、传播和因素。
为此,网络可视化,无论是单独还是组合,都可以有助于结论的提取。例如,检查时间和空间动态可以说明微生物网络如何随时间或在不同空间位置演变,从而提供对微生物群落内时间和空间变化的见解。此外,网络可以实现功能预测。利用基于网络的方法有助于根据网络中邻近基因的功能预测基因功能,这在功能注释不完整的情况下特别有用。
所有上述关系都可以以网络的形式捕获和查看 [
Bastian M., Heymann S., Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks 2009. https://doi.org/10.13140/2.1.1341.1520.
- Mrvar A.
- Batagelj V.
- Karatzas E.
- Baltoumas F.A.
- Panayiotou N.A.
- Schneider R.
- Pavlopoulos G.A.
- Kokoli M.
- Karatzas E.
- Baltoumas F.A.
- Schneider R.
- Pafilis E.
- Paragkamian S.
- et al.
- Karatzas E.
- Koutrouli M.
- Baltoumas F.A.
- Papanikolopoulou K.
- Bouyioukos C.
- Pavlopoulos G.A.
图 5a-c 中展示了几个涉及使用网络可视化来描述宏基因组数据集的示例,这些示例是使用从 NMPFamsDB 数据库获取的数据创建的。图 5a 展示了使用 Gephi 渲染的所有可用新型宏基因组蛋白家族 (NMPF) 在八种主要生物群落类型(淡水、海洋、土壤、植物、人类、哺乳动物、其他宿主相关和工程环境)中分布的网络可视化。生物群落由中央彩色节点(集线器)表示,而灰色外围节点表示 NMPF,边缘表示 NMPF-生物群落关联。通过这种表示,可以可视化出现在多个生物群落中的 NMPF,以及仅限于特定生物群落的 NMPF。图 5b 显示了使用 Arena3D web 创建的三维 (3D) 多层网络,其中包含与四个主要人类微生物组系统(皮肤、呼吸、消化和生殖系统)连接的所有 NMPF。此外,每个 NMPF 都注释有其源微生物组样本(宏基因组或元转录组)的性质,以及是否具有预测的蛋白质结构模型。该信息被组织为多个层。蛋白质家族本身在中心层中描绘,节点对应于 NMPF,层内边缘描绘了同一宏基因组样本中 NMPF 的共存。层间边将每个 NMPF 与其相应的注释连接起来,包括与特定生物群落的关联、源数据集的性质以及 3D 蛋白质模型的可用性。最后,图 5c 显示了来自 NMPFamsDB 的新型宏基因组蛋白家族 (F006270) 的基因邻域的网络表示,使用 NORMA 渲染。 该家族的邻近区域由与已知 Pfam 结构域(例如 p450)命中的蛋白质和/或与 COG 功能(例如“防御机制”或“代谢物生物合成”)相关的蛋白质组成。通过这些关联,可以推断蛋白质家族中未注释基因的潜在功能。总的来说,这些例子展示了网络在宏基因组数据和元数据的可视化、分析和注释方面提供先进方法的能力。
6.8 基因邻域和同线性保守分析
在原核基因组中,功能相关的基因往往被分组,共享共同的调控机制并形成保守的基因邻域。对这些邻域的研究通常以同线性保守分析的形式进行,其中比较多个基因组,或者在宏基因组学的情况下,比较多个宏基因组支架,以研究一个或多个研究基因周围是否存在共同的坐标模式。基因组同线性是指不同物种染色体中基因和其他基因组元件的相对顺序的保守性,通常用于研究物种之间的进化关系并识别直系同源基因,即不同物种中从共同祖先进化而来的基因基因。识别不同支架之间的共同基因背景可用于对以前未知的宏基因组序列进行功能注释(例如,NMPFamsDB [
- Rodríguez Del Río Á.
- Giner-Lamia J.
- Cantalapiedra C.P.
- Botas J.
- Deng Z.
- Hernández-Plaza A.
- et al.
- Karatzas E.
- Koutrouli M.
- Baltoumas F.A.
- Papanikolopoulou K.
- Bouyioukos C.
- Pavlopoulos G.A.
6.9 生物群落分布/生态系统/地理分布
生物群落分布、生态系统和地理分布是相互关联的概念,在理解地球上生命的多样性以及生物体与其环境之间的复杂关系方面发挥着至关重要的作用。生物群落是指以独特的气候、植被和动物生命为特征的广阔地理区域。地球上生物群落的分布受到温度、降水和阳光等因素的影响。生物群落的例子包括热带雨林、沙漠、苔原和草原。每个生物群落都具有独特的生态特征,生物群落的全球分布对地球的生物多样性做出了重大贡献。生态系统是较小的、局部的生物体群落,与其物理环境相互作用。这些生态系统,从淡水池塘到珊瑚礁、森林或草原,其分布受到气候、地形、土壤成分和其他环境因素的影响。地理分布是指地球上生物体的空间排列,包括跨区域的发生和丰度模式。气候、地貌和人类活动等因素影响着生命形式的地理分布。了解地理分布对于研究生物多样性、生态模式以及环境变化对各种物种的影响至关重要。
生物群落分布、生态系统和地理分布通过复杂的生态动力学错综复杂地联系在一起。生物群落的特征决定了其所拥有的生态系统的类型,而物种的地理分布通常与其所栖息的特定生物群落和生态系统相关。环境变化,无论是自然的还是人为的,都会对这些相互联系产生深远的影响,随着时间的推移影响生物群落和生态系统的分布。
可视化生物群落分布、生态系统和地理分布有助于揭示地球生物多样性和生态动态的复杂性。通过先进的可视化技术,研究人员可以绘制生物群落的全球分布图,突出显示不同地理区域的独特气候、植被和动物生命特征(参见图 5f-h 中的 COVID-19 示例)。这些可视化不仅提供了对生物群落、生态系统和地理特征之间关系的全面理解,而且还可以作为向更广泛的受众传达复杂生态概念、培养环境意识和管理能力的强大工具。虽然可以使用第 4 节(可视化概念)中概述的方法来实现自定义生物群系可视化,但也可以在宏基因组资源中访问各种预构建的查看器。 IMG/M、MGnify 或 SPIRE 等数据库使用 GOLD 生态系统分类(图 5e),并为每个提交的数据集提供地理位置数据可视化。 GOLD 还提供了一个专门的浏览器,用于根据微生物组元数据探索生物群落的地理分布。 NMPFamsDB 提供每个 NMPF 的生态系统和地理分布的可视化。此外,该数据库还提供了用于生成自定义图(条形图、维恩图、Circos 图、颜色编码矩阵和翻转图)的专用工具,用于测量用户选择的 NMPF 的生态系统和系统发育分布,以及地理分布每个 NMPF。最后,微生物组地图资源使用 Jasper [
- Valdes C.
- Stebliankin V.
- Ruiz-Perez D.
- Park J.I.
- Lee H.
- Narasimhan G.
7. 讨论
可视化工具是基因组学和宏基因组学中复杂生物数据分析和解释中不可或缺的资产。基因组学和宏基因组学研究见证了数据生成的指数级激增,需要强大的可视化工具来揭示这些数据集中编码的复杂性。虽然可视化技术的进步极大地增强了研究人员探索和解释生物数据的能力,但仍然存在一些挑战:
7.1 传达复杂性
尽管取得了进步,可视化工具通常难以有效地传达基因组和宏基因组数据集固有的复杂性。例如,生态位内微生物群落动态的可视化可能会过度简化复杂的相互作用,从而导致对生态模式的潜在误解。
7.2 计算需求
某些可视化工具提出了巨大的计算要求,使得高性能计算资源的访问权限有限的研究人员无法使用它们。例如,采用复杂算法对基因组结构进行三维可视化的工具可能需要大量的计算能力,从而限制了它们在资源有限的环境中的实用性。
7.3 兼容性问题
可视化工具、数据格式和操作系统之间的兼容性问题带来了巨大的挑战。例如,生物信息学管道和可视化平台之间的互操作性可能需要复杂的数据预处理步骤,从而引入潜在的错误并阻碍无缝数据分析工作流程。
7.4 可扩展性限制
当面对大规模基因组和宏基因组数据集时,可视化工具的可扩展性经常受到考验。例如,在分析包含不同微生物种群或广泛测序深度的数据集时,为可视化微生物群落多样性而设计的工具可能会表现出性能下降或计算时间增加。
7.5 学习曲线
一些可视化工具需要陡峭的学习曲线,要求研究人员投入大量时间和精力来掌握其功能。
7.6 对未来技术的调整
随着可视化工具适应虚拟现实 (VR) 等未来技术,它们将经历一场变革性的演变。将 VR 功能集成到可视化工具中,有望彻底改变研究人员探索生物数据和与生物数据交互的方式。通过利用 VR 技术,可视化工具可以提供身临其境的交互式体验,超越传统 2D 可视化的局限性。例如,研究人员可以浏览基因组景观的三维表示,用手势操纵分子结构,或者在沉浸式虚拟环境中探索复杂的生物网络。此外,增强现实(AR)技术的出现为将虚拟数据可视化叠加到物理世界提供了令人兴奋的可能性,使研究人员能够将生物学见解无缝集成到他们的实验室实验或现场工作中。随着 VR 和 AR 技术的不断发展,可视化工具将在充分利用这些沉浸式技术的潜力来释放对生物系统复杂性的新见解并加速科学发现方面发挥关键作用。
尽管面临挑战,可视化工具的进步包括大量的前沿创新。这些进步涵盖了广泛的变革性功能,例如:
7.7 直观表示
现代可视化工具提供直观的表示,促进数据探索和解释。例如,Krona 等工具利用交互式旭日可视化来描述分类层次结构,使研究人员能够轻松辨别微生物群落组成。
7.8 交互特征和动态探索
交互功能的结合使得基因组和宏基因组数据的动态探索成为可能。著名的例子包括 Anvi'o,它允许用户交互式地可视化和注释宏基因组组件,从而促进对基因组背景的实时探索。
7.9 数据集成
生物信息学可视化工具展示了先进的数据集成功能,彻底改变了研究人员合成不同组学数据集和揭示复杂生物现象的能力。这些工具有助于基因组学/宏基因组学、转录组学、/宏转录组学、蛋白质组学和代谢组学数据的无缝集成,从而实现生物系统的整体分析。
7.10 社区参与和持续发展
流行的可视化工具通常拥有活跃的用户社区,促进协作开发和持续改进。用于基因组分析的 Galaxy 和用于网络分析和可视化的 Cytoscape 是两个典型的例子。
7.11 定制灵活性
提供定制选项的工具使研究人员能够根据其特定的研究问题和偏好定制可视化效果。这方面的一个典型工具是 Circos,它能够创建高度可定制的圆形图来可视化基因组数据,使研究人员能够精确地突出感兴趣的基因组特征。
7.12 再现性
基因组可视化工具通过提供透明且可复制的方法来可视化和分析基因组数据,在确保可重复性方面发挥着至关重要的作用。
总之,可视化工具是基因组学和宏基因组学研究不可或缺的资产,为复杂的生物现象提供了宝贵的见解。虽然最近的进步显着增强了可视化工具的实用性和可访问性,但仍然存在一些挑战,需要不断的创新和改进。通过应对这些挑战并利用新兴技术,研究人员可以充分利用可视化工具的潜力来加深我们对基因组和宏基因组景观复杂性的理解。
作者贡献
所有作者都对各种工具进行了测试和基准测试。所有作者均已阅读并批准该手稿。
资金
健康基金会;奥纳西斯基金会; ARISE 计划来自欧盟 Horizon 2020 研究和创新计划,根据 Marie Skłodowska-Curie 赠款协议第 945405 号;美国能源部联合基因组研究所 (https://ror.org/04xm1d337),美国能源部科学用户设施办公室,由美国能源部科学办公室支持,根据合同号 DE-AC02–05CH11231 运营;来自宾夕法尼亚州立大学医学院的启动资金以及来自宾夕法尼亚州立大学哈克生命科学研究所的哈克创新和转型种子基金 (HITS) 奖;希腊研究与创新基金会 (H.F.R.I),名为“希腊 2.0 - 基础研究融资行动(所有科学的横向支持),子行动 II”,拨款 ID:16718-PRPFOR; “希腊 2.0 - 国家恢复和复原力计划”,拨款 ID:TAEDR-0539180。
CRediT 作者贡献声明
Maria Chasapi:数据管理、形式分析、调查、资源、可视化、写作 - 初稿、写作 - 评论和编辑。 Nikolaos Vergoulidis:数据管理、形式分析、调查、验证、可视化、写作 - 初稿、写作 - 审查和编辑。玛丽亚·科科利:调查。 Nefeli K Venetsianou:调查。 Evangelos Karatzas:调查、方法论、验证、可视化、写作 - 初稿、写作 - 评论和编辑。 Ioannis Iliopoulos:调查、项目管理、可视化、写作 - 初稿、写作 - 审查和编辑。 Nikos C Kyrpides:资金收购、监督、写作 - 初稿、写作 - 审查和编辑。 Evangelos Pafilis:数据管理、调查、方法论、写作审查和编辑。 Fotis A Baltoumas:概念化、数据管理、形式分析、调查、方法论、项目管理、监督、验证、可视化、写作 - 初稿、写作 - 审查和编辑。 Georgios A Pavlopoulos:概念化、调查、监督、可视化、写作 - 初稿、写作 - 评论和编辑。 Eleni Panagiotopoulou:调查、写作、评论和编辑。 Eleni Aplakidou:数据管理、形式分析、调查、方法论、项目管理、资源、监督、可视化、写作 - 初稿、写作 - 审查和编辑。 Ilias Georgakopoulos-Soares:调查、写作——评论和编辑。
竞争利益声明
作者声明,他们没有已知的可能影响本文报告工作的相互竞争的经济利益或个人关系。
参考
- 细胞。 2008年; 134:708-713
宏基因组学和代谢组学联姻的邀请。
https://doi.org/10.1016/j.cell.2008.08.025IF:64.5 第一季度
拯救微生物就是拯救地球。呼吁国际微生物学会联盟 (IUMS) 采取行动。
一种健康展望。 2023; 5:5
https://doi.org/10.1186/s42522-023-00077-2IF:4.9
大多数生物群落中很大比例的细菌和古细菌仍未培养。
ISME J.2019; 13:3126-3130
https://doi.org/10.1038/s41396-019-0484-yIF:11.0 第一季度- JRSM。 2002年; 95:81-83
不可培养的细菌——引起口腔感染的未知生物体。
https://doi.org/10.1258/jrsm.95.2.81IF:17.3 第一季度 - The human gut microbiome – a potential controller of wellness and disease.Front Microbiol. 2018; 9: 1835https://doi.org/10.3389/fmicb.2018.01835IF: 5.2 Q2
- A systematic review on omics data (metagenomics, metatranscriptomics, and metabolomics) in the role of microbiome in gallbladder disease.Front Physiol. 2022; 13888233https://doi.org/10.3389/fphys.2022.888233IF: 4.0 Q2
- Metagenomics, metatranscriptomics, and metabolomics approaches for microbiome analysis: supplementary issue: bioinformatics methods and applications for big metagenomics data.Evol Bioinform Online. 2016; 12s1EBO.S36436https://doi.org/10.4137/EBO.S36436IF: 2.6 Q2
- Metagenomics: an effective approach for exploring microbial diversity and functions.Foods. 2023; 12: 2140https://doi.org/10.3390/foods12112140IF: 5.2 Q1
- Metagenomic analyses: past and future trends.Appl Environ Microbiol. 2011; 77: 1153-1161https://doi.org/10.1128/AEM.02345-10IF: 4.4 Q2
- Recent progress and new challenges in metagenomics for biotechnology.Biotechnol Lett. 2010; 32: 1351-1359https://doi.org/10.1007/s10529-010-0306-9IF: 2.7 Q3
- Analysis and Interpretation of metagenomics data: an approach.Biol Proced Online. 2022; 24: 18https://doi.org/10.1186/s12575-022-00179-7IF: 6.4 Q1
- Advances and challenges in metatranscriptomic analysis.Front Genet. 2019; 10: 904https://doi.org/10.3389/fgene.2019.00904IF: 3.7 Q2
- Metatranscriptomics for the human microbiome and microbial community functional profiling.Annu Rev Biomed Data Sci. 2021; 4: 279-311https://doi.org/10.1146/annurev-biodatasci-031121-103035IF: 6.0
- Use of metatranscriptomics in microbiome research.Bioinform Biol Insights. 2016; 10BBI.S34610https://doi.org/10.4137/BBI.S34610IF: 5.8
- RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.Nucleic Acids Res. 2024; 52: D762-D769https://doi.org/10.1093/nar/gkad988IF: 14.9 Q1
- UniProt: the universal protein knowledgebase in 2021.Nucleic Acids Res. 2021; 49: D480-D489https://doi.org/10.1093/nar/gkaa1100IF: 14.9 Q1
- Web resources for metagenomics studies.Genom, Proteom Bioinforma. 2015; 13: 296-303https://doi.org/10.1016/j.gpb.2015.10.003IF: 9.5 Q1
- GenBank.Nucleic Acids Res. 2022; 50: D161-D164https://doi.org/10.1093/nar/gkab1135IF: 14.9 Q1
- DNA Data Bank of Japan (DDBJ) update report 2022.Nucleic Acids Res. 2023; 51: D101-D105https://doi.org/10.1093/nar/gkac1083IF: 14.9 Q1
- The European Nucleotide Archive in 2021.Nucleic Acids Res. 2022; 50: D106-D110https://doi.org/10.1093/nar/gkab1051IF: 14.9 Q1
- The sequence read archive: explosive growth of sequencing data.Nucleic Acids Res. 2012; 40: D54-D56https://doi.org/10.1093/nar/gkr854IF: 14.9 Q1
- Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9.Nucleic Acids Res. 2022; (gkac974)https://doi.org/10.1093/nar/gkac974IF: 14.9 Q1
- IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes.Nucleic Acids Res. 2018; https://doi.org/10.1093/nar/gky901IF: 14.9 Q1
- The IMG/M data management and analysis system v.7: content updates and new features.Nucleic Acids Res. 2022; (gkac976)https://doi.org/10.1093/nar/gkac976IF: 14.9 Q1
- MGnify: the microbiome analysis resource in 2020.Nucleic Acids Res. 2019; gkz1035https://doi.org/10.1093/nar/gkz1035IF: 14.9 Q1
- SPIRE: a searchable, planetary-scale microbiome REsource.Nucleic Acids Res. 2024; 52: D777-D783https://doi.org/10.1093/nar/gkad943IF: 14.9 Q1
- MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis.Brief Bioinform. 2019; 20: 1151-1159https://doi.org/10.1093/bib/bbx105IF: 9.5 Q1
- DOE JGI metagenome workflow.mSystems. 2021; 6e00804-20https://doi.org/10.1128/mSystems.00804-20IF: 6.4 Q1
- IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses.Nucleic Acids Res. 2021; 49: D764-D775https://doi.org/10.1093/nar/gkaa946IF: 14.9 Q1
- IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata.Nucleic Acids Res. 2022; (gkac1037)https://doi.org/10.1093/nar/gkac1037IF: 14.9 Q1
- Uncovering Earth’s virome.Nature. 2016; 536: 425-430https://doi.org/10.1038/nature19094IF: 64.8 Q1
- IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.Nucleic Acids Res. 2017; 45: D457-D465https://doi.org/10.1093/nar/gkw1030IF: 14.9 Q1
- KBase: The United States Department of Energy Systems Biology Knowledgebase.Nat Biotechnol. 2018; 36: 566-569https://doi.org/10.1038/nbt.4163IF: 46.9 Q1
- NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes.Nucleic Acids Res. 2024; 52: D502-D512https://doi.org/10.1093/nar/gkad800IF: 14.9 Q1
- Unraveling the functional dark matter through global metagenomics.Nature. 2023; 622: 594-602https://doi.org/10.1038/s41586-023-06583-7IF: 64.8 Q1
- Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.Front Bioinform. 2023; 31157956https://doi.org/10.3389/fbinf.2023.1157956
- Functional and evolutionary significance of unknown genes from uncultivated taxa.Nature. 2023; https://doi.org/10.1038/s41586-023-06955-zIF: 64.8 Q1
- Biosynthetic potential of the global ocean microbiome.Nature. 2022; 607: 111-118https://doi.org/10.1038/s41586-022-04862-3IF: 64.8 Q1
- Strains, functions and dynamics in the expanded Human Microbiome Project.Nature. 2017; 550: 61-66https://doi.org/10.1038/nature23889IF: 64.8 Q1
- TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes.Nucleic Acids Res. 2019; gkz994https://doi.org/10.1093/nar/gkz994IF: 14.9 Q1
- MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes.Environ Micro. 2022; 17: 57https://doi.org/10.1186/s40793-022-00449-7IF: 7.9 Q1
- HumanMetagenomeDB: a public repository of curated and standardized metadata for human metagenomes.Nucleic Acids Res. 2021; 49: D743-D750https://doi.org/10.1093/nar/gkaa1031IF: 14.9 Q1
- The MAR databases: development and implementation of databases specific for marine metagenomics.Nucleic Acids Res. 2018; 46: D692-D699https://doi.org/10.1093/nar/gkx1036IF: 14.9 Q1
- Structure and function of the global ocean microbiome.Science. 2015; 3481261359https://doi.org/10.1126/science.1261359IF: 56.9 Q1
- The National Microbiome Data Collaborative Data Portal: an integrated multi-omics microbiome data resource.Nat. 2022; https://doi.org/10.1093/nar/gkab990IF: 14.9 Q1
- Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.Bioinform Biol Insights. 2015; 9: 75-88https://doi.org/10.4137/BBI.S12462IF: 5.8
- An integrated pipeline for annotation and visualization of metagenomic contigs.Front Genet. 2019; 10: 999https://doi.org/10.3389/fgene.2019.00999IF: 3.7 Q2
- Prokka: rapid prokaryotic genome annotation.Bioinformatics. 2014; 30: 2068-2069https://doi.org/10.1093/bioinformatics/btu153IF: 5.8 Q1
- metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data.Gigascience. 2022; 12 (giad078)https://doi.org/10.1093/gigascience/giad078IF: 9.2 Q1
- PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes.Gigascience. 2020; 9giaa022https://doi.org/10.1093/gigascience/giaa022IF: 9.2 Q1
- NCBI prokaryotic genome annotation pipeline.Nucleic Acids Res. 2016; 44: 6614-6624https://doi.org/10.1093/nar/gkw569IF: 14.9 Q1
- DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.Bioinformatics. 2018; 34: 1037-1039https://doi.org/10.1093/bioinformatics/btx713IF: 5.8 Q1
- nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning.NAR Genom Bioinform. 2022; 4lqac007https://doi.org/10.1093/nargab/lqac007IF: 4.6
- Rfam 14: expanded coverage of metagenomic, viral and microRNA families.Nucleic Acids Res. 2021; 49: D192-D200https://doi.org/10.1093/nar/gkaa1047IF: 14.9 Q1
- Infernal 1.1: 100-fold faster RNA homology searches.Bioinformatics. 2013; 29: 2933-2935https://doi.org/10.1093/bioinformatics/btt509IF: 5.8 Q1
- tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes.Nucleic Acids Res. 2021; 49: 9077-9096https://doi.org/10.1093/nar/gkab688IF: 14.9 Q1
- CRISPRCasTyper: An automated tool for the identification, annotation and classification of CRISPR-Cas loci.Bioinformatics. 2020; https://doi.org/10.1101/2020.05.15.097824
- CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats.BMC Bioinforma. 2007; 8: 209https://doi.org/10.1186/1471-2105-8-209IF: 3.0 Q2
- Fast and accurate identification of plasmids and viruses in sequencing data using geNomad.Nat Biotechnol. 2023; https://doi.org/10.1038/s41587-023-01982-7IF: 46.9 Q1
- Prodigal: prokaryotic gene recognition and translation initiation site identification.BMC Bioinforma. 2010; 11: 119https://doi.org/10.1186/1471-2105-11-119IF: 3.0 Q2
- Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.Curr Protoc Microbiol. 2014; 32 (Unit 1E.7)https://doi.org/10.1002/9780471729259.mc01e07s32
- FragGeneScan: predicting genes in short and error-prone reads.Nucleic Acids Res. 2010; 38e191https://doi.org/10.1093/nar/gkq747IF: 14.9 Q1
- UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.Bioinformatics. 2015; 31: 926-932https://doi.org/10.1093/bioinformatics/btu739IF: 5.8 Q1
- Pfam: The protein families database in 2021.Nucleic Acids Res. 2021; 49: D412-D419https://doi.org/10.1093/nar/gkaa913IF: 14.9 Q1
- InterPro in 2022.Nucleic Acids Res. 2023; 51: D418-D427https://doi.org/10.1093/nar/gkac993IF: 14.9 Q1
- Basic local alignment search tool.J Mol Biol. 1990; 215: 403-410https://doi.org/10.1016/S0022-2836(05)80360-2IF: 5.6 Q1
- Fast and sensitive protein alignment using DIAMOND.Nat Methods. 2015; 12: 59-60https://doi.org/10.1038/nmeth.3176IF: 48.0 Q1
- MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.Nat Biotechnol. 2017; 35: 1026-1028https://doi.org/10.1038/nbt.3988IF: 46.9 Q1
- HMMER web server: 2018 update.Nucleic Acids Res. 2018; 46: W200-W204https://doi.org/10.1093/nar/gky448IF: 14.9 Q1
- HH-suite3 for fast remote homology detection and deep protein annotation.BMC Bioinforma. 2019; 20: 473https://doi.org/10.1186/s12859-019-3019-7IF: 3.0 Q2
- Improved metagenomic analysis with Kraken 2.Genome Biol. 2019; 20: 257https://doi.org/10.1186/s13059-019-1891-0IF: 12.3 Q1
- Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.Nat Methods. 2009; 6: 673-676https://doi.org/10.1038/nmeth.1358IF: 48.0 Q1
- MetaPhlAn 4 profiling of unknown species-level genome bins improves the characterization of diet-associated microbiome changes in mice.Cell Rep. 2023; 42112464https://doi.org/10.1016/j.celrep.2023.112464IF: 8.8 Q1
- Flame (v2.0): advanced integration and interpretation of functional enrichment results from multiple sources.Bioinformatics. 2023; 39btad490https://doi.org/10.1093/bioinformatics/btad490IF: 5.8 Q1
- FLAME: a web tool for functional and literature enrichment analysis of multiple gene lists.Biol (Basel). 2021; 10: 665https://doi.org/10.3390/biology10070665IF: 4.2 Q2
- The characterization of novel tissue microbiota using an optimized 16S metagenomic sequencing pipeline.PLoS ONE. 2015; 10e0142334https://doi.org/10.1371/journal.pone.0142334IF: 3.7 Q2
- Bee foraging preferences, microbiota and pathogens revealed by direct shotgun metagenomics of honey.Mol Ecol Resour. 2022; 22: 2506-2523https://doi.org/10.1111/1755-0998.13626IF: 7.7 Q1
- Biomolecule and bioentity interaction databases in systems biology: a comprehensive review.Biomolecules. 2021; 11: 1245https://doi.org/10.3390/biom11081245IF: 5.5 Q1
- A guide to conquer the biological network era using graph theory.Front Bioeng Biotechnol. 2020; 8: 34https://doi.org/10.3389/fbioe.2020.00034IF: 5.7 Q1
- Metaproteome analysis reveals that syntrophy, competition, and phage-host interaction shape microbial communities in biogas plants.Microbiome. 2019; 7: 69https://doi.org/10.1186/s40168-019-0673-yIF: 15.5 Q1
- Extensive T-Cell Epitope Repertoire Sharing among Human Proteome, Gastrointestinal Microbiome, and Pathogenic Bacteria: Implications for the Definition of Self.Front Immunol. 2015; 6https://doi.org/10.3389/fimmu.2015.00538IF: 7.3 Q1
- Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea.Nat Commun. 2019; 10: 5477https://doi.org/10.1038/s41467-019-13443-4IF: 16.6 Q1
- Overview of Sankey flow diagrams: Focusing on symptom trajectories in older adults with advanced cancer.J Geriatr Oncol. 2022; 13: 742-746https://doi.org/10.1016/j.jgo.2021.12.017IF: 3.0 Q3
- The thermal efficiency of steam engines. report of the committee appointed to the council upon the subject of the definition of a standard or standards of thermal efficiency for steam engines: with an introductory note. (Including appendixes and plate at back of volume).Minutes Proc Inst Civ Eng. 1898; 134: 278-312https://doi.org/10.1680/imotp.1898.19100
- BioSankey: Visualization of Microbial Communities Over Time.J Integr Bioinforma. 2018; 1520170063https://doi.org/10.1515/jib-2017-0063IF: 1.9
- Metagenomic insights into the microbial diversity in manganese-contaminated mine tailings and their role in biogeochemical cycling of manganese.Sci Rep. 2018; 8: 8257https://doi.org/10.1038/s41598-018-26311-wIF: 4.6 Q2
- Hive plots--rational approach to visualizing networks.Brief Bioinforma. 2012; 13: 627-644https://doi.org/10.1093/bib/bbr069IF: 9.5 Q1
- Compositional homogeneity in the pathobiome of a new, slow-spreading coral disease.Microbiome. 2019; 7: 139https://doi.org/10.1186/s40168-019-0759-6IF: 15.5 Q1
- Applications and Comparison of Dimensionality Reduction Methods for Microbiome Data.Front Bioinform. 2022; 2821861https://doi.org/10.3389/fbinf.2022.821861
- Review of Dimension Reduction Methods.JDAIP. 2021; 09: 189-231https://doi.org/10.4236/jdaip.2021.93013
- A Review on Dimension Reduction.Int Stat Rev. 2013; 81: 134-150https://doi.org/10.1111/j.1751-5823.2012.00182.xIF: 2.0 Q2
- Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization.Commun Biol. 2022; 5: 719https://doi.org/10.1038/s42003-022-03628-xIF: 5.9 Q1
- Dimensionality reduction for visualizing single-cell data using UMAP.Nat Biotechnol. 2018; https://doi.org/10.1038/nbt.4314IF: 46.9 Q1
- A Review of Dimensionality Reduction Techniques for Efficient Computation.Procedia Comput Sci. 2019; 165: 104-111https://doi.org/10.1016/j.procs.2020.01.079
- The specious art of single-cell genomics.PLoS Comput Biol. 2023; 19e1011288https://doi.org/10.1371/journal.pcbi.1011288IF: 4.3 Q1
- Species Divergence vs. Functional Convergence Characterizes Crude Oil Microbial Community Assembly.Front Microbiol. 2016; 7https://doi.org/10.3389/fmicb.2016.01254IF: 5.2 Q2
- SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data.Comput Struct Biotechnol J. 2023; 21: 5382-5393https://doi.org/10.1016/j.csbj.2023.10.032IF: 6.0 Q1
- Metagenomic approaches to study the culture-independent bacterial diversity of a polluted environment—a case study on north-eastern coast of Bay of Bengal, India.Microbial Biodegradation and Bioremediation. Elsevier,, 2022: 81-107https://doi.org/10.1016/B978-0-323-85455-9.00014-X
- Characterizing the bacterial community across the gastrointestinal tract of goats: Composition and potential function.MicrobiologyOpen. 2019; 8e00820https://doi.org/10.1002/mbo3.820IF: 3.4 Q2
- Normalization and microbial differential abundance strategies depend upon data characteristics.Microbiome. 2017; 5: 27https://doi.org/10.1186/s40168-017-0237-yIF: 15.5 Q1
- Extrication of the microbial interactions of activated sludge used in the textile effluent treatment of anaerobic reactor through metagenomic profiling.Curr Microbiol. 2020; 77: 2496-2509https://doi.org/10.1007/s00284-020-02020-4IF: 2.6 Q3
- Meander: visually exploring the structural variome using space-filling curves.Nucleic Acids Res. 2013; 41 (e118–e118)https://doi.org/10.1093/nar/gkt254IF: 14.9 Q1
- LongQC: A Quality Control Tool for Third Generation Sequencing Long Read Data.G3 Genes|Genomes|Genet. 2020; 10: 1193-1196https://doi.org/10.1534/g3.119.400864IF: 2.6 Q3
- MinIONQC: fast and simple quality control for MinION sequencing data.Bioinformatics. 2019; 35: 523-525https://doi.org/10.1093/bioinformatics/bty654IF: 5.8 Q1
- NanoPack: visualizing and processing long-read sequencing data.Bioinformatics. 2018; 34: 2666-2669https://doi.org/10.1093/bioinformatics/bty149IF: 5.8 Q1
- SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data.GigaScience. 2018; 7https://doi.org/10.1093/gigascience/gix120IF: 9.2 Q1
- SequelTools: a suite of tools for working with PacBio Sequel raw sequence data.BMC Bioinforma. 2020; 21: 429https://doi.org/10.1186/s12859-020-03751-8IF: 3.0 Q2
- ABySS-Explorer: visualizing genome sequence assemblies.IEEE Trans Vis Comput Graph. 2009; 15: 881-888https://doi.org/10.1109/TVCG.2009.116IF: 5.2 Q1
- Assembly Graph Browser: interactive visualization of assembly graphs.Bioinformatics. 2019; 35: 3476-3478https://doi.org/10.1093/bioinformatics/btz072IF: 5.8 Q1
- GfaViz: flexible and interactive visualization of GFA sequence graphs.Bioinformatics. 2019; 35: 2853-2855https://doi.org/10.1093/bioinformatics/bty1046IF: 5.8 Q1
- SGTK: a toolkit for visualization and assessment of scaffold graphs.Bioinformatics. 2019; 35: 2303-2305https://doi.org/10.1093/bioinformatics/bty956IF: 5.8 Q1
- PanGraphViewer: a versatile tool to visualize pangenome graphs.Bioinformatics. 2023; https://doi.org/10.1101/2023.03.30.534931
- BinaRena: a dedicated interactive platform for human-guided exploration and binning of metagenomes.Microbiome. 2023; 11: 186https://doi.org/10.1186/s40168-023-01625-8IF: 15.5 Q1
- CONCOCT: Clust cONtigs Cover Compos. 2013; https://doi.org/10.48550/ARXIV.1312.4038
- MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis.Microbiome. 2018; 6: 158https://doi.org/10.1186/s40168-018-0541-1IF: 15.5 Q1
- VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data.Microbiome. 2015; 3https://doi.org/10.1186/s40168-014-0066-1IF: 15.5 Q1
- Community-led, integrated, reproducible multi-omics with anvi’o.Nat Microbiol. 2020; 6: 3-6https://doi.org/10.1038/s41564-020-00834-3IF: 28.3 Q1
- Visualizing and comparing circular genomes using the CGView family of tools.Brief Bioinform. 2019; 20: 1576-1582https://doi.org/10.1093/bib/bbx081IF: 9.5 Q1
- CRAMER: a lightweight, highly customizable web-based genome browser supporting multiple visualization instances.Bioinformatics. 2020; 36: 3556-3557https://doi.org/10.1093/bioinformatics/btaa146IF: 5.8 Q1
- Elviz – exploration of metagenome assemblies with an interactive visualization tool.BMC Bioinforma. 2015; 16: 130https://doi.org/10.1186/s12859-015-0566-4IF: 3.0 Q2
- Accessing NCBI data using the NCBI Sequence Viewer and Genome Data Viewer (GDV).Genome Res. 2021; 31: 159-169https://doi.org/10.1101/gr.266932.120IF: 7.0 Q1
- Gosling: A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization.IEEE Trans Vis Comput Graph. 2022; 28: 140-150https://doi.org/10.1109/TVCG.2021.3114876IF: 5.2 Q1
- Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.Brief Bioinforma. 2013; 14: 178-192https://doi.org/10.1093/bib/bbs017IF: 9.5 Q1
- JBrowse: a dynamic web platform for genome visualization and analysis.Genome Biol. 2016; 17: 66https://doi.org/10.1186/s13059-016-0924-1IF: 12.3 Q1
- Tablet—next generation sequence assembly visualization.Bioinformatics. 2010; 26: 401-402https://doi.org/10.1093/bioinformatics/btp666IF: 5.8 Q1
- The UCSC Genome Browser database: 2023 update.Nucleic Acids Res. 2023; 51: D1188-D1195https://doi.org/10.1093/nar/gkac1072IF: 14.9 Q1
- Ensembl 2022.Nucleic Acids Res. 2022; 50: D988-D995https://doi.org/10.1093/nar/gkab1049IF: 14.9 Q1
- Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data.Bioinformatics. 2012; 28: 464-469https://doi.org/10.1093/bioinformatics/btr703IF: 5.8 Q1
- Unipro UGENE: a unified bioinformatics toolkit.Bioinformatics. 2012; 28: 1166-1167https://doi.org/10.1093/bioinformatics/bts091IF: 5.8 Q1
- Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data.Bioinformatics. 2012; 28: 1647-1649https://doi.org/10.1093/bioinformatics/bts199IF: 5.8 Q1
- Introducing the bacterial and viral bioinformatics resource center (BV-BRC): a resource combining PATRIC, IRD and ViPR.Nucleic Acids Res. 2023; 51: D678-D689https://doi.org/10.1093/nar/gkac1003IF: 14.9 Q1
- MSAViewer: interactive JavaScript visualization of multiple sequence alignments.Bioinformatics. 2016; 32: 3501-3503https://doi.org/10.1093/bioinformatics/btw474IF: 5.8 Q1
- Comparative visualization of genetic and physical maps with Strudel.Bioinformatics. 2011; 27: 1307-1308https://doi.org/10.1093/bioinformatics/btr111IF: 5.8 Q1
- SuiteMSA: visual tools for multiple sequence alignment comparison and molecular sequence simulation.BMC Bioinforma. 2011; 12: 184https://doi.org/10.1186/1471-2105-12-184IF: 3.0 Q2
- Jalview Version 2—a multiple sequence alignment editor and analysis workbench.Bioinformatics. 2009; 25: 1189-1191https://doi.org/10.1093/bioinformatics/btp033IF: 5.8 Q1
- MSABrowser: dynamic and fast visualization of sequence alignments, variations and annotations.Bioinforma Adv. 2021; 1vbab009https://doi.org/10.1093/bioadv/vbab009
- Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation.(vol. 2231)in: Katoh K. Multiple Sequence Alignment. Springer US, New York, NY2021: 241-260https://doi.org/10.1007/978-1-0716-1036-7_15 (vol. 2231)
- Panache: a web browser-based viewer for linearized pangenomes.Bioinformatics. 2021; 37: 4556-4558https://doi.org/10.1093/bioinformatics/btab688IF: 5.8 Q1
- Pan-Tetris: an interactive visualisation for Pan-genomes.BMC Bioinforma. 2015; 16S3https://doi.org/10.1186/1471-2105-16-S11-S3IF: 3.0 Q2
- PanViz: interactive visualization of the structure of functionally annotated pangenomes.Bioinformatics. 2017; 33: 1081-1082https://doi.org/10.1093/bioinformatics/btw761IF: 5.8 Q1
- panX: pan-genome analysis and exploration.Nucleic Acids Res. 2018; 46 (e5–e5)https://doi.org/10.1093/nar/gkx977IF: 14.9 Q1
- PanTools: representation, storage and exploration of pan-genomic data.Bioinformatics. 2016; 32: i487-i493https://doi.org/10.1093/bioinformatics/btw455IF: 5.8 Q1
- Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.Genome Biol. 2020; 21: 249https://doi.org/10.1186/s13059-020-02135-8IF: 12.3 Q1
- The design and construction of reference pangenome graphs with minigraph.Genome Biol. 2020; 21: 265https://doi.org/10.1186/s13059-020-02168-zIF: 12.3 Q1
- TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes.Bioinformatics. 2017; 33: 4024-4032https://doi.org/10.1093/bioinformatics/btw609IF: 5.8 Q1
- Pangenome graph construction from genome alignments with Minigraph-Cactus.Nat Biotechnol. 2023; https://doi.org/10.1038/s41587-023-01793-wIF: 46.9 Q1
- Microbiome maps: Hilbert curve visualizations of metagenomic profiles.Front Bioinform. 2023; 31154588https://doi.org/10.3389/fbinf.2023.1154588
- QIIME 2 enables comprehensive end‐to‐end analysis of diverse microbiome data and comparative studies with publicly available data.CP Bioinforma. 2020; 70e100https://doi.org/10.1002/cpbi.100
- phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.PLoS ONE. 2013; 8e61217https://doi.org/10.1371/journal.pone.0061217IF: 3.7 Q2
- MicrobiomeAnalyst: a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data.Nucleic Acids Res. 2017; 45: W180-W188https://doi.org/10.1093/nar/gkx295IF: 14.9 Q1
Joseph Nathaniel Paulson HT. metagenomeSeq 2017. https://doi.org/10.18129/B9.BIOC.METAGENOMESEQ.
- MEGA11: molecular evolutionary genetics analysis version 11.Mol Biol Evol. 2021; 38: 3022-3027https://doi.org/10.1093/molbev/msab120IF: 10.7 Q1
Wilgenbusch J.C., Swofford D. Inferring Evolutionary Trees with PAUP *. CP in Bioinformatics 2003;00. https://doi.org/10.1002/0471250953.bi0604s00.
- Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation.Bioinformatics. 2007; 23: 127-128https://doi.org/10.1093/bioinformatics/btl529IF: 5.8 Q1
- itol.toolkit accelerates working with iTOL (Interactive Tree of Life) by an automated generation of annotation files.Bioinformatics. 2023; 39btad339https://doi.org/10.1093/bioinformatics/btad339IF: 5.8 Q1
- PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization.Bioinformatics. 2017; 33: 2946-2947https://doi.org/10.1093/bioinformatics/btx324IF: 5.8 Q1
- Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks.Syst Biol. 2012; 61: 1061-1067https://doi.org/10.1093/sysbio/sys062IF: 6.5 Q1
- A travel guide to Cytoscape plugins.Nat Methods. 2012; 9: 1069-1076https://doi.org/10.1038/nmeth.2212IF: 48.0 Q1
- Cytoscape: a software environment for integrated models of biomolecular interaction networks.Genome Res. 2003; 13: 2498-2504https://doi.org/10.1101/gr.1239303IF: 7.0 Q1
Bastian M., Heymann S., Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks 2009. https://doi.org/10.13140/2.1.1341.1520.
- Analysis and visualization of large networks with program package Pajek.Complex Adapt Syst Model. 2016; 4https://doi.org/10.1186/s40294-016-0017-8
- Arena3Dweb: interactive 3D visualization of multilayered networks.Nucleic Acids Res. 2021; https://doi.org/10.1093/nar/gkab278IF: 14.9 Q1
- Arena3Dweb: interactive 3D visualization of multilayered networks supporting multiple directional information channels, clustering analysis and application integration.NAR Genom Bioinforma. 2023; 5lqad053https://doi.org/10.1093/nargab/lqad053IF: 4.6
- NORMA: the network makeup artist — a web tool for network annotation visualization.Genom, Proteom Bioinforma. 2022; 20: 578-586https://doi.org/10.1016/j.gpb.2021.02.005IF: 9.5 Q1
- The network makeup artist (NORMA-2.0): distinguishing annotated groups in a network using innovative layout strategies.Bioinforma Adv. 2022; 2vbac036https://doi.org/10.1093/bioadv/vbac036
- A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data.Comput Struct Biotechnol J. 2021; 19: 6301-6314https://doi.org/10.1016/j.csbj.2021.11.028IF: 6.0 Q1
- Comparison of de-novo assembly tools for plasmid metagenome analysis.Genes Genom. 2019; 41: 1077-1083https://doi.org/10.1007/s13258-019-00839-1IF: 2.1 Q3
- Metagenomic data assembly – the way of decoding unknown microorganisms.Front Microbiol. 2021; 12613791https://doi.org/10.3389/fmicb.2021.613791IF: 5.2 Q2
- Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective - not only size matters!.PLoS ONE. 2017; 12e0169662https://doi.org/10.1371/journal.pone.0169662IF: 3.7 Q2
- A review of methods and databases for metagenomic classification and assembly.Brief Bioinforma. 2019; 20: 1125-1136https://doi.org/10.1093/bib/bbx120IF: 9.5 Q1
- Omega: an Overlap-graph de novo assembler for metagenomics.Bioinformatics. 2014; 30: 2717-2722https://doi.org/10.1093/bioinformatics/btu395IF: 5.8 Q1
- Using the Velvet de novo assembler for short‐read sequencing technologies.CP Bioinforma. 2010; 31https://doi.org/10.1002/0471250953.bi1105s31
- MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads.Nucleic Acids Res. 2012; 40 (e155–e155.)https://doi.org/10.1093/nar/gks678IF: 14.9 Q1
- MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.Bioinformatics. 2015; 31: 1674-1676https://doi.org/10.1093/bioinformatics/btv033IF: 5.8 Q1
- Compacting de Bruijn graphs from sequencing data quickly and in low memory.Bioinformatics. 2016; 32: i201-i208https://doi.org/10.1093/bioinformatics/btw279IF: 5.8 Q1
- metaSPAdes: a new versatile metagenomic assembler.Genome Res. 2017; 27: 824-834https://doi.org/10.1101/gr.213959.116IF: 7.0 Q1
- MetaCarvel: linking assembly graph motifs to biological variants.Genome Biol. 2019; 20: 174https://doi.org/10.1186/s13059-019-1791-3IF: 12.3 Q1
- Bandage: interactive visualization of de novo genome assemblies.Bioinformatics. 2015; 31: 3350-3352https://doi.org/10.1093/bioinformatics/btv383IF: 5.8 Q1
- Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.BMC Bioinforma. 2020; 21: 334https://doi.org/10.1186/s12859-020-03667-3IF: 3.0 Q2
- MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.PeerJ. 2019; 7e7359https://doi.org/10.7717/peerj.7359IF: 2.7 Q2
- MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.PeerJ. 2015; 3e1165https://doi.org/10.7717/peerj.1165IF: 2.7 Q2
- ICoVeR – an interactive visualization tool for verification and refinement of metagenomic bins.BMC Bioinforma. 2017; 18: 233https://doi.org/10.1186/s12859-017-1653-5IF: 3.0 Q2
- Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.Sci Rep. 2016; 624175https://doi.org/10.1038/srep24175IF: 4.6 Q2
- gbtools: interactive visualization of metagenome bins in R.Front Microbiol. 2015; 6https://doi.org/10.3389/fmicb.2015.01451IF: 5.2 Q2
- The neighbor-joining method: a new method for reconstructing phylogenetic trees.Mol Biol Evol. 1987; 4: 406-425https://doi.org/10.1093/oxfordjournals.molbev.a040454IF: 10.7 Q1
- Survey of clustering algorithms.IEEE Trans Neural Netw. 2005; 16: 645-678https://doi.org/10.1109/TNN.2005.845141
- Evaluation of clustering algorithms for protein-protein interaction networks.BMC Bioinforma. 2006; 7: 488https://doi.org/10.1186/1471-2105-7-488IF: 3.0 Q2
- Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies.PLoS ONE. 2009; 4e4345https://doi.org/10.1371/journal.pone.0004345IF: 3.7 Q2
- A large-scale evaluation of algorithms to calculate average nucleotide identity.Antonie Van Leeuwenhoek. 2017; 110: 1281-1286https://doi.org/10.1007/s10482-017-0844-4IF: 2.6 Q3
- HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.Nucleic Acids Res. 2018; 46e33https://doi.org/10.1093/nar/gkx1313IF: 14.9 Q1
- Fast unfolding of communities in large networks.J Stat Mech. 2008; 2008: P10008https://doi.org/10.1088/1742-5468/2008/10/P10008IF: 2.4 Q1
- SPICi: a fast clustering algorithm for large biological networks.Bioinformatics. 2010; 26: 1105-1111https://doi.org/10.1093/bioinformatics/btq078IF: 5.8 Q1
Selvitopi O., Ekanayake S., Guidi G., Pavlopoulos G.A., Azad A., Buluc A. Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA: IEEE; 2020, p. 1–14. https://doi.org/10.1109/SC41405.2020.00079.
Selvitopi O., Ekanayake S., Guidi G., Awan M.G., Pavlopoulos G.A., Azad A., et al. Extreme-Scale Many-against-Many Protein Similarity Search. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA: IEEE; 2022, p. 1–12. https://doi.org/10.1109/SC41404.2022.00006.
- Adaptive seeds tame genomic sequence comparison.Genome Res. 2011; 21: 487-493https://doi.org/10.1101/gr.113985.110IF: 7.0 Q1
- Anvi’o: an advanced analysis and visualization platform for ‘omics data.PeerJ. 2015; 3e1319https://doi.org/10.7717/peerj.1319IF: 2.7 Q2
- QIIME allows analysis of high-throughput community sequencing data.Nat Methods. 2010; 7: 335-336https://doi.org/10.1038/nmeth.f.303IF: 48.0 Q1
- Principal component analysis: a review and recent developments.Philos Trans R Soc A. 2016; 374: 20150202https://doi.org/10.1098/rsta.2015.0202IF: 5.0 Q2
- EMPeror: a tool for visualizing high-throughput microbial community data.GigaSci. 2013; 2: 16https://doi.org/10.1186/2047-217X-2-16IF: 9.2 Q1
- Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.Gigascience. 2015; 4: 38https://doi.org/10.1186/s13742-015-0077-2IF: 9.2 Q1
- A brief introduction to web-based genome browsers.Brief Bioinforma. 2013; 14: 131-143https://doi.org/10.1093/bib/bbs029IF: 9.5 Q1
- Pangenome Graphs.Annu Rev Genom Hum Genet. 2020; 21: 139-162https://doi.org/10.1146/annurev-genom-120219-080406IF: 8.7 Q1
- Comparing methods for constructing and representing human pangenome graphs.Genome Biol. 2023; 24: 274https://doi.org/10.1186/s13059-023-03098-2IF: 12.3 Q1
- A Review of Pangenome Tools and Recent Studies.in: Tettelin H. Medini D. The Pangenome. Springer International Publishing, Cham2020: 89-112https://doi.org/10.1007/978-3-030-38281-0_4
- PanGP: A tool for quickly analyzing bacterial pan-genome profile.Bioinformatics. 2014; 30: 1297https://doi.org/10.1093/bioinformatics/btu017IF: 5.8 Q1
- Roary: rapid large-scale prokaryote pan genome analysis.Bioinformatics. 2015; 31: 3691-3693https://doi.org/10.1093/bioinformatics/btv421IF: 5.8 Q1
- Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions.BMC Bioinforma. 2010; 11: 461https://doi.org/10.1186/1471-2105-11-461IF: 3.0 Q2
- GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy.Nucleic Acids Res. 2022; 50: D785-D794https://doi.org/10.1093/nar/gkab776IF: 14.9 Q1
- A reference guide for tree analysis and visualization.BioData Min. 2010; 3https://doi.org/10.1186/1756-0381-3-1IF: 4.5 Q1
- VAMPS: a website for visualization and analysis of microbial population structures.BMC Bioinforma. 2014; 15: 41https://doi.org/10.1186/1471-2105-15-41IF: 3.0 Q2
- ETE 3: reconstruction, analysis, and visualization of phylogenomic data.Mol Biol Evol. 2016; 33: 1635-1638https://doi.org/10.1093/molbev/msw046IF: 10.7 Q1
- DendroPy: a Python library for phylogenetic computing.Bioinformatics. 2010; 26: 1569-1571https://doi.org/10.1093/bioinformatics/btq228IF: 5.8 Q1
- Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython.BMC Bioinforma. 2012; 13: 209https://doi.org/10.1186/1471-2105-13-209IF: 3.0 Q2
- ampvis2: an R package to analyse and visualise 16S rRNA amplicon data.Bioinformatics. 2018; https://doi.org/10.1101/299537
- Interactive metagenomic visualization in a Web browser.BMC Bioinforma. 2011; 12: 385https://doi.org/10.1186/1471-2105-12-385IF: 3.0 Q2
- A survey of visualization tools for biological network analysis.BioData Min. 2008; 1: 12https://doi.org/10.1186/1756-0381-1-12IF: 4.5 Q1
- Bipartite graphs in systems biology and medicine: a survey of methods and applications.Gigascience. 2018; 7: 1-31https://doi.org/10.1093/gigascience/giy014IF: 9.2 Q1
- Analyzing protein-protein interaction networks with web tools.CBIO. 2011; 6: 389-397https://doi.org/10.2174/157489311798072972IF: 4.0 Q1
- Protein-protein interaction predictions using text mining methods.Methods. 2015; 74: 47-53https://doi.org/10.1016/j.ymeth.2014.10.026IF: 4.8 Q1
- Network analysis of genes and their association with diseases.Gene. 2016; 590: 68-78https://doi.org/10.1016/j.gene.2016.05.044IF: 3.5 Q2
- Arena3D: visualization of biological networks in 3D.BMC Syst Biol. 2008; 2: 104https://doi.org/10.1186/1752-0509-2-104
- The JAX Synteny Browser for mouse-human comparative genomics.Mamm Genome. 2019; 30: 353-361https://doi.org/10.1007/s00335-019-09821-4IF: 2.5 Q3
- ALLMAPS: robust scaffold ordering based on multiple maps.Genome Biol. 2015; 16: 3https://doi.org/10.1186/s13059-014-0573-1IF: 12.3 Q1
- GeneSpy, a user-friendly and flexible genomic context visualizer.Bioinformatics. 2019; 35: 329-331https://doi.org/10.1093/bioinformatics/bty459IF: 5.8 Q1
- FlaGs and webFlaGs: discovering novel biology through the analysis of gene neighbourhood conservation.Bioinformatics. 2021; 37: 1312-1314https://doi.org/10.1093/bioinformatics/btaa788IF: 5.8 Q1
- GeCoViz: genomic context visualisation of prokaryotic genes from a functional and evolutionary perspective.Nucleic Acids Res. 2022; 50: W352-W357https://doi.org/10.1093/nar/gkac367IF: 14.9 Q1
- FeGenie: a comprehensive tool for the identification of iron genes and iron gene neighborhoods in genome and metagenome assemblies.Front Microbiol. 2020; 11: 37https://doi.org/10.3389/fmicb.2020.00037IF: 5.2 Q2
- The EFI web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways.Biochemistry. 2019; 58: 4169-4182https://doi.org/10.1021/acs.biochem.9b00735IF: 2.9 Q3
Article info
Publication history
Identification
Copyright
User license
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |Permitted
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works
Not Permitted
- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article
Elsevier's open access license policy
ScienceDirect
Access this article on ScienceDirectFigures
- Graphical AbstractA. Minimum-Evolution tree - Adh sequence data from eleven fruit fly species B. iTol circular tree - Alignment of temporally sampled data for using RelTime with Dated Tips (RTDT) to estimate times of divergence C. iTOL unrooted tree - Alignment of temporally sampled data for using RelTime with Dated Tips (RTDT) to estimate times of divergence. D. Pavian E. Krona sunburst chart - Taxonomic abundance of skin microbiome samples for 4 consecutive days F. iTOL rectangular tree - Adh sequence data from eleven fruit fly species.
- Fig. 1Different steps of a typical metagenomic analysis: (i) Marker gene detection and taxonomic assignment, (ii) De novo assembly towards the generation of larger contigs, and (iii) Map to reference genome (if it exists).
- Fig. 2Different visualization concepts. (A) Circos diagram. (B) Upset plot & its corresponding Venn diagram. (C) HeatMap. (D) Bar chart (species). (E) Network. (F) Sunburst chart (Krona). (G) Treemap. (H) Phylogenetic tree. (I) Sankey plot. (J) Bubble chart. (K) Hive plot. (L) PCA map. All plots have been created using simulated data.
- Fig. 3(A-C) Graph-based visualization of sequence assembly of Escherichia coli str. K-12 substrate MG1655 with (A) Bandage, (B) GFAviz, and (C) AbyssExplorer (NCBI:txid511145). (D) Heatmap visualizing the bin abundances of draft genomes using MetaWrap (Bioproject Accession: PRJEB2054, ID: 203783). (E) Binning of MAGs highlighting 214 bins of E.coli using BinaRena (BioProject: PRJNA382010). (F) CGView: Genome Contigs Viewer of Escherichia coli PA2 (NCBI RefSeq assembly GCF_000335355.2) in a circular format. (G-H) Scaffold visualization of E.coli K-12 with (G) IMG and (H) UCSC genome viewers. (I) Example of a pangenome graph.
- Fig. 4(A) Sunburst chart (Krona) showing taxonomy. (B) Taxonomy with Sankey plot (Pavian). (C) Tree of Life visualized by iTOL. (D) Taxonomy visualized as a Bubble chart. (E) Taxonomy visualized as a Treemap. (F) Taxonomic Ordering with the use of Hilbert curves visualized by Jasper/Microbiome Maps. All the plots above have been created using example data provided with each tool.
- Fig. 5(A-C) Various network visualization schemes for data retrieved from NMPFamsDB. (A) 2D Network visualization of NMPF distribution across different biomes, rendered using Gephi. (B) 3D, multi-layered network visualization of NMPFs associated with 4 human microbiomes, as well as additional annotation (sample type and availability of 3D model), created using Arena3Dweb. (C) A gene co-occurrence network describing the gene neighborhood of a novel metagenome protein family (F006270), constructed with data from NMPFamsDB and rendered using NORMA. The functional annotation of F006270’s neighboring genes is presented in the form of colored groups. (D) Gene neighborhood visualization for multiple MAGs through synteny conservation analysis, rendered using GeCoViz and the FESNov catalog. (E) Tree visualization of metagenome ecosystems, using the GOLD classification system. The number of metagenomic datasets associated with each ecosystem is given in parentheses. (F) Chronological progression of different SARS-Cov-2 strains in the form of a histogram, rendered using NextStrain. (G-H) Map visualizations of the geographical distribution across Europe (G) and global dispersion patterns of COVID-19 (H) rendered using NextStrain.