热带玉米白斑病抗性候选基因挖掘及功能分析 Exploration and functional analysis of candidate genes for resistance to tropical maize white spot disease
摘要:玉米白斑病 (MWS) 是由菠萝泛菌 (Pantoea ananatis) 引起的一种严重叶部病害, 近年来, 严重损害了我国热带亚热带玉米的产量和质量。了解热带玉米白斑病的遗传机制, 鉴定其候选基因对选育和培育高产抗病的热带玉米新品种十分重要。本研究利用 6 白斑病遗传变异广泛的热带玉米自交系(YML32、TRL418、CML171、TML139、YML2 26、NK40-1)与易受白斑病影响的优良温带玉米自交系 Ye107 杂交后自交 9 代, 构建了 6 个重组自交系群体, 在云南省砚山县进行了三年的表型鉴定, 各群体 MWS 的遗传力为 78 . 02%-98.85%,其通过全基因组重测序(WGS)产生了 6390967 个高质量 SNP,结合表型数据进行了全基因组关联分析(GWAS)和连锁分析, 共鉴定出了 679 个 SNP 和 42 个QT L,在显著 SNP 上下游 50kb 区域内进行候选基因的笕选,基于基因表达、基因本体论(G O)富集分析, 鉴定出可能与 MWS 抗性相关的 SNP2-212792117、SNP2-210547514、SNP4 -118983592、SNP8-149153434、SNP8-149695036 和 SNP7-72694269 位点上的 7 个新的候选基因: Zm00001eb107230、Zm00001eb106290、Zm00001eb181850、Zm00001eb358700、Z m00001eb358890、Zm00001eb308270 和 Zm00001eb308280。这些基因通过编码相关抗性蛋白和转录因子, 参与植物病原菌防御、植物信号转导以及植物激素的表达等来对 MWS 防御发挥重要作用。本研究不仅为探索热带玉米白斑病的遗传结构和分子机制提供了新的遗传标记和基因组资源,而且为验证和克隆与热带玉米白斑病抗性形成相关的功能基因奠定了重要基础。 Abstract: Maize white spot disease (MWS) is a serious leaf disease caused by Pantoea ananatis, which has seriously damaged the yield and quality of tropical and subtropical maize in China in recent years. Understanding the genetic mechanism of tropical maize white spot disease and identifying candidate genes are crucial for breeding and cultivating new high-yielding and disease-resistant tropical maize varieties. In this study, six tropical maize inbred lines with wide genetic variations for white spot disease (YML32, TRL418, CML171, TML139, YML226, NK40-1) were crossed with the excellent temperate maize inbred line Ye107, which is susceptible to white spot disease, and then self-pollinated for 9 generations to construct 6 recombinant inbred line populations. Phenotypic identification was conducted in Yanshan County, Yunnan Province for three years, and the heritability of MWS in each population ranged from 78.02% to 98.85%. Through whole-genome resequencing (WGS), 6,390,967 high-quality SNPs were identified, and combined with phenotype data, genome-wide association analysis (GWAS) and linkage analysis were performed, identifying a total of 679 SNPs and 42 QTLs. Candidate genes were selected within the 50kb region upstream and downstream of significant SNPs, and based on gene expression and Gene Ontology (GO) enrichment analysis, 7 new candidate genes related to MWS resistance were identified at the SNP2-212792117, SNP2-210547514, SNP4-118983592, SNP8-149153434, SNP8-149695036, and SNP7-72694269 loci: Zm00001eb107230, Zm00001eb106290, Zm00001eb181850, Zm00001eb358700, Zm00001eb358890, Zm00001eb308270, and Zm00001eb308280. These genes encode resistance proteins and transcription factors, playing important roles in plant defense against pathogens, plant signal transduction, and plant hormone expression in MWS defense. This study not only provides new genetic markers and genomic resources for exploring the genetic structure and molecular mechanisms of tropical maize white spot disease but also lays an important foundation for verifying and cloning functional genes related to resistance to tropical maize white spot disease.
关键词: 热带玉米, 白斑病, GWAS, QTL, GO 分析, 候选基因 Keywords: Tropical maize, White spot disease, GWAS, QTL, GO analysis, Candidate genes
1.引言 1. Introduction
玉米(Zea mays L.)作为重要的世界粮食作物之一, 直接影响世界粮食安全及畜牧业发展 [1]。近年来,全世界范围内毁灭性病原体的存在不断威胁着玉米的产质量[2]。玉米白斑病(Maize white spot, MWS)是一种菠萝泛菌 (Pantoea ananatis) 引起的叶部病害[3, 4], 症状最初表现为在开花期间基部叶片出现病变, 迅速扩散到植株上部, 在病害的较晚期出现褪绿的外观, 急剧影响植物循环和光合作用[5], 特别是在玉米灌浆期之前, 如果有部分叶片被感染,可导致净光合速率大幅下降 40%,进而导致生殖成熟阶段提前结束,缩短玉米的整个生长周期[6], 最终导致玉米果穗畸形、出䊅率低, 严重降低玉米的品质。 Corn (Zea mays L.), as one of the important world food crops, directly affects global food security and livestock development. In recent years, the presence of destructive pathogens worldwide has continuously threatened the production and quality of corn. Maize white spot disease (MWS) is a leaf disease caused by Pantoea ananatis, with symptoms initially appearing as lesions on the lower leaves during flowering, rapidly spreading to the upper parts of the plant. In the later stages of the disease, a faded green appearance occurs, severely affecting plant circulation and photosynthesis, especially before the corn filling stage. If some leaves are infected before the grain filling stage, the net photosynthetic rate can drop significantly by 40%, leading to an early end to the reproductive maturity stage and shortening the entire growth cycle of corn. This ultimately results in deformed corn cobs, low yield, and a significant reduction in corn quality.
该病害于 1965 年首次在印度报道[7], 并于 20 世纪 80 年代至 90 年代在巴西和美国流 The disease was first reported in India in 1965 [7], and spread to Brazil and the United States in the 1980s and 1990s
行[8, 9], 严重损害了热带和亚热带地区的玉米生产。有报道表明, 严重的叶斑病侵染可导致巴西玉米减产 以上, 美国玉米减产 。2020 年 7 月中旬以来, 白斑病在我国西南玉米生态区,尤其是云南大爆发,导致部分地区玉米大面积减少,产量损失在 之间 [12,13]。并且在西南地区,该病害导致植株平均发病率为 , 叶片平均感染率为 , 平均病级指数为 , 严重损害玉米产质量, 导致农民收入大幅下降[14]。目前, MWS 的防控措施主要集中在优化田间耕作制度和农业化学药剂的使用 [15,16]。但随着近年来的研究表明, 白斑病的控制很大程度上依赖于遗传抗性 71, 白斑病抗性是一种具有相对较高遗传力的数量性状, 受相对较少且具有加性效应的基因控制,基因型受环境相互作用影响较低[18-20],抗白斑病育种是控制该病害最有效的途径。因此,鉴定其抗性候选基因,揭示抗性遗传机制,对选育和培育抗性品种是十分必要的。 Severe damage to maize production in tropical and subtropical regions [8, 9]. Reports indicate that severe leaf spot disease can lead to a reduction in maize production in Brazil by more than , and in the United States by . Since mid-July 2020, southern maize ecological areas in China, especially in Yunnan, have experienced a major outbreak of leaf spot disease, resulting in a significant reduction in maize production in some areas, with yield losses ranging between and [12, 13]. In the southwestern region, the disease has caused an average plant infection rate of , an average leaf infection rate of , and an average disease grade index of , severely affecting maize quality and leading to a significant decrease in farmers' income [14]. Currently, the control measures for MWS mainly focus on optimizing field cultivation systems and the use of agricultural chemical agents [15, 16]. However, recent studies have shown that the control of leaf spot disease largely depends on genetic resistance 71. Resistance to leaf spot disease is a quantitative trait with relatively high heritability, controlled by relatively few genes with additive effects, and less influenced by gene-environment interactions [18-20]. Breeding for resistance to leaf spot disease is the most effective way to control this disease. Therefore, it is essential to identify candidate resistance genes, reveal the genetic mechanisms of resistance, and breed and cultivate resistant varieties.
全基因组关联分析(genome-wide association study, GWAS)已成为解析一种不同物种间复杂基因型-表型关联的有力工具 [21], 为作物育种和性状改良提供了新的方法[22]。玉米以其广泛的遗传多样性和快速的连锁不平衡(LD)而闻名,是 GWAS 的典型候选者[23],并且随着玉米基因组 B73 参考序列测序完成,海量 SNP 标记被用于玉米 GWAS 分析[24]。基于 SNP 标记的全基因组关联分析所定位的 QTL,解析率高,可提高选择的目的性和准确性, 进而提高育种效率[25]。目前, 已成功应用于玉米相关叶部病害的研究, 如: 如灰斑病 [26.27], 纹枯病[28, 29], 南方锈病[30,31], 粗缩病[32]等。Carson 等人使用源自 B73×Mo17 杂交的 158 个重组自交(RI)系, 利用 PLABQTL 软件的复合区间作图(CIM), 在 Mo17 中发现了 4 条不同染色体上的 5 个控制 MWS 抗性的 QTL,其中两个 QTL 之间的加性 加性相互作用非常显着[19], Moreira 等人利用高感性 L14-04B 和高抗性 L08-05F 自交系杂交形成 F2 群体, 并在 7 个环境中评估了它们的 F2:3 后代。定位到位于 1、3、4、6、8 号染色体上的 6 个 QTL, 每个 QTL 解释的表型方差范围为 至 11.86%, 联合 QTL 效应解释了 的表型方差[33]。Kistner 等人对 MWS 和 GLS 抗性进行了多亲本 QTL 定位。在 bin 上检测到 6 个 MWS 抗性 QTL。并揭示了三个 QTL(bin 6.02、1.03 和 10.06), 有助于更好地理解 MWS 抗性 [34]。Wang 等人通过全基因组关联研究(GWAS)和转录组整合分析表明,SYN10137-PZA00131.14 不仅是一个多重抗病位点, 也是温带和热带玉米种质改良的重要遗传区域, 特别是创建抗 MWS 的新玉米种质; Zm00001d031875 被确定为 MWS 耐药的重要候选基因[35]。 Genome-wide association study (GWAS) has become a powerful tool for deciphering complex genotype-phenotype associations between different species, providing new methods for crop breeding and trait improvement. Maize is renowned for its extensive genetic diversity and rapid linkage disequilibrium (LD), making it a typical candidate for GWAS. With the completion of the maize genome B73 reference sequence, a large number of SNP markers have been used for maize GWAS analysis. GWAS based on SNP markers can accurately locate QTLs, thereby improving the efficiency of breeding by enhancing selection specificity and accuracy. It has been successfully applied in studies related to maize leaf diseases such as gray leaf spot, northern corn leaf blight, southern rust, and common rust. Carson et al. identified 5 QTLs controlling MWS resistance on 4 different chromosomes in Mo17 using 158 recombinant inbred (RI) lines derived from B73×Mo17 crosses. Moreira et al. evaluated F2:3 progenies from crosses between highly susceptible L14-04B and highly resistant L08-05F in 7 environments, identifying 6 QTLs on chromosomes 1, 3, 4, 6, and 8. Kistner et al. conducted multi-parent QTL mapping for MWS and GLS resistance, detecting 6 MWS resistance QTLs on bin 3 and revealing three QTLs (bin 6.02, 1.03, and 10.06) that contribute to a better understanding of MWS resistance. Wang et al. integrated GWAS and transcriptome analysis to show that SYN10137-PZA00131.14 is not only a multi-disease resistance locus but also an important genetic region for temperate and tropical maize germplasm improvement, especially in creating new maize germplasm resistant to MWS; Zm00001d031875 was identified as an important candidate gene for MWS resistance.
热带和亚热带玉米种质表现出温带玉米所缺乏的丰富遗传变异[36]。近年来, 虽然热带 Tropical and subtropical maize germplasm exhibit rich genetic variation lacking in temperate maize.
和亚热带种质在防治玉米病害方面起到了十分重要的作用, 但这些研究很少利用热带或亚热带玉米种质作为研究分析群体[26]。在本研究中, 我们利用热带玉米自交系 (YML32、T RL418、CML171、TML139、YML226、NK40-1)作为抗性亲本,与温带感病优良自交系 Ye107 杂交, 经过连续 9 代单粒传法自交, 选育出了 6 个 F9 重组自交系, 在云南省砚山县经过三年的表型鉴定, 通过全基因组重测序得到的高质量 SNP 进行 GWAS 分析和连锁分析。 The subtropical germplasm plays a very important role in the prevention and control of corn diseases, but these studies rarely use tropical or subtropical corn germplasm as a research analysis group. In this study, we used tropical corn inbred lines (YML32, T RL418, CML171, TML139, YML226, NK40-1) as resistant parents, crossed with the temperate susceptible excellent inbred line Ye107, and after 9 generations of single-seed descent, 6 F9 recombinant inbred lines were bred. After three years of phenotypic identification in Yanshan County, Yunnan Province, GWAS analysis and linkage analysis were performed using high-quality SNPs obtained through whole-genome resequencing.
本研究的目的是 The purpose of this study is
(1)通过 GWAS 分析和连锁分析筛选白斑病抗性 SNP 或 QTL 位点 (1) Screening vitiligo-resistant SNP or QTL loci through GWAS analysis and linkage analysis
(2)通过功能注释、GO 富集分析、单倍型分析揭示调节 MWS 的新候选基因 (2) Reveal new candidate genes regulating MWS through functional annotation, GO enrichment analysis, and haplotype analysis
2.材料与方法 2. Materials and Methods
2.1 材料与实验设计 2.1 Materials and Experimental Design
本实验选取六个白斑病遗传变异广泛的热带玉米自交系(YML32、TRL418、CML17 1、TML139、YML226、NK40-1)作为母本,与易受白斑病影响的优良温带玉米自交系 Ye 107 为共同亲本, 进行杂交, (7 个亲本的系谱, 杂种优势群, 生态群以及相关抗性等级详见表 1), 通过单粒传法, 连续进行九代, 产生 6 个重组自交系(RIL) 亚群: pop1 (YML32 Ye107) 、 pop2(TRL418× Ye107)、 pop3 (CML171×Ye107)、 pop4 (TML139×Ye10 7)、pop5(YML226×Ye107)、pop6(NK40-1×Ye107),最初每个亚群包含 200 个 RIL,但由于环境选择, 近交衰退等因素, 本研究只有 904 (pop1: 145; pop2: 141: pop3: 14 7: pop4: 160; pop5: 152; pop6: 159)个 RIL 可进行研究, 将这些RIL于 2021, 2022, This experiment selected six tropical maize inbred lines (YML32, TRL418, CML171, TML139, YML226, NK40-1) with wide genetic variation for southern corn leaf blight as female parents, and the excellent temperate maize inbred line Ye107 susceptible to southern corn leaf blight as the common male parent for hybridization. (For the pedigree of the 7 parents, hybrid vigor group, ecological group, and related resistance levels, please refer to Table 1). Through single seed descent, nine generations were continuously conducted to produce 6 recombinant inbred line (RIL) subgroups: pop1 (YML32 x Ye107), pop2 (TRL418 x Ye107), pop3 (CML171 x Ye107), pop4 (TML139 x Ye107), pop5 (YML226 x Ye107), pop6 (NK40-1 x Ye107). Initially, each subgroup contained 200 RILs, but due to environmental selection, inbreeding depression, and other factors, only 904 RILs (pop1: 145; pop2: 141; pop3: 147; pop4: 160; pop5: 152; pop6: 159) were available for study in this research. These RILs were studied in 2021, 2022
全区组设计(RCBD), 每行 14 株, 行长 4 cm , 株距 25 cm , 实验地管理按照标准农艺实践进行。 Randomized complete block design (RCBD), with 14 plants per row, row length of 4 cm, plant spacing of 25 cm, and experimental field management following standard agronomic practices.
Table 1.Descriptive statistics of the six RIL subpopulations for reactions against MWS Table 1. Descriptive statistics of the six RIL subpopulations for reactions against MWS
name
genealogy
Heterotic
YE107
Derived from US hybrid DeKalb XL80 Derivado del híbrido estadounidense DeKalb XL80
Reid
Temperate
9
YML32
Suwan1(S)C9-S8-346-2(Kei8902)-3-4-4-6
Suwan1
Tropical
3
TRL418
Derived from Thailand Monsanto hybrid ผลิตจากพันธุ์ไทยของมอนซานโต้
Derived from Thailand Monsanto hybrid
2021,2022,2023 在中国云南省砚山县(YS)一个地点, 对多亲群体 RIL 进行白斑病的表型鉴定,玉米白斑病暴发在每年的 8 月上旬至 9 月末,此时玉米处于灌浆至成熟期开始进行抗性评估[6], 白斑病 MWS 的病害严重程度分级遵循《玉米病虫害手册》中概述的症状识别标准[37], 并按 1 至 9 的范围为单个小区的平均发病严重度分配疾病评分, 如图 1, 表 2 所示 In 2021, 2022, and 2023, at a location in Yanshan County (YS), Yunnan Province, China, phenotypic identification of maize downy mildew in the RIL population was conducted. Maize downy mildew outbreaks occur from early August to late September each year, during which maize is in the grain filling to maturity stage for resistance evaluation. The severity grading of downy mildew MWS follows the symptom recognition criteria outlined in the "Maize Disease and Pest Manual," and disease scores are assigned to the average severity of individual plots within the range of 1 to 9, as shown in Figure 1 and Table 2
Figure 1. 不同等级玉米白斑病侵染叶片图。R:对白斑病(MWS)高度抗性, R:对 MWS 抗性, M:对 Figure 1. Image of leaves infected with different levels of maize white spot disease. R: Highly resistant to maize white spot disease (MWS), R: Resistant to MWS, M: Resistant to
MWS 中度抗性, S:对 MWS 敏感,HS:对 MWS 高度敏感。 MWS moderate resistance, S: sensitive to MWS, HS: highly sensitive to MWS.
Table 2. Classification standard of Maize white spot disease in maize
抗病等级 Disease resistance level
抗病性 Disease resistance
症状表现 Symptom manifestation
病斑覆盖面积(%) Disease spot coverage area (%)
Resistance
Resistance
Disease symptom
Coverage areas of disease spot
1 级 Level 1
高抗(HR) High resistance (HR)
叶片上无病斑或仅有少量病斑 There are no diseased spots on the leaves or only a small amount of diseased spots
少于或等于 5 Less than or equal to 5
3 级 Level 3
抗病 Disease resistance
叶片上少量病斑 A small amount of disease spots on the leaves
5 级 Level 5
中抗(MR) China National Pharmaceutical Group Corporation (Sinopharm)
叶片上病斑较多 There are more lesions on the leaves
7 级 Level 7
感病 Feeling sick
叶片上那大量病斑, 病斑相连 The large number of lesions on the leaves, the lesions are connected
9 级 Level 9
高感 High sensitivity
全株叶片基本为病斑覆盖 The entire plant leaves are basically covered with disease spots
大于 70 Greater than 70
2.3 表型鉴定与统计分析 2.3 Phenotypic identification and statistical analysis
使用 SPSS (SPSS Statistics 26) 和 ORIGIN (Orgin 2022) 软件对表型数据进行描述性统计分析。计算平均值、最小值、最大值、标准差 (SD)、变异系数 (CV)、偏度和峰度等测量值。使用 SPSS 软件进行表型数据频率分布。采用峰度和偏度来评估频率分布的正态性。按照 Knapp 等人[38]概述的方法计算广义遗传力。 Use SPSS (SPSS Statistics 26) and ORIGIN (Origin 2022) software for descriptive statistical analysis of phenotype data. Calculate measurements such as mean, minimum, maximum, standard deviation (SD), coefficient of variation (CV), skewness, and kurtosis. Use SPSS software for frequency distribution of phenotype data. Evaluate the normality of frequency distribution using kurtosis and skewness. Calculate broad-sense heritability according to the method outlined by Knapp et al. [38].
数量(跨年份和地点的总和), r 指重复次数。 Quantity (sum of across years and locations), r refers to the number of repetitions.
2.4 全基因组重测序 2.4 Whole genome resequencing
使用改良的 CTAB 方法[39]从玉米幼苗叶片中提取 DNA, 并进行全基因组重测序。按 DNA was extracted from maize seedling leaves using the improved CTAB method [39], and whole-genome resequencing was performed.
比对到玉米参考基因组 B73_RefGen_v5 上以鉴定 SNP 标记, 并使用 SNPeff 软件对其进行注释。 Compare with the maize reference genome B73_RefGen_v5 to identify SNP markers, and annotate them using the SNPeff software.
2.5 群体分析,LD 衰退 2.5 Population analysis, LD decline
使用 TreeBeST(版本:Treebest-1.9.2)软件计算距离矩阵, 并根据这个距离矩阵通过邻接法(NJ)构建系统进化树, 为了确保准确性, 引导值(bootstrap values)经过达 1000 次计算获得。使用 GCTA[40]进行 PCA 分析, 获得 PCA 结果后, 使用前 2 个 PC 的值对参与分析的样本进行二维展示。使用 PopLDdecay[41]软件计算两两标记间的计算连锁不平衡程度( , 并使用软件自带脚本 Plot_OnePop.pl 对 LD decay 绘图。 Using TreeBeST (version: Treebest-1.9.2) software to calculate the distance matrix, and constructing a systematic evolutionary tree based on this distance matrix using the neighbor-joining (NJ) method. To ensure accuracy, bootstrap values are obtained through 1000 calculations. Performing PCA analysis using GCTA[40], and after obtaining the PCA results, displaying the samples analyzed in two dimensions using the values of the first 2 PCs. Calculating the linkage disequilibrium between pairwise markers using PopLDdecay[41] software, and plotting LD decay using the software's built-in script Plot_OnePop.pl.
2.6 全基因组关联分析 2.6 Whole Genome Association Analysis
使用 GEMMA 软件[42](http://www.xzlab.org/software.html)进行全基因组关联分析,使用了 6390967 个高质量 SNP 分析。该分析针对所有环境的表型采用了 GWAS 的混合线性模型(MLM)。使用显着性阈值 用于识别与玉米 MWS 抗性相关的显著 SN P, 使用 bedtools v1.7 提取达到或超过阈值的 SNP 位点[43], 根据 B73_RefGen_v5 参考基 Using GEMMA software [42] (http://www.xzlab.org/software.html) for whole-genome association analysis, 6390967 high-quality SNPs were analyzed. The analysis used a mixed linear model (MLM) of GWAS for phenotypes across all environments. A significance threshold of was used to identify significant SNPs associated with maize MWS resistance, and bedtools v1.7 was used to extract SNP loci that reached or exceeded the threshold [43], based on the B73_RefGen_v5 reference genome.
因组和注释信息, 在显著 SNP 的上游和下游 50 kb 区域内鉴定出与玉米 MWS 相关的候选基因。 Due to the grouping and annotation information, candidate genes related to maize MWS were identified within the upstream and downstream 50 kb regions of significant SNPs.
2.7 遗传连锁图谱构建 2.7 Construction of genetic linkage maps
采用 JoinMap4.0 软件进行连锁图谱构建, LOD 阈值是使用 1000 个随机排列检验确定的 。如果 LOD 阈值 , 则认为 QTL 显著。根据 LOD 临界值 的标准定义连锁群,并使用符合条件的 SNP 标记构建遗传连锁图。并对其六个亚群子代基因分型基于 0.8 完整度和 0.001 偏分离进行过滤, 得到群体标记, 然后基于群体标记每隔 15 个且不存在连锁的标记进行画 bin, 得到最终群体标记, 并针对每个群体标记使用 Joinmap4.0 对每个群体的 bin 标记进行排序, Kosambi 函数计算标记之间的遗传距离(cM)。 Using JoinMap4.0 software for constructing linkage maps, the LOD threshold was determined using 1000 random permutations. If the LOD threshold is met, QTLs are considered significant. Linkage groups are defined based on the LOD critical value and genetic linkage maps are constructed using SNP markers that meet the criteria. Genotyping of six subpopulations is filtered based on 0.8 completeness and 0.001 segregation distortion, resulting in population markers. Bins are created every 15 markers without linkage to obtain final population markers. Each population marker is sorted using JoinMap4.0 and genetic distances (cM) between markers are calculated using the Kosambi function.
2.8 QTL 定位分析 2.8 QTL mapping analysis
使用 Windows QTL Cartographer 2.0 中的复合区间作图(CIM)方法进行 QTL作图。将 MWS 的表型数据整合到高密度遗传连锁图谱中, 将缺失率 的 SNP 标记和次要等位基因频率低于 0.05 的位点排除在外进行分析, 并使用 0.21 cM 来鉴定亲本多态性 QT L。该阈值是使用 1000 个随机排列以 置信区间 (CI) 确定的。与侧翼标记相关的 LOD 阈值设置为 2.5 , 以识别控制 MWS 的 QTL。 Use the Composite Interval Mapping (CIM) method in Windows QTL Cartographer 2.0 for QTL mapping. Integrate the MWS phenotype data into a high-density genetic linkage map, exclude SNP markers with a missing rate of and sites with minor allele frequencies below 0.05 for analysis, and use 0.21 cM to identify parental polymorphic QTL. This threshold was determined using 1000 random permutations at a confidence interval (CI). The LOD threshold associated with flanking markers is set at 2.5 to identify the QTL controlling MWS.
2.9 基因的功能注释和 GO 富集分析 2.9 Functional annotation of genes and GO enrichment analysis
本研究首先采用 MaizeGDB、InterPro、UniProt 和 NCBI 等数据库对候选基因进行注释和功能预测分析。随后, 使用 R 软件中的 clusterProfiler 包(V3.18.1)和默认参数对候选基因进行 GO 富集分析。 This study first used databases such as MaizeGDB, InterPro, UniProt, and NCBI to annotate and predict the functions of candidate genes. Subsequently, the candidate genes were subjected to GO enrichment analysis using the clusterProfiler package (V3.18.1) in R software with default parameters.
2.10 单倍型分析 2.10 Haplotype analysis
Haploview v4.2 软件用于分析在多种环境中一致检测到的或具有 MWS 相关功能的基因。 Haploview v4.2 software is used to analyze genes that are consistently detected or have MWS-related functions in multiple environments.
3.结果分析 3. Results Analysis
3.1 表型数据分析 3.1 Phenotypic Data Analysis
在砚山对六个 RIL 群体 pop1、pop2、pop3、pop4、pop5 和 pop6 进行了三年 MWS 表型数据收集,并对其表型数据进行了描述性统计分析(表3)。结果表明,在砚山三年中 6 The three-year MWS phenotype data collection was conducted on six RIL populations in Yanshan, and descriptive statistical analysis was performed on their phenotype data (Table 3). The results show that in Yanshan over three years, 6
个 RIL 群体的变异系数在 0.25-0.79 之间, 表明了样本之间的差异性, 并且该 6 个 RIL 群体对植株 MWS 感染的平均值在 3-6 之间。同时, 三年 6 个群体的偏度和峰度系数的绝对值均接近于 1, 表明研山三年的测试群体的 MWS 水平均服从正态分布, 表现出典型的数量性状特征。根据进一步分析, 各亚群的 MWS 严重程度的遗传力范围为 之间。相关性分析显示,同一群体中不同环境,RIL 的整体表现之间存在很强的相关性(0. 46-0.98)(表 3, 图 2)。强相关性表明 RIL 在不同环境下对白斑病抗性的反应一致, 不仅说明了白斑病抗性的高遗传性, 也体现了这些表型数据的高可靠性, 之后 GWAS 是根据该表型数据进行的。 The coefficient of variation of the 6 RIL populations ranges from 0.25 to 0.79, indicating differences among the samples, with the average values of plant MWS infection in the 6 RIL populations ranging from 3 to 6. At the same time, the absolute values of skewness and kurtosis coefficients of the 6 populations over three years are close to 1, indicating that the MWS levels of the test populations in the three years follow a normal distribution, showing typical quantitative trait characteristics. Further analysis shows that the heritability range of MWS severity in each subpopulation is between . Correlation analysis shows a strong correlation (0.46-0.98) between the overall performance of RILs in different environments within the same population (Table 3, Figure 2). The strong correlation indicates consistent resistance of RILs to white spot disease in different environments, not only demonstrating high heritability of resistance to white spot disease but also reflecting the high reliability of these phenotypic data, which were used for subsequent GWAS.
Table 3. 六个 RIL 亚群对 MWS 的描述性统计 Table 3. Descriptive Statistics of MWS for Six RIL Subgroups
群体 Group
环境 Environment
平均值 Average
标准差 Standard deviation
变异系数(%) Coefficient of Variation (%)
偏度 Skewness
峰度 Kurtosis
遗传力(%) Genetic force (%)
r
Pop(YML32×Ye107)
21 YS
5.764
1.533
0.267
0.264
-0.213
76.47
22 YS
5.808
1.468
0.253
-0.005
-0.284
23 YS
5.788
1.414
0.244
-0.055
-0.198
Pop2(TML418×Ye107)
21 YS
5.489
1.612
0.294
0.367
-0.204
84.93
22 YS 22 years
5.358
1.689
0.315
0.347
-0.398
23 YS
5.455
1.574
0.289
-0.046
-0.217
Pop3(CML171×Ye107)
21 YS
3.463
2.467
0.712
0.703
-0.527
98.79
22 YS
3.626
2.508
0.692
0.570
-0.774
23 YS
3.844
2.488
0.647
0.452
-0.882
Pop4(TML139×Ye107)
21 YS
2.899
2.060
0.711
0.665
-0.696
97.80
22 YS
3.201
2.241
0.700
0.537
-0.921
23 YS
3.491
2.325
0.666
0.390
-1.056
Pop5(YML226×Ye107)
21 YS
2.947
2.313
0.785
1.069
0.266
98.54
22 YS
3.066
2.386
0.778
0.923
-0.151
23 YS
3.395
2.522
0.743
0.725
-0.633
Pop6(NK40-1×Ye107)
21 YS
3.012
2.341
0.777
0.872
-0.346
98.47
22 YS
3.252
2.418
0.744
0.710
-0.644
23 YS
3.459
2.433
0.703
0.574
-0.838
2021、2022、2023 年在砚山进行的试验。 Experiments conducted in Yanshan in 2021, 2022, and 2023.
Figure 2. 六个亚群体中白斑病抗性的相关热图。描述了在不同环境下六个群体中每个 RILs 的整体表现相关性, 圆形越窄,颜色越红,表示相关性越强。 Figure 2. Heat map of white spot disease resistance in six subpopulations. Describes the overall performance correlation of each RILs in six populations in different environments, with narrower circles and redder colors indicating stronger correlation.
3.2 SNP 密度, LD 衰退 3.2 SNP density, LD decay
我们通过基因组重测序, 共鉴定出 6390967 个高质量全基因组 SNP, 分布在玉米的 Through genome resequencing, we identified a total of 6,390,967 high-quality whole-genome SNPs distributed in maize
10 条染色体上。图 3a 中的 SNP 密度图说明了 SNP 在玉米染色体上的标记密度。1 至 10 号染色体上鉴定的 SNP 数量如下:1354402、699661、716354、880880、668865、 On the 10 chromosomes. The SNP density map in Figure 3a illustrates the density of SNP markers on maize chromosomes. The number of SNPs identified on chromosomes 1 to 10 is as follows: 1354402, 699661, 716354, 880880, 668865.
485243、563392、545220、472482、462679。SNP 数量最多位于 1 号染色体上,而 10 号染色体上的 SNP 数量最少, 且 10 条染色体均存在部分的变异位点。并使用 6390967 个 SNP 来评估关联作图群体中的连锁不平衡(LD)衰减。图 所示, 发现当 下降率趋于平缓时, 物理距离约为 50 kb , 因此, 我们选择 50 kb 作为筛选候选者的标准。 485243, 563392, 545220, 472482, 462679. The largest number of SNPs is located on chromosome 1, while the smallest number of SNPs is on chromosome 10, and there are partial variant sites on all 10 chromosomes. Using 6390967 SNPs to evaluate the linkage disequilibrium (LD) decay in the mapping population. As shown in Figure , it was found that when the decay rate tends to flatten, the physical distance is about 50 kb. Therefore, we chose 50 kb as the standard for screening candidates.
Figure 3. SNP 密度图和 LD 衰退。(a) 1Mb 间隔内的染色体特异性 SNP 密度, 纵坐标表示染色体, 横坐标表示每条染色体上的位置, 对应位置颜色越红表示变异位点越多。(b)941 个玉米 RIL 中所有染色体的全基因组 LD 衰减 随物理距离 的变化。 Figure 3. SNP density map and LD decay. (a) Chromosome-specific SNP density within 1Mb intervals, the vertical axis represents chromosomes, the horizontal axis represents positions on each chromosome, with redder colors indicating more variant sites at corresponding positions. (b) Genome-wide LD decay in 941 maize RILs across all chromosomes as a function of physical distance.
3.3 群体结构分析 3.3 Group Structure Analysis
使用 TreeBeST 进行了群体进化树分析, 产生的系统进化树将 941 个 RIL 分为了 6 个亚群,后又用 GCTA 进行主成分分析(PCA),结果同样显示将 941 个 RIL 分为了 6 个亚群, 分别标记为 pop1-pop6, 图中出现一些线条混杂或群体重叠的地方可能源于 6 个亚群存在共同亲本 Ye107 以及在育种过程中出现了基因渗入 (图 4a-c)。因此, 在后续分析中考虑了人口结构这一可能导致 GWAS 假阳性关联的因素 。 The system evolution tree generated by using TreeBeST for population evolutionary tree analysis divided 941 RILs into 6 subgroups. Subsequently, principal component analysis (PCA) was performed using GCTA, which also showed the division of the 941 RILs into 6 subgroups, labeled as pop1-pop6. The presence of some mixed lines or overlapping populations in the figure may be due to the existence of a common parent Ye107 among the 6 subgroups and gene infiltration during the breeding process. Therefore, in the subsequent analysis, the population structure was considered as a factor that may lead to false positive associations in GWAS.
3.4 全基因组关联分析 3.4 Whole Genome Association Analysis
我们使用 GEMMA(http://www.xzlab.org/software.html)软件对多亲群体中 941 个 RIL 砚山三年的 MWS 表型数据和全基因组重测序得到的 6390967 个高质量 SNP, 进行了 BLU P 和砚山 GWAS 分析, 并采用混合线性模型(Mixed Linear Model, MLM)同时对群体结构和个体亲缘关系进行校正,共识别了 679 个 SNP(附表 1)。 We used GEMMA (http://www.xzlab.org/software.html) software to perform BLUP and Yanshan GWAS analysis on 941 RIL MWS phenotype data and 6,390,967 high-quality SNPs obtained from whole-genome resequencing in a multi-parent population over three years. We used a Mixed Linear Model (MLM) to correct for population structure and individual relatedness, and identified a total of 679 SNPs (see Supplementary Table 1).
在 BLUP 的 GWAS 分析中,我们鉴定出了 183 个显著的 SNP,分别位于 1、2、4、 6、7、8、9、10 号染色体上(图 5a),这些显著的 SNP 解释了-1.05%-7.24%的表型方 In the GWAS analysis of BLUP, we identified 183 significant SNPs located on chromosomes 1, 2, 4, 6, 7, 8, 9, and 10 (Figure 5a), these significant SNPs explain -1.05% to 7.24% of the phenotypic variance.
差, 其中 2 号染色体上显著 SNP 数量最多, 10 号染色体上则最少(附表 1)。 Difference, with the largest number of significant SNPs on chromosome 2 and the fewest on chromosome 10 (see Table 1).
在 21 YS 的 GWAS 分析中, 我们鉴定出了 109 个显著的 SNP, 分别位于 1、2、4、6、 In the GWAS analysis of 21 YS, we identified 109 significant SNPs located on chromosomes 1, 2, 4, 6
7、8、9、10 号染色体上(图 5b),这些显著的 SNP 解释了-0.25%至 7.17%的表型方差,其中 2 号染色体上显著 SNP 数量最多, 5 号染色体上最少(附表 1)。 On chromosomes 7, 8, 9, and 10 (Figure 5b), these significant SNPs explain -0.25% to 7.17% of the phenotypic variance, with the largest number of significant SNPs on chromosome 2 and the fewest on chromosome 5 (see Table 1).
在 22 YS 的 GWAS 分析中, 我们鉴定出了 183 个显著的 SNP, 分别位于 1、2、4、6、 7、8、9、10 号染色体上(图 5c), 这些显著的 SNP 解释了-0.83%-6.76%的表型方差, 其中 2 号染色体上显著 SNP 数量最多, 3 号染色体上最少(附表 1)。 In the GWAS analysis of 22 YS, we identified 183 significant SNPs located on chromosomes 1, 2, 4, 6, 7, 8, 9, and 10 (Figure 5c). These significant SNPs explain -0.83% to 6.76% of the phenotypic variance, with the highest number of significant SNPs on chromosome 2 and the fewest on chromosome 3 (Supplementary Table 1).
在 23YS 的 GWAS 分析中, 我们鉴定出了 204 个显著的 SNP, 分别位于 1、2、4、6、 7、8、9、10 号染色体上(图 5d), 这些显著的 SNP 解释了 的表型方差, 其中 2 号染色体上 SNP 数量最多, 10 号染色体上最少(附表 1)。 In the GWAS analysis of 23YS, we identified 204 significant SNPs located on chromosomes 1, 2, 4, 6, 7, 8, 9, and 10 (Figure 5d). These significant SNPs explain of the phenotypic variance, with the highest number on chromosome 2 and the lowest on chromosome 10 (Supplementary Table 1).
值得注意的是,在这些显著 SNP 内鉴定出了几个多环境共定位或单环境定位的 39 个显著 SNP(附表 2)。分布于 1, 2, 4, 6, 7, 8, 9 号染色体上, 其中 2 染色体上最多, 7 号染色体上最少, 这些 SNP 将作为进一步功能基因鉴定的候选者。 It is worth noting that several significant SNPs have been identified within these significant SNPs, with 39 significant SNPs co-located in multiple environments or single-environment locations (see Table 2). They are distributed on chromosomes 1, 2, 4, 6, 7, 8, and 9, with the most on chromosome 2 and the fewest on chromosome 7. These SNPs will serve as candidates for further functional gene identification.
Figure 5. 曼哈顿图和 QQ 图。BLUP(a) 、YS21 (b)、YS22 (c)、YS23 (d)的曼哈顿图(左)和 图(右)显示与 MWS 抗性相关的 SNP。左图中的每个点代表一个 SNP, 黑线代表 的阈值。不同颜色代表不同染色体右图中的红线是每种情况下理想 Q-Q 图应对应的趋势线。 Figure 5. Manhattan plot and QQ plot. Manhattan plots (left) and QQ plots (right) of BLUP(a), YS21 (b), YS22 (c), and YS23 (d) showing SNPs associated with MWS resistance. Each point in the left plot represents a SNP, and the black line represents the threshold. Different colors represent different chromosomes. The red line in the right plot is the trend line that the ideal Q-Q plot should correspond to in each case.
3.4.1 GWAS 揭示的候选基因 3.4.1 Candidate genes revealed by GWAS
本研究使用 B73_RefGen_v5 参考基因组, 对通过鉴定到的显著 SNP 的上游和下游 50 Kb 区域进行筛选,并利用 MaizeGDB、InterPro、UniProt 和 NCBI 等数据库对候选基因进行注释和功能预测分析, 最终确定了与 52 个可能与 MWS 抗性相关的候选基因(附表 3), 其中在四个环境下共同定位到 24 个, 三个环境下共同定位到 13 个, 两个环境下共同定位 9 个, 单个环境下定位到 6 个, 其中大部分的候选基因都有功能注释。 This study used the B73_RefGen_v5 reference genome to screen the upstream and downstream 50 Kb regions of identified significant SNPs, and annotated and functionally predicted candidate genes using databases such as MaizeGDB, InterPro, UniProt, and NCBI. Finally, 52 candidate genes possibly related to MWS resistance were identified (see Table 3), with 24 co-located in four environments, 13 in three environments, 9 in two environments, and 6 in a single environment, most of which have functional annotations.
其次, 为了进一步鉴定最终的 MWS 抗性候选基因, 我们对基于 39 个显著 SNP 得到的 52 个候选基因进行了基因本体 (GO) 富集分析(图 5)。但由于仅使用 MaizeGDB 数据库作为参考, 最终仅富集了 38 个基因(图 6a),从图结果可以看出,在生物过程分类中, 富集最多的在“细胞和代谢过程”、“生物过程的正负调控”、“对刺激的反应”和“信号”; 分子功能分类中,二级分类中“结合”富集最多,“催化活性”次之。 细胞成分分类中, “细胞解剖实体”富集最多。从图 6b 可以看出,“磷酸转移酶活性”、“蛋白激酶活性”、“激酶活性"、“蛋白丝氨酸苏氨酸激酶活性”等富集程度最大,且显著性最高。 Next, in order to further identify the final MWS resistance candidate genes, we conducted gene ontology (GO) enrichment analysis on 52 candidate genes obtained based on 39 significant SNPs (Figure 5). However, due to using only the MaizeGDB database as a reference, only 38 genes were ultimately enriched (Figure 6a). From the results, it can be seen that in the biological process category, the most enriched are in "cell and metabolic processes," "positive and negative regulation of biological processes," "response to stimuli," and "signaling"; in the molecular function category, the secondary category "binding" is the most enriched, followed by "catalytic activity." In the cellular component category, "cell anatomical entity" is the most enriched. From Figure 6b, it can be seen that "phosphotransferase activity," "protein kinase activity," "kinase activity," and "protein serine/threonine kinase activity" are the most enriched and have the highest significance.
Figure 6. GO 富集分类图。(a):GO 二级分类柱状图, 横坐标表示富集到的基因数目, 纵坐标表示 GO 术语, 其中绿色条表示细胞组件类别中的 GO 术语。 紫色条表示与分子功能类别相关的 GO 术语。橙色条代表与生物过程相关的 GO 术语。(b):GO 显著性气泡图, Rich Factor 指差异表达的基因中位于该 term 条目的基因数目与所有基因中位于该 term 条目的基因总数的比值, RichFactor 越大,表示富集的程度越高。气泡大小表示基因的数量多少,气泡越大,富集到该 term 基因数量越多; 气泡颜色的深浅表示显著性的高低,颜色越红表示富集到该 term 的显著性越高。 Figure 6. GO enrichment classification diagram. (a): GO secondary classification histogram, the horizontal axis represents the number of enriched genes, the vertical axis represents GO terms, with green bars representing GO terms in the cellular component category. Purple bars represent GO terms related to molecular function category. Orange bars represent GO terms related to biological processes. (b): GO significance bubble chart, Rich Factor refers to the ratio of the number of genes in the differentially expressed genes located in the term entry to the total number of genes located in the term entry among all genes, the larger the Rich Factor, the higher the degree of enrichment. The size of the bubble indicates the number of genes, the larger the bubble, the more genes enriched in that term; the depth of the bubble color indicates the significance level, the redder the color, the higher the significance of enrichment in that term.
通过上述分析,我们最终确定了 5 个 MWS 抗性相关的候选基因表 5, 位于 SNP2-2 12792117 上的 Zm00001eb107230, 在四个不同环境下共同定位到, 共解释了 20.39%的表 Through the above analysis, we finally identified 5 candidate genes related to MWS resistance, located on SNP2-2 12792117 Zm00001eb107230, jointly mapped in four different environments, explaining a total of 20.39% of the table.
型变异, 功能主要与 Remorin 家族蛋白相关; 位于 SNP2-210547514 上的 Zm00001eb10629 Mutation, function mainly related to the Remorin family protein; located on Zm00001eb10629 at SNP2-210547514
0 , 只在 23 YS 环境中定位到, 解释了 的表型变异, 功能主要与 Harpin 诱导蛋白相关; 位于 SNP4-118983592 上的 Zm00001eb1818500, 在 21YS、23YS、BLUP 环境中共定位到, 共解释了 的表型变异, 功能主要与富亮氨酸重复序列( LRR )家族蛋白相关;位于 SNP8-149153434 和 SNP8-149695036 上的 Zm00001eb358700, Zm00001eb358890, 在四个不同环境下共同定位到, 共解释了 的表型变异, 功能主要与热胁迫转录因子和丝裂原激活的蛋白激酶相关。 0, located only in the 23 YS environment, explained the phenotypic variation of , with its function mainly related to Harpin-induced proteins; Zm00001eb1818500 located on SNP4-118983592, was located in the 21YS, 23YS, and BLUP environments, explaining a total of phenotypic variations, with its function mainly related to leucine-rich repeat (LRR) family proteins; Zm00001eb358700 and Zm00001eb358890 located on SNP8-149153434 and SNP8-149695036, were jointly located in four different environments, explaining a total of phenotypic variations, with their function mainly related to heat stress transcription factors and cyclin-dependent protein kinases.
Table 4. 候选基因表 Table 4. Candidate Gene Table
Candidate Gene
Position
Chr
PVE(%)
Gene Annotation
Environment
Zm00001eb107230
212792117
2
20.39
Remorin family protein Proteína de la familia Remorin
21YS、22YS、23YS、BLUP 21YS, 22YS, 23YS, BLUP
Zm00001eb106290
210547514
2
6.46
Harpin-induced protein
23YS
Zm00001eb181850
118983592
4
8.84
Leucine-rich repeat (LRR) family protein
21YS、23YS、BLUP 21YS, 23YS, BLUP
Zm00001eb358700
149153434
8
22.4
Heat stress transcription factor C-1a
21YS、22YS、23YS、BLUP 21YS, 22YS, 23YS, BLUP
Zm00001eb358890
149695036
8
15.05
Mitogen-activated protein kinase
21YS、22YS、23YS、BLUP 21YS, 22YS, 23YS, BLUP
3.4.2 单倍型分析 3.4.2 Haplotype Analysis
基于上述所笕选出来候选基因, 本研究利用 Haploview v4.2 软件对其进行单倍型分析。各候选基因主要单倍型类型和数量如附表 4 所示, Zm00001eb107230 表现出 5 种主要单倍型(图 7b),且 Hap2 (AAGT) 单倍型的植物对玉米白斑病表现出更好的抗性; Zm00 001 eb106290 表现出 4 种主要单倍型, Hap2 (AGCCGG) 单倍型的植物对玉米白斑病表现出更好的抗性(图 8b); Zm00001eb181850 表现出 7 种单倍型, Hap8 (CTCGCGGTGCT TAACCATGTGCCAA) 单倍型的植物对玉米白斑病表现出更好的抗性(图 9b);Zm0000 leb358700 表现出 7 种单倍型, 从图 10b 中可以看出, Hap4 (CCACC) 单倍型的植物对玉米白斑病表现出更好的抗性; Zm00001eb358890 表现出 7 种单倍型, 从图 11b 中可以看出, Hap3 (GGGCGTGCGGGCAG) 单倍型的植物对玉米白斑病表现出更好的抗性。在 RN A 测序过程中,发现 Zm00001eb1106290、Zm00001eb107230、Zm00001eb181850、Zm000 01eb358700、Zm00001eb358890 基因在叶片相关生长过程中表达,进一步证明五个候选基因可能与玉米白斑病抗性相关(图 7c、8c、9c、10c、11c)。 Based on the selected candidate genes mentioned above, this study used Haploview v4.2 software for haplotype analysis. The main haplotype types and quantities of each candidate gene are shown in Supplementary Table 4. Zm00001eb107230 exhibits 5 main haplotypes (Figure 7b), with the Hap2 (AAGT) haplotype showing better resistance to maize white spot disease. Zm00001eb106290 shows 4 main haplotypes, with the Hap2 (AGCCGG) haplotype showing better resistance to maize white spot disease (Figure 8b). Zm00001eb181850 shows 7 haplotypes, with the Hap8 (CTCGCGGTGCT TAACCATGTGCCAA) haplotype showing better resistance to maize white spot disease (Figure 9b). Zm00001eb358700 shows 7 haplotypes, and from Figure 10b, it can be seen that the Hap4 (CCACC) haplotype exhibits better resistance to maize white spot disease. Zm00001eb358890 shows 7 haplotypes, and from Figure 11b, it can be seen that the Hap3 (GGGCGTGCGGGCAG) haplotype exhibits better resistance to maize white spot disease. During RNA sequencing, it was found that the genes Zm00001eb1106290, Zm00001eb107230, Zm00001eb181850, Zm00001eb358700, and Zm00001eb358890 are expressed during leaf-related growth processes, further demonstrating that these five candidate genes may be related to resistance to maize white spot disease (Figure 7c, 8c, 9c, 10c, 11c).
a
Figure 7. Zm00001eb107230 单倍型及相关分析 (a) 候选基因位置。(b)5 种单倍型对 MWS 抗性水平的总体差异, *表示 表示 。 (d) Zm00001eb107230 基因在各种组织中的表达水平,橙色框突出显示了叶子中的表达。 Figure 7. Zm00001eb107230 haplotype and related analysis (a) Candidate gene location. (b) Overall differences in MWS resistance levels among 5 haplotypes, * indicates indicates . (d) Expression levels of the Zm00001eb107230 gene in various tissues, with orange highlighting the expression in leaves.
Figure 8. Zm00001eb106290 单倍型及相关分析 (a) 候选基因位置。(b)4 种单倍型对 MWS 抗性水平的总体差异, *表示 表示 。(d)Zm00001eb106290 基因在各种组织中的表达水平, 橙色框突出显示了叶子中的表达。 Figure 8. Zm00001eb106290 haplotype and related analysis (a) Candidate gene location. (b) Overall differences in MWS resistance levels among 4 haplotypes, * represents represents . (d) Expression levels of the Zm00001eb106290 gene in various tissues, with the orange box highlighting expression in leaves.
a
b
C
Figure 9. Zm00001eb181850 单倍型及相关分析 (a) 候选基因位置。(b)8 种单倍型对 MWS 抗性水平的总体差异, *表示 表示 。(d) Zm00001eb181850 基因在各种组织中的表达水平, 橙色框突出显示了叶子中的表达。 Figure 9. Zm00001eb181850 haplotype and related analysis (a) Candidate gene location. (b) Overall differences in MWS resistance levels among 8 haplotypes, * indicates indicates . (d) Expression levels of the Zm00001eb181850 gene in various tissues, with orange highlighting expression in leaves.
体差异, *表示 表示 表示 。 (d) Zm00001eb358700 基因在各种组织中的表达水平, 橙色框突出显示了叶子中的表达。 Body differences, * represents represents represents . (d) Expression levels of the Zm00001eb358700 gene in various tissues, with orange boxes highlighting expression in leaves.
Figure 11. Zm00001eb358890 单倍型及相关分析 (a) 候选基因位置。(b)8 种单倍型对 MWS 抗性水平的总体差异, *表示 表示 表示 。 (d) Zm00001eb3588890 基因在各种组织中的表达水平, 橙色框突出显示了叶子中的表达。 Figure 11. Zm00001eb358890 haplotype and related analysis (a) Candidate gene location. (b) Overall differences in MWS resistance levels among 8 haplotypes, * represents represents represents . (d) Expression levels of gene Zm00001eb3588890 in various tissues, with orange highlighting expression in leaves.
3.5 连锁分析 3.5 Chain Analysis
3.5.1 群体标记分析 3.5.1 Group Tag Analysis
我们构建了 6 个亚群(pop1、pop2、pop3、pop4、pop5、pop6)的遗传连锁图谱, 最终在六个亚群中形成了 7418 个 bin 标记(附表 5)。Pop1 连锁图全长 1071.97 cM , 平均标记距离 0.65 cM 。最长的染色体是 1 号染色体, 跨度为 163.88 cM , 最短的是 10 号染色体,跨度为 80.43 cM 。pop2 连锁图全长 721.56 cM , 平均标记间距 0.72 cM 。最长的染色体是 7 号染色体, 长度为 120.12 cm , 最短的是 2 号染色体, 长度为 34.51 cM 。Pop3 连锁图全长 422.44 cM , 平均标记距离 0.59 cM 。最长的染色体是 4 号染色体, 跨度为 66.78 cM , 最短的是 9 号染色体, 跨度为 22.84 cM 。Pop4 连锁图全长 588.07 cM , 平均标记距离 0.65 cM 。最长的染色体是 1 号染色体, 跨度为 103.77 cM , 最短的是 6 号染色体, 跨度为 28.64 cM 。 Pop5 连锁图全长 1358.94 cM , 平均标记距离 0.62 cM 。最长的染色体是 5 号染色体, 跨度为 205.41 cM , 最短的是 6 号染色体, 跨度为 75.57 cM 。Pop6 连锁图全长 684.2 cM , 平均标记距离 0.65 cM 。最长的染色体是 4 号染色体, 跨度为 137 cM , 最短的是 10 号染色体, 跨度为 63 cM 。 We constructed genetic linkage maps for 6 subpopulations (pop1, pop2, pop3, pop4, pop5, pop6), ultimately forming 7418 bin markers across the six subpopulations (see Supplementary Table 5). The full length of the linkage map for Pop1 is 1071.97 cM, with an average marker distance of 0.65 cM. The longest chromosome is chromosome 1, spanning 163.88 cM, while the shortest is chromosome 10, spanning 80.43 cM. The full length of the linkage map for pop2 is 721.56 cM, with an average marker interval of 0.72 cM. The longest chromosome is chromosome 7, with a length of 120.12 cm, and the shortest is chromosome 2, with a length of 34.51 cM. The full length of the linkage map for Pop3 is 422.44 cM, with an average marker distance of 0.59 cM. The longest chromosome is chromosome 4, spanning 66.78 cM, while the shortest is chromosome 9, spanning 22.84 cM. The full length of the linkage map for Pop4 is 588.07 cM, with an average marker distance of 0.65 cM. The longest chromosome is chromosome 1, spanning 103.77 cM, while the shortest is chromosome 6, spanning 28.64 cM. The full length of the linkage map for Pop5 is 1358.94 cM, with an average marker distance of 0.62 cM. The longest chromosome is chromosome 5, spanning 205.41 cM, while the shortest is chromosome 6, spanning 75.57 cM. The full length of the linkage map for Pop6 is 684.2 cM, with an average marker distance of 0.65 cM. The longest chromosome is chromosome 4, spanning 137 cM, while the shortest is chromosome 10, spanning 63 cM.
3.5.2 QTL 定位分析 3.5.2 QTL positioning analysis
为了识别与 MWS 抗性相关的 QTL, 我们在不同环境中对 6 个群体 (Pop1、Pop2、 P op3、 Pop4 和 Pop6)进行了 QTL 定位分析。鉴定出位于 1、4、6、7、8、9 和 10 号染色体上的总共 42 个显著 QTL(附表 6), 其中在 BLUP 中定位到 9 个显著 QTL, 表型解释率为 6.01%-9.66%, 其中 qMWS1-1、qMWS1-2、 qMWS4-1、qMWS7-1 表现出负向的加性效应, qMWS7-2、qMWS9-1、qMWS9-2、qMWS9-3 表现出正向的加性效应; 在 21YS 中定位到 10 个显著 QTL, 解释的表型方差从 , 其中只有 qMWS7-3、 qMWS74、qMWS9-4 表现出正向的加性效应, 其它的 QTL 均表现为负向的加性效应; 在 22YS 中定位到 14 个显著 QTL,表型解释率从 5.73%-11.57%,其中只有 qMWS4-4、qMWS6-4、q MWS7-7、qMWS7-8、qMWS7-9、qMWS9-5 qMWS9-6 表现出正向的加性效应, 其余的为负向加性效应; 在 23 YS 中定位到 9 个显著 QTL, 表型解释率从 5.8%-12.07%, 其中 qMW S4-7、qMWS4-8、qMWS6-5、qMWS7-10、qMWS9-8 表现出正向加性效应, 其余的为负向加性效应。 In order to identify QTL related to MWS resistance, we conducted QTL mapping analysis on 6 populations (Pop1, Pop2, Pop3, Pop4, and Pop6) in different environments. A total of 42 significant QTLs were identified on chromosomes 1, 4, 6, 7, 8, 9, and 10 (see Table 6), with 9 significant QTLs located in BLUP, explaining 6.01%-9.66% of the phenotypic variance. Among them, qMWS1-1, qMWS1-2, qMWS4-1, and qMWS7-1 showed negative additive effects, while qMWS7-2, qMWS9-1, qMWS9-2, and qMWS9-3 showed positive additive effects. In 21YS, 10 significant QTLs were identified, explaining phenotypic variance from , with only qMWS7-3, qMWS74, and qMWS9-4 showing positive additive effects, while the rest showed negative additive effects. In 22YS, 14 significant QTLs were identified, explaining phenotypic variance from 5.73% to 11.57%, with only qMWS4-4, qMWS6-4, qMWS7-7, qMWS7-8, qMWS7-9, qMWS9-5, and qMWS9-6 showing positive additive effects, while the rest showed negative additive effects. In 23YS, 9 significant QTLs were identified, explaining phenotypic variance from 5.8% to 12.07%, with qMWS4-7, qMWS4-8, qMWS6-5, qMWS7-10, and qMWS9-8 showing positive additive effects, while the rest showed negative additive effects.
我们将 GWAS 结果和 QTL 定位结果进行对比分析, 如表 5 所示,共同定位到了 26 个基因, 解释了 的表型变异, 分别位于 1、4、7 号染色体上, 其中在 4 号染色体上共定位到的最多, 1 号染色体上共定位到的最少。在 SNP1-127437679 与 qMWS1-1、 qMWS1-4 共同定位到 1 个基因; 在 SNP4-14146725、SNP4-14319548、SNP4-13843404、 SNP4-29691239 与 qMWS4-4、qMWS4-5 下共同定位到 10 个基因; SNP4-76323495、 SNP4-68326051 与 qMWS4-2 共定位到 3 个基因; SNP4-183082843、SNP4-174551447 与 qMWS4-8、qMWS4-1、qMWS4-3、qMWS4-9 共同定位到 8 个基因。SNP7-72694269 与 qMWS7-11 共定位到 3 个基因。SNP7-146193699 与 qMWS7-2、 MWS7-3、 qMWS7-8、 qMWS7-10 共定位到 1 个基因。 We compared the GWAS results with QTL mapping results, as shown in Table 5, and identified a total of 26 genes that explain of the phenotypic variation, located on chromosomes 1, 4, and 7. The most co-located genes were found on chromosome 4, while the least were found on chromosome 1. Gene co-localization was observed between SNP1-127437679 and qMWS1-1, qMWS1-4, with 1 gene co-located; between SNP4-14146725, SNP4-14319548, SNP4-13843404, SNP4-29691239 and qMWS4-4, qMWS4-5, with 10 genes co-located; between SNP4-76323495, SNP4-68326051 and qMWS4-2, with 3 genes co-located; between SNP4-183082843, SNP4-174551447 and qMWS4-8, qMWS4-1, qMWS4-3, qMWS4-9, with 8 genes co-located. SNP7-72694269 co-located with qMWS7-11, with 3 genes co-located. SNP7-146193699 co-located with qMWS7-2, qMWS7-3, qMWS7-8, qMWS7-10, with 1 gene co-located.
同时使用 B73_RefGen_v5 参考基因组, 并利用 MaizeGDB、InterPro、UniProt 和 NCBI 等数据库对候选基因进行注释和功能预测分析,以及基于共定位得到的 26 个候选基因进行了基因本体 (GO) 富集分析(图 12),在 GO 二级分类柱状图(图 12a)可以看出在生物过程分类中,富集最多在“代谢和细胞过程”、“对刺激的反应”、“生物过程调控和生物调控”和“信号”; 分子功能分类中,二级分类中“结合”富集最多,“催化活性”次之。细胞成分分类中, “细胞解剖实体”占主体。从 GO 显著性气泡图(图 12b)可以看出, “S-甲基-5-硫代核糖-1-磷酸异构酶活性”、“核小体活动化”、“CHRAC”富集程度最大,且显著性最高。 Using the B73_RefGen_v5 reference genome, candidate genes were annotated and functionally predicted using databases such as MaizeGDB, InterPro, UniProt, and NCBI. Furthermore, gene ontology (GO) enrichment analysis was conducted based on 26 candidate genes obtained through co-localization (Figure 12). In the GO secondary classification bar graph (Figure 12a), it can be seen that the enrichment is highest in the categories of "metabolic and cellular processes," "response to stimulus," "regulation of biological processes and regulation of biological processes," and "signaling" in the biological process category; in the molecular function category, the highest enrichment is in the secondary category of "binding," followed by "catalytic activity." In the cellular component category, "cell anatomical entity" predominates. From the GO significance bubble chart (Figure 12b), it can be seen that the enrichment levels are highest for "S-methyl-5-thioribose-1-phosphate isomerase activity," "nucleolus activation," and "CHRAC," with the highest significance.
Figure 12. GO 富集分类图。(a): GO 二级分类柱状图,横坐标表示富集到的基因数目,纵坐标表示 GO 术语, 其中绿色条表示细胞组件类别中的 GO 术语。 紫色条表示与分子功能类别相关的 GO 术语。橙色条代表与生物过程相关的 GO 术语。(b):GO 显著性气泡图,Rich Factor 指差异表达的基因中位于该 term 条目的基因数目与所有基因中位于该 term 条目的基因总数的比值, RichFactor 越大,表示富集的程度越高。气泡大小表示基因的数量多少,气泡越大,富集到该 term 基因数量越多; 气泡颜色的深浅表显著性的高低, 颜色越红表示富集到该 term 的显著性越高。 Figure 12. GO enrichment classification diagram. (a): GO secondary classification histogram, the horizontal axis represents the number of enriched genes, the vertical axis represents GO terms, with green bars representing GO terms in the cellular component category. Purple bars represent GO terms related to molecular function category. Orange bars represent GO terms related to biological processes. (b): GO significance bubble chart, Rich Factor refers to the ratio of the number of genes in the differentially expressed genes located in the term entry to the total number of genes located in the term entry among all genes. The larger the Rich Factor, the higher the degree of enrichment. The size of the bubble indicates the number of genes, with larger bubbles indicating more genes enriched in that term; the depth of the bubble color indicates the level of significance, with a redder color indicating higher significance of enrichment in that term.
Table 5. GWAS 与 QTL 共定位基因 Table 5. Genes Co-located between GWAS and QTL
Gene ID
PVE(%)
POS
Chr
QTL
QTL-Loc
GWAS-Loc
annotation
Zm00001eb026960
2.08
127437679
1
qMWS1-1
qMWS1-4
Pop2-BLUP
Pop2-23YS
23YS
Deoxyribodipyrimidine photo-lyase
Zm00001eb168490
21YS/22YS/
Methyltransferase
Zm00001eb168500
15.14
14146725
4
23YS/ BLUP
Zm00001eb168510
Cysteine proteinases superfamily protein
Zm00001eb168530
4.45
14319548
4
qMWS4-4
Pop6-22YS
23YS/BLUP
通过上述分析, 我们最终鉴定出了 2 个 GWAS 与 QTL 共定位的与 MWS 抗性相关的 Through the above analysis, we finally identified 2 GWAS co-located with QTL related to MWS resistance
候选基因,位于 SNP7-72694269 上的两个基因 Zm00001eb308270 和 Zm00001eb308280,基因功能分别与 Kelch 型 F-box 蛋白和 NF-Y 转录因子相关, 与 Pop6-23YS 下定位到的 qM WS7-11 重复定位到, 共解释了 的表型变异。 Candidate genes, two genes Zm00001eb308270 and Zm00001eb308280 located on SNP7-72694269, are respectively associated with Kelch-type F-box protein and NF-Y transcription factor functions, and are co-located with qM WS7-11 identified under Pop6-23YS, explaining % of the phenotypic variation.
基于上述篮选出来的 2 个共定位候选基因进行了单倍型分析, 候选基因重要单倍型和 Based on the above-mentioned basket, two co-located candidate genes were selected for haplotype analysis, and the important haplotypes of candidate genes were