Main  主要

Paralleling the obesity epidemic, steatotic liver disease (SLD) is a growing burden worldwide. SLD includes a spectrum of conditions characterized by an excess of lipids, mainly triglycerides, stored in intracellular lipid droplets in the liver, potentially progressing to inflammation, fibrosis and ultimately to cirrhosis and liver cancer1. SLD is a heterogenous disease coexisting with a metabolic derangement, including visceral adiposity, insulin resistance and hypertension, namely, metabolic dysfunction-associated SLD (or MASLD). This metabolic derangement ultimately increases the risk of cardiovascular events, including heart failure, and also increases kidney disease2,3,4. Indeed, cardiovascular disease is the most frequent cause of death in individuals with MASLD, whereas liver-related death is less frequent; however, it is a common clinical observation that some individuals develop a rapidly progressing liver disease despite similar or even less-marked metabolic derangement.
与肥胖流行病并行,脂肪性肝病(SLD)是全球日益增长的负担。SLD 包括一系列以肝脏细胞内脂滴中脂质(主要是甘油三酯)过多储存为特征的疾病,可能进展为炎症、纤维化,最终导致肝硬化甚至肝癌 1 。SLD 是一种异质性疾病,与代谢紊乱共存,包括内脏脂肪、胰岛素抵抗和高血压,即代谢功能障碍相关脂肪性肝病(或 MASLD)。这种代谢紊乱最终增加了心血管事件的风险,包括心力衰竭,同时也增加了肾脏疾病 2,3,4 。事实上,心血管疾病是 MASLD 患者死亡的最常见原因,而与肝脏相关的死亡较少见;然而,临床观察中常见一些个体尽管代谢紊乱相似甚至更轻微,但仍然会迅速发展出肝病。

MASLD has a strong inherited component; several variants that increase primarily liver lipids by impairing hepatocyte lipid droplet remodeling and lipoprotein secretion also cause the progression of MASLD5; however, contrarily to the epidemiological evidence, these variants result in a protection against cardiovascular disease and no association with hypertension5,6,7 or heart failure, suggesting no causal relationship between MASLD and cardiometabolic derangement5.
MASLD 具有强烈的遗传成分;几种变异主要通过损害肝细胞脂滴重塑和脂蛋白分泌,主要增加肝脏脂质,导致 MASLD 的进展 5 ;然而,与流行病学证据相反,这些变异导致对心血管疾病的保护作用,且与高血压 5,6,7 或心力衰竭无关联,表明 MASLD 与心代谢紊乱之间无因果关系 5

Over the last 15 years, genome-wide association studies (GWAS) identified several genetic loci associated with chronic liver disease or proxies for increased liver triglyceride content8,9,10,11,12,13. Excess in adiposity amplifies the effect size of a handful of variants14 likely by increasing ectopic visceral fat. To improve the precision of genetic studies and to identify genetic variants with primary effects on the liver, independent of adiposity, GWAS analyses are typically adjusted for body mass index (BMI); however, anthropometric measures of adiposity (BMI) and body fat distribution (waist circumference) fail to provide an accurate quantification of visceral adiposity, which is most closely related to insulin resistance and metabolic alterations. Therefore, standard adjustments for BMI may fail to capture and remove the total effect of adiposity on liver fat, limiting the precision of GWAS. In contrast, imaging (for example, visceral adipose volume) and bioelectrical impedance analysis (for example, whole-body fat mass) are more accurate measurements of body composition and are better predictors of MASLD15. We thus reasoned that adjusting for these traits could better capture the effect of adiposity on liver fat, thereby improving the power to detect previously unknown loci contributing to SLD.
过去 15 年中,全基因组关联研究(GWAS)确定了与慢性肝病或肝脏甘油三酯含量增加的遗传位点。肥胖过剩会放大一些变异的效果大小,可能通过增加异位内脏脂肪。为了提高遗传研究的精确性并识别对肝脏有主要影响的遗传变异,独立于肥胖,GWAS 分析通常调整体重指数(BMI);然而,肥胖的体测量(BMI)和体脂分布(腰围)无法准确量化内脏脂肪,这与胰岛素抵抗和代谢改变最为密切。因此,标准的 BMI 调整可能无法捕捉和消除肥胖对肝脏脂肪的总效应,限制了 GWAS 的精确性。相比之下,成像(例如,内脏脂肪体积)和生物电阻抗分析(例如,全身脂肪质量)是更准确的身体成分测量方法,也是 MASLD(2#)更好的预测指标。 因此,我们推断,调整这些特征可以更好地捕捉肥胖对肝脏脂肪的影响,从而提高检测先前未知对 SLD 有贡献的位点的效力。

Here, we show that indices of adiposity differentially contribute to the association between genetic variants and liver triglyceride content/inflammation, and we leverage these indices to identify previously unknown genetic loci associated with SLD. We identified and replicated six previously unknown loci and generated two partitioned polygenic risk scores (pPRSs) that suggest the presence of at least two distinct types of MASLD, one confined to the liver and one entwined in the systemic cardiometabolic syndrome.
这里,我们表明肥胖指数在不同程度上对遗传变异与肝脏甘油三酯含量/炎症之间的关联有贡献,并利用这些指数来识别与 SLD 相关的先前未知的遗传位点。我们鉴定并复制了六个先前未知的位点,并生成了两个分区的多基因风险评分(pPRSs),这表明至少存在两种不同类型的 MASLD,一种局限于肝脏,另一种与全身性心血管代谢综合征交织在一起。

Results  结果

Visceral adipose tissue, whole-body fat mass and BMI are independent predictors of liver triglyceride content and inflammation/fibrosis
内脏脂肪组织、全身脂肪质量和 BMI 是肝脏甘油三酯含量和炎症/纤维化的独立预测因子

To identify the independent predictors of liver triglyceride content and inflammation/fibrosis among the indices of adiposity, we examined the pairwise correlations among different measures of adiposity and (1) liver triglyceride content measured by magnetic resonance imaging (MRI)-derived proton density fat fraction (PDFF); and (2) liver inflammation/fibrosis measured by liver iron corrected T1 (cT1) in European participants from the UK Biobank (Extended Data Fig. 1a). The strongest correlation with liver traits was observed for visceral adipose tissue (VAT) volume followed by BMI, waist-to-hip ratio (WHR) and whole-body fat mass (WFM). As expected, there was a high correlation between PDFF and cT1. Due to high multicollinearity among the adiposity indices, we used three penalized regression models (Methods) to assess their predictive contribution to PDFF and cT1. The standardized coefficients from the best performing algorithm (Ridge regression) showed that VAT was the strongest independent predictor of PDFF and cT1, followed by WFM and BMI for PDFF (Extended Data Fig. 1b). In the penalized regression analysis, WHR and impedance of whole body had almost no independent predictive power and therefore, we used WFM, BMI and VAT as covariates in the genetic association studies.
为了确定肥胖指数中独立预测肝脏甘油三酯含量和炎症/纤维化的因素,我们检查了不同肥胖测量指标之间的成对相关性,包括(1)通过磁共振成像(MRI)衍生的质子密度脂肪分数(PDFF)测量的肝脏甘油三酯含量;以及(2)通过肝脏铁校正 T1(cT1)测量的肝脏炎症/纤维化,研究对象为来自英国生物样本库的欧洲参与者(扩展数据图 1a)。与肝脏特征最强的相关性出现在内脏脂肪组织(VAT)体积,其次是 BMI、腰臀比(WHR)和全身脂肪质量(WFM)。正如预期的那样,PDFF 与 cT1 之间存在高度相关性。由于肥胖指数之间高度多重共线性,我们使用了三种惩罚回归模型(方法)来评估它们对 PDFF 和 cT1 的预测贡献。最佳性能算法(岭回归)的标准系数表明,VAT 是 PDFF 和 cT1 最强的独立预测因子,其次是 WFM 和 BMI(扩展数据图 1b)。 在惩罚回归分析中,腰臀比(WHR)和全身阻抗几乎不具有独立的预测能力,因此,我们在遗传关联研究中使用了 WFM、BMI 和 VAT 作为协变量。

Identification of 17 previously unknown loci for liver triglyceride content and 9 for liver inflammation/fibrosis by the multi-adiposity-adjusted GWAS

To capitalize on the independent contribution of indices of adiposity to proxies of liver triglyceride content (PDFF) and inflammation/fibrosis (cT1), we conducted eight GWAS (four GWAS per each trait), together referred to as the multi-adiposity-adjusted GWAS. Each GWAS was adjusted for a specific index of adiposity (VAT, BMI and WFM), and one unadjusted (Supplementary Table 1)16. Genetic heritability estimates for PDFF showed that adjustment for adiposity index explains up to 6% more heritability compared to the unadjusted model (Supplementary Table 2). For cT1, all adiposity measurements yielded similar results to the unadjusted model. These data suggest that liver triglyceride content is dependent on adiposity, whereas inflammation is less correlated. Furthermore, genetic correlations among different adiposity adjustments showed that BMI and WFM adjustments shared the largest overlap for both PDFF and cT1 (Fig. 1a and Supplementary Table 3), consistent with the epidemiological correlation.

Fig. 1: Overview of the identified loci for liver triglycerides and inflammation/fibrosis by the multi-adiposity-adjustment GWAS.
figure 1

a, Genetic correlation among different multi-adiposity-adjusted PDFF and liver iron corrected T1 was estimated using LD score regression analysis. The asterisks denote Benjamini–Hochberg false discovery rate (FDR) <0.05. The color bar represents the genetic correlation values. Detailed summary statistics for genetic correlations have been reported in Supplementary Table 3. b, Circular Manhattan plot of PDFF and liver iron corrected T1 for different adiposity adjustments. The association analyses were performed using REGENIE adjusting for adiposity index, age, sex, age × sex, age2 and age2 × sex, first ten genomic principal components and array batch. Each dot represents an independent genetic locus. Yellow represents loci associated with liver PDFF and purple represents those associated with liver cT1. Large dots represent pleiotropic loci (where the association with either PDFF or liver cT1 was shared among two or more adiposity adjustments). Small dots show adiposity-trait specific associations. Loci in bold are shared among both traits irrespective of the adiposity adjustment. Only loci with a genome-wide significant P <5 × 10−8 calculated by a whole-genome regression model (Methods) are shown. P values were two-sided and not corrected for multiple testing among four different models (unadjusted, adjusted for BMI, WFM and VAT).

Statistically independent genetic loci for each adiposity-adjusted GWAS were identified by linkage disequilibrium (LD) clumping17 and conditional analysis (Methods)18. Next, we performed pleiotropic analysis19 to identify independent genetic loci from the four adiposity adjustments. In this context, pleiotropic analysis refers to genetic loci that are shared among more than two adiposity adjustments for each liver trait (Supplementary Table 4). Specifically, we assigned the same locus number to lead variants from adiposity-adjusted GWAS (four for PDFF and four for cT1) located within 1 Mb of each other, provided that more than two GWAS lead variants were in LD (r2 > 0.2). Finally, the strongest (lowest GWAS P value) association at each locus was selected as the independent lead variant for that trait (PDFF or cT1).

A total of 37 and 18 independent genetic loci for PDFF and cT1 reached the genome-wide significance level (P < 5 × 10−8 not adjusted for the number of GWAS carried out), respectively (Fig. 1b and Table 1). Multiple loci showed the strongest associations when adjusted for specific adiposity indices (Supplementary Table 5).

Table 1 Genome-wide significant loci for multi-adiposity-adjusted PDFF and liver iron corrected T1 in the UK Biobank

We found 17 and 9 previously unknown genetic loci associating with PDFF and cT1, respectively (Methods, Table 1 and Supplementary Table 6). Four loci (PNPLA3, TM6SF2, GPAM and HFE/SLC17A3) were associated with both traits with at least one adiposity adjustment; however, only PNPLA3 and TM6SF2 loci were associated with both traits at a genome-wide level irrespective of the adjustment (Fig. 1b).

Identification of the putative causal loci associated with liver traits

To identify the putative causal loci, we fine-mapped the independent genome-wide significant loci associated with adiposity-adjusted PDFF and cT1. Independent lead variants at multiple loci had a posterior inclusion probability (PIP) > 0.95, suggesting that these GWAS lead variants are causal variants (Supplementary Table 7). Notably, a missense variant on ADH1B (rs1229984) had a PIP of 1 at ADH1B, MTTP and RP11-766F14.2 loci, suggesting that the observed effect from all three loci may derive from the same putative causal variant. In fact, ADH1B rs1229984 and MTTP rs11937107 have a D′ of 1 in Europeans20 (Supplementary Table 7).

To examine whether the set of independent variants could potentially perturb the gene expression patterns of nearby genes, we performed a Bayesian colocalization (Methods). We were able to colocalize 13 and 7 GWAS signals with at least one eQTL evidence for PDFF and cT1, respectively (Supplementary Table 8).

Functional analyses of independent loci associated with liver traits

Independent genetic loci for adiposity-adjusted PDFF and cT1 were mapped to genes and ranked using multiple approaches (Methods). Out of 37 and 18 independent loci for PDFF and cT1, respectively, the majority (31 and 12) loci had the highest rank for the nearest genes (Supplementary Table 9). For the remaining loci, multiple candidate genes were found. To gain a deeper understanding of the biological implications of genome-wide significant loci, we conducted a functional gene-set enrichment analysis using mapped genes with the highest evidence (Supplementary Table 10). Mapped genes for PDFF were enriched in genes mostly expressed in liver and they were involved in lipid metabolism (Supplementary Table 10a and Supplementary Fig. 1a). Conversely, mapped genes for liver iron corrected T1 were enriched in metal ion metabolism (Supplementary Table 10b and Supplementary Fig. 1b).

Previously unknown genetic loci, liver and metabolic traits

Given the causal relationship between liver triglyceride content and inflammation/fibrosis, we examined the association of the previously unknown variants identified by PDFF with cT1 and vice versa. Notably, most of the variants were associated with both traits and directionally concordant (Extended Data Fig. 2). This is consistent with the notion that liver triglyceride content causes inflammation21. A total of 5 (29 %) and 4 (44 %) loci were associated with either PDFF or cT1, suggesting a specificity of the effect on lipid or inflammation pathways. Furthermore, we examined the association between these previously unknown variants and indices of liver damage, fibrosis and liver disease (Extended Data Fig. 2 and Supplementary Table 11). More than 80% of variants associated with PDFF also associated with alanine aminotransferase (ALT) (one-sided Fisher’s exact test P = 0.028); however, there was no significant correlation between PDFF and cT1 loci associations with aspartate aminotransferase (one-sided Fisher’s exact test P = 0.613). Most variants were associated with plasma lipoproteins and glucose metabolism traits, including diabetes (Extended Data Fig. 2 and Supplementary Table 11).

Indices of adiposity contribute differentially to the association between genetic variants and liver triglycerides

Adiposity is a well-known risk factor for MASLD and there is no evidence on causal impact of SLD on adiposity22. Hence, it is safe to assume that adjusting for adiposity does not suffer from collider bias. Therefore, we hypothesized that the association between PDFF or cT1 and genetic loci depends on the measures of adiposity. To explore this, we performed different statistical analyses (Methods). As reported in Supplementary Tables 12 and 13, while the overall associations are consistent across multi-adiposity-adjusted GWAS, some loci display different associations depending on the adiposity adjustment.

For instance, we found no association between PDFF and rs73026242CEBPG with BMI or WFM adjustments, but a strong genome-wide association with VAT adjustment. This locus has a strong association with VAT but in the opposite direction to that of PDFF, and mediation analysis suggests an inconsistent mediation, namely, a partial mediation in the opposite direction. This locus has been recently linked with visceral to abdominal subcutaneous fat ratio15. Our gene mapping suggests that CEBPA, known for its role in adipogenesis through PPARγ, is the potential causal gene (Supplementary Table 9a)23.

Conversely, while the PPARG locus shows evidence of interaction with all three adiposity measures, a putative inconsistent effect was only observed for WFM adjustment. This variant decreases PPARγ activity24 and confers a protection against diabetes25,26. While the contribution of this variant to SLD is a matter of debate27, we observed a modest positive association with WFM. Another interesting finding is FAM101A locus, where there is a nominally significant association with BMI/WFM and VAT but in opposite directions. Hence, adjusting for VAT mitigated the association (PVAT = 0.01). The top-ranked gene at this locus, CCDC92, has been shown to play a role in insulin resistance and subcutaneous adipose and peripheral fat28. On the other hand, we also encounter the opposite scenario, where adiposity may act as a positive partial mediator, as for the PRMT8, MAST3 and CKM loci.

For cT1, while most loci have consistent associations over different adiposity adjustments, the mediation analysis suggests a putative inconsistent effect for VAT-adjusted model at PEPD locus. The top-ranked gene at this locus is CEBPA; however, whether the putative mechanism is similar to the one described above for liver triglyceride content is unclear because PEPD has a similar rank and is associated with diabetes and adiposity29.

Finally, we performed sensitivity analysis to examine the robustness of mediation analysis to unmeasured confounders, and observed a strong (for PDFF, ρ = 0.6) and moderate (for cT1, ρ = 0.4) robustness of mediation estimates to the sequential ignorability assumption30. Given the causality assumptions in mediation analysis, these potential mechanisms should be considered with caution. Furthermore, due to the statistical equivalence of mediation and confounding effects31, the observed inconsistent mediations may be interpreted as negative confounding effects that may enable the discovery of unknown genetic loci.

Considering this extensive body of evidence, measures of adiposity contribute to the association between genetic loci and proxies of liver triglyceride content (PDFF) and inflammation/fibrosis (cT1), thus supporting our multi-adiposity-adjusted GWAS approach.

The association between six previously unknown loci and liver triglyceride content was replicated in independent cohorts

Based on the strong genetic correlation between PDFF and cT1, to validate the previously unknown SNPs, we meta-analyzed the association between all the previously unknown 26 variants and liver triglyceride content in 3,903 individuals of European ancestry from four independent cohorts (Fig. 2 and Supplementary Table 14). We were able to replicate the association between six of the previously unknown loci (CEBPG, TSC22D2, ABO, GUSB, TECTB and TFCP2) and liver triglyceride content. The direction of the association in the replication cohort was consistent with the discovery cohort.

Fig. 2: The association between six previously unknown loci and hepatic triglyceride content in independent cohorts.
figure 2

The association between each genetic variant and rank-based inverse normal transformed hepatic triglyceride content was performed using a linear regression analysis adjusted for age, sex, age2, age × sex, age2 × sex (shown as circles). Proxy variants were used for variants not available in the replication cohorts (r2 > 0.4 within a window of 1.5 Mb around each lead variant in the UK Biobank) as marked with an asterisk. Pooled effect estimates were calculated using inverse-variance-weighted fixed-effect meta-analysis (shown as diamonds). Genomic loci in bold are those with a P value <0.05 in the fixed-effects model. Error bars represent the 95% confidence intervals from the regression models or meta-analysis. Full summary statistics have been reported in Supplementary Table 14. P values are two-sided and not adjusted for multiple testing. NEO, Netherlands Epidemiology of Obesity study; DHS, Dallas Heart Study.

Partitioned polygenic risk scores identify a steatotic liver-specific disease and a systemic MASLD

Triglyceride secretion is a key mechanism regulating hepatocyte triglyceride homeostasis. Triglyceride secretion is mediated by very-low-density lipoprotein (VLDL) secretion that in fasting conditions are proxied by circulating triglyceride levels. Variants in genes hampering VLDL secretion, including APOB, MTTP, TM6SF2 and PNPLA3, cause retention of liver triglycerides mirrored by lower circulating lipoproteins. Carriers of these variants have an increased risk of MASLD and lower risk for cardiovascular disease due to lower lipoproteins32,33,34.

Based on this mechanism, we allocated the previously unidentified replicated and the previously known variants into two pPRSs: (1) a group showing discordant association between PDFF and circulating triglycerides (n = 10), suggesting that liver triglyceride content is primarily influenced by liver retention, and (2) a group with concordant associations (n = 13), indicating that liver triglyceride content may result from an increase in uptake, synthesis of energy substrates or a reduction in β-oxidation (Methods, Extended Data Fig. 3 and Supplementary Table 15). The variance explained by the discordant pPRS was higher than the concordant pPRS mirroring the fact that discordant pPRS is composed of PNPLA3 and TM6SF2 variants (Supplementary Table 16).

Both pPRSs were associated with an increased risk of MASLD with the largest association being with hepatocellular carcinoma (HCC); however, the association was stronger for the discordant pPRS (Fig. 3a, Supplementary Fig. 2 and Supplementary Tables 17 and 18). Of note, only the discordant pPRS was associated with autoimmune liver diseases.

Fig. 3: Partitioned polygenic risk scores identify a steatotic liver-specific disease and a systemic MASLD.
figure 3

a,b, The case–control (a) and prospective (b) association between two PDFF-circulating TGs pPRS and liver-related, cardiometabolic and chronic kidney failure traits in the UK Biobank. Effect plot of the association between concordant and discordant PDFF-circulating TGs PRS with each disease was tested using either logistic (a) or Cox proportional hazard (b) regression analysis adjusted for BMI, age, sex, age × sex, age2 and age2 × sex, first ten genomic principal components and array batch. The x axis shows either the odds ratio (OR) or hazard ratio. All association analyses have been performed after excluding individuals with available PDFF (n = 36,394). Error bars represent the 95% confidence intervals from the regression models. Full summary statistics have been reported in Supplementary Table 18. P values were two-sided and not corrected for multiple hypothesis testing. TG, triglyceride.

Discordant pPRS was associated with a decreased risk of cardiovascular, whereas concordant PRS was associated with a substantial increased risk of cardiovascular disease and heart failure. Both pPRSs conferred an increased predisposition to diabetes, suggesting that hepatic triglyceride accumulation predisposes to diabetes irrespective of the underlying cause. Conversely, the larger effect size of the concordant pPRS for diabetes despite the lower effect on liver triglyceride content would suggest that the association of diabetes in the concordant pPRS is not mediated by liver damage. In the case of hypertension and chronic kidney failure, discordant pPRS showed no association, whereas the concordant pPRS increased the risk of both diseases; however, when we adjusted for hypertension, the association with chronic kidney failure was no longer significant, whereas the other associations remained (Supplementary Table 18). Further adjustment for diabetes, total cholesterol and alcohol intake did not change the results (Supplementary Table 18). The prospective risk conferred by the pPRS to develop liver and cardiometabolic disease in the UK Biobank was virtually identical (Fig. 3b and Supplementary Table 18).

Functional gene-set enrichment analysis for both pPRSs also revealed a distinct metabolic pattern. While gene sets of discordant pPRS were mostly enriched in lipid and triglyceride homeostasis (Supplementary Table 10c and Supplementary Fig. 3), concordant pPRS gene sets were enriched in insulin receptor signaling and glucose homeostasis pathways, overall consistent with an impact on stimulation of de novo lipogenesis (Supplementary Table 10d and Supplementary Fig. 3).

In addition to our hypothesis-driven approach, we performed an unsupervised soft clustering approach35 (Methods). Of 1,000 iterations, 90% converged to two clusters and 10% to one cluster. One genetic locus, rs738408PNPLA3, appeared in both clusters36 (Supplementary Fig. 4). We used two top-weighted traits to name the clusters: (1) low-density lipoprotein (negative)/triglycerides (negative); and (2) triglycerides (positive)/ALT (positive). When examining the association of pPRS clusters with the same outcomes, we observed a similar dissociation to the PDFF-circulating TGs pPRS; however, differences in the risk of diseases defining the two types of MASLD were larger in our hypothesis-driven approach (Supplementary Table 19). This may be attributed to the soft clustering feature of the Bayesian non-negative matrix factorization (bNMF) algorithm, where rs738408PNPLA3 was included in both clusters (Supplementary Fig. 5).

When comparing plasma biomarkers between individuals in the upper and lower quartiles of pPRS, those in the upper quartile of discordant pPRS had the largest differences in lipoprotein levels (Supplementary Table 20), consistent with the protective effect of the liver-specific subtype compared to the systemic subtype. In addition, individuals in the top quartile of concordant pPRS had higher lipoproteins and blood pressure and lower creatinine levels, consistent with the increased risk of cardiovascular disease, heart failure and kidney failure.

Sex-specific effect of the association between pPRS and the feature of cardiometabolic syndrome

Liver diseases have a different prevalence between males and females. For instance, HCC is more frequent in men while liver autoimmune disease is more prevalent in females. Moreover, there are sex-specific differences in carriers of the PNPLA3 rs738409 between males and females37. Therefore, we examined the association between the two pPRS and cardiometabolic syndrome stratified by sex. Results of the stratified analyses are consistent with the pooled analyses, with the following exceptions: (1) HCC is associated with the concordant pPRS only in males; (2) heart failure is associated with protection in the discordant pPRS specifically in females; and (3) chronic kidney failure is increased in the concordant pPRS only in males (Supplementary Fig. 6 and Supplementary Table 21).

mRNA expression of loci from the liver-specific PRS is more abundant in the liver

We further examined the messenger RNA expression of mapped genes within concordant and discordant pPRS using paired bulk RNA-seq of liver (n = 244) and VAT (n = 261) from participants with obesity from the MAFALDA cohort (Molecular Architecture of Fatty Liver Disease in Patients with Obesity Undergoing Bariatric Surgery study). Notably, only the mapped genes of discordant pPRS showed a significant overlap with upregulated differentially expressed genes in the liver (one-sided Fisher’s exact test, P = 0.007; Fig. 4). Given the tight interplay between VAT and liver in the MASLD, this finding suggests a liver-specific nature of discordant pPRS compared to its metabolic counterpart, concordant pPRS.

Fig. 4: mRNA expression of loci from the liver-specific (discordant) polygenic risk score is more abundant in the liver compared to the visceral adipose tissue.
figure 4

Differential expression analysis of paired liver and VAT bulk RNA-seq data for mapped gene sets of concordant and discordant pPRS. The lower right bar plot shows the fraction of upregulated differentially expressed (DE) genes in the liver compared to VAT. The enrichment of pPRS with upregulated DE genes in the liver was calculated using one-sided Fisher’s exact test. FC, fold change.

Discussion

The main findings of this study are (1) the identification of previously unknown loci associated with SLD; and (2) the identification of two distinct types of MASLD, namely, a liver-specific and a systemic type.

BMI, as a proxy of adiposity, amplifies the genetic predisposition to SLD given by common variants14,22; however, BMI does not consider body fat distribution and body composition. To identify previously unknown genetic loci associated with SLD, we compared a range of measurements of adiposity finding that VAT volume, WFM and BMI were the best independent predictors.

Next, we performed multi-adiposity-adjusted GWAS on PDFF and iron corrected T1, as a measure of liver triglyceride content and inflammation/fibrosis. Our data demonstrate that adiposity indices may confound association between genetic variants and liver triglyceride content and inflammation/fibrosis. By using this approach, we identified 17 previously unknown genetic loci for liver triglyceride content and 9 for liver inflammation, and replicated 6 of these loci in four independent cohorts.

The heritability of liver triglyceride content was influenced by the multi-adiposity adjustment explaining in the best-case scenario approximately 6% more heritability compared to the unadjusted; however, for inflammation this was not the case, suggesting that heritability of inflammation is not directly influenced by adiposity. We have previously demonstrated a causal link between liver triglyceride content and increased risk of liver inflammation using Mendelian randomization21. This finding is further supported by our observation that approximately 80% of the genetic loci associated with PDFF also associate with cT1 in the same direction.

Intrahepatocyte triglyceride homeostasis is governed by three fundamental mechanisms: triglyceride synthesis, lipoprotein secretion and energy substrate utilization. Hindering lipoprotein secretion causes liver triglyceride accumulation by retention. Indeed, loss-of-function variants in TM6SF2 and PNPLA3 cause liver triglyceride retention by reducing lipoprotein secretion38,39. Consistently with hepatic lipoprotein retention, carriers of these variants have lower risk for cardiovascular disease due to the lower circulating lipoproteins32,40.

Therefore, we generated two pPRSs: one composed of variants in which the association between liver triglyceride content and circulating triglycerides were discordant and one in which they were concordant. ‘Partitioned’ polygenic scores may elucidate disease pathogenesis and capture specific signatures driving the individual disease progression, hence providing a framework for tailored therapeutic interventions41.

Of note, the concordant pPRS predicts the entire spectrum of cardiometabolic disease. On the contrary, the discordant pPRS is associated with liver disease mirrored by protection from cardiovascular disease due to lipoprotein retention, despite a marginal increase in the risk of diabetes. The liver specificity of the discordant pPRS is further supported by a higher mRNA expression of the genes composing this score in liver versus visceral adipose paired biopsies from individuals with obesity. We additionally generated pPRSs using hypothesis-free soft clustering35, which was very similar to our hypothesis-driven approach.

Our data suggest the presence of at least two distinct types of MASLD with specific disease-causing molecular mechanisms: one specific for the liver and one systemic and entwined with cardiometabolic syndrome (Extended Data Fig. 4). Understanding the molecular mechanisms underlying these components may allow us to find effective treatments for MASLD and cardiometabolic syndrome. Clinically, these entities reflect the presence of individuals rapidly progressing to later stages of MASLD and those with a slow-progressing MASLD associated with the entire metabolic cardiometabolic syndrome. These types of MASLD may account for the disease heterogeneity and help explain why several drugs have failed in clinical trials to treat MASLD.

Currently Mendelian randomization studies are carried out by selecting variants associated with a trait and using them to explain the causal relationship with a different trait. In this study, the pPRS had opposite effects on cardiovascular risk, indicating that if we had pooled the variants together, we may have nullified the association. Therefore, our findings support the notion that the pPRSs constructed by integrating variants into physiological pathways may allow clarifying the heterogeneity of disease pathogenesis. Ultimately, this will lead to precision medicine, improving outcome prediction and therapy.

A strength of this study is that the partitioning of the PRS was based on a hypothesis-driven approach with solid knowledge of intracellular lipid homeostasis. While the finding on cardiovascular disease may be expected, the associations with hypertension and diabetes were not granted. Alcohol consumption may have an additive effect on SLD and cardiovascular disease. Alcohol is converted into triglycerides in hepatocytes and alcoholic and nonalcoholic SLD share common genetic determinants, suggesting common disease-causing mechanisms42. Therefore, we did not exclude individuals based on alcohol consumption. Nonetheless, sensitivity analyses showed that adjusting for alcohol did not change the results. Finally, we obtained similar results by using a completely different approach, namely, unsupervised clustering, in cohorts of individuals in whom liver biopsy was available (REF).

A limitation of our study is that the identified and replicated genetic loci were based on study cohorts of European ancestry limiting the applicability in non-Europeans. Future studies are warranted to validate these loci and the pPRS in non-European populations. Furthermore, while we performed genetic colocalization and enrichment analyses, the functional implications of the identified loci are yet to be established by in vitro and in vivo experiments.

In conclusion, we identified six previously unknown loci associated with SLD and two distinct types of SLD, namely, one that is liver specific and another that is entwined with the full spectrum of cardiometabolic syndrome.
总结来说,我们确定了与 SLD 相关的六个先前未知的位点以及两种不同的 SLD 类型,即一种为肝特异性,另一种与心血管代谢综合征的全谱系相关。

Methods  方法

UK Biobank  英国生物样本库

The UK Biobank study has recruited over 500,000 participants aged between 40 and 69 years across the United Kingdom between 2006 and 2010, with extensive phenotypic and genetic data43. The UK Biobank received ethical approval from the National Research Ethics Service Committee North West Multi-Centre Haydock (reference 16/NW/0274). Data used in this study were obtained under application number 37142. European ancestry was defined previously12 by removing outliers using genomic principal components. Additionally, participants were excluded if they fell into any of these categories: (1) more than ten putative third-degree relatives; (2) a mismatch between self-reported and genetically inferred sex; (3) putative sex chromosome aneuploidy; (4) heterozygosity and missingness outliers; and (5) withdrawn consent43.
英国生物样本库研究在 2006 年至 2010 年间招募了超过 50 万名 40 至 69 岁的参与者,覆盖整个英国,拥有广泛的表型和遗传数据 43 。英国生物样本库获得了国家研究伦理服务委员会西北多中心 Haydock 的伦理批准(参考号 16/NW/0274)。本研究中使用的数据是在申请号 37142 下获得的。欧洲血统之前通过使用基因组主成分去除异常值来定义 12 。此外,如果参与者属于以下任何一类,则将其排除:(1)超过十个疑似三级亲属;(2)自我报告的性别与遗传推断的性别不匹配;(3)疑似性染色体非整倍体;(4)杂合性和缺失性异常值;以及(5)撤回同意 43

Genotypes and imputation  基因型与推断

UK Biobank participants were genotyped using two highly similar (>95% overlap) genotyping arrays, which were then imputed centrally by the UK Biobank based on the 1000 Genomes Project phase 3, UK 10K haplotype and Haplotype Reference Consortium reference panels. Starting from approximately 97 million variants, we kept only 9,356,431 variants with a minor allele frequency (MAF) > 1%, imputation quality (INFO) score > 0.8 and Hardy–Weinberg equilibrium P > 1 × 10−10 (refs. 12,43).
英国生物样本库参与者使用两个高度相似的基因分型阵列(>95%重叠)进行基因分型,然后根据 1000 基因组计划第 3 阶段、英国 10K 单倍型以及单倍型参考联盟参考面板进行集中推断。从大约 9700 万个变异开始,我们仅保留了 9,356,431 个变异,这些变异的次要等位基因频率(MAF)>1%,推断质量(INFO)评分>0.8,哈迪-温伯格平衡 P>1×10 −10 (参考文献 12,43 )。

Definition of traits  定义特征

We used adiposity measures directly provided by the UK Biobank, including VAT (data field 22407), WFM (data field 23100) and impedance of whole body (data field 23106). WHR was calculated by dividing the waist-to-hip circumference. MRI-derived PDFF and liver iron corrected T1 (cT1) were provided directly by the UK Biobank (data fields 40061 and 40062). The details of liver MRI protocols can be found elsewhere44. In brief, individuals were scanned using a Siemens 1.5T Magnetom Aera. Two sequences were then used for data acquisition, a multiecho-spoiled gradient-echo and a modified look locker inversion sequence (ShMOLLI) for PDFF and cT1, respectively44. The definition of binary traits can be found in Supplementary Table 22.
我们使用了英国生物样本库直接提供的肥胖度指标,包括 VAT(数据字段 22407)、WFM(数据字段 23100)和全身阻抗(数据字段 23106)。腰臀比(WHR)是通过腰围与臀围的比值计算得出的。由 MRI 得出的 PDFF 和校正铁肝 T1(cT1)由英国生物样本库直接提供(数据字段 40061 和 40062)。肝脏 MRI 协议的详细信息可在其他地方找到 44 。简而言之,个体使用西门子 1.5T Magnetom Aera 进行扫描。然后使用两种序列进行数据采集,分别为 PDFF 和 cT1 分别使用多回波饱和梯度回波和改进的快速自旋回波反转序列(ShMOLLI) 44 。二元特征的定义可在补充表 22 中找到。

Phenotypic prediction models
表型预测模型

To address the multicollinearity between different measures of adiposity and to verify their contribution in predicting PDFF and cT1 values, we fit penalized linear regression models and carried out a model selection in a tenfold nested cross-validation (CV) using Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Elastic Net. LASSO penalizes the regression model using the L1-norm, effectively reducing the influence of non-contributing features to zero. On the other hand, Ridge regression utilizes the L2-norm, allowing it to shrink regression coefficients toward zero. Elastic Net combines elements of both LASSO and Ridge by incorporating both L1 and L2 penalties through a mixing parameter α.
为解决不同肥胖度测量指标之间的多重共线性问题,并验证它们在预测 PDFF 和 cT1 值中的贡献,我们拟合了惩罚线性回归模型,并使用十折嵌套交叉验证(CV)进行了模型选择,采用了最小绝对收缩和选择算子(LASSO)、岭回归和弹性网络。LASSO 通过使用 L1 范数惩罚回归模型,有效地将非贡献特征的影响降至零。另一方面,岭回归利用 L2 范数,使其能够将回归系数缩小到零。弹性网络通过混合参数α结合了 LASSO 和岭回归的元素,通过引入 L1 和 L2 惩罚。

To conduct the CV process, the dataset was initially divided into training (80%) and test (20%) sets. Within the training set, the outer CV assessed the performance of each model, while the inner CV was utilized for hyperparameter tuning. This tuning was accomplished by minimizing the mean squared error across a grid of α and shrinkage values in each fold of the outer CV. The best performing model with the lowest mean squared error was then trained on the entire training set within a tenfold CV framework. Subsequently, its performance was evaluated using the remaining test set. Finally, the model with the optimal set of hyperparameters, determined in the previous step, was fitted to the entire dataset for final evaluation45. Adiposity indices were standardized before the training, whereas PDFF and cT1 values were rank-based inverse normal transformed. All models were adjusted for age, sex, age2, age × sex and age2 × sex. All analyses were performed in MATLAB (MathWorks) R2023a.
为了进行 CV 过程,数据集最初被分为训练集(80%)和测试集(20%)。在训练集中,外部 CV 评估了每个模型的性能,而内部 CV 用于超参数调整。这种调整是通过最小化外部 CV 每个折叠中α和收缩值网格上的均方误差来实现的。然后,在十折 CV 框架内,性能最佳的模型(均方误差最低)在全部训练集上进行了训练。随后,使用剩余的测试集对其性能进行了评估。最后,在先前步骤中确定的最佳超参数集的模型被拟合到整个数据集以进行最终评估 45 。在训练之前对肥胖指数进行了标准化,而 PDFF 和 cT1 值则基于排名进行了逆正态转换。所有模型都调整了年龄、性别、年龄 2 、年龄×性别和年龄 2 ×性别。所有分析均在 MATLAB(MathWorks)R2023a 中完成。

Genome-wide association analysis
全基因组关联分析

The association between 9 million imputed common variants and PDFF or cT1 under different adiposity adjustments under an additive genetic model was performed using a whole-genome regression model as implemented in REGENIE (v.3.2.8)16. The analysis was adjusted for age at MRI, sex, age2, age × sex, age2 × sex, the first ten principal components (PCs) of ancestry, genotyping array and adiposity index, where adiposity index was VAT, WFM, BMI or no adiposity adjustments.
9 百万个推断的共同变异与 PDFF 或 cT1 在不同肥胖调整下的关联,在加性遗传模型下,使用 REGENIE(v.3.2.8)中实现的基因组回归模型进行。分析调整了 MRI 时的年龄、性别、年龄 2 、年龄×性别、年龄 2 ×性别、祖先的前十个主成分(PCs)、基因分型阵列和肥胖指数,其中肥胖指数为 VAT、WFM、BMI 或无肥胖调整。

Similarly, we tested the association between independent lead variants from multi-adiposity-adjusted GWAS of PDFF and cT1, and other binary or continuous metabolic traits using either a logistic or a linear whole-genome regression model in REGENIE and adjusted for the same set of covariates, including consistent adiposity adjustments. Individuals with an available PDFF or cT1 measurements were excluded before the association analysis (n = 36,748). In cases where the trait was measured at baseline, we used WHR instead of VAT adjustment, as the latter was not available at baseline. To fit the whole-genome regression model in step 1 of REGENIE, a subset of directly genotyped common variants (MAF > 1%) was used. After excluding variants on long-range LD and major histocompatibility complex (MHC) regions, variants with a missingness <0.01 and with Hardy–Weinberg equilibrium P > 1 × 10−15 were retained. Finally, 146,833 markers left following an LD pruning with a window of 500,000 base pairs and pairwise r2 < 0.1 (ref. 17). Continuous traits were rank-based inverse normal transformed before the analyses.
类似地,我们使用 REGENIE 中的逻辑回归或线性全基因组回归模型,测试了来自多脂肪调整的 PDFF 和 cT1 的独立主成分变异与其他二元或连续代谢特征之间的关联,并调整了相同的一组协变量,包括一致的脂肪调整。在关联分析之前排除了有可用 PDFF 或 cT1 测量的个体(n = 36,748)。在特征在基线测量的情况下,我们使用腰臀比(WHR)而不是 VAT 调整,因为后者在基线时不可用。为了在 REGENIE 的第 1 步中拟合全基因组回归模型,使用了一组直接基因分型的常见变异(MAF > 1%)。在排除长距离 LD 和主要组织相容性复合体(MHC)区域的变异后,保留了缺失率<0.01 且 Hardy-Weinberg 平衡 P > 1 × 10 −15 的变异。最后,在 500,000 个碱基对的窗口和成对 r 2  < 0.1(参考文献 17 )的 LD 修剪后,留下了 146,833 个标记。在分析之前,连续特征基于排名进行逆正态转换。

Identifying independent variants
识别独立变体

We first performed LD clumping (PLINK v.1.90b6.26 parameters: –clump-p1 5 × 10−8 –clump-r2 0.01 –clump-kb 1,000, after excluding individuals with third-degree or closer relatives17,43) to identify approximately independent loci. Next, to detect statistically independent variants, we conducted approximate step-wise model selection in conditional and joint multiple-single nucleotide polymorphism (SNP) analysis implemented in Genome-wide Complex Trait Analysis (GCTA-COJO18, v.1.94.0), with an LD window of 10 Mb and using 50,000 randomly selected unrelated Europeans from the UK Biobank for in-sample LD structure, as described previously12. To examine whether the identified genetic loci were previously reported, we searched the NHGRI-EBI GWAS catalog database46 in a window of 1 Mb around each lead variant.
首先执行 LD 聚集(PLINK v.1.90b6.26 参数:–clump-p1 5 × 10 −8 –clump-r2 0.01 –clump-kb 1,000,在排除有三级或更近亲属的个体后 17,43 )以识别大约独立的位点。接下来,为了检测统计上独立的变异,我们在条件联合多重单核苷酸多态性(SNP)分析中进行了近似逐步模型选择,该分析在基因组复杂性状分析(GCTA-COJO 18 ,v.1.94.0)中实现,LD 窗口为 10 Mb,并使用来自英国生物样本库的 50,000 名随机选择的无关欧洲人进行样本 LD 结构,如前所述 12 。为了检查所识别的遗传位点是否之前已有报道,我们在每个主要变异位点周围 1 Mb 的窗口内搜索了 NHGRI-EBI GWAS 目录数据库 46

Estimating heritability and genetic correlations
估算遗传力和遗传相关系数

SNP heritability and confounding bias were estimated with LD score regression analysis (LDSC; v.1.0.1, https://github.com/bulik/ldsc/)47 using the baseline LD model (v.2.2; https://data.broadinstitute.org/alkesgroup/LDSCORE/), containing 97 annotations, including functional annotations and MAF-/LD-dependent architectures48. Similarly, pairwise genetic correlations were calculated using LDSC analysis47 after excluding variants in the MHC region (chromosome 6, 25–34 Mb) due to the complex LD structure. Trait pairs with a Benjamini−Hochberg FDR < 0.05 were considered to have a significant genetic correlation. In all analyses, we set the LDSC parameter chisq-max to an arbitrary large number (99,999) to keep large-effect associations.
SNP 遗传力与混杂偏倚通过 LD 分数回归分析(LDSC;v.1.0.1,https://github.com/bulik/ldsc/) 47 使用基线 LD 模型(v.2.2;https://data.broadinstitute.org/alkesgroup/LDSCORE/),包含 97 个注释,包括功能注释和 MAF-/LD 相关架构 48 。同样,在排除 MHC 区域(染色体 6,25-34 Mb)的变异后,使用 LDSC 分析计算成对遗传相关性 47 ,因为该区域的 LD 结构复杂。具有 Benjamini−Hochberg FDR < 0.05 的性状对被认为具有显著的遗传相关性。在所有分析中,我们将 LDSC 参数 chisq-max 设置为任意大数(99,999)以保持大效应关联。

Pleiotropy analysis

We evaluated whether the independent genome-wide significant loci, adjusted for different adiposity measures, were specific to each adiposity measure, common between PDFF and cT1 GWAS or shared across both. Therefore, if two independent lead variants within 1 Mb of each other were in LD (r2 > 0.2), they were assigned the same locus ID (Supplementary Table 4). Circular Manhattan plots were visualized using Circos49.

Functionally informed fine-mapping

Functionally informed genetic fine-mapping was performed using PolyFun v.1.0.0 and Sum of Single Effects (SuSiE, v.0.11.92)50,51. PolyFun was used to estimate per-SNP heritability using L2-regularized extension of stratified LDSC (S-LDSC) and baseline LD model v.2.2 containing 187 annotations47,48,50. The estimated per-SNP heritabilities were used as prior causal probabilities in SuSiE with a maximum of ten causal variants in each region. The subset of 337,000 unrelated white-British individuals from UK Biobank were used for in-sample LD structure. After excluding the MHC region on chromosome 6, fine-mapping per each locus was performed in a window of 1.5 Mb around the lead genetic variants (P < 5 × 10−8).

Colocalization

Colocalization was performed between independent genetic loci identified by COJO-GCTA, and summary statistics of gene expression quantitative trait loci (eQTL) of 49 tissues in GTEx (v.8) from the eQTL catalog release 4 (refs 52,53). The coordinates of GWAS summary statistics were first converted from Build 37 to 38 using liftOver function of rtracklayer R package (v.1.54.0)54. We performed colocalization using COLOC-SuSiE assuming the presence of multiple causal variants (coloc R package55, v.5.1.0) with default priors and considered variants in a window of 1.5 Mb around the index variant at each locus. We considered only genes with at least one significant variant (FDR P < 0.1, eGenes) and performed the colocalization in a window of 1.5 Mb around each eGene. If SuSiE did not converge after 1,000 iterations, conventional (single causal variant) colocalization was used. Finally, an H4 posterior probability (PP) > 0.8 was considered as strong evidence that both traits share the same causal variant.

Variant annotation

Independent genome-wide significant and fine-mapped variants were annotated using Ensembl Variant Effect Predictor (VEP) accessed from REST API (https://rest.ensembl.org/).

Gene mapping and functional enrichment analysis

To map and prioritize potential candidate genes for independent genetic loci, we employed multiple approaches. (1) SNP2GENE module of FUMA v.1.5.4 (ref. 56) was used for positional mapping of lead variants to genes with a maximum distance of 50 kb. (2) eQTL mapping using FUMA by considering only genes with at least one significant eQTL association (FDR < 0.05). (3) 3D chromatin interaction mapping using FUMA by considering only significant interactions (FDR < 1 × 10−6) within 250–500 bp upstream and downstream of the transcription start site, respectively. (4) Multi-marker analysis of genomic annotation (MAGMA, v.1.08)57 implemented in FUMA was carried out to perform genome-wide gene association analysis using 19,535 curated protein-coding genes. Only genes with a Bonferroni threshold below 0.05/19,535 = 2.56 × 10−6 were kept for gene mapping. Variants within the MHC region were excluded before the analysis. (5) Colocalized genes from colocalization analysis with at least one tissue and an H4 PP > 0.8. (6) Nearest gene(s) to the fine-mapped variants with the maximum PP per each locus. (7) Genes with the highest overall V2G score at each locus were based on Open Targets Genetics58. Finally, to prioritize the mapped genes, we calculated an unweighted ranking score by summing over the evidence from the above-mentioned approaches.

By using the set of genes with the maximum ranking score at each locus, we performed functional gene-set enrichment analysis using Enrichr tool59 against, ARCHS4 tissues, Reactome biological pathways and Gene Ontology Biological Processes. Significant terms with a Benjamini–Hochberg FDR-corrected P value <0.05 per each database were reported. For visualization, both adjusted P values and Enrichr combined scores (−log(P) × OR) were used.

Partitioned polygenic risk scores of liver triglyceride content

To define the pPRSs, genetic loci (full list in Supplementary Table 15) were assigned into two groups based on their concordant or discordant associations with PDFF and circulating triglycerides. We excluded the genetic loci that did not associate with circulating triglycerides. Finally, the pPRSs were generated by taking the weighted sum of genetic variants, where the strongest association at each locus was used as the weight, following the pleiotropy analysis.

Replication cohorts

NEO

NEO is a population-based cohort study in men and women aged 45 to 65 years, with oversampling of individuals having BMI over 27 kg m−2 from Leiden and surrounding areas in the Netherlands. At baseline, 6,671 participants were included and around 35% of the NEO participants were randomly selected to undergo hepatic triglyceride content (HTGC) measurements by magnetic resonance spectroscopy. Genotyping was performed using Illumina HumanCoreExome-24 BeadChip and imputed to TOPMed reference genome panel60. In the present work, a total of 1,822 individuals of European ancestry with an available HTGC were used.

Liver BIBLE

The Liver BIBLE-2022 cohort comprises 1,144 healthy middle-aged individuals (40–65 years) with metabolic dysfunction (at least three criteria for metabolic syndrome among BMI ≥ 35 kg m2, arterial hypertension ≥135/80 mm Hg or therapy, fasting glucose ≥100 mg dl−1 or diabetes, low high-density lipoprotein <45/55 mg dl−1 in males/females and high triglycerides ≥150 mg dl−1) who presented for blood donation from June 2019 to February 2021 at the Transfusion Medicine unit of Fondazione IRCCS Ca’ Granda Hospital (Milan, Italy)61. Hepatic fat content was estimated noninvasively by controlled attenuation parameter (CAP) with FibroScan device (Echosens). Genotyping was performed by Illumina GlobalScreeningArray (GSA)-24 v.3.0 plus Multidisease Array (Illumina) and further imputed to TOPMed reference genome panel62. At the time of analysis, genomic data passing quality control with an available CAP measure were available for 1,081 patients of European ancestry.

MAFALDA

The MAFALDA study started in May 2020 and ended in April 2022. It comprised a total of 468 consecutive participants with morbid obesity (BMI ≥ 35 kg m2) who underwent bariatric surgery at Campus Bio-Medico University of Rome, Italy. In MAFALDA participants, SLD diagnosis was assessed only by liver histology in n = 116, only by vibration-controlled transient elastography, including CAP measurement with FibroScan (Echosens)63 in 141 individuals, with both in 148 individuals and 63 with neither CAP nor liver biopsy. In this study, only individuals with liver fat content estimated by CAP were included (n = 172). Genotyping was performed in the same manner as that of the Liver BIBLE cohort. MAFALDA includes a total of 264 paired visceral and liver biopsies with available bulk transcriptomic data.

Dallas Heart Study

In this study, only 828 European Americans from the Dallas Heart Study (DHS-1) were used. The DHS is a population-based sample study of Dallas County, Texas, USA, where liver triglyceride content was measured by magnetic spectroscopy. Details of this study can be found elsewhere13.

Ethics

This research complies with the principles outlined in the Declaration of Helsinki. The UK Biobank received ethical approval from the National Research Ethics Service Committee Northwest Multi-Centre Haydock (reference 16/NW/0274). Data used in this study were obtained under application number 37142. The NEO study was approved by the medical ethical committee of the Leiden University Medical Center. The Liver BIBLE study was approved by the ethical committee of the Fondazione IRCCS Ca’ Granda (ID 1650, revision 23 June 2020). The MAFALDA study was approved by the Local Research Ethics Committee (no. 16/20). The DHS was approved by the institutional review board of the University of Texas Southwestern Medical Center. Each participant provided written informed consent. The baseline characteristics for these cohorts are listed in Supplementary Table 23.

Meta-analysis

The association between previously unknown independent loci for PDFF and cT1 and magnetic resonance spectroscopy liver fat (DHS-1 and NEO studies) or CAP measurement (MAFALDA and Liver BIBLE) was performed using a linear regression analysis adjusted for age, sex, age2, age × sex, age2 × sex and BMI after a rank-based inverse normal transformation of the response. An inverse-variance meta-analysis was then performed with fixed-effect model using the meta R package (v.6.5.0). For genetic variants not available in either of replication cohorts, a proxy variant was used instead: variants in LD (R2 > 0.4) with the lead variant in the UK Biobank within a window of 1.5 Mbp. If no such variant was found in the UK Biobank, the LDproxy tool with Europeans from 1000 Genomes Project was used instead20. In case of multiple proxy variants, the one with the highest LD and functional consequence was selected.

RNA-seq analysis

Total RNA for 264 paired liver and VAT samples from the MAFALDA cohort was isolated using miRNeasy Advanced Mini kit (QIAGEN). RNA sequencing and library preparation was performed in a paired-end 150-bp mode using the Illumina NovaSeq PE150 (Novogene). Reads were aligned to GRCh38 reference genome by STAR64 (v.2.7.10a) after quality check (FastQC software v.0.12.0, Babraham Bioinformatics) and trimming of low-quality reads and potential contaminating adapters by Trimmomatic65 (v.0.39). Gene-level read counts were quantified by RSEM66 (v.1.3.3) tool against the Ensembl (release 107). Samples with insufficient mapping specificity (uniquely to total mapped reads <0.7) were excluded before the analysis. Finally, a paired differential expression analysis of 261 VAT and 244 liver samples was carried out using DESeq2 (ref. 67) (v.1.38.3), while adjusting for RNA integrity number, individual ID and five surrogate variables detected by surrogate variable analysis68.

Follow-up analysis

The longitudinal association of PRS with the occurrence of the outcomes was tested through Cox proportional hazard regression and expressed as hazard ratios with 95% confidence intervals. The median follow-up was 14.5 years and individuals with any of diagnoses at the baseline were excluded before the analysis (Supplementary Table 22). The proportional hazard assumption was checked through the consideration of Schoenfeld residuals and no violations were detected. Prospective associations were performed in R v.4.0.2 (R Foundation for Statistical Computing).

Gene–adiposity interaction analysis

Gene–adiposity interaction for independent loci from multi-adiposity-adjusted GWAS was performed in REGENIE (v.3.2.8) using robust standard errors (sandwich estimators HC3) to guard against heteroskedasticity16. The analysis was adjusted for age at MRI, sex, age2, age × sex, age2 × sex, the first ten PCs of ancestry, genotyping array and adiposity index, where adiposity index was VAT, WFM or BMI. Due to the sensitivity of interaction effect sizes to the trait transformation, PDFF and liver iron corrected T1 were log-transformed before the interaction analyses69,70.

Mediation analysis

To examine whether the impact of identified independent loci on PDFF or liver iron corrected T1 are mediated via the measures of adiposity, we performed mediation analysis using the mediation R package71. All models were adjusted for age at MRI, sex, age2, age × sex, age2 × sex, the first ten PCs of ancestry, genotyping array and the polygenic covariate estimated in step 1 of REGENIE16. We additionally considered the scenario with a genetic variant-mediator interaction term. The significance of mediation (P values and 95% CIs) was assessed via a nonparametric bootstrap method (1,000 simulations) on the rank-based inverse normal transformed PDFF and liver iron corrected T1 as outcomes and adiposity measures as mediators. We also performed sensitivity analyses to examine sequential ignorability assumption (possible existence of unobserved confounders between adiposity indices and liver traits)30. This was carried out by examining the correlation coefficient between error terms of liver traits and adiposity index models, at which estimated mediation effect was zero (95% CI contains 0).

Association analysis with adiposity measures

The association between independent loci from multi-adiposity-adjusted GWAS and BMI, WFM or VAT was performed using a whole-genome regression model as implemented in REGENIE (v.3.2.8)16. All models were adjusted for age at MRI, sex, age2, age × sex, age2 × sex, the first ten PCs of ancestry and genotyping array. Measures of adiposity were rank-based inverse normal transformed before the analysis.

bNMF clustering

We applied bNMF, an unsupervised soft clustering approach, to define ‘hypothesis-free’ clusters of independent loci from multi-adiposity-adjusted GWAS35. This approach has been successfully employed to find physiologically relevant partitioned polygenic risk scores for type 2 diabetes35. We first performed the association analysis between ALT, aspartate aminotransferase, glycated hemoglobin, circulating triglycerides, low-density lipoprotein cholesterol, glucose, creatinine, systolic blood pressure and cystatin C using REGENIE adjusted for age, sex, age2, age × sex, age2 × sex, the first ten PCs of ancestry and adiposity index, where adiposity index was chosen based on the strongest association from the multi-adiposity-adjusted GWAS. As VAT was not available at baseline, we used WHR instead. Next, a variant-trait association matrix of standardized z-scores (m × n, where m and n are the number of independent loci associating with PDFF and continuous traits mentioned above, respectively) was constructed while accounting for different sample sizes in each GWAS on continuous traits35. This scaled matrix was then aligned to PDFF-increasing alleles. bNMF clustering was performed using the bNMF R pipeline (https://github.com/gwas-partitioning/bnmf-clustering) by setting the maximum number of clusters, K, to 7 for 1,000 iterations and removing highly correlated traits (Pearson correlation coefficient >0.85). After determining the maximum posterior solution at the most probable K, a cutoff maximizing the signal-to-noise ratio (1.08) was used to keep variants in each cluster35. Two top-weighted traits at each cluster were used to define the cluster names. Finally, pPRSs were generated by a weighted sum of genetic variants at each cluster, where weights were derived from the multi-adiposity-adjusted GWAS of PDFF.

Comparison between bNMF and PDFF-TGs pPRS

To compare the PDFF-TGs hypothesis-driven pPRS approach and two pPRSs identified by bNMF algorithm to distinguish between liver, cardiometabolic and kidney outcomes, we performed a Wald test as follows:

$$W=\frac{{\hat{\beta }}_{{pPRS}1}\,-\,{\hat{\beta }}_{{pPRS}2}}{\sqrt{{{SE}}_{{pPRS}1}^{2}+\,{{SE}}_{{pPRS}2}^{2}}}$$

where \({\hat{\beta }}_{{pPRS}1}\), \({{{{SE}}_{{pPRS}1}^{2}}}\) and \({\hat{\beta }}_{{pPRS}2}\), \({{{{SE}}_{{pPRS}2}^{2}}}\) are the log OR and standard error of either discordant/concordant or bNMF pPRSs, respectively. In the equation above, W is the Wald test statistic and \({W}^{2} \sim \,{\chi }_{1}^{2}\). Similarly, we compared the model fit, by calculating the difference between Akaike information criterion between the pPRS (Supplementary Table 19).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.