Abstract 抽象
Synthetic genetic circuits program the cellular input–output relationships to execute customized functions. However, efforts to scale up these circuits have been hampered by the limited number of reliable regulatory mechanisms with high programmability, performance, predictability and orthogonality. Here we report a class of split-intron-enabled trans-splicing riboregulators (SENTRs) based on de novo designed external guide sequences. SENTR libraries provide low leakage expression, wide dynamic range, high predictability with machine learning and low crosstalk at multiple component levels. SENTRs can sense RNA targets, process signals by logic computation and transduce them into various outputs, either mRNAs or noncoding RNAs. We subsequently demonstrate that digital logic operation with up to six inputs can be implemented using multiple orthogonal SENTRs to regulate a single gene simultaneously and coupling SENTRs with split intein-mediated protein trans-splicing. SENTR represents a powerful and versatile regulatory tool at the post-transcriptional level in Escherichia coli, suggesting broad biotechnological applications.
合成遗传电路对细胞输入-输出关系进行编程以执行自定义功能。然而,由于具有高可编程性、性能、可预测性和正交性的可靠监管机制数量有限,因此扩大了这些电路的规模。在这里,我们报告了一类基于从头设计的外部向导序列的分裂内含子启用的反式剪接核糖调节因子 (SENTR)。SENTR 文库提供低泄漏表达、宽动态范围、高机器学习可预测性以及多组分级别的低串扰。SENTR 可以检测 RNA 靶标,通过逻辑计算处理信号并将其转导成各种输出,无论是 mRNA 还是非编码 RNA。我们随后证明,可以使用多个正交 SENTR 来实现多达六个输入的数字逻辑操作,以同时调节单个基因,并将 SENTR 与分裂内含肽介导的蛋白质反式剪接偶联。SENTR 代表了大肠杆菌转录后水平的一种强大且多功能的调节工具,表明其具有广泛的生物技术应用。
Main 主要
A longstanding challenge in synthetic biology is to program cellular behavior by constructing complex genetic circuitry, which requires composable and orthogonal biological parts and predictable circuit behavior when integrating these components1,2. Toward this vision, RNA-based regulators (riboregulators) have gained increasing interest because of their simple base-pairing mechanism, inherent programmability, fast dynamics and low metabolic load. A wealth of riboregulators have been developed so far for diverse purposes, including toehold switches1,3 and repressors4, small transcription-activating RNAs5,6, ribozymes7, microRNAs8, small RNAs9, RNA-responsive10 or RNA-targeting clustered regularly interspaced short palindromic repeats (CRISPR)–Cas systems11 and RNA-editing systems12, which permits combinatorial logic circuits such as multi-input AND, NAND, OR and NOR logic gates4,13. However, challenges remain in customizing genetic circuits using these RNA modalities: First, the functionality of many modalities involves endogenous regulatory machinery (for example, translation and degradation) or protein elements, amenable to host–circuit interaction and resource competition among circuits. Second, the translational regulators modify the output protein sequences (especially N-terminal sequences) with unexpected effects on protein activities, hindering their integration with protein-level regulators. Third, the allosteric cis-regulation of RNA hairpin structures, as a predominate mechanism to de novo design riboregulators, could suffer from risks of leakiness depending on the sequence variants and needs arduous screening to reduce RNA misfolding and optimize the dynamic ranges.
合成生物学的一个长期挑战是通过构建复杂的遗传回路来对细胞行为进行编程,这需要可组合和正交的生物部件以及集成这些组件时的可预测电路行为1,2。为了实现这一愿景,基于 RNA 的调节因子(核糖调节因子)因其简单的碱基配对机制、固有的可编程性、快速的动力学和低代谢负荷而受到越来越多的关注。迄今为止,已经开发了大量用于多种用途的核糖调节因子,包括脚趾固定开关 1,3 和阻遏物4、小转录激活 RNA5,6、核酶7、microRNA8、小 RNA9、RNA 反应10 或 RNA 靶向成簇规则间隔短回文重复序列 (CRISPR)-Cas 系统11 和 RNA 编辑系统12,它允许组合逻辑电路,例如多输入 AND、NAND OR 和 NOR 逻辑门 4,13。然而,使用这些 RNA 模式定制遗传电路仍然存在挑战:首先,许多模式的功能涉及内源性调节机制(例如,翻译和降解)或蛋白质元件,适合宿主-电路相互作用和电路之间的资源竞争。其次,翻译调节因子修饰输出蛋白序列(尤其是 N 端序列),对蛋白质活性产生意想不到的影响,阻碍它们与蛋白质水平调节因子的整合。 第三,RNA 发夹结构的变构顺式调节作为从头设计核糖调节剂的主要机制,根据序列变体的不同,可能存在泄漏风险,需要艰巨的筛选以减少 RNA 错误折叠并优化动态范围。
To address these limitations, we designed a novel class of riboregulators based on group I intron-mediated trans-splicing. Unlike current riboregulators, our split-intron-enabled trans-splicing riboregulators (SENTRs) break bacterial mRNAs into several fragments and rejoin them through RNA trans-splicing once coexpressed. Exploiting the de novo design of external guide sequences (EGSs), we improved the conventional split-intron design’s efficiency, programmability, predictability and orthogonality while maintaining its low leakiness, flexible localization and seamlessness. We demonstrate the utility of SENTRs by validating 56 trans-splicing systems in Escherichia coli, which yield high fold changes and good predictability with machine learning (ML). We then systematically investigate the orthogonality of different sequence components and multiply the orthogonal library size by composing orthogonal components. We also demonstrate that SENTRs can function with flexible 5′ splice site choices and splice robustly in coding sequences (CDSs), 5′ untranslated regions (5′ UTRs) and single guide RNAs (sgRNAs), for versatile gene regulation and RNA sensing. The SENTRs were then integrated into ribocomputing devices to implement two-input OR, NOR, NIMPLY and IMPLY and multi-input AND and NAND logic gates. Finally, the SENTRs were coupled with protein-level regulation (that is, intein-mediated trans-splicing) to build digital AND and NAND devices with up to six inputs.
为了解决这些限制,我们设计了一类基于 I 组内含子介导的反式剪接的新型核糖调节因子。与目前的核糖调节因子不同,我们支持分裂内含子的反式剪接核糖调节因子 (SENTR) 将细菌 mRNA 分解成多个片段,并在共表达后通过 RNA 反式剪接重新连接它们。利用外部引导序列 (EGS) 的从头设计,我们提高了传统分裂内含子设计的效率、可编程性、可预测性和正交性,同时保持了其低泄漏性、灵活的定位和无缝性。我们通过验证大肠杆菌中的 56 个反式剪接系统来证明 SENTRs 的实用性,这些系统通过机器学习 (ML) 产生高倍数变化和良好的可预测性。然后,我们系统地研究了不同序列分量的正交性,并通过组合正交分量来乘以正交文库大小。我们还证明,SENTR 可以在灵活的 5' 剪接位点选择下发挥作用,并在编码序列 (CDS)、5' 非翻译区 (5' UTR) 和单向导 RNA (sgRNA) 中稳健地剪接,以实现多功能基因调控和 RNA 传感。然后将 SENTR 集成到核糖计算设备中,以实现双输入 OR、NOR、NIMPLY 和 IMPLLY 以及多输入 AND 和 NAND 逻辑门。最后,将 SENTR 与蛋白质水平调节(即内含肽介导的反式剪接)偶联,以构建具有多达 6 个输入的数字 AND 和 NAND 设备。
Design of SENTRs SENTR 的设计
As a starting point, we chose to engineer the group I intron from Tetrahymena thermophila (TT intron), which is the most studied trans-splicing ribozyme14,15,16,17 and has been reported to function in E. coli, albeit with low efficiencies18. In previous studies, the TT intron was usually split at loop 1 (L1) and inserted after the 5′ splice site uridine14,15,16,17 to generate 5′ RNA and 3′ RNA. After transcription, the assembly of two intron halves and the formation of a P1 duplex (red in Fig. 1a) by exonic 5′P1 and an intronic internal guide sequence (IGS) would trigger the trans-splicing reaction, ligating the exons to produce mature mRNAs. Annotated cartoons of the secondary structure, sequence features and splicing reactions of the ribozyme are provided in Supplementary Fig. 1.
作为起点,我们选择从嗜热四膜虫 (TT 内含子) 中设计 I 组内含子,这是研究最多的反式剪接核酶14,15,16,17,据报道在大肠杆菌中发挥作用,尽管效率低18。在以前的研究中,TT 内含子通常在环 1 (L1) 处分裂并插入 5' 剪接位点尿苷14、15、16、17 之后以产生 5' RNA 和 3'RNA。转录后,两个内含子半部的组装和 P1 双链体的形成(图 1 中的红色)。1a) 通过外显子 5′P1 和内含子内向导序列 (IGS) 将触发反式剪接反应,连接外显子以产生成熟的 mRNA。核酶的二级结构、序列特征和剪接反应的注释漫画在补充图 中提供。1.
In the design of SENTRs, we appended extra pairs of hybridization domains to the intron halves, referred to as EGSs (orange in Fig. 1a). The matched pairs of 5′ EGSs and 3′ EGSs could be computationally designed using the Nucleic Acids Package (NUPACK)19 and facilitate the assembly of intron halves in precursor RNAs. To increase the trans-splicing efficiency, we also designed the synthetic P10 duplex formed by the 3′ spacer (gray in Fig. 1a) and 3′ exon. The detailed design and results of SENTRs with different lengths of EGSs and P10s are provided in the Methods and Extended Data Fig. 1.
在 SENTR 的设计中,我们在内含子的两半上附加了额外的杂交结构域对,称为 EGS(图 1 中的橙色)。匹配的 5' EGS 和 3' EGS 对可以使用核酸包 (NUPACK)19 进行计算设计,并促进前体 RNA 中内含子的组装。为了提高反式剪接效率,我们还设计了由 3' 垫片形成的合成 P10 双链体(图 10 中的灰色)。1a) 和 3' 外显子。具有不同长度 EGS 和 P10 的 SENTR 的详细设计和结果在方法和扩展数据图 1 中提供。1.
In vivo validation of 56-EGS SENTR library
56-EGS SENTR 文库的体内验证
We established a reporter assay for rapid assessment of the in vivo trans-splicing activity by inserting the split TT intron at site 482 of the sfgfp gene. To validate the reporter assay, we tested the performance of SENTRs with multiple controls. Bacterial cells expressing both 5′ RNA and 3′ RNA fused with EGSs showed significantly higher green fluorescence than those expressing only 5′ RNA or 3′ RNA and the output levels of the groups with the deactivated intron (G264A mutant20) or without EGS were low (Fig. 1b and Supplementary Fig. 2). The RNA levels of spliced products in these groups quantified by reverse transcription (RT)–qPCR were consistent with the fluorescence assays (Supplementary Fig. 3). These results indicated that the fluorescent outputs were dependent on RNA trans-splicing and that EGSs determined the splicing activities of SENTRs.
我们建立了一种报告基因测定法,通过在 sfgfp 基因的 482 位点插入分裂的 TT 内含子来快速评估体内反式剪接活性。为了验证报告基因分析,我们用多个对照测试了 SENTR 的性能。与EGS融合的5' RNA和3' RNA的细菌细胞比仅表达5' RNA或3' RNA的细菌细胞显示出明显更高的绿色荧光,并且具有失活内含子(G264A 突变体20)或没有 EGS 的组的输出水平较低(图 D)。1b 和补充图2). 通过逆转录 (RT)-qPCR 定量的这些组中剪接产物的 RNA 水平与荧光测定一致(补充图 D)。3). 这些结果表明荧光输出依赖于 RNA 反剪接,并且 EGSs 决定了 SENTR 的剪接活性。
A library of 56 EGSs (EGS-1 to EGS-56) was then designed using NUPACK (RNA sequences in Supplementary Table 1) and characterized in vivo for functional performance (Fig. 1c). The on-state fluorescence was measured from cells expressing both RNA strands while the off-state fluorescence was determined from cells expressing only 3′ RNA. The 56 riboregulators displayed extremely low off-state fluorescence, below 2.2 × 10−4 relative promoter units (RPU), and significantly higher on-state fluorescence. Of the 56 variants, 46 EGSs (82%) provided on–off ratios > 10. A total of 32 variants (57%) exhibited over 100-fold activation in the on state and eight variants (14%) yielded over 1,000-fold activation, with the two best-performing ones showing fold changes exceeding 10,000.
然后使用 NUPACK(补充表 1 中的 RNA 序列)设计了 56 个 EGS(EGS-1 至 EGS-56)的文库,并在体内表征了功能性能(图 1)。从表达两条 RNA 链的细胞中测量导通状态荧光,而从仅表达 3′ RNA 的细胞中测定关态荧光。56 个核调节因子显示出极低的关态荧光,低于 2.2 × 10−4 个相对启动子单位 (RPU),并且导通态荧光显著较高。在 56 种型号中,46 种 EGS (82%) 提供开关比> 10。共有 32 个变体 (57%) 在开启状态下表现出超过 100 倍的激活,8 个变体 (14%) 产生超过 1,000 倍的激活,其中两个表现最好的变体显示倍数变化超过 10,000。
ML for SENTR performance prediction
用于 SENTR 性能预测的 ML
Recent success in analyzing and predicting toehold switch performances from sequences in silico21,22 inspired us to unveil the relationship between the SENTR sequence and performance by ML. To this end, we expanded the SENTR dataset by characterizing 1,296 pairwise combinations of 36 EGSs in vivo (Fig. 1d, left, and Extended Data Fig. 1a). We then trained 11 different regression models on the dataset to infer the relationship of the EGSs with the reporter assay’s activity (detailed discussions, pipeline development and modeling results in Supplementary Note 1 and Supplementary Tables 2–5).
最近在计算机模拟21,22 中分析和预测序列的稳固开关性能方面取得了成功,这激发了我们揭示 SENTR 序列与 ML 性能之间的关系。为此,我们通过表征体内 36 个 EGS 的 1,296 种成对组合来扩展 SENTR 数据集(图 D)。1d、左和扩展数据图然后,我们在数据集上训练了 11 种不同的回归模型,以推断 EGS 与报告基因分析活性的关系(补充说明 1 和补充表 2-5 中的详细讨论、管道开发和建模结果)。
We first explored the use of thermodynamic and compositional features as inputs for ML models. We calculated 33 parameters using NUPACK and Biopython, such as G+C content, minimum free energy, partition function, base-pairing probabilities and ensemble size. We further obtained 3,249 features using the one-hot encoding technique to get numerical features from the nonnumerical parameters. When trained on these features, half of the models delivered meaningful predictions (R2 > 0.56) and ensemble tree-based models provided the strongest correlations to our dataset (highest R2 = 0.82) (Extended Data Fig. 2a,c and Supplementary Table 3).
我们首先探索了使用热力学和组合特征作为 ML 模型的输入。我们使用 NUPACK 和 Biopython 计算了 33 个参数,例如 G+C 含量、最小自由能、分区函数、碱基配对概率和集成大小。我们使用 one-hot 编码技术进一步获得了 3,249 个特征,从非数值参数中获取数值特征。当根据这些特征进行训练时,一半的模型提供了有意义的预测(R2 > 0.56),并且基于集成树的模型与我们的数据集提供了最强的相关性(最高 R2 = 0.82)(扩展数据图2a,c 和补充表 3)。
The predictive power of ML models might be limited by the information loss during thermodynamic calculations21. To improve prediction performance, we investigated the sequence motifs in EGSs as predictors of the SENTR function. We adopted the k-mers encoding technique (k = {3, 4, 5, 6}) to build a text document consisting of a dictionary of sequence motifs. Subsequently, we applied the bag-of-words model to tokenize the sequence motifs in this text corpus, with the frequency of occurrence of each motif used to train the models. After this feature engineering step, we obtained 1,418 features to train the ML models. Most sequence-based models delivered improved predictions (R2 > 0.62) (Extended Data Fig. 2a,b and Supplementary Table 2), with category boosting as the model with the highest R2 of 0.85 (Fig. 1d, right).
ML 模型的预测能力可能会受到热力学计算过程中信息损失的限制21。为了提高预测性能,我们研究了 EGS 中的序列基序作为 SENTR 函数的预测因子。我们采用 k-mers 编码技术 (k = {3, 4, 5, 6}) 来构建一个由序列模序字典组成的文本文档。随后,我们应用词袋模型对该文本语料库中的序列模体进行标记化,每个模体的出现频率用于训练模型。在这个特征工程步骤之后,我们获得了 1,418 个特征来训练 ML 模型。大多数基于序列的模型提供了更好的预测 (R2 > 0.62) (扩展数据图 .2a,b 和补充表 2),类别提升是 R2 最高的模型,为 0.85(图 2)。1d,右)。
Design and assessment of orthogonal SENTRs
正交 SENTR 的设计和评估
Orthogonality of riboregulators is a prerequisite for higher-order logic computation and multiplexed gene regulation. The SENTR system comprises multiple sequence components, both EGSs and intronic components, providing us with more tuning knobs to engineer high-performing and orthogonal SENTRs in a composable way.
核糖调节因子的正交性是高阶逻辑计算和多路基因调控的先决条件。SENTR 系统由多个序列组件组成,包括 EGS 和 intronic 组件,为我们提供了更多的调音旋钮,以可组合的方式设计高性能和正交的 SENTR。
We first designed and validated an orthogonal EGS library (Fig. 2a). The 24 EGS variants, EGS-O1 to EGS-O24, exhibited high on-state fluorescence, at least 380-fold greater than that in the off state (Fig. 2b). The complete library showed less than 16% crosstalk, while a crosstalk level above 10% was only observed in a single pairwise interaction of 3′ EGS-O22 and 5′ EGS-O21 (Fig. 2c). A subset of 17 variants displayed a library dynamic range of at least 100-fold, which means that the overall crosstalk level was lower than 1% (calculation of library dynamic range described in the Methods). We also found that our subset libraries provided higher dynamic ranges at the same library size compared to previously reported riboregulators, as plotted in Extended Data Fig. 3a.
我们首先设计并验证了一个正交 EGS 库(图 D)。2a). EGS-O1 到 EGS-O24 的 24 种 EGS 变体表现出高导通态荧光,至少比关态荧光高 380 倍(图 D)。2b)。整个文库显示低于 16% 的串扰,而仅在 3' EGS-O22 和 5' EGS-O21 的单次成对相互作用中观察到高于 10% 的串扰水平(图 D)。17 个变体的子集显示至少 100 倍的库动态范围,这意味着整体串扰水平低于 1%(方法中描述的库动态范围计算)。我们还发现,与之前报道的核糖调节因子相比,我们的子集库在相同的库大小下提供了更高的动态范围,如扩展数据图 1 所示。3a.
The P1 duplex was required for catalysis at the 5′ splice site. Thus, we reasoned that orthogonal SENTRs could also be built by designing orthogonal P1s (Fig. 2d). To find high-performing P1s, we screened different P1 sequences at different splice sites of the sfgfp gene (Supplementary Figs. 6a,b). Two P1s with high cis-splicing activities and low sequence homology were selected and tested for trans-splicing activities together with wild-type (WT) P1s. All P1s exhibited digital activation in the on state (Fig. 2e) and great orthogonality with low crosstalk (Fig. 2f).
P1 双链体是 5' 剪接位点催化所必需的。因此,我们推断,正交 SENTR 也可以通过设计正交 P1 来构建(图 D)。为了找到高性能的 P1,我们在 sfgfp 基因的不同剪接位点筛选了不同的 P1 序列(补充图 2)。6a,b)。选择两个具有高顺式剪接活性和低序列同源性的 P1,并与野生型 (WT) P1 一起测试反式剪接活性。所有 P1 在导通状态下都表现出数字激活(图 D)。2e) 和低串扰的出色正交性(图 .2f)。
We then investigated orthogonal intron split sites. We split the TT intron at three different loop regions, L1, L6 and L9 (Fig. 2g), and truncated the P6b and P9 regions to reduce the overefficient assembly of split introns to make their splicing activities reliant on EGSs (Extended Data Fig. 3b,c). After truncation, the EGS-fused introns split at L1, L6 and L9 displayed 4,738-fold, 66-fold and 3,689-fold higher slicing activities than those without EGSs (Fig. 2h). These split sites also showed good orthogonality, except for a single cross-interaction observed between 5′ L9 and 3′ L6 (Fig. 2i).
然后我们研究了正交内含子分裂位点。我们在三个不同的环区域 L1、L6 和 L9 处拆分 TT 内含子(图 D)。2g),并截短 P6b 和 P9 区域以减少分裂内含子的过度高效组装,使它们的剪接活动依赖于 EGS(扩展数据图 D)。3b,c)。截断后,EGS 融合的内含子在 L1、L6 和 L9 处分裂,切片活性比没有 EGS 的内含子高 4,738 倍、66 倍和 3,689 倍(图 D)。这些分裂位点也显示出良好的正交性,除了在 5' L9 和 3' L6 之间观察到的单一交叉相互作用(图 D)。2i)。
We proceeded to engineer SENTRs from orthogonal introns. We synthesized 17 group I introns from bacteriophages and other species (Supplementary Table 10) and validated their cis-splicing efficiencies in vivo (detailed design in the Methods, Extended Data Fig. 3 and Supplementary Table 11). Eight introns with high cis-splicing performance were then split and assayed for trans-splicing activities (Fig. 2j and Supplementary Table 12). All SENTRs fused with EGSs, except for the SunY intron, showed at least 66-fold higher fluorescence than those without EGSs (Fig. 2k). Moreover, the SENTRs displayed extremely low leakiness in the off state, as 23 of 24 riboregulators provided statistically indistinguishable off-state fluorescence from the bacterial autofluorescence (P > 0.18, two-tailed Welch’s t-test; Supplementary Fig. 7a). These riboregulators also exhibited excellent orthogonality, with all crosstalk levels lower than 6%, when fused with the same EGS, either EGS-1 or EGS-2 (Supplementary Fig. 7b,c). This is the first library of orthogonal split introns reported to date.
我们继续从正交内含子设计 SENTR。我们合成了来自噬菌体和其他物种的 17 个 I 组内含子(补充表 10),并在体内验证了它们的顺式剪接效率(方法中的详细设计,扩展数据图 10)。3 和补充表 11)。然后分裂 8 个具有高顺式剪接性能的内含子并测定反式剪接活性(图 D)。2j 和补充表 12)。除 SunY 内含子外,所有与 EGS 融合的 SENTR 都显示出比没有 EGS 的 SENTR 至少高 66 倍的荧光(图 D)。此外,SENTR 在关闭状态下显示出极低的泄漏率,因为 24 个核糖调节因子中有 23 个提供了与细菌自发荧光 (P > 0.18,双尾韦尔奇 t 检验;补充图这些核糖调节因子也表现出优异的正交性,当与相同的 EGS(EGS-1 或 EGS-2)熔合时,所有串扰水平均低于 6%(补充图 D)。7b,c)。这是迄今为止报道的第一个正交分裂内含子文库。
Finally, we demonstrated that the size of the orthogonal SENTR library could be further multiplied by integrating orthogonal intronic components with orthogonal EGSs. Composing seven orthogonal split introns with two orthogonal EGSs doubled the orthogonal library size to 14 (Fig. 2l). Although obvious cross-interactions were observed between EGS-1 and EGS-2 in TT and Thy introns, all of the other 254 combinations of 3′ RNA–5′ RNA pairs provided maximum crosstalk below 7.4%. In principle, this strategy allows us to easily establish large libraries with over 100 orthogonal SENTRs for building sophisticated gene networks.
最后,我们证明了通过将正交内含子组件与正交 EGS 集成,可以进一步成倍增加正交 SENTR 库的大小。用两个正交 EGS 组成 7 个正交分裂内含子,将正交文库大小增加了一倍,达到 14 个(图 D)。尽管在 TT 和 Thy 内含子中观察到 EGS-1 和 EGS-2 之间存在明显的交叉相互作用,但所有其他 254 种 3' RNA-5' RNA 对组合提供的最大串扰低于 7.4%。原则上,这种策略使我们能够轻松建立具有 100 多个正交 SENTR 的大型文库,以构建复杂的基因网络。
Robust trans-splicing regulation in genetic contexts
遗传环境中稳健的反式剪接调控
Conventional split-intron design relies on forming a U∙G base pair to determine the 5′ splice site14,15, limiting the splice sites to the base uracil. We expanded the splice sites to adenine, cytosine and guanine by demonstrating that the C∙A, A∙U and G∙C base pairs could also promote prominent trans-splicing activities (Fig. 3a). The spliced products were confirmed by Sanger sequencing following RNA extraction and RT–PCR (Supplementary Fig. 8). Other base-pair combinations such as U∙A, C∙G and A∙G also showed appreciable trans-splicing fluorescence (Extended Data Fig. 4a). This promiscuity is consistent with the in vitro results of the PC intron23, suggesting that this is a conserved molecular recognition mechanism among different group I introns. Although the C∙A base pair has been reported to trigger in vitro cis-splicing of the TT intron24,25, this is the first report of C, A and G as 5′ splice sites in in vivo trans-splicing circuits.
传统的分裂内含子设计依赖于形成 U∙G 碱基对来确定 5' 剪接位点14,15,将剪接位点限制在碱基尿嘧啶。我们通过将剪接位点扩展到腺嘌呤、胞嘧啶和鸟嘌呤,证明 C∙A、A∙U 和 G∙C 碱基对也可以促进突出的反式剪接活性(图 D)。3a). 剪接产物在 RNA 提取和 RT-PCR 后通过 Sanger 测序确认(补充图 D)。8). 其他碱基对组合,如 U∙A、C∙G 和 A∙G 也显示出明显的反式剪接荧光(扩展数据图 D)。这种混杂性与 PC 内含子23 的体外结果一致,表明这是不同 I 组内含子之间的保守分子识别机制。尽管据报道 C∙A 碱基对会触发 TT 内含子的体外顺式剪接24,25,但这是 C、A 和 G 作为体内反式剪接回路中 5' 剪接位点的首次报道。
The expansion of splice sites allows more flexible localizations of SENTRs; thus, we tested whether SENTRs could function robustly in various genetic contexts. SENTRs displayed significantly higher fluorescence in the on state at all nine sites tested inside sfGFP transcripts (P < 0.001, two-tailed Welch’s t-test; Supplementary Fig. 6b), as well as two separate sites in mCherry transcripts (Supplementary Fig. 6c). Moreover, SENTRs could regulate the activities of different transcription factors (TFs) (Fig. 3b). The split-intron-inserted transcription activator, extracytoplasmic function sigma factor (ECF) 20 (ref. 26), triggered 315-fold activation of GFP expression, while the hypersensitive response and pathogenicity (hrp) pathway activator HrpR27 increased the fluorescence 65-fold with both RNA strands and its cognate protein HrpS induced. The intron-inserted transcription repressors LmrA, PhlF and BM3R1 (ref. 28) provided 81-fold, 90-fold and 33-fold repression, respectively, upon the expression of both RNA strands.
剪接位点的扩展允许更灵活地定位 SENTR;因此,我们测试了 SENTRs 是否可以在各种遗传环境中稳健地发挥作用。在 sfGFP 转录本内测试的所有 9 个位点,SENTR 在开启状态下显示出显着更高的荧光 (P < 0.001,双尾韦尔奇 t 检验;补充图6b) 以及 mCherry 转录本中的两个独立位点(补充图 .此外,SENTRs 可以调节不同转录因子 (TFs) 的活性 (图 D)分裂内含子插入的转录激活剂胞浆外功能 sigma 因子 (ECF) 20 (参考文献 26) 触发了 GFP 表达的 315 倍激活,而超敏反应和致病性 (hrp) 途径激活剂 HrpR27 在 RNA 链及其同源蛋白 HrpS 诱导下将荧光增加了 65 倍。内含子插入的转录抑制因子 LmrA、PhlF 和 BM3R1(参考文献 28)在两条 RNA 链的表达上分别提供 81 倍、90 倍和 33 倍的抑制。
We then sought to engineer modular SENTRs by grafting the split introns into two sites in the 5′ UTR and one in the start codon (Extended Data Fig. 4b). These splice sites exhibited on–off ratios above 32 and site 2 provided on–off ratios exceeding 10,000 (Fig. 3c). Site 2 also displayed good modularity, as similar digital behavior was observed after changing the downstream reporter gene to mCherry and mTagBFP2 (Fig. 3d and Supplementary Fig. 9). The modular riboregulators could also be synergistically interfaced with constitutive control elements, ribosome-binding sites (RBSs) or inducible control of 5′ RNA to achieve fine-tuning of output levels (Extended Data Fig. 4f,g). Moreover, we also engineered the trans-splicing of the SunY intron at the N terminus of sfGFP (Extended Data Fig. 4h), showing the vast potential of our intron library for modular gene regulation.
然后,我们试图通过将分裂的内含子接枝到 5' UTR 中的两个位点和起始密码子中的一个位点来设计模块化 SENTR(扩展数据图 .这些剪接位点的开关比超过 32,而位点 2 的开关比超过 10,000(图 D)。位点 2 也显示出良好的模块化,因为在将下游报告基因更改为 mCherry 和 mTagBFP2 后观察到类似的数字行为(图 2)。3d 和补充图模块化核糖调节剂也可以与组成型控制元件、核糖体结合位点 (RBS) 或 5′ RNA 的诱导型控制协同连接,以实现输出水平的微调(扩展数据图 D)。4f,g)。此外,我们还在 sfGFP 的 N 末端设计了 SunY 内含子的反式剪接(扩展数据图 .4h),展示了我们的内含子文库在模块化基因调控方面的巨大潜力。
We further investigated the capability of SENTRs to regulate noncoding RNAs by implanting the split introns into the sgRNA scaffolds of the eukaryote-like CRISPR activation (CRISPRa) system29. The sgRNAs could only be reconstituted by trans-splicing of 5′ sgRNA and 3′ sgRNA to activate downstream sfGFP expression 54-fold (Extended Data Fig. 4e). This SENTR-regulated CRISPRa device also functioned modularly with four different spacers, all providing fold changes above 22 (Fig. 3e), allowing the future integration of SENTRs into CRISPR-based cascades and regulatory networks.
我们通过将分裂的内含子植入真核生物样 CRISPR 激活 (CRISPRa) 系统的 sgRNA 支架中,进一步研究了 SENTR 调节非编码 RNA 的能力29。sgRNA 只能通过 5' sgRNA 和 3' sgRNA 的反式剪接来重构,以激活下游 sfGFP 表达 54 倍(扩展数据图 .这种 SENTR 调节的 CRISPRa 设备还通过四个不同的垫片模块化运行,所有垫片都提供 22 倍以上的折叠变化(图 D)。3e),允许将来将 SENTR 集成到基于 CRISPR 的级联和监管网络中。
We moved on to reprogram the SENTRs to sense intracellular mRNAs. To this end, we designed the 3′ EGSs as mRNA-sensing domains, which could bind to and be repressed by the target mRNA (Fig. 3f and Supplementary Fig. 10). The mCherry sensor exhibited decreasing fluorescence with increasing induction levels of the mCherry gene (Extended Data Fig. 4c,d). The sensors could also detect the antibiotic resistance genes cat and ampR, conferring resistance to chloramphenicol and ampicillin, respectively, with on–off ratios over 300 (Fig. 3f).
我们继续对 SENTR 进行重编程以检测细胞内 mRNA。为此,我们将 3′ EGS 设计为 mRNA 感应结构域,它可以与靶 mRNA 结合并被其抑制(图 D)。3f 和补充图10). mCherry 传感器随着 mCherry 基因诱导水平的增加而表现出荧光减弱(扩展数据图 1)。4c,d)。传感器还可以检测抗生素耐药基因 cat 和 ampR,分别赋予对氯霉素和氨苄西林的耐药性,开关比超过 300(图 D)。3f)。
RNA-splicing-only logic circuitry
仅 RNA 剪接逻辑电路
The programmability and orthogonality of SENTRs enabled their integration into ribocomputing devices to build complicated RNA-splicing circuits. To demonstrate the logic capability, we implemented a set of two-input and multi-input logic gates based on synthetic trans-splicing (Supplementary Tables 13–19).
SENTR 的可编程性和正交性使它们能够集成到核糖计算设备中,以构建复杂的 RNA 剪接电路。为了演示逻辑功能,我们实现了一组基于合成反式熔接的双输入和多输入逻辑门(补充表 13-19)。
Six two-input logic devices (AND, NAND, NIMPLY, OR, IMPLY and NOR) were first constructed. The two-input AND and NAND gates were initially built using different inducible promoters to control the expression of 5′ RNA and 3′ RNA (Fig. 3b). For the AND gates, both RNA strands were induced to trigger the trans-splicing of sfGFP, mCherry or ECF20 and activated the output fluorescence over 2,000-fold, 438-fold and 300-fold, respectively (Extended Data Fig. 5a–c). For the NAND gates, transcription repressors were produced to inhibit the downstream GFP expression only with the existence of both RNAs. The 15 NAND gates built from repressors LmrA, PhlF and BM3R1 generated on–off ratios ranging from 24 to 181 (Extended Data Fig. 5d,e). We then built the NIMPLY gates by designing synthetic trans-repressing RNAs (trRNAs) to repress the 5′ RNA through direct hybridization. The NIMPLY gates exhibited tight repression, reaching 1,986-fold upon the induction of trRNA (Extended Data Fig. 5f). The OR gate was implemented by fusing two 5′ RNAs tandemly. The two 3′ RNAs, each responding to one of the 5′ EGSs, could both splice with the chimeric 5′ RNA and increase sfGFP expression 286-fold (Fig. 4a). The IMPLY gate was constructed by exploiting trRNAs to repress one of the 3′ RNAs in an OR gate and achieved a true–false ratio of 34 (Fig. 4b). Similarly, we designed two trRNAs to target the 3′ EGS and 5′ EGS in the AND gate to implement the NOR logic. Either trRNA could inhibit the output fluorescence at least 51-fold (Fig. 4c).
首先构建了六个双输入逻辑器件 (AND, NAND, NIMPLY, OR, IMPLY 和 NOR)。双输入 AND 和 NAND 门最初使用不同的诱导启动子构建,以控制 5' RNA 和 3' RNA 的表达(图 D)。对于 AND 门,诱导两条 RNA 链触发 sfGFP、mCherry 或 ECF20 的反式剪接,并分别激活输出荧光超过 2,000 倍、438 倍和 300 倍(扩展数据图 D)。5a-c)。对于 NAND 门,只有在存在两种 RNA 的情况下,才会产生转录抑制因子以抑制下游 GFP 表达。由阻遏物 LmrA、PhlF 和 BM3R1 构建的 15 个 NAND 门产生了 24 到 181 的开关比(扩展数据图 1)。5d,e)。然后,我们通过设计合成反式抑制 RNA (trRNA) 来构建 NIMPLY 门,以通过直接杂交抑制 5′ RNA。NIMPLY 门表现出紧密抑制,在诱导 trRNA 时达到 1,986 倍(扩展数据图 .OR门是通过串联融合两个5′ RNA来实现的。两个 3' RNA,每个 RNA 都响应其中一个 5' EGS,都可以与嵌合 5' RNA 剪接并将 sfGFP 表达增加 286 倍(图 D)。IMPLY 门是通过利用 trRNA 抑制 OR 门中的 3' RNA 之一来构建的,并实现了 34 的真假比(图 D)。同样,我们设计了两个 trRNA 来靶向 AND 门中的 3' EGS 和 5' EGS,以实现 NOR 逻辑。两种 trRNA 都可以抑制输出荧光至少 51 倍(图 D)。4c)。
To scale up these ribocomputing devices, we investigated whether the number of AND and NAND inputs could be increased by inserting more introns into the target gene (Supplementary Note 2). As a proof of concept, two pairs of split introns were inserted into the sfgfp gene to break the mRNA into three fragments. The three-input AND gate provided a 250-fold increase at the true state compared to all false states (Fig. 4d). We also implanted two split introns into the LmrA CDS to implement three-input NAND logic, with sevenfold repression in output GFP expression for the false state (Extended Data Fig. 5g). Furthermore, three split introns were inserted into the ecf20 gene to build four-input AND gates. From the 16-element truth table, we observed 16-fold and 13-fold increases in output GFP expression over the leakiest false state (Fig. 4e and Extended Data Fig. 5h).
为了扩大这些核糖计算设备,我们研究了是否可以通过将更多内含子插入靶基因来增加 AND 和 NAND 输入的数量(补充注 2)。作为概念验证,将两对分裂的内含子插入 sfgfp 基因,将 mRNA 分解成三个片段。与所有 false 状态相比,三输入 AND 门在 true 状态下提供了 250 倍的增长(图 D)。我们还将两个分裂的内含子植入 LmrA CDS 中,以实现三输入 NAND 逻辑,在错误状态下,输出 GFP 表达受到七倍抑制(扩展数据图 .此外,将 3 个分裂的内含子插入 ecf20 基因以构建 4 输入 AND 门。从 16 元素真值表中,我们观察到输出 GFP 表达在最泄漏的错误状态上增加了 16 倍和 13 倍(图 D)。4e 和扩展数据图5h)。
Split-intron and intein-enabled logic circuits
分裂内含子和内含肽启用的逻辑电路
We integrated SENTRs with existing biological components to engineer sophisticated genetic programs with higher dynamic ranges. We recently demonstrated that split inteins, which trigger protein trans-splicing to reconstitute separate peptides, are novel scalable tools for digital logic computation30,31. We hypothesized that incorporating inteins into the RNA-splicing-only circuits could reduce RNA-level circuit complexity and provide greater translational amplification by adding extra RBSs. Thus, we coupled split introns and inteins to build single-layer multi-input logic devices.
我们将 SENTR 与现有的生物成分相结合,以设计具有更高动态范围的复杂遗传程序。我们最近证明,触发蛋白质反式剪接以重建单独肽的分裂内含肽是数字逻辑计算的新型可扩展工具30,31。我们假设将内含肽掺入仅 RNA 剪接电路中可以降低 RNA 水平电路的复杂性,并通过添加额外的 RBS 提供更大的翻译扩增。因此,我们将分裂的内含子和内含肽耦合在一起,构建了单层多输入逻辑器件。
We started with three-input AND gates by positioning a pair of split introns into the CDS of N-terminal split ECF20–M86 chimeric protein (ECF20N–M86N)31. The split intron and intein M86 separated the ECF20 CDS into three parts, each under the control of a different inducible promoter on the medium-copy plasmid (Extended Data Fig. 6a). We observed a qualitative three-input AND gate behavior with true–false ratios above 100 (Extended Data Fig. 6b), showing the compatibility of SENTRs with split inteins. We proceeded to incorporate another split intron into the C-lobe of the split ECF20–M86 chimeric protein (ECF20C–M86C) to construct four-input AND gates (Fig. 5a). In this design, the four RNA strands underwent two trans-splicing reactions to produce two mature mRNAs translated into two peptides for protein trans-splicing. We generated nine four-input AND gates by screening different combinations of orthogonal EGSs for N-lobe and C-lobe RNA splicing and tuned the C-lobe RBS strengths to optimize their dynamic ranges. One combination (EGS-O8 and EGS-O25) yielded the highest activation fold of 172 (Fig. 5b), at least 20-fold greater than all previously reported four-input AND systems in E. coli1,13,32. Five other combinations generated fold changes ranging from 12 to 60 (Extended Data Fig. 6d).
我们从三输入 AND 门开始,将一对分裂的内含子定位到 N 末端分裂 ECF20-M86 嵌合蛋白 (ECF20N–M86N) 的 CDS 中31。分裂的内含子和内含肽 M86 将 ECF20 CDS 分成三个部分,每个部分在培养基拷贝质粒上不同的诱导启动子的控制下(扩展数据图 D)。我们观察到一个定性的三输入 AND 门行为,其真假比高于 100(扩展数据图 1)。6b),显示了 SENTR 与分裂内含肽的相容性。我们继续将另一个分裂的内含子掺入分裂的 ECF20-M86 嵌合蛋白 (ECF20C–M86C) 的 C 叶中,以构建四个输入 AND 门(图 D)。在该设计中,四条 RNA 链经历了两次反式剪接反应,产生两个成熟的 mRNA,翻译成两个肽用于蛋白质反式剪接。我们通过筛选 N 叶和 C 叶 RNA 剪接的正交 EGS 的不同组合生成了 9 个 4 输入 AND 门,并调整了 C 叶 RBS 强度以优化其动态范围。一种组合(EGS-O8 和 EGS-O25)产生最高的激活倍数,为 172 倍(图 D)。5b),至少比以前报道的大肠杆菌 1、13、32 中所有四输入 AND 系统大 20 倍。其他 5 种组合产生了从 12 到 60 不等的倍数变化(扩展数据图 .6d)。
Meanwhile, we incorporated split introns into intein-split repressors for scaling NAND logic devices. The first pair of split introns were inserted into the N-lobe CDS of split LmrA–SspGyrB protein (LmrAN–SspGyrBN) (Fig. 5f) and generated up to a 51-fold decrease in GFP with all three RNA strands induced (Extended Data Fig. 7b). The second pair was incorporated into the C-lobe CDS to increase the input numbers (Fig. 5c). Measurements of two four-input NAND gates revealed low leakage for the logical false state and 46-fold to 80-fold increases in output GFP expression with either RNA strand missing (Fig. 5d and Extended Data Fig. 7e).
同时,我们将分裂的内含子掺入内含肽分裂抑制器中,以缩放 NAND 逻辑器件。将第一对分裂的内含子插入分裂的 LmrA-SspGyrB 蛋白 (LmrA N-SspGyrBN) 的 N 叶 CDS 中(图 D)。5f),并在诱导所有三条 RNA 链的情况下使 GFP 降低高达 51 倍(扩展数据图 .第二对被合并到 C 叶 CDS 中以增加输入数(图 D)。5c). 两个四输入 NAND 门的测量显示,逻辑错误状态的泄漏率较低,并且在任一 RNA 链缺失的情况下,输出 GFP 表达增加了 46 倍至 80 倍(图 D)。5d 和扩展数据图7e)。
We next designed a six-input AND device by incorporating two orthogonal split inteins and three orthogonal split introns (Fig. 5e). The orthogonal split inteins, M86 and sspDnaX, separated ECF20 into three peptides (ECF20N–M86N, M86C–ECF20M–sspDnaXN and sspDnaXC–ECF20C) and rejoined them once coexpressed (Extended Data Fig. 6f,g). We further regulated these peptides by three split introns to build the AND device comprising six RNA generators encoded in 5.5 kb on two plasmids (Supplementary Table 23). This device generated apparent output for the logical true state, 30-fold above 63 false states with either RNA generator absent (Fig. 5f), representing the most complex AND circuit reported so far.
接下来,我们通过结合两个正交分裂内含肽和三个正交分裂内含子,设计了一个六输入 AND 器件(图 D)。5e)。正交分裂的内含肽 M86 和 sspDnaX 将 ECF20 分离成三种肽(ECF20N–M86N、M86C–ECF20M–sspDnaXN 和 sspDnaXC–ECF20C),并在共表达后重新连接它们(扩展数据图 .6f,g)。我们通过三个分裂的内含子进一步调节这些肽,以构建由两个质粒上编码的 6 个 RNA 生成子组成的 AND 装置,这些 RNA 生成子编码为 5.5 kb(补充表 23)。该设备为逻辑真实状态生成了明显的输出,在没有 RNA 发生器的情况下,比 63 个错误状态高出 30 倍(图 D)。5f),代表了迄今为止报道的最复杂的 AND 电路。
Lastly, we wondered whether integrating split introns and inteins could be a universal approach to engineering multi-input logic devices. To that end, we applied this strategy to alternative genes encoding reporter mCherry, activator ECF16 and repressor BM3R1 to build three-input AND gate, four-input AND gate (Extended Data Fig. 6c,e) and three-input NAND gates (Extended Data Fig. 7c,d), respectively. Split introns were grafted into the CDS of split mCherryN–Mja-KlbAN, ECF16N–sspGyrBN, ECF16C–sspGyrBC and BM3R1C–Cth-TerC chimeric peptides (Supplementary Note 3 and Supplementary Tables 20–22 for details). Measurements of these gates indicated that integrating split introns and inteins is generally applicable for implementing digital multi-input logic operations.
最后,我们想知道集成分离的内含子和内含肽是否可以成为设计多输入逻辑器件的通用方法。为此,我们将此策略应用于编码报告基因 mCherry、激活因子 ECF16 和阻遏因子 BM3R1 的替代基因,以构建 3 输入 AND 门、4 输入 AND 门(扩展数据图 .6c,e)和三输入 NAND 门(扩展数据图 17c,d) 分别。将分裂的内含子接枝到分裂的 mCherry N-Mja-KlbA N、ECF16N-sspGyrB N、ECF16C-sspGyrB C 和 BM3R1C-Cth-Ter C 嵌合肽的 CDS 中(详见补充注 3 和补充表 20-22)。这些门的测量表明,集成分裂的内含子和内含肽通常适用于实现数字多输入逻辑操作。
Discussion 讨论
In this study, we demonstrated split-intron-enabled trans-splicing as a reliable mechanism for synthetic regulation at the post-transcriptional level in E. coli. SENTR functions by colocalizing RNA fragments and reconstituting intact RNAs, in an analogous way to its protein counterparts (for example, heterodimerization of split effectors and protein splicing). This design is better suited for tight control of gene expression as either RNA strand cannot produce intact translation products alone. To ensure the low leakiness levels while using SENTRs in new genetic contexts, in silico prediction algorithms such as RBS calculator33 can be used to check whether there are any cryptic translation initiation sites inside 3′ RNA. Moreover, the seamless RNA splicing does not modify output RNA and protein sequence, offering an attractive option for regulating highly structured noncoding RNAs and proteins with critical N-terminal residues such as N-degrons. Next, the widespread distribution and autocatalytic splicing mechanism of group I intron allow SENTRs to regulate gene expression in miscellaneous organisms. Given that the TT intron can trans-splice in Gram-negative bacteria34, yeast35 and mammalian cells36, the orthogonal intron library established in this work will be a good candidate for transfer to clinically or industrially important organisms. Moreover, our success in using computational tools to design, analyze and predict EGSs can guide future improvements in other fields harnessing ribozymes for therapeutic purposes. For example, the same pipeline could be applied in engineering the homology arms for RNA circularization37,38 and increasing efficiencies for repairing or reprogramming host transcripts39,40.
在这项研究中,我们证明了分裂内含子启用的反式剪接是大肠杆菌转录后水平合成调控的可靠机制。SENTR 通过共定位 RNA 片段和重组完整 RNA 来发挥作用,其方式类似于其蛋白质对应物(例如,分裂效应子的异二聚化和蛋白质剪接)。这种设计更适合严格控制基因表达,因为任一 RNA 链都不能单独产生完整的翻译产物。为了确保在新的遗传环境中使用 SENTR 时的低泄漏水平,可以使用计算机预测算法(如 RBS 计算器33)来检查 3' RNA 内是否有任何隐蔽的翻译起始位点。此外,无缝 RNA 剪接不会改变输出的 RNA 和蛋白质序列,为调节高度结构化的非编码 RNA 和具有关键 N 末端残基(如 N-degrons)的蛋白质提供了一个有吸引力的选择。接下来,I 组内含子的广泛分布和自催化剪接机制使 SENTRs 能够调节其他生物体中的基因表达。鉴于 TT 内含子可以在 Gram 阴性细菌34、酵母35 和哺乳动物细胞36 中反式剪接,因此本研究中建立的正交内含子文库将是转移到临床或工业上重要生物体的良好候选者。此外,我们在使用计算工具设计、分析和预测 EGS 方面的成功可以指导其他利用核酶治疗目的的领域的未来改进。 例如,相同的管道可以应用于 RNA 环化的同源臂工程 37,38 和提高修复或重编程宿主转录本的效率39,40。
The engineering principles of synthetic biology have enabled us to reveal the exciting properties of group I intron splicing. Some of them (for example, low leakage, high programmability and flexible split site) were consistent with a recently published study by Gambill et al.34 in which transposon-based evolution and thermodynamic modeling were used to design split introns capable of detecting input RNAs. We also discovered for the first time that the group I intron could use all four bases (A, G, C and U) as 5′ splice sites and noncoding RNAs as exons for in vivo splicing. These studies together revealed how the large, catalytic RNAs could be manipulated and studied by synthetic biology approaches, which can be applied in other RNAs with complex functions, such as the group II ribozymes41 cleaving DNA and bridge RNAs42 guiding DNA recombination.
合成生物学的工程原理使我们能够揭示 I 组内含子剪接的令人兴奋的特性。其中一些(例如,低泄漏、高可编程性和灵活的分裂位点)与 Gambill 等人最近发表的一项研究一致34,其中基于转座子的进化和热力学建模用于设计能够检测输入 RNA 的分裂内含子。我们还首次发现,I 组内含子可以使用所有四个碱基 (A、G、C 和 U) 作为 5' 剪接位点,使用非编码 RNA 作为体内剪接的外显子。这些研究共同揭示了如何通过合成生物学方法操纵和研究大的催化 RNA,这些方法可以应用于其他具有复杂功能的 RNA,例如切割 DNA 的 II 组核酶41 和指导 DNA 重组的桥 RNA42。
Taking advantage of SENTR’s compatibility with existing regulatory modalities (transcriptional factors, CRISPR–Cas systems and inteins), we developed a split-biomolecule strategy for building single-layer multi-input processing devices. The marriage of multilevel regulatory modalities increased the information-processing capacity of a single regulator gene more than sixfold compared to layered TF-based circuits32. This multilevel regulatory scheme can be extended by further splitting the CDSs of target genes. Opportunities to push the limits are provided by recent reports that the introns and inteins can be split into three fragments and rejoin43,44 and that orthogonal split inteins can assemble multiple peptides into one protein45. Lastly, the integration of split-biomolecule-enabled devices with ribocomputing devices4,13, CRISPR–Cas systems46,47, layered TF-based circuits48 and protein-level regulation49,50 points to unprecedentedly sophisticated signal-processing circuits that operate at the transcriptional, post-transcriptional, translational and post-translational levels.
利用 SENTR 与现有调控方式(转录因子、CRISPR-Cas 系统和内含肽)的兼容性,我们开发了一种用于构建单层多输入处理设备的分裂生物分子策略。与基于 TF 的分层电路相比,多级调节模式的结合使单个调节基因的信息处理能力增加了六倍以上32。这种多层次调控方案可以通过进一步分裂靶基因的 CDS 来扩展。最近的报道提供了突破极限的机会,即内含子和内含肽可以分裂成三个片段并重新加入43,44,并且正交分裂的内含肽可以将多个肽组装成一种蛋白质45。最后,支持分裂生物分子的设备与核糖计算设备 4,13、CRISPR-Cas 系统46,47、基于分层 TF 的电路48 和蛋白质水平调节49,50 的集成指向前所未有的复杂信号处理电路,这些电路在转录、转录后、翻译和翻译后水平运行。
Methods 方法
Bacterial strains and growth media
细菌菌株和生长培养基
E. coli TOP10 was used for all cloning experiments. E. coli MC1061ΔpspF (ref. 29) was used to characterize sgRNA trans-splicing circuits. E. coli Marionette-Wild51 was used to characterize the multi-input NAND gates, six-input AND gates and RNA-splicing-only four-input AND gates. For overnight incubation and the induction assay, E. coli cells were grown in LB Miller medium (10 g L−1 tryptone, 10 g L−1 NaCl and 5 g L−1 yeast extract). SOC medium was used for bacterial transformation and TB medium was used for plasmid extraction. The growth medium was supplemented with appropriate antibiotics to working concentrations (ampicillin: 50 µg ml−1 for low-copy vector pSB4A3 or 100 µg ml−1 for high-copy vector pSB1A3; kanamycin: 50 µg ml−1). When required, appropriate inducers were added to the growth medium. Unless otherwise noted, the working concentrations of the inducers were as follows: 0.83 mM l-(+)-arabinose, 13.7 mM l-(+)-rhamnose, 12.8 μM N-(3-oxohexanoyl)-l-homoserine lactone (AHL), 100 ng ml−1 anhydrotetracycline (aTc), 125 μM cuminic acid, 10 μM 3-hydroxytetradecanoyl-homoserine lactone, 100 μM vanillic acid, 250 μM sodium salicylate and 1 mM IPTG.
E.coli TOP10 用于所有克隆实验。大肠杆菌 MC1061ΔpspF (参考文献 29) 用于表征 sgRNA 反式剪接电路。大肠杆菌 Marionette-Wild51 用于表征多输入 NAND 门、六输入 AND 门和仅 RNA 剪接的四输入 AND 门。对于过夜孵育和诱导测定,大肠杆菌细胞在 LB Miller 培养基(10 g L-1 胰蛋白胨、10 g L-1 NaCl 和 5 g L-1 酵母提取物)中生长。SOC 培养基用于细菌转化,TB 培养基用于质粒提取。在生长培养基中补充适当的抗生素至工作浓度(氨苄青霉素:低拷贝载体 pSB4A3 为 50 μg ml-1,高拷贝载体 pSB1A3 为 100 μg ml-1;卡那霉素:50 μg ml-1)。需要时,将适当的诱导剂添加到生长培养基中。除非另有说明,否则诱导剂的工作浓度如下:0.83 mM l-(+)-阿拉伯糖、13.7 mM l-(+)-鼠李糖、12.8 μM N-(3-氧代己酰基)-l-高丝氨酸内酯 (AHL)、100 ng ml-1 无氢四环素 (aTc)、125 μM 孜然酸、10 μM 3-羟基十四酰-高丝氨酸内酯、100 μM 香草酸、250 μM 水杨酸钠和 1 mM IPTG。
Plasmid construction
DNA constructs used and constructed in this study and key primer sequences are listed in Supplementary Tables 25–28. Sequences of biological parts and plasmids are provided in Supplementary Table 24 and Supplementary Data 1, respectively. Plasmids pYG-5/3E1 (pYG-5E1 and pYG-3E1), pYG-5/3O1, pYG-5/3E1L6, pYG-5/3E1L9, pYG-5/3E1BNE, pYG-5/3E1BNF, pYG-5/3E1NE2, pYG-5/3E1NB, pYG-5/3E1SY, pYG-5/3E1TY, pYG-5/3E1TD, pYG-G14, pYG-G16, pYG-G39, pYG-G41, pYG-G45, pYG-G48, pYG-G66, pYG-G68 and pYG-G76 were deposited to the Addgene public plasmid repository (https://www.addgene.org/Baojun_Wang/). Other plasmids generated in this study are available upon request to the corresponding author (B.W.).
DNA construction was performed following standard protocols of AQUA cloning52, Gibson assembly and Biobrick assembly. For AQUA and Gibson assembly, the DNA fragments were obtained by PCR amplification using Phusion high-fidelity DNA polymerase (F530, Thermo Fisher). The purified DNA fragments were mixed for direct transformation into bacterial cells (AQUA) or ligation by NEBbuilder HiFi DNA assembly Master Mix (E2621, New England Biolabs) (Gibson assembly). For Biobrick assembly, the plasmids were digested by FastDigest enzymes EcoRI, PstI, BcuI and XbaI (FD0274, FD0614, FD1253 and FD0684, respectively; Thermo Fisher), and the purified fragments were ligated by T4 DNA ligase (M0202, New England Biolabs). Plasmids constructed by AQUA and Gibson assembly were sequenced (Source Bioscience) and plasmids constructed by Biobrick assembly were confirmed by EcoRI–PstI enzyme digestion and gel electrophoresis.
Bacterial transformation
The bacterial transformation with one or two plasmids followed the previously described procedures31. In brief, 10–30 µl of competent cells were incubated with around 10 ng of each plasmid on ice for 30 min, after which heat shock was performed at 42 °C for 1 min, followed by incubation on ice for 3 min. Bacterial cells were then transferred to 96-well plates (CytoOne) loaded with 180 µl of SOC per well and incubated in the plate shaker (AS-03020-00, Allsheng) for 1 h at 37 °C and 1,000 rpm. After this period, 20–30 µl of the culture was diluted into three new wells containing 180 µl of LB supplemented with antibiotics to generate three biological repeats. The 96-well plates were then incubated in the plate shaker overnight at 37 °C and 1,000 rpm. The overnight culture was then used for induction assays.
Flow cytometry assays
The flow cytometry data were collected using an Attune NxT (Thermo Fisher) or CytoFLEX S (Beckman Coulter). Unless otherwise noted, the flow cytometry data were collected using an Attune NxT.
In flow cytometry assays, 2-µl overnight cultures were diluted 100-fold into 198 µl of LB medium with appropriate antibiotics and inducers in transparent 96-well plates (CytoOne). The bacterial culture was incubated in the plate shaker at 1,000 rpm for 5 or 6 h, after which 1–2 µl of culture was added to 198 µl of 1× PBS (K813-500ML, VWR) with 1 mg ml−1 kanamycin and incubated at 4 °C for at least 1 h.
Fixed samples were measured using the autosampler of Attune NxT (green fluorescence excitation laser: 488 nm, emission filter: 530/30 nm) or CytoFLEX S (green fluorescence excitation laser: 488 nm, emission filter: 525/40 nm; red fluorescence excitation laser: 561 nm, emission filter: 585/42 nm; blue fluorescence excitation laser: 405 nm, emission filter: 450/45 nm). At least 10,000 events were recorded using Attune Nxt 3.1.2.0 or CytExpert 2.5.0.77 software. Events were then gated by their forward scatter (FCS) and side scatter (SSC) and analyzed using FlowJo 10.7.1. Detailed gating strategies and examples are provided in Supplementary Figs. 12 and 13. A detailed description of flow cytometry data analysis is provided below.
Plate reader assays
In plate reader assays, the overnight cultures were diluted as described above into 96-well black plates (Greiner Bio-one) for 5-h or 6-h induction. End-point green fluorescence (excitation filter: 485 nm; emission filter: 520–10 nm; gain = 1,000), red fluorescence (excitation filter: 584 nm; emission filter: 620–10 nm; gain = 1,700) and blue fluorescence (excitation filter: 405–10 nm; emission filter: 460 nm; gain = 1,000) and culture optical density at 600 nm (OD600) were then measured using Omega Control 5.11 R4 software on a BMG FLUOstar plate reader and data were processed using Omega MARS 3.32 software (BMG Labtech). The background fluorescence and absorbance from the blank medium were subtracted and the fluorescence–OD600 ratio was calculated. Unless otherwise noted, the fluorescence–OD600 ratio of samples was then normalized by negative and positive controls, as described below.
General design of cis-splicing constructs
The cis-splicing constructs were expressed by the Plux2 promoter (inducible by AHL) on pSB3K3. We used two strategies to engineer efficient cis-splicing circuits, depending on the programmability of the given introns or the possible sequence change in output peptides.
For the TT intron, which is programmable in the IGS region and shows relatively robust splicing activities, we inserted the intron into different splice sites of sfGFP CDS and designed synthetic IGS according to different exon–intron junction sequences, such as 5′ P1 and 3′ P10. The IGS design adhered to the following principles:
5′-NNNNNTAAANNNNTTTGNNNNN-intron body-NNN-NNNN-3′
(5′ P1, splice site T, 5′ P1 extension, L1 (5′ P10), 3′ P1 extension, splice site G, 3' P1, 3′ P10)
For engineering other introns whose programmability was not studied well or for studying the effect of P1 sequences and 5′ base pairs on TT intron splicing, we chose to insert the intron sequences with their native exon–intron junction sequences into sfGFP CDS. As these junction sequences remain in the splicing products (Extended Data Fig. 3d,e) and their translation products might obstruct the proper folding of sfGFPs, we identified proper sites tolerant to the insertion of junction sequences in sfGFP. We selected four candidate sites, one (codon 1) at the N terminus and three (codons 157, 172 and 194)53 inside the β-turns of sfGFP, and incorporated junction sequences of the TT intron into these sites to mimic the translation of splicing products. The 5′ and 3′ junction sequences were adjusted to ensure correct reading frames and prevent possible stop codons. For codon 1, extra linker sequences (AKAKA) were added to improve the translation rates, as previously described54. The junction sequence-inserted sfGFPs showed comparable fluorescence to the full-length sfGFP and the TT introns inserted into C1 and C157 showed the highest cis-splicing efficiencies (Extended Data Fig. 3f). Thus, C157 was chosen for assaying the trans-splicing activities of different P1s and introns and the insertion of junction sequences from different introns did not abolish the sfGFP fluorescence (Extended Data Fig. 3g). C1 was used for engineering the modular trans-splicing of the SunY intron (Extended Data Fig. 4h).
General design of trans-splicing riboregulators
The trans-splicing riboregulators were built by splitting the cis-splicing introns and fusing EGSs. Unless otherwise noted, the 5′ RNA and 3′ RNA were expressed by the Plux2 promoter (inducible by AHL) on pSB3K3 and pSB1A3, respectively. RBS30 (BBa_B0030) was normally used for the 5′ RNA generator.
To optimize the SENTR design, we first studied the effect of EGS lengths on trans-splicing activities. As a starting point, we generated three EGSs with lengths of 10, 20, 30, 40 and 50 nt each by NUPACK and validated their performance. The best-performing candidates among the three EGSs are shown in Extended Data Fig. 1b. We noticed that 10-nt EGSs could already produce obvious trans-splicing fluorescence and SENTRs with 20-nt and 30-nt EGSs yielded the highest outputs. To further determine the optimal EGS lengths, we truncated the 50-nt EGSs by every 10 nt (Extended Data Fig. 1c, top) and then every 5 nt from 30 nt (Extended Data Fig. 1c, bottom) to generate EGSs with length gradients. Among these EGSs, the 20-nt EGS performed best, encouraging us to build SENTR libraries with 20-nt EGSs. We also demonstrated that other EGS lengths could be exploited to expand the design space, by validating a 13-variant 30-nt EGS library (Extended Data Fig. 1d). Most 30-nt EGSs (8 of 13) yielded on–off ratios above 100, with four variants exceeding 1,000.
The designs of 5′ spacers and 3′ spacers were also investigated. The spacers (gray in Fig. 1a) separate the EGS duplexes from P1s to avoid direct interaction between the two components. In some previous reports, the 3′ spacers were engineered to form a P10 duplex with 3′ exons to increase trans-splicing efficiencies16,40. However, the effect of the P10 duplex was dependent on sequence context, as some constructs with P10 duplexes were reported to show poor efficiencies18. We designed length gradients of 5′ spacers and 3′ spacers with or without the P10 duplex and tested different combinations of these spacers. As shown in Extended Data Fig. 1e, SENTRs with 6-nt 5′ spacers and 3′ spacers forming 4-bp to 12-bp P10 outperformed the others.
Therefore, the final designs for SENTRs split at L1 were as follows (from 5′ to 3′ end):
5′ RNA: 5′ exon-NNNNNTAAANNNNNN-5′ EGS
3′ RNA: 3′ EGS-NNNNNNTTTGNNNNN-intron body-NNN-NNNNNN-3′ exon
(5′ P1, splice site T, 5′ P1 extension, 5′ spacer, 3′ P1 extension, splice site G, 3′ P1, 3′ P10)
For SENTRs from the TT intron split at other sites or from other introns, the EGSs were directly fused to intron halves without any spacers. The designs were as follows (from 5′ to 3′ end):
5′ RNA: 5′ exon-5′ intron half-5′ EGS
3′ RNA: 3′ EGS-3′ intron half-3′ exon
Computational sequence design by NUPACK
The 56-EGS library was computationally designed using the NUPACK online tool19. A 26-nt 5′ spacer–5′ EGS chimeric sequence and 20-nt 3′ EGS were used for designing the EGS duplex. The two RNA strands were designed to be unstructured on their own and form a 20-bp heteroduplex when present together. The sequence patterns AAAA, CCCC, GGGG, UUUU, KKKKK, MMMMM, RRRRR, SSSSS, WWWWW and YYYYY were prevented.
The orthogonal EGS library was designed using the multitube design program55 in the local NUPACK software, following the guideline for designing orthogonal pathways provided in the NUPACK 4.0 user guide (https://docs.nupack.org/advanced/#design-orthogonal-reaction-pathways). As described above, we defined the binding of EGS as a one-step reaction. In step 0, two strands remain unstructured; in step 1, the two strands bind to form the on-target duplex. Thus, to design orthogonal systems, there were two step tubes (step 0 tube and step 1 tube) per system and a global crosstalk tube. The orthogonal systems were then defined by loop in Python and designed with low global crosstalk. Sequence patterns AAAA, CCCC, GGGG, UUUU, MMMMMM, KKKKKK, WWWWWW, SSSSSS, RRRRRR and YYYYYY were prevented.
The trans-repressing trans-splicing devices were designed using NUPACK online tool. For NIMPLY devices, sfGFP trans-splicing at s482 with EGS-2 was repressed by trRNAs. A 20-nt extended domain was fused to 5′ EGS-2 as toeholds for improved binding to trRNA and disassociation of the EGS duplex. For the IMPLY gate, trRNA bound to 3′ EGS-O1 and the 20-nt extended domain to inhibit the 3′ RNA-2 of trans-splicing at site 2. For the NOR gate, two extended domains were fused to 5′ EGS-O1 and 3′ EGS-O1 for the hybridization of two trRNAs to EGSs to repress the modular trans-splicing at site 2. The sequence patterns AAAA, CCCC, GGGG, UUUU, KKKKK, MMMMM, RRRRR, SSSSS, WWWWW and YYYYY were prevented.
The mRNA-sensing SENTR devices were implemented by designing extended 3′ EGSs complementary to 40-nt or 50-nt target sequences in the target mRNAs. The target regions in mCherry, cat and ampR are listed in Supplementary Table 8.
All NUPACK scripts used in this study were deposited to Zenodo56. The EGSs were designed using default energy parameters and 37 °C.
Assessment of crosstalk levels
To quantify the orthogonality, we measured the trans-splicing fluorescence of all pairwise 3′ RNA–5′ RNA combinations. Crosstalk was then calculated by dividing the fluorescence obtained from the given pair of 5′ RNA and 3′ RNA by the fluorescence of the 3′ RNA with its cognate 5′ RNA. To assess the subset library orthogonality, the minimum dynamic range of the given library was determined as the reciprocal of the maximum crosstalk level inside the library1,4.
Total RNA extraction, DNase treatment and RT–qPCR
The bacterial strains transformed with plasmids were cultured in 200 μl of LB medium with appropriate antibiotics at 37 °C and 1,000 rpm overnight. They were diluted 100-fold into 200 μl of LB medium with appropriate antibiotics and inducers. The diluted culture was incubated at 37 °C and 1,000 rpm for 5 h and centrifuged at 12,000 rpm for 2 min. The cell pellets were resuspended in 1× TE buffer (Sangon Biotech) with proteinase K (39450-01-6, Macklin) and lysozyme (12650-88-3, Sigma-Aldrich) and incubated at 37 °C and 220 rpm for 10 min. The total RNA was extracted using the TaKaRa MiniBEST Universal RNA Extraction Kit (9767, TaKaRa). RNase-free DNase (M0303L, New England Biolabs) was used to digest the residual genomic DNA. A total of 1 μg of RNA was used for generating complementary DNA using the PrimeScript RT reagent kit (RR037A, TaKaRa) in 10-μl reaction systems through RT.
The ChamQ Blue universal SYBR qPCR master mix (Q312-02, Vazyme) was used for qPCR in 20-μl reaction systems. The samples were loaded onto a MicroAmp optical 96-well reaction plate (4358293, Thermo Fisher) sealed by MicroAmp optical adhesive film (4311971, Thermo Fisher). The qPCR reactions were run on a StepOnePlus real-time qPCR system (Applied Biosystems). The qPCR program is provided in Supplementary Table 29. The primers used for qPCR are provided in Supplementary Table 28. Typical standard curves for qPCR were generated using the 3K3-Plux2-sfGFP plasmid, as provided in Supplementary Fig. 3c.
Quantification and statistical analysis
For flow cytometry data analysis, there were two sets of controls: bacterial cells transformed with empty plasmid vectors and, thus, not expressing reporter genes (negative control) and bacterial cells expressing reporter genes from a constitutive promoter J23101 (positive control). The representative genetic circuits for negative and positive controls are provided in Supplementary Fig. 2a. For data collected by flow cytometer, the events were gated by FCS and SSC and at least 10,000 events were analyzed in FlowJo 10.7.1. The detailed gating strategies, cell counts and fluorescence histograms for negative and positive controls are provided in Supplementary Figs. 12 and 13. The fluorescence intensity was calculated as the median fluorescence of a gated population. The mean fluorescence of the negative control was subtracted from each sample’s fluorescence value. The NC-normalized fluorescence value was then divided by the mean value of the positive control fluorescence to generate the fluorescence in RPU.
For plate reader data analysis, there were three sets of controls; a blank with medium, bacterial cells transformed with empty plasmid vectors (negative control) and bacterial cells expressing reporter genes from J23101 (positive control). For plate reader data, the fluorescence and OD600 were first corrected by the subtracting fluorescence and OD600 of the blank and the fluorescence–OD600 ratio was calculated for each well in Omega MARS 3.32 software (BMG Labtech).The mean fluorescence–OD600 ratio of the negative control was then subtracted from each well’s fluorescence–OD600 ratio and the NC-normalized fluorescence–OD600 ratio was divided by the mean value of the positive control fluorescence–OD600 ratio to generate the fluorescence–OD600 ratio in RPU.
The means and s.d. of the fluorescence data were calculated and plotted in GraphPad Prism 8.1.2. In bar graphs, bars show the mean values and error bars represent the s.d. of at least n = 3 biological replicates. In line charts, points show the mean values and error bars represent the s.d. of n = 3 biological replicates. In the qPCR standard curve (Supplementary Fig. 3c), points the show mean values and error bars represent the s.d. of n = 3 technical replicates. The heat maps show the mean values of n = 3 biological replicates.
The s.d. of on–off ratios and true–false ratios were calculated by adopting equations from previous studies29,44. The s.d. of the on–off ratios was calculated as follows:
To calculate the s.d. of the true–false ratios of logic gates, we used the mean and s.d. of the leakiest false states and lowest true states. The s.d. of the true–false ratios was calculated as follows:
Two-tailed Welch’s t-tests were performed using Prism. The significance and P values are described in the figure legends (*P < 0.05, **P < 0.01 and ***P < 0.001). Linear regression was performed in Prism to determine the correlation between the experiment and the prediction results of ML. All statistical analysis results are provided on Zenodo56.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data that support the findings of this study were deposited at Zenodo56 and are publicly available as of the date of publication.
Code availability
All original code and NUPACK scripts were deposited at Zenodo56 and are publicly available as of the date of publication.
References
Green, A. A., Silver, P. A., Collins, J. J. & Yin, P. Toehold switches: de-novo-designed regulators of gene expression. Cell 159, 925–939 (2014).
Xiang, Y., Dalchau, N. & Wang, B. Scaling up genetic circuit design for cellular computing: advances and prospects. Nat. Comput. 17, 833–853 (2018).
Zhao, E. M. et al. RNA-responsive elements for eukaryotic translational control. Nat. Biotechnol. 40, 539–545 (2021).
Kim, J. et al. De novo-designed translation-repressing riboregulators for multi-input cellular logic. Nat. Chem. Biol. 15, 1173–1182 (2019).
Chappell, J., Takahashi, M. K. & Lucks, J. B. Creating small transcription activating RNAs. Nat. Chem. Biol. 11, 214–220 (2015).
Chappell, J., Westbrook, A., Verosloff, M. & Lucks, J. B. Computational design of small transcription activating RNAs for versatile and dynamic gene regulation. Nat. Commun. 8, 1051 (2017).
Win, M. N. & Smolke, C. D. Higher-order cellular information processing with synthetic RNA devices. Science 322, 456–460 (2008).
Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R. & Benenson, Y. Multi-input RNAi-based logic circuit for identification of specific cancer cells. Science 333, 1307–1311 (2011).
Ghodasara, A. & Voigt, C. A. Balancing gene expression without library construction via a reusable sRNA pool. Nucleic Acids Res. 45, 8116–8127 (2017).
Li, Y., Teng, X., Zhang, K., Deng, R. & Li, J. RNA strand displacement responsive CRISPR/Cas9 system for mRNA sensing. Anal. Chem. 91, 3989–3996 (2019).
Cox, D. B. T. et al. RNA editing with CRISPR–Cas13. Science 358, 1019–1027 (2017).
Qian, Y. et al. Programmable RNA sensing for cell monitoring and manipulation. Nature 610, 713–721 (2022).
Green, A. A. et al. Complex cellular logic computation using ribocomputing devices. Nature 548, 117–121 (2017).
Sullenger, B. A. & Cech, T. R. Ribozyme-mediated repair of defective mRNA by targeted trans-splicing. Nature 371, 619–622 (1994).
Lan, N., Howrey, R. P., Lee, S. W., Smith, C. A. & Sullenger, B. A. Ribozyme-mediated repair of sickle β-globin mRNAs in erythrocyte precursors. Science 280, 1593–1596 (1998).
Byun, J., Lan, N., Long, M. & Sullenger, B. A. Efficient and specific repair of sickle-globin RNA by trans-splicing ribozymes. RNA 9, 1254–1263 (2003).
Kwon, B. S. et al. Intracellular efficacy of tumor-targeting group I intron-based trans-splicing ribozyme. J. Gene Med. 13, 89–100 (2011).
Olson, K. E. & Muller, U. F. An in vivo selection method to optimize trans-splicing ribozymes. RNA 18, 581–589 (2012).
Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).
Michel, F., Hanna, M., Green, R., Bartel, D. P. & Szostak, J. W. The guanosine binding site of the Tetrahymena ribozyme. Nature 342, 391–395 (1989).
Angenent-Mari, N. M., Garruss, A. S., Soenksen, L. R., Church, G. & Collins, J. J. A deep learning approach to programmable RNA switches. Nat. Commun. 11, 5057 (2020).
Valeri, J. A. et al. Sequence-to-function deep learning frameworks for engineered riboregulators. Nat. Commun. 11, 5058 (2020).
Baum, D. A., Sinha, J. & Testa, S. M. Molecular recognition in a trans excision-splicing ribozyme: non-Watson–Crick base pairs at the 5′ splice site and ωG at the 3′ splice site can play a role in determining the binding register of reaction substrates. Biochemistry 44, 1067–1077 (2005).
Doudna, J. A., Cormack, B. P. & Szostak, J. W. RNA structure, not sequence, determines the 5′ splice-site specificity of a group I intron. Proc. Natl Acad. Sci. USA 86, 7402–7406 (1989).
Strobel, S. & Cech, T. Minor groove recognition of the conserved G·U pair at the Tetrahymena ribozyme reaction site. Science 267, 675–679 (1995).
Rhodius, V. A. et al. Design of orthogonal genetic switches based on a crosstalk map of sigmas, anti-sigmas, and promoters. Mol. Syst. Biol. 9, 703 (2013).
Wang, B. & Buck, M. Rapid engineering of versatile molecular logic gates using heterologous genetic transcriptional modules. Chem. Commun. 50, 11642–11644 (2014).
Stanton, B. C. et al. Genomic mining of prokaryotic repressors for orthogonal logic gates. Nat. Chem. Biol. 10, 99–105 (2014).
Liu, Y., Wan, X. & Wang, B. Engineered CRISPRa enables programmable eukaryote-like gene activation in bacteria. Nat. Commun. 10, 3693 (2019).
Ho, T. Y. H. et al. A systematic approach to inserting split inteins for Boolean logic gate engineering and basal activity reduction. Nat. Commun. 12, 2200 (2021).
Pinto, F., Thornton, E. L. & Wang, B. An expanded library of orthogonal split inteins enables modular multi-peptide assemblies. Nat. Commun. 11, 1529 (2020).
Moon, T. S., Lou, C., Tamsir, A., Stanton, B. C. & Voigt, C. A. Genetic programs constructed from layered logic gates in single cells. Nature 491, 249–253 (2012).
Reis, A. C. & Salis, H. M. An automated model test system for systematic development and improvement of gene expression models. ACS Synth. Biol. 9, 3145–3156 (2020).
Gambill, L., Staubus, A., Mo, K., Ameruoso, A. & Chappell, J. A split ribozyme that links detection of a native RNA to orthogonal protein outputs. Nat. Commun. 14, 543 (2023).
Ayre, B. G., Köhler, U., Turgeon, R. & Haseloff, J. Optimization of trans-splicing ribozyme efficiency and specificity by in vivo genetic selection. Nucleic Acids Res. 30, e141 (2002).
Hasegawa, S., Gowrishankar, G. & Rao, J. Detection of mRNA in mammalian cells with a split ribozyme reporter. ChemBioChem 7, 925–928 (2006).
Chen, R. et al. Engineering circular RNA for enhanced protein production. Nat. Biotechnol. 41, 262–272 (2022).
Wesselhoeft, R. A., Kowalski, P. S. & Anderson, D. G. Engineering circular RNA for potent and stable translation in eukaryotic cells. Nat. Commun. 9, 2629 (2018).
Köhler, U., Ayre, B. G., Goodman, H. M. & Haseloff, J. trans-Splicing ribozymes for targeted gene delivery. J. Mol. Biol. 285, 1935–1950 (1999).
Kwon, B. S. et al. Specific regression of human cancer cells by ribozyme-mediated targeted replacement of tumor-specific transcript. Mol. Ther. 12, 824–834 (2005).
Liu, Z. X. et al. Hydrolytic endonucleolytic ribozyme (HYER) is programmable for sequence-specific DNA cleavage. Science 383, eadh4859 (2024).
Durrant, M. G. et al. Bridge RNAs direct programmable recombination of target and donor DNA. Nature 630, 984–993 (2024).
Lienert, F. et al. Two- and three-input TALE-based AND logic computation in embryonic stem cells. Nucleic Acids Res. 41, 9967–9975 (2013).
Nadimi, M., Beaudet, D., Forget, L., Hijri, M. & Lang, B. F. Group I intron-mediated trans-splicing in mitochondria of Gigaspora rosea and a robust phylogenetic affiliation of arbuscular mycorrhizal fungi with Mortierellales. Mol. Biol. Evol. 29, 2199–2210 (2012).
Jillette, N., Du, M., Zhu, J. J., Cardoz, P. & Cheng, A. W. Split selectable markers. Nat. Commun. 10, 4968 (2019).
Liu, Y. et al. Reprogrammed tracrRNAs enable repurposing of RNAs as crRNAs and sequence-specific RNA biosensors. Nat. Commun. 13, 1937 (2022).
Gander, M. W., Vrana, J. D., Voje, W. E., Carothers, J. M. & Klavins, E. Digital logic circuits in yeast with CRISPR–dCas9 NOR gates. Nat. Commun. 8, 15459 (2017).
Nielsen, A. A. K. et al. Genetic circuit design automation. Science 352, aac7341 (2016).
Chen, Z. et al. De novo design of protein logic gates. Science 368, 78–84 (2020).
Gao, X. J., Chong, L. S., Kim, M. S. & Elowitz, M. B. Programmable protein circuits in living cells. Science 361, 1252–1258 (2018).
Meyer, A. J., Segall-Shapiro, T. H., Glassey, E., Zhang, J. & Voigt, C. A. Escherichia coli ‘Marionette’ strains with 12 highly optimized small-molecule sensors. Nat. Chem. Biol. 15, 196–204 (2019).
Beyer, H. M. et al. AQUA cloning: a versatile and simple enzyme-free cloning approach. PLoS ONE 10, e0137652 (2015).
Abedi, M. R., Caponigro, G. & Kamb, A. Green fluorescent protein as a scaffold for intracellular presentation of peptides. Nucleic Acids Res. 26, 623–630 (1998).
Che, A. J. Engineering RNA Logic with Synthetic Splicing Ribozymes. PhD thesis, Massachusetts Institute of Technology (2009).
Wolfe, B. R., Porubsky, N. J., Zadeh, J. N., Dirks, R. M. & Pierce, N. A. Constrained multistate sequence design for nucleic acid reaction pathway engineering. J. Am. Chem. Soc. 139, 3134–3144 (2017).
Gao, Y. et al. Programmable trans-splicing riboregulators for complex cellular logic computation. Zenodo https://doi.org/10.5281/zenodo.13743081 (2024).
Ma, D. et al. Multi-arm RNA junctions encoding molecular logic unconstrained by input sequence for versatile cell-free diagnostics. Nat. Biomed. Eng. 6, 298–309 (2022).
Acknowledgements
We thank Y. Liu, X. Wan and F. Pinto for providing plasmid constructs and helpful suggestions. We thank R. Grima and S. Granneman for their support and advice. We thank C. Voigt (MIT) for providing the Marionette-Wild E. coli strain. This work was supported by the National Key R&D Program of China (2023YFF1204500 to B.W.), the ‘Pioneer’ and ‘Leading Goose’ R&D Program of Zhejiang (2024C03011 to B.W.), the National Natural Science Foundation of China (32271475 and 32320103001 to B.W.), the Fundamental Research Funds for the Central Universities (226-2022-00214 to B.W.) and the Kunpeng Action Program Award of Zhejiang Province (to B.W.). Y.G. acknowledges support by a Darwin Trust of Edinburgh scholarship.
Ethics declarations
Competing interests
B.W. and Y.G. have filed a patent application (number CN2024105834200) based on the presented work. The other authors declare no competing interests.
Peer review
Peer review information
Nature Chemical Biology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Characterizing, analyzing, and optimizing SENTR performance.
a, In vivo fluorescence of 1369 combinations of 5′ RNA-3′ RNA pairs. All 1296 combinations of 5′ RNAs with 36 5′ EGSs and 3′ RNAs with 36 3′ EGSs were validated, and another 73 groups of the 5′ RNA without 5′ EGS and 3′ RNAs without 3′ EGS were also characterized as control groups. b, SENTR performance with different lengths of EGSs. Different lengths of EGSs were designed by NUPACK with three trials, and the best-performing one of each length was shown. c, SENTR performance with EGS length gradient. The 50 nt EGS was shortened by every 10 nt (upper panel) and then every 5 nt (lower panel). d, ON state and OFF state fluorescence (upper panel) and ON/OFF ratios (lower panel) of 30 nt EGS library. e, The effect of 5′ spacer and 3′ spacer on SENTR performance. The 3′ spacer was designed to form P10 or not with the 3′ exon. Different lengths of 5′ spacers and 3′ spacers were tested. Error bars, mean values ± SD (n = 3). RPU, relative promoter units. Data in (a-e) were collected by flow cytometry. For (d), ON-state fluorescence was measured from cells expressing 5′ RNA and 3′ RNA, and OFF state fluorescence was determined from cells expressing only 3′ RNA. For (b-d), bars show mean values and error bars represent s.d. of n = 3 biological replicates. For (a, e), the heat maps show mean values of n = 3 biological replicates. Bacteria transformed with empty vectors and J23101-sfGFP were used as negative and positive controls for calculating fluorescence values in relative promoter units (RPU).
Extended Data Fig. 2 Evaluation of machine-learning (ML) models.
a, Two strategies to split the dataset into the training sets (purple) and the evaluation set (blue). Left panel shows the randomly splitting strategy where the data were randomly selected to train the models, and the rest data were used to test the models. Right panel shows the systematic splitting strategy where the data in quadrant 1 (Q1) were used for training the models, and those in Q2, Q3, or Q4 were used to test the models. b-c, R2 metrics between experimental and prediction values for ML models trained on sequence motifs (b) and thermodynamic parameters (c) using 16-fold cross-validation. Upper panels are R2 metrics of models trained on the randomly split dataset. Lower panels are R2 metrics of models trained on the systematically split dataset and the prediction performance for data points in quadrant 2 (Q2), quadrant 3 (Q3), and quadrant 4 (Q4). The linear regression results in lower panels of (b) are not plotted due to very poor R2 values. For (b-c), data are presented as mean values +/− SD of n = 16 cross validation. Three models (Neural Network, Bayesian Regression, and Ridge Regression) are omitted from panels (c) due to their poor performances in both scenarios.
Extended Data Fig. 3 Designing EGSs and intronic sequence components of SENTRs.
a, Comparison of orthogonal library size and library dynamic range for SENTRs and previous riboregulators. LIRA, loop-initiated RNA activators57. 3WJ, three-way junction. b-c, Truncating P6b (b) or P9 (c) to eliminate the trans-splicing activity of SENTRs without EGSs. w/o EGS: without EGS; w/ EGS: with EGS. d-e, Schematic of cis-splicing constructs with different P1s (d) and introns (e). The 5′ P1 sequence (d) and native exon-intron junction sequence (e) will remain in mature RNA. f, Cis-splicing activities of TT intron with native junction sequences at four splice sites of sfGFP. The junction sequences were also inserted without introns to mimic the translation of splicing products. g, Effect of native exon-intron junction sequences from 17 introns on sfGFP fluorescence. h, Cis-splicing activities of 17 introns. Thy intron was truncated to remove the open reading frame (ORF) in the P8 region. Data in (b-c) were collected by plate reader. Data in (f-h) were collected by flow cytometry. For (b-c), ON-state fluorescence was measured from cells expressing 5′ RNA and 3′ RNA, and OFF-state fluorescence was from cells expressing 3′ RNA. For (b-c, f-h), bars show mean values and error bars represent s.d. of n = 3 biological replicates. Bacteria transformed with empty vectors and J23101-sfGFP were used as negative and positive controls for calculating fluorescence in RPU.
Extended Data Fig. 4 Robust, tunable, and modular SENTR for gene regulation.
a, Trans-splicing performance for 16 base pair combinations at 5′ splice site. b, Choices of 5′ splice sites inside 5′ UTR (site −6 and −1) or start codon (site 2). c, Design schematic of mRNA sensor. The mRNA binds to 3′ EGS to repress the 3′ RNA, thus lowering the trans-splicing fluorescence. d, Transfer function for the mCherry sensor (blue curve, left y-axis) and mCherry (purple curve, right y-axis) as a function of rhamnose concentration. The sensor expression was induced by 3.2 µM AHL, and mCherry expression was induced by rhamnose. e, Splice site selection and performance for sgRNA trans-splicing. Combinations of RNA strands were achieved via the induction (+) and non induction (-) of certain RNA expression, and absence (∅) of certain RNA generator in circuits. The active (+) and deactivated (-) introns were used for assaying the sgRNA activity without trans-splicing reactions. Inset, sgRNA design and 5′ splice site. f, Genetic design architecture of modular SENTR. Yellow rectangles, split intron halves. Orange rectangles, EGSs. g, Tuning modular SENTRs by seven RBSs and four induction levels of 5′ RNAs. h, Modular SENTR from SunY intron. The split SunY intron, with its native junction sequences, was inserted into codon 1 of sfGFP CDS. Data in (a, d-e, g-h) were collected by flow cytometry. For (a, g), the heat maps show mean values of n = 3 biological replicates. For (e, h), bars show mean values and error bars represent s.d. of n = 3 biological replicates. For (d), the points show mean values and error bars represent s.d. of n = 3 biological replicates. Bacteria transformed with empty vectors and J23101-sfGFP/mCherry were used as negative and positive controls for calculating fluorescence values in relative promoter units (RPU).
Extended Data Fig. 5 RNA-splicing only logic gates.
a-c, Fluorescence for the two-input AND gates from ECF20 (a), sfGFP (b), and mCherry (c). d-e, Fluorescence for the two-input NAND gates from LmrA, PhlF (d), and BM3R1 (e). f, Fluorescence for the two-input NIMPLY gates. g, Fluorescence for the three-input NAND gate from BM3R1. h, Fluorescence for the four-input AND gate. Inset, fluorescence levels on logarithmic scales. Data in (a-b, d-h) were collected by flow cytometry. Data in (c) were collected by plate reader. Bars and errors represent the mean values and s.d. of six biological repeats (n = 6) in (g) and of three biological repeats (n = 3) in other panels. Bacteria transformed with empty vectors and J23101-sfGFP/mCherry were used as negative and positive controls for calculating fluorescence values in relative promoter units (RPU). The fold changes of logic gates were calculated by dividing the lowest TRUE-state fluorescence values by the highest FALSE-state fluorescence values, and labeled above the bars.
Extended Data Fig. 6 Split biomolecule enabled AND gates.
a, Design schematic of split biomolecule enabled three-input AND gate from ECF20. The 3′ RNA and 5′ RNA splice to produce mRNA encoding ECF20N-M86N, which splice with ECF20C-M86C to produce functional ECF20 protein. ECF20 activates downstream GFP expression. b, Fluorescence for split biomolecule enabled three-input AND gate from ECF20. Inset, fluorescence levels on logarithmic scales. Fold change: 102. c, Fluorescence for split biomolecule enabled three-input AND gate from mCherry. Fold change: 22. d, Fold changes of split biomolecule enabled four-input AND gate library. EGSs for RNA-splicing of each lobe are from the orthogonal EGS library. RBSs for C-lobe translation are from the iGEM RBS collection. RBS29, BBa_B0029. RBS32, BBa_B0032. RBS30 (BBa_B0030) was used for N-lobe peptide translation. The fluorescence for each gate was provided in source data file. e, Fluorescence for split biomolecule enabled four-input AND gates from ECF16. Fold change (left to right): 5, 5, 4. f, Design schematic of split intein enabled three-input AND gate from ECF20. Three peptides (ECF20N-M86N, M86C-ECF20M-sspDnaXN, and sspDnaXC-ECF20C) splice into functional ECF20 protein. g, Fluorescence for split intein enabled three-input AND gate from ECF20. Fold change: 13. Data in (b-c, e, g) were collected by flow cytometry, and bars show mean values and error bars represent s.d. of n = 3 biological replicates. Bacteria transformed with empty vectors and J23101-sfGFP were used as negative and positive controls for calculating fluorescence values in relative promoter units (RPU). The fold changes of logic gates were calculated by dividing the lowest TRUE-state fluorescence values by the highest FALSE-state fluorescence values.
Extended Data Fig. 7 Split biomolecule enabled NAND gates.
a, Design schematic of split biomolecule enabled three-input NAND gate from LmrA. The 3′ RNA and 5′ RNA splice to produce mRNA encoding LmrAN-sspGyrBN, which splice with LmrAC-sspGyrBC to produce functional LmrA protein. LmrA represses downstream GFP expression. b, Fluorescence for split biomolecule enabled three-input NAND gates from LmrA. Fold change: gate 1 (left), 51; gate 2 (right), 26. c, Design schematic of split biomolecule enabled three-input NAND gate from BM3R1. The 3′ RNA and 5′ RNA splice to produce mRNA encoding BM3R1C-Cth-TerC, which splice with BM3R1N-Cth-TerN to produce functional BM3R1 protein. BM3R1 represses downstream GFP expression. d, Fluorescence for split biomolecule enabled three-input NAND gates from BM3R1. Fold change: gate 1 (left), 15; gate 2 (right), 14. e, Fluorescence for the second split biomolecule enabled four-input NAND gate from LmrA. Fold change: 46. Data in (b, d, e) were collected by flow cytometry, and bars show mean values and error bars represent s.d. of n = 3 biological replicates. Bacteria transformed with empty vectors and J23101-sfGFP were used as negative and positive controls for calculating fluorescence values in relative promoter units (RPU). The fold changes of logic gates were calculated by dividing the lowest TRUE-state fluorescence values by the highest FALSE-state fluorescence values.
Supplementary information
Supplementary Information
Supplementary Figs. 1–13, Tables 1–30 and Notes 1–3.
Supplementary Data 1
Sequences of the plasmids constructed in this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Y., Mardian, R., Ma, J. et al. Programmable trans-splicing riboregulators for complex cellular logic computation. Nat Chem Biol (2025). https://doi.org/10.1038/s41589-024-01781-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41589-024-01781-4