这是用户在 2024-4-30 11:48 为 https://www.geoffreylitt.com/2023/07/25/building-personal-tools-on-the-fly-with-llms 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

July 2023 2023 年 7 月

Codifying a ChatGPT workflow into a malleable GUI
将 ChatGPT 工作流程编码为可塑性 GUI

Wouldn't it be neat if you could use LLMs to create little personal utility apps as the need arises? Here's a story where I did just that...
如果您可以使用 LLMs 随需创建小型个人实用应用程序,那不是很棒吗?这是我刚做过的一个故事...

In my previous post, Malleable software in the age of LLMs, I laid out a theory for how LLMs might enable a new era of people creating their own personal software:
在我之前的帖子《在 LLMs 时代的可塑软件中》,我提出了一个理论,即 LLMs 可能会开启一个新时代,人们可以创建自己的个人软件:

I think it’s likely that soon all computer users will have the ability to develop small software tools from scratch, and to describe modifications they’d like made to software they’re already using.
我认为很可能不久之后,所有的电脑用户都将具备从零开始开发小型软件工具的能力,并描述他们希望对已经在使用的软件进行的修改。

In other words, LLMs will represent a step change in tool support for end-user programming: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming. Until now, that vision has been bottlenecked on turning fuzzy informal intent into formal, executable code; now that bottleneck is rapidly opening up thanks to LLMs.
换句话说,LLMs 将代表终端用户编程工具支持的一次重大变革:普通人完全利用计算机的一般功能而无需借助普通编程的复杂性。直到现在,这一愿景一直受制于将模糊的非正式意图转化为正式的可执行代码;现在,由于 LLMs,这一瓶颈正在迅速打开。

Today I’ll share a real example where I found it useful to build custom personal software with an LLM. Earlier this week, I used GPT-4 to code an app that helps me draft text messages in English and translate them to Japanese. The basic idea: I paste in the context for the text thread and write my response in English; I get back a translation into Japanese. The app has a couple other neat features, too: I can drag a slider to tweak the formality of the language, and I can highlight any phrase to get a more detailed explanation.
今天我将分享一个真实的例子,我发现使用 LLM 构建定制个人软件非常有用。本周早些时候,我使用 GPT-4 编写了一个应用程序,帮助我起草英文短信并将其翻译成日文。基本思路是:我粘贴文本线程的上下文,用英文写下我的回复;我得到一个日文翻译。该应用程序还有一些其他很棒的功能:我可以拖动滑块调整语言的正式程度,我可以突出显示任何短语以获得更详细的解释。

The whole thing is ugly and thrown together in no time, but it has exactly the features I need, and I’ve found it quite useful for planning an upcoming trip to Japan.
整个东西看起来很丑,匆忙拼凑而成,但它恰好具有我需要的功能,我发现它在规划即将到来的日本之行时非常有用。

The app uses the GPT-4 API to do the actual translations. So there are two usages of LLMs going on here: I used an LLM to code the app, and then the app also uses an LLM when it runs to do the translations. Sorry if that’s confusing, 2023 is weird.
该应用程序使用 GPT-4 API 进行实际翻译。因此,这里有两个 LLMs 的用法:我使用 LLM 编写应用程序,然后应用程序在运行时还使用 LLM 进行翻译。如果这让你感到困惑,2023 年真是奇怪。

You may ask: why bother making an app for this? Why not just ask ChatGPT to do the translations? I’m glad you asked—that’s what this post is all about! In fact, I started out doing these translations in ChatGPT, but I ended up finding this GUI nicer to use than raw ChatGPT for several reasons:
你可能会问:为什么要为此制作一个应用程序?为什么不直接问 ChatGPT 进行翻译?我很高兴你问到了—这正是这篇文章的主题!事实上,我最初是在 ChatGPT 中进行这些翻译的,但最终发现这个 GUI 比原始的 ChatGPT 更好用,原因有几个。

  • It encodes a prescriptive workflow so I don’t need to fuss with prompts as much.
    它编码了一个规范的工作流程,这样我就不需要太多地纠结于提示了。
  • It offers convenient direct manipulation affordances like text boxes and sliders.
    它提供了方便的直接操作功能,如文本框和滑块。
  • It makes it easier to share a workflow with other people.
    它使得与其他人共享工作流程变得更加容易。

(Interestingly, these are similar to the reasons that so many startups are building products wrapping LLM prompts—the difference here is that I’m just building the tool for myself, and not trying to make a product.)
(有趣的是,这些原因与许多初创公司构建产品包装LLM提示的原因相似——不同之处在于我只是为自己构建工具,而不是试图制作产品。)

A key point is that making this personal GUI is only worth it because GPT also lowers the cost of making and iterating on the GUI! Even though I’m a programmer, I wouldn’t have made this tool without LLM support. It’s not only the time savings, it’s also the fact that I don’t need to turn on my “programmer brain” to make these tools; I can think at a higher level and let the LLM handle the details.
一个关键点是,制作这个个人 GUI 之所以值得,是因为 GPT 也降低了制作和迭代 GUI 的成本!即使我是一名程序员,没有LLM的支持,我也不会制作这个工具。这不仅仅是节省时间,还有一个事实,我不需要打开我的“程序员大脑”来制作这些工具;我可以在更高的层次上思考,让LLM处理细节。

There are also tradeoffs to consider when moving from ChatGPT into a GUI tool: the resulting workflow is more rigid and less open-ended than a ChatGPT session. In a sense this is the whole point of a GUI. But the GUI isn’t necessarily as limiting as it might seem, because remember, it’s malleable—I built it myself using GPT and can quickly make further edits. This is a very different situation that using a fixed app that someone else made! Below I’ll share one example of how I edited this tool on the fly as I was using it.
从 ChatGPT 转向 GUI 工具时,也需要考虑权衡:生成的工作流程比 ChatGPT 会话更为严格,不那么开放式。在某种程度上,这正是 GUI 的全部意义。但 GUI 并不一定像看起来那么限制,因为请记住,它是可塑的——我自己使用 GPT 构建了它,并且可以快速进行进一步的编辑。这与使用别人制作的固定应用程序完全不同!接下来我将分享一个例子,说明我在使用过程中如何即兴编辑这个工具。

Overall I think this experience suggests an intriguing workflow of codifying a ChatGPT workflow into a malleable GUI: starting out with ChatGPT, exploring the most useful way to solve a task, and then once you’ve landed on a good approach, codifying that approach in a GUI tool that you can use in a repeatable way going forward.
总的来说,我认为这种经历表明了一个有趣的工作流程,将 ChatGPT 的工作流程编码成一个可塑性的 GUI:从 ChatGPT 开始,探索解决任务的最有效方式,一旦找到了一个好的方法,将该方法编码到一个 GUI 工具中,以便您可以在以后重复使用。

Alright, on to the story of how this app came about.
好的,让我们来讲述这个应用程序是如何诞生的故事。


ChatGPT is a good translator (usually 🙃)
ChatGPT 是一个不错的翻译工具(通常 🙃)

I’m going on a trip to Japan soon and have been on some text threads where I need to communicate in Japanese. I grew up in Japan but my writing is rusty and painfully slow these days. One particular challenge for me is using the appropriate level of formality with extended family and other family acquaintances—I have fluent schoolyard Japanese but the nuances of formal grown-up Japanese can be tricky.
我很快要去日本旅行,最近一直在一些文本线索中需要用日语交流。我在日本长大,但现在写作已经生疏而且变得异常缓慢。对我来说,一个特别的挑战是在与远亲和其他家庭熟人交流时使用适当的礼貌水平——我能流利地使用学校里的日语,但成年人之间的正式日语的微妙之处可能会让我感到棘手。

I started using ChatGPT to make this process faster by asking it to produce draft messages in Japanese based on my English input. I quickly realized there are some neat benefits to ChatGPT vs. a traditional translation app. I can give it the full context of the text thread so it can incorporate that into its translation. I can steer it with prompting: asking it to tweak the formality or do a less word-for-word translation. I can ask follow-up questions about the meaning of a word. These capabilities were all gamechangers for this task; they really show why smart chatbots can be so useful!
我开始使用 ChatGPT,通过要求它根据我的英文输入生成日语草稿消息,以加快这一过程。我很快意识到与传统翻译应用相比,ChatGPT 有一些很棒的优势。我可以给它完整的文本线索背景,这样它就可以将其纳入翻译中。我可以通过提示来引导它:要求它调整正式程度或进行非逐字翻译。我可以就单词的含义提出后续问题。这些功能对于这项任务都是改变游戏规则的因素;它们真正展示了为什么智能聊天机器人可以如此有用!

You may be wondering: how good were the translations? I’d say: good enough to be spectacularly useful to me, given that I can verify and edit. Often they were basically perfect. Sometimes they were wrong in huge, hilarious ways—flipping the meaning of a sentence, or swapping the name of a train station for another one (sigh, LLMs…).
也许你会想:翻译质量如何?我会说:对我来说足够好,因为我可以验证和编辑。通常它们基本上是完美的。有时它们以巨大、滑稽的方式出错——颠倒了句子的意思,或者将一个火车站的名字与另一个混淆(叹气,LLMs…)。

In practice these mistakes didn’t matter too much though. I’m slow at writing in Japanese but can read basic messages easily, so I just fix the errors and they aren’t dealbreakers. When creation is slow and verification is fast, it’s a sweet spot for using an LLM.
实际上,这些错误并不太重要。我在日语写作方面速度较慢,但可以轻松阅读基本信息,所以我只需修正错误,它们并不是致命问题。当创作速度较慢而验证速度较快时,使用 LLM 是一个理想选择。

Honing the workflow 磨练工作流程

As I translated more messages and saw ways that the model failed, I developed some little prompting tricks that seemed to produce better translations. Things like this:
随着我翻译更多的消息并看到模型失败的方式,我开发了一些小提示技巧,似乎产生了更好的翻译。像这样的事情:

Below is some context for a text message thread:
下面是一些文本消息线程的背景:

…paste thread… …粘贴线程…

Now translate my message below to japanese. make it sound natural in the flow of this conversation. don’t translate word for word, translate the general meaning.
现在将我的消息翻译成日语。让它在这段对话中自然流畅。不要逐字翻译,而是翻译其大致含义。

…write message… …写信息…

I also learned some typical follow-up requests I would often make after receiving the initial translation: things like asking to adjust the formality level up or down.
我还学到了一些典型的跟进请求,通常在收到初始翻译后我会经常提出这些请求:比如要求调整正式程度上升或下降等。

Once I had landed on these specific prompt patterns, it made my interactions more scripted. Each time I would need to dig up my prompt text for this task, copy-paste it in, and fill in the blanks for this particular translation. When asking follow-up questions I’d also copy-paste phrasings from previous chats that had proven successful. At this point it didn’t feel like an open-ended conversation anymore; it felt like I was tediously executing a workflow made up of specific chat prompts.
一旦我确定了这些特定的提示模式,我的互动变得更加程式化。每次我需要找出这项任务的提示文本,复制粘贴进去,并为这个特定的翻译填写空白。在提出后续问题时,我也会复制粘贴之前成功的聊天中的措辞。此时,它不再感觉像是一场开放式对话;而是感觉我在繁琐地执行由特定聊天提示组成的工作流程。

I also found myself wanting to have more of a feeling of a solid tool that I could return to. ChatGPT chats feels a bit amorphous and hard to return to: where do I store my prompts? How do I even remember what useful workflows I’ve come up with? I basically wanted a window I could pop open and get a quick translation.
我也发现自己希望有一种坚实的工具感,可以随时返回。ChatGPT 聊天感觉有点模糊,难以返回:我应该把提示存放在哪里?我该如何记住我想出的有用工作流程?我基本上想要一个可以快速弹出并进行快速翻译的窗口。

Making a GUI with GPT
使用 GPT 制作 GUI

So, I asked GPT-4 to build me a GUI codifying this workflow. The app is a frontend-only React.js web app. It’s hosted on Replit, which makes it easy to spin up a new project in one click and then share a link with people. (You can see the current code here if you’re curious.) I just copy-pasted the GPT-generated code into Replit.
因此,我请 GPT-4 为我构建一个 GUI,将这个工作流程编码化。这个应用是一个仅有前端的 React.js 网页应用。它托管在 Replit 上,这使得一键启动新项目然后与他人分享链接变得容易。(如果你感兴趣,你可以在这里看到当前的代码。)我只是将 GPT 生成的代码复制粘贴到 Replit 中。

The initial version of the app was very simple: it basically just accepted a text input and then made a request to the GPT-4 API asking for a natural-sounding translation. The early designs generated by ChatGPT were super primitive:
这个应用的最初版本非常简单:基本上只是接受文本输入,然后向 GPT-4 API 发送请求,要求自然流畅的翻译。ChatGPT 生成的早期设计非常原始。

Asking it for a “professional and modern” redesign helped get the design looking passable. I then asked GPT to add a formality slider to the app. The new app requests three translations of varying formality, and then lets the user drag a slider to instantly choose between them 😎
要求进行“专业和现代化”的重新设计有助于使设计看起来过得去。然后我要求 GPT 为应用程序添加一个正式度滑块。新应用程序请求三种不同正式程度的翻译,然后让用户拖动滑块即可立即选择其中之一 😎

GPT-4 did most of the coding of the UI. I didn’t measure how long it took, but subjectively, the whole thing felt pretty effortless; it felt more like asking a friend to build an app for me than building it myself, and I never engaged my detailed programmer brain. I still haven’t looked very closely at the code. GPT generally produced good results on every iteration. At one point it got confused about how to call the OpenAI API, but pasting in some recent documentation got it sorted out. I’ve included some of the coding prompts I used at the bottom of this post if you’re curious about the details.
GPT-4 完成了大部分 UI 的编码。我没有测量花了多长时间,但主观上,整个过程感觉非常轻松;感觉更像是请朋友帮我建一个应用程序,而不是自己建,我从未动用我的详细程序员大脑。我仍然没有仔细查看代码。GPT 在每次迭代中通常都能产生良好的结果。有一次它在如何调用 OpenAI API 上感到困惑,但粘贴一些最近的文档解决了这个问题。如果你对细节感兴趣,我在本文底部包含了一些我使用的编码提示。

At the same time, it’s important to note that my programming background did substantially help the process along and I don’t think it would have gone that well if I didn’t know how to make React UIs. I was able to give the LLM a detailed spec, which was natural for me to write. For example: I suggested storing the OpenAI key as a user-provided setting in the app UI rather than putting it in the code, because that would let us keep the app frontend-only. I also helped fix some minor bugs.
同时,需要注意的是,我的编程背景在这个过程中起到了相当大的帮助,如果我不知道如何制作 React UIs,我认为情况可能不会那么顺利。我能够给LLM提供详细的规范,这对我来说是很自然的。例如:我建议将 OpenAI 密钥存储为应用程序 UI 中的用户提供的设置,而不是放在代码中,因为这样可以让我们保持应用程序仅限于前端。我还帮助修复了一些小 bug。

I do believe it’s possible to get to the point where an LLM can support non-programmers in building custom GUIs (and that’s in fact one of my main research goals at the moment). But it’s a much harder goal than supporting programmers, and will require a lot more work on tooling. More on this later.
我相信有可能达到这样一个程度,即LLM可以支持非程序员构建自定义 GUI(事实上,这是我目前的主要研究目标之一)。但这比支持程序员更难实现,需要在工具方面进行更多的工作。稍后会详细介绍。

Iterating on the fly
在飞行中迭代

A few times I noticed that the Japanese translations included phrases I didn’t understand. Once this need came up a few times, I decided to add it as a feature in my GUI. I asked GPT to modify the code so that I can select a phrase and click a button to get an explanation in context:
有几次我注意到日语翻译中包含我不理解的短语。当这种需求出现几次时,我决定将其添加为我的 GUI 中的一个功能。我要求 GPT 修改代码,这样我就可以选择一个短语,然后点击一个按钮以获得上下文中的解释。

This tight iteration loop felt awesome. Going from wanting the feature to having it in my app was accomplished in minutes with very little effort. This shows the benefit of having a malleable GUI which I control and I can quickly edit using an LLM. My feature requests aren’t trapped in a feedback queue, I can just build them for myself. It’s not the best-designed interaction ever, but it gets the job done.
这种紧密的迭代循环感觉真棒。从想要某项功能到在我的应用程序中实现它只用了几分钟,几乎没有什么努力。这显示了拥有一个灵活的 GUI 的好处,我可以控制它,并且可以使用LLM快速编辑。我的功能请求不会被困在反馈队列中,我可以自己构建它们。虽然它不是有史以来设计最好的交互,但它完成了任务。

I’ve found that having the button there encourages me to ask for explanations more often. Before, when I was doing the translations in ChatGPT, I would need to explicitly think to write a follow-up message asking for an explanation. Now I have a button reminding me to do it, and the button also uses a high-quality prompt that I’ve developed.
我发现有了那个按钮,我更经常会请求解释。以前在 ChatGPT 做翻译时,我需要刻意思考是否要写跟进消息请求解释。现在有了一个按钮提醒我这么做,而且按钮还使用了我开发的高质量提示。

Sharing the tool 分享工具

My brother asked me to try the tool. I sent him the Replit link and he was able to use it.
我哥哥让我试试这个工具。我给了他 Replit 的链接,他成功地使用了它。

I think sharing a GUI is probably way more effective than trying to share a complex ChatGPT workflow with various prompts patched together. The UI encodes what I’ve learned about doing this particular task effectively, and provides clear affordances that anyone can pick up quickly.
我觉得分享一个图形用户界面可能比尝试分享一个由各种提示拼凑在一起的复杂 ChatGPT 工作流程更有效。这个界面包含了我对如何有效地完成这个特定任务的经验,提供清晰的操作指引,任何人都能迅速上手。

From chatbot to GUI
从聊天机器人到图形用户界面

What general lessons can we take away from my experience here? I think it gestures at two big ideas.
我们可以从我的这里的经验中得出哪些一般性教训?我认为它指向了两个重要的想法。

The first one is that chatbots are not always the best interface for a task, even one like translation that involves lots of natural language and text. Amelia Wattenberger wrote a great piece explaining some of the reasons. It’s worth reading the whole thing, but here’s a key excerpt about the value of affordances:
第一个是聊天机器人并不总是最适合某项任务的界面,即使是像涉及大量自然语言和文本的翻译这样的任务。Amelia Wattenberger 写了一篇很棒的文章解释了一些原因。值得一读整篇文章,但这里有一段关于功能性的价值的关键摘录:

Good tools make it clear how they should be used. And more importantly, how they should not be used. If we think about a good pair of gloves, it’s immediately obvious how we should use them. They’re hand-shaped! We put them on our hands. And the specific material tells us more: metal mesh gloves are for preventing physical harm, rubber gloves are for preventing chemical harm, and leather gloves are for looking cool on a motorcycle.
良好的工具清楚地表明了它们应该如何使用。更重要的是,它们不应该如何使用。如果我们考虑一双好手套,我们立刻就知道应该如何使用它们。它们是手形的!我们把它们戴在手上。而特定的材料告诉我们更多:金属网手套用于防止身体伤害,橡胶手套用于防止化学伤害,皮手套用于在摩托车上看起来酷。

Compare that to looking at a typical chat interface. The only clue we receive is that we should type characters into the textbox. The interface looks the same as a Google search box, a login form, and a credit card field.
与查看典型的聊天界面相比,我们唯一得到的线索是应该在文本框中输入字符。该界面看起来与谷歌搜索框、登录表单和信用卡字段相同。

This principle clearly holds when designing a product that other people are going to use. But perhaps surprisingly, in my experience, affordances are actually useful even when designing a tool for myself! Good affordances can help my future self remember how to use the tool. The “explain phrase” button reminds me that I should ask about words I don’t know.
在设计别人会使用的产品时,这个原则显然很重要。但或许令人惊讶的是,在我的经验中,即使是为自己设计工具,功能性也是非常有用的!良好的功能性可以帮助我未来的自己记得如何使用这个工具。“解释短语”按钮提醒我应该询问我不懂的词语。

I also find that making a UI makes a tool more memorable. My custom GUI is a visually distinctive artifact that lives at a URL; this helps me remember that I have the tool and can use it. Having a UI makes my tool feel more like a reusable artifact than a ChatGPT prompt.
我也发现,制作用户界面会让工具更加令人难忘。我的自定义 GUI 是一个视觉上独特的物件,它存在于一个网址上;这有助于我记得我拥有这个工具并且可以使用它。拥有用户界面让我的工具感觉更像是一个可重复使用的物件,而不仅仅是一个 ChatGPT 提示。

Now, it’s not quite as simple as “GUI good, chatbot bad"—there are tradeoffs. For my translation use case, I found ChatGPT super helpful for my initial explorations. The open-endedness of the chatbot gave it a huge leg up over Google Translate, a more traditional application with more limited capabilities and clearer affordances. I was able to explore a wide space of useful features and find the ones that I wanted to keep using.
现在,事情并不像“GUI 好,聊天机器人不好”那么简单——这其中存在一些权衡。对于我的翻译需求来说,我发现 ChatGPT 在我最初的探索中非常有帮助。聊天机器人的开放性使其比谷歌翻译更具优势,后者是一款更传统、功能更有限、界面更清晰的应用程序。我能够探索各种有用的功能,并找到我想要继续使用的功能。

I think this suggests a natural workflow: start in chat, and then codify a UI if it’s getting annoying doing the same chat workflow repeatedly.
我觉得这表明了一种自然的工作流程:从聊天开始,如果重复进行相同的聊天工作流程变得烦人,那就将其编码为用户界面。

By the way, one more thing: there are obviously many other visual affordances to consider besides the ones I used in this particular example. For example, here’s another example of a GPT-powered GUI tool I built a couple months ago, where I can drag-and-drop in a file and see useful conversions of that file into different formats:
顺便说一句,还有一件事:显然除了我在这个特定示例中使用的之外,还有许多其他视觉提示要考虑。例如,这里是我几个月前构建的另一个由 GPT 驱动的 GUI 工具的另一个示例,我可以将文件拖放到其中,看到该文件转换为不同格式的有用转换:

The joy of editing our tools
编辑工具的乐趣

Another takeaway: it feels great to use a tiny GUI made just for my own needs. It does only what I want it to do, nothing more. The design isn’t going to win any awards or get VC funding, but it’s good enough for what I want. When I come across more things that the app needs to do, I can add them.
另一个收获:使用一个只为了满足我的需求而制作的小型图形用户界面感觉真好。它只做我想让它做的事情,没有多余的功能。设计不会赢得任何奖项或获得风险投资,但对我来说已经足够好了。当我发现应用需要做更多事情时,我可以添加它们。

Robin Sloan has this delightful idea that an app can be a home-cooked meal:
罗宾·斯隆有一个很有趣的想法,就是一个应用程序可以像一顿家常菜:

When you liberate programming from the requirement to be professional and scalable, it becomes a different activity altogether, just as cooking at home is really nothing like cooking in a commercial kitchen. I can report to you: not only is this different activity rewarding in almost exactly the same way that cooking for someone you love is rewarding, there’s another feeling, too, specific to this realm. I have struggled to find words for this, but/and I think it might be the crux of the whole thing:
当你将编程从必须专业和可扩展的要求中解放出来时,它就变成了完全不同的活动,就像在家里做饭与在商业厨房做饭完全不同一样。我可以告诉你:这种不同的活动几乎与为你爱的人做饭的奖励方式完全相同,还有另一种感觉,特有于这个领域。我一直在努力寻找用于描述这种感觉的词语,但我认为这可能是整个事情的关键所在。

This messaging app I built for, and with, my family, it won’t change unless we want it to change. There will be no sudden redesign, no flood of ads, no pivot to chase a userbase inscrutable to us. It might go away at some point, but that will be our decision. What is this feeling? Independence? Security? Sovereignty?
我为我的家人构建的这个即时通讯应用,除非我们希望它改变,否则它不会改变。不会有突然的重新设计,没有大量广告涌入,也不会为了追逐我们无法理解的用户群而改变方向。它可能会在某个时候消失,但那将是我们的决定。这是什么感觉?独立?安全?主权?

Is it simply … the feeling of being home?
这只是……感觉像是回家了吗?

Software doesn’t always need to be mass-produced like restaurant food, it can be produced intimately at small scale. My translator app feels this way to me.
软件并不总是需要像餐厅食物那样大规模生产,它可以在小规模下亲密地生产。我的翻译应用对我来说就是这样的感觉。

In this example, using GPT-4 to code and edit the app is what enabled the feeling of malleability for me. It feels magical describing an app and having it appear on-screen within seconds. Little React apps seem to be the kind of simple code that GPT-4 is good at producing. You could even argue that it’s "just regurgitating other code it’s already seen”, but I don’t care—it made me the tool that I wanted.
在这个例子中,使用 GPT-4 来编码和编辑应用程序是让我感受到可塑性的原因。描述一个应用程序,然后在几秒钟内在屏幕上看到它出现,感觉就像魔法一样。小的 React 应用程序似乎是 GPT-4 擅长生成的简单代码类型。你甚至可以说它只是“重复其他它已经见过的代码”,但我不在乎 —— 它让我成为了我想要的工具。

I’m a programmer and I could have built this app manually myself without too much trouble. And yet, I don’t think I would have. The LLM is an order of magnitude faster than me at getting the first draft out and producing new iterations, this makes me much more likely to just give it a shot. This reminds me of how Simon Willison says that AI-enhanced development makes him more ambitious with his projects:
我是一名程序员,我本可以自己手动构建这个应用程序而不会遇到太多麻烦。然而,我认为我不会这样做。LLM 在快速完成初稿并生成新版本方面比我快一个数量级,这让我更有可能尝试一下。这让我想起了 Simon Willison 如何说 AI 增强开发让他对自己的项目更加雄心勃勃:

In the past I’ve had plenty of ideas for projects which I’ve ruled out because they would take a day—or days—of work to get to a point where they’re useful. I have enough other stuff to build already!
过去,我有很多项目的想法,但最终都放弃了,因为要花上一天,甚至几天的时间才能让它们变得有用。我已经有足够多的其他事情要做了!

But if ChatGPT can drop that down to an hour or less, those projects can suddenly become viable.
但如果 ChatGPT 能将这个时间缩短到一个小时或更短,那些项目突然间就变得可行起来了。

Which means I’m building all sorts of weird and interesting little things that previously I wouldn’t have invested the time in.
这意味着我正在制作各种奇怪和有趣的小东西,以前我不会花时间投资在这些上面。

Simon’s description applies perfectly to my example.
西蒙的描述完美地适用于我的例子。

It’s not just about the initial creation, it’s also about the fast iteration loop. I discussed the possibility of LLMs updating a GUI app in my previous post:
不仅仅是关于最初的创建,也关乎快速的迭代循环。我在之前的帖子中讨论了LLMs更新 GUI 应用的可能性:

Next, consider LLMs applied to the app model. What if we started with an interactive analytics application, but this time we had a team of LLM developers at our disposal? As a start, we could ask the LLM questions about how to use the application, which could be easier than reading documentation.
接下来,考虑LLMs应用于应用模型。如果我们从一个交互式分析应用程序开始,但这次我们有一个团队的LLM开发人员可供我们使用,会怎样?首先,我们可以向LLM提问如何使用该应用程序,这可能比阅读文档更容易。

But more profoundly than that, the LLM developers could go beyond that and update the application. When we give feedback about adding a new feature, our request wouldn’t get lost in an infinite queue. They would respond immediately, and we’d have some back and forth to get the feature implemented. Of course, the new functionality doesn’t need to be shipped to everyone; it can just be enabled for our team. This is economically viable now because we’re not relying on a centralized team of human developers to make the change.
但更重要的是,LLM开发人员可以超越这一点并更新应用程序。当我们提出添加新功能的反馈时,我们的请求不会在无限队列中丢失。他们会立即回应,我们会有一些来回以实现该功能。当然,新功能不需要立即发布给所有人;它可以仅对我们的团队启用。现在这在经济上是可行的,因为我们不再依赖一个集中的人类开发团队来进行更改。

It simply feels good to be using a GUI app, have an idea for how it could be different, and then have that new version running within seconds.
使用图形界面应用程序感觉真好,有了改进的想法,然后几秒钟内就能看到新版本运行起来。

There’s a caveat worth acknowleding here: the story I shared in this post only worked under specific conditions. The app I made is extremely simple in functionality; a more complex app would be much harder to modify.
这里有一个值得注意的警告:我在这篇文章中分享的故事只在特定条件下有效。我制作的应用程序在功能上非常简单;一个更复杂的应用程序将更难修改。

And I’m pretty confident that the coding workflow I shared in this post only worked because I’m a programmer. The LLM makes me much, much faster at building these simple kinds of utilities, but my programming knowledge still feels essential to keeping the process running. I’m writing fairly detailed technical specs, I’m making architectural choices, I’m occasionally directly editing the code or fixing a bug. The app is so small and simple that it’s easy for me to keep up with what’s going on.
而且我相当自信,我在这篇文章中分享的编码工作流之所以有效,是因为我是一名程序员。LLM 让我在构建这些简单类型的实用程序时更快,但我的编程知识仍然对保持流程运行至关重要。我正在编写相当详细的技术规格,做出架构选择,偶尔直接编辑代码或修复错误。这个应用程序如此小巧简单,让我很容易跟上进展。

I yearn for non-programmers to also experience software this way, as a malleable artifact they can change in the natural course of use. LLMs are clearly a big leap forward on this dimension, but there’s also a lot of work ahead. We’ll need to find ways for LLMs to work with non-programmers to specify intent, to help them understand what’s going on, and to fix things when they go wrong.
我渴望非程序员也能像这样体验软件,将其视为一种可在自然使用过程中改变的可塑品。LLMs在这个维度上显然是一个巨大的飞跃,但还有很多工作要做。我们需要找到方法让LLMs与非程序员合作,明确意图,帮助他们理解发生了什么,并在出现问题时进行修复。

I’m optimistic that a combination of better tooling and improved models can get us there, at least for simpler use cases like my translator tool. I guess there’s only one way to find out 🤓 (Subscribe to my email newsletter if you want to follow along with my research in this area.)
我乐观地认为,更好的工具和改进的模型的结合可以让我们实现这一目标,至少对于像我的翻译工具这样简单的用例。我想只有一种方法可以找出来 🤓(如果您想跟踪我在这一领域的研究,请订阅我的电子邮件通讯。)


Recently… 最近...

In the past few months I’ve given a couple talks relevant to the themes in this post.
在过去的几个月里,我做了几次与本文主题相关的演讲。

In April I spoke at Causal Islands about Potluck, a programmable notes prototype I worked on with Max Schoening, Paul Shen, and Paul Sonnentag at Ink & Switch. In my talk I share a bunch of demos from our published essay, but I also show some newer demos of integrating LLMs to help author spreadsheets. (The embed below will jump you right to the LLM demos)
四月份,我在 Causal Islands 演讲了关于 Potluck 的话题,这是我与 Max Schoening、Paul Shen 和 Paul Sonnentag 在 Ink & Switch 合作开发的可编程笔记原型。在我的演讲中,我分享了一堆来自我们发表的文章的演示,同时也展示了一些最新的演示,展示了如何整合LLMs来帮助编写电子表格。(下面的嵌入内容将直接跳转到LLM演示)

Also: a couple weeks ago, I presented my PhD thesis defense at MIT! I gave a talk called Building Personal Software with Reactive Databases. I talk about what makes spreadsheets great, and show a few projects I’ve worked on that aim to make it easier to build software using techniques from spreadsheets and databases.
前几周,我在麻省理工学院进行了博士论文答辩!我的演讲题目是《使用反应式数据库构建个人软件》。我谈到了电子表格的优点,并展示了一些我参与的项目,旨在通过电子表格和数据库技术,使软件开发变得更加简单。


If you’re interested in diving deeper into ways of interacting with LLMs besides chatbots, I strongly recommend the following readings:
如果您对与LLMs进行更深入的互动方式感兴趣,我强烈推荐以下阅读:

And for a more abstract angle on the example in this post, check out my previous post, Malleable software in the age of LLMs!
对于本文示例的更抽象角度,请查看我的上一篇文章,LLMs时代的可塑软件!


Appendix: prompts 附录:提示

Here are some of the prompts I used to make the translator app.
这里是我用来制作翻译应用程序的一些提示。

First, my general system prompt for UI coding:
首先,我用于 UI 编码的一般系统提示:

You are a helpful AI coding assistant. Make sure to follow the user’s instructions precisely and to the letter. Always reason aloud about your plans before writing the final code.
你是一个有用的 AI 编码助手。确保准确地遵循用户的指示,并且一丝不苟。在编写最终代码之前,始终大声地思考你的计划。

Write code in ReactJS. Keep the whole app in one file. Only write a frontend, no backend.
在 ReactJS 中编写代码。将整个应用程序放在一个文件中。只编写前端,没有后端。

If the specification is clear, you can generate code immediately. If there are ambiguities, ask key clarifying questions before proceeding.
如果规范清晰,可以立即生成代码。如果存在歧义,请在继续之前提出关键澄清问题。

When the user asks you to make edits, suggest minimal edits to the code, don’t regenerate the whole file.
当用户要求您进行编辑时,请建议对代码进行最小修改,不要重新生成整个文件。

Initial prompt for the texting app:
文本应用程序的初始提示:

I’d like you to make me an app that helps me participate in a text message conversation in Japanese by using an LLM to translate. Here’s the basic idea:
我想让你为我制作一个应用程序,通过使用 LLM 进行翻译,帮助我参与日语的短信对话。这是基本的想法:

  • I paste in a transcript of a text message thread into a box
    我将一段短信对话的文字剪贴到一个框中
  • I write the message I want to reply with (in english) into a different box
    我将要回复的消息(用英文)写入另一个框中
  • I click a button
    我点击一个按钮
  • the app shows me a Japanese translation of my message as output; there’s a copy button so i can copy-paste it easily.
    应用程序会将我的消息显示为日语翻译;有一个复制按钮,方便我进行复制粘贴。
  • the app talks to openai gpt-4 to do the translation. the prompt can be something like “here’s a text thread in japanese: . now translate my new message below to japanese. make it sound natural in the flow of this conversation. don’t translate word for word, translate the general meaning.” use the openai js library, some sample code pasted below.
    应用程序与 openai gpt-4 进行通信进行翻译。提示可以是类似“这是一个日语文本线程:”。现在将我的新消息翻译成日语。使其在对话中自然流畅。不要逐字翻译,翻译其一般含义。使用 openai js 库,以下是一些示例代码。
  • the user can paste in their openai key in a settings pane, it gets stored in localstorage
    用户可以在设置面板中粘贴他们的 openai 密钥,它将被存储在本地存储中

One of the iterative edits for the texting app:
文本应用程序的迭代编辑之一:

make the following edits and output new code:
进行以下编辑并输出新代码:

  • write a css file and style the app to look professional and modern.
    编写一个 CSS 文件,使应用程序看起来专业和现代。
  • arrange the text thread in a tall box on the left, and then the new message and translation vertically stacked to the right
    将文本线程排列在左侧的高盒子中,然后将新消息和翻译垂直堆叠在右侧。
  • give the app a title: Japanese Texting Helper
    给这个应用取个名字:日本文本助手
  • hide the openai key behind a settings section that gets toggled open/closed at the bottom of the app
    将 OpenAI 密钥隐藏在应用程序底部可切换打开/关闭的设置部分