Musings on building a Generative AI product
关于构建生成式人工智能产品的思考

April 25, 2024 2024 年 4 月 25 日

Co-authors: Co-authored byJuan Pablo Bottaro and Co-authored byKarthik Ramgopal
合著者：Juan Pablo Bottaro 和 Karthik Ramgopal

Over the last six months, our team here at LinkedIn has been working hard to develop a new AI-powered experience. We wanted to reimagine how our members go about their job searches and browse professional content.
在过去的六个月里，LinkedIn 团队一直在努力开发一种新的人工智能体验。我们希望重新构想我们的会员如何进行工作搜索和浏览专业内容。

The explosion of generative AI made us pause and consider what was possible now that wasn’t a year ago. We tried many ideas which didn’t really click, eventually discovering the power of turning every feed and job posting into a springboard to:
生成式人工智能的爆发使我们停下来思考，考虑现在可能实现的事情，而这在一年前是不可能的。我们尝试了许多想法，但没有真正奏效，最终发现了将每个信息源和工作发布转化为一个跳板的力量：

Get information faster, e.g. takeaways from a post or learn about the latest from a company.
更快地获取信息，例如从一篇帖子中获取要点，或了解一家公司的最新动态。
Connect the dots, e.g. assess your fit for a job posting.
连接这些点，例如评估您与工作岗位的匹配程度。
Receive advice, e.g. improve your profile or prepare for an interview.
接受建议，例如改善您的个人资料或为面试做准备。
And much more… 等等……

Was it easy to build? What went well and what didn’t? Building on top of generative AI wasn’t all smooth sailing, and we hit a wall in many places. We want to pull back the “engineering” curtain and share what came easy, where we struggled, and what’s coming next.
建设起来容易吗？哪些方面做得好，哪些方面做得不好？在生成式人工智能的基础上构建并非一帆风顺，我们在许多地方遇到了困难。我们希望拉开“工程”幕布，分享哪些方面容易，我们遇到了哪些困难，以及接下来会发生什么。

Mobile screen showing Premium experience

Overview 概述

Let’s walk through a real-life scenario to show how the system works.
让我们通过一个现实生活场景来展示系统是如何运作的。

Imagine you're scrolling through your LinkedIn feed and stumble upon an intriguing post about accessibility in design. Alongside the post, you're presented with a few starter questions to delve deeper into the topic. You're curious and click on, "What are some examples of accessibility driving business value in tech companies?"
想象一下，当您浏览 LinkedIn 动态时，偶然发现了一篇关于设计无障碍性的引人入胜的帖子。在帖子旁边，您看到了一些初始问题，以深入探讨这个话题。您感到好奇，点击了“科技公司中无障碍性如何推动商业价值的一些例子是什么？”

Here’s what happens in the background:
这是背景中发生的事情：

Pick the right agent: This is where your journey begins. Our system takes your question and decides which AI agent is best equipped to handle it. In this case, it recognizes your interest in accessibility within tech companies and routes your query to an AI agent specialized in general knowledge seeking questions.
选择正确的代理人：这是您旅程的起点。我们的系统会接收您的问题，并决定哪个人工智能代理人最适合处理它。在这种情况下，它会识别出您对科技公司内可访问性的兴趣，并将您的查询路由到一个专门处理一般知识寻求问题的人工智能代理人。
Gather information: It’s time for some legwork. The AI agent calls a combination of internal APIs & Bing, searching for specific examples and case studies that highlight how accessibility in design has contributed to business value in tech. We are creating a dossier to ground our response.
收集信息：是时候进行一些实地调查了。AI 代理调用一组内部 API 和必应搜索特定示例和案例研究，突出设计中的可访问性如何为科技业务价值做出贡献。我们正在创建一个档案来支撑我们的回应。
Craft a response: With the necessary information in hand, the agent can now write a response. It filters and synthesizes the data into a coherent, informative answer, providing you with clear examples of how accessibility initiatives have driven business value for tech companies. To avoid generating a wall of text and make the experience more interactive, internal APIs are invoked to decorate the response with attachments like article links, or profiles of people mentioned in the post.
制作回应：有了必要的信息，代理现在可以撰写回应。它将数据进行过滤和综合，形成一篇连贯、信息丰富的答复，为您提供清晰的例子，说明可访问性倡议如何推动科技公司的商业价值。为了避免生成大段文字，使体验更具互动性，内部 API 被调用以在回应中添加附件，如文章链接或提及帖子中的人物简介。

You might follow up with “How do I pivot my career towards this area?”, and we’d repeat the process but now routing you to a career and job AI agent. With just a few clicks you can go deep on any topic, get actionable insights or find your next big opportunity.
您可能会追问：“我如何将我的职业转向这个领域？”，我们会重复这个过程，但现在会将您引导到一个职业和工作人工智能代理。只需点击几下，您就可以深入研究任何主题，获得可操作的见解，或找到下一个重要机会。

Most of this was made possible by the advent of large language models (LLMs), and we thought it’d be interesting to share the behind-the-scenes stories about the challenges we faced building on top of them.
这在很大程度上得益于大型语言模型的出现，我们认为分享在其基础上构建时面临的挑战背后的故事会很有趣。

What came easy 容易得来的东西

Overall design 总体设计

Diagram of a simplified pipeline for handling user queries — Figure 1: Simplified pipeline for handling user queries. KSA stands for “Knowledge Share Agent”, one of the dozens of agents that can handle user queries
图 1：处理用户查询的简化流程。KSA 代表“知识共享代理”，是数十个可以处理用户查询的代理之一。

Some of you might’ve noticed from the explanation above that our pipeline follows what’s known as Retrieval Augmented Generation (RAG), which is a common design pattern with generative AI systems. Building the pipeline was surprisingly less of a headache than we anticipated. In just a few days we had the basic framework up and running:
从上面的解释中，你们可能已经注意到我们的流程遵循了所谓的检索增强生成（RAG），这是一种常见的生成式人工智能系统设计模式。建立这个流程比我们预期的要简单得多。仅仅几天，我们就建立起了基本框架并让其运行起来。

Routing: decides if the query is in scope or not, and which AI agent to forward it to. Examples of agents are: job assessment, company understanding, takeaways for posts, etc.
路由：决定查询是否在范围内，以及将其转发给哪个 AI 代理。代理的示例包括：工作评估，公司理解，帖子要点等。
Retrieval: recall-oriented step where the AI agent decides which services to call and how (e.g. LinkedIn People Search, Bing API, etc.).
检索：这是一个以召回为导向的步骤，AI 代理决定调用哪些服务以及如何调用（例如 LinkedIn 人员搜索，必应 API 等）。
Generation: precision-oriented step that sieves through the noisy data retrieved, filters it and produces the final response.
生成：精准定向的步骤，通过筛选嘈杂的数据，过滤并生成最终响应。

Tuning ‘routing’ and ‘retrieval’ felt more natural given their classification nature: we built dev sets and fitted them with prompt engineering and in-house models. Now, generation, that was a different story. It followed the 80/20 rule; getting it 80% was fast, but that last 20% took most of our work. When the expectation from your product is that 99%+ of your answers should be great, even using the most advanced models available still requires a lot of work and creativity to gain every 1%.
调整“路由”和“检索”感觉更自然，考虑到它们的分类性质：我们建立了开发集，并使用提示工程和内部模型进行了拟合。现在，生成是另一回事。它遵循 80/20 法则；达到 80%很快，但最后的 20%占据了我们大部分的工作。当您的产品期望是 99%以上的答案都应该很好时，即使使用最先进的可用模型，仍需要大量的工作和创造力来获得每 1%。

What worked for us: 我们的成功之道：

Fixed 3-step pipeline 固定的 3 步流程
Small models for routing/retrieval, bigger models for generation
小模型用于路由/检索，大模型用于生成
Embedding-Based Retrieval (EBR) powered by an in-memory database as our 'poor man's fine-tuning' to inject response examples directly into our prompts
基于嵌入式检索（EBR）由内存数据库提供支持，作为我们的“穷人版微调”，直接将响应示例注入到我们的提示中
Per-step specific evaluation pipelines, particularly for routing/retrieval
每步特定的评估流程，特别是用于路由/检索

Development speed 发展速度

We wanted to move fast across multiple teams and hence decided to split tasks into independent agents (i.e., AI agents) developed by different people: general knowledge, job assessment, post takeaways, etc.
我们希望在多个团队之间快速移动，因此决定将任务分解为由不同人开发的独立代理人（即 AI 代理人）：普通知识、工作评估、岗位收获等。

However, this approach introduces a significant compromise. By parallelizing tasks, we gained in terms of speed, but it came at the cost of fragmentation. Maintaining a uniform user experience became challenging when subsequent interactions with an assistant might be managed by varied models, prompts, or tools.
然而，这种方法引入了一个重大的妥协。通过并行化任务，我们在速度方面取得了进展，但这是以碎片化为代价的。当助手的后续交互可能由不同的模型、提示或工具管理时，保持统一的用户体验变得具有挑战性。

To address this, we adopted a simple organizational structure:
为了解决这个问题，我们采用了简单的组织结构：

A small ‘horizontal’ engineering pod that handled common components and focused on the holistic experience. This included:
一个小型的“水平”工程舱，处理常见组件并专注于整体体验。这包括：
- The service hosting the product
  托管产品的服务
- Tooling for evaluation/testing
  评估/测试工具
- Global prompt templates that were consumed by all verticals (e.g. agent’s global identity, conversation history, jailbreak defense, etc.)
  全球提示模板被所有垂直领域所使用（例如，代理商的全球身份、对话历史、越狱防御等）。
- Shared UX components for our iOS/Android/Web clients
  我们 iOS/Android/Web 客户端共享的用户体验组件
- A server driven UI framework for releasing new UI changes without client code changes or releases.
  一个由服务器驱动的 UI 框架，用于发布新的 UI 更改，而无需客户端代码更改或发布。
Several ‘vertical’ engineering pods with autonomy on their agents, examples:
几个具有代理人自主性的“垂直”工程舱，例如：
- Personalized post summarization
  个性化的帖子总结
- Job fit assessment 工作匹配评估
- Interview tips 面试技巧

What worked for us: 我们的成功之道：

Divide and conquer but limiting the number of agents
分而治之但限制代理人数量
A centralized evaluation pipeline with multi-turn conversations
一个带有多轮对话的集中式评估流程
Sharing prompt templates (e.g. ‘identity’ definition), UX templates, tooling & instrumentation
分享快速模板（例如“身份”定义），用户体验模板，工具和仪器。

Where we struggled 我们挣扎的地方

Evaluation 评估

Evaluating the quality of our answers turned out to be more difficult than anticipated. The challenges can be broadly categorized into three areas: developing guidelines, scaling annotation, and automatic evaluation.
评估我们答案的质量比预期更困难。挑战主要可分为三个领域：制定指南、扩展标注和自动评估。

Developing guidelines was the first hurdle. Let’s take Job Assessment as an example: clicking “Assess my fit for this job” and getting “You are a terrible fit” isn’t very useful. We want it to be factual but also empathetic. Some members may be contemplating a career change into fields where they currently do not have a strong fit, and need help understanding what are the gaps and next steps. Ensuring these details were consistent was key for the uniformity of our annotator scores.
制定指南是第一个障碍。以工作评估为例：点击“评估我与这份工作的匹配度”，然后得到“您与这份工作的匹配度很差”并不是很有用。我们希望它既要客观，又要有同理心。一些成员可能正在考虑转行到目前自己并不擅长的领域，需要帮助了解差距和下一步怎么走。确保这些细节的一致性对于我们标注者评分的统一性至关重要。
Scaling annotation was the second step. Initially everyone in the team chimed in (product, eng, design, etc.), but we knew we needed a more principled approach with consistent and diverse annotators. Our internal linguist team built tooling and processes by which we could evaluate up to 500 daily conversations and get metrics around: overall quality score, hallucination rate, Responsible AI violation, coherence, style, etc. This became our main signpost to understand trends, iterate on prompts & ensure we were ready to go live.
标注的扩展是第二步。最初，团队中的每个人都参与其中（产品、工程、设计等），但我们知道我们需要一个更有原则性的方法，具有一致性和多样性的标注者。我们的内部语言学家团队建立了工具和流程，通过这些工具和流程，我们可以评估高达 500 个每日对话，并获得有关整体质量得分、幻觉率、负责任的 AI 违规行为、连贯性、风格等方面的指标。这成为我们了解趋势、对提示进行迭代并确保我们准备好上线的主要标志。
Automatic evaluation is the holy grail, but still a work in progress. Without it, engineers are left with eye-balling results and testing on a limited set of examples, and having a 1+ day delay to know metrics. We are building model-based evaluators to estimate the above metrics & allow for much faster experimentation, and had some success on hallucination detection (but it wasn’t easy!).
自动评估是圣杯，但仍在不断完善中。没有它，工程师只能凭经验判断结果并在有限的示例集上进行测试，并且需要 1 天以上的延迟才能了解指标。我们正在构建基于模型的评估器来估计上述指标，并实现更快的实验，对幻觉检测取得了一些成功（但这并不容易！）。

Diagram of the evaluation steps we perform — Figure 2: Evaluation steps we perform. Engineers perform fast, coarse evaluations to get directional metrics. Annotators give more granular feedback but have a ~1 day turnaround. Members are the final judges and give us scale, but some metrics can take 3+ days for a single change
图 2：我们执行的评估步骤。工程师进行快速、粗略的评估以获得方向性指标。注释者提供更细粒度的反馈，但需要大约 1 天的反馈时间。成员是最终的评判者，为我们提供规模，但某些指标可能需要 3 天以上才能进行单个更改。

What we are working on: end-to-end automatic evaluation pipeline for faster iteration.
我们正在开发的内容：端到端的自动评估流程，以加快迭代速度。

Calling internal APIs 调用内部 API

LinkedIn has a lot of unique data about people, companies, skills, courses, etc. which are critical to building a product offering unique and differentiated value. LLMs, however, have not been trained with this information and hence are unable to use them as is for reasoning and generating responses. A standard pattern to work around this is to set up a Retrieval Augmented Generation (RAG) pipeline, via which internal APIs are called, and their responses are injected into a subsequent LLM prompt to provide additional context to ground the response.
LinkedIn 拥有大量关于人员、公司、技能、课程等的独特数据，这些数据对于构建提供独特和差异化价值的产品至关重要。然而，LLMs并未经过这些信息的训练，因此无法直接用于推理和生成响应。解决这个问题的标准模式是建立一个检索增强生成（RAG）管道，通过该管道调用内部 API，并将它们的响应注入到后续的LLM提示中，以提供额外的上下文来支撑响应。

A lot of this unique data is exposed internally via RPC APIs across various microservices. While this is very convenient for humans to invoke programmatically, it is not very LLM friendly. We worked around this by wrapping “skills” around these APIs. Every skill has the following components:
许多这些独特数据通过各种微服务之间的 RPC API 在内部公开。虽然这对人类在程序上调用非常方便，但对LLM并不友好。我们通过在这些 API 周围包装“技能”来解决这个问题。每个技能都包括以下组件：

A human (and hence LLM) friendly description of what the API does, and when to use it.
API 的人类友好描述，以及何时使用它。
The configuration to call the RPC API (Endpoint, input schema, output schema etc.)
调用 RPC API 的配置（端点、输入模式、输出模式等）
The LLM friendly input and output schema
LLM友好的输入和输出模式
- Primitive typed (String/Boolean/Number) values
  原始类型（字符串/布尔值/数字）的值
- JSON schema style input and output schema descriptions
  JSON 模式风格的输入和输出模式描述
The business logic to map between LLM friendly schemas and actual RPC schemas.
将友好模式LLM和实际 RPC 模式之间的业务逻辑映射。

Skills like this enable the LLM to do various things relevant to our product like view profiles, search articles/people/jobs/companies and even query internal analytics systems. The same technique is also used for calling non-LinkedIn APIs like Bing search and news.
这样的技能使LLM能够执行与我们的产品相关的各种操作，如查看个人资料、搜索文章/人员/工作/公司，甚至查询内部分析系统。相同的技术也用于调用非 LinkedIn 的 API，如必应搜索和新闻。

Diagram of the system calling internal APIs using skills — Figure 3: Calling internal APIs using skills
图 3：使用技能调用内部 API

We write prompts that ask the LLM to decide what skill to use to solve a particular job (skill selection via planning), and then also output the parameters to invoke the skill with (function call). Since the parameters to the call have to match the input schema, we ask the LLM to output them in a structured manner. Most LLMs are trained on YAML and JSON for structured output. We picked YAML because it is less verbose, and hence consumes fewer tokens than JSON.
我们编写提示，要求LLM决定使用哪种技能来解决特定的工作（通过规划进行技能选择），然后还输出调用该技能所需的参数（函数调用）。由于调用的参数必须与输入模式匹配，我们要求LLM以结构化方式输出它们。大多数LLMs都接受关于结构化输出的 YAML 和 JSON 培训。我们选择了 YAML，因为它更简洁，因此比 JSON 消耗更少的标记。

One of the challenges we ran into was that while about ~90% of the time, the LLM responses contained the parameters in the right format, ~10% of the time the LLM would make mistakes and often output data that was invalid as per the schema supplied, or worse not even valid YAML. These mistakes, while being trivial for a human to spot, caused the code parsing them to barf. ~10% was a high enough number for us to not ignore trivially, and hence we set out to fix this problem.
我们遇到的挑战之一是，大约 90%的时间，LLM响应包含正确格式的参数，而约 10%的时间，LLM会犯错误，通常输出的数据无效，与提供的模式不符，甚至不是有效的 YAML。这些错误虽然对人类来说很容易发现，但导致解析它们的代码出错。10%的错误率对我们来说并不是可以忽略的，因此我们着手解决这个问题。

A standard way to fix this problem is to detect it and then re-prompt the LLM to ask it to correct its mistakes with some additional guidance. While this technique works, it adds a non-trivial amount of latency and also consumes precious GPU capacity due to the additional LLM call. To circumvent these limitations, we ended up writing an in-house defensive YAML parser.
解决这个问题的标准方法是检测它，然后重新提示LLM要求它根据一些额外的指导纠正错误。虽然这种技术有效，但会增加相当数量的延迟，同时由于额外的LLM调用而消耗宝贵的 GPU 容量。为了规避这些限制，我们最终编写了一个内部防御性 YAML 解析器。

Through an analysis of various payloads, we determined common mistakes made by the LLM, and wrote code to detect and patch these appropriately before parsing. We also modified our prompts to inject hints around some of these common mistakes, to improve the accuracy of our patching. We were ultimately able to reduce occurrences of these errors to ~0.01%.
通过对各种有效载荷的分析，我们确定了LLM常见的错误，并编写了代码来在解析之前检测和修补这些错误。我们还修改了我们的提示，以在一些常见错误周围注入提示，以提高我们修补的准确性。最终，我们成功将这些错误的发生率降低到约 0.01%。

What we are working on: a unified skill registry to dynamically discover and invoke APIs/agents packaged as LLM friendly skills across our generative AI products.
我们正在努力开发一个统一的技能注册表，以动态发现和调用打包为LLM友好技能的 API/代理，跨越我们生成式人工智能产品。

Consistent quality 一致的质量

The team achieved 80% of the basic experience we were aiming to provide within the first month and then spent an additional four months attempting to surpass 95% completion of our full experience - as we worked diligently to refine, tweak and improve various aspects. We underestimated the challenge of detecting and mitigating hallucinations, as well as the rate at which quality scores improved—initially shooting up, then quickly plateauing.
团队在第一个月内实现了我们旨在提供的基本体验的 80％，然后又花了额外的四个月努力超过我们完整体验的 95％完成度 - 我们努力完善，调整和改进各个方面。我们低估了检测和减轻幻觉的挑战，以及质量评分提高的速度 - 最初迅速上升，然后迅速趋于平稳。

For product experiences that tolerate such a level of errors, building with generative AI is refreshingly straightforward. But it also creates unattainable expectations, the initial pace created a false sense of ‘almost there,’ which became discouraging as the rate of improvement slowed significantly for each subsequent 1% gain.
对于能容忍如此错误水平的产品体验来说，利用生成式人工智能构建是令人耳目一新的简单。但它也制造了无法实现的期望，最初的步伐造成了一种虚假的“几乎到达”感，随着每增加 1%的改进速度显著放缓，这种感觉变得令人沮丧。

Building the assistant felt like a departure from more ‘principled’ ML, and more akin to tweaking rules in expert systems. So while our evaluation became more and more sophisticated, our ‘training’ was mostly prompt engineering which was more of an art than a science.
构建助手感觉像是从更“原则性”的机器学习中脱离，更类似于调整专家系统中的规则。因此，虽然我们的评估变得越来越复杂，我们的“训练”主要是提示工程，更像是一门艺术而不是科学。

What we are working on: fine tuning large language models (LLMs) to make our pipeline more data-driven.
我们正在进行的工作是：对大型语言模型（LLMs）进行微调，使我们的流程更加数据驱动。

Capacity & Latency 容量和延迟

Capacity and perceived member latency were always top of mind. Some dimensions:
容量和成员感知的延迟一直是首要考虑的问题。一些方面：

Quality vs Latency: techniques like Chain of Thought (CoT) are very effective at improving quality and reducing hallucinations. But they require tokens that the member never sees, hence increasing their perceived latency.
质量 vs 延迟：像思维链 (CoT) 这样的技术非常有效，可以提高质量并减少幻觉。但是它们需要成员从未见过的令牌，因此会增加他们感知到的延迟。
Throughput vs Latency: when running large generative models, it’s often the case that TimeToFirstToken (TTFT) & TimeBetweenTokens (TBT) increase with utilization. In the case of TBT it can sometimes be linear. It’s not uncommon to get 2x/3x the TokensPerSecond (TPS) if you are willing to sacrifice both of those metrics, but we initially had to bound them pretty tight.
吞吐量 vs 延迟：在运行大型生成模型时，通常情况下，首标记时间（TTFT）和标记间时间（TBT）会随着利用率的增加而增加。在 TBT 的情况下，有时可能是线性的。如果愿意牺牲这两个指标，通常可以获得 2 倍/3 倍的每秒标记数（TPS），但最初我们必须将它们限制得非常严格。
Cost: GPU clusters are not easy to come by and are costly. At the beginning we even had to set timetables for when it was ok to test the product or not, as it’d consume too many tokens and lock out developers from working.
成本：GPU 集群不易获得且成本高昂。起初，我们甚至不得不为测试产品的时间制定时间表，因为它会消耗太多代币并阻止开发人员工作。
End to end streaming: a full answer might take minutes to complete, so we make all our requests stream to reduce perceived latency. What’s more, we actually stream within our pipeline end to end. For example the LLM response deciding which APIs to call is progressively parsed and we fire API calls as soon as parameters are ready, without waiting for the full LLM response. The final synthesized response is also streamed all the way to the client using our realtime messaging infrastructure with incremental processing for things like trust/Responsible AI classification.
端到端流式传输：完整的答复可能需要几分钟才能完成，因此我们使所有请求都以流式传输的方式进行，以减少感知延迟。此外，我们实际上在我们的管道端到端进行流式传输。例如，决定调用哪些 API 的LLM响应被逐步解析，一旦参数准备就绪，我们立即发出 API 调用，而不必等待完整的LLM响应。最终合成的响应也通过我们的实时消息基础设施以增量处理的方式流式传输到客户端，用于诸如信任/负责任的 AI 分类等事项。
Async non-blocking pipeline: Since LLM calls can take a long time to process, we optimized our service throughput by building a fully async non-blocking pipeline that does not waste resources on account of threads blocked on I/O.
异步非阻塞管道：由于LLM调用可能需要很长时间来处理，我们通过构建完全异步非阻塞管道来优化我们的服务吞吐量，这样就不会浪费资源在因 I/O 阻塞而被线程占用的情况下。

These sometimes had an interesting interplay between them. As an example, we initially only bounded TTFT as that mapped to member latency directly for our initial product. As we tackled hallucinations and Chain of Thought became prominent in our prompts, we neglected that TBT would hurt us much more, as any ‘reasoning’ token would multiple member latency (e.g. for a 200-token reasoning step, even a 10ms TBT increase means an extra 2s of latency). This caused one of our public ramps to suddenly sound alerts left and right that some tasks were hitting timeouts, and we quickly had to increase capacity to alleviate the issue.
有时它们之间会产生有趣的相互作用。例如，最初我们只将 TTFT 限定为直接映射到成员延迟的初始产品。当我们处理幻觉并且“思维链”在我们的提示中变得突出时，我们忽视了 TBT 会对我们造成更大的伤害，因为任何“推理”令牌都会使成员延迟倍增（例如，对于一个包含 200 个令牌的推理步骤，即使 TBT 增加了 10 毫秒，也意味着额外的 2 秒延迟）。这导致我们的一个公共斜坡突然传来警报声，提示一些任务已经达到超时，我们迅速增加了容量以缓解问题。

What we are working on:
我们正在研究的内容：

Moving simpler tasks to in-house, fine-tuned models.
将更简单的任务转移到内部，进行微调模型。
More predictable deployment infrastructure for LLM deployments.
更可预测的LLM部署基础设施。
Reducing wasted tokens of every step.
减少每个步骤中浪费的代币。

Takeaways 主要观点

Enough from us, why don’t we let the product do the talking?
让产品自己说话，我们就不多说了

Mobile screen with AI-powered insights and takeaways

That’s not bad! The follow-up suggestions, in particular, can lead you down a Wikipedia-style rabbit hole of curiosity.
这不错！特别是后续建议，可能会让你沉浸在一种类似维基百科风格的好奇心兔子洞中。

As we continue to refine quality, develop new features, and optimize the pipeline for speed, we'll be rolling out to more users very soon.
随着我们不断完善质量、开发新功能，并优化速度管道，我们将很快推出更多用户。

Getting here has been a monumental effort from a wonderful group of people, and we’ll be sharing more technical details soon as we keep learning. Stay tuned!
到达这里是一群了不起的人付出了巨大的努力，随着我们不断学习，我们很快将分享更多技术细节。敬请关注！

Topics: Generative AI AI Responsible AI
主题：生成式人工智能 AI 负责人工智能

Musings on building a Generative AI product关于构建生成式人工智能产品的思考