这是用户在 2024-5-29 9:49 为 https://mattweidner.com/2023/09/26/crdt-survey-3.html#misc-techniques 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

CRDT Survey, Part 3: Algorithmic Techniques
CRDT 调查,第 3 部分:算法技术

Matthew Weidner | Feb 13th, 2024
马修·韦德纳 | 2024 年 2 月 13 日

Home | RSS Feed
主页 | RSS 订阅

Keywords: CRDTs, optimization, state-based CRDTs
关键词:CRDT,优化,基于状态的 CRDT

This blog post is Part 3 of a series.
这篇博客文章是系列的第三部分。

# Algorithmic Techniques
# 算法技术

Let’s say you have chosen your collaborative app’s semantics in style of Part 2. That is, you’ve chosen a pure function that inputs an operation history and outputs the intended state of a user who is aware of those operations.
假设您已经选择了以第 2 部分风格的协作应用程序语义。也就是说,您已经选择了一个纯函数,该函数接受操作历史记录作为输入,并输出了了解这些操作的用户的预期状态。

Here is a simple protocol to turn those semantics into an actual collaborative app:
这里是一个简单的协议,可以将这些语义转化为一个实际的协作应用程序:

  1. Each user’s state is a literal operation history, i.e., a set of operations, with each operation labeled by a Unique ID (UID).
    每个用户的状态是一个文字操作历史记录,即一组操作,每个操作都由唯一标识符(UID)标记。
  2. When a user performs an operation, they generate a new UID, then add the pair (id, op) to their local operation history.
    当用户执行操作时,他们会生成一个新的 UID,然后将 (id, op) 对添加到他们的本地操作历史记录中。
  3. To synchronize their states, users share the pairs (id, op) however they like. For example, users could broadcast pairs as soon as they are created, periodically share entire histories peer-to-peer, or run a clever protocol to send a peer only the pairs that it is missing. Recipients always ignore redundant pairs (duplicate UIDs).
    为了同步它们的状态,用户可以按照自己的喜好共享配对 (id, op) 。例如,用户可以在创建后立即广播配对,定期点对点共享整个历史记录,或者运行一个聪明的协议,仅向对等方发送它缺少的配对。接收者始终忽略冗余的配对(重复的 UID)。
  4. Whenever a user’s local operation history is updated - either by a local operation or a remote message - they apply the semantics (pure function) to their new history, to yield the current app-visible state.
    每当用户的本地操作历史记录被更新 - 无论是通过本地操作还是远程消息 - 他们将语义(纯函数)应用于他们的新历史记录,以产生当前应用程序可见状态。

Technicalities: 技术细节:

  • It is the translated operations that get stored in operation histories and sent over the network. E.g., convert list indices to list CRDT positions before storing & sending.
    翻译后的操作被存储在操作历史记录中,并通过网络发送。例如,在存储和发送之前,将列表索引转换为列表 CRDT 位置。
  • The history should also include causal ordering metadata - the arrows in Part 2’s operation histories. When sharing an operation, also share its incoming arrows, i.e., the UIDs of its immediate causal predecessors.
    历史记录还应包括因果排序元数据 - 第 2 部分操作历史中的箭头。在共享操作时,还应共享其传入箭头,即其直接因果前任的 UID。
  • Optionally enforce causal order delivery, by waiting to add a received operation to your local operation history until after you have added all of its immediate causal predecessors.
    可选择强制执行因果顺序传递,即在将接收到的操作添加到本地操作历史记录之前等待,直到添加完其所有直接因果前任。


This post describes CRDT algorithmic techniques that help you implement more efficient versions of the simple protocol above. We start with some Prerequisites that are also useful in general distributed systems. Then Sync Strategies describes the traditional “types” of CRDTs - op-based, state-based, and others - and how they relate to the simple protocol.
本文描述了 CRDT 算法技术,可以帮助您实现比上述简单协议更高效的版本。我们从一些先决条件开始,这些先决条件在一般分布式系统中也很有用。然后同步策略描述了传统的 CRDT 类型 - 基于操作、基于状态等 - 以及它们与简单协议的关系。

The remaining sections describe specific algorithms. These algorithms are largely independent of each other, so you can skip to whatever interests you. Misc Techniques fills in some gaps from Part 2, e.g., how to generate logical timestamps. Optimized CRDTs describes nontrivial optimized algorithms, including classic state-based CRDTs.
剩余部分描述了具体的算法。这些算法在很大程度上彼此独立,因此您可以直接跳转到您感兴趣的部分。Misc Techniques 填补了第 2 部分的一些空白,例如,如何生成逻辑时间戳。Optimized CRDTs 描述了非平凡的优化算法,包括经典的基于状态的 CRDTs。

# Table of Contents
目录

# Prerequisites # 先决条件

# Replicas and Replica IDs
复制品和复制 ID

A replica is a single copy of a collaborative app’s state, in a single thread on a single device. For web-based apps, there is usually one replica per browser tab; when the user (re)loads a tab, a new replica is created.
复制品是协作应用程序状态的单个副本,在单个线程上的单个设备上。对于基于 Web 的应用程序,通常每个浏览器选项卡有一个副本;当用户重新加载选项卡时,将创建一个新的副本。

You can also call a replica a client, session, actor, etc. However, a replica is not synonymous with a device or a user. Indeed, a user can have multiple devices, and a device can have multiple independently-updating replicas - for example, a user may open the same collaborative document in multiple browser tabs.
您也可以将复制品称为客户端、会话、参与者等。但是,复制品并不等同于设备或用户。实际上,一个用户可以拥有多个设备,一个设备可以拥有多个独立更新的复制品 - 例如,用户可以在多个浏览器标签中打开相同的协作文档。

In previous posts, I often said “user” out of laziness - e.g., “two users concurrently do X and Y”. But technically, I always meant “replica” in the above sense. Indeed, a single user might perform concurrent operations across different devices.
在之前的帖子中,我经常因懒惰而说“用户” - 例如,“两个用户同时执行 X 和 Y”。但从技术上讲,我上面的意思总是“副本”。事实上,一个单一用户可能会在不同设备上执行并发操作。


The importance of a replica is that everything inside a replica happens in a sequential order, without any concurrency between its own operations. This is the fundamental principle behind the next two techniques.
复制品的重要性在于复制品内部的一切都按顺序发生,没有任何并发在其自身操作之间。这是下面两种技术背后的基本原则。

It is usually convenient to assign each replica a unique replica ID (client ID, session ID, actor ID), by generating a random string when the replica is created. The replica ID must be unique among all replicas of the same collaborative state, including replicas created concurrently, which is why they are usually random instead of “the highest replica ID so far plus 1”. Random UUIDs (v4) are a safe choice. You can potentially use fewer random bits (shorter replica IDs) if you are willing to tolerate a higher chance of accidental non-uniqueness (cf. the birthday problem).
通常情况下,为每个副本分配一个唯一的副本 ID(客户端 ID、会话 ID、执行者 ID)是方便的,方法是在创建副本时生成一个随机字符串。副本 ID 必须在同一协作状态的所有副本中是唯一的,包括同时创建的副本,这就是为什么它们通常是随机的,而不是“迄今为止最高的副本 ID 加 1”。随机 UUID(v4)是一个安全的选择。如果你愿意容忍更高的意外非唯一性几率(比如生日悖论),你可以潜在地使用更少的随机位(更短的副本 ID)。

For reference, a UUID v4 is 122 random bits, a Collabs replicaID is 60 random bits (10 base64 chars), and a Yjs clientID is 32 random bits (a uint32).
作为参考,UUID v4 是 122 位随机比特,Collabs replicaID 是 60 位随机比特(10 个 base64 字符),而 Yjs clientID 是 32 位随机比特(一个 uint32)。


Avoid the temptation to reuse a replica ID across replicas on the same device, e.g., by storing it in window.localStorage. That can cause problems if the user opens multiple tabs, or if there is a crash failure and the old replica did not record all of its actions to disk.
避免在同一设备上的复制品之间重复使用复制品 ID 的诱惑,例如,通过将其存储在 window.localStorage 中。如果用户打开多个选项卡,或者发生崩溃故障且旧复制品未将其所有操作记录到磁盘上,这可能会导致问题。

In Collabs: ReplicaIDs 在合作中:复制 ID

# Unique IDs: Dots
# 唯一标识符:点

Recall from Part 2 that to refer to a piece of content, you should assign it an immutable Unique ID (UID). UUIDs work, but they are long (32 chars) and don’t compress well.
回顾第 2 部分,要引用内容片段,您应该为其分配一个不可变的唯一 ID(UID)。UUID 可行,但它们很长(32 个字符)且不易压缩。

Instead, you can use dot IDs: pairs of the form (replicaID, counter), where counter is a local variable that is incremented each time. So a replica with ID "n48BHnsi" uses the dot IDs ("n48BHnsi", 1), ("n48BHnsi", 2), ("n48BHnsi", 3), …
相反,您可以使用点 ID:形式为 (replicaID, counter) 的一对,其中 counter 是一个本地变量,每次递增一次。因此,ID 为 "n48BHnsi" 的副本使用点 ID ("n48BHnsi", 1)("n48BHnsi", 2)("n48BHnsi", 3) ,...

In pseudocode: 在伪代码中:

// Local replica ID.
const replicaID = <sufficiently long random string>;
// Local counter value (specific to this replica). Integer.
let counter = 0;

function newUID() {
    counter++;
    return (replicaID, counter);
}

The advantage of dot IDs is that they compress well together, either using plain GZIP or a dot-aware encoding. For example, a vector clock (below) represents a sequence of dots ("n48BHnsi", 1), ("n48BHnsi", 2), ..., ("n48BHnsi", 17) as the single map entry { "n48BHnsi": 17 }.
点 ID 的优势在于它们可以很好地压缩在一起,可以使用普通的 GZIP 或点感知编码。例如,一个向量时钟(如下)表示一系列点 ("n48BHnsi", 1), ("n48BHnsi", 2), ..., ("n48BHnsi", 17) 作为单个映射条目 { "n48BHnsi": 17 }

You have some flexibility for how you assign the counter values. For example, you can use a logical clock value instead of a counter, so that your UIDs are also logical timestamps for LWW. Or, you can use a separate counter for each component CRDT in a composed construction, instead of one for the whole replica. The important thing is that you never reuse a UID in the same context, where the two uses could be confused.
您可以灵活地分配计数器值。例如,您可以使用逻辑时钟值而不是计数器,这样您的 UID 也可以是 LWW 的逻辑时间戳。或者,您可以为组合结构中的每个组件 CRDT 使用单独的 counter ,而不是为整个副本使用一个。重要的是在相同上下文中不要重复使用 UID,以免混淆两个用途。

Example: An append-only log could choose to use its own counter when assigning events’ dot IDs. That way, it can store its state as a Map<ReplicaID, T[]>, mapping each replica ID to an array of that replica’s events indexed by (counter - 1). If you instead used a counter shared by all component CRDTs, then the arrays could have gaps, making a Map<ReplicaID, Map<number, T>> preferable.
示例:只追加日志可以选择在分配事件的点 ID 时使用自己的 counter 。这样,它可以将其状态存储为 Map<ReplicaID, T[]> ,将每个副本 ID 映射到该副本事件的数组,索引为 (counter - 1) 。如果您改为使用所有组件 CRDT 共享的 counter ,那么数组可能会有间隙,这样做会更好。 Map<ReplicaID, Map<number, T>>

In previous blog posts, I called these “causal dots”, but I cannot find that name used elsewhere; instead, CRDT papers just use “dot”.
在先前的博客文章中,我称之为“因果点”,但我找不到其他地方使用过这个名称;相反,CRDT 论文只是使用“点”。


Collabs: Each transaction has an implicit dot ID (senderID, senderCounter).
合作:每个交易都有一个隐式的点 ID(senderID,senderCounter)。

Refs: Preguiça et al. 2010
参考文献:Preguiça 等人。2010 年

# Tracking Operations: Vector Clocks 1
# 跟踪操作:向量时钟 1

Vector clocks are a theoretical technique with multiple uses. In this section, I’ll focus on the simplest one: tracking a set of operations. (Vector Clocks 2 is later.)
向量时钟是一种具有多种用途的理论技术。在本节中,我将重点关注最简单的用途:跟踪一组操作。(向量时钟 2 稍后。)

Suppose a replica is aware of the following operations - i.e., this is its current local view of the operation history:
假设一个复制品知道以下操作——即,这是它对操作历史的当前本地视图:

An operation history with labels ("A84nxi", 1), ("A84nxi", 2), ("A84nxi", 3), ("A84nxi", 4), ("bu2nVP", 1), ("bu2nVP", 2).

I’ve labeled each operation with a dot ID, like ("A84nxi", 2).
我已经用点 ID 为每个操作贴上标签,例如 ("A84nxi", 2)

In the future, the replica might want to know which operations it is already aware of. For example:
在将来,复制品可能想知道它已经了解的操作。例如:

One way to track the operation history is to store the operations’ unique IDs as a set: { ("A84nxi", 1), ("A84nxi", 2), ("A84nxi", 3), ("A84nxi", 4), ("bu2nVP", 1), ("bu2nVP", 2)}. But it is cheaper to store the “compressed” representation
跟踪操作历史的一种方法是将操作的唯一标识存储为一个集合: { ("A84nxi", 1), ("A84nxi", 2), ("A84nxi", 3), ("A84nxi", 4), ("bu2nVP", 1), ("bu2nVP", 2)} 。但是更便宜的方法是存储“压缩”表示。

{
    "A84nxi": 4,
    "bu2nVP": 2
}

This representation is called a vector clock. Formally, a vector clock is a Map<ReplicaID, number> that sends a replica ID to the maximum counter value received from that replica, where each replica assigns counters to its own operations in order starting at 1 (like a dot ID). Missing replica IDs implicitly map to 0: we haven’t received any operations from those replicas. The above example shows that a vector clock efficiently summarizes a set of operation IDs.
这种表示被称为向量时钟。形式上,向量时钟是一个 Map<ReplicaID, number> ,它将一个副本 ID 发送到从该副本接收到的最大计数器值,其中每个副本为其自己的操作分配计数器,从 1 开始(类似于一个点 ID)。缺失的副本 ID 隐式映射为 0:我们没有从这些副本接收任何操作。上面的例子显示了向量时钟如何有效地总结一组操作 ID。

The previous paragraph implicitly assumes that you process operations from each other replica in order: first ("A84nxi", 1), then ("A84nxi", 2), etc. That always holds when you enforce causal-order delivery. If you don’t, then a replica’s counter values might have gaps (1, 2, 4, 7, ...); you can still encode those efficiently, using a run-length encoding, or a vector clock plus a set of “extra” dot IDs (known as a dotted vector clock).
前一段隐含地假定您按顺序处理来自每个其他副本的操作:首先 ("A84nxi", 1) ,然后 ("A84nxi", 2) ,依此类推。当您强制执行因果顺序传递时,这种情况总是成立。如果不这样做,那么副本的计数器值可能存在间隙( 1, 2, 4, 7, ... );您仍然可以高效地对其进行编码,使用一种运行长度编码,或者一个向量时钟加上一组“额外”的点 ID(称为点向量时钟)。


Like dot IDs, vector clocks are flexible. For example, instead of a per-replica counter, you could store the most recent logical timestamp received from each replica. That is a reasonable choice if each operation already contains a logical timestamp for LWW.
像点标识符一样,向量时钟是灵活的。例如,您可以存储从每个副本接收的最新逻辑时间戳,而不是每个副本的计数器。如果每个操作已经包含了一个用于 LWW 的逻辑时间戳,那是一个合理的选择。

Collabs: CausalMessageBuffer
合作:CausalMessageBuffer

Refs: Baquero and Preguiça 2016; Wikipedia
参考文献:Baquero 和 Preguiça 2016;维基百科

# Sync Strategies # 同步策略

We now turn to sync strategies: ways to keep collaborators in sync with each other, so that they eventually see the same states.
我们现在转向同步策略:让协作者保持同步,以便最终看到相同的状态。

# Op-Based CRDTs 基于操作的 CRDTs

An operation-based (op-based) CRDT keeps collaborators in sync by broadcasting operations as they happen. This sync strategy is especially useful for live collaboration, where users would like to see each others’ operations quickly.
基于操作的 CRDT 通过在操作发生时广播操作来保持协作者同步。这种同步策略在实时协作中特别有用,用户希望快速看到彼此的操作。

Current state (op history) + single operation -> new state (op history including operation).

In the simple protocol at the top of this post, a user processes an individual operation by adding it to their operation history, then re-running the semantic function to update their app-visible state. An op-based CRDT instead stores a state that is (usually) smaller than the complete operation history, but it still contains enough information to render the app-visible state, and it can be updated incrementally in response to a received operation.
在本帖子顶部的简单协议中,用户通过将其添加到操作历史记录中来处理单个操作,然后重新运行语义函数以更新其应用程序可见状态。基于操作的 CRDT 相反,存储的状态通常比完整操作历史记录小,但仍包含足够的信息来呈现应用程序可见状态,并且可以根据接收到的操作逐步更新。

Example: The op-based counter CRDT from Part 1 has an internal state that is merely the current count. When a user receives (or performs) an inc() operation, they increment the count.
示例:第 1 部分中的基于操作的计数器 CRDT 具有仅为当前计数的内部状态。当用户接收(或执行)一个 inc() 操作时,它们会递增计数。

Formally, an op-based CRDT consists of:
正式地,基于操作的 CRDT 包括:

  1. A set of allowed CRDT states that a replica can be in.
    一组允许的 CRDT 状态,复制品可以处于其中。
  2. A query that returns the app-visible state. (The CRDT state often includes extra metadata that is not visible to the rest of the app, such as LWW timestamps.)
    返回应用程序可见状态的查询。(CRDT 状态通常包括对应用程序其余部分不可见的额外元数据,例如 LWW 时间戳。)
  3. For each operation (insert, set, etc.):
    对于每个操作( insertset 等):
    1. A prepare function that inputs the operation’s parameters and outputs a message describing that operation. An external protocol promises to broadcast this message to all collaborators. (Usually this message is just the translated form of the operation. prepare is allowed to read the current state but not mutate it.)
      一个 prepare 函数,输入操作的参数并输出描述该操作的消息。外部协议承诺将此消息广播给所有协作者。(通常,此消息只是操作的翻译形式。 prepare 被允许读取当前状态但不允许改变它。)
    2. An effect function that processes a message, updating the local state. An external protocol promises to call effect:
      一个处理消息并更新本地状态的 effect 函数。外部协议承诺会调用 effect
      • Immediately for each locally-prepared message, so that local operations update the state immediately.
        立即为每个本地准备的消息,以便本地操作立即更新状态。
      • Eventually for each remotely-prepared message, exactly once and (optionally) in causal order.
        最终对于每个远程准备的消息,确切地一次且(可选地)按因果顺序。

In addition to updating their internal state, CRDT libraries’ effect functions usually also emit events that describe how the state changed. Cf. Views in Part 2.
除了更新其内部状态外,CRDT 库的 effect 函数通常还会发出描述状态变化的事件。参见第 2 部分中的视图。


To claim that an op-based CRDT implements a given CRDT semantics, you must prove that the app-visible state always equals the semantics applied to the set of operations effected so far.
要声称基于操作的 CRDT 实现了给定的 CRDT 语义,您必须证明应用可见状态始终等于迄今为止影响的操作集应用的语义。

As an example, let’s repeat the op-based unique set CRDT from Part 2.
作为一个例子,让我们重复第 2 部分中的基于操作的唯一集合 CRDT。

  • Per-user CRDT state: A set of pairs (id, x).
    每个用户的 CRDT 状态:一组成对的 (id, x)
  • Query: Return the CRDT state directly, since in this case, it coincides with the app-visible state.
    查询:由于在这种情况下,CRDT 状态与应用程序可见状态相符,因此直接返回 CRDT 状态。
  • Operation add:  操作 add
    • prepare(x): Generate a new UID id, then return the message ("add", (id, x)).
      prepare(x) :生成一个新的 UID id ,然后返回消息 ("add", (id, x))
    • effect("add", (id, x)): Add (id, x) to your local state.
      effect("add", (id, x)) :将 (id, x) 添加到您的本地状态。
  • Operation delete:  操作 delete
    • prepare(id): Return the message ("delete", id).
      prepare(id) :返回消息 ("delete", id)
    • effect("delete", id): Delete the pair with the given id from your local state, if it is still present.
      effect("delete", id) :如果仍然存在,请从本地状态中删除具有给定 id 的对。

It is easy to check that this op-based CRDT has the desired semantics: at any time, the query returns the set of pairs (id, x) such that you have an effected an add(id, x) operation but no delete(id) operations.
很容易检查这个基于操作的 CRDT 是否具有所需的语义:在任何时候,查询都会返回一组成对 (id, x) ,这样您就可以执行 add(id, x) 操作,但没有 delete(id) 操作。

Observe that the CRDT state is a lossy representation of the operation history: we don’t store any info about delete operations or deleted add operations.
观察到 CRDT 状态是操作历史的有损表示:我们不存储任何关于 delete 操作或已删除 add 操作的信息。


How can the “external protocol” (i.e., the rest of the app) guarantee that messages are effected at-most-once and in causal order? Using a history-tracking vector clock:
“外部协议”(即应用程序的其余部分)如何确保消息最多一次生效且按因果顺序进行?使用历史跟踪向量时钟:

  1. On each replica, store a vector clock tracking the operations that have been effected so far.
    在每个副本上,存储一个向量时钟,跟踪到目前为止已经执行的操作。
  2. When a replica asks to send a prepared message, attach a new dot ID to that message before broadcasting it. Also attach the dot IDs of its immediate causal predecessors.
    当复制品要求发送准备好的消息时,在广播之前将一个新的点 ID 附加到该消息。还附加其直接因果前任的点 ID。
  3. When a replica receives a message:
    当复制品接收到一条消息时:
    1. Check if its dot ID is redundant, according to the local vector clock. If so, stop.
      根据本地向量时钟检查其点 ID 是否冗余。如果是,则停止。
    2. Check if the immediate causal predecessors’ dot IDs have been effected, according to the local vector clock. If not, block until they are.
      检查立即因果前趋的点 ID 是否受到影响,根据本地向量时钟。如果没有受到影响,则阻塞直到受到影响。
    3. Deliver the message to effect and update the local vector clock. Do the same for any newly-unblocked messages.
      将消息传递给 effect 并更新本地向量时钟。对于任何新解锁的消息也执行相同操作。

To ensure that messages are eventually delivered at-least-once to each replica (the other half of exactly-once), you generally need some help from the network. E.g., have a server store all messages and retry delivery until every client confirms receipt.
为了确保消息最终至少传递一次到每个副本(确切一次的另一半),通常需要网络的一些帮助。例如,让服务器存储所有消息并重试传递,直到每个客户端确认收到。

As a final note, suppose that two users concurrently perform operations o and p. You are allowed to deliver their op-based messages to effect in either order without violating the causal-order delivery guarantee. Semantically, the two delivery orders must result in equivalent internal states: both results correspond to the same operation history, containing both o and p. Thus for an op-based CRDT, concurrent messages commute.
作为最后说明,假设两个用户同时执行操作 op 。您可以以任何顺序将它们的基于操作的消息传递给 effect ,而不会违反因果顺序传递保证。从语义上讲,这两种传递顺序必须导致等效的内部状态:两种结果对应于相同的操作历史,包含 op 。因此,对于基于操作的 CRDT,同时消息是可交换的。

Conversely, you can prove that if an algorithm has the API of an op-based CRDT and concurrent messages commute, then its behavior corresponds to some CRDT semantics (i.e., some pure function of the operation history). This leads to the traditional definition of an op-based CRDT in terms of commuting concurrent operations. Of course, if you only prove commutativity, there is no guarantee that the corresponding semantics are reasonable in the eyes of your users.
相反,您可以证明,如果算法具有基于操作的 CRDT 的 API 并且并发消息是可交换的,那么它的行为对应于某些 CRDT 语义(即,操作历史的某些纯函数)。这导致了基于操作的 CRDT 的传统定义,涉及到并发操作的交换。当然,如果您只证明了可交换性,则不能保证相应的语义在用户看来是合理的。

Collabs: sendCRDT and receiveCRDT in PrimitiveCRDT
在 PrimitiveCRDT 中的 sendCRDT 和 receiveCRDT 合作

Refs: Shapiro et al. 2011a
参考文献:Shapiro 等人 2011a

# State-Based CRDTs 基于状态的 CRDTs

A state-based CRDT keeps users in sync by occasionally exchanging entire states, “merging” their operation histories. This sync strategy is useful in peer-to-peer networks (peers occasionally exchange states in order to bring each other up-to-date) and for the initial sync between a client and a server (the client merges its local state with the server’s latest state, and vice-versa).
基于状态的 CRDT 通过偶尔交换整个状态来使用户保持同步,"合并"它们的操作历史。这种同步策略在点对点网络中很有用(对等方偶尔交换状态以使彼此保持最新)以及客户端和服务器之间的初始同步(客户端将其本地状态与服务器的最新状态合并,反之亦然)。

Current state (op history) + other state (overlapping op history) -> merged state (union of op histories).

In the simple protocol at the top of this post, the “entire state” is the literal operation history, and merging is just the set-union of operations (using the UIDs to filter duplicates). A state-based CRDT instead stores a state that is (usually) smaller than the complete operation history, but it still contains enough information to render the app-visible state, and it can be “merged” with another state.
在本帖子顶部的简单协议中,“整个状态”是字面操作历史记录,合并只是操作的集合并(使用 UID 来过滤重复项)。基于状态的 CRDT 相反,存储的状态通常比完整操作历史记录小,但仍包含足够的信息来呈现应用程序可见状态,并且可以与另一个状态“合并”。

Formally, a state-based CRDT consists of:
正式地,基于状态的 CRDT 包括:

  1. A set of allowed CRDT states that a replica can be in.
    一组允许的 CRDT 状态,复制品可以处于其中。
  2. A query that returns the app-visible state. (The CRDT state often includes extra metadata that is not visible to the rest of the app, such as LWW timestamps.)
    返回应用程序可见状态的查询。(CRDT 状态通常包括对应用程序其余部分不可见的额外元数据,例如 LWW 时间戳。)
  3. For each operation, a state mutator that updates the local state.
    对于每个操作,都有一个状态改变器来更新本地状态。
  4. A merge function that inputs two states and outputs a “merged” state. The app using the CRDT may set local state = merge(local state, other state) at any time, where the other state usually comes from a remote collaborator or storage.
    一个合并函数,输入两个状态并输出一个“合并”状态。使用 CRDT 的应用程序可以随时将 local state = merge(local state, other state) 设置为任何值,其中 other state 通常来自远程协作者或存储。

To claim that a state-based CRDT implements a given CRDT semantics, you must prove that the app-visible state always equals the semantics applied to the set of operations that contribute to the current state. Here an operation “contributes” to the output of its state mutator, plus future states resulting from that state (e.g., the merge of that state with another).
要声称基于状态的 CRDT 实现了给定的 CRDT 语义,您必须证明应用可见状态始终等于应用于导致当前状态的操作集的语义。这里的一个操作“贡献”于其状态变更器的输出,以及由该状态产生的未来状态(例如,该状态与另一个状态合并)。

As an example, let’s repeat just the state-based part of the LWW Register from Part 2.
作为一个例子,让我们仅重复第 2 部分中基于状态的 LWW 寄存器的部分。

  • Per-user state: state = { value, time }, where time is a logical timestamp.
    每个用户的状态: state = { value, time } ,其中 time 是一个逻辑时间戳。
  • Query: Return state.value. 查询:返回 state.value
  • Operation set(newValue): Set state = { value: newValue, time: newTime }, where newTime is the current logical time.
    操作 set(newValue) :设置 state = { value: newValue, time: newTime } ,其中 newTime 是当前逻辑时间。
  • Merge in other state: Pick the state with the greatest logical timestamp. That is, if other.time > state.time, set state = other.
    在其他状态中合并:选择具有最大逻辑时间戳的状态。也就是说,如果 other.time > state.time ,则设置 state = other

It is easy to check that this state-based CRDT has the desired semantics: at any time, the query returns the value corresponding to the set operation with the greatest logical timestamp that contributes to the current state.
很容易检查这个基于状态的 CRDT 具有所需的语义:在任何时候,查询都会返回与对当前状态有贡献的具有最大逻辑时间戳的 set 操作对应的值。


As a final note, observe that for any CRDT states s, t, u, the following algebraic rules hold, because the same set of operations contributes to both sides of each equation:
最后请注意,对于任何 CRDT 状态 s, t, u ,以下代数规则成立,因为相同的操作集对每个方程的两侧都有贡献:

Thus for a state-based CRDT, the merge function is Associative, Commutative, and Idempotent (ACI).
因此,对于基于状态的 CRDT,合并函数是结合的、交换的和幂等的(ACI)。

Conversely, you can prove that if an algorithm has the API of a state-based CRDT and it satisfies ACI, then its behavior corresponds to some CRDT semantics (i.e., some pure function of the operation history). This leads to the traditional definition of a state-based CRDT in terms of an ACI merge function. Of course, if you only prove these algebraic rules, there is no guarantee that the corresponding semantics are reasonable in the eyes of your users.
相反,您可以证明,如果算法具有基于状态的 CRDT 的 API 并且满足 ACI,则其行为对应于某些 CRDT 语义(即,操作历史的某些纯函数)。这导致了基于状态的 CRDT 的传统定义,涉及 ACI 合并函数。当然,如果您只证明这些代数规则,不能保证相应的语义在用户看来是合理的。

Collabs: saveCRDT and loadCRDT in PrimitiveCRDT
在 PrimitiveCRDT 中,Collabs:saveCRDT 和 loadCRDT

Refs: Shapiro et al. 2011a
参考文献:Shapiro 等人 2011a

# Other Sync Strategies
其他同步策略

In a real collaborative app, it is inconvenient to choose op-based or state-based synchronization. Instead, it’s nice to use both, potentially within the same session.
在一个真正的协作应用程序中,选择基于操作或基于状态的同步都不方便。相反,最好同时使用两者,可能在同一个会话中。

Example: When the user launches your app, first do a state-based sync with a storage server to become in sync. Then use op-based messages over TCP to stay in sync until the connection drops.
当用户启动您的应用程序时,首先与存储服务器进行基于状态的同步,以保持同步。然后使用基于操作的消息通过 TCP 保持同步,直到连接中断。

Thus hybrid op-based/state-based CRDTs that support both sync strategies are popular in practice. Typically, these look like either state-based CRDTs with op-based messages tacked on (Yjs, Collabs), or they use an op-based CRDT alongside a complete operation history (Automerge).
因此,在实践中,支持同步策略的混合操作/基于状态的 CRDT 很受欢迎。通常,这些看起来要么像带有操作为基础的消息的基于状态的 CRDT(Yjs,Collabs),要么它们使用一个基于操作的 CRDT 以及完整的操作历史(Automerge)。

To perform a state-based merge in the latter approach, you look through the received state’s history for operations that are not already in your history, and deliver those to the op-based CRDT. This approach is simple, and it comes with a built-in version history, but it requires more effort to make efficient (as Automerge has been pursuing).
在后一种方法中执行基于状态的合并时,您需要查看接收到的状态历史记录中尚未包含在您历史记录中的操作,并将这些操作传递给基于操作的 CRDT。这种方法简单直接,并且具有内置的版本历史,但需要更多的工作来提高效率(正如 Automerge 一直在努力做的)。


Other sync strategies use optimized peer-to-peer synchronization. Traditionally, peer-to-peer synchronization uses state-based CRDTs: each peer sends a copy of its own state to the other peer, then merges in the received state. This is inefficient if the states overlap a lot - e.g., the two peers just synced one minute ago and have only updated their states slightly since then. Optimized protocols like Yjs’s sync potocol or Byzantine causal broadcast instead use back-and-forth messages to determine what info the other peer is missing and send just that.
其他同步策略使用优化的点对点同步。传统上,点对点同步使用基于状态的 CRDT:每个对等方将自己的状态副本发送给另一个对等方,然后合并接收到的状态。如果状态有很多重叠,这种方法效率低下 - 例如,两个对等方刚在一分钟前同步过,自那时以来只轻微更新了它们的状态。优化的协议,如 Yjs 的同步协议或拜占庭因果广播,使用来回消息来确定另一个对等方缺少的信息,并仅发送那些信息。

For academic work on hybrid or optimized sync strategies, look up delta-state based CRDTs (also called delta CRDTs). These are like hybrid CRDTs, with the added technical requirement that op-based messages are themselves states (in particular, they are input to the state-based merge function instead of a separate effect function). Note that some papers focus on novel sync strategies, while others focus on the orthogonal problem of how to tolerate non-causally-ordered messages.
对于混合或优化的同步策略的学术工作,请查阅基于增量状态的 CRDTs(也称为增量 CRDTs)。这些类似于混合 CRDTs,但增加了技术要求,即基于操作的消息本身是状态(特别是它们是状态合并函数的输入,而不是单独的 effect 函数)。请注意,一些论文关注新颖的同步策略,而另一些则关注如何容忍非因果排序消息的正交问题。


Collabs: Updates and Sync - Patterns
合作:更新和同步 - 模式

Refs: Yjs document updates; Almeida, Shoker, and Baquero 2016; Enes et al. 2019
参考文献:Yjs 文档更新;Almeida,Shoker 和 Baquero 2016;Enes 等人 2019

# Misc Techniques # 其他技术

The rest of this post describes specific algorithms. We start with miscellaneous techniques that are needed to implement some of the semantics from Part 2: two kinds of logical timestamps for LWW, and ways to query the causal order. These are all traditional distributed systems techniques that are not specific to CRDTs.
这篇文章的其余部分描述了具体的算法。我们从一些杂项技术开始,这些技术需要实现第 2 部分中的一些语义:LWW 的两种逻辑时间戳,以及查询因果顺序的方法。这些都是传统的分布式系统技术,不特定于 CRDT。

# LWW: Lamport Timestamps
# LWW:Lamport 时间戳

Recall that you should use a logical timestamp instead of wall-clock time for Last-Writer Wins (LWW) values. A Lamport timestamp is a simple and common logical timestamp, defined by:
请记住,对于最后写入者获胜(LWW)值,应使用逻辑时间戳而不是挂钟时间。 Lamport 时间戳是一种简单且常见的逻辑时间戳,定义如下:

  1. Each replica stores a clock value time, an integer that is initially 0.
    每个复制品都存储一个时钟值 time ,这是一个最初为 0 的整数。
  2. Whenever you perform an LWW set operation, increment time and attach its new value to the operation, as part of a pair (time, replicaID). This pair is called a Lamport timestamp.
    每当执行一个 LWW set 操作时,增加 time 并将其新值附加到操作中,作为一对 (time, replicaID) 。这对被称为 Lamport 时间戳。
  3. Whenever you receive a Lamport timestamp from another replica as part of an operation, set time = max(time, received time).
    每当您在操作的一部分收到另一个副本的 Lamport 时间戳时,请设置 time = max(time, received time)
  4. The total order on Lamport timestamps is given by: (t1, replica1) < (t2, replica2) if t1 < t2 or (t1 = t2 and replica1 < replica2). That is, the operation with a greater time “wins”, with ties broken using an arbitrary order on replica IDs.
    Lamport 时间戳上的总排序由以下规定:如果 t1 < t2 或( t1 = t2replica1 < replica2 ),则 (t1, replica1) < (t2, replica2) 。也就是说,具有更大时间戳的操作“获胜”,并使用副本 ID 上的任意顺序来打破平局。

Operations A-F with arrows A to B, A to E, B to C, C to D, C to F. The labels are: (1, "A84nxi"); (2, "A84nxi"); (3, "A84nxi"); (5, "A84nxi"); (2, "bu2nVP"); (4, "bu2nVP").

Figure 1. An operation history with each operation labeled by a Lamport timestamp.
图 1. 每个操作都由 Lamport 时间戳标记的操作历史。

Lamport timestamps have two important properties (mentioned in Part 2):
Lamport 时间戳具有两个重要属性(在第 2 部分中提到):

  1. If o < p in the causal order, then (o's Lamport timestamp) < (p's Lamport timestamp). Thus a new LWW set always wins over all causally-prior sets.
    如果 o < p 在因果顺序中,则 (o's Lamport timestamp) < (p's Lamport timestamp) 。因此,新的 LWW set 总是胜过所有因果先前的 set
    • Note that the converse does not hold: it’s possible that (q's Lamport timestamp) < (r's Lamport timestamp) but q and r are concurrent.
      请注意,逆否命题并不成立:可能存在 (q's Lamport timestamp) < (r's Lamport timestamp) ,但 qr 是同时发生的。
  2. Lamport timestamps created by different users are always distinct (because of the replicaID tiebreaker). Thus the winner is never ambiguous.
    Lamport 时间戳由不同用户创建,始终是不同的(因为 replicaID 决胜者)。因此,获胜者永远不会有歧义。

Collabs: lamportTimestamp
合作:Lamport 时间戳

Refs: Lamport 1978; Wikipedia
参考文献:Lamport 1978;维基百科

# LWW: Hybrid Logical Clocks
# LWW:混合逻辑时钟

Hybrid logical clocks are another kind of logical timestamp that combine features of Lamport timestamps and wall-clock time. I am not qualified to write about these, but Jared Forsyth gives a readable description here: https://jaredforsyth.com/posts/hybrid-logical-clocks/.
混合逻辑时钟是另一种逻辑时间戳,结合了 Lamport 时间戳和挂钟时间的特点。我没有资格写关于这些的内容,但 Jared Forsyth 在这里提供了一个可读的描述:https://jaredforsyth.com/posts/hybrid-logical-clocks/.

# Querying the Causal Order: Vector Clocks 2
查询因果顺序:向量时钟 2

One of Part 2’s “Other Techniques” was Querying the Causal Order. For example, an access-control CRDT could include a rule like “If Alice performs an operation but an admin banned her concurrently, then treat Alice’s operation as if it had not happened”.
第 2 部分的“其他技术”之一是查询因果顺序。例如,访问控制 CRDT 可以包括一个规则,如“如果 Alice 执行一个操作,但同时管理员禁止了她,那么将 Alice 的操作视为未发生”。

I mentioned in Part 2 that I find this technique too complicated for practical use, except in some special cases. Nevertheless, here are some ways to implement causal-order queries.
在第 2 部分中我提到,我发现这种技术对于实际应用来说过于复杂,除非在一些特殊情况下。尽管如此,这里有一些实现因果顺序查询的方法。

Formally, our goal is: given two operations o and p, answer the query “Is o < p in the causal order?”. More narrowly, a CRDT might query whether o and p are concurrent, i.e., neither o < p nor p < o.
正式来说,我们的目标是:给定两个操作 op ,回答查询“ o < p 是否在因果顺序中?”更具体地说,CRDT 可能会查询 op 是否并发,即既不 o < p 也不 p < o

Recall from above that a vector clock is a map that sends a replica ID to the maximum counter value received from that replica, where each replica assigns counters to its own operations in order starting at 1 (like a dot ID). Besides storing a vector clock on each replica, we can also attach a vector clock to each operation: namely, the sender’s vector clock at the time of sending. (The sender’s own entry is incremented to account for the operation itself.)
请回想上文中所提到的,向量时钟是一个将副本 ID 发送到从该副本接收的最大计数器值的映射,其中每个副本按顺序为其自己的操作分配计数器,从 1 开始(类似于点 ID)。除了在每个副本上存储一个向量时钟外,我们还可以将一个向量时钟附加到每个操作上:即,在发送时的发送者的向量时钟。(发送者自己的条目会增加以考虑操作本身。)

Operations A-F with arrows A to B, A to E, B to C, C to D, C to F. The labels are: ("A84nxi", 1) / { A84nxi: 1 }; ("A84nxi", 2) / { A84nxi: 2 }; ("A84nxi", 3) / { A84nxi: 3 }; ("A84nxi", 4) / { A84nxi: 4, bu2nVP: 2 }; ("bu2nVP", 1) / { A84nxi: 1, bu2nVP: 1 }; ("bu2nVP", 2) / { A84nxi: 3, bu2nVP: 2 }.

Figure 2. An operation history with each operation labeled by its dot ID (blue italics) and vector clock (red normal text).
图 2. 每个操作历史记录都标有其点 ID(蓝色斜体)和向量时钟(红色普通文本)。

Define a partial order on vector clocks by: v < w if for every replica ID r, v[r] <= w[r], and for at least one replica ID, v[r] < w[r]. (If r is not present in a map, treat its value as 0.) Then it is a classic result that the causal order on operations matches the partial order on their vector clocks. Thus storing each operation’s vector clock lets you query the causal order later.
通过以下方式在向量时钟上定义偏序关系: v < w 如果对于每个副本 ID rv[r] <= w[r] ,以及至少一个副本 ID, v[r] < w[r] 。 (如果 r 不在映射中,则将其值视为 0。)因此,经典结果表明操作之间的因果顺序与它们的向量时钟上的偏序相匹配。因此,存储每个操作的向量时钟可以让您稍后查询因果顺序。

Example: In the above diagram, { A84nxi: 1, bu2nVP: 1 } < { A84nxi: 4, bu2nVP: 2 }, matching the causal order on their operations. Meanwhile, { A84nxi: 1, bu2nVP: 1 } and { A84nxi: 2 } are incomparable, matching the fact that their operations are concurrent.
在上图中, { A84nxi: 1, bu2nVP: 1 } < { A84nxi: 4, bu2nVP: 2 } ,匹配它们操作的因果顺序。同时, { A84nxi: 1, bu2nVP: 1 }{ A84nxi: 2 } 是不可比较的,与它们的操作是并发的事实相匹配。

Often you only need to query the causal order on new operations. That is, you just received an operation p, and you want to compare it to an existing operation o. For this, it suffices to know o’s dot ID (replicaID, counter): If counter <= p.vc[replicaID], then o < p, else they are concurrent. Thus in this case, you don’t need to store each operation’s vector clock, just their dot IDs (though you must still send vector clocks over the network).
通常,您只需要查询新操作的因果顺序。也就是说,您刚刚收到一个操作 p ,并且您想将其与现有操作 o 进行比较。为此,只需知道 o 的点 ID (replicaID, counter) :如果 counter <= p.vc[replicaID] ,则 o < p ,否则它们是并发的。因此,在这种情况下,您不需要存储每个操作的向量时钟,只需它们的点 ID(尽管您仍然必须通过网络发送向量时钟)。

The above discussion changes slightly if you do not assume causal order delivery. See Wikipedia’s update rules.
如果您不假定因果顺序交付,则上述讨论会略有变化。请参阅维基百科的更新规则。


We have a performance problem: The size of a vector clock is proportional to the number of past replicas. In a collaborative app, this number tends to grow without bound: each browser tab creates a new replica, including refreshes. Thus if you attach a vector clock to each op-based message, your network usage also grows without bound.
我们有一个性能问题:向量时钟的大小与过去副本的数量成正比。在协作应用程序中,这个数量往往会无限增长:每个浏览器标签都会创建一个新的副本,包括刷新。因此,如果您将向量时钟附加到每个基于操作的消息上,您的网络使用量也会无限增长。

Some workarounds: 一些解决方法:

  1. Instead of attaching the whole vector clock to each op-based message, just attached the UIDs of the operation’s immediate causal predecessors. (You are probably attaching these anyway for causal-order delivery.) Then on the receiver side, look up the predecessors’ vector clocks in your stored state, take their entry-wise max, add one to the sender’s entry, and store that as the operation’s vector clock.
    不要将整个向量时钟附加到每个基于操作的消息上,只需附加操作的直接因果前任的 UID。 (您可能已经为因果顺序传递附加了这些。)然后在接收方,查找存储状态中的前任的向量时钟,取它们的逐个条目的最大值,将发送方的条目加一,然后将其存储为操作的向量时钟。
  2. Same as 1, but instead of storing the whole vector clock for each operation, just store its immediate causal predecessors - the arrows in the operation history. This uses less space, and it still contains enough information to answer causal order queries: o < p if and only if there is a path of arrows from o to p. However, I don’t know of a way to perform those queries quickly.
    与 1 相同,但不是为每个操作存储整个向量时钟,而是仅存储其直接因果前任 - 操作历史中的箭头。这样使用的空间更少,仍然包含足够的信息来回答因果顺序查询: o < p 当且仅当从 op 存在箭头路径时。然而,我不知道如何快速执行这些查询。
  3. Instead of referencing the causal order directly, the sender can list just the “relevant” o < p as part of p’s op-based message. For example, when you set the value of a multi-value register, instead of using a vector clock to indicate which set(x) operations are causally prior, just list the UIDs of the current multi-values (cf. the multi-value register on top of a unique set).
    发送方可以在操作消息的一部分中列出“相关的” o < p ,而不是直接引用因果顺序。例如,当您设置多值寄存器的值时,可以列出当前多值的 UID(参见唯一集合顶部的多值寄存器),而不是使用向量时钟来指示哪些 set(x) 操作在因果上是先前的。

Collabs: vectorClock 合作:向量时钟

Refs: Baquero and Preguiça 2016; Wikipedia; Automerge issue discussing workarounds
参考文献:Baquero 和 Preguiça 2016 年;维基百科;讨论解决方法的 Automerge 问题

# Optimized CRDTs 优化的 CRDTs

We now turn to optimizations. I focus on algorithmic optimizations that change what state you store and how you access it, as opposed to low-level code tricks. Usually, the optimizations reduce the amount of metadata that you need to store in memory and on disk, at least in the common case.
现在我们转向优化。我专注于算法优化,改变您存储的状态以及访问方式,而不是低级代码技巧。通常,这些优化会减少您需要在内存和磁盘上存储的元数据量,至少在常见情况下。

These optimizations are the most technical part of the blog series. You may wish to skip them for now and come back only when you are implementing one, or trust someone else to implement them in a library.
这些优化是博客系列中最技术性的部分。您可以选择暂时跳过它们,只有在您实施其中一个时再回来,或者信任其他人在库中实施它们。

# List CRDTs # 列出 CRDTs

I assume you’ve understood Lists and Text Editing from Part 2.
我假设你已经理解了第 2 部分中的列表和文本编辑。

There are too many list CRDT algorithms and optimizations to survey here, but I want to briefly introduce one key problem and solution.
这里有太多的列表 CRDT 算法和优化,无法在这里一一列举,但我想简要介绍一个关键问题和解决方案。

When you use a text CRDT to represent a collaborative text document, the easy way to represent the state is as an ordered map (list CRDT position) -> (text character). Concretely, this map could be a tree with one node per list CRDT position, like in Fugue: A Basic List CRDT.
当您使用文本 CRDT 来表示协作文档时,表示状态的简单方法是作为一个有序映射 (list CRDT position) -> (text character) 。具体来说,这个映射可以是一个树,每个列表 CRDT 位置对应一个节点,就像在 Fugue: A Basic List CRDT 中一样。

Example of a Fugue tree.

Figure 3. Example of a Fugue tree with corresponding text "abcde". Each node's UID is a dot.
图 3. 带有相应文本“abcde”的赋格树示例。每个节点的 UID 是一个点。

In such a tree, each tree node contains at minimum (1) a UID and (2) a pointer to its parent node. That is a lot of metadata for a single text character! Plus, you often need to store this metadata even for deleted characters (tombstones).
在这样的树中,每个树节点至少包含(1)一个 UID 和(2)指向其父节点的指针。这对于单个文本字符来说是大量的元数据!此外,即使对于已删除的字符(墓碑),您通常也需要存储这些元数据。

Here is an optimization that dramatically reduces the metadata overhead in practice:
这里有一个在实践中显著减少元数据开销的优化:

It’s possible to later insert characters in the middle of a sequence, e.g., between char1 and char2. That’s fine; the new characters just need to indicate the corresponding list CRDT positions (e.g. “I am a left child of (id, 2)”).
可以在序列中间插入字符,例如,在 char1char2 之间。这没问题;新字符只需要指示相应的列表 CRDT 位置(例如“我是 (id, 2) 的左子节点”)。

Applying this optimization to Fugue gives you trees like so, where only the filled nodes are stored explicitly (together with their children’s characters):
将此优化应用于 Fugue 会使您得到如下树状结构,其中仅存储填充节点(以及它们子节点的字符):

Example of an optimized Fugue tree.

Figure 4. Example of an optimized Fugue tree with corresponding text "abcdefg".
图 4. 优化的 Fugue 树示例,对应文本为"abcdefg"。

Collabs: Waypoints in CTotalOrder
合作:CTotalOrder 中的路标

Refs: Yu 2012; Jahns 2020 (Yjs blog post)
参考文献:Yu 2012;Jahns 2020(Yjs 博客文章)

# Formatting Marks (Rich Text)
格式标记(富文本)

Recall the inline formatting CRDT from Part 2. Its internal CRDT state is an append-only log of formatting marks
回顾第 2 部分中的内联格式化 CRDT。其内部 CRDT 状态是格式标记的追加日志。

type Mark = {
    key: string;
    value: any;
    timestamp: LogicalTimestamp;
    start: { pos: Position, type: "before" | "after" }; // type Anchor
    end: { pos: Position, type: "before" | "after" }; // type Anchor
}

Its app-visible state is the view of this log given by: for each character c, for each format key key, find the mark with the largest timestamp satisfying
其应用可见状态是由此日志给出的视图:对于每个字符 c ,对于每个格式键 key ,找到满足最大时间戳的标记

Then c’s format value at key is mark.value.
然后 c 的格式值在 keymark.value

For practical use, we would like a view that represents the same state but uses less memory. In particular, instead of storing per-character formatting info, it should look more like a Quill delta. E.g., the Quill delta representing “Quick brown fox” is
为了实际应用,我们希望有一个表示相同状态但使用更少内存的视图。特别是,不要存储每个字符的格式信息,它应该更像一个 Quill delta。例如,“Quick brown fox” 的 Quill delta 表示为:

{
    ops: [
        { insert: "Quick " },
        { insert: "brow", attributes: { bold: true, italic: true } },
        { insert: "n fox", attributes: { italic: true } }
    ]
}

Here is such a view. Its state is a map: Map<Anchor, Mark[]>, given by:
这是一个这样的视图。它的状态是 map: Map<Anchor, Mark[]> ,由以下给出:

Given this view, it is easy to look up the format of any particular character. You just need to go left until you reach an anchor that is in map, then interpret map.get(anchor) in the usual way: for each key, find the LWW winner at key and use its value. (If you reach the beginning of the list, the character has no formatting.)
鉴于这种观点,查找任何特定字符的格式变得很容易。您只需向左移动,直到找到位于 map 中的 anchor ,然后以通常的方式解释 map.get(anchor) :对于每个 key ,找到 key 处的 LWW 获胜者并使用其值。(如果到达列表开头,则字符没有格式。)

I claim that with sufficient coding effort, you can also do the following tasks efficiently:
我声明,通过足够的编码工作,您也可以高效地完成以下任务:

  1. Convert the whole view into a Quill delta or similar rich-text-editor state.
    将整个视图转换为 Quill delta 或类似的富文本编辑器状态。
  2. Update the view to reflect a new mark added to the log.
    更新视图以反映添加到日志中的新标记。
  3. After updating the view, emit events describing what changed, in a form like Collabs’s RichTextFormatEvent:
    更新视图后,以类似于 Collabs 的 RichTextFormatEvent 形式发出描述更改内容的事件:
type RichTextFormatEvent = {
    // The formatted range is [startIndex, endIndex).
    startIndex: number;
    endIndex: number;
    key: string;
    value: any; // null if unformatted
    previousValue: any;
    // The range's complete new format.
    format: { [key: string]: any };
}

Let’s briefly discuss Task 2; there’s more detail in the Peritext essay. When a new mark mark is added to the log:
让我们简要讨论任务 2;Peritext 文章中有更多细节。当一个新标记 mark 被添加到日志中时:

  1. If mark.start is not present in map, go to the left of it until you reach an anchor prev that is in map, then do map.set(mark.start, copy of map.get(prev)). (If you reach the beginning of the list, do map.set(mark.start, []).)
    如果 mark.start 不在 map 中,向其左侧移动,直到到达 map 中的锚点 prev ,然后执行 map.set(mark.start, copy of map.get(prev)) 。(如果到达列表开头,请执行 map.set(mark.start, []) 。)
  2. If mark.end is not present in map, do likewise.
    如果 map 中不存在 mark.end ,则执行相同操作。
  3. For each entry (anchor, array) in map such that mark.start <= anchor < mark.end, append mark to array.
    对于每个 map 中的 (anchor, array) ,如果 mark.start <= anchor < mark.end ,则将 mark 附加到 array

Some variations on this section:
这部分的一些变体:

Collabs: CRichText; its source code shows all three tasks
合作:CRichText;其源代码显示所有三个任务

Refs: Litt et al. 2021 (Peritext)
参考文献:Litt 等人。2021 年(Peritext)

# State-Based Counter # 基于状态的计数器

The easy way to count events in a collaborative app is to store the events in an append-only log or unique set. This uses more space than the count alone, but you often want that extra info anyway - e.g., to display who liked a post, in addition to the like count.
在协作应用程序中计算事件的简单方法是将事件存储在追加日志或唯一集中。这比仅计数使用更多空间,但通常您仍然希望获得额外信息 - 例如,显示谁喜欢了一篇帖子,而不仅仅是喜欢的计数。

Nonetheless, the optimized state-based counter CRDT is both interesting and traditional, so let’s see it.
尽管如此,经过优化的基于状态的计数器 CRDT 既有趣又传统,让我们来看看它。

The counter’s semantics are as in Part 1’s passenger counting example: its value is the number of +1 operations in the history, regardless of concurrency.
计数器的语义与第 1 部分的乘客计数示例相同:其值是历史记录中 +1 操作的数量,不考虑并发性。

You can obviously achieve this semantics by storing an append-only log of +1 operations. To merge two states, take the union of log entries, skipping duplicate UIDs.
您可以通过存储一个仅追加日志 +1 操作来明显实现这种语义。要合并两个状态,请取日志条目的并集,跳过重复的 UID。

In other words, store the entire operation history, following the simple protocol from the top of this post.
换句话说,存储整个操作历史,遵循此帖子顶部的简单协议。


Suppose the append-only log uses dot IDs as its UIDs. Then the log’s state will always look something like this:
假设追加日志使用点 ID 作为其 UID。那么日志的状态将始终看起来像这样:

[
    ((a6X7fx, 1), "+1"), ((a6X7fx, 2), "+1"), ((a6X7fx, 3), "+1"), ((a6X7fx, 4), "+1"),
    ((bu91nD, 1), "+1"), ((bu91nD, 2), "+1"), ((bu91nD, 3), "+1"),
    ((yyn898, 1), "+1"), ((yyn898, 2), "+1")
]

You can compress this state by storing, for each replicaID, only the range of dot IDs received from that replica. For example, the above log compresses to
您可以通过仅存储每个 replicaID 收到的副本的点 ID 范围来压缩此状态。例如,上述日志压缩为

{
    a6X7fx: 4,
    bu91nD: 3,
    yyn898: 2
}

This is the same trick we used in Vector Clocks 1.
这是我们在矢量时钟 1 中使用的相同技巧。

Compressing the log in this way leads to the following algorithm, the state-based counter CRDT.
以这种方式压缩日志会导致以下算法,即基于状态的计数器 CRDT。

For example: 请将以下文本翻译成简体中文,直接输出翻译文本,不要提供任何额外信息: Computer software is a set of instructions, data or programs used to operate computers and execute specific tasks

// Starting local state:
{
    a6X7fx: 2, // Implies ops ((a6X7fx, 1), "+1"), ((a6X7fx, 2), "+1")
    bu91nD: 3,
}

// Other state:
{
    a6X7fx: 4, // Implies ops ((a6X7fx, 1), "+1"), ..., ((a6X7fx, 4), "+1")
    bu91nD: 1,
    yyn898: 2
}

// Merged result:
{
    a6X7fx: 4, // Implies ops ((a6X7fx, 1), "+1"), ..., ((a6X7fx, 4), "+1"): union of inputs
    bu91nD: 3,
    yyn898: 2
}

You can generalize the state-based counter to handle +x operations for arbitrary positive values x. However, to handle both positive and negative additions, you need to use two counters: P for positive additions and N for negative additions. The actual value is (P's value) - (N's value). This is the state-based PN-counter.
您可以将基于状态的计数器泛化,以处理任意正值 +x 的操作。但是,要处理正数和负数的加法,您需要使用两个计数器: P 用于正数加法, N 用于负数加法。实际值为 (P's value) - (N's value) 。这就是基于状态的 PN 计数器。

Collabs: CCounter 合作:CCounter

Refs: Shapiro et al. 2011a
参考文献:Shapiro 等人 2011a

# Delta-State Based Counter
# 基于 Delta 状态的计数器

You can modify the state-based counter CRDT to also support op-based messages:
您可以修改基于状态的计数器 CRDT,以支持基于操作的消息

This hybrid op-based/state-based CRDT is called the delta-state based counter CRDT.
这种混合操作/基于状态的 CRDT 被称为增量状态计数器 CRDT。

Technically, the delta-state based counter CRDT assumes causal-order delivery for op-based messages. Without this assumption, a replica’s uncompressed log might contain gaps like
从技术上讲,基于状态差分的计数器 CRDT 假定操作基消息的因果顺序传递。如果没有这个假设,副本的未压缩日志可能会包含类似的间隙。

[
    ((a6X7fx, 1), "+1"), ((a6X7fx, 2), "+1"), ((a6X7fx, 3), "+1"), ((a6X7fx, 6), "+1")
]

which we can’t represent as a Map<ReplicaID, number>.
我们无法表示为 Map<ReplicaID, number>

You could argue that the operation ((a6X7fx, 6), "+1") lets you “infer” the prior operations ((a6X7fx, 4), "+1") and ((a6X7fx, 5), "+1"), hence you can just set the map entry to { a6X7fx: 6 }. However, this will give an unexpected counter value if those prior operations were deliberately undone, or if they’re tied to some other change that can’t be inferred (e.g., a count of comments vs the actual list of comments).
您可以认为操作 ((a6X7fx, 6), "+1") 让您“推断”之前的操作 ((a6X7fx, 4), "+1")((a6X7fx, 5), "+1") ,因此您可以将映射条目设置为 { a6X7fx: 6 } 。然而,如果那些先前的操作被故意撤消,或者它们与无法推断的其他更改相关联(例如,评论计数与实际评论列表),这将导致意外的计数值。


Luckily, you can still compress ranges within the dot IDs that you have received. For example, you could use a run-length encoding:
幸运的是,您仍然可以压缩您收到的点 ID 范围。例如,您可以使用一种游程编码:

{
    a6X7fx: [1 through 3, 6 through 6]
}

or a map plus a set of “extra” dot IDs:
或者一张地图加上一组“额外”的点 ID:

{
    map: {
        a6X7fx: 3
    },
    dots: [["a6X7fx", 6]]
}

This idea leads to a second delta-state based counter CRDT. Its state-based merge algorithm is somewhat complicated, but it has a simple spec: decompress both inputs, take the union, and re-compress.
这个想法导致了第二个基于Δ状态的计数器 CRDT。它的基于状态的合并算法有点复杂,但它有一个简单的规范:解压缩两个输入,取并集,然后重新压缩。

Refs: Almeida, Shoker, and Baquero 2016
参考文献:Almeida,Shoker 和 Baquero 2016

# State-Based Unique Set
# 基于状态的唯一集合

Here is a straightforward state-based CRDT for the unique set:
这是一个针对唯一集合的基于状态的简单 CRDT

For example: 请将以下文本翻译成简体中文,直接输出翻译文本,不要提供任何额外信息: Computer software is a set of instructions, data or programs used to operate computers and execute specific tasks

// Starting local state:
{
    elements: [ (("A84nxi", 1), "milk"), (("A84nxi", 3), "eggs") ],
    tombstones: [ ("A84nxi", 2), ("bu2nVP", 1), ("bu2nVP", 2) ]
}

// Other state:
{
    elements: [
        (("A84nxi", 3), "eggs"), (("bu2nVP", 1), "bread"), (("bu2nVP", 2), "butter"),
        (("bu2nVP", 3), "cereal")
    ],
    tombstones: [ ("A84nxi", 1), ("A84nxi", 2) ]
}

// Merged result:
{
    elements: [ (("A84nxi", 3), "eggs"), (("bu2nVP", 3), "cereal") ],
    tombstones: [ ("A84nxi", 1), ("A84nxi", 2), ("bu2nVP", 1), ("bu2nVP", 2) ]
}

The problem with this straightforward algorithm is the tombstone set: it stores a UID for every deleted element, potentially making the CRDT state much larger than the app-visible state.
这个直接算法的问题在于墓碑集合:它为每个已删除的元素存储一个 UID,可能使 CRDT 状态比应用程序可见状态大得多。

Luckily, when your UIDs are dot IDs, you can use a “compression” trick similar to the state-based counter CRDT: in place of the tombstone set, store the range of dot IDs received from each replica (deleted or not), as a Map<ReplicaID, number>. That is, store a modified vector clock that only counts add operations. Any dot ID that lies within this range, but is not present in elements, must have been deleted.
幸运的是,当您的 UID 是点 ID 时,您可以使用类似于基于状态的计数器 CRDT 的“压缩”技巧:在墓碑集的位置,存储从每个副本接收到的点 ID 范围(已删除或未删除),作为 Map<ReplicaID, number> 。也就是说,存储一个只计数 add 操作的修改后的向量时钟。在此范围内但不在 elements 中的任何点 ID 必定已被删除。

For example, the three states above compress to:
例如,上述三个状态压缩为:

// Starting local state:
{
    elements: [ (("A84nxi", 1), "milk"), (("A84nxi", 3), "eggs") ],
    vc: { A84nxi: 3, bu2nVP: 2 }
}

// Other state:
{
    elements: [
        (("A84nxi", 3), "eggs"), (("bu2nVP", 1), "bread"), (("bu2nVP", 2), "butter"),
        (("bu2nVP", 3), "cereal")
    ],
    vc: { A84nxi: 3, bu2nVP: 3 }
}

// Merged result:
{
    elements: [ (("A84nxi", 3), "eggs"), (("bu2nVP", 3), "cereal") ],
    vc: { A84nxi: 3, bu2nVP: 3 }
}

Compressing the tombstone set in this way leads to the following algorithm, the optimized state-based unique set:
以这种方式压缩设置的墓碑会导致以下算法,即优化的基于状态的唯一集合:

You can re-use the same vector clock that you use for tracking operations, if your unique set’s dot IDs use the same counter. Nitpicks:
如果您的唯一集的点 ID 使用相同的计数器,您可以重复使用用于跟踪操作的相同向量时钟。挑剔的地方:

  • Your unique set might skip over some dot IDs, because they are used for other operations; that is fine.
    您的唯一集合可能会跳过一些点 ID,因为它们用于其他操作;这没关系。
  • If a single operation can add multiple elements at once, consider using UIDs of the form (dot ID, within-op counter) = (replicaID, per-op counter, within-op counter).
    如果单个操作可以一次添加多个元素,请考虑使用形式为 (dot ID, within-op counter) = (replicaID, per-op counter, within-op counter) 的 UID。


The optimized unique set is especially important because you can implement many other CRDTs on top, which are then also optimized (they avoid tombstones). In particular, Part 2 describes unique set algorithms for the multi-value register, multi-value map, and add-wins set (all assuming causal-order delivery). You can also adapt the optimized unique set to manage deletions in the unique set of CRDTs.
优化的唯一集合尤为重要,因为您可以在其上实现许多其他 CRDT,这些 CRDT 也经过了优化(它们避免了墓碑)。特别是,第 2 部分描述了多值寄存器、多值映射和添加获胜集的唯一集合算法(都假定因果顺序传递)。您还可以调整优化的唯一集合以管理 CRDT 的唯一集合中的删除。

Collabs: CMultiValueMap, CSet
合作:CMultiValueMap,CSet

Refs: Based on the Optimized OR-Set from Bieniusa et al. 2012
基于 Bieniusa 等人 2012 年的优化 OR-Set。

# Delta-State Based Unique Set
# 基于 Delta-State 的唯一集合

Similar to the second delta-state based counter CRDT, you can create a delta-state based unique set that is a hybrid op-based/state-based CRDT and allows non-causal-order message delivery. I’ll leave it as an exercise.
类似于第二个基于增量状态的计数器 CRDT,您可以创建一个基于增量状态的唯一集合,它是一种混合操作/基于状态的 CRDT,并允许非因果顺序的消息传递。我将把它留作练习。

Refs: Causal δ-CRDTs in Almeida, Shoker, and Baquero 2016
参考文献:Almeida、Shoker 和 Baquero 2016 年的因果δ-CRDTs

# Conclusion 结论

The last two posts surveyed the two “topics” from Part 1:
最后两篇帖子调查了第 1 部分的两个“主题”

  1. Semantics: An abstract description of what a collaborative app’s state should be, given its concurrency-aware operation history.
    语义学:根据应用程序的并发感知操作历史,对协作应用程序状态应该是什么的抽象描述。
  2. Algorithms: Algorithms to efficiently compute the app’s state in specific, practical situations.
    算法:在特定的实际情况下高效计算应用程序状态的算法。

Together, they covered much of the CRDT theory that I know and use. I hope that you now know it too!
他们一起涵盖了我所了解和使用的大部分 CRDT 理论。希望你现在也了解它!

However, there are additional CRDT ideas outside my focus area. I’ll give a bibliography for those in the next and final post, Part 4: Further Topics.
然而,在我的关注领域之外还有其他 CRDT 想法。我将在接下来的最后一篇文章中提供参考文献,即第四部分:进一步主题。

This blog post is Part 3 of a series.
这篇博客文章是系列的第三部分。

⭐️ I'm on the market for a software engineering job.
⭐️ 我正在寻找软件工程师的工作。


Home • Matthew Weidner • PhD student at CMU CSD • mweidner037@gmail.com • @MatthewWeidner3LinkedInGitHub
主页 • Matthew Weidner • 卡内基梅隆大学计算机科学博士生 • mweidner037@gmail.com • @MatthewWeidner3 • 领英 • GitHub