这是用户在 2024-7-8 17:31 为 https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Skip to content
Open in github.dev Open in a new github.dev tab Open in codespace

Files 文件

t
944 lines (750 loc) · 36.6 KB
944 行(750 行)· 36.6 KB

17503-eliminate-rescan.md

File metadata and controls

944 lines (750 loc) · 36.6 KB
944 行(750 行)· 36.6 KB

Proposal: Eliminate STW stack re-scanning
提案:消除 STW 堆栈重新扫描

Author(s): Austin Clements, Rick Hudson
作者:Austin Clements,Rick Hudson

Last updated: 2016-10-21
最后更新日期:2016-10-21

Discussion at https://golang.org/issue/17503.
讨论请参见 https://golang.org/issue/17503。

Abstract 摘要

As of Go 1.7, the one remaining source of unbounded and potentially non-trivial stop-the-world (STW) time is stack re-scanning. We propose to eliminate the need for stack re-scanning by switching to a hybrid write barrier that combines a Yuasa-style deletion write barrier [Yuasa '90] and a Dijkstra-style insertion write barrier [Dijkstra '78]. Preliminary experiments show that this can reduce worst-case STW time to under 50µs, and this approach may make it practical to eliminate STW mark termination altogether.
截至 Go 1.7 版本,仍存在一个无界且潜在非平凡的全停顿时间来源,即堆栈重新扫描。我们建议通过切换到一种混合写屏障来消除堆栈重新扫描的需要,该写屏障结合了 Yuasa 风格的删除写屏障 [Yuasa '90] 和 Dijkstra 风格的插入写屏障 [Dijkstra '78]。初步实验表明,这可以将最坏情况的全停顿时间减少到 50 微秒以下,这种方法可能使消除全停顿标记终止变得实际可行。

Eliminating stack re-scanning will in turn simplify and eliminate many other parts of the garbage collector that exist solely to improve the performance of stack re-scanning. This includes stack barriers (which introduce significant complexity in many parts of the runtime) and maintenance of the re-scan list. Hence, in addition to substantially improving STW time, the hybrid write barrier should also reduce the overall complexity of the garbage collector.
消除堆栈重新扫描将简化并消除垃圾收集器的许多其他部分,这些部分仅存在于改进堆栈重新扫描性能的目的。这包括堆栈屏障(在运行时的许多部分引入了显着的复杂性)和重新扫描列表的维护。因此,除了大幅改善 STW 时间外,混合写屏障还应减少垃圾收集器的整体复杂性。

Background 背景

The Go garbage collector is a tricolor concurrent collector [Dijkstra '78]. Every object is shaded either white, grey, or black. At the beginning of a GC cycle, all objects are white, and it is the goal of the garbage collector to mark all reachable objects black and then free all white objects. The garbage collector achieves this by shading GC roots (stacks and globals, primarily) grey and then endeavoring to turn all grey objects black while satisfying the strong tricolor invariant:
Go 垃圾收集器是一种三色并发收集器[Dijkstra '78]。每个对象都被标记为白色、灰色或黑色。在 GC 周期开始时,所有对象都是白色,垃圾收集器的目标是将所有可达对象标记为黑色,然后释放所有白色对象。垃圾收集器通过将 GC 根(主要是堆栈和全局变量)标记为灰色,然后努力将所有灰色对象转为黑色来实现这一目标,同时满足强三色不变性:

No black object may contain a pointer to a white object.
没有黑色对象可以包含指向白色对象的指针。

Ensuring the tricolor invariant in the presence of concurrent pointer updates requires barriers on either pointer reads or pointer writes (or both). There are many flavors of barrier [Pirinen '98]. Go 1.7 uses a coarsened Dijkstra write barrier [Dijkstra '78], where pointer writes are implemented as follows:
在存在并发指针更新的情况下确保三色不变性需要在指针读取或指针写入(或两者)上设置屏障。有许多种屏障[Pirinen '98]。Go 1.7 使用粗粒度的 Dijkstra 写屏障[Dijkstra '78],其中指针写入实现如下:

writePointer(slot, ptr):
    shade(ptr)
    *slot = ptr

shade(ptr) marks the object at ptr grey if it is not already grey or black. This ensures the strong tricolor invariant by conservatively assuming that *slot may be in a black object, and ensuring ptr cannot be white before installing it in *slot.
shade(ptr) 如果对象不是灰色或黑色,则将其标记为灰色。这通过保守地假设 *slot 可能是黑色对象,确保在将其安装到 *slot 中之前 ptr 不能是白色,从而确保了强三色不变性。

The Dijkstra barrier has several advantages over other types of barriers. It does not require any special handling of pointer reads, which has performance advantages since pointer reads tend to outweigh pointer writes by an order of magnitude or more. It also ensures forward progress; unlike, for example, the Steele write barrier [Steele '75], objects transition monotonically from white to grey to black, so the total work is bounded by the heap size.
Dijkstra 屏障相对于其他类型的屏障具有几个优点。它不需要对指针读取进行任何特殊处理,这在性能上具有优势,因为指针读取往往比指针写入大一个数量级或更多。它还确保向前进展;例如,与 Steele 写屏障[Steele '75]不同,对象从白色过渡到灰色再到黑色,因此总工作量受堆大小限制。

However, it also has disadvantages. In particular, it presents a trade-off for pointers on stacks: either writes to pointers on the stack must have write barriers, which is prohibitively expensive, or stacks must be permagrey. Go chooses the later, which means that many stacks must be re-scanned during STW. The garbage collector first scans all stacks at the beginning of the GC cycle to collect roots. However, without stack write barriers, we can't ensure that the stack won't later contain a reference to a white object, so a scanned stack is only black until its goroutine executes again, at which point it conservatively reverts to grey. Thus, at the end of the cycle, the garbage collector must re-scan grey stacks to blacken them and finish marking any remaining heap pointers. Since it must ensure the stacks don't continue to change during this, the whole re-scan process happens with the world stopped.
然而,它也有缺点。特别是,它对堆栈上的指针提出了一个折衷方案:要么对堆栈上的指针进行写入必须具有写屏障,这是代价高昂的,要么堆栈必须是永久灰色。Go 选择了后者,这意味着许多堆栈在 STW 期间必须重新扫描。垃圾收集器首先在 GC 周期开始时扫描所有堆栈以收集根。然而,没有堆栈写入屏障,我们无法确保堆栈后来不会包含对白色对象的引用,因此扫描过的堆栈只有在其 goroutine 再次执行时才会变为黑色,此时它会保守地恢复为灰色。因此,在周期结束时,垃圾收集器必须重新扫描灰色堆栈以使其变为黑色,并完成标记任何剩余的堆指针。由于它必须确保在此期间堆栈不会继续更改,因此整个重新扫描过程发生在全局停止状态下。

Re-scanning the stacks can take 10's to 100's of milliseconds in an application with a large number of active goroutines.
重新扫描堆栈可能需要在具有大量活跃 goroutine 的应用程序中花费 10 到 100 毫秒。

Proposal 提案

We propose to eliminate stack re-scanning and replace Go's write barrier with a hybrid write barrier that combines a Yuasa-style deletion write barrier [Yuasa '90] with a Dijkstra-style insertion write barrier [Dijkstra '78]. The hybrid write barrier is implemented as follows:
我们建议消除堆栈重新扫描,并用混合写屏障替换 Go 的写屏障,该混合写屏障结合了 Yuasa 风格的删除写屏障[Yuasa '90]和 Dijkstra 风格的插入写屏障[Dijkstra '78]。混合写屏障的实现如下:

writePointer(slot, ptr):
    shade(*slot)
    if current stack is grey:
        shade(ptr)
    *slot = ptr

That is, the write barrier shades the object whose reference is being overwritten, and, if the current goroutine's stack has not yet been scanned, also shades the reference being installed.
这意味着写屏障会遮蔽被覆盖引用的对象,并且如果当前 goroutine 的堆栈尚未被扫描,则还会遮蔽被安装的引用。

The hybrid barrier makes stack re-scanning unnecessary; once a stack has been scanned and blackened, it remains black. Hence, it eliminates the need for stack re-scanning and the mechanisms that exist to support stack re-scanning, including stack barriers and the re-scan list.
混合屏障使得堆栈重新扫描变得不必要;一旦堆栈被扫描并变黑,它将保持为黑色。因此,它消除了堆栈重新扫描的需要以及存在以支持堆栈重新扫描的机制,包括堆栈屏障和重新扫描列表。

The hybrid barrier requires that objects be allocated black (allocate-white is a common policy, but incompatible with this barrier). However, while not required by Go's current write barrier, Go already allocates black for other reasons, so no change to allocation is necessary.
混合屏障要求对象分配为黑色(分配为白色是一种常见策略,但与此屏障不兼容)。然而,虽然 Go 当前的写屏障不要求这样做,但 Go 已经因其他原因分配为黑色,因此不需要更改分配。

The hybrid write barrier is equivalent to the "double write barrier" used in the adaptation of Metronome used in the IBM real-time Java implementation [Auerbach '07]. In that case, the garbage collector was incremental, rather than concurrent, but ultimately had to deal with the same problem of tightly bounded stop-the-world times.
混合写屏障等同于在 IBM 实时 Java 实现中使用的 Metronome 适配中使用的“双写屏障”[Auerbach '07]。在那种情况下,垃圾收集器是增量的,而不是并发的,但最终必须处理相同的严格界定的停止世界时间问题。

Reasoning 推理

A full proof of the hybrid write barrier is given at the end of this proposal. Here we give the high-level intuition behind the barrier.
在本提案的最后给出了混合写屏障的完整证明。这里我们介绍屏障背后的高层直觉。

Unlike the Dijkstra write barrier, the hybrid barrier does not satisfy the strong tricolor invariant: for example, a black goroutine (a goroutine whose stack has been scanned) can write a pointer to a white object into a black object without shading the white object. However, it does satisfy the weak tricolor invariant [Pirinen '98]:
与 Dijkstra 写屏障不同,混合屏障不满足强三色不变性:例如,黑色 goroutine(其堆栈已被扫描的 goroutine)可以将指向白色对象的指针写入黑色对象而不对白色对象进行着色。但是,它确实满足弱三色不变性[Pirinen '98]:

Any white object pointed to by a black object is reachable from a grey object via a chain of white pointers (it is grey-protected).
由黑对象指向的任何白对象都可以通过一系列白指针从灰对象访问(它是灰保护的)。

The weak tricolor invariant observes that it's okay for a black object to point to a white object, as long as some path ensures the garbage collector will get around to marking that white object.
弱三色不变性观察到,黑色对象指向白色对象是可以的,只要某条路径确保垃圾收集器最终会标记该白色对象。

Any write barrier has to prohibit a mutator from "hiding" an object; that is, rearranging the heap graph to violate the weak tricolor invariant so the garbage collector fails to mark a reachable object. For example, in a sense, the Dijkstra barrier allows a mutator to hide a white object by moving the sole pointer to it to a stack that has already been scanned. The Dijkstra barrier addresses this by making stacks permagray and re-scanning them during STW.
任何写屏障都必须禁止变异器“隐藏”对象;也就是说,重新排列堆图以违反弱三色不变量,使垃圾收集器无法标记可达对象。例如,在某种意义上,Dijkstra 屏障允许变异器通过将指向它的唯一指针移动到已经扫描过的堆栈中来隐藏一个白色对象。Dijkstra 屏障通过使堆栈永久变灰并在 STW 期间重新扫描它们来解决这个问题。

In the hybrid barrier, the two shades and the condition work together to prevent a mutator from hiding an object:
在混合屏障中,两个阴影和条件一起工作,以防止变异器隐藏对象

  1. shade(*slot) prevents a mutator from hiding an object by moving the sole pointer to it from the heap to its stack. If it attempts to unlink an object from the heap, this will shade it.
    shade(*slot) 通过将指向堆上对象的唯一指针从堆移动到其堆栈,防止了变异器隐藏对象。如果尝试从堆中取消链接对象,则会对其进行阴影处理。

  2. shade(ptr) prevents a mutator from hiding an object by moving the sole pointer to it from its stack into a black object in the heap. If it attempts to install the pointer into a black object, this will shade it.
    shade(ptr) 防止变异器通过将指向对象的唯一指针从其堆栈移动到堆中的黑色对象来隐藏对象。如果尝试将指针安装到黑色对象中,这将对其进行阴影处理。

  3. Once a goroutine's stack is black, the shade(ptr) becomes unnecessary. shade(ptr) prevents hiding an object by moving it from the stack to the heap, but this requires first having a pointer hidden on the stack. Immediately after a stack is scanned, it only points to shaded objects, so it's not hiding anything, and the shade(*slot) prevents it from hiding any other pointers on its stack.
    一旦 goroutine 的堆栈变为黑色, shade(ptr) 就变得不必要了。 shade(ptr) 防止通过将对象从堆栈移动到堆中来隐藏对象,但这需要首先在堆栈上隐藏一个指针。在扫描堆栈后不久,它只指向阴影对象,因此它不会隐藏任何东西, shade(*slot) 防止它隐藏其堆栈上的任何其他指针。

The hybrid barrier combines the best of the Dijkstra barrier and the Yuasa barrier. The Yuasa barrier requires a STW at the beginning of marking to either scan or snapshot stacks, but does not require a re-scan at the end of marking. The Dijkstra barrier lets concurrent marking start right away, but requires a STW at the end of marking to re-scan stacks (though more sophisticated non-STW approaches are possible [Hudson '97]). The hybrid barrier inherits the best properties of both, allowing stacks to be concurrently scanned at the beginning of the mark phase, while also keeping stacks black after this initial scan.
混合屏障结合了 Dijkstra 屏障和 Yuasa 屏障的优点。Yuasa 屏障要求在标记开始时进行 STW 以扫描或快照堆栈,但在标记结束时不需要重新扫描。Dijkstra 屏障允许并发标记立即开始,但需要在标记结束时进行 STW 以重新扫描堆栈(尽管可能存在更复杂的非 STW 方法[Hudson '97])。混合屏障继承了两者的最佳特性,允许在标记阶段开始时并发扫描堆栈,同时在此初始扫描后保持堆栈为黑色。

Rationale 原理

The advantage of the hybrid barrier is that it lets a stack scan permanently blacken a stack (without a STW and without write barriers to the stack), which entirely eliminates the need for stack re-scanning, in turn eliminating the need for stack barriers and re-scan lists. Stack barriers in particular introduce significant complexity throughout the runtime, as well as interfering with stack walks from external tools such as GDB and kernel-based profilers.
混合屏障的优势在于它允许堆栈扫描永久地使堆栈变黑(无需 STW 和无需对堆栈进行写屏障),从而完全消除了对堆栈重新扫描的需要,进而消除了对堆栈屏障和重新扫描列表的需要。特别是堆栈屏障在整个运行时引入了显着的复杂性,并干扰了来自外部工具(如 GDB 和基于内核的分析器)的堆栈遍历。

Also, like the Dijkstra-style write barrier, the hybrid barrier does not require a read barrier, so pointer reads are regular memory reads; and it ensures progress, since objects progress monotonically from white to grey to black.
与 Dijkstra 风格的写屏障一样,混合屏障也不需要读屏障,因此指针读取是常规内存读取;并且它确保了进展,因为对象从白色到灰色再到黑色的进展是单调的。

The disadvantages of the hybrid barrier are minor. It may result in more floating garbage, since it retains everything reachable from roots (other than stacks) at any point during the mark phase. However, in practice it's likely that the current Dijkstra barrier is retaining nearly as much. The hybrid barrier also prohibits certain optimizations: in particular, the Go compiler currently omits a write barrier if it can statically show that the pointer is nil, but the hybrid barrier requires a write barrier in this case. This may slightly increase binary size.
混合屏障的缺点很小。它可能会导致更多的浮动垃圾,因为它保留了在标记阶段的任何时刻从根(除了堆栈)可达的所有内容。然而,在实践中,当前的迪杰斯特拉屏障很可能保留了几乎相同数量的内容。混合屏障还禁止了某些优化:特别是,Go 编译器当前会在可以静态显示指针为 nil 时省略写屏障,但在这种情况下,混合屏障需要写屏障。这可能会略微增加二进制文件的大小。

Alternative barrier approaches
替代屏障方法

There are several variations on the proposed barrier that would also work, but we believe the proposed barrier represents the best set of trade-offs.
提出的障碍有几种变体也可以起作用,但我们认为提出的障碍代表了最好的权衡集合。

A basic variation is to make the Dijkstra-style aspect of the barrier unconditional:
一种基本的变体是使屏障的 Dijkstra 风格方面无条件:

writePointer(slot, ptr):
    shade(*slot)
    shade(ptr)
    *slot = ptr

The main advantage of this barrier is that it's easier to reason about. It directly ensures there are no black-to-white pointers in the heap, so the only source of black-to-white pointers can be scanned stacks. But once a stack is scanned, the only way it can get a white pointer is by traversing reachable objects, and any white object that can be reached by a goroutine with a black stack is grey-protected by a heap object.
这个屏障的主要优势在于更容易推理。它直接确保堆中没有黑到白的指针,因此黑到白指针的唯一来源可以是扫描堆栈。但是一旦扫描了堆栈,它可以获得白指针的唯一方式是通过遍历可达对象,并且可以通过具有黑堆栈的 goroutine 到达的任何白对象都受到堆对象的灰保护。

The disadvantage of this barrier is that it's effectively twice as expensive as the proposed barrier for most of the mark phase.
这个屏障的缺点是,在大部分标记阶段,它的成本实际上是建议屏障的两倍。

Similarly, we could simply coarsen the stack condition:
同样,我们可以简单地粗化堆栈条件:

writePointer(slot, ptr):
    shade(*slot)
    if any stack is grey:
        shade(ptr)
    *slot = ptr

This has the advantage of making cross-stack writes such as those allowed by channels safe without any special handling, but prolongs when the second shade is enabled, which slows down pointer writes.
这样做的好处是使得跨堆栈写入(例如通过通道允许的写入)变得安全,无需任何特殊处理,但在启用第二个阴影时会延长时间,从而减慢指针写入速度。

A different approach would be to require that all stacks be blackened before any heap objects are blackened, which would enable a pure Yuasa-style deletion barrier:
另一种方法是要求在黑化任何堆对象之前将所有堆栈都黑化,这将使得纯粹的兪浅式删除屏障成为可能:

writePointer(slot, ptr):
    shade(*slot)
    *slot = ptr

As originally proposed, the Yuasa barrier takes a complete snapshot of the stack before proceeding with marking. Yuasa argued that this was reasonable on hardware that could perform bulk memory copies very quickly. However, Yuasa's proposal was in the context of a single-threaded system with a comparatively small stack, while Go programs regularly have thousands of stacks that can total to a large amount of memory.
正如最初提出的那样,Yuasa 屏障在继续标记之前对堆栈进行完整快照。Yuasa 认为,在可以非常快速执行批量内存复制的硬件上,这是合理的。然而,Yuasa 的提议是在单线程系统的背景下提出的,该系统具有相对较小的堆栈,而 Go 程序通常具有成千上万个堆栈,这些堆栈的总内存量可能很大。

However, this complete snapshot isn't necessary. It's sufficient to ensure all stacks are black before scanning any heap objects. This allows stack scanning to proceed concurrently, but has the downside that it introduces a bottleneck to the parallelism of the mark phase between stack scanning and heap scanning. This bottleneck has downstream effects on goroutine availability, since allocation is paced against marking progress.
然而,这种完整的快照并非必需。在扫描任何堆对象之前,确保所有堆栈都是黑色就足够了。这样可以使堆栈扫描能够并发进行,但缺点是在堆栈扫描和堆扫描之间的标记阶段的并行性引入了瓶颈。这种瓶颈会对 goroutine 的可用性产生下游影响,因为分配是根据标记进度进行的。

Finally, there are other types of black mutator barrier techniques. However, as shown by Pirinen, all possible black mutator barriers other than the Yuasa barrier require a read barrier [Pirinen '98]. Given the relative frequency of pointer reads to writes, we consider this unacceptable for application performance.
最后,还有其他类型的黑色变异障碍技术。然而,正如 Pirinen 所展示的,除了 Yuasa 障碍之外的所有可能的黑色变异障碍都需要读取障碍[Pirinen '98]。鉴于指针读取与写入的相对频率,我们认为这对应用性能是不可接受的。

Alternative approaches to re-scanning
重新扫描的替代方法

Going further afield, it's also possible to make stack re-scanning concurrent without eliminating it [Hudson '97]. This does not require changes to the write barrier, but does introduce significant additional complexity into stack re-scanning. Proposal #17505 gives a detailed design for how to do concurrent stack re-scanning in Go.
进一步地,也可以使堆栈重新扫描并发,而无需消除它[Hudson '97]。这不需要对写屏障进行更改,但会在堆栈重新扫描中引入显着的额外复杂性。提案#17505 详细介绍了如何在 Go 中进行并发堆栈重新扫描的设计。

Other considerations 其他考虑事项

Channel operations and go statements
通道操作和 go 语句

The hybrid barrier assumes a goroutine cannot write to another goroutine's stack. This is true in Go except for two operations: channel sends and starting goroutines, which can copy values directly from one stack to another. For channel operations, the shade(ptr) is necessary if either the source stack or the destination stack is grey. For starting a goroutine, the destination stack is always black, so the shade(ptr) is necessary if the source stack is grey.
混合屏障假设一个 goroutine 不能写入另一个 goroutine 的堆栈。在 Go 中,这是正确的,除了两个操作:通道发送和启动 goroutine,它们可以直接从一个堆栈复制值到另一个堆栈。对于通道操作,如果源堆栈或目标堆栈是灰色,则需要 shade(ptr) 。对于启动 goroutine,目标堆栈始终是黑色,因此如果源堆栈是灰色,则需要 shade(ptr)

Racy programs 淫秽程序

In a racy program, two goroutines may store to the same pointer simultaneously and invoke the write barrier concurrently on the same slot. The hazard is that this may cause the barrier to fail to shade some object that it would have shaded in a sequential execution, particularly given a relaxed memory model. While racy Go programs are generally undefined, we have so far maintained that a racy program cannot trivially defeat the soundness of the garbage collector (since a racy program can defeat the type system, it can technically do anything, but we try to keep the garbage collector working as long as the program stays within the type system).
在一个充满风险的程序中,两个 goroutine 可能同时存储到同一个指针,并在同一个插槽上并发调用写屏障。危险在于这可能导致屏障未能正确标记某些对象,尤其是在放松的内存模型下,它可能会在顺序执行中标记的对象未被标记。虽然 Go 程序通常是未定义的,但我们迄今为止一直坚持认为,一个充满风险的程序不能轻易破坏垃圾收集器的完整性(因为一个充满风险的程序可以破坏类型系统,从技术上讲,它可以做任何事情,但我们尽量保持垃圾收集器在程序保持在类型系统内时正常工作)。

Suppose optr is the value of the slot before either write to the slot and ptr1 and ptr2 are the two pointers being written to the slot. "Before" is well-defined here because all architectures that Go supports have coherency, which means there is a total order over all reads and writes of a single memory location. If the goroutine's respective stacks have been scanned, then ptr1 and ptr2 will clearly be shaded, since those shades don't read from memory. Hence, the difficult case is if the goroutine's stacks have been scanned. In this case, the barriers reduce to:
假设 optr 是在写入该槽之前的值,ptr1 和 ptr2 是要写入该槽的两个指针。“之前”在这里是明确定义的,因为 Go 支持的所有架构都具有一致性,这意味着对于单个内存位置的所有读取和写入都有一个总顺序。如果 goroutine 的相应堆栈已经被扫描过,那么 ptr1 和 ptr2 显然会被阴影覆盖,因为这些阴影不会从内存中读取。因此,困难的情况是如果 goroutine 的堆栈已经被扫描过。在这种情况下,障碍会减少到:

Goroutine G1Goroutine G2
optr1 = *slot
shade(optr1)
*slot = ptr1
optr2 = *slot
shade(optr2)
*slot = ptr2

Given that we're only dealing with one memory location, the property of coherence means we can reason about this execution as if it were sequentially consistent. Given this, concurrent execution of the write barriers permits one outcome that is not permitted by sequential execution: if both barriers read *slot before assigning to it, then only optr will be shaded, and neither ptr1 nor ptr2 will be shaded by the barrier. For example:
鉴于我们只处理一个内存位置,一致性的属性意味着我们可以将这个执行视为顺序一致。鉴于此,写屏障的并发执行允许一种顺序执行不允许的结果:如果两个屏障在分配给它之前都读取 *slot ,那么只有 optr 会被阴影覆盖,而 ptr1 和 ptr2 都不会被屏障阴影覆盖。例如:

Goroutine G1Goroutine G2
optr1 = *slot
shade(optr1)

*slot = ptr1

optr2 = *slot

shade(optr2)

*slot = ptr2

We assert that this is safe. Suppose ptr1 is written first. This execution is nearly indistinguishable from an execution that simply skips the write of ptr1. The only way to distinguish it is if a read from another goroutine G3 observes slot between the two writes. However, given our assumption that stacks have already been scanned, either ptr1 is already shaded, or it must be reachable from some other place in the heap anyway (and will be shaded eventually), so concurrently observing ptr1 doesn't affect the marking or reachability of ptr1.
我们断言这是安全的。假设首先写入 ptr1。这个执行与简单跳过 ptr1 写入的执行几乎无法区分。唯一的区别是,如果另一个 goroutine G3 的读取在两次写入之间观察到 slot。然而,鉴于我们假设栈已经被扫描过,要么 ptr1 已经被标记,要么它必须从堆中的其他地方可达(并最终会被标记),因此同时观察 ptr1 不会影响 ptr1 的标记或可达性。

cgo

The hybrid barrier could be a problem if C code overwrites a Go pointer in Go memory with either nil or a C pointer. Currently, this operation does not require a barrier, but with any sort of deletion barrier, this does require the barrier. However, a program that does this would violate the cgo pointer passing rules, since Go code is not allowed to pass memory to a C function that contains Go pointers. Furthermore, this is one of the "cheap dynamic checks" enabled by the default setting of GODEBUG=cgocheck=1, so any program that violates this rule will panic unless the checks have been explicitly disabled.
混合屏障可能会成为一个问题,如果 C 代码用 nil 或 C 指针在 Go 内存中覆盖 Go 指针。目前,此操作不需要屏障,但任何删除屏障都需要屏障。然而,执行此操作的程序将违反 cgo 指针传递规则,因为不允许 Go 代码将包含 Go 指针的内存传递给 C 函数。此外,这是默认设置 GODEBUG=cgocheck=1 启用的“廉价动态检查”之一,因此除非显式禁用了检查,否则违反此规则的任何程序都将引发 panic。

Future directions 未来方向

Write barrier omission 写屏障省略

The current write barrier can be omitted by the compiler in cases where the compiler knows the pointer being written is permanently shaded, such as nil pointers, pointers to globals, and pointers to static data. These optimizations are generally unsafe with the hybrid barrier. However, if the compiler can show that both the current value of the slot and the value being written are permanently shaded, then it can still safely omit the write barrier. This optimization is aided by the fact that newly allocated objects are zeroed, so all pointer slots start out pointing to nil, which is permanently shaded.
当前的写屏障可以在编译器知道被写入的指针是永久着色的情况下被省略,比如 nil 指针、全局指针和静态数据指针。这些优化通常与混合屏障一起使用是不安全的。然而,如果编译器能够证明槽的当前值和正在被写入的值都是永久着色的,那么它仍然可以安全地省略写屏障。这种优化得益于新分配的对象被清零,因此所有指针槽最初都指向 nil,这是永久着色的。

Low-pause stack scans 低暂停堆栈扫描

Currently the garbage collector pauses a goroutine while scanning its stack. If goroutines have large stacks, this can introduce significant tail latency effects. The hybrid barrier and the removal of the existing stack barrier mechanism would make it feasible to perform stack scans with only brief goroutine pauses.
目前,垃圾收集器在扫描 goroutine 的堆栈时会暂停该 goroutine。如果 goroutine 具有较大的堆栈,这可能会引入显著的尾延迟效应。混合屏障和移除现有堆栈屏障机制将使得可以在仅短暂暂停 goroutine 的情况下执行堆栈扫描成为可能。

In this design, scanning a stack pauses the goroutine briefly while it scans the active frame. It then installs a blocking stack barrier over the return to the next frame and lets the goroutine resume. Stack scanning then continues toward the outer frames, moving the stack barrier up the stack as it goes. If the goroutine does return as far as the stack barrier, before it can return to an unscanned frame, the stack barrier blocks until scanning can scan that frame and move the barrier further up the stack.
在这种设计中,扫描堆栈会暂停 goroutine 一小段时间,同时扫描活动帧。然后,在返回到下一个帧时安装一个阻塞堆栈屏障,让 goroutine 恢复。然后,堆栈扫描继续向外部帧移动,随着扫描的进行,堆栈屏障也随之上移。如果 goroutine 返回到堆栈屏障,需要返回到一个未扫描的帧之前,堆栈屏障会阻塞,直到扫描可以扫描该帧并将屏障进一步移动到堆栈上。

One complication is that a running goroutine could attempt to grow its stack during the stack scan. The simplest solution is to block the goroutine if this happens until the scan is done.
一个复杂性在于正在运行的 goroutine 可能会在堆栈扫描期间尝试增长其堆栈。最简单的解决方案是,如果发生这种情况,阻塞 goroutine 直到扫描完成。

Like the current stack barriers, this depends on write barriers when writing through pointers to other frames. For example, in a partially scanned stack, an active frame could use an up-pointer to move a pointer to a white object out of an unscanned frame and into the active frame. Without a write barrier on the write that removes the pointer from the unscanned frame, this could hide the white object from the garbage collector.
与当前的堆栈屏障一样,这取决于在通过指向其他帧的指针进行写入时的写入屏障。例如,在部分扫描的堆栈中,活动帧可以使用上指针将指向白色对象的指针从未扫描的帧移出并移入活动帧。如果在从未扫描的帧中移除指针的写入操作中没有写入屏障,这可能会将白色对象隐藏在垃圾回收器之外。

However, with write barriers on up-pointers, this is safe. Rather than arguing about "partially black" stacks, the write barrier on up-pointers lets us view the stack as a sequence of separate frames, where unscanned frames are treated as part of the heap. Writes without write barriers can only happen to the active frame, so we only have to view the active frame as the stack.
然而,通过对上游指针使用写屏障,这是安全的。与其争论“部分黑色”堆栈,对上游指针使用写屏障让我们将堆栈视为一系列单独的帧,未扫描的帧被视为堆的一部分。没有写屏障的写入只能发生在活动帧上,因此我们只需要将活动帧视为堆栈。

This design is technically possible now, but the complexity of composing it with the existing stack barrier mechanism makes it unappealing. With the existing stack barriers gone, the implementation of this approach becomes relatively straightforward. It's also generally simpler than the existing stack barriers in many dimensions, since there are at most two stack barriers per goroutine at a time, and they are present only during the stack scan.
这种设计在技术上现在是可能的,但是与现有的堆栈屏障机制组合的复杂性使其不太吸引人。随着现有的堆栈屏障消失,这种方法的实现变得相对简单。在许多方面,它通常比现有的堆栈屏障更简单,因为每个 goroutine 最多只有两个堆栈屏障,并且它们仅在堆栈扫描期间存在。

Strictly bounded mark termination
严格限定的标记终止

This proposal goes a long way toward strictly bounding the time spent in STW mark termination, but there are some other known causes of longer mark termination pauses. The primary cause is a race that can trigger mark termination while there is still remaining heap mark work. This race and how to resolve it are detailed in the "Mark completion race" appendix.
这项提案在严格限制 STW 标记终止所花费的时间方面取得了长足进展,但还有一些其他已知导致标记终止暂停时间较长的原因。主要原因是一种竞争条件,可能在仍有剩余堆标记工作时触发标记终止。这种竞争条件以及如何解决它在“标记完成竞争”附录中有详细说明。

Concurrent mark termination
并发标记终止

With stack re-scanning out of mark termination, it may become practical to make the remaining tasks in mark termination concurrent and eliminate the mark termination STW entirely. On the other hand, the hybrid barrier may reduce STW so much that completely eliminating it is not a practical concern.
通过堆栈重新扫描来终止标记,可能会使标记终止中剩余的任务变得实际可行,并完全消除标记终止的 STW。另一方面,混合屏障可能会减少 STW,以至于完全消除它不是一个实际关注的问题。

The following is a probably incomplete list of remaining mark termination tasks and how to address them. Worker stack scans can be eliminated by having workers self-scan during mark. Scanning the finalizer queue can be eliminated by adding explicit barriers to queuefinalizer. Without these two scans (and with the fix for the mark completion race detailed in the appendix), mark termination will produce no mark work, so finishing the work queue drain also becomes unnecessary. mcaches can be flushed lazily at the beginning of the sweep phase using rolling synchronization. Flushing the heap profile can be done immediately at the beginning of sweep (this is already concurrent-safe, in fact). Finally, updating global statistics can be done using atomics and possibly a global memory barrier.
以下是可能不完整的剩余标记终止任务列表以及如何解决它们。通过在标记期间让工作线程自行扫描,可以消除工作堆栈扫描。通过向 queuefinalizer 添加显式屏障可以消除扫描终结器队列。没有这两个扫描(以及附录中详细介绍的标记完成竞争修复),标记终止将不会产生标记工作,因此完成工作队列排空也变得不必要。可以在扫描阶段开始时使用滚动同步来延迟刷新 mcache 。可以立即在扫描开始时刷新堆剖面(实际上已经是并发安全的)。最后,可以使用原子操作和可能的全局内存屏障来更新全局统计信息。

Concurrent sweep termination
并发扫描终止

Likewise, it may be practical to eliminate the STW for sweep termination. This is slightly complicated by the fact that the hybrid barrier requires a global memory fence at the beginning of the mark phase to enable the write barrier and ensure all pointer writes prior to enabling the write barrier are visible to the write barrier. Currently, the STW for sweep termination and setting up the mark phase accomplishes this. If we were to make sweep termination concurrent, we could instead use a ragged barrier to accomplish the global memory fence, or the membarrier syscall on recent Linux kernels.
同样,消除 STW 以进行扫描终止可能是实际的。这在一定程度上变得稍微复杂,因为混合屏障需要在标记阶段开始时进行全局内存屏障,以启用写屏障并确保在启用写屏障之前的所有指针写入对写屏障可见。目前,STW 用于扫描终止和设置标记阶段来实现这一点。如果我们将扫描终止并发化,我们可以改用不规则屏障来实现全局内存屏障,或者在最近的 Linux 内核上使用 membarrier 系统调用。

Compatibility 兼容性

This proposal does not affect the language or any APIs and hence satisfies the Go 1 compatibility guidelines.
该提案不会影响语言或任何 API,因此符合 Go 1 兼容性准则。

Implementation 实现

Austin plans to implement the hybrid barrier during the Go 1.8 development cycle. For 1.8, we will leave stack re-scanning support in the runtime for debugging purposes, but disable it by default using a GODEBUG variable. Assuming things go smoothly, we will remove stack re-scanning support when the tree opens for Go 1.9 development.
Austin 计划在 Go 1.8 开发周期中实现混合屏障。对于 1.8 版本,我们将在运行时保留堆栈重新扫描支持以用于调试目的,但默认情况下将其禁用,使用 GODEBUG 变量。假设一切顺利,当 Go 1.9 开发树开放时,我们将删除堆栈重新扫描支持。

The planned implementation approach is:
计划中的实施方法是:

  1. Fix places that do unusual or "clever" things with memory containing pointers and make sure they cooperate with the hybrid barrier. We'll presumably find more of these as we debug in later steps, but we'll have to make at least the following changes:
    修复那些对包含指针的内存做出异常或“聪明”操作的地方,并确保它们与混合屏障协作。随着我们在后续步骤中进行调试,可能会发现更多这样的情况,但至少需要进行以下更改:

    1. Ensure barriers on stack-to-stack copies for channel sends and starting goroutines.
      确保通道发送和启动 goroutine 的堆栈到堆栈拷贝上的屏障。

    2. Check all places where we clear memory since the hybrid barrier requires distinguishing between clearing for initialization and clearing already-initialized memory. This will require a barrier-aware memclr and disabling the duffzero optimization for pointers with types.
      检查我们清除内存的所有地方,因为混合屏障需要区分清除初始化和清除已初始化内存。这将需要一个了解屏障的 memclr 和禁用具有类型指针的 duffzero 优化。

    3. Check all uses of unmanaged memory in the runtime to make sure it is initialized properly. This is particularly important for pools of unmanaged memory such as the fixalloc allocator that may reuse memory.
      检查运行时中所有对非托管内存的使用,确保其被正确初始化。这对于非托管内存池(如 fixalloc 分配器)等可能重用内存的情况尤为重要。

  2. Implement concurrent scanning of background mark worker stacks. Currently these are placed on the rescan list and only scanned during mark termination, but we're going to disable the rescan list. We could arrange for background mark workers to scan their own stacks, or explicitly keep track of heap pointers on background mark worker stacks.
    实现后台标记工作线程堆栈的并发扫描。目前,这些堆栈被放置在重新扫描列表上,并且仅在标记终止期间扫描,但我们将禁用重新扫描列表。我们可以安排后台标记工作线程扫描它们自己的堆栈,或者明确地在后台标记工作线程堆栈上跟踪堆指针。

  3. Modify the write barrier to implement the hybrid write barrier and the compiler to disable write barrier elision optimizations that aren't valid for the hybrid barrier.
    修改写屏障以实现混合写屏障,并修改编译器以禁用对混合屏障无效的写屏障省略优化。

  4. Disable stack re-scanning by making rescan enqueuing a no-op unless a GODEBUG variable is set. Likewise, disable stack barrier insertion unless this variable is set.
    通过使重新扫描排队成为无操作,除非设置了 GODEBUG 变量,来禁用堆栈重新扫描。同样,除非设置了此变量,否则禁用堆栈屏障插入。

  5. Use checkmark mode and stress testing to verify that no objects are missed.
    使用勾选模式和压力测试来验证没有遗漏任何对象。

  6. Wait for the Go 1.9 development cycle.
    等待 Go 1.9 开发周期。

  7. Remove stack re-scanning, the rescan list, stack barriers, and the GODEBUG variable to enable re-scanning. Possibly, switch to low-pause stack scans, which can reuse some of the stack barrier mechanism.
    删除堆栈重新扫描、重新扫描列表、堆栈屏障和 GODEBUG 变量以启用重新扫描。可能切换到低暂停堆栈扫描,可以重用一些堆栈屏障机制。

Appendix: Mark completion race
附录:标记完成比赛

Currently, because of a race in the mark completion condition, the garbage collector can begin mark termination when there is still available mark work. This is safe because mark termination will finish draining this work, but it makes mark termination unbounded. This also interferes with possible further optimizations that remove all mark work from mark termination.
目前,由于在标记完成条件中存在竞争,垃圾收集器可能会在仍有可用的标记工作时开始标记终止。这是安全的,因为标记终止将完成排空此工作,但这使得标记终止无限制。这也会干扰可能进一步优化的可能性,从标记终止中删除所有标记工作。

Specifically, the following interleaving starts mark termination without draining all mark work:
具体来说,以下交错开始标记终止而不耗尽所有标记工作:

Initially workers is zero and there is one buffer on the full list.
最初 workers 为零,在完整列表上有一个缓冲区。

Thread 1 线程 1Thread 2 线程 2
inc(&workers)
gcDrain(&work)
=> acquires only full buffer
=> adds more pointer to work
work.dispose()
=> returns buffer to full list
n := dec(&workers) [n=0]
if n == 0 && [true]
full.empty() && [true]
markrootNext >= markrootJobs { [true]
    startMarkTerm()

}

inc(&workers) gcDrain(&work) => acquires only full buffer

=> adds more pointers to work ...

In this example, a race between observing the workers count and observing the state of the full list causes thread 1 to start mark termination prematurely. Simply checking full.empty() before decrementing workers exhibits a similar race.
在这个示例中,观察 workers 计数和观察完整列表状态之间的竞争导致线程 1 过早启动标记终止。在递减 workers 之前简单地检查 full.empty() 会展示类似的竞争。

To fix this race, we propose introducing a single atomic non-zero indicator for the number of non-empty work buffers. Specifically, this will count the number of work caches that are caching a non-empty work buffer plus one for a non-empty full list. Many buffer list operations can be done without modifying this count, so we believe it will not be highly contended. If this does prove to be a scalability issue, there are well-known techniques for scalable non-zero indicators [Ellen '07].
为了解决这个竞争问题,我们建议引入一个单一的原子非零指示器,用于表示非空工作缓冲区的数量。具体来说,这将计算缓存非空工作缓冲区的工作缓存数量,再加上一个非空完整列表。许多缓冲区列表操作可以在不修改此计数的情况下完成,因此我们认为这不会有很高的争用。如果这被证明是一个可扩展性问题,那么已经有为可扩展的非零指示器而知名的技术[Ellen '07]。

Appendix: Proof of soundness, completeness, and boundedness
附录:声音性、完备性和有界性的证明

This section argues that the hybrid write barrier satisfies the weak tricolor invariant, and hence is sound in the sense that it does not collect reachable objects; that it terminates in a bounded number of steps; and that it eventually collects all unreachable objects, and hence is complete. We have also further verified these properties using a randomized stateless model.
本节论证了混合写屏障满足弱三色不变量,因此在不收集可达对象的意义上是正确的;它在有限步数内终止;最终收集所有不可达对象,因此是完整的。我们还使用随机无状态模型进一步验证了这些属性。

The following proofs consider global objects to be a subset of the heap objects. This is valid because the write barrier applies equally to global objects. Similarly, we omit explicit discussion of nil pointers, since the nil pointer can be considered an always-black heap object of zero size.
以下证明将全局对象视为堆对象的子集。这是有效的,因为写屏障同样适用于全局对象。同样,我们省略了对 nil 指针的明确讨论,因为 nil 指针可以被视为一个始终为黑色且大小为零的堆对象。

The hybrid write barrier satisfies the weak tricolor invariant [Pirinen '98]. However, rather than directly proving this, we prove that it satisfies the following modified tricolor invariant:
混合写屏障满足弱三色不变性[Pirinen '98]。然而,我们并非直接证明这一点,而是证明它满足以下修改后的三色不变性:

Any white object pointed to by a black object is grey-protected by a heap object (reachable via a chain of white pointers from the grey heap object). That is, for every B -> W edge, there is a path G -> W₁ -> ⋯ -> Wₙ -> W where G is a heap object.
由黑对象指向的任何白对象都受堆对象的灰色保护(通过从灰堆对象的白指针链到达)。也就是说,对于每个 B -> W 边,存在一条路径 G -> W₁ -> ⋯ -> Wₙ -> W,其中 G 是堆对象。

This is identical to the weak tricolor invariant, except that it requires that the grey-protector is a heap object. This trivially implies the weak tricolor invariant, but gives us a stronger basis for induction in the proof.
这与弱三色不变性相同,唯一的区别是它要求灰色保护者是一个堆对象。这显然意味着弱三色不变性,但在证明中为我们提供了更强的归纳基础。

Lemma 1 establishes a simple property of paths we'll use several times.
引理 1 建立了路径的一个简单属性,我们将多次使用。

Lemma 1. In a path O₁ -> ⋯ -> Oₙ where O₁ is a heap object, all Oᵢ must be heap objects.
引理 1. 在路径 O₁ -> ⋯ -> Oₙ 中,其中 O₁ 是一个堆对象,所有 Oᵢ 必须是堆对象。

Proof. Since O₁ is a heap object and heap objects can only point to other heap objects, by induction, all Oᵢ must be heap objects. ∎
证明。由于 O₁ 是一个堆对象,堆对象只能指向其他堆对象,通过归纳,所有的 Oᵢ 必须是堆对象。 ∎

In particular, if some object is grey-protected by a heap object, every object in the grey-protecting path must be a heap object.
特别是,如果某个对象由堆对象保护,则灰色保护路径中的每个对象都必须是堆对象。

Lemma 2 extends the modified tricolor invariant to white objects that are indirectly reachable from black objects.
引理 2 将修改后的三色不变量扩展到从黑色对象间接可达的白色对象。

Lemma 2. If the object graph satisfies the modified tricolor invariant, then every white object reachable (directly or indirectly) from a black object is grey-protected by a heap object.
引理 2。如果对象图满足修改后的三色不变性,则从黑色对象直接或间接可达的每个白色对象都受堆对象的灰色保护。

Proof. Let W be a white object reachable from black object B via simple path B -> O₁ -> ⋯ -> Oₙ -> W. Note that W and all Oᵢ and must be heap objects because stacks can only point to themselves (in which case it would not be a simple path) or heap objects, so O₁ must be a heap object, and by lemma 1, the rest of the path must be heap objects. Without loss of generality, we can assume none of Oᵢ are black; otherwise, we can simply reconsider using the shortest path suffix that starts with a black object.
证明。设 W 是从黑对象 B 通过简单路径 B -> O₁ -> ⋯ -> Oₙ -> W 可达的白对象。注意 W 和所有 Oᵢ必须是堆对象,因为栈只能指向自身(在这种情况下不会是简单路径)或堆对象,所以 O₁必须是堆对象,根据引理 1,路径的其余部分必须是堆对象。不失一般性,我们可以假设 Oᵢ中没有一个是黑色;否则,我们可以简单地重新考虑使用以黑对象开头的最短路径后缀。

If there are no Oᵢ, B points directly to W and the modified tricolor invariant directly implies that W is grey-protected by a heap object.
如果没有 Oᵢ,则 B 直接指向 W,并且修改后的三色不变性直接暗示 W 受到堆对象的灰色保护。

If any Oᵢ is grey, then W is grey-protected by the last grey object in the path.
如果任何 Oᵢ 是灰色的,则 W 受到路径中最后一个灰色对象的保护。

Otherwise, all Oᵢ are white. Since O₁ is a white object pointed to by a black object, O₁ is grey-protected by some path G -> W₁ -> ⋯ -> Wₙ -> O₁ where G is a heap object. Thus, W is grey-protected by G -> W₁ -> ⋯ -> Wₙ -> O₁ -> ⋯ -> Oₙ -> W. ∎
否则,所有的 Oᵢ 都是白色的。由于 O₁ 是黑色对象指向的白色对象,O₁ 是灰色的,受到路径 G -> W₁ -> ⋯ -> Wₙ -> O₁ 的保护,其中 G 是堆对象。因此,W 受到 G -> W₁ -> ⋯ -> Wₙ -> O₁ -> ⋯ -> Oₙ -> W 的保护。 ∎

Lemma 3 builds on lemma 2 to establish properties of objects reachable by goroutines.
引理 3 基于引理 2 建立了 goroutine 可达对象的属性。

Lemma 3. If the object graph satisfies the modified tricolor invariant, then every white object reachable by a black goroutine (a goroutine whose stack has been scanned) is grey-protected by a heap object.
引理 3。如果对象图满足修改后的三色不变性,则每个被黑色 goroutine(其堆栈已被扫描)访问的白色对象都受堆对象保护。

Proof. Let W be a white object reachable by a black goroutine. If W is reachable from the goroutine's stack, then by lemma 2 W is grey-protected by a heap object. Otherwise, W must be reachable from a global X. Let O be the last non-white object in the path from X to W (O must exist because X itself is either grey or black). If O is grey, then O is a heap object that grey-protects W. Otherwise, O is black and by lemma 2, W is grey-protected by some heap object. ∎
证明。设 W 是黑色 goroutine 可访问的白色对象。如果 W 是从 goroutine 的堆栈可访问的,则根据引理 2,W 由堆对象灰色保护。否则,W 必须是从全局 X 可访问的。设 O 是从 X 到 W 路径中最后一个非白色对象(O 必须存在,因为 X 本身要么是灰色要么是黑色)。如果 O 是灰色,则 O 是一个灰色保护 W 的堆对象。否则,O 是黑色的,根据引理 2,W 由某个堆对象灰色保护。 ∎

Now we're ready to prove that the hybrid write barrier satisfies the weak tricolor invariant, which implies it is sound (it marks all reachable objects).
现在我们准备证明混合写屏障满足弱三色不变量,这意味着它是正确的(它标记所有可达对象)。

Theorem 1. The hybrid write barrier satisfies the weak tricolor invariant.
定理 1. 混合写屏障满足弱三色不变性。

Proof. We first show that the hybrid write barrier satisfies the modified tricolor invariant. The proof follows by induction over the operations that affect the object graph or its coloring.
证明。我们首先展示混合写屏障满足修改后的三色不变性。证明通过对影响对象图或其着色的操作进行归纳得出。

Base case. Initially there are no black objects, so the invariant holds trivially.
基本情况。最初没有黑色物体,因此不变量显然成立。

Write pointer in the heap. Let obj.slot := ptr denote the write, where obj is in the heap, and let optr denote the value of obj.slot prior to the write.
在堆中写指针。让 obj.slot := ptr 表示写操作,其中 obj 在堆中,并且让 optr 表示写操作之前 obj.slot 的值。

Let W be a white object pointed to by a black object B after the heap write. There are two cases:
让 W 成为堆写操作后由黑对象 B 指向的白对象。有两种情况:

  1. B ≠ obj: W was pointed to by them same black object B before the write, and, by assumption, W was grey-protected by a path G -> W₁ -> ⋯ -> Wₙ -> W, where G is a heap object. If none of these edges are obj.slot, then W is still protected by G. Otherwise, the path must have included the edge obj -> optr and, since the write barrier shades optr, W is grey-protected by optr after the heap write.
    B ≠ obj:在写操作之前,W 被它们指向的相同黑色对象 B 指向,并且根据假设,W 通过路径 G -> W₁ -> ⋯ -> Wₙ -> W 受到灰色保护,其中 G 是堆对象。如果这些边中没有一个是 obj.slot,则 W 仍然受到 G 的保护。否则,路径必须包括边缘 obj -> optr,并且由于写屏障遮蔽了 optr,所以在堆写入后,W 受到 optr 的灰色保护。

  2. B = obj: We first establish that W was grey-protected before the write, which breaks down into two cases:
    B = obj:我们首先建立 W 在写入之前是灰保护的,这可以分解为两种情况:

    1. W = ptr: The goroutine must be black, because otherwise the write barrier shades ptr, so it is not white. ptr must have been reachable by the goroutine for it to write it, so by lemma 3, ptr was grey-protected by some heap object G prior to the write.
      W = ptr:goroutine 必须是黑色的,否则写屏障会遮挡 ptr,因此它不是白色的。goroutine 必须能够访问 ptr 才能写入它,因此根据引理 3,写入之前 ptr 必须是某个堆对象 G 保护的灰色。

    2. W ≠ ptr: B pointed to W before the write and, by assumption, W was grey-protected by some heap object G before the write.
      W ≠ ptr:在写入之前,B 指向了 W,并且根据假设,在写入之前,某个堆对象 G 对 W 进行了灰色保护。

    Because obj was black before the write, it could not be in the grey-protecting path from G to W, so this write did not affect this path, so W is still grey-protected by G.
    因为在写入之前,obj 是黑色的,所以它不可能在从 G 到 W 的灰色保护路径中,因此这次写入不会影响这条路径,因此 W 仍然受到 G 的灰色保护。

Write pointer in a stack. Let stk.slot := ptr denote the write.
在堆栈中写指针。让 stk.slot := ptr 表示写操作。

Let W be a white object pointed to by a black object B after the stack write. We first establish that W was grey-protected before the stack write, which breaks down into two cases:
让 W 成为黑对象 B 指向的白对象,在栈写入后。我们首先要确定在栈写入之前,W 是灰色受保护的,这可以分为两种情况:

  1. B = stk and W = ptr: W may not have been pointed to by a black object prior to the stack write (that is, the write may create a new B -> W edge). However, ptr must have been reachable by the goroutine, which is black (because B = stk), so by lemma 3, W was grey-protected by some heap object G prior to the write.
    B = stk and W = ptr: W 可能在堆栈写入之前未被黑色对象指向(即,写入可能创建新的 B -> W 边缘)。但是,ptr 必须是 goroutine 可达的,goroutine 是黑色的(因为 B = stk),因此根据引理 3,W 在写入之前由某个堆对象 G 进行了灰色保护。

  2. Otherwise: W was pointed to by the same black object B prior to the stack write, so, by assumption, W was grey-protected by some heap object G prior to the write.
    否则:在堆栈写入之前,相同的黑色对象 B 指向了 W,因此,根据假设,在写入之前,某个堆对象 G 对 W 进行了灰色保护。

By lemma 1, none of the objects in the grey-protecting path from heap object G to W can be a stack, so the stack write does not modify this path. Hence, W is still grey-protected after the stack write by G.
根据引理 1,从堆对象 G 到 W 的灰保护路径中的任何对象都不能是堆栈,因此堆栈写入不会修改此路径。因此,在堆栈写入 G 后,W 仍然受到灰保护。

Scan heap object. Let obj denote the scanned object. Let W be an object pointed to by a black object B after the scan. B cannot be obj because immediately after the scan, obj does not point to any white objects. Thus, B must have been black and pointed to W before the scan as well, so, by assumption, W was grey-protected by a path G -> W₁ -> ⋯ -> Wₙ -> W, where G is a heap object. If some Wᵢ was an object pointed to by obj, then W is grey-protected by Wᵢ after the scan. Otherwise, W is still grey-protected by G.
扫描堆对象。让 obj 表示被扫描的对象。设 W 是扫描后由黑色对象 B 指向的对象。B 不能是 obj,因为扫描后,obj 不指向任何白色对象。因此,B 在扫描之前必须是黑色并指向 W,所以,根据假设,W 被路径 G -> W₁ -> ⋯ -> Wₙ -> W 保护,其中 G 是堆对象。如果某个 Wᵢ 是 obj 指向的对象,则扫描后 W 被 Wᵢ 保护。否则,W 仍然被 G 保护。

Stack scan. This case is symmetric with scanning a heap object.
堆栈扫描。这种情况与扫描堆对象对称。

Allocate an object. Since new objects are allocated black and point to nothing, the invariant trivially holds across allocation.
分配一个对象。由于新对象是分配的黑色并且指向空值,因此不变式在分配过程中显然保持不变。

Create a stack. This case is symmetric with object allocation because new stacks start out empty and hence are trivially black.
创建一个堆栈。这种情况与对象分配对称,因为新堆栈开始为空,因此是平凡的黑色。

This completes the induction cases and shows that the hybrid write barrier satisfies the modified tricolor invariant. Since the modified tricolor invariant trivially implies the weak tricolor invariant, the hybrid write barrier satisfies the weak tricolor invariant. ∎
这完成了归纳情况,并显示混合写屏障满足修改的三色不变性。由于修改的三色不变性显然蕴含弱三色不变性,混合写屏障满足弱三色不变性。 ∎

The garbage collector is also bounded, meaning it eventually terminates.
垃圾回收器也是有界的,这意味着它最终会终止。

Theorem 2. A garbage collector using the hybrid write barrier terminates in a finite number of marking steps.
定理 2. 使用混合写屏障的垃圾收集器在有限数量的标记步骤中终止。

Proof. We observe that objects progress strictly from white to grey to black and, because new objects (including stacks) are allocated black, the total marking work is bounded by the number of objects at the beginning of garbage collection, which is finite. ∎
证明。我们观察到对象严格地从白色进展到灰色再到黑色,因为新对象(包括堆栈)被分配为黑色,所以总标记工作受到垃圾收集开始时对象数量的限制,这是有限的。 ∎

Finally, the garbage collector is also complete, in the sense that it eventually collects all unreachable objects. This is trivial from the fact that the garbage collector cannot mark any objects that are not reachable when the mark phase starts.
最后,垃圾收集器也是完整的,因为它最终会收集所有不可达对象。这是微不足道的事实,即垃圾收集器在标记阶段开始时无法标记任何不可达对象。

References 参考资料

[Auerbach '07] J. Auerbach, D. F. Bacon, B. Blainey, P. Cheng, M. Dawson, M. Fulton, D. Grove, D. Hart, and M. Stoodley. Design and implementation of a comprehensive real-time java virtual machine. In Proceedings of the 7th ACM & IEEE international conference on Embedded software (EMSOFT '07), 249–258, 2007.
[Auerbach '07] J. Auerbach, D. F. Bacon, B. Blainey, P. Cheng, M. Dawson, M. Fulton, D. Grove, D. Hart, and M. Stoodley. 设计和实现一个全面的实时 Java 虚拟机。在第 7 届 ACM 和 IEEE 国际嵌入式软件会议(EMSOFT '07)论文集中,2007 年,249-258。

[Dijkstra '78] E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. Steffens. On-the-fly garbage collection: An exercise in cooperation. Communications of the ACM, 21(11), 966–975, 1978.

[Ellen '07] F. Ellen, Y. Lev, V. Luchango, and M. Moir. SNZI: Scalable nonzero indicators. In Proceedings of the 26th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Portland, OR, August 2007.
[Ellen '07] F. Ellen, Y. Lev, V. Luchango, and M. Moir. SNZI: 可扩展的非零指标。在第 26 届 ACM SIGACT-SIGOPS 分布式计算原理研讨会论文集中,2007 年 8 月,俄勒冈州波特兰。

[Hudson '97] R. L. Hudson, R. Morrison, J. E. B. Moss, and D. S. Munro. Garbage collecting the world: One car at a time. In ACM SIGPLAN Notices 32(10):162–175, October 1997.
[Hudson '97] R. L. Hudson, R. Morrison, J. E. B. Moss, 和 D. S. Munro. 垃圾收集世界:一次收集一辆车。在 ACM SIGPLAN Notices 32(10):162–175, 1997 年 10 月。

[Pirinen '98] P. P. Pirinen. Barrier techniques for incremental tracing. In ACM SIGPLAN Notices, 34(3), 20–25, October 1998.
[Pirinen '98] P. P. Pirinen. 用于增量跟踪的屏障技术。在 ACM SIGPLAN 通知中,34(3),20-25,1998 年 10 月。

[Steele '75] G. L. Steele Jr. Multiprocessing compactifying garbage collection. Communications of the ACM, 18(9), 495–508, 1975.
[Steele '75] G. L. Steele Jr. 多处理器紧缩垃圾回收。ACM 通讯,18(9),495–508,1975。

[Yuasa '90] T. Yuasa. Real-time garbage collection on general-purpose machines. Journal of Systems and Software, 11(3):181–198, 1990.
[Yuasa '90] T. Yuasa. 通用计算机上的实时垃圾回收。《系统与软件》杂志,11(3):181–198,1990 年。