这是用户在 2025-4-10 1:04 为 https://preshing.com/20120612/an-introduction-to-lock-free-programming/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Preshing on ProgrammingPreshing on Programming  Preshing 谈编程

An Introduction to Lock-Free Programming
无锁编程简介

Lock-free programming is a challenge, not just because of the complexity of the task itself, but because of how difficult it can be to penetrate the subject in the first place.
无锁编程是一项挑战,不仅因为任务本身的复杂性,还因为深入理解这一主题本身就相当困难。

I was fortunate in that my first introduction to lock-free (also known as lockless) programming was Bruce Dawson’s excellent and comprehensive white paper, Lockless Programming Considerations. And like many, I’ve had the occasion to put Bruce’s advice into practice developing and debugging lock-free code on platforms such as the Xbox 360.
我很幸运,第一次接触无锁(也称为 lockless)编程就是 Bruce Dawson 的优秀且全面的白皮书《Lockless Programming Considerations》。和许多人一样,我也有机会将 Bruce 的建议付诸实践,在 Xbox 360 等平台上开发和调试无锁代码。

Since then, a lot of good material has been written, ranging from abstract theory and proofs of correctness to practical examples and hardware details. I’ll leave a list of references in the footnotes. At times, the information in one source may appear orthogonal to other sources: For instance, some material assumes sequential consistency, and thus sidesteps the memory ordering issues which typically plague lock-free C/C++ code. The new C++11 atomic library standard throws another wrench into the works, challenging the way many of us express lock-free algorithms.
自那时起,涌现了大量优质资料,内容涵盖抽象理论、正确性证明到实际案例及硬件细节。我将在脚注中列出参考文献。有时,不同来源的信息可能看似互不相关:例如,某些资料假设了顺序一致性,从而避开了通常困扰无锁 C/C++ 代码的内存排序问题。而新的 C++11 原子库标准又带来了新的挑战,改变了我们许多人表达无锁算法的方式。

In this post, I’d like to re-introduce lock-free programming, first by defining it, then by distilling most of the information down to a few key concepts. I’ll show how those concepts relate to one another using flowcharts, then we’ll dip our toes into the details a little bit. At a minimum, any programmer who dives into lock-free programming should already understand how to write correct multithreaded code using mutexes, and other high-level synchronization objects such as semaphores and events.
在这篇文章中,我想重新介绍无锁编程,首先定义它,然后将大部分信息提炼为几个关键概念。我会用流程图展示这些概念之间的联系,接着我们会稍微深入一些细节。至少,任何深入研究无锁编程的程序员应该已经理解如何使用互斥锁(mutexes)以及其他高级同步对象(如信号量和事件)编写正确的多线程代码。

What Is It?  什么是无锁编程?

People often describe lock-free programming as programming without mutexes, which are also referred to as locks. That’s true, but it’s only part of the story. The generally accepted definition, based on academic literature, is a bit more broad. At its essence, lock-free is a property used to describe some code, without saying too much about how that code was actually written.
人们通常将无锁编程描述为不使用互斥锁(也称为锁)的编程方式。这没错,但这只是故事的一部分。基于学术文献的普遍定义更为宽泛。从本质上讲,无锁是一种用于描述某些代码特性的术语,而并不具体说明这些代码是如何编写的。

Basically, if some part of your program satisfies the following conditions, then that part can rightfully be considered lock-free. Conversely, if a given part of your code doesn’t satisfy these conditions, then that part is not lock-free.
基本上,如果你的程序某部分满足以下条件,那么这部分代码就可以正当地被视为无锁的。相反,如果代码的某部分不满足这些条件,那么这部分就不是无锁的。

In this sense, the lock in lock-free does not refer directly to mutexes, but rather to the possibility of “locking up” the entire application in some way, whether it’s deadlock, livelock – or even due to hypothetical thread scheduling decisions made by your worst enemy. That last point sounds funny, but it’s key. Shared mutexes are ruled out trivially, because as soon as one thread obtains the mutex, your worst enemy could simply never schedule that thread again. Of course, real operating systems don’t work that way – we’re merely defining terms.
从这个意义上说,“无锁” 中的 “锁” 并非直接指互斥锁,而是指以某种方式 “锁死” 整个应用程序的可能性,无论是死锁、活锁,还是由于假想中最恶劣的线程调度决策所导致。最后一点听起来有些滑稽,但却是关键所在。共享互斥锁显然被排除在外,因为一旦某个线程获得了互斥锁,你想象中的 “最坏敌人” 完全可以不再调度该线程。当然,实际操作系统并非如此运作 —— 我们只是在定义术语。

Here’s a simple example of an operation which contains no mutexes, but is still not lock-free. Initially, X = 0. As an exercise for the reader, consider how two threads could be scheduled in a way such that neither thread exits the loop.
这里有一个简单例子,展示了一个不含互斥锁但仍非无锁的操作。初始时,X = 0。作为读者的练习,请思考如何调度两个线程,使得两者都无法退出循环。

while (X == 0)
{
    X = 1 - X;
}

Nobody expects a large application to be entirely lock-free. Typically, we identify a specific set of lock-free operations out of the whole codebase. For example, in a lock-free queue, there might be a handful of lock-free operations such as push, pop, perhaps isEmpty, and so on.
没有人期望一个大型应用完全无锁。通常,我们会从整个代码库中识别出一组特定的无锁操作。例如,在一个无锁队列中,可能存在少量无锁操作,如 pushpop ,或许还有 isEmpty 等等。

Herlihy & Shavit, authors of The Art of Multiprocessor Programming, tend to express such operations as class methods, and offer the following succinct definition of lock-free (see slide 150): “In an infinite execution, infinitely often some method call finishes.” In other words, as long as the program is able to keep calling those lock-free operations, the number of completed calls keeps increasing, no matter what. It is algorithmically impossible for the system to lock up during those operations.
《多处理器编程的艺术》作者 Herlihy 与 Shavit 倾向于将此类操作表述为类方法,并给出了以下关于无锁的简洁定义(参见幻灯片 150):“在无限执行中,某些方法调用会无限频繁地完成。” 换言之,只要程序能持续调用这些无锁操作,无论发生什么,已完成调用的数量都会不断增加。在这些操作期间,系统在算法上不可能陷入锁定状态。

One important consequence of lock-free programming is that if you suspend a single thread, it will never prevent other threads from making progress, as a group, through their own lock-free operations. This hints at the value of lock-free programming when writing interrupt handlers and real-time systems, where certain tasks must complete within a certain time limit, no matter what state the rest of the program is in.
无锁编程的一个重要推论是:若暂停单个线程,该线程永远不会阻碍其他线程作为一个整体通过其自身的无锁操作继续推进。这揭示了无锁编程在编写中断处理程序和实时系统时的价值 —— 在这些场景中,某些任务必须无论程序其余部分处于何种状态,都需在规定时限内完成。

A final precision: Operations that are designed to block do not disqualify the algorithm. For example, a queue’s pop operation may intentionally block when the queue is empty. The remaining codepaths can still be considered lock-free.
最后需要明确的是:设计为阻塞的操作不会使算法丧失无锁资格。例如,队列的弹出操作在队列为空时可能故意阻塞,其余代码路径仍可被视为无锁。

Lock-Free Programming Techniques
无锁编程技术

It turns out that when you attempt to satisfy the non-blocking condition of lock-free programming, a whole family of techniques fall out: atomic operations, memory barriers, avoiding the ABA problem, to name a few. This is where things quickly become diabolical.
事实证明,当你试图满足无锁编程的非阻塞条件时,会衍生出一系列技术:原子操作、内存屏障、避免 ABA 问题等,不一而足。这正是事情迅速变得棘手的地方。

So how do these techniques relate to one another? To illustrate, I’ve put together the following flowchart. I’ll elaborate on each one below.
那么这些技术之间如何关联呢?为了说明这一点,我整理了以下流程图,并将在下文逐一详解。

Atomic Read-Modify-Write Operations
原子性读 - 改 - 写操作

Atomic operations are ones which manipulate memory in a way that appears indivisible: No thread can observe the operation half-complete. On modern processors, lots of operations are already atomic. For example, aligned reads and writes of simple types are usually atomic.
原子操作是以一种看似不可分割的方式操作内存:任何线程都无法观察到操作完成一半的状态。在现代处理器上,许多操作本身已是原子的。例如,简单类型的对齐读写通常就是原子性的。

Read-modify-write (RMW) operations go a step further, allowing you to perform more complex transactions atomically. They’re especially useful when a lock-free algorithm must support multiple writers, because when multiple threads attempt an RMW on the same address, they’ll effectively line up in a row and execute those operations one-at-a-time. I’ve already touched upon RMW operations in this blog, such as when implementing a lightweight mutex, a recursive mutex and a lightweight logging system.
读 - 改 - 写(RMW)操作更进一步,允许你以原子方式执行更复杂的交易。它们在无锁算法需要支持多写入者时尤为有用,因为当多个线程尝试对同一地址进行 RMW 操作时,它们实际上会排成一行,逐个执行这些操作。我在本博客中已经提到过 RMW 操作,比如在实现轻量级互斥锁、递归互斥锁和轻量级日志系统时。

Examples of RMW operations include _InterlockedIncrement on Win32, OSAtomicAdd32 on iOS, and std::atomic<int>::fetch_add in C++11. Be aware that the C++11 atomic standard does not guarantee that the implementation will be lock-free on every platform, so it’s best to know the capabilities of your platform and toolchain. You can call std::atomic<>::is_lock_free to make sure.
RMW 操作的例子包括 Win32 平台上的 _InterlockedIncrement 、iOS 上的 OSAtomicAdd32 以及 C++11 中的 std::atomic<int>::fetch_add 。需要注意的是,C++11 的原子标准并不保证在所有平台上的实现都是无锁的,因此最好了解你的平台和工具链的能力。你可以调用 std::atomic<>::is_lock_free 来确认这一点。

Different CPU families support RMW in different ways. Processors such as PowerPC and ARM expose load-link/store-conditional instructions, which effectively allow you to implement your own RMW primitive at a low level, though this is not often done. The common RMW operations are usually sufficient.
不同的 CPU 家族以不同的方式支持 RMW。像 PowerPC 和 ARM 这样的处理器提供了加载 - 链接 / 存储 - 条件指令,这实际上允许你在底层实现自己的 RMW 原语,尽管这种做法并不常见。通常,常见的 RMW 操作已经足够使用。

As illustrated by the flowchart, atomic RMWs are a necessary part of lock-free programming even on single-processor systems. Without atomicity, a thread could be interrupted halfway through the transaction, possibly leading to an inconsistent state.
如流程图所示,即使在单处理器系统中,原子性的读 - 修改 - 写操作也是无锁编程的必要组成部分。缺乏原子性时,线程可能在事务执行中途被中断,从而导致状态不一致。

Compare-And-Swap Loops  比较并交换循环

Perhaps the most often-discussed RMW operation is compare-and-swap (CAS). On Win32, CAS is provided via a family of intrinsics such as _InterlockedCompareExchange. Often, programmers perform compare-and-swap in a loop to repeatedly attempt a transaction. This pattern typically involves copying a shared variable to a local variable, performing some speculative work, and attempting to publish the changes using CAS:
讨论最多的读 - 修改 - 写操作或许是比较并交换(CAS)。在 Win32 平台上,CAS 通过一系列类似 _InterlockedCompareExchange 的内部函数实现。程序员通常会在循环中执行比较并交换操作,以反复尝试完成事务。这种模式通常包括将共享变量复制到局部变量,执行一些推测性工作,然后尝试使用 CAS 发布更改:

void LockFreeQueue::push(Node* newHead)
{
    for (;;)
    {
        // Copy a shared variable (m_Head) to a local.
        Node* oldHead = m_Head;

        // Do some speculative work, not yet visible to other threads.
        newHead->next = oldHead;

        // Next, attempt to publish our changes to the shared variable.
        // If the shared variable hasn't changed, the CAS succeeds and we return.
        // Otherwise, repeat.
        if (_InterlockedCompareExchange(&m_Head, newHead, oldHead) == oldHead)
            return;
    }
}

Such loops still qualify as lock-free, because if the test fails for one thread, it means it must have succeeded for another – though some architectures offer a weaker variant of CAS where that’s not necessarily true. Whenever implementing a CAS loop, special care must be taken to avoid the ABA problem.
这类循环仍属于无锁范畴,因为如果一个线程的测试失败,必然意味着另一个线程已成功 —— 尽管某些架构提供的 CAS 弱变体可能不符合此特性。在实现 CAS 循环时,必须特别注意避免 ABA 问题。

Sequential Consistency  顺序一致性

Sequential consistency means that all threads agree on the order in which memory operations occurred, and that order is consistent with the order of operations in the program source code. Under sequential consistency, it’s impossible to experience memory reordering shenanigans like the one I demonstrated in a previous post.
顺序一致性意味着所有线程对内存操作发生的顺序达成一致,且该顺序与程序源代码中的操作顺序保持一致。在顺序一致性模型下,不可能出现我之前文章中演示的那种内存重排序的混乱情况。

A simple (but obviously impractical) way to achieve sequential consistency is to disable compiler optimizations and force all your threads to run on a single processor. A processor never sees its own memory effects out of order, even when threads are pre-empted and scheduled at arbitrary times.
实现顺序一致性的一种简单(但显然不切实际)方法是禁用编译器优化,并强制所有线程在单处理器上运行。处理器永远不会看到自身内存操作乱序的现象,即使线程被抢占并以任意时间调度时也是如此。

Some programming languages offer sequentially consistency even for optimized code running in a multiprocessor environment. In C++11, you can declare all shared variables as C++11 atomic types with default memory ordering constraints. In Java, you can mark all shared variables as volatile. Here’s the example from my previous post, rewritten in C++11 style:
某些编程语言即使在多处理器环境下运行优化代码时也能提供顺序一致性。在 C++11 中,您可以将所有共享变量声明为具有默认内存顺序约束的 C++11 原子类型。在 Java 中,您可以将所有共享变量标记为 volatile 。以下是我前一篇文章中的示例,用 C++11 风格重写:

std::atomic<int> X(0), Y(0);
int r1, r2;

void thread1()
{
    X.store(1);
    r1 = Y.load();
}

void thread2()
{
    Y.store(1);
    r2 = X.load();
}

Because the C++11 atomic types guarantee sequential consistency, the outcome r1 = r2 = 0 is impossible. To achieve this, the compiler outputs additional instructions behind the scenes – typically memory fences and/or RMW operations. Those additional instructions may make the implementation less efficient compared to one where the programmer has dealt with memory ordering directly.
由于 C++11 原子类型保证了顺序一致性,结果 r1 = r2 = 0 是不可能出现的。为了实现这一点,编译器在幕后生成了额外的指令 —— 通常是内存屏障和 / 或 RMW 操作。与程序员直接处理内存排序的情况相比,这些额外指令可能会降低实现效率。

Memory Ordering  内存排序

As the flowchart suggests, any time you do lock-free programming for multicore (or any symmetric multiprocessor), and your environment does not guarantee sequential consistency, you must consider how to prevent memory reordering.
如流程图所示,每当为多核(或任何对称多处理器)进行无锁编程时,若运行环境无法保证顺序一致性,就必须考虑如何防止内存重排序。

On today’s architectures, the tools to enforce correct memory ordering generally fall into three categories, which prevent both compiler reordering and processor reordering:
在当今架构中,强制正确内存排序的工具通常分为三类,这些工具可同时防止编译器重排序和处理器重排序:

  • A lightweight sync or fence instruction, which I’ll talk about in future posts;
    轻量级的同步或栅栏指令(我将在后续文章中详述);
  • A full memory fence instruction, which I’ve demonstrated previously;
    完整的内存屏障指令,我之前已经演示过;
  • Memory operations which provide acquire or release semantics.
    提供获取或释放语义的内存操作。

Acquire semantics prevent memory reordering of operations which follow it in program order, and release semantics prevent memory reordering of operations preceding it. These semantics are particularly suitable in cases when there’s a producer/consumer relationship, where one thread publishes some information and the other reads it. I’ll also talk about this more in a future post.
获取语义防止程序顺序中后续操作的内存重排序,而释放语义防止其之前操作的内存重排序。这些语义特别适用于生产者 / 消费者关系的场景,即一个线程发布信息,另一个线程读取它。我将在未来的文章中进一步讨论这一点。

Different Processors Have Different Memory Models
不同处理器具有不同的内存模型

Different CPU families have different habits when it comes to memory reordering. The rules are documented by each CPU vendor and followed strictly by the hardware. For instance, PowerPC and ARM processors can change the order of memory stores relative to the instructions themselves, but normally, the x86/64 family of processors from Intel and AMD do not. We say the former processors have a more relaxed memory model.
不同 CPU 家族在内存重排序方面有着不同的习惯。各 CPU 厂商会严格遵循并记录这些规则。例如,PowerPC 和 ARM 处理器可以改变内存存储相对于指令本身的顺序,但通常英特尔和 AMD 的 x86/64 系列处理器则不会。我们称前者的处理器拥有更为宽松的内存模型。

There’s a temptation to abstract away such platform-specific details, especially with C++11 offering us a standard way to write portable lock-free code. But currently, I think most lock-free programmers have at least some appreciation of platform differences. If there’s one key difference to remember, it’s that at the x86/64 instruction level, every load from memory comes with acquire semantics, and every store to memory provides release semantics – at least for non-SSE instructions and non-write-combined memory. As a result, it’s been common in the past to write lock-free code which works on x86/64, but fails on other processors.
人们常倾向于抽象掉这些平台特定的细节,尤其是 C++11 为我们提供了编写可移植无锁代码的标准方法。但目前,我认为大多数无锁程序员至少对平台差异有所了解。如果要记住一个关键区别,那就是在 x86/64 指令级别上,每次内存加载都带有获取语义,每次内存存储都提供释放语义 —— 至少对于非 SSE 指令和非写合并内存而言。因此,过去常见的情况是编写的无锁代码在 x86/64 上运行正常,但在其他处理器上却会失败。

If you’re interested in the hardware details of how and why processors perform memory reordering, I’d recommend Appendix C of Is Parallel Programming Hard. In any case, keep in mind that memory reordering can also occur due to compiler reordering of instructions.
如果你对处理器如何进行内存重排序及其原因的硬件细节感兴趣,我推荐阅读《并行编程难吗》的附录 C。无论如何,请记住内存重排序也可能由编译器对指令的重排序引起。

In this post, I haven’t said much about the practical side of lock-free programming, such as: When do we do it? How much do we really need? I also haven’t mentioned the importance of validating your lock-free algorithms. Nonetheless, I hope for some readers, this introduction has provided a basic familiarity with lock-free concepts, so you can proceed into the additional reading without feeling too bewildered. As usual, if you spot any inaccuracies, let me know in the comments.
在这篇文章中,我并未过多涉及无锁编程的实际应用方面,比如:我们何时需要它?实际需求有多大?我也未提及验证无锁算法的重要性。尽管如此,我希望这篇介绍能为部分读者提供对无锁概念的基本了解,以便在进一步阅读时不至于感到过于困惑。如常,若发现任何不准确之处,请在评论区告知。

[This article was featured in Issue #29 of Hacker Monthly.]
[本文曾刊登于《黑客月刊》第 29 期。]

Additional References  延伸阅读资料

Comments (29)  评论(29)

Loading... Logging you in...
  • Logged in as
Commenting Disabled  评论已禁用
Further commenting on this page has been disabled by the blog admin.
博客管理员已禁用此页面的进一步评论。
Fantastic article!   精彩的文章!

The flow charts are very useful. I'm going to save copies.
流程图非常实用,我打算保存副本。


And the text explanations are top-notch!
而且文字解释也是一流的!
Reply  回复
pikachu's avatar

pikachu   皮卡丘 · 669 weeks ago  ・669 周前

Ever heard of CCommunicating Sequential Processes or Go programming language?
听说过通信顺序进程(CSP)或 Go 编程语言吗?
Reply  回复
extralongpants's avatar

extralongpants   超长裤 · 669 weeks ago  ・669 周前

Thank you for taking the time to write this article. I found it informative and well written. The flow charts were a good idea. :)
感谢您抽出时间撰写这篇文章。我觉得内容既丰富又条理清晰,流程图的设计也很棒。:)
Reply  回复
Wix's avatar

Wix   维克斯 · 669 weeks ago  ・669 周前

Excellent post! It's like an chapter of well written book. I can't wait to see future articles and in the mean time I'm going to read all the past ones.
优秀的文章!就像一本精心撰写的书籍中的一章。我迫不及待想看到未来的文章,同时我会去阅读所有过去的文章。
Reply  回复
2 replies · active 580 weeks ago
Thanks you guys. Glad people liked this one. I found it tough to write.
谢谢大家。很高兴人们喜欢这篇。我发现写起来挺难的。
Reply  回复
Mairaaj's avatar

Mairaaj   迈拉吉 · 580 weeks ago  ・580 周前

Most of the times Well written things are tough to write. Jazak Allah (may God reward you best)
大多数情况下,写得好的东西往往难以下笔。愿真主回赐你(愿真主最优厚地奖赏你)
Reply  回复
Ian's avatar

Ian   伊恩 · 669 weeks ago  ・669 周前

A lock-free queue should never have a member called isEmpty(). It's a misleading name, since that condition can be fleeting and not relied on. A better name is wasEmpty(), which is a gentle reminder of this fact.
无锁队列绝不应包含名为 isEmpty () 的成员函数。这个名称具有误导性,因为该状态可能转瞬即逝,不可依赖。更合适的命名是 wasEmpty (),它能温和地提醒这一事实。
Reply  回复
After working on the JActor project, this article feels so wrong-headed. Once you have ultra-high performance actors, it becomes clear that the problem all along was the focus on sharing state across threads--which you say is a prerequisite for lock-free programming. We just didn't have a workable alternative.
在参与 JActor 项目后,这篇文章的观点显得如此误入歧途。当你拥有超高性能的 actor 模型时,就会明白问题始终在于跨线程共享状态这一焦点 —— 而你说这是无锁编程的前提条件。我们只是此前没有可行的替代方案。
Reply  回复
4 replies · active 603 weeks ago
Your JActor project uses java.util.concurrent.ConcurrentLinkedQueue. Thus, it too incorporates lock-free programming.
您的 JActor 项目使用了 java.util.concurrent.ConcurrentLinkedQueue,因此它也采用了无锁编程技术。
Reply  回复
You are quite correct. There is also one semaphore in the implementation of the thread pool, but then it is important to block idle threads, eh? So applying your own reasoning and definitions, there is no such thing as a lock-free program.  

The real attention should be directed to the application logic. Once you have really fast actors to build on, in my experience, sharing state between threads is not an appropriate focus. Yeah, there are times when it is appropriate, but those are the exceptions.  
Reply 
I wholeheartedly agree that sharing state between threads should not be encouraged! The same advice is given in Bruce Dawson's paper and elsewhere. Thanks for your comments.  
Reply 
I've actually come of late to a better understanding of a significant difference between JActor actors-- http://jactorconsulting.com/product/jactor/ --and Scala actors, and that has to do with the elimination of intermediate or process state.  

In JActor you define a callback (usually an anonymous class) which is invoked on the requesting actor's "thread" when a response message is received. Member variables in this callback are the equivalent to method variables as they are not accessible when processing other incoming requests.  

Contrast this to Scala/Akka actors which use 1-way messages and which must use shared state for intermediate data and consequently must delay processing of other requests least this intermediate data be overwritten.  

With JActor then, actors do not have state that is shared across multiple requests except when that state is updated atomically. And I think that is key.  
Reply 
Nicely written overview of lock-free programming, especially in that it gives a sense of how tricky it really is (especially if cross-platform compatibility is needed.)  

Given how many pitfalls there are with lock-free approaches, would you recommend implementing something with plain old mutexes first, and then benchmarking against that? Lock-free data structures seem like a great avenue for premature optimization.  
Reply 
1 reply · active 603 weeks ago
Yea for sure. I even gave respect to locks (mutexes) in a previous post.  

Where I work, nobody has gone overboard with lock-free programming. We've got some base lock-free systems in place. Aside from that, it usually happens as you suggest. When some part of the game is too slow, we profile it. Once in a while, a locking section shows up as a bottleneck. We first try to reduce data sharing, but when that's not possible, we might apply a lock-free technique. The latter choice usually requires careful testing.  
Reply 
Mica's avatar

Mica   · 649 weeks ago 

Hi Jeff,  

Thx for these useful information.  

Are you courageous enough to go a step further and teach us how to program "wait-free" algorithms?  

No one seems to know how to ;-)  

Regards  
Mica  
Reply 
adrien's avatar

adrien   · 647 weeks ago 

Thank you for this great article.  

I've read several articles on DrDobbs and stuff like that and the doc of c++11, but this one is like a great very clear overview of the all thing well written, with very nice drawings.  

Keep up the great work  
Reply 
code2live's avatar

code2live   · 645 weeks ago 

Thanks for the very useful article/links.  

I had a question regarding the flowchart - "Are there producers and consumers only? Yes -> Acquire and release semantics are sufficient" - Where can I find more information about this - especially involving multiple producers/consumers? Maybe its already there in a particular section of the listed links? Would be great if you have more information  
Reply 
drewlu's avatar

drewlu   · 643 weeks ago 

Wonderful article especially for beginners!  
Reply 
Hi...Thanks for this useful article.I am doing FYP on lock free synchronization.I have studied a lot of articles and papers but I could not understand the basic concept that how a single resource cab be utilized by two processors simultaneously..No one have given pictorial /graphical examples.  
I need information about Lock free Link Lists.What is the flow of making them lockless...  
Kindly help me..  
Reply 
1 reply · active 603 weeks ago
Any change should be made to a (deep?) copy of the data structure and, when updating the reference, if it was changed since the copy was made (see java atomic reference or just use a CAS instruction), just repeat the process.  

This is a heavy handed approach, but will work for any type of structure. But there are many structures (java's concurrent linked queue for example) that likely work more efficiently, i.e. without having to copy the entire structure. Alternately, you can use an immutable structure which returns a different reference with each update as the immutable structure will cleverly avoid having to copy the entire structure.  
Reply
I had a question. Intel allows reordering of loads with earlier stores, ie: StoreLoad can be reordered to LoadStore. However for that very reordering there is no fence - ie no StoreLoad Barrier. One must use a full barrier - MFENCE. Did I get that right?
Reply
1 reply · active 577 weeks ago
You got that right, though "mfence" is not the only instruction which can act as a full barrier. Any locked instruction, such as "xchg", "lock or", and others, will also act as a full memory barrier – provided you don’t use SSE instructions or write-combined memory in the neighboring operations you wish to affect. Microsoft C++ generates "xchg" when you use the MemoryBarrier intrinsic, at least in Visual Studio 2008. Mintomic implements mint_thread_fence_seq_cst using "lock or".
Reply
Thanks for the reply! As I read further into the Intel Manuals I realized that "StoreLoads may be reordered for different mem locations" but "StoreLoads are NOT reordered for same mem location". So in-fact one doesn't need a StoreLoad Barrier at all on x86 if they point to same memory location, and I really cant think of an example where a StoreLoad barrier would be needed if they point to different memory locations. At that point there is no option other than a full barrier.
Reply
Do the parameters of _InterlockedCompareExchange be misplaced?

I mean, should the following piece of code
if (_InterlockedCompareExchange(&m_Head, newHead, oldHead) == oldHead)
return;

be modified as

if (_InterlockedCompareExchange(&m_Head, oldHead, newHead) == oldHead)
return;

Thanks.
Reply
Very well written. Thank you. Will bookmark and come back when needed.
Reply
Uma Mahesh's avatar

Uma Mahesh · 422 weeks ago

Thanks for writing complex topic in very detailed & easy to understand manner. hats off to you. Flow charts are amazing.
Reply
I have bookmarked the site (not only this article) as a research blog for myself. Thanks for sharing these goodies!!!
Reply
There is a set of headers for lock free programming. It is based on a paradigm of nuclear chain reactions. For example Fission reactions fan out one process (upon completion) to many new processes. Fusion fans in multiple threads to one computational thread.
Check it out here : https://github.com/flatmax/nuclear-processing
Reply

Comments by