这是用户在 2024-3-24 20:45 为 https://jack-vanlightly.com/blog/2017/12/4/rabbitmq-vs-kafka-part-1-messaging-topologies 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

RabbitMQ vs Kafka Part 1 - Two Different Takes on Messaging
RabbitMQ 与 Kafka 第 1 部分 - 两种不同的消息传递方式


Jack Vanlightly 消息传递系统

In this part we'll explore what RabbitMQ and Apache Kafka are and their approach to messaging. Each technology has made very different decisions regarding every aspect of their design, each with strengths and weaknesses. We'll not come to any strong conclusions in this part, instead think of this as a primer on the technologies so we can dive deeper in subsequent parts of the series.
在这一部分中,我们将探讨 RabbitMQ 和 Apache Kafka 是什么以及它们的消息传递方法。每种技术在其设计的各个方面都做出了非常不同的决定,每种技术都有其优点和缺点。我们不会在这一部分中得出任何强有力的结论,而是将其视为技术的入门读物,以便我们可以在本系列的后续部分中更深入地研究。

RabbitMQ 兔子MQ

RabbitMQ is a distributed message queue system. Distributed because it is usually run as a cluster of nodes where queues are spread across the nodes and optionally replicated for fault tolerance and high availability. It natively implements AMQP 0.9.1 and offers other protocols such as AMQP 1.0, STOMP, MQTT and HTTP via plug-ins.
RabbitMQ 是一个分布式消息队列系统。分布式,因为它通常作为节点集群运行,其中队列分布在节点之间,并可选择复制以实现容错和高可用性。它原生实现了 AMQP 0.9.1,并通过插件提供了其他协议,例如 AMQP 1.0、STOMP、MQTT 和 HTTP。

RabbitMQ takes both a classic and a novel take on messaging. Classic in the sense that it is oriented around message queues, and novel in its highly flexible routing capability. It is this routing capability that is its killer feature. Building a fast, scalable, reliable distributed messaging system is an achievement in itself, but the message routing functionality is what makes it truly stand out among the myriad of messaging technologies out there.
RabbitMQ 对消息传递进行了经典和新颖的演绎。经典在于它以消息队列为导向,而新颖之处在于其高度灵活的路由功能。正是这种路由功能是它的杀手锏。构建一个快速、可扩展、可靠的分布式消息传递系统本身就是一项成就,但消息路由功能才是使其在众多消息传递技术中真正脱颖而出的原因。

Exchanges and Queues 交换和队列

The super simplified overview:
超级简化概述:

  • Publishers send messages to exchanges
    发布者向交易所发送消息

  • Exchanges route messages to queues and other exchanges
    Exchange 将消息路由到队列和其他 Exchange

  • RabbitMQ sends acknowledgements to publishers on message receipt
    RabbitMQ 在收到消息时向发布者发送确认

  • Consumers maintain persistent TCP connections with RabbitMQ and declare which queue(s) they consume
    使用者维护与 RabbitMQ 的持久 TCP 连接,并声明他们使用的队列

  • RabbitMQ pushes messages to consumers
    RabbitMQ 向消费者推送消息

  • Consumers send acknowledgements of success/failure
    消费者发送成功/失败的确认

  • Messages are removed from queues once consumed successfully
    成功使用后,将从队列中删除消息

But don’t be fooled into thinking that an exchange is a “thing”. A common misconception of exchanges in RabbitMQ are that they are "things" that you send messages to. In fact that are routing rules. You send a message to a channel process, it uses the routing rules (exchanges) to decide where to send the message on to. I think that the decision to use the word exchange was unfortunate, I much prefer "rule" or "routing rule".
但不要被愚弄,以为交易所是一种“东西”。对 RabbitMQ 中交换的一个常见误解是,它们是您向其发送消息的“事物”。事实上,这些都是路由规则。您将消息发送到通道进程,它使用路由规则(交换)来决定将消息发送到何处。我认为使用交换这个词的决定是不幸的,我更喜欢“规则”或“路由规则”。

Hidden in that list are a huge number of decisions that developers and admins should take to get the delivery guarantees they want, performance characteristics etc, all of which we'll cover in later sections of this series.
该列表中隐藏着开发人员和管理员为获得他们想要的交付保证、性能特征等而应做出的大量决策,我们将在本系列的后面部分介绍所有这些决策。

Let's take a look at a single publisher, exchange, queue and consumer:
让我们看一下单个发布者、交换、队列和使用者:

Fig 1 - Single publisher and single consumer
图1 - 单个发布者和单个使用者

What if you have multiple publishers of the same message? Also what if we have multiple consumers that each want to consume every message?
如果您有多个发布者发布同一邮件,该怎么办?另外,如果我们有多个消费者,每个消费者都想使用每条消息,该怎么办?

Fig 2 - Multiple publishers, multiple independent consumers
图 2 - 多个发布者,多个独立消费者

As you can see, the publishers send their messages to the same exchange, which route each message to three queues, each of which has a single consumer. With RabbitMQ, queues enable different consumers to consume each message. Contrast that to the diagram below:
如您所见,发布者将其消息发送到同一交换,该交换将每条消息路由到三个队列,每个队列都有一个使用者。借助 RabbitMQ,队列使不同的使用者能够使用每条消息。与下图进行对比:

Fig 3 - Multiple publishers, one queue with multiple competing consumers
图 3 - 多个发布者,一个队列,具有多个竞争消费者

In figure 3 we have three consumers all consuming from a single queue. These are competing consumers, that is they compete to consume the messages of a single queue. One would expect that on average, each consumer would consume one third of the messages of this queue. We use competing consumers to scale our message processing and with RabbitMQ it is very simple, just add or remove consumers on demand. No matter how many competing consumers you have, RabbitMQ will ensure that messages are delivered to only a single consumer.
在图 3 中,我们有三个使用者都在一个队列中消费。这些是竞争消费者,也就是说,他们竞争使用单个队列的消息。平均而言,每个使用者将消耗此队列中三分之一的消息。我们使用竞争消费者来扩展我们的消息处理,使用 RabbitMQ 非常简单,只需按需添加或删除消费者即可。无论您有多少竞争消费者,RabbitMQ 都将确保消息仅传递给单个消费者。

We can combine figure 2 and 3 to have multiple sets competing consumers where each set consumes every message.
我们可以将图 2 和图 3 结合起来,让多个集合竞争消费者,其中每个集合都消耗每条消息。

Fig 4 - Multiple publishers, multiple queues with competing consumers
图 4 - 多个发布者,多个队列与竞争消费者

The arrows between exchanges and queues are called bindings and we'll be taking a much closer look at those in Part 2 of this series.
交换和队列之间的箭头称为绑定,我们将在本系列的第 2 部分中仔细研究这些箭头。

GUARANTEES 保证

RabbitMQ offers "at most once delivery" and "at least once delivery" but not "exactly once delivery" guarantees. We'll take a deeper look at message delivery guarantees in Part 4 of the series.
RabbitMQ 提供“最多一次交付”和“至少一次交付”,但不提供“恰好一次交付”的保证。我们将在本系列的第 4 部分中更深入地了解消息传递保证。

Messages are delivered in order of their arrival to the queue (that is the definition of a queue after all). This does not guarantee the completion of message processing matches that exact same order when you have competing consumers. This is no fault of RabbitMQ but a fundamental reality of processing an ordered set of messages in parallel. This problem can be resolved by using the Consistent Hashing Exchange as you'll see in the next part on patterns and topologies.
消息是按照它们到达队列的顺序传递的(毕竟这是队列的定义)。这并不能保证在有竞争使用者时,消息处理的完成与完全相同的顺序匹配。这不是 RabbitMQ 的错,而是并行处理一组有序消息的基本现实。此问题可以通过使用一致哈希交换来解决,您将在下一部分中介绍模式和拓扑。

PUSH AND CONSUMER PREFETCH
推送和消费者预取

RabbitMQ pushes messages to consumers in a stream. There is a Pull API but it has terrible performance as each message requires a request/response round-trip (note, I updated this paragraph due to a comment from Shiva Kumar).
RabbitMQ 以流的形式将消息推送给使用者。有一个拉取 API,但它的性能很糟糕,因为每条消息都需要一个请求/响应往返(注意,由于 Shiva Kumar 的评论,我更新了这一段)。

Push-based systems can overwhelm consumers if messages arrive at the queue faster than the consumers can process them. So to avoid this each consumer can configure a prefetch limit (also known as a QoS limit).  This basically is the number of unacknowledged messages that a consumer can have at any one time. This acts as a safety cut-off switch for when the consumer starts to fall behind.
如果消息到达队列的速度快于使用者的处理速度,则基于推送的系统可能会使使用者不堪重负。因此,为了避免这种情况,每个使用者都可以配置预取限制(也称为 QoS 限制)。 这基本上是消费者在任何时候可以拥有的未确认消息的数量。当消费者开始落后时,这充当安全切断开关。

Why push and not pull? First of all it is great for low latency. Secondly, ideally when we have competing consumers of a single queue we want to distribute load evenly between them. If each consumer pulls messages then depending on how many they pull the distribution of work can get pretty uneven. The more uneven the distribution of messages the more latency and the further the loss of message ordering at processing time. For that reason RabbitMQ's Pull API only allows to pull one message at a time, but that seriously impacts performance. These factors make RabbitMQ lean towards a push mechanism. This is one of the scaling limitations of RabbitMQ. It is ameliorated by being able to group acknowledgements together.
为什么要推而不是拉?首先,它非常适合低延迟。其次,理想情况下,当我们有一个队列的竞争消费者时,我们希望在他们之间平均分配负载。如果每个消费者都拉取消息,那么根据他们拉取的数量,工作分配可能会变得非常不均匀。消息分布越不均匀,延迟就越大,处理时消息排序的损失就越大。出于这个原因,RabbitMQ 的 Pull API 一次只允许拉取一条消息,但这会严重影响性能。这些因素使 RabbitMQ 倾向于推送机制。这是 RabbitMQ 的扩展限制之一。通过能够将确认组合在一起,可以改善它。

Routing 路由

Exchanges are basically routing rules for messages. In order for a message to travel from a publisher channel process to a queue or other exchange, a binding is needed. Different exchanges require different bindings. There are many types of exchange and associated bindings:
交换基本上是消息的路由规则。为了使消息从发布者通道进程传输到队列或其他交换,需要绑定。不同的交换需要不同的绑定。有许多类型的交换和关联的绑定:

  • Fanout. Routes to all queues and exchanges that have a binding to the exchange. The standard pub sub model.
    扇出。路由到与交换具有绑定关系的所有队列和交换。标准的酒吧子模型。

  • Direct. Routes messages based on a Routing Key that the message carries with it, set by the publisher. A routing key is a short string. Direct exchanges route messages to queues/exchanges that have a Binding Key that exactly matches the routing key.
    直接。根据邮件随身携带的路由密钥(由发布者设置)路由邮件。路由键是一个短字符串。直接交换将消息路由到具有与路由密钥完全匹配的绑定密钥的队列/交换。

  • Topic. Routes messages based on a routing key, but allows wildcard matching.
    主题。基于路由键路由消息,但允许通配符匹配。

  • Header. RabbitMQ allows for custom headers to be added to messages. Header exchanges route messages according to those header values. Each binding includes exact match header values. Multiple values can be added to a binding with ANY or ALL values required to match.
    页眉。RabbitMQ 允许将自定义标头添加到消息中。标头根据这些标头值交换路由消息。每个绑定都包含完全匹配的标头值。可以将多个值添加到绑定中,这些值需要匹配 ANY 或 ALL 值。

  • Consistent Hashing. This is an exchange that hashes either the routing key or a message header and routes to one queue only. This is useful when you need processing order guarantees with scaled out consumers.
    一致的哈希。这是一个对路由键或消息标头进行哈希处理并仅路由到一个队列的交换。当需要处理具有横向扩展使用者的订单保证时,这非常有用。

Fig 5. Topic exchange example
图 5.主题交换示例

We'll be looking more closely at routing in Part 2, but above is an example of a Topic exchange. Publishers publish error logs with a Routing Key format of LEVEL.AppName.
我们将在第 2 部分中更仔细地研究路由,但上面是一个主题交换的示例。发布者发布路由密钥格式为 LEVEL 的错误日志。AppName。

  • Queue 1 will receive all messages as it uses the multi-word # wildcard.
    队列 1 将接收所有消息,因为它使用多字 # 通配符。

  • Queue 2 will receive any log level of the ECommerce.WebUI application. It uses the single-word * wildcard that covers the log level.
    队列 2 将接收 ECommerce.WebUI 应用程序的任何日志级别。它使用涵盖日志级别的单字 * 通配符。

  • Queue 3 will see all ERROR level messages from any application. It uses the multi-word # wildcard to cover all applications.
    队列 3 将看到来自任何应用程序的所有 ERROR 级别消息。它使用多字 # 通配符来覆盖所有应用程序。

With four ways of routing messages, and allowing exchanges to route to other exchanges,  RabbitMQ offers a powerful and flexible set of messaging patterns. Next we'll touch on dead letter exchanges, ephemeral exchanges and queues, and you'll start seeing the power of RabbitMQ.
RabbitMQ 具有四种路由消息的方式,并允许交换路由到其他交换,提供了一组强大而灵活的消息传递模式。接下来,我们将讨论死信交换、短暂的交换和队列,您将开始看到 RabbitMQ 的强大功能。

Dead Letter Exchanges

We can configure queues to send messages to an exchange under the following conditions:

  • Queue exceeds the configured number of messages.

  • Queue exceeds the configured number of bytes.

  • Message Time To Live (TTL) expired. The publisher can set the lifetime of the message and also the queue can have a message TTL. Which ever is shorter applies.

We create a queue that has a binding to the dead letter exchange and these messages get stored there until action is taken. In a separate post I have described the topology I have implemented where all dead lettered messages go to a central clearing house where the support team can decide what actions to take.

Like with many RabbitMQ functionalities, dead letter exchanges give extra patterns that were not originally considered. We can use message TTLs and dead letter exchanges to implement delay queues and retry queues including exponential backoff. See my previous posts on this.

Ephemeral Exchanges and Queues

Exchanges and queues can be dynamically created and given auto delete characteristics. After a certain time period they can self destruct. This allows for patterns such as ephermal reply queues for message based RPC.

Plug-Ins

The first plug-in you will want to install is the Management Plug-In that provides an HTTP server, with web UI and REST API. It is really easy to install and gives you an easy to use UI to get you up and running. Scripting deployments via the REST API is really easy too.

Some other plug-ins include:

  • Consistent Hashing Exchange, Sharding Plugin

  • protocols like STOMP and MQTT

  • web hooks

  • extra exchange types

  • SMTP integration

  • Top

There's a lot more to RabbitMQ but that is a good primer and gives you an idea of what RabbitMQ can do. Now we'll look at Kafka, which takes a completely different approach to messaging, and also has amazing features.

Apache Kafka

Kafka is a distributed, replicated commit log. Kafka does not have the concept of a queue which might seem strange at first given that it is primary used as a messaging system. Queues have been synonymous with messaging systems for a long time. Let's break down "distributed, replicated commit log" a bit:

  • Distributed because Kafka is deployed as a cluster of nodes, for both fault tolerance and scale

  • Replicated because messages are usually replicated across multiple nodes (servers).

  • Commit Log because messages are stored in partitioned, append only logs which are called Topics. This concept of a log is the principal killer feature of Kafka.

Understanding the log (Topic) and its partitions are the key to understanding Kafka. So how is a partitioned log different from a set of queues? Let's visualise it.

Fig 6 One producer, one partition, one consumer

Rather than put messages in a FIFO queue and track the status of that message in the queue like RabbitMQ does, Kafka just appends it to the log and that is that. The message stays put whether it is consumed once or a thousand times. It is removed according to the data retention policy (often a window time period). So how is a topic consumed? Each consumer tracks where it is in the log, it has a pointer to the last message consumed and this pointer is called the offset. Consumers maintain this offset via the client libraries and depending on the version of Kafka the offset is stored either in ZooKeeper or Kafka itself. ZooKeeper is a distributed consensus technology used by many distributed systems for things like leader election. Kafka relies on ZooKeeper for managing the state of the cluster.

What is amazing about this log model is that it instantly removes a lot of complexity around message delivery status and more importantly for consumers, it allows them to rewind and go back and consume messages from a previous offset. For example imagine you deploy a service that calculates invoices which consumes bookings placed by clients. The service has a bug and calculates all the invoices incorrectly for 24 hours. With RabbitMQ at best you would need to somehow republish those bookings and only to the invoice service. But with Kafka you simply move the offset for that consumer back 24 hours.

So let's see what it looks like with a topic that has a single partition and two consumers which each need to consume every message. From now on I have started to label the consumers because it will not be as clear (as the RabbitMQ diagrams) which are independent and which are competing consumers.

Fig 7 One producer, one partition, two independent consumers

As you can see from the diagram, two independent consumers both consume from the same partition, but they are reading from different offsets. Perhaps the invoice service takes longer to process messages than the push notification service, or perhaps the invoice service was down for a while and catching up, or perhaps there was a bug and its offset had to be moved back a few hours.

Now let's say that the invoice service needs to be scaled out to three instances because it cannot keep up with the message velocity. With RabbitMQ we simply deploy two more invoice service apps which consume from the bookings invoice service queue. But Kafka does not support competing consumers on a single partition, Kafka's unit of parallelism is the partition itself. So if we need three invoice consumers we need at least three partitions. So now we have:

Fig 8 Three partitions and two sets of three consumers

So the implication is that you need at least as many partitions as the most scaled out consumer. Let's talk about partitions a bit.

PARTITIONS AND CONSUMER GROUPS

Each partition is a separate data structure which guarantees message ordering. That is important to remember: message ordering is only guaranteed within a single partition. That can introduce some tension later on between message ordering needs and performance needs as the unit of parallelism is also the partition. One partition cannot support competing consumers, so our invoice application can only have one instance consuming each partition.

Messages can be routed to partitions in a round robin manner or via a hashing function: hash(message key) % number of partitions. Using a hashing function has some benefits as we can design the message keys such that messages of the same entity, like a booking for example, always go to the same partition. This enables many patterns and message ordering guarantees.

Consumer Groups are like RabbitMQ's competing consumers. Each consumer in the group is an instance of the same application and will process a subset of all the messages in the topic. Whereas RabbitMQ's competing consumers all consume from the same queue, each consumer in a Consumer Group consumes from a different partition of the same topic. So in the examples above, the three instances of the invoice service all belong to the same Consumer Group.

At this point RabbitMQ looks a little more flexible with its guarantee of message order within a queue and its seamless ability to cope with changing numbers of competing consumers. With Kafka, how you partition your logs is important.

There is a subtle yet important advantage that Kafka had from the start that RabbitMQ added later on, regarding message order and parallelism. RabbitMQ maintains global order of the whole queue but offers no way for maintaining that order during the parallel processing of that queue. Kafka cannot offer global ordering of the topic, but it does offer ordering at the partition level. So if you only need ordering of related messages then Kafka offers both ordered message delivery and ordered message processing. Imagine you have messages that show the latest state of a client's booking, so you want to always process the messages of that booking sequentially (in temporal order). If you partition by the booking Id, then all messages of a given booking will all arrive at a single partition, where we have message ordering. So you can create a large number of partitions, making your processing highly parallelised and also get the guarantees you need for message ordering.

This capability exists in RabbitMQ also via the Consistent Hashing exchange which distributes messages over queues in the same way. Though Kafka enforces this ordered processing by the fact that only one consumer per consumer group can consume a single partition, and makes it easy as the coordinator node does all the work for you to ensure this rule is complied with. RabbitMQ offers Single Active Consumer (SAC) which prevents more than one consumer actively consuming a queue at the same time (even if multiple consumers have subscribed to the queue), but it doesn’t offer coordination of consumers over multiple “partitioned” queues.

Back to Kafka, there's also a gotcha here, the moment you change the number of partitions, those messages for order Id 1000 now go to a different partition, so messages of order  Id 1000 exist in two partitions. Depending on how you process your messages this can introduce a headache. There exist scenarios now where the messages get processed out of order.

We'll be covering this subject in greater detail in the Part 4 Message Delivery Semantics and Guarantees section of the series.

PUSH VS PULL

RabbitMQ uses a push model and prevents overwhelming consumers via the consumer configured prefetch limit. This is great for low latency messaging and works well for RabbitMQ's queue based architecture. Kafka on the other hand uses a pull model where consumers request batches of messages from a given offset. To avoid tight loops when no messages exist beyond the current offset Kafka allows for long-polling.

A pull model makes sense for Kafka due to its partitions. As Kafka guarantees message order in a partition with no competing consumers, we can leverage the batching of messages for a more efficient message delivery that gives us higher throughput. This doesn't make so much sense for RabbitMQ as ideally we want to try to distribute messages one at a time as fast as possible to ensure that work is parallelised evenly and messages are processed close to the order in which they arrived in the queue. But with Kafka the partition is the unit of parallelism and message ordering, so neither of those two factors are a concern for us.

Publish Subscribe

Kafka supports basic pub sub with some extra patterns related to that fact it is a log and has partitions. The producers append messages to the end of the log partitions and the consumers could be positioned with their offset anywhere in the partition.

Fig 9. Consumers with different offsets

This style of diagram is not as easy to quickly interpret when there are multiple partitions and consumer groups, so for the remainder of the diagrams for Kafka I will use the following style:

Fig 10. One producer, three partitions and one consumer group with three consumers

Fig 10. One producer, three partitions and one consumer group with three consumers

We don't have to have the same number of consumers in our consumer group as there are partitions:

Fig 11. Sone consumers read from more than one partition

Consumers in one consumer group will coordinate the consumption of partitions, ensuring that one partition is not consumed by more than one consumer of the same consumer group.

Likewise, if we have more consumers than partitions, the extra consumer will remain idle, in reserve.

Fig 12. One idle consumer

After adding and removing consumers, the consumer group can become unbalanced. A rebalancing redistributes the consumers as evenly as possible across the partitions.

Fig 13. Addition of new consumers requires rebalancing

Rebalancing is automatically triggered after:

  • a consumer joins a Consumer Group

  • a consumer leaves a Consumer Group (it shutsdown or is considered dead)

  • new partitions are added

Rebalancing will cause a short period of extra latency while consumers stop reading batches of messages and get assigned to different partitions. Any in memory state that was maintained by the consumer may now be invalid. One of the patterns of consumption with Kafka is being able to direct all messages of a given entity, like a given booking, to the same partition and hence the same consumer. This is called data locality. Upon rebalancing any in memory data about that data will be useless unless the consumer gets assigned back to the same partition. Therefore consumers that maintain state will need to persist it externally.

Log Compaction

The standard data retention policies are time and space based policies. Store up to the last week of messages or up to 50GB for example. But another type of data retention policy exists - Log Compaction. When a log is compacted, the result is that only the most recent message per message key is retained, the rest are removed.

Let's imagine we receive a message containing the current state of a user's booking. Every time a change is made to the booking a new event is generated with the current state of the booking. The topic may have a few messages for that one booking that represent the states of that booking since it was created. After the topic gets compacted only the most recent message related to that booking will be kept.

Depending on the volume of bookings and the size of each booking, you could theoretically store all bookings forever in the topic. By periodically compacting the topic we ensure we only store one message per booking.

Log compaction enables a few different patterns, which we will explore in Part 3.

More on Message Ordering

We've covered that scaling out and maintaining message ordering is possible with both RabbitMQ and Kafka, but Kafka makes it a lot easier. With RabbitMQ we must use the Consistent Hash Exchange and manually implement the consumer group logic ourselves by using Single Active Consumer and custom hand-rolled logic.

But RabbitMQ has one interesting capability that Kafka does not: RabbitMQ allows subscribers to order arbitrary groups of events.

Let's dive into that a little more. Different applications cannot share a queue because then they would compete to consume the messages. They need their own queue. This gives applications the freedom to configure their queue anyway they see fit. They can route multiple events types from multiple exchanges to their queue. This allows applications to maintain the ordering of related events. Which events it wants to combine can be configured differently for each application.

This is simply not possible with a log based messaging system like Kafka because logs are shared resources. Multiple applications read from the same log. So any grouping of related events into a single topic is a decision made at a wider system architecture level.

So there is no winner takes all here. RabbitMQ allows you to maintain relative ordering across arbitrary sets of events and Kafka provides a simple way of maintaining ordering at scale.

Update: I have built a library called Rebalanser that provides consumer group logic to RabbitMQ for .NET applications. Check out the post on it and the GitHub repo. If people show any interest then I'd be up for making versions in other languages. Let me know.

Conclusions

RabbitMQ offers a swiss army knife of messaging patterns due to the variety of functionality it offers. With its powerful routing, it can obviate the need for consumers to retrieve, deserialize and inspect every message when it only needs a subset. It is easy to work with, scaling up and down is done by simply adding and removing consumers. It's plug-in architecture allows it to support other protocols and add new features such as Consistent hashing exchange which is an important addition.

Kafka's distributed log with consumer offsets makes time travel possible. It's ability to route messages of the same key to the same consumer, in order, makes highly parallelised, ordered processing possible. Kafka's log compaction and data retention allow new patterns that RabbitMQ simply cannot deliver. Finally yes, Kafka can scale further than RabbitMQ, but most of us deal with a message volume that both can handle comfortably.

In the next part we'll take a closer look at messaging patterns and topologies with RabbitMQ.

Comments (27)

Newest First
Preview Post Comment…

Absolutely fantastic article with in-depth explaination! Thank you!

Preview Post Reply

Fantastic, really clear well thought-out explanation! Thanks you!

Preview Post Reply

Simply Awesome. You have covered every single thing in your post in a very clear and understandable way.

Preview Post Reply

very good article, thanks for sharing

Preview Post Reply

I am happy to this blog site giving one-of-a-kind and also useful knowledge concerning this topic.

Preview Post Reply

This is a gold mine. Thanks for the wonderful article!

Preview Post Reply

excellent article. Thanks a lot!.

Preview Post Reply

This is a very long and wonderful article, thank you.

Preview Post Reply

Hi Jack Vanlightly,

You just rock !!. Very good post series which is covering
almost all the queries which come up while working with these messaging systems. Write now i use apache helix for creating rabbitmq consumer group in java which uses jookeper for metadata storage. I loved your rebalanser project which does the same. Would love to contribute nodejs package for the same.

Here's my email "manojvarma.mkv@gmail.com". Please let me know how can i connect with you for contributing.

Thanks :)

Preview Post Reply

Hi Manoj,

Thanks :)

Regarding Rebalanser, I want to return to the project in a month or so and pick up where I left off, I got pretty busy for a few months, and I hate leaving it unfinished. I'll email you when I'm in a position to start discussing its future. My preference for RabbitMQ is to develop a consumer group plugin and add support to the major client libraries, which may be on the cards. But Rebalanser is not actually tied to RabbitMQ, it is just a peer-to-peer resource allocation library and is applicable to many technologies, so might be interesting to continue with separate to RabbitMQ.

Thanks,
Jack

Preview Post Reply

Hi Jack,

I got you. Do drop a mail when you want to start working on this. I am looking forward to contribute as much as possible.

Thanks,
Manoj

Preview Post Reply

oops grammer mistake its "zookeeper" not jookeeper.

Preview Post Reply

The round trip latency problem that you mentioned in case of pull API. Is it not applicable to kafka. I mean in kafka all messages are pulled, so it should perform very bad.

Preview Post Reply

Excellent work! Thanks a lot!
I would like to say that RabbitMQ also supports only one consumer per queue, it is called an exclusive consumer. So, RabbitMQ offers a way for maintaining processing order.

Preview Post Reply

Thanks for mentioning that. In the upcoming 3.8 release there is a new feature called Single Active Consumer which can be enabled on a queue that allows multiple consumers, but only one is able to consume at a time. When the active consumer goes away, a waiting consumer gets promoted to active.

Preview Post Reply

I am looking forward to this feature.
Do you know when is the official release of 3.8 release is planned?

Preview Post Reply

It should be this year, but there is no fixed date.

Preview Post Reply

Excellent work! Thanks a lot!

Preview Post Reply

Excellent article, very informative and in-depth.

Preview Post Reply

Excellent post, thanks Jack.

I have one question.
"RabbitMQ ............. (there is a pull API but it is deprecated). "
I couldn't find any official documentation about the pull API deprecation. Can you share the source for this?

Preview Post Reply

Hi Shiva, thanks for the question. I think that back when I was learning RabbitMQ a few years ago there was an implementation in Java and C# called the QueueingConsumer/QueueingBasicConsumer. It allowed you to explicitly pull from an in-memory queue. It got deprecated. I was confusing that with the Pull API.

So to clarify, the QueueingConsumer used the push model underneath, but exposed the messages via an in-memory queue that you could pull from. This was what was deprecated.

The Pull API itself was not deprecated and is still a valid way of consuming messages. The performance is far worse as you have to perform a round-trip per message, so it should be reserved for use-cases where pull is really necessary.

Preview Post Reply

Very informative and in-depth description of RabbitMQ and Kafka. It helps a lot, especially for new comers of RabbitMQ and Kafka.

Preview Post Reply

Great work, thank you!

Preview Post Reply

Nice one.. Thanks for the insights

Preview Post Reply

Really Jack, kudos on all your RMQ and Kafka stuff. I've read it all.

Preview Post Reply

Amazing article. Well done.

Preview Post Reply

Thanks

Preview Post Reply