This is a bilingual snapshot page saved by the user at 2025-7-3 13:48 for https://www.apecloud.cn/cases/kuaishou, provided with bilingual support by Immersive Translate. Learn how to save?
ApeCloud 云
开源社区关于我们

  quick worker

  Containerization challenges for massive Redis clusters

  Speaker: Yuyu Liu, Kuaishou Cloud Native Team

  Value 1

  Reduce infrastructure operating costs

  Value 2

  Simplify stateful service management

  Redis in Kuaishou

Before we dive into the details, let's take a look at some background. Kuaishou's Redis uses a classic master-slave architecture, consisting of three components (Server, Sentinel, and Proxy). Extremely large scale is a distinctive feature of Kuaishou Redis clusters, not only in terms of the total number of instances, but also in the size of a single cluster, which can even exceed 10,000 instances.

In Kuaishou, when Redis has been running stably and supporting such a large scale, why are we willing to toss and promote the cloud nativeization of Redis? Because we found that Kuaishou Redis resource utilization is relatively low, and for such a large-scale system, even a small optimization will bring huge benefits.

So what are the best ways to improve resource utilization? In fact, we have found that cloud native technology has provided us with the shortest path and best practices. In addition, the container cloud has become a new interface between business and infrastructure, and most of Kuaishou's stateless services have been migrated to the container cloud. In the long run, the unification of infrastructure is an inevitable trend. This not only decouples the business from the infrastructure, but also improves the agility of the business and reduces the operational costs of the infrastructure.

That's why we decided to do a cloud-native transformation to optimize costs. To make this work better, we brought in not only the Redis team, but also the cloud-native team.

  Why KubeBlocks

In fact, the fact that we're able to communicate here today has shown that we're doing it through KubeBlocks. Next, I would like to share with you why we chose KubeBlocks? In simple terms, KubeBlocks provides an API for stateful services.

So, what is a stateful service? What are the key differences between stateful and stateless services?

Literally, the difference seems to be whether or not there is state information. However, we believe that the real difference lies in the unequal relationship between instances. Precisely because different instances play different roles and store different data, we can't just throw away any instances. In addition, this inequality relationship is not static, and it is likely to change at runtime, such as a master/standby switchover. KubeBlocks is designed to solve this problem by providing role-based management capabilities.

Furthermore, KubeBlocks supports a variety of databases, so there's no need to deploy a dedicated operator for each database. In addition, it is interesting to note that KubeBlocks provides a process-oriented API via OpsRequest, which makes it easier to transform the database into cloud native. If you are interested, you can learn more details through the official documentation.

  Redis cluster orchestration definition

We've talked a lot about the KubeBlocks API, but how exactly does KubeBlocks define a Redis cluster? It should contain all the components (i.e. server/sentinel/proxy). In order to reduce the duplication of component definitions, KubeBlocks separates the Component Definition and Component Version, so that they can be directly referenced when creating a new cluster. Here we take Redis Server as an example, which is the most complex component form.

  • First, Redis Server is defined via ShardSpec, which contains multiple shard lists to support larger scale data. For a Redis Server cluster, it needs to be split into multiple shards, each with a master-slave instance.

  • Another key point is that master-slave instances within the same shard may have different configurations. To handle this, KubeBlocks allows users to define multiple configurations within the same component using an InstanceTemplate. Using the Redis Server component example, we can see that object relationships are divided into the following levels:

  • Cluster - Used to define the entire Redis cluster.

  • ShardSpec - A list used to define Redis Server shards.

  • Component - Used to define Redis Proxy, Sentinel, and individual Server shards.

  • InstanceTemplate - Used to define different configurations within the same component.

  • InstanceSet - This is the final auto-generated workload that provides the role-based management capabilities we mentioned earlier.

角色(Role)管理

关于基于角色的管理能力,可以总结为两个关键点:

  • 构建并维护正确的关系
  • 实现细粒度的基于角色的管理

我们重点关注如何维护正确的角色关系,有一件重要的事情需要注意。我们认为,单个分片的主节点信息非常重要。如果发生错误或无法获取,将会导致严重的问题。因此,为了确保业务的稳定性,我们倾向于在此处分离数据面(data plane)和控制面(control plane)。这就是为什么我们不从 KubeBlocks 获取主节点信息的原因。

部署架构

我们已经看到,KubeBlocks 在单个 Kubernetes 集群上运行有状态服务时表现得非常出色。然而,正如之前提到的,快手的 Redis 数量非常庞大,远远超出了单个 Kubernetes 集群的容量。因此,我们不得不使用多个 Kubernetes 集群来支持业务。关于多集群管理,如果我们将复杂性直接暴露给 Redis 业务,将会带来以下几个问题:

  • Redis 团队需要为所有 Kubernetes 集群维护缓冲资源池,这意味着更多的资源浪费。
  • Redis 团队必须在单个 Kubernetes 集群达到上限之前提前迁移 Redis 集群。

我们认为,最好隐藏多集群的复杂性。而且,我们已经通过联邦集群提供了所需的能力。不过 KubeBlocks 本身并不支持多集群。我们是如何解决这个问题的呢?以下是整体架构。

我们将 KubeBlocks operator 拆分为两部分。Cluster Operator 和 Component Operator 放置在联邦集群中,而 InstanceSet Controller 放置在成员集群中。在中间,有一个名为 Federal InstanceSet 控制器的组件,用于将 InstanceSet 对象从联邦集群分发到成员集群。那么,Federal InstanceSet 控制器是如何工作的呢?

  • 首先,它的首要职责是根据调度建议,决策每个集群应该部署多少个实例。
  • 其次,它的主要任务是拆分 InstanceSet 并将其分发到成员集群。

与 StatefulSet 类似,InstanceSet 中的实例也有编号名称。为了确保不打破这一规则,我们重新设计了 InstanceSet 中的 ordinals 字段,允许自定义编号范围。通过这种架构,我们能够在不做重大修改的情况下支持 KubeBlocks 在多个 Kubernetes 集群中运行。

稳定性保障

除了在功能上满足业务需求外,我们还需要确保解决方案能够达到生产级别,特别是在稳定性方面的保障。我们在这里讨论几个关键点。

首先,在调度能力方面,为了确保 Redis 的高可用性,我们应该确保实例尽可能地分散,但同时也要考虑单台机器故障对 Redis 集群规模的影响。因此,我们定制了一种细粒度的分散调度能力,既支持配置每个节点的最大实例数,又支持配置每个 Redis 集群的最大节点数。我们还提供了基于 CPU/内存/网络带宽的负载均衡调度能力。

接下来,我们来看一下运行时控制。我们都喜欢 Kubernetes 带来的自动化,但这也意味着更大的风险:一个小的变动可能导致大规模的故障。因此,我们对运行中的实例进行了大量控制,比如并发控制、仅允许原地更新等。此外,还有许多其他的工作。由于时间限制,我不会一一列举。

总结

最后,让我们做一个简单的总结。我相信大家已经意识到,KubeBlocks 是一个出色的项目。与 StatefulSet 相比,KubeBlocks 为有状态服务设计了全新的 API,并提供了基于角色的管理,这些设计使得有状态业务的云原生转型更加容易。看起来,KubeBlocks API 几乎可以支持所有的有状态服务。不过,我认为仍然有一些工作需要完成:

  • 如何与现有的数据库 Operator 建立连接,并对齐它们的功能。
  • 尝试推动有状态服务标准 API 的构建,快手也愿意与 KubeBlocks 一起在这个领域努力。也欢迎大家一起在这个领域做出更多的探索。

迄今为止,快手已经在许多功能上与 KubeBlocks 展开了合作,如 InstanceSet 直管 Pod 和 PVC、实例模板、与联邦集群集成等等。欢迎大家关注后继的工作进展。

演讲发表于 KubeCon China 2024