Key Takeaways 关键要点
- Distributed Systems spanning over multiple availability zones can incur significant data transfer costs and performance bottlenecks.
分布式系统跨越多个可用区可能会产生显著的数据传输成本和性能瓶颈。 - Organizations can reduce costs and latencies by applying zone aware routing techniques without sacrificing reliability and high availability.
组织可以通过应用区域感知路由技术来降低成本和延迟,同时不牺牲可靠性和高可用性。 - Zone Aware Routing is a strategy designed to optimize network costs and latency by directing traffic to services within the same availability zone whenever possible.
区域感知路由是一种策略,旨在通过尽可能将流量引导到同一可用区内的服务来优化网络成本和延迟。 - Implementing zone aware routing end-to-end requires various tools, such as Istio, and selecting distributed databases that support this capability.
实施端到端区域感知路由需要各种工具,例如 Istio,并选择支持此功能的分布式数据库。 - Be aware of the issues that arise when your services aren’t evenly distributed. Handle cluster hotspots and be able to scale a service within the scope of a specific zone.
请注意当您的服务分布不均时出现的问题。处理集群热点,并能够在特定区域范围内扩展服务。
The microservices architectural approach has become a core factor in building successful products. It was made possible by adopting advanced cloud technologies such as service mesh, containers, and serverless computing. The need to rapidly grow, create maintainability, resilience, and high availability made it standard for teams to build "deep" distributed systems: Systems with many microservices layers. Systems that span across multiple Availability Zones (AZs) and even regions. Those systems are often characterized by dozens of chatty microservices that must frequently communicate with each other over the network.
微服务架构方法已成为构建成功产品的核心因素。通过采用服务网格、容器和无服务器计算等先进云技术,使其成为可能。快速增长、可维护性、弹性和高可用性的需求使得团队构建“深度”分布式系统成为标准:具有许多微服务层的系统。跨越多个可用区(AZ)甚至区域的系统。这些系统通常由数十个频繁通过网络相互通信的“健谈”微服务组成。
Distributing our resources across different physical locations (regions and AZs) is crucial for achieving resilience and high availability. However, significant data transfer costs and performance bottlenecks can be incurred in some deployments. This article aims to create awareness of those potential issues and provide guidelines on maintaining resilience while overcoming these challenges.
在不同物理位置(地区和可用区)分配我们的资源对于实现弹性和高可用性至关重要。然而,在某些部署中可能会产生显著的数据传输成本和性能瓶颈。本文旨在提高对这些潜在问题的认识,并提供在克服这些挑战的同时保持弹性的指南。
The Problem 问题
Using multiple AZs is a best practice and an essential factor in maintaining service availability. Whether those are our microservices, load balancers, DBs, or message queues, we must provision them across multiple AZs to ensure our applications are geographically distributed and can withstand all sorts of outages.
使用多个可用区(AZ)是最佳实践,也是保持服务可用性的关键因素。无论是我们的微服务、负载均衡器、数据库还是消息队列,我们都必须将它们部署在多个可用区,以确保我们的应用程序在地理上分布,并能抵御各种故障。
In some critical systems, it is also common to distribute the resources and data across multiple regions and even multiple cloud providers for the sake of fault tolerance.
在某些关键系统中,为了容错,通常也会将资源和数据分布到多个区域,甚至多个云服务提供商。
Distributing our services across multiple AZs requires us to transfer data between those AZs. Whenever two microservices communicate with each other or whenever a service reads data from a distributed DB, the communication will likely cross the AZ boundary. This is because of the nature of load balancing.
在多个可用区(AZ)之间分发我们的服务需要我们在这些 AZ 之间传输数据。每当两个微服务相互通信或当服务从分布式数据库读取数据时,通信很可能会跨越 AZ 边界。这是因为负载均衡的性质。
Usually, load balancers distribute the traffic evenly between instances of the upstream service (the service being called) without awareness of the AZ in which the origin service is located. So, in reality, whenever systems are provisioned over multiple AZs, cross-AZ traffic is likely to happen a lot.
通常,负载均衡器在上游服务(被调用的服务)的实例之间均匀分配流量,而不会意识到原始服务所在的可用区(AZ)。因此,实际上,当系统在多个可用区部署时,跨可用区流量很可能会大量发生。
But why is it a cloud cost burden and even a performance bottleneck? Let’s break it down.
但为什么它是云成本负担甚至性能瓶颈?让我们来分析一下。
Figure 1: Illustration of a request flow that spans across multiple availability zones
图 1:跨多个可用区的请求流程示意图
The Cost Burden 成本负担
Every time data is transferred between two AZs, it is common among the big cloud providers to apply data transfer charges in both directions for incoming cross-AZ data transfer and for outgoing cross-AZ data transfer. For example, if it costs $0.01/GB in each direction, a single terabyte of transferred data between two same-region availability zones will cost you $10 for the egress and $10 for the ingress, a total of $20. However, those costs can quickly get out of control on distributed systems. Imagine that as part of a request, your Load Balancer instance on AZ-1 may reach out to Service A in AZ-2, which in turn calls Service B on AZ-3, which in turn reads a key/value record from a DB on AZ-2, and often consumes updates from a Kafka broker on AZ-1 (see figure 1).
每次在两个可用区(AZ)之间传输数据时,大型云服务提供商通常会对入站跨 AZ 数据传输和出站跨 AZ 数据传输双向收取数据传输费用。例如,如果每个方向每 GB 收费 0.01 美元,那么在两个同一区域的可用区之间传输 1TB 数据,出口费用和入口费用各为 10 美元,总计 20 美元。然而,在分布式系统中,这些费用可能会迅速失控。想象一下,作为请求的一部分,您的 AZ-1 上的负载均衡器实例可能需要访问 AZ-2 中的服务 A,该服务 A 又调用 AZ-3 中的服务 B,服务 B 再从 AZ-2 的数据库中读取键/值记录,并经常从 AZ-1 上的 Kafka 代理中消费更新(见图 1)。
So, while this architectural choice prioritizes availability, it is prone to extreme situations where all the communication between services on a single request is done across AZs. So, the reality is that in systems with various DBs, message queues, and many microservices that talk to each other, data must frequently move around and will likely become extremely expensive. A burden that tends to grow every time you add services to your stack.
因此,虽然这种架构选择优先考虑可用性,但它容易出现在所有服务在单个请求中跨可用区(AZ)进行通信的极端情况下。所以,现实是,在拥有各种数据库、消息队列以及相互通信的许多微服务系统中,数据必须频繁移动,并且可能会变得极其昂贵。每当您向您的堆栈添加服务时,这种负担往往会增长。
The Performance Bottleneck
性能瓶颈
AZs are physically separated by a meaningful distance from other AZs in the same data center or region, although they are all within up to several miles. This generally produces a minimum of single-digit millisecond roundtrip latency between AZs and sometimes even more. So, back to our sample where a request to your Load Balancer instance on AZ-1 may reach out to Service A in AZ-2, which in turn calls Service B on AZ-3, which in turn reads a key/value record from a DB on AZ-2, and often consumes updates from a Kafka broker on AZ-1. That way, we could easily add over a dozen milliseconds to every request. A precious time that can drop to sub millisecond when services are in the same AZ. And like with the cost burden, the more services you add to your stack, the more this burden grows.
AZs 在相同数据中心或区域内与其他 AZs 物理上保持有意义的距离,尽管它们都在几英里范围内。这通常在 AZs 之间产生至少个位数的毫秒往返延迟,有时甚至更多。因此,回到我们的示例,当对 AZ-1 上的负载均衡器实例的请求可能扩展到 AZ-2 中的服务 A 时,该服务又调用 AZ-3 中的服务 B,然后服务 B 从 AZ-2 的数据库中读取键/值记录,并经常从 AZ-1 上的 Kafka 代理中消耗更新。这样,我们就可以轻松地为每个请求添加超过十几个毫秒。当服务在同一 AZ 中时,宝贵的时间可以降至亚毫秒。而且,就像成本负担一样,你添加到你的堆栈中的服务越多,这种负担就越大。
So, how can we gain cross-AZ reliability without sacrificing performance?
那么,我们如何在牺牲性能的情况下获得跨可用区可靠性?
And is there anything to do about those extra data transfer costs?
关于那些额外的数据传输费用,有什么办法可以解决吗?
Let’s dive into those questions.
让我们深入探讨这些问题。
Zone Aware Routing to the Rescue
区域感知路由拯救之道
Zone Aware Routing (aka Topology Aware Routing) is the way to address those issues. And in Aura from Unity, we have been gradually rolling this capability for the past year. We saw that it saved 60% of the cross-AZ traffic in some systems. We understood how to leverage it to optimize performance and reduce bandwidth costs significantly. We also uncovered where it didn’t fit and should be avoided. So, let’s describe Zone Aware Routing and what needs to be considered to utilize it well.
区域感知路由(又称拓扑感知路由)是解决这些问题的方法。在 Unity 的 Aura 中,我们过去一年一直在逐步推出这项功能。我们发现它在某些系统中节省了 60%的跨可用区流量。我们了解了如何利用它来优化性能并显著降低带宽成本。我们还发现了它不适合的地方,应该避免使用。因此,让我们描述一下区域感知路由以及在使用它时需要考虑的因素。
What Is Zone Aware Routing?
什么是区域感知路由?
Zone Aware Routing is a strategy designed to optimize network costs and latency by directing traffic to services within the same availability zone whenever possible. It minimizes the need for cross-zone data transfer, reducing associated costs and latencies.
Zone Aware Routing aims to send all or most of the traffic from an originating service to the upstream service over the local zone. Ideally, routing over the local zone or performing cross-zone routing should depend on the percentage of healthy instances of the upstream service in the local zone of the origin service. In addition, in case of a failure or malfunction with the upstream service in the current AZ, a Zone Aware Routing component should be able to automatically reroute requests across AZ to maintain high availability and fault tolerance.
At its core, a Zone Aware Router should act as an intelligent load balancer. While trying to maintain awareness of zone locality, it is also responsible for balancing the same number of requests per second across all upstream instances of a service. Its purpose is to avoid cases where specific instances are bombarded with traffic. Such a traffic skew can cause localized overload or underutilization, which can cause inefficiency and potentially incur additional costs.
在其核心,区域感知路由器应充当智能负载均衡器。在尝试保持对区域局部性的意识的同时,它还负责平衡每个服务所有上游实例每秒相同数量的请求。其目的是避免特定实例被大量流量冲击的情况。这种流量偏差可能导致局部过载或利用率不足,从而造成效率低下,并可能产生额外成本。
Due to that nature, Zone Aware Routing can be very beneficial in environments with evenly distributed zonal traffic and resources (for example, when you distribute your traffic 50/50 over two AZs with the same amount of resources). However, for uneven zonal distributions, it could become less effective due to the need to balance the traffic across multiple zones and overcome hotspots (bombarded and overloaded instances—see figure 2).
由于这种特性,区域感知路由在区域流量和资源均匀分布的环境中非常有用(例如,当您将流量平均分配到两个具有相同资源量的可用区时)。然而,对于不均匀的区域分布,由于需要在多个区域之间平衡流量并克服热点(受到攻击和过载的实例——见图 2),它可能变得不那么有效。
Figure 2: Illustration of uneven zonal distribution. Two services across two zones. Will Service B hold up when it will serve all requests coming from AZ-2?
How Can I Adopt Zone Aware Routing?
如何采用区域感知路由?
The concept of Zone Aware Routing has various implementations. So, adopting it depends on finding the implementations that fit your setup and tech stack. Let’s review some of the options:
区域感知路由的概念有多种实现方式。因此,采用它取决于找到适合您设置和技术堆栈的实现。让我们回顾一些选项:
Istio Locality Load Balancing
Istio 本地负载均衡
Istio is an open-source service mesh platform for Kubernetes. Istio probably has the most promising Zone Aware Routing implementation out there: Locality Load Balancing. It is a feature in Istio that allows for efficient routing of requests to the closest available instance of a service based on the locality of the service and the upstream service instances. When Istio is installed and configured in a Kubernetes cluster, locality can be defined based on geographical region, availability zone, and sub-zone factors.
Istio’s Locality Load Balancing enables requests to be preferentially routed to an upstream service instance in the same locality (i.e., zone) as the originating service. If no healthy instances of a service are available in the same locality as the origin, Istio can failover and reroute the requests to instances in the nearest available locality (e.g., another zone and even another region). This ensures high availability and fault tolerance.
In addition, Istio allows you to define weights for different localities, enabling you to control the distribution of traffic across AZs based on factors such as capacity or priority. This makes it easier to overcome uneven zonal distributions by letting you adjust the amount of traffic you keep localized in each zone vs. the amount of traffic you send across zones.
Topology Aware Routing (AKA Topology Aware Hints)
拓扑感知路由(又称拓扑感知提示)
If you work with Kubernetes without Istio, Topology Aware Routing is a native feature of Kubernetes, introduced in version 1.17 (as topology aware hints). While somewhat simpler than what Istio offers, it allows the Kubernetes scheduler to make intelligent routing decisions based on the cluster's topology. It considers the topology information, such as the node’s geographical location (e.g., region) or the availability zone, to optimize the placement of pods and the routing of traffic.
如果您在没有 Istio 的情况下使用 Kubernetes,拓扑感知路由是 Kubernetes 1.17 版本(作为拓扑感知提示)引入的本地功能。虽然比 Istio 提供的要简单一些,但它允许 Kubernetes 调度器根据集群的拓扑结构做出智能路由决策。它考虑拓扑信息,例如节点的地理位置(例如,区域)或可用区,以优化 Pod 的放置和流量路由。
Kubernetes also provides some essential safeguards around this feature. The Kubernetes control plane will apply those safeguard rules before using the feature. Kubernetes won't localize the request if these rules don’t check out. Instead, it will select an instance regardless of the zone or region to maintain high availability and avoid overloading a specific zone or instance. The rules can evaluate whether every zone has available instances of a service or if it is indeed possible to achieve balanced allocation across zones, preventing the creation of uneven traffic distribution on the cluster.
Kubernetes 还为此功能提供了一些基本的安全保障。Kubernetes 控制平面会在使用该功能之前应用这些安全规则。如果这些规则不符合要求,Kubernetes 不会本地化请求。相反,它将选择一个实例,无论该实例位于哪个区域或地区,以保持高可用性并避免特定区域或实例过载。这些规则可以评估每个区域是否有可用的服务实例,或者是否确实可以在区域之间实现平衡分配,防止在集群上创建不均匀的流量分布。
While Topology Aware Routing enables optimizing cross-zone traffic and handling the hidden burden of performance and costs of working with multiple AZs, it is a little less capable than Istio’s Locality Load Balancing. The main drawback is that it doesn’t handle failovers like Istio. Instead, its safeguards approach shuts down the zone locality entirely for the service, which is a more rigid design choice requiring services to spread evenly across zones to benefit from those rules.
虽然拓扑感知路由能够优化跨区域流量并处理与多个可用区(AZ)合作时的性能和成本隐性负担,但其能力略逊于 Istio 的本地负载均衡。主要缺点是它不像 Istio 那样处理故障转移。相反,其安全防护方法完全关闭了服务的区域本地性,这是一个更严格的设计选择,要求服务在区域间均匀分布才能从这些规则中受益。
End-to-End Zone Aware Routing
Using Istio’s Locality Load Balancing or Kubernetes Topology Aware Routing to maintain Zone Aware Routing is a significant step forward in fighting our distributed systems' data transfer costs and performance bottlenecks. However, to complete the adoption of Zone Aware Routing end-to-end and bringing cross-AZ data transfer to a minimum, we also have to figure out whether our services are capable of reading their data from DBs and Message Queues (MQ) over the local zone (see figure 3).
Figure 3: Illustration of a request flow localized on a single AZ end-to-end
图 3:单 AZ 端到端请求流程的示意图
DBs and MQs are typically distributed across AZs to achieve high availability, maintaining copies of each piece of data in every AZ. This is why DBs and MQs are prone to being accessed cross zones, and as such, exposing the system to the performance and cost burden. So, can we optimize our DB reads latencies and reduce data transfer costs by accessing their data over the local zone without compromising resilience?
数据库和消息队列通常分布在不同的可用区(AZ)以实现高可用性,在每个可用区维护每份数据的副本。这就是为什么数据库和消息队列容易跨区域访问,从而暴露系统于性能和成本负担。那么,我们能否通过在本地区域访问它们的数据来优化数据库读取延迟并降低数据传输成本,同时不牺牲弹性呢?
Here are some examples of DBs and MQs that can support Zone Aware Routing:
以下是一些支持区域感知路由的数据库(DB)和消息队列(MQ)的示例:
Kafka’s Follower Fetching feature
卡夫卡跟随者抓取功能
Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, and data integration. Like other MQs, it is typically used to connect between microservices and move data across distributed systems.
Apache Kafka 是一个开源的分布式事件流平台,用于高性能数据管道、流式分析和数据集成。与其他消息队列(MQs)一样,它通常用于连接微服务并在分布式系统中传输数据。
Follower Fetching is a feature of the Kafka client library that allows consumers to prefer reading data over the local availability zone, whether from the leader or replica node. This capability is based on Kafka’s Rack Awareness feature, which is designed to enhance data reliability and availability by leveraging the physical or logical topology of the data center.
跟随者获取是 Kafka 客户端库的一个功能,允许消费者优先读取本地可用区之外的数据,无论来自领导者节点还是副本节点。此功能基于 Kafka 的机架感知功能,该功能旨在通过利用数据中心的数据中心的物理或逻辑拓扑结构来增强数据可靠性和可用性。
Kafka’s Rack Awareness requires each broker within the cluster to be assigned its corresponding rack information (e.g., its availability zone ID) so that it will be possible to ensure that replicas of a topic’s partition are distributed across different AZs to minimize the risk of data loss or service disruption in case of an AZ or node failure. While rack awareness doesn’t directly control the locality of client reads, it does enable the Kafka broker to provide its location information via the rack tag so that the client libraries can use this to attempt to read from replicas located in the same rack or zone, or either fallback to another zone if the local broker isn’t available. Thus, it follows the concept of Zone Aware Routing, which optimizes data transfer costs and latency. To utilize follower fetching, we must assign the consumer the rack tag, indicating which AZ it is currently running from so the client library can use it to locate a broker on that same AZ.
Notice that when using Kafka’s follower fetching feature and a local broker’s replication lags behind the leader, it will increase the consumer’s waiting time within that local zone instead of impacting random consumers across zones, thus limiting the blast radius of some issues to the local zone. Also, like other Zone Awareness implementations, it is very sensitive to uneven zonal distributions of consumers, which may cause certain brokers to become overloaded and require the consumers to handle failover.
Redis and other Zone Aware DBs
Redis is a popular key-value DB used in distributed systems. It is one of those DBs used in high throughput, large-scale apps for caching and other submillisecond queries. A DB that typically would be distributed across multiple AZs for redundancy reasons. As such, any geographically distributed application that reads from Redis would have a great performance and cost-benefit from reading over the local zone.
Redis does not have built-in support to automatically route read requests to the local replica based on the client’s availability zone. However, we can gain this capability with certain Redis client libraries. For example, when using Lettuce (a Java lib), setting the client’s "ReadFrom" setting to LOWEST_LATENCY will configure the client to read from the lowest-latency copy of the data, whether it is on a replica or the master node. This copy will usually reside in the local zone, therefore reducing the cross-zone data transfer.
If you use a client library that does not support selecting the nearest replica, you can implement custom logic that achieves the same by retrieving the data from the local Redis endpoint. And preferably fallback to a cross-AZ endpoint when needed.
如果您使用不支持选择最近副本的客户端库,您可以通过从本地 Redis 端点检索数据来实现相同的逻辑。并且最好在需要时回退到跨可用区(AZ)的端点。
There are more popular database technologies that support Zone Aware Routing; here are two that are worth mentioning:
存在更多支持区域感知路由的流行数据库技术;以下是两个值得提及的:
Aerospike—A distributed key-value database designed for high-performance data operation. It has a rack awareness feature, which, like Kafka, provides a mechanism for database clients to prefer reading from the closest rack or zone.
Vitess—A distributed scalable version of MySQL originally developed by Youtube, enabling Zone Aware Routing with its Local Topology Service that enables it to route queries over the local zone.
While we may notice that the concept of Zone Aware Routing is named a little bit differently in every piece of technology, all of those implementations have the same goal —to improve read latencies and data transfer costs while not sacrificing service reliability.
Another cost burden of distributing our DBs and MQs across multiple availability zones is that every piece of data we write needs to be replicated within the DB cluster itself to every zone. The replication, by itself, can cause a significant amount of data transfer cost. While technically there is nothing to do about it, it is essential to note that usually managed DB services such as AWS MSK (Amazon’s Managed Streaming for Apache Kafka), AWS RDS (Amazon Relational Database Service), Elasticache (Amazon’s managed Redis), and more, aren’t charging their users for cross AZ data transfer within the cluster but only for data getting in and out of the cluster. So, choosing a managed service can become a significant cost consideration if you plan to replicate large amounts of data.
Handling Hotspots and Uneven Service Distribution
While it is highly suggested that we maintain even zonal distributions of our services, it is not always as trivial or possible. Handling hotspots is a complex topic, but one way to address it would be to consider a separate deployment and a separate auto-scaling policy for each zone. Allowing a service to independently scale based on the traffic within a specific zone can ensure that if the load is concentrated in a specific zone, it will be adequately scaled to handle that load with dedicated resources, eliminating the need to fall back to other zones.
Conclusions
Fighting data transfer costs and performance in highly available distributed systems is a real challenge. To handle it, we first need to figure out which microservices are prone to frequent cross-AZ data transfer. We need to be data-driven about it, measure cross-AZ costs, and trace latencies of microservices communication to ensure we put our efforts in the right direction.
应对高可用分布式系统中的数据传输成本和性能问题是一项真正的挑战。为了应对它,我们首先需要找出哪些微服务容易发生频繁的跨区域数据传输。我们需要对此进行数据驱动,衡量跨区域数据传输的成本,并追踪微服务通信的延迟,以确保我们的努力方向正确。
Then, we must adopt the proper tooling to eliminate or minimize those cross-AZ calls. Those tools and optimizations depend on our technology stack. To adopt Zone Aware Routing end-to-end, we must adopt different tools for different use cases. Some tools, like Istio and Topology Aware Routing rules, are meant to unlock zone awareness for Kubernetes pods communicating with each other. However, they won’t apply to cases where your microservices consume data across AZ from Databases or Message Queues. So, we must choose the right DBs and MQs with zone-aware routing baked in their client libs.
然后,我们必须采用适当的工具来消除或最小化跨 AZ 调用。这些工具和优化取决于我们的技术栈。为了实现端到端的区域感知路由,我们必须针对不同的用例采用不同的工具。一些工具,如 Istio 和拓扑感知路由规则,旨在为相互通信的 Kubernetes pod 解锁区域感知。然而,它们不适用于您的微服务从数据库或消息队列跨 AZ 消费数据的情况。因此,我们必须选择具有区域感知路由内置客户端库的正确数据库和消息队列。
Finally, we must ensure we do not break our system’s high availability and resilience by deploying those tools and optimizations in an uneven zonal distribution setup that causes hotspots (instances bombarded with traffic). Such deployment may defeat the purpose, causing us to fix our cost and performance problem by creating a new one.
最后,我们必须确保通过在区域分布不均的设置中部署这些工具和优化措施,不会破坏我们系统的高可用性和弹性,从而造成热点(遭受大量流量冲击的实例)。这种部署可能会适得其反,导致我们通过创造新的问题来解决成本和性能问题。
Overall, optimizing cross-AZ traffic is crucial; it can affect our applications' performance, latency, and cost, but it has to be handled carefully to ensure that we do not sacrifice our app’s reliability and high availability.
总体而言,优化跨可用区(AZ)流量至关重要;它会影响我们应用程序的性能、延迟和成本,但必须谨慎处理,以确保我们不会牺牲应用程序的可靠性和高可用性。