这是用户在 2024-11-23 1:33 为 https://pingcap.github.io/tidb-dev-guide/understand-tidb/introduction.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Introduction of TiDB Architecture
TiDB 架构介绍

Understanding TiDB talks about the architecture of TiDB, the modules it consists of, and the responsibility of each module.
《了解 TiDB》讲述了 TiDB 的架构、模块组成以及每个模块的职责。

TiDB Architecture TiDB 架构

When people refer to TiDB, they usually refer to the entire TiDB distributed database that includes three components: the TiDB stateless server, the Placement Driver (PD) server, and the storage server, TiKV or TiFlash. The TiDB server does not store data; it only computes and processes SQL queries. The PD server is the managing components of the entire cluster. The storage server is responsible for persistently storing data.
当人们提到 TiDB 时,通常指的是整个 TiDB 分布式数据库,它包括三个组件:TiDB 无状态服务器、放置驱动程序(PD)服务器和存储服务器 TiKV 或 TiFlash。TiDB 服务器不存储数据;它只计算和处理 SQL 查询。PD 服务器是整个集群的管理组件。存储服务器负责持久存储数据。

Let's see an architecture graph from TiDB stateless server's perspective.
让我们从 TiDB 无状态服务器的角度来看看架构图。

tidb-architecture

As you can see, TiDB is a SQL engine that supports the MySQL protocol with some kind of distributed KV storage engine that supports transactions as the underlying storage.
正如你所看到的,TiDB 是一个支持 MySQL 协议的 SQL 引擎,其底层存储是某种支持事务的分布式 KV 存储引擎。

Here come three significant questions.
这里有三个重要问题。

  1. How to support MySQL protocol?
    如何支持 MySQL 协议?
  2. How to communicate with storage engine, store and load data?
    如何与存储引擎通信、存储和加载数据?
  3. How to implement SQL functions?
    如何实现 SQL 函数?

This section will start with a few of brief descriptions of what modules TiDB has and what they do, and then put them together to answer these three questions.
本节将首先简要介绍 TiDB 有哪些模块及其作用,然后将它们放在一起回答这三个问题。

Code Structure 代码结构

TiDB source code is fully hosted on Github, you can see all the information from the repository homepage. The whole repository is developed in Golang and divided into many packages according to functional modules.
TiDB 的源代码完全托管在 Github 上,你可以从 代码库主页查看所有信息。整个版本库使用 Golang 开发,并根据功能模块分为多个包。

Most of the packages export services in the form of interfaces, and most of the functionality is concentrated in one package. But there are packages that provide basic functionality and are dependent on many packages, so these packages need special attention.
大多数软件包以接口的形式输出服务,大部分功能都集中在一个软件包中。但有些软件包提供基本功能,并且依赖于许多软件包,因此这些软件包需要特别关注。

The main method of TiDB locates in tidb-server/main.go, which defines how the service is started.
TiDB 的主方法位于 tidb-server/main.go 中,它定义了服务的启动方式。

The build system of the entire project can be found in the Makefile.
整个项目的构建系统可在 Makefile 中找到。

In addition to the code, there are many test cases, which can be found with suffix _test.go. There is also toolkit under the cmd directory for doing performance tests or constructing test data.
除代码外,还有许多测试用例,可在后缀为 _test.go 的目录下找到。cmd 目录下还有工具包,用于进行性能测试或构建测试数据。

Module Structure 模块结构

TiDB has a number of modules. Table below is an overview that shows what each module does, and if you want to see the code for the relevant function, you can find the corresponding module directly.
TiDB 有许多模块。下表概述了每个模块的功能,如果你想查看相关功能的代码,可以直接找到相应的模块。

Package 包装Description 说明
pingcap/tidb/pkg/bindinfoHandles all global sql bind operations, and caches the sql bind info from storage.
处理所有全局 sql 绑定操作,并缓存存储中的 sql 绑定信息。
pingcap/tidb/pkg/configThe configuration definition.
配置定义。
pingcap/tidb/pkg/ddlThe execution logic of data definition language (DDL).
数据定义语言 (DDL) 的执行逻辑。
pingcap/tidb/pkg/distsqlThe abstraction of the distributed computing interfaces to isolate the logic between the executor and the TiKV client
对分布式计算接口进行抽象,以隔离执行器和 TiKV 客户端之间的逻辑
pingcap/tidb/pkg/domainThe abstraction of a storage space in which databases and tables can be created. Like namespace, databases with the same name can exist in different domains. In most cases, a single TiDB instance only creates one Domain instance with details about the information schema and statistics.
存储空间的抽象,可在其中创建数据库和表。与命名空间一样,具有相同名称的数据库可以存在于不同的域中。在大多数情况下,一个 TiDB 实例只创建一个域实例,其中包含信息模式和统计信息的详细信息。
pingcap/tidb/pkg/errnoThe definition of MySQL error code, error message, and error summary.
MySQL 错误代码、错误信息和错误摘要的定义。
pingcap/tidb/pkg/executorThe operator related code that contains the execution logic of most statements.
与运算符相关的代码,包含大多数语句的执行逻辑。
pingcap/tidb/pkg/expressionThe expression-related code that contains various operators and built-in functions.
与表达式相关的代码,包含各种运算符和内置函数。
pingcap/tidb/pkg/infoschemaThe metadata management module for SQL statements; accessed when all the operations on the information schema are executed.
SQL 语句的元数据管理模块;执行信息模式的所有操作时都会访问该模块。
pingcap/tidb/pkg/kvThe Key-Value engine interface and some public methods; the interfaces defined in this package need to be implemented by the storage engine which is going to adapt TiDB SQL layer.
键值引擎接口和一些公共方法;该包中定义的接口需要由适应 TiDB SQL 层的存储引擎来实现。
pingcap/tidb/pkg/lockThe implementation of LOCK/UNLOCK TABLES.
LOCK/UNLOCK 表的实现。
pingcap/tidb/pkg/metaManages the SQL metadata in the storage engine through the features of the structure package; infoschema and DDL use this module to access or modify the SQL metadata .
通过结构包的功能管理存储引擎中的 SQL 元数据;infoschema 和 DDL 使用该模块访问或修改 SQL 元数据。
pingcap/tidb/pkg/meta/autoidA module to generate the globally unique monotonically incremental IDs for each table, as well as the database ID and table ID.
为每个表生成全局唯一单调递增 ID 以及数据库 ID 和表 ID 的模块。
pingcap/tidb/pkg/metricsStore the metrics information of all modules.
存储所有模块的度量信息。
pingcap/tidb/pkg/ownerSome tasks in the TiDB cluster can be executed by only one instance, such as the asynchronous schema change. This owner module is used to coordinate and generate a task executor among multiple TiDB servers. Each task has its own executor.
TiDB 集群中的某些任务只能由一个实例执行,如异步模式更改。该所有者模块用于协调和生成多个 TiDB 服务器之间的任务执行器。每个任务都有自己的执行器。
pingcap/tidb/pkg/parserA MySQL compatible SQL parser used by TiDB, also contains the data structure definition of abstract syntax tree (AST) and other metadata.
TiDB 使用的与 MySQL 兼容的 SQL 解析器,还包含抽象语法树(AST)的数据结构定义和其他元数据。
pingcap/tidb/pkg/plannerQueries optimization related code.
查询与优化相关的代码。
pingcap/tidb/pkg/pluginThe plugin framework of TiDB.
TiDB 的插件框架。
pingcap/tidb/pkg/privilegeThe management interface of user privileges.
用户权限管理界面。
pingcap/tidb/pkg/serverCode of the MySQL protocol and connection management.
MySQL 协议和连接管理代码。
pingcap/tidb/pkg/sessionCode of session management.
会议管理守则。
pingcap/tidb/pkg/sessionctx/binloginfoOutput binlog information.
输出 binlog 信息。
pingcap/tidb/pkg/sessionctx/stmtctxNecessary information for the statement of a session during runtime.
运行期间会话声明的必要信息。
pingcap/tidb/pkg/sessionctx/variableSystem variable related code.
系统变量相关代码。
pingcap/tidb/pkg/statisticsCode of table statistics.
表格统计代码。
pingcap/tidb/pkg/storeStorage engine drivers, wrapping Key-Value client to meet the requirements of TiDB.
存储引擎驱动程序,封装 Key-Value 客户端以满足 TiDB 的要求。
tikv/client-goThe Go client of TiKV.
TiKV 的 Go 客户端
pingcap/tidb/pkg/structureThe structured API defined on the Transactional Key-Value API, providing structures like List, Queue, and HashMap.
结构化 API 定义于事务键值 API,提供 List、Queue 和 HashMap 等结构。
pingcap/tidb/pkg/tableThe abstraction of Table in SQL.
SQL 中表的抽象。
pingcap/tidb/pkg/tablecodecEncode and decode data from SQL to Key-Value. See the codec package for the specific encoding and decoding solution for each data type.
将数据从 SQL 编码和解码为 Key-Value。有关每种数据类型的具体编码和解码解决方案,请参阅编解码器软件包。
pingcap/tidb/pkg/telemetryCode of telemetry collect and report.
遥测收集和报告代码。
pingcap/tidb/cmd/tidb-serverThe main method of the TiDB service.
TiDB 服务的主方法。
pingcap/tidb/pkg/typesAll the type related code, including the definition of and operation on types.
所有与类型相关的代码,包括类型的定义和操作。
pingcap/tidb/pkg/utilUtilities. 公用设施

At a glance, TiDB has 80 packages, which might let you feel overwhelmed, but not all of them are important, and some features only involve a small number of packages, so where to start to look at the source code depends on the purpose of looking at the source code.
一眼望去,TiDB 有 80 个软件包,这可能会让你感到不知所措,但并非所有软件包都很重要,有些功能只涉及少量软件包,因此从哪里开始查看源代码取决于查看源代码的目的。

If you want to understand the implementation details of a specific feature, then you can refer to the module description above and just find the corresponding module.
如果您想了解某个特定功能的实现细节,可以参考上面的模块描述,找到相应的模块即可。

If you want to have a comprehensive understanding of the source code, then you can start from tidb-server/main.go and see how tidb-server starts and how it waits for and handles user requests. Then follow the code all the way through to see the exact execution of the SQL. There are also some important modules that need to be looked at to know how they are implemented. For the auxiliary modules, you can look at them selectively to get a general impression.
如果您想全面了解源代码,可以从 tidb-server/main.go 开始,查看 tidb-server 如何启动、如何等待和处理用户请求。然后跟随代码一路查看 SQL 的具体执行过程。还需要查看一些重要模块,以了解它们是如何实现的。对于辅助模块,可以有选择性地查看,以获得总体印象。

SQL Layer Architecture SQL 层架构

sql-layer-architecture

This is a detailed SQL layer architecture graph. You can read it from left to right.
这是一张详细的 SQL 层架构图。您可以从左到右阅读。

Protocol Layer 协议层

The leftmost is the Protocol Layer of TiDB, this is the interface to interact with Client, currently TiDB only supports MySQL protocol, the related code is in the server package.
最左边的是 TiDB 的协议层,这是与客户端交互的接口,目前 TiDB 只支持 MySQL 协议,相关代码位于 server 包中。

The purpose of this layer is to manage the client connection, parse MySQL commands and return the execution result. The specific implementation is according to MySQL protocol, you can refer to MySQL Client/Server Protocol document. If you need to use MySQL protocol parsing and processing functions in your project, you can refer to this module.
该层的目的是管理客户端连接、解析 MySQL 命令并返回执行结果。具体实现根据 MySQL 协议,可参考 MySQL 客户端/服务器协议文档。如果你需要在你的项目中使用 MySQL 协议解析和处理函数,你可以参考这个模块。

The logic for connection establishment is in the Run() method of server.go, mainly in the following two lines.
建立连接的逻辑在 server.goRun() 方法中,主要在以下两行。

conn, err := s.listener.Accept() clientConn := s.newConn(conn) go s.onConn(clientConn)

The entry method for a single session processing command is to call the dispatch method of the clientConn class, where the protocol is parsed and passed to a different handler.
单个会话处理命令的入口方法是调用 clientConn 类的 dispatch 方法,在该方法中,协议将被解析并传递给不同的处理程序。

SQL Layer SQL 层

Generally speaking, a SQL statement needs to go through a series of processes:
一般来说,SQL 语句需要经过一系列过程:

  1. syntax parsing 语法分析
  2. validity verification 有效性验证
  3. building query plan 建立查询计划
  4. optimizing query plan 优化查询计划
  5. generating executor according to plan
    按计划产生执行人
  6. executing and returning results
    执行和返回结果

These processes locate at the following modules:
这些流程位于以下模块:

Package 包装Usage 使用方法
pingcap/tidb/severInterface between protocol layer and SQL layer
协议层和 SQL 层之间的接口
pingcap/tidb/parserSQL parsing and syntax analyze
SQL 解析和语法分析
pingcap/tidb/plannerValidation, query plan building, query plan optimizing
验证、查询计划构建、查询计划优化
pingcap/tidb/executorExecutor generation and execution
执行器的生成和执行
pingcap/tidb/distsqlSend request to TiKV and aggregate return results from TiKV via TiKV Client
通过 TiKV 客户端向 TiKV 发送请求并汇总 TiKV 返回的结果
pingcap/tidb/kvKV client interface KV 客户界面
tikv/client-goTiKV Go Client TiKV Go 客户端

KV API Layer KV 应用程序接口层

TiDB relies on the underlying storage engine to store and load data. It does not rely on a specific storage engine (such as TiKV), but has some requirements for the storage engine, and any engine that meets these requirements can be used (TiKV is the most suitable one).
TiDB 依靠底层存储引擎来存储和加载数据。它不依赖于特定的存储引擎(如 TiKV),但对存储引擎有一些要求,任何满足这些要求的引擎都可以使用(TiKV 是最合适的引擎)。

The most basic requirement is "Key-Value engine with transactions and Golang driver". The more advanced requirement is "support for distributed computation interface", so that TiDB can push some computation requests down to the storage engine.
最基本的要求是 "带有事务和 Golang 驱动程序的键值引擎"。更高级的要求是 "支持分布式计算接口",这样 TiDB 就能将一些计算请求推送给存储引擎。

These requirements can be found in the interfaces of the kv package, and the storage engine needs to provide a Golang driver that implements these interfaces, which TiDB then uses to manipulate the underlying data.
这些要求可以在 kv 包的接口中找到,存储引擎需要提供一个实现这些接口的 Golang 驱动程序,然后 TiDB 将使用它来操作底层数据。

As for the most basic requirement, these interfaces are related:
就最基本的要求而言,这些接口是相关的:

  • Transaction: Basic manipulation of transaction
    Transaction: 交易的基本操作
  • Receiver: Interface for reading data
    接收器:读取数据的接口
  • Mutator: Interface for mutating data
    Mutator: 用于更改数据的接口
  • Storage: Basic functionality provided by the driver
    存储:驱动程序提供的基本功能
  • Snapshot: Basic manipulation of data snapshot
    Snapshot: 数据快照的基本操作
  • Iterator: Result of Seek, used to iterate data
    迭代器Seek的结果,用于迭代数据

With the above interfaces, you are able to do all the required operations on the data and complete all the SQL functions. However, for more efficient computing, we have also defined an advanced computing interface, which can focus on these three interfaces or structures:
通过上述界面,您可以对数据进行所有必要的操作,并完成所有 SQL 函数。不过,为了提高计算效率,我们还定义了高级计算界面,它可以集中使用这三个界面或结构:

  • Client: Send request to storage engine
    客户端:向存储引擎发送请求
  • Request: Payload of the request
    Request:请求的有效载荷
  • Response: Abstraction of result
    Response:结果抽象

Summary 摘要

This section talks about the source structure of TiDB and the architecture of three significant components. More details will be described in the later sections.
本节将介绍 TiDB 的源结构和三个重要组件的架构。更多细节将在后面的章节中介绍。