Introduction of TiDB Architecture
TiDB 架构介绍
Understanding TiDB talks about the architecture of TiDB, the modules it consists of, and the responsibility of each module.
《了解 TiDB》讲述了 TiDB 的架构、模块组成以及每个模块的职责。
TiDB Architecture TiDB 架构
When people refer to TiDB, they usually refer to the entire TiDB distributed database that includes three components: the TiDB stateless server, the Placement Driver (PD) server, and the storage server, TiKV or TiFlash. The TiDB server does not store data; it only computes and processes SQL queries. The PD server is the managing components of the entire cluster. The storage server is responsible for persistently storing data.
当人们提到 TiDB 时,通常指的是整个 TiDB 分布式数据库,它包括三个组件:TiDB 无状态服务器、放置驱动程序(PD)服务器和存储服务器 TiKV 或 TiFlash。TiDB 服务器不存储数据;它只计算和处理 SQL 查询。PD 服务器是整个集群的管理组件。存储服务器负责持久存储数据。
Let's see an architecture graph from TiDB stateless server's perspective.
让我们从 TiDB 无状态服务器的角度来看看架构图。
As you can see, TiDB is a SQL engine that supports the MySQL protocol with some kind of distributed KV storage engine that supports transactions as the underlying storage.
正如你所看到的,TiDB 是一个支持 MySQL 协议的 SQL 引擎,其底层存储是某种支持事务的分布式 KV 存储引擎。
Here come three significant questions.
这里有三个重要问题。
- How to support MySQL protocol?
如何支持 MySQL 协议? - How to communicate with storage engine, store and load data?
如何与存储引擎通信、存储和加载数据? - How to implement SQL functions?
如何实现 SQL 函数?
This section will start with a few of brief descriptions of what modules TiDB has and what they do, and then put them together to answer these three questions.
本节将首先简要介绍 TiDB 有哪些模块及其作用,然后将它们放在一起回答这三个问题。
Code Structure 代码结构
TiDB source code is fully hosted on Github, you can see all the information from the repository homepage. The whole repository is developed in Golang and divided into many packages according to functional modules.
TiDB 的源代码完全托管在 Github 上,你可以从 代码库主页查看所有信息。整个版本库使用 Golang 开发,并根据功能模块分为多个包。
Most of the packages export services in the form of interfaces, and most of the functionality is concentrated in one package. But there are packages that provide basic functionality and are dependent on many packages, so these packages need special attention.
大多数软件包以接口的形式输出服务,大部分功能都集中在一个软件包中。但有些软件包提供基本功能,并且依赖于许多软件包,因此这些软件包需要特别关注。
The main method of TiDB locates in tidb-server/main.go
, which defines how the service is started.
TiDB 的主方法位于 tidb-server/main.go
中,它定义了服务的启动方式。
The build system of the entire project can be found in the Makefile
.
整个项目的构建系统可在 Makefile
中找到。
In addition to the code, there are many test cases, which can be found with suffix _test.go
. There is also toolkit under the cmd
directory for doing performance tests or constructing test data.
除代码外,还有许多测试用例,可在后缀为 _test.go
的目录下找到。cmd
目录下还有工具包,用于进行性能测试或构建测试数据。
Module Structure 模块结构
TiDB has a number of modules. Table below is an overview that shows what each module does, and if you want to see the code for the relevant function, you can find the corresponding module directly.
TiDB 有许多模块。下表概述了每个模块的功能,如果你想查看相关功能的代码,可以直接找到相应的模块。
Package 包装 | Description 说明 |
---|---|
pingcap/tidb/pkg/bindinfo | Handles all global sql bind operations, and caches the sql bind info from storage. 处理所有全局 sql 绑定操作,并缓存存储中的 sql 绑定信息。 |
pingcap/tidb/pkg/config | The configuration definition. 配置定义。 |
pingcap/tidb/pkg/ddl | The execution logic of data definition language (DDL). 数据定义语言 (DDL) 的执行逻辑。 |
pingcap/tidb/pkg/distsql | The abstraction of the distributed computing interfaces to isolate the logic between the executor and the TiKV client 对分布式计算接口进行抽象,以隔离执行器和 TiKV 客户端之间的逻辑 |
pingcap/tidb/pkg/domain | The abstraction of a storage space in which databases and tables can be created. Like namespace, databases with the same name can exist in different domains. In most cases, a single TiDB instance only creates one Domain instance with details about the information schema and statistics. 存储空间的抽象,可在其中创建数据库和表。与命名空间一样,具有相同名称的数据库可以存在于不同的域中。在大多数情况下,一个 TiDB 实例只创建一个域实例,其中包含信息模式和统计信息的详细信息。 |
pingcap/tidb/pkg/errno | The definition of MySQL error code, error message, and error summary. MySQL 错误代码、错误信息和错误摘要的定义。 |
pingcap/tidb/pkg/executor | The operator related code that contains the execution logic of most statements. 与运算符相关的代码,包含大多数语句的执行逻辑。 |
pingcap/tidb/pkg/expression | The expression-related code that contains various operators and built-in functions. 与表达式相关的代码,包含各种运算符和内置函数。 |
pingcap/tidb/pkg/infoschema | The metadata management module for SQL statements; accessed when all the operations on the information schema are executed. SQL 语句的元数据管理模块;执行信息模式的所有操作时都会访问该模块。 |
pingcap/tidb/pkg/kv | The Key-Value engine interface and some public methods; the interfaces defined in this package need to be implemented by the storage engine which is going to adapt TiDB SQL layer. 键值引擎接口和一些公共方法;该包中定义的接口需要由适应 TiDB SQL 层的存储引擎来实现。 |
pingcap/tidb/pkg/lock | The implementation of LOCK/UNLOCK TABLES. LOCK/UNLOCK 表的实现。 |
pingcap/tidb/pkg/meta | Manages the SQL metadata in the storage engine through the features of the structure package; infoschema and DDL use this module to access or modify the SQL metadata . 通过结构包的功能管理存储引擎中的 SQL 元数据;infoschema 和 DDL 使用该模块访问或修改 SQL 元数据。 |
pingcap/tidb/pkg/meta/autoid | A module to generate the globally unique monotonically incremental IDs for each table, as well as the database ID and table ID. 为每个表生成全局唯一单调递增 ID 以及数据库 ID 和表 ID 的模块。 |
pingcap/tidb/pkg/metrics | Store the metrics information of all modules. 存储所有模块的度量信息。 |
pingcap/tidb/pkg/owner | Some tasks in the TiDB cluster can be executed by only one instance, such as the asynchronous schema change. This owner module is used to coordinate and generate a task executor among multiple TiDB servers. Each task has its own executor. TiDB 集群中的某些任务只能由一个实例执行,如异步模式更改。该所有者模块用于协调和生成多个 TiDB 服务器之间的任务执行器。每个任务都有自己的执行器。 |
pingcap/tidb/pkg/parser | A MySQL compatible SQL parser used by TiDB, also contains the data structure definition of abstract syntax tree (AST) and other metadata. TiDB 使用的与 MySQL 兼容的 SQL 解析器,还包含抽象语法树(AST)的数据结构定义和其他元数据。 |
pingcap/tidb/pkg/planner | Queries optimization related code. 查询与优化相关的代码。 |
pingcap/tidb/pkg/plugin | The plugin framework of TiDB. TiDB 的插件框架。 |
pingcap/tidb/pkg/privilege | The management interface of user privileges. 用户权限管理界面。 |
pingcap/tidb/pkg/server | Code of the MySQL protocol and connection management. MySQL 协议和连接管理代码。 |
pingcap/tidb/pkg/session | Code of session management. 会议管理守则。 |
pingcap/tidb/pkg/sessionctx/binloginfo | Output binlog information. 输出 binlog 信息。 |
pingcap/tidb/pkg/sessionctx/stmtctx | Necessary information for the statement of a session during runtime. 运行期间会话声明的必要信息。 |
pingcap/tidb/pkg/sessionctx/variable | System variable related code. 系统变量相关代码。 |
pingcap/tidb/pkg/statistics | Code of table statistics. 表格统计代码。 |
pingcap/tidb/pkg/store | Storage engine drivers, wrapping Key-Value client to meet the requirements of TiDB. 存储引擎驱动程序,封装 Key-Value 客户端以满足 TiDB 的要求。 |
tikv/client-go | The Go client of TiKV. TiKV 的 Go 客户端 |
pingcap/tidb/pkg/structure | The structured API defined on the Transactional Key-Value API, providing structures like List, Queue, and HashMap. 结构化 API 定义于事务键值 API,提供 List、Queue 和 HashMap 等结构。 |
pingcap/tidb/pkg/table | The abstraction of Table in SQL. SQL 中表的抽象。 |
pingcap/tidb/pkg/tablecodec | Encode and decode data from SQL to Key-Value. See the codec package for the specific encoding and decoding solution for each data type. 将数据从 SQL 编码和解码为 Key-Value。有关每种数据类型的具体编码和解码解决方案,请参阅编解码器软件包。 |
pingcap/tidb/pkg/telemetry | Code of telemetry collect and report. 遥测收集和报告代码。 |
pingcap/tidb/cmd/tidb-server | The main method of the TiDB service. TiDB 服务的主方法。 |
pingcap/tidb/pkg/types | All the type related code, including the definition of and operation on types. 所有与类型相关的代码,包括类型的定义和操作。 |
pingcap/tidb/pkg/util | Utilities. 公用设施 |
At a glance, TiDB has 80 packages, which might let you feel overwhelmed, but not all of them are important, and some features only involve a small number of packages, so where to start to look at the source code depends on the purpose of looking at the source code.
一眼望去,TiDB 有 80 个软件包,这可能会让你感到不知所措,但并非所有软件包都很重要,有些功能只涉及少量软件包,因此从哪里开始查看源代码取决于查看源代码的目的。
If you want to understand the implementation details of a specific feature, then you can refer to the module description above and just find the corresponding module.
如果您想了解某个特定功能的实现细节,可以参考上面的模块描述,找到相应的模块即可。
If you want to have a comprehensive understanding of the source code, then you can start from tidb-server/main.go
and see how tidb-server starts and how it waits for and handles user requests. Then follow the code all the way through to see the exact execution of the SQL. There are also some important modules that need to be looked at to know how they are implemented. For the auxiliary modules, you can look at them selectively to get a general impression.
如果您想全面了解源代码,可以从 tidb-server/main.go
开始,查看 tidb-server 如何启动、如何等待和处理用户请求。然后跟随代码一路查看 SQL 的具体执行过程。还需要查看一些重要模块,以了解它们是如何实现的。对于辅助模块,可以有选择性地查看,以获得总体印象。
SQL Layer Architecture SQL 层架构
This is a detailed SQL layer architecture graph. You can read it from left to right.
这是一张详细的 SQL 层架构图。您可以从左到右阅读。
Protocol Layer 协议层
The leftmost is the Protocol Layer of TiDB, this is the interface to interact with Client, currently TiDB only supports MySQL protocol, the related code is in the server
package.
最左边的是 TiDB 的协议层,这是与客户端交互的接口,目前 TiDB 只支持 MySQL 协议,相关代码位于 server
包中。
The purpose of this layer is to manage the client connection, parse MySQL commands and return the execution result. The specific implementation is according to MySQL protocol, you can refer to MySQL Client/Server Protocol document. If you need to use MySQL protocol parsing and processing functions in your project, you can refer to this module.
该层的目的是管理客户端连接、解析 MySQL 命令并返回执行结果。具体实现根据 MySQL 协议,可参考 MySQL 客户端/服务器协议文档。如果你需要在你的项目中使用 MySQL 协议解析和处理函数,你可以参考这个模块。
The logic for connection establishment is in the Run()
method of server.go
, mainly in the following two lines.
建立连接的逻辑在 server.go
的 Run()
方法中,主要在以下两行。
conn, err := s.listener.Accept()
clientConn := s.newConn(conn)
go s.onConn(clientConn)
The entry method for a single session processing command is to call the dispatch
method of the clientConn
class, where the protocol is parsed and passed to a different handler.
单个会话处理命令的入口方法是调用 clientConn
类的 dispatch
方法,在该方法中,协议将被解析并传递给不同的处理程序。
SQL Layer SQL 层
Generally speaking, a SQL statement needs to go through a series of processes:
一般来说,SQL 语句需要经过一系列过程:
- syntax parsing 语法分析
- validity verification 有效性验证
- building query plan 建立查询计划
- optimizing query plan 优化查询计划
- generating executor according to plan
按计划产生执行人 - executing and returning results
执行和返回结果
These processes locate at the following modules:
这些流程位于以下模块:
Package 包装 | Usage 使用方法 |
---|---|
pingcap/tidb/sever | Interface between protocol layer and SQL layer 协议层和 SQL 层之间的接口 |
pingcap/tidb/parser | SQL parsing and syntax analyze SQL 解析和语法分析 |
pingcap/tidb/planner | Validation, query plan building, query plan optimizing 验证、查询计划构建、查询计划优化 |
pingcap/tidb/executor | Executor generation and execution 执行器的生成和执行 |
pingcap/tidb/distsql | Send request to TiKV and aggregate return results from TiKV via TiKV Client 通过 TiKV 客户端向 TiKV 发送请求并汇总 TiKV 返回的结果 |
pingcap/tidb/kv | KV client interface KV 客户界面 |
tikv/client-go | TiKV Go Client TiKV Go 客户端 |
KV API Layer KV 应用程序接口层
TiDB relies on the underlying storage engine to store and load data. It does not rely on a specific storage engine (such as TiKV), but has some requirements for the storage engine, and any engine that meets these requirements can be used (TiKV is the most suitable one).
TiDB 依靠底层存储引擎来存储和加载数据。它不依赖于特定的存储引擎(如 TiKV),但对存储引擎有一些要求,任何满足这些要求的引擎都可以使用(TiKV 是最合适的引擎)。
The most basic requirement is "Key-Value engine with transactions and Golang driver". The more advanced requirement is "support for distributed computation interface", so that TiDB can push some computation requests down to the storage engine.
最基本的要求是 "带有事务和 Golang 驱动程序的键值引擎"。更高级的要求是 "支持分布式计算接口",这样 TiDB 就能将一些计算请求推送给存储引擎。
These requirements can be found in the interfaces of the kv
package, and the storage engine needs to provide a Golang driver that implements these interfaces, which TiDB then uses to manipulate the underlying data.
这些要求可以在 kv
包的接口中找到,存储引擎需要提供一个实现这些接口的 Golang 驱动程序,然后 TiDB 将使用它来操作底层数据。
As for the most basic requirement, these interfaces are related:
就最基本的要求而言,这些接口是相关的:
Transaction
: Basic manipulation of transactionTransaction
: 交易的基本操作Receiver
: Interface for reading data接收器
:读取数据的接口Mutator
: Interface for mutating dataMutator
: 用于更改数据的接口Storage
: Basic functionality provided by the driver存储
:驱动程序提供的基本功能Snapshot
: Basic manipulation of data snapshotSnapshot
: 数据快照的基本操作Iterator
: Result ofSeek
, used to iterate data迭代器
:Seek
的结果,用于迭代数据
With the above interfaces, you are able to do all the required operations on the data and complete all the SQL functions. However, for more efficient computing, we have also defined an advanced computing interface, which can focus on these three interfaces or structures:
通过上述界面,您可以对数据进行所有必要的操作,并完成所有 SQL 函数。不过,为了提高计算效率,我们还定义了高级计算界面,它可以集中使用这三个界面或结构:
Client
: Send request to storage engine客户端
:向存储引擎发送请求Request
: Payload of the requestRequest
:请求的有效载荷Response
: Abstraction of resultResponse
:结果抽象
Summary 摘要
This section talks about the source structure of TiDB and the architecture of three significant components. More details will be described in the later sections.
本节将介绍 TiDB 的源结构和三个重要组件的架构。更多细节将在后面的章节中介绍。