这是用户在 2024-3-25 18:32 为 https://15445.courses.cs.cmu.edu/fall2023/project0/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Project #0 - C++ Primer
项目 #0 - C++ 入门

Do not post your project on a public Github repository.
不要在公共 Github 存储库上发布您的项目。

Overview 概述

All the programming projects this semester will be written on the BusTub database management system. This system is written in C++. To make sure that you have the necessary C++ background, you must complete a simple programming assignment to assess your knowledge of basic C++ features. You will not be given a grade for this project, but you must complete the project with a perfect score before being allowed to proceed in the course. Any student unable to complete this assignment before the deadline will be asked to drop the course.
本学期的所有编程项目都将编写在 BusTub 数据库管理系统上。这个系统是用 C++ 编写的。若要确保您具有必要的 C++ 背景,必须完成简单的编程作业,以评估您对基本 C++ 功能的了解。你不会得到这个项目的成绩,但你必须以满分完成这个项目,然后才能被允许继续学习课程。任何无法在截止日期前完成此作业的学生都将被要求放弃课程。

All of the code in this programming assignment must be written in C++. The projects will be specifically written for C++17, but we have found that it is generally sufficient to know C++11. If you have not used C++ before, here are some resources to help:
此编程作业中的所有代码都必须用 C++ 编写。这些项目将专门为 C++ 编写17,但我们发现了解 C++ 11 通常就足够了。如果您以前没有使用过 C++,这里有一些资源可以提供帮助:

If you are using VSCode, we recommend you to install CMake Tools, C/C++ Extension Pack and clangd. After that, follow this tutorial to learn how to use the visual debugger in VSCode: Debug a C++ project in VS Code.
如果您使用的是 VSCode,我们建议您安装 CMake 工具、C/C++ 扩展包和 clangd。之后,按照本教程了解如何在 VSCode 中使用可视化调试器:在 VS Code 中调试 C++ 项目。

If you are using CLion, we recommend you to follow this tutorial: CLion Debugger Fundamentals.
如果您使用的是 CLion,我们建议您遵循以下教程:CLion 调试器基础知识。

If you prefer to use gdb for debugging, there are many tutorials available to teach you how to use gdb. Here are some that we have found useful:
如果您更喜欢用于 gdb 调试,有许多教程可以教您如何使用 gdb .以下是我们发现有用的一些:

This is a single-person project that will be completed individually (i.e. no groups).

Project Specification 项目规格

In this project, you will implement a key-value store backed by a copy-on-write trie. Tries are efficient ordered-tree data structures for retrieving a value for a given key. To simplify the explanation, we will assume that the keys are variable-length strings, but in practice they can be any arbitrary type.

Each node in a trie can have multiple child nodes representing different possible next characters.
trie 中的每个节点都可以有多个子节点,这些子节点表示不同的可能的下一个字符。

The key-value store you will implement can store string keys mapped to values of any type. The value of a key is stored in the node representing the last character of that key (aka terminal node). For example, consider inserting kv pairs ("ab", 1) and ("ac", "val") into the trie.
您将实现的键值存储可以存储映射到任何类型的值的字符串键。键的值存储在表示该键的最后一个字符的节点(也称为终端节点)中。例如,考虑将 kv 对 ("ab", 1) 插入到 ("ac", "val") trie 中。


The two keys share the same parent node. The value 1 corresponding to key "ab" is stored in the left child, and the value "val" corresponding to key "ac" is stored in the right node.
这两个键共享同一个父节点。键“ab”对应的值 1 存储在左子节点中,键“ac”对应的值“val”存储在右节点中。

Task #1 - Copy-On-Write Trie
任务 #1 - 写入时复制尝试

In this task, you will need to modify trie.h and trie.cpp to implement a copy-on-write trie. In a copy-on-write trie, operations do not directly modify the nodes of the original trie. Instead, new nodes are created for modified data, and a new root is returned for the newly-modified trie.
在此任务中,您将需要修改 trie.htrie.cpp 实现写入时复制尝试。在写入时复制尝试中,操作不会直接修改原始尝试的节点。相反,将为修改后的数据创建新节点,并为新修改的尝试返回新的根。

Copy-on-write enables us to access the trie after each operation at any time with minimum overhead. Consider inserting ("ad", 2) in the above example. We create a new Node2 by reusing two of the child nodes from the original tree, and creating a new value node 2. (See figure below)
写入时复制使我们能够在每次操作后随时以最小的开销访问 trie。请考虑在上面的示例中插入。 ("ad", 2) 我们通过重用原始树中的两个子节点并创建一个新的值节点 2 来创建一个新 Node2 节点。(见下图)


If we then insert ("b", 3), we will create a new root, a new node and reuse the previous nodes. In this way, we can get the content of the trie before and after each insertion operation. As long as we have the root object (Trie class), we can access the data inside the trie at that time. (See figure below)
如果我们随后插入 ("b", 3) ,我们将创建一个新的根,一个新节点并重用以前的节点。这样,我们就可以得到每次插入操作前后的trie内容。只要我们有根对象( Trie 类),我们就可以访问当时 trie 内部的数据。(见下图)


One more example: if we then insert ("a", "abc") and remove ("ab", 1), we can get the below trie. Note that parent nodes can have values, and you will need to purge all unnecessary nodes after removal. An empty trie is represented by nullptr.
再举一个例子:如果我们然后插入 ("a", "abc") 和删除 ("ab", 1) ,我们可以得到下面的尝试。请注意,父节点可以有值,删除后需要清除所有不必要的节点。空尝试用 表示 nullptr


Your trie must support three operations:
您的 trie 必须支持三个操作:

None of these operations should be performed directly on the trie itself. You should create new trie nodes and reuse existing ones as much as possible.
这些操作都不应直接在尝试本身上执行。您应该创建新的 trie 节点并尽可能多地重用现有节点。

To create a completely new node (i.e., a new leaf node without children), you can simply construct the object using the TrieNodeWithValue constructor and then make it into a smart pointer. To copy-on-write create a new node, you should use the Clone function on the TrieNode class. To reuse an existing node in the new trie, you can copy std::shared_ptr<TrieNode>: copying a shared pointer doesn’t copy the underlying data. You should not manually allocate memory by using new and delete in this project. std::shared_ptr will deallocate the object when no one has a reference to the underlying object.
要创建一个全新的节点(即,一个没有子节点的新叶节点),您只需使用 TrieNodeWithValue 构造函数构造对象,然后将其转换为智能指针即可。若要在写入时复制创建新节点,应使用 TrieNode 类上的 Clone 函数。若要在新尝试中重用现有节点,可以复制 std::shared_ptr<TrieNode> :复制共享指针不会复制基础数据。不应使用 newdelete 在此项目中手动分配内存。 std::shared_ptr 将释放对象,当没有人引用基础对象时。

For the full specifications of these operations, please refer to the comments in the starter code. Your implementation should store data as in the above examples. Do not store the C-string terminator \0 in your trie. Please also avoid removing any const from the class definitions or use mutable / const_cast to bypass the const checks.
有关这些操作的完整规格,请参阅起始代码中的注释。您的实现应存储数据,如上述示例所示。不要将 C 字符串终结器 \0 存储在您的尝试中。也请避免从类定义中删除任何 const 内容或使用 mutable / const_cast 绕过常量检查。

Task #2 - Concurrent Key-Value Store
任务 #2 - 并发键值存储

After you have a copy-on-write trie which can be used in a single-thread environment, implement a concurrent key-value store for a multithreaded environment. In this task, you will need to modify trie_store.h and trie_store.cpp. This key-value store also supports 3 operations:
在具有可在单线程环境中使用的写入时复制尝试后,请为多线程环境实现并发键值存储。在此任务中,您需要修改 trie_store.htrie_store.cpp .此键值存储还支持 3 个操作:

For the original Trie class, everytime we modify the trie, we need to get the new root to access the new content. But for the concurrent key-value store, the put and delete methods do not have a return value. This requires you to use concurrency primitives to synchronize reads and writes so that no data is lost through the process.
对于原始的 Trie 类,每次修改 trie 时,都需要获取新的根才能访问新内容。但对于并发键值存储, put and delete 方法没有返回值。这要求您使用并发基元来同步读取和写入,以便在此过程中不会丢失任何数据。

Your concurrent key-value store should concurrently serve multiple readers and a single writer. That is to say, when someone is modifying the trie, reads can still be performed on the old root. When someone is reading, writes can still be performed without waiting for reads.
并发键值存储应同时为多个读取器和单个写入器提供服务。也就是说,当有人修改 trie 时,仍然可以在旧根上执行读取。当有人正在阅读时,仍然可以在不等待读取的情况下执行写入。

Also, if we get a reference to a value from the trie, we should be able to access it no matter how we modify the trie. The Get function from Trie only returns a pointer. If the trie node storing this value has been removed, the pointer will be dangling. Therefore, in TrieStore, we return a ValueGuard which stores both a reference to the value and the TrieNode corresponding to the root of the trie structure, so that the value can be accessed as we store the ValueGuard.
此外,如果我们从 trie 中获得对值的引用,无论我们如何修改 trie,我们都应该能够访问它。 Get 函数 from Trie 仅返回指针。如果存储此值的 trie 节点已被删除,则指针将悬空。因此,在 中 TrieStore ,我们返回 a ValueGuard ,它既存储了对值的引用,也存储了对应于 trie 结构根的 TrieNode,因此可以在存储 ValueGuard .

To achieve this, we have provided you with the pseudo code for TrieStore::Get in trie_store.cpp. Please read it carefully and think of how to implement TrieStore::Put and TrieStore::Remove.
为此,我们为您提供了 TrieStore::Get in trie_store.cpp 的伪代码。请仔细阅读并思考如何实现 TrieStore::PutTrieStore::Remove .

Task #3 - Debugging
任务 #3 - 调试

In this task, you will learn the basic techniques of debugging. You can choose any way you prefer for debugging, including but not limited to: using cout and printf, using CLion / VSCode debugger, using gdb, etc.
在此任务中,您将学习调试的基本技术。您可以选择任何您喜欢的调试方式,包括但不限于:使用 coutprintf 、使用 CLion/VSCode 调试器、使用 gdb 等。

Please refer to trie_debug_test.cpp for instructions. You will need to set a breakpoint after the trie structure is generated and answer a few questions. You will need to fill in the answer in trie_answer.h.
有关说明,请参阅 trie_debug_test.cpp 。生成 trie 结构后,您需要设置断点并回答几个问题。您需要填写答案 trie_answer.h

Task #4 - SQL String Functions
任务 #4 - SQL 字符串函数

Now it is time to dive into BusTub itself! You will need to implement upper and lower SQL functions. This can be done in 2 steps: (1) implement the function logic in string_expression.h. (2) register the function in BusTub, so that the SQL framework can call your function when the user executes a SQL, in plan_func_call.cpp.
现在是时候深入了解 BusTub 本身了!您将需要实现 upper SQL lower 函数。这可以通过 2 个步骤完成:(1) 实现 中的 string_expression.h 函数逻辑。(2)在 BusTub 中注册函数,以便 SQL 框架可以在用户执行 SQL 时调用您的函数,在 plan_func_call.cpp .

To test your implementation, you can use bustub-shell:
要测试您的实现,您可以使用 bustub-shell

cd build
make -j`nproc` shell
bustub> select upper('AbCd'), lower('AbCd');
ABCD abcd

Your implementation should pass all 3 sqllogictest test cases.
您的实现应通过所有 3 个 sqllogictest 测试用例。

cd build
make -j`nproc` sqllogictest
./bin/bustub-sqllogictest ../test/sql/p0.01-lower-upper.slt --verbose
./bin/bustub-sqllogictest ../test/sql/p0.02-function-error.slt --verbose
./bin/bustub-sqllogictest ../test/sql/p0.03-string-scan.slt --verbose

Note: If you see BufferPoolManager is not implemented yet. when running sqllogictest, this is normal and you can safely ignore this warning in project 0.
注意:如果在运行sqllogictest时看到 BufferPoolManager is not implemented yet. ,这是正常现象,可以安全地忽略项目0中的此警告。

Instructions 指示

Creating Your Own Project Repository

If the below git concepts (e.g., repository, merge, pull, fork) do not make sense to you, please spend some time learning git first.
如果以下 git 概念(例如,存储库、合并、拉取、分叉)对您没有意义,请先花一些时间学习 git。

Follow the instructions to setup your own PRIVATE repository and your own development branch. If you have previuosly forked the repository through the GitHub UI (by clicking Fork), PLEASE DO NOT PUSH ANY CODE TO YOUR PUBLIC FORKED REPOSITORY! Make sure your repository is PRIVATE before you git push any of your code.
按照说明设置您自己的 PRIVATE 存储库和您自己的开发分支。如果您已经通过 GitHub UI 预先分叉了存储库(通过单击 Fork),请不要将任何代码推送到您的公共分叉存储库!在编写 git push 任何代码之前,请确保您的存储库是 PRIVATE。

If the instructor makes any changes to the code, you can merge the changes to your code by keeping your private repository connected to the CMU-DB master repository. Execute the following commands to add a remote source:
如果教师对代码进行了任何更改,您可以通过将私有存储库连接到 CMU-DB 主存储库来合并对代码所做的更改。执行以下命令以添加远程源:

$ git remote add public https://github.com/cmu-db/bustub.git

You can then pull down the latest changes as needed during the semester:

$ git fetch public
$ git merge public/master

Setting Up Your Development Environment

First install the packages that BusTub requires:
首先安装 BusTub 需要的软件包:

# Linux
$ sudo build_support/packages.sh
# macOS
$ build_support/packages.sh

See the README for additional information on how to setup different OS environments.

To build the system from the commandline, execute the following commands:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make -j`nproc`

We recommend always configuring CMake in debug mode. This will enable you to output debug messages and check for memory leaks (more on this in below sections).
建议始终在调试模式下配置 CMake。这将使您能够输出调试消息并检查内存泄漏(以下部分将对此进行更多介绍)。

Testing 测试

You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. You can compile and run each test individually from the command-line:
您可以使用我们的测试框架测试此作业的各个组件。我们将 GTest 用于单元测试用例。可以从命令行单独编译和运行每个测试:

$ cd build
$ make trie_test trie_store_test -j$(nproc)
$ make trie_noncopy_test trie_store_noncopy_test -j$(nproc)
$ ./test/trie_test
$ ./test/trie_noncopy_test
$ ./test/trie_store_test
$ ./test/trie_store_noncopy_test

In this project, there are no hidden tests. In the future, the provided tests in the starter code are only a subset of the all the tests that we will use to evaluate and grade your project.

You should write additional test cases on your own to check the complete functionality of your implementation.

Make sure that you remove the DISABLED_ prefix from the test names otherwise they will not run!
请确保从测试名称中删除前缀, DISABLED_ 否则它们将无法运行!

Formatting 格式

Your code must follow the Google C++ Style Guide. We use Clang to automatically check the quality of your source code. Your project grade will be zero if your submission fails any of these checks.
您的代码必须遵循 Google C++ 样式指南。我们使用 Clang 自动检查您的源代码的质量。如果您提交的未通过任何这些检查,您的项目成绩将为零。

Execute the following commands to check your syntax. The format target will automatically correct your code. The check-lint and check-clang-tidy targets will print errors that you must manually fix to conform to our style guide.
执行以下命令以检查语法。 format 目标将自动更正您的代码。 check-lintcheck-clang-tidy 目标将打印错误,您必须手动修复这些错误以符合我们的样式指南。

$ make format
$ make check-clang-tidy-p0

Memory Leaks 内存泄漏

For this project, we use LLVM Address Sanitizer (ASAN) and Leak Sanitizer (LSAN) to check for memory errors. To enable ASAN and LSAN, configure CMake in debug mode and run tests as you normally would. If there is memory error, you will see a memory error report. Note that macOS only supports address sanitizer without leak sanitizer.
对于此项目,我们使用 LLVM 地址清理程序 (ASAN) 和泄漏清理程序 (LSAN) 来检查内存错误。要启用 ASAN 和 LSAN,请在调试模式下配置 CMake,然后像往常一样运行测试。如果存在内存错误,您将看到内存错误报告。请注意,macOS 仅支持地址清理程序,不支持泄漏清理程序。

In some cases, address sanitizer might affect the usability of the debugger. In this case, you might need to disable all sanitizers by configuring the CMake project with:
在某些情况下,地址清理程序可能会影响调试器的可用性。在这种情况下,可能需要通过使用以下命令配置 CMake 项目来禁用所有清理程序:


Development Hints 开发提示

You can use BUSTUB_ASSERT for assertions in debug mode. Note that the statements within BUSTUB_ASSERT will NOT be executed in release mode. If you have something to assert in all cases, use BUSTUB_ENSURE instead.
可用于 BUSTUB_ASSERT 调试模式下的断言。请注意,其中 BUSTUB_ASSERT 的语句不会在发布模式下执行。如果您在所有情况下都要断言某些内容,请改用 BUSTUB_ENSURE

We will test your implementation in release mode. To compile your solution in release mode,

$ mkdir build_rel && cd build_rel
$ cmake -DCMAKE_BUILD_TYPE=Release ..

Post all of your questions about this project on Piazza. Do not email the TAs directly with questions.

TAs will not look into your code or help you debug in this project.
TA 不会查看你的代码,也不会帮助你在此项目中进行调试。

Grading Rubric 评分标准

In order to pass this project, you must ensure your code follows the following guidelines:

  1. Does the submission successfully execute all of the test cases and produce the correct answer?
  2. Does the submission execute without any memory leaks?
  3. Does the submission follow the code formatting and style policies?

Note that we will use additional test cases to grade your submission that are more complex than the sample test cases that we provide you in future projects.

Late Policy 逾期政策

There are no late days for this project.

Submission 提交

You will submit your implementation to Gradescope:
您将向 Gradescope 提交您的实施:

Run this command in build directory and it will create a zip archive called project0-submission.zip that you can submit to Gradescope.
在目录中 build 运行此命令,它将创建一个名为 project0-submission.zipzip 存档,您可以将其提交到 Gradescope。

$ make submit-p0

Although you are allowed submit your answers as many times as you like, you should not treat Gradescope as your only debugging tool. Most students submit their projects near the deadline, and thus Gradescope will take longer to process the requests. You may not get feedback in a timely manner to help you debug problems. Furthermore, the output from Gradescope is unlikely to be as informative as the output from a debugger (like gdb), provided you invest some time in learning to use it.
尽管您可以根据需要多次提交答案,但不应将 Gradescope 视为唯一的调试工具。大多数学生在截止日期前提交他们的项目,因此 Gradescope 将需要更长的时间来处理请求。您可能无法及时获得反馈来帮助您调试问题。此外,Gradescope 的输出不太可能像调试器(如 gdb )的输出那样提供信息,前提是您投入一些时间来学习使用它。

CMU students should use the Gradescope course code announced on Piazza.

Collaboration Policy 合作政策

WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.