Project #1 - Buffer Pool Manager
项目#1 - 缓冲池管理器
Do not post your project on a public Github repository.
Overview
This semester, you will build a disk-oriented database management system (DBMS) called BusTub. A disk-oriented architecture means that the DBMS's primary storage location is in persistent storage, like a hard drive (HDD) or flash storage (SSDs). This is different from an in-memory DBMS, where data is stored in volatile memory.
The first programming project is to implement the DBMS's buffer pool manager. The buffer pool is responsible for moving physical pages of data back and forth from buffers in main memory to persistent storage. It also behaves as a cache, keeping frequently used pages in memory for faster access, and evicting unused or cold pages back out to storage.
A page in BusTub is 4096 bytes (4 KB) of data, meaning the buffer pool manages data in 4 KB units. Since pages in BusTub have a fixed size, the buffer pool manager stores these pages into fixed-size buffers called frames. The distinction between a page and a frame is somewhat subtle. A page is 4 KB of logical (virtual) data, and can be stored in memory, on disk, or both in memory and on disk. A frame, on the other hand, is a fixed-length 4 KB block of memory (i.e., a pointer to this memory) that stores a single page of data. The analogy here is storing (logical) pages inside (physical) fixed frames.
In addition to behaving as a cache, the buffer pool manager allows a DBMS to support databases that are larger than the amount of memory available to the system. Consider a computer with 1 GB of memory (RAM). If we want to manage a 2 GB database, a buffer pool manager gives us the ability to interact with this database without needing to fit its entire contents in memory.
The I/O operations that the buffer pool executes are abstracted away from other parts of the DBMS. For example, when one of the DBMS's components (e.g., execution engine) asks the buffer pool manager for a page of data using its unique identifier (page_id_t
), that component does not need to know whether that page is already in memory or whether the system has to retrieve it from disk. Similarly, the buffer pool manager does not need to understand the contents of these pages, it only needs to know where the data is located.
Implementation
Your implementation of the buffer pool must be thread-safe. Multiple threads will concurrently access the internal data structures of your buffer pool, and you must make sure that critical sections are protected with latches (these are called "locks" in operating systems).
You must implement the following storage manager components:
This is a single-person project that must be completed individually (i.e., no groups).
- Release Date: Jan 22, 2025
- Due Date: Feb 09, 2025 @ 11:59pm
Project Specification
Remember to pull latest code from the BusTub repository.
For each of the following components, we have provided stub classes that contain the API that you must implement. You should not modify the signatures for the pre-defined functions in these classes. If you modify the signatures, our grading test code will not work and you will not get credit for this project.
If a class already contains data members, you should not remove them. For example, the BufferPoolManager
class contains DiskScheduler
and LRUKReplacer
members that are required to implement functionality needed by the rest of the system. You may add data members and helper functions to these classes to correctly implement the required functionality.
如果一个类已经包含数据成员,则不应移除它们。例如, BufferPoolManager
类包含 DiskScheduler
和 LRUKReplacer
成员,这些成员是实现系统其他部分所需功能所必需的。你可以向这些类添加数据成员和辅助函数,以正确实现所需功能。
You may use any built-in C++17 containers in your project unless specified otherwise. It is up to you to decide which ones you want to use. Be warned that these containers are not thread-safe, and you will need to use latches to protect access to them. You may not use additional third-party libraries (e.g., Boost).
除非另有说明,否则你可以在项目中使用任何内置的 C++17 容器。具体使用哪些容器由你自行决定。请注意这些容器不是线程安全的,你需要使用锁存器来保护对它们的访问。不得使用额外的第三方库(如 Boost)。
Task #1 - LRU-K Replacement Policy
任务 #1 - LRU-K 替换策略
This component is responsible for tracking page usage in the buffer pool in order to determine candidate pages / frames to evict out of memory and back to disk.
该组件负责跟踪缓冲池中的页面使用情况,以确定需要从内存中逐出并写回磁盘的候选页面/帧。
You will implement a class called LRUKReplacer
in src/include/buffer/lru_k_replacer.h and its corresponding implementation file in src/buffer/lru_k_replacer.cpp . Note that LRUKReplacer
is a standalone class and is not related to any of the other Replacer
classes. You are only expected to implement the LRU-K replacement policy, and you don't have to implement the LRU or Clock replacement policy (even though there is a corresponding file for it).
你将在 src/include/buffer/lru_k_replacer.h 中实现一个名为 LRUKReplacer
的类,并在 src/buffer/lru_k_replacer.cpp 中完成其对应的实现文件。请注意, LRUKReplacer
是一个独立的类,与其他任何 Replacer
类都没有关联。你只需要实现 LRU-K 替换策略,无需实现 LRU 或 Clock 替换策略(尽管存在对应的实现文件)。
The LRU-K algorithm evicts a frame whose backward k-distance is the maximum of all frames in the replacer. Backward k-distance is computed as the difference in time between the current timestamp and the timestamp of kth previous access. A frame with fewer than k historical accesses is given +inf as its backward k-distance. If multiple frames have +inf backward k-distance, the replacer evicts the frame with the earliest overall timestamp (i.e., the frame whose least-recent recorded access is the overall least recent access).
LRU-K 算法会淘汰所有帧中后向 K 距离最大的那个帧。后向 K 距离的计算方式是当前时间戳与第 K 次先前访问时间戳之间的差值。对于历史访问次数少于 K 次的帧,其后向 K 距离被设为+inf。如果有多个帧的后向 K 距离都是+inf,则替换器会选择整体时间戳最早的帧进行淘汰(即记录访问时间最早的帧)。
The maximum size of the LRUKReplacer
is the same as the size of the buffer pool since it contains placeholders for all of the frames in the BufferPoolManager
. However, not all frames in the replacer may be considered as evictable at any given time. The size of LRUKReplacer
is represented by the number of evictable frames. The LRUKReplacer
is first initialized to have no frames in it. Only when a frame is marked as evictable will replacer's size will increase. Similarly, when a frame is pinned or not in use, the replacer's size will decrease.
LRUKReplacer
的最大大小与缓冲池的大小相同,因为它包含了 BufferPoolManager
中所有帧的占位符。然而,并非所有替换器中的帧在任何时候都可被视为可驱逐的。 LRUKReplacer
的大小由可驱逐帧的数量表示。 LRUKReplacer
初始时不含任何帧。只有当帧被标记为可驱逐时,替换器的大小才会增加。同样,当帧被固定或未使用时,替换器的大小会减小。
You will need to implement the following methods for LRU-K as defined in the header file (src/include/buffer/lru_k_replacer.h ) and in the source file (src/buffer/lru_k_replacer.cpp ):
你需要为 LRU-K 实现以下方法,这些方法在头文件( src/include/buffer/lru_k_replacer.h )和源文件( src/buffer/lru_k_replacer.cpp )中定义:
-
Evict() -> std::optional<frame_id_t>
: Evict the frame that has the largest backward k-distance compared to all other evictable frames being tracked by theReplacer
. If there are no evictable frames, returnstd::nullopt
.
Evict() -> std::optional<frame_id_t>
:驱逐与Replacer
跟踪的所有其他可驱逐帧相比具有最大向后 k-距离的帧。如果没有可驱逐的帧,则返回std::nullopt
。 -
RecordAccess(frame_id_t frame_id)
: Record that the given frame has been accessed at the current timestamp. This method should be called after a page has been pinned in theBufferPoolManager
.
RecordAccess(frame_id_t frame_id)
:记录给定帧在当前时间戳被访问过。此方法应在页面被固定在BufferPoolManager
后调用。 -
Remove(frame_id_t frame_id)
: Clear all access history associated with a frame. This method should be called only when a page is deleted in theBufferPoolManager
.
Remove(frame_id_t frame_id)
:清除与帧关联的所有访问历史。此方法仅当页面在BufferPoolManager
中被删除时才应调用。 -
SetEvictable(frame_id_t frame_id, bool set_evictable)
: This method controls whether a frame is evictable or not. It also controls theLRUKReplacer
's size. You'll know when to call this function when you implement theBufferPoolManager
. To be specific, when the pin count of a page hits 0, its corresponding frame should be marked as evictable.
SetEvictable(frame_id_t frame_id, bool set_evictable)
:此方法控制帧是否可被驱逐,同时也控制LRUKReplacer
的大小。在实现BufferPoolManager
时,你会知道何时调用此函数。具体来说,当页面的 pin 计数降为 0 时,其对应的帧应标记为可驱逐。 -
Size() -> size_t
: This method returns the number of evictable frames that are currently in theLRUKReplacer
.
Size() -> size_t
:此方法返回LRUKReplacer
中当前可驱逐帧的数量。
The implementation details are up to you. You are allowed to use built-in STL containers. You may assume that you will not run out of memory for these data structures (you cannot assume the same for the buffer pool in Task #3, you will run out of available frames). You must make sure that your implementation is thread-safe.
具体实现细节由你决定。允许使用 STL 内置容器。可以假设这些数据结构不会耗尽内存(对于任务#3 中的缓冲池不能做同样假设,可用帧会耗尽)。必须确保实现是线程安全的。
If you would like to read more about the LRU-K replacement algorithm, refer to this paper.
如需了解更多关于 LRU-K 替换算法的信息,请参阅这篇论文。
Task #2 - Disk Scheduler
任务 #2 - 磁盘调度器
This component is responsible for scheduling read and write operations on the DiskManager
. You will implement a class called DiskScheduler
in src/include/storage/disk/disk_scheduler.h and its corresponding implementation file in src/storage/disk/disk_scheduler.cpp .
该组件负责调度 DiskManager
上的读写操作。您需要在 src/include/storage/disk/disk_scheduler.h 中实现一个名为 DiskScheduler
的类,并在 src/storage/disk/disk_scheduler.cpp 中实现其对应的实现文件。
The disk scheduler can be used by other components (in this case, your BufferPoolManager
in Task #3) to queue disk requests, represented by a DiskRequest
struct (already defined in src/include/storage/disk/disk_scheduler.h ). The disk scheduler will maintain a background worker thread which is responsible for processing scheduled requests.
磁盘调度器可被其他组件(在本例中,即你在任务#3 中的 BufferPoolManager
)用于排队磁盘请求,这些请求由 DiskRequest
结构体表示(已在 src/include/storage/disk/disk_scheduler.h 中定义)。磁盘调度器将维护一个后台工作线程,负责处理已调度的请求。
The disk scheduler will utilize a shared queue (channel) to schedule and process the DiskRequest
s. One thread will add a request to the queue, and the disk scheduler's background worker will process the queued requests. We have provided a Channel
class in src/include/common/channel.h to facilitate the thread-safe sharing of data between threads, but feel free to use your own implementation if you find it necessary.
磁盘调度器将使用一个共享队列(通道)来调度和处理 DiskRequest
。一个线程会将请求添加到队列中,而磁盘调度器的后台工作线程将处理队列中的请求。我们在 src/include/common/channel.h 中提供了一个 Channel
类,以便于线程间安全地共享数据,但如果你认为有必要,也可以自由使用自己的实现。
The DiskScheduler
constructor and destructor are already implemented and are responsible for creating and joining the background worker thread. You will only need to implement the following methods as defined in the header file (src/include/storage/disk/disk_scheduler.h ) and in the source file (src/storage/disk/disk_scheduler.cpp ):
DiskScheduler
的构造函数和析构函数已经实现,负责创建和加入后台工作线程。你只需要按照头文件( src/include/storage/disk/disk_scheduler.h )和源文件( src/storage/disk/disk_scheduler.cpp )中的定义实现以下方法:
-
Schedule(DiskRequest r)
: Schedules a request for theDiskManager
to execute. TheDiskRequest
struct specifies whether the request is for a read or write, where the data should be read from / written into, and the page ID for the operation. TheDiskRequest
also includes astd::promise
whose value should be set to true once the request is processed. See below for more information aboutstd::promise
.
Schedule(DiskRequest r)
:为DiskManager
安排执行请求。DiskRequest
结构体指定了请求是读取还是写入操作、数据应读取自/写入到的位置,以及操作的页面 ID。DiskRequest
还包含一个std::promise
,其值应在请求处理完成后设为 true。有关std::promise
的更多信息,请参阅下文。 -
StartWorkerThread()
: The startup method for the background worker thread which processes the scheduled requests. The worker thread is created in theDiskScheduler
constructor and calls this method. This worker thread is responsible for receiving queued requests and dispatching them to theDiskManager
. Remember to set the value correctly on theDiskRequest
's callback to signal to the request issuer that the request has been completed. This should not return until theDiskScheduler
's destructor is called.
StartWorkerThread()
:后台工作线程的启动方法,用于处理已调度的请求。该工作线程在DiskScheduler
构造函数中创建并调用此方法。此工作线程负责接收队列中的请求并将其分派给DiskManager
。记得正确设置DiskRequest
回调的值,以向请求发起者发出请求已完成的信号。在DiskScheduler
的析构函数被调用之前,此方法不应返回。
We mentioned that one of the fields of a DiskRequest
is a std::promise
. If you are unfamiliar with C++ promises and futures, you can check out the documentation here. For the purposes of this project, they essentially provide a callback mechanism for a thread to know when their scheduled request is completed. To see an example of how they might be used, check out disk_scheduler_test.cpp
.
我们提到过, DiskRequest
的一个字段是 std::promise
。如果你不熟悉 C++ 中的 promise 和 future,可以查阅这里的文档。就本项目而言,它们本质上提供了一种回调机制,让线程能知道它们调度的请求何时完成。要查看使用示例,请参阅 disk_scheduler_test.cpp
。
Again, the implementation details are up to you. You must make sure that your implementation is thread-safe.
再次强调,具体实现方式由你决定。你必须确保你的实现是线程安全的。
Disk Manager 磁盘管理器
The header containing the DiskManager
class is located at (src/include/storage/disk/disk_manager.h ). It reads page data from disk and writes data to disk. Your disk scheduler will use DiskManager::ReadPage()
and DiskManager::WritePage()
while it is processing a read or write request.
包含 DiskManager
类的头文件位于 ( src/include/storage/disk/disk_manager.h )。它负责从磁盘读取页面数据并将数据写入磁盘。你的磁盘调度器在处理读写请求时会使用 DiskManager::ReadPage()
和 DiskManager::WritePage()
。
Task #3 - Buffer Pool Manager
任务 #3 - 缓冲池管理器
Finally, you must implement the buffer pool manager (BufferPoolManager
)! Echoing the beginning of this page, the BufferPoolManager
is responsible for fetching database pages from disk with the DiskScheduler
and storing them in memory. The BufferPoolManager
can also schedule writes of dirty pages out to disk when it is either explicitly instructed to do so or when it needs to evict a page to make space for a new page.
最终,你需要实现缓冲池管理器( BufferPoolManager
)!正如本页开头所述, BufferPoolManager
负责通过 DiskScheduler
从磁盘获取数据库页面并将其存储在内存中。当被明确指示或需要驱逐页面为新页面腾出空间时, BufferPoolManager
还可以调度脏页写回磁盘。
Your BufferPoolManager
implementation will use the LRUKReplacer
and DiskScheduler
classes that you created in the previous steps of this assignment. The LRUKReplacer
will keep track of when pages are accessed so that it can decide which frame to evict when it must make room for a new page. The DiskScheduler
will schedule writes and reads to disk on the DiskManager
.
你的 BufferPoolManager
实现将使用本作业前几个步骤中创建的 LRUKReplacer
和 DiskScheduler
类。 LRUKReplacer
会跟踪页面访问情况,以便在需要为新页面腾出空间时决定驱逐哪个帧。 DiskScheduler
将通过 DiskManager
调度磁盘的读写操作。
We have provided a helper class called FrameHeader
, which helps manage the in-memory frames. All access to page data should be through FrameHeader
s. FrameHeader
has a method called GetData
that returns a raw pointer to its frame's memory, and the DiskScheduler
/ DiskManager
will use this pointer to copy the contents of a physical page on disk into memory.
我们提供了一个名为 FrameHeader
的辅助类来帮助管理内存中的帧。所有对页面数据的访问都应通过 FrameHeader
进行。 FrameHeader
有一个名为 GetData
的方法,该方法返回指向其帧内存的原始指针, DiskScheduler
/ DiskManager
将使用此指针将磁盘上物理页面的内容复制到内存中。
As a reminder, the buffer pool manager does not need to understand the contents of these pages. The only information that the BufferPoolManager
knows about pages are the page IDs (page_id_t
) and the FrameHeader
s they are stored inside of. Also, the BufferPoolManager
will reuse the same FrameHeader
object to store data as it moves back and forth between disk and memory. In other words, all FrameHeader
s will store many different pages throughout the lifetime of the system.
提醒一下,缓冲池管理器无需理解这些页面的内容。 BufferPoolManager
对页面仅需了解页面 ID( page_id_t
)及其存储所在的 FrameHeader
。此外,当数据在磁盘与内存之间来回移动时, BufferPoolManager
会复用同一个 FrameHeader
对象来存储数据。也就是说,在系统运行期间,所有 FrameHeader
都将存储许多不同的页面。
Concurrency 并发控制
When implementing a multi-threaded buffer pool manager, we must take care to synchronize data access. This means that we do not want multiple copies of the same page in different frames of the buffer pool. If we allowed this, we would encounter this scenario:
- Thread T1 loads page X1 from disk into a frame and starts modifying page X1, and let's call this new version page X2.
- Thread T2 loads page X1 from disk into a different frame and starts modifying this version of page X1, and let's call this other modified version page X3.
- Thread T2 finishes writing and writes X3 back to disk.
- Thread T1 finishes writing and writes X2 back to disk.
- Data race ☠️!
Thus, we keep only 1 version of a page in memory at a time to prevent data synchronization races. Additionally, to prevent us from evicting a page while threads are accessing it, we maintain a reference count / pin count on the frame that stores it. Finally, in order to keep track of which pages are stored in which frames, we also maintain a page table using a hash map that maps page IDs to frames.
The pin count of a frame is the number of threads that have access to the page's data. As long as the pin count on a frame is greater than 0 (implying there is at least 1 thread accessing the page's data), the buffer pool manager is not allowed to evict the page being stored. You can maintain the pin count using the atomic field pin_count_
in the FrameHeader
class. Keep in mind that pin_count_
is separate from LRUKReplacer::SetEvictable
, so you will need to make sure those are synced properly. You will also have to update the is_dirty_
flag of the FrameHeader
when you think it is necessary. If this flag is set when you want to evict a page, you will have to act accordingly to maintain data synchronization between memory and disk.
Last, but certainly not least, you will have to implement both ReadPageGuard
and WritePageGuard
. These classes are RAII objects that provide thread-safe read / write access to the underlying pages. See the implementation section below for more information. You will probably need to implement this in tandem with the BufferPoolManager
methods CheckedReadPage
and CheckedWritePage
. However, if you want to make sure your page guard implementations are correct, you may choose to implement BufferPoolManager::GetPinCount
first and then stitch together something that will pass the page guard tests.
Implementation
You will need to implement the following page guard methods defined in the header file (src/include/storage/page/page_guard.h ) and in the source file (src/storage/page/page_guard.cpp ):
-
ReadPageGuard::ReadPageGuard()
-
ReadPageGuard::ReadPageGuard(ReadPageGuard &&that)
-
ReadPageGuard::operator=(ReadPageGuard &&that) -> ReadPageGuard &
-
ReadPageGuard::Flush()
-
ReadPageGuard::Drop()
-
WritePageGuard::WritePageGuard()
-
WritePageGuard::WritePageGuard(WritePageGuard &&that)
-
WritePageGuard::operator=(WritePageGuard &&that) -> WritePageGuard &
-
WritePageGuard::Flush()
-
WritePageGuard::Drop()
You do not have to implement these methods before the BufferPoolManager
methods. You should probably work on them at the same time.
These methods implement move semantics and RAII for the page guards. If you are unfamiliar with these things, please familiarize yourself with learning materials online. There are many great resources (including articles, Microsoft tutorials, YouTube videos) that explain this in depth. You should not attempt to implement these methods without having a solid understanding of how RAII and move semantics work.
There will likely be a lot of code duplication here (i.e. the two guards should be identical except for a handful of lines). If you want to derive these classes based on a class you create, you are welcome to do so. Just make sure that no interfaces and method signatures are changed!
You will also need to implement the following BufferPoolManager
methods defined in the header file (src/include/buffer/buffer_pool_manager.h ) and in the source file (src/buffer/buffer_pool_manager.cpp ):
-
NewPage() -> page_id_t
-
DeletePage(page_id_t page_id) -> bool
-
CheckedWritePage(page_id_t page_id) -> std::optional<WritePageGuard>
-
CheckedReadPage(page_id_t page_id) -> std::optional<ReadPageGuard>
-
FlushPageUnsafe(page_id_t page_id) -> bool
-
FlushPage(page_id_t page_id) -> bool
-
FlushAllPagesUnsafe()
-
FlushAllPages()
-
GetPinCount(page_id_t page_id)
All of these methods have detailed documentation comments in the source file. Make sure to read all of these in their entirety! They will contain many useful hints.
You do not need to make your buffer pool manager super efficient. For all of the public BufferPoolManager
method, holding the buffer pool latch from beginning to end should be enough (except for when you need to release it early to prevent deadlocks). However, you do need to ensure that your buffer pool manager has reasonable performance, otherwise there will be problems in future projects. You can compare your benchmark result (QPS.1 and QPS.2) with other students and see if your implementation is too slow.
Please refer to the source files (src/storage/page/page_guard.cpp and src/buffer/buffer_pool_manager.cpp ) for significantly more detailed specifications and documentation.
Leaderboard Task (Optional)
For this project's leaderboard challenge, we are doing a benchmark on your buffer pool manager with a special storage backend.
Optimizing for the leaderboard is optional (i.e., you can get a perfect score in the project after finishing all previous tasks). However, your solution must finish the leaderboard test with a correct result and without deadlock and segfault.
The leaderboard test is compiled with the release profile:
mkdir cmake-build-relwithdebinfo cd cmake-build-relwithdebinfo cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo make -j `nproc` bpm-bench # The command below is just for illustrating how the bpm-bench test works. # We are NOT running tests with the same parameter for the leaderboard test. ./bin/bustub-bpm-bench --duration 5000 --latency 1
We strongly recommend you to checkpoint your code before optimizing for leaderboard tests, so that if these optimizations cause problems in future projects, you can always revert them.
In the leaderboard test, we will have multiple threads accessing the pages on the disk. There are two types of threads running in the benchmark:
- Scan threads. Each scan thread will read pages on the disk sequentially. There will be 8 scan threads.
- Get threads. Each get thread will randomly select and update a page using the zipfian distribution. There will be 8 get threads.
We will run the benchmark three times on the in-memory storage backend, each time for 30 seconds. For the first and the second time, it will run directly with different buffer pool and replacer settings.
For the third time, we will add a 1-millisecond latency to each of the random read/write operation, and 0.1ms latency to sequential/local read/write
operations. See DiskManagerUnlimitedMemory
class for more information.
The final score is computed as a weighted QPS of scan and get operations, with and without latency respectively:
scan_qps_large / 1000 + get_qps_large / 1000 + scan_qps_small / 1000 + get_qps_small / 1000 + scan_qps_1ms + get_qps_1ms
Recommended Optimizations
- Better replacer algorithm. Given that get workload is skewed (i.e., some pages are more frequently accessed than others), you can design your LRU-k replacer to take page access type into consideration, so as to reduce page miss.
- Parallel I/O operations. Instead of processing one request at a time in your disk scheduler, you can issue multiple requests to the disk manager at the same time. This optimization will be very useful in modern storage devices, where concurrent access to the disk can make better use of the disk bandwidth. You should handle the case that multiple operations to the same page are in the queue and the end result of these requests should be as if they are processed in order. In a single thread, they should have read-after-write consistency.
- To achieve true parallelism in disk scheduler, you will also need to allow your buffer pool manager can handle multiple
ReadPage
andWritePage
requests and evicting multiple pages at the same time. You might need to bring in a conditional variable in your buffer pool manager to manage free pages. - You can use our provided lock-free queue implementation in
third_party/readerwriterqueue
and create your ownpromise
implementation that is compatible withstd::promise
so as to lower the overhead of inter-thread communication. Note that in this project, all requests must go through theDiskScheduler
's background thread.
Leaderboard Policy
- Submissions with leaderboard bonus are subject to manual review by TAs.
- By saying "review", it means that TAs will manually look into your code, or if they are unsure whether an optimization is correct or not by looking, they will make simple modification to existing test cases to see if your leaderboard optimization correctly handled the specific cases that you want to optimize.
- One example of simple modification: change the buffer pool manager size for the benchmark.
- Your optimization should not affect correctness. You can optimize for specific cases, but it should work for all inputs in your optimized cases.
- Disallowed: compare the plan with the leaderboard test and convert it to
ValueExecutor
with the output table in project 3. That’s because your optimization should work for all table contents. Hardcoding the answer will yield wrong result in some cases.
- Disallowed: compare the plan with the leaderboard test and convert it to
- You should not try detecting whether your submission is running leaderboard test by using side information.
- Unless we allow you to do so.
- Disallowed: use
#ifdef NDEBUG
, etc.
- Submissions with obvious correctness issues will not be assigned leaderboard bonus.
- You cannot use late days for leaderboard tests.
- If you are unsure about whether an optimization is reasonable, you should post on Piazza or visit any TA's office hour.
Instructions
See the Project #0 instructions on how to create your private repository and setup your development environment.
Testing
You can test the individual components of this assigment using our testing framework. We use GTest for unit test cases. There are three separate files that contain tests for each component:
LRUKReplacer
: test/buffer/lru_k_replacer_test.cppDiskScheduler
: test/storage/disk_scheduler_test.cppPageGuard
: test/storage/page_guard_test.cppBufferPoolManager
: test/buffer/buffer_pool_manager_test.cpp
You can compile and run each test individually from the command-line:
$ make lru_k_replacer_test -j `nproc` $ ./test/lru_k_replacer_test
You can also run make check-tests
to run ALL of the test cases. Note that some tests are disabled as you have not implemented future projects. You can disable tests in GTest by adding a DISABLED_
prefix to the test name.
Important: These tests are only a subset of the all the tests that we will use to evaluate and grade your project. You should write additional test cases on your own to check the complete functionality of your implementation.
Formatting
Your code must follow the Google C++ Style Guide. We use Clang to automatically check the quality of your source code. Your project grade will be zero if your submission fails any of these checks.
Execute the following commands to check your syntax. The format
target will automatically correct your code. The check-lint
and check-clang-tidy-p1
targets will print errors and instruct you how to fix it to conform to our style guide.
$ make format $ make check-clang-tidy-p1
Memory Leaks
For this project, we use LLVM Address Sanitizer (ASAN) and Leak Sanitizer (LSAN) to check for memory errors. To enable ASAN and LSAN, configure CMake in debug mode and run tests as you normally would. If there is memory error, you will see a memory error report. Note that macOS only supports address sanitizer without leak sanitizer.
In some cases, address sanitizer might affect the usability of the debugger. In this case, you might need to disable all sanitizers by configuring the CMake project with:
$ cmake -DCMAKE_BUILD_TYPE=Debug -DBUSTUB_SANITIZER= ..
Development Hints
You can use BUSTUB_ASSERT
for assertions in debug mode. Note that the statements within BUSTUB_ASSERT
will NOT be executed in release mode.
If you have something to assert in all cases, use BUSTUB_ENSURE
instead.
Post all of your questions about this project on Piazza. Do not email the TAs directly with questions.
We encourage you to use a graphical debugger to debug your project if you are having problems.
If you are having compilation problems, running make clean
does not completely reset the compilation process. You will need to delete your build directory and run cmake ..
again before you rerun make
.
Grading Rubric
Each project submission will be graded based on the following criteria:
- Does the submission successfully execute all of the test cases and produce the correct answer?
- Does the submission execute without any memory leaks?
- Does the submission follow the code formatting and style policies?
Note that we will use additional test cases to grade your submission that are more complex than the sample test cases that we provide you.
Late Policy
See the late policy in the syllabus.
Submission
After completing the assignment, you can submit your implementation to Gradescope:
Running make submit-p1
in your build/
directory will generate a zip
archive called project1-submission.zip
under your project root directory that you can submit to Gradescope.
You can submit your answers as many times as you like and get immediate feedback.
Notes on Gradescope and Autograder
- If you are timing out on Gradescope, it's likely because you have a deadlock in your code or your code is too slow and does not run in 60 seconds. If your code is too slow it may be because your
LRUKReplacer
is not efficient enough. - The autograder will not work if you are printing too many logs in your submissions.
- If the autograder did not work properly, make sure that your formatting commands work and that you are submitting the right files.
- The leaderboard benchmark score will be calculated by stress testing your
buffer_pool_manager
implementation.
CMU students should use the Gradescope course code announced on Piazza.
Collaboration Policy
- Every student has to work individually on this assignment.
- Students are allowed to discuss high-level details about the project with others.
- Students are not allowed to copy the contents of a white-board after a group meeting with other students.
- Students are not allowed to copy the solutions from another colleague.
WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.