CMU 15-445/645: Database Systems - Buffer Pool Manager Notes
Buffer Pool Manager
LRU-K replacement policy
LRUKNode
1 2 3 4 5 6 7 8 9 10
classLRUKNode { private: /** History of last seen K timestamps of this page. Least recent timestamp stored in front. */ // Remove maybe_unused if you start using them. Feel free to change the member variables as you want.
/** * LRUKReplacer implements the LRU-k replacement policy. * * The LRU-k algorithm evicts a frame whose backward k-distance is maximum * of all frames. Backward k-distance is computed as the difference in time between * current timestamp and the timestamp of kth previous access. * * A frame with less than k historical references is given * +inf as its backward k-distance. When multiple frames have +inf backward k-distance, * classical LRU algorithm is used to choose victim. */ classLRUKReplacer { public: /** * * TODO(P1): Add implementation * * @brief a new LRUKReplacer. * @param num_frames the maximum number of frames the LRUReplacer will be required to store */ explicitLRUKReplacer(size_t num_frames, size_t k);
/** * TODO(P1): Add implementation * * @brief Find the frame with largest backward k-distance and evict that frame. Only frames * that are marked as 'evictable' are candidates for eviction. * * A frame with less than k historical references is given +inf as its backward k-distance. * If multiple frames have inf backward k-distance, then evict frame with earliest timestamp * based on LRU. * * Successful eviction of a frame should decrement the size of replacer and remove the frame's * access history. * * @brief-zh-CN 需要做的事情有:找到标记为 'evictable' 的 frame 中 backward k-distance 最大的 frame,然后将 * 其 evict 出去。若成功 evict 一个 frame,还需要减小 replacer_size_,并且删除该 frame 的访问历史,返回 true; * 若没有 frame 可以 evict,返回 false。 * 传入的 frame_id 为 evict 出去的 frame 的 id,需要在函数内部赋值。 * * @param[out] frame_id id of frame that is evicted. * @return true if a frame is evicted successfully, false if no frames can be evicted. */ autoEvict(frame_id_t *frame_id) -> bool;
/** * TODO(P1): Add implementation * * @brief Record the event that the given frame id is accessed at current timestamp. * Create a new entry for access history if frame id has not been seen before. * * If frame id is invalid (ie. larger than replacer_size_), throw an exception. You can * also use BUSTUB_ASSERT to abort the process if frame id is invalid. * * @brief-zh-CN 需要做的事情有:记录当前时间戳下,frame_id 对应的 frame 被访问了一次,如果 frame_id 之前没有被 * 访问过(node_store 中找不到这个 key),那么创建一个新的访问历史(即插入一个新的 LRUKNode)。 * 如果 frame_id 是无效的(即大于 replacer_size_),抛出异常。 * TODO : 这里需要怎么处理 access_type? * * @param frame_id id of frame that received a new access. * @param access_type type of access that was received. This parameter is only needed for * leaderboard tests. */ voidRecordAccess(frame_id_t frame_id, AccessType access_type = AccessType::Unknown);
/** * TODO(P1): Add implementation * * @brief Toggle whether a frame is evictable or non-evictable. This function also * controls replacer's size. Note that size is equal to number of evictable entries. * * If a frame was previously evictable and is to be set to non-evictable, then size should * decrement. If a frame was previously non-evictable and is to be set to evictable, * then size should increment. * * If frame id is invalid, throw an exception or abort the process. * * For other scenarios, this function should terminate without modifying anything. * * @brief-zh-CN 需要做的事情有:切换 frame_id 对应的 frame 的 evictable 状态。 * 如果 frame_id 之前是 evictable 的,现在要设置为 non-evictable,调整 lru_list_/k_list_/curr_size_ * 如果 frame_id 之前是 non-evictable 的,现在要设置为 evictable,调整 lru_list_/k_list_/curr_size_ * 这里需要更新 lru_list_/k_list_ ,其余函数都不会自己更新 evictable 状态,只有这个函数会更新 * 因此要在此处更新 lru_list_ 和 k_list_ 和 curr_size_ * * 如果 frame_id 是无效的(即大于 replacer_size_),抛出异常。 * 其他情况下,函数应该直接返回。 * * @param frame_id id of frame whose 'evictable' status will be modified * @param set_evictable whether the given frame is evictable or not */ voidSetEvictable(frame_id_t frame_id, bool set_evictable);
/** * TODO(P1): Add implementation * * @brief Remove an evictable frame from replacer, along with its access history. * This function should also decrement replacer's size if removal is successful. * * Note that this is different from evicting a frame, which always remove the frame * with largest backward k-distance. This function removes specified frame id, * no matter what its backward k-distance is. * * If Remove is called on a non-evictable frame, throw an exception or abort the * process. * * If specified frame is not found, directly return from this function. * * @brief-zh-CN 需要做的事情有:从 replacer 中移除一个 evictable 的 frame(参数中制定的 frame_id 对应 frame), * 同时移除其访问历史。如果移除成功,需要减小 replacer_size_。 * 如果 Remove 被调用在一个 non-evictable 的 frame 上,抛出异常。(查看异常类型) * 如果指定的 frame 不存在,直接返回。 * * @param frame_id id of frame to be removed */ voidRemove(frame_id_t frame_id);
/** * TODO(P1): Add implementation * * @brief Return replacer's size, which tracks the number of evictable frames. * * @brief-zh-CN 返回 curr_size_ * * @return size_t */ autoSize() -> size_t;
private: // TODO(student): implement me! You can replace these member variables as you like. // Remove maybe_unused if you start using them. std::unordered_map<frame_id_t, LRUKNode> node_store_; // 当前 replacer 存储的所有 LRUKNode, key 为 frame_id size_t current_timestamp_{0}; // 当前时间戳,每次 RecordAccess 时更新(++递增) size_t curr_size_{0}; // 当前 replacer 中 evictable 的 frame 的数量(即 lru_list_ + k_list 的大小) size_t replacer_size_; // replacer 最大大小(包括 evictable 和 non-evictable) size_t k_; // k, LRU“K” std::mutex latch_; // 保护 node_store_、current_timestamp_、curr_size_ 的互斥锁
// ADD: 当有多个节点的 backward_k_dist_ 都为 +inf 时,使用 LRU 算法选择 victim std::list<frame_id_t> lru_list_; // 用于记录 frame_id 的访问顺序,最近访问的 frame_id 在 list 的尾部,最远访问的在 list 的头部 // ADD: 记录访问次数大于等于 k 的 frame_id std::list<frame_id_t> k_list_; // 记录访问次数大于等于 k 的 frame_id };
/** * BufferPoolManager reads disk pages to and from its internal buffer pool. */ classBufferPoolManager { public: /** * @brief Creates a new BufferPoolManager. * @param pool_size the size of the buffer pool * @param disk_manager the disk manager * @param replacer_k the LookBack constant k for the LRU-K replacer * @param log_manager the log manager (for testing only: nullptr = disable logging). Please ignore this for P1. * @param-zh-CN pool_size 缓冲池的大小,也就是最大的 frame(page) 数量 */ BufferPoolManager(size_t pool_size, DiskManager *disk_manager, size_t replacer_k = LRUK_REPLACER_K, LogManager *log_manager = nullptr);
/** * @brief Destroy an existing BufferPoolManager. */ ~BufferPoolManager();
/** @brief Return the size (number of frames) of the buffer pool. */ autoGetPoolSize() -> size_t{ return pool_size_; }
/** @brief Return the pointer to all the pages in the buffer pool. */ autoGetPages() -> Page * { return pages_; }
/** * TODO(P1): Add implementation * * @brief Create a new page in the buffer pool. Set page_id to the new page's id, or nullptr if all frames * are currently in use and not evictable (in another word, pinned). * * You should pick the replacement frame from either the free list or the replacer (always find from the free list * first), and then call the AllocatePage() method to get a new page id. If the replacement frame has a dirty page, * you should write it back to the disk first. You also need to reset the memory and metadata for the new page. * * Remember to "Pin" the frame by calling replacer.SetEvictable(frame_id, false) * so that the replacer wouldn't evict the frame before the buffer pool manager "Unpin"s it. * Also, remember to record the access history of the frame in the replacer for the lru-k algorithm to work. * * @brief-zh-CN 需要做的事情有:在 buffer pool 中创建一个新的 page,这里应当调用 AllocatePage() * 来获取一个新的 page id(next_page_id_)(需要先检查 frame 是否有空闲,确认有之后再 AllocatePage()), * 并且赋值给 *page_id,接下来需要与 Buffer Pool Manager * 的 frame_id 产生一个映射,并且记录在 page_table_ 之中 {page_id, frame_id} * 如果所有的 frame 都被使用了,且不可替换,那么返回 nullptr。 * * 查找新的对应的 frame 的时候,首先从 free_list_ 中查看是否有空闲的 frame, * 如果没有,那么从 replacer_ 中选择一个 frame,然后调用 AllocatePage() 来获取新的 page id, * 其次,替换的时候需要注意,如果替换的 frame 有 dirty page,那么需要先将其写回磁盘。然后再映射新 page * * 最后,需要将 replacer_ 中的 frame 标记为不可替换,即 replacer.SetEvictable(frame_id, false) * 同时还需要调用 replacer.RecordAccess(frame_id) 来记录访问历史 * * @param[out] page_id id of created page * @return nullptr if no new pages could be created, otherwise pointer to new page */ autoNewPage(page_id_t *page_id) -> Page *;
/** * TODO(P2): Add implementation * * @brief PageGuard wrapper for NewPage * * Functionality should be the same as NewPage, except that * instead of returning a pointer to a page, you return a * BasicPageGuard structure. * * @brief-zh-CN 功能与 NewPage 相同,只是返回值不同,返回一个 BasicPageGuard 结构 * * @param[out] page_id, the id of the new page * @return BasicPageGuard holding a new page */ autoNewPageGuarded(page_id_t *page_id) -> BasicPageGuard;
/** * TODO(P1): Add implementation * * @brief Fetch the requested page from the buffer pool. Return nullptr if page_id needs to be * fetched from the disk but all frames are currently in use and not evictable (in another word, * pinned). * * First search for page_id in the buffer pool. If not found, pick a replacement frame from either * the free list or the replacer (always find from the free list first), read the page from disk by * scheduling a read DiskRequest with disk_scheduler_->Schedule(), and replace the old page in the * frame. Similar to NewPage(), if the old page is dirty, * you need to write it back to disk and update the metadata of the new page * * In addition, remember to disable eviction and record the access history of the frame like you did for NewPage(). * * @brief-zh-CN 需要做的事情有:根据 page_id_ 从 buffer pool 中获取 page_id 对应的 page, * 如果 page_id 对应的 page 不在 buffer pool 中,那么需要从磁盘中读取 page_id 对应的 page,但是如果所有的 frame * 都被使用了(free_list_ full),且不可替换(all pinned),那么返回 nullptr(无法 fetch)。 * 若存在可用的 frame,那么需要调用 disk_scheduler_->Schedule() 此后进行读取磁盘操作,然后替换旧的 page。 * 若旧的 page 是 dirty 的,那么需要将其写回磁盘,并且更新新 page 的 metadata。 * * 最后,需要将 replacer_ 中的 frame 标记为不可替换,即 replacer.SetEvictable(frame_id, false) * 同时还需要调用 replacer.RecordAccess(frame_id) 来记录访问历史 * * * @param page_id id of page to be fetched * @param access_type type of access to the page, only needed for leaderboard tests. * @return nullptr if page_id cannot be fetched, otherwise pointer to the requested page */ autoFetchPage(page_id_t page_id, AccessType access_type = AccessType::Unknown) -> Page *;
/** * TODO(P2): Add implementation * * @brief PageGuard wrappers for FetchPage * * Functionality should be the same as FetchPage, except * that, depending on the function called, a guard is returned. * If FetchPageRead or FetchPageWrite is called, it is expected that * the returned page already has a read or write latch held, respectively. * * @param page_id, the id of the page to fetch * @return PageGuard holding the fetched page */ autoFetchPageBasic(page_id_t page_id) -> BasicPageGuard; autoFetchPageRead(page_id_t page_id) -> ReadPageGuard; autoFetchPageWrite(page_id_t page_id) -> WritePageGuard;
/** * TODO(P1): Add implementation * * @brief Unpin the target page from the buffer pool. If page_id is not in the buffer pool or its pin count is already * 0, return false. * * Decrement the pin count of a page. If the pin count reaches 0, the frame should be evictable by the replacer. * Also, set the dirty flag on the page to indicate if the page was modified. * * @brief-zh-CN 需要做的事情有:将 page_id 对应的 page 的 pin count 减一,如果 pin count 减为 0, * 那么需要将 replacer_ 中的 frame 标记为可替换,即 replacer.SetEvictable(frame_id, true)。 * 同时,设置 page 的 dirty flag 来表示 page 是否被修改过。 * 以及记录访问历史 RecordAccess(frame_id, access_type) * * 注意如果 page_id 不在 page table 中,或者其 pin count 已经为 0,那么返回 false。 * * @param page_id id of page to be unpinned * @param is_dirty true if the page should be marked as dirty, false otherwise * @param access_type type of access to the page, only needed for leaderboard tests. * @return false if the page is not in the page table or its pin count is <= 0 before this call, true otherwise */ autoUnpinPage(page_id_t page_id, bool is_dirty, AccessType access_type = AccessType::Unknown) -> bool;
/** * TODO(P1): Add implementation * * @brief Flush the target page to disk. * * Use the DiskManager::WritePage() method to flush a page to disk, REGARDLESS of the dirty flag. * Unset the dirty flag of the page after flushing. * * @brief-zh-CN 需要做的事情有:将 page_id 对应的 page 写回磁盘,不管 page 是否 dirty。 * 使用 DiskManager::WritePage() 方法来将 page 写回磁盘,然后将 page 的 dirty flag 置为 false。 * * 如果 page_id 不在 page table 中,返回 false。否则返回 true。 * * @param page_id id of page to be flushed, cannot be INVALID_PAGE_ID * @return false if the page could not be found in the page table, true otherwise */ autoFlushPage(page_id_t page_id) -> bool;
/** * TODO(P1): Add implementation * * @brief Flush all the pages in the buffer pool to disk. * * @brief-zh-CN 需要做的事情有:将 buffer pool 中的所有 page 都写回磁盘。 */ voidFlushAllPages();
/** * TODO(P1): Add implementation * * @brief Delete a page from the buffer pool. If page_id is not in the buffer pool, do nothing and return true. If the * page is pinned and cannot be deleted, return false immediately. * * After deleting the page from the page table, stop tracking the frame in the replacer and add the frame * back to the free list. Also, reset the page's memory and metadata. Finally, you should call DeallocatePage() to * imitate freeing the page on the disk. * * @brief-zh-CN 需要做的事情有:删除 buffer pool 中对应的 page * 返回值注意:如果 page_id 不在 page table 中,那么直接返回 **true**,如果 page 存在且被 pin 住了, * 那么不能删除,立即返回 false。 * * 删除 page 之后,需要从 page table 中删除 page_id 对应的记录,然后停止 replacer_ 中对应 frame 的记录, * 并且将 frame 添加到 free_list_ 中。然后重置 page 的 memory 和 metadata。 * 最后,调用 DeallocatePage() 来模拟在磁盘上释放 page。 * * @param page_id id of page to be deleted * @return false if the page exists but could not be deleted, true if the page didn't exist or deletion succeeded */ autoDeletePage(page_id_t page_id) -> bool;
private: /** Number of pages in the buffer pool. */ constsize_t pool_size_; /** The next page id to be allocated */ std::atomic<page_id_t> next_page_id_ = 0;
/** Array of buffer pool pages. */ Page *pages_; /** Pointer to the disk sheduler. */ std::unique_ptr<DiskScheduler> disk_scheduler_ __attribute__((__unused__)); /** Pointer to the log manager. Please ignore this for P1. */ LogManager *log_manager_ __attribute__((__unused__)); /** Page table for keeping track of buffer pool pages. */ std::unordered_map<page_id_t, frame_id_t> page_table_; /** Replacer to find unpinned pages for replacement. */ std::unique_ptr<LRUKReplacer> replacer_; /** List of free frames that don't have any pages on them. */ std::list<frame_id_t> free_list_; /** This latch protects shared data structures. We recommend updating this comment to describe what it protects. */ std::mutex latch_;
/** * @brief Allocate a page on disk. Caller should acquire the latch before calling this function. * @return the id of the allocated page */ autoAllocatePage() -> page_id_t;
/** * @brief Deallocate a page on disk. Caller should acquire the latch before calling this function. * @param page_id id of the page to deallocate */ voidDeallocatePage(__attribute__((unused)) page_id_t page_id){ // This is a no-nop right now without a more complex data structure to track deallocated pages }
// TODO(student): You may add additional private members and helper functions };