FLASH TRANSLATION LAYER FOR SSD
FlashSim (SSD simulator)
Course and Teammates
For this project, I wrote a flash translation layer (FTL) on a simulated solid-state drive (SSD). The FTL is responsible for:
address translation (logical-to-physical mapping from logical to physical pages)
garbage collection (erasing blocks to free up space when the SSD is filled up)
wear leveling (load balancing writes across blocks to extend the lifespan of an SSD)
The goal is to support READ, WRITE, and TRIM requests while maximizing the lifetime WRITE requests sustained by the SSD.
Followup: Apr 2020
Scoreboard: 1st place out of 95 (5% higher endurance than 2nd place)
Score based on SSD lifetime, write amplification, and memory footprint
SSDs based on NAND Flash memory have become the state-of-the-art non-volatile storage medium. Compared to hard-disk drives, SSDs have faster random I/O speed and can serve multiple requests in parallel. Internally, an SSD is a collection of physical pages, organized in a hierarchy of packages, dies, planes, blocks. and pages. These layers are hidden to the upper layer, such as file systems and user programs, which view the SSD as a series of Logical Block Addresses (LBAs). Each LBA corresponds to the size of a physical SSD page. Several key characteristics of an SSD are
No overwriting on a SSD page is possible, so every new write must occupy a new physical page and invalidate the previous physical page.
Erases must be done on the granularity of physical blocks, so valid pages in a block must be migrated before erasing a block.
Each SSD block can only tolerate a limited number of erases (called its lifetime). If a particular block is erased too frequently, it will wear out before other blocks and make the SSD inoperable.
With page migrations, a simple direct mapping is not possible, so the FTL manages the mapping between logical LBAs and physical pages. Moreover, when the SSD is close to full, the FTL is responsible for garbage collection by erasing physical blocks and migrating valid pages within them to free up space. Finally, to maximize the lifetime of the SSD, the FTL applies a wear leveling policy when picking which blocks to erase and when picking which page to direct a new WRITE request to. An ideal wear leveling policy will wear out blocks evenly while incurring minimal page migrations.
Our project is run on the FlashSim SSD simulator. Our FTL must support READ (reads data at an LBA), WRITE (writes/updates data at an LBA), and TRIM (deletes an LBA) requests. Our FTLs are scored based on the performance on 10 long request traces. The performance metrics include
SSD Lifetime (max number of WRITE requests completed before the SSD declares failure)
Write Amplification (ratio of physical page writes performed internally by the SSD to WRITE requests sent to the FTL)
Memory Footprint (the peak memory usage of the FTL)
Wear leveling and garbage collection policies for FTLs have been extensively studied. After surveying literature, I devised an FTL algorithm with ideas combined and adapted from Rejuvenator and LazyFTL.
My FTL stores a page-to-page mapping table. There are two kinds of physical blocks: free blocks (empty and available for writing) and data blocks. We use a specific free block, CDB (current data block), to accept hot LBAs both from write requests and page migrations, and another free block, CCB (current cold block), to accept cold LBAs. All LBAs with last access time within a specific threshold are identified as hot LBAs, while others are cold. When CDB fills up, it becomes a data block and we pick the lowest erase count free block to be the next CDB; and for CCB, we pick the highest erase count free block. Whenever the number of free blocks drop below a lower threshold LT, we erase blocks until we reach a higher threshold HT of free blocks. We will stop erasing if all data blocks have more than half of valid pages. We choose the block to clean next according to the following policy:
Let dist be the distance between max and min erase count data blocks
Case 1: If dist < D, we pick the block with max invalid pages among all data blocks
Case 2: If dist >= D, we pick the block with max invalid pages among data blocks with min erase count
D is a large number in the beginning, and is adaptively decreased to 1 as the max erase count increases. The intuition is that we want to perform Case 1 more often than Case 2 since Case 1 chooses the optimal block to clean among all blocks, and initially a large D allows us to choose Case 1 most of the time. As we near the end of an SSD’s lifetime, we want D to be smaller, because we don’t want some blocks to wear out much faster than the others. Finally, in the case where the system is quite full and there are repeated sequential writes to a few hot blocks, we enter a special mode, SWAP MODE, where we ignore D and always pick the data block with max invalid pages to erase, and swap hot free blocks with cold data blocks after each cleaning period.
1. Page level vs block level mapping
Many popular FTL policies maintain a block-level mapping from logical blocks to physical blocks, while others use page-level mapping. I chose page-level mapping because it is much more flexible - we can group LBAs from different logical blocks into the same physical block. This allows for the nuanced segregation of hot and cold LBAs into CDB and CUB.
The downside of page-level mapping is that it uses memory linear to the number of pages in the SSD, which is unacceptably large given that SSDs typically have a small DRAM. There are many known solutions to this problem. For example, in LazyFTL , two small tables kept in DRAM serve as caches to record recent updates to the page-level mapping table, and the actual page-level mapping table is stored in a designated region of the SSD and lazily updated.
2. Use of CDB and CCB
CDB is filled with hot LBAs, which include frequently accessed LBAs and newly accessed LBAs. Therefore, CDB is filled up fast, and many of its hot LBAs will likely be rewritten soon, resulting in a lot of invalid pages and CDB being picked to erase again soon. Therefore, to balance out the erase counts, we pick the free block with the lowest erase count to be the next CDB. Similarly, CCB is filled with cold LBAs, meaning it won't be picked to erase any time soon, so we pick the free block with the lowest erase count to be the next CCB.
3. The length of each cleaning period
A cleaning period starts when there are not enough (<=LT) free blocks. We keep picking blocks to erase until we reach HT free blocks. The larger HT - LT is, the larger the time gap between cleaning periods. A larger time gap will allow for more pages to become invalid, and thus give us more optimal blocks to erase during the next cleaning period. However, if we mindlessly pick HT - LT blocks to erase, we might run out of "good blocks" with many invalid pages to clean, and be forced to clean blocks with mostly valid pages, resulting in numerous page migrations and large write amplification. Therefore, we stop erasing if all data blocks have more than half of valid pages.
4. Why SWAP MODE
The above algorithm works well for both sequential and random workloads, but is not fit for a specific type of workload. When the SSD is close to full (there are many untrimmed LBAs in the mapping table), and a specific region of blocks is "hotter" than the others, the dist < D constraint becomes a curse: we will end up with many blocks with only a few invalid pages, and repeatedly migrate valid pages between all blocks, resulting in high write amplification. Using SWAP MODE, we ignore the dist < D constraint, and resolve to swapping hot free blocks with cold data blocks, resulting in lower write amplification.
For the final project of CMU 15712 Advanced OS, my teammates and I revisited and improved upon my FTL to explore the possibility of a workload-aware FTL (paper link). Such a superior workload-aware FTL algorithm, if it exists, can maximize SSD lifetime and likely involves the use of machine learning. To prove its feasibility, we implemented an oracle with complete knowledge of a workload, which is allowed to provide hints to the FTL to help the FTL make smarter decisions. By exploring several different types of hints, such as hotness/coldness of LBAs and optimal internal parameters, we showed that the FTL with oracle outperforms the FTL baseline by 20 ~ 25%, implying that a workload-aware FTL is possible. Our project received an A+ grade.