To HDD or to SSD? That is the question …

SSD adoption is growing rapidly as it gets more mature and affordable, but Enterprise SSDs are still about 30 times more expensive than capacity SATA hard drives (HDD), and analysts predict this gap will remain for the foreseeable future. The top 3 enterprise storage pain points according to a recent report from The 451 Research Group are capacity growth, high cost of storage, and delivering storage performance. While SSD can solve the performance angle, it would definitely increase the cost, so a balanced approach is needed – creating the right mix of SSD and HDD per application use-case.

Intro to Flash and SSD

Today there are 4 key SSD categories: SLC, MLC/eMLC, TLC, and NV-RAM. Each has different cost, endurance, performance and market availability. SLC has high endurance but is pretty expensive and therefore is not gaining wide adoption. MLC and Enterprise MLC (eMLC) are the most commonly used; they can last for 3-5 years with moderate write workloads and would cost $1-5/GB. TLC (Triple-Level Cell) is a new desktop oriented SSD technology with much lower endurance (~10X) but also much lower costs (as low as $0.40/GB) and can be used for read mostly workloads. The NV-RAM category is still emerging with technologies like RRAM, MRAM, PCM. The unique thing about it is that it performs and looks like DRAM and can endure a few magnitudes higher number of writes. It’s also going to be more expensive and will be used mostly used as a cache layer. As an interim solution vendors make NV-DIMMs which combine DRAM with Flash and SuperCap.

SSDs can be accessed through traditional SAS/SATA interfaces like HDD, or through a much faster PCI-Express (PCIe) interface. The PCIe drives started with proprietary protocols and are now standardizing on NVMe which will use built-in OS drivers and will offer very high-performance in multi-core systems compared to SAS/SATA based disks, due to NVMe parallel multi-channel architecture and bypass of legacy SCSI layers. PCIe cards are usually sold for a premium over SAS/SATA SSDs, but deliver about 5 times larger bandwidth, much higher IOPs and lower latency. The downside of PCIe cards is that they are not as hot-swappable as disks.

Another benefit of SSD is density, there are already new 4TB 2.5″ SSDs (when 2.5″ HDDs are ~1TB), new 1TB laptop SSDs such as Samsung SM951 will soon arrive with the tiny M.2 form factor and a fast NVMe interface. Those will probably be exploited by enterprise storage vendors to produce extremely dense storage enclosures or as an on-board flash caching module. We can already see examples like the Skyera squeezing 44TB in a single 1U with plans for 300TB in 1U next year. This significant density increase can help lower overall costsand can fit special applications and use cases in which density is a key factor.

Different SSD form factors: mSATA, PCIe M.2 (60mm, 80mm), and 2.5″ SAS/SATA/PCIe (source: Micron)

It’s important to know that lower-end SSDs have high latency variance (jitter). Usually the better and more expensive ones will have smaller variance and more predictable latency, This is an inherent problem with SSDs due to the way its FTL and GC mechanisms work – if your application is latency sensitive make sure you check the drive specifications.

SSD and HDD Trends and usage models

On the HDD front there are two main categories, the Capacity 3.5″ drives with densities as high as 10TB, and 2.5″ SAS Performance drives running at 10/15K RPM and with densities just over 1TB. The 2.5″ performance category can be 5-10 times more expensive than the capacity SATA drives, and given their smaller size you can fit twice the disks in the same chassis. When looking at reports from the leading analysts and vendors, the cost reduction trends will remain in both the SSD and HDD fronts, slightly faster on SSDs. It seems that the category of 2.5″ 15K RPM disks will be replaced with faster and denser SSDs, but the capacity HDD category will remain at least 10-20 times cheaper than MLC SSDs in the next few years.
TLC will gain adoption as a replacement for capacity drives in places where read performance or density matter. For example, think of a site storing videos or pictures – when they are new the access rate is high, given the variety the access is random, and after a few days or weeks they are not accessed as frequently, and can be de-staged to some cold storage – a perfect match for cheap TLC drives that are limited to 1 drive re-write per day.

The table below provide general comparison between the main options

* 12G SAS interface will enable up to 1000 MB/s.

Notice that 2.5″ SSDs bandwidth is only 2-3 times the bandwidth of HDD, and they cost 20-30 times more. Therefore, workloads which require high sequential bandwidth (like Hadoop or video processing) would benefit more from hard disks striped together. Workloads which are random or latency sensitive (like Databases or VDI) would significantly benefit from SSD. But even within a workload category like database there may be tables or indexes with lots of random access and others which contain large blobs and/or are used sequentially or not frequently. In VDI applications, some data like windows installations files or last year’s PowerPoint or Word documents or pictures are hardly accessed to justify SSD costs.

In many cases SSDs are used as a tier in front of slower disks (“Flash Cache”), this is a good practice for read cache especially if the cache doesn’t update too frequently. But using SSD as write cache is a problem, if your SSD cache size is 5% of the disk and each write goes through the cache than the cache will be written at a rate of 20x more writes per GB compare to using SSD only solution. This is a good way to kill your SSD device which has limited endurance. In such cases usually NV-RAM solutions are used. Nimble storage has a nice implementation which combine flash (SSD) and NV-RAM caches in the right way. Cache implementations must be implemented carefully as simple LRU caches are doomed to fail.. The fact that data was read recently doesn’t mean it will be read frequently. Sometimes large IOs may be better served reading directly from HDD vs using valuable cache space, which requires making sure that cache eviction (when it’s almost full) wouldn’t kill the system performance.

Usage model and recommended solution:

Conclusions and Summary

We covered the different SSD and HDD options and see that there is a big cost and performance differences between SSD and HDD. This means they both need to co-exist and used appropriately if we want to maximize performance and lower the costs at the same time. HDDs will be deployed in use cases that are capacity oriented, involve sequential access and do not have high-performance requirements. SSDs will be used for random access or latency sensitive workloads, or as a read cache tier. TLC can be used to further optimize cost and performance.

The only way to address the major challenges of exponential capacity growth, high storage costs, and storage performance is to use a hybrid storage approach.

 

5 thoughts on “To HDD or to SSD? That is the question …

    • – What about Pseudo MLC (MLC operate like SLC) which is being accepted this year in the Industrial market?
      – I thought MLC raw endurance is 3000 and not 10000. Also taking into acount GC and block management, the number of times you can fill up the SSD dramatically reduced (in some intel SSDs to 150 for seuential large block wirting and probably lower for real application).
      – Any comment on OCZ MLC PCIeSSD claiming endurance of 25000 read/write cycles?
      – Number of MLC NAND Flash read cycles is also limited because of “read disturb” errors. Do you have any number of the no. of read cycles during various application?

      Like

      • in general its hard to trust the vendor endurance numbers, and as you mentioned it depends on the workload, in some cases its less, in some cases vendors are very conservative e.g. see the link : The SSD Endurance Experiment: Casualties on the way to a petabyte
        i’m not too familiar with the OCZ cards, mainly used FusionIO, Micron, and LSI ones
        when we used OCZ 2.5″ disks a while back there was a big difference between advertised performance and actual.

        Like

      • I looked at the article and it did not mention the testing work load. 600TB written (TBW) in to 240GB SSD = 2500 times of filling up the SSD. I guess it is for sequential write of large blocks. I.E. Write Amplification factor (WAI) is 1.2 which is good number but still reasonable since WAI should be >1. We see more and more companies mentioned in their data sheets the TBW per JEDEC JESD-218 test methods and JESD-219 workloads. It is very easy to measure the TBW or life span at any work load using SMART tool. I am wondering how companies like OCZ, Stec and Smart Storage published endurandce of 25000 to 90000 times of filling up SSD using MLC chips. Do you have any idea?
        These companies used all kind of buzz words like “adaptive block management”, etc. I suspect that they do hidden compression during blocks manipulation. The mentioned Pseudo SLC already announced by FusionIO but never realized but now it is popular in the industrial market.
        The common of all above companies is they were bought by the big companies 🙂

        Like

  1. I wluld like to thank you for tthe efforts you’ve put in writing this blog.

    I’m hoping to check out the same high-grade content from you in the fugure
    as well. In fact, your creative writung abilities has encoluraged me to get my vedy
    own website now 😉

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s