avatar#hope

Summary

The article discusses the importance and benefits of utilizing Intel's RAID technology, particularly RAID 5, for improved performance and data protection on NVMe SSDs.

Abstract

The article emphasizes the significance of maximizing the use of Intel CPUs' RAID features, especially RAID 5, which offers a balance between performance, data protection, and storage capacity. It highlights the cost-effectiveness of this approach, given the decreased prices of NVMe SSDs. The author provides a detailed explanation of different RAID levels, with a focus on RAID 5's advantages for consumer scenarios. Intel's RAID solutions, such as Intel RST and Intel VROC, are recommended for their ability to protect data and enhance performance, even suggesting the use of Optane technology for caching. The article also addresses practical considerations, potential issues, and setup recommendations, including the need for supported NVMe SSDs to avoid parity errors.

Opinions

  • The author believes that consumers should leverage the full capabilities of their Intel CPUs, particularly the RAID features, to improve their computing experience.
  • RAID 5 is portrayed as the optimal choice for most consumer scenarios, providing a good mix of speed, reliability, and storage efficiency.
  • There is a clear preference for Intel RAID solutions over AMD's offerings, due to AMD's lack of RAID 5 support.
  • The author expresses frustration with the limited support for non-Intel SSDs and the necessity of using a supported SSD, like the Samsung OEM SM961, to avoid parity errors in a RAID 5 setup.
  • The article suggests that turning off write-cache buffer flushing and turning on read patrol can mitigate issues with parity errors when using non-recommended SSDs.
  • The author is critical of the performance impact when disabling disk data cache for write hole protection and recommends using Intel Optane caching as a solution.
  • The author plans to explore Intel Optane caching in the future to address the performance issues encountered with write hole protection enabled.

The hidden, but awesome feature of Intel CPUs which nobody told you about

I have always believed that if you are spending your hard earned money on anything, you should utilize it to the MAX. This article is about how and why you should really be using a very important feature of Intel CPUs if yours support it.

This is new and exciting because the cost of NVMe SSDs have come down so much that, it is now foolish to not be protected against hardware failure with RAID 5 while simultaneously getting much, much better performance.

Remember paying hundreds of dollars for 250GB? Wow, how the prices have fallen!

Let us not forget that the biggest performance hog of any computer is the disk. You increase the disk performance, to substantially improve your overall performance no matter how many CPU cores you have, or RAM.

Given below is a simplified explanation of RAID. This can be hard to understand, but over time and over use, you will understand it better.

Redundant Array of Independent Disks (RAID)

I still remember the day a very long time ago when I first read about RAID and purchased a PCIe RAID card to hook up multiple old school HDDs in RAID 0. For a very long time, RAID 0 was the only RAID I used. But those were the days of HDDs, where everything ran super slow and NVMe was a fantasy.

RAID 0

RAID 0 will give you maximum performance & disk space, but your data will always be at the edge of a knife. Very easy to lose all your data!

RAID 1

RAID 1 will just protect your data and you get half the disk space across all your disks. There won’t be any performance benefits, it will just safe guard your data.

RAID 5

RAID 5 gives you the best of both worlds. You get good performance, good data performance and most of the disk space. This is obviously more complex to implement, and hence some vendors do not support this by default.

Even if one drive fails completely, everything will still keep working (albeit slowly) while you get the replacement drive and swap it.

This is what I recommend for most consumer scenarios.

RAID 10

This is what most data centers do. It will use up a lot of drives (which is the biggest problem with it), but you can lose more than one drive and still continue to work — that too without any loss of performance (which is pretty awesome if you think about it).

Then swap the failing drives when you get a chance. This is cost intensive but perfect for data center scenarios.

Intel RAID

If you have a newer Intel CPU, you should find out whether it supports Intel RAID or not. This is usually referred to as:

  • Intel RST (Rapid Storage Technology)
  • Intel VROC (Virtual RAID on CPU)

Optane fits into this as a caching technology which can be used in conjunction with a highly reliable Intel RAID configuration so that your data is protected very well + you can use Optane caching to increase your performance as well.

Intel VROC 7.5 Specifications (March 2021)

The documentation makes more sense once you go through the “death by a thousand cuts” experience I had while trying to get everything working. This is a summarized, relatively easy to understand, walkthrough of this technology.

CPUs which support Intel VROC

Nowadays, even laptops come with Xeon CPUs.

You need bootable RAID with NVMe support for sure:

I suppose bootable RAID with “only” SATA support is better than nothing, but NVMe is preferable.

You need a VROC Premium hardware key for the technology to work properly with non Intel SSDs:

  • These are usually sold by the Motherboard Manufacturer or Intel Dealers.

It is sad to see only the Samsung OEM SM961 (pretty old) SSD is officially supported. I do know for a fact that if you get a supported SSD, you get 0 parity errors, even with SSDs pretty old like I have on my Dell 77xx laptop.

This is “easy” to setup if you have a desktop, although laptops like the Dell Precision 77xx series, which can take at-least 3 NVMe SSDs can also do Intel RAID:

The latest generation 77xx laptop keeps the NVMe SSD ever further away from the older models (I have a 7730), to reduce the amount of localized heat generated within the laptop.

Most laptops can only take (at the most) two “true PCIe” NVMe SSDs which severely limits us from using the best RAID option out there — RAID5 which combines speed with data protection at an optimal cost.

This article talks about how to configure your OS drive to use Intel VROC:

  • You really need to use atleast RAID 5 whenever you can to avert a total disaster situation when a drive fails (especially a consumer drive).

And this is a further continuation of that story:

I wrote this article as the continuation in this series because there is a ton of information I had to find out the hard way on my journey and this is going to help someone else prevent a lot of pain.

You decide to do this because the feature is just lying there unused, wasted if you may, which can provide you with a faster experience, with zero downtime, hardware failure protection “built in”.

The catch with AMD where this is supported “with all CPUs”, is that they do not support RAID 5, which really is the only cost effective RAID option which “gives you all”. Don’t get me wrong though — all my personal PCs are AMDs, I hate Intel for their sockets which keep changing all the time.

The first thing to remember is that if you decide to go ahead with this, use a supported NVMe SSD and the fastest & newest of the supported SSDs are a bit old — the Samsung OEM drives. If you use any unsupported drive, be prepared to deal with parity errors in your RAID 5 setup.

I already got three 510gb Samsung 970 Plus NVMe SSDs, and I am sticking with them for now because it would be too much of a hassle to return these (cheap) ones and find older OEM drives which actually cost more today. But, in living with these, I run the RAID verify/ repair utility daily, and it always finds about 10–13 parity errors which get fixed:

Daily repair runs.

I have found after much painstaking experimentation that you really need to turn off write-cache buffer flushing, because it causes more parity errors when turned on, and it really does need a separate power supply for the drives to support that feature:

Do NOT check the second checkbox just because you have a UPS.
A UPS does not help you turn on write-cache buffer flushing.

Turning on read patrol does significantly reduce parity errors when using non recommended SSDs:

Turning off the disk data cache severely affects performance no matter how powerful your setup is:

I found that processes in windows like search indexing uses > 80% disk IO severely slowing down even the most powerful Intel Workstation, when disk data cache is disabled, because you wanted to turn on the RAID write hole protection:

  • I tried numerous times, but no matter what I did my PC with 10 cores, 20 threads, 32GB ECC RAM and RAID 5 on NVMe runs super slow every few minutes because of very high Disk utilization from various windows processes.

Both in distributed mode (which uses all the SSDs) and in Journaling Drive mode (which uses one SSD exclusively for this), it turns off the disk caching and this severely affects performance no matter what you do. This is sad because I have an extra SSD just lying there unused:

I talked to Intel support and they told me that the only way to use caching with write hole protection “closed” or “turned on”, is to use optane caching. I have not tried that yet, but if I get a chance, I will.

To be continued…

References

Intel
Amd
Cpu
Raid
Technology
Recommended from ReadMedium