The hidden, but awesome feature of Intel CPUs which nobody told you about
I have always believed that if you are spending your hard earned money on anything, you should utilize it to the MAX. This article is about how and why you should really be using a very important feature of Intel CPUs if yours support it.
This is new and exciting because the cost of NVMe SSDs have come down so much that, it is now foolish to not be protected against hardware failure with RAID 5 while simultaneously getting much, much better performance.
Let us not forget that the biggest performance hog of any computer is the disk. You increase the disk performance, to substantially improve your overall performance no matter how many CPU cores you have, or RAM.
Given below is a simplified explanation of RAID. This can be hard to understand, but over time and over use, you will understand it better.
Redundant Array of Independent Disks (RAID)
I still remember the day a very long time ago when I first read about RAID and purchased a PCIe RAID card to hook up multiple old school HDDs in RAID 0. For a very long time, RAID 0 was the only RAID I used. But those were the days of HDDs, where everything ran super slow and NVMe was a fantasy.
RAID 0
RAID 0 will give you maximum performance & disk space, but your data will always be at the edge of a knife. Very easy to lose all your data!
RAID 1
RAID 1 will just protect your data and you get half the disk space across all your disks. There won’t be any performance benefits, it will just safe guard your data.
RAID 5
RAID 5 gives you the best of both worlds. You get good performance, good data performance and most of the disk space. This is obviously more complex to implement, and hence some vendors do not support this by default.
Even if one drive fails completely, everything will still keep working (albeit slowly) while you get the replacement drive and swap it.
This is what I recommend for most consumer scenarios.
RAID 10
This is what most data centers do. It will use up a lot of drives (which is the biggest problem with it), but you can lose more than one drive and still continue to work — that too without any loss of performance (which is pretty awesome if you think about it).
Then swap the failing drives when you get a chance. This is cost intensive but perfect for data center scenarios.
Intel RAID
If you have a newer Intel CPU, you should find out whether it supports Intel RAID or not. This is usually referred to as:
- Intel RST (Rapid Storage Technology)
- Intel VROC (Virtual RAID on CPU)
Optane fits into this as a caching technology which can be used in conjunction with a highly reliable Intel RAID configuration so that your data is protected very well + you can use Optane caching to increase your performance as well.
Intel VROC 7.5 Specifications (March 2021)
The documentation makes more sense once you go through the “death by a thousand cuts” experience I had while trying to get everything working. This is a summarized, relatively easy to understand, walkthrough of this technology.
CPUs which support Intel VROC
You need bootable RAID with NVMe support for sure:
You need a VROC Premium hardware key for the technology to work properly with non Intel SSDs:
- These are usually sold by the Motherboard Manufacturer or Intel Dealers.
It is sad to see only the Samsung OEM SM961 (pretty old) SSD is officially supported. I do know for a fact that if you get a supported SSD, you get 0 parity errors, even with SSDs pretty old like I have on my Dell 77xx laptop.
This is “easy” to setup if you have a desktop, although laptops like the Dell Precision 77xx series, which can take at-least 3 NVMe SSDs can also do Intel RAID:
Most laptops can only take (at the most) two “true PCIe” NVMe SSDs which severely limits us from using the best RAID option out there — RAID5 which combines speed with data protection at an optimal cost.
This article talks about how to configure your OS drive to use Intel VROC:
- You really need to use atleast RAID 5 whenever you can to avert a total disaster situation when a drive fails (especially a consumer drive).
And this is a further continuation of that story:
I wrote this article as the continuation in this series because there is a ton of information I had to find out the hard way on my journey and this is going to help someone else prevent a lot of pain.
You decide to do this because the feature is just lying there unused, wasted if you may, which can provide you with a faster experience, with zero downtime, hardware failure protection “built in”.
The catch with AMD where this is supported “with all CPUs”, is that they do not support RAID 5, which really is the only cost effective RAID option which “gives you all”. Don’t get me wrong though — all my personal PCs are AMDs, I hate Intel for their sockets which keep changing all the time.
The first thing to remember is that if you decide to go ahead with this, use a supported NVMe SSD and the fastest & newest of the supported SSDs are a bit old — the Samsung OEM drives. If you use any unsupported drive, be prepared to deal with parity errors in your RAID 5 setup.
I already got three 510gb Samsung 970 Plus NVMe SSDs, and I am sticking with them for now because it would be too much of a hassle to return these (cheap) ones and find older OEM drives which actually cost more today. But, in living with these, I run the RAID verify/ repair utility daily, and it always finds about 10–13 parity errors which get fixed:
I have found after much painstaking experimentation that you really need to turn off write-cache buffer flushing, because it causes more parity errors when turned on, and it really does need a separate power supply for the drives to support that feature:
Turning on read patrol does significantly reduce parity errors when using non recommended SSDs:
Turning off the disk data cache severely affects performance no matter how powerful your setup is:
I found that processes in windows like search indexing uses > 80% disk IO severely slowing down even the most powerful Intel Workstation, when disk data cache is disabled, because you wanted to turn on the RAID write hole protection:
- I tried numerous times, but no matter what I did my PC with 10 cores, 20 threads, 32GB ECC RAM and RAID 5 on NVMe runs super slow every few minutes because of very high Disk utilization from various windows processes.
Both in distributed mode (which uses all the SSDs) and in Journaling Drive mode (which uses one SSD exclusively for this), it turns off the disk caching and this severely affects performance no matter what you do. This is sad because I have an extra SSD just lying there unused:
I talked to Intel support and they told me that the only way to use caching with write hole protection “closed” or “turned on”, is to use optane caching. I have not tried that yet, but if I get a chance, I will.
To be continued…