All hardware will ultimately fail one day. This is one of the painful truth of technology. For most of the types of hardware used in modern infrastructure, the loss of a single component usually incurs some amount of downtime. Other than the time taken to swap out something like a bad CPU or stick of RAM, sysadmins or users rarely see many long term ill-effects. But unless an admin takes particular care with storage, data loss from disk failures can have immediate and lasting consequences.
Take a user’s desktop as an example: If they store their data locally on a single drive, then when the drive inevitably fails, their data will be lost. The same is true no matter the quality, brand, or type of drive. Of course, there are data recovery outfits that would be happy to take hard-earned cash in exchange for the possibility of resurrecting bits from dead drives. Unfortunately, the cost quickly becomes exorbitant, and even those specialists fall short at some point.
Administrators have a number of options at their disposal to fend off looming disaster: RAID, backups, clusters of networked storage, etc. Often these options are used together to provide layers of data protection and multiple opportunities to stop an issue before it becomes too late. Building redundant arrays of disks and abstracting the storage away from single drives is the simplest and best way to remove these single points of failure.
What is RAID?
Redundant Arrays of Inexpensive Disks (RAID) is one of the most widely used and effective storage technologies a sysadmin will come across. Being comfortable with its most common implementations is vital. RAID can be offered as a software solution through an operating system utility like mdadm in Linux, a hardware RAID controller like the MegaRAID line of cards, or even chipsets that give pseudo-RAID capabilities. Hardware controllers like those in the MegaRAID line should not be confused with host bus adapters (HBAs) though, they are designed for simple and direct access to disks.
At a high level, the concept of RAID is grouping a collection of drives into an array to write data across them. Depending on the configuration, the data will be written in different ways, with different amounts of parity information to help rebuild the data in case of a drive failure. While it’s possible to use different types, speeds, sizes, or connections for drives in an array, it’s best to make them match as much as possible. Differently-sized drives almost always end up carved down to the lowest common denominator, and drives of different speeds have to wait on the slowest.
Many admins do prefer to buy drives from different manufacturers, though, to avoid bad batches of drives causing concurrent failures across members of arrays.
Understanding RAID 0
RAID 0 is a configuration where the data is evenly distributed across 2 or more disks, however there is no redundancy or fault tolerance possibilities. This means that in case one drive fails, it will cause the entire setup of all the drives to fail ultimately resulting in a total data loss. This RAID configuration is used when having a great speed for backing up is the intended goal.
Understanding RAID 1
RAID 1 is a configuration setup where an exact copy of a single or a set of disks is present. A classic RAID 1 configuration is a mirrored pair containing two disks. The data acrosss all individual hard disks or hard disks setup is similar. his layout is useful when read performance or reliability is more important than write performance or the resulting data storage capacity. This RAID configuration will keep working as long as one disk in the RAID setup is functioning.
Understanding RAID 5
RAID 5 consists of block-level striping with distributed parity. Unlike in previous RAID configurations, parity information is distributed among the drives. It requires that all drives but one be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks. Write performance is increased in RAID 5 since all RAID hard disks participate in the serving of write requests.
RAID vs Backup
One of the most commonly espoused sayings in the realm of system administration seems to be: “RAID is not a backup.” For new admins or those who don’t spend much time thinking about storage, this fact may not be immediately obvious. It may even seem antagonistic or flat out wrong.
The issue comes from the fact that the redundancy built into RAID configurations is built with the same goals in mind as backups: Fighting against data loss. The reason it’s so important to talk about the difference is not to nitpick, but to remind ourselves that these tools exist to provide us with layers of protection, and by lumping them together we do ourselves a disservice.
RAID exists to provide an immediate, live copy of data to assist a running machine as a crutch as it picks itself back up after it stumbles. On the other hand, backups offer an opportunity to test our ability to restore a machine to a working state or to recover data without needing the machine to be running. Backups give us other benefits that RAID does not as well, including the ability to push copies to multiple places on multiple types of media, and save multiple versions.
RAID and backups fill different roles, but both are important, and neither should be neglected.