INTRO VIDEOS CLOUD COMPUTING DIRECTORY GLOSSARY ABOUT THE AUTHOR PRESS CONTACT SITE MAP
Key Topics: Requirements Hard Disks RAID DAS Optical Disks Solid State Drives SD Cards Online StorageTweet
This page provides an overview of the most widely available means of storing and backing-up computer data, and in doing so provides a supplement to the hardware and security pages. For more information you may also want to watch the following video on "data wrangling" in which I discuss the handling and long-term storage of large quantities of data:
Computer storage is measured in bytes, kilobytes (KB), megabytes (MB), gigabytes (GB) and increasingly terabytes (TB). One byte is one character of information, and is comprised of eight bits (or eight digital 1's or 0's). Technically a kilobyte is 1024 bytes, a megabyte 1024 kilobytes, a gigabyte 1024 megabytes, and a terabyte 1024 gigabytes. This said, whilst this remains true when it comes to a computer's internal RAM and solid state storage devices (like USB memory sticks and flash memory cards), measures of hard disk capacity often take 1MB to be 1,000,000 bytes (not 1,024,768 bytes) and so on. This means that the storage capacity of two devices labelled as the same size can be different, and which remains an ongoing source of debate within the computer industry.
Any sensible computer user will plan for two categories of storage. These will comprise the storage necessary to keep files internally on their computer, as well as those media required to back-up, transfer and archive data (as also explored in the security section). In turn, when deciding on suitable external storage devices, the key questions to be asked should be how much data actually needs to be stored, and whether the external data archive will be subject to random-access or incremental change.
If a computer user is usually only going to create word processor documents and spreadsheets, then most of their files will probably be in the order of a few hundred KB or maybe occasionally a few MB in size. If, however, a computer is being used to store and manipulate digital photographs, then average file sizes will be in the region of several MB in size (and potentially tens of MB if professional digital photography is being conducted). Yet another level of storage higher, if a computer is being used to edit and store video, individual file sizes will probably be measured in hundreds of MB or even a few GB. For example, an hour of DV format video footage consumes about 12GB of storage. Non-compressed video requires even more space -- for example 2GB for every minute of standard definition footage, and 9.38GB for each minute of non-compressed 1920x1080 high definition video. Knowing what a computer is going to be used for (and of course many computers are used for a variety of purposes) is hence very important when planning storage requirements.
In addition to capacity requirements, whether the data in a user's back-up archive will have to change in a random-access or incremental fashion can be a critical factor in the choice of external storage devices. A digital photographer, for example, will probably have incremental back-up requirements where each time they complete a shoot they will want to take a back-up of several hundred MB or a few GB of photographs that will subsequently never change. In other words they will want to keep a permanent record of an historical digital state of the world. Writing data like photographs to write-once media (such as CD-R or DVD-R as discussed below) would hence be perfectly acceptable. The photographer's total archive may be hundreds of GB in size, but would only be added to incrementally with previously stored data never being changed.
In contrast, somebody producing 3D computer animation may be re-rendering tens of GB of output on a regular basis to replace previous files in a random-access fashion. In this situation not only would re-writable media be more suitable, but the speed of the back-up device would become far more critical. Having to take a copy of even 50GB of data at the end of a working day is a very different proposition to a few GB, let alone a few tens or hundreds of MB. Further discussion of the suitability of different media for incremental and random-access back-up continues within the following explanation of available storage devices and technologies.
Spinning hard disk (HD) drives are today the most common means of high capacity computer storage, with most desktop and laptop computers still relying on a spinning hard disk to store their operating system, applications programs and at least some user data. Traditional, spinning hard disk drives consist of one or more disk "platters" stacked one above the other, and coated in a magnetic media that is written to and read by the drive heads. As discussed in the hardware section, hard disk drives can transfer data directly to other computer hardware via a range of three interface types (SATA, IDE/UDMA, or SCSI) and come in a range of speeds from 4200 to 15000 revolutions per minute (RPM).
Hard disks are almost always manufactured with either 3.5" of 2.5" platters (although just to break the rule a few smaller -- most notably 1.8" -- and even some larger platter disks are made by some manufacturers). For many years 3.5" hard disks have been standard for desktop computers and servers, and 2.5" hard disks for laptops. Yet this is now starting to change, with enterprise class 2.5" hard disks now increasingly being used in servers and some desktop computers due to their low power requirements. Indeed, the fact that Western Digital's top-of-the-range Velociraptor hard drives now use a 2.5" rather than a 3.5" mechanism speaks volumes and probably indicates that within a few years most spinning hard disk drives are likely to be 2.5". (Note that some raptor models are supplied in a metal "sled" for fitting into a 3.5" bay)
Whilst at least one hard disk is usually required inside a computer as the "system disk", additional hard disk drives can be located either "internally" inside the main computer case, or connected "externally" as an independent hardware unit. A second internal hard disk is highly recommended where a user regularly works on very large media files (typically digital video files) that are always accessed directly off the hard disk, rather than loaded into RAM. Where such files are loaded off a computer's system disk, the disk drive heads are inevitably constantly nipping back and forth between accessing the large media file and writing temporary operating system files, and this both degrades performance and reduces the life of the disk.
On servers and high-end PC workstations (such as those used for high-end video editing), at least two hard disks are often linked together using a technology called RAID. This stands for "redundant array of independent disks" (or sometimes "redundant array of inexpensive drives"), and stores the data in each user volume on multiple physical drives.
Many possible RAID configurations are available. The first is called "RAID 0". This divides or "strips" the data in a storage volume across two or more disks, with half of each file written to one disk, and half to another. This improves overall read/write performance without sacrificing capacity. So, for example (as shown above), two 1TB drives may be linked to form a 2TB array. Because this virtual volume is faster than either of its component disks, RAID 0 is common used on video editing workstations.
In contrast to RAID 0, "RAID 1" is primarily intended to protect data against hardware failure. Here data is duplicated or "mirrored" across two or more disks. The data redundancy so created means that if one physical drive fails there is still a complete copy of its contents on another drive. However, this does mean that drive capacity is sacrificed. For example (as shown above), a 1TB RAID 1 volume requires two 1TB disks. While data write performance is not improved by using RAID 1, data read times are increased as multiple files can be accessed simultaneously from different physical drives.
If more than two drives are used, several other configurations become possible. For example, using three of more drives, "RAID 5" strikes a balance between speed and redundancy by stripping data across two drives but also writing "parity" data to a third. Parity data maintains a record of the differences between the blocks of data on the other drives, in turn permitting file restoration in the event of a drive failure. (A great explanation of parity and RAID 5 in detail can be found in this video. For mission-critical applications, "RAID 10" strips and mirrors data across four or more drives to provide the gold standard in performance and redundancy. You can find a more detailed explanation of RAID 0, 1, 5 and 10 on TheGeekStuff.com.
Many modern personal computer motherboards permit two SATA hard disk drives to set up in a RAID configuration. However, for users who do not require the extra speed provided by RAID 0, RAID 5 or RAID 10, there are relatively few benefits to be gained. Not least, it needs to be remembered that any hardware setup featuring more than one internal hard disk -- whether or not in a RAID configuration -- at best provides marginal improvements in data security and integrity. This is simply because it provides no more tolerance to the theft of the base unit, nor to power surges or computer power supply failures (which can simply fry two or more hard drives at once rather than one). A summary of RAID can also be found in my Explaining RAID video.
Except where two internal hard disks are considered essential on the basis of performance (and possibly convenience), a second hard disk is today most advisably connected as an external unit, or what is sometimes now known as a "DAS" or direct attached storage drive. DAS external hard disks connect via a USB, firewire or an E-SATA interface (see the hardware section), with USB being the most common. The highest quality external hard drives routinely include at least two of these interfaces as standard, hence maximising their flexibility for moving data between different computers. As explained in the networking section, today some external hard disks can also be purchased as NAS (network attached storage) devices that can easily be shared between users across a network.
For most purposes, external hard disks offer comparable performance to most internal hard disks -- even when used for highly disk intensive processes such as video editing. This will be especially the case when a drive is connected via an interface such as USB 3.0. External hard disks also have the added convenience of being easily physically separable from the computer for secure and/or off-site storage. A user can also purchase additional external hard disks as their data storage requirements dictate.
External hard disk units normally include one 3.5" or 2.5" hard disk inside their case. Units with a 3.5" disk tend to offer a cheaper cost per megabyte. Units based on 2.5" drives are smaller and usually do not require an external power adapter (as a computer can supply enough electricity down the USB or firewire hard disk connection cable). Some external hard disks now include several physical disks inside one unit in some form of RAID configuration.
External hard disks offer a user fast and high-capacity external storage with a low cost-per-megabyte. In most instances, they are also only real option where high capacity, random access data archives have to be maintained. This said, many users will never have such archives, and there are several other disadvantages to DAS-style external hard disks.
For a start, whilst their cost-per-megabyte is low, their cost-per-unit is high compared to most optical media and solid state storage devices. External hard disks are also fairly easy to physically damage via impact or by getting them wet. Reliance on a single external hard drive can also place an entire data archive "in one basket", and is of no use at all where data either needs to be physically exchanged between users (as still happens even in the days of the Internet), or has to be accessed via a media device to which an external hard disk cannot be connected.
External hard drive units are also somewhat cumbersome for those wrangling tens of terabyes of data. For this reason, some people now transfer and store large quantities of data on bare hard disks connected to their computer as required (and usually via a flying E-SATA lead). However, this is hardly ideal, not least because both connectors and the drives themselves can become damaged. As shown above in my Explaining Data Wrangling video, one solution for those who need to work with a great many hard drives is to use house the disks in caddies that then slot into PC-mounted bay. Such caddies can sometimes also be connected to other computers via USB or E-SATA.
As a consequence of the above limitations, computer users handling both small and large quantities of data tend not to rely entirely on hard disk technology, and will therefore also make use of optical, solid-state or online storage technologies.
Almost all optical storage involves the use of a 5" disk from which data is read by a laser. Optical media can be read only (such as commercial software, music or movie disks), write-one, or rewritable, and currently exists in one of three basic formats. These are compact disk (CD), digital versatile disk (DVD)and Blu-Ray disk (BD). A fourth format called High-Definition DVD (HD DVD) is now dead-in-the-water.
Compact disk is a very mature, low-cost and reliable storage media particularly well suited for most personal computer users for incremental data archiving, as well as for the physical exchange of moderate-sized qualities of data. Writable compact disks can be either CD-R (which are a write-one media) or CD-RW (to which data can be written and erased typically a few hundred times). The storage capacity of a compact disk is up to about 700MB for CD-R and somewhat less for CD-RW media (and depending on the format used to write the data).
For the reliable back-up or exchange of up to 700MB of data there is still little to beat a compact disk. Problems accessing a CD-R disk are now very rare, and the cost of the disks is low if bought in bulk in "pancakes" of 25, 50 or 100 disks. The media are also physically very durable -- and certainly considerably more so than an external hard disk. The only real drawbacks to compact disks for data storage are the speed of access (even if a modern drive will write and verify a CD-R in well under five minutes) and the relatively limited capacity.
DVD followed compact disk into the optical storage arena, and most new computers are now equipped with an optical drive that will read and write both CD and DVD media. Due to format battles as yet unresolved (and now unlikely ever to be resolved!), DVD comes in two write-once formats (DVD-R and DVD+R), as well as two re-writable formats (DVD+RW and DVD-RW). Many older DVD writers will only write to either DVD-R and DVD-RW or to DVD+R and DVD+RW, so users need to take care to purchase the right media. Also many DVD drives will only read one type of rewritable media, and again users need to carefully take this into account when producing disks for other people. In general, it is fairly widely accepted that DVD+R is the most "stable" widely-readable write-once format (especially in domestic DVD video players) due to having superior error correction and burning control than DVD-R, whilst DVD+RW is the most flexible re-writable format.
To make matters a little more confusing, Panasonic also created a format called DVD RAM. This is actually a superb re-writable technology (disks can reliably be re-written tens of thousands of times, as opposed realistically to hundreds of times for DVD-RW or DVD+RW). DVD RAM disks are also starting to be widely used in domestic DVD recorders, and are available in caddy units that can be either single or double sided. For video recording purposes and stable data archiving, DVD RAM is the media of choice. The only constraint is that many DVD drives still won't read or write DVD RAM disks (although the number is rapidly growing), with even fewer drives accepting the caddied disks that offer the media the best protection from dust, and hence maximum the durability. Windows XP also has only limited support for DVD RAM.
The standard capacity for any format of DVD media is 4.7GB. Commercial read-only disks (as used to distribute movies) double this to 8.5GB by storing the data on two layers. Yet two more formats of DVD write-one disk (DVD-R DL and DVD+R DL) also exist to copy the same trick to raise writeable DVD data storage capacity to 8.5GB. However, once again not all drives will write these media, and in terms of cost per gigabyte it remains far cheaper (if less environmentally or archive-space friendly) to write two DVD-R or DVD+R disks rather than a single double layer (DL) disk. Double-sided DVD RAM disks -- that physically have to be turned over to read or write the other side -- have a capacity of 9.4GB.
Blu-Ray disk is the high-capacity successor to DVD, and the only surviving new optical disk media on the block. It was developed by the Blu-Ray Disk Association (BDA) as a higher-capacity replacement for DVD (and especially to allow for the distribution and home recording of movies in high definition). Whilst most of the attention in this area has until recently been focused on Blu-Ray's battle with HD DVD (see below), for computer users Blu-Ray already offers write-once (BD-R) and re-writable (BD-RE) disk capacities of 25GB on a single-layer disk and 50GB on a dual layer disks. Just as importantly for the format, multi-hundred GB disks are already in the lab and on the consumer horizon.
More information on Blu-Ray can be found via the FAQ files at Blu-Ray.com.
It is worth noting for completeness that HD DVD was the contender to Blu-Ray Disk to replace DVD as the next generation optical storage media for both computer data storage and domestic video use. HD DVD disks had a 15GB capacity (lower than Blu-Ray disk at 25 or 50GB, and not that much higher than dual layer DVD-R DL or DVD+R DL disks at 8.5GB). HD DVD was created by Toshiba and NEC, and was backed by Microsoft. However, most movie studios and other computer industry players (including Sony, Panasonic, Philips, Samsung, Pioneer, Sharp, JVC, Hitachi, Mitsubishi, TDK, Thomson, LG, Apple, HP and Dell) were on the side of Blu-Ray. Indeed, it was following the defection in early 2008 of Warner Bros from HD DVD camp that Blu-Ray won the high capacity optical disk format wars. Hurrah!
As an aside, in the television industry, Sony now sells professional video cameras and recorders that use its own 23.3GB XD-CAM optical disk storage system.
Whatever format of optical disk media users choose, an ongoing debate concerns the archival qualities of all forms of optical media (ie how likely it is that data is going to remain on a disk in the long-term). Everybody seems to agree that archives should never be made on re-writable media (ie CD-RW, DVD+RW, DVD-RW or BD-RE), and advice to make new copies of optical media at least once every few years is not uncommon. For an in-depth discussion of this issue, see this excellent article on How To Choose CD/DVD Archival Media. And if you don't want an in-depth discussion, the short recommendation from this article is to archive on write-once media manufactured by Taiyo Yuden (the creators of recordable CD), and as available in the UK from retailers including DVDshoponline. To make matters far easier, in 2010 Taiyo Yuden bought the JVC Media brand, meaning that Taiyo Yuden media can now be purchased in (some) JVC boxes. Another solid archival option is to purchase "gold archival" DVD media made by Verbatum or Kodak, and fairly widely available (if at about triple the cost of standard DVD-R or DVD+R disks).
Solid state storage devices store computer data on non-volatile "flash" memory chips rather than by changing the surface properties of a magnetic or optical spinning disk. With no moving parts solid state drives (SSDs) -- are also very much the future for almost all forms of computer storage.
Sometime in the second half of this decade, solid state drives are likely to replace spinning hard disks in most computers, with several manufacturers now offering hard-disk-replacement SSDs. These are often very fast indeed, extremely robust and use very little power. As pictured above, typically today most hard disk replacement SSDs are the same size -- and hence a direct replacement for -- a 2.5" hard drive. They also usually connect via a SATA interface.
Unfortunately the prices of solid state drives are currently high, with the lowest capacity disks (of 30 to 64GB) costing in the £60 to £120 bracket, and the highest capacity disks (currently up to 512GB) being in the region of £1,000. At present SSDs are therefore generally only being used in high-end PCs and laptops, and as a means of increasing robustness, reducing noise, decreasing power consumption, and often significantly decreasing boot-up times.
As a notable exception, for a couple of years some ultramobile "netbook" computers and some low-power desktop computers -- such as the Asus Eee PC -- used an SSD rather than a traditional hard drive, and which was made cost-effective by limiting disk sizes to around 4-8GB. Sadly, on netbooks this trend has now died out. However, the new Google Chromebooks are SSD-based, as our the lovely if pricey Macbook Air notebooks. For more information on solid state drives as hard disk replacements, you may also like to watch the following video:
FLASH MEMORY CARDS
The above discussion of hard-disk replacement SSDs noted, at present for most people most solid state storage devices come in two basic forms: flash memory cards and USB memory sticks.
Flash memory cards were developed as a storage media for digital cameras and mobile computers. They consist of a small plastic package with a contact array that slots into a camera or other mobile computing device, or an appropriate memory card reader. Such readers usually have several slots (to accommodate the various formats of flash memory cards now available), and can either be integrated into a desktop computer or laptop's case, or connected via a USB port as an external hardware unit. In addition to still and video digital cameras, many mobile phones, tablets, netbooks, media players, audio recorders and televisions now also have slots for reading and writing a flash memory card.
The capacity of flash memory cards on the market currently ranges from 8MB to 64GB. There are also six major card formats, each with its own type of card slot. The most common format is the secure digital or SD card (see below). Next most popular are compact flash (CF) cards, which were the first popular format introduced, and which are used by many professional digital cameras and audio recorders. Finally come Sony's Memory stick format (and not to be confused with a USB memory stick), the multi-media card (MMC) and the xD picture card (XD card).
Adapters are available to allow a compact flash card to be connected to a computer's motherboard instead of a hard disk, and these are becoming popular on small-format computers running the Linux operating system. As another aside, Panasonic have their own video recording flash memory card format called the P2 card. This is internally based on four high-speed SD cards, currently available in 16, 32 or 64GB capacities, and is used instead of tape on some professional video equipment. In April 2007, Sandisk and Sony also released an alternative flash memory card format -- the SxS card -- currently also available in 16, 32 and 64GB capacities. This said, even in professional video, compact flash and even SD cards are becoming the dominant recording media.
SD cards are as noted above the most popular flash memory cards now on the market, and come in so many variants that they do require some explanation. For a start, SD cards come in three physical sizes. These comprise standard-size SD cards (first developed in 1999), smaller mini SD cards (introduced on some mobile phones in 2003), and the even smaller micro SD cards. The latter were invented in 2005 and are becoming increasingly popular on smartphones and tablets. While the larger cards cannot fit in smaller card slots, adapters are available to enable micro and mini cards to be accessed by any device that accepts a standard-size card.
SD cards also come in three capacity types known as SD, SDHC and SDXC. The first of these can store up to 2GB of data. SDHC (SD high capacity) cards are then available in capacities of between 4 and 32 GB, while SDXC (SD extended capacity) cards range from 32GB up to a theoretical 2TB (although at present only 64GB cards are on the market).
Because SD cards now come in three capacity syltes, not all SD devices can access all SD cards of the same physical design. While standard SD cards can be read by anything, SDHC cards should only be inserted into SDHC or SDXC devices. SDXC cards must then only be used with the latest SDXC hardware. If you try to use an SDXC or SDHC card in a device that does not support it then you may lose data or even damage the card.
To further add to the confusion(!), SD cards are currently also available in five speed classes. These are known as class 2, class 4, class 6, class 10, and UHS-1 (ultra high speed 1). Many manufacturers also label cards with a speed multiple that compares them to a CD-ROM drive. Absolute data transfer ratings are sometimes also included. However, in practical terms it is the speed class that really matters.
As may be expected, the higher an SD card's speed class, the faster it will be but the more it will cost. For most purposes class 4 or class 6 cards are fine. This said, class 10 or UHS-1 are best for high definition video or when otherwise handling large quantities of data. You can learn lots more about SD cards in my Explaining SD Cards video, as well as from the SD Association.
USB MEMORY STICKS
USB memory sticks (or USB memory keys, USB memory drives, or whatever you choose to call them!) are basically a combination of a flash memory card and a flash memory card reader in one handy and tiny package. Over the past five years, USB memory sticks have also become the dominant means of removable, re-writable portable data storage, and look set to remain so for some time. Not least this is because of their size, ever-increasing capacity (which currently ranges from about 512MB to 256GB), and perhaps most importantly their inherent durability.
As with other storage devices, there are two key factors to consider when selecting a USB memory stick: capacity and data transfer speed. Whilst most consumer attention remains on the former, the later can be at least as critical. It is not uncommon for some USB memory sticks to transfer data at least ten or more times slower than others (I recently compared transferring 1GB of files between a high-specification Corsair Voyager USB memory stick and a cheaper "own brand" model and measured transfer times of under 2 minutes and approaching 30). The extent to which this matters depends as discussed previously on whether the data in your archive is only updated incrementally (with each new document), or more completely (with a large number or a few large files replaced on a regular basis). A USB memory stick that takes 30 minutes to shift a gigabyte of data is fine if you only copy a few tens of MB or less to it per day. However, if you regularly have to back-up multiple GB, you need a fast USB memory key if you are not to lose your sanity.
Fortunately, just why some solid state disks are slower than others is not a mystery. Rather, it is a function of the type of flash memory chips used to hold the data. Without going into great technicalities, these chips come in two varieties called single level cell (SLC) and multi level cell (MLC). Basically, MLC flash chips store two or more bits of data in each memory cell, whilst SLC chips store only one. MLC solid state disks are therefore cheaper to produce than SLC disks at any given capacity, but due to storing more than one bit of information in each memory cell take longer to write and read data. If you need a fast USB key, memory card or indeed hard-disk replacement SSD then you need to pay more to obtain an SLC device.
Many computer users may never have to back-up their data to a removable media or external hard drive (and indeed may be discouraged or banned from doing so) because their files will be stored and backed-up on their company's network servers. Even in the home (and as discussed in the networking section), back-up to a server is also now an option for many. Far more fundamentally, all of those switching in whole or part to cloud computing are now storing at least some of their data out on the Internet. And even those not using online applications and processing power now have the option of backing up moderate amounts of data online, and often for free!
Files stored and/or backed-up online are still saved to a hard disk rather than to some magic, new alternative media. However, the fact that the disk is located remotely to your computer, can be accessed from anywhere, and is probably backed up by the service provider(?), can make online storage and back-up very attractive. Indeed, when Google added 1GB of free online storage for any type of file to its Google Docs online office suite it even stated in the press release that one of their intentions was to remove the need for people to use and carry USB memory keys.
Cloud data storage services come in two flavours. Some simply provide online filespace, whilst others additionally include a back-up synchronization service. An online filespace can be thought of as a hard disk in the cloud that can be accessed with a web browser to upload or download files. One example is Microsoft's Windows Live Skydrive, which provides 25GB of personal storage absolutely free (although there is a maximum file size limit of 50MB). As already noted, Google Docs offers 1GB of free online storage to which any kind of file can be uploaded up to a maximum size also of 1GB. Google then charge $5 a year for each additional 20GB. Another popular online filespace provider is box.net.
For those people who may forget to regularly back-up their data to one of the above, there are cloud storage services that automate the process. These require the installation of a piece of software on each computer that uses them. This local application then automatically backs up data to the cloud, and may also synchronize it across PCs. Such a service is offered by Dropbox, which describes itself as a kind of 'magic pocket' that becomes available on all of your computing devices. For a more extensive listing of online storage services, please look in the cloud computing directory.
Every major media has now gone digital, and as a result both companies and individuals are creating an increasing volume of data not just to initially store, but just as importantly to manage and back-up into a coherent archive. Indeed, in the film industry where the digital storage requirements for high-speed, random access archives can run into tens of terabytes on a major blockbuster, the job title of "data wrangler" has been born to signal the requirement for people to take on effective data management in order keep the production running effectively. (With the decline of the Western, there has been a decline in the need for horse wranglers, though sadly the skill sets required for data wranglers and horse wranglers are not similar, with no former horse wrangler having been reported to have taken up residence in a data center).