This page is my quick crib sheet of issues to be considered when making copies of data for backup or archival purposes.
The process for backup and archiving can be very similar but there are key differences. These differences are important when considering what method(s) you will use.
The act of taking a snapshot of the current state of a system, to protect the data on the system and hence your business from a catastrophic event. Most days nothing happens and the backup is not used.
The following is a short table of reasons people have wanted a backup, based on 30 years memory of working in the IT industry..
Event | Count | Comment |
---|---|---|
User | 100+ | I did not mean to :-
I genuinely thought that phone call was from:-
|
Mechanical | 100+ | Hard disks fail. For 5.25", and 3.5" drives, made before 2005 the most common issues I have seen are bearing or motor related. With the more recent drives they most commonly seem to develop issues reading particular sectors. Either way the drives fail. RAID helps, but you can still hit issues when the rebuild process discovers additional problematic sectors. Side note modern disk drives especially the small 2.5" units common in laptops, notebooks, and portable drives. Seem to be particularly sensitive to external magnetic fields. Keep away from:-
|
Loss of power | 30 | Modern file systems keep journals of key changes, and are much more tolerant of loosing power mid change. However they do not guarantee that all files will be complete and consistent. |
Lightning | 12 | Risk varies greatly with location and recent weather. |
Fire | 2 | Mitigate with good building design. Do not leave unterminated lengths of cable in void spaces. |
Canal | 1 | Break in, office trashed, computer thrown in canal. |
Wiring fault | 1 | Data cabling is not designed to carry mains power. |
Malicious | 1 | Ransom ware |
This has stood the test of time, but it is pretty much a minimum.
Whatever you backup procedure, it should be done on a regular schedule. For a home computer it may be that very little changes on a day to day basis, but at the start of each month you do on line banking, check your statements, and save copies.
In this case backing up your documents each month when you have finished checking your statements, would be a logical choice.
In a business situation where a lot more changes, I would suggest performing a backup procedure at the end of each working day. Procedure should include writing one copy to removable media, and taking it off site.
For a 5 day operation have 5 tapes labelled Monday, Tuesday, Wednesday, Thursday, Friday, plus a couple of spares. If the operator who does the backup on Monday falls ill, another staff member can do Tuesday's backup. If you have a problem on a Thursday there is no confusion as to which tape you would like to restore from, it is the one labelled Wednesday.
This is a particularly nasty class of malware, that encrypts, or scrambles data files on your computer. Sender offers a blackmail proposition, pay us some money or face the costs of loosing your data. Paying up does not guarantee that you get your data back.
Key issue with regard to backups is that the malware often launches a phased attack.
If the attacker decides to run through these steps quickly say 30 minutes, then even if the attack coincides with a backup the data from the previous backup should be OK.
Snag is if the attacker is leisurely, taking weeks or even months to ex-filtrate data before you notice. Will your backup schedule and method retain a safe backup in this case?
Making a copy of a set of data currently on a server, for long term storage with the expectation that the data is going to be deleted from the server. so that the only copy will be the long term archive.
Generally make multiple identical copies, add file check sums, or parity information. Over and above the error correction built in the the recording medium.
All storage media degrades with time. For printed paper that may be 100s or even 1,000s of year. But for all current storage technologies it is much much shorter.
FLASH | 5 to 10 years | The charge on the memory cells slowly degrades, eventually too many cells will be in error for the error recovery to recover your data. |
---|---|---|
Tape | 10 to 20 years | Tape substrate degrades. |
Optical dye based | 100 years? | Depends a lot on the disk quality and construction. |
Verbatim mdisk | 1000 years | sold specifically for archiving, different construction. |
Next consideration is what equipment will you have available to access the archive in 5, 10, 20 years time. Servers are generally expected to have an operating life of around 5 years, so you may be 2 or 3 server changes down the road when you want to access that archive.
I figure this is fairly obvious, but have come across multiple cases where it has been missed.
If you are putting data away for long term storage, make sure you also include a copy of the applications needed to access that data.
Take a common example, you run a computerized payroll. Each year the government makes changes to the tax rules. so at the end of each financial year you apply an update. If you archive the data but not the program, then when you suddenly find a need to access that data from 6 years ago you have a problem.
This has worked quite well with our development teams.
Backup to a rotating set of tapes labeled Monday, Tuesday, Wednesday, Thursday, Friday.
At the end of each month remove the last tape used from the rotation, and label with the month and year.
Literally the data is encrypted sent and decrypted at the receiving end. If you are using a HTTPS connection from a web browser to say your bank. Then the data is encrypted by the bank sent to you and decrypted by your browser. The web server had access to the unencrypted data, but is hopefully well secured. Your browser may have cached the decrypted pages on your PC.
File sharing sites often support encryption in transit, but the shared files are usually stored unencrypted on their servers.
If your home computer is stolen, then the thief can in most cases access any file stored on its disks. High end disk drives marketed to companies working with sensitive data, have the ability to have an encryption key loaded to the drive during power up. This key is used to encrypt and decrypt data written to or read back from the drive. If the drive is powered off it forgets the key.
Drives removed for maintenance, or stolen therefore contain no usable data. Some Tape drives, and some USB keys, have the same abilities. Some operating systems have the option to perform a similar function in software, at the cost of a small performance penalty.
End to end or whole life encryption. You encrypt your confidential data, send it some where for storage, then later recover it possibly to a different location. At no point between when you encrypted it, and when you received it back was the data decrypted.
A backup of a data set to a different storage area, on a server. Very useful for rolling back an upgrade, that went wrong. Usually the quickest form of backup to make, also the quickest to restore.
May also be used as the first step of an off server or off site backup procedure, to minimise downtime on a database.
Backup is to removable media, or a different server in the same building. Protects against problems affecting the original server, but not to problems affecting the building.
If you have a lot of devices with small amounts of storage that need to be backed up. Copying them all to a backup server fitted with large but relatively slow disks, then making an off site backup from that server, makes a lot of sense.
Backup has enough physical separation from the originating device that it is very unlikely, that any single event would affect both.
A container (file), that contains multiple files, no mater why it was created is referred to as an archive. This can be a great source of confusion.
How secure is your data? This question can be looked at in two different ways.
I am not aware of any GPS enable tracking systems for backup media, that would detect when the media was "off site". From an administrative point of view, this could automate the record keeping for the offsite eliment of a 3-2-1 backup plan. It would also help with security as people do misplace things, which can be extreemly ease with the smaller forms of backup media.
LTO tape cartridges contain a contact less memory device, but it has a very short range, less than 20mm, it is intended to be used by the drive to record usage details.
Generally tape cartridges intended for use in large automated library systems, have had provision for a label using text and a code39 bar code. Since 2010 HP enterprise, has produced a range of labels for LTO media that incorporates a RFID tag. This is intended to work as part of an asset tracking system with detectors at key doorways.
As of 2016 MicroSDXC cards are readily available up to 128GB, standard allows for 2TB. New mobile phones are expected to adopt the USB3 "type C" connector. What are the pros and cons of putting a high capacity MicroSDXC card, in a mobile phone, and using that as a portable, trackable storage device?
A lot of different types of media are or have been used for backups or archiving. They all have their own strengths and weaknesses. As technology has changed so capacity and speed have improved.
The table below is as of 2016. Table is sorted by native capacity. Some devices mostly tape drives, can attempt to compress data as it is written.
Devices which are no longer available, have a grey background.
Capacity | Media | Drive Cost | I/O speed | |||
---|---|---|---|---|---|---|
Native | Compressed | Type | ID | Cost | ||
60 MB | 120? | Tape | DC600A | |||
125 MB | 250? | Tape | DC6150A | |||
525 MB | 1G? | Tape | DC6525A | 300 KB/s? | ||
700 MB | ? | Optical | CD | 18p | £20 | 1x 150KB/s 4x 600KB/s 16x 2.4 MB/s |
1.2 GB | 250? | Tape | QIC136 | 300 KB/s? | ||
4 GB | ? | FLASH | iStorage datashur 4GB USB2 | £40 | N/A | USB2 |
4 GB | 8 GB | Tape | DDS2 | 600 KB/s, 2100 MB/hour | ||
4.7 GB | ? | Optical | DVD+R | 14p | £20 | 1x 1.25 MB/s 4x 5MB/s |
8 GB | ? | FLASH | iStorage datashur 8GB USB2 | £60 | N/A | USB2 |
9.2 GB | ? | Optical | DVD+R-DL | 36p | £20 | 1x 1.25 MB/s 4x 5MB/s |
12 GB | 24 GB | Tape | DDS3 | 1.1 MB/s, 4GB/hour | ||
16 GB | ? | FLASH | iStorage datashur 16GB USB2 | £82 | N/A | USB2 |
20 GB | 40 GB | Tape | DDS4 | 2.4 MB/s, 8.6 GB/hour | ||
? GB | ? | Optical | HD-DVD | |||
25 GB | ? | Optical | Bluray | 40p | 1x 4.5 MB/s 2x 9MB/s | |
30 GB | ? | FLASH | iStorage datashur 30GB USB3 | £179 | N/A | USB3 |
32 GB | ? | FLASH | iStorage datashur 32GB USB2 | £98 | N/A | USB2 |
36 GB | 72 GB | Tape | DDS5 DAT72 | £13 | 3 MB/s or 12.6 GB/hour | |
40 GB | 80 GB | Tape | DLT4 | £66 | 10 MB/s, or 36 GB/hour | |
50 GB | ? | Optical | Bluray-DL | £3 | 1x 4.5 MB/s 2x 9MB/s | |
100 GB | 200 GB | Tape | LTO1 | £48 | 17 MB/s | |
100 GB | ? | Optical | Bluray-TL | 1x 4.5 MB/s 2x 9MB/s | ||
120 GB | ? | FLASH | iStorage datashur 120GB USB3 | £252 | N/A | USB3 |
128 GB | ? | Optical | Bluray-QL | 1x 4.5 MB/s 2x 9MB/s | ||
200 GB | 400 GB | Tape | LTO2 | £43 | 34 MB/s | |
240 GB | ? | FLASH | iStorage datashur 240GB USB3 | £330 | N/A | USB3 |
300 GB | 600 GB | Tape | SDLT2 | £170 | 36 MB/s, 130 GB/hour | |
320 GB | ? | RDX | RDX-320GB | £122 | £160 | USB |
350 GB | ? | RDX | RDX-350GB | £160 | USB | |
400 GB | 800 GB | Tape | LTO3 | £35 | 68 MB/s | |
500 GB | ? | RDX | RDX-500GB | £165 | £160 | USB |
800 GB | 1.6TB | Tape | LTO4 | £23 | 120 MB/s | |
1 TB | ? | DISK | iStorage diskAshur 1TB USB3 | £183 | N/A | USB3 |
1 TB | ? | RDX | RDX-1TB | £200 | £160 | USB |
1.5 TB | ? | DISK | iStorage diskAshur 1.5TB USB3 | £250 | N/A | USB3 |
1.5 TB | 3 TB | Tape | LTO5 | £19 | 140 MB/s | |
2 TB | ? | RDX | RDX-2TB | £390 | £160 | USB |
2.5 TB | 6.25 TB | Tape | LTO6 | £30 | 160 MB/s | |
3 TB | ? | DISK | iStorage diskAshur 3TB USB3 | £285 | N/A | USB3 |
6 TB | 15 TB | Tape | LTO7 | £141 | 315 MB/s | |
12.8 TB | 32 TB | Tape | LTO8 | 472 MB/s |
RDX and flash keys are commonly connected via USB. Speed may be Limited by either the interface speed, or the device characteristics.
RDX docking stations have been available with both USB2 and USB3 connections. If you are using an older USB2 model, upgrading to USB3 may get you a useful performance gain providing your computer supports it.
Tanburg produced a quad RDX docking station that connects iSCSI over 1 Gbit network, so up to 120 MB/S.
In essence a jukebox for backup tapes. Enclosure typically holds 1 to 4 tape drives, and 12 tape cartridges. Much bigger units are available. They are generally targeted at banks, and scientific research groups. Both of which tend to have a lot of data they want to backup or archive.
Where a backup will not fit on one tape, the tape library can be setup to automaticaly swap tapes and continue.
File compression, and data deduplication techniques can reduce the amount of data to be stored. They are not always a good idea.
The following table shows the result of running a 682,716,160 Byte archive containing a mixture of text files, and image files. Through a number of compression programs. with 4 Intel (Haswell) cores available, specifically E5-2687W v3 @ 3.10GHz, and 1GB of RAM
Program | Size | Percentage | Time | Read | Write |
---|---|---|---|---|---|
xz | 583,166,776 | 85% | 4m 19s | 2.6 MB/s | 2.3 MB/s |
bzip2 | 597,040,030 | 87% | 1m 29s | 7.7 MB/s | 6.7 MB/s |
pbzip2 | 597,232,025 | 87% | 22.9s | 29.8 MB/s | 26 MB/s |
gzip | 598,583,165 | 87.6% | 22s | 31.0 MB/s | 27.2 MB/s |
zip | 598,583,343 | 87.6% | 19.8s | 34.5 MB/s | 30.2 MB/s |
lzop | 613,016,913 | 89.8% | 2.5s | 273 MB/s | 245 MB/s |
compress | 758,656,977 | 111.1% | 16s | 42.7 MB/s | 47.4 MB/s |
Larger set of files (24 GB) on an internal SATA disk, no images. Output to USB2 attached hard disk. Intel core 2 duo @ 2.6 GHz.
Program | Archive Size | Percentage | Time | Read | Write |
---|---|---|---|---|---|
spax+xz | 3,723,833,168 | 15.5% | 4h 28m 23s | 1,496 KB/s | 231 KB/s |
spax_bzip2 | 5,217,319,955 | 21.7% | 46m 40s | 8.6 MB/s | 1.7 MB/s |
spax_pbzip2 | 5,218,819,020 | 21.7% | 27m 10s | 14.8 MB/s | 3.2 MB/s |
spax+zip | 6,673,666,290 | 27.7% | 24m 13s | 16.6 MB/s | 4.6 MB/s |
spax+lzop | 9,667,022,074 | 40.1% | 5m 37s | 71.5 MB/s | 28.7 MB/s |
spax+compress | 12,636,592,083 | 52.5% | 9m 10s | 43.8 MB/s | 23.0 MB/s |
spax | 24,088,596,480 | 100% | 11m 5.8s | 36.1 MB/s | 36.1 MB/s |
This is a real odd ball mix.
Further based on those figures i would expect the best option for a 100 Mbit link would be spax+lzop I estimate 12m 54s. For a 10 Mbit link i estimate that using spax+bzip2 would take 1h 3m 28s. for a 1 Mbit link i estimate that using spax+xz would be quickest and take 8h 15m 58s.
While data is on a server, especially a rack mounted server in a secure room, it is physically protected. All current operating systems make provision for controlling access, to data files and programs.
Guides on how to configure a secure environment are freely published by a number of organizations.
When you backup your system, to a removable device you loose this protection.
When you remove a storage device from a system, you loose this protection.
To be very blunt when i joined the industry theft of information from backup media, was not seen as a major problem. Tapes were of a substantial size, and in modern terms held very little data. A DC600A cartridge held 60 MB, and was about the size of a smallish paperback novel. If you wanted a larger data capacity, you were looking at reel to reel tape systems, with tape spools around 50 cm in diameter, and 3 cm thick.
Because of the small memory and storage capacities, of the computers of the day. The programs tried very hard to squish the data they wanted to store in to the smallest space possible. This had an unintended but interesting consequence.
If you wanted to make sense of the data on a tape, you needed a copy of the program that wrote it. you also need the computer to run that program, and someone who new how to set it up. Generally that combined to make a data tape pretty much useless to anyone other than its owner.
If your data includes personal data then there are a number of pieces of legislation that set out your need to look after personal data.
This is a tool, it is not a one size fits all cases solution. Encrypting media that is to be shipped off site adds a useful layer of protection. If the media is lost, stolen, or other wise misplaced, the information is safe from disclosure, providing you did not ship the key with the media.
The down sides are that if the media is you backup, and you can not locate the key when you want to recover the data, then you have a big problem. The other issue is that encryption comes with a cost in terms of time and energy.