Intro

This page is my quick crib sheet of issues to be considered when making copies of data for backup or archival purposes.

Whats the difference

The process for backup and archiving can be very similar but there are key differences. These differences are important when considering what method(s) you will use.

Backing up

The act of taking a snapshot of the current state of a system, to protect the data on the system and hence your business from a catastrophic event. Most days nothing happens and the backup is not used.

The following is a short table of reasons people have wanted a backup, based on 30 years memory of working in the IT industry..

EventCountComment
User100+I did not mean to :-
  • Run that command.
  • Delete that directory
  • Install Malware
  • ...

I genuinely thought that phone call was from:-

  • Microsoft
  • Cisco
  • Our ISP
  • The police
  • ...
Mechanical100+

Hard disks fail. For 5.25", and 3.5" drives, made before 2005 the most common issues I have seen are bearing or motor related. With the more recent drives they most commonly seem to develop issues reading particular sectors. Either way the drives fail. RAID helps, but you can still hit issues when the rebuild process discovers additional problematic sectors.

Side note modern disk drives especially the small 2.5" units common in laptops, notebooks, and portable drives. Seem to be particularly sensitive to external magnetic fields. Keep away from:-

  • Magnets including executive desk toys.
  • Speakers. Car door pockets are too close to door speakers.
  • Microwaves
  • Radio transmitters. Not designed to operate in close proximity to disks.
Loss of power30Modern file systems keep journals of key changes, and are much more tolerant of loosing power mid change. However they do not guarantee that all files will be complete and consistent.
Lightning12Risk varies greatly with location and recent weather.
Fire2Mitigate with good building design. Do not leave unterminated lengths of cable in void spaces.
Canal1Break in, office trashed, computer thrown in canal.
Wiring fault1Data cabling is not designed to carry mains power.
Malicious1Ransom ware

3-2-1 backup rule

This has stood the test of time, but it is pretty much a minimum.

backup schedule

Whatever you backup procedure, it should be done on a regular schedule. For a home computer it may be that very little changes on a day to day basis, but at the start of each month you do on line banking, check your statements, and save copies.

In this case backing up your documents each month when you have finished checking your statements, would be a logical choice.

In a business situation where a lot more changes, I would suggest performing a backup procedure at the end of each working day. Procedure should include writing one copy to removable media, and taking it off site.

For a 5 day operation have 5 tapes labelled Monday, Tuesday, Wednesday, Thursday, Friday, plus a couple of spares. If the operator who does the backup on Monday falls ill, another staff member can do Tuesday's backup. If you have a problem on a Thursday there is no confusion as to which tape you would like to restore from, it is the one labelled Wednesday.

Ransom ware

This is a particularly nasty class of malware, that encrypts, or scrambles data files on your computer. Sender offers a blackmail proposition, pay us some money or face the costs of loosing your data. Paying up does not guarantee that you get your data back.

Key issue with regard to backups is that the malware often launches a phased attack.

If the attacker decides to run through these steps quickly say 30 minutes, then even if the attack coincides with a backup the data from the previous backup should be OK.

Snag is if the attacker is leisurely, taking weeks or even months to ex-filtrate data before you notice. Will your backup schedule and method retain a safe backup in this case?

Archiving data

Making a copy of a set of data currently on a server, for long term storage with the expectation that the data is going to be deleted from the server. so that the only copy will be the long term archive.

Generally make multiple identical copies, add file check sums, or parity information. Over and above the error correction built in the the recording medium.

Readability

All storage media degrades with time. For printed paper that may be 100s or even 1,000s of year. But for all current storage technologies it is much much shorter.

FLASH5 to 10 yearsThe charge on the memory cells slowly degrades, eventually too many cells will be in error for the error recovery to recover your data.
Tape10 to 20 yearsTape substrate degrades.
Optical dye based100 years?Depends a lot on the disk quality and construction.
Verbatim mdisk 1000 yearssold specifically for archiving, different construction.

Next consideration is what equipment will you have available to access the archive in 5, 10, 20 years time. Servers are generally expected to have an operating life of around 5 years, so you may be 2 or 3 server changes down the road when you want to access that archive.

Applications

I figure this is fairly obvious, but have come across multiple cases where it has been missed.

If you are putting data away for long term storage, make sure you also include a copy of the applications needed to access that data.

Take a common example, you run a computerized payroll. Each year the government makes changes to the tax rules. so at the end of each financial year you apply an update. If you archive the data but not the program, then when you suddenly find a need to access that data from 6 years ago you have a problem.

Hybrid Backup + Archive

This has worked quite well with our development teams.

Backup to a rotating set of tapes labeled Monday, Tuesday, Wednesday, Thursday, Friday.

At the end of each month remove the last tape used from the rotation, and label with the month and year.

Terms

Encrypted in transit

Literally the data is encrypted sent and decrypted at the receiving end. If you are using a HTTPS connection from a web browser to say your bank. Then the data is encrypted by the bank sent to you and decrypted by your browser. The web server had access to the unencrypted data, but is hopefully well secured. Your browser may have cached the decrypted pages on your PC.

File sharing sites often support encryption in transit, but the shared files are usually stored unencrypted on their servers.

Encrypted at rest

If your home computer is stolen, then the thief can in most cases access any file stored on its disks. High end disk drives marketed to companies working with sensitive data, have the ability to have an encryption key loaded to the drive during power up. This key is used to encrypt and decrypt data written to or read back from the drive. If the drive is powered off it forgets the key.

Drives removed for maintenance, or stolen therefore contain no usable data. Some Tape drives, and some USB keys, have the same abilities. Some operating systems have the option to perform a similar function in software, at the cost of a small performance penalty.

End to end encryption

End to end or whole life encryption. You encrypt your confidential data, send it some where for storage, then later recover it possibly to a different location. At no point between when you encrypted it, and when you received it back was the data decrypted.

On server

A backup of a data set to a different storage area, on a server. Very useful for rolling back an upgrade, that went wrong. Usually the quickest form of backup to make, also the quickest to restore.

May also be used as the first step of an off server or off site backup procedure, to minimise downtime on a database.

Off server

Backup is to removable media, or a different server in the same building. Protects against problems affecting the original server, but not to problems affecting the building.

If you have a lot of devices with small amounts of storage that need to be backed up. Copying them all to a backup server fitted with large but relatively slow disks, then making an off site backup from that server, makes a lot of sense.

Off site

Backup has enough physical separation from the originating device that it is very unlikely, that any single event would affect both.

Archive

A container (file), that contains multiple files, no mater why it was created is referred to as an archive. This can be a great source of confusion.

Data security

How secure is your data? This question can be looked at in two different ways.

Media tracking

I am not aware of any GPS enable tracking systems for backup media, that would detect when the media was "off site". From an administrative point of view, this could automate the record keeping for the offsite eliment of a 3-2-1 backup plan. It would also help with security as people do misplace things, which can be extreemly ease with the smaller forms of backup media.

LTO tape cartridges contain a contact less memory device, but it has a very short range, less than 20mm, it is intended to be used by the drive to record usage details.

Generally tape cartridges intended for use in large automated library systems, have had provision for a label using text and a code39 bar code. Since 2010 HP enterprise, has produced a range of labels for LTO media that incorporates a RFID tag. This is intended to work as part of an asset tracking system with detectors at key doorways.

Question

As of 2016 MicroSDXC cards are readily available up to 128GB, standard allows for 2TB. New mobile phones are expected to adopt the USB3 "type C" connector. What are the pros and cons of putting a high capacity MicroSDXC card, in a mobile phone, and using that as a portable, trackable storage device?

Media options

A lot of different types of media are or have been used for backups or archiving. They all have their own strengths and weaknesses. As technology has changed so capacity and speed have improved.

Comparison of characteristics

Capacity speed and cost

The table below is as of 2016. Table is sorted by native capacity. Some devices mostly tape drives, can attempt to compress data as it is written.

Devices which are no longer available, have a grey background.

CapacityMediaDrive
Cost
I/O speed
NativeCompressedTypeIDCost
60 MB120?TapeDC600A  
125 MB250?TapeDC6150A  
525 MB1G?TapeDC6525A  300 KB/s?
700 MB?OpticalCD18p£201x 150KB/s
4x 600KB/s
16x 2.4 MB/s
1.2 GB250?TapeQIC136  300 KB/s?
4 GB?FLASHiStorage datashur 4GB USB2£40N/AUSB2
4 GB8 GBTapeDDS2  600 KB/s, 2100 MB/hour
4.7 GB?OpticalDVD+R14p£201x 1.25 MB/s
4x 5MB/s
8 GB?FLASHiStorage datashur 8GB USB2£60N/AUSB2
9.2 GB?OpticalDVD+R-DL36p£201x 1.25 MB/s
4x 5MB/s
12 GB24 GBTapeDDS3  1.1 MB/s, 4GB/hour
16 GB?FLASHiStorage datashur 16GB USB2£82N/AUSB2
20 GB40 GBTapeDDS4  2.4 MB/s, 8.6 GB/hour
? GB?OpticalHD-DVD  
25 GB?OpticalBluray40p 1x 4.5 MB/s
2x 9MB/s
30 GB?FLASHiStorage datashur 30GB USB3£179N/AUSB3
32 GB?FLASHiStorage datashur 32GB USB2£98N/AUSB2
36 GB72 GBTapeDDS5
DAT72
£13 3 MB/s or 12.6 GB/hour
40 GB80 GBTapeDLT4£66 10 MB/s, or 36 GB/hour
50 GB?OpticalBluray-DL£3 1x 4.5 MB/s
2x 9MB/s
100 GB200 GBTapeLTO1£48 17 MB/s
100 GB?OpticalBluray-TL  1x 4.5 MB/s
2x 9MB/s
120 GB?FLASHiStorage datashur 120GB USB3£252N/AUSB3
128 GB?OpticalBluray-QL  1x 4.5 MB/s
2x 9MB/s
200 GB400 GBTapeLTO2£43 34 MB/s
240 GB?FLASHiStorage datashur 240GB USB3£330N/AUSB3
300 GB600 GBTapeSDLT2£170 36 MB/s, 130 GB/hour
320 GB?RDXRDX-320GB£122£160USB
350 GB?RDXRDX-350GB £160USB
400 GB800 GBTapeLTO3£35 68 MB/s
500 GB?RDXRDX-500GB£165£160USB
800 GB1.6TBTapeLTO4£23 120 MB/s
1 TB?DISKiStorage diskAshur 1TB USB3£183N/AUSB3
1 TB?RDXRDX-1TB£200£160USB
1.5 TB?DISKiStorage diskAshur 1.5TB USB3£250N/AUSB3
1.5 TB3 TBTapeLTO5£19 140 MB/s
2 TB?RDXRDX-2TB£390£160USB
2.5 TB6.25 TBTapeLTO6£30 160 MB/s
3 TB?DISKiStorage diskAshur 3TB USB3£285N/AUSB3
6 TB15 TBTapeLTO7£141 315 MB/s
12.8 TB32 TBTapeLTO8  472 MB/s

Note USB

RDX and flash keys are commonly connected via USB. Speed may be Limited by either the interface speed, or the device characteristics.

Note RDX

RDX docking stations have been available with both USB2 and USB3 connections. If you are using an older USB2 model, upgrading to USB3 may get you a useful performance gain providing your computer supports it.

Tanburg produced a quad RDX docking station that connects iSCSI over 1 Gbit network, so up to 120 MB/S.

Note Bluray

Original Bluray format was one or two layers at 25 GB per layer. Latest generation of drives are BD-XL, they support approximately 30 GB per layer, and up to 4 layers.

Tape Library

In essence a jukebox for backup tapes. Enclosure typically holds 1 to 4 tape drives, and 12 tape cartridges. Much bigger units are available. They are generally targeted at banks, and scientific research groups. Both of which tend to have a lot of data they want to backup or archive.

Where a backup will not fit on one tape, the tape library can be setup to automaticaly swap tapes and continue.

Compression

File compression, and data deduplication techniques can reduce the amount of data to be stored. They are not always a good idea.

Archive compression.

The following table shows the result of running a 682,716,160 Byte archive containing a mixture of text files, and image files. Through a number of compression programs. with 4 Intel (Haswell) cores available, specifically E5-2687W v3 @ 3.10GHz, and 1GB of RAM

Program Size PercentageTimeReadWrite
xz 583,166,77685% 4m 19s2.6 MB/s2.3 MB/s
bzip2 597,040,03087% 1m 29s7.7 MB/s6.7 MB/s
pbzip2 597,232,02587% 22.9s29.8 MB/s26 MB/s
gzip 598,583,16587.6% 22s31.0 MB/s27.2 MB/s
zip 598,583,34387.6% 19.8s34.5 MB/s30.2 MB/s
lzop 613,016,91389.8% 2.5s273 MB/s245 MB/s
compress758,656,977111.1% 16s42.7 MB/s47.4 MB/s

Larger set of files (24 GB) on an internal SATA disk, no images. Output to USB2 attached hard disk. Intel core 2 duo @ 2.6 GHz.

Program Archive Size PercentageTimeReadWrite
spax+xz 3,723,833,16815.5%4h 28m 23s1,496 KB/s231 KB/s
spax_bzip2 5,217,319,95521.7%46m 40s8.6 MB/s1.7 MB/s
spax_pbzip2 5,218,819,02021.7%27m 10s14.8 MB/s3.2 MB/s
spax+zip 6,673,666,29027.7%24m 13s16.6 MB/s4.6 MB/s
spax+lzop 9,667,022,07440.1%5m 37s71.5 MB/s28.7 MB/s
spax+compress12,636,592,08352.5%9m 10s43.8 MB/s23.0 MB/s
spax 24,088,596,480100%11m 5.8s36.1 MB/s36.1 MB/s

This is a real odd ball mix.

Further based on those figures i would expect the best option for a 100 Mbit link would be spax+lzop I estimate 12m 54s. For a 10 Mbit link i estimate that using spax+bzip2 would take 1h 3m 28s. for a 1 Mbit link i estimate that using spax+xz would be quickest and take 8h 15m 58s.

Encryption and security

While data is on a server, especially a rack mounted server in a secure room, it is physically protected. All current operating systems make provision for controlling access, to data files and programs.

Guides on how to configure a secure environment are freely published by a number of organizations.

When you backup your system, to a removable device you loose this protection.

When you remove a storage device from a system, you loose this protection.

history

To be very blunt when i joined the industry theft of information from backup media, was not seen as a major problem. Tapes were of a substantial size, and in modern terms held very little data. A DC600A cartridge held 60 MB, and was about the size of a smallish paperback novel. If you wanted a larger data capacity, you were looking at reel to reel tape systems, with tape spools around 50 cm in diameter, and 3 cm thick.

Because of the small memory and storage capacities, of the computers of the day. The programs tried very hard to squish the data they wanted to store in to the smallest space possible. This had an unintended but interesting consequence.

If you wanted to make sense of the data on a tape, you needed a copy of the program that wrote it. you also need the computer to run that program, and someone who new how to set it up. Generally that combined to make a data tape pretty much useless to anyone other than its owner.

Legislation

If your data includes personal data then there are a number of pieces of legislation that set out your need to look after personal data.

ICO Information Commissioner's Office

Cryptography

This is a tool, it is not a one size fits all cases solution. Encrypting media that is to be shipped off site adds a useful layer of protection. If the media is lost, stolen, or other wise misplaced, the information is safe from disclosure, providing you did not ship the key with the media.

The down sides are that if the media is you backup, and you can not locate the key when you want to recover the data, then you have a big problem. The other issue is that encryption comes with a cost in terms of time and energy.

Hardware encryption

Software encryption