Deduplication made easy

The amount of data stored by businesses may exceed the world’s storage systems by 2010, claims markets research firm IDC.


The amount of data stored by businesses may exceed the world’s storage systems by 2010, claims markets research firm IDC.

The amount of data stored by businesses may exceed the world’s storage systems by 2010, claims markets research firm IDC.

Now while it may be easy to dismiss this as the next Millennium Bug story, data storage is growing by a whopping 60 per cent a year. Steve Mills, head of IBM’s US$13 billion software department, says the vast bulk of information can be discarded as junk as it’s replicated data.

The duplication can result from pictorial files clogging up email accounts to IT staff dutifully backing up an organisation’s data (either on disk or tape for offsite storage). Generally, they don’t stop to assess the quality of that data so they’ll copy that gigabyte every time they do a back up.

Considering the scale of the problem, it’s no surprise that deduplication (or ‘dedupe’) is currently the storage industry’s favourite buzzword. It means running a programme through your network to scan data at a sub-file level – not altogether different from a virus scanner. If it finds a copy it removes it and replaces it with a pointer to the original, allowing you to store one master copy.

Single-instance-storage (SIS) offered by some vendors, and inherent in Microsoft’s Exchange server, have been doing this for a while. But where dedupe gets clever is by scanning ‘inside’ the file to see what’s different.

For example, SIS keeps separate copies of your logo where it appears on all corporate correspondence, office stationery and every page of a presentation – but with dedupe, the logo will only be stored once.

Aside from email servers, the ramifications for backups, virtualisation (running lots of desktops from a single server) and disaster recovery, all of which involve large quantities of replicated data, are enormous.

Big companies are well-served in the dedupe arena by the giants – but small smaller companies looking to cut down on storage costs can take advantage of dedupe too.

Beth White, vice president of marketing for Data Doman, says: ‘We’ve seen 20 times compression with a database, and very aggressive deduplication rates with VMware (virtualisation). We really chew that up – sometimes we get 40-60 times compression.’

Surfwear brand O’Neills, credited with inventing the modern wetsuit and surfer’s safety leash, introduced dedupe appliances from Data Domain into the backup environments of five of its European sites. The company reduced its stored data, including a VMware implementation, by a factor of 18 and in the process cut its backup window from 14 hours to two.

‘We used to back up 1.4 terabytes – just critical things,’ says the company’s global IT service and infrastructure manager, Peter Malijaars. ‘Now we back up all the archives as well – 5.3 terabytes. Since we’re a clothing company there’s a lot of old designs, but now we only have to back up the changes.’

Related Topics

Leave a comment