Storage de-duplication is not a new term in the IT industry. It had been a while since few years back. Many users are confused with this term. Same goes to the Cloud Computing.  Every Storage vendor will claim they have the de-duplication solution today, but they are different from each others as we know.

General de-duplication solution:

  1. Primary SAN de-duplication
  2. Appliance target base de-duplication
  3. Host base de-duplication
  4. Software base de-duplication

I may had missed out some of them, but these should cover majority of the de-deduplication solution in the market today.

Primary SAN de-duplication allow dedupe happens on the primary SAN,  where most of the active data were stored. This should be the highest cost / GB that running in your environment.  Primary SAN de-dupe happen on Netapp and EMC storage today. While both claim they are doing de-duplication, you may need to take note the block base de-dedupe and file base de-dupe are different approach.

Primary SAN de-dupe allow you to achieve disk space saving up to 50% or more, especially in virtualization environment. As we all are familiar with the virtual machine deployment, most of the virtual machines are usually cloned or deploy from the standard template, which will contains numbers of duplicated block data on the primary storage. De-duplication allow administrator to re-claim more space from the primary SAN, and reduce the TCO of overall virtualization strategy. I will suggest de-dupe to be enable for virtualization deployment and exclude the volume or LUN that require high I/O performance especially for Databases.

Appliance base de-deuplication – example Data Domain VTL, Quantum VTL and etc. These solution will be more specified to backup and recovery. As an example, it allows users to perform daily full backup with minimal disk space consumption with de-deduplication on virtual tape library. This is not on primary SAN as your VTL are not the primary SAN where the data are stored

Host base de-duplication – Avamar. This is another great solution in the market; allow data to be de-dupe before process over for local or remote backup. As an example, it will allow users to minimize the bandwidth requirement for remote office or location, and centralize manage the backup and recovery on primary data center.

Software base de-duplication which allows users to convert direct attach, NAS, tape or SAN attach storage, to be target base deduplication machine. This is heavily relying on the management software to perform the data de-duplication and not the hardware appliance itself.

No right or wrong, just to share the different in general how you should consider about de-duplication solution today in the market. Each of them has the unique position in the market for different perspective.