SCUTECH Wechat QR Code

SCUTECH Wechat QR Code

A few tips on estimating deduplication ratio

If you are trying to find a deduplication tool, but are confused which one is the best and works best for your data; well, the first thing you need to know is: how each software identifies data and deduplicates them is what differentiates one from another. Hopefully, this article can help you in some way.

Deduplication technology is mainly applied in backup software and disk library. Popular backup software includes Tele-vaulting developed by Asigra, Avamar by EMC and Veritas Network PureDisk by Symantec, which deduplicate data in host-level and then transfer data to a destination disk or an alternate location for disaster recovery; disk library manufacturer include Data Domain, Diligent Technologies, Quantum and Sepatum. Deduplication software deduplicates data of destination devices without impacting data backup jobs.

The algorithm below works for basically every deduplication software, but exactly how it will impact performance and how to manage deduplicated data are what people cares most. If no single deduplication software on the market can meet your demands, you may be forced to use more than one to deduplicate both on backup software and disk library, which, may cause some problems, such as: will undeduplicated data be successfully stored on the disk? Will the deduplication software be compatible with the backup software or will it be possible to shut down deduplication service if necessary. One must thoroughly think through these problems before they do it.

How to estimate deduplication ratio?

Data redundancy

The more redundant data you store on the server, the higher the ratio. If most of your data are identical, then you may be surprised by the deduplication result; while it could be comparatively low if users use different systems and do not have many repeated files.

Data change frequency:

The more frequent data changes, the higherthe deduplication ratio. The 20-1 ratio is based on an estimated 5% of datachange.

Data pre-compression

Generally, data will be compressed at 2-1. Assume that the deduplication ratio is 15-1, then after compression, it should reach 30-1; but you must also know that little can be achieved if data are already compressed such as .jpeg, .mpeg and .zip files.

Reservation period:

This is another element affecting deduplication ratio. Without enough space to store data, the ratio should drop significantly. For example, if you plan to reach 10-1 or 30-1 of deduplication ratio, then set reservation period at 20 weeks.

Full backup:

The more full backup you have done, the higher the ratio.