SEPATON CTO Miklos Sandorfi Outlines Key Considerations for Meeting the Deduplication Needs of the Enterprise

MARLBOROUGH, MA - November 6, 2007 - The volume of data generated by most companies has grown at such an explosive rate that many data centers are running out of space, power, cooling, and storage capacity.  Issues of insufficient capacity are being compounded by increasingly stringent regulatory requirements and business initiatives demanding higher service levels, longer online retention times, and higher levels of data protection.  Data deduplication technology is rapidly emerging as an effective solution to significantly offset data growth and meet regulatory and business requirements.

SEPATON, Inc.'s Chief Technology Officer, Miklos Sandorfi, cautions enterprises that data deduplication approaches vary and outlines key considerations for choosing the approach that best meets the needs of large enterprises.

Know the Basic Approaches of Data Deduplication. There are two basic categories of data deduplication technology: hash based and byte-level comparison deduplication.The hash-based approach runs incoming data through a hashing algorithm to create a small representation of the data and a unique identifier for that piece of data called a hash. It then compares the hash to previous hashes stored in a lookup table. If a match is found, then the duplicate data is replaced with a pointer to the existing data. If a match is not found, the data is added to the lookup table.

An alternate approach is utilizing byte-level comparison technology.  Here, pattern matching is used to find duplicate data; since actual data comparisons are made, there is no data integrity risk.  Some solutions take this a step further by using built-in intelligence about the actual file content for comparing data as objects (e.g., Word document to Word document or Oracle database to Oracle database) and identifying potential redundancies.  Unlike other technologies that use the first instance of a file as the reference copy, enterprise-class implementations use the most recent copy and replaces older duplicatedata with pointers. As a result, this technology eliminates the need to reconstitute new data from multiple reference points enabling instantaneous data restoration.

Distinguish between Inline vs Post-Processing.  A key distinction between deduplication technologies is whether the deduplication process is done in-line as part of the backup process or as a post-process. Deduplication performed inline requires slightly less capacity and is adequate for relatively small backup requirements. However, this method has a significant negative impact on performance and cannot complete large backups required by enterprise organizations within typical backup windows. An alternative method completes backups at full, unimpeded performance. The deduplication process is started as soon as the backup process begins and continues in parallel with the backup in a fully integrated operation. The main benefit of this post-process method is that it can handle much larger volume backups within a typical eight-hour backup window. In addition, because it backs up a full set of data, post-process method enables a more rigorous data integrity checking capability.

Choose a Solution that can Backup and Restore Petabytes of Data. A primary consideration in choosing a backup technology for an enterprise or large enterprise is the solution's ability to handle terabytes or petabytes of data while staying within the backup window. The objective being to avoid creating dozens of separately managed ôsilosö of storage.

Ensure High-Performance Over Time. Many solutions see a marked degradation in performance over time as data becomes more fragmented across the disk and the database when duplicate data storage expands. Choose a solution that delivers performance regardless of the timeframe.

Set Realistic Expectations for Capacity Reduction. Deduplication approaches and results vary widely among solutions as does the time required to achieve maximum deduplication. The effectiveness of deduplication technology also depends heavily on the specific backup policies, the application and the mix of data types that are being backed up.

Check Restore Performance. Backing up data quickly is only half the challenge. To be successful, data needs to be restored quickly and efficiently. In fact, one of the key drivers for adopting deduplication technology is the ability to keep data on disk longer in order to simplify and accelerate restore times.  Before adopting a new deduplication technology, be sure to test restore times and efficiency. Most restore requests are for data thatis less than two weeks old. Solutions that use the first backup as the reference copy must recreate the most recent backup from weeks or months of pointers. In contrast, solutions that use the most recent backup as the reference copy can restore that data nearly instantaneously.

Ensure Data Integrity. Enterprise deduplication requires guaranteed data integrity. Some deduplication algorithms can result in data integrity issues. Look for solutions that guarantee data integrity. Enterprise class solutions perform a data integrity check that compares the deduplicated data to the original data set at the byte level before any duplicate data is deleted or disk space is redeployed. This comparison needs to ensure that when deduplicated data is reconstructed, it is byte for byte identical to the original backup.

Require Enterprise Class Reliability. Since the deduplication solution is going to be the primary recovery source for weeks or months of data, the base platform should have the type of reliability and availability features as those found in enterprise class disk solutions including redundant power and cooling; redundant data paths with automatic failover; RAID protected storage; the ability to maintain nearly full backup and restore performance, even when a node is lost; solutions that enable capacity or performance to be added independently and without disruption to existing infrastructure; and management software that reports any faults through email/page etc.

Tune the Solution to the Environment. Choose a solution that can be tuned to supportspecific policies and procedures as well as to specific environmental requirements. Some solutions, particularly low-end solutions are designed for smaller, simpler infrastructures and therefore have few parameters that can be adjusted to individual needs.

Consider the People Factor. Before trusting data to a new technology, choose a vendor with experience in the specific data protection requirements of enterprise-size organizations. To be effective, work closely with a company that will help configure a solution that best meets the needs and addresses the specific requirements of the backup applications.

Beware solutions that are built on off-the-shelf servers and low-end storage without enterprise class reliability/availability features.

Where to Learn More
To learn more about data deduplication for the enterprise, please downloadthe free white paper from SEPATON, Inc., entitled, ôThe Business Case for Data Deduplicationö at the following URL:  http://www.sepaton.com/bizcase

About SEPATON, Inc.
SEPATON is enabling companies of any size to comply with demanding data storage and retrieval requirements by changing the rules of data protection. We provide innovative, disk-based solutions that are intelligent, reliable and reduce storage, management and recovery costs. Because the proven SEPATON ContentAware architecture keeps trackof all files and content, it enables faster and more efficient performance of a range of advanced data protection, data retention and data recovery functions. SEPATON's virtual tape library systems dramatically out-perform tape and other disk-based solutions. For additional information call 508-490-7900 or visit www.sepaton.com.

Contact:

Beth Winkowski
SEPATON, Inc.
978/649-7189
bwinkowski@sepaton.com

###

SEPATON, SRE and S2100 are registered trademarks of  SEPATON Corporation.  ContentAware, Site2 and DeltaStor are  trademarks of SEPATON Corporation.  All  other brand and product names are or may betrademarks of their respective  owners.