Large-scale Data Migration -
Replicating and Synchronizing Millions of Files

RepliWeb: File Replication and File Synchronization for Content Deployment and Distribution.
The Challenge:

Two legacy UNIX servers, each with millions of files spanning terabytes of data, need to be synchronized. The existing solution cannot handle the volume of data, uncompromisingly bringing production systems to almost a standstill.

The synchronization must include the transfer of all UNIX file and directory permissions and ownership, including UID, GID, SUID, and SGID. It must also be flexible enough to handle UNIX hidden files and links; having the capability to replicate the link itself, the file or directory pointed to by the link, or ignoring links in the case that they are system specific.

Since system processes will constantly be accessing data on the target legacy servers, an incomplete or damaged file is not option. An 'old' file on the target system must only be overwritten if the new file has been verified for integrity and completeness.

Finally, the solution must support remote administration and monitoring.


Replicating millions of files



The Solution: Configuring RDS

The RepliWeb Deployment SuiteTM (RDS) includes advanced options that control every aspect of the replication. UNIX permissions and file securities can be transferred in full, including the choice of transferring UNIX hidden files. RDS is capable of handling UNIX links for any scenario: ignoring the links, replicating the links themselves, or replicating the files pointed to by links.

When the servers are separated by a firewall over a VPN, RDS can seamlessly replicate the data by using multiple streams, drastically increasing performance, and taking advantage of the parallel processors available to enterprise servers. RDS has the option of enabling differential transfer. This feature will only send the blocks necessary to match the target file to its source.

For example, a 2 GB database file grows by a few megabytes each day. Instead of retransferring the entire 2 GB file, RDS will only transfer the differing blocks, saving time and costly bandwidth.

When advanced transfer options, such as bandwidth control and compression are unnecessary, RDS can use an advanced "express" transport mechanism. This option has been shown to transfer gigabytes of data at rates exceeding ten times conventional transfer rates. To address the issue of system resources, RDS can be configured to use a reduced amount of CPU resources during the CPU intensive Comparative Snapshot stage. This allows the legacy server to perform its other duties unhindered, while allowing the replication to proceed in the background.

Each file replicated can be sent to a user-defined temporary directory during the span of the transfer. Once the transfer is complete and the file has been verified for integrity it will be instantaneously renamed, overwriting the 'old' file. This will ensure that system processes and users will only ever access complete, verified files.

Using the RDS Console, the entire operational progress and administration can be done remotely by command-line from a UNIX system, or through either command-line or graphical interface on a Windows machine.



TOP

  RepliWeb, Inc., All Rights Reserved. | Knowledge Base | Support | Home

More Solutions...
Web Deployment
Data Distribution
Huge Volumes
Replacing Site Server
Cross Platform Distribution
Cascaded Replication
Content on Demand
Offsite Backup
Publish Web-Sites Developed in MS Front Page
Replicate Oracle Archive Logs
Scheduled File Distribution using a Corporate Scheduler Scheduled Distribution using Batch Scheduler
RepliWeb vs. Protocol-Based Solutions
Consolidating R-1 Jobs
GlobalSite Replacement using R-1
Middle-Mile Content Deployment