Cascaded File Replication

RepliWeb: File Replication and File Synchronization for Content Deployment and Distribution.

A government agency has migrated its public information to be accessible via a web interface. The agency is operating out of one central office and multiple remote regional and district offices.

The central office system processes the data and generates new web content on a scheduled basis. A portion of the content is updated daily and the remainder is updated on a weekly basis. The customer requires synchronization of the central server content with the regional servers, and subsequently from the regional servers there is a need to synchronize the content on to the district level servers.

Cascaded file replication

Figure 1: Content on National level to be replicated to Regions & Districts

One of the challenges is that the synchronization process takes place over a Wide Area Network with limited bandwidth. The network is the same production network that was already in place before the migration to web content. However, the web content has brought up a demand for synchronization of hyper-volumes (real monsters) that needs to take place over the existing net infrastructure in addition to all the other "regular" applications over the network.

The average volume that has to be synchronized between the central server and the regional and district servers is in the area of 15GB in about 35,000 files. The average number of changed files in each synchronization cycle is in the area of 5,000. The synchronization process must also take care of removing files/directories that were deleted on the central server so that an exact copy of the central web is available at ALL remote locations.

The main requirement is to implement a solution that completes the synchronization in a timeframe proportional to the number of changed files rather than the total volume to be synchronized.

  1. As the entire process works over a WAN, it cannot use disk mapping, UNC names etc. It must rely only on TCP/IP connections and must be "firewall-friendly."
  2. The data movement process must use only a user-configurable portion of the available bandwidth to each server.
  3. The synchronization process must be performed simultaneously to all the machines on a given level (all regional servers are sync-ed simultaneously). However, the synchronization progress of one location must not depend on the progress or success of another (non-sequential).
  4. As some of the lines at the second tier are relatively slow, performance should be boosted by using compression for the specific locations only.
RDS Configuration

An RDS Controller is installed on the national central web server.

RDS Satellites are installed on the regional web servers.

Another RDS Controller is installed on regional server "A" to take advantage of network topology and perform the distribution and synchronization with the second tier of district web servers.

An RDS Console is installed on the administrator's machine at the central location. This Console controls and monitors the synchronization jobs performed both by the central Controller and by the regional Controller, from a single central location.

A scheduled replication job is configured for each synchronization process. Once configured, each job runs according to its schedule policy without any human intervention or a logged-on window. Each job is configured to use the "mirror" logic so that it faithfully replicates changed files and "delete" operations.

Each job is also configured to use a bandwidth policy that fits the specific requirements of each remote location. For example, being set to start running at 4:30 PM, it is set to use no more then 5% of the available bandwidth between 4:30 PM and 12:00 AM. After 12:00 AM the job may use up to 80% of the available bandwidth. The change of bandwidth consumed by the process will be automatic without intervention.

Synchronization of remote locations over slower lines is also configured to implement transparent date compression when sending data over these slower lines.

When the process begins, RDS analyzes the changes between the source web server and the target server data, builds a Comparative Snapshot, transfers only the required data to the remote location and takes care of deleting what has to be deleted.

As the analysis is based on local processing occurring simultaneously on both machines and the data transfer is kept to a minimum, replication time is proportional to the number of changes and is not influenced by the total volume of data or total number of files.




TOP

  RepliWeb, Inc., All Rights Reserved. | Knowledge Base | Support | Home

More Solutions...
Web Deployment
Data Distribution
Large-scale Data Migration
Replacing Site Server
Cross Platform Distribution
Cascaded Replication
Content on Demand
Offsite Backup
Publish Web-Sites Developed in MS Front Page
Replicate Oracle Archive Logs
Scheduled File Distribution using a Corporate Scheduler Scheduled Distribution using Batch Scheduler
RepliWeb vs. Protocol-Based Solutions
Consolidating R-1 Jobs
GlobalSite Replacement using R-1
Middle-Mile Content Deployment