Isilon – SyncIQ Loopback Replication

Our company migrated to Isilon from Celerra about two years ago. From time to time we receive requests from application people to clone production environment to either qa or dev instances in the same Isilon cluster. Back in the Celerra/VNX days we used to use nas_copy that would allow us to perform file system to file system copies but since we migrated to Isilon we were trying to figure out how to accomplish the same thing using Isilon utilities.  Up to this point we had to rely on host based tools such as emcopy or rsync, not very convenient considering the fact that you have to have a “proxy” server available to perform these copies. I also was very positive that using internal tools would be much more efficient and faster. After looking around Isilon google groups i found my solution.  Here is my configuration:

Isilon Cluster – 6 x 108NL nodes, each node has 2 x10G NICs and 2 x 1G NICs (LACP)

OneFS – 6.5.5.12

InsightIQ v2.5.0.0007

ESX 5.0 10G NIC

Ubuntu 11.10 VM, VMXNET3 NIC

I decided to test a couple of different scenarios to see which one would give me the best performance. Here is information from InsightIQ on the directory that i am using in all 3 scenarios. This directory contains data from a learning system so a lot of tiny little files.

5-31-2013 3-51-40 PM

Scenario 1 – Using SyncIQ with loopback address of 127.0.0.1

I created my SyncIQ job and specified 127.0.0.1 as Target cluster IP address, here is the policy details:

isilon-6# isi sync policy list  -v
Id: a7388bd04b21ba61ce9597eb90c712ca
Spec:
Type: user
Name: testpolicy
Description:
Source paths:
Root Path: /ifs/data/dev1
Source node restriction:
Destination:
       Cluster: 127.0.0.1
Password is present: no
Path: /ifs/data/qa
Make snapshot: off
Restrict target by zone name: off
Force use of interface in pool: off
Predicate: None
Check integrity: yes
Skip source/target file hashing: no
Disable stf syncing: no
Log level: notice
Maximum failure errors: 1
Target content aware initial sync (diff_sync): no
Log removed files: no
Rotate report period (sec): 31536000
Max number of reports: 2000
Coordinator performance settings:
Workers per node: 3
Task: sync manually
State: on

I went ahead and started the job, but i was really curious what interface it was going to use to copy the data.  I let the job run for about 15 minutes and this is what i saw in InsightIQ (performance reporting section)

5-31-2013 3-35-44 PM

Very interesting,  SyncIQ decided to use 1G interfaces. I was also happy to see that workload was distributed among 6 nodes of the cluster.  Even though SyncIQ settings were set at defaults (workers, file operations rules) look what it did to my cluster CPU utilization, pretty big spike.

5-31-2013 3-48-56 PM

I started the job at 3:15pm and it completed at 6:30pm for a total of 3:15 minutes, not bad at all for full copy.

Scenario 2 – Using SyncIQ with SmartConnect zone name

In this test i wanted to see if performance would be any different if i were to use SmartConnect zone name of my cluster that utilizes 10G NICs. Before i ran this test i went ahead and deleted the old policy and deleted the data from /ifs/data/qa directory using “treedelete” command, see bottom of this post for instructions.

Here is my SyncIQ job that uses local SmartConnect zone name

isilon-6# isi sync policy list  -v
Id: 7b5dac0efe79543425720a8290aa58b4
Spec:
Type: user
Name: testpolicy
Description:
Source paths:
Root Path: /ifs/data/dev1
Source node restriction:
Destination:
            Cluster: isilon.mycompany.com
Password is present: no
Path: /ifs/data/qa
Make snapshot: off
Restrict target by zone name: off
Force use of interface in pool: off
Predicate: None
Check integrity: yes
Skip source/target file hashing: no
Disable stf syncing: no
Log level: notice
Maximum failure errors: 1
Target content aware initial sync (diff_sync): no
Log removed files: no
Rotate report period (sec): 31536000
Max number of reports: 2000
Coordinator performance settings:
Workers per node: 3
Task: sync manually
State: on

I started the job and let it run for 15 minutes, this is what i saw in InsightIQ this time

5-31-2013 11-49-32 PM

This is what i expected to see, SyncIQ was using 10G network interfaces, quick look at CPU utilization displayed the same utilization as before, very CPU intensive process.  I started the job around 11:30pm and it completed at 2:30am,  so 3 hours for full copy.

6-1-2013 12-09-27 AM

Scenario 3 – Using rsync on Ubuntu VM

In this scenario i wanted to test and document performance of using rsync on a host that is acting as a “proxy” server. Basically i exported /ifs/data/dev1 and /ifs/data/qa directory using NFS and mounted them on my Ubuntu VM.  While i wanted to simulate “multithread” performance by running multiple rsync instances, directory layout was very convoluted and would not allow me to do that easily, so what i have tested is just the single rsync command

nohup  rsync –delete –progress -avW  /mnt/ifs/data/dev1/ /mnt/ifs/data/qa/ > log_`date +%F`_`date +%T` 2>&1 &

Summary Table

Scenario Description Full Copy Incremental Copy
1 SyncIQ using 127.0.0.1 address 3:15 hours 29 seconds
2 SyncIQ using SmartConnect Zone 3 hours 29 seconds
3 Using rsync 36 hours 4.5 hours

These are pretty impressive numbers from SyncIQ, going forward we will be using this procedure to clone our production instances.

Deleting data from Isilon Cluster

the most efficient way that i found to delete data from the cluster is to use treedelete command, here is the syntax

isi job start treedelete –path=<ifs-directory> –priority=<priority> –policy=<policy>

for example:

isi job start treedelete –path=/ifs/data/qa

It’s actually pretty fast, to delete the data that i was using for this test (3.47million files and 2.1 million directories) took only 20 minutes using default settings.

Advertisements