Isilon – SyncIQ Loopback Replication

Our company migrated to Isilon from Celerra about two years ago. From time to time we receive requests from application people to clone production environment to either qa or dev instances in the same Isilon cluster. Back in the Celerra/VNX days we used to use nas_copy that would allow us to perform file system to file system copies but since we migrated to Isilon we were trying to figure out how to accomplish the same thing using Isilon utilities.  Up to this point we had to rely on host based tools such as emcopy or rsync, not very convenient considering the fact that you have to have a “proxy” server available to perform these copies. I also was very positive that using internal tools would be much more efficient and faster. After looking around Isilon google groups i found my solution.  Here is my configuration:

Isilon Cluster – 6 x 108NL nodes, each node has 2 x10G NICs and 2 x 1G NICs (LACP)

OneFS – 6.5.5.12

InsightIQ v2.5.0.0007

ESX 5.0 10G NIC

Ubuntu 11.10 VM, VMXNET3 NIC

I decided to test a couple of different scenarios to see which one would give me the best performance. Here is information from InsightIQ on the directory that i am using in all 3 scenarios. This directory contains data from a learning system so a lot of tiny little files.

5-31-2013 3-51-40 PM

Scenario 1 – Using SyncIQ with loopback address of 127.0.0.1

I created my SyncIQ job and specified 127.0.0.1 as Target cluster IP address, here is the policy details:

isilon-6# isi sync policy list  -v
Id: a7388bd04b21ba61ce9597eb90c712ca
Spec:
Type: user
Name: testpolicy
Description:
Source paths:
Root Path: /ifs/data/dev1
Source node restriction:
Destination:
       Cluster: 127.0.0.1
Password is present: no
Path: /ifs/data/qa
Make snapshot: off
Restrict target by zone name: off
Force use of interface in pool: off
Predicate: None
Check integrity: yes
Skip source/target file hashing: no
Disable stf syncing: no
Log level: notice
Maximum failure errors: 1
Target content aware initial sync (diff_sync): no
Log removed files: no
Rotate report period (sec): 31536000
Max number of reports: 2000
Coordinator performance settings:
Workers per node: 3
Task: sync manually
State: on

I went ahead and started the job, but i was really curious what interface it was going to use to copy the data.  I let the job run for about 15 minutes and this is what i saw in InsightIQ (performance reporting section)

5-31-2013 3-35-44 PM

Very interesting,  SyncIQ decided to use 1G interfaces. I was also happy to see that workload was distributed among 6 nodes of the cluster.  Even though SyncIQ settings were set at defaults (workers, file operations rules) look what it did to my cluster CPU utilization, pretty big spike.

5-31-2013 3-48-56 PM

I started the job at 3:15pm and it completed at 6:30pm for a total of 3:15 minutes, not bad at all for full copy.

Scenario 2 – Using SyncIQ with SmartConnect zone name

In this test i wanted to see if performance would be any different if i were to use SmartConnect zone name of my cluster that utilizes 10G NICs. Before i ran this test i went ahead and deleted the old policy and deleted the data from /ifs/data/qa directory using “treedelete” command, see bottom of this post for instructions.

Here is my SyncIQ job that uses local SmartConnect zone name

isilon-6# isi sync policy list  -v
Id: 7b5dac0efe79543425720a8290aa58b4
Spec:
Type: user
Name: testpolicy
Description:
Source paths:
Root Path: /ifs/data/dev1
Source node restriction:
Destination:
            Cluster: isilon.mycompany.com
Password is present: no
Path: /ifs/data/qa
Make snapshot: off
Restrict target by zone name: off
Force use of interface in pool: off
Predicate: None
Check integrity: yes
Skip source/target file hashing: no
Disable stf syncing: no
Log level: notice
Maximum failure errors: 1
Target content aware initial sync (diff_sync): no
Log removed files: no
Rotate report period (sec): 31536000
Max number of reports: 2000
Coordinator performance settings:
Workers per node: 3
Task: sync manually
State: on

I started the job and let it run for 15 minutes, this is what i saw in InsightIQ this time

5-31-2013 11-49-32 PM

This is what i expected to see, SyncIQ was using 10G network interfaces, quick look at CPU utilization displayed the same utilization as before, very CPU intensive process.  I started the job around 11:30pm and it completed at 2:30am,  so 3 hours for full copy.

6-1-2013 12-09-27 AM

Scenario 3 – Using rsync on Ubuntu VM

In this scenario i wanted to test and document performance of using rsync on a host that is acting as a “proxy” server. Basically i exported /ifs/data/dev1 and /ifs/data/qa directory using NFS and mounted them on my Ubuntu VM.  While i wanted to simulate “multithread” performance by running multiple rsync instances, directory layout was very convoluted and would not allow me to do that easily, so what i have tested is just the single rsync command

nohup  rsync –delete –progress -avW  /mnt/ifs/data/dev1/ /mnt/ifs/data/qa/ > log_`date +%F`_`date +%T` 2>&1 &

Summary Table

Scenario Description Full Copy Incremental Copy
1 SyncIQ using 127.0.0.1 address 3:15 hours 29 seconds
2 SyncIQ using SmartConnect Zone 3 hours 29 seconds
3 Using rsync 36 hours 4.5 hours

These are pretty impressive numbers from SyncIQ, going forward we will be using this procedure to clone our production instances.

Deleting data from Isilon Cluster

the most efficient way that i found to delete data from the cluster is to use treedelete command, here is the syntax

isi job start treedelete –path=<ifs-directory> –priority=<priority> –policy=<policy>

for example:

isi job start treedelete –path=/ifs/data/qa

It’s actually pretty fast, to delete the data that i was using for this test (3.47million files and 2.1 million directories) took only 20 minutes using default settings.

Deploying virtual Isilon on ESXi

I’ve documented the steps on how to deploy virtual Isilon appliance on ESXi platform. I believe at the moment only existing EMC customers can get their hands on this appliance.  Isilon virtual appliance is to be used for testing purposes only. This appliance comes pre-built for VMware Workstation/Player, which is nice, but I wanted to deploy it on my ESXi server (free edition).

Current ESX server configuration:

Two vSwitches, vSwitch0 is my public network and vSwitch1 is my private network (I used 192.168.1.0/24 subnet, this will be used by Isilon cluster for intra-cluster connectivity, on real hardware Isilon uses InfiniBand switches).

DNS records:

You don’t have to create A records for internal interfaces, i am listing them here for documentation purposes only.

isilonintpoc1.local – 192.168.1.50
isilonintpoc2.local – 192.168.1.51
isilonintpoc3.local – 192.168.1.52

A record for each node of the cluster (external interfaces)

isilonpoc1.local – 10.144.4.11
isilonpoc2.local – 10.144.4.12
isilonpoc3.local – 10.144.4.13

A record for SmartConnect Service IP

isilonpoc0.local – 10.144.4.10

NS record for SmartConnect zone name, this record should point to the A record of SmartConnect Service IP

isilonpoc.local –> isilonpoc0.local

Let’s get started …

  • Extract content of zip file to your hard drive
  • Download and install VMware vCenter Converter Standalone
  • Open Converter, select “Convert Machine”. For Source type select “VMware Workstation or other VMware virtual machine”.  For “Virtual machine file browse to the extracted folder and select vmx file that is in the root of the directory.

vmxselect

  • Enter ESX server information and hit Next. Enter node name that you want to assign to this VM, select datastore, I left virtual machine version at 7.
  • The only thing that i modify on the next page is Networking, i change NIC1 to vSwitch1 (private) and NIC2 to vSwitch0 (public). It will take 5-10 minutes to convert the appliance. In my virtual cluster i will have three Isilon nodes so i repeat the same steps and convert two more node.
  • Let’s setup our first node, connect to ESX, open console of the first node and turn it on. You will be prompted to format ifs partition for all drives, select yes
  • Next we get to the wizard that will walk us through configuring the cluster (Note: if you make a typo and need to get back, simply tack “back” at the prompt)

2-11-2013 11-15-22 AM

  • Since this is the first node of our cluster we are going to select 1 to Create a new cluster, create password for root and admin account, select No for Enabling SupportIQ, this is a virtual appliance we don’t need to enable email-home support, enter cluster name, select enter for default encoding (utf-8)
  • Next we are going to configure intra-network settings, this is the network that Isilon nodes use to communicate to each other. I am using my private network (vSwitch1, 192.168.1.0/24) network.

2-11-2013 11-15-39 AM

  • Select 1 to configure netmask, select 3 to configure intra-cluster ip range. On my private network i will use range 192.168.1.50-53 where 192.168.1.50 is my low ip and 192.168.1.53 is my high ip.
  • Now that we are done with our internal network we are going to configure our external network, select 1 to configure external interface

2-11-2013 11-15-49 AM

  • Enter subnet mask information and then configure ext-1 ip range information. Next configure default gateway.
  • Now we are ready to configure SmartConnect settings of the cluster. SmartConnect is built-in load-balancer, you can read more about it on support.emc.com document titled “SmartConnect – Optimized Scale-out storage performance and availability”. You can also get a lot of tips on how to configure in production by visiting this blog http://www.jasemccarty.com/blog/

2-11-2013 11-16-02 AM

  • Select 1 to configure zone name, it should match NS record you created in DNS (delegation), in this example i will enter isilonpoc.local. Next configure SmartConnect service IP.
  • Configure DNS Servers and Search Domains (separate multiple entries with a comma)
  • Configure TimeZone and Date/Time
  • Cluster Join mode keep at Manual
  • Commit settings and at this point you should be able to connect to the SmartConnect zone name  (isilonpoc.local) and login to the cluster

Now that we have our cluster going let’s join another node to the cluster,  connect to console of another node and turn it on.

  • You will be prompted to format ifs partition for all drives, select yes
  • Now at the wizard select Join an existing cluster

2-11-2013 11-16-12 AM

  • Isilon node will automatically search local subnet and look for an existing cluster(s), if it finds it will present this option. Select your cluster index # and hit Enter

2-11-2013 11-16-21 AM

  • At this point you should see the new node show up in Isilon WebUI
  • When you login to WebUI you will see a couple of messages about “one or more drives area ready to be replaced”. This is normal since this a virtual appliance and does not have all the drives physical nodes have. Ssh into the cluster and clear  all alerts/events but running these two commands: “isi events cancel all” and “isi events quiet all”

Thank you @ChristopherImes for some pointers