VNX – Creating cascaded clones, sort of..

If you are familiar with VMAX TimeFinder/Clone you will easily recognized what this is,  TimeFinder/Clone Cascaded clone feature.  There are a lot of use-cases that would allow one to take advantage of this feature, one example would be to refresh a dev or test environment from a nightly clone of production.

8-16-2014 8-02-15 PM

Great feature but what if you need to do same thing on a VNX ?  Unfortunately native Snapview Clone software will not allow you to create clone of “C” because clone “B” is a target of another clone session. If you try to create a clone session by selecting B as your source (regardless if it’s synchronized or fractured)  you will get an error message similar to this:

8-16-2014 8-22-53 PM

So what do we ?  Well we reach out to our good ol’  friend SANCopy.  Most people know and use SANCopy for array to array migration but not many know that you can also use it for intra array copies.  Typically intra array copies are performed using SnapView clone but this is a special case.  SANCopy will allow us to create a completely new session between  “B” and “C”.   Here is how :

Environment:

VNX5600

8-16-2014 8-36-13 PM

 

8-16-2014 8-39-07 PM

First step we create a standard SnapView Clone session between LUN Oracle_A and Oracle_B and

then Fracture it.

8-16-2014 8-45-48 PM

Next step we are going to create a SANCopy session between LUN Oracle_B and Oracle_C.

8-16-2014 8-54-27 PM

Here is one caveat,  as you can see from the screenshot below we do not see LUN ID 12000 which should be our Oracle_C. We don’t see it because our source LUN (Oracle_B) is currently owned by SPB and our target (Oracle_C) is owned by SPA. So we need to trespass  Oracle_C to match SP owner of source LUN, SPB in this case.

8-16-2014 9-08-35 PM

Once you trespass the target LUN, you will see it in the destination list and you can go ahead and create the session. Once session is created we need to start it, in Unisphere navigate to Storage > Data Migration > Sessions Tab.  Select the session and hit Start8-16-2014 9-28-59 PM

Now if you select the SAN Copy LUNs tab you will see this, i know it looks as if we have two different sessions, don’t worry it’s normal.

8-16-2014 9-32-58 PM

When session Status changes to Completed you can go ahead and delete the SANCopy session. SANCopy session can remain in place and will not impact your existing SnapView clone session between Oracle_A and Oracle_B. You can freely synchronize it and fracture it again. Finally if you had to trespass target LUNs , don’t forget to trespass them back after you delete the SANCopy session.

 

Advertisement

Cisco DCNM – How to schedule switch backups and more (Part 1)

Every system admin out there knows how essential it is to backup your system on regular basis, SAN fabrics are no exception. You need to backup your SAN switch configuration on regular basis (i do it daily) to be able to use recover from hardware failures or from human errors. Traditional approach to setting up switch backups is to setup an external TFTP/SFTP/FTP server, write a script that would execture backup commands and then user crontab/task manager to execute these commands. Also on Cisco MDS you could take advantage of internal scheduler to accomplish the same tasks. In all of these scenarios you are required to use a 3rd party TFTP/SFTP/FTP server. Not anymore, starting with Cisco DCNM 6.2.x TFTP server comes built-in in the software. All you have to do is specify what server type you will use (TFTP versus SFTP, although for SFTP you still have to use 3rd party server) and use built-in scheduler to backup all switches in the fabric. Here are the steps:

  • Login to DCNM Web client,  typically it’s the server name where you installed DCNM. Simply type the address in your browser and login to the server. Landing page will look something like this:

landing

  • Next step is to configure server type, select pull down menu next to Admin and press SFTP/TFTP Credentials

credentials

  • Highlight fabric/switches, select TFTP and press Verify and Apply

selecttftp

  • Next step is to schedule automatic job that will backup running configuration to internal TFTP server. Select Config and then Jobs.

selectjobs

  • Next step is very important, you must select SAN fabric otherwise you will not be able to schedule a backup.

selectfabric

  • Now that Fabric is selected, click on the green plus sign on the right and configure date and time for the job to run.

createschedule

  • That’s it, backup job has been scheduled.  Now let’s verify that it completed successfully. Select Admin and then View

verifyarchive

  • Select Groups > Eligible Switches and you should see backups for that particular switch. Select backup file and hit View to see file content.  One thing to note that these backup files are stored inside of DCNM database, they are not flat files that you can find on the file system. In later posts we will go through the steps of how to modify and create custom configuration files.

viewarchives

  • That is it.

Using VNX Snapshots on Linux

My previous post was about using VNX Snapshots on Windows and now let’s see how to use this functionality on Linux, Redhat specifically.

Configuration:

VNX 5700 – Block OE 05.32.000.5.206
Redhat 6.4 Enterprise Server
PowerPath – 5.7 SP 1 P 01 (build 6)
Snapcli – V3.32.0.0.6 – 1 (64 bits)
2 storage groups

stg-group1linux

stg-group2

As we can from screenshot above, i have two LUNs presented to source server (LUN 353 and LUN 382).  On the host LUNs have been fdisk, aligned and volume group created.

[root@source ~]# powermt display dev=all
Pseudo name=emcpowera
VNX ID=APM00112345678 [stg-group1]
Logical device ID=60060160131C420092A5A2B8ECF0E211 [LUN 353]
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---
###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors
==============================================================================
   2 fnic                   sdp         SP B0    active   alive      0      0
   2 fnic                   sdn         SP B1    active   alive      0      0
   2 fnic                   sdl         SP A3    active   alive      0      0
   2 fnic                   sdj         SP A2    active   alive      0      0
   1 fnic                   sdh         SP B3    active   alive      0      0
   1 fnic                   sdf         SP B2    active   alive      0      0
   1 fnic                   sdd         SP A0    active   alive      0      0
   1 fnic                   sdb         SP A1    active   alive      0      0

Pseudo name=emcpowerb
VNX ID=APM00112345678 [stg-group1]
Logical device ID=60060160131C4200D263B55218F7E211 [LUN 382]
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---
###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors
==============================================================================
   2 fnic                   sdq         SP B0    active   alive      0      0
   2 fnic                   sdo         SP B1    active   alive      0      0
   2 fnic                   sdm         SP A3    active   alive      0      0
   2 fnic                   sdk         SP A2    active   alive      0      0
   1 fnic                   sdi         SP B3    active   alive      0      0
   1 fnic                   sdg         SP B2    active   alive      0      0
   1 fnic                   sde         SP A0    active   alive      0      0
   1 fnic                   sdc         SP A1    active   alive      0      0

[root@source ~]# vgdisplay VG_VNX -v
    Using volume group(s) on command line
    Finding volume group "VG_VNX"
  --- Volume group ---
  VG Name               VG_VNX
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               19.99 GiB
  PE Size               4.00 MiB
  Total PE              5118
  Alloc PE / Size       4864 / 19.00 GiB
  Free  PE / Size       254 / 1016.00 MiB
  VG UUID               hbfpXb-q7mU-KMi0-nFxP-9lw2-bi8q-S0vJpY

  --- Logical volume ---
  LV Path                /dev/VG_VNX/vnx_lv
  LV Name                vnx_lv
  VG Name                VG_VNX
  LV UUID                hqEH3q-zchl-ZeRd-KcuA-3PLi-seMi-8PxWsR
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2013-07-28 14:19:05 -0400
  LV Status              available
  # open                 1
  LV Size                19.00 GiB
  Current LE             4864
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

  --- Physical volumes ---
  PV Name               /dev/emcpowera1     
  PV UUID               YAGYyX-tTpm-NFDC-2PRw-nntO-jRUs-Z54k00
  PV Status             allocatable
  Total PE / Free PE    2559 / 0

  PV Name               /dev/emcpowerb1     
  PV UUID               d2bjdo-qrw3-J00L-KkF6-I7zq-7qGo-3d1Bxm
  PV Status             allocatable
  Total PE / Free PE    2559 / 254

1) Step one is to create consistency group, because our volume group consists of two LUNs, both LUNs need to be snapped at the same time and consistency group allows us to do just that.

[root@management ~]# naviseccli -h 10.210.6.19 snap -group -create -name vnx_consistency_group -res 353,382

2) Next we need to create SMP (Snapshot Mount  Point) and present it to stg-group2. Think of SMP as a placeholder device that will be used to attach snapshot to, since we have two LUNs we need to create two SMPs.

[root@management ~]# naviseccli -h 10.210.6.19 lun -create -type Snap -primaryLunName "LUN 353" -name SMP_LUN_353 -allowInbandSnapAttach yes -sp A

[root@management ~]# naviseccli -h 10.210.6.19 lun -create -type Snap -primaryLunName "LUN 382" -name SMP_LUN_382 -allowInbandSnapAttach yes -sp A

3) Now let’s identify each SMP Snapshot Mount Point Number and then attach both to storage group “stg-group2

[root@management ~]# naviseccli -h 10.210.6.19  lun -list -l 353 -snapMountPoints
LOGICAL UNIT NUMBER 353
Name:  LUN 353
Snapshot Mount Points:  7533

[root@management ~]# naviseccli -h 10.210.6.19  lun -list -l 382 -snapMountPoints
LOGICAL UNIT NUMBER 382
Name:  LUN 382
Snapshot Mount Points:  7532

Since stg-group2 does not have any LUNs in it, we are going to start with HLU 0

[root@management ~]# naviseccli -h 10.210.6.19  storagegroup -addhlu -gname stg-group2 -alu 7533 -hlu 0

[root@management ~]# naviseccli -h 10.210.6.19  storagegroup -addhlu -gname stg-group2 -alu 7532 -hlu 1

4) Now that SMPs presented to target host, let’s rescan the bus and see what happens. I am testing on RedHat 6.4 so i am using these command to rescan the bus:

[root@target ~]# ls -l /sys/class/scsi_host/

lrwxrwxrwx. 1 root root 0 Jul 28 13:04 host1 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:01.0/0000:07:00.0/host1/scsi_host/host1
lrwxrwxrwx. 1 root root 0 Jul 28 13:04 host2 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:02.0/0000:08:00.0/host2/scsi_host/host2

[root@target ~]# echo "- - -" > /sys/class/scsi_host/host1/scan
[root@target ~]# echo "- - -" > /sys/class/scsi_host/host2/scan

[root@target ~]# powermt check
[root@target ~]# powermt set policy=co dev=all
[root@target ~]# powermt save
[root@target ~]# powermt display dev=all
Pseudo name=emcpowera
VNX ID=APM00112345678 []
Logical device ID=60060160131C42004456510ACEF7E211 []
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---
###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors
==============================================================================
   2 fnic                   sdq         SP B1    active   alive      0      0
   2 fnic                   sdp         SP B0    active   alive      0      0
   2 fnic                   sdo         SP A3    active   alive      0      0
   2 fnic                   sdn         SP A2    active   alive      0      0
   1 fnic                   sdm         SP B2    active   alive      0      0
   1 fnic                   sdl         SP B3    active   alive      0      0
   1 fnic                   sdk         SP A0    active   alive      0      0
   1 fnic                   sdj         SP A1    active   alive      0      0

Pseudo name=emcpowerb
VNX ID=APM00112345678 []
Logical device ID=60060160131C420006DFEB996EF6E211 []
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---
###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors
==============================================================================
   2 fnic                   sdi         SP B1    active   alive      0      0
   2 fnic                   sdh         SP B0    active   alive      0      0
   2 fnic                   sdg         SP A3    active   alive      0      0
   2 fnic                   sdf         SP A2    active   alive      0      0
   1 fnic                   sde         SP B2    active   alive      0      0
   1 fnic                   sdd         SP B3    active   alive      0      0
   1 fnic                   sdc         SP A0    active   alive      0      0
   1 fnic                   sdb         SP A1    active   alive      0      0

5) We are ready to create snapshots, on source server flush memory to disk

[root@source ~]# /usr/snapcli/snapcli flush -o /dev/emcpowera1,/dev/emcpowerb1
Flushed /dev/emcpowera1,/dev/emcpowerb1.

6) Create snapshot using consistency group, run this command on source server. Notice how we specify each powerpath devices that is a member of the volume group.

[root@source ~]# /usr/snapcli/snapcli create -s vnx_snapshot -o /dev/emcpowera1,/dev/emcpowerb1 -c vnx_consistency_group
Attempting to create consistent snapshot vnx_snapshot.
Successfully created consistent snapshot vnx_snapshot.
 on object /dev/emcpowera1.
 on object /dev/emcpowerb1.

7) Attach snapshots to SMP created earlier, run this command on target server.

[root@target ~]# /usr/snapcli/snapcli attach -s vnx_snapshot -f
Scanning for new devices.
Attached snapshot vnx_snapshot on device /dev/emcpowerb.
Attached snapshot vnx_snapshot on device /dev/emcpowera.

8) When snapshot gets attached, volume group gets automatically imported. We can verify it by running this command on target server

[root@target ~]# vgdisplay -v VG_VNX
    Using volume group(s) on command line
    Finding volume group "VG_VNX"
  --- Volume group ---
  VG Name               VG_VNX
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               19.99 GiB
  PE Size               4.00 MiB
  Total PE              5118
  Alloc PE / Size       4864 / 19.00 GiB
  Free  PE / Size       254 / 1016.00 MiB
  VG UUID               hbfpXb-q7mU-KMi0-nFxP-9lw2-bi8q-S0vJpY

  --- Logical volume ---
  LV Path                /dev/VG_VNX/vnx_lv
  LV Name                vnx_lv
  VG Name                VG_VNX
  LV UUID                hqEH3q-zchl-ZeRd-KcuA-3PLi-seMi-8PxWsR
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2013-07-28 14:19:05 -0400
  LV Status              NOT available
  LV Size                19.00 GiB
  Current LE             4864
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto

  --- Physical volumes ---
  PV Name               /dev/emcpowerb1     
  PV UUID               YAGYyX-tTpm-NFDC-2PRw-nntO-jRUs-Z54k00
  PV Status             allocatable
  Total PE / Free PE    2559 / 0

  PV Name               /dev/emcpowera1     
  PV UUID               d2bjdo-qrw3-J00L-KkF6-I7zq-7qGo-3d1Bxm
  PV Status             allocatable
  Total PE / Free PE    2559 / 254

Notice how LV Status is “NOT Available”, that means while volume group got imported, it needs to be activated

[root@target ~]# vgchange -a y VG_VNX
  1 logical volume(s) in volume group "VG_VNX" now active

Now if we repeat vgdisplay command, LV Status will be “available”. At this point if the mount point is in your fstab you can simply run “mount -a”, if you not you can manually mount the logical volume.

Another helpful step would be to run “powermt check” to update PowerPath configuration with correct LUN information. If you look back at “powermt display dev=all” in step 4 you will notice that it did not display any storage group nor LUN related information. But now after we run “powermt check” on target device followed by “powermt display dev=all” we will see storage group and LUN (SMP in this case) information populated.

[root@target ~]# powermt display dev=all
Pseudo name=emcpowera
VNX ID=APM00112345678 [stg-group2]
Logical device ID=60060160131C42004456510ACEF7E211 [SMP_LUN_382]
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---
###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors
==============================================================================
   2 fnic                   sdq         SP B1    active   alive      0      0
   2 fnic                   sdp         SP B0    active   alive      0      0
   2 fnic                   sdo         SP A3    active   alive      0      0
   2 fnic                   sdn         SP A2    active   alive      0      0
   1 fnic                   sdm         SP B2    active   alive      0      0
   1 fnic                   sdl         SP B3    active   alive      0      0
   1 fnic                   sdk         SP A0    active   alive      0      0
   1 fnic                   sdj         SP A1    active   alive      0      0

Pseudo name=emcpowerb
VNX ID=APM00112345678 [stg-group2]
Logical device ID=60060160131C420006DFEB996EF6E211 [SMP_LUN_353]
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A       Array failover mode: 4
==============================================================================
--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---
###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors
==============================================================================
   2 fnic                   sdi         SP B1    active   alive      0      0
   2 fnic                   sdh         SP B0    active   alive      0      0
   2 fnic                   sdg         SP A3    active   alive      0      0
   2 fnic                   sdf         SP A2    active   alive      0      0
   1 fnic                   sde         SP B2    active   alive      0      0
   1 fnic                   sdd         SP B3    active   alive      0      0
   1 fnic                   sdc         SP A0    active   alive      0      0
   1 fnic                   sdb         SP A1    active   alive      0      0

9) After you are done with snapshot we are going to detach it from target server. First we are going to flush memory to disk

[root@target ~]# /usr/snapcli/snapcli flush -o /dev/emcpowera1,/dev/emcpowerb1
Flushed /dev/emcpowera1,/dev/emcpowerb1.

Then unmount file system, deactivate/export volume group and detach snapshots.

[root@target ~]# vgchange -a n VG_VNX

[root@target ~]# vgexport VG_VNX
  Volume group "VG_VNX" successfully exported

[root@target ~]# /usr/snapcli/snapcli detach -s vnx_snapshot 
Detaching snapshot vnx_snapshot on device /dev/emcpowerb.
Detaching snapshot vnx_snapshot on device /dev/emcpowera.

10) Finally we are going to destroy snapshot from source server

[root@source ~]# /usr/snapcli/snapcli destroy -s vnx_snapshot -o /dev/emcpowera1,/dev/emcpowerb1
Destroyed snapshot vnx_snapshot on object /dev/emcpowera1,/dev/emcpowerb1.

11) When you are ready to create snapshots again simply repeat steps 5-8.

Using VNX Snapshots on Windows

In Block OE version 32 EMC introduced new snapshot functionality VNX Snapshots, very slick technology that simplifies the process of creating snapshots for block storage. There are lot of benefits in using VNX Snapshots versus legacy Snapview Snapshots (no need for RLP, no COFW ..etc). You can read about it in the following two documents:

Click to access h10858-vnx-snapshots-wp.pdf

https://support.emc.com/docu45754_SnapCLI_for_VNX_Release_Notes_Version_3.32.0.0.5.pdf?language=en_US

https://support.emc.com/docu41553_VNX-Command-Line-Interface-Reference-for-Block.pdf?language=en_US

These papers provide a lot of good information about the technology and some examples. I found some example kind of confusing so i wanted to provide my own examples. Overall goal was to allow my customer to create/delete/mount snapshots without needing to issue naviseccli commands.

Configuration:

VNX 5700 – Block OE 05.32.000.5.206

2 Windows 2008 R2 servers

SnapCLI – V3.32.0.0.5 – 1 (32 bits)

Naviseccli – 7.32.25.1.63

2 storage groups

stg-group1

stg-group2

1) First step we need to install snapcli on both Windows servers, nothing special there, next next Done
2) Next we need to create SMP (Snapshot Mount Point) and present it to stg-group2. Think of SMP as a placeholder device that will be used to attach snapshot to. On a management host where i have naviseccli installed i run the following command. You want to specify “allowInbandSnapAttach” option as that will allow you to attach snapshot on target host using snapcli, otherwise you would have to use naviseccli to attach it. Since our goal is to have customer only use snapcli that’s exactly what we need. LUN 353 is owned by SPA hence -spa A.

C:\>naviseccli -address 10.210.6.19 lun -create -type Snap -primaryLunName "LUN 353" -name SMP_LUN_353 -allowInbandSnapAttach yes -sp A

Now let’s see what SMP looks like and note snapshot mount point number

C:\>naviseccli -address 10.210.6.19 lun -list -l 353 -snapMountPoints
LOGICAL UNIT NUMBER 353
Name: LUN 353
Snapshot Mount Points: 7533

3) Now we need to add SMP to storage stg-group2, note alu is the snapshot mount point point number and since there are no LUNs in the storage group we are using hlu 0

C:\>naviseccli -address 10.210.6.19 storagegroup -addhlu -gname stg-group2 -alu 7533 -hlu 0

If we were to look in Disk Management and rescan, we would see “Unknown” disk in offline state.

smpview

4) At this point we are ready to create snapshot, on source system we flush any data in memory to disk and then create snapshot

C:\>snapcli flush -o F:

C:\>snapcli create -s "Snapshot_LUN_353" -o F:
Attempting to create snapshot Snapshot_LUN_353 on device \\.\PhysicalDrive1.
Attempting to create the snapshot on the entire LUN.
Created snapshot Snapshot_LUN_353.

5) Now on target system we are going to attach snapshot

C:\>snapcli attach -s "Snapshot_LUN_353" -f -d F:
Scanning for new devices.
User specified drive letter F:
Attached snapshot Snapshot_LUN_353 on device F:.

At this point if we look in Disk Management again the drive should be online and available

snapmounted
6) When you are done with snapshot we need to flush it and detach it, on target server we run these commands

C:\>snapcli flush -o F:
Flushed F:.

C:\>snapcli detach -s "Snapshot_LUN_353"
Detaching snapshot Snapshot_LUN_353 on device F:.

7) And finally delete the snapshot, on source server run

C:\>snapcli destroy -s "Snapshot_LUN_353" -o F:
Destroyed snapshot Snapshot_LUN_353 on object F:.

Isilon – SyncIQ Loopback Replication

Our company migrated to Isilon from Celerra about two years ago. From time to time we receive requests from application people to clone production environment to either qa or dev instances in the same Isilon cluster. Back in the Celerra/VNX days we used to use nas_copy that would allow us to perform file system to file system copies but since we migrated to Isilon we were trying to figure out how to accomplish the same thing using Isilon utilities.  Up to this point we had to rely on host based tools such as emcopy or rsync, not very convenient considering the fact that you have to have a “proxy” server available to perform these copies. I also was very positive that using internal tools would be much more efficient and faster. After looking around Isilon google groups i found my solution.  Here is my configuration:

Isilon Cluster – 6 x 108NL nodes, each node has 2 x10G NICs and 2 x 1G NICs (LACP)

OneFS – 6.5.5.12

InsightIQ v2.5.0.0007

ESX 5.0 10G NIC

Ubuntu 11.10 VM, VMXNET3 NIC

I decided to test a couple of different scenarios to see which one would give me the best performance. Here is information from InsightIQ on the directory that i am using in all 3 scenarios. This directory contains data from a learning system so a lot of tiny little files.

5-31-2013 3-51-40 PM

Scenario 1 – Using SyncIQ with loopback address of 127.0.0.1

I created my SyncIQ job and specified 127.0.0.1 as Target cluster IP address, here is the policy details:

isilon-6# isi sync policy list  -v
Id: a7388bd04b21ba61ce9597eb90c712ca
Spec:
Type: user
Name: testpolicy
Description:
Source paths:
Root Path: /ifs/data/dev1
Source node restriction:
Destination:
       Cluster: 127.0.0.1
Password is present: no
Path: /ifs/data/qa
Make snapshot: off
Restrict target by zone name: off
Force use of interface in pool: off
Predicate: None
Check integrity: yes
Skip source/target file hashing: no
Disable stf syncing: no
Log level: notice
Maximum failure errors: 1
Target content aware initial sync (diff_sync): no
Log removed files: no
Rotate report period (sec): 31536000
Max number of reports: 2000
Coordinator performance settings:
Workers per node: 3
Task: sync manually
State: on

I went ahead and started the job, but i was really curious what interface it was going to use to copy the data.  I let the job run for about 15 minutes and this is what i saw in InsightIQ (performance reporting section)

5-31-2013 3-35-44 PM

Very interesting,  SyncIQ decided to use 1G interfaces. I was also happy to see that workload was distributed among 6 nodes of the cluster.  Even though SyncIQ settings were set at defaults (workers, file operations rules) look what it did to my cluster CPU utilization, pretty big spike.

5-31-2013 3-48-56 PM

I started the job at 3:15pm and it completed at 6:30pm for a total of 3:15 minutes, not bad at all for full copy.

Scenario 2 – Using SyncIQ with SmartConnect zone name

In this test i wanted to see if performance would be any different if i were to use SmartConnect zone name of my cluster that utilizes 10G NICs. Before i ran this test i went ahead and deleted the old policy and deleted the data from /ifs/data/qa directory using “treedelete” command, see bottom of this post for instructions.

Here is my SyncIQ job that uses local SmartConnect zone name

isilon-6# isi sync policy list  -v
Id: 7b5dac0efe79543425720a8290aa58b4
Spec:
Type: user
Name: testpolicy
Description:
Source paths:
Root Path: /ifs/data/dev1
Source node restriction:
Destination:
            Cluster: isilon.mycompany.com
Password is present: no
Path: /ifs/data/qa
Make snapshot: off
Restrict target by zone name: off
Force use of interface in pool: off
Predicate: None
Check integrity: yes
Skip source/target file hashing: no
Disable stf syncing: no
Log level: notice
Maximum failure errors: 1
Target content aware initial sync (diff_sync): no
Log removed files: no
Rotate report period (sec): 31536000
Max number of reports: 2000
Coordinator performance settings:
Workers per node: 3
Task: sync manually
State: on

I started the job and let it run for 15 minutes, this is what i saw in InsightIQ this time

5-31-2013 11-49-32 PM

This is what i expected to see, SyncIQ was using 10G network interfaces, quick look at CPU utilization displayed the same utilization as before, very CPU intensive process.  I started the job around 11:30pm and it completed at 2:30am,  so 3 hours for full copy.

6-1-2013 12-09-27 AM

Scenario 3 – Using rsync on Ubuntu VM

In this scenario i wanted to test and document performance of using rsync on a host that is acting as a “proxy” server. Basically i exported /ifs/data/dev1 and /ifs/data/qa directory using NFS and mounted them on my Ubuntu VM.  While i wanted to simulate “multithread” performance by running multiple rsync instances, directory layout was very convoluted and would not allow me to do that easily, so what i have tested is just the single rsync command

nohup  rsync –delete –progress -avW  /mnt/ifs/data/dev1/ /mnt/ifs/data/qa/ > log_`date +%F`_`date +%T` 2>&1 &

Summary Table

Scenario Description Full Copy Incremental Copy
1 SyncIQ using 127.0.0.1 address 3:15 hours 29 seconds
2 SyncIQ using SmartConnect Zone 3 hours 29 seconds
3 Using rsync 36 hours 4.5 hours

These are pretty impressive numbers from SyncIQ, going forward we will be using this procedure to clone our production instances.

Deleting data from Isilon Cluster

the most efficient way that i found to delete data from the cluster is to use treedelete command, here is the syntax

isi job start treedelete –path=<ifs-directory> –priority=<priority> –policy=<policy>

for example:

isi job start treedelete –path=/ifs/data/qa

It’s actually pretty fast, to delete the data that i was using for this test (3.47million files and 2.1 million directories) took only 20 minutes using default settings.

Data Doman – manipulate interface used for replication

When we deployed our second Data Domain i wanted to start migrating some shares from the old DD880 to the new DD890.  We have multiple virtual interfaces on DD880 and DD890  and i wanted to make sure that replication traffic was using specific interfaces that had layer 2 connectivity. I have documented the steps how to manipulate replication interfaces.

Configuration:

Source – DD880 – OS 5.1.1.0-291218

DNS Name – DD880PROD  – virtual interface name used for host backups
DNS Name – DD880PRIV – virtual interface i want to use for replication

DD880PROD# net show settings 
port         enabled   DHCP   IP address       netmask           type   additional setting   
----------   -------   ----   --------------   ---------------   ----   ---------------------
eth0a        yes       no     10.12.24.51      255.255.255.0     n/a                         
eth0b        yes       no     192.168.1.3      255.255.255.0     n/a                         
eth5a        yes       n/a    n/a              n/a               n/a    bonded to veth0      
eth5b        yes       n/a    n/a              n/a               n/a    bonded to veth0      
veth0        yes       no     0                255.255.252.0     n/a    failover: eth5a,eth5b
veth0.2694   yes       no     128.140.38.136   255.255.255.128   n/a    << --- Production (DNS Name DD880PROD)           
veth0.3442   yes       no     10.140.32.25     255.255.255.128   n/a    << -- Private      (DNS Name DD880PRIV)

Target – DD890 – OS 5.1.1.0-291218

DNS Name – DD890PROD  – virtual interface name used for host backups
DNS Name – DD890PRIV – virtual interface i want to use for replication

DD890PROD# net show settings
port         enabled   DHCP   IP address       netmask         type   additional setting             
----------   -------   ----   --------------   -------------   ----   -------------------------------
eth0a        no        n/a    n/a              n/a             n/a                                   
eth0b        no        n/a    n/a              n/a             n/a                                   
eth4a        yes       n/a    n/a              n/a             n/a    bonded to veth0                
eth4b        yes       n/a    n/a              n/a             n/a    bonded to veth0                
veth0        yes       no     0                255.255.255.0   n/a    lacp hash xor-L3L4: eth4a,eth4b
veth0.2997   yes       no     128.199.145.250  255.255.255.0   n/a    << --- Production (DNS Name DD890PROD)                            
veth0.3442   yes       no     10.140.32.26     255.255.255.128 n/a    << -- Private      (DNS Name DD890PRIV)

When you configure Data Domain replication session you have to specify host name assigned to Data Domain, you can get it by typing “hostname” and the prompt (it is case sensitive).

  • Let’s go ahead and create the session first, run this command on both source and target DD
replication add source dir://DD880PROD.mycompany.local/backup/oracle destination dir://DD890PROD.mycompany.local/backup/oracle
  • Verify session got created, also notice Connection Host. This is my production interface that i do not want to use for replication.
DD880PROD# replication show config
CTX   Source                                                       Destination                      Connection Host and Port                Low-bw-optim   Enabled

---   -------------------------------------------    ---------------------------------------------  -------------------------------------   ------------   -------

1    dir://DD880PROD.mycompany.local/backup/oracle   dir://DD890PROD.mycompany.local/backup/oracle  DD880PROD.mycompany.local   (default)     disabled       yes
  • Now let’s modify this session on each Data Domain so it uses private interfaces

On source (DD880)

replication modify rctx://1 connection-host DD890PRIV.mycompany.local

On target (DD890)

replication modify rctx://1 connection-host DD880PRIV.mycompany.local
  • Let’s see what it looks like (on source), notice it’s now displaying correct Connection Host name
DD880PROD# replication show config
CTX   Source                                                       Destination                      Connection Host and Port                Low-bw-optim   Enabled

---   -------------------------------------------    ---------------------------------------------  -------------------------------------   ------------   -------

1    dir://DD880PROD.mycompany.local/backup/oracle   dir://DD890PROD.mycompany.local/backup/oracle  DD880PRIV.mycompany.local   (default)     disabled       yes
  • At this point we are ready to start replication session, if you have issues starting the session make sure you don’t need to add any static routes
replication initialize dir://DD890PROD.mycompany.local/backup/oracle

Configuring DDBoost for RMAN and Oracle 10g

DDBoost for RMAN is an amazing product, we ran a couple of test backups and here are our results:

Server – HP Proliant G7, 1G NIC
Operating System – RedHat 4.7
Oracle 10g database, 1TB in size
Data Domain 890, 10G interface configured in LACP,  OS 5.1.1.0-291218
DDBoost – RMAN 1.0.1.4 for Linux

Test 1:

RMAN backup to Data Domain nfs export ~= 5 hours

Test 2:

DDBoost RMAN backup to Data Domain ~=36 minutes !!!

The process of configuring Data Domain and Oracle is very straightforward, you can also reference the official document from EMC titled “EMC Data Domain Boost for Oracle Recovery Manager Administration Guide”

  • Enable DD Boost feature (assumes you already own and installed the license)

# ddboost enable

  • Configure user account that will be used for DD Boost. This will be the only account that can utilize DD Boost and will have access to all storage units

# user add <username> password <password>

# ddboost set user-name <username>

  • Configure storage unit, it’s a logical container where you are going to send your backups to. When you create storage unit, it automatically creates an mtree and that will allow you to track storage unit utilization at a later time.  When i talked to Data Domain support they recommended not to exceed 16 mtrees per Data Domain unit as having more could have negative impact on performance

# ddboost storage-unit create <storageunitname>

  • Now we need to grant our Oracle host access to storage units, as of DDOS 5.1  i don’t see a way to grant access to specific storage unit, it’s global access

# ddboost access add clients <hostname>

We are done with Data Domain configuration, now let’s configure our Oracle host

  • Download RMAN Plug-in from Data Domain website and install it on database server, it’s just a couple of binary files in a tar file for *nix platforms
  • Next step is to register storage unit name, ddboost username and password.  I don’t know exactly what happens here but i know it’s not talking to Data Domain because you can enter incorrect password or username and it will still complete successfully

RUN
{
CONFIGURE DEFAULT DEVICE TYPE TO SBT_TAPE; # default
CONFIGURE DEVICE TYPE SBT_TAPE Backup TYPE to BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE ‘SBT_TAPE’
PARMS ‘SBT_LIBRARY=/u01/app/oracle/product/10.2.0/dbhome/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=/u01/app/oracle/product/10.2.0/dbhome)’;

ALLOCATE CHANNEL c1 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=/u01/app/oracle/product/10.2.0/dbhome/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=/u01/app/oracle/product/10.2.0/dbhome)’;
send ‘set username ddboostusername password XXXX servername datadomain890.mycompany.local’;

RELEASE CHANNEL c1;
}

  • Let’s run a backup, in this example we are using 6 channels. In our environment this gave us the most speed before we started seeing high spikes in CPU utilization

RUN {
CONFIGURE DEFAULT DEVICE TYPE TO SBT_TAPE; # default
CONFIGURE DEVICE TYPE SBT_TAPE Backup TYPE to BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE ‘SBT_TAPE’
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

ALLOCATE CHANNEL c1 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

ALLOCATE CHANNEL c2 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

ALLOCATE CHANNEL c3 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

ALLOCATE CHANNEL c4 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

ALLOCATE CHANNEL c5 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

ALLOCATE CHANNEL c6 TYPE SBT_TAPE
PARMS ‘SBT_LIBRARY=${ORACLE_HOME}/lib/libddobk.so, ENV=(STORAGE_UNIT=storageunitname,BACKUP_HOST=datadomain890.mycompany.local,ORACLE_HOME=${ORACLE_HOME})’;

backup filesperset 1 blocks all database format ‘%d_DATABASE_%T_%t_s%s_p%p’ tag ‘${ORACLE_DB} database backup’;
backup current controlfile format ‘%d_controlfile_%T_%t_s%s_p%p’ tag ‘${ORACLE_DB} Controlfile backup’;
backup spfile format ‘%d_spfile_%T_%t_s%s_p%p’ tag ‘${ORACLE_DB} Spfile backup’;

release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
}

  • As this backup is running and you want to monitor its performance from Data Domain side, you can use this command:

# ddboost show stats interval 5 count 100

  • If you want to look at storage-unit statistics (# of files, compress/dedupe rate)

# ddboost storage-unit show compression

A couple of of issues that you may encounter with your implementation

  • We got this error message below when we  specified incorrect password during ddboost rman client registration (first RMAN script above)

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c6 channel at 02/11/2013 17:59:10
ORA-19506: failed to create sequential file, name=”DB3_DATABASE_20130211_p1″, parms=””
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
sbtbackup: dd_rman_connect_to_backup_host failed

  • We got this error message when we exported storage unit via NFS and change ownership of some files.

RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c3 channel at 02/11/2013 14:55:31
ORA-19506: failed to create sequential file, name=”DB3_DATABASE_20130211_p1″, parms=””
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
sbtbackup: Could not create file DB3_DATABASE_20130211_p1 on host datadomain890.mycompany.local, error 5034

Deploying virtual Isilon on ESXi

I’ve documented the steps on how to deploy virtual Isilon appliance on ESXi platform. I believe at the moment only existing EMC customers can get their hands on this appliance.  Isilon virtual appliance is to be used for testing purposes only. This appliance comes pre-built for VMware Workstation/Player, which is nice, but I wanted to deploy it on my ESXi server (free edition).

Current ESX server configuration:

Two vSwitches, vSwitch0 is my public network and vSwitch1 is my private network (I used 192.168.1.0/24 subnet, this will be used by Isilon cluster for intra-cluster connectivity, on real hardware Isilon uses InfiniBand switches).

DNS records:

You don’t have to create A records for internal interfaces, i am listing them here for documentation purposes only.

isilonintpoc1.local – 192.168.1.50
isilonintpoc2.local – 192.168.1.51
isilonintpoc3.local – 192.168.1.52

A record for each node of the cluster (external interfaces)

isilonpoc1.local – 10.144.4.11
isilonpoc2.local – 10.144.4.12
isilonpoc3.local – 10.144.4.13

A record for SmartConnect Service IP

isilonpoc0.local – 10.144.4.10

NS record for SmartConnect zone name, this record should point to the A record of SmartConnect Service IP

isilonpoc.local –> isilonpoc0.local

Let’s get started …

  • Extract content of zip file to your hard drive
  • Download and install VMware vCenter Converter Standalone
  • Open Converter, select “Convert Machine”. For Source type select “VMware Workstation or other VMware virtual machine”.  For “Virtual machine file browse to the extracted folder and select vmx file that is in the root of the directory.

vmxselect

  • Enter ESX server information and hit Next. Enter node name that you want to assign to this VM, select datastore, I left virtual machine version at 7.
  • The only thing that i modify on the next page is Networking, i change NIC1 to vSwitch1 (private) and NIC2 to vSwitch0 (public). It will take 5-10 minutes to convert the appliance. In my virtual cluster i will have three Isilon nodes so i repeat the same steps and convert two more node.
  • Let’s setup our first node, connect to ESX, open console of the first node and turn it on. You will be prompted to format ifs partition for all drives, select yes
  • Next we get to the wizard that will walk us through configuring the cluster (Note: if you make a typo and need to get back, simply tack “back” at the prompt)

2-11-2013 11-15-22 AM

  • Since this is the first node of our cluster we are going to select 1 to Create a new cluster, create password for root and admin account, select No for Enabling SupportIQ, this is a virtual appliance we don’t need to enable email-home support, enter cluster name, select enter for default encoding (utf-8)
  • Next we are going to configure intra-network settings, this is the network that Isilon nodes use to communicate to each other. I am using my private network (vSwitch1, 192.168.1.0/24) network.

2-11-2013 11-15-39 AM

  • Select 1 to configure netmask, select 3 to configure intra-cluster ip range. On my private network i will use range 192.168.1.50-53 where 192.168.1.50 is my low ip and 192.168.1.53 is my high ip.
  • Now that we are done with our internal network we are going to configure our external network, select 1 to configure external interface

2-11-2013 11-15-49 AM

  • Enter subnet mask information and then configure ext-1 ip range information. Next configure default gateway.
  • Now we are ready to configure SmartConnect settings of the cluster. SmartConnect is built-in load-balancer, you can read more about it on support.emc.com document titled “SmartConnect – Optimized Scale-out storage performance and availability”. You can also get a lot of tips on how to configure in production by visiting this blog http://www.jasemccarty.com/blog/

2-11-2013 11-16-02 AM

  • Select 1 to configure zone name, it should match NS record you created in DNS (delegation), in this example i will enter isilonpoc.local. Next configure SmartConnect service IP.
  • Configure DNS Servers and Search Domains (separate multiple entries with a comma)
  • Configure TimeZone and Date/Time
  • Cluster Join mode keep at Manual
  • Commit settings and at this point you should be able to connect to the SmartConnect zone name  (isilonpoc.local) and login to the cluster

Now that we have our cluster going let’s join another node to the cluster,  connect to console of another node and turn it on.

  • You will be prompted to format ifs partition for all drives, select yes
  • Now at the wizard select Join an existing cluster

2-11-2013 11-16-12 AM

  • Isilon node will automatically search local subnet and look for an existing cluster(s), if it finds it will present this option. Select your cluster index # and hit Enter

2-11-2013 11-16-21 AM

  • At this point you should see the new node show up in Isilon WebUI
  • When you login to WebUI you will see a couple of messages about “one or more drives area ready to be replaced”. This is normal since this a virtual appliance and does not have all the drives physical nodes have. Ssh into the cluster and clear  all alerts/events but running these two commands: “isi events cancel all” and “isi events quiet all”

Thank you @ChristopherImes for some pointers

Welcome!

Welcome, content coming soon. Don’t forget to follow this blog and receive notifications of my latest posts.

 

~Dynamox