Sun StorEdge[TM] 3510:Tech Tip: Fail-over Functionality Testing

Asset ID:	1-71-1003370.1
Update Date:	2009-01-25
Keywords:

Solution Type Technical Instruction Sure

Solution 1003370.1 : Sun StorEdge[TM] 3510:Tech Tip: Fail-over Functionality Testing

Related Items


Sun Storage 3510 FC Array

Related Categories


GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays

PreviouslyPublishedAs
204691

Description
This procedure is designed to be used when the functionality of se3510 fail-over is in doubt. It could also potentially be used during initial install of the product.
Assumptions:

The se3510 is dual controller.
The customer is using Leadville and STMS and they are correctly installed.
All switches, host HBA's and cables are presumed to be good.
Any StorAde probing is disabled for the duration of the procedure, unless otherwise noted.
This procedure is followed during a maintenance window .
This Procedure is based on all LD's mapped to PID'S and SID's on channels 0 and 5. The fiber cables are attached to upper channel 0 and lower channel 5.
Luxadm commands will take much longer than normal while the array is under a load.
Never try cable pull testing on drive channels!

NOTE: It is possible that you have a system using 4-8 3510 host connections. If you are knowledgeable on the architecture, you can modify this procedure to test the extra channels. Make sure you test each connection by itself for 2-5 minutes under the iostat load. You can also insert additional testing. For instance, at step one you could make sure the array is discovered by ssconsole, then close the console and reopen, then close on each single connect test, ensuring array discovery through each controller on each channel under load.

Steps to Follow
Sun StorEdge[TM] 3510:Tech Tip: Fail-over Functionality Testing

Verify that all the expected mpxio paths are online using luxadm display <raw path to lun slice 2>, you can determine the paths using format.

luxadm display /dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2

                                 ^ use the raw path!

In a window start a #tail -f /var/adm/messages
In another window start an #iostat -xnMcz 5
In another window start an sccli in band session with the attached

array.
In another window generate some I/O load to all of the luns presented by the array using the raw mpxio paths. Use 1 dd read instance per Lun on the slice 2 path. Wait for the iostat output to stabilize (1 minute).
```
# dd if=/dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2 of=/dev/null
```
```
bs=32k &
```
From the sccli prompt fail the primary controller. Make a note of

which controller has the blinking green LED as it is the present primary

controller.
```
sccli> fail primary
```
The activity may slow down or even seem to stop in iostat, but it should come back in no more than 2 minutes. Messages may appear in the messages window, but keep an eye on iostat. The most important thing is that the I/O continues. Give the system 2 min to stabilize.
```
#luxadm display /dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2
```
This should show all paths online.
```
sccli>show inq
```
(this ensures that the secondary controller can service sccli)

At this point you have forced a take-over by the default secondary. You could establish a telnet session to ensure this controller's network port is working. (They should both be plugged into the network).
This step is a cable pull, only one path will be taken OFFLINE in this step.

Pull the upper channel 0 cable.

Watch /var/adm/messages for the offline or degraded path messages.

#luxadm display /dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2

The number of paths should be reduced by half. Logically, if your number of paths have been reduced to one and iostat continues to show activity, you really do not need to look at the paths to know that the one is working.

Again, you should see some messages in /var/adm/messages. Iostat may have another slowdown or percieved stop. Iostat should stablize within 2 minutes and bandwidth may be affected by as much as half. Connectivity is the key. Now plug upper channel 0 back in to ONLINE the path.

Wait 2 minutes for the system to stablize.

Look for the path ONLINE messages in /var/adm/messages
```
#luxadm display /dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2
```
The number of paths should be back to normal. Iostat bandwith, if

it was affected, should be back to normal.
Repeat step 8 for lower channel 5.

Pull the lower channel 5 cable.
```
#luxadm display /dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2
```
The number of paths should be reduced by half.

Again, you should see some messages in /var/adm/messages. Iostat may have another slowdown or percieved stop and should stablize within 2 minutes. Bandwidth may be affected by as much as half. Connectivity is the key. Now plug upper channel 5 back in to ONLINE the path.

Wait 2 minutes for the system to stablize.
```
#luxadm display /dev/rdsk/c8t600C0FF00000000000136C6215C0B500d0s2
```
The number of paths should be back to normal. Iostat bandwith, if it was affected, should be back to normal.
Deassert (restore) the failed controller from step 7

sccli> unfail primary

wait 5 minutes for the system to stablize.

You may see some activity in /var/adm/messages. Iostat should be mostly unaffected. If you are watching the array LED's, the unfailed controller will be yellow during post. Then both will go yellow for a second and finally the unfailed one will go solid green showing it is now the secondary.
The original secondary controller is now primary and this procedure has verified it's base functionality. We will now fail the present primary; making the original primary, "primary" again. Then we will accomplish all the same tests on the original primary. To accomplish this, repeat steps 6 through 9 verbatim. When you are done, after you do step 9 again the se3510 will be in it's original state. Once you are done running steps 6-9 to test the paths on the primary controller, you are done.

You can now
```
#pkill dd
```
Quit out of iostat and tail, and power cycle the se3510.

Product
Sun StorageTek 3510 FC Array

se3510, 3510, sccli, failover, fail, over
Previously Published As
73551

Change History
Date: 2005-03-16
User Name: 7058
Action: Approved
Comment: Reformatted entire document so that all the unnecessary whitespace between lines went away.
OK to republish.
Review date updated.
Version: 3
Date: 2005-03-15
User Name: 7058
Action: Accept
Comment:
Version: 0
Date: 2005-03-15

Attachments

This solution has no attachment