Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1009274.1
Update Date:2011-05-13
Keywords:

Solution Type  Technical Instruction Sure

Solution  1009274.1 :   Sun StorEdge[TM] 3510 FC Array: Understanding The Internal Loop Architecture During Controller Failover  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
212837


Applies to:

Sun Storage 3310 Array
Sun Storage 3510 FC Array
All Platforms

Goal

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Storage Disk 3000 Series RAID Arrays

Description

Understanding the backend loop architecture of these arrays is necessary to understand controller failover and replacement.

Solution

Steps to Follow

To understand controller failover in a dual controller configuration there are several things to keep in mind:

 

1.  The controller module and I/O module (IOM) are integrated onto a single fru commonly known as the controller.  (For this discussion we will refer to them separately.)

a.  When you plug into a channel you are actually connecting to an FC-AL loop through the IOM and not directly to the controller.

b. When a controller within an IOM fails, that IOM maintains connectivity to the alternate controller providing two potential active paths between the array and the host unless of course the IOM fails.

c. Normally, both IOMs are on the same internal FCAL loop as both controllers at all times as described below.

2.  The I/O modules inter-connect to a mini-hub via the midplane.

a.  Each host channel of the top IOM shares a loop with the matching host channel on the bottom IOM.

b. For example, upper ch0 and lower ch0 are connected to the same FCAL loop.  This is true for host channels 0,1,4, and 5.

c. Channel 0 of each controller connects to a port bypass circuit (PBC) and are then connected to each other through the midplane.  Each of  the PBC's connected to the RAID controllers are then connected to a second PBC that leads to the SFP port.  Channels 1,4 and 5 follow similar architecture.

3.  The drive channels 2 and 3, (L and R ports) are connected to the internal dual-ported FC disk drives.

a.  The L and R ports of the upper IOM are both connected to drive loop a.

b.  The L and R ports of the lower IOM are both connected to drive loop b.

4.  Each controller is a member of all 6 channel loops.

The following is output from sccli displaying the internal drive loop map for channel 2.

sccli> show loop-map channel 2
15 devices found in loop map
=== Channel Loop Map retrieved from CH 2 ID 12 ===
AL_PA   SEL_ID  SEL_ID  TYPE    ENCL_ID SLOT
(hex) (hex) (dec)
-----   -----   -----   ----    ------  ----
CE 0F 15 RAID N/A N/A
D4 0B 11 DISK 0 11
DC 06 6 DISK 0 6
D5 0A 10 DISK 0 10
DA 07 7 DISK 0 7
D3 0C 12 SES 0 N/A
D1 0E 14 RAID N/A N/A
E8 01 1 DISK 0 1
E1 04 4 DISK 0 4
E4 02 2 DISK 0 2
E2 03 3 DISK 0 3
E0 05 5 DISK 0 5
EF 00 0 DISK 0 0
D9 08 8 DISK 0 8
D6 09 9 DISK 0 9
sccli> show loop-map channel 3
15 devices found in loop map
=== Channel Loop Map retrieved from CH 3 ID 12 ===
AL_PA SEL_ID SEL_ID TYPE ENCL_ID SLOT
(hex) (hex) (dec)
-----   -----   -----   ----    ------  ----
CE 0F 15 RAID N/A N/A
E8 01 1 DISK 0 1
E1 04 4 DISK 0 4
E4 02 2 DISK 0 2
E2 03 3 DISK 0 3
E0 05 5 DISK 0 5
EF 00 0 DISK 0 0
D9 08 8 DISK 0 8
D6 09 9 DISK 0 9
D1 0E 14 RAID N/A N/A
D4 0B 11 DISK 0 11
DC 06 6 DISK 0 6
D5 0A 10 DISK 0 10
DA 07 7 DISK 0 7
D3 0C 12 SES 0 N/A
sccli>

Both RAID controllers and the drives are on both channel loops 2 and 3.  On each loop you will only see one SES device since you are communicating to each loop through the single SES device on each IOM.

Note that the host channels are not on the same loop.  The host channels access the drives through the controller.

If one controller fails in a redundant controller configuration, the surviving controller takes over for the failed controller until it is replaced.  The surviving controller then manages all processes for the failed controller.  The surviving controller is always the primary controller regardless of its original status, and any replacement controller afterwards becomes the secondary controller.  Any subsequent reset of the array may cause the controller with the highest order serial number to become the primary controller.

Failover and failback processes are transparent to the host since you are not connecting directly to the controller itself but rather participating in a loop via the I/O module.

To demonstrate this fact, in our test case we have the following configuration:

A dual controller 3510 in loop only mode.

Direct attached via dual port qlgc hba to a host.

One cable to primary channel 0.

One cable to secondary channel 0.

One logical drive mapped to primary channel 0.

For this example, no multipathing software such as DMP or MPXIO are in use to demonstrate that when a controller fails, the controller failover occurs without any actual host path failover.  

In real practice you would need to setup multipathing software to provide high availability in cases where other components fail such as th IOM, sfp's, cables, switches, hba's and so forth.

Configuring a High Availabilty DAS configuration in this way is not in accordance with best practices. However, in this way we can demonstrate that the upper and lower channel 0 ports are both on the same FC-AL loop since we will only map the ld once on primary channel 0.



Refer to *Sun[TM] StorEdge  3000 Family Best Practices Manual, 816-7325 for best practices.

sccli> show lun-maps
Ch  Tgt   LUN   ld/lv      ID-Partition         Assigned  Filter Map
--------------------------------------------------------------
0    40    0      ld0        7F9D1116-00      Primary

Note from the show port-wwns below that ch0 PID=40 and ch0 SID=41.  These will show up in format as the target id.

sccli> show port-wwns
Ch  Id   WWPN
-------------------------
0  40   216000C0FF804B07   <--this is the port wwn of ch 0, top IOM, lun mapped here
0 41 216000C0FF904B07 <--this is the port wwn of ch 0, bottom IOM
1 36 226000C0FF404B07
4 44 256000C0FFC04B07
5 46 266000C0FFE04B07
sccli> show logical-drives
LD    LD-ID         Size       Assigned        Type      Disks Spare  Failed  Status
-----------------------------------------------------------------------------
ld0   7F9D1116   136.23GB     Primary         RAID5      3        2         0       Good

In the output below note that both hbas (" Unknown Type") and both 3510 disks show up on each dump_map for the two hba instances.  This is because both the upper and lower array host channels share the same loop and therefore both hbas attached are also on the same FCAL loop.  Also note that the  Port WWN  of each disk corresponds to the  WWPN  from sccli show port-wwns.

# luxadm -e dump_map /devices/pci@1f,0/pci@1/SUNW,qlc@1/fp@0,0:devctl
Pos AL_PA  ID  Hard_Addr   Port WWN                Node WWN              Type
0 1 7d 0 210000e08b0794b5 200000e08b0794b5 0x1f (Unknown Type,Host Bus Adapter)
1 a7 28 a7 216000c0ff804b07 206000c0ff004b07 0x0 (Disk device) <-ld on top IOM ch 0
2 2 7c 0 210100e08b2794b5 200100e08b2794b5 0x1f (Unknown Type)
3 a6 29 a6 216000c0ff904b07 206000c0ff004b07 0xd (Disk device) <-ld on bottom IOM ch 0

root@richie / # luxadm -e dump_map /devices/pci@1f,0/pci@1/SUNW,qlc@1,1/fp@0,0:devctl
Pos AL_PA ID Hard_Addr   Port WWN                  Node WWN         Type
0     1         7d    0         210000e08b0794b5   200000e08b0794b5   0x1f  (Unknown Type)
1 a7 28 a7 216000c0ff804b07 206000c0ff004b07 0x0 (Disk device) <-ld on top IOM ch 0
2 2 7c 0 210100e08b2794b5 200100e08b2794b5 0x1f (Unknown Type,Host Bus Adapter)
3 a6 29 a6 216000c0ff904b07 206000c0ff004b07 0xd (Disk device) <-ld on bottom IOM ch 0

In the format output below, disks 1 and 2 are the same ld0.  They both have the same target number 40 which is the PID of channel 0 even though the SID of channel 0 is 41 and we have indeed plugged into both 40 and 41.  Also, they have the same wwn of 216000c0ff804b07 which, from show port-wwns, is the port wwn of PID ch 0 and not SID ch 0.  Remember that ld0 is owned by the primary controller.

# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@1f,0/pci@1,1/ide@3/dad@0,0
       1. c3t40d0 
          /pci@1f,0/pci@1/SUNW,qlc@1/fp@0,0/ssd@w216000c0ff804b07,0
       2. c4t40d0 
          /pci@1f,0/pci@1/SUNW,qlc@1,1/fp@0,0/ssd@w216000c0ff804b07,0
Specify disk (enter its number): ^D

We put a filesystem  and some data on  c3t40d0s0 and mounted it on /3510.

# df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c3t40d0s0     122M    10M    99M    10%    /3510
root@richie /3510 # ls
./                          ../                         SAN_4.4b_install_it.tar.Z

Then the lab ran dex to generate I/O to it.  During this test we failed the primary controller:

sccli> fail primary

dex ran without error:

Pass 100, Errors 0, Elapsed time= 4:53 min.

Here is a snapshot of iostat at the time the controller was failed:

# iostat -xpnz 3
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    1.7    5.3   13.3   40.3  0.0  0.1    0.0    9.3   0   7 c0t0d0
    1.7    5.3   13.3   40.3  0.0  0.1    0.0    9.3   0   7 c0t0d0s0
 2954.8    0.0 5909.7    0.0  0.0  1.1    0.0    0.4   3  67 c3t40d0   <---controller failed here!

                    extended device statistics       
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.3   10.3    2.7   78.0  0.0  0.1    0.0    8.0   0   9 c0t0d0
    0.3   10.3    2.7   78.0  0.0  0.1    0.0    8.0   0   9 c0t0d0s0
    2.0    0.7    0.1    0.0  0.0  0.0    0.0    0.0   0   0 c4t40d0
    2.0    0.7    0.1    0.0  0.0  0.0    0.0    0.0   0   0 c4t40d0s2
  259.7    0.7  503.7    0.0  0.0 15.1    0.0   58.0   0  99 c3t40d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.3   30.3    2.7  236.6  8.0  0.5  259.5   14.9  21  25 c0t0d0
    0.3   30.3    2.7  236.6  8.0  0.5  259.5   14.9  21  25 c0t0d0s0
 1219.2    0.0 2438.5    0.0  0.0 12.1    0.0    9.9   1  99 c3t40d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 4380.2    0.0 8760.5    0.0  0.0  1.6    0.0    0.4   4  96 c3t40d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 4284.3    0.0 8568.6    0.0  0.0  1.6    0.0    0.4   4  93 c3t40d0

It looks like the throughput slowed down for a few seconds probably due to a lip as shown in a tail of /var/adm/messages:

Sep  1 15:31:26 richie fp: [ID 517869 kern.info] NOTICE: fp(2): PLOGI to a7 failed state=Packet Transport error, reason=No Connection
Sep  1 15:31:26 richie scsi: [ID 243001 kern.warning] WARNING: /pci@1f,0/pci@1/SUNW,qlc@1/fp@0,0 (fcp2):
Sep  1 15:31:26 richie  PLOGI to D_ID=0xa7 failed: State:Packet Transport error, Reason:No Connection. Giving up
Sep  1 15:31:26 richie qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop OFFLINE
Sep  1 15:31:26 richie qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(3): Loop OFFLINE
Sep  1 15:31:26 richie qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE
Sep  1 15:31:26 richie qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(3): Loop ONLINE
Sep  1 15:31:27 richie scsi: [ID 243001 kern.info] /pci@1f,0/pci@1/SUNW,qlc@1/fp@0,0 (fcp2):
Sep  1 15:31:27 richie scsi: [ID 243001 kern.info] /pci@1f,0/pci@1/SUNW,qlc@1,1/fp@0,0 (fcp3):

With the primary controller failed, luxadm -e dump_map still shows both paths are up:

root@richie / # luxadm -e dump_map /devices/pci@1f,0/pci@1/SUNW,qlc@1/fp@0,0:devctl
Pos AL_PA ID   Hard_Addr Port WWN                  Node WWN               Type
0     1          7d    0       210000e08b0794b5    200000e08b0794b5    0x1f (Unknown Type,Host Bus Adapter)
1 2 7c 0 210100e08b2794b5 200100e08b2794b5 0x1f (Unknown Type)
2 a7 28 a7 216000c0ff804b07 206000c0ff004b07 0x0 (Disk device)
3 a6 29 a6 216000c0ff904b07 206000c0ff004b07 0xd (Disk device)
root@richie / # luxadm -e dump_map /devices/pci@1f,0/pci@1/SUNW,qlc@1,1/fp@0,0:devctl
Pos AL_PA ID   Hard_Addr   Port WWN                 Node WWN                Type
0     1           7d    0       210000e08b0794b5    200000e08b0794b5    0x1f (Unknown Type)
1 2 7c 0 210100e08b2794b5 200100e08b2794b5 0x1f (Unknown Type,Host Bus Adapter)
2 a7 28 a7 216000c0ff804b07 206000c0ff004b07 0x0 (Disk device)
3 a6 29 a6 216000c0ff904b07 206000c0ff004b07 0xd (Disk device)

Conclusion:

Controller failover is invisible to the host in so far as users are concerned since both hba connections on a given array channel and both array controllers are already on the same FC-AL loop.




Although the 3510 has dual back-end drive channels, the product architecture has no provisions to
isolate a failed component. Due to this limitation it is possible for both channels to be affected by a
controller failure. There may be cases where the disruption on the disk channels is severe enough to
affect the operation of the surviving controller. In this particular instance all drives changed state to
NONE or USED due to the automatic drive scan feature of the 3510.
For more information see CR 6319024
 

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback