Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1017618.1
Update Date:2011-05-13
Keywords:

Solution Type  Troubleshooting Sure

Solution  1017618.1 :   How To Resolve RAID Controller "Race Conditions" on a StorEdge[TM} 3310, 3320, 3510, or 3511 Array  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
228794
This document describes how to resolve a RAID controller "race condition" on a Storage 3310,

Applies to:

Sun Storage 3310 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3320 SCSI Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3510 FC Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3511 SATA Array - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Purpose

This document will explain the symptoms seen with  StorEdge[TM] 3000  controller "race"
conditions, and then describe the troubleshooting steps used to resolve this problem.

Race Condition Symptoms:
 
On a StorEdge[TM] 3000 RAID chassis, with two RAID controller modules installed:

1. BOTH RAID controller status LEDs are flashing green.
2. the TCP/IP (ethernet) connection may not respond. 
3. Serial console (Console Menu Interface) sends garbled characters 
(or may work initially then hang after a short time).
 4. The RAID chassis may not respond to sccli commands, and/or may not send/accept host I/O.
 

Note: Either RAID controller will work correctly when it is the only RAID controller installed.
By definition, this condition cannot occur on a RAID chassis that is configured with only
one RAID controller module (single controller array).



Explanation:

In this situation, the RAID controllers have gotten into a "race condition" in that
they  both are functioning as the Primary Controller.

As each takes on the Primary RAID controller role, they both attempt to:
 
 
1.Service requests to the LUNs on all of the host channels.
2.Send and receive on the serial port (which is a common bus for both controllers).
3.Send and receive data on the ethernet port using the same IP and MAC addresses.

Cause:
 
This problem can be attributed to either an improper controller firmware upgrade,
or may occur when a 3.2x controller was installed in a 4.x array.
 
In most cases, the end result is that the array has 3.2x NVRAM, in a 4.x controller firmware array.


 

Last Review Date

May 13, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Troubleshooting Steps:

The following procedure has been proven to resolve this "race" condition. It requires an
approximately 30 minute service outage and a host with sccli (preferably out of band,
TCP/IP) access to the afflicted RAID chassis.


**NOTE: The following definitions apply for purposes of this procedure:

- RAID controller module - The FRU that contains the RAID controller. This maybe 
a RAID controller only (33x0) or a RAID I/O module (351x).
- RAID chassis - The StorEdge[TM] 3xxx chassis that contains one or more RAID controller modules.
- S/N - The Serial Number of the RAID chassis which can be found on a sticker on the 
bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has
the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
- serial Console Menu Interface - The menuing interface that is accessed through the 
DB-9 serial connection on the RAID controller module. This can be accessed with
a terminal emulator such as tip (UNIX), minico (Linux), or Hyperterm (non-UNIX/Linux PC).
The settings are 8 bit, Noparity, 1 Stop bit, 38400 (or 9600) baud rate.

Steps to Follow:

1. Ensure there is a correct version of the show_config.xml file from either explorer
or se3kxtr. If not, this procedure will not work.

2.Verify with the customer that no services (including mounted filesystems) are
using this RAID chassis.

3.Power off the RAID chassis using the power switches on the two PCUs.

4.Pull the BOTTOM RAID controller module part way out of the RAID chassis.

5.Wait 10 seconds, then power the RAID chassis on.

6.Wait for the TOP RAID controller module to boot up (approx. 90 seconds),

7.From the host start an sccli session: /usr/sbin/sccli
**NOTE: For TCP/IP (out of band) access: /usr/sbin/sccli <ip-address>

8.Select (or verify that sccli displays) the correct S/N for the RAID chassis
you are working on.


a)Type the command:  reset  nvram  (Confirm "Y" when prompted.)

b)Type the command: exit


9.Power off the RAID chassis using the power switches on the two PCUs.

10.Fully insert the BOTTOM RAID controller module (ensure it is fully seated).

11.Pull the TOP RAID controller module part way out of the RAID chassis.

12.Power the RAID chassis on.

13.Wait for the BOTTOM RAID controller module to boot up (approx. 90
seconds), then from the host start an sccli session: /usr/sbin/sccli
**NOTE: For TCP/IP (out of band) access: /usr/sbin/sccli <ip-address>

14.Select (or verify that sccli displays) the correct S/N for the RAID chassis
you are working on.


a)Type the command:  reset  nvram  (Confirm "Y" when prompted.)

b)Type the command: exit


15. Power off the RAID chassis.

16. Reinsert the TOP RAID controller.

17. Power up the RAID array.

18. From sccli, issue the sccli> show redundancy command, and verify the Status is "Enabled".


19. Verify the IP address is set correctly on the array, either:

a) using the sccli> show ip command.

OR:

b)Using the serial Console Menu Interface select "view and Edit Configuration Parameters" ->Communication
Parameters->TCP/IP Address.

If this value is incorrect, set this value.

20. Next step will be to restore the array configuration using the s3kdlres tool, and the show_config.xml file
verified in step 1:

 a)Type the command: cd /opt/SUNWsscs/sbin
b)Type either: ./s3kdlres /var/tmp/show_cfg.xml --device=/dev/rdsk/c4t4d0s2 <----use for in-band device entry
OR:
./s3kdlres /var/tmp/show_cfg.xml --device=10.145.229.23 <----use for out of band


Refer to the Sun StorEdge[TM] 3000 Family RAID Controller Firmware Migration Guide (819-6573-11)
for the command line syntax details.

19.End of procedure.



race, condition, both, flashing, green, garbled, characters, serial, console, hang, hung
Previously Published As
79042

Change History
Date: 2010-10-04
User Name: [email protected]
Action: Currency & Update
Date: 2007-06-29
User Name: 7058
Action: Add Comment
Comment: Notes for Normalizaton:


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback