Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1002641.1
Update Date:2011-06-10
Keywords:

Solution Type  Troubleshooting Sure

Solution  1002641.1 :   Troubleshooting Sun StorEdge[TM] 351x and 33x0 Controllers  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
203641
Before replacing a controller, confirm  overall Array health . Other components and/or conditions may have caused the current controller state. Most controller failures are caused by firmware detected problems and not actual controller hardware problems. Generally speaking, the controller should be one of the very  last components  replaced.

Applies to:

Sun Storage 3511 SATA Array - Version: Not Applicable and later   [Release: N/A and later ]
Sun Storage 3510 FC Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3310 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3320 SCSI Array - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Purpose

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Storage Disk Disk 3000 Series RAID Arrays Community.

Description


Troubleshooting Sun Storage[TM] 351x and 33x0 Controllers and Array Health.

Symptoms:

- "show config" shows a failed controller

-"show redundancy" shows Redundancy status: Failed or Scanning on a dual controller array

-"show disks" shows only one channel on a dual controller array

-"show channel" shows only one channel on a dual channel array

-Controller Alert: Redundant Controller Failure Detected

-Amber controller LED indicating failure

-Chassis sounds audible alarm

-Controller appear hung

-DRAM Parity errors

Please refer to the Sun StorEdge 3000 Family RAID Firmware 4.x User Guide, Appendix E for additional Controller related Event Messages.


Last Review Date

May 13, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow



NOTE: This is a sub-set of DocID 1011431.1 : "Troubleshooting Sun StorEdge[TM] 33x0 /351x Hardware". The steps below will help verify and resolve controller problems.

Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to the document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Step 1 - Verify the Redundancy status of the controllers is Enabled and the correct number of controller serial numbers are visible by issuing a "show redundancy" command from the sccli .

Refer to Chapter 6 of the Sun StorEdge Family FRU Installation Guide or the Sun StorEdge 3000 Family CLI 2.x User's Guide for an explanation of possible states.

Redundancy enabled on dual controller array:

sccli> show redundancy

   Primary controller serial number: 8003752
   Primary controller location: Lower
   Redundancy mode: Active-Active
   Redundancy status: Enabled
   Secondary controller serial number: 8000596


Example of failed state:

sccli> show redundancy

    Primary controller serial number: 8021942
    Primary controller location: Lower
    Redundancy mode: Active-Active
    Redundancy status: Failed
    Secondary controller serial number: 8025533

Step 2 - Verify the FRU status of the FC_RAID_IOM is OK by issuing a "show fru" command from the sccli .


Note: In a dual controller array, two FC_RAID_IOMs are listed as FRUS:


3510 Examples:

Name: FC_RAID_IOM
Description: SE3510 I/O w/SES + RAID Cont 1GB
Part Number: 370-5537
Serial Number: 000665
Revision: 04
Initial Hardware Dash Level: 04
FRU Shortname: 370-5537-04
Manufacturing Date: Wed Sep 17 05:36:35 2003
Manufacturing Location: Milpitas California, USA
Manufacturer JEDEC ID: 0x0301
FRU Location: UPPER FC RAID IOM SLOT ,
Chassis Serial Number: 000F26
FRU Status: OK


Lower controller has a Fault state:

Name: FC_RAID_IOM\
Description: SE3510 I/O w/SES + RAID Cont 1GB
Part Number: 370-5537
Serial Number: 003115
Revision: 04
Initial Hardware Dash Level: 04
FRU Shortname: 370-5537-04
Manufacturing Date: Thu Sep 11 14:27:35 2003
Manufacturing Location: Milpitas California, USA
Manufacturer JEDEC ID: 0x0301
FRU Location: LOWER FC RAID IOM SLOT
Chassis Serial Number: 000F26
FRU Status: Fault

Step 3 - Verify the health of the enclosure and all listed components are OK by issuing a "show enclosure-status" command from the sccli:

3510 Example:

sccli> show enclosure-status

Ch Id Chassis Vendor/Product ID Rev PLD WWNN WWPN
-------------------------------------------------------------------------------

2 12 000F26 SUN StorEdge 3510F A 1080 1000 204000C0FF000F26 214000C0FF000F26
3 12 000F26 SUN StorEdge 3510F A 1080 1000 204000C0FF000F26 224000C0FF000F26

Enclosure Component Status:
Type Unit Status FRU P/N FRU S/N Add'l Data

      Fan 0    OK 370-5398 006568 --
      Fan 1    OK 370-5398 006568 --
      Fan 2    OK 370-5398 006573 --
      Fan 3    OK 370-5398 006573 --
      PS  0    OK 370-5398 006568 --
      PS  1    OK 370-5398 006573 --
      Temp 0 OK 370-5535 000F26 temp=33
      Temp 1 OK 370-5535 000F26 temp=35
      Temp 2 OK 370-5535 000F26 temp=33
...........
 DiskSlot 7 OK 370-5535 000F26 addr=7,led=off
 DiskSlot 8 OK 370-5535 000F26 addr=8,led=off
 DiskSlot 9 OK 370-5535 000F26addr=9,led=off
DiskSlot 10 OK 370-5535 000F26 addr=10,led=off
DiskSlot 11 OK 370-5535 000F26 addr=11,led=off

Step 4 - Verify there are no critical controller events by issuing the sccli> "show persistent" command (if 4.x firmware installed ( determined via sccli> "show inq" command), issue the command out of band, otherwise issue "show events" .

See Appendix E of the Sun StorEdge 3000 Family RAID Firmware 4.x User's Guide for a list of controller events.

Examples:

# sccli -o 129.153.49.188 show persistent

or

sccli> show events

Step 5 - Verify there are no mis-seated or marginal controllers/IOMs

Refer to Doc ID 1006856.1 "Troubleshooting StorEdge [TM] 351x Redundant Loop Failures".

Step 6 - Verify the SES or PLD firmware is not mis-matched by issuing the sccli "show ses" command .

If a SES or PLD mis-match is detected, refer to Doc ID 1012024.1 "How To Sun StorEdge[TM] 351x Array: SES or PLD Firmware Mismatches".



Example of matched PLD and SES:

sccli> show ses

Ch Id Chassis Vendor/Product ID Rev PLD WWNN WWPN
-------------------------------------------------------------------------------

2 12 000F26 SUN StorEdge 3510F A 1080 1000 2040000FF000F26 214000C0FF000F26
3 12 000F26 SUN StorEdge 3510F A 1080 1000 204000C0FF000F26 224000C0FF000F26

Example of unmatched PLD:

sccli> show ses

Ch Id Chassis Vendor Product ID Rev PLD WWPN
-----------------------------------------------------------------------

2 124 SUN StorEdge 3510F A 1040 A000 204000C0FF000008
3 124 SUN StorEdge 3510F A 1040 8B00* 204000C0FF000008


* indicates SES or PLD firmware mismatch.

Step 7 - If physical access to the array is possible, verify the controller LED is Solid or blinking Green.

If both controller LEDs are flashing or solid green, refer to Doc ID 1017618.1 "How to Resolve  RAID Controller "Race Conditions" on a StorEdge 3310, SE 3320, SE 3510, or 3511 Array".

Step 8 - Verify SFP Link status LED is Solid Green.

If not lit, re-seat and/or replace SFP and/or cable. If SFP LED continues to remain unlit, insert cable to another port on the HBA.

For further information, see Doc ID 1009556.1: "Verifying HBA Connectivity".

Step 9 - In a dual controller array, if redundancy status is failed, but redundancy mode is Active-Active and primary serial number is seen:

Issue sccli> sec unfail  command and reply y when prompted. (issue sccli> show redund to verify)

Wait up to 5 minutes for device detection before the controller redundancy status is Enabled.

Issue sccli> show redundancy command to confirm redundancy status is Enabled .

Step 10 - If FC_RAID_IOM modules (controller or IOM) are missing from the sccli "show frus" command...

see Doc ID 1012692.1: " Sun StorEdge [TM] 351x Array: How to Resolve Devices Missing from The Se3kxtr "show_frus" and "show_ses-devices" Output ".


Step 11 - If symptoms remain, stop i/o to the array and reset controller

see Doc ID 1010657.1: "The Proper Way to De-stage Cache in a Sun StorEdge[TM] 351x/33x0 Array is to Use the "shutdown" Command"

Issue the sccli> show redundancy command to confirm redundancy status is Enabled.

Step 12 - If step 11 above fails, power off and on the array.

Issue the sccli> "show redundancy" command to confirm redundancy status is Enabled.

Step 13 - If controller remains in a Failed state, replace the controller using the following :


Doc ID 1018906.1: " Sun StorEdge[TM] 3510 FC Array and Sun StorEdge[TM] 3511 SATA Array: Replacing the I/O Controller Module" .

Sun StorEdge 3310 SCSI Array Controller Module Replacement Guide
Sun StorEdge 3320 SCSI Array Controller Module Replacement Guide
Sun StorEdge 3510 FC Array and 3511 SATA Array Controller Replacement Guide
Sun StorEdge 3000 Family FRU Installation Guide
FAB 1017358.1: Sun StorEdge 3310/3510/3511 controllers must be allowed to complete the firmware cross loading process during controller replacement.

If other problems are found during the course of this document refer back to

DocID: 1011431.1: " Troubleshooting Sun StorEdge [TM] 33x0/351x Hardware."

At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required. Please engage the next level of support.


http://pts-storage.us.oracle.com/products/SE33xx/toi/nvram.html

Change History
Date: 2010-12-08
User Name: [email protected]
Action: Currency & Update links


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback