Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1120725.1
Update Date:2010-09-20
Keywords:

Solution Type  Troubleshooting Sure

Solution  1120725.1 :   Troubleshooting Sun Storage[TM] 6580/6780 Host Interface Card Faults  


Related Items
  • LSI SANtricity Storage Manager
  •  
  • Sun Storage 6780 Array
  •  
  • Sun Storage 6580 Array
  •  
  • Sun Storage Common Array Manager (CAM)
  •  
Related Categories
  • GCS>Sun Microsystems>Storage Software>Modular Disk Device Software
  •  




In this Document
  Purpose
  Last Review Date
  Instructions for the Reader
  Troubleshooting Details


Applies to:

LSI SANtricity Storage Manager - Version: All Versions and later   [Release: All Releases and later ]
Sun Storage 6780 Array - Version: Not Applicable and later    [Release: NA and later]
Sun Storage Common Array Manager (CAM) - Version: 6.2 to 6.5   [Release: 6.2 to 6.5]
Sun Storage 6580 Array - Version: Not Applicable and later    [Release: NA and later]
Information in this document applies to any platform.

Purpose

This document is intended to provide a basic overview on how to troubleshoot faults with the array RAID controller Host Interface Cards (HIC).

Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Symptoms:

  • Controller Offline Critical Fault(xx.66.1028).
  • Seven Segment Display of SE+dF+dash+CF+H#+ blank (repeats).
  • SANtricity or Common Array Manager report a critical fault of Host IO Card has Failed (xx.66.1150).
  • SANtricity or Common Array Manager report a critical fault of Host IO Card is Degraded (xx.66.1300).
  • SANtricity or Common Array Manager reports a critical fault of One or More Volumes not on their Preferred Path (xx.66.1010).
  • Array Error log event ID .

HIC faults come in two basic varieties.

  • A configuration fault during array controller boot, where the two controllers do not match each others configuration.
  • A diagnostic fault of th Host IO card itself, which results in a degraded or failed status.

A degraded HIC is one where at least one host port is able to transport data.  A failed Host Card is one where no host ports can transport data.  In both of these cases, the array controller will still be in an Online state.

A volume not on its preferred path is a secondary symptom to a HIC fault, and if none of the other alarms are present, should be ignored as part of this document.  This fault will not be addressed beyond this point in the document.


Last Review Date

June 9, 2010

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

1. Verify the Critical Faults on the array

Reference <Document: 1021057.1> Verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 Critical Faults via the User Interface.

  • If the Critical Fault is for Controller Offline (xx.66.1028), or if there are no critical faults, go to Step 2.
  • If the Critical Fault is for either HIC Degraded (xx.66.1300) or HIC Failed (xx.66.1150), go to Step 5.

2. Verify the Seven Segment display on the array controller module

Look at the Seven Segment Display on the Array Controller Module. This display also serves as the array controller module Tray ID indicator.  It is located in the rear (cable side) of the tray.  For more information on the 7-segment display, reference:

<Document: 1021110.1> Sun Storage[TM] 6180, 6580, and 6780 Array Controller 7-Segment Display.

If the display shows in a repeating pattern:

SE+L8+dash+CF+H#+ blank (repeats) 

 Go to Step 6.

If the display shows in a repeating pattern:

OS+CF+H#+ blank (repeats)

or

SE+dF+dash+CF+H#+ blank (repeats)

Got to Step 5.

 

Note 1:  All repeating patterns end with a blank display.
Note 2:  The hash(#) symbol represents a number 1 or 2 which represents the HIC slot being called out on that controller.

  • If the display shows another repeating pattern other than that shown above, the problem is not a problem with the HIC, but something else with the controller.  Please reference <Document: 1021113.1> Troubleshooting Sun StorageTek[TM], Sun StorEdge[TM], and Sun Storage[TM] RAID Controller Failures.
  • If the display shows the tray ID(defaults to 99) of the module, OR you are unable to confirm the ID but the controller is ONLINE, continue to Step 3.
  • If there is a controller offline alarm, and you cannot check the display, go to Step 4.

3. Verify the state/status of the Host Interface Card

Follow the instructions below to look at the state and status of your HIC slots, based on your user environment:

Sun StorageTek Common Array Manager:

Browser:

  1. Expand Storage Arrays in the left menu pane.
  2. Expand your storage array name in the left menu pane.
  3. Expand Troubleshooting in the left menu pane.
  4. Click on FRUs.
  5. In the right display pane, click on HIC.
SSCS CLI:

sscs list -d <array_name> -t hic fru

Sun StorageTek SANtricity Storage Manager:

GUI:
  1. Launch SANtricity.
  2. Double Click on your array name to open the Array Management Window.
  3. In the left pane click on the controller icon for the controller DIMMs you want to view. 
  4. In the right pane the HIC status, for each slot, will be listed after the base controller information.
SMcli:

SMcli -n <array_name> -c "show storageArray profile;"


NOTE 1:  The HIC information will be listed in the CONTROLLERS section of the resultant SMcli output.
NOTE 2:  The HIC information will be Unknown or Unavailable if the controller it resides on is currently OFFLINE.  This is true for all management software.

  • If the status is Optimal, go to Step 4.
  • If the status is Failed, go to Step 5.
  • If the status is Unknown or Removed, and the Controller is offline, go to Step 4.
  • If the status is Unknown or Removed, and the Controller is online, go to Step 6.

4. Verify the existence of Host Interface Card messages in the array event log

Sun StorageTek Common Array Manager:

Browser:
  1. Expand Storage Arrays in the left menu pane.
  2. Expand your storage array name in the left menu pane.
  3. Expand Troubleshooting in the left menu pane.
  4. Click on Events.
  5. In the right pane, click on the -|-> icon.  If you mouse over it it will state Advanced Filter.
  6. Set Event to Log Events.
  7. Set Event Type to Component.
  8. Set Read the last X Kbytes From Log File to 100.
  9. Set String Filter to HIC.
  10. Click on the Details of any alarm that is shown.
  11. Review the Description Field.
  12. Get the value of the array log event ID from the description.

Example:

Description : Apr 08 21:31:31 6780-array Tray.99.Controller.A.HIC.HostCard1: [ID 0x1904] NOTICE: Host interface card failed diagnostic

Note:  The filter in Step 9 is case sensitive.


SSCS CLI:

Get the list of events:

sscs list -d <array_name> -t LogEvent -f HIC event

Get the event details:

sscs list -d array_name event event_id

Note:  The -f option is case sensitive.


Get the value of the array log event ID from the description:

Example:

Description : Apr 08 21:31:31 6780-array Tray.99.Controller.A.DIMM01: [ID 0x1904] NOTICE: Host interface card failed diagnostic


SANtricity Storage Manager:

GUI:
  1. Launch SANtricity.
  2. Double Click on your array name to open the Array Management Window.
  3. Click on the Advanced Menu.
  4. Click on the Troubleshooting Sub-Menu.
  5. Click on View Event Log.
  6. Un-Check View Only Critical Events.
  7. Click on the Component Type field header to sort the events.
  8. Look for Cache DIMM in the list of events.
  9. For any Cache DIMM event, highlight it, and check the View Details box.
  10. Get the value of the Event type field for each DIMM event.
SMcli:

Get the list of events by saving off the event log:

SMcli -n array_name -c "save storageArray allEvents file=\"some/file/path/log.txt\";"

Open a text viewing application to look at the individual events.
Get the value of the Event type field for each DIMM event.

Example Event

Date/Time: 6/8/10 21:52:00 ET
Sequence Number: 12345
Event Type: 1904
Description:  Host interface card failed diagnostic

  • If there is the existence of an event log ID of 0x1904, continue to Step 5.
  • If there are no 0x1904 events, but your array controller is OFFLINE, reference <Document: 1021113.1> Troubleshooting Sun StorageTek[TM], Sun StorEdge[TM], and Sun Storage[TM] RAID Controller Failures.
  • If there are no 0x1904 events, and your array controller is Online/OK, no further work is required, as you have successfully validated that the HICs on your array are working optimally.
5. Open a Service Call with Oracle to have the HIC indicated replaced

The HIC slot for the controller is indicated in either the 0x1904 Event ID or by the Seven Segment Display.

Please open a service call with Oracle with:
  • Support Data Collection
Reference <Document: 1002514.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] Common Array Manager.
Reference <Document: 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.

OR
  • HIC Slot location
  • Array Critical Faults
  • Array Event Log
  • Seven Segment Display Code cited.
6. Open a Service Call with Oracle to for further research

Please provide a Support Data Collection:

Reference <Document: 1002514.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] Common Array Manager.
Reference <Document: 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback