Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1003908.1
Update Date:2010-08-02
Keywords:

Solution Type  Technical Instruction Sure

Solution  1003908.1 :   Sun Storage 3310/3320/3510/3511: SSE or FE steps to follow to assist with RCA.  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
205483


Applies to:

Sun Storage 3310 SCSI Array
Sun Storage 3320 SCSI Array
Sun Storage 3510 FC Array
Sun Storage 3511 SATA Array
All Platforms

Goal

Description

Often we see problems manifest themselves with the Sun Storage 3310/3510/3511 arrays, 
and we often resort to resetting or power cycling the array to resolve the problem and
then escalate in order to find the root cause. This document will describe some of the
recommended steps that should occur BEFORE any actions are taken in order to collect data
to find the root cause.

Solution


Steps to Follow:

There could be any number of a common set of problems which might have ocurred 
with the array. We may think of resetting the array to see if it resolves the issue.

  A few symptoms include:
1. Host not able to access the array but telnet or ping works fine.
2. Garbled characters are seen when connected via the serial port.
3. Controller replacement causes array to be inaccessible.
4. False amber lights where you may see an amber displayed on the Power Supply or
Controller without supporting error log or genuine fru failures.

Things to check and data to collect to assist with RCA:

Note that some of these may be screen snap shots if no sccli access.
1. Check and determine if we are able to access the array in-band and out of band.
2. Check the controller LED's and note the status before doing anything.
Ideally there should be one controller which is the primary and has a green
blinking LED and the secondary controller should be solid green.
3. If we are not able to access the luns in sccli, but are able to access the array out of
band (telnet) or via the serial port:
-Check the channel status
-Check the redundancy status
-Gather the event log which could possibly tell us more.
-For the 3510/3511: Check the SES device status and determine if we are able to see
ALL the SES devices.
-For the 3310/3320, Check the SAFTE devices.

The recommendation is to run an se3k extractor to obtain all this information in one step.

Note: We can get the SES device status and determine if all the SES devices are seen by
connecting to the array via telnet/serial port and selecting the following:
-> view and edit Peripheral devices
-> View Peripheral Device Status

and for a RAID array with no JBODs, you should see:

ITEM STATUS LOCATION
Redundant Controller Failback Complete Primary
SES Device Enclosure Device Channel 2 ID 12

for normal status. If there are one or more JBODs, you will see entries for many SES devices.
For more information on the IDs you should see, refer to Article 1007692.1: 351x Array switch
settings and disk IDs.
4. Please collect the relevant state or screen dump (if serial/telnet interface is used)
and if a dual controller configuration, issue a "ctrl-w" to switch to the alternate controller
and note the SES device status and event logs for the same too.
5. For 3510/3511, if there is any back end loop problems and you are seeing controllers/drives
getting failed, gather the ELS data. Power cycle the array to reset the counters and gather
the diag counters for the back end channels (channels 2 and 3 mainly).
6. For the 3510/3511, monitor the back end loop by issuing the diag error channel command for
channels 2 and 3, specifying 'target all' to gather invalid transmission words and CRC error
counts. You may want to run this over a period of time to note increases in these counts.
7. Engage TSE assistance WHEN the problem occurs to get proper guidance.
To reach RCA, it is important to have extractors from before and after failures, so
please encourage customers to run extractor or explorer with the se3k option on a regular
basis before and when problems occur. Once we have collected the above, it will be easier
for TSE to provide better support and chances of getting the root cause are increased.

Change History
Date: 2010-08-02
User Name: [email protected]
Action: Currency Check & Update
Comment: Re-formatted

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback