Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1001057.1
Update Date:2010-11-03
Keywords:

Solution Type  FAB (standard) Sure

Solution  1001057.1 :   On StorageTek 2501/2530/2540 one or both controllers I/O to disk drives may timeout until drives are disabled.  


Related Items
  • Sun Storage 2530 Array
  •  
  • Sun Storage 2540 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
201384


Product
Sun StorageTek 2530 Array
Sun StorageTek 2501
Sun StorageTek 2540 Array

Bug Id
<SUNBUG: 6544466>


Impact

This issue results in the loss of drive(s) which would, at a minimum, put the associated volumes into a degraded state. The loss of several drives can cause the associated volumes to be taken offline, leading to a loss of availability.

During 25xx beta testing, one customer experienced a drive disabled event. Based on analysis by Sun Product Engineering, it is anticipated that Sun Service may encounter 1-3 customers during the first quarter of shipments who may experience this particular issue.


Contributing Factors

Products:

  • StorageTek 2501
  • StorageTek 2530
  • StorageTek 2540

This is a new product (expected FCS in mid-May) to the Sun StorageTek Entry Disk Portfolio. The Sun System Handbook product page for these will not be available until late May 2007. In the interim, please reference the following TSC Backline webpage for these new products, along with the SSH page when it becomes available:

http://pts-storage.west/products/ST25xx/

https://support.us.oracle.com/handbook_internal/Systems/2540/2540.html


Symptoms

These are the symptoms and how to identify this issue:

For MEL Events:

1) Clusters of the following event types occur repeatedly:

A) Check condition events coming back from the drive(s):

     Event type: 100A
Event category: Error
Priority: Informational
Description: Drive returned CHECK CONDITION
Event specific codes: 6/2a/2
     Component type: Drive

B) Drive side timeout events

     Event type: 100D
Event category: Error
Priority: Informational
Description: Timeout on drive side of controller
Event specific codes: 0/0/0

2) Eventually the drive gets failed and at least one of the following events will be logged:

Event type: 2217
Event category: Notification
Priority: Informational
Description: Piece failed
Event specific codes: 0/0/0
Component type: Drive
Event type: 2216
Event category: Notification
Priority: Informational
Description: Piece taken out of service
Event specific codes: 0/0/0
Component type: Drive
Event type: 2215
Event category: Notification
Priority: Informational
Description: Drive marked failed
Event specific codes: 0/0/0
Component type: Drive

 


Root Cause

Engineering is currently trying to determine what conditions are required for the array to enter into this state. Currently it appears as though one of the back-end SAS drive channels is marginally functioning and causing the array's error recovery procedures to be executed at an abnormally high frequency.


Workaround

The recovery requires the drives to be reconstructed.

Collect support data and escalate to TSC-Storage Backline who maintain an onsite service procedure which may be required for recovery, and would be implemented with live support/guidance from TSC Backline. Do not power cycle or otherwise modify the state of the array. Based upon the support data collected, TSC-Storage Backline will  provide service personnel with the steps to:

1) Clear the condition

2) Recover any volumes that were taken off line due to the condition

3) Reinstate and rebuild any drives that were failed due to the condition


Resolution

A final resolution is pending completion. Please use CR 6544466 to track the final resolution as this document may not be updated.




Previously Published As
102907
Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Kasp FAB Legacy ID
102907

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2007-05-08
Avoidance: Service Procedure
Responsible Manager: [email protected]
Original Admin Info: WF submitted on 02 May 2007. I will send to review today - karen.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback