Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1008108.1
Update Date:2010-01-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  1008108.1 :   Sun StorEdge[TM] 3310, 3510 and 3511: Avoiding double drive failure conditions on 3.x firmware  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
211152


Symptoms
When a disk fails on a Sun StorEdge[TM] 33x0, 3510 and 3511 array, a bad block is encountered on another disk of that logical drive during the rebuild operation and the rebuild operation fails as follows :
Tue Jan 31 15:38:30 2006
[1113] #5: StorEdge Array SN#8040967 CH2 ID7: SCSI Drive ALERT: bad block encountered (02h, 03h,11/00)
Tue Jan 31 15:38:30 2006
[2103] #6: LD-ID 6CC584FE on StorEdge Array SN#8040967: ALERT: rebuild failed

This is known as a "double disk error" and when this happens, data
loss occurs and a restore from backup is required.



Resolution
This problem is due to the latent disk access and is common for all arrays.

The solution was to introduce disk scrubbing which was also introduced on Sun StorEdge[TM] 3310, 3510 and 3511 arrays with 4.x firmware which automatically does media scan. To avoid this problem, upgrade to 4.x firmware which will minimize the chances of this happening. Before upgrading to 4.x firmware, it would be advisable to use the procedure mentioned in the workaround section once while on 3.x, to avoid seeing too many drive failures (see SunAlert 102011 for more details).

Sample of drive related Sun Alerts while using 4.x below:

Sun Alert ID: 102098

Synopsis: Insufficient Information for Recovery From Double Drive Failure for Sun StorEdge 33x0/35xx Arrays

Sun Alert ID: 102129

Synopsis: Disks May be Marked as Bad Without Explanation After "Drive Failure," "Media Scan Failed" or "Clone Failed" Events

Sun Alert ID: 102011

Synopsis: Sun StorEdge 33x0/3510 Arrays May Report a Higher Incidence of Drive Failures With Firmware 4.1x SMART Feature Enabled



Relief/Workaround
To do this, run the Parity Regenerate operation at least once a month which will read the data and compare it with the parity for all the disk blocks. This will ensure that latent disk access will be avoided. This will also help ensure that the data and parity are consistent, and will take necessary action(s) if they aren't.

This can be done two ways.

1. Telnet/Serial interface
2. sscs GUI interface

To describe in detail :-

1. Telnet/Serial interface

From the telnet/serial access to the array, select the RAID 5 logical drive. It is best to run parity regenerate and select the last but one option which is

  x reGenerate parity         x

This will give 2 options :-

        Execute Regenerate Logical Drive Parity
Overwrite Inconsistent Parity - Enabled 

By default, the "Overwrite Inconsistency Parity" is enabled. Disable this as it will overwrite the parity, should there be a mismatch between the data and parity.

After disabling the "Overwrite Inconsistency Parity", select the "Execute Regenerate Logical Drive Parity" which will start the parity regenerate. You can also track its progress.

Notes:

a. Only run one parity regenerate program at a time.
b. Do not schedule this to run automatically. It has to be manually run each
time.

2. sscs GUI interface

a. Launch the SSCS gui with /usr/sbin/ssconsole.
b. Select a RAID5 logical drive from the device tree.
c. From the "Array Administration" menu, select "Schedule parity check".

Please refer Sun StorEdge[TM] 3000 Family Configuration Service User's Guide for more details on this.



Product
Sun StorageTek 3511 SATA Array
Sun StorageTek 3510 FC Array
Sun StorageTek 3320 SCSI Array
Sun StorageTek 3310 SCSI Array

parity, scrubbing, latent, disk, 3310, 3510, 3511, 3.x, 4.x, 3.27, 3.25, 4.11, 4.13, regen, rebuild, drive-failure
Previously Published As
84456

Change History
Date: 2010-01-11
User Name: [email protected]
Action: Currency & Update
Date: 2006-03-15
User Name: 7058
Action: Approved
Comment: Trademarked where appropriate.
Reworded sentences throughout document for reader clarity.
Made grammar and punctuation fixes as needed.
Enabled STM to bold section headers and offset preformatted text for clarity.

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback