Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1010064.1
Update Date:2011-05-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  1010064.1 :   Sun StorEdge[TM] 33x0/351x Array - How to Replace a Hard Drive  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
213814


Applies to:

Sun Storage 3511 SATA Array
Sun Storage 3510 FC Array
Sun Storage 3320 SCSI Array
Sun Storage 3310 Array
All Platforms

Goal

Description

How do I replace a drive after a single drive failure?

Removing and replacing a failed drive may seem simple in some respects and usually is, however if the wrong disk is pulled, you risk the loss of data. Drive failures can turn in to complex issues if the necessary steps are not taken to verify the drive as BAD and replace it correctly.

This document  covers the steps necessary to visually identify a failed drive and go through various steps that validate the drive is bad then physically remove and place the failed disk drive.

Solution

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Storage Disk 3000 Series RAID Arrays

Steps to Follow

Check if the status of the disk indicates BAD or FAILED by issuing a sccli show disks command or through the firmware interface steps described in: Viewing the Status of a Physical Drive in the Sun StorEdge[TM] 3000 Family RAID Firmware 4.2 User's Guide. The Physical Drive Status table shows you the status of all physical drives in your array. 


To View the Physical Drive Status Table:
From the Main Menu choose "view and edit Drives" to view your array's physical drives, and to edit physical drive parameters. If a drive is installed but not listed, the drive might be defective or installed incorrectly.

Note the channel number and target ID of the drive indicating BAD or FAILED. This information will help you locate the tray slot which houses the bad drive.

For further information on how to identify a defective drive, or for a JBOD, identify the failed drive following the steps in Identifying the Defective Disk Drive in a JBOD Array see the Sun StorEdge 3000 Family FRU Installation Guide

Also note the logical drive the disk is assigned to. From the show disk command the logical drive information is contained in the LD column and from the firmware interface menu, it is in the LG_DRV column.

Verify the event log or persistent event log to confirm the failure state of the drive and any errors from the drive using the:

sccli > show events

or show persistent-events command.

Verify the logical disk status by issuing:

sccli command> show logical-drives

or through the firmware interface steps described in: Logical Drive Status Table in the Sun StorEdge 3000 Family RAID Firmware 4.2 User's Guide which also includes a list of possible logical drive states.


To check and configure logical drives

From the Main Menu choose "view and edit Logical drives" and press Return. The status of all   logical drives is displayed.

If more than one physical drive is in a MISSING or BAD state, or a logical drive is in a FATAL FAIL state follow the steps described in: Recovering From Fatal Drive Failure in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual.

If the logical drive is in a degraded (DRV FAIL) state and you have one failed drive (BAD or ABSENT) for a raid array, identify the failed drive by following the steps in in the Sun StorEdge 3000 Family FRU Installation Guide or for a JBOD, identify the failed drive following the steps in Identifying the Defective Disk Drive in a JBOD Array or Identifying the Defective Disk Drive in a Raid Array in the Sun StorEdge 3000 Family FRU Installation Guide. If the target logical drive status is GOOD, the spare disk is successfully protected and is now integrated into the logical drive, and the replacement disk drive is available to be assigned as a global spare.

Visually inspect all the drives for amber LEDs which could indicate a failed drive.

Drive LEDs:

Solid green Good: Drive power-up and spin-up OK.
Blinking green Good: Indicates drive activity.
Solid Amber LED Drive fault: Indicates failed drive

Before replacing a failed drive; save the configuration settings to NVRAM.

1. From the RAID firmware Main Menu, choose "system Functions controller maintenance Save nvram to disks."

2. Choose Yes to confirm.

A message informs you that NVRAM information has been successfully saved. If array has 3.2x controller firmware, you will need to reference the appropriate revision of the Firmware User's Guide.

Caution - Do not restore the 3.2x NVRAM settings from disk onto a 4.x controller FRU, or vice versa. The NVRAM structures are incompatible.

Removing a Defective Disk Drive, remember failure to identify the correct disk drive may result in replacing the wrong disk drive and could cause a loss of data. Be sure that you have identified the correct disk drive. If you are uncertain about the location of the drive, refer to the Sun StorEdge 3000 Family FRU Installation Guide. To prevent any possibility of data loss, back up the data prior to removing disk drives. Do not remove a defective module unless you have a replacement FRU module to immediately replace the defective module. If you remove a module and do not replace it, you alter the air flow inside the chassis and could overheat the chassis as a result. When a failed drive is replaced, the system rebuilds the logical drive by restoring data that was on the failed drive onto a new or spare drive.

Remove the defective disk drive with the following steps.

1. Unlock the locks with the provided key, and gently pull the plastic front bezel away from the front of the unit so that it drops down and is supported by the two hinged brackets on the sides.

2. Turn the thumbscrew of the defective disk drive counterclockwise several full turns until the thumbscrew and drive module are loosened.

3. Gently pull the release handle upward.
4. Pull the drive module out until the drive connector has fully disconnected from the midplane.
5. Wait 20 seconds for the drive to stop spinning and then remove it from the chassis.

Installing a New Disk Drive. Be sure to install a disk drive that is appropriate for your array. Sun Storage 3510 FC array disk drives cannot be used in a Sun Storage 3511 FC array. Similarly, a Sun Storage 3511 disk drive cannot be used in a Sun Storage 3510 FC array.

To install the replacement disk drive, perform the following steps.

1. Gently slide the drive module into the drive slot until the handle pins slip into the chassis notch.
2. Lower the disk drive handle until it is vertical.
3. Press and hold the drive handle in while you press the thumbscrew in until it engages the threads.
4. Turn the thumbscrew clockwise until it is finger-tight.
5. Push the plastic front bezel onto the front of the unit until it is seated firmly, and use the key to lock the locks.
6. If the replaced drive is in a JBOD directly attached to a server, perform any operations your host software requires to recognize the new drive and bring it under software control.

Verify the physical disk state of the new disk is FRMT DRV, NEW DRV or ONLINE by issuing sccli> show disks command or through the firmware interface by selecting view and edit Drives from the main menu.

If the logical volume status was DRV FAILED, check the status again to verify that the logical volume has changed to either REBUILDING or GOOD. Monitor the logical drive until the status is GOOD.


If a global or local spare drive replaced the failed drive and it is desired to have the replaced drive as part of the LUN again, do the following steps from the firmware interface menu:

1.Select view and edit Logical drives from the main menu
2.Select the logical drive that has the drive that was the spare
3.Select copy and replace drive
4.Select the drive that was the spare
5.Select the drive that was replaced
6.Select Yes to copy and replace the drive

After the copy and replace is completed, that drive or the replaced needs to be defined as a spare.

Rebuilding the logical drive restores the RAID integrity to a self-consistent state. This does not guarantee that the data has not been corrupted. All possible application checks should be performed to ensure that the data is not corrupted before it is used for business or production purposes.

Troubleshooting information is available in the following documents:

Troubleshooting Sun StorEdge [TM] 33x0/351x Disk Failures (Doc ID 1008190.1)





Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback