Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1004799.1
Update Date:2009-12-22
Keywords:

Solution Type  Technical Instruction Sure

Solution  1004799.1 :   Decoding Sense codes for A1000/A3x000 Products  


Related Items
  • Sun SPARCstorage RSM Array 2000
  •  
  • Sun Storage RAID Manager (RM6) Software
  •  
  • Sun Storage A3500 SCSI Array
  •  
  • Sun Netra st A1000 Array
  •  
  • Sun Storage A3500 FC Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - Other
  •  

PreviouslyPublishedAs
206662


Description
Users may reference the sense code listing below. This is a copy of the raidcode.txt file that is included with Raid Manager 6.22.

RAID ERROR CODE DESCRIPTIONS
This document describes the various error codes returned by the
RAID controllers. On detection of an error condition, the
controller will return a status of Check Condition on the command
that had the error. The host should respond with a Request Sense
command. On receipt of the Request Sense, the controller will
return sense data.

SENSE KEYS

The possible Sense Keys returned by the RAID controller in thesense data, on receipt of a Request Sense command are shown below. The Sense Key is returned in byte 2 (zero-referenced) of theRequest Sense data. The Sense Key may be thought of as a summarycode for the error. More detailed information about the erroris provided by the FRU and ASC/ASCQ codes described in the nextsections.

(0x00)-No Sense

The controller has no errors to report at this time.

(0x01)-Recovered Error

The controller detected the error, but was able to recover fromit.

(0x02)-Not Ready

The controller is in the process of finishing initialization,and will not allow hosts access to user data until it is ready.

(0x03)-Media Error

A drive attached to the controller detected a media error onitself.

(0x04)-Hardware Error

This Sense Key is typically returned by the controller on mostunrecoverable errors.

(0x05)-Illegal Request

A command was issued to the controller that is not allowed (forexample, access to a non-existent logical unit).

(0x06)-Unit Attention

The controller is informing the host of an action it took toremedy an exception condition (for example, the controller marked adrive Failed, because the drive could no longer be accessed).

(0x0B)-Aborted Command

The controller could not finish the requested operation. However, in the typical scenario, it will have taken someaction to ensure that the error condition would not occur again. Therefore, the next time this same command is received, thesame error condition should not occur.

(0x0E)-Miscompare

A failed Verify operation, or a Verify with Parity Checkoperation failure will return a Sense Key of Miscompare.

FIELD REPLACEABLE UNITS (FRU) CODE DEFINITIONS

Each time an error is detected, the controller will put theField Replaceable Unit (FRU) code of the failed component in thesense data (byte 14 (zero-referenced) in the sense data for thefirst error and bytes 26-33 (zero-referenced) for additionalerrors). To provide meaningful information for troubleshooting, theFRU codes have been grouped. The defined FRU groups are listedbelow.

FRU Code Description

0x01 Host Channel Group

0x02 Controller Drive Interface Group

0x03 Controller Buffer Group

0x04 Controller ASIC Group

0x05 Controller Other Group

0x06 Subsystem Group

0x07 Not Used

0x08 Sub-enclosure Group

0x09-0x0F Reserved

0x10-0xFF DriveGroups

(0x01)-Host Channel Group

This group consists of the host SCSI bus, its SCSI interfacechip, and all initiators and other targets connected to thebus.

(0x02)-Controller Drive Interface Group

This group consists of the SCSI interface chips on thecontroller which connect to the drive buses.

(0x03)-Controller Buffer Group

This group consists of the controller logic used to implementthe on-board data buffer.

(0x04)-Controller Array ASIC

This group consists of the ASICs on the controller associatedwith the RAID functions.

(0x05)-Controller Other Group

This group consists of all controller-related hardware notassociated with another group.

(0x06)-Subsystem Group

This group consists of subsystem components that are monitoredby the RAID controller, such as power supplies, fans, thermalsensors, and AC power monitors.

(0x08)-Sub-Enclosure Group

This group consists of the devices such as power supplies,environmental monitor, and other subsystem components in thesub-enclosure.

(0x10-0xFF)-Drive Group

This group consists of a drive (embedded controller, driveelectronics, and Head Disk Assembly), its power supply, and theSCSI cable that connects it to the controller; or supportingsub-enclosure environmental electronics. An FRU code denoting adrive contains the channel number (1-relative) in the upper nibble,and the drive's SCSI ID in the lower nibble. For example, a driveon the third channel, SCSI ID 2 would be denoted by an FRU code of0x32.

ADDITIONAL SENSE CODES AND QUALIFIERS

This section lists the Additional Sense Code (ASC), andAdditional Sense Code Qualifier (ASCQ) values returned by the RAIDcontroller in the sense data. The ASC and ASCQ providedetailed information about the specific error.

SCSI-2 defined codes are used whenever possible. Arrayspecific error codes are used when necessary, and are assignedSCSI-2 vendor unique codes 0x80 to 0xFF.

The most probable Sense Keys (listed below for reference)returned for each error are also listed in the table. SenseKeys of 6 in parentheses indicate that 6 (Unit Attention) would bethe nominal Sense Key reported; however, the actual value would bethat set in the "Sense Key for Vendor-unique Conditions" field inthe User-configurable options of the NVSRAM.

ASCs and ASCQs are normally returned in bytes 12 and 13(zero-referenced) of the sense data. On multiple errors(defined as errors that occurred on the same command, notnecessarily as errors that occurred simultaneously), there may beadditional ASCs and ASCQs in the ASC/ASCQ stack, which are bytes22-25 (zero-referenced) of the sense data. In most cases, thefirst error detected is stored in bytes 12 and 13 of the sensedata; subsequent errors are stored in the ASC/ASCQ stack.

The following section lists all possible ASC/ASCQ combinationsreturned by the controller.

ASC ASCQ Sense Key

00 00 0

No Additional Sense Information

The controller has no errors to report for the requesting hostand addressed logical unit combination.

ASC ASCQ Sense Key

04 01 2

Logical Unit In Process Of Becoming Ready

The controller is executing its initialization functions on theaddressed logical unit. This includes drive spin-up andvalidation of the drive and logical unit configuration information.This error is normally returned on commands following the initialInquiry command after a power-up/reset.

ASC ASCQ Sense Key

04 02 2

Logical Unit Not Ready, Initializing Command Required

The controller is configured to wait for a Start/Stop Unitcommand before spinning up the drives, but the command has not yetbeen received.

ASC ASCQ Sense Key

04 04 2

Logical Unit Not Ready, Format In Progress

The controller previously received a Format Unit command from aninitiator, and is executing that command on this logical unit. Other commands cannot be sent to this logical unit until theFormat Unit completes.

ASC ASCQ Sense Key

04 81 02

Firmware Versions Incompatible

The versions of firmware on the redundant controllers areincompatible/inconsistent. This is probably because you replaced afailed controller with a new controller that does not have the sameversion of firmware. Controllers with an incompatible version offirmware may cause unexpected results. Therefore, you must downloadnew firmware as soon as possible. Use the Recovery Guru/HealthCheck in the Recovery Application to obtain instructions on how todownload firmware to make the versions consistent.

ASC ASCQ Sense Key

04 A1 2

Quiescence Is In Progress or Has Been Achieved

ASC ASCQ Sense Key

0C 00 4,(6)

Unrecoverable Write Error

If this error is reported during normal operation, thecontroller has detected an error on a write operation to a drive,but was unable to recover from the error. The drive thatfailed the write operation will be marked Failed.

Drive Marked Offline Due To Internal Recovery Procedure

An error has occurred during interrupted write processing
causing the LUN to transition to the Dead state. Drives in the
drive group that did not experience the read error will transition
to the Offline state (0x0B) and log this error.

ASC ASCQ Sense Key

3F BD (6)

Drive Has Incorrect Critical Parameters Set

The controller was unable to query the drive for its currentcritical mode page settings, or was unable to change these to thecorrect setting. Currently, this indicates the Qerr bit isset incorrectly on the drive specified in the FRU field of theRequest Sense data.

ASC ASCQ Sense Key

3F C3 (6)

Channel Failure

The controller failed a channel, and will not access drives onthis channel any more. The FRU Group Qualifier (byte 26) inthe sense data will indicate the 1-relative channel number of thefailed channel. This condition is typically caused by a driveignoring SCSI protocol on one of the controller's destinationchannels. The controller typically fails a channel if itissued a reset on a channel, and it continued to see drives ignorethe SCSI Bus Reset on this channel.

ASC ASCQ Sense Key

3F C7 (6)

Non-Media Component Failure

(1) A subsystem component other than a drive or controller hasfailed (for example, fan, power supply, battery) or (2) Anover-temperature condition has occurred (some RAID modules containa temperature sensor). The fans, power supplies, and battery areusually located in the controller module tray. The FRU codes willindicate the faulty component. The user should replace thecomponent indicated.

ASC ASCQ Sense Key

3F C8 (6)

AC Power Fail

The Uninterruptible Power Source (UPS) has indicated that ACpower is no longer present and the UPS has switched to standbypower. While there is no immediate cause for concern, usersshould save their work frequently, in case the battery is suddenlydepleted.

ASC ASCQ Sense Key

3F C9 (6)

Standby Power Depletion Imminent

The Uninterruptible Power Source (UPS) has indicated that itsstandby power source is nearing depletion. The host shouldtake actions to stop IO activity to the controller. Normally,the controller will change from a write-back caching mode to awrite-through mode. The user should not change again towrite-back mode until full AC power has been restored.

ASC ASCQ Sense Key

3F CA (6)

Standby Power Source Not At Full Capacity

The Uninterruptible Power Source (UPS) has indicated that itsstandby power source is not at full capacity. To prevent lossof data in the event of the failure of AC power, the user shouldnot activate write-back caching mode until full UPS power has beenrestored.

ASC ASCQ Sense Key

3F CB (6)

AC Power Has Been Restored

The Uninterruptible Power Source (UPS) has indicated that ACpower is now being used to supply power to the controller.

ASC ASCQ Sense Key

3F D0 (6)

Write-Back Cache Battery Discharged

The controller has detected that its battery is no longercharged. If a power failure were to occur, any dirty userdata in cache will be lost. To prevent the loss of any userdata, the user should either: (1) replace this controller withanother, or (2) turn off write-back cache.

ASC ASCQ Sense Key

3F D1 (6)

Write-Back Cache Battery Charged

The controller has detected that its battery is now fullycharged, and will be capable of holding up the cache contents inthe event of a power failure. The user may switch towrite-back mode, if desired.

ASC ASCQ Sense Key

3F D8 (6)

Battery Reached Expiration

The controller has failed the battery because the battery hasreached its expirations date. You should replace the battery assoon as possible.

ASC ASCQ Sense Key

3F D9 (6)

Battery Near Expiration

The controller has detected that the battery is nearing itsexpiration date. You should replace the battery as soon aspossible.

ASC ASCQ Sense Key

3F E0 (6)

Logical Unit Failure

The controller has placed the logical unit in a "Dead" state. User data and/or parity can no longer be maintained to ensureavailability. The most likely cause is the failure of asingle drive in non-redundant configurations or a second drive in aconfiguration protected by one drive. The data on the logicalunit is no longer accessible.

ASC ASCQ Sense Key

3F EB (6)

LUN Marked Dead Due To Media Error Failure

An error has occurred during interrupted write processing duringStart of Day causing the LUN to transition to the Dead state.

ASC ASCQ Sense Key

40 NN 4,(6)

Diagnostic Failure On Component NN (0x80 - 0xFF)

The controller has detected the failure of an internalcontroller component. This failure may have been detectedduring operation as well as during an on-board diagnostic routine. The values of NN supported in this release are listed asfollows:

> 80 - Processor RAM

> 81 - RAID buffer

> 82 - NVSRAM

> 83 - RAID Parity Assist (RPA) chip

> 84 - Battery-backed NVSRAM or clock failure

> 91 - Diagnostic self test failed non-data transfercomponents test most likely controller cache holdup batterydischarge)

> 92 - Diagnostic self test failed data transfer componentstest

> 93 - Diagnostic self test failed drive Read/Write Bufferdata turnaround test

> 94 - Diagnostic self test failed drive Inquiry accesstest

> 95 - Diagnostic self test failed drive Read/Write dataturnaround test

> 96 - Diagnostic self test failed drive self test

In a dual controller environment, the user should place thiscontroller offline (hold in reset) (unless the error indicatescontroller battery failure, in which case the user should wait forthe batteries to recharge). In single controllerenvironments, the user should not use this subsystem until thecontroller has been replaced.

ASC ASCQ Sense Key

43 00 4

Message Error

The controller attempted to send a message to the host, but thehost responded with a Reject message.

ASC ASCQ Sense Key

44 00 4,B

Internal Target Failure

The controller has detected a hardware or software conditionthat does not allow the requested command to be completed. If the Sense Key is 0x04 indicating a Hardware Failure, the controllerhas detected what it believes is a fatal hardware or softwarefailure and it is unlikely that just a retry of the command wouldbe successful. If the Sense Key is 0x0B indicating an AbortedCommand, the controller has detected what it believes is atemporary software failure that is likely to be recovered ifretried.

ASC ASCQ Sense Key

45 00 4

Selection Time-out On A Destination Bus

A drive did not respond to selection within a selection time-outperiod. Possible reasons for this error include drivefailure, channel failure, or the possibility of an incompletehot-swap holding the whole channel in reset.

ASC ASCQ Sense Key

47 00 1,B

SCSI Parity Error

The controller detected a parity error on the host SCSI bus orone of the drive SCSI buses.

ASC ASCQ Sense Key

48 00 1,B

Initiator Detected Error Message Received

The controller received an Initiator Detected Error Message fromthe host during the operation.

ASC ASCQ Sense Key

49 00 B

Invalid Message Error

The controller received a message from the host that is notsupported or was out of context when received.

ASC ASCQ Sense Key

49 80 B

Drive Reported Reservation Conflict

A drive returned a status of Reservation Conflict.

ASC ASCQ Sense Key

4B 00 1,4

Data Phase Error

The controller encountered an error while transferring datato/from the initiator or to/from one of the drives.

ASC ASCQ Sense Key

4E 00 B

Overlapped Commands Attempted

The controller received a tagged command while it had anuntagged command pending from the same initiator, or it received anuntagged command while it had a tagged command(s) pending from thesame initiator.

ASC ASCQ Sense Key

5D 80 6

Drive Sun StorageTek A3500 Array
Sun StorageTek A3500 FC Array
SPARCstorage RSM Array 2000
Netra st A1000 Array

Raid Manager, RM6, Sonoma, raidcode, raidcode.txt, logutil
Previously Published As
49525

Change History
Date: 2007-12-03
User Name: 97961
Action: Approved
Comment: Publishing. No further edits required.
Version: 12
Date: 2007-12-02
User Name: 97961
Action: Accept
Comment:
Version: 0

Date: 2007-12-02
User Name: 100761
Action: Approved
Comment: Made minor grammatical changes. Ready to be published.
Version: 0

Date: 2007-12-01
User Name: 116529
Action: Approved
Comment: Please review.
Version: 0

Date: 2009-12-01
User: 88109
Comment: Changed title, and removed all references to LSI. Removed 6xxx products from the product list. Broke up the steps to follow into Description and Steps to follow to make editing easier.

Attachments

This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback