Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1003547.1 : Sun StorEdge[TM] 3510/3511/3310/3320: Getting "soft errors" reported by iostat
PreviouslyPublishedAs 204988
Applies to:Sun Storage 3310 ArraySun Storage 3510 FC Array Sun Storage 3511 SATA Array Sun Storage 3320 SCSI Array - Version: Not Applicable and later [Release: N/A and later] All Platforms Symptoms{SYMPTOM}SymptomsA Sun Storage[TM] 3510 is receiving thousands of soft errors on twological disks (below). Soft errors are reported by using the "iostat -e" command: # iostat -en ---- errors --- s/w h/w trn tot 0 0 0 0 c1t0d0 0 0 0 0 c1t1d0 0 0 0 0 c2t0d0 0 0 0 0 c2t1d0 0 6 0 6 c0t0d0 116568 0 0 116568 c5t44d0 0 0 0 0 c8t44d0 0 0 0 0 c5t44d2 0 0 0 0 c5t44d1 0 0 0 0 c8t44d2 0 0 0 0 c8t44d1 0 0 0 0 c6t44d3 0 0 0 0 c6t44d2 0 0 0 0 c6t44d1 1592 0 0 1592 c6t44d0 0 0 0 0 c7t44d3 0 0 0 0 c7t44d2 0 0 0 0 c7t44d1 0 0 0 0 c7t44d0 There are absolutely no messages in the /var/adm/messages file, and there seems to be nothing wrong with the array itself. Why are these soft errors occurring? Changes{CHANGE}Cause{CAUSE}Background on soft errors: The kernel statistic on soft errors (reported by iostat) is incremented even for EXPECTED issues. These errors do not mean there is a problem with the array at all. Soft errors indicate that an application is sending a command to the array that the array cannot perform, so the array rejects the request. For example, a "soft error" is generated when a CDB is invalid for the device to which it is sent, such as when a tape rewind request is sent to a disk drive. To see the actual command being rejected by the array, you can put the following setting *temporarily* into the /etc/system file: set ssd:ssd_error_level=0 (3510; 3511) set sd:sd_error_level=0 (3310;3320) and reboot. The temporary setting causes additional information messages from the ssd (sd) driver to appear in the /var/adm/messages file (as shown below). Unfortunately, the messages do not identify the application that is sending the invalid commands, just the invalid commands themselves. NOTE: Do not leave the temporary value in affect for too long - it will cause many informational messages to be printed. As soon as you are done with your troubleshooting, remove this line from the /etc/system file and reboot again. CAUSEOne of the more common causes of soft errors on the Sun Storage[TM] 3000 family of products is the concurrent monitoring of the hardware from more than one host. Monitoring of these arrays is done using the SSCS software, which includes the ssagent and the diagnostic reporter. The SSC software should only be configured and run from *ONE* host, even if the array is connected to multiple hosts. The following are the types of messages generated (after setting "ssd_error_level" to 0) when multiple hosts are monitoring the same array: Failed CDB:0x1d 0x10 0x0 0x0 0x10 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi: /ssm@0,0/pci@18,600000/SUNW,qlc@1,1/fp@0,0/ssd@w256000c0ffc0869f,0 (ssd15): Sense Data:0x70 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x4e 0x0 0x0 0x0 0x0 0x0 0xca 0xfe scsi: /ssm@0,0/pci@18,600000/SUNW,qlc@1,1/fp@0,0/ssd@w256000c0ffc0869f,0 (ssd15): Solution
To stop the soft errors, unconfigure, or remove, the SSCS
monitoring
Attachments This solution has no attachment |
||||||||||||
|