Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003547.1
Update Date:2011-01-06
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003547.1 :   Sun StorEdge[TM] 3510/3511/3310/3320: Getting "soft errors" reported by iostat  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
204988


Applies to:

Sun Storage 3310 Array
Sun Storage 3510 FC Array
Sun Storage 3511 SATA Array
Sun Storage 3320 SCSI Array - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Symptoms

{SYMPTOM}

Symptoms

A Sun Storage[TM] 3510 is receiving thousands of soft errors on two
logical disks (below). Soft errors are reported by using the "iostat -e"
command:

# iostat -en
---- errors ---
s/w h/w trn tot
0 0 0 0 c1t0d0
0 0 0 0 c1t1d0
0 0 0 0 c2t0d0
0 0 0 0 c2t1d0
0 6 0 6 c0t0d0
116568 0 0 116568 c5t44d0
0 0 0 0 c8t44d0
0 0 0 0 c5t44d2
0 0 0 0 c5t44d1
0 0 0 0 c8t44d2
0 0 0 0 c8t44d1
0 0 0 0 c6t44d3
0 0 0 0 c6t44d2
0 0 0 0 c6t44d1
1592 0 0 1592 c6t44d0
0 0 0 0 c7t44d3
0 0 0 0 c7t44d2
0 0 0 0 c7t44d1
0 0 0 0 c7t44d0

There are absolutely no messages in the /var/adm/messages file, and there
seems to be nothing wrong with the array itself.

Why are these soft errors occurring?

Changes

{CHANGE}

Cause

{CAUSE}

Background on soft errors:

The kernel statistic on soft errors (reported by iostat) is incremented
even for EXPECTED issues. These errors do not mean there is a problem with
the array at all. Soft errors indicate that an application is sending a
command to the array that the array cannot perform, so the array rejects
the request. For example, a "soft error" is generated when a CDB is
invalid for the device to which it is sent, such as when a tape rewind
request is sent to a disk drive.

To see the actual command being rejected by the array, you can put the
following setting *temporarily* into the /etc/system file:

set ssd:ssd_error_level=0 (3510; 3511)
set sd:sd_error_level=0 (3310;3320)
and reboot.

The temporary setting causes additional information messages from the
ssd (sd) driver to appear in the /var/adm/messages file (as shown below).
Unfortunately, the messages do not identify the application that is
sending the invalid commands, just the invalid commands themselves.
NOTE: Do not leave the temporary value in affect for too long - it will
cause many informational messages to be printed. As soon as you are
done with your troubleshooting, remove this line from the /etc/system
file and reboot again.

CAUSE


One of the more common causes of soft errors on the Sun Storage[TM] 3000
family of products is the concurrent monitoring of the hardware from more
than one host. Monitoring of these arrays is done using the SSCS software,
which includes the ssagent and the diagnostic reporter. The SSC software
should only be configured and run from *ONE* host, even if the
array is connected to multiple hosts.

The following are the types of messages generated (after setting
"ssd_error_level" to 0) when multiple hosts are monitoring the same array:

Failed CDB:0x1d 0x10 0x0 0x0 0x10 0x0 0x0 0x0 0x0 0x0 0x0 0x0
scsi: /ssm@0,0/pci@18,600000/SUNW,qlc@1,1/fp@0,0/ssd@w256000c0ffc0869f,0
(ssd15):
Sense Data:0x70 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x4e 0x0 0x0
0x0 0x0 0x0 0xca 0xfe
scsi: /ssm@0,0/pci@18,600000/SUNW,qlc@1,1/fp@0,0/ssd@w256000c0ffc0869f,0
(ssd15):



Solution



Resolution


To stop the soft errors, unconfigure, or remove, the SSCS monitoring
software from all but one connected host.




Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback