Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000319.1
Update Date:2011-02-25
Keywords:

Solution Type  Sun Alert Sure

Solution  1000319.1 :   Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array  


Related Items
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Data Loss
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
200437


Product
Sun StorageTek 3310 SCSI Array
Sun StorageTek 3320 SCSI Array

Bug Id
<SUNBUG: 6363490>, <SUNBUG: 6378796>

Date of Workaround Release
12-JAN-2006

Date of Resolved Release
13-Mar-2008

Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array

1. Impact

When the connection between the SE33x0 array and the host has degraded to the point that WRITE requests cannot be completed due to connectivity issues, persistent SCSI parity errors may be generated between the host and the SE33X0 array and data inconsistencies may occur.

2. Contributing Factors

This issue can occur on the following platforms:

  • Sun StorEdge 3310 SCSI array without firmware 4.15F (as delivered in patch 113722-15)
  • Sun StorEdge 3320 SCSI array without firmware 4.15G (as delivered in patch 113730-01)

SCSI parity errors can cause invalid data to get written into the array's cache. Prior to firmware version 4.15, this data eventually gets flushed to the disk media, permanently storing this invalid data on the volume. Firmware version 4.15 was modified to discard this corrupted data rather than write it to disk media. This reduces the probability of corrupting the volume. However, in the rare case where the write command overlapped a prior write command's data that still resided in cache, that data will also be discarded.

Single Path Configurations

Configurations in which a host has only one path to one or more logical units on the array are exposed to this problem. This is because there is no redundant path between the host and the SE33x0 array. This lack of redundancy does not allow for a retry using a second path to the SE33x0 array.

When using firmware version 4.15 in this configuration, if any write commands failed due to parity errors, there is a possibility of lost write data in cache if the application or file system issued writes to overlapping LBAs.

When using older firmware in this configuration, the data for LBAs of any WRITE request that cannot be completed as a result of a PARITY ERROR returned by the SE33x0 should be considered to have invalid data.

Multi Path/High Availability Configurations

The exposure for a properly configured High Availability configuration using a host multi-pathing driver and and multiple separate connections between the host(s) and the SE33x0 array is very small. In this configuration, the multi-pathing driver in the host will utilize the second, non-compromised path to the array controller to retry the WRITE request. A successful retry will successfully write the intended data to the correct LBAs with the following exceptions:

1. If the SE33x0 array or the host experiences a power failure between the failed WRITE request and the successful completion of the retry down the second path, the data for the failed WRITE request should be considered invalid.

2. If the Host OS experiences a crash or a multi-path driver error between the failed WRITE request and the successful completion of the retry down the second path, the data for the failed WRITE request should be considered invalid.

3. Symptoms

Should the described issue occur, persistent SCSI parity errors between the host and the SE33x0 array will be generated. The SE33x0 array will return a SCSI status of "Parity Error" to the host SCSI Host Bus Adapter (HBA). Typically, the host SCSI HBA will retry the WRITE request some number of times (most drivers attempt between 2 to 6 retries) before returning the WRITE request to the application with a FAILURE status.

4. Workaround

There is no workaround for this issue. Please see the resolution section below.

5. Resolution

The issue described in BugID 6363490 is addressed on the following platforms:
  • Sun StorEdge 3310 SCSI array with firmware revision 4.15F (as delivered in patch 113722-15 or later)
  • Sun StorEdge 3320 SCSI array with firmware revision 4.15G (as delivered in patch 113730-01 or later)

Note: Insure that SCSI connections are reliable and properly configured to minimize the probability of parity errors and use multiple SCSI connections with failover drivers.

Because the nature of the changes would require a major redesign, the issue described in BugID 6378796 was closed as "will not fix."

This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


Modification History
14-Jun-2006: Updated Contributing Factors and Resolution Sections
13-Mar-2008: Updated Resolution section - RESOLVED



References

<SUNPATCH: 113722-15>
<SUNPATCH: 113730-01>

Previously Published As
102128
Internal Comments
Please send technical questions to the following email:
[email protected]
and CC the following persons:
Internal Contributor/Submitter
Internal Eng Responsible Engineer
Internal Services Knowledge Engineer

The following Sun Alerts have information about other known issues for the 3000 series products:

102011 - Sun StorEdge 33x0/3510 Arrays May Report a Higher Incidence of Drive Failures With Firmware 4.1x SMART Feature Enabled



102067 - Sun Cluster 3.x Nodes May Panic Upon Controller Failure/Replacement Within Sun StorEdge 3510/3511 Arrays



102086 - Failed Controller Condition May Cause Data Integrity Issues



102098 - Insufficient Information for Recovery From Double Drive Failure for Sun StorEdge 33x0/35xx Arrays



102126 - Recovery Behavior From Fatal Drive Failure May Lead to Data Integrity Issues



102127 - Performance Degradation Reported in Controller Firmware Releases 4.1x on Sun StorEdge 3310/351x Arrays for All RAID Types and Certain Patterns of I/O



102128 - Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array



102129 - Disks May be Marked as Bad Without Explanation After "Drive Failure," "Media Scan Failed" or "Clone Failed" Events



Note: One or more of the above Sun Alerts may require a Sun Spectrum Support Contract to login to a SunSolve Online account.



As referenced in bug 6363490, this issue may occur with a faulty cable where possibly a toggled bit on the upper 8-bit of the cable has occurred.



A firmware release is scheduled that will ensure that data that has been compromised by the host SCSI HBA to the SE33X0 array controller will not be flushed to media. When this fix is delivered, there are cases in which stale data may remain in the SE33X0 array for LBAs that correspond to the failed WRITE Request.



Subsequent firmware release is scheduled that will ensure that the data that has been compromised by the host SCSI HBA to the SE33X0 array will not be flushed to media AND EITHER:



A. The most current version of data successfully written to the SE33X0 array by the host may be read from the SE33X0 array



OR:



B. A host READ request for data that could not be recovered will be returned with an error indicating MEDIA ERROR for the LBAs that can not be recovered.



A successful WRITE request to the LBAs corresponding to the MEDIA ERROR will allow these LBAs to be read correctly again.


Internal Contributor/submitter
[email protected]

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Escalation ID
1-12074354

Internal Resolution Patches
113722-15, 113730-01

Internal Sun Alert & FAB Admin Info
Critical Category: Data Loss
Significant Change Date: 2006-01-12
Avoidance: Patch
Responsible Manager: [email protected]
Original Admin Info: [WF 14-Jun-2006, Dave M: updating for FW patches]
[WF 15-May-2006, Dave M: pending patch number given]
[WF 05-Jan-2006, Dave M: sent for review 04-Jan, reviews returned, Chessin suggestions/changes made, all docs on hold until Exec approval pending 1/12]
[WF 04-Jan-2006, Dave M: final edits before sending to tech review]
[WF 02-Jan-2006, Dave M: draft created]
Internal Sun Alert Kasp Legacy ID
102128

Product_uuid
3db30178-43d7-4d85-8bbe-551c33040f0d|Sun StorageTek 3310 SCSI Array
95288bce-56d3-11d8-9e3a-080020a9ed93|Sun StorageTek 3320 SCSI Array

References

SUNPATCH:113722-15
SUNPATCH:113730-01

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback