Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000559.1
Update Date:2011-02-24
Keywords:

Solution Type  Sun Alert Sure

Solution  1000559.1 :   SE3310/SE3320/SE3510/SE3511 Storage Arrays May Experience Data Integrity Events  


Related Items
  • Sun Netra T6340 Server Module
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Data Loss
  •  

PreviouslyPublishedAs
200705


Bug Id
<SUNBUG: 6511494>

Product
Sun StorageTek 3510 FC Array
Sun StorageTek 3310 NAS Array
Sun StorageTek 3320 SCSI Array
Sun StorageTek 3511 SATA Array

Date of Workaround Release
22-FEB-2007

Date of Resolved Release
20-MAR-2007

Impact

System panics and warning messages on the host Operating System may occur due to a filesystem reading and acting on incorrect data from the disk or a user application reading and acting on incorrect data from the array.


Contributing Factors

This issue can occur on the following platforms:

  • Sun StorEdge 3310 (SCSI) Array with firmware version 4.11K/4.13B/4.15F (as delivered in patch 113722-10 through 113722-15)
  • Sun StorageTek 3320 (SCSI) Array with firmware version 4.12E (as shipped)/ 4.15G (as delivered in patch 113730-01)
  • Sun StorageTek 3510 3510 (FC) Array with firmware version 4.11I/4.13C/4.15F (as delivered in patch 113723-10 through 113723-16)
  • Sun StorageTek 3511 (FC) Array with firmware version 4.11I/4.13C/4.15F (as delivered in patch 113724-04 through 113724-09)

The above raid arrays (single or double controller) with "Write-Back Caching" enabled on Raid 5 LUNs (or other raid level LUNs and an array disk administration action occurs), can return stale data when the I/O contains writes and reads in a very specific pattern. This pattern has been observed in both QFS and UFS metadata updates, and could be seen in other situations.


Symptoms

Filesystem warnings and panics occur and with no indication of an underlying storage issue. For UFS these messages could include:

   "panic: Freeing Free Frag"
   WARNING: /<mount point>: unexpected allocated inode XXXXXX, run fsck(1M) -o f
   WARNING: /<mount point>: unexpected free inode XXXXXX, run fsck(1M) -o f

This list is not exhaustive and other symptoms of stale data read might be seen.


Workaround

Disable the "Write-Back Caching" option inside the array using your preferred array administration tool (sccli(1M) or telnet). This workaround can be removed on final resolution.

Use ZFS to detect (and correct if configured) the Data Integrity Events.

If not using a filesystem make sure your application has checksums and identity information embedded in its disk data so it can detect Data Integrity Events.

Migrating back to 3.X firmware is a major task and is not recommended.


Resolution

This issue is addressed on the following platforms:

  • Sun StorEdge 3310 (SCSI) Array with firmware version 4.15G (as delivered in patch 113722-16) or later
  • Sun StorageTek 3320 (SCSI) Array with firmware version 4.15H (as delivered in patch 113730-02) or later
  • Sun StorageTek 3510 3510 (FC) Array with firmware version 4.15G (as delivered in patch 113723-17) or later
  • Sun StorageTek 3511 (FC) Array with firmware version 4.15G (as delivered in patch 113724-10) or later


Modification History
Date: 20-MAR-2007
  • Updated Contributing Factors and Resolution sections
  • State: Resolved

 


Date: 23-MAY-2007
  • Updated Contributing Factors section


References

<SUNPATCH: 113722-16>
<SUNPATCH: 113730-02>
<SUNPATCH: 113723-17>
<SUNPATCH: 113724-10>

Previously Published As
102815
Internal Comments


sub-CR: 2146925



Comments: this regression was introduced when the firmware 4.X code base was introduced by the software manufacturer.



PTS Reviewer (approved by): [email protected]



23-May-2007 added, per ENG: this issue is also seen in QFS metadata updates


Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Escalation ID
1-21037989, 1-21037591, 1-20986492, 1-20837385, 1-19849835

Internal Resolution Patches
113722-16, 113730-02, 113723-17, 113724-10

Internal Sun Alert Kasp Legacy ID
102815

Internal Sun Alert & FAB Admin Info
Critical Category: Data Loss, Availability ==> Regression
Significant Change Date: 2007-02-22, 2007-03-20
Avoidance: Firmware
Responsible Manager: [email protected]
Original Admin Info: [WF 23-May-2007, dave m: request by ENG to include clarification for QFS, just found recently, important to FEs]
[WF 21-Feb-2007, karened: submitted friday 16-Feb, I'm just drafting now and will send to sunalert_review]

Internal SA-FAB Eng Submission
-------- Original Message --------
Subject: Draft Sun Alert: Bug ID 6511494 :3510/3310 with huge I/O and Write Cache Enable can lead to filesystem corruption and panic
Date: Fri, 16 Feb 2007 21:37:06 +0000
From: tim uglow
To: [email protected]
CC: [email protected], [email protected], Tim Uglow - Principal Engineer


Hi

Please find my draft Sun Alert for this minnow issue.



-------------------------------------------------------------------------------------------------------------------
Synopsis: SE3310/SE3320/SE3510/SE3511 Storage arrays can suffer data
integrity events.


Category: {X] Data Loss.
[X] Availability.

Product: SUN 3310/3320/3510/3511 Raid arrays

BugID: 6511494

Avoidance: [X] Workaround

State: [X] Workaround

1. Impact:

A user application could read and act on incorrect data from the array.
A filesystem could read and act on incorrect data from the disk and produce
warning messages or panic the host Operating System.

2. Contributing Factors:

This issue can occur on the following platforms:

* Sun StorEdge 3310 (SCSI) Array with firmware version 4.11K/4.13B/4.15F (as delivered in patch 113722-10/113722-11/113722-15)

* Sun StorEdge 3320 (SCSI) Array with firmware version 4.15G (as delivered in patch 113730-01)

* Sun StorEdge 3510 (FC) Array with firmware version 4.11I/4.13C/4.15F (as delivered in patch 113723-10/113723-11/113723-15)

* Sun StorEdge 3511 (FC) Array with firmware version 4.11I/4.13C/4.15F (as delivered in patch 113724-04/113724-05/113724-09)

Sun StorEdge 3310/3320/5310 raid arrays (single or double controller) with "Write Behind Caching" enabled on Raid 5 LUNs (or other raid level LUNs and an array disk administration action occurs), can return stale data when the i/o contains writes and reads in a very specific pattern. This pattern has only be observed in UFS metadata updates but could be seen in other situations.


3. Symptoms:

Filesystem warnings and panics with no indication of an underlying storage issue. For UFS these messages could include..

"panic: Freeing Free Frag"
WARNING: /: unexpected allocated inode XXXXXX, run fsck(1M)
-o f
WARNING: /: unexpected free inode XXXXXX, run fsck(1M) -o f
This list is not exhaustive other symptons of stale data read could be seen.


4. Relief/Workaround:

Disable the "write behind" caching option inside the array using your
preferred array administration tool(sccli(1M) or telnet), this workaround can be removed on final resolution.

Use ZFS to detect (and correct if configured ) the Data Integrity Events.

If not using a filesystem make sure your application has checksums and identity information embedded in its disk data so it can detect Data Integrity Events.

Migrating back to 3.X firmware is a major task and is not recommended.

5. Resolution:

A final resolution is pending urgent completion.


6. Internal Section:

Escalation IDs: 1-21037989 1-21037591 1-20986492 1-20837385 1-19849835

Pending Patches:

I'm just getting the proposed patch numbers, but FYI the fix will go
into the following
firmware versions...

SE3510 4.15G
SE3310 4.15G
SE3511 4.15G
SE3320 4.15H

and 4.21 for all array types.


Resolution Patches:
FIN:
FCO:
Submitter: [email protected]
Responsible Engineer: [email protected]
Responsible Manager: [email protected]
PTS/Engineering organization: [X] NWS (Network Storage)


Distribution: [X] Public SunSolve

Comments: this regression was introduced when the firmware 4.X code
base was introduced by the software manufacturer.

PTS Reviewer (approved by): [email protected]
--------------------------------------------------------------------------------------------------------------------

thanks
tim

References

SUNPATCH:113722-16
SUNPATCH:113723-17
SUNPATCH:113724-10
SUNPATCH:113730-02

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback