Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000907.1
Update Date:2011-03-07
Keywords:

Solution Type  Sun Alert Sure

Solution  1000907.1 :   Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days  


Related Items
  • Sun Storage T3 Array
  •  
  • Sun Storage T3+ Array
  •  
  • Sun Storage 3910 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Data Loss
  •  

PreviouslyPublishedAs
201217


Product
Sun StorageTek 3900 Series
Sun StorageTek 6900 Series
Sun StorageTek T3 Array
Sun StorageTek T3+ Array

Bug Id
<SUNBUG: 4785593>

Date of Workaround Release
30-JAN-2003

Date of Resolved Release
30-MAY-2003

Impact

Sun StorEdge T3 and T3+ arrays (including Sun StorEdge T3+ arrays contained in the Sun StorEdge 3900 and Sun StorEdge 6900 Series) may reset and/or lose host connectivity for 2-3 minutes if it has been running continuously for exactly 497 days and if I/O operations are in progress at that time.

Data may become unavailable or may get lost permanently, depending on the system configuration and how applications react to the arrays resetting or temporarily losing host connectivity.


Contributing Factors

This issue can occur with the following configurations:

  • Sun StorEdge T3 array with firmware 1.18.01 or earlier
  • Sun StorEdge T3+ array with firmware 2.01.03 or earlier
  • Sun StorEdge 3900 Series (SE39xx) containing Sun StorEdge T3+ arrays with firmware 2.01.03 or earlier
  • Sun StorEdge 6900 Series (SE69xx) containing Sun StorEdge T3+ arrays with firmware 2.01.03 or earlier

This issue will only occur if I/O operations are executed on the array at the time it has been running continuously for exactly 497 days. If the array is idle at that time, this issue will not occur.

Note: There is no "uptime" or similar command on the T3/T3+. To identify how long the T3/T3+ has been running it is necessary to review the T3/T3+ syslog (or remote logging file), or to review the change logs possibly kept by the system administrator to find the date of the last T3/T3+ boot. Applying new firmware to a T3/T3+ requires an array reboot. Therefore, T3/T3+ arrays whose firmware has been kept updated with firmware releases will have been rebooted during the update process and hence are less likely to be impacted by this issue.


Symptoms

1. Messages similar to the following may be seen in the "/var/adm/messages" file:

    unix: ID[SUNWssa.socal.link.5010] socal3: port 0: Fibre Channel is OFFLINE
unix: WARNING: /sbus@49,0/SUNW,socal@1,0/sf@0,0 (sf6):
unix:        Offline Timeout
unix: sf6:   target 0x2 al_pa 0xe4 offlined
unix: WARNING: /sbus@49,0/SUNW,socal@1,0/sf@0,0/ssd@w50020f2300007193,0 (ssd3):
unix:        SCSI transport failed: reason 'tran_err': giving up

The above messages indicate a loss of host connectivity to a T3/T3+ array and may occur for different reasons. Should the issue described in this Sun Alert document occur, one of the sets of T3/T3+ messages listed below will also be seen:

2. As it restarts, the T3/T3+ syslog may record the following reason for the T3/T3+ resetting:

    ROOT[1]: W: u1ctr Assertion Reset (3000) was initiated at yyyymmdd
hhmmss ../../common/bss/qlcf.c line xxxx, Assert(cmd->cmd_deadline
!= CAM_TIME_INFINITY) => 0 BOOT

3. The T3/T3+ syslog may record messages similar to the following, at the time that host connectivity is lost:

    ISR1[1]: N: u1ctr ISP2100[2] Fatal timeout on host 125
ISR1[1]: N: u1ctr ISP2100[2] Received LIP(f0,f0) async event
ISR1[1]: N: u1ctr ISP2100[2] qlcf_invalidate_pdb: PDB Invalidate (host id 125)
FCC0[1]: N: u1ctr Port event received on port 0, abort 0 (id 125)
ISR1[1]: N: u1ctr ISP2100[2] qlcf_sync_pdb: PDB Sync Initiated (host id 125)
ISR1[1]: N: u1ctr ISP2100[2] qlcf_update_pdb: PDB Sync Done (host id 125)
ISR1[1]: N: u1ctr ISP2100[2] PDB Sync Done (host id 125,
host WWN 2004020000101a00)
FCC0[1]: N: u1ctr PDB Changed on port 0 (id 125)
[...]

10 seconds later:

    ISR1[1]: N: u1ctr ISP2100[2] Fatal timeout on host 125
ISR1[1]: N: u1ctr ISP2100[2] qlcf_i_watch_host_port: Debug Code - ISP2100 Hang
Detected
ISR1[1]: N: u1ctr ISP2100[2] interface going offline
ISR1[1]: N: u1ctr ISP2100[2] qlcf_init_pdb: PDB Initialize
ISR1[1]: N: u1ctr ISP2100[2] QLCF_I_ABORT_ALL_TM_CMDS: Target-mode Flush Started
(lun = 0x0)
ISR1[1]: N: u1ctr ISP2100[2] interface going online
[...]
SIMT[1]: N: u1ctr Initializing host port u1p1 ISP2100 ... firmware status = 7
[...]
DUMP[1]: N: u1ctr ISP2100[2] [==>BEG]ISPDEBUGDUMP:
DUMP[1]: N: u1ctr ISP2100[2]  PBIU REGISTERS (OFFSET 00H, 8):
DUMP[1]: N: u1ctr ISP2100[2]        0000 0001 0002 0003 0004 0005 0006 0007
DUMP[1]: N: u1ctr ISP2100[2]        ---- ---- ---- ---- ---- ---- ---- ----
[... followed by lines of hex data ...]

Workaround

To prevent the described issue, the Sun StorEdge T3/T3+ array needs to be rebooted before it has been running for 497 days.


Resolution

This issue is addressed in the following releases:

  • Sun StorEdge T3 array with firmware 1.18.02 or later (firmware 1.18.02 is available as patch 109115-13)
  • Sun StorEdge T3+ array with firmware 2.01.04 or later (firmware 2.01.04 is available as patch 112276-07)
  • Sun StorEdge 3900 Series (SE39xx) containing Sun StorEdge T3+ arrays with firmware 2.01.04 or later (firmware 2.01.04 is available as patch 112276-07)
  • Sun StorEdge 6900 Series (SE69xx) containing Sun StorEdge T3+ arrays with firmware 2.01.04 or later (firmware 2.01.04 is available as patch 112276-07)


Modification History
Date: 30-MAR-2003
  • Sun StorEdge T3 array firmware 1.18.02 is available as patch 109115-13

Date: 30-MAY-2003
  • Sun StorEdge T3+ array firmware 2.01.04 is available as patch 112276-07


References

<SUNPATCH: 109115-13>
<SUNPATCH: 112276-07>

Previously Published As
101168
Internal Comments



A scheduled reboot, as described in the "Relief/Workaround" section above, may also be used to update arrays to the latest firmware revision (although currently available firmware does not address the issue described in this document).





Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected], [email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Escalation ID
542409, 542406, 542934

Internal Resolution Patches
109115-13, 112276-07

Internal Sun Alert Kasp Legacy ID
101168, 50381 (Sun Alert)

Internal Sun Alert & FAB Admin Info
Critical Category: Data Loss, Availability ==> Pervasive
Significant Change Date: 2003-01-30, 2003-03-30, 2003-05-30
Avoidance: Patch, Workaround
Responsible Manager: [email protected]
Original Admin Info: This document has been imported from KMS Creator and may need adjustment before re-publishing.

This imported document has been reviewed/adjusted by:
Review Name:
Review Date:

Original KMS Creator attributes below:

--- PLEASE DO NOT MAKE ANY CHANGES BELOW THIS LINE! ---

Sun Alert ID: 50381
Synopsis: Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days
Category: Data Loss, Availability
Product: Sun StorEdge T3, Sun StorEdge T3+, Sun StorEdge 3900 Series, Sun StorEdge 6900 Series
BugIDs: 4785593
Avoidance: Patch, Workaround
State: Resolved
Date Released: 30-Jan-2003, 30-Mar-2003, 30-May-2003
Date Closed: 30-May-2003
Date Modified: 30-Mar-2003, 30-May-2003
Escalation IDs: 542409, 542406, 542934
Pending Patches:
Resolution Patches: 109115-13, 112276-07
FIN:
FCO:
Date Submitted: 24-Jan-2003
Submitter: [email protected]
Responsible Engineer: [email protected], [email protected]
Responsible Manager: [email protected]
CTE group: CPRE NWS EMEA
Responsible Writer: [email protected]
Distribution: Public SunSolve

Workflow History:

WF State: Issued, 30-May-2003, Olaf Reineke
WF Note: Re-issued resolved.

WF State: Issued, 30-Jan-2003, Karen Edwards
WF Note: oked by susan, released by karen

WF State: Draft, 30-Jan-2003, Olaf Reineke
WF Note: Sent out for signoff



WF State: Draft, 29-Jan-2003, Olaf Reineke
WF Note: Sent out for technical review and BT review.


WF State: Draft, 28-Jan-2003, Olaf Reineke
WF Note: Article created.

Exported from KMS Creator Sat May 21 08:55:01 2005 GMT, [email protected]
Internal SA-FAB Eng Submission
Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days

Product_uuid
04ccc2c2-16a1-11d7-9f9a-f83fdd2e2f1b|Sun StorageTek 3900 Series
09a6d778-16f2-11d7-8802-94885c013b6c|Sun StorageTek 6900 Series
2a6d7d50-0a18-11d6-8e0b-f0bd33b24928|Sun StorageTek T3 Array
2a714b10-0a18-11d6-86e2-d56b387d4fbf|Sun StorageTek T3+ Array

References

SUNPATCH:109115-13
SUNPATCH:112276-07

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback