Document Audience:INTERNAL
Document ID:I0834-2
Title:Domains on Sun Fire 12K/15K systems may suffer domain stops (DSTOP) with a signature of "CP arbiter lockstep consistency check error". Sun Alert: Yes
Copyright Notice:Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved
Update Date:2002-12-24

---------------------------------------------------------
        - Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
                        FIELD INFORMATION NOTICE
          (For Authorized Distribution by Enterprise Services)
FIN #: I0834-2
Synopsis: Domains on Sun Fire 12K/15K systems may suffer domain stops (DSTOP) with a signature of "CP arbiter lockstep consistency check error". Sun Alert: Yes
Create Date: Dec/20/02
SunAlert: Yes
Top FIN/FCO Report: Yes
Products Reference: Sun Fire 15K/12K
Product Category: Server / SW Admin
Product Affected: 
Systems Affected
----------------
Mkt_ID    Platform   Model      Description             Serial Number
------    --------   -----      -----------             -------------
  -        F12K       ALL       Sun Fire 12K                  -
  -        F15K       ALL       Sun Fire 15K                  -


X-Options Affected
------------------
Mkt_ID    Platform   Model    Description               Serial Number
------    --------   -----    -----------               -------------
  -          -         -           -                          -
Parts Affected: 
Part Number     Description            Model
-----------     -----------            -----
     -               -                   -
References: 
BugId:   4671526 - ibPower needs to clear board test status when boards 
                   are reset .
         4671531 - libKeyswitch needs to deconfigure L1 boards before 
                   the expander.
         4724771 - LibPower should send events sychronously.
         4712287 - EXB asic LBIST needs to be skipped when CP is in use.
         4699827 - Deconfigure L1 boards should reset Darb ports if 
                   necessary.

FIN:     I0834-1

PatchId: 112481-06 (or higher): SMS 1.2: fomd, hwad, esmd, pcd patch. 
         112488-06 (or higher): SMS 1.2: hpost, redx, libxcpost Patch.
 
ESC:     536181 - F15K domain was dstoped on SMS1.2.  
         536183 - SF15K ???#1 dstop??????. 
         536638 - F15K SB1 will not power up after adding SB0 to domain 
                  sapr0115.
         537720 - Dstop still occurs after applying 112481-02/112827-01;
                  (lockstep consistency err).
         537670 - When hpost in domain C; Domain D dstop.
         537851 - SF15k: at least 4 of 7 domains crashed, one remained up 
                  and running.
         
Sun Alert: 44627

URL: http://sunsolve.Central/cgi/retrieve.pl?doc=intsrdb%2F48223
     http://pts-americas.west/esg/hsg/starcat/ tools/cp-ports-dl.html
Issue Description: 
CHANGE HISTORY:
---------------
I0834-2

DATE MODIFIED: Dec/20/2002
 
UPDATE: REFERENCES, PROBLEM DESCRIPTION, CORRECTIVE ACTION

. PROBLEM DESCRIPTION: has been modified to reflect the revision
                       change on the patchId 112481 from -02 to -06 
                       and replaced another with patchId 112827-01 to
                       112488-06.

. CORRECTIVE ACTION: has been modified to reflect the revision
                     change on the patchId 112481 from -02 to -06 
                     and replaced another with patchId 112827-01 to
                     112488-06. 

. REFERENCES: PatchID 112481 revision change from -02 to -06, and
              replace patchId 112827-01 with 112488-06.

------------------------------  

One or more domains in a multiple domain F12K or F15K system may suffer
a service interruption due to a Domain Stop (DSTOP) or a failure during
POST.  This issue can be recognized by the error message "CP arbiter
lockstep consistency check error".

This issue occurs on Sun Fire 12K/15K systems running SMS 1.1 software,
or on systems running SMS 1.2 software without patches 112481-06 (or
higher) and 112488-06 (or higher).

In cases observed thus far, failing configurations have multiple
domains.  At the time of failure, one or more domains were executing a
'setkeyswitch standby' or 'setkeyswitch on' process, or an Expander
Board was being installed into the platform.

The failure signature can be seen by using redx to examine the
DStop/hardware dump state file.  The failure signature is similar to
the following:

   SDI EX02/S0  Master_Stop_Status0[31:0] = 3000000F
	   MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
   SDI EX02/S0  Dstop0[31:0] = 00428040
           Dstop0[17]: D    DARB texp requests Slot0 Dstop (M)
           Dstop0[22]: D 1E SDI internal CP port requested Dstop
   SDI EX02/S0  CP_Error0[31:0]    = 2000A000  Mask = 580067FF
           CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M)
        	cp0_{dembusp,texp,unload,demand[1:0]} = 00
        	cp1_{dembusp,texp,unload,demand[1:0]} = 14
   FAIL EXB EX2:  Dstop/Rstop detected by SDI EX2/S0.
	   Primary service FRU is EXB EX2.

The failure above could either be a Dstop of a running domain or an
xcstate dump during a POST run.

There are two different contributors to the failures:

    o Insufficient cleanup of DARB ports when an Expander is no longer 
      active in a domain. Later, when boards are again tested, the error 
      state is seen and causes the DARB to report them in a fashion that 
      results in a loss of lockstep.

    o The LBIST tests on the SDI (executed by POST) leave the logic
      structures in a state where after LBIST completes, the SDI can
      drive signals to the CP ASICs such that the DARBs detect a
      "Tslot parity error from SDI" or "GDTransID parity error from 
      SDI", leading to a "CP arbiter lockstep" DStop on another domain.
      
This issue is addressed by SMS 1.2 patches 112481-06 (or higher) and
112488-06 (or higher).  No patches are planned for SMS 1.1.  Customers
are advised to upgrade to SMS 1.2 and to apply the patches.
Implementation: 
---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                            
         ---
        |   |   REACTIVE (As Required)
         ---
Corrective Action: 
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.

1) Apply SMS 1.2 patches 112481-06 (or higher) and 112488-06 (or higher)
   to both System Controllers.  Refer to the patch READMEs for special 
   installation instructions.  
   
2) For additional information, refer to Knowledge Base Article 88000 on 
   SunSolve for detailed steps to identify the specific cause of one of  
   these failures.  The 'cp-ports' utility may also be used for this  
   purpose, and can be downloaded from: 
   
   http://pts-americas.west/esg/hsg/starcat/tools/cp-ports-dl.html
Comments: 
None

============================================================================
Implementation Footnote: 
i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.
 
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------
Statusactive