Document Audience:INTERNAL
Document ID:I0587-1
Title:Fatal Reset Errors occurring on the Enterprise Servers may not be diagnosed as E-Cache Tag Parity Errors.
Copyright Notice:Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved
Update Date:2000-07-26

---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)
FIN #: I0587-1
Synopsis: Fatal Reset Errors occurring on the Enterprise Servers may not be diagnosed as E-Cache Tag Parity Errors.
Create Date: Jul/26/00
Keywords: 

Fatal Reset Errors occurring on the Enterprise Servers may not be diagnosed as E-Cache Tag Parity Errors.

Top FIN/FCO Report: Yes
Products Reference: E-Cache Parity Error Failure
Product Category: Server / System CPU Module
Product Affected: 
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------
Systems Affected
----------------
  -       E3000       ALL    Ultra Enterprise 3000          -
  -       E3500       ALL    Ultra Enterprise 3500          -
  -       E4000       ALL    Ultra Enterprise 4000          -
  -       E4500       ALL    Ultra Enterprise 4500          -
  -       E5000       ALL    Ultra Enterprise 5000          -
  -       E5500       ALL    Ultra Enterprise 5500          -
  -       E6000       ALL    Ultra Enterprise 6000          -
  -       E6500       ALL    Ultra Enterprise 6500          -

X-Options Affected
------------------
  -         -         -           -             -
Parts Affected: 
Part Number   Description   Model
-----------   -----------   -----
     -             -          -
References: 
URL:  http://bestpractices.central
FIN:  I0570-3
Issue Description: 
Systems may suffer down time due to Fatal Reset Error. If the failure
message is not understood as E-Cache Tag Parity Error, the solution rendered
may not fix the problem resulting in troubleshooting delays and extended
downtime.  

The Ultra Enterprise Server products from UE3X00 to UE6X00 in the field
may be susceptible to this problem.  Fatal Reset Errors that have the
IPREP and FERR error bits set in the Address Controller Error Status
Register indicate that an Ecache Tag parity Error may be the cause of
the Fatal Reset.  In the typical error message shown below the failure
can be diagnosed to be an Ecache Tag Parity Error on Board 0 CPU 0.
Note that UPA_A_ERR refers to processor 0 and UPA_B_ERR refers to
processor 1.

   Fatal Reset
   0,0>   FATAL ERROR
   0,0>	   At time of error: System software was running.
   0,0>	   Diagnosis: Board 0, UPA PORT Device, AC
   0,0>   Log Date: May 22 12:02:03 GMT 2000
   0,0>
   0,0>   RESET INFO for CPU/Memory  board in slot 0
   0,0>	   AC ESR 00000000.00600001 IPREP FERR UPA_A_ERR  <-----<<<    
   0,0>	   DC[0] 00
   0,0>	   DC[1] 00
   0,0>	   DC[2] 00
   0,0>	   DC[3] 00
   0,0>	   DC[4] 00
   0,0>	   DC[5] 00
   0,0>	   DC[6] 00
   0,0>	   DC[7] 00
   0,0>	   FHC  CSR 00050200 LOC_FATAL SYNC NOT_BRD_PRES
   0,0>	   FHC RCSR 02000000  FATAL

Fatal reset Error messages are only visible from the machine console,
or TTYA.  To see these messages it is necessary to log the output from
the console, for example through a "tip" connection.

UPA devices, such as CPU's, report error conditions that should cause a
fatal system reset to the Address Controller by encoding the P_REPLY
lines with P_ERR at any time.  This results in the setting of the FERR
bit in the AC ESR and the corresponding fatal reset.

The IPREP bit is normally set when the P_REPLY lines are driven with
one of the reserved encodings, however it will also always be set
whenever the FERR bit is set - this is normal and can be disregarded.

The Spitfire (UltraSparcII Processor) system CPU's will send a 
P_ERR P_REPLY to the AC for the following two conditions;

   1. Parity error detected on UPA address bus while AC
      is the bus master & CPU is the slave.

   2. E-Cache tag parity error.  This is not reported as a trap
      as is an E-Cache data error as system coherence is lost
      for this condition and the system must be reset.

Improper torque may cause scenario (1) above but since the address bus
is a bi-directional and the CPU is bus master, most of the time a
mechanical connection problem would probably also produce fatal reset
errors with the UPA_PERR bit set, i.e., the CPU is address bus master
and the AC saw the parity error. Absence of this type of failure
indicates scenario (2) is the most likely cause.

Confirmation of an E-Cache tag parity error requires inspection of the
CPU's AFSR register which is not displayed in the current OBP release.
The current plan is to log the AFSR in newer prom version 3.2.27
expected to release before end of the year 2000.
Implementation: 
---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
Corrective Action: 
Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problems by following the recommendations as
shown below;

Any Fatal Reset errors in the UE 3X00 to UE 6X00 servers should be 
evaluated to determined whether it is because of Ecache Tag error.  Do
not assume that a Fatal Reset error implies that the system board
is defective.

If the Fatal Reset error is displayed and the IPREP and FERR are present
on the AC ESR line and if UPA_PERR is not present then suspect that an
Ecache Tag parity error is most likely cause.  The board number in the
error message and the UPA_A_ERR or UPA_B_ERR will identify the CPU module
suspected of causing the Fatal Reset Error.

If the Fatal Reset error is displayed and the IPREP and FERR are
present on the AC ESR line and if UPA_PERR is present then suspect a
mechanical connection or improper torque problem.

Always refer to the current Best Practices document to determine the
most current guidelines.
Comments: 
--------------------------------------------------------------------------
Implementation Footnote: 
i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
---------------------------------------------------------------------------
Statusinactive