Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1280772.1
Update Date:2011-03-08
Keywords:

Solution Type  FAB (standard) Sure

Solution  1280772.1 :   FCO A0310-1: Sun SPARC M9000-64 XBU boards may fail because of an inadequate clock connector design.  


Related Items
  • Sun SPARC Enterprise M9000-64 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Hardware Remediation>Mandatory
  •  




In this Document
  Symptoms
  Changes
  Cause
  Solution


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: FABs available to Internals and Partners only

Applies to:

Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Information in this document applies to any platform.
__________

Affected Parts:

371-2240-01 - Crossbar Unit, XBU_B

Symptoms

Customers may see FMA diagnosed faults (as reported on the Active role XSCF) similar to any of the following signatures:
   Jan 07 00:15:46.5751 ereport.chassis.SPARC-Enterprise.asic.xb.test (fmdump -e output)
   Jan 7 00:15:52 xscfhost Alarm: /XBU_B#3/CL,/XBU_B#3,*:SCF:XB clk-cable test error (showlogs monitor output)
   Jan 7 08:09:33 xscfhost Alarm: /XBU_B#11,/XBU_B#3,*:ANALYZE:XBU-XBU interface fatal error (showlogs monitor output)

The XSCF 'showstatus' command output would appear as follows:
   *   XBU_B#1 Status:Deconfigured;
 * XBU_B#3 Status:Faulted;
 * XBU_B#5 Status:Deconfigured;
 * XBU_B#7 Status:Deconfigured;
 * XBU_B#9 Status:Deconfigured;
 * XBU_B#11 Status:Degraded;
 * XBU_B#13 Status:Deconfigured;
 * XBU_B#15 Status:Deconfigured;

Impact

When a Crossbar Unit (XBU_B) fails, all platform domains will crash. The system will attempt to automatically deconfigure the failed unit and recover onto a degraded half backplane.

Changes

Contributing Factors

Crossbar Units (371-2240-01) residing in M9000-64 are vulnerable to a clock connector related failure. The connector is susceptible to damage from an over-torqued cable connection. This connection exists only on an M9000-64 when XBU_B in the base and expansion cabinets are linked. The connector is not used in the M9000-32 and hence it is not exposed to the problem.

Failures may occur at installation time or deteriorate to a failure over time.

Due to the number of XBU spares required per system, the list of affected customers is being maintained at the region level by the Regional FCO Drivers. This list prioritizes the order of systems to be remediated.  Due to limited availability of XBU spares, remediation will need to be based on a prioritized customer list, and your Regional FCO Driver will make known to you when parts are available to you to order for your customer.

Cause

Root Cause

The clock connector mount on the revision -01 XBU_B provided insufficient strength to protect the circuit board interface from over torquing damage. The board connector mount was redesigned to sustain greater forces and in order to incur no damage.

All M9000 systems manufactured after May 2009 included revision -02 or higher Crossbar Units which include the strengthened connector redesign. Service spares were reworked to -02 via GSAP 4660.A beginning on October 5, 2009.

Solution

Target Completion Date: January 20, 2013

Hot Swappable? No

Workaround

No workaround is available - see Resolution section below.

Resolution

This is a two year proactive FCO and requires a valid hardware contract on each system to be remediated.

 Until Target Completion Date listed above, proactively replace all 371-2240-01 Crossbar Unit (XBU_B) in M9000-64 system with 371-2240-02 (or above).  After the Target Completion Date systems should be only remediated per standard break-fix processes.

Identify the number of -01 boards which must be replaced. Work with your Regional FCO Driver identified in the "Hardware Remediation and Material Availability Details" section below to order that number of Crossbar Units along with one XBU Mitigation Kit (p/n 555-1959-01) per system.

The XBU Mitigation Kit will include a clock connector torque tool, 16 static protection bags used for repackaging returned Crossbar Units and an instruction manual.

For replacement procedures the SPARC Enterprise M8000/M9000 Servers Service Manual can be obtained via the below URL;

  http://download.oracle.com/docs/cd/E19415-01/819-4202-17/819-4202-17.pdf

An Oracle legal approved Customer Letter is attached.

Identification of Affected Parts (how to)

All 371-2240-01 residing in M9000-64 systems are impacted by this FCO. The number of units within the system can be determined by logging into the Active role XSCF and executing the 'showhardconf' command. Output will be similar to the below:

    XBU_B#0 Status:Normal; Ver:0201h; Serial:PP074403LA  ;
       + FRU-Part-Number:CA06620-D302 A0 /371-2240-01 ; <=== Part # found on this line

<snip>

   XBU_B#15 Status:Normal; Ver:0201h; Serial:PP0744052T ;
      + FRU-Part-Number:CA06620-D302 A0 /371-2240-01 ;

Locate the FRUs identified as XBU_B#xx. Only 371-2240-01 parts are impacted. All higher dash levels have been redesigned and are not vulnerable to the clock connector damage.

Parts may also be identified in the XSCF snapshot output in the xscf_command/@tmp@cli@[email protected] file. You can easily count the number of parts by executing this command:
    "grep 371-2240-01 @tmp@cli@[email protected] | wc -l"
Note: It is important that the above procedures be used to obtain an accurate count of the revision -01 boards in the platform. Intermediate field service actions may have replaced revision -01 Crossbar Units with higher level parts. The system can hold a total of 16 Crossbar Units (XBU_B).


Hardware Remediation and Material Availability Details

At time of publication of this FAB all Regions were Materially Ready to support this activity. However, due to the number of units needed per system to address this issue, the field should work with their Regional FCO Drivers before placing orders. The Regional FCO Drivers are identified below:

   North America:  [email protected]
   EMEA:  [email protected]  -or-  [email protected]
   Latin America:  [email protected]
   Japan:  [email protected]
   APAC:  [email protected]

Comments

If you have questions about this FCO send email to the below alias;

    [email protected]

References


   BugID: 6842585
   ECO: WO_40103
   GSAP: 4660.A


For information about FAB documents, its release processes, implementation strategies and billing information, click here.

In addition to the above you may email:

    [email protected]

Contacts:

Contributor: [email protected]
Responsible Engineer: [email protected]
Responsible Manager: [email protected]
Business Unit Group: Systems Group-OPL (Fujitsu, M4000 through M9000)


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback