Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1000792.1
Update Date:2010-08-30
Keywords:

Solution Type  FAB (standard) Sure

Solution  1000792.1 :   Failure to properly tighten System/Motherboard or PDB on the Sun Fire X4100 and X4200 can result in a system outage or thermal event.  


Related Items
  • Sun Fire X4200 M2 Server
  •  
  • Sun Fire X4100 Server
  •  
  • Sun Fire X4100 M2 Server
  •  
  • Sun Fire X4200 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
201070


Product
Sun Fire X4100 Server
Sun Fire X4100 M2 Server
Sun Fire X4200 M2 Server
Sun Fire X4200 Server

Date of Resolved Release
03-JAN-2007


Impact

Major system outage and damage may result, as well as potential customer safety implication, extended downtime and high cost of recovery.


Contributing Factors

When servicing the Sun Fire X4100 or X4200 platforms for System Board or Power Distribution Board issues, it is IMPORTANT that care and diligence are applied to the bus bar assembly.

The following FRU Part Numbers are impacted by this issue:

Affected
Part Numbers    Description
____________    ___________
501-7261        System Board, Sun Fire X4100
501-7644        System Board, RoHS, Sun Fire X4100
501-7513	System Board, RoHS, Sun Fire X4100
501-6974	System Board, Sun Fire X4200
501-7645	System Board, RoHS, Sun Fire X4200
501-7514	System Board, RoHS, Sun Fire X4200
501-7590	System Board, RoHS, Sun Fire X4200 M2
501-7668	System Board, RoHS, Sun Fire X4100 M2
501-6920	DC Power Distribution Board, Sun Fire X4100/X4200

 


Symptoms

Failure to properly tighten the System/Motherboard or the DC Power Distribution Board bus bar connections on the Sun Fire X4100 and X4200 has resulted in major Thermal events on customer systems and result in a Sun requirement to provide full system exchanges. These events can be characterized as a burning smell and/or failure to power on, or in some instances severe smoke exhausting from the product.


Workaround

Resolution

*** DO NOT SERVICE THESE FRUs WITHOUT THE PROPER TOOLS ***

Only attempt to remove and refit the System/Motherboard and/or the Power Distribution Board board in the systems indicated above using the correct tools, preferably a properly calibrated Torque driver and bits. DO NOT use pliers to try to secure the nut on the bus bar.

Where the Galaxy Motherboard has the closed acorn style nuts (Sun p/n 240-4779-01), replace with open flange nuts (Sun p/n 240-5984-01) included in the Motherboard/PDB FRU kit.

A picture showing both the old acorn and new flange nut types can be seen via the below URL;

http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/Nut_compare.pdf

If using a torque driver, select one that is adjustable at a minimum over the range of 7 to 20 in/lbs, with an accuracy of better than 6%, and which accepts 1/4" bits for use with nut and screwdriver bits as specified below.

All screws should be torqued to the factory settings of 7.5in/lbs (0.847385 Newton Meters) and 18in/lbs (2.03372 Newton Meters) for nuts. Be aware that a new flange type nut offers greater resistance due to the non-metallic insert, therefore auto-torque drivers may torque-out prematurely due to initial resistance.

If using a nut driver this would be an 8mm Nut Driver with a 1/4" Hex Bit for chuck type collet. These are available from local hardware distributors.

For the screw side, a #2 Phillips Hex Bit with a 3 inch length x 1/4" Hex Shank.

After replacing or servicing the product, the field service representative must test the integrity of the bus bar connection by running the diagnostic released to validate this part of the product. The bus bar Diagnostic and ReadMe file can be downloaded from the below Internal Only link;

http://nsgrelease.sfbay/galaxy12/releases/G12x-SW1.3-rc38/ops/061215/

For Service Partners without SWAN access the bus bar Diagnostic and ReadMe files can be downloaded from the below links;

+ busbar Diagnostic;

http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/busbar

+ busbar ReadMe;

http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/busbar.README

A PASS result from this diagnostic should be demonstrated after servicing ANY part of the system. This completion should be recorded in the radiance notes.

Note: This diagnostic should NOT be left with the customer!

How to install and run the BusBar Test:

1) Copy the latest busbar tool to the service processor  coredump directory.
scp busbar sunservice@?sp_ip?:/coredump  <cr>  where ?sp_ip? is the target IP address
.....continue conection (yes/no)? yes <cr>
password:changeme <cr>
2) ssh into the targeted system
ssh sunservice@?sp_ip? <cr>
password:changeme <cr>
cd /coredump
3) Run the busbar test (see example How to install/run Busbar test example below.)
#./busbar <loopcnt> <system name>
loopcnt - This is the number of time you wish busbar to run.  If this value is 0 then busbar will run forever.
Recommendation is to set this to '1' when executing the diag in the field.
System name -  This specifies the machine type to test.  Below is a list of systems known to busbar.
system name
g1      = Galaxy1
g2      = Galaxy2
g1e     = Galaxy1e
g2e     = Galaxy2e
g1f     = Galaxy1f
g2f     = Galaxy2f
cnst    = Constellation
4) After the test,  reboot SP to get normal SP functionality and its state back.
#/etc/init.d/reboot <cr>

Test Description:

The busbar diagnostic was developed to find systems with poor busbar connections. This is done by reading the 12 volt sensor twice. The first time the 12 volt sensor is read with the system in reset and the fans spun down so as to minimize the load on the system. The second time the 12 volt sensor is read with the system running and the fans at their highest rpm so as to maximize the load on the system. The two numbers are compared, if the difference between the two is greater than 5% then there may be an issue with the bus bar connection and an error will be generated.

How to install/run BusBar Test example:

$ cd cygwin
$ cd /busbar/
$ cd busbartest/
$ scp busbar [email protected]:/coredump ****** STEP # 1******
The authenticity of host '10.6.78.122 (10.6.78.122)' can't be established.
RSA key fingerprint is 55:c9:05:b4:84:f2:33:6a:26:0b:22:cd:67:ca:02:9e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.6.78.122' (RSA) to the list of known hosts.
[email protected]'s password:
busbar 100% 211KB 211.2KB/s 00:00
$ ssh [email protected] ****** STEP # 2******
[email protected]'s password:
[(flash)root@SUNSP0003BAF20AF0:~]# cd /coredump
[(flash)root@SUNSP0003BAF20AF0:/coredump]# ls
0ABGA037.ROM busbar spdiag.log
0ABGA938.ROM hdt still_mounted
50_busbar_spdiag.log messages
bbar.sh spdiag
[(flash)root@SUNSP0003BAF20AF0:/coredump]# ./busbar 1 g1 ****** STEP # 3******
Parsing command line . . .
Storing test info in spdiag.log
Machine Type: G1
killall: Could not kill pid '685': No such process
killall: cdserver: no process killed
killall: fdserver: no process killed
Stopping IPMI Stack....Done.
sh: /etc/init.d/cmm: not found
execute BusBar
Shutting down system
Powering system on
Running BusBar test on mb.v_+12 voltage
MINIMUM LOAD 12V MAXIMUM LOAD 12V RESULT
---------------- ---------------- ------
+12.12V +12.12V PASS
DONE. Please, reboot SP to get normal SP functionality and its state back.
[(flash)root@SUNSP0003BAF20AF0:/coredump]# /etc/init.d/reboot ****** STEP # 4******
Rebooting...
The system is going down NOW !!
Sending SIGTERM to all processes.
Connection to 10.6.78.122 closed by remote host.
Connection to 10.6.78.122 closed.

 


Modification History
Date: 30-JAN-2007
  • Added "How to" instructions to end of Corrective Action section and additional tool details near the beginning of the Corrective Action section.

Date: 05-APR-2007
  • Changed open flange nut p/n from 240-5894 to 240-5984 in Corrective Action section.


Previously Published As
102770
Internal Comments


Definition of collet: a cone-shaped chuck used for holding cylindrical pieces in a lathe.


Related Information
  • Service Request: 65089025
  • ECO: WO_34899
  • URL: http://nsgrelease.sfbay/galaxy12/releases/G12x-SW1.3-rc38/ops/061215/
    http://alert.west/smi_detail.cgi?I:25469521
    http://sdpsweb.central/FIN_FCO/FAB/102770/SPE/Nut_compare.pdf

Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
KE Authors

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Kasp FAB Legacy ID
102770

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2007-01-17
Avoidance: Service Procedure
Responsible Manager: [email protected]
Original Admin Info: WF - Initial draft started on 1/5/07 and had to recreate this asset due to a bug in the way xoptions were being displayed. Awtg external link to the referenced Diagnostic and ReadMe - Joe
WF - Submitter reviewed and requested minor changes to Resolution section on 1/10/07. Will release with internal only link and update once the external link to the Diag and ReadMe are provid - Joe
WF - sent off to extended review on 1/10/07 - Joe
WF - added ECO reference per Mike Persichetty on 1/10/07 - Joe
WF - added link to Nut_compare pic now that sdpsweb server is back up 1/17/07 - Joe
WF - sending to publish on 1/17/07 - Joe
WF - after publication sponsor requested TNS host diag and readme files
for Partners to have access to via SPE. Put files up on sdpsweb
and added links to these two files and republished FAB.
WF - added more specific diag instructions and example in Corrective
Action section - Joe 1/30/07
WF - corrected open flange nut part number in Corrective Action
section. - Joe Apr/05/07
Product_uuid
54e2ac49-df71-11d9-89e6-080020a9ed93|Sun Fire X4100 Server
5b03d0ed-216d-11db-a023-080020a9ed93|Sun Fire X4100 M2 Server
c15f7881-216e-11db-a023-080020a9ed93|Sun Fire X4200 M2 Server
c6e795ef-df6f-11d9-89e6-080020a9ed93|Sun Fire X4200 Server

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback