Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1020509.1
Update Date:2010-09-23
Keywords:

Solution Type  FAB (standard) Sure

Solution  1020509.1 :   FCO A0304-1: T6300 Blades with down rev foureyes chip firmware causing erroneous Failed, Hot Insertion and Removal messages.  


Related Items
  • Sun Blade T6300 Server Module
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Hardware Remediation>Reactive
  •  

PreviouslyPublishedAs
259708


Bug Id
<SUNBUG: 6720809>, <SUNBUG: 6695705>, <SUNBUG: 6762599>

Product
Sun Blade T6300 Server Module

Date of Resolved Release
06-Jul-2009

Foureyes chip firmware causing erroneous Failed, Hot Insertion and Removal messages (see details below).

Affected X-Options:
 
X5705A / 5714A
X5706A / 5715A
X5707A / 5716A
X5708A / 5717A
 
Affected Parts:
 
541-2317-06 (or below)  0 MB, 6-Core, UltraSPARC T1, 1.0GHz Sun Blade T6300 Server Module
541-2318-06 (or below)  0 MB, 8-Core, UltraSPARC T1, 1.0GHz Sun Blade T6300 Server Module
541-2319-06 (or below)  0 MB, 8-Core, UltraSPARC T1, 1.2GHz Sun Blade T6300 Server Module
541-2320-05 (or below)  0 MB, 8-Core, UltraSPARC T1, 1.4GHz Sun Blade T6300 Server Module

Impact

Sun T6300 Blade (A94) and chassis CMM exhibits erroneous Failed, Hot Insertion and Removal messages.  While these messages do not effect system performance, functionality or reliability and are completely benign, some customers may not be comfortable with these messages as it creates a perception of instability of the blade/chassis product.

Contributing Factors

This issue manifests with the above listed T6300 Blade Module part numbers along with Blade sysFW 6.7.1 (or earlier).
 
The problem is exacerbated by:
 
 - The number of T6300 blades in a system
 - The position of the T6300 Blades in relation to other blade types
 - The version of CMM hardware and CMM firmware in the system
 
These messages will appear on a Sun Blade 6000 chassis with a minimum of one T6300 Blade, and become more prevalent as the blade count increases, if other blade types are present, and more so if the T6300 blades are located in slots above that of different blade types (i.e. T6300 in slots 6 and 7 while other blade(s) are in slots 0-5).  Sun Blade 6000 chassis CMM, part number 371-1447-05 (or below), can increase susceptibility to the messaging.

Symptoms

When this issue manifests itself, the following messages appear when SP has been powered on.
 
Examples of T6300 blade messages:
 
  SC Alert: PSU at MP/PS1 has been removed.
  SC Alert: PSU at MP/PS1 has been inserted.
  SC Alert: NEM at MP/NEM0 has been removed.
  SC Alert: PSU at MP/PS0 has been removed.
  SC Alert: PSU at MP/PS0 has been inserted.
  SC Alert: SYS_FAN at MP/FM3/FIN has FAILED.
  SC Alert: SYS_FAN at MP/FM3/FOUT has FAILED.
  SC Alert: SYS_FAN at MP/FM6/FIN has FAILED.
  SC Alert: SYS_FAN at MP/FM6/FOUT has FAILED.
  SC Alert: SYS_FAN at MP/FM7/FIN has FAILED.
  SC Alert: SYS_FAN at MP/FM7/FOUT has FAILED.
  SC Alert: NEM at MP/NEM1 has been removed.
  SC Alert: NEM at MP/NEM1 has been inserted.
  SC Alert: PSU at MP/PS0 has FAILED.
  SC Alert: PSU at MP/PS1 has FAILED.
 
Examples of chassis CMM events:
 
  1319  Thu Jan  1 00:01:56 1970  Chassis  Action  major  Hot insertion of /CH/NEM1
  1318  Thu Jan  1 00:01:56 1970  Chassis  Action  major  Hot insertion of /CH/NEM0
  1317  Thu Jan  1 00:01:56 1970  Chassis  Action  major  Hot insertion of /CH/PS1
  1316  Thu Jan  1 00:01:56 1970  Chassis  Action  major  Hot insertion of /CH/PS0
  1315  Thu Jan  1 00:01:56 1970  Chassis  Action  major  Hot insertion of /CH/BL6
  1314  Thu Jan  1 00:01:56 1970  Chassis  Action  major  Hot insertion of /CH/BL5

Root Cause

T6300 blades are being starved of I2C bus bandwidth and cannot reliably poll for NEM and PSU presence.  The hardware does not provide direct PSU/NEM presence bits so the ALOM must poll for status across the I2C bus.  ALOM interprets an access timeout as device not present.  When bus access starvation occurs the erroneous device removed message is reported.

This issue was addressed in Manufacturing via ECO# WO_39440 as of October 10, 2008 and in Services via GSAP 4795 beginning April 30, 2009.

Corrective Action

Workaround:
 
In a chassis configuration populated with 1 to 3 T6300 blades, move these blades to the lower chassis slots and move blades of other types to the higher numbered slots.
 
Also, see that CMM is at p/n 371-1447-06 (or above), which provides I2C bus bridge FourEyes chip firmware v1.3.  Further ensure that CMM ILOM firmware is at 2.0.3.2 (or later), which provides reduced I2C bus traffic by utilizing VLAN protocol.
 
As another temporary T6300 blade workaround so that the blade OS /var/adm/messages files don't get inundated with false messages, set sys_eventlevel to a 0 or a 1 in the T6300 ALOM.  This will suppress the messages from being logged to OS /var/adm/messages logs.  The insert/remove messages are "major" messages, and level 1 is for critical and 0 is for zero messages to be logged to the /var/adm/messages file.  We recommend level 1 so that you still get critical messages logged.  Make note that this will not stop the messages from being logged on the CMM or the T6300 blade SPs, as there is no way to suppress these.
 
Resolution:
 
Hot Swappable:
No

Upon failure only, replace as follows;
 
 . replace 541-2317-06 (or below) with 541-2317-07 (or above)
 . replace 541-2318-06 (or below) with 541-2318-07 (or above)
 . replace 541-2319-06 (or below) with 541-2319-07 (or above)
 . replace 541-2320-05 (or below) with 541-2320-06 (or above)
 
Affected T6300 blade modules must be replaced as outlined above, which includes I2C bridge chip firmware v1.4 (or later).  This firmware is not field upgradeable.
 
Firmware v1.4 contains I2C bus Dynamic Arbitration and Dynamic Priority features (originally released in v1.3) that are needed for Round Robin Bus Access.  All blades must use this polling mechanism to ensure fair access to the I2C Bus and prevents the timeouts that produce the false messaging.
 
In addition to the required motherboard levels above the System Firmware must also be upgraded to sysfw 6.7.2 (or later), which contains ALOM code to enable the dynamic arbitration/priority feature within the bridge chip.
 
Earlier production of T6300 blades use a static priority scheme based upon slot position for I2C Bus access.
 
The Sun Blade 6000 CMM should be at Sun p/n 371-1447-06 (or above), and CMM ILOM firmware should be at 2.0.3.2 (or later).  Please note only a very small number of Sun Blade 6000 CMM at Sun p/n 371-1447-05 (or below) were shipped to the field.  If you are affected by this FCO and your Sun Blade 6000 CMM is at Sun p/n 371-1447-05 (or below), replace with Sun Blade 6000 CMM p/n 371-1447-06 (or above).  To check the CMM you need to access the CMM as sunservice as shown in the below example.

telnet dtnts214-248 7031
Trying 10.6.214.248...
Connected to dtnts214-248.sfbay.sun.com.
Escape character is '^]'.

SUNCMM00144F6B9F76 login: sunservice
Password:

Copyright 2008 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.

WARNING: The "sunservice" account is provided solely to allow
Sun Services to perform diagnosis and recovery tasks. Customer use of
the "sunservice" account may interfere with the correct operation of
ILOM and is not supported other than to perform recovery procedures as
documented by Sun Microsystems. Normal ILOM operations must be performed
using other (non-"sunservice") accounts. Further usage of the "sunservice"
account implies your agreement with these terms.

[(flash)root@SUNCMM00144F6B9F76:~]#i2ct -t --info
Four-Eyes v1.3
reg->STATUS@0x02 = 0x05
reg->BRID@0x06 = 0x02
reg->NXTBRID@0x07 = 0x00
reg->LSTBRID@0x08 = 0x00
reg->FLTRCTL@0x03 = 0x00
reg->DAID@0x05 = 0xE0
reg->DACTL@0x04 = 0x03

Important!  Defective blades should be returned, with "FCO 304" written on the Defective Material Tag, as soon as possible to avoid any material availability issues.

Identification of Affected Parts (how to):
 
The following procedure is the only accurate method to identify affected blade(s);

 1) Access the SP of the Target blade
 2) Type "shownetwork"
 3) Set the SP to ssqa mode, type...

      sc> setsc sc_ssqamode true xyz  (xyz= last nibble from the three last bytes from the mac addr)

 4) Get the Four Eyes firmware version...

      sc> i2cp 0x20 3 2 0 0 1.

    If the command returns a value == 12, the blade is impacted as described above.

    If the command returns a value == 14, the blade is not affected.

 5) Upon finishing the data gathering above the sc_ssqamode parameter needs to be
    set back to false.

      sc> setsc sc_ssqamode false

As a final note please refrain from using the showfru, ipmitool fru command, or even visual inspection to determine affected blades as it has been found that fruid dash level information and blade revision labeling can be inaccurate.

Hardware Remediation and Material Availability Details:

All Regions/Timezones were materially ready at the time of publication of this knowledge asset.

Check with your Logistics Representative or TZ / Country / Area FCO Manager for more information with regard to material availability and the parts ordering process for this FCO.

References:
 
  BugID: 6720809, 6695705, 6762599
  Sun Alert: 248186
  ECO: WO_39440
  GSAP: 4550, 4795


For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

For Sun Authorized Service Providers go to:

In addition to the above you may email:



Modification History
Changes made since initial publication.

09-Jul-2009
  • Clarified statement about proper dash level of CMM in Resolution section.
31-Jul-2009
  • Added instructions in Resolution section on how to check CMM.

Internal Contributor/submitter
[email protected]

Internal Eng Responsible Engineer
[email protected] Responsible Manager: [email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Eng Business Unit Group
SSG ES (Enterprise Systems)

Internal Sun Alert & FAB Admin Info
21-May-2009: Finalized draft and sent to FCO Tiger Team for review.
06-Jul-2009: Material Readiness acquired - sending to Publish.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback