Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1000049.1
Update Date:2010-09-17
Keywords:

Solution Type  FAB (standard) Sure

Solution  1000049.1 :   On Sun Blade 8000/8000P chassis, the Rear Fan Module may cycle OK on and off repeatedly.  


Related Items
  • Sun Blade 8000 P System
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
200065


Product
Sun Blade 8000 Modular System

Bug Id
<SUNBUG: 6501084>


Impact

The problem is that when a rear fan module is hot plugged into the system, the fan CRU being inserted may not be able to start up. This causes the system to see the CRU as alternately present and absent in a repeating cycle. The "OK" LED associated with the fan CRU will toggle between green and off alternately. The fan will not come on-line.

This problem is specific to fans made by Delta (black triangle logo) with part # PFC1248DE.

The system may respond by reporting a fan fault condition and increasing the speed on all the other fans in the system to maximum. The affected fan will appear as absent in the system inventory. The other fans will cycle up and down in speed repeatedly while the affected fan is in this off-line state.


Contributing Factors

Product:

  • A81 Sun Blade 8000 Modular System
  • A82 Sun Blade 8000 P Modular System

with Parts:

  • F541-0385-01 CRU, Rear Fan, RoHS:Y

The problem is exacerbated in minimally configured systems. Systems with only 1 or 2 blades installed are more susceptible to this problem than a fully loaded system. This is because small configs do not allow much airflow through the fans allowing large back pressure to be created in the fan plenum area.

The affected fans are made by Delta. These fans have a black triangle logo which can be viewed from the exhaust opening in the fan CRU. The fan part number is PFC1248DE which can also be viewed through the grate.


Symptoms

The affected fan CRU LED will be toggling between green (OK) and off. The fan speeds will be oscillating between max and normal roughly every 10 seconds.

The fan CRU will display as Absent in the GUI chassis view.

The following entries are added to the CMM unified log, and blade SEL (filling the latter up pretty quickly requiring an F1 BIOS intervention):

-> show list
/CMM/logs/event/list
  Targets:
Properties:
Commands:
show
ID     Date/Time                 Class     Type      Severity
----- ------------------------ -------- -------- --------
97710  Tue Dec  5 18:41:33 2006  IPMI      Log       critical
ID = 18ef : 12/05/2006 : 18:41:33 : Entity Presence :
/LCMM/RFM1_PRSNT : Device Absent
97709  Tue Dec  5 18:41:27 2006  Chassis   Action    major
Hot removal of /CH/RFM1
97708  Tue Dec  5 18:41:25 2006  IPMI      Log       critical
ID = 18ee : 12/05/2006 : 18:41:25 : Fan : /RFM1/FAN1_ERR :
Predictive Failure Asserted
97707  Tue Dec  5 18:41:25 2006  IPMI      Log       critical
ID = 18ed : 12/05/2006 : 18:41:25 : Entity Presence :
/LCMM/RFM1_PRSNT : Device Present
97706  Tue Dec  5 18:41:25 2006  Chassis   Action    major
Hot insertion of /CH/RFM1
97705  Tue Dec  5 18:41:24 2006  Chassis   Action    major
Hot removal of /CH/RFM1
97704  Tue Dec  5 18:41:22 2006  Chassis   Action    major
Hot insertion of /CH/RFM1

 


Root Cause

The cause of this problem is due to the fact that when a fan is removed, large amounts of air flows into the vacant fan opening. When a fan is installed, this recirculation air flow is directed through the fan CRU causing the fans within to start spinning backwards. When the fan power connector finally makes contact, the fans soft start circuit tries to start the fan. It can be spinning backwards so quickly that it cannot get up to speed before it times out. This time-out is a locked rotor detection mechanism provided by the fan motor controller to protect the fan motor in case it is mechanically stuck.

When the chassis is lightly loaded (just one or 2 blades), there is not much air flow through the fans. This makes the vaccuum in the fan plenum higher causing the fan being inserted to have more difficulty starting. This has been noted more often on SB8000 P (codename A14) chassis, but can also happen on SB8000 chassis (codename A19) as well.

The fan is being modified to provide a braking function to slow it's reverse rotation prior to attempting start-up. This has been proven effective with the samples provided. The updated fan part number is not known at this time.


Resolution

A fan having start-up trouble in this way, can be helped to start by blocking the reverse air flow through the fan. Blocking the fan grate with your hand or some other solid object of similar size and shape as the Rear Fan should be enough to prevent the reverse air flow such that fan is able to start properly overcoming the issue.

Power cycling the chassis (stop all blades, stop /CH from CMM, remove all power cords, and re-plug after 1 minute) will also fix the problem.


Modification History
Date: 29-MAR-2007
  • minor modifications made to Comments and FRU description

 


Date: 08-JUN-2007
  • Updated Corrective Actions section

 



Previously Published As
102861
Internal Comments


No FCO is planned. It is a reactive fix on fail and can be fixed using the procedure outlined ie covering the fan with your hand when inserting.



The issue would only be expected to be seen if a customer removes the fan module and puts it back in live which would only be expected to occur on a fan CRU replacement. Updated fan's that are not affected by this issue are being shipped in new systems, however a small quantity of affected fan's may still be in spares stock.



It is recommended the customer or field engineer check the fan model prior to inserting a replacement Rear Fan and follow the procedure above to cover with their hand, or make it a best practice to do this procedure on all Rear Fan replacements.


Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NSG (Network Systems Group)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Kasp FAB Legacy ID
102861

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2007-03-28
Avoidance: Service Procedure
Responsible Manager: Nick Laplaca
Original Admin Info: null

Internal SA-FAB Eng Submission
Field Action Bulletin (FAB) BLANK Submittal Template

For Preliminary, Non-Hardware and Hardware FABs --

After following instructions for drafting a Field Action Bulletin,
send this ASCII Text filled-in template to:

[email protected]

The email Subject line should be similar to the following:

Draft Field Action Bulletin / Synopsis ...

---------------------------------------------------------------

Synopsis: Rear Fan Module cycles OK on and off repeatedly


Avoidance:
[ ] Binary
[ ] T-Patch
[ ] Patch
[ ] Firmware
[ ] Hardware
[ ] Upgrade
[ ] Workaround
[ ] Reconfiguration
[X] Service Procedure
[ ] None
[ ] Hardware Replacement (Must follow SMI FCO Process) *

* Note:Please follow the below link for details on how
to submit an FCO;

http://sunwebcollab.central.sun.com/gm/folder-1.11.811470

Implementation:

[X] Reactive (Upon Failure)
[ ] Controlled Proactive (H/W FABs require Customer List and
Customer Letter)
[ ] Mandatory (Requires Customer List and Customer Letter)


Product: (Mktg Part Number / Description)
A81 Sun Blade[tm] 8000 Modular System
A82 Sun Blade[tm] 8000 P Modular System

Affected X-Options: (Xoption Part Number / Description)

Affected Parts: (FRU/CRU Part Number / Description)
F541-0385-01 FRU ASSY,REAR FAN MODULE


Issue Description:

Impact

[This should explain the impact to the running system in
general and the impact to the affected component in particular;
in terms of outage, downtime, loss of availability, loss
of data, etc. State the actual impact, for example,
the issue causes a system panic, reset, hang etc. If this is
a serviceability issue, state how it affects the ability to
service or maintain the product.]

The problem is that when a rear fan module is hot plugged into
the system, the fan CRU being inserted may not be able to
start up. This causes the system to see the CRU as
alternately present and absent in a repeating cycle. The "OK"
LED associated with the fan CRU will toggle between green and
off alternately. The fan will not come on-line.

This problem is specific to fans made by Delta (black
triangle logo) with part # PFC1248DE.

The system may respond by reporting a fan fault condition and
increasing the speed on all the other fans in the system to
maximum. The affected fan will appear as absent in the system
inventory. The other fans will cycle up and down in speed
repeatedly while the affected fan is in this off-line state.

Contributing Factors

[List anything, such as specific operating environments and/or
configurations that may contribute to the issue.]

The problem is exacerbated in minimally configured systems.
Systems with only 1 or 2 blades installed are more
susceptable to this problem than a fully loaded system. This
is because small configs do not allow much airflow through
the fans allowing large back pressure to be created in the
fan plenum area.

Symptoms

[Provide exact error messages and where and when the error
messages are likely to occur.]

The affected fan CRU LED will be toggling between green
(OK) and off. The fan speeds will be oscillating between
max and normal roughly every 10 seconds.

The fan CRU will display as Absent in the GUI chassis view.

The following entries are added to the CMM unified log, and
blade SEL (filling the latter up pretty quickly requiring an
F1 BIOS intervention):

-> show list

/CMM/logs/event/list
Targets:

Properties:

Commands:
show

ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------
97710 Tue Dec 5 18:41:33 2006 IPMI Log critical
ID = 18ef : 12/05/2006 : 18:41:33 : Entity Presence : /LCMM/RFM1_PRSNT :
Device Absent
97709 Tue Dec 5 18:41:27 2006 Chassis Action major
Hot removal of /CH/RFM1
97708 Tue Dec 5 18:41:25 2006 IPMI Log critical
ID = 18ee : 12/05/2006 : 18:41:25 : Fan : /RFM1/FAN1_ERR : Predictive Fai
lure Asserted
97707 Tue Dec 5 18:41:25 2006 IPMI Log critical
ID = 18ed : 12/05/2006 : 18:41:25 : Entity Presence : /LCMM/RFM1_PRSNT :
Device Present
97706 Tue Dec 5 18:41:25 2006 Chassis Action major
Hot insertion of /CH/RFM1
97705 Tue Dec 5 18:41:24 2006 Chassis Action major
Hot removal of /CH/RFM1
97704 Tue Dec 5 18:41:22 2006 Chassis Action major
Hot insertion of /CH/RFM1


Root Cause

[This is ultimate cause of the issue and can be provided in
engineering/technical terms.]

[Explain how and when the issue was resolved by manufacturing,
engineering or by the vendor.]

The cause of this problem is due to the fact that when a fan
is removed, large amounts of air flows into the vacant fan
opening. When a fan is installed, this recirculation air flow
is directed through the fan CRU causing the fans within to
start spinning backwards. When the fan power connector
finally makes contact, the fans soft start circuit tries to
start the fan. It can be spinning backwards so quickly that
it cannot get up to speed before it times out. This time-out
is a locked rotor detection mechanism provided by the fan
motor controller to protect the fan motor in case it is
mechanically stuck.

When the chassis is lightly loaded (just one or 2 blades),
there is not much air flow through the fans. This makes the
vaccuum in the fan plenum higher causing the fan being
inserted to have more difficulty starting. This has been
noted more often on SB8000 P (codename A14) chassis, but
can also happen on SB8000 chassis (codename A19) as well.

The fan is being modified to provide a braking function to
slow it's reverse rotation prior to attempting start-up. This
has been proven effective with the samples provided. The
updated fan part number is not known at this time.

Corrective Action:

Supported Workaround (if available)

Final Resolution

[Provide recommended action for Sun Field personnel to
follow in order to implement this fix in the field.]

[A detailed step-by-step procedure for implementing the fix,
or high level instructions together with pointers to specific
documentation pages.]

[List all relevant product manuals, documentation and URL's which
will help to implement the Corrective Action.]

A fan having start-up trouble in this way, can be helped to
start by blocking the reverse air flow through the fan. If
the speed of the other chassis fans is relatively low, then
blocking the fan grate with your hand may be sufficient. If
the fan speeds are higher, a better method is to put a piece
of paper or cardboard over the fan. The paper will initially
be sucked against the fan CRU and will blow off once the fan
starts.

Power cycling the chassis (stop all blades, stop /CH from CMM,
remove all power cords, and re-plug after 1 minute) will also
fix the problem.

Identification of Affected Parts (how to):

[State whether the affected components require visual inspection.]

[Explain how the field would identify the "bad" or affected
parts from the "good" or fixed parts.]

[Give precise commands, syntax, and sample output.]

[State which Explorer files/directories might also be utilized
for determining affected parts/product.]

The affected fans are made by Delta. These fans have a black
triangle logo which can be viewed from the exhaust opening in
the fan CRU. The fan part number is PFC1248DE which can also
be viewed through the grate.

Hardware Remediation and Material Availability Details:
(For Hardware FABs only)

[List estimated dates seedstock material will be available
in each Timezone. This information should be acquired from
the Services Logistics representative.]


Comments:

[List anything specific to this asset that isn't already
listed above.]

Engineering has designed an "instruction label" which is
to be assembled onto the fan CRU for fans being held as
replacements. This label is to be left on the fan CRU until
after the fan is installed. The instructions on the label
indicate the label should be kept in place until after the
fan is installed and has started. Fans that are held as
replacement stock will come with these labels attached. That
does not solve the problem of a customer removing and
replacing an existing fan in their chassis however.

References:

* BugID: CR6501084: Delta rear fan modules sometimes experience startup problems
* Escalation ID:
* Sun Alert:
* Pending Patches:
* Resolution Patches:
* Other References:
* Reference Manual:
* Related URL(s):
* ECO:
* GSAP:
* WW Stop Ship:
* Radiance ID:

Contacts:

* Contributor: Oliver Sharwood
* Responsible Engineer: Scott Bleiweiss
* Responsible Manager: Nick Laplaca
* Business Unit Group:

[ ] SSG WGS (Workgroup Systems)
[ ] SSG NSN (Netra Systems and Networking)
[ ] SSG ES (Enterprise Systems)
[ ] SSG SW (Platform Software)
[ ] SSG PNP (Processor)
[X] NSG (Network Systems Group)
[ ] NWS (Network Storage)
[ ] OP/N1 RPE (Operating Platforms/N1 Revenue Product Engin.)
[ ] JPSE (Java Platform Sustaining Engineering)
[ ] JWSSE (Java Web Services Sustaining Engineering)
[ ] USG (User Software Group)
[ ] Other - Please specify
Product_uuid
42c5a02e-c0f1-11da-857a-080020a9ed93|Sun Blade 8000 Modular System

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback