Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003308.1
Update Date:2010-09-22
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003308.1 :   Sun Fire[TM]12K/15K/E20K/E25K: esmd warning; A power failure has been detected on a redundant power supply at ...  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
204588


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server
All Platforms

Symptoms

The following error messages appear in the system controller platform messages
file ($SMSLOGGER/platform/messages):
Mar 10 11:33:57 2002 orlocn01-sc0 esmd[506]: [2000 171363006794824 ERRSysControl.cc 1371]
A power failure has been detected on a redundant power supply at ps1_power_good_l, located on CSB at CS1. SCHEDULE REPLACEMENT of CSB at CS1 as soon as possible.
If an additional failure occurs on this supply it may crash any dependent domain(s).

The same failure can be reported on an Expander Board.
esmd[...]: [2000 4153696025929986 ERR SysControl.cc 1581]
A power failure has been detected on a redundant power supply at ps0_power_good_l; located on EXB at EX7. SCHEDULE REPLACEMENT of EXB at EX7 as soon as possible.
If an additional failure occurs on this supply it may crash any dependent domain(s).

or
esmd[...]: [0 8353244896562214 ERR SysControl.cc 1579]
A failure has been detected on redundant PS at ps1_power_good_l; located on EXB at EX6. SCHEDULE REPLACEMENT of EXB at EX6 as soon as possible to
restore redundancy.

In case this problem occurrs on a IO Board the message will be:
esmd[....]: [2000 3006117574587 ERR SysControl.cc 2772]
A failure has been detected on redundant PS at +1.0-2.55_vdc0_ok; located on HPCI+ at IO17. SCHEDULE REPLACEMENT of HPCI+ at IO17 as soon as possible to restore redundancy.

Changes

{CHANGE}

Cause

Expander Boards, Centerplane Support Boards and IO Boards have 2 redundant power supplies, reported in the previous messages as ps0_power_good_l and ps1_power_good_l. Esmd tells you which one is the failing one and the action that needs to be taken.

Although the system can survive the loss of one of these power supplies, providing enough power to run, the replacement of the component needs to be scheduled as soon as possible.

Solution

In the case of the power supply warning message, it prescribes the course of action you should take ASAP. In this case, "schedule replacement of CSB at CS1 as soon as possible." or "schedule replacement of EXB at EX7 as soon as possible".

Additional notes :

When such a failure occurs, the component is automatically added to the
ASR Blacklist file.

esmd[...]: [0 47269795990296515 NOTICE SysControl.cc 5296]
Component CSB at CS1 has been blacklisted

This can be confirmed via the 'showcomponent -a' command.

Keep in mind to remove the component from the ASR Blacklist file, via 'enablecomponent -a' command after the replacement.

In the case of a defective CSB, the component can be proactively removed from the system via the 'setbus' command and then powered off to prevent any further fatal impact on the system.

Refer to the manual for 'setbus' or to the Sun Fire[TM] 12K/15K/E20K/E25K Systems Service Manual to make sure to use this command properly.

Important note

The behavior is different if the failure is detected during the power on (or setkeyswitch) operation of the component. In this case, the failure of one of the power supply is considered as fatal and the power on aborts.

poweron[...]: [6121 1491355281854766 ERR L2PowerControl.cc 325]
Power supply 0 is indicating a BAD(1) reading on EXB at EX13
poweron[...]: [6200 1491355296710844 ERR EXBPowerControl.cc 525]
Failed to power on component EXB at EX13
poweron[...]: [6214 1491355299261776 ERR poweronApp.cc 1356]
Attempt to poweron EXB at EX13 failed

And the replacement of the component (EX13 in this example) must be scheduled as soon as possible.

Internal comments
The following is strictly for the use of Sun employees:

*Note : Actually, ps0_power_good_l and ps1_power_good_l are the output signals
of the 1+1 redundant D116 DC-DC converters; they are set to HIGH when output
fails to be able to deliver power or while DC outputs are out of spec.

Reference:

Troubleshooting <Document 1017705.1>

Sun Fire[TM] 12K/15K: Expander Board Power Supply Failure May Cause System or
IO Boards to Lose Power

[email protected]

Product
Sun Fire 15K Server
Sun Fire 12K Server
Sun Fire E25K Server
Sun Fire E20K Server

Keywords
poweron, ps1_power_good_l, ps0_power_good_l, power failure, exb, csb, io

Previously Published As
47302

Change History
Date: 2010-04-26
User Name: Volkmar Grote 117021
Action: Reviewed for Content Team
Comment: minor formatting and typo corrections

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback