Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003552.1
Update Date:2010-06-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003552.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: SC POST results: 'Power On Selftest not run on last reset'  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
204998


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server
All Platforms

Symptoms

{SYMPTOM}
The following message is reported in the $SMSVAR/adm/platform/messages
(/var/opt/SUNWSMS/adm/platform/messages) file on a Sun Fire[TM]
12K/15K/E20K/E25K System Controller(SC):

Aug 04 16:24:46 2004 SC1 ssd[381]: [0 66349511944 NOTICE SSDWorkArea.cc 38]
SC POST results: 'Power On Selftest not run on last reset'

Changes

{CHANGE}


Cause

{CAUSE}
This message means exactly as it states. The System Controller did not run
Power On Selftest, POST, upon its last reset or reboot. This means that the SC
has not executed basic hardware testing of its own components during a reboot or
a reset action.

Solution


Resolution
A reboot or reset could be the result of someone manually rebooting an SC for
whatever reasons one reboots a machine, or it could be a more serious issue
where a SC panic rebooted or was forced down because of a problem.

In any event, if an SC reboots or is reset and comes back up without running
basic hardware diagnostics, a possible bad component will not be detected and
the SC can become the MAIN SC again. This may result in the platform being
monitored and controlled by a possibly defective SC.

The SCs must run the basic hardware diagnostics in SSCPOST so that any
detected errors on the SC's components are reported. Then SMS can report those
errors to the $SMSVAR/adm/platform/messages file as it starts up in /etc/rc3.
d/S99sms as well as report the errors to the remote SC. SMS can then take
action against the SC startup as needed. This may include preventing SMS from
starting up on the SC with problems in sscpost.

So, if the system does not run hardware tests on an SC when it reboots or
resets it bypasses the checks built into SMS that may keep a suspect SC from
managing the platform.


Relief/Workaround
System Controllers will execute extended POST upon reboot or reset when the
following OBP variables are set as such:

diag-level=pmax-epvmax
diag-switch?=true
post-on-sir?=true

NOTE: SC1 may have diag-level=pmax-epvmax, while SC0 is set to pmax-epmax.
The difference in this setting is that epvmax is extended diagnostics and
epmax is normal diagnostics. They are set differently so that when both SCs are
powered on and run POST at the same time, SC0 will complete the normal
diagnostics before SC1, ultimately meaning that SC0 will become MAIN SC in SMS.
It's a race to become MAIN and SC0 is given a head start.

To enable SSCPOST from the OBP prompt and then execute it:

ok setenv diag-level pmax-epvmax
ok setenv diag-switch? true
ok setenv post-on-sir? true
ok reset

To enable SSCPOST from multi-user and then execute it (make sure SC failover
is disabled before rebooting the MAIN SC, otherwise the reboot will cause SMS to
failover to the SPARE):

# eeprom diag-level=pmax-epvmax
# eeprom diag-switch?=true
# eeprom post-on-sir?=true
# reboot


Additional Information
When SSCPOST is not executed against an SC and it has rebooted or been reset,
in addition to the message below you may also notice certain I2c Bus Address
warnings in $SMSVAR/adm/messages are occurring:

SC POST results: 'Power On Selftest not run on last reset'

For example:

Aug 4 17:14:31 2004 SC1 hwad[438]: [1123 5036434911859 ERR
I2cComm.cc 410] I2c read time out - bus: 23, address: 25

Aug 4 17:15:25 2004 SC1 hwad[438]: [1123 5090842384614 ERR
I2cComm.cc 410] I2c read time out - bus: 23, address: 22

Bus 23 maps to System Controller 1.
Address 25 and Address 22 are LED control registers.

NOTE: Messages may be on Bus 22 if the SC you have just rebooted is SC0.

It turns out that a side effect of not running sscpost on a SC upon a reboot or a reset is that the warning LED registers for the SC may start showing false Ambers, and the I2c messages may exist, and be quite numerous.

After enabling sscpost, and rebooting the SC (which runs sscpost), these warnings messages and false Ambers go away.


Product
Sun Fire 15K Server
Sun Fire 12K Server
Sun Fire E25K Server
Sun Fire E20K Server

Internal Comments
The following is strictly for the use of Sun employees:

References:

This subject was written based on Radiance case ID 37097773 which exhibited this behavior and had the Additional Information section's additional behavior.

Bug ID 4621045 details that it is sscpost which is responsible for resetting the LED registers on the SC.  If sscpost isn't executed the LED registers aren't reset, and could result in false LED warnings (Amber) or even I2c warnings.

Technical Solution "Sun Enterprise[TM] 12K/15K: EIS standard EEPROM settings"

Problem Resolution "Sun Fire[TM] 15K: Running diagnostics on System Controller"
starcat, 12k, 15k, 20k, 25k SC, POST, System Controller, sscpost, SSCPOST, I2c, SMS
Previously Published As
75093

Updated by the ESG Knowledge Content Team 4/2010


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback