Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1003169.1
Update Date:2011-05-09
Keywords:

Solution Type  Technical Instruction Sure

Solution  1003169.1 :   Sun Fire[TM] reboots due to TOD-POR reason  


Related Items
  • Sun Fire 12K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
204355


Applies to:

Sun Fire 12K Server
All Platforms

Goal

Sun Fire[TM] reboots due to TOD-POR issue (where TOD = Time Of Day chipset and POR = Power On Reset).

Solution

When the Sun Fire[TM] (Enterprise server) hardware generates a reset, it is up to the openboot PROM (OBP) to handle it and recover any debugging information.

The OBP's messages get printed to the system's console only. During a reset of the Sun Fires, the OBP will save reset_cause and previous_reset_cause. The messages can be displayed with the prtconf(1M) command. The fields "reset-reason" and "previous-reset-reason" in the output of "prtconf -vp"will list the reason(s).

One reset type that can be observed from these OBP messages is TOD-POR reset or "TOD Watchdog". It is a Sun Fire feature enabled by the kernel flag "watchdog_enable". Sun Fires use the clock board's TOD as a watchdog facility.

The "TOD Watchdog" may be used to recover from a hung system. However, if the watchdog timer expires (10sec), a system reset occurs so the system reboots rather than remain hung. Unfortunately does not allow for debugging the hard hang situation. To help debug the hang, you would need to disable the TOD watchdog feature and enable the deadman kernel instead.

By default the TOD Watchdog feature is disabled.

To enable it, add the following line in /etc/system, then reboot.
set watchdog_enable = 1

MECHANISM

============

tod_setwatchdog() is initially called from clkstart(). Then the next tod_get() programs the TOD hardware's watchdog facility. tod_suspendwatchdog() temporarily suspends the watchdog timer and the next call to tod_get() re-enables the watchdog timer. The only call to tod_suspendwatchdog() is from complete_panic().

It is not possible to determine if the hang itself was from HW or SW when the timer expires, because the TOD watchdog mechanism involves both the Kernel, SW mechanism, and the clock board, HW mechanism.

Note that if the even if the line is set in /etc/system, if Solaris[TM] is running under a debugger (RB_DEBUG), then the watchdog is disabled.



Product
Netra 1280 Server
Sun Netra 1290 Server
Sun Fire V1280 Server
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E2900 Server
Sun Fire E4900 Server
Sun Fire E6900 Server
Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server


Internal Section

On Serengeti systems, the SC provides TOD support for Solaris[TM] as Solaris does not have direct access to or exclusive ownership of TOD hardware. The TOD watchdog mechanism works differently on the Serengeti versus the Sun Fire, and is enabled by default.

See Technical Instruction Document 1008873.1 for more information on troubleshooting resets.

Previously Published As 46241

Keywords: TOD-POR, Time of day, watchdog, reboot, rebooted, reboots, kernel 


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback