Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1011849.1
Update Date:2009-04-01
Keywords:

Solution Type  Technical Instruction Sure

Solution  1011849.1 :   AUTOMATIC SYSTEM RECOVERY (ASR) - Sun Fire [TM] V480/V880  


Related Items
  • Sun Fire V480 Server
  •  
  • Sun Fire V880 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  

PreviouslyPublishedAs
216230


Description
The automatic system recovery feature for the Sun Fire[TM] V480/V880 enables the system to resume operation in the event a NON-Fatal Error has occurred. When ASR is enabled, the system's firmware diagnostics automatically detect failed hardware components. The OpenBoot[TM] PROM (OBP) firmware will deconfigure the failed components and restore system operation as long as the system is capable of operating without the failed component.

The ASR feature enables the system to reboot automatically, without operator intervention.



Steps to Follow
AUTOMATIC SYSTEM RECOVERY (ASR) - Sun Fire V480/V880 (OBP version 4.5.x and Below, 4.6.x and Above):

NOTE : For OBP version prior to 4.15, recommended settings for OBP parameters are described in FAB < Solution: 200879 > 

NOTE : For OBP version 4.15.x and above see notes at the end of document or check FAB < Solution: 200879 >

NOTE: Recommendation in FAB < Solution: 201161 > is to upgrade to OBP 4.15.6 or later

AUTOMATIC SYSTEM RECOVERY (ASR) for OBP Versions 4.6.x and Above

ok setenv auto-boot  true

ok setenv auto-boot-on-error  true

ok setenv diag-switch  true

ok setenv diag-level max

ok setenv diag-device (a value = to boot-device) disk

ok setenv post-trigger all-resets  

'post-trigger' options:

power-on-reset: Execute POST during system power-cycle on

error-reset: Execute POST on Red State Exception, Watchdog, or Fatal Reset events

user-reset: Execute POST on panic or user-initiated events (ie: reboot, reset-all).

all-reset: Equivalent of  power-on-reset error-reset user-reset 

none: Disable POST execution, excluding keyswitch & service processor.

It should be noted that the post-trigger should be set to power-on-reset and error-reset for normal operation. If user-reset (or soft-reset for 4.5.x) is included as will happen with all-resets then the system will run diagnostics every time the system receives a software reset, for example when the system is rebooted.

ok setenv obdiag-trigger all-resets

'obdiag-trigger' options:

power-on-reset : Execute OBDIAG during system power-cycle on

error-reset: Execute OBDIAG on Red State Exception, Watchdog, of Fatal Reset events

user-reset: Execute OBDIAG on panic or user-initiated events (ie: reboot, reset-all)

all-reset: Equivalent of  power-on-reset error-reset user-reset 

none: Disable OBDIAG execution, excluding keyswitch & service processor.

AUTOMATIC SYSTEM RECOVERY (ASR) for OBP Versions 4.5.x and Below

ok setenv auto-boot  true

ok setenv auto-boot-on-error  true

ok setenv diag-switch  true

ok setenv diag-level max

ok setenv diag-device (a value = to boot-device) disk

ok setenv diag-trigger soft-reset

diag-trigger variable settings that can be used:

power-reset (default): Runs firmware diagnostics only on power-on resets, including RSC-initiated power on resets.

error-reset: Runs firmware diagnostics only on resets triggered by hardware errors, including watchdog reset events. This does not include software resets.

soft-reset (recommended): Runs firmware diagnostics on all reset events, including OS system panics (including software resets).

none: Disables the automatic triggering of firmware diagnostics by any reset event. You can still invoke firmware diagnostics manually by turning the front panel keyswitch to the Diagnostics position prior to powering on the system.

ok reset-all

Non-fatal (recoverable by ASR) errors includes the following:

  • SCSI/GLM Failure

  • FCAL Subsystem Failure (In this case, a working alternate path to the boot disk is required.)

  • Gigabit Ethernet or Fast Ethernet interface Failure

  • RSC Failure

  • USB Interface Failure

  • Serial Interface Failure

  • Any PCI Card Failure

  • CPU Failure (Renders the entire module "failed", Requires another "passed" module present in the system, otherwise this is a Fatal Error)

  • Memory DIMM Failure (Renders the memory bank "failed", NOT the entire GROUP. If the entire GROUP failed it would require another passed bank at a minimum.)

Note: If POST or OpenBoot Diagnostics detects a non-fatal error associated with the normal boot device, the OpenBoot firmware automatically deconfigures the failed device and tries the next-in-line boot device, as specified by the 'boot-device' configuration variable.

Fatal (non-recoverable by ASR) errors includes the following:

  • Switch ASIC Failure (DAR, DCDS, MDR, BBC)

  • PCI Bridge Failure

  • RIO Failure

  • CPU Failure (All CPU modules present failed.)

  • Memory Failure (All memory banks present failed.)

  • Flash RAM cyclical redundancy check (CRC) Failure

  • Critical FRUPROM configuration data (CRC, consistency) Failure

To view a list of components that can be manually enabled or disabled by ASR, type the following at the ok prompt:

ok .asr



Product
Sun Fire V880 Server
Sun Fire V880/890
Sun Fire V480 Server
Sun Fire V880 Server


Internal Comments
NOTE: 'obdiag-trigger' and 'post-trigger' are replaced by 'diag-trigger' with OBP 4.15. and above

RECOMMENDED for NORMAL mode:


----------------------------


Diagnostics (POST/OBDIAG) are run at power-on to validate no
faulty hardware exists in configuration, prior to booting the
operating system.


Diagnostics (POST/OBDIAG) will run during any OBP detected error
event, such as Red State Exceptions, Watchdog Resets, and Fatal
Resets to identify any potential defective hardware.


Mount mount


Increased boot time and messaging only occurs during system
power-on and OBP detected error events (power-on-reset, error-reset).


These settings increase system availability when faced with
non-fatal OBP detected error events.


    


Recommended OBP Parameter settings (for OBP version prior to 4.15)


------------------------------------------------


 diag-switch  true
diag-level max
diag-script normal

post-trigger power-on-reset error-reset


obdiag-trigger power-on-reset error-reset

auto-boot true
auto-boot-on-error true
diag-device (set to same value as the boot-device parameter)
diag-file ( set to same value as the boot-file parameter)


    


Recommended OBP Parameter Settings (for OBP version 4.15.6 and
above
)


-------------------------------------------------


 service-mode        =false
diag-switch =false
diag-level =max
diag-script =normal

diag-trigger =power-on-reset error-reset

verbosity =normal
auto-boot =true
auto-boot-on-error =true


       


        For details on
these parameters, refer to section:


        "OPENBOOT
Parameters Specific to Enabling Firmware Diagnostics"


For more details concerning the OBP parameters in OBP 4.15 and
above, please reference FAB < Solution: 200879 > 


For more information, please reference:


http://panacea/twiki/bin/view/Products/ASR_v480_v490SunFireV490


V480, V880, ASR, firmware
Previously Published As
47086

Change History
Updated link in internal section http://panacea/twiki/bin/view/Products/ASR_v480_v490SunFireV490
bj79977

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback