Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1009098.1
Update Date:2010-10-13
Keywords:

Solution Type  Technical Instruction Sure

Solution  1009098.1 :   Sun Fire[TM] 1280/E2900/3800-6900: Using testboard to run extended POST diagnostics [Video]  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • GCS>Support>KM>Content>Video
  •  
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
212567


Applies to:

Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E2900 Server
Sun Fire E4900 Server
All Platforms

Goal

Description
Before using DR to install a new or replacement system board into a running domain, it is a good idea to run extended POST diagnostics. This document explains how to run extended diags using testboard. An example of how to run extended diagnostics during a keyswitch operation is also included.

Available for this topic, a Video Tutorial; Brief how-to video tutorials that provide step-by-step instructions answering Sun's most frequently asked questions.  View the video answer and/or follow the detailed instructions below. 


Video - Understanding POST options (5:00)

 

Sunsolve users must download the attachment to view the video.


Solution

Steps to Follow

Sun Fire[TM] 1280/E2900/3800-6900: Using testboard to run extended POST diagnostics.

The Sun Fire[TM] Serengeti Server POST level can be set to the following levels:

 POST Level      Description
----------      -----------------

init Provides fastest POST run. No testing is done; only system board initialization.

quick Tests all system board components. Provides minimal test counts and patterns. No memory testing is done.

default Runs all system board tests and patterns. Performs some memory and Ecache testing.

max The same as as default

mem1 Performs all default/maximum testing, plus extensive memory and Ecache testing.

mem2 Includes mem1 testing plus DRAM testing.

Running extended diags using testboard

Testboard does not have any options for setting the level of POST.

v4u-3800a-sc0:SC> testboard -h
testboard -- test a CPU/Memory board
Usage: 
testboard [-f]
testboard -h

-f -- force board testing of an already tested board
-h -- display this help message

Running testboard on a 3800-6900

Testboard uses the diag-level setting of the domain that the board is assigned to. To avoid having to make any changes to the running domain, assign the board to an unused domain. Then use setupdomain to get the required diag-level. If no unused domain exists you can change the diag-level of a running domain, run testboard and change back to the original diag-level, without impacting the running domain.

v4u-3800b-sc1-gmp02:A> setupdomain -p boot 

domain Boot Parameters ----------------------
diag-level [default]: wibble wibble: is not a valid setting valid settings: init, quick, max, default, mem1, mem2

Default is the same as max, for additional testing use either mem1 or mem2.

Note: Firmware revisions prior to 5.12.5 have a bug which means mem1 and mem2 may fail. 4424609 POST timeout with mem1 or mem2 and fully populated board.

Running testboard on a 1280 or E2900

The 1280 and E2900s only have a single domain and you will need to change the domain config to increase the POST level of the testing.

From the running Solaris domain first confirm the current diag-level:

# eeprom diag-level
diag-level=init

Change the diag-level to the required setting:

# eeprom diag-level=mem2
# eeprom diag-level
diag-level=mem2

From the lom prompt you can now run the testboard command and it will use the new diag-level setting.

lom>testboard sbX

Once the testboard has started you can reset the diag-level back to the original setting from Solaris using eeprom.

Running extended diags using cfgadm

It is also possible to run an increased/decreased level of diagnostics directly as you DR a component into the domain.

# cfgadm -o platform=diag=mem2 -c configure N0.SBX

Running extended diags using setkeyswitch

It is sometimes difficult to determine a failing component when a domain refuses to boot Solaris. If this happens, capture the currently available fault detail by collecting the following information.

  • See Document: 1011830.1which explains how to run an scextended explorer to collect all the hardware level fault information from the System Controller.
  • Collect the System Controller remote logs from the loghost. Some faults are undiagnosable without a loghost.  Refer to Document: 1008676.1  How to configure a loghost on Sun Fire[TM] 3800-6800 and E4900/E6900 servers

Then follow these steps to run high level diagnostics

STEP1: Configure domain to be tested, all options will stay at the current setting unless you type in new information.

domainA-sc0:A> setupdomain
Domain Boot Parameters
----------------------
diag-level [default]: mem2 <- Manually type "mem2" here, hit RETURN
verbosity-level [min]:
error-level [max]:
interleave-scope [within-board]:
interleave-mode [optimal]:
reboot-on-error [true]:
hang-policy [reset]:
OBP.use-nvramrc? [true]:
OBP.auto-boot? [false]:
OBP.error-reset-recovery []: 
Loghosts
--------

Loghost [10.10.10.100]: <- Without a loghost some faults are undiagnosable Log Facility [local5]:
SNMP
----

Domain Description [ ]: Domain Contact [ ]: The SNMP agent is disabled.

STEP2: Execute POST

Set the keyswitch to off (NOTE: First make sure domain is already down!)

domainA-sc0:A> setkeyswitch off
Powering boards off ...

Set the keyswitch to on:

domainA-sc0:A> setkeyswitch on
Powering boards on ...

STEP3: Capture the full POST output

Diagnose the fault using the full error messages that are observed.

Automated Diagnosis

The SC firmware itself has a number of other features to automatically diagnose faults, this document details those present in firmware 5.19.x patch 114526

The latest firmware release is 5.20.x from patch 11452

Blueprint - Auto Diagnosis and Recovery Enhancements for Sun Fire Midrange Servers



Product
Sun Fire 4810 Server
Sun Fire E6900 Server
Sun Fire E4900 Server
Sun Fire 6800 Server
Sun Fire 4800 Server
Sun Fire 3800 Server
Netra 1280 Server
Sun Fire V1280 Server
Sun Fire E2900 Server

POST, Diagnostic, OBP, 1280, E2900, 6800, 3800, 4800, 4810, E4900, E6900, MEM2, MEM1
Previously Published As
76684

Change History
Audited/updated 11/17/09 - [email protected], Mid-Range Server Content Team

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback