Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1017491.1
Update Date:2010-07-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  1017491.1 :   Sun Fire[TM] 3800/48x0/6800/E4900/E6900/E2900/v1280 or Netra[TM] 1280/1290 server: showcomponent indicates something is disabled  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  

PreviouslyPublishedAs
228613


Applies to:

Sun Fire 3800 Server
Sun Fire V1280 Server
Sun Netra 1280 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
All Platforms

Goal

This document applies to the Sun Fire  3800, 4800, 4810, 6800, E4900, E6900, E2900, v1280 and Netra 1280, and 1290 server.
  • SC refers to the System Controller (for which some systems have only a single SC while others have dual SCs).
  • When commands are to be executed on the "SC", it means from the "SC>" or "lom>" prompt on the Main SC for the server in question.
There are three reasons a component might show as disabled in the output of showcomponent.
  1. Someone has manually disabled the component
  2. No COD Capacity on Demand license exists
  3. The component has been disabled as a result of a fault

Solution

 

Someone has manually disabled the component

In this example (from a v1280, E2900, n1280, or n1290 system) SB4 has been disabled using disablecomponent SB4

lom>showcomponent SB4

Component Status Pending POST Description
--------- ------ ------- ---- -----------
/N0/SB4/P0 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P1 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P2 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P3 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P0/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P0/B0/L2 disabled - untest 2048M DRAM
/N0/SB4/P1/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P1/B0/L2 disabled - untest 2048M DRAM
/N0/SB4/P2/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P2/B0/L2 disabled - untest 2048M DRAM
/N0/SB4/P3/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P3/B0/L2 disabled - untest 2048M DRAM

First, confirm if there is a reason the component cannot be re-enabled, for example perhaps an application license has forced you to have to limit the number of CPUs (and this has been forgotten). If no issue with re-enabling the component(s) use the following command (from the SC or lom prompt):

lom>enablecomponent sb4

 

No COD Capacity on Demand license exists

In this example SB0 contains no licensed COD CPUs.

lom>showcomponent SB0
Component           Status   Pending  POST   Description
--------- ------ ------- ---- -----------
/N0/SB0/P0 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P1 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P2 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P3 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P0/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P0/B0/L2 Cod-dis - untest 1024M DRAM
/N0/SB0/P1/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P1/B0/L2 Cod-dis - untest 1024M DRAM
/N0/SB0/P2/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P2/B0/L2 Cod-dis - untest 1024M DRAM
/N0/SB0/P3/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P3/B0/L2 Cod-dis - untest 1024M DRAM

To enable the CPUs contact the licensing center.

See 1007945.1 for further information relating to Capacity on Demand.

The component has been disabled as a result of a fault

If the word chs (Component Health Status) appears in the POST column of showcomponent output, this indicates that a component has been marked as faulty, by the automated diagnostic engine built into Solaris and/or ScApp (System Controller Application).  The component(s) will not be available to the system. This chs status remains with the component until it is serviced (replaced, or the status reset).

If a parent FRU has a chs value in the POST column, all its child FRUs will also have a chs value (either Suspect or Faulty).  This does not mean that there is a specific issue with the Child or the Parent FRU(s) just from the status output alone.  Errors must be investigated.

  • The result of this is that if something is to be disabled, the architectural relationship between Parent and Child will group all of these components together in the action that the system software eventually takes (ie disable all of them, or not).
    • The most common Parent to Child FRU relationship is CPU (parent) and DIMMs (children).
    • Note: DIMM failures during POST can result in both the DIMM and CPU displaying chs in the POST column as both can be marked faulty.

In the latest release of ScApp, 5.20.x 114527-xx, you can view chs status of components using the showchs command.  In ScApp 5.20.15 or higher, you can also reset this status if needed (details later).

If an error is seen by ScApp, the diagnosis engine will analyze the event, produce it's advice and the showchs command will only indicate faulty or suspect against the component.  The date when the status changed is not shown (you can view the full error details using the SC command showlogs -v).

  • A faulty component will be disabled at the next reboot (note that most faults which will result in a faulty component automatically result in a recovery reboot, so this component may already be out of the configuration when you go to view it's status).
  • A suspect component will be available in the domain and the status is an indication that it may need it's errors investigated (for example the errors are correctable in nature, or there are many different FRU suspects).

It is quite common for systems to contain components in a suspect state.  In many cases the suspect status is very old and can simply be cleared. Contact Support Services if you find components which are marked suspect or faulty and have the events diagnosed.

sun_fire-sc1:SC> showcomponent sb2
Component           Status   Pending  POST   Description
--------- ------ ------- ---- -----------
/N0/SB2/P0 enabled - pass UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P1 enabled - pass UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P2 disabled - chs UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P3 enabled - pass UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P0/B0/L0 enabled - pass 512M DRAM
/N0/SB2/P0/B0/L2 enabled - pass 512M DRAM
/N0/SB2/P0/B1/L1 enabled - pass 512M DRAM
/N0/SB2/P0/B1/L3 enabled - pass 512M DRAM
/N0/SB2/P1/B0/L0 enabled - pass 512M DRAM
/N0/SB2/P1/B0/L2 enabled - pass 512M DRAM
/N0/SB2/P1/B1/L1 enabled - pass 512M DRAM
/N0/SB2/P1/B1/L3 enabled - pass 512M DRAM
/N0/SB2/P2/B0/L0 disabled - chs 512M DRAM
/N0/SB2/P2/B0/L2 disabled - chs 512M DRAM
/N0/SB2/P2/B1/L1 disabled - chs 512M DRAM
/N0/SB2/P2/B1/L3 disabled - chs 512M DRAM

/N0/SB2/P3/B0/L0 enabled - pass 512M DRAM
/N0/SB2/P3/B0/L2 enabled - pass 512M DRAM
/N0/SB2/P3/B1/L1 enabled - pass 512M DRAM
/N0/SB2/P3/B1/L3 enabled - pass 512M DRAM
sun_fire-sc1:SC> showchs
Component Status
--------------- --------
/N0/SB2/P2 Faulty

To service this component, contact your Support Services provider.  Should the Support Services engineer direct you to reset the CHS status of a particular component(s), utilize 1004879.1 to perform this procedure.

Note: See 1009358.1 if showcomponent indicates a Pending "disabled" status for a device.

Data required if a Sun Service call is logged

The information required to troubleshoot the fault depends on the platform.

  • For v1280, E2900, n1280, n1290 collect an explorer (see 1019066.1) with the following options:
    • /opt/SUNWexplo/bin/explorer -w default,1280extended
  • For 3800, 4810, 4800, E4900, 6800, and E6900s collect an explorer (see 1019066.1) with the following options as well as the loghost data (see 1008676.1):
    • /opt/SUNWexplo/bin/explorer -w default,scextended,fru
  • For Sun Fire 12K, 15K, E20K, or E25K platforms an explorer from the Main System Controller would be required.
    • No additional options to explorer are required.

Note: Explorer version 5.6 and above will take a lot less time to capture the required data due to bugs fixed in the scextended and sf15k modules (See 1002383.1).



Sun Shared Shell


If you require assistance in collecting the data recommended in this article or require help in diagnosing a system issue, there is a collaborative service tool called Sun Shared Shell which allows Sun Service engineers to remotely view and diagnose customer's systems. Consider using this option to reduce the problem resolution time.



Internal Comments

Investigating CHS Status from prtfru -x outputs

The chs status of a component is recorded to the component FRUID
and is captured in the prtfru -x output.
To reconstruct the showchs output from a prtfru_-x.out you may
utilize the internal tool, showfru.
################################################################################

Latest version 1.16 /net/cores.uk/export/hotline/hotlocal/bin/showfru
Report bugs, RFEs or if you have questions email [email protected]
Further info from http://panacea/twiki/bin/view/Tools/ToolPageShowfru
################################################################################
... Removing all the FRU information ...
################################################################################
CHS History of currently disabled Components, use -v to see full history
################################################################################

Component : N0.SB2.P2
Time Stamp : Mon Jul 20 19:16:53 YEKST 2006
New Status : FAULTY
Old Status : FAULTY
Event Code : *** UNKNOWN Invalid Value ***: 0x0000000003000000
Initiator : SCAPP
Message : SF4800.VCMON.1.03.1425
@
Investigating CHS status with physical access to the SC
1. Use the showchs -b command at the platform shell.
For example:
sun_fire-sc1:SC> showchs -b

Component Status
--------------- --------
/N0/SB2/P2 Faulty

In the above example, SB2/P2 is marked faulty and must be serviced.

2. Collect the following data while in service mode (if ScApp version
is < 5.20.15) or from normal mode (if ScApp 5.20.15 or higher is
installed) on the Main SC:

v4u-6800c-sc0:SC[service]> showchs -v -c sb2

Total # of records: 2
Component : N0/SB2/P2
Time Stamp : Wed Jul 19 19:16:29 YEKST 2006
New Status : FAULTY
Old Status : OK
Event Code : Other
Initiator : SCAPP
Message : SF4800.VCMON.1.03.1424

Component : N0/SB2/P2
Time Stamp : Thu Jul 20 19:16:53 YEKST 2006
New Status : FAULTY
Old Status : FAULTY
Event Code : Other
Initiator : SCAPP
Message : SF4800.VCMON.1.03.1425
@
What to do next
With the date stamp and the fault code (SF4800.VCMON.1.03.1425 in the
above example) you can determine if the fault is valid, replace the part, or
re-enable if a known bug has been hit.

In this example, check if CR 6353053 "false VCMON failures due to ramping
system load" applies then check the replacement rules in 1010919.1 to see
what to do to resolve this issue.

If you are not sure if the component needs to be replaced ask for confirmation
in the GL-ESG IM room.

If component status must be reset
Utilize 1004879.1 for instructions on resetting component CHS status
using setchs (if needed).


chs, showchs, setchs, OK, SUSPECT, FAULTY, normalized
Previously Published As
70181

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback