Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1013120.1
Update Date:2011-02-03
Keywords:

Solution Type  Troubleshooting Sure

Solution  1013120.1 :   Troubleshooting "can't power on" component errors on Sun Fire [TM] v1280, E2900, 3800, 4800, 4810, 6800, E4900, E6900, or Netra 1280, 1290 Servers  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Netra 1290 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  

PreviouslyPublishedAs
217972


Applies to:

Sun Netra 1280 Server
Sun Netra 1290 Server
Sun Fire V1280 Server
Sun Fire 3800 Server
Sun Fire 4800 Server
All Platforms

Purpose

Description

Failure to power on a System Board, I/O Board, Repeater, Fan Tray, or System Controller.

This document addresses situations where the system is powered on, but warnings or errors indicate that a particular component could not be powered on, or added to the hardware configuration. The component in question may be a System Board (SB), I/O Board (IB), Repeater (RP), Fan Tray (FT), or System Controller (SC).

System Type:

  • Sun Fire [TM] 3800, 4800, 4810, E4900, 6800, E6900
  • Sun Fire [TM] v1280, E2900, and Netra [TM] 1280, 1290

Symptoms:

  • The issue may be described as a System Board (SB), I/O Board (IB), Repeater (RP), Fan Tray (FT), or even System Controller (SC) failure.
  • The showenvironment command may show an "ERROR LOW" status for the component in question and a 3.3 VDC sensor value of 0.xx (in other words, less then the LoWarn value).
    • This output is extremely useful for diagnosing power failure events when showlogs data has scrolled off the error buffer (when the other symptoms described below are not present in error logs for any number of reasons).
    • The example below indicates an ERROR LOW status for SB0 and SB2 is also provided to show what "normal" values are in relation:
lom> showenvironment -v
Slot Device Sensor Min LoWarn Value HiWarn Max Units Age Status
------- ---------- ------------ ------ ------ ------ ------ ------ --------- ------- ------
***** Results truncated for this example *****
/N0/SB0 Board 0 3.3 VDC 0 2.97 3.13 0.49 3.47 3.63 Volts DC 5 min *** ERROR LOW ***
***** Results truncated for this example *****
/N0/SB2 Board 0 1.5 VDC 0 1.35 1.42 1.51 1.58 1.65 Volts DC 9 sec OK
/N0/SB2 Board 0 3.3 VDC 0 2.97 3.13 3.27 3.47 3.63 Volts DC 9 sec OK
  • The error would result in the failure of a component to be powered on with warnings similar to the following (messages in BOLD TEXT are the key to focus on):

Dec 07 12:13:20 Main-SC Platform.SC: Attempt to power up /N0/IB9 failed: /N0/IB9 1.5V DC failed, observed: 0.0 volts /N0/IB9 3.3V DC failed, observed: 0.58 volts /N0/IB9: powered on

or

Sat Sep 27 06:16:24 sc lom: [ID 395834 local0.error] Attempt to power up /N0/SB0 failed: /N0/SB0 3.3V DC failed, observed: 0.15 volts

  • The issue could cause "setkeyswitch on" to fail because the board can not be powered on.
  • There may be amber LEDs lit ("wrench" or warning LED) on system components.
  • There may be accompanying messages on the system console or in the SC log files (showlogs) indicating (key messages are in BOLD TEXT):
Mar 09 14:12:30 Sunfire Platform.SC: [ID 920508 local0.notice] CPCI I/O Board (F3800) at /N0/IB8 Device poll caused: sun.serengeti.FailedHwException: (SdcAsic)Asic.getTemp: Path broken between CBH and SDC: IB8.sdc.10 (13000010)
Mar 09 14:12:31 Sunfire Platform.SC: [ID 818977 local0.notice] /N0/IB8,
sensor status, outside acceptable limits (7,1,0x503080d00050000)
or
Wed Oct 01 21:56:10 sc lom: [ID 390680 local0.notice] CPU Board V3 at /N0/SB0 Device poll caused: sun.serengeti.HpuFailedException: CpuVoltageA2D.getOutputVoltage: sun.serengeti.CommException: I2cComm.readCmd: Path broken between CBH and SDC: SB0.sbbc1.regs.c0 (102000c0)
Wed Oct 01 21:56:10 sc lom: [ID 336982 local0.notice]
Device will not be polled
Wed Oct 01 21:56:10 sc lom: [ID 120592 local0.notice] /N0/SB0,
sensor status, outside acceptable limits (7,1,0x207000d00070000)

Situations excluded from this document:

  • The entire system can not be powered on;  See Document 1010053.1 for instructions on troubleshooting if your system will not power on.
  • Power Supply warnings or errors;  See Document 1007049.1 for instructions on troubleshooting errors or warnings associated to a Power Supply Unit.

Last Review Date

September 8, 2010

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow

Please validate that each troubleshooting step below is true for your environment.
The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.


1. Verify the error is associated to a single board or type of board and not the entire platform.
  • Confirm the messages in showlogs on the System Controller (SC) identify only a single board in error (will identify it's exact board number).
  • Confirm the status of all boards except the implicated one are OK in showenvironment output.
    • Reference: Document 1011930.1 Sun Fire{TM} (3800-6800) System Controller Application (ScApp) How To's
2. Verify all PSUs are active and OK.
  • Make sure the status of the PSUs are OK in showenvironment output.
    • Reference: Document 1011930.1 Sun Fire{TM} (3800-6800) System Controller Application (ScApp) How To's
3. If the component being powered on is new to the configuration, verify it is supported in the configuration.
  • Document 1010756.1 Sun Fire{TM} Mid-Range Servers ScApp Features and Requirements Matrix
  • Document 1006342.1 Sun Fire[TM] Midrange System: UltraSPARC[R] IV System Board Upgrade Requirements
  • Document 1007858.1 Sun Fire[TM] Midrange System: UltraSPARC[R] IV+ (USIV+) System Board Upgrade Requirements
4. Validate whether the event is one of the following known issues:
5. Confirm the same component errors when the other SC is main (if dual SC configuration)
  • Use scfailover to switch over to the other SC in dual SC configurations.
    • Reference: Document 1003245.1 Sun Fire[TM] 3800-6900: System Controller failover functionality
  • If the errors cease when utilizing the new SC, then the former SC is suspect and should be replaced.
6. Collect the following data and collaborate with the Sun Support Engineer by opening a Service Request with Sun.
  • It is preferred that Explorer with the appropriate scextended or 1280extended option.
    • Reference:  Document 1018748.1 How to Run Sun[TM] Explorer and Forward the Data to a Sun Engineer
  • If Explorer data can not be collected for whatever reason see Document 1003529.1 Procedure to manually collect Sun Fire[TM] Midrange System Controller level failure data.
NOTE: If you are a customer and have reached this stage in the troubleshooting process, please open a Service Ticket with Sun Support Services or engage your local field office to obtain assistance with resolving this issue. Please make sure to mention this knowledge article so we can continue with the following steps to resolve this issue.


Internal Comments
At this point, the customer should have validated that each troubleshooting
step above is true for their environment.

Apparently, the issue still exists, so Internal Support Engineers should
validate that the initial steps have been checked properly and then investigate
the following:

7. If the component is new to the configuration or recently serviced, confirm
the errors still persist after the component has been re-seated.
- Be careful to inspect pins and sockets for any damage.
- Document 1019218.1
Sun Fire[TM] Midrange Servers: How to identify pin or socket damage for details.

Reference the appropriate System Service Manual for complete instructions on FRU
handling and insertion procedures:
- Sun Fire 6800/4810/4800/3800 Systems Service Manual (pdf)
- Sun Fire[TM] V1280/Netra[TM] 1280 Systems Service Manual (pdf)
- Sun Fire[TM] E2900 System Service Manual (pdf)
- Sun Fire[TM] E4900/E6900 Systems Service Manual (pdf)
- Netra[TM] 1290 System Service Manual (pdf)

8. Verify the errors still persist after the component is replaced.
Reference the appropriate System Service Manual from Step 6 above for complete
instructions on FRU replacement and procedures.

9. Collaborate with the next level of support if errors persist.
- It is preferred that Explorer with the appropriate scextended or 1280extended option
    as detailed in Document 1018748.1 How to Run Sun[TM] Explorer and Forward
    the Data to a Sun Engineer
- If Explorer data can not be collected for whatever reason see Document 1003529.1
    Procedure to manually collect Sun Fire[TM] Midrange System Controller level failure
   data manually

Voltage Error Resources - Main causes of SB and IB poweron or power failure issues:
- Document 1019667.1 Sun Fire[TM] Midrange Server System Board (SB) voltage errors
- Also see Document 1021064.1 for additional advice with regards to SB voltage errors.
- Document 1017844.1 Sun Fire[TM] MidRange Server PCI or cPCI IO Board (IB)
    power supply failures

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback