Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1013119.1
Update Date:2011-03-24
Keywords:

Solution Type  Troubleshooting Sure

Solution  1013119.1 :   Troubleshooting temperature warnings on multiple components within a Sun Fire [TM] Serengeti or LightWeight8 system  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire 4810 Server
  •  
  • Sun Netra 1290 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
217971


Applies to:

Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E2900 Server
All Platforms

Purpose

Description

This document addresses temperature warnings or errors on multiple components (board, repeater, etc) in Sun Fire [TM] 3800, 4800, 4810, E4900, 6800, E6900 and Sun Fire [TM] v1280, E2900, and Netra [TM] 1280, 1290 systems.

This document covers situations where the system is powered on, but warnings or errors related to temperature exist in the configuration.  If you have only a single device in the chassis reporting temperature warnings, see <Document:1010052.1> for troubleshooting information related to that specific issue.

Symptoms:

  • System components report unusually high or low temperature warnings or messages.
  • Fan Tray(s) may be marked Failed in showenvironment output from the System Controller.
  • Domain(s) may be unable to be powered on, unable to be "setkeyswitched on", or booted;  Or Domain functions may be completely unaffected by the errors, warnings, or Fan Tray status.
  • The System Controller (SC) should be accessible
  • Other systems in the same area or data center may also be reporting temperature errors, messages, or issues.

Last Review Date

March 24, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow

Please validate that each troubleshooting step below is true for your environment.  The steps will provide instructions or a link to a document, for validating the step and taking corrective action  as necessary.  The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution.  Please do not skip a step.

1.   Verify this is the only server having high temperature warnings in the environment (data center, rack, etc).

  • Log into neighboring systems in the same rack or general area and look at temperature readings to determine whether they are also showing elevated temperatures. 
  • If multiple systems are showing elevated temperatures, the issue is specific to the site (data center) and the site administrator should investigate the issue.

2.  Verify that all Fan Trays are active and ok (not marked FAILED in showenvironment or showlogs).

  • Confirm the status as shown in <Document:1011930.1> Sun Fire[TM] (3800-6800) System Controller Application (ScApp) How To's.
  • If there is a failed Fan Tray then please follow the instructions detailed in <Document:1008393.1> Troubleshooting Cooling Fan Failures on Sun Fire [TM] Serengeti or LightWeight8 Systems starting after Step 3 of that document.

3.  Verify that multiple component's temperature are high as shown in showenvironment output and record which components are implicated.

  • Confirm the status as shown in <Document:1011930.1> Sun Fire[TM] (3800-6800) System Controller Application (ScApp) How To's.
  • If there is only a single component with an elevated temperature you should use <Document:1010052.1> Troubleshooting temperature warnings on an individual component within a Sun Fire [TM] Serengeti or LightWeight8 system and start on Step 4.

4.  Verify that the components in error are not physically located together in chassis.

  • Use each system's pictures in the Sun System Handbook to determine where these components are physically located.
    • Of importance, determine if all of the components implicated are physically located on the same side of the chassis (left or right, front or back).
  • If a clear indication that they are grouped together on a particular side of the chassis, visibly inspect the area around that side of the chassis.  Try to determine if there is an obstruction, dirty filters, or some other external blockage associated to the "hot spot".
Be aware of how the system is designed to cool itself when performing a visual inspection of it:
  • All of these systems utilize Front to Back cooling.
  • For 3800, 48x0, 6800, E4900, and E6900 the Systems Site Planning Guide, Chapter 3.4.1 provides diagrams showing airflow.  Two notes are key:
    • Any systems mounted in a rack must have front to back cooling (no side to side).
    • The front of the cabinet should not be facing, nor be in the path of the exhaust air from any other systems or cabinets.
  • The E2900, v1280, n1280, and n1290 also use front to back cooling which is document in their Systems Site Planning Guide.  A diagram is not available for these chassis, however.
    • Be aware of <Document:1021703.1>
    • The Alert details that for E2900 & v1280 the left air filter should be removed and on Netra 1280 & 1290, the air filters should be inspected and/or replaced every 6 months.

5.  Confirm the errors persist on the same component when the other SC is main.

  • If the errors cease utilizing the new SC, then the former SC is suspect and should be replaced.
  • SC failover reference: <Document:1003245.1> Sun Fire[TM] 3800-6900: System Controller failover functionality

6.   Collect the following data and collaborate with the next level of support.

  • It is preferred that Explorer with the appropriate scextended or 1280extended option as detailed in <Document:1018748.1> How to Run Sun[TM] Explorer and Forward the Data to a Sun Engineer
  • If Explorer data can not be collected for whatever reason see <Document:1003529.1> Procedure to manually collect Sun Fire[TM] Midrange System Controller level failure data

Internal Comments
At this point, if the customer has validated that each troubleshooting step above is true for their
environment, and the issue still exists, escalate to your next level of technical support.

Previously Published As 91429


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback