Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1021064.1
Update Date:2010-07-06
Keywords:

Solution Type  FAB (standard) Sure

Solution  1021064.1 :   Preventing potential system outages through proactive cooling improvements on Sun Fire Entry Level servers.  


Related Items
  • Sun Netra 1280 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Controlled Proactive
  •  

PreviouslyPublishedAs
270169


Bug Id
<SUNBUG: 6864515>

Product
Sun Fire E2900 Server
Sun Fire V1280 Server
Netra 1280 Server
Sun Netra 1290 Server

Date of Resolved Release
16-Oct-2009

Inadequate cooling may lead to a system panic or reset (see details below).

Impact

Loss of application availability may occur due to a system panic or reset.

Contributing Factors

This issue may occur on the following platforms:

SPARC Platform

    . Sun Fire E2900/V1280 Systems
    . Netra 1280/1290 Systems

Note:  This cooling related issue has been found to occur on systems where, over time, increased temperature can contribute to system failures.  Air filters may not have been properly cleaned or serviced over the life of the product, contributing to increased system temperature.

Symptoms

A typical error scenario includes one or more of the following error messages in the System Controller's (SC) log files ('showlogs -v' or from the console):

     Path broken between CBH and SDC:SB#
     Device voltage problem: /N0/SB#
     Attempt to power up /N0/SB# failed
     /N0/SB#, sensor status, outside acceptable limits

          (Where # is the board number)

Root Cause

Higher ambient air temperature results in a premature degradation of system components which can lead to system outages.  This higher ambient air temperature is not necessarily at a level that exceeds Sun's environmental specifications, or which will result in temperature warning messages to be logged.  Systems that operate in warmer environments are at a potentially higher risk of component failure.  Air filters may not have been properly cleaned or serviced over the life of the product contributing to increased temperature.

Corrective Action

Workaround:

No workaround available - see Resolution section below.

Resolution:

Recognize that the following cooling actions will increase the life of the components but will not eliminate all failures.  Specifically, for components that are going to fail in the short term, the additional cooling may not have enough time to significantly alter the failure time frame.  As time goes on, the additional cooling will have more time to work.  The component's life will be extended as the cooling actions become increasingly significant.

Perform the following actions:

STEP 1. Obtain the following commands from each system to obtain baseline temperature
        readings (from the lom prompt on the System Controller):

     lom> showhostname
     hostname: system_name
     lom> showdate
     Mon Aug 31 22:14:55 CEST 2009
     lom> showenv -ltuvw
     System Controller Board
     Slot     Device     Sensor     Min     LoWarn   Value     HiWarn  Max     Units     Age     Status
          <OUTPUT TRUNCATED, BUT ALL COMPONENTS WILL BE LISTED>
 
STEP 2. Improve ambient air temperature levels as much as possible for all Sun Fire E2900,
        V1280 and Netra 1280, 1290 systems by performing the following steps, if applicable:

   . Position additional vented floor tiles or perform other ventilation changes
     to reduce ambient air temperature.

   . Re-position the system to a cooler ambient air temperature environment by
     relocating to a different location, alter rack mount location, etc.

   . Reduce ambient air temperature level via increased cooling if customer
     environment can support this.

   . Validate that any empty board slots have the proper Filler Panel installed
     to assure correct chassis airflow.

STEP 3. For Sun Fire V1280, and E2900 systems:

   . Remove the left input air filter to increase the chassis airflow.  Filters that
     haven't been cleaned previously may stick to the door.

   . These systems are currently qualified to operate without this air filter in
     standard data center environments and this maintenance action is required.

   . Field engineers are being instructed to remove the left input air filter, if
     there is one present whenever they service a V1280 or E2900.

   . Page 1 of the Sun Fire E2900/V1280 and Netra 1280 Systems Filter Installation Guide...

       http://dlc.sun.com/pdf/817-2680-12/817-2680-12.pdf

     ...provides details on removing the filter.

   For Netra 1280 systems:

   . Clean or replace the left input air filter.  Filters that haven't been cleaned
     previously may stick the the door.  Air Filter kits are Sun order number X6806A-Z.

     NOTE:  Customers are responsible for ordering the replacement air filter kits and
               and this maintenance action is required.

   . N1280 systems are not qualified to operate without this air filter.

   . Filters are required to be inspected and cleaned or replaced as necessary every
     3-6 months as per the Periodic Maintenance instructions in the Service Manual.

   . See the Sun Fire E2900/V1280 and Netra 1280 Systems Filter Installation Guide...

       http://dlc.sun.com/pdf/817-2680-12/817-2680-12.pdf

     ...provides details on replacing the filter.

   For Netra 1290 systems:

   . Clean or replace the left input air filter.  Filters that haven not been cleaned
     previously may stick to the door.  Air Filter kits are Sun order number X6806A-Z.

     NOTE:  Customers can order filters now, but there will limited availability
            until late November 2009.  Customers are responsible for ordering
            the replacement air filter kits.

   . N1290 systems are not qualified to operate without this air filter.

   . Filters should be inspected and cleaned or replaced as necessary every 3-6 months
     as per the Periodic Maintenance instructions in the Service Manual.

   . See the Netra 1290 Server Service Manual...

        http://dlc.sun.com/pdf/819-4373-10/819-4373-10.pdf

    ...section C-1 Periodic Maintenance for details.

STEP 4. Install System Controller Application (ScApp) Firmware release 5.20.14, which
       includes an RFE to increase the default fan speed to increase system cooling.

      . The patch ID is 114527-15.

STEP 5. As close as possible to 24 hours later, obtain outputs from the following commands
        for each system to obtain post-change temperature readings (from the lom prompt on
        the System Controller).  Archive and retain this information.

     lom> showhostname
     hostname: system_name
     lom> showdate
     Mon Aug 31 22:14:55 CEST 2009
     lom> showenv -ltuvw
     System Controller Board
     Slot     Device     Sensor     Min     LoWarn   Value     HiWarn  Max     Units     Age     Status
          <OUTPUT TRUNCATED, BUT ALL COMPONENTS WILL BE LISTED>

     NOTE:  If the system is significantly busier or less busy for this second reading
           than the first, temperature differences will also be significant.  Try to
           take this reading when system state and data center environment is nearly
           identical to the baseline for most accurate measurement of improvement
           (ie; same time of day).

Expectations:

Filter removal (Sun Fire E2900 and v1280) and Replacement (Netra 1280 and 1290) should reduce average board temperatures but actual results will vary.  Archive and retain the showenvironment data in the event it needs to be referred to in the future.  You do not need to provide this data to anyone, it is purely intended to provide a reflection of the impact of this procedure.

All existing Sun Fire E2900, V1280 and Netra 1280, 1290 systems may be impacted by prolonged exposure to higher ambient air temperature.  There is no method to determine the level of degradation on system components.

Sun Fire 3800/48x0/6800/E4900/E6900 and 12K/E20K/15K/E25K systems do not show the same failure rate, due to the increased system cooling capacity and the typical cooler environmental conditions that these systems operate in.

Comments

When a board fails for the errors described above, replace the board.  For systems with air filters installed, filters should be inspected and cleaned, or replaced, as necessary every 3 to 6 months as per the Periodic Maintenance instructions in the Service Manual.

In summary, the actions recommended by this document are as follows (Perform these actions Pro-Actively on ALL effected systems):

Sun Fire V1280/E2900: Immediate Actions - REMOVE input air filter to improve ambient temperature, and Install ScApp 5.20.14 (patch 114527-15).

Netra 1280/1290: Immediate Actions - REPLACE air filter to improve ambient temperature, Install ScApp 5.20.14 (patch 114527-15), and inspect filter every 3 to 6 months and replace as needed.

Note:  For all platforms capture showenv data as previously described.

Please send questions with regards to this issue to the  [email protected]  email alias.

References:

    Escalation ID: 1-554249604
    Resolution Patches: 114527-15
    Reference Manual:  Sun Fire E2900/V1280 and Netra 1280 Systems Filter Installation Guide
                       Netra 1290 Server Service Manual
                         
    Related URL(s):  http://dlc.sun.com/pdf/817-2680-12/817-2680-12.pdf
                     http://dlc.sun.com/pdf/819-4373-10/819-4373-10.pdf




For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

For Sun Authorized Service Providers go to:

In addition to the above you may email:


Modification History
Changes made since initial publication.

22-Mar-2010
  • Changed from Reactive to Controlled Proactive, moved workaround down to resolved with significant rewrite of Resolution section.  Changed Pending Patch to Resolution Patch.

Internal Contributor/submitter
[email protected]

Internal Eng Responsible Engineer
[email protected], [email protected] Responsible Manager: [email protected], [email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Eng Business Unit Group
Systems Group-Netra Systems and Networking Systems Group-Enterprise Systems

Internal Sun Alert & FAB Admin Info
14-Oct-2009: Completed draft and sent to Extended Review.
16-Oct-2009: No feedback from Ext Rvw - sending to Publish.
18-Nov-2009: Corrected Product Name to swoRDFish inconsistency.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback