Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1168273.1
Update Date:2011-06-10
Keywords:

Solution Type  Problem Resolution Sure

Solution  1168273.1 :   Sun Storage 7000 Unified Storage System: Service Processor (SP) Issues  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Storage 7110 Unified Storage System
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Unified Storage
  •  




In this Document
  Symptoms
  Cause
  Solution
  References


Applies to:

Sun Storage 7110 Unified Storage System - Version: Not Applicable and later   [Release: N/A and later ]
Sun Storage 7410 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 7210 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 7310 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Information in this document applies to any platform.

Symptoms

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 7000 Series ZFS Appliances

Service Processor firmware version below 2.0.2.16 can leak memory eventually resulting in a variety of issues as listed below for 7110, 7310 and 7410.

 - Cannot connect to Service Processor via serial or network
 - Service Processor absent from hardware details page in BUI
 - Alert: Service Processor has stopped responding to requests
 - Directories, such as /SYS, missing from SP interface.
 - Fans in server node running continuously at full speed
 - Slow throughput to system disks (due to fan vibration)
 - Time out during software upgrade (due to system disks/fan vibration)

Same applies to 7210 for Service Processor firmware version below 2.0.2.15.

Alert Example:
---------------------------------------------------------------------------------------------------
SUNW-MSG-ID: AK-8000-86, TYPE: Defect, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Aug 4 10:32:13 2009
PLATFORM: i86pc, CSN: 0810QAS002, HOSTNAME: XXXX
SOURCE: appliance/kit/akd:default, REV: 1.0
EVENT-ID: 8b942adb-4213-4cf5-df69-d567f6ecab1b
DESC: The service processor needs to be reset to ensure proper functioning.
AUTO-RESPONSE: None.
IMPACT: Service Processor-controlled functionality, including LEDs, fault management, and the serial console, may not work correctly.
REC-ACTION: Click the initiate repair button.
---------------------------------------------------------------------------------------------------

--------------
TIME UUID SUNW-MSG-ID
Nov 19 15:13:51.3262 0522677b-24ed-e55e-c47c-a5cce7260c2f AK-8000-86

TIME CLASS ENA
Nov 19 15:13:51.1931 ereport.ak.xmlrpc.hardware.sp.uptoolong 0x0000000000000000

::
class = defect.ak.xmlrpc.hardware.sp.needreset
-------------

-----------------------------------------------------------------------------------
Feb 08 22:12:06.7927 ereport.ak.xmlrpc.hardware.sp.uptoolong
-----------------------------------------------------------------------------------

Cause

A number of CRs for memory leaks on the Service Processor. Over time memory becomes depleted and the Service Processor becomes unresponsive and/or hangs.

When present, the issues surface somewhere between 30 and 60 days of uptime. There is some variation in the time between failures, their severity, and even whether or not they occur at a particular site. The reasons for these variations are not known at this time.



Solution

The appliance software, as of version 2009.Q3, has a mechanism to reset the Service Processor every 60 days, or sooner if it becomes unresponsive. This is sufficient to prevent the issues on the majority of systems.

For systems that experience the problems described above, use the following procedure:

First, ensure the Service Processor is responding. This is best done by resetting the Service Processor. Use one of the following two methods:

* Enter
   maintenance hardware select chassis-000 select sp reset
   at the appliance kit shell.

* If you have an alert by clicking on the repair.

This process takes some time, on the order of five minutes. (If you are hit by high speed fan issue due to the SP memory leak issue then the main external indication that the reset has completed is that the fans spin down to a normal speed). You can also monitor progress for any of these operations via a serial connection to the SP.

Next, verify that the Service Processor has been reset, via the Alert Log. You should see that the service processor either stopped, then resumed responding to requests, or simply resumed, in the case of a Service Processor that was previously unresponsive.

References

<BUG:6859470> - SP GOES OUT TO LUNCH CAUSING FANS TO RATTLE DISKS
<BUG:6868208> - UPDATE TEXT OF MSG AK-8000-86 FOR CLARIFICATION
<NOTE:1267544.1> - Older versions of the Service Processor firmware on Sun Storage 7110, 7210, 7310 and 7410 can leak memory.

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback