Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1004364.1
Update Date:2011-04-27
Keywords:

Solution Type  Problem Resolution Sure

Solution  1004364.1 :   Sun Fire[TM] Midrange Server: Safari Port Error may be caused by a resetting SC  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Netra 1290 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
206035


In this Document
  Symptoms
  Cause
  Solution


Applies to:

Sun Netra 1280 Server
Sun Netra 1290 Server
Sun Fire V1280 Server
Sun Fire 4800 Server
Sun Fire 6800 Server
All Platforms

Symptoms

A reset on the Sun Fire[TM] v1280/E2900/3800/4800/4810/E4900/6800/E6900 or Netra[TM] 1280/1290) System Controller (SC) or LOM may cause false hardware failures.

Cause

If these types of resets happen during a jtag scan, hardware may be disabled due to Safari Port Errors.

Solution

Hardware errors seen after a SC resets are usually false. Look for SC resets with any of these SBBC Reset Reason(s):

  • Peer Reset
  • Watchdog Reset
  • SC Reset Button
  • Software Reset


Resolution
There is currently no resolution to the hardware errors seen after a reset. The errors are false. For resolution, the root cause of the SC reset must be found.

The hardware disabled immediately after the SC has reset is good and can be re-enabled. The hardware errors are the result of the SC being interrupted during a critical operation.

Resolution is to fix the cause of the SC reset.

Please gather an explorer with scextended or 1280extended and as much information as possible about the customers SC network configuration. The serial console output from the system controller at the time the reset occured is also very useful.


Relief/Workaround

Known Causes of SC resets are:

  • Manual intervention - someone pressing the reset button present on some platform system controllers.
  • Resets caused by hardware errors on the System controller.
  • Resets caused by software errors on the System Controller.



Additional Information
The important thing to look at is the messages.

Look for messages indicating a reset before failed hardware is called out. Reset message types include Reset Reason(s): Peer Reset, Watchdog Reset, SC Reset Button, Software Reset before the falsely failed hardware is called out. Usually errors of the type seen after an SC reset are the result of failed hardware, so the first inclination is to replace the hardware. Several instances of replaced hardware may occur before this is diagnosed properly giving the impression of a quality issue.

If hardware errors are preceded by the Reset, they are the result of that reset and NOT bad hardware. Not all Resets cause Safari errors. The errors depend on what the SC was doing at the time of the reset.

Example of the SC reset from /var/adm/messages on a E2900:


Mar 3 06:53:57 sppuap01 lw8: [ID 128070 kern.notice] Main, up 94 days 08:01:42, Memory 6,968,824
Mar 3 10:06:01 sppuap01 lw8: [ID 190882 kern.notice] Unretrieved lom log history follows ...
Mar 3 10:06:01 sppuap01
Mar 3 10:06:03 sppuap01 lw8: [ID 650827 kern.notice] 3/3/07 7:04:14 AM Boot: ScApp 5.20.2, RTOS 45
Mar 3 10:06:03 sppuap01 lw8: [ID 811040 kern.notice] 3/3/07 7:04:16 AM SBBC Reset Reason(s): Peer Reset, Watchdog Reset




Below are some of the hardware errors that may appear:

From showerrorbuffer:


ErrorData[0]
Date: Fri Aug 17 20:54:58 PDT 2007
Device: /RP0/dx1
ErrorID: 0x31273023
Register: Safari Port Error Status 3[0x22] : 0x00000004
SafPar [02:02] : 0x1 Safari input parity error

ErrorData[7]
Date: Fri Aug 17 20:54:59 PDT 2007
Device: /SB2/sdc0
ErrorID: 0x60171010
Register: SafariPortError0[0x200] : 0x00000002
ParSglErr [01:01] : 0x1 ParitySingle error



CHS Error example:


Component : SB2
Time Stamp : Fri Aug 17 23:55:36 EDT 2007
New Status : FAULTY
Old Status : OK
Event Code : 01000006 (unrecognized value)
Initiator : SCAPP
Message : 1.E2900.FAULT.ASIC.DX.SERD.SAF_IN_PAR_ERR.31271023.20-3.2.5406679000200



Showlogs Example:


Fri Aug 17 20:54:23 v1280-lom lom: [ID 434738 local0.error] /N0/SB2 encountered the first error
Fri Aug 17 20:54:23 v1280-lom lom: [ID 277478 local0.error] DxSbAsic reported first error on /N0/SB2
Fri Aug 17 20:54:23 v1280-lom lom: [ID 357707 local0.error]
/SB2/sdc0:
SafariPortError0[0x200] : 0x00020002
AccParSglErr [17:17] : 0x1
ParSglErr [01:01] : 0x1 ParitySingle error

Fri Aug 17 20:54:23 v1280-lom lom: [ID 539903 local0.error]
>>> SafariPortError1[0x210] : 0x00028002
AccParSglErr [17:17] : 0x1
ParSglErr [01:01] : 0x1 ParitySingle error
FE [15:15] : 0x1








Third party Security scanning software is known to cause Peer Reset / Watchdog Resets


We have seen this issue with products from several companies which exploit a weakness in the SC's ssh version. The companies we have seen this are Nessus and Forescout. A workaround in this case is to disable scanning of the SC, and/or switching to telnet as a connection type on the sc.

The CR related to this is network induced reset is SunBug 6539431 .
Peer, reset, SafariPortError, sbbc, ParSglErr, ParitySingle, Watchdog, ssh, serengeti, lw8, lom, quality
Previously Published As
90665

Change History
Date: 2007-12-22
User Name: 71396
Action: Approved
Comment: Performed final review of article.
No changes required.
Publishing.
Version: 7
Date: 2007-12-20
User Name: 71396
Action: Accept
Comment:
Version: 0
Date: 2007-12-20
User Name: 103287
Action: Approved
Comment: Gave the doc some formatting to make it look better, added links, etc. Changes look good. Ready to go.
 Josh
Version: 0
Date: 2007-12-19
User Name: 24214
Action: Approved
Comment: Added internal note with a pointer to an action plan for next time someone sees this.
Version: 0


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback