Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1009241.1
Update Date:2011-05-27
Keywords:

Solution Type  Technical Instruction Sure

Solution  1009241.1 :   Sun Fire[TM], Netra[TM] servers: Lights Out Management(LOM) troubleshooting information to gather.  


Related Items
  • Sun Netra T1405 Server
  •  
  • Sun Netra T1 AC200 Server
  •  
  • Sun Fire V120 Server
  •  
  • Sun Netra T1400 Server
  •  
  • Sun Netra X1 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Netra 440 Server
  •  
  • Sun Netra T1 DC200 Server
  •  
  • Sun Fire V100 Server
  •  
  • Sun Netra 120 Server
  •  
  • Sun Netra 20 Server
  •  
  • Sun Fire V1280 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  

PreviouslyPublishedAs
212786


Applies to:

Sun Netra 440 Server
Sun Fire V120 Server
Sun Netra T1 AC200 Server
Sun Netra T1 DC200 Server
Sun Netra T1400 Server
All Platforms

Goal

This document provides what Lights Out Management (LOM) information should be
gathered to troubleshoot a Sun Fire[TM] or Netra[TM] server.

Solution

The Fault LED on a Sun Fire server or Netra server can have three states: off, solid, or flashing. Refer to the following URL, and select the related server for more information about LEDs:
https://support.oracle.com/handbook_private/General/LEDs_TOC.html

 

To determine what actually failed, or what caused the fault condition, service personnel need to access the system controller, the Lights Out Management(LOM). You can access the LOM as follows:

- Through the LOM management serial port,
- Through the operating system(OS) by using lom commands,

USING THE LOM MANAGEMENT SERIAL PORT

The LOM Management port shares the serial port with the console. Use the #. key sequence to go to the "lom" prompt if it is not already there. The "console" command will provide access back to the server's console. Dropping to the lom prompt does not affect the OS in any way, unless you issue strong commands, such as "break", "reset", or "poweroff" from the LOM.

Run the following commands from the lom> prompt when debugging information is needed:

1. Hardware failures, such as the power supply and the fan, can be detected by the "environment" command. Any component's state labeled as "FAILED" might indicate a bad component requiring replacement. Sample output is as follows:
lom>environment
Fault ON
Alarm1 OFF
Alarm2 OFF
Alarm3 OFF
Fans:
1 fan1 OK speed 61%
PSUs:
1 OK
Temperature sensors:
1 Enclosure 21degC OK
Overheat sensors:
1 CPU OK
Circuit breakers:
1 USB0 OK
2 USB1 OK
3 SCC OK
Supply rails:
1 5V OK
2 3V3 OK
3 +12V OK
4 -12V OK
5 VDD core OK

2. If the environment command does not show any failing components, check the events log for the trigger. Use the "showlogs -v" command from the lom> prompt, as shown below:
lom>showlogs -v
SCC card removed:
+1d+8h5m9s host FATAL FAULT: SCC removed <--- Cause
+1d+8h5m9s Fault LED 3Hz <--- Fault LED flashing
Rocker switch/Power switch/Power Button switch turned to off:
+11d+0h14m58s host FAULT: unexpected power off <--- Cause
+11d+0h14m58s Fault LED ON <--- Fault LED solid
Input power source failure:
+19h25m20s PSU 1 FAULT: state change - InA failed <--- Cause
+19h25m20s Fault LED ON <--- Fault LED solid
Fan failure:
+18d+20h22m59s Fan 4 FATAL FAULT: failed 7% <--- Cause
+18d+20h22m59s Fault LED ON <--- Fault LED solid

3. The LOM can also provide console output from a prior event such as a fatal reset. Use the "consolehistory" command to provide this information, as shown below:
lom>consolehistory
Console history:
ms Inc. SunOS 5.9 Generic May 2002
Jun 27 17:21:29 ita-v120b lom: +12d+3h34m18s host FAULT: SCC - removed
Jun 27 17:21:29 ita-v120b lom: +12d+3h34m18s fault led state - ON
Jun 27 17:22:08 ita-v120b lom: +12d+3h34m57s host SCC - inserted

4. Output from the LOM "version" command may be needed to solve LOM specific problems. Its output appears as follows:
lom>version
LOM version: v3.14 <=== LOM firmware version
LOM checksum: 1190
LOM firmware part# 258-7871-20
Microcontroller: H8/3437S
LOM firmware build Aug 5 2004 10:39:42
Configuration rev. v1.5 <=== LOM EEPROM version

Once the fault has been fixed, the service personnel might need to turn off the Fault LED. This can easily be achieved by issuing a "faultoff" command from the lom> prompt.

USING THE LOM COMMAND FROM THE OS (IF LOMLITE PACKAGES INSTALLED)

The "lom" command can be issued from the OS provided that the "LOMlite" packages have been installed. Use the "pkginfo(1)" command to check if the "LOMlite" packages have been installed:
# pkginfo | grep SUNWlom
system SUNWlomm LOMlite manual pages
system SUNWlomr LOMlite driver (root)
system SUNWlomu LOMlite Utilities (usr)

1. Again, check for problems with the environment as follows:

# lom -plvtf
PSUs:
1 OK
LOM alarm states:
Alarm1=off
Alarm2=off
Alarm3=off
Fault LED=off
Supply voltages:
1 5V status=ok
2 3V3 status=ok
3 +12V status=ok
4 -12V status=ok
5 CPU core status=ok
6 +3VSB status=ok
System status flags:
1 SCSI-Term status=ok
2 USB0 status=ok
3 USB1 status=ok
4 SCC status=ok
System Temperature Sensors:
1 Enclosure 23 degC : warning 67 degC : shutdown 72 degC
System Over-temperature Sensors:
1 CPU status=ok
Fans:
1 OK speed 95%
2 OK speed 91%
3 OK speed 100%
4 OK speed 100%

2. The following command will display the last 50 events:

# lom -e 50
LOM Event Log:
+0h0m0s Fault LED ON
+0h0m0s host power on
+0h3m17s Fault LED OFF
+0h3m56s host power off
+0h4m8s host power on
+0h0m0s LOM booted
+0h0m0s host power on
+0h0m0s LOM booted
+0h0m0s host power on
+0h0m0s Fault LED ON
+0h0m0s host power on
+0h1m19s host power off
+0h1m30s host power on
5/20/2004 4:54:48 GMT LOM time reference
+0h41m33s host reset
5/20/2004 5:51:15 GMT LOM time reference
+0h32m36s host reset
5/20/2004 6:24:38 GMT LOM time reference
+0h0m0s LOM flash download: v3.12 to v3.13
+0h0m0s LOM reset
+0h0m0s host power on
5/20/2004 6:33:57 GMT LOM time reference

The console history and LOM firmware versions can only be obtained by the serial port. To turn off the Fault LED, use the "lom -F off" command from the OS.

Internal Comments
Important note: The Fault LED can be turned on manually by the "lom" command
in the "LOMlite" packages while the OS is running. In a typical scenario, the user
programmable alarms are sufficient for use in their applications. However, there
are occasions when users meddle with the fault LED using the "lom -F on" to get
more alarms (that is, Combination of Fault LED + Alarm LEDs). Do not be
surprised to find Fault LED and Alarms being set when there are no hardware
faults and showlogs does not indicate errors.

For example:
+0h22m30s host power off
+0h22m51s host power on
11/5/2003 6:12:55 LOM time reference
11/19/2003 6:19:35 LOM time reference
12/3/2003 6:26:16 LOM time reference
+17d+5h55m10s Fault LED ON
+59d+20h18m26s host power off
+0h0m0s LOM booted
+0h0m3s host power on
2/1/2004 3:25:22 LOM time reference

Additional LOM information can be found at URL:
http://panacea.uk.oracle.com/twiki/bin/view/Products/ProdInfoLOM

https://support.oracle.com/handbook_private/General/LEDs_TOC.html
can be found internally at URL:
http://support.us.oracle.com/handbook_internal/General/LEDs_TOC.html

lom, lomlite, fault, LED, faulton, faultoff, showenvironment, environment,
showlogs, alom, consolehistory

Previously Published As 77137

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback