Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1001778.1
Update Date:2010-10-06
Keywords:

Solution Type  Technical Instruction Sure

Solution  1001778.1 :   Instructions on How to Gather Data from a Hung Domain on a Sun Fire[TM] 3800, 4800/4810, 6800, E2900, E4900, E6900, V1280 or Netra[TM] 1280, 1290 server [Video]  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  
  • GCS>Support>KM>Content>Video
  •  
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
202431


Applies to:

Sun Fire E6900 Server
Sun Fire V1280 Server
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
All Platforms

Goal

Description
Instructions on how to gather data from a hung Sun Fire[TM] SF3800/SF4800/SF4810/SF6800/E4900/E6900/E2900/V1280.

Available for this topic, a brief how-to video tutorial that provides step-by-step instructions answering Sun's most frequently asked questions. View the video and/or follow the detailed instructions below.


Video - Troubleshooting a hung domain (5:00)


Sunsolve users must download the attachment to view the video.


Solution

Steps to Follow

Please make sure to follow each step in the order in which it is presented.
The instructions for a Sun Fire[TM] E2900, V1280 and Netra[TM] 1280, 1290 are highlighted separately as they employ Lights Out Management(LOM) instead of the System Controller(SC) employed by the Serengeti/Amazon class of servers.

Instructions for Platforms employing the System controller(SC)

1. Ensure that the domain is actually hung:

       - Can you ping the domain?
- Can you telnet to the domain?

2. Ensure that the SC (System Controller) is not hung, If you can access the System Controller, proceed to login to the SC and obtain a platform shell.

       A.If you get to the platform shell run the following commands:
               SCname:SC> showlogs
SCname:SC> showplatform
       B. If the SC is hung See Document 1002033.1 for details on how to recover from a hung system controller. Then go back to step 2A. 

3. Once in the platform shell attempt to get a domain shell:

               SCname:SC> console -d 

- If the command appears to hang, then we need to send a break signal to the domain.

       - if you are using telnet: Press CTRL ]
at the telnet prompt type: send break
       - if you are connected to the SC via tip: use ~#

At this point you should have a domain shell prompt, continue with the following commands, otherwise continue to step 4.

- If you get the domain shell run the following commands:


SCname:A> showdomain -p status
SCname:A> showlogs

Then type break to get to the OBP. if this takes you to the ok prompt then type sync to force a core file.

4. If you were not able to get to the ok prompt, then the system is really hung and we will need to send an XIR (externally initiated reset) to the domain.

From the domain shell type: reset
This command will give different behavior depending on what the OBP
variable error-reset-recovery is set to. If this variable is set to
sync, a core file will attempt to be taken. If it is set to boot, then
the system will just reboot as if the boot command was issued at the
ok prompt. If it is set to none it should drop you to the ok prompt,
where you can run the following commands, the '#' sign represents the
cpu that we took the XIR on, use that number in the cbuf command if
possible run this command on each of the cpus (some depend on
firmware level of the SC):
{#} ok dump-sigblock
{#} ok # cbuf
{#} ok .xir-state-all

- If you were not able to return to the ok prompt, but have a domain prompt type the following command:

SCname:A> showresetstate

5. If none of these tactics work you may be forced in to just powering off the domain.

If this is the case then do a setkeyswitch off for the domain.

Instructions for Platforms employing Lights Out Management(LOM)

1. Ensure that the domain is actually hung:

       - Can you ping the domain?
- Can you telnet to the domain?

2. Login to the LOM prompt via telnet/ssh or tip.

    A. once you get the lom prompt, run the following commands:
        lom>showsc -v
lom>showlogs -v

3. Try to connect to the domain and see what state it is in:

   A. use the console commands to connect to domain
       lom> console
   B. If there's no response from console, use escape sequence to break out.
The default escape sequence is "#."
       lom>console
#.
lom>
   C. Once the domain is confirmed to be un-reachable, go to next step.

4. Using the 'break' or 'reset' command to recover.

   A. Try to break into the OBP by 'break' and if you get to OBP,
do a sync to collect a corefile.
      lom>break
      This will suspend Solaris.
Do you want to continue? [no] yes
Type 'go' to resume
debugger entered.
      {3} ok sync
  B. If 'break' does not work, a 'reset' has to be used
and 'showresetstate' collected as well. The behaviour of reset also
depends on the settings used in OBP for error-reset-recovery which
should preferably be set to 'sync'.
      lom>reset
      This will abruptly terminate Solaris.
Do you want to continue? [no] yes
      lom>showresetstate

5. If none of the procedures above work, a poweroff/poweron needs to be issued.

      lom> poweroff all
lom> poweron all


Product
Sun Fire 6800 Server
Sun Fire 4810 Server
Sun Fire 4800 Server
Sun Fire 3800 Server
Sun Fire E6900 Server
Sun Fire E4900 Server
Sun Fire E2900 Server
Sun Fire V1280 Server

Internal Comments
Audited/updated 11/05/09 - [email protected], Mid-Range Server Content Team

NOTE: Procedures given in this document are dependant on OBP and SC versions

System Controller, SC, Sun Fire, Serengeti, kernel, XIR
Previously Published As
46780

Change History
Date: 2008-11-19
User Name: T230884
Action: Quality Review
Date: 2007-06-04
User Name: 97961
Action: Approved
Comment: - Fixed STM formatting warnings/issues
Version: 11
Date: 2007-06-04
User Name: 97961
Action: Accept

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback