Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1321278.1
Update Date:2011-05-25
Keywords:

Solution Type  Troubleshooting Sure

Solution  1321278.1 :   Sun Enterprise[TM] 10000: Troubleshooting Domain Panics  


Related Items
  • Sun Enterprise 10000 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  




In this Document
  Purpose
  Last Review Date
  Instructions for the Reader
  Troubleshooting Details


Applies to:

Sun Enterprise 10000 Server - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Information in this document applies to any platform.

Purpose

This document provides troubleshooting information for various panics commonly seen on E10000 domains.

Last Review Date

May 11, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Cannot allocate IOMMU TSB arrays

Symptom:

Boot device: /sbus@40,0/SUNW,qfe@0,8c00000  File and args:
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
23a00 X
Requesting Internet address for 0:0:be:a6:68:1b
Internet address is xxx.xxx.xx.xx = xxxxxxxx
hostname: chef
domainname: Lab.Sun.COM
root server: venus
root directory: /export/install/7/base.s998s_u3smccServer-11/Solaris_2.7/Tools/B
oot
Alloc of 0x2140000 bytes at 0x10b80000 refused.
panic[cpu0]/thread=10404040: Cannot allocate IOMMU TSB arrays
rebooting...
Resolution:

The system is trying to boot Solaris 7, but Solaris 2.6 is specified in the domain_config file. Correct the domain_config file.


Fast Data Access MMU Miss

Symptom:

KERNEL dropped into OBP due to following trap at trap level = 1 
Fast Data Access MMU Miss
Normal Alternate MMU Vector
0: 0 0 0 0
1: 0 1ff00000000 ffe916d0 40
2: 0 0 f1000000 12800003808
3: 0 0 0 0
4: 0 1fff0008d1c 0 1ff00000000
5: 0 f0008d1c f 0
6: 0 0 fff00000 0
Resolution:

There are some likely possibilities:

1. inetboot file in /tftpboot on the SSP is incorrect.

2.  400/8MB processors involved and boot image does not have the latest kernel patch

3. dr-max-mem set too large or incorrectly on Solaris 2.5.1

4. A hardware problem. Run an hpost -l32 or hpost -l64 on the domain.    bringup -D on can also done.


lock_set_spl: 70222069 lock held and only one CPU

Symptom:

Rebooting with command: boot net -v
It took 741 milli seconds to do mailbox callback
Boot device: /sbus@44,0/SUNW,qfe@0,8c00000 File and args: -v
Using Onboard transceiver - Link Up.
2ee00
Server IP address: xxx.xx.xxx.xx
Client IP address: xxx.xx.xxx.xx
Using Onboard transceiver - Link Up.
hostname: lima
domainname: dom1.something.com
root server: beans-ssp
root directory: /cdrom/sol_2_6_598_sparc_smcc_svr/s0/Solaris_2.6/Tools/Boot
Size: 335983+72325+449939 Bytes
cpu0: SUNW,UltraSPARC (upaid 4 impl 0x11 ver 0xa0 clock 400 MHz)
cpu1: SUNW,UltraSPARC (upaid 5 impl 0x11 ver 0xa0 clock 400 MHz)
cpu2: SUNW,UltraSPARC (upaid 6 impl 0x11 ver 0xa0 clock 400 MHz)
cpu3: SUNW,UltraSPARC (upaid 7 impl 0x11 ver 0xa0 clock 400 MHz)
It took 916 milli seconds to do mailbox callback
SunOS Release 5.6 Version Generic_105181-05 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.
Using default device instance data
mem = 4194304K (0x100000000)
avail mem = 4156407808
panic[cpu4]/thread=0x10404040: lock_set_spl: 70222069 lock held and only one CPU
rebooting...
BAD TRAP: cpu=4 type=0x31 rp=0x104035b8 addr=0x17 mmu_fsr=0x0
: trap type = 0x31
Resolution:

The CPUs in the domain being booted have an 8MB cache size, and the patch level of Solaris being booted is choking on this. Use the OBP command limit-ecache-size.


munged memory list

Symptom:

Boot device: /sbus@64,0/SUNW,hme@0,8c00000  File and args: - install
2ee00 hostname: foo
domainname: bag.com
root server: foobar
root directory: /export/install/sparc/os/2.6-598-419+/Solaris_2.6/Tools/Boot
panic[cpu38]/thread=0x10404000: munged memory list = 0x10403914
Resolution:

The system is trying to boot Solaris 2.6, but Solaris 2.5.1 is specified in the domain_config file. Correct the domain_config file.


Async data error at tl1

Symptom:

System panics with Async data error at tl1.

Resolution:

This is generally indicative of an E-cache parity error on a CPU. The SPARC Architecture Manual writes:

An asynchronous data error occurred on a data access. Examples: an ECC error occurred while writing data from a cache store buffer to memory, or an ECC error occurred on an MMU hardware table walk.

The panic string will report the failing CPU.

panic[cpu44]/thread=0x74714720
Async data error at tl1

Replace the CPU reported.


Ecache SRAM Data Parity Error

Ecache Writeback Data Parity Error

UE Error: Ecache Copyout on CPUyy

Symptom:

System panics with one of the following Ecache SRAM Data Parity Error Ecache Writeback Data Parity Error UE Error: Ecache Copyout on CPUyy

Resolution:

These are E-cache parity error panics caused by a CPU. Click here for details on which CPU needs replacement.


kstat_q_exit: qlen == 0

Symptom:

System panics with kstat_q_exit: qlen == 0.

Resolution:

Check if EMC disk is attached to the domain. It is possible for Solaris to overflow EMC's queues. EMC has a restriction on tag queue depth and suggests reducing the default sd throttle.

To reduce the throttle, add set sd:sd_max_throttle=20 in /etc/system.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback