Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1006088.1
Update Date:2011-01-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  1006088.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: Dstop: AMX: Detected Header parity error from AXQ  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
208489


Applies to:

Sun Fire 15K Server
Sun Fire 12K Server
All Platforms

Symptoms

Domain Dstop with this error type:
AMX: Detected Header parity error from AXQ

ADR Ereport:
ereport.asic.amx.status.detected_header_parity_error_from_axq

Related ADR Ereport:
ereport.asic.rmx.status.detected_header_parity_error_from_axq

Message in platform log report something similiar to:

Event: SF15000-8009-VE CSN: 0409AK20AE DomainID: C ADInfo: 1.SMS-DE.1.6 Time: Fri Jul 14 10:54:07 IST 2006 Recommended-Action: Service action required

and 'wfail' output, found in the domain directory /var/opt/SUNWSMS/SMS1.6/adm/<domain letter A..R>/wfailoutput, reports something similar to the following:
sms-svc:25> cat wfailoutput.060714.1054.07
03 By hpost v. 1.6 Generic 124319-03 Apr 4 2006 16:46:04 executing as pid=27337
04 On ssc name: sf15k-sc1.
05 Domain = 0=A = domA Platform = sf15k01
06 Boards in dump: master SC CPs/CSBs[1:0]: 3
07 EXB[17:0]: 00603
08 Slot0[17:0]: 00603
09 Slot1[17:0]: 00003
10 -D option, -d
11 "DSMD DomainStop Dump"
12 8 errors occurred while creating this dump.
14 All master SDIs in this dump indicating valid error info [00603]
15 indicate the first error was Dstop for all of EXB EX9.
16 SDI EX09/S0: All SDI is DStopped and RStopped, requested by DARB.
17 DARB C0: enabled ports (expanders) [17:0]: 0160F
18 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 01600
19 DARB C0 Port 9 InterAsicStatus[31:0] = B02C0041
20 IAStat[18,28]: AMX A0 requests Dstop+Rstop for this exp
21 IAStat[19,29]: AMX A1 requests Dstop+Rstop for this exp
22 AMX C0/A0 (0.0) Port 9 Status[1][21:0],[0][31:0] = 0130AA CE010C01
23 P9Stat0[16]: D 1E Detected Header parity error from AXQ
24 P9Stat0[25]: Local NACK occurred, transaction discarded
25 FAIL EXB EX9 with Addr Bus C0: Dstop/Rstop detected by AMX.
26 Primary service FRU is EXB EX9.
27 Secondary service FRU is CSB C0 or the logic centerplane.
28 AMX C0/A1 (0.1) Port 9 Status[1][21:0],[0][31:0] = 0172B1 CE020C02
29 P9Stat0[17]: D 1E Detected Information parity error from AXQ
30 P9Stat0[25]: Local NACK occurred, transaction discarded
31 FAIL EXB EX9 with Addr Bus C0: Dstop/Rstop detected by AMX.
32 Primary service FRU is EXB EX9.
33 Secondary service FRU is CSB C0 or the logic centerplane.
34 DARB C1: enabled ports (expanders) [17:0]: 0160F
35 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 01600
36 DARB C1 Port 9 InterAsicStatus[31:0] = B0200041
37 IAStat[28]: AMX A0 requests Domainstop for this exp
38 IAStat[29]: AMX A1 requests Domainstop for this exp

Cause

This error will result in domain reboot and in some system components being disabled.

Solution

Collect an explorer from the main system controller and contact your authorized service provider.

Product
Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server

Keywords
15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K, Dstop, Detected Header parity error from AXQ, amx, rmx, AMX, RMX

Internal Section

Previously Published As 52162

Detailed troubleshooting info

The dump header tells us that this Dstop was generated by dsmd (lines 10,11)
while a domain was active. This is also evident by the dumpf file name -
dsmd.dstop files are created by dsmd to capture the error state. Also note
that 8 errors occured while collecting the state dump (line 12). These errors
should be investigated; refer to Doc 1003356.1 (previously 52062)

Walking the error chain:
- All SDIs concur that the stop message is from EX9 [mstop1 analysis]
(lines 14,15)
- EX9/SDI0 is responding to a DARB request to stop (line 16)
- DARB0 reports errors requested by the AMXs for port 9 (line 19-21)
- AMX0.0 port 9 reports a parity error detected from the AXQ (lines 22-24)
AMX0.1 port 9 also reports a parity error detected from the AXQ (lines
28-30)
- 'wfail' FAILs out configurations using EX9 with the low address bus
(line 25)
- EX9 and CP half 0 are named as the primary and secondary FRUs (lines 26,
27,32,33)

'wfail' clearly notes that the parity error is between EX9 and the AMXs
on CP0. Since both AMXs on CP0 are involved,
the CSB supporting CP0 may be a factor.
But, an exhaustive search of all AMX ports on CP0 reveals that only
port 9 shows parity errors (hint: use the 'repeat' command).

As such, the error(s) are isolated to EX9 and slot 9 in the centerplane.
Since the pathways from EX9/AXQ and AMX0.0/AMX0.1 cross
an interconnect, a single FRU cannot be isolated.

Resolution

Start with replacement of the expander. If the domain has been in operation
for a period of time, bent pins are unlikely (pins don't magically bend when
an expander has been in place for a while). If problems persist, replace the
centerplane

References and bug IDs

1001657.1 - An Overview of Dstop Diagnosis
1006074.1 - Dstop: Using the MStop1
1003356.1 - redx: Tips and Tricks
1010372.1 - Dstop: AMX: Detected Information parity error from AXQ

Additional background information

When an AMX detects a parity error, the history records in the AMX and AXQ
can be compared to isolate single bit error. The 'parse axqoh' command is
used to do this isolation. Refer to SRDB 1010372.1 (previously 52113)
for an example.

*******************************************************************
SMS 1.4 introduces Auto-Diagnosis and Recovery (ADR) for error
events on the Sun Fire 12K/15K platform. Events that occur are
automatically analyzed on platform and generate Event Codes, also
known as Fault Analysis Codes. When translated, the codes provide
a Service Action Plan to resolve the error event. Each error
event log, produced by the SMS-DE or POST-DE diagnostic engine,
is comprised of several layers of information, each layer providing
more detail to the error event.

The topmost layer is the Event Code. For example, Sf15000-8000-H9,
represents a System Board failure. This code represents that a system
board event occurred but does not specify what component on the system
board has the problem.

More detailed error event information is collected deeper within
the log by examining the EReports (Error Reports) for each of these
error events. The reports provide more detailed descriptions of the
detectors of the error that were used in the analysis and diagnosis
by the ADR diagnostic engine(s). By examining the EReports, you can
identify the component(s) which are actually the root cause to the
error event or the component(s) affected by the event.

*******************************************************************
See the System Management Services (SMS) Administrator Guide,
Chapter 5 for more details of Automatic Diagnosis and Recovery.
*******************************************************************

The topic covered in this SRDB is ADR EReport event:
ereport.asic.amx.status.detected_header_parity_error_from_axq
ereport.asic.rmx.status.detected_header_parity_error_from_axq

*******************************************************************

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback