Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1006063.1
Update Date:2010-07-15
Keywords:

Solution Type  Technical Instruction Sure

Solution  1006063.1 :   Non-Cacheable Address Space tables for Sun[TM] Fire 3800/4800/4810/6800/E2900/E4900/E6900/V1280 and Netra[TM] 1280/1290 Server  


Related Items
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  

PreviouslyPublishedAs
208454


Applies to:

Sun Netra 1280 Server
Sun Fire V1280 Server
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 6800 Server
All Platforms

Goal

Description

This document provides tables of the Non-Cacheable Address Space for Sun Fire[TM] 3800-6800 systems. The tables can be used for decoding AFAR and (for USIII Cu) AFAR_2 registers on Sun Fire 3800-6800, v1280, and Netra[TM] 1280 systems. Decoding the registers in case of a Domain failure won't necessarily have anything to do with the error, but in practice it often helps to determine a suspect FRU(s).

Caution:  This topic is quite complicated and it is recommended that customers needing to investigate this type of fault do so with caution.  It's advisable to contact Support Services and open a Service Request for this to be resolved.

Refer to Document 1004877.1 for a cheatsheet that provides a breakdown of the address spaces listed in the tables below. The cheatsheet can be used to short circuit the manual decoding of address spaces with simple look-up tables for non-cacheable addresses.

Terms:

  • USIII: (CPU 750Mhz ) On USIII the AFAR register is supported. The AFAR is only valid if there is NO IERR/PERR Bit set.
  • USIII Cu: (CPU >= 900Mhz) On a USIII Cu the AFAR and AFAR_2 are supported.
  • PERR: If the Bit is set it indicates a Protocol Error.
  • IERR: If the Bit is set it indicates an Internal Error.
  • AFAR: Asynchronous Fault Address Register contains the physical address which caused the AFSR register to be set. If multiple errors occur the register is updated based on the priority of the errors. If the first error had a lower priority than the subsequent errors, it gets overwritten. Therefore, the AFAR register is not valid in all cases.
  • AFAR_2: Asynchronous Fault Address Register 2/shadow, which always contains the physical address of the first error which caused the AFSR register to be set. If AFAR and AFAR_2 are different, the register for which the PERR Bit is set contains the first error and should be used for decoding.

Solution

The physical address space assignment is Firmware dependent. The Firmware version on the I/O Board within a Domain determine which address space assignment table is used.
  • If any I/O Board in a Domain has 5.11.X or 5.12.X installed, Table 1 is used.
  • If any I/O Board in a Domain has 5.13.X or higher installed, Table 2 is used.
  • All v1280/Netra 1280 systems should use Table 2

All boards within a Domain should have the same Firmware level. If in a Domain I/O Boats with 5.12.X, 5.13.X or higher Firmware level are mixed, the lowest Firmware level dictates which table to use.

Table 1: Non Cacheable Address Space for 5.11.X, 5.12.X and any domain with an I/O Boat with one of those Firmware installed.

+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-------|
|Physical Address Bit |42|41|40|39|38|37|36|35|34|33|32|31|30|29|28|27|26|25|24|23| 22:0 |
+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-------|
|Cacheable memory | 0| |
+---------------------+--+--+--+--+--+--+--+--+--+--+--------------+--------------+-------|
|Safari Device Config | 1| 0| 0| 0| 0| 0| 0| 0| 0| 0| Node | AID[4:0] |8MB/dev|
+---------------------+--+--+--+--+--+--+--+--+--+--+-----+--------+-----+--+-----+-------|
|Schizo PCI Config& IO| 1| 0| 0| 0| 0| 0| 0| 0| 0| 1|Node | AID[4:0] | P| 32MB per PCI|
+---------------------+--+--+--+--+--+--+--+--+--+--+-----+--------------+--+-------------|
|Reserved | 1| 0| 2 TB - 16 GB reserved |
+---------------------+--+--+--------+--------------+--+----------------------------------|
|Schizo IO Board | 1| 1| Node | AID[4:0] | P| 4GB per PCI Bus(P) |
+---------------------+--+--+--------+--+--+--+--+--+--+----------------------------------|
| UPA Device Pair * | 1| 1| Node | AID[4:0] | 8 GB per UPA Device |
+---------------------+--+--+--------+--------------+--------------+----------------------|
|BootBus | 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 256MB BootBus space |
+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------------------|
*) Node is always 0
**)UPA support was never implemented.

Table 2: Non Cacheable Address Space for 5.13.X or higher.

+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-------|
|Physical Address Bit |42|41|40|39|38|37|36|35|34|33|32|31|30|29|28|27|26|25|24|23| 22:0 |
+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-------|
|Cacheable memory | 0| |
+---------------------+--+--+--+--+--+--+--+--+--+--+--------------+--------------+-------|
|Safari Device Config | 1| 0| 0| 0| 0| 0| 0| 0| 0| 0| Node | AID[4:0] |8MB/dev|
+---------------------+--+--+--+--+--+--+--+--+--+--+-----------+--+-----+--+-----+-------|
|Schizo PCI Config& IO| 1| 0| 0| 0| 0| 0| 0| 0| 0| 1| Node |AID[2:0]| P| 32MB per PCI|
+---------------------+--+--+--+--+--+--+--+--+--+--+--+--------+--+--+--+--+-------------|
|Schizo IO Board | 1| Node*10 +Board# + 1 | S| P| 4GB per PCI Bus(P) per Schizo(S)|
+---------------------+--+-----------------------+--+--+----------------------------------|
|Wildcat Board | 1| Node*10 +Board# + 1 | W| 8GB per WCI(W) |
+---------------------+--+-----------------------+--+-------------------------------------|
|Non-Geographic | 1|161(0xa1) to 254(0xfe) | 16GB per dynamically assigned NCslice |
+---------------------+--+--+--+--+--+--+--+--+--+-----------------+----------------------|
|Reserved | 1| 1| 1| 1| 1| 1| 1| 1| 1|0(0x0)to 62(0x3e)| 16GB - 256 MB |
+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------------------|
|BootBus | 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 1| 256MB BootBus space |
+---------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------------------|
*)Node is always 0

The following examples illustrate the use of the tables and the decoding of the registers. The examples are portions of error messages. On accessing errors it's important to read them in context with the entire log of the Sun Fire System Controller (SSC).

Example 1:

Nov 22 15:59:10 sc0 Domain-C.SC:/partition1/domain0/SB5/bbcGroup1/cpuCD/cpusafariagent0:
AFAR (high)[0x531] : 0x00000422
AFAR [42:32] [10:00] : 0x422
AFAR (low)[0x541] : 0x09800100
AFAR_2 (high)[0x571] : 0x00000422
AFAR_2 [42:32] [10:00] : 0x422
AFAR_2 (low)[0x581] : 0x09800100
AFSR (high)[0x551] : 0x00080000
PERR [19:19] : 0x1
AFSR_2 (high)[0x591] : 0x00080000
PERR [19:19] : 0x1
EMU B[0x511] : 0x03000000
AID_LK [24:24] : 0x1 ATransID leakage error
NCPQ_TO [25:25] : 0x1 NCPQ system bus time-out

This Example uses 5.13.X and 900Mhz USIII Cu CPUs. Since USIII Cu supports the AFAR_2 register and AFAR equals AFAR_2, the register can be decoded using table 2 (Since 5.13.X).

AFAR_2: 0x00000422.09800100

       Bit 42 = 1 => Non Cacheable Address Space
Bit 41:34 = 00001000 => Node*10 +Board# + 1 = 0*10+Board#+1 = 8
=> Board# = 8 - 1 - 0*10 = 7
=> I/O Boat 7
Bit 33 = 1 => Schizo 1
Bit 32 = 0 => Schizo Leaf B
Bit 31:0 => 4GB per PCI Bus(P) per Schizo(S)

Suspect Parts in this case are the I/O Boat 7 and the cPCI/PCI cards in Slots supported by Schizo1 Leaf B. 

See Document 1017926.1 for further instructions on NCPQ_TO errors.

Example 2:

Aug 18 22:11:04 sc0 Domain-A.SC /partition0/domain0/SB0/bbcGroup0/cpuAB/cpusafariagent1:
AFAR(high)[0x531] : 0x00000400
AFAR [42:32] [10:00] : 0x400
AFAR(low)[0x541] : 0x01400010
AFAR_2 (high)[0x571] : 0x00000400
AFAR_2 [42:32] [10:00] : 0x400
AFAR_2 (low)[0x581] : 0x01400010
AFSR (high)[0x551] : 0x00080000
PERR [19:19] : 0x1
AFSR_2 (high)[0x591] : 0x00080000
PERR [19:19] : 0x1
EMU A[0x501] : 0x00002000
UDG [13:13] : 0x1

This Example uses 5.13.X and 1050Mhz USIII Cu CPUs.

Since USIII Cu supports the AFAR_2 register and AFAR equals AFAR_2, the register can be decoded using table 2 (Since 5.13.X).

AFAR_2: 0x00000400.01400010

      Bit 42 = 1 => Non Cacheable Address Space
Bit 32:28 = 00000 => Node 0
Bit 27:23 = 00010 => AgentID 2 => CPU/Memory Board 0
Bit 22:0 => 8MB/dev

Suspect Part in this case is CPU 2 on CPU/Memory board 0.

Example 3:

Oct 15 01:17:53 sc0 Domain-B.SC: /partition0/domain1/SB3/bbcGroup0/cpuAB/cpusafariagent1:
AFAR (high)[0x531] : 0x0000001a
AFAR [42:32] [10:00] : 0x1a
AFAR (low)[0x541] : 0x02000000
AFSR (high)[0x551] : 0x00080000
PERR [19:19] : 0x1
EMU A[0x501] : 0x08000000
UDT [27:27] : 0x1

This Example uses 5.12.X and 750Mhz USIII CPUs.

Since PERR Bit is set and it is a UDT error, the AFAR isn't valid. A higher level POST should be used to determine the suspect FRU.

Example 4:

Apr 04 15:04:43 sc0 Domain-A.SC: /partition0/domain0/SB0/bbcGroup1/cpuCD/cpusafariagent0:
AFAR (high)[0x531] : 0x00000400
AFAR [42:32] [10:00] : 0x400
AFAR (low)[0x541] : 0x00c00010
AFSR (high)[0x551] : 0x00040000
IERR [18:18] : 0x1
EMU A[0x501] : 0x00040000
S2M_WER [18:18] : 0x1

This Example uses 5.12.X and 750Mhz USIII CPUs.

Since IERR Bit is set indicating an Internal Error, the AFAR isn't valid. In this case the reporting CPU, which is /partition0/domain0/SB0/bbcGroup1/cpuCD/cpusafariagent0 (CPU/Memory Board 0 CPU C), is the suspect FRU.

Example 5:

/partition0/domain0/SB2/bbcGroup0/cpuAB/cpusafariagent0:
AFAR (high) [0x531] : 0x00000402
AFAR [ 42:32] [10:00] : 0x402
AFAR (low) [0x541] : 0x14001800
AFAR_2 (high) [0x571] : 0x00000402
AFAR_2 [42:32] [10:00] : 0x402
AFAR_2 (low) [0x581] : 0x14001800
AFSR (high) [0x551] : 0x00080000

AFAR_2: 0x00000402.0x14001800

               Bit 42 = 1 => Non Cacheable Address Space
Bit 33 = 1 = > Schizo PCI Config & IO AID
Bit 28:26 = 101 => '11' + 101 => 11101 = AID 29 = IB 8 Schizo 1
Bit 25 = 0 => Leaf B

With this example uses 5.13.X and higher with 900Mhz processors.

We should be looking at bits 42 and 33. Since both these bits are set we should be looking at the Schizo PCI Config & IO line on the chart above. When this is done we need to look at bits [28:26] in this AFAR. But when doing so you will be required to add a constant binary 11 to the left of bit 28. Thus AID = 11101 = 29 ( 11 constant + bits [28:26] 11 + 101 = 11101). Looking up AID 29 will be IB8 with Schizo 1. Since bit 25 = 0 this will be Leaf B (If Bit 25 is 0 = Leaf B, 1 = Leaf A).


Previously Published As
49293

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback