Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1009927.1
Update Date:2011-02-25
Keywords:

Solution Type  Troubleshooting Sure

Solution  1009927.1 :   Sun SPARC(R) Enterprise Mx000 (OPL) Servers : Configuration Errors  


Related Items
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>OPL Servers
  •  

PreviouslyPublishedAs
213609


Applies to:

Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M5000 Server
Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
All Platforms

Purpose

This document is aimed at helping users identify possible issues with their platform when configuration errors are detected.


To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - M Series Servers

Last Review Date

August 10, 2010

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Minimal System Config:
  • M4000
    • CPU in Slot 0
    • Memory in Slot 0
  • M5000 - With a single IOU
    • IOU in Slot 0
    • CPU in Slot 0
    • Memory in Slot 0
  • M5000 - With both IOUs, IOU1 is required to access internal disks 2 and 3
    • CPU in Slot 0 and 2
    • Memory in Slot 0 and 4
  • M8000/9000
    • Two CPUs (must be in positions 0 and 1).
    • All Memory in Group A (16 DIMMs).

Examples of simple configuration mistakes:



Steps to Follow
No Memory or CPU in a Physical System Board

No Memory or CPU in a Physical System Board

In order to use the IO in a Physical System Board there must also be CPU and Memory installed.

In this example the customer has an M5000 with two CPU and Memory boards and both IOUs installed. The customer reported that they were unable to see the PCI cards installed in IOU1 from either the ok prompt or Solaris.

XSCF> showstatus
No failures found in System Initialization.
XSCF> showhardconf -u
SPARC Enterprise M5000 COL2-FF2; Memory_Size:32 GB;
+-----------------------------------+------------+
| FRU | Quantity |
+-----------------------------------+------------+
| MBU_B | 1 |
| CPUM | 2 | << Two CPU Boards
| Freq:2.150 GHz; | ( 4) |
| MEMB | 2 | << Two Memory Boards
| MEM | 16 |
| Type:2B; Size:2 GB; | ( 16) | << 16 2Gig DIMMs
| DDC_A | 4 |
| DDC_B | 2 |
| IOU | 2 | << Two IO Boards
| DDC_A | 2 |
| DDC_B | 2 |
| DDCR | 2 |
| XSCFU | 1 |
| OPNL | 1 |
| PSU | 4 |
| FANBP_C | 1 |
| FAN_A | 4 |
+-----------------------------------+------------+

However, when looking in more detail at the showhardconf output we can see that the CPUs and Memory are in Slots 0 and 1. To access all the IO required CPU/Memory in Slot 1 to be moved to CPU Slot 2 and Memory Slot 4.

XSCF> showhardconf
SPARC Enterprise M5000 COL2-FF2;
+ Serial:BCF072503H; Operator_Panel_Switch:Service;
+ Power_Supply_System:Dual; SCF-ID:XSCF#0;
+ System_Power:On;
Domain#0 Domain_Status:Running;
  MBU_B Status:Normal; Ver:0201h; Serial:BF07210VFK  ;
+ FRU-Part-Number:CF00501-7670 02 /501-7670-02 ;
+ Memory_Size:32 GB;
CPUM#0-CHIP#0 Status:Normal; Ver:0201h; Serial:PP0647H909 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#0-CHIP#1 Status:Normal; Ver:0201h; Serial:PP0647H909 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#0 Status:Normal; Ver:0201h; Serial:PP071202C2 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#1 Status:Normal; Ver:0201h; Serial:PP071202C2 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
MEMB#0 Status:Normal; Ver:0101h; Serial:BF072311X7 ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-5d0015e2;
+ Type:2B; Size:2 GB;
MEM#0B Status:Normal;
.... removing the rest of the DIMM info
MEMB#1 Status:Normal; Ver:0101h; Serial:BF072311ML ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-5d0015da;
+ Type:2B; Size:2 GB;
MEM#0B Status:Normal;
.... removing the rest of the DIMM info
DDC_A#0 Status:Normal;
DDC_A#1 Status:Normal;
DDC_A#2 Status:Normal;
DDC_A#3 Status:Normal;
DDC_B#0 Status:Normal;
DDC_B#1 Status:Normal;
IOU#0 Status:Normal; Ver:0101h; Serial:BF072412BC ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
PCI#1 Name_Property:SUNW,qlc;
PCI#2 Name_Property:SUNW,qlc;
IOU#1 Status:Normal; Ver:0101h; Serial:BF072518HF ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
XSCFU Status:Normal,Active; Ver:0101h; Serial:BF07140FBU ;
+ FRU-Part-Number:CF00501-7672 02 /501-7672-02 ;
... cut the rest of the output

No memory associated with PSB containing CPUs

A Physical System Board must have at least memory and CPU to be functional.

In this example an M5000 is fully populated with CPU, however, the memory had been installed in PSB0 Slots 0 and 1. As a result the CPUs in PSB1 are Deconfigured. To use all four CPU Boards the two Memory should have been installed in Slots 1 and 4.

XSCF> showstatus
MBU_B Status:Normal;
* CPUM#2-CHIP#0 Status:Deconfigured;
* CPUM#2-CHIP#1 Status:Deconfigured;
* CPUM#3-CHIP#0 Status:Deconfigured;
* CPUM#3-CHIP#1 Status:Deconfigured;

XSCF> showhardconf
SPARC Enterprise M5000 COL2-FF2;
+ Serial:BCF0726048; Operator_Panel_Switch:Locked;
+ Power_Supply_System:Single; SCF-ID:XSCF#0;
+ System_Power:Off;
Domain#0 Domain_Status:Powered Off;

   MBU_B Status:Normal; Ver:0201h; Serial:BF07140EPK  ;
+ FRU-Part-Number:CF00501-7670 02 /501-7670-02 ;
+ Memory_Size:32 GB;
CPUM#0-CHIP#0 Status:Normal; Ver:0201h; Serial:PP072300VA ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#0-CHIP#1 Status:Normal; Ver:0201h; Serial:PP072300VA ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#0 Status:Normal; Ver:0201h; Serial:PP0705017M ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
CPUM#1-CHIP#1 Status:Normal; Ver:0201h; Serial:PP0705017M ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#2-CHIP#0 Status:Deconfigured; Ver:0201h; Serial:PP06533939 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#2-CHIP#1 Status:Deconfigured; Ver:0201h; Serial:PP06533939 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#3-CHIP#0 Status:Deconfigured; Ver:0201h; Serial:PP06533940 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
* CPUM#3-CHIP#1 Status:Deconfigured; Ver:0201h; Serial:PP06533940 ;
+ FRU-Part-Number:CF00375-3477 01 /375-3477-01 ;
+ Freq:2.150 GHz; Type:16;
+ Core:2; Strand:2;
MEMB#0 Status:Normal; Ver:0101h; Serial:BF072311NE ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-d31f0646;
+ Type:2B; Size:2 GB;
... delete the other seven DIMMs
MEMB#1 Status:Normal; Ver:0101h; Serial:BF072311NW ;
+ FRU-Part-Number:CF00501-7674 03 /501-7674-03 ;
MEM#0A Status:Normal;
+ Code:2c000000000000000836HTF25672PY-667D10100-d31f0673;
+ Type:2B; Size:2 GB;
... removing the other seven DIMMs
DDC_A#0 Status:Normal;
DDC_A#1 Status:Normal;
DDC_A#2 Status:Normal;
DDC_A#3 Status:Normal;
DDC_B#0 Status:Normal;
DDC_B#1 Status:Normal;
IOU#0 Status:Normal; Ver:0101h; Serial:BF072412CR ;
+ FRU-Part-Number:CF00541-2240 02 /541-2240-02 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
... Removing the rest of the output.

Three Memory boards in an M4000/M5000 PSB is not supported

On the M4000 and M5000 the only supported configs are 1, 2 and 4 Memory Boards in a PSB. In this M5000 example six Memory boards have been spread across the two PSBs all the memory DIMMs from the third Memory board are shown as deconfigured.

XSCF> showstatus
MBU_B Status:Normal;
MEMB#2 Status:Normal;
* MEM#0A Status:Deconfigured;
* MEM#0B Status:Deconfigured;
* MEM#1A Status:Deconfigured;
* MEM#1B Status:Deconfigured;
* MEM#2A Status:Deconfigured;
* MEM#2B Status:Deconfigured;
* MEM#3A Status:Deconfigured;
* MEM#3B Status:Deconfigured;
MEMB#6 Status:Normal;
* MEM#0A Status:Deconfigured;
* MEM#0B Status:Deconfigured;
* MEM#1A Status:Deconfigured;
* MEM#1B Status:Deconfigured;
* MEM#2A Status:Deconfigured;
* MEM#2B Status:Deconfigured;
* MEM#3A Status:Deconfigured;
* MEM#3B Status:Deconfigured;
XSCF> showhardconf -u
SPARC Enterprise M5000 M5000; Memory_Size:48 GB;
+-----------------------------------+------------+
| FRU | Quantity |
+-----------------------------------+------------+
| MBU_B | 1 |
| CPUM | 4 |
| Freq:2.150 GHz; | ( 8) |
| MEMB | 6 |
| MEM | 48 |
| Type:1A; Size:1 GB; | ( 48) |
| DDC_A | 4 |
| DDC_B | 2 |
| IOU | 2 |
| DDC_A | 2 |
| DDC_B | 2 |
| DDCR | 2 |
| XSCFU | 1 |
| OPNL | 1 |
| PSU | 4 |
| FANBP_C | 1 |
| FAN_A | 4 |
+-----------------------------------+------------+

No memory in the platform

Configuration Error reported after using "setupfru"

When configuring domain the "configuration error was detected" error message may appear when there is a hardware issue with the machine.

XSCF> setupfru -x 1 sb 0
Operation has completed. However, a configuration error was detected.

Looking into the issue we see that there are no error reports, and the status is reported as normal.

XSCF> showstatus
No failures found in System Initialization.
XSCF> showlogs error
XSCF>

Using `showhardconf` we can see that in this case the issue is there is no memory in the platform.

XSCF> showhardconf -u
SPARC Enterprise M5000 COL2-FF2; Memory_Size:0 GB;
+-----------------------------------+------------+
| FRU | Quantity |
+-----------------------------------+------------+
| MBU_B | 1 |
| CPUM | 1 |
| Freq:2.150 GHz; | ( 2) |
| MEMB | 1 |
| DDC_A | 4 |
| DDC_B | 2 |
| IOU | 1 |
| DDC_A | 1 |
| DDC_B | 1 |
| DDCR | 1 |
| XSCFU | 1 |
| OPNL | 1 |
| PSU | 2 |
| FANBP_C | 1 |
| FAN_A | 4 |
+-----------------------------------+------------+

Checking `showlogs event` reports any configuration issues.

XSCF> showlogs event
May 17 00:16:10 PDT 2007 no CPU on XSB#00-0
May 17 00:16:10 PDT 2007 no MEM on XSB#00-0
May 17 00:16:10 PDT 2007 no CPU on XSB#01-0
May 17 00:16:10 PDT 2007 no MEM on XSB#01-0
May 17 00:32:09 PDT 2007 no MEM on XSB#00-0
May 17 00:33:52 PDT 2007 no MEM on XSB#00-0

Only CMU boards physically populated with 4 processors can be placed in Quad-XSB mode

Configuration Error reported after using "setupfru"

When configuring domain the "configuration error was detected" error message may appear when there is a hardware issue with the machine.

XSCF> setupfru -x 4 sb 1
Operation has completed. However, a configuration error was detected.

Looking into the issue we see that there are no error reports, and the status is reported as normal.

XSCF> showstatus
No failures found in System Initialization.

XSCF> showlogs error

Using `showhardconf` we can see that, in this case, CMU#1 is a 2 CPUs (slots 0 and 1) CMU with the group A populated with 16 DIMMs. Configuring this CMU in quad mode causes 8 DIMMs to have no associated CPUs (slots 2 and 3)

    CMU#1 Status:Normal; Ver:0101h; Serial:PP074802GW  ;
        + FRU-Part-Number:CA06620-D002 C1   /371-2214-03          ;
        + Memory_Size:16 GB;
        CPUM#0-CHIP#0 Status:Normal; Ver:0801h; Serial:PP091400BD  ;
            + FRU-Part-Number:CA06620-D044 B1   /375-3580-02          ;
            + Freq:2.520 GHz; Type:32;
            + Core:4; Strand:2;
        CPUM#1-CHIP#0 Status:Normal; Ver:0801h; Serial:PP091302J4  ;
            + FRU-Part-Number:CA06620-D044 B1   /375-3580-02          ;
            + Freq:2.520 GHz; Type:32;
            + Core:4; Strand:2;
        MEM#00A Status:Normal;
            + Code:ce0000000000000001M3 93T2950EZA-CE6 4145-45569c2b;
            + Type:1A; Size:1 GB;
[...]


      MEM#33A Status:Normal;
            + Code:ce0000000000000001M3 93T2950EZA-CE6 4145-4754f6c3;
            + Type:1A; Size:1 GB;

 

Checking `showlogs event` reports any configuration issues.

Jun 19 22:10:41 KST 2009      SB configuration changed (quad-XSB mode)
Jun 19 22:10:44 KST 2009      no CPU on XSB#01-2
Jun 19 22:10:45 KST 2009      no CPU on XSB#01-3

and `showboards` reports the quad-XSB as "unmount"

XSB  R DID(LSB) Assignment  Pwr  Conn Conf Test    Fault    COD
---- - -------- ----------- ---- ---- ---- ------- -------- ----
01-0   SP       Available
   y    n    n    Passed  Normal   n  
01-1   SP       Available   y    n    n    Passed  Normal   n  
01-2   SP       Unavailable y    n    n    Unmount Normal   n  
01-3   SP       Unavailable y    n    n    Unmount Normal   n 
 



Previously Published As 89561






Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback