Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003585.1
Update Date:2010-08-27
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003585.1 :   Sun Fire [TM] 12K/15K: Cards Fail to be Configured in I/O Board Slot 1 if Slot 3 is Populated  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
205060


Symptoms
Adding a Crystal-2A card (2GB/Sec PCI Dual FC Network Adapter) into a
domain's configuration using cfgadm results in errors, and then a domain
panic.
 #  cfgadm -c configure pcisch0:e09b1slot1
- error message - "Hardware specific error"

Then the /var/adm/messages or the domain console show a panic:

 WARNING: pcisch-0: PCI fault log start:
PCI excessive retry error
PCI error ocurred on device #6
pcisch-0: PBM AFSR=0x20800003.20000000<mem-space> dwordmask=0 bytemask=3
pcisch-0: PCI primary error (8):
Excessive Retries
pcisch-0: PCI secondary error (8):
Excessive Retries
pcisch-0: PBM AFAR 0.00124040:WARNING: [AFT1] Bus Error (BERR) Event
detected by CPU291
Privileged Data Access at TL=0, errID 0x000001d0.41c71c38
AFSR 0x00100800<PRIV,BERR>.00000000 AFAR 0x00000451.00124000
Fault_PC 0x10372f4
 ^Mpanic[cpu291]/thread=2a1004d7d40: [AFT1] errID 0x000001d0.41c71c38 BERR
Error(s)

Suspecting that the I/O card itself might be bad, the card is replaced. The
cfgadm error and panic persist.



Resolution
1. Confirm that the card in question is a Crystal-2A card (problem has
also been seen on a third party "Venus" card and Qlogic card, but this
document specifically covers a Crystal-2A card):

https://support.oracle.com/handbook_private/Devices/Fibre_Channel/FIBRE_Dual_2GB_FC_AL.html

2. Confirm this card's location is in I/O board Slot 1 (c5v0 slot; Top
Right Slot).

3. Confirm that another card (type unimportant) is installed in the same
I/O board, Slot 3 (c5v1; Top Left Slot).

>If all are true, this document applies.
>If all are not true, this document does not apply.

Ultimately, the problem is based on Bug ID: 4830665, "hsPCI board does not
implement 33Mhz slots M66EN signal correctly." The bug is a design problem
with the 5 volt I/O cassettes hardware revision<09, which prevents I/O
board slot1 cards from being configured properly if I/O board slot 3 is
populated with any card (type unimportant). The bug was filed for issues
with "Venus" I/O cards

Bug ID: 4993711 exists for this same issue with Qlogic cards.

Bug ID: 4987200, "Crystal2a's may not initialize properly if installed in
slot1 w/slot3 populated," was filed on Crystal-2A cards for the same issue.

The resolution is to use pn 501-5600-09 or higher rev I/O cassettes.



Relief/Workaround

If replacement of the I/O cassette is not currently an option, install the Crystal-2A card in question in a different I/O Slot (non-Slot1).


Or, you may remove (if possible) the I/O card, which is located on the same I/O board Slot 3, and leave Slot 3 empty.



Additional Information
This solution is intended for use by Sun IT and Sun IT Partner Engineers only.

Bug ID: 4987200 shows that a symptom of this issue appears at OpenBoot(R) PROM (OBP) on the domain console.
Note: This message will only be logged in the domain's console log, not in /var/adm/messages, and will only be seen when the OBP setting "diag-switch?=true" is set. When diag-switch? is set to false, the "Probing" messages are not logged to the console.
From the bug:
 The hard failure always shows up in OBP as:
 *******************************************
 	Probing /pci@5d,600000 Device 1  Nothing there
    When OBP does find the cards the output is always:
 **************************************************
 	Probing /pci@5d,700000 Device 1  SUNW,qlc fp disk SUNW,qlc fp disk
 

Another symptom of this issue not already documented in the bug, is an error message that may be logged during the boot-up process:
 May 16 20:58:21 2004 WARNING: POST status for card in /IO9/C5V0 is good but OBP failed to probe it!
 
NOTE: The I/O board location in error may change depending on your domain configuration, but the slot (c5v0) will be the same.
This message simply states that HPOST (Hardware Power On Self Test) configured the I/O card into the domain successfully (and postlogs will confirm that), but OBP can not find/see/configure the card in the slot specified in the domain.

Product
Sun Fire 12K Server
Sun Fire 15K Server

Internal Comments
For the internal use of Sun Employees.

There is FCO A0246-1 for the 5-volt I/O cassettes pn 501-5600-08 and below: http://sunsolve.central/handbook_internal/fin-fco/1-6-A0246-1-1.html (no Oracle internal link found).

Reference Radiance case 64077490, Apollo Escalation 1-1031362, which encountered all the symptoms described above in this document. After replacing the cassettes with rev 09, all symptoms ceased and cfgadm operations worked smoothly.

12k, 15k, 12K, 15K, Crystal2A, Crystal-2A, crystal-2a, cfgadm, OBP, I/O cassette, 5volt
Previously Published As
76434

Change History
Updated by the ESG Knowledge Content Team 4/2010
Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback