Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1000786.1
Update Date:2010-09-10
Keywords:

Solution Type  FAB (standard) Sure

Solution  1000786.1 :   Recommended PCI Card Slotting on V480/V490 to prevent NVRAM Corruption of FCAL HBAs.  


Related Items
  • Sun Fire V480 Server
  •  
  • Sun Fire V490 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
201063


Product
Sun Fire V490 Server
Sun Fire V480 Server

Bug Id
<SUNBUG: 6460429>


Impact

FCAL NVRAM corruption can occur at power cycle time.

When NVRAM is corrupted, even by a single bit, the driver will not trust any of the contents including the World Wide Number (WWN). The driver will then use default values.

If the leadville driver is in use, the impact is minimal because leadville will generate a WWN for the card/port based on the hostid.

IF the Qlogic native driver is in use, the WWN will be zeroed. In this case, if there are 2 systems on the same storage SAN, both will have a duplicate WWN.

If two hosts have the same WWN, then the two hosts can not "login" to the SAN since the SAN traffic will not know who is the correct WWN to send SAN traffic to.

The WWN is similar to the MAC address for network communications. It should be unique on a SAN Storage network.


Contributing Factors

This issue can occur on the following platforms:

  • A37* - Sun Fire V480
  • A52* - Sun Fire V490

The following X-Options are affected:

   X6768 SG-XPCI2FC-QF2(Z)
   QLogic QLA2342-SUN PCI/PCI-X
   2Gigabit/Sec PCI Dual FC Host Adapter
   or
   SG-XPCI1FC-QL2  
   Qlogic QLA2340-CK PCI/PCI-X Amber2A*
   2Gigabit/Sec PCI-X Single FC Host Adapter

* There is no Xoption # for Amber2A

The following parts are affected:

  • 375-3108 Crystal-2A
  • 375-3363 Crystal-2A RoHS:Y
  • 375-3383 Amber2A RoHS:Y

Notes:

1) The QLogic Native FCAL HBA is not sold or serviced by Sun. Sun Service does not stock this HBA. There is no Sun FRU/CRU part number for it. However, the QLogic HBA models affected are QLA2340, QLA2342 and QLA2344.

2) The issue is directly related to PCI Card slotting.

If there is a 3.3v/5v PCI card in a PCI slot on the same PCI-Reset line as the slot in which the FCAL HBA is located, there is potentIal for this issue to occur.

In the V480 and V490 there are 6 PCI slots:

- Slots 0 and 1 are on the same PCI-Reset line

- Slots 2 and 3 on another PCI-Reset line

- Slots 4 and 5 on the third PCI-Reset line

3) HBA onboard NVRAM corruption cannot be detected by visual inspection of the HBA.

Explorers do contain the outputs of luxadm -e dump_map and prtpicl -v.

Still other methods of determining if the system contains QLogic based FCAL HBAs and whether they are QLogic Native or Sun Branded, can be found in INTERNAL Infodoc 74531.


Symptoms

1) If the leadville driver is loaded the error message is:

    50 ohm is not set

2) With QLogic Native HBA:

If the Qlogic native driver "qla" is 4.08, the last 6 digits of the WWN is set to 0 and in the system boot messages the last four hex characters of the HBA's WWN will be changed to zero's.

Here are 2 examples of HBA0 and HBA1 with corrupt NVRAM's:
qsxbat08 qla2300: [ID 364886 kern.info] qla2300-hba0-adapter-node-name="200000e08b000000";
 qsxbat08 qla2300: [ID 358785 kern.info] qla2300-hba0-adapter-port-name="210000e08b000000";
 qsxbat08 qla2300: [ID 332924 kern.info] qla2300-hba0-adapter-port-id="000000";
 qsxbat08 qla2300: [ID 364886 kern.info] qla2300-hba1-adapter-node-name="200000e08b000000";
 qsxbat08 qla2300: [ID 358785 kern.info] qla2300-hba1-adapter-port-name="210000e08b000000";
 qsxbat08 qla2300: [ID 332924 kern.info] qla2300-hba1-adapter-port-id="000000";
 
a non corrupted NVRAM would be as follows:
 qla2300: [ID 364886 kern.info] qla2300-hba0-adapter-node-name="200000e08b1e5e6c";

If the QLogic native driver "qla" is 4.18 or later, as the driver attaches to the HBA, system boot messages will include the string "NVRAM Corrupt" in additions to showing the WWN with the last 6 digits set to 0.

3) With Sun Branded HBA:

Take note of Sun Branded HBA onboard NVRAM corruption from modified WWNs as presented in outputs of OBP commands, luxadm, prtpicl, and system boot messages.

At the OK Prompt "cd" to the HBA node, then a ".properties" command will show that the right most part of the HBA's WWN has been changed to the system's HostID.

You can also use:

    Luxadm -e dump_map
    prtpicl -v -c scsi-fp
    prtdiag -v will identify the QLogic Native HBAs.
    prtdiag -v will list the Sun Branded HBAs as follows:

For example:

   SUNW,qlc-pci1077,2312.1077.10a.2+  <--x6768   Sun/Qlogic Crystal-2A
   SUNW,qlc-pci1077,2312.1077.149.2+  <--        Sun/Qlogic Amber2A

At system boot time when the Sun "qlc" driver attaches to the HBA it will report:

   50 ohm is not set

 


Root Cause

FCAL NVRAM corruption is caused by interference to the correct operation of PCI_RESET lines within the PCI bus.

It can occur randomly on system power up and power down.

PCI_RESET nets are shared between slots 0 and 1, 2 and 3, 4 and 5.

The HBA onboard NVRAM corruption occurs when PCI_RESET is deasserted during the power up or power down of a particular QLogic ASIC on the HBA resulting in unpredictable accesses to the NVRAM.

Other PCI cards configured on the PCI bus can cause the PCI_RESET line to be deasserted.


Workaround

Until a final resolution is available the following is recommended:

If the affected HBA is in Slot 0, or 2 or 4, leave the other slot of the associated PCI_RESET Slot pair empty (i.e: Slots 1, 3 and 5, respectively).

Alternatively, configure the PCI card slotings so that any combination of the affected Sun Branded FCAL HBAs and the QLogic Native FCAL HBA occupy a PCI_RESET PCI slot pair.

For example, a Qlogic Native FCAL HBA in PCI slot 0 and a Sun Branded Crystal-2A in PCI slot 1 prevents this issue from occurring.


Resolution

At this time there is no engineering fix for this issue.


Previously Published As
102755
Internal Contributor/submitter
[email protected], [email protected], [email protected], [email protected], [email protected]

Internal Eng Business Unit Group
SSG WGS (Workgroup Systems)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Escalation ID
1-18894137

Internal Kasp FAB Legacy ID
102755

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2006-12-19
Avoidance: Reconfiguration
Responsible Manager: Steven Doherty
Original Admin Info: [WF 19-Dec-2006, karened: releaseing after Ext review and update]

[WF 15-Dec-2006, karened: took me about a week to rewrite from 2 different submittals]

Product_uuid
5c71fc02-5e51-11d7-8add-8938754df22a|Sun Fire V490 Server
a2b9bc2b-52c6-45c2-a3e0-f19bd2c86953|Sun Fire V480 Server

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback