Document Audience:INTERNAL
Document ID:I1144-1
Title:Clustered Netra D130 array causes various SCSI errors under load if LVD SCSI Disk Drives are installed to replace failed single ended SCSI disk drives in a cross cabled configuration.
Copyright Notice:Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved
Update Date:2005-03-29

------------------------------------------------------------
            - Sun Proprietary/Confidential: Internal Use Only -
------------------------------------------------------------------------

  ***  Sun Confidential:  Internal Use and Authorized VARs Only  ***
________________________________________________________________________

  This message including any attachments is confidential information
  of Sun Microsystems, Inc.  Disclosure, copying or distribution is
  prohibited without permission of Sun.  If you are not the intended
  recipient, please reply to the sender and then delete this message.
________________________________________________________________________
  
                        FIELD INFORMATION NOTICE
               (For Authorized Distribution by Sun Service)
FIN #: I1144-1
Synopsis: Clustered Netra D130 array causes various SCSI errors under load if LVD SCSI Disk Drives are installed to replace failed single ended SCSI disk drives in a cross cabled configuration.
Create Date: Jan/12/05
SunAlert: No
Top FIN/FCO Report: No
Products Reference: Sun StorEdge st D130 Array
Product Category: Storage / Diag-Doc-Service
Product Affected: 
Systems Affected:
-----------------  
Mkt_ID         Platform      Model       Description           Serial Number
------         --------      -----       -----------           -------------
  -              N14          ALL        Netra t1405                 -
  -              N15          ALL        Netra t1400                 -


X-Options Affected:
-------------------
Mkt_ID      Platform     Model     Description                 Serial Number
------      --------     -----     -----------                 ------------- 
  -         st D130       ALL      Netra st D130 Array               -
X1032A         -           -       OPT INT PCI 10/100BASET NIC       -
Parts Affected: 
----------------------
Part Number        Description                        Model
-----------        -----------                        -----
390-0069-03        DRV SEA 36GB 10K 1-in SCSI3          -
390-0109-05        DRV SEA 36GB 10K 1-in SCSI4          -
390-0156-03        DRV FJ 36GB 10K1 SCSI4-T485-61       -
References: 
ESC: 1-3909668
Issue Description: 
Installation of LVD SCSI disk drives as replacements for failed single
ended SCSI disk drives in a cross cabled configuration with clustered 
Netra D130 array causes various SCSI errors under load.

The following is a SunCluster 3.0 configuration in which the two host
nodes are Netra t 1400 servers connected to a pair of Netra D130
arrays.  A Single-Ended Ultra/Wide SCSI/FastEthernet (SunSwift PCI) has
been installed in each host.  These hosts are cabled to the D130 arrays
using 4 0.8 Meter SCSI cables in a cross connected fashion.

The onboard SCSI connecter of the Node 0 Netra 1400 is connected to
the SCSI in port of the Array 1 Netra D130, the SunSwift HBA of this
host is connected to the SCSI out port of the Array 2 netra D130.

The onboard SCSI connecter of the Node 1 Netra 1400 is connected to
the SCSI in port of the Array 2 Netra D130, the SunSwift HBA of this
host is connected to the SCSI out port of the Array 1 netra D130.

Diagrams are shown below:

        	+-------------+
  Netra t 1400	|           |-|-----+ 
    Node 0	| =           |     |
		+-|-----------+     |
		  +---------+	    |	
		+-----------|-+     |
  Netra st D130	|           = |     |
    Array 1	|           =-|--+  |
		+-------------+  |  |
		                 |  |
		                 |  |
		+-------------+  |  |
  Netra st D130	|        +--= |  |  |
    Array 2	|        |  =-|--)--+
		+--------|----+  |
		         |       |
		  +------+       |
		+-|-----------+  |
  Netra t 1400	| |         |-|--+
    Node 1	| =           |
		+-------------+
		
There are several factors contributing to the observed issue.

   . In this configuration given the 1 meter internal bus length of
     the D130 array, the 0.8 meter SCSI cables used to connect the
     hosts, and the 5 total targets on the SCSI bus, the entire overall
     SCSI bus length exceeds the 1.5 meter maximum bus length for this
     configuration.  This introduces a potential SCSI signal
     degradation issue and thus SCSI errors.

   . The onboard SCSI adapter on one host is cross connected to the
     SunSwift HBA on the second host which is typically not done.

   . It appears that the LVD SCSI disk drives are more sensitive to the
     extended bus length / cross cabled configuraton than single ended
     SCSI disks and thus genenerate the SCSI errors.

All of these factors combined to produce the SCSI errors observed onsite.

The following are examples of the SCSI errors seen on the D130 array.

SCSI Errors reported against the SunSwift HBA:
 
  Sep  20 03:16:47 DS2cable0 SCSI: [ID 107833 kern.warning] WARNING: 
             /pci@1f,4000/pci@5/SUNW,isptwo@4 (isp0):
  Sep  20 03:16:47 DS2cable0  Interrupt bit still set after 10 seconds.
             Card or firmware failure.
  Sep  20 03:16:51 DS2cable0 SCSI: [ID 107833 kern.warning] WARNING: 
             /pci@1f,4000/pci@5/SUNW,isptwo@4/sd@a,0 (sd9):
  Sep  20 03:16:51 DS2cable0 SCSI transport failed: reason 'reset':
             retrying command.

SCSI Errors reported against the onboard SCSI adapter:

  Sep 20 03:27:38 DS2cable0 SCSI: [ID 107833 kern.warning] WARNING: 
             /pci@1f,4000/scsi@3,1 (glm1):
  Sep 20 03:27:38 DS2cable0  Resetting SCSI bus, Message-In was expected  
             from (11,0)
  Sep 20 03:27:38 DS2cable0 genunix: [ID 408822 kern.info] NOTICE: glm1:  
             fault detected in device; service still available
  Sep 20 03:27:38 DS2cable0 genunix: [ID 611667 kern.info] NOTICE: glm1:  
             Resetting SCSI bus, Message-In was expected from (11,0)
  Sep 20 03:27:38 DS2cable0 SCSI: [ID 107833 kern.warning] WARNING: 
             /pci@1f,4000/scsi@3,1 (glm1):
  Sep 20 03:27:38 DS2cable0 	Target 11 reducing sync. transfer rate
  Sep 20 03:27:38 DS2cable0 glm: [ID 923092 kern.warning] WARNING: 
             ID[SUNWpd.glm.sync_wide_backoff.6014]
  Sep 20 03:27:38 DS2cable0 SCSI: [ID 107833 kern.warning] WARNING:
             /pci@1f,4000/scsi@3,1 (glm1):
  Sep 20 03:27:38 DS2cable0 	got SCSI bus reset

Issue an 'iostat -En' command from an operating system prompt, review the
output and look for one of the three Vendor/Product combinations listed
below:

   Vendor: FUJITSU  Product: MAP3367NC
   Vendor: SEAGATE  Product: ST336605LC
   Vendor: SEAGATE  Product: ST336607LC

See the sample iostat -En output below:

   #iostat -En 
   sd10     Soft Errors: 0 Hard Errors: 2 Transport Errors: 0 
   Vendor: FUJITSU  Product: MAP3367NC SUN18G  Revision: 0804 Serial No: 05P32232     
   Size: 18.11GB <18110967808 bytes>
   Media Error: 0 Device Not Ready: 0 No Device: 2 Recoverable: 0  
   Illegal Request: 0 Predictive Failure Analysis: 0

Under normal conditions, LVD drives can be used as a one for one
replacement for Single Ended SCSI disks.  But in the configuration
detailed above, the LVD disks combined with an Extended SCSI bus length
for cluster configurations and a cluster configuration using differing
HBAs in cross connected fashion are contributing to the SCSI signal
degradaton which leads to the SCSI errors.

Removal and replacement of existing LVD disks in affected Netra D130
arrays resolves the SCSI error issue.
Implementation: 
---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        |   |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
Corrective Action: 
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned issue.

When replacing failed drives in the D130 arrays, only use a single ended
SCSI drive for a replacement disk. The following details the procedure that 
must be followed to guarantee that an appropriate drive will be shipped 
as a replacement.

   1. NOTE that the FE/SSE needs to contact the RSL directly
      (the Parts Call Center will not be able to do this for them).
      Upon doing so, DO NOT advise the RSL to open the box.  Having
      RSLs open boxes will introduce risk to the part, as they do not
      have any ESD or parts handling training.  Instead, the RSL
      should check the drive base part number that is noted on the
      external box label:

           TOP LEVEL FRU:  540-4689

             Different versions of bare drives that go into this
             same top level FRU part number are noted on the label:
               
                 * 390-0050 Single Ended SCSI (OK)
                 * 390-0051 Single Ended SCSI (OK)
                 * 390-0052 Single Ended SCSI (OK)
                 * 390-0069 LVD SCSI (not acceptable)
                 * 390-0109 LVD SCSI (not acceptable)
                 * 390-0156 LVD SCSI (not acceptable)

    2. Make sure FE/SSEs are aware that having the RSLs manually check 
       the boxes may slightly increase the order processing time.
Comments: 
Acronyms used for this FIN:

   LVD - Low Voltage Differential

============================================================================

NOTE: FIN Tracking Instructions for Radiance/SPWeb:
--------------------------------------------------

If a Radiance case involves the application of a FIN to solve a customer
issue, please complete the following steps in Radiance/SPWeb prior to
closing the case:
 
    o Select "Field Information Notice" in the REFERENCE TYPE field.

    o Enter FIN ID number in the REFERENCE ID field.
      For example; I1111-1.

If possible, include additional details in the REFERENCE SUMMARY field
(ie. implementation complete, customer declined, etc.)
--------------------------------------------------------------------------


Implementation Notes:
--------------------

In case of "Mandatory" FINs, Sun Services will attempt to contact
all known customers to recommend proactive implementation.

For "Controlled Proactive" FINs, Sun Services mission critical
support teams will initiate proactive implementation efforts for
their respective accounts as required.

For "Reactive" FINs, Sun Services and partners will implement
the necessary corrective actions as the need arises.


Billing Information:
-------------------

Warranty: On-Site Labor Rates are based on specified Warranty deliverables
          for the affected product.

Contract: On-Site Labor Rates are based on the type of service contract.

Non Contract: On-Site implementation by Sun is available based on On-Site
              Labor Rates defined in the Price List.

--------------------------------------------------------------------------

All FIN documents are accessible via Internal SunSolve.  Type "sunsolve"
in a browser and follow the prompts to Search Collections.

For questions on this document, please email:

        [email protected]

The FIN and FCO homepage is available at:

        http://sdpsweb.central/FIN_FCO/index.html

For more information on how to submit a FIN, go to:

        http://pronto.central/fin.html

To access the Service Partner Exchange, use:

        https://spe.sun.com
--------------------------------------------------------------------------