Document Audience:	INTERNAL
Document ID:	I0939-1
Title:	VxDMP I/O fail back during Microcode upgrade on SE9900 systems can result in data corruption.
Copyright Notice:	Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved
Update Date:	2005-04-14

---------------------------------------------------------
            - Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                        FIELD INFORMATION NOTICE
               (For Authorized Distribution by SunService)

FIN #: I0939-1

Synopsis: VxDMP I/O fail back during Microcode upgrade on SE9900 systems can result in data corruption.

Create Date: Feb/28/03

SunAlert: No

Top FIN/FCO Report: Yes

Products Reference: Sun StorEdge 99x0 Arrays

Product Category: Storage / Service

Product Affected:

Systems Affected:
-----------------  
Mkt_ID    Platform    Model    Description                  Serial Number
------    --------    -----    -----------                  -------------
  -        ANYSYS       -      System Platform Independent        -   


X-Options Affected:
-------------------
Mkt_ID     Platform        Model     Description          Serial Number
------     --------        -----     -----------          -------------
 -         SE9910     ALL    Sun StorEdge 9910 Array         -
 -         SE9960     ALL    Sun StorEdge 9960 Array         -
 -         SE9970     ALL    Sun StorEdge 9970 Array         -
 -         SE9980     ALL    Sun StorEdge 9980 Array         -

Parts Affected:

Part Number     Description           Model
-----------     -----------           -----
     -               -                  -

References:

DOC: SE99xx Maintenance manual. 
     Veritas Volume Manager Administrators Guide.
 
URL: http://pts-americas.west/nws/products/T99x0/documentation.html, 
     http://www.veritas.com

Issue Description:

When performing an online Microcode upgrade of a SE9900 array using the
"Alternate SCSI Path" mode, if the host is utilizing VxDMP for
multipath failover, it is possible to inadvertently shutdown all paths
from the host to the array, thereby causing an outage and possibly data
corruption/loss.

VxDMP by default only checks the health of an offlined path every 300
seconds.  Therefore, if the support engineer relies on the auto
failover capabilities of VxDMP to offline and online multiple paths to
the host, it is possible for the Microcode upgrade process to:

   1. Down the paths from cluster 1

   2. Perform the necessary updates to cluster 1
 
   3. Bring cluster 1 back up
  
   4. Then, before VxDMP has recognized that cluster 1 is back online, down 
      cluster 2 (which VxDMP believes is the only good path). 

This would cause the host to lose all access to the array.  This can be 
seen in the /var/adm/messages file of the host:

   Sept 20th 19:40:43  HDLM: [ID 936769 kern.info] 1A down 
   dmp path0 offline

   Sept 20th 19:42:37  HDLM: [ID 936770 kern.info] 2A down 
   dmp path1 offline

   Sept 20th 19:44:37  HDLM: [ID 936769 kern.info] 1A up dmp 
   path0 online


The default VxDMP health check period for an offlined path can be set 
or verified by looking at:

   /etc/rcS.d/S25vxvm-sysboot

     # By default, the restore daemon will check the health of
     # only disabled paths with a polling interval of 300sec.

   restore_daemon_opts="interval=300 policy=check_disabled"
   
     # Uncomment the following line to turn on checking for all
     # the paths on the system with polling interval of 300sec.

     # restore_daemon_opts="interval=300 policy=check_all"

Implementation:

---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---

Corrective Action:

The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.    
    
The proper corrective action to avoid this type of outage is to insure
availability of multiple paths prior to the Microcode upgrade and to
manually fail over paths through the upgrade process.

To verify availability of all paths prior to the upgrade, execute the
"vxdmpadm listctlr all" command from the host.  The result should look
similar to this:

   root[sh]@test# vxdmpadm listctlr all

   CTLR-NAME       ENCLR-TYPE      STATE      ENCLR-NAME
   =====================================================
   c0              Disk            ENABLED    Disk
   c16             Disk            ENABLED    Disk
   c17             Disk            ENABLED    Disk

This indicates that the state of both paths (c16 and c17) are enabled.
In addition, you can verify multiple paths to a particular disk by
executing "vxdisk list ":

   root[sh]@test# vxdisk list c16t1d30s2
   Device:    c16t1d30s2
   devicetag: c16t1d30
   type:      sliced
   hostid:    test
   disk:      name=test2dg31 id=1037654212.4042.test
   group:     name=test2dg id=1037658550.4056.test
   flags:     online ready private autoconfig autoimport imported
   pubpaths:  block=/dev/vx/dmp/c16t1d30s4 char=/dev/vx/rdmp/c16t1d30s4
   privpaths: block=/dev/vx/dmp/c16t1d30s3 char=/dev/vx/rdmp/c16t1d30s3
   version:   2.2
   iosize:    min=512 (bytes) max=2048 (blocks)
   public:    slice=4 offset=0 len=4397056
   private:   slice=3 offset=1 len=2047
   update:    time=1041990512 seqno=0.8
   headers:   0 248
   configs:   count=1 len=1486
   logs:      count=1 len=225
   Defined regions:
    config   priv 000017-000247[000231]: copy=01 offset=000000 disabled
    config   priv 000249-001503[001255]: copy=01 offset=000231 disabled
    log      priv 001504-001728[000225]: copy=01 offset=000000 disabled
   Multipathing information:
   numpaths:   2
   c16t1d30s2      state=enabled
   c17t1d30s2      state=enabled

When performing the online Microcode upgrade using the "Alternate SCSI
Path" method, you will be prompted to discontinue I/O through a
particular path with the message:

  "Switch the SCSI channel path that is connected to the cluster-1, to 
  an alternate path.  Then select OK."

Before clicking "OK" you must disable the specified path from the host
by entering "vxdmpadm disable ctlr=".  Verify that the path is down by once again running "vxdmpadm
listctrl all", this should give you an output similar to:

   root[sh]@test# vxdmpadm listctlr all

   CTLR-NAME       ENCLR-TYPE      STATE      ENCLR-NAME
   =====================================================
   c0              Disk            ENABLED    Disk
   c16             Disk            DISABLED   Disk
   c17             Disk            ENABLED    Disk


*NOTE: Make sure you bring down the correct path, or when you click "OK"  
       on the SVP you will bring down the host!

The Microcode installation process will then ask you to stop I/O on the
next path (cluster-2).  Before continuing with this process you must
bring the first path back online and verify it by running "vxdmpadm
enable ctlr=" then "vxdmpadm listctrl all" and
insure that the controller once again has the state of "ENABLED".  You
can then proceed to disable the second controller using the same method
("vxdmpadm disable ") and verify it is disabled before
clicking "OK" on the SVP.  After the microcode upgrade on cluster-2 has
completed, you must re-enable that path with the vxdmpadm command as
shown above.

Make sure to verify that when you have completed the Microcode upgrade,
all paths are once again enabled and accessible.

For more information regarding the Microcode upgrade process look at
the Micrc-FC section of the relevant maintenance manual.

For more information regarding Veritas VxDMP commands look at the
Veritas Volume Manager Administrator's Guide, section 3, page 95.

Comments:

None.

============================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Sun Services will attempt to contact   
     all affected customers to recommend implementation of the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Sun Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Sun Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.central/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Central/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://spe.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------