Document Audience:INTERNAL
Document ID:I0854-3
Title:Invalid Device structure found after disk removal when under Volume Manager 3.2 Control.
Copyright Notice:Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved
Update Date:2004-07-27

---------------------------------------------------------
            - Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                        FIELD INFORMATION NOTICE
               (For Authorized Distribution by SunService)
FIN #: I0854-3
Synopsis: Invalid Device structure found after disk removal when under Volume Manager 3.2 Control.
Create Date: Jul/27/04
SunAlert: No
Top FIN/FCO Report: Yes
Products Reference: Sun Fire V880/V480, E3500 Servers, A5x00 StorEdge Array
Product Category: Server / SW Admin
Product Affected: 
Systems Affected:
-----------------  
Mkt_ID   Platform   Model   Description             Serial Number
------   --------   -----   -----------             -------------
  -        A30       ALL    Sun Fire V880                 -
  -        A37       ALL    Sun Fire V480                 -
  -        E3500     ALL    Ultra Enterprise 3500         -


X-Options Affected:
-------------------
Mkt_ID   Platform   Model   Description             Serial Number
------   --------   -----   -----------             -------------
  -       A5x00      ALL    A5x00 Storage Array           -
Parts Affected: 
Part Number   Description           Model
-----------   -----------           -----
     -             -                  -
References: 
BugId:   4630477 - bogus device in "vxdisk list" output after replacing 
                   a disk drive.

PatchId: 113201-03: VxVM 3.2: general patch for Solaris 2.6; 7; and 8.
         112385-05: VxVM 3.2s9: general patch for Solaris 9.
                
ESC:     534731- bogus device in `vxdisk list` output after replacing a 
                 disk drive.

URL:     http://sdn.sfbay/cgi-bin/escweb?-I024271?-M534731?-P1
Issue Description: 
--------------------------------------------------------------------------
| Change History from FIN I0854-2 to FIN I0854-3                           |
| ==============================================                           |
|    Date Modified: July 27, 2004                                          |
|                                                                          |
|      Updated Sections: Corrective Action                                 |
|                                                                          |
|      CORRECTIVE ACTION: Eliminated temporary workaround fix and replaced |
|                         with permanent patch fix.                        |
|                                                                          |
|                                                                          |
| Change History from I0854-1 on Dec 17, 2002                              |
| ===========================================                              |
|    Date Modified: Nov/24/2003                                            |
|                                                                          |
|    Updates:  CORRECTIVE ACTION:                                          |
|                                                                          |
|    CORRECTIVE ACTION: . Added a procedure for 'disk replacement for Root |
|                         Disk', and 'V880 disk replacement'.              |
|                                                                          |
|                       . Added a procedure for Manual replacement on      |
|                         Volume Manager Disk w/CLI on 'Mirrored boot disk'|
|                         and 'Standard mirrored data disk'.               |
  -------------------------------------------------------------------------

After removing an FCAL  disk under VM 3.2 control from the internal
disk sub-system of a V880, V480, E3500, or from an A5X00, an 'Invalid
device structure' error message can be seen.  This causes the
replacement of the disks to fail.  When this happens, a reboot is
needed to allow replacement of the disk to proceed.  Having to reboot
the system nullifies the advantage of disk hotswap, causing unnecessary
downtime.

The following is an example of removing a disk under control of VM3.2.
Invalid device can be seen that's left behind:

   # vxdiskadm

     Volume Manager Support Operations
     Menu: VolumeManager/Disk

       1      Add or initialize one or more disks
       2      Encapsulate one or more disks
       3      Remove a disk
       4      Remove a disk for replacement
       5      Replace a failed or removed disk
       6      Mirror volumes on a disk
       7      Move volumes from a disk
       8      Enable access to (import) a disk group
       9      Remove access to (deport) a disk group
      10      Enable (online) a disk device
      11      Disable (offline) a disk device
      12      Mark a disk as a spare for a disk group
      13      Turn off the spare flag on a disk
      14      Unrelocate subdisks back to a disk
      15      Exclude a disk from hot-relocation use
      16      Make a disk available for hot-relocation use
      17      Prevent multipathing/Suppress devices from VxVM's view
      18      Allow multipathing/Unsuppress devices from VxVM's view
      19      List currently suppressed/non-multipathed devices
      20      Change the disk naming scheme
      21      Get the newly connected/zoned disks in VxVM view
 
      list   List disk information

       ?      Display help about menu
      ??      Display help about the menuing system
       q      Exit from menus

      Select an operation to perform: 4

    Remove a disk for replacement
    Menu: VolumeManager/Disk/RemoveForReplace

Use this menu operation to remove a physical disk from a disk group,
while retaining the disk name.  This changes the state for the disk
name to a "removed" disk.  If there are any initialized disks that are
not part of a disk group, you will be given the option of using one of
these disks as a replacement.

    Enter disk name [,list,q,?] list

       Disk group: rootdg

       DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

       dm disk01       c1t11d0s2    sliced   4711     35358848 -
       dm disk02       c1t13d0s2    sliced   4711     35358848 -

    Enter disk name [,list,q,?] disk01

    The following volumes will lose mirrors as a result of this operation:

       vol01

       No data on these volumes will be lost.

    The following devices are available as replacements:

       c1t2d0
        
    Choose one of these disks now, to replace disk01.
    Select "none" if you do not wish to select a replacement disk.

    Choose a device, or select "none"
    [,none,q,?] (default: c1t2d0) none

The requested operation is to remove disk disk01 from disk group
rootdg.  The disk name will be kept, along with any volumes using the
disk, allowing replacement of the disk.

    Select "Replace a failed or removed disk" from the main menu
    when you wish to replace the disk.

  Continue with operation? [y,n,q,?] (default: y) 

    Removal of disk disk01 completed successfully.

  Remove another disk? [y,n,q,?] (default: n) 
  
    Volume Manager Support Operations
    Menu: VolumeManager/Disk

      1      Add or initialize one or more disks
      2      Encapsulate one or more disks
      3      Remove a disk
      4      Remove a disk for replacement
      5      Replace a failed or removed disk
      6      Mirror volumes on a disk
      7      Move volumes from a disk
      8      Enable access to (import) a disk group
      9      Remove access to (deport) a disk group
     10      Enable (online) a disk device
     11      Disable (offline) a disk device
     12      Mark a disk as a spare for a disk group
     13      Turn off the spare flag on a disk
     14      Unrelocate subdisks back to a disk
     15      Exclude a disk from hot-relocation use
     16      Make a disk available for hot-relocation use
     17      Prevent multipathing/Suppress devices from VxVM's view
     18      Allow multipathing/Unsuppress devices from VxVM's view
     19      List currently suppressed/non-multipathed devices
     20      Change the disk naming scheme
     21      Get the newly connected/zoned disks in VxVM view
 
     list   List disk information

      ?      Display help about menu
     ??      Display help about the menuing system
      q      Exit from menus

   Select an operation to perform: q

   Goodbye.

     # luxadm remove_device /dev/rdsk/c1t11d0s2

WARNING!!! Please ensure that no filesystems are mounted on these device(s).
           All data on these devices should have been backed up.

The list of devices being used (either busy or reserved) by the host:
     1: Box Name:    "dak"  slot 9

Please enter 's' or  to Skip the "busy/reserved" device(s) or
     'q' to Quit and run the subcommand with
     -F (force) option. [Default: s]: 

   # luxadm remove_device -F /dev/rdsk/c1t11d0s2 --> Had to Force removal
     of device.

WARNING!!! Please ensure that no filesystems are mounted on these device(s).
           All data on these devices should have been backed up.

  The list of devices which will be removed is:
  
     1: Box Name:    "dak" slot 9
        Node WWN:    2000002037d9ff50
        Device Type:Disk device
        Device Paths:
        /dev/rdsk/c1t11d0s2

  Please verify the above list of devices and then enter 'c' or  to
  Continue or 'q' to Quit. [Default: c]:
 
    stopping:  Drive in "dak" slot 9....Done
    offlining: Drive in "dak" slot 9....Done

  Hit  after removing the device(s).
  Jun 6 08:51:52 eis-dak-f picld[233]:
 
  Device DISK9 removed
  Jun  6 08:51:52 eis-dak-f picld[233]: Device DISK9 removed

  Drive in Box Name "dak" slot 9
  Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
  Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
        c1t11d0s0
        c1t11d0s1
        c1t11d0s2
        c1t11d0s3
        c1t11d0s4
        c1t11d0s5
        c1t11d0s6
        c1t11d0s7

   # format
     Searching for disks...done

     AVAILABLE DISK SELECTIONS:

       0. c1t0d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037f3d422,0
       1. c1t1d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037bd2c91,0
       2. c1t2d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff6a,0
       3. c1t3d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff44,0
       4. c1t4d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff4c,0
       5. c1t5d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037e6057a,0
       6. c1t8d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9fd70,0
       7. c1t9d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff5d,0
       8. c1t10d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff56,0
       9. c1t11d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0
      10. c1t12d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037bde2dc,0
      11. c1t13d0 
          /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff46,0

    Specify disk (enter its number): 1
    selecting c1t1d0
    [disk formatted]

    FORMAT MENU:

        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !     - execute , then return
        quit

  format> q

  # devfsadm -C

  # cd /dev/rdsk

  # ls -al c1t11*

    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s0 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:a,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s1 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:b,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s2 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:c,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s3 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:d,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s4 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:e,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s5 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:f,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s6 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:g,raw
    lrwxrwxrwx   1 root     root          74 May 22 08:10 c1t11d0s7 -> 
    ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:h,raw

  All above Devices should NOT exist.......

  # luxadm insert_device

  Please hit  when finished adding Fibre Channel Enclosure(s)/Device(s): 
     Jun  6 08:54:49 eis-dak-f picld[233]: Device DISK9 inserted
     Jun  6 08:54:49 eis-dak-f picld[233]: Device DISK9 inserted

  Waiting for Loop Initialization to complete...
  New Logical Nodes under /dev/dsk and /dev/rdsk :

        c1t11d0s0
        c1t11d0s1
        c1t11d0s2
        c1t11d0s3
        c1t11d0s4
        c1t11d0s5
        c1t11d0s6
        c1t11d0s7

 No new enclosure(s) were added!!

  # vxdctl enable
    Jun  6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled path 118/0x48 
         belonging to the dmpnode 239/0x10
    Jun  6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled path 118/0x48 
         belonging to the dmpnode 239/0x10
    Jun  6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled dmpnode 
239/0x10
    Jun  6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled dmpnode 
239/0x10
    Jun  6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled path 118/0x208 
         belonging to the dmpnode 239/0x8
    Jun  6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled path 118/0x208 
         belonging to the dmpnode 239/0x8
    Jun  6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled dmpnode 
239/0x8
    Jun  6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled dmpnode 
239/0x8

  # vxdisk list    

    DEVICE       TYPE      DISK         GROUP        STATUS

    c1t0d0s2     sliced    -            -            error
    c1t1d0s2     sliced    -            -            error
    c1t2d0s2     sliced    -            -            online
    c1t3d0s2     sliced    -            -            error
    c1t4d0s2     sliced    -            -            online
    c1t5d0s2     sliced    -            -            error
    c1t8d0s2     sliced    -            -            online
    c1t9d0s2     sliced    -            -            online
    c1t10d0s2    sliced    -            -            online
    c1t11d0s2    sliced    -            -            error
    c1t11d0s2    sliced    -            -            error --> Invalid Device
    c1t12d0s2    sliced    -            -            error
    c1t13d0s2    sliced    disk02       rootdg       online
    -            -         disk01       rootdg       removed was:c1t11d0s2

The root cause of the problem is that if the VxVM is not disabled,
then the drive device node is dangling and should be removed from the
device tree.  After the new drive is inserted, a new node is created
on the device tree and will not see the dangling device node.
Implementation: 
---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
Corrective Action: 
The following recommendation is provided as a guideline for authorized
SUN Services Field Representatives who may encounter the above
mentioned issue.

This issue has now been fixed in 113201-03 (For Solaris 6,7,8) and 
112385-05 (Solaris 9).  The appropriate patch need to be applied to avoid 
issue.
 
Once patches are appllied, the workaround in bug 4630477 should not be 
required and normal disk replacemnt procedure should now be followed to 
replace a failed disk.
Comments: 
Using 'vxdisk rm' command in conjunction with luxadm -e offline will
allow replacement of disks to succeed without having to reboot the
system, maintaining uptime and availablity, and allowing hotswap of
disks to remain a viable solution.  The advantage to using this
procedure is that whether the disk was pulled prior to taking any VM
action, or when using the procedure outlined in above steps, disk
replacement is successful without needing a reboot.  In reference to
SRDB 17003 using option 11 to offline a disk through VM, in some cases
this will work, and in other cases where the disk has physically been
pulled first, option 11 to offline disk won't cleanup any opens left.
Thus we will have the duplicate enties.

============================================================================
Implementation Footnote: 
i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------
Statusactive