Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1003122.1 : Veritas Volume Manager - Procedure to Replace Internal FibreChannel (FC) Disks controlled by VxVM
PreviouslyPublishedAs 204286 Description A specific procedure must be used when replacing one of the internal disks in a system with internal fibre drives (Sun Fire[TM] 280R, Sun Fire[TM] V480, Sun Fire[TM] V490, Sun Fire[TM] V880, Sun Fire[TM] V890), especially if the disk is under Veritas Volume Manager (VxVM) control. Although these disks are hot-swappable, the procedure below should be used to alert VxVM to the fact that the drive is being replaced. This document applies to Veritas Volume Manager (VxVM) 3.x and above. This document also assumes that you are running either: Solaris[TM] 8, 9 or 10 Operating System(OS), Solaris[TM] 7 OS, with kernel patch 106541-08 or higher. This is required to get the functionality of the devfsadm command. Failure to follow this procedure could result in a duplicate entry for the replaced disk in VxVM. This is most notable when running a 'vxdisk list' command. For example: # vxdisk list DEVICE TYPE DISK GROUP STATUS c1t0d0s2 sliced rootdisk rootdg online c1t1d0s2 sliced - - error c1t1d0s2 sliced - - error The extra device will disappear after the next reboot, which seems to be the only way to remove it. Therefore, it is best to prevent the duplicate device from being created in the first place. This is accomplished by the following procedure. Steps 9a - 9c pertain only to Sun[TM] Cluster 3.x installations. If the disk is not under VxVM control, you can skip steps 2,4,9-11 Steps to Follow NOTE: All data on these devices should have been backed up. Before replacing any disk under VxVM control, it should be in either a 'failed' or 'removed' state: # vxdisk list DEVICE TYPE DISK GROUP STATUS c1t0d0s2 sliced rootdisk rootdg online c1t1d0s2 sliced - - online - - disk01 rootdg failed was:c1t1d0s2 If the disk does not show up as "failed was", as shown above, then you should run 'vxdiskadm' and choose option #4 to remove the disk for replacement. After running 'vxdiskadm', the output should look like this: # vxdisk list DEVICE TYPE DISK GROUP STATUS c1t0d0s2 sliced rootdisk rootdg online c1t1d0s2 sliced - - online - - disk01 rootdg removed was:c1t1d0s2
NOTE:
For example, # ls -al /dev/rdsk/c1t0d0s0 lrwxrwxrwx 1 root root 74 Mar 6 2003 c1t0d0s0 -> ../../ devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0:a,raw
For example, # eeprom nvramrc devalias rootdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19920,0:a devalias mirrdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19838,0:a boot-device=rootdisk mirrdisk 1. If this is a root-disk or root-mirror, use the dumpadm command to ensure that the dump-device is not on the failed disk. If it is, move it to the good side of the mirror, for example: dumpadm -d /dev/dsk/c1t0d0s1 2. If vxdiskadm option 4 is used to remove the disk for replacement, instruct VxVM to re-read the device tree by running the command # vxdctl enable 3. Put the disk into the "offline" state with the following command: # vxdisk offline c1t1d0s2 4. Verify the disk has been marked "offline" with "vxdisk list": # vxdisk list DEVICE TYPE DISK GROUP STATUS c1t0d0s2 sliced rootdisk rootdg online c1t1d0s2 sliced - - offline - - disk01 rootdg removed was:c1t1d0s2 5. Once Veritas has recognized the disk as offline and ready for replacement, you need to tell the operating system. This is done as follows: # /usr/sbin/luxadm remove_device /dev/rdsk/c1t1d0s2 This will produce output similar to the following: WARNING!!! Please ensure that no file systems are mounted on these device(s). All data on these devices should have been backed up. The list of devices which will be removed is: 1: Device name: /dev/rdsk/c1t1d0s2 Node WWN: 20000020371b1f31 Device Type: Disk device Device Paths: /dev/rdsk/c1t1d0s2 Please verify the above list of devices and then enter c or <CR> to Continue or q to Quit. [Default: c]:c stopping: /dev/rdsk/c1t1d0s2.... Done offlining: /dev/rdsk/c1t1d0s2.... Done The drives are now off-line and spun down. Physically remove the disk and press the Return key. Hit <Return> after removing the device(s). <date> <systemname> picld[87]: Device DISK1 removed Device: /dev/rdsk/c1t1d0s2 No FC devices found. - /dev/rdsk/c1t1d0s2 NOTE: The picld daemon notifies the system that the disk has been removed. If no errors are printed, continue to step 6. Otherwise, if you receive any errors during this step:
# vxdisk rm c1t1d0s2 # luxadm -e offline /dev/rdsk/c1t1d0s2
6. Initiate devfsadm cleanup subroutines by entering the following command: # /usr/sbin/devfsadm -C -c disk The default devfsadm operation, is to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files in the /devices directory, and logical links in /dev. With the "-c disk" option, devfsadm will only update disk device files. This saves time and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware. The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names. This should remove all the device paths for this particular disk. This can be verified with: # ls -ld /dev/dsk/c1t1d* This should return no devices. 7. Verify that the reference to this disk is gone by running the commands # vxdisk list (if the disk is under vxvm control) # format It is now safe to physically replace the disk. 8. After replacing the disk, create the necessary entries in the Solaris OS device tree with one of the following commands: # devfsadm or # /usr/sbin/luxadm insert_device <enclosure_name,sx> where sx is the slot number. NOTE: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number: # luxadm display <enclosure_name> To find the <enclosure_name> use: # luxadm probe Run "ls -ld /dev/dsk/c1t1d*" to verify that the new device paths have been created. NOTE: After inserting disk and running devfsadm(or luxadm), the old ssd id was changed to a new one. So, just ignore this change. For example: When an error occurs on the following disks(ssd3). WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958 (After inserting disk) picld[287]: [ID 727222 daemon.error] Device DISK0 inserted qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@ w21000011c63f0c94,0 scsi: [ID 365881 kern.info] <SUN72G cyl 14087 alt 2 hd 24 sec 424> genunix: [ID 408114 kern.info] /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c 63f0c94,0 (ssd10) online 9. Label the disk using the format command. If the disk is under VxVM control, be sure to write an SMI label(Solaris 9 4/03 OS or later): # format -e /dev/rdsk/c1t1d0s2 ... format> l [0] SMI Label [1] EFI Label Specify Label type[1]: 0 Auto configuration via format.dat[no]? no Auto configuration via generic SCSI-2[no]? yes Ready to label disk, continue? yes If the disk is not under VxVM control, label the disk to local requirements, otherwise, it could be labeled with a standard vtoc. Steps 9a - 9c are only required if this is a system running SunCluster For Pre SunCluster 3.2 9a. /usr/cluster/bin/scdidadm -C 9b. /usr/cluster/bin/scdidadm -r 9c. /usr/cluster/bin/scgdevs For SunCluster 3.2 9a. /usr/cluster/bin/cldevice clear *run on each node, or use "-n <node,...> to specify nodes 9b. /usr/cluster/bin/cldevice refresh *run on each node, or use "-n <node,...> to specify nodes 9c. /usr/cluster/bin/cldevice populate *run on any node Note: It's possible to get errors from c0t0d0 which is the cdrom/dvd drive on Sun fire v480,v880 etc.. 10. Instruct VxVM to re-read the device tree by running the command # vxdctl enable 11. The disk will remain in the "offline" state until the new disk is initialized. To initialize it, use the command line first: # vxdisksetup -i c1t1d0 Then, use 'vxdiskadm' and choose option #5 to replace the failed or removed disk. - OR - Run 'vxdiskadm' and choose option #5 to initialize it and replace the failed or removed disk. If the 'vxdiskadm' command is run, and option #5 is chosen, it will show that "Access is disabled" for this new disk (because it is still "offline"), and will be asked whether or not you wish to "enable access" to it. Answer 'yes' to this question. 12. The disk should now be online and functional, within the operating system and VxVM. Confirm this with "vxdisk list". NOTE: Do not re-boot the system and Setp-13(modify nvramrc) until a synchronization is completed. If it is re-booted, it cannot boot from a new disk or modify devalias. Confirm this with "vxtask list": # vxtask list 13. If a swap partition had to be moved, move it back, for example: dumpadm -d /dev/dsk/c1t1d0s1 14. If this was a root-disk or a root-mirror, then you need to make sure and run /etc/vx/bin/vxbootsetup command. The vxbootsetup utility configures a disk by writing a boot track at the beginning of the disk and by creating physical disk partitions in the UNIX VTOC that match the mirrors of the root, swap, /usr and /var. /etc/vx/bin/vxbootsetup -g rootdg rootdisk 15. If this was a root-disk or root-mirror, then ensure the nvram aliases are updated so you can boot. ls -al /dev/rdsk/<new disk>s0 example: ls -al /dev/rdsk/c1t1d0s0 Check the WWN from the ls output with the appropriate root alias entries in the NVRAM. (eeprom nvramrc) and look at rootmirror or rootdisk entries. NOTE: The change method of devalias in nvramrc. From removed disk information to new disk information. For example, - List before modifying nvramrc. (removed disk information) # eeprom nvramrc devalias rootdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19920,0:a devalias mirrdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19838,0:a - List the new disk information # ls -al /dev/rdsk/c1t0d0s0 lrwxrwxrwx 1 root root 74 Mar 6 2003 c1t0d0s0 -> ../../ devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0:a,raw - Modify nvramrc (This example is written in the bourne shell) # eeprom nvramrc='devalias root-disk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@ w21000011c63f0c94,0:a[enter] devalias rootmirror /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19838,0:a'[enter] - List after modifying nvramrc. # eeprom nvramrc devalias rootdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0:a devalias mirrdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19838,0:a NOTE: Product VERITAS Volume Manager 3.0.2 Software Sun Fire 280R Server Sun Fire V880 Server Sun Fire V480 Server VERITAS Volume Manager 4.0 Software VERITAS Volume Manager 3.5 Software VERITAS Volume Manager 3.2 Software Internal Comments Procedure to Manually remove duplicate device issues If the customer ends up with duplicate entries for the same device in vxdisk list, use the following procedure. If it does not get rid of duplicate entries, have the customer do a reconfiguration re-boot.
# touch /reconfigure VxVM, Volume Manager, suncluster, cluster, upgrade, upgrading, configure, configured, configuration, disk, failed, replace, replace disk, 480, 490, 280, 880, 890, V880, V890, V480, V490, 280R Previously Published As 40842 Change History Date: 2007-11-09 User Name: 31620 Action: Approved Comment: Verified Metadata - ok Verified Keywords - ok Verified still correct for audience - currently set to contract Audience left at contract as per FvF at http://kmo.central/howto/content/voyager-contributor-standards.html Checked review date - currently set to 2008-07-03 Checked for TM - ok as presented Publishing under the current publication rules of 18 Apr 2005: Checked for the word normalized - not present Version: 54 Date: 2007-11-05 User Name: 31620 Action: Accept Comment: Version: 0 Date: 2007-11-05 User Name: 116529 Action: Approved Comment: Looks fine, minor change. Please publish. Version: 0 Date: 2007-11-05 User Name: 116529 Action: Accept Comment: Version: 0 Attachments This solution has no attachment |
||||||||||||
|