Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1004100.1
Update Date:2010-09-14
Keywords:

Solution Type  Technical Instruction Sure

Solution  1004100.1 :   Sun[TM] Cluster 3.x: Rolling firmware update on SCSI JBOD disk with Solaris[TM] Volume Manager and root disk  


Related Items
  • Sun Storage D1000 Array
  •  
  • Sun Storage D2 Array
  •  
  • Solstice DiskSuite Software
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - Other
  •  

PreviouslyPublishedAs
205707


Description
This Technical Instruction explains how to do a rolling disk firmware update with minimum downtime in Sun[TM] Cluster using Solaris[TM] Volume Manager (also known as SDS Solstice DiskSuite[TM]). This disk could be the local disk (root/mirror) on the nodes or a external shared SCSI JBOD disk.


Steps to Follow
There are two different approaches to do the firmware update, one for shared SCSI JBOD disks and one for cluster node disks (root/mirror).

The following procedure assumes that:
1) All metadevices (local and in disksets) are in the "Okay" state, all submirrores are attached and no resyncing of any metadevice is still in progress!

2) Your /etc/lvm/md.tab is current - compare output of "metastat -p" with the entries in /etc/lvm/md.tab. If they are missing or not uptodate then run:

  # metastat -p >> /etc/lvm/md.tab

and for the disksets:

  # metastat -s <setname> -p >> /etc/lvm/md.tab

Note: If you run a "sun explorer" all this info is saved in the output file!

3) Both cluster nodes are members and this will not change during the procedure.

4) The cluster 'did' namespace is current with no mismatches.

5) Check that all cluster 'did' id's do match the physical disk's id. Check /var/adm/messages for similar warnings like:

  device id for '/dev/rdsk/c2t8d0' does not match physical disk's id.  The drive may have been replaced

If this is the case then first identify the cluster DID device which does not match:

[root]# scdidadm -L
1        msun0001:/dev/rdsk/c1t0d0      /dev/did/rdsk/d1
2        msun0001:/dev/rdsk/c1t1d0      /dev/did/rdsk/d2
3        msun0002:/dev/rdsk/c3t8d0      /dev/did/rdsk/d3
3        msun0001:/dev/rdsk/c3t8d0      /dev/did/rdsk/d3
4        msun0001:/dev/rdsk/c3t9d0      /dev/did/rdsk/d4
4        msun0002:/dev/rdsk/c3t9d0      /dev/did/rdsk/d4
5        msun0001:/dev/rdsk/c2t8d0      /dev/did/rdsk/d5 <<<<<< wrong ID
5        msun0002:/dev/rdsk/c2t8d0      /dev/did/rdsk/d5 <<<<<<
6        msun0001:/dev/rdsk/c2t9d0      /dev/did/rdsk/d6
6        msun0002:/dev/rdsk/c2t9d0      /dev/did/rdsk/d6
7        msun0002:/dev/rdsk/c1t0d0      /dev/did/rdsk/d7
8        msun0002:/dev/rdsk/c1t1d0      /dev/did/rdsk/d8

Now run the following commands on the identified DID device to update cluster config: (this commands are save to run on a productive cluster!)

Check the current ID
[root]# scdidadm -o asciidiskid -l d5
IBM     8RM838
Update DID
[root]# scdidadm -R d5
Xcheck that ID is correctly updated
[root]# scdidadm -o asciidiskid -l d5
SEAGATE 3JA97LEV00007503

The ID should match the label on the front of the physical disk! You can use "iostst -En" to check all real serial number (and revision too!) and "scdidadm -o asciidiskid -l dYX" an all DID for cross checking.

......
c2t8d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE  Product: ST336607L SUN36G  Revision: 0507 Serial No: 00007503
Size: 18.11GB <18110967808 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
......

If there is no metadevice which is offline or in maintenance state and all DID ID's mach the physical ID's, the continue with A) and /or B).

A) How to do a firmware update on SCSI JBOD cluster shared disk

As the disks is spinning down and does a hard reset while doing a F/W update, you cannot do this on a disk in use. You would lose a mirror half! The second problem is that the "download" routine is checking if SVM too if drive is in use. To overcome both problems you need to:

First offline the disk for the period of updating, so will not lose the mirror half and the resync is quite quick! Then you need to run the "download" routine from the note which is currently NOT the owner of the diskset with the disk to update. In other words, if node 1 has the diskset imported, then run the "firmware "download" from node 2.

root@msun0002 # scstat -D
....
Device Group        Primary             Secondary
------------        -------             ---------
Device group servers:  nfs-set             msun0001            msun0002
....
root@msun0002 # metastat -s nfs-set
Proxy command to: msun0001
nfs-set/d300: Mirror
Submirror 0: nfs-set/d301
State: Okay
Submirror 1: nfs-set/d302
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 142239915 blocks
nfs-set/d301: Submirror of nfs-set/d300
State: Okay
Size: 142239915 blocks
Stripe 0: (interlace: 128 blocks)
Device Start Block  Dbase State        Hot Spare
d3s0          0     No    Okay
d4s0          0     No    Okay
nfs-set/d302: Submirror of nfs-set/d300
State: Okay
Size: 142239915 blocks
Stripe 0: (interlace: 128 blocks)
Device Start Block  Dbase State        Hot Spare
d5s0          0     No    Okay
d6s0          0     No    Okay
root@msun0002 # metaoffline -s nfs-set d300 d301

Change directory to the firmware patch directory:

root@msun0002 # cd /var/tmp/116369-11
root@msun0002 # ./download
Firmware Download Utility, V4.2
**************************  WARNING  **************************
NO OTHER ACTIVITY IS ALLOWED DURING FIRMWARE UPGRADE!!!
No other programs including any volume manager (e.g. Veritas,
SDS, or Vold) should be running.  Other host systems sharing
any I/O bus with this host must either be offline or
disconnected.  Any interruption (e.g. power loss) during
upgrade can result in damage to devices being upgraded.
Any disk to be upgraded should first have its data backed up.
***************************************************************
Searching for devices...
rmt/0: Mode Sense for default pages failed!
DISK DEVICES
Device         Rev   Product
c1t0d0:        0507  ST336607L -- SUN36G
c1t1d0:        1804  MAN3367M -- SUN36G
c2t8d0:        0507  ST336607L -- SUN36G <<<<<<<<<<<<<<<
c2t9d0:        0507  ST336607L -- SUN36G
c3t8d0:        S96H  DDYST3695 -- SUN36G
c3t9d0:        0507  ST336607L -- SUN36G
Total Devices:  6
Enter command: p c2t8d0 <<<<<<<<<<<<<<
NOTE: select ONLY the one disk to update!!!
NOTICE: Cannot access kernel, kvm_open did not succeed!
Upgrading devices...
c2t8d0: Successful download
Enter command: inq   " check if new firmware in place!"
DISK DEVICES
Device  Rev   Product              S/N
........
c2t8d0:        0707  ST336607L -- SUN36G <<<<<<<<<<<<<<<
........
Enter command: q
Now online the disk again and observe syncing.....
root@msun0002 # metaonline -s nfs-set d300 d301
Proxy command to: msun0001
root@msun0002 # metastat -s nfs-set | grep %
Proxy command to: msun0001
32 % done
root@msun0002 #

Repeat with other disks if nessessary.

B) How to do a firmware update on cluster node disk

If you have to update local disk, the just switch all the resource groups to the node.

root@msun0002 # scswitch -z -g <resourcegroup> -h msun0002

Then reboot this node into "none cluster mode":

root@msun0002 # init 0
> OK boot -xs

Once booted, you will have to delete the metadb on the disk to be updated and detach and clear the metadevice:

root@msun0002 # metadb
flags           first blk       block count
a m  p  luo        16              4096            /dev/dsk/c1t0d0s7
a    p  luo        4112            4096            /dev/dsk/c1t0d0s7
a    p  luo        8208            4096            /dev/dsk/c1t0d0s7
a    p  luo        16              4096            /dev/dsk/c1t1d0s7
a    p  luo        4112            4096            /dev/dsk/c1t1d0s7
a    p  luo        8208            4096            /dev/dsk/c1t1d0s7
root@msun0002 # metadb -d /dev/dsk/c1t0d0s7
root@msun0002 # metadb
flags           first blk       block count
a    p  luo        16              4096            /dev/dsk/c1t1d0s7
a    p  luo        4112            4096            /dev/dsk/c1t1d0s7
a    p  luo        8208            4096            /dev/dsk/c1t1d0s7

Save your rootdisk configuration before you start and save the original md.tab file.

root@msun0002 # cp /etc/lvm/md.tab /etc/lvm/md.tab.orig
root@msun0002 # metastat -p > /etc/lvm/md.tab
root@msun0002 # metastat -p
d200 -m d201 d202 1
d201 1 1 c1t0d0s0
d202 1 1 c1t1d0s0
d210 -m d211 d212 1
d211 1 1 c1t0d0s1
d212 1 1 c1t1d0s1
d230 -m d231 d232 1
d231 1 1 c1t0d0s3
d232 1 1 c1t1d0s3
d240 -m d241 d242 1
d241 1 1 c1t0d0s4
d242 1 1 c1t1d0s4
d250 -m d251 d252 1
d251 1 1 c1t0d0s5
d252 1 1 c1t1d0s5
d260 -m d261 d262 1
d261 1 1 c1t0d0s6
d262 1 1 c1t1d0s6
root@msun0002 # metadetach d200 d201
......
Repeat this with all other submirrors
.....

The metastat should now look something like this:

root@msun0002 # metastat -p
d200 -m d202 1
d202 1 1 c1t1d0s0
d210 -m d212 1
d212 1 1 c1t1d0s1
d230 -m d232 1
d232 1 1 c1t1d0s3
d240 -m d242 1
d242 1 1 c1t1d0s4
d250 -m d252 1
d252 1 1 c1t1d0s5
d260 -m d262 1
d261 1 1 c1t0d0s6
d262 1 1 c1t1d0s6
d201 1 1 c1t0d0s0
d211 1 1 c1t0d0s1
d231 1 1 c1t0d0s3
d241 1 1 c1t0d0s4
d251 1 1 c1t0d0s5

Now save this configuration again, you will see further down the reason for this.

root@msun0002 # cp /etc/lvm/md.tab /etc/lvm/md.tab.bothmirrors
root@msun0002 # metastat -p > /etc/lvm/md.tab

Now clear all the subrirrors

root@msun0002 # metaclear d201
....
repeat for all submirrors
....

Now you can update your firmware but for this disk only. (Follow the procedure in patch readme or like in A)

When finished, just run:

root@msun0002 # metainit -a

metainit is now using the entries in /etc/lvm/md.tab by default and it will recreate all the missing submirrors again. Ther will be a lot of messages telling that some mirrors exist, that is OK, ignore.

root@msun0002 # metattach d200 d201
....
repeat for all submirrors
....

Repeat with other local disk (if nessessary). Reboot node into cluster again and repeat with other node (if necessary).



Product
Solaris Volume Manager Software
Solstice DiskSuite 4.0
Solstice DiskSuite 3.0
Sun StorageTek 3510 FC Array JBOD
Sun StorageTek D2 Array
Sun StorageTek D1000 Array
Sun Cluster 3.1
Sun Cluster 3.0

scsi, jbod, disk, replacement, suncluster, cluster, scdidadm, firmware, download, update
Previously Published As
88595

Change History
Date: 2007-03-04
User Name: 97961
Action: Approved
Comment: - Converted to STM formatting for better readability
- Made simple sentence/grammatical corrections
Version: 4
Date: 2007-03-04
User Name: 97961
Action: Accept
Comment:
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback