Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1009147.1
Update Date:2011-05-31
Keywords:

Solution Type  Technical Instruction Sure

Solution  1009147.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: Procedure for applying Solaris[TM] and SMS patches to System Controllers  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
212657


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server
All Platforms

Goal

Applying patches to a Solaris[TM] workstation or server is a relatively straightforward maintenance task. Patches are downloaded, and patch READMEs are reviewed in order to follow the Installation Instructions properly.

There are, however, certain types of systems that are so special that the patching process may be confusing and hard to follow for specific OS or software products. Such a special system is the Sun Fire[TM] 12K/15K/E20K/E25K System Controller (SC). This system manages the 12K/15K/E20K/E25K platform (up to 18 total domains) and therefore is an important server which requires regular patch maintenance for both the Solaris OS and System Management Services (SMS) software. The Solaris and SMS patch installation instructions vary greatly, and must be followed exactly to assure the platform remains healthy and unaffected negatively by the maintenance work. Because of the different special instructions required for these two patch sets, the maintenance process might seem confusing, but it really is not so bad a process at all.

Solaris patches, generally speaking, require a reboot to take effect.  The Solaris patches should also be applied at single user level because the OS changes need to be made when applications are shut down and the system is in as quiesced a state as possible.  Some patches may also require additional modifications be made according to a specific patch's README.

SMS patches are not applied in the same manner as Solaris patches.  SMS patching requires that the SMS software be stopped and started in specific orders during the patching process to be sure that the platform's health is not effected by the patching process itself.  This involves various steps to assure that failovers do not occur when they are not desired.  In addition to the general order of stopping and starting SMS, some specific  SMS patches require their own set of instructions be carried out as well.

Because of the general differences in the process of patching Solaris and SMS, following the two sets of Installation Instructions can ultimately lead to confusion.  In the worst case, serious problems may be encountered if a critical step(s) is missed.  This document is meant to be a general reference guide to describe the steps that should be followed when patching the SMS software and Solaris OS on the Sun Fire 12K/15K/E20K/E25K System Controller configuration.  This procedure is meant to reduce the problems that could be encountered by the maintenance task of patching by separating the process out into three stages, PRE WORK, SMS PATCHING, and SOLARIS PATCHING.

**************************************************************************
NOTE:   This document is to be used for reference only.  This document in
        no way replaces the README included with every patch.  It is the
        responsibility of the person who installs the patch to review,
        understand, and follow a patch's README.  This document contains
        the general process that the described maintenance should follow.  
        The patch READMEs contain the specific and sometimes additional
        steps needed to apply the specific patches.
**************************************************************************

Please review this entire document before "diving in" and executing the process on your SC configuration.

Solution

Sun Fire[TM] 12K/15K/E20K/E25K: Procedure for applying Solaris[TM] and SMS patches to System Controllers

Legend for document:
    SC0        ORIGINAL MAIN SC
    SC1        ORIGINAL SPARE SC
    sms-svc%    Command issued as user sms-svc on a SC.
    #        Command issued as user root on a SC.

The OPTIONAL STEPS listed in this procedure are added steps for the protection of the SC and production domain(s) environment.  These highly important steps will reduce the chance that any issue encountered with patch maintenance on the SCs is more easy to diagnose, and therefore more easy to resolve with minimal to no impact to the production domains.  The time these steps add to the total patch process is definitely worth the investment.  It is highly recommended to follow these OPTIONAL STEPS during this procedure.

Contact your Sun Support Representative with any problems you might encounter during this patch install process.


STAGE 1:  PRE WORK

1)
Write down on a piece of paper, your hand, anywhere, the name of the ORIGINAL MAIN SC.  This is one of the most important steps in the entire process, so this must be done!

2)
Connect to both SCs via the external serial cable console connection. It is not advisable to use the smsconnectsc function between the two SCs because the SCs will be rebooting during the process and that would terminate the smsconnectsc connections every once in awhile. The external console connection will stay connected through reboots and therefore is more reliable.  All work should be done in the console session and if possible log the console sessions so we have evidence of the work that was done during this procedure should anything bad happen.

3)
Confirm failover is ACTIVE and datasync is ACTIVE:
SC0 (MAIN SC):
sms-svc%  showfailover
SC Failover Status:     ACTIVE
C1: Up
-v

sms-svc%  showdatasync
File Propagation State: ACTIVE
Active File:            -
Queued Files:           0
If failover status is not ACTIVE, activate it on the MAIN by executing:
sms-svc% setfailover on
This will also activate datasync, and should begin propagation of files from the MAIN to SPARE SC.
Do not proceed until failover is ACTIVE and datasync is ACTIVE with no Active or Queued files.
4)
Confirm that platform configuration changes are NOT occurring:
SC0 (MAIN SC):
sms-svc%  ps -ef | grep hpost
A DR operation is indicated if an hpost -H is running, but any hpost process indicates a configuration change is occurring.
sms-svc%  ps -ef | grep board
Will identify addboard, deleteboard, moveboard DR operations.
sms-svc%  ps -ef | grep cfgadm
Will identify a cfgadm or rcfgadm DR operation.
If there is any hpost or DR process present, let the process complete before proceeding.  If no changes are reported then proceed.

4a) *OPTIONAL STEP*
Confirm that SC failover works properly:

SC0 (MAIN SC):
sms-svc% setfailover force
****************************************************************
NOTE: Forcing failover serves two functions.  The failover will
      confirm that SMS can failover properly from SC to SC.  It
      will also reboot the MAIN SC, assuring that prior to
      applying patches the SC could boot properly.  If the SC
      fails to failover or fails to reboot successfully, the
      patch maintenance should be delayed until the current
      problems are resolved.
****************************************************************

SC1 (SPARE becoming MAIN SC):
sms-svc%  showfailover -r
(Will report SPARE status transitioning to MAIN)
Once SC1 reports it is MAIN, and SC0 has successfully rebooted and reports itself as SPARE, enable failover and fail SMS back to SC0:

SC1 (MAIN SC):
sms-svc%  setfailover on
sms-svc%  setfailover force
Just as with SC0, this will reset (reboot) SC1, and the same NOTE from above applies.

Following the second failover and SC1s reboot, the original configuration should exist again.  Confirm it:

SC0 (MAIN SC):
sms-svc%  showfailover -r
MAIN
SC1 (after reboot completes):
sms-svc%  showfailover -r
SPARE
5)
Turn off SC failover (If you skipped the OPTIONAL STEP);  If you followed the OPTIONAL STEP, just confirm that failover is off:

SC0 (MAIN SC)
sms-svc%  setfailover off
sms-svc%  showfailover
SC Failover Status:     DISABLED

Proceed to Stage 2.

STAGE 2:  SMS PATCH INSTALLATION

Make sure to have read ALL of the Special Installation Instructions contained in the patch README files for the patches you are about to install.  The steps below apply to most SMS patches but do not include additional steps that some specific patches require.  Fit those additional steps into this process as their READMEs direct.

1)
Stop SMS on the MAIN and SPARE SC:

SC0 & SC1:
#  /etc/init.d/sms stop
1a) *OPTIONAL STEP*
Backup the current SMS operational environment of the MAIN SC:
 
SC0 (ORIGINAL MAIN)
#  /opt/SUNWSMS/bin/smsbackup /
****************************************************************
NOTE: Direct the backup file to be placed in a directory such as
      /var/tmp and not in a directory such as /tmp.  A reboot
      will flush /tmp, so this is not a good directory to use.
****************************************************************

2)
Apply SMS patches to BOTH SCs using patchadd.  

****************************************************************
NOTE: Do not use patchadd option -d as this option will prevent
      a patch from being able to be removed in the future.  If
      we are not able to remove a patch in the future, we may
      not be able to fix an issue with a patch, should an issue
      be discovered with the patch itself.
****************************************************************    
2a)
Complete any specific Special Installation Instructions which need to be made prior to SMS running again.
3)
Start SMS on the ORIGINAL MAIN SC:

SC0 (ORIGINAL MAIN)
#  /etc/init.d/sms start
4)
Confirm that it has become MAIN:

SC0 (MAIN)
sms-svc%  showfailover -r
MAIN
5)
Start SMS on the ORIGINAL SPARE SC:

SC1 (ORIGINAL SPARE)
#  /etc/init.d/sms start
6)
Confirm that it has become SPARE again:

SC1 (MAIN)
sms-svc%  showfailover -r
SPARE
6a) *OPTIONAL STEP*
Confirm that SC failover works properly following SMS patching:

SC0 (MAIN SC):
sms-svc% setfailover on
sms-svc% setfailover force
SC1 (SPARE becoming MAIN SC):
sms-svc%  showfailover -r
(Will report SPARE status transitioning to MAIN)

Once SC1 reports it is MAIN, and SC0 has successfully rebooted and reports itself as SPARE, enable failover and fail SMS back to SC0:

SC1 (MAIN SC):
sms-svc%  setfailover on
sms-svc%  setfailover force
Following the second failover and SC1s reboot, the original configuration should exist again.  Confirm it:

SC0 (MAIN SC):
sms-svc%  showfailover -r
MAIN
SC1 (after reboot completes):
sms-svc%  showfailover -r
SPARE
7)
Confirm that SC failover is off:

SC0 (MAIN SC)
sms-svc%  showfailover
SC Failover Status:     DISABLED
Complete any of the specific SMS patch Installation Instructions that may still need to be completed and then proceed to Stage 3.  If a reboot is required for any specific SMS patch installation instruction, the reboot will be accomplished on the SCs early in Stage 3 so that step can wait.

STAGE 3:  Solaris PATCH INSTALLATION

Make sure to have read ALL of the Special Installation Instructions contained in the patch README files for the patches you are about to install.  The instructions below apply to most Solaris patches but do not include the additional steps that some patches require.  In general, Solaris patches should be applied at single user level and then require a reboot to take effect, but some specific patches do require certain additional steps.  Fit those additional steps into this process as their READMEs direct.

1)
Reboot the SPARE SC into Single User Mode (The MAIN SC will remain up in SMS monitoring the platform):

SC1 (SPARE)
#  shutdown -y -g0 -iS    
2)
Apply Solaris patches to SPARE SC (SC1) using patchadd.  

****************************************************************
NOTE: Do not use patchadd option -d as this option will prevent
      a patch from being able to be removed in the future.  If
      we are not able to remove a patch in the future, we may
      not be able to fix an issue with a patch, should an issue
      be discovered with the patch itself.
****************************************************************
3)
Perform any additional installation instructions for the specific patches which require them from the patch READMEs.  Then reboot the SPARE SC:
    
SC1 (SPARE)
#  init 6
4)
Once SPARE SC reboots, and SMS starts as role of SPARE, activate SMS failover:

SC1 (Becoming SPARE following reboot)
sms-svc%  showfailover -r
SPARE    
SC0 (MAIN)
sms-svc%  setfailover on
sms-svc%  showfailover
SC Failover Status:     ACTIVE
C1: Up
-v

sms-svc%  showdatasync
File Propagation State: ACTIVE
Active File:            -
Queued Files:           0
Once Failover is ACTIVE and datasync is ACTIVE with no Active or Queued files, continue.
     
5)
Reboot the MAIN SC into single user:
     
SC0 (MAIN)
#  shutdown -y -g0 -iS    
****************************************************************
NOTE: Because failover is activated, SMS will failover the MAIN
      role to the SPARE when the MAIN SC is rebooted in such a
      manner as this.
****************************************************************

5a)
Confirm that the SPARE SC is transitioning to MAIN:

SC1 (SPARE to MAIN)
sms-svc%  showfailover -r
SPARE -> MAIN in time
        
6)
Apply Solaris patches to ORIGINAL MAIN now SPARE SC (SC0) using patchadd.  

****************************************************************
NOTE: Do not use patchadd option -d as this option will prevent
      a patch from being able to be removed in the future.  If
      we are not able to remove a patch in the future, we may
      not be able to fix an issue with a patch, should an issue
      be discovered with the patch itself.
****************************************************************
7)
Perform any additional installation instructions for the specific patches which require them from the patch READMEs.  Then reboot the ORIGINAL MAIN, now SPARE SC:
    
SC0 (ORIGINAL MAIN - now SPARE)
#  init 6
8)
Once SPARE SC reboots, and SMS starts as role of SPARE, activate SMS failover:

SC0 (Becoming SPARE following reboot)
sms-svc%  showfailover -r
SPARE    
SC1 (MAIN)
sms-svc%  setfailover on
sms-svc%  showfailover
SC Failover Status:     ACTIVE
C1: Up
-v

sms-svc%  showdatasync
File Propagation State: ACTIVE
Active File:            -
Queued Files:           0
At this point the patching process is complete. Both SCs are back into the configuration and Solaris and SMS is updated. Currently SC1 is configured as MAIN and SC0 is SPARE, opposite of the original configuration. Either leave this configuration alone, or you can force failover of the MAIN role back to SC0 with setfailover force from SC1 (Which will reboot SC1 again).


Report any problems with the process or with the SC's post patch install configuration to your Oracle Support Representative.  

If the process has been followed as detailed above, Oracle Support Services should have a good chance in being able to isolate a given problem to a certain patch or process problem quickly and then resolve it if need be.


Internal Comments
The following is strictly for the use of Oracle employees:
This process includes some steps from a patch procedure presentation created by David Dalik, a Sun SSE, for a customer site that he supports. This is a field tested procedure, but every site's exact patch process may vary slightly depending on those additional steps required by the READMEs for the patches each site is installing.

Keywords: 12k, 12K, 15k, 15K, 20k, 25k, SC, System Controller, SMS, sms, solaris, patch, patchadd

Previously Published As 75248


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback