Sun Fire[TM] Server: How to Use Dynamic Reconfiguration (DR) in a Sun Cluster[TM] 3.x Environment

Asset ID:	1-71-1012186.1
Update Date:	2011-02-22
Keywords:

Solution Type Technical Instruction Sure

Solution 1012186.1 : Sun Fire[TM] Server: How to Use Dynamic Reconfiguration (DR) in a Sun Cluster[TM] 3.x Environment

Applies to:

Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E2900 Server
All Platforms

Goal

This document describes how to use Dynamic Reconfiguration (DR) in Sun[TM] Cluster 3.x configurations. Although DR is supported in a clustered environment, some restrictions apply; this document provides guidelines and best practices for performing a DR operation in a safe manner on production systems.

Note: It is always safe to issue a DR detach operation because the DR subsystem rejects operations on boards containing active components.

This document is aimed toward an audience that has basic knowledge about DR and Sun[TM] Cluster.

Solution

SUPPORTED CONFIGURATIONS

Sun Fire[TM] server domains running:
- Solaris 8 2/02 Operating System + patches
- Solaris 9 4/03 Operating System + patches

Sun Fire 12K/15K servers must be running System Management Services (SMS) 1.3 or greater
Sun Fire server revision must be 5.12.7 or greater
Sun Cluster 3.0 12/01 or later, Sun Cluster 3.1
No special patches are required for using DR in a Sun Cluster environment.

KNOWN RESTRICTIONS AND CONSIDERATIONS

All the requirements, procedures, and restrictions that are documented for the Solaris OS DR also apply to Sun Cluster DR support.

In a Sun Cluster configuration, the following operations are not supported:

DR detach of a board containing permanent memory

Sun Cluster software does not support detach operations on a board that contains the permanent kernel memory and will reject any such operation.

Note: A DR detach operation that pertains to memory other than the kernel permanent memory will not be rejected.

Note: The kernel cage on OS versions up to and including Solaris[TM] 8 will always be located on a board within the domain to which the highest ordered slice number is assigned. The PCD address slice mapping can only be modified using interaction with a kernel cage copy/rename operation.
Therefore, by default, the permanent memory is located on the most significantly numbered board. The same is true on Solaris[TM] 9 prior to KU 118558-05 (together with platmod patch 117124-07 ). With KU 118558-05 and higher and with 117124-07 and higher the kernel cage may be split among multiple boards.
See for details.

Refer to for more information about the kernel cage.

DR detach operations for the board where the heartbeat threads are bound

The board where the heartbeat threads are bound is the least significantly numbered in the domain when the domain is booted, with the least significant processors. DR detach operations will fail for the board where the heartbeat
threads are bound. These real time (RT) threads cannot be moved automatically to other CPUs.

You can confirm which board has these threads in the following way:

 phys-motu-1# echo ::cycinfo | mdb -k
CPU     CYC_CPU   STATE NELEMS        ROOT            FIRE HANDLER
0 300000b3900  online      3 300000c7188    2dcfb9483f80 clock
1 300000b2700  online      2 30001bd4990    2dcfbfd6b700 cluster_heartbeat

Here we see that the cluster_heartbeat cyclic is on CPU 1.

Note: This is for recent versions of Solaris/Sun Cluster where cyclics are used to trigger the heartbeat; with older versions you have to look where the clock is located.

DR detach operations on a device that is currently configured as a quorum device

A RCM script (SUNW_cluster_quorum_rcm.pl) prevents the removal and suspension of Sun Cluster quorum devices.

DR detach operations on active devices in the primary node

DR detach operations on active devices in the primary node are not permitted. DR operations can be performed on non-active devices in the primary node and on any devices in the secondary node. An RCM script (SUNW_cluster_storage_rcm.pl) prevents the removal and suspension of primary paths to the Sun Cluster managed storage devices.

DR detach operations on active private interconnect interfaces

Dr detach operations cannot be performed on active private interconnect interfaces. A workaround is to disable and remove the interface from the active interconnect.

GUIDELINES

Removing a board containing active components may result in system errors. Before removing a board, the DR subsystem queries other subsystems, such as Sun Cluster subsystems, to determine whether the components on the board are being used. As a result, it is always safe to issue a DR detach operation in a Sun Cluster 3.x environment.

DR Detach of CPU/Memory Board

- Determine if the CPU/Memory Board has the kernel permanent memory.

- From the domain:

   # cfgadm -alv | grep SB | grep permanent

- From the main system controller (SC):

   sc:sms-user:> rcfgadm -d  -alv | grep SB | grep permanent

   or from the following:

   sc:sms-user:> showdevices -d 
 represents the domain (A-R)

For example:

 # cfgadm -alv | grep SB | grep permanent
SB2::memory connected    configured   ok         base address
0x12000000000, 16777216 KBytes total, 2133448 KBytes permanent

If the board being detached contains the kernel permanent memory, the operation will be rejected. For example:

 # cfgadm -c disconnect SB2
System may be temporarily suspended, proceed (yes/no)? yes
cfgadm: Library error:
Resource  Information
--------  -----------
SUNW_OS   Sun Cluster

In the Solaris[TM] 8 OS, the most significantly numbered board fails to drain because Sun Cluster software uses the Real Time Thread scheduler class for both of its heartbeat threads, and there is a problem with RT threads that causes the DR operation to abort:
* The kernel cage is on the most significantly numbered board.
* Copy/rename (relocation) of the kernel cage requires that all threads of
execution be suspended.
* With Solaris 8 OS, RT threads cannot be suspended.

In the Solaris[TM] 9 OS, the DR operation advises you of the RT threads and gives you the option to proceed anyway. Suspension of RT threads causes them to stop being real time. If you proceed with the operation "without" manual workarounds, problems can occur in some RT thread applications. Suspending RT threads for too long can panic the node.

To DR detach the board containing the kernel permanent memory, the node must be shut down and booted in a non-cluster mode (Of course, using DR in that case is not necessary.):

 # /usr/cluster/bin/scswitch -S -h 
# shutdown -g0 -y -i0
ok boot -x

If the board does not contain kernel permanent memory, proceed with the detach of the CPU/Memory Board.

- From the domain:

   # cfgadm -c disconnect SBxx

- From the main SC:

   sc:sms-user:> rcfgadm -c disconnect SBxx

   or from the following:

   sc:sms-user:> deleteboard SBxx

A DR Detach of a CPU/Memory Board that does not contain the kernel permanent memory might fail due to the presence of the cluster heartbeat threads. You can have up to NCPUS heartbeat threads.

Usually, there are only two heartbeat threads because there are only two private interconnect links. The heartbeat threads act as bound threads, but they are not. Because the heartbeat threads are usually awakened by the clock, these threads appear to be bound to the CPU running the clock. If one of the heartbeat threads is busy during the DR Detach, it will prevent DR on the CPU it is using, and the DR operation fails. For example:

 #cfgadm -d A -c disconnect SB12
cfgadm: Hardware specific failure: unconfigure SB12: Failed to
off-line: dr@0:SB12::cpu0

The status of the heartbeat threads cannot be reported using a user command (ps), but they can be reported from some Solaris OS internal structures (for example, clock thread callout tables).

Use the cfgadm -al command to check the state and condition of a board prior to replacing it. The 'Receptacle/Occupant/Condition' fields must indicate disconnected/unconfigured/unknown.

The CPU/Memory Board can be physically removed as soon as the board is powered off.

DR Attach of CPU/Memory Board

The DR attach of CPU/Memory Board is always safe.

- From the domain:

   # cfgadm -c configure SBxx

- From the main SC:

   sc:sms-user:> rcfgadm -c configure SBxx

   or from the following:

   sc:sms-user:> addboard -d  SBxx

Use the cfgadm -alv command to check that all the resources have been added to the domain. After the DR attach, Receptacle/Occupant/Condition must be Connected/Configured/Ok.

DR Detach of PCI card

As stated previously, there are some DR considerations for PCI cards.
All the following issues must be considered before proceeding with the DR detach of a PCI card.

1) Determine if the PCI card has any Quorum Devices by using the following to show the status for all device quorums and node quorums:

    # /usr/cluster/bin/scstat -q

For example:

    # /usr/cluster/bin/scstat -q
[output omitted]
-- Quorum Votes by Device --

                      Device Name         Present Possible Status
-----------         ------- -------- ------
Device votes:     /dev/did/rdsk/d14s2 1        1       Online
Device votes:     /dev/did/rdsk/d15s2 1        1       Online

2) Use the scdidadm command to list the mapping between device entries and DID driver instance numbers.

For example:

    # /usr/cluster/bin/scdidadm -L d14
14       /dev/rdsk/c1t12d0        /dev/did/rdsk/d14

If the DR detach pertains to a quorum device, the operation will be rejected as follows:

    ERROR: Unable to unassign IO3 from domain: B
deleteboard: Library error:
Resource                                   Information
---------                --------------------------------------------
/dev/rdsk/cxtydzsn  Sun Cluster Quorum Disk (DevID="/dev/did/rdsk/dm")

A new quorum device must be added and the current quorum device must be removed.

Use the /usr/cluster/bin/scsetup command to add a new quorum device and then disable the quorum device that needs to be removed.

3) Determine if the PCI card has any active devices in the primary node by using the following to show the status for all disk device groups:

    # /usr/cluster/bin/scstat -D

For example:

    # /usr/cluster/bin/scstat -D
-- Device Group Servers --
Device Group    Primary         Secondary
------------    -------         ---------
Device group servers:  dg1           phys-schost-1   phys-schost-2
Device group servers:  dg2           phys-schost-2   phys-schost-1

    -- Device Group Status --
Device Group    Status
------------    ------
Device group status: dg1           Online
Device group status: dg2           Online

Depending on the nature of the Device Group (Solaris[TM] Volume Manager, VERITAS Volume Manager), use the appropriate commands to determine the Solaris devices path for the device group, and then determine if the PCI card to be removed affects an active device group on the current primary node.

Before proceeding with the DR detach of a Sun[TM] PCi card that belongs to an active device group on the current primary node, you must switch the primary and secondary nodes.

    # /usr/cluster/bin/scswitch -z -D  -h

4) Determine if the PCI card has any Active Private Interconnect Interfaces by using the following to show the status for the cluster transport path:

    # /usr/cluster/bin/scstat -W

For example:

    # /usr/cluster/bin/scstat -W

    -- Cluster Transport Paths --
Endpoint         Endpoint       Status
--------         --------       ------
Transport path:   phys-1:qfe7      phys-2:qfe7    Path online
Transport path:   phys-1:qfe3      phys-2:qfe3    Path online

Use the /usr/cluster/bin/scsetup command to disable and remove the interface from the active interconnect. Correct removal for the cable, adapter, or junction can also be checked using the following:

    # /usr/cluster/bin/scconf -p | grep cable
# /usr/cluster/bin/scconf -p | grep adapter
# /usr/cluster/bin/scconf -p | grep junction

NOTE: Refer to the Sun Cluster 3.x System Administration Guide for more information about Quorum, Global Devices, and Cluster Interconnects administration, and also for detailed instructions to perform the actions previously mentioned.

5) If all the previous tests have been done AND any I/O device activity is stopped AND all the alternate paths to storage and network (MPxIO, IPMP, and so on) are properly set (refer to the appropriate documentation), a PCI card can be DR detached as follows:

    # cfgadm -c disconnect 
 represents the Ap_Id of the PCI card to be removed.

For example:

    # cfgadm -c disconnect pcisch7:e02b1slot2

Use the cfgadm -al command to check the state and condition of the card prior to detaching the PCI card.
Receptacle/Occupant/Condition must be disconnected/unconfigured/unknown.

For example:

    # cfgadm -al | grep pcisch7:e02b1slot2
pcisch7:e02b1slot2 unknown disconnected unconfigured unknown

Check the logs (/var/adm/messages) to confirm that the DR operation has successfully completed.

Then, the PCI card can be safely removed from the hsPCI I/O Board.

DR Attach of a PCI Card

The DR attach of a PCI card is safe and can be done using the following:

 # cfgadm -c configure

Note: represents the Ap_Id of the PCI card to be added.

Use the cfgadm -al command to check the state and condition of the card before and after attaching the PCI card. After the DR attach, Receptacle/Occupant/Condition must be Connected/Configured/Ok.

For example:

 # cfgadm -c configure pcisch7:e02b1slot2
# cfgadm -al | grep pcisch7:e02b1slot2
pcisch7:e02b1slot2 ethernet/hp connected configured ok

If any changes have been applied to the cluster configuration (quorum device removal, transport path removal, node switch), the system administrator can manually configure the cluster as was done previously.

DR Detach of an hsPCI I/O Board

To DR detach an hsPCI I/O Board from a clustered domain, you must consider all the issues for the PCI card removal as described previously.

1) Check that none of the four PCI cards pertain to the following:

A quorum device
Active private interconnect interfaces
A device in the primary node

2) Check that I/O device activity and all the alternate paths to storage and to the network (vxdmp, MPxIO, IPMP, and so on)are properly set.
(Refer to the previous sections for more details.)

3) Check the state and condition of the hsPCI I/O Board prior to detaching it. The Recipient/Occupant/Condition must be disconnected/unconfigured/unknown.

4) Power off the board.

5) Now, you can safely remove the hsPCI I/O Board from the configuration.

- From the domain:

      # cfgadm -c disconnect IOxx

- From the main SC:

      sc:sms-user:> rcfgadm -c disconnect IOxx

   or from the following:

      sc:sms-user:> deleteboard IOxx

DR Attach of the hsPCI I/O Board

The DR attach of the hsPCI I/O Board is safe.

- From the domain:

   # cfgadm -c configure IOxx

- From the main SC:

   sc:sms-user:> rcfgadm -c configure IOxx

   or from the following:

   sc:sms-user:> addboard -d  IOxx

Use the cfgadm -alv command to check that all the resources have been added to the domain. After the DR attach, Receptacle/Occupant/Condition must be Connected/Configured/Ok.

BEST PRACTICES

Best Practices 1:

The following section describes a special boot method for clustered domain to improve usage for the DR feature in such a domain.

Technical Background

After a setkeyswitch on operation, DR will work with Sun Cluster 3.x on all but two boards in the domain, those being both the least and most significantly numbered boards, the most significantly numbered board because of the presence of the kernel cage, the least significantly numbered board because of the potential presence of the heartbeat threads.

Special Boot Method for a Clustered Domain

As an example, your domain consists of SB0-7 and IO0-3. At boot time, you configure your domain with SB0, IO0-3 and key it on.
Once the domain is up, you DR attach each of the other boards in sequence, that is, SB1, then SB2, SB3, and so on until all boards are in the domain.

Benefit and Drawback

Both the kernel cage AND the default "bindings" of the Sun Cluster heartbeat threads will reside on SB0, leaving all but _one_ board, the "boot board," easily detachable using DR. SB0 would then be considered as a system critical resource, like the boot device.

Once a system is booted in this manner, if the system crashes/dstops/hangs, and if sms-svc does automatic system (domain) recovery, the domain will be brought back up in default mode, with SB7 having the cage, and SB0 the heartbeat threads, and so on.
That is, the manual boot process does not survive automated recovery.

Prepare a Domain for Using DR with Sun Cluster

As previously stated, the kernel cage will always be located on a board within the domain to which is assigned the highest ordered slice number.
The PCD address slice mapping can only be modified using an interaction with a kernel cage copy/rename operation. Only a copy/rename or a fresh installation of SMS without restoring an SMS backup image can change this mapping.

The idea is to force the location of the kernel cage and the heartbeat threads on the least significant board prior to the installation of the Sun Cluster software.

As an example, your domain consists of SB0-7 and IO0-3. Prior to Sun Cluster installation, you create a domain with the final configuration SB0-7 and then DR detach each of the other boards in sequence, that is, SB7, then SB6, and so on until all boards are out of the domain but SB0. Hence, SB0 will "host" the kernel cage because it has the highest ordered slice number. You can now DR attach all the system boards back to the domain and install the Sun Cluster software.

Benefit and Drawback

Both the kernel cage AND the default "bindings" of Sun Cluster heartbeat threads will reside on SB0, leaving all but _one_ board, the "boot board," easily detachable using DR. SB0 would then be considered as a system critical resource, like the boot device. Only a fresh installation of SMS without restoring a SMS backup image (no restoration of the platform configuration information) can change the location of the kernel cage so the system can survive any crashes/dstops/hangs/ASR.

This option must be considered early in the installation process and must be done prior to Sun Cluster software installation. Nothing should be running in the domain in the way of user jobs and applications at the time you set up the domain. You should examine the PCD using redx after each detach to make sure that the cage has actually ended up on the least significantly numbered board.

Best practice 2 :

Technical Background

In some rare cases, it's been reported from site that during the DR detach operation of a CPU/Memory Board that does not contain the kernel permanent memory, timeout for fault monitor probes can be reached.

This can lead to a stop for the associated Sun Cluster Data Service.

This can occur mainly when ISM segments are located on the board to be detached from the system; and so, even if the Solaris 8 patch 117350-05 or Solaris 9 patch 117171-08 is installed.

Note that increasing the probe timeout value (Probe_timeout) does not seem like the right course of action in such a case; see Sun Cluster Data Services Planning and Administration Guide for Solaris OS for more details .
Basically, because there is no safe way to determine the correct value.
Such values are configuration dependent and should be sized accordingly; they should be determined after looking at the alert logs.

The safest approach

In clustered domains, the safest approcach is to failover the cluster service to the other node, then perform the DR operations, then fail back when desired.
This way, no timeout issues will be experienced. DR will work similar as in a non-clustered domain.

Alternate option

An alternate solution may be to disable the Data Service Fault Monitor during the DR operation and to enable it back after the operation.
This can be done via the scswitch command as following :
To unmonitor the resource :

    # /usr/cluster/bin/scswitch -M -n -j

You can then use the scstat command to verify that the resource is reported as "Online but not monitored".

To monitor the resource again :

    # /usr/cluster/bin/scswitch -M -e -j

Internal Comments
References & Solutions for Dynamic Recofiguration

Dynamic Reconfiguration for High-End Servers: Part 1 - Planning Phase (Clustered Domains p.41) - BluePrint Part No. 817-5949-10.
Sun Fire[TM] 15K Dynamic Reconfiguration Troubleshooting Guide - http://webhome.emea/sdutille/sf15k/DR_TBS_Guide.html

Document: 1001683.1 Sun Fire[TM] 12K/15K: Location and Relocation of Kernel for DR Operations
Document: 1017710.1 Sun Fire[TM] Servers : Dynamic Reconfiguration and Intimate Shared Memory.
Sun Cluster configuration guide - http://suncluster.eng.sun.com/products/SC3.1/config/sc3ConfigurationGuide-5.htm#46677
Sun Cluster 3.x & Dynamic Reconfiguration (DR) http://sunweb.germany/SSS/MCSC/ET/suncluster/clusttips/sc3xDR.html
Sun Cluster 3.0 System Administration Guide - 806-1423
Sun Cluster 3.1 System Administration Guide - 816-3384
Sun Cluster Data Services Planning and Administration Guide for Solaris OS

For VERITAS Cluster software, refer to the "Veritas Cluster Server Application Note
Sun Fire 12K/15K Dynamic Reconfiguration" available at http://seer.support.veritas.com/docs/254514.htm

Previously Published As 76514

Attachments

This solution has no attachment