Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1010760.1
Update Date:2010-06-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  1010760.1 :   Sun Fire[TM] 15K/12K/E20K/E25K Servers: What Happens in a DR Slot0 Attach Operation  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
214863


Description
The Dynamic Reconfiguration(DR) feature on the Sun Fire[TM] 15K/12K/20K/25K servers, enables you to perform hardware configuration changes to a live domain, that is running the Solaris[TM] Operating System(OS), without causing machine down-time.


Steps to Follow
DR may also be used in conjunction with hot-swap functionality to physically 
remove boards from, or add them to, the server.
DR operations can be executed from the System Controller (SC) by using the System Management Services commands: addboard(1M), moveboard(1M), deleteboard(1M), and rcfgadm(1M). The command on the domain is cfgadm(1M).
THE DR FRAMEWORK 
The following is an architectural overview of the DR infrastructure that is referenced in this document:
  SC				DR Capable Domain
------			  .	-----------------
  -------------                  .
 ( rcfgadm(1M) )------|          .
  -------------       |          .
                      |          .
  -----------------   |          .
 ( showdevices(1M) )--|          .
  -----------------   |          .
                      |          .
  ---------------     |          .
 ( moveboard(1M) )----|          .    
  ---------------     |          .
                      |          .
  --------------      |   -----  .  -----      ------------
 ( addboard(1M) )-----|--( DCA )---( DCS )    ( cfgadm(1M) )
  --------------      |   ----- ^.  -----      ------------
                      |        / .    |           |
  -----------------   |       /  .    -------------
 ( deleteboard(1M) )--|      /   .          |
  -----------------         /    .     --------------
                    network i/f  .    ( libcfgadm(4) )
                                 .     --------------
                                 .          |
                                 .    ---------------     --------
                                 .   ( cfgadm_plugins )--( librcm )
                                 .    ---------------     --------
                                 .          |                 |
                                 .          |             ------------ 
                                 .          |            ( rcm_daemon )
                                 .          |             ------------ 
                                 .          |                 |
                                 .          |            ----------------
                                 .          |            |              |
                                 .          |         ---------      --------
    --------------------         .          |        ( scripts )    (modules )
   ( "Hardware Control" )        .          |         ---------      --------
    --------------------         .          |                  
                  \              . . . . . .|. . . . . . . . . . . . . . . . .
                   \             .          |                  
                    \            .     -----------
                     \           .    ( DR driver )--------------|
                      \          .     -----------               |
                       \         .          |           ------------------
                        \        .          |           |        |        |
                         \       .          |      ---------   ------   -----
                      ( hardware i/f )------|     ( NDI/DDI ) ( CPUs ) ( MEM )
                                 .                 ---------   ------   -----
Domain Configuration Server(DCS) 
The DCS listens for incoming DR requests and facilitates applications on the SC, such as the remote version of cfgadm (rcfgadm)
and so on, to control DR operations on the domain. DCS exports the full functionality of the libcfgadm framework through a 
secure network protocol.
libcfgadm 
This is the main module of the libcfgadm library. It exports the config_admin interface, which in turn offers a generic interface that 
is used by DR. Under this arrangement each piece of hardware that supports DR must supply a hardware-specific plug-in library. 
Hence, a primary function, would be to locate, load, and make calls to the correct hardware-specific plug-in library for the 
hardware type involved in the DR operations.

cfgadm_plugins 
This document focuses on the libcfgadm plug-in for System Board(slot0) DR -- the cfgadm_sbd plug-in (which resides in 
/usr/platform/sun4u/lib/cfgadm). This library provides DR functionality for connecting, configuring, unconfiguring, and 
disconnecting class sbd system boards. It also enables you to connect or disconnect a system board from a running system, 
without having to reboot the system.
DR DRIVER 
The DR driver consists of a platform-independent driver( dr), and a platform-specific module (drmach). The DR driver uses 
standard features of Solaris OS to control DR operations and calls the platform-specific module as needed. The DR driver is also 
responsible for maintaining board information and performing state transition checking.
The drmach driver provides platform-specific DR functionality. Regarding Sun Fire 15K/12K servers, the drmach driver works in 
concert with the In-Kernel Probing(IKP) sub-system to identify devices on a system board part. For example, the drmach driver 
determines if a given device belongs to the system board by determining if the device has a valid port-id that maps to the system 
board, and if the device has a property field name="name." Devices such as CPUs, AXQs, and memory controllers have this 
property.
SAFARI CONFIGURATOR 
A Safari device is one that is connected to a port of a Safari bus, and the Safari Configurator defines the specification that 
describes a common interface for manipulating CPUs, I/O, Graphics, and memory controllers. This is done to facilitate a generic 
way for the devices to interface with one another. 
Because the Safari bus is deployed in many different platforms, there is a need for one loadable module that inter-mediates 
between board drivers and the generic Safari Configurator. On the Sun Fire 15K/12K resident domains a platform-specific 
loadable module(sc_gptwocfg) sits between the DR driver and the Safari Configurator(gptwocfg). This module is responsible for 
mediating transactions between the DR driver and the Safari Configurator and the FCODE interpreter. 
Three functions will be provided:
- sc_probe_board ()
- sc_unprobe_board ()
- sc_next_node ()

THE DR SB ATTACH OPERATION 
The DR attach CLI initiated from the main SC 
v4u-15ka-sc0:sms-svc:2> addboard -d A SB4
The primary point of note here is the loading of the SMS library, libscdr, which in turn provides the Remote DR(RDR) interface 
to the DR CLI. This would facilitate a DR request to the Domain Configuration Agent (DCA).
DCS 
A socket connection between DCA and DCS is set up by establishing a TCP/IP 3-way handshake over the I1 network 
link. This socket connection provides the medium on which RDR request/replies are enabled. 
Receives incoming RDR_CONF_CHANGE_STATE request to change the configuration state of an attachment point. In 
the sample DR ops in this document, the ap_id concerned would be "sb4" and the state change command to execute (that 
is, CFGA_CMD_CONFIGURE) would be through the state_change_cmd field). For more examples of the 
configuration state changes that are possible, refer to the MAN pages for config_admin (Configuration Administration 
Library Functions). 
The CFGA_CMD_CONFIGURE argument is passed on to libcfgadm's config_change_state() function. 
libcfgadm 
The config_change_state() function processes the CFGA_CMD_CONFIGURE request. 
For each ap_id applied to the above config state change ops it will in turn find and load the correct cfgadm_plugins. 
Based on the fact that the SB part's ap_id is classified as "sbd", the plugin loaded for the sample DR SB4 detach ops is 
sbd.so.1. 
cfgadm_plugin(sbd plugin) 
Status ioctl using ap_stat() against the physical ap_id, that is: 
       /devices/pseudo/dr@0:SB4 
     to ensure that the board number returned by the driver, matches the
     plug-in's notion of the board number, as extracted from the ap_id.
GetNCM (that is, get number of components) ioctl (SBD_CMD_GETNCM) against the SB part involved. This should 
return a value of 0 (because components resident on a SB in a disconnected state should not be "visible" to the OS). This 
is followed by a GetSTATUS ioctl (SBD_GET_STATUS). 
The previous GetNCM / GetSTATUS operations are facilitated in part by the dr/drmach pair, that is, dr_pre_op() / dr_post_op
() with CMD=STATUS or GETNCM will direct drmach to initiate a SHOWBOARD request. The returned data of the 
SHOWBOARD reply determines the "no device present" state to the GetNCM ops.
DR DRIVER -- ASSIGN / POWERON / TEST 
ap_seq() "exec assign" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_ASSIGN ). 
dr_pre_op ( CMD=ASSIGN ) will initiate a check for the validity of the state transition requested before drmach will 
initiate an ASSIGN Request. The ensuing ASSIGN reply will facilitate dr_post_op ( CMD=ASSIGN ) and 
drmach_log_sysevent (). 
ap_seq() "exec poweron" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_POWERON ). 
dr_pre_op ( CMD=POWERON ) will initiate a check for the validity of the state transition requested before drmach will 
initiate a POWERON Request. The ensuing POWERON reply will facilitate dr_post_op ( CMD=POWERON ) and 
drmach_log_sysevent (). 
ap_seq() "exec test" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_TEST ). 
dr_pre_op ( CMD=TEST ) will initiate a check for the validity of the state transition requested before drmach will initiate 
a TESTBOARD Request. The ensuing TESTBOARD reply will facilitate dr_post_op ( CMD=TEST ) and 
drmach_log_sysevent (). 
DR DRIVER -- CONNECT 
The ap_seq() "exec connect" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_CONNECT ). 
dr_pre_op ( CMD=CONNECT ) will initiate a check for the validity of the state transition requested before dr_connect(). 
dr_connect() will only proceed if called to operate on an entire board that doesn't already have components present in the 
domain.
drmach_board_connect() will be responsible for building the CASM information portion of the subsequent CLAIM request (to 
the SC).
The drmach initiated CLAIM request to the SC will enclose information from the 18-entry table in the AXQ that defines which 
expander contains the home memory of each 128 GByte range of the Physical Address for PA[41:37], and whether the slots in 
this expander have permission to send these transactions. For example, the example DR attach SB4 operations will involve a 
CLAIM request that includes information about which expander houses (and does not house) memory resident to this domain at 
this point in time -- that is, it will continue to report no memory resident at EX#4 :
v4u-15ka-a drmach: exp4: val=0 slice=0x0
v4u-15ka-a drmach:   MC  0: MADR[0] =0x0, MADR[1] = 0x0
v4u-15ka-a drmach:        : MADR[2] =0x0, MADR[3] = 0x0
v4u-15ka-a drmach:   MC  1: MADR[0] =0x0, MADR[1] = 0x0
v4u-15ka-a drmach:        : MADR[2] =0x0, MADR[3] = 0x0
v4u-15ka-a drmach:   MC  2: MADR[0] =0x0, MADR[1] = 0x0
v4u-15ka-a drmach:        : MADR[2] =0x0, MADR[3] = 0x0
v4u-15ka-a drmach:   MC  3: MADR[0] =0x0, MADR[1] = 0x0
v4u-15ka-a drmach:        : MADR[2] =0x0, MADR[3] = 0x0
Upon the SC's CLAIM reply received by the IOSRAM facilitated Mailbox drmach will initiate the Safari Configurator phase 
through the sc_probe_board() to facilitate the CONFIGURE phase.
DR DRIVER -- CONFIGURE 
ap_seq() "exec configure" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_CONFIGURE ), 
which would initiate the Sun Fire 12K/15K/20K/25K specific configurator module sc_gptwocfg's sc_configure() against 
slot0 at EX4. The ensuing sc_find_axq_node (id = 0x9e) against AXQ0 at EX4 would step through the device tree to 
verify that it is not already configured to this domain. 
Given the expected situation where sc_find_axq_node ( id = 0x9e ) returns 0, the generic Safari Configurator module, 
gptwocfg, will initiate gptwo_configure_axq ( id = 0x9e ) and add the device node. 
Next, it will access the Global DCD Structure from the golden IOSRAM using sc_gptwocfg's sc_get_common_pcd() 
and then proceed on to dump_pcd() against CPU IDs 0x80 to 0x84 ( resident at SB4 ) along with the requisite 
information with regards to its DIMMs / Ecache banks. The returned agentID of the processors will allow the gptwo_cpu 
module to proceed to initiate with configuring in the CPU ( using gptwo_configure_cpu() ) and creating the device for the 
associated memory-controller ( using gptwocfg_create_mc_node() ). 
The previous newly configurated data is then maintained as "cookies" before updating the CASM's slice table using the 
drmach module and re-programming the LPA settings. 
The previous state allows dr_pre_op ( CMD = CONFIGURE ) to prepare and validate the ensuing state transition, that 
is: 
       dr_pre_attach_cpu() --> i_ndi_block_device_tree_changes() 
      to facilitate the drmach module's drmach_configure(), to walk the DDI
      branch a d initiate the online ops against the four CPUs.
The previous will drive dr_post_attach_cpu()'s COLD START initialization of SB4's processors, and transition them into 
the expected CONFIGURED state. The same process is repeated to enable the memory-controllers that are resident on 
the same SB part. 
ap_seq() "exec notify online" against the ap_id involved in this DR ops will initiate GetNCM (that is, get number of 
components) ioctl (SBD_CMD_GETNCM) against the SB part involved. This should return a value of 5 ( four CPUs + 
one memory-controller ). This is followed by a GetSTATUS ioctl (SBD_GET_STATUS). 
The previous GetNCM / GetSTATUS operation is facilitated in part by the dr / drmach pair. That is, dr_pre_op() / dr_post_op() 
with CMD=STATUS or GETNCM will direct drmach to initiate a SHOWBOARD request. The returned data from the 
SHOWBOARD reply will determine the returned value to the GetNCM ops.
The final operation to wrap the whole DR attach process would be for the RCM to be notified of the current amount of 
configured memory and the current number of configured CPUs through the SBD class's plugin's ap_rcm_cap_cpu() and 
ap_rcm_cap_mem(). 
NOTE: Although mostly concerned with DR detach operations, readers should be familiar with the changes to the kernel cage 
under Solaris 9 OS KU patch 118558-05. See Technical Instruction < Solution: 217037 > for more information.



Product
Sun Fire 15K Server
Sun Fire 12K Server
Sun Fire E25K Server
Sun Fire E20K Server

12K, 15K, 20K, 25K, starcat, DR, dynamic reconfiguration, casm, IKP, sc_gptwocfg, drmach, librcm, sbd, cfgadm_sbd, dcs, dca, config_admin, libcfgadm
Previously Published As
76338

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback