Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1008805.1
Update Date:2011-05-09
Keywords:

Solution Type  Problem Resolution Sure

Solution  1008805.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: Remote Dynamic Reconfiguration (DR) generates "DCA/DCS Communication Error" and showdevices is “Unable to get device information from domain”.  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
212092
***Checked for relevance on 09-May-2011***

Applies to:

Sun Fire E20K Server
Sun Fire E25K Server
Sun Fire 15K Server
Sun Fire 12K Server
All Platforms

Symptoms

The rcfgadm or showdevices commands, generate errors from the system controller (SC). The error message might be "DCA/DCS Communication Error" when executing these commands. The command showdevices might generate the following error (where x is the domain ID):
# showdevices -v -d x
Unable to get device information from domain x

This showdevices error could also be seen in Explorer data from the Main System Controller (SC).  The file, showdevices_-v_-d_x.out, which is in the /explorer/sf15k/ directory of Explorer will show the same "Unable to get device information from domain x" error message.

Also the following messages might be logged in the platform log file on the SC ( $SMSVAR/adm/platform/messages )

xcat-sc0 showdevices[7496]: [0 2197706244444996 ERR ri_init.cc 85] rcfgaRequestProxy->ri_init failed. Status= 4315
xcat-sc0 showdevices[7496]: [4509 2197706254650079 ERR RcfgaCallback.cc 521] server accept failed. RcfgaCallback::serverAccept: failed in ioctl domain id = I

Cause

These errors can be caused by configuration.

Solution

To resolve the problem:
  • ensure that the network interfaces are properly configured and running,

  • verify that relevant parameters are not commented out of key files,

  • verify that the appropriate daemons are running.

>scman0 and dman0

The dca <> dcs handshaking takes place over the I1 network.  This means that scman0 on the SC and dman0 on the domain must be configured and running properly.   This is often overlooked, so be sure to verify this information with the following command:

On SC:

# ifconfig -a
scman0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 3
inet 10.10.1.1 netmask ffffffe0 broadcast 10.10.1.31

On domain:

dman0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 3
inet 10.10.1.3 netmask ffffffe0 broadcast 10.10.1.31
ether 0:0:be:a8:17:57
Note that the IP addresses and netmasks on the dman0 and scman0 interfaces should match the information stored in the /etc/SUNWMSMS/config/MAN.cf file on the SC.

 This should be further confirmed by running the following command on the domain:

# ndd /dev/dman man_get_hostinfo
manc_magic = 0x4d414e43
manc_version = 01
manc_csum = 0x0
manc_ip_type = AF_INET
manc_dom_ipaddr = 10.10.1.3
manc_dom_ip_netmask = 255.255.255.224
manc_dom_ip_netnum = 10.10.1.0
manc_sc_ipaddr = 10.10.1.1
manc_dom_eaddr = 0:0:be:a8:17:57
manc_sc_eaddr = 8:0:20:fa:5f:1a
manc_iob_bitmap = 0xa0 io boards = 5.1, 7.1,
manc_golden_iob = 5

Domain Configuration Agent(DCA)

The Domain Configuration Agent (DCA) daemon runs on the SC,one per domain. Similar to a netcon session on a Sun Enterprise[TM] 10000 server, the DCA provides communication between the DCA on the SC and the Domain Configuration Server (DCS) on the specified domain.  If DCA is not running, the showdevices and the rcfgadm commands fail.

To verify that DCA is running, issue the following command on the SC:

# ps -ef | grep dca sms-dca 1614 361 0 Feb 26 0:00 dca -d A

Domain Configuration Server (DCS)

DCS is a domain daemon process that supports remote dynamic reconfiguration.     DCS must also be running on the domain in order for the showdevices or rcfgadm commands to work on the domain.

If either command fails, check the domain for the following lines in the /etc/inetd.conf file:

sun-dr stream tcp wait root /usr/lib/dcs dcs
sun-dr stream tcp6 wait root /usr/lib/dcs dcs

These lines must be in the /etc/inetd.conf file for the rcfgadm and showdevices commands to work properly.  If the lines are not in the file, and showdevices fails from the SC, add the indicated lines above and restart the inetd process as follows:

# ps -ef | grep inetd root 151 1 0 Mar 11 0:00 /usr/sbin/inetd -s # kill -HUP 151

For additional information, refer to the man page about dcs.

Note for domains running Solaris[TM] 10 (without patch 120253-02 ):

The /etc/inetd.conf file is no longer directly used to configure inetd. inetd is now configured in the Service Management Facility. You can get the list of the list of all the SMF services installed.

# inetadm ENABLED STATE FMRI enabled     online  svc:/application/font/stfsloader:default [output omitted] disabled    disabled svc:/network/talk:default enabled     online  svc:/platform/sun4u/dcs:default
[output omitted]

The /platform/sun4u/dcs service must be enabled/online.

You can now get more information from the svc:/platform/sun4u/dcs service and list its properties via the svccfg command :

# /usr/sbin/svccfg -s svc:/platform/sun4u/dcs:default listprop general framework general/enabled boolean true restarter framework NONPERSISTENT restarter/auxiliary_state astring none restarter/next_state astring none restarter/state astring online restarter/state_timestamp time 1117463395.870876000 restarter/contract count 94 inetd_state framework NONPERSISTENT inetd_state/cur_state integer 1 inetd_state/next_state integer 13 inetd_state/start_pids integer svc:/platform/sun4u/dcs:default> quit

If any dcs processes are running, pids will be reported in inetd_state/start_pids.

Note that, on domains running Solaris 10 w/o 120253-02, the dcs process will not be running if the SC has not recently communicated with the domain. It's forked by inetd upon request (Remote DR request started from the SC). Hence, the PPID for dcs is the  inetd PID.

Ex :
# ptree 304
159 /usr/sbin/inetd -s
   304 dcs

Note for domains running Solaris[TM] 10 Update 2( with patch 120253-02 ):

Due to the fixes for :

Bug ID 4792021 per-socket level IPsec policy for dynamic reconfiguration
Bug ID 6380945 Changes required for PSARC 2006/038 introduced in patch 120253-02, dcs does not belong to inetd any longer.

Since inetd does not support per-socket IPsec, dcs will be changed to run standalone. Both dcs and cvcd will be controlled by SMF and use SMF properties to define command line options.  Hence, running:

inetadm | grep dcs

will not return information about dcs any longer.

Use the following command to get the status from the dcs service:

# svcs dcs
STATE          STIME    FMRI
online        13:53:40 svc:/platform/sun4u/dcs:default

Note that, on domains running Solaris 10 U2 or w/ 120253-02

The dcs process starts at boot time. And due to the new implementation, dcs will now be running with different options and will  accept command line arguments ("-a", "-e", and "-u") allowing the administrator to configure the encryption and authentication IPsec options.  Where:

  • "ah_auth" corresponds to the "-a" option.
  • "esp_encr" corresponds to the "-e" option.
  • "esp_auth" corresponds to the "-u" option.

See the manpages for dcs(1M) for more details.

Example:
# ptree 220
220 /usr/lib/dcs -a md5
Note that the dcs process might not be running if the SC has not recently communicated with the domain.
To check to see if any process is actually listening on the sun-dr port (port 665), run:
e25ka-dom-c# netstat -an | grep 665
*.665         *.*            0      0 49152      0 LISTEN
*.665         *.*            0      0 49152      0 LISTEN

This verifies that there is indeed some process listening on the sun-dr port, 665. If there is nothing listening on port 665, then the showdevices and addboard / deleteboard commands on the SC can never work properly.

The /etc/services File

The /etc/services file must also have the following entry on the domain for remote Dynamic Reconfiguration (DR):

sun-dr 665/tcp # Remote Dynamic Reconfiguration
If you are using the NIS+, make sure that above entry is present in the /etc/services file of NIS+ server. You can check this using the following command:

 

$ niscat services.org_dir | grep sun-dr sun-dr sun-dr tcp 665  Remote Dynamic Reconfiguration

 

/etc/inet/ipsecinit.conf File on the Domain

When running Solaris 9 or below the /etc/inet/ipsecinit.conf file should contain the following entries:

{ dport sun-dr ulp tcp } permit { auth_algs md5 } { sport sun-dr ulp tcp } apply { auth_algs md5 sa unique } { dport cvc_hostd ulp tcp } permit { auth_algs md5 } { sport cvc_hostd ulp tcp } apply { auth_algs md5 sa unique }

If the entries do not exist, add them and then issue:

# ipsecconf -a /etc/inet/ipsecinit.conf
Use the following command to check that the system is now running with these settings:
# ipsecconf
If the domain is running Solaris 10 with patch 120253 then the service is managed by SMF and will not need the ipsecinit.conf file.

The /etc/inet/ipsecinit.conf MUST not be present on the System Controller (SC) in order to avoid failover machanism not working properly.

Domain X Server (DXS)

The console command uses DXS.  It is similar to the netcon_server on the Sun Enterprise[TM] 10000 server.  DXS runs on the SC, one per domain. 

To verify that DXS is running, issue the following command on the SC:

# ps -ef | grep dxs sms-dxs 1609 361 0 Feb 26 0:57 dxs -d A

Console commands take place over the console bus but can be toggled between the console bus and I1 network using the ~= command.  When the domain is rebooting, a message appears on the SC that is similar to "dxs disconnecting."  The reboot of a domain causes an hpost -Q. which is a quick POST from the SC.

Sun Fire[TM] 12K/15K/E20K/E25K key management daemon (sckmd)

The sckmd server process resides on a Sun Fire[TM] 12K/15K/E20K/E25K domain.  The sckmd daemon maintains the Internet Protocol Security (IPsec) Security Associations (SAs) needed to secure the communication between the SC and the cvcd and dcs daemons running on the domains.

The sckmd daemon must be running on the domain in order for the "showdevices" or "rcfgadm" commands to work on the domain.

To verify that the sckmd daemon is running, issue the following command on the domain:

# ps -ef | grep sckmd root 24156 1 0 Apr 02 0:00 /usr/platform/SUNW,Sun-Fire-15000/lib/sckmd

Failure after a Solaris[TM] 10+ OS initial installation

Upon the initial installation of a Solaris 10+ domain, showdevices/rcfgadm will not work successfully.  Running the commands will generate domain-side console messages such as:

Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=ADD, errno=22: Invalid argument, diagnostic code=40: Unsupported authentication algorithm Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=DELETE, errno=3: No such process, diagnostic code=0: No diagnostic Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=ADD, errno=22: Invalid argument, diagnostic code=40: Unsupported authentication algorithm Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=DELETE, errno=3: No such process, diagnostic code=0: No diagnostic Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=ADD, errno=22: Invalid argument, diagnostic code=40: Unsupported authentication algorithm Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=DELETE, errno=3: Nosuch process, diagnostic code=0: No diagnostic

To fix this, on the domain, issue the command:

    # ipsecalgs -s

For a more detailed explanation on this issue, please see Bug ID 6233334

Failure after a Solaris[TM] 10 Update 2 Installation or after installing 120253-02 on Solaris[TM]10.

After an upgrade to Solaris[TM] Update 2 or patch installation the dcs service may fail to go online, staying in maintenance mode and the dcs process is not running :

Jul 27 13:50:30 inetd[284]: Unspecified inetd_start method for instance svc:/platform/sun4u/dcs:default Jul 27 13:50:30 inetd[284]: Invalid configuration for instance svc:/platform/sun4u/dcs:default, placing in maintenance Jul 27 13:50:30 inetd[284]: Invalid configuration for instance svc:/platform/sun4u/dcs:default, placing in maintenance


# svcs dcs STATE STIME FMRI maintenance 13:52:23 svc:/platform/sun4u/dcs:default

Check the reason why dcs never got online via the  /etc/svc/volatile/platform-sun4u-dcs:default.log log file.

# svcs -xv svc:/platform/sun4u/dcs:default (domain configuration server) State: maintenance since Thu 20 Jul 2006 13:50:30 AM MEST Reason: Start method failed repeatedly, last exited with status 1. See: http://sun.com/msg/SMF-8000-KS See: man -M /usr/share/man -s 1M dcs See: /etc/svc/volatile/platform-sun4u-dcs:default.log
Impact: This service is not running.
To fix this, on the domain, restart the services :
# svcadm disable dcs # svcadm enable dcs # svcs dcs STATE STIME FMRI online 13:53:40 svc:/platform/sun4u/dcs:default
Since dcs is not available, rcfgadm/showdevices not work successfully.
If using a separate /usr partition the workaround for CR# 6453706 will need to be used to define a dependency for the /usr filesystem.
svccfg -s svc:/platform/sun4u/dcs svc:/platform/sun4u/dcs> addpg SUNW,workaround dependency svc:/platform/sun4u/dcs> setprop SUNW,workaround/entities = fmri:svc:/system/filesystem/local svc:/platform/sun4u/dcs> setprop SUNW,workaround/grouping = astring: require_all svc:/platform/sun4u/dcs> setprop SUNW,workaround/restart_on = astring: none svc:/platform/sun4u/dcs> setprop SUNW,workaround/type = astring: service svc:/platform/sun4u/dcs> exit
# svcadm refresh svc:/platform/sun4u/dcs # svcs -d dcs STATE STIME FMRI online 17:46:06 svc:/network/loopback:default online 17:46:09 svc:/system/identity:node online 17:46:22 svc:/system/filesystem/local:default

Failure after an upgrade to Solaris[TM] 10 Update 2.

After an upgrade to Solaris[TM] Update 2, the dcs service may fail to go online, staying in maintenance mode and the dcs process is not running :

Sep 19 10:57:55 inetd[250]: Property 'name' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid Sep 19 10:57:55 inetd[250]: Property 'endpoint_type' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid Sep 19 10:57:55 inetd[250]: Property 'isrpc' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid Sep 19 10:57:55 inetd[250]: Property 'wait' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid Sep 19 10:57:55 inetd[250]: Unspecified inetd_start method for instance svc:/platform/sun4u/dcs:default Sep 19 10:57:55 inetd[250]: Invalid configuration for instance svc:/platform/sun4u/dcs:default, placing in maintenance


# svcs -xv
svc:/platform/sun4u/dcs:default (domain configuration server)
State: maintenance since Tue Sep 19 10:57:55 2006
Reason: Restarter svc:/network/inetd:default gave no explanation.
See: http://sun.com/msg/SMF-8000-9C

  See: man -M /usr/share/man -s 1M dcs
Impact: This service is not running

The new manifest /var/svc/manifest/platform/sun4u/dcs.xml provided by 120253-02 (bundled in S10U2) has not been applied properly so inetd is still trying to start it. The general/restarter property for the dcs service should now be startd and no longer be inetd.

# svcprop dcs general/enabled boolean true general/entity_stability astring Unstable general/restarter fmri
svc:/network/inetd:default dcs/ah_auth astring md5
[...]

See CR#  6472374 for more details.

To fix this problem, the new manifest must be imported using the following procedure :

# svcs dcs
STATE          STIME    FMRI
maintenance    10:57:55 svc:/platform/sun4u/dcs:default

# svcadm disable dcs

# Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'name' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'endpoint_type' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'isrpc' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'wait' of instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Unspecified inetd_start method for instance svc:/platform/sun4u/dcs:default

# svcs dcs
STATE          STIME    FMRI
disabled       11:02:13 svc:/platform/sun4u/dcs:default

# svccfg -v delete dcs
# svcs dcs
svcs: Pattern 'dcs' doesn't match any instances
STATE          STIME    FMRI

# svccfg -v import /var/svc/manifest/platform/sun4u/dcs.xml
svccfg: Taking "initial" snapshot for svc:/platform/sun4u/dcs:default.
svccfg: Taking "last-import" snapshot for svc:/platform/sun4u/dcs:default.
svccfg: Refreshed svc:/platform/sun4u/dcs:default.
svccfg: Successful import.

# svcs dcs
STATE          STIME    FMRI
disabled       11:03:04 svc:/platform/sun4u/dcs:default

# svcadm enable dcs
# svcs dcs
STATE          STIME    FMRI
online         11:03:20 svc:/platform/sun4u/dcs:default

# svcs -p dcs
STATE          STIME    FMRI
online         11:03:20 svc:/platform/sun4u/dcs:default
11:03:20      717 dcs

# svcprop dcs
general/enabled boolean false
general/entity_stability astring Unstable
dcs/ah_auth astring md5
[...]

 

Note that when no general/restarter is mentionned, the default one - startd   is used.

 

**Note, in certain instances this workaround is not the complete fix.  On certain systems it has been found that an inetconv command has been run, resulting in two services called  sun-dr  being created that will stop the DCS service from being able to start even after following the above workaround.

To check for this condition:
#  svcs  -xv
svc:/platform/sun4u/dcs:default (domain configuration server)
State: maintenance since Thu Nov 15 19:16:38 2007
Reason: Restarter svc:/network/inetd:default gave no explanation.
  See: http://sun.com/msg/SMF-8000-9C
  See: man -M /usr/share/man -s 1M dcs
Impact: This service is not running.

# svcs -a | grep sun-dr
online         -             19:14:48      - svc:/network/sun-dr/tcp6:default
online         -             19:14:48      - svc:/network/sun-dr/tcp:default

To clear this condition:
1. Remove 2 sun-dr lines from /etc/inetd.conf
2. svcadm disable svc:/network/sun-dr/tcp:default
3. svcadm disable svc:/network/sun-dr/tcp6:default
4. svccfg delete -f svc:/network/sun-dr/tcp:default
5. svccfg delete -f svc:/network/sun-dr/tcp6:default
6. rm /var/svc/manifest/network/sun-dr-tcp.xml
7. rm /var/svc/manifest/network/sun-dr-tcp6.xml
8. svcadm disable svc:/platform/sun4u/dcs
9. svccfg delete -f svc:/platform/sun4u/dcs
10. svccfg -v import /var/svc/manifest/platform/sun4u/dcs.xml
11. svcadm enable svc:/platform/sun4u/dcs

Starcat, 12k, 15k, dca, dcs, dr, DCA/DCS communication error, E20K, E25K, Unable to get device information from domain, rcfgadm, showdevices, console, ipsecinit.conf, svcs
Previously Published As 51772



Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback