Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1007966.1
Update Date:2011-05-11
Keywords:

Solution Type  Problem Resolution Sure

Solution  1007966.1 :   Sun Fire[TM] 12K/15K E20K/E25K servers: Can't setkeyswitch domains. POST aborts with ecode=2  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
210986


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server
All Platforms

Symptoms

The System Management Services (SMS) setkeyswitch command fails to change the domain state to standby, off, or on, instead aborting the POST process with ecode=2.

Example of setkeyswitch command-line output:
$ setkeyswitch -d a on
[50000] file open failed: file=/var/opt/SUNWSMS/SMS1.4.1/.lock/A/setkeyswitch, ecode=2
[50002] file unlink failed: file=/var/opt/SUNWSMS/SMS1.4.1/.lock/A/setkeyswitch, ecode=2
[5311] setkeyswitch lock acquisition failed: ecode=2
From the file /var/opt/SUNWSMS/adm/<domain_id>/messages, after trying to execute the setkeyswitch command over a specific domain:
May 31 10:44:23 2005 nscssc01 setkeyswitch[9467]-A(): [50000 64834661870356 ERR setKeyswitchLock.cc 74] file open failed:
file=/var/opt/SUNWSMS/SMS1.4.1/.lock/A/setkeyswitch, ecode=2
May 31 10:44:23 2005 nscssc01 setkeyswitch[9467]-A(): [50002 64834674914132 ERR setKeyswitchLock.cc 75] file unlink failed:
file=/var/opt/SUNWSMS/SMS1.4.1/.lock/A/setkeyswitch, ecode=2
May 31 10:44:23 2005 nscssc01 setkeyswitch[9467]-A(): [5311 64834675904421 ERR setKeyswitchLock.cc 76] setkeyswitch lock acquisition failed: ecode=2
The regular hpost file will not be generated in the domain directory, as hpost has not started at this point. The error appears in the terminal where the setkeyswitch command was executed or in the /var/opt/SUNWSMS/adm/<domain_id>/messages file.

Changes

Manual intervention.

Cause

Within the structure of the SMS directories, under /var/opt/SUNWSMS, the symbolic link /var/opt/SUNWSMS/.lock points to the directory /var/opt/SUNWSMS/SMS/.lock. In the .lock directory, there is a directory entry for each domain, that is: A, B, C, through to R.

If a domain has ever been keyswitched on, its domain directory (A through R) has a file in it, named setkeyswitch. This file is opened by the setkeyswitch process at the time of a setkeyswitch operation, as a means of knowing that a keyswitch operation is in progress.

This can be observed by using fuser. For example:

With a keyswitch operation running for domain "A"
$ fuser /var/opt/SUNWSMS/.lock/A/setkeyswitch
/var/opt/SUNWSMS/.lock/A/setkeyswitch:    13775o
If there was no setkeyswitch operation
$ fuser /var/opt/SUNWSMS/.lock/A/setkeyswitch
/var/opt/SUNWSMS/.lock/A/setkeyswitch:
In the case that prompted the creation of this document, the /var/opt/SUNWSMS/.lock link was broken, and the /var/opt/SUNWSMS/SMS/.lock directory was empty, i.e. it had none of the domain directories, A through R.

The setkeyswitch operation was unable to create (or open) its lockfile, as the directory where it wanted to open it did not exist. The setkeyswitch code is not designed to expect this, and fails.

The only attempts made to open or create the file, are on the file itself:
18054:  open("/var/opt/SUNWSMS/SMS1.4.1/.lock/A/setkeyswitch", O_WRONLY|O_SYNC|O_CREAT|O_EXCL, 0666) Err#2 ENOENT
18054:  open("/var/opt/SUNWSMS/SMS1.4.1/.lock/A/setkeyswitch", O_WRONLY) Err#2 ENOENT
The only known way for this to happen, is for someone to remove components of the SMS directory structure.

Solution

The fix for this issue, is a little difficult, as the directories in /var/opt/SUNWSMS/.lock have ACL's (Access Control Lists), and these ACL's form a part of the SMS security mechanism.

The directories cannot simply be re-created. The directories and their ACL's/permissions, need to be recovered from somewhere.

Essentially, there are only 2 places the directories could come from:
  1. Installation of the SMS packages
  2. The current spare System Controller (SC).
NOTE: smsbackup does NOT backup the setkeyswitch directories.

To recover from the files available on the spare SC:

 1. Log in to the spare SC

 2. Move to the directory /var/opt/SUNWSMS

 3. Tar the directory ./SMS/.lock, for example:    
tar cvfp lock.tar ./SMS/.lock
================================================
Now, move to the the SC with the problem, and...
================================================

 4. Using ftp or some other remote copying program, copy the the lock.tar from the spare SC to the main SC and into the directory /var/opt/SUNWSMS

 5. cd to the directory /var/opt/SUNWSMS

 6. Untar the file lock.tar, for example:  
tar xvfp lock.tar
 7. Create the .lock symbolic link, for example:
ln -s ./SMS/.lock .lock
 8. Make the SC with the problem the main SC

 9. Test the setkeyswitch command to ensure complete functionality.

If there are continuing issues, please engage a Oracle Service representative to help resolve the issue.

Additional Information

Note: While this is a reasonable means of remediation, the questions that should be asked are:
  • How did this happen?
  • What happened to cause this?
  • What else might have been damaged or removed?
If these questions cannot be answered, and it is uncertain what else may have been impacted, then perhaps the best solution would be to remove and re-install the SMS software, then complete the configuration from the beginning.

See also "Sun Fire[TM] 12K/15K/E20K/E25K: Incorrect permission(s) on SMS Directories" Doc ID 1006613.1, which discusses other potential causes of the ecode=2 error.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback