Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1012305.1
Update Date:2010-08-11
Keywords:

Solution Type  Technical Instruction Sure

Solution  1012305.1 :   Sun Fire[TM] 15K/12K Servers: DR attach/detach operations failing due to portid conflicts arising from third party device drivers  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire E25K Server
  •  
  • Sun Fire 15K Server
  •  
  • Sun Fire E20K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
216980


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server - Version: Not Applicable to Not Applicable   [Release: NA to NA]
Sun Fire E25K Server - Version: Not Applicable to Not Applicable   [Release: NA to NA]
Sun SPARC Sun OS

Goal

The dynamic reconfiguration (DR) feature on the Sun Fire[TM] 12K/15K/20K/25K servers enables you to perform hardware configuration changes to a live domain that is running the Solaris[TM] Operating System without causing machine downtime.

This article discusses a known condition with regards to DR operations on Sun Fire 12K/15K/20K/25K servers arising from portid conflicts introduced as a result of certain third party device drivers.

Solution

You can execute DR operations from the system controller (SC) by using the SMS commands: addboard(1M), moveboard(1M),deleteboard(1M), and rcfgadm(1M).

Alternatively, the same DR operations can also be initiated from the domain OS environment via the cfgadm CLI.
The DR attach/detach operations' failure signature associated with such events is as follows:

A. Failed DR attach operations

hgc-6-sc0% date;addboard -da sb0
assign SB0
...
assign SB0 done
poweron SB0
....
poweron SB0 done
test SB0
.........
test SB0 done
connect SB0
......
ERROR: Unable to configure SB0 into domain: A

addboard: Hardware specific failure: connect SB0: Cannot read property value:
Device Node 0x0: property name


The above DR attach failure event is further marked by the following message logs in the domain OS environment:

Aug 16 06:22:48 aji drmach: [ID 801593 kern.warning] WARNING: Cannot read property value: Device Node 0x0: property name
Aug 16 06:22:48 aji dcs: [ID 903045 daemon.error] <718> config_change_state: Hardware specific failure: connect



B. Failed DR detach operations

xcat3-sc0:sms-svc:6> deleteboard SB0
request delete capacity (4 cpus)
request delete capacity (524288 pages)
request delete capacity SB0 done
request offline SUNW_cpu/cpu0
request offline SUNW_cpu/cpu1
request offline SUNW_cpu/cpu2
request offline SUNW_cpu/cpu3
request offline SUNW_cpu/cpu0 done
request offline SUNW_cpu/cpu1 done
request offline SUNW_cpu/cpu2 done
request offline SUNW_cpu/cpu3 done
unconfigure SB0
unconfigure SB0 done
notify remove SUNW_cpu/cpu0
notify remove SUNW_cpu/cpu1
notify remove SUNW_cpu/cpu2
notify remove SUNW_cpu/cpu3
notify remove SUNW_cpu/cpu0 done
notify remove SUNW_cpu/cpu1 done
notify remove SUNW_cpu/cpu2 done
notify remove SUNW_cpu/cpu3 done
disconnect SB0

ERROR: Unable to unassign SB0 from domain: A
deleteboard: Hardware specific failure: disconnect SB0: Solaris failed to deprobe: SB0


The above DR detach failure event is further marked by the following message logs in the domain OS environment:

Jul 21 23:30:57 nebula2 gptwocfg: [ID 200766 kern.notice] ndi_devi_offline failed
...
Jul 21 23:30:57 nebula2 drmach: [ID 801593 kern.warning] WARNING: Cannot read property value:
Device Node 0x0: property name
Jul 21 23:30:57 nebula2 dcs: [ID 678942 daemon.error] <7858> config_change_state:
Hardware specific failure: disconnect SB0: Solaris failed to deprobe: SB0


The root-cause of the above failure signatures have been isolated to the fact that if there exist a device node within the domain OS environment whose portid maps to a board that is involved with the DR attach/detach operations & where this specific device driver instance does not have a "property-name" attribute, will result in the failure of the DR operations. The above example demonstrates different symptoms of the same error condition.

In the DR attach failure example detailed above, the device node originating the DR addboard failure was isolated to an instance of the Hitachi HDLM driver dlmfdrv, whilst in the DR detach failure example detailed above, the device node found to be in conflict with SB0's processors was isolated to an instance of the Compaq RAID driver swsp.

Underlying in the above conflicting portid scenarios is the incorrect assumption that the portid for devices attached to the Safari bus are unique. In both situations depicted above the third party device driver instance resulted in a duplicated portid condition.

This resulted in the DR failure problem, caused by the improper identification of devices (i.e., relying on the portid property attribute as the primary means of locating the device list of a DR operations' target board).

The above issue with regards to the employment of portid as the primary means of identifying device(s) can occur with either Solaris[TM] 8 or Solaris[TM] 9 environment.

This bug-fix to the Solaris Operating Environment is addressed via the following:
  • Solaris 8
111335-27 = platform/SUNW,Sun-Fire-15000/kernel/misc/sparcv9/drmach
116979-04 = platform/SUNW,Sun-Fire-15000/kernel/misc/sparcv9/platmod
110836-06 = platform/sun4u/kernel/misc/sparcv9/gptwocfg
  • Solaris 9
bug fixes are in patch 117124-05 which was integrated into 122300

Internal Comments
For internal Oracle/Sun use only
see Bug IDs 4873095 & 4913987
starcat, dr, cfgadm, addboard, deleteboard, deprobe, property, name, portid, attribute, drmach
Previously Published As Doc 80017
Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback