Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1006489.1
Update Date:2010-06-07
Keywords:

Solution Type  Technical Instruction Sure

Solution  1006489.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: Main System Controller's Community Network (C1) interface failure causes a failover onto the Spare SC.  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
209085


Description
A failure of all configured C1 (Public) network interfaces will cause the SMS Main System Controller (SC) role to failover to the Spare and the outgoing Main to reboot.

This is a normal behavior which is designed for the failover mechanism on the SC's on a Sun Fire[TM] 12K,15K,E20K,E25K.



Steps to Follow
On a C1 network configured using IPMP on SCs, if the IPMP "test" network interface fails its health check (Typically a ping test to a router or a multicast to the network) IPMP will try to fail over the logical IP to another member of the IPMP group.

If the IPMP group is completely unavailable then the Solaris[TM] "policing" action takes over. At that point the Failover Management Daemon (fomd) may have to take action and possibly then force a failover to the Spare SC.

The following message will be logged into the Main SC's /var/adm/messages before it reboots :

Sep 02 22:00:52 e25k-sc0 in.mpathd[140]: [ID 168056 daemon.error] All Interfaces in group C1 have failed
Sep 02 22:02:35 e25k-sc0 genunix: [ID 672855 kern.notice] syncing file systems...
Sep 02 22:02:35 e25k-sc0 genunix: [ID 904073 kern.notice]  done
Sep 02 22:06:02 e25k-sc0 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Version Generic_117171-12 64-bit

The $SMSVAR/adm/platform/messages file will have the following error messages logged:

Sep 02 22:00:54 2005 e25k-sc0 fomd[513]: [8569 1638658827360354 NOTICE FailoverMgr.cc 1279] The external network test FAILED
Sep 02 22:02:23 2005 e25k-sc0 fomd[513]: [8567 1638747925688988 NOTICE FailoverMgr.cc 1956]
Failing over to the spare SC because of the following faults on the main SC: External Network Failure

There may be various reasons why the both the interfaces on the C1 network fails. Typical scenarios may include but are not restricted to:

  • Routers or bridges not responding or the end device connected to the C1 network ports are not responding.
  • IPMP not configured properly for the C1 network.
  • Worst case , but least probable - both the network interfaces are malfunctioning.

The showfailover -v command on the Main SCcommand will give you the status of the C1 network :

Status of e25k-sc1:
Role:                    .......................................MAIN
:
:
Public Network:
Group "C1":      .........................................Up
eri0:              .........................................Up
eri3:              .........................................Up
Logical IP Addr. - C1:.........................................Up
:
:

The ifconfig -a command on the Main SC will give you details about the C1 network interfaces :

eri0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 10.0.0.5 netmask ffffff00 broadcast 10.0.0.255
groupname C1
ether 0:3:ba:6b:d6:48
eri0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.0.0.10 netmask ffffff00 broadcast 10.0.0.255
eri3: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
inet 10.0.0.15 netmask ffffff00 broadcast 10.0.0.255
groupname C1
ether 0:3:ba:6b:d6:49


Product
Sun Fire E25K Server
Sun Fire E20K Server
Sun Fire 15K Server
Sun Fire 12K Server

Internal Comments
Some internals on how the Community network related test works on the Main SC :

XnetTest is a thread which belongs to the fomd (Failover Management Daemon) daemon.


The function of the XNetTest thread is to test the external network.


This test checks for the presence of incoming packets on each of the external network interface adapters. This information is retrieved using the Solaris kstat routines.


For each of the configured external network interfaces, the test retrieves the current rxpkt (Received Packet) count, waits for two seconds, and then retrieves the rxpkt count again.


The two counts are then compared. If they differ then the given network interface is considered to have passed the test.


If the two counts are equal, the test proceeds to attempt to generate incoming packets on the given network interface by pinging the multicast router address (224.0.0.2), the multicast broadcast address (224.0.0.1) and finally the subnet broadcast address (x.x.x.255). If all of these pings fail to generate any incoming packets on the given network interface, that interface is considered to have failed the test.


If there is at least one good interface on a given SC, the XNetTest posts a TEST_PASSED to the FM, otherwise it posts TEST_FAILED. This will lead to the SC failover if it is the Main SC.


If routing over the I2 network is allowed then a XNetTest failover on the main SC will never trigger a failover.


A XNetTest failure on the spare SC will always disable the failover mechanism because there is no advantage in failing over to the spare SC if the spare is unreachable from the external network.


References :
 <Document: 1018896.1> (Sun Fire[TM] 12K/15K/E20K/E25K: Management Networks (MAN))

Starcat System Controller Failover Software Specification.


SC, C1 network, Community network, 12K, 15K, 20K, 25K, SMS, fomd, MAN Network
Previously Published As
82854

Change History
Updated by the ESG Knowledge Content Team 4/2010
Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback