Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1019625.1
Update Date:2010-08-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1019625.1 :   fmd event transport unable to support more than 6 running domains  


Related Items
  • Sun SPARC Enterprise M8000 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M9000-64 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>OPL Servers
  •  

PreviouslyPublishedAs
242526


Symptoms
Booting a 7th domain will cause the XSCF event-transport module to fail.
See CR 6716103

Impact:
----------
When the fma 'event-transport' module is failed, fault events won't be exchanged between domains and the xscf. Hence, this will prevent the XSCF from taking the appropriate action for faults diagnosed on the domain (e.g., offlining a cpu or deconfiguring a dimm due to too many correctable errors)

  1. On the XSCF
    • fma module "event-transport" in failed status
      • Example:
            XSCF> fmadm config
            MODULE                   VERSION STATUS  DESCRIPTION
            case-close               1.0     active  Case-Close Agent
            event-transport          2.0     failed  Event Transport Module
            faultevent-post          1.0     active  Gate Reaction Agent for errhandd
            flush                    1.10    active  Resource Cache Flush Agent
            fmd-self-diagnosis       1.0     active  Fault Manager Self-Diagnosis
            iox_agent                1.0     active  IO Box Recovery Agent
            reagent                  1.16    active  Reissue Agent
            sde                      1.16    active  Simple Diagnosis Engine
            snmp-trapgen             1.0     active  SNMP Trap Generation Agent
            sysevent-transport       1.0     active  SysEvent Transport Agent
            syslog-msgs              1.0     active  Syslog Messaging Agent


    • ereport.fm.fmd.module with msg = event-transport request to create an auxiliary thread exceeds module thread limit (8)
      • Example:
            From 'fmdump -eV errlog':

            Sep 09 2008 15:53:17.274465420 ereport.fm.fmd.module
            nvlist version: 0
                    version = 0x0
                    class = ereport.fm.fmd.module
                    detector = (embedded nvlist)
                    nvlist version: 0
                            version = 0x0
                            scheme = fmd
                            authority = (embedded nvlist)
                            nvlist version: 0
                                       version = 0x0
                           product-id = SPARC Enterprise M9000
                                       chassis-id = 2020643005
                                       server-id = san-dc2-1-0
                            (end authority)

                        mod-name = event-transport
                            mod-version = 2.0
                    (end detector)

                    ena = 0x5090126c9883c001
                    msg = event-transport request to create an auxiliary thread exceeds module thread limit (8)
                    __ttl = 0x1
                    __tod = 0x48c6fe5d 0x105c028c
  2. On the Domains
    • hundreds of ereport.fm.fmd.module with msg = Failed to read S_HELLO from dev:///sp0: Resource temporarily unavailable
      and/or ereport.fm.fmd.module with msg = Failed to write C_HELLO to dev:///sp0: Transport endpoint is not connected 
      • Example:
        From "fmdump -eV"

        TIME                           CLASS
        Sep 10 2008 17:33:01.422924350 ereport.fm.fmd.module
        nvlist version: 0
                version = 0x0
                class = ereport.fm.fmd.module
                detector = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = fmd
                        authority = (embedded nvlist)
                        nvlist version: 0
                                version = 0x0
                                product-id = SUNW,SPARC-Enterprise
                                chassis-id = 2020643005
                                server-id = san-dc2-1-g
                        (end authority)

                        mod-name = event-transport
                        mod-version = 2.0
                (end detector)

                ena = 0x5e60bdf957002401
              &
      • In the TWO files (on the active AND standby XSCF),
            /hcp1/scfprog/init/scf_initrc/11cmemready/S29setfmconf
           /hcp0/scfprog/init/scf_initrc/11cmemready/S29setfmconf


            change the last line of S29setfmconf (using the 'vi' command):

                exit 0

            to

            ${EGREP} -n 'setprop *client.thrlim'  ${FILE} >/dev/null 2>&1
            if  [ $? -ne 0 ]; then
                echo "setprop client.thrlim 48" >> ${FILE}
            fi

            exit 0
      • Run rebootxscf on the active XSCF unit.
      • Check that the thread limit is now 48 with the following command:
               XSCF> fmstat -a -m event-transport
               NAME                        VALUE             DESCRIPTION
               error_drop_read             0                Dropped read messages
               error_post_filter           0                Post filter errors
               ...
               fmd.thrlimit                48               limit on number of auxiliary threads
      • Reboot all running domains.


Product
Sun SPARC Enterprise M8000
Sun SPARC Enterprise M9000

event-transport, ereport.fm.fmd.module, thread limit, S_HELLO
Product_uuid
2eb6b8a2-ce94-11db-9135-080020a9ed93
51e8feab-ce93-11db-9135-080020a9ed93

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback