Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1010814.1
Update Date:2009-01-06
Keywords:

Solution Type  Problem Resolution Sure

Solution  1010814.1 :   Sun StorEdge[TM] T3/T3B/Sun StorEdge[TM] 3900 Series/Sun StorEdge[TM] 6120 array: Performance impact due to Ping-Pong Effect  


Related Items
  • Sun Storage 6320 System
  •  
  • Sun Storage T3 Array
  •  
  • Sun Storage T3+ Array
  •  
  • Sun Storage 6120 Array
  •  
  • Sun Storage 3910 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
  •  
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - Other
  •  

PreviouslyPublishedAs
214943


Symptoms
Heavy performance impact, and possible loss of storage to an application
(depending on load and setup), due to the so-called "Ping-Pong effect".

Note: The Ping-Pong effect is only possible with Partner Pair
configurations. It will not occur in a Workgroup configuration.

The Sun StorEdge[TM] T3/T3B/Sun StorEdge[TM] 3900 Series/Sun StorEdge[TM] 6120
arrays, are Active/Passive arrays, and occasionally if they are configured
incorrectly, can display the so-called "Ping-Pong effect". This is where the
array constantly switches the Active path of a Logical Unit Number(LUN) from the
primary controller to the secondary, back and forth without stopping. This can
cause a heavy performance impact, and possible loss of storage to the
application, depending on load and setup.

This Document lists the conditions for this to occur, and how it can be
avoided.

During this behavior, messages similar to the following will be seen:

 FCC0[2]: W: u1ctr starting lun 0 failover
CFGT[1]: N: u1ctr: LUN 0 initiating failover
FCC0[2]: W: u1ctr starting lun 1 failover
FCC0[1]: N: u1ctr (ITL 7D 1 6 TT 20 TID D3F4 OP 2A) Busy response
FCC0[1]: N: u1ctr (ITL 7D 1 5 TT 20 TID D400 OP 28) Busy response
FCC0[1]: N: u1ctr (ITL 7D 1 1 TT 20 TID D40C OP 0) Busy response
FCC0[1]: N: u1ctr (ITL 7D 1 0 TT 20 TID D418 OP 2A) Busy response
FCC0[1]: N: u1ctr (ITL 7D 1 5 TT 20 TID D424 OP 28) Busy response
FCC0[1]: N: u1ctr (ITL 7D 1 6 TT 20 TID D430 OP 2A) Busy response
FCC0[1]: N: u1ctr (ITL 7D 1 1 TT 20 TID D43C OP 0) Busy response
FCC2[1]: N: u1ctr (ITL F E 0 TT 20 TID B568 OP 0) Target in Unit Attention
CFGT[1]: N: u1ctr: LUN 0 failover granted
CFGT[1]: N: u1ctr: LUN 1 initiating failover
CFGT[1]: N: u1ctr: LUN 1 failover granted
 FCC0[2]: W: u2ctr starting lun 0 failover
CFGT[1]: N: u2ctr: LUN 0 initiating failover
FCC0[2]: W: u2ctr starting lun 1 failover
FCC0[1]: N: u2ctr (ITL 7D 1 6 TT 20 TID D3F4 OP 2A) Busy response
FCC0[1]: N: u2ctr (ITL 7D 1 5 TT 20 TID D400 OP 28) Busy response
FCC0[1]: N: u2ctr (ITL 7D 1 1 TT 20 TID D40C OP 0) Busy response
FCC0[1]: N: u2ctr (ITL 7D 1 0 TT 20 TID D418 OP 2A) Busy response
FCC0[1]: N: u2ctr (ITL 7D 1 5 TT 20 TID D424 OP 28) Busy response
FCC0[1]: N: u2ctr (ITL 7D 1 6 TT 20 TID D430 OP 2A) Busy response
FCC0[1]: N: u2ctr (ITL 7D 1 1 TT 20 TID D43C OP 0) Busy response
FCC2[1]: N: u2ctr (ITL F E 0 TT 20 TID B568 OP 0) Target in Unit Attention
CFGT[1]: N: u2ctr: LUN 0 failover granted
CFGT[1]: N: u2ctr: LUN 1 initiating failover
CFGT[1]: N: u2ctr: LUN 1 failover granted

Note: Repeated messages of the above would be seen, coming from both u2ctr, and
u1ctr. This indicates failover going back and forth from primary to secondary
controllers.

On the host side, depending on config, the following could be seen:

  • SCSI errors
  • mpxio failover messages
  • dmp failover messages

But this depends on the load, and what is configured at the time.

This can effect the following products:

  • Sun StorEdge[TM] T3
  • Sun StorEdge[TM] T3B
  • Sun StorEdge[TM] 6120/6020/6320
  • Sun StorEdge[TM] 3900 Series

Note: Since a Sun StorEdge 39x0 Series is basically one or more T3+ arrays with
a dedicated management host, and configuration software, the mp_support behavior
is the same as a T3+. So, the ping-pong effect described above, could also
occur in a Sun StorEdge 3900 Series.



Resolution
Array Configuration:

:/:<1>sys list
controller         : 2.5
blocksize          : 64k
cache              : auto
mirror             : auto
mp_support         : mpxio
naca               : off
rd_ahead           : off
recon_rate         : high
sys memsize        : 256 MBytes
cache memsize      : 1024 MBytes
fc_topology        : auto
fc_speed           : 2Gb
disk_scrubber      : on
ondg               : befit

Conditions when Ping-Pong can Occur

Implicit Failover (mp_support = rw)

Implicit failover, is when the array mp_support setting, is set to "rw". This
means that failover will be initiated by simply accessing the Passive path of
the array. This will cause the passive path to become active.

If both path(controllers) are accessed at the same time, it will lead to the
Ping-Pong effect described above.

Two ways this can happen:

 1. Incompatible Multi-path software:
    When a single host is multi-pathed, but the multi-pathing software doesn't
know how to deal with this array type, it can lead to the multi-pathing
software trying to access both paths at once.
    Example incompatible software:
      vxdmp - can cause this issue. 
      How to fix this depends on the array:
       T3 and T3B - vxvm 3.0.4 or above, are "T3 aware", and don't need
anything else installed.
       T4(SE6120) - install VRTSt4 package - This is an ASL library, and is a
MUST for vxdmp to recognize the T4 array. This is only
supported for vxvm 3.2 and above. You can obtain this
package from the following Location:
                    http://www.sun.com/download/products.xml?id=3e8a1451
 2. Multi-host connections to the Array
    If multiple hosts connect to the array, implicit failover(mp_support=rw)
MUST NOT BE USED. The only supported option for a multi-hosted setup, is
explicit failover(mp_support=mpxio). Multiple hosts = multiple Solaris[TM] OS,
or any combination of OS.

Explicit Failover (mp_support = mpxio)

In this mode, the array only does a failover if it is explicitly requested to,
by a host. This is done in a special command sent from the host(via
multi-pathing software), asking the array to initiate a failover.

This mode is best for multi-host setups, as a failover will not happen, unless
it is explicitly requested by a host, and no host will try to failback by simply
accessing the passive path, such as with implicit failover.

Conditions for Ping-Pong

 1. Single Path to a Partner Pair:
    Note: This is an unsupported config, as Partner Pairs MUST have dual
paths to a host.
    The problem has occurred in a multi-hosted environment, where NOT ALL
HOSTS have access to both controllers on the array. 
    For example: 
         Host A only has access to the primary   controller.
Host B only has access to the secondary controller. 
    If these hosts shared access to a T3/T4 volume on the array, and a single
host was connected to the Primary, it would explicitly ask for that volume
to be available on the Primary. This in turn, makes host B lose access to
this volume. So, it would issue an explicit request to get access. This
keeps going, as neither of the hosts is ever satisfied, and the array will
try to satisfy both. 
    If however, they were multi-pathed, then they would reach an agreement on
the best available path.
 2. Unstable Path to the Array:
    The Ping-Pong problem can also occur with Explicit LUN Failover(ELF). If
auto-failback is set to "enable" in the file /kernel/drv/scsi_vhci.conf
and one path on a host becomes unstable(not completely failed), it can
cause MPXIO to fail the path back and forth.
    This is normal behavior, and can only be remedied by fixing the bad path.
    This can be avoided by setting auto-failback, to "disable" in the file
/kernel/drv/scsi_vhci.conf (See Symptom Resolution <Document: 1010947.1> for more details).

It is simple, in both of the above examples, to clarify how hosts can share a
volume. With volslicing now available, the hosts can easily have different LUNs
(slices) that belong to the same array volume. An array volume is what is seen
in "vol list" output, while "volslice list" will show where the slices belong.
However, it is still possible for multiple hosts to use the same LUN(Slice). It
may be a BAD BAD idea(unless used with proper software) but it's still possible.



Product
Sun StorageTek 3900 Series
Sun StorageTek 6320 System
Sun StorageTek 6120 Array
Sun StorageTek T3+ Array
Sun StorageTek T3 Array

t3, t3b, t3+, t4, 3900, 3910, 3960, se6120, se6020, se6320, se6x20, 6x20, 6120, 6020, 6320, vxvm, vxdmp, failover, mpxio, mp_support, rw, pingpong, ping, pong, ping-pong, ping pong
Previously Published As
80720

Change History
Date: 2005-09-29
User Name: 18392
Action: Approved
Comment: Added product names, product nouns, expanded abbreviations, did extensive re-wording and reformatting for clarity, brevity and readability.
Version: 10
Date: 2005-09-27
User Name: 18392
Action: Accept
Comment:
Version: 0
Date: 2005-09-27
User Name: 124974
Action: Approved
Comment: Good enough.
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback