Document Audience:INTERNAL
Document ID:I1030-1
Title:T3/T3+ Power Control Unit (PCU) connectors on the midplane may be damaged by applied force and/or stress during normal maintenance.
Copyright Notice:Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved
Update Date:2004-04-05

------------------------------------------------------------
            - Sun Proprietary/Confidential: Internal Use Only -
------------------------------------------------------------------------  
                        FIELD INFORMATION NOTICE
               (For Authorized Distribution by Sun Service)
FIN #: I1030-1
Synopsis: T3/T3+ Power Control Unit (PCU) connectors on the midplane may be damaged by applied force and/or stress during normal maintenance.
Create Date: Mar/15/04
SunAlert: No
Top FIN/FCO Report: No
Products Reference: Sun StorEdge T3/T3+ Arrays
Product Category: Storage / Service
Product Affected: 
Systems Affected:
-----------------  
Mkt_ID   Platform   Model   Description                    Serial Number
------   --------   -----   -----------                    -------------
  -       ANYSYS      -     System Platform Independent          -  


X-Options Affected:
-------------------
Mkt_ID         Platform      Model      Description   	   Serial Number
------         --------      -----      -----------   	   -------------
  -	          T3          ALL       T3 StorEdge Array        -
  -               T3+         ALL       T3+ StorEdge Array       -
Parts Affected: 
----------------------
Part Number                Description   		       Model
-----------                -----------   		       -----
300-1454-04 or lower       PWR SUPPLY PURPLE1 NIMH               -
References: 
FIN: I0745-1 
 
URL: http://grand.central/web/salesmktg/products/svc_prod/sz/SunMoves.html
     http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_closeup.jpg
     http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_closeup2.jpg
     http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_inchassis.jpg
     http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_inchassis2.jpg
     http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_inchassis3.jpg
Issue Description: 
Applied force or inadvertent over stress during PCU insertions may
cause the Power Cooling Unit (PCU) connector in StorEdge T3/T3+ arrays
to shift.  This may produce PCU midplane connector breakage and cause
the T3 PCU leads to short, rendering the disk array non-functional.

There have been no reported instances of data loss, but a new chassis
with midplane, loop cards, controller and at least one PCU may be
required to restore the system to full functionality.

When this connector damage occurs and power is applied to the T3/T3+
disk array, the power leads may short.  Smoke and heat may be emitted
from the unit and the T3/T3+ disk array.  This results in the array
becoming non-functional.  Components on the controller and loop cards
can be seen to have heat damage, as can the PCU and the PCU midplane
connector.      
   
NOTE: Make sure to follow all guidelines outlined in FIN IO745-1 
      for the movement of any Sun T3s equipment.

Please see the following URL for T3 movement guidelines:

URL:
http://grand.central/web/salesmktg/products/svc_prod/sz/SunMoves.html
Implementation: 
---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        |   |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
Corrective Action: 
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned issue.

These guidelines will help to ensure midplane connectors are not
damaged during normal PCU insertion activities.  However due to prior
PCU removal or insertion actions, the midplane connector may have
already sustained damage.  These guidelines include instructions to
examine the midplane connector for damage prior to reinserting the
PCU.  Please visit the following URLs to examine the types of damaged
connectors:

   http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_closeup.jpg
   http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_closeup2.jpg
   http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_inchassis.jpg
   http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_inchassis2.jpg
   http://sdpsweb.EBay/FIN_FCO/FIN/FINI0745-1_dir/connector_inchassis3.jpg

When performing a proactive replacement of near expiring PCUs, always
be careful on handling the PCUs.  Make sure to use only the PCUs that
are in a sealed bag.  Also make sure that the PCUs should be inserted
and removed with a single gentle motion without hesitation or side to
side motion.

Please adhere to the guidelines shown below and perform the following
step-by-step procedure in order to remove and reinstall PCUs on all
T3/T3+:

NOTE: Never change out both PCU1 and PCU2 on the same brick on the same
      day.  This gives the newly replaced PCU time to fully recharge.

1. If applicable, check with the system administrator to make sure that
they are ready, by executing 'tail -f /var/adm/messages.t300' on the
admin host so the syslog activity can be seen.

Check with the system administrator which filename is used on the host
system for array syslog remote logging - the filename given here
(/var/adm/messages.t300) is a standard name but the customer may have
chosen a different filename.  Ensure that you are monitoring the remote
syslog messages from the array that you will be working on.

2. As a best practice perform the following prerequisites:

   A. Verify that all loop cables (for ES config) and MIAs are screwed
      down tightly by using a small flathead screwdriver and tightening 
      each loop cable. Be very careful not to disconnect any loop cable. 
      If you notice a loop cable that is not screwed in at all, notify 
      customer.

   B. Verify all controllers and loop cards are in their prospective
      slots securely by pushing on each card and verifying that all 
      latches are in the locked position.

   C. Verify that all PCUs are in their prospective slots securely
      by pushing on each PCU and verifying that the PCU latches are 
      in the locked position.

   D. Type "fru stat" to check ALL T3 FRUs are in a healthy state
      and that their LEDs are in their normal state before proceeding.

   E. Type "date" and "tzset" to check if the date and
      timezone are correct.  If not, use the "date" and "tzset" command
      to set the date and timezone, respectively.

   F. Type "refresh -s" to check that no battery refresh is running
      before proceeding.  Also, check if the "Next Refresh" won't begin 
      shortly after executing the PCU replacement.  If yes, the "Next 
      Refresh" should be re-scheduled to a later time (24 hours).  Refer 
      to the Field Service Manual for changing the refresh time (BAT_BEG) 
      in the file /etc/schd.conf.  If battery status is reported as "Low", 
      this is ok as the purpose of this maintenance action is to replace 
      it, or replace the battery pack.

   G. Type "proc list" to check that no drive reconstruction is running
      before proceeding.

      NOTE: And if possible this procedure be performed during a  
            maintenance window to minimize disruption to customer 
            operation.

   H. Notify customer that performance will degrade during and
      after execution of this FCO as new batteries will need to be
      charged up after power on.  Charging can take up to 12 hours 
      (per battery) and during this time write caching will be 
      disabled.

   I. Advise the customer SysAdmin that while the PCU is removed, an
      inspection of the midplane connector will be performed.  If this
      inspection results in finding a cracked or damaged PCU midplane
      connector, notify customer immediately.  Ask the customer SysAdmin
      to make the operational decision whether they want to try to 
      install a new PCU, or start shutting down access so the chassis 
      can be replaced.  The chassis will need to be swapped out at some 
      point.


3. All new PCUs from the RSL's should be at revision level -04.  If you
   can't reliably identify version level return the PCU to stock and
   clearly identify the issue.

4. To remove the PCU for battery swap, or remove the PCU to replace
   the PCU, you must power off the PCU and then you can pull it out.
   Carefully observe and follow these guide lines:

   A. Power off PCU, wait 30 seconds. (watch syslog)

      NOTE - DO NOT POWER OFF MORE THAN ONE PCU AT A TIME FOR EITHER ES
             OR WG CONFIGURATION.  Powering off/removing a PCU will cause 
             the T3 cache to run in write-through mode.  Make sure that 
             the AC LED (left) is AMBER and the PS LED (right) is OFF.

      typical messages:

        Jan 14 19:47:47 LPCT[1]: W: u2pcu1: Switch off, serial no = 005363
        Jan 14 19:47:48 LPCT[1]: W: u2pcu1: Off, serial no = 005363
        Jan 14 19:47:50 LPCT[1]: W: u2pcu1: DC not OK, serial no = 005363

      No additional errors or warnings are noted.

   B. Disconnect power cord from PCU.  (watch syslog)

      typical message:

        Jan 14 19:48:23 LPCT[1]: E: u2pcu1: Battery not present

      No additional errors or warnings are noted.

   C. Push the PCU latches into the unlocked position and pull the unit
      out of the disk tray.  Wait 15 seconds and then verify that both
      controller online LEDs are still GREEN.  If any controller LED
      changes to non-solid GREEN (ie OFF/AMBER/Flashing AMBER) then
      immediately refer to the "Troubleshooting" section (below) before
      continuing.

      CAUTION - Any PCU that is removed must be replaced within 30 minutes
                or the Sun StorEdge T3 disk tray and all attached disk 
                trays will automatically shutdown and power off.

      CAUTION - For partner pair configurations make sure that the loop
                cables have significant length to spread apart so you can 
                remove u1pcu1.  Also make sure that the loop cables, along 
                with other cables connected to the T3, are screwed in 
                tightly so you do not inadvertently knock them off during 
                removal/insertion.

      typical messages:

        Jan 14 19:49:06 LPCT[1]: N: u2pcu1: Warranty date was cleared.
        Jan 14 19:49:06 LPCT[1]: E: u2pcu1: Not present
        Jan 14 19:49:06 TMRT[1]: E: u2pcu1: Missing; system shutting down
                                 in 30 minutes
        Jan 14 19:49:08 TMRT[1]: E: u2ctr: Multiple Fan Faults; system
                                 shutting down in 30 minutes
        Jan 14 19:50:45 LPCT[2]: E: u2pcu1: Not present

      No additional errors or warnings are noted.

   D.  Look inside the PCU bay, inspect the left and right sides of the 
       PCU midplane connector for cracks or other damage.  A working 
       flashlight is required to inspect the connector.

       NOTE: PCU must be inserted within 30 minutes, otherwise the brick
             will time out and shut off.

   E.  If obvious damage is seen, inform the SysAdmin of the risk of an
       outage as soon as we attempt to insert the new PCU.

       Ask the customer SysAdmin to make the operational decision whether
       they want to try to put the new one in, or start shutting down
       access so the chassis can be replaced. The chassis will need to be
       swapped out at some point.

       CAUTION - It is also important to note the same fault(cracked
                 connector) may be experienced on incorrect 
                 removal/re-insertion of a PCU.  PCU's should be inserted 
                 and removed with a single gentle motion without hesitation 
                 or side to side motion.

   F.  If no damage is seen, carefully install the replacement PCU.  Do
       not force. If any abnormal resistance or friction is felt select
       another PCU to use in this T3 chassis. You can most likely use the
       PCU experiencing friction in the next T3. Observe same insertion
       procedure.

   G. Install new PCU.  Wait 30 seconds and then verify that both
      controller online LEDs are still GREEN.  If any controller LED
      changes to non-solid GREEN (ie OFF/AMBER/Flashing AMBER) immediately 
      refer to the "- Troubleshooting" section below before continuing.

      typical messages:

        Jan 14 19:50:06 LPCT[1]: E: u2pcu1: Over temperature, serial no =
                                 005363
        Jan 14 19:50:06 LPCT[1]: W: u2pcu1: Switch off, serial no = 005363
        Jan 14 19:50:07 LPCT[1]: W: u2pcu1: Off, serial no = 005363
        Jan 14 19:50:07 LPCT[1]: E: u2pcu1: Battery not present
        Jan 14 19:50:11 LPCT[1]: W: u2pcu1: DC not OK, serial no = 005363

      No additional errors or warnings are noted.

   H. Push the PCU latches into the locked position.

   I. Connect power cord to PCU.  (watch syslog)

      typical messages:

        Jan 14 19:50:58 LPCT[1]: N: u2pcu1: Battery not OK
        Jan 14 19:50:58 LPCT[1]: W: u2pcu1: Off, serial no = 005363

      No additional errors or warnings are noted.

   J. Verify that the AC LED (left) is AMBER, indicating that AC power
      is present.

   K. Power on PCU, wait 30 seconds.  (watch syslog)

      typical message:

        Jan 14 19:51:40 LPCT[1]: N: u2pcu1: Battery not OK

      No additional errors or warnings are noted.

   L. Verify that both LEDs on the Power Cooling Unit are Green,
      indicating that the unit is receiving power.  Wait 15 seconds 
      and then verify that both controller online LEDs are still GREEN.  
      If any controller LED changes to AMBER immediately refer to the 
      "-Troubleshooting" section below before continuing.

      NOTE - The PS LED (right) may blink GREEN for a period of time.
	     (up to 12 hours for charging per battery while write
	     caching is disabled)

   M. Type "fru stat" to check if new PCU is recognized and functioning.
      Battery might show up as "fault" as it is charging up.

   N. Verify the Battery Warranty Date by typing "id read u(x)pcu(y)".

      hostname:/:<1>id read u1pcu1
      Revision: 0000
      Manufacture Week: 00421999
      Battery Install Week : 00222001  <----- week # when battery was
                                              installed
      Battery Life Used    :   0 days, 0 hours  <----- usage since pcu
                                                       inserted
      Battery Life Span    : 730 days, 12 hours
      Serial Number        : 003566
                             range
      Battery Warranty Date: 20010322172349 <----- date & time when PCU
                                                   switch turn on
      Battery Internal Flag: 0x00000000
      Vendor ID            : TECTROL-CAN
      Model ID             : 300-1454-01(50)


5. Troubleshooting

   During the removal, insertion, or switching on of the PCU, there is
   a very small chance where the T3 (ES or WG config) will reboot, and
   in the case of ES config one T3 controller can be disabled.  When
   this happens, the controller LED will change state from a solid
   GREEN to either OFF (reboot started), AMBER (booting), or Flashing
   AMBER (disabled).

   It is important to run the extractor after the T3 boots up and to
   get the reset log of the disabled controller.  The extractor will,
   by default, get the reset log of the remaining live controller.
   Give engineering the extractor and reset log for analysis and note
   when the reboot occurred, ie; at removal, insertion, or power on.

   Whether the disabled controller can be reused or not depends on any
   valid information from the reset log.  To get the reset log of the
   disabled controller:

   A.  Remove the disabled controller from the T3.

   B.  Insert a new controller.
  
       NOTE:  the new controller will boot up as alt master role for
              ES config.

   C.  Take the removed controller back, install it in a spare T3 
       (single brick), and let it boot up.

   D.  Via the telnet session or serial port, type "logger -dmprstlog"
       to dump the reset log to the T3 syslog.

   E.  If the reset log shows a valid hardware problem (ex; cache parity
       error) around the time the PCU was replaced, the controller should 
       be sent back via CPAS.

       Example;

       Jul 18 20:15:26 pshc[1]: N: logger -dmprstlog
       Jul 18 20:15:26 pshc[1]: W: u1ctr SysFail Reset (7001) was initiated
           at Cache memory parity error detected        20010626 163740
                                                        ^^^^^^^^ ^^^^^^
                                                       /        /
                                                      /        /
                                                 yyyymmdd   hr/min/sec

   F.  If the reset log shows other non-hardware related messages and that
       the time of occurrence is not around the time the PCU was replaced 
       then the controller can be deemed to be good.  The problem is more 
       related to firmware than hardware.

       Example;

       Jul 13 22:03:26 pshc[1]: W: u1ctr Exception Reset (2004) was
           initiated at Instruction Access exception      20001103 175513
                                                          ^^^^^^^^ ^^^^^^
                                                         /        /
                                                        /        /
                                                   yyyymmdd   hr/min/sec
Comments: 
None.

============================================================================
Implementation Footnote: 
i)   In case of MANDATORY FINs, Sun Services will attempt to contact   
     all affected customers to recommend implementation of the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Sun Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Sun Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.central/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.central/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://spe.sun.com
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------
Statusactive