Document Audience:INTERNAL
Document ID:A0253-1
Title:A sub-population of DIMMs that shipped between 2001 and 2002 on the below platforms are showing significantly lower reliability than expected.
Copyright Notice:Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved
Update Date:Thu Aug 18 00:00:00 MDT 2005

__________________________________________________________________

***  Sun Confidential:  Internal Use and Authorized VARs Only  ***
__________________________________________________________________

This message including any attachments is confidential information
of Sun Microsystems, Inc.  Disclosure, copying or distribution is
prohibited without permission of Sun.  If you are not the intended
recipient, please reply to the sender and then delete this message
__________________________________________________________________

                     FIELD CHANGE ORDER    
        (For Authorized Distribution by Sun Services)
            
            
FCO #: A0253-1
Status: active
Synopsis: A sub-population of DIMMs that shipped between 2001 and 2002 on the below platforms are showing significantly lower reliability than expected.
Date: Aug/18/2005
Top FIN/FCO Report: Yes

PRODUCT REFERENCE: Memory / DIMM
Product Category: Server / Desktop / System Component
Product Affected: 
Mkt_ID    Platform    Model   Description
------    --------    -----   -----------

  -       F12K        All    Sun Fire 12K
  -       F15K        All    Sun Fire 15K
  -       S8          All    Sun Fire 3800
  -       S12/S12i    All    Sun Fire 4800/4810
  -       S24         All    Sun Fire 6800
  -       A40         All    Sun Fire V1280
  -       A35         All    Sun Fire SF280R
  -       N28         All    Netra 20
  -       A28         All    Sun Blade 1000
  -       A29         All    Sun Blade 2000
  -       A37         All    Sun Fire V480
  -       A30         All    Sun Fire V880
Parts Affected: 
Part Number    	Description
-----------    	-----------

501-5401-xx	ASSY,SDRAM,DIMM,256MB,18X8MX16
501-6175-xx	ASSY,WS NGDIMM,256MB
501-5030-xx	ASSY,SDRAM,DIMM,512MB
501-6174-xx	ASSY,WS NGDIMM,512MB
501-5031-xx	ASSY,SDRAM,DIMM,1GB
501-6109-xx	ASSY,SDRAM,DIMM,1GB SF HES
501-6173-xx	ASSY,WS NGDIMM,1GB
References: 
Sun Alerts:    57757
 BugIDs:        5034665
 Escalations:   1-833911, 1-1482139
 DPCOs/LEAPs:   DPCO #483, GSAP #3037, GSAP #3111.B
 URL:   FAQ     http://onestop/qco/pllDIMM/index_pllDIMM.shtml
Issue Description: 
Sun has determined that a limited subset of DIMMs shipped in 2001 and
2002 (less than one percent of the installed base) may begin to show
reduced reliability after approximately two years of operation.  This
reliability issue manifests itself in the form of UEs (Uncorrectable
Errors), sometimes with CEs (Correctable Errors), originating from the
DIMMs.  The reliability of these DIMMs is normal for approximately the
first two years of use, after which they may start to degrade below the
expected level.

The root cause of this issue is related to a PLL device on the DIMMs.
This sub-population of DIMMs has PLL devices with a date code range
between 0049 and 0215 inclusive.

No unique symptom will be experienced due to this issue, other than
higher than expected UEs and CEs.  A DIMM lookup tool has been
developed to assist in identifying suspect DIMMs.


Impacted Platforms
------------------
It has been determined that the following platforms if shipped
between Jan/01/2001 and Dec/31/2002 could be impacted:

   F12K, F15K, 3800, 4800, 4810, 6800, V1280, V480, V880,
   SF 280R, Netra 20, Sun Blade 1000 and 2000


Example System Messages
-----------------------

  WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU0
  Privileged Data Access at TL=0, errID 0x00000019.4558db40
     AFSR 0x00100004.0000000c AFAR 0x00000040.e78fe750
     Fault_PC 0x10033c24 Esynd 0x000c Slot B: J7900 J7901 J8001 J8000
  [AFT1] errID 0x00000019.4558db40 Two Bits were in error
  WARNING: [AFT1] EDU Event detected by CPU0 at TL=0, errID
  0x00000019.4558db40
     AFSR 0x00200028.0000000c AFAR 0x00000040.e78fe750 
AMBIGUOUS
     Fault_PC 0x10033c24 Esynd 0x000c AMBIGUOUS
  [AFT1] errID 0x00000019.4558db40 Two Bits were in error
  NOTICE: Scheduling clearing of error on page 0x00000040.e78fe000
  WARNING: [AFT1] WDU Event detected by CPU0 at TL=0, errID
  0x00000019.4558db40
     AFSR 0x00200028.0000000c AFAR 0x00000040.e78fe750 
AMBIGUOUS
     Fault_PC 0x10033c24 Esynd 0x000c AMBIGUOUS
  [AFT1] errID 0x00000019.4558db40 Two Bits were in error
  NOTICE: Scheduling clearing of error on page 0x00000040.e78fe000

  panic[cpu0]/thread=30002e32b20: [AFT1] errID 0x00000019.4558db40 
UE EDU WDU
  Error(s)
Parts Affected: 
AMER: August 30, 2007
APAC: December 31, 2007
EMEA: January 31, 2008
Implementation: 
---
|   |   MANDATORY (Fully Pro-Active)
 ---

 ---
| X |   CONTROLLED PRO-ACTIVE (proactively implement on systems 
 ---			       under Gold or above contracts)

 ---
|   |   UPON FAILURE
 ---
Replacement Time Estimate: 
less than 2 hours
			   (depending on platform type)
Special Considerations: 
This FCO will have a time zone phased release based on material 
readiness as follows:

                  Readiness Date
                  --------------
  US/Canada       READY
  Ltn America     READY
  EMEA            Sep/01/2005
  APAC            READY
  ANZO            READY
  Japan           READY

The above dates represent when each time zone has determined that
it will be materially ready to support this FCO.  All dates are
estimates.  Please check with your Logistics Representative for
more information with regard to material availability.

Note: To order DIMMs for remediation, follow your TZ FCO parts ordering process.
      Due to limitations on parts, it is requested that when a failure is
      identified in the field, that only the failed DIMM be replaced at that
      time.  Run the tool to identify other suspected DIMM issues and identify
      the actual affected DIMMs during the failure replacement.  However,
      replacement of the non-failed, affected DIMMs should be scheduled at
      a time in the future and ordered through the FCO process.  For more
      information contact your TZ or Area FCO Representative with any questions
      per the following;

 . EMEA: Contact the FCO country manager per the following list;

    http://finfco.emea/ORGANIZATION/EMEA/country_fco.html

 . North America: use the following alias to communicate proactive requirements;

    [email protected]

 . Latin America: use the following alias to communicate proactive requirements;

    [email protected]

 . APac: Please follow standard process to order parts.  If in doubt, please
         contact your local country FCO Representative or Tech Ops Mgr.

There are many causes for the occurrence of UEs and CEs in memory.  In fix
on fail situations, if the system no longer experiences the condition after a
monitoring period you can assume the issue was caused by the PLL issue.  But
if the system continues to experience the issue, the system likely has other
DIMMs with this or some other issue.  If no affected DIMMs are flagged by the
lookup tool, the Field Engineer should continue with the debug process as
normal.
Corrective Action: 
Hot Swappable: No

Replace according to the following part swap table, and per the
Details section below:

    Replace           With
  -----------	   -----------
  501-5401-xx      501-5401-03 (or above)
  501-6175-xx      501-6175-02 (or above) or 501-5401-03 (or above)
  501-5030-xx      501-5030-03 (or above)
  501-6174-xx 	   501-6174-02 (or above)
  501-5031-xx      501-6109-02 (or above)
  501-6109-xx      501-6109-02 (or above)
  501-6173-xx      501-6109-02 (or above)

Note: If the above non-RoHS part numbers are not available, RoHS parts may
      be used in their place.  Please refer to the Sun System Handbook for
      all RoHS part numbers.  For example, the 501-7385 may be used in
      place of the 501-5030.

More information about the RoHS Program in general can be viewed by going
to Field Information Notice (FIN) 102250.

You may also reference the Worldwide Sub-List which can be viewed using
your Sun employee number and LDAP password via the below URL;

  http://roca.central/clrepair/lists/WWsublist.txt


Identifying Suspect DIMMs
-------------------------

To determine if a system has suspect DIMMs within the affected 
date code range:

- First use the PLL DIMM Lookup Tool.  Run explorer ('prtfru -x') on
  the system to test, and use the lookup tool on...
    
      http://pts-appl-z1.holland:8080/PLLManualLookup/ 
        
  and Prtfru Scanner on...
    
      http://pts-appl-z1.holland/pll.html     
      
  to check the output file for affected DIMMs on the system.
    
Below is a link to the commandline version of the PLL Lookup tool...

      http://pts-appl-z1.holland/pll_commandline.html

You can find additional information and an FAQ (item 9 on that page)
via the below URL...

      http://onestop/qco/pllDIMM/index_pllDIMM.shtml

Note: Every effort has been made to ensure the lookup tool has a 
      complete list of the suspect DIMMs.  However, due to issues 
      with traceability of DIMM serial numbers, the lookup tool also 
      has a small set of DIMMs that may not have the PLL within the 
      suspect range.  Therefore, the instructions in the following
      step are necessary to ensure only DIMMs with the target PLLs
      are remediated.

- Second, upon system shutdown, Sun Field Representative should 
  verify that all DIMMs to be replaced have a PHILIPS PLL Device
  and the device has a date code that falls within the date code
  range of 0049 and 0215 inclusive.  For those that do not but
  were identified by the PLL DIMM Lookup Tool, they should be
  installed back in the system.  The Sun Field Engineer needs to
  capture the list of DIMM part numbers and serial numbers that
  were flagged by the tool but did not have the suspect PLLs.
  The Sun Field Engineer should then e-mail this information to
  the feedback alias [email protected] so the lookup
  tool can be updated.

Note: Whenever a system architecture and the customer setup allow
      it, Dynamic Reconfiguration (DR) can be used to remove a
      board from a system to inspect and replace DIMMs flagged
      by the tool.

Whenever possible, remediation efforts related to this FCO should
be coordinated with remediation efforts for FCO A0248.  If you do
so, it is recommended that you replace faulty Uniboards and DIMMs
in one step.

Details:
-------

  - For Gold and above accounts, recommend proactive check and 
    replacement of all affected DIMMs based on the date code range
    of 0049 through 0215 inclusive.

Note: The PLL lookup tool should be used to identify if the system 
      has any of the suspect DIMMs.

Note: If one or more DIMMs are reported suspect by the tool it is 
      highly recommended to verify the date code of all the DIMMs
      on that board and replace the DIMMs that are in the affected
      date code ranges.

  - For all others, upon failure of one DIMM, use the lookup tool to 
    check the system and verify if it has the affected DIMMs.  The
    lookup tool will identify all affected DIMMs.  Use the instructions
    in the Special Considerations section above to replace all affected
    DIMMs in ONLY the system that has experienced the UE or CE event.
    The replacement should go beyond the affected DIMM, but not beyond
    the affected system.

Note: What to do if a machine shows memory errors that could point to "PLL"
      related problems, but the tools do not find any suspect DIMMs.

      - The first step in the actions plan should be, visual inspection of 
	the DIMMs for suspect DIMMs that are missed by the tool.  Before 
	replacing any other parts.

      - Replace the DIMMs that fall into the date codes (see instructions 
	for visual inspection below).

      - Visual inspection of DIMMs for systems and boards that do not 
	display problems is discouraged.  This to minimize handling for
        parts.

	** Findaftt/FindUE can be used to assist in determining if possible 
	   "PLL" related issues are on going, and can be found via the
           below URLs;

	   http://systems-tsc/twiki/bin/view/Tools/ToolPageFindaft
	   http://systems-tsc/twiki/bin/view/Tools/ToolPageFindUE

	** See the FAQ for possible reasons for the tools missing suspect
           DIMMs, which can be found via the below (internal only) URL;

	   http://onestop/qco/pllDIMM/docs/PLL_FAQ.pdf

        ** Additional information may be found via the below (internal
           only) URL;

           http://onestop/qco/pllDIMM/index_pllDIMM.shtml

To determine what DIMMs are within the suspect date code range you must find
the PLL chip on the DIMMs themselves.  This will be a small square chip on the
DIMM side facing the outside of the system.  The chip will be near the center
of the DIMM.  See the first URL for an example chip marked in red, and the
other two URLs for close-ups of the chip itself.

Internal only links:
                                                          
 http://pts-platform/twiki/pub/Products/ProdIssuesSunFireV880/NG_Dimm_front_PLL.pdf
 http://pts-platform/twiki/pub/Products/ProdIssuesSunFireV880/PLL_closer_look.jpg
 http://pts-platform/twiki/pub/Products/ProdIssuesSunFireV880/PLL_up_close.jpg
 http://onestop/qco/pllDIMM/docs/PLL_FAQ.pdf
 http://onestop/qco/pllDIMM/index_pllDIMM.shtml
                           
Partner viewable links:

 http://sdpsweb.central/FIN_FCO/FCO/A0253-1/SPE/NG_Dimm_front_PLL.pdf
 http://sdpsweb.central/FIN_FCO/FCO/A0253-1/SPE/PLL_closer_look.jpg
 http://sdpsweb.central/FIN_FCO/FCO/A0253-1/SPE/PLL_up_close.jpg

The only PLL chips that are suspect are manufactured by Philips.  If a DIMM
has a PLL chip manufactured by any other vendor (Motorola, Agere, etc..) then
it is not suspect.
 
To determine who manufactured the PLL chip, look at the markings on the chip
itself.  A Philips PLL chip will have three to four lines of information
similar to the below information.

     PCK953BD
     CA6936
     TS204B

     PCK953BD
     CA6936
     TS
     0204B

The top line is the Philips P/N (always PCK953BD).  Therefore the first
step in the visual check will be to look for the Philips P/N (PCK953BD)
on the chip markings.

The second line down is the wafer lot number (for example, CA6936), and
the third or fourth lines will contain a number that includes the date
code (for example TS204B or TS0204B).

NOTE: The printed information is quite small and reading it may require
the use of a magnifying glass.

The date code can be determined by the bottom line on the Philips chip
"TS204B" or "0204B".  Ignore the letters and just focus on the
numbers.  Both versions above can be read in the same way.  204 stands
for manufacturing year 2002 "2", in the 4th week "04", or in the second
example above year 2002 "02", in the 4th week "04".  You may see both
datecode versions in the field "YYWW" or "YWW".

Any Philips PLL called out by the lookup tool that falls within the
manufacturing date code range of year 2000 week 49 through year 2002
week 15 inclusive, should be replaced as suspect.  The example chips
above would need to be removed from the system and replaced because
they fall within the year 2000 week 49 "049" or "0049" to year 2002
week 15 "215" or "0215" date code range.  Let's give a few more
examples to be ensure understanding.
 
     PCK953BD
     CA6936
     TS209B
     
     or

     PCK953BD
     CA6936
     TS
     0209B
 
  This chip should also be replaced as it falls within the year 2002 
  "2" or "02" week 9 "09" range.
 
     PCK953BD
     CA6936
     TS322B
     
     or

     PCK953BD
     CA6936
     TS
     0322B
 
  This chip is not within the suspect range as it falls in the year 
  2003 "3" or "03" week 22 "22" range.
Comments: 
Note: Some DIMMs are manufactured by Elpida and/or Hitachi.  These DIMMs have
      a metal cover on them that covers the PLL device.  These DIMMs are not
      impacted by this FCO.
      
Important! Put an "x" in the Purge/FCO box and write "FCO A0253-1" on all
           Defective Material Tags (DMT) to ensure proper return processing,
           and always quote the FCO number in the Radiance case entry.  When
           DIMM is proactively replaced under this FCO, ie; did not fail
           prior to replacement, also clearly write "proactive replacement"
           on the tag.
           
           When mass remediating multiple DIMMs proactively, you may package 
           all DIMMs into one shipping box, and only mark the box ONCE with
           one DMT labeled as noted above.

Send email to [email protected] for questions or comments about
this Field Change Order.

CHANGE HISTORY:

Aug/18/2005 - change all affected part number dash levels to -xx.
	    - added GSAP 3111.B to References section.
	    
Aug/22/2005 - moved Identifying Suspect DIMMs section from Special Considerations
	      to Corrective Action section.
	    - Added TZ contact information in the Special Considerations section.
	   
Aug/25/2005 - republished to distribution alias with above changes and date
              change.
              
Oct/25/2005 - corrected outdated link to EMEA Contact information in SPECIAL
              CONSIDERATIONS section.
              
Dec/15/2005 - added sentence to the end of Important! section under COMMENTS
              requesting field put "proactive replacement" on DMT when DIMM
              is proactively replaced and wasn't a failed unit.

Jan/25/2006 - added partner viewable links to DIMM pictures in the Corrective
              Action section.

Apr/07/2006 - added information in Corrective Action section that RoHS parts
              could be used when non-RoHS is not available.

Nov/15/2006 - additional instructions, notes and URLs added to the "Details"
              under "Indentifying Suspect DIMMs" in the Corrective Action
              section.

May/03/2007 - Updated Target Completion Dates by Timezone.
________________________________________________________________________

NOTE: FCO Tracking Instructions for Radiance/SPWeb:
--------------------------------------------------

If a Radiance case involves the application of an FCO to solve a customer
issue, please complete the following steps in Radiance/SPWeb prior to
closing the case:
 
    o Select "Field Change Order" in the REFERENCE TYPE field.

    o Enter FCO ID number in the REFERENCE ID field.
      For example; A0222-1.

If possible, include additional details in the REFERENCE SUMMARY field
(ie. Upgrade complete, customer declined, etc.)
________________________________________________________________________

Implementation Notes
--------------------

In case of "Mandatory" FCOs, Sun Services will attempt to contact
all known customers to recommend proactive implementation.

For "Controlled Proactive" FCOs, Sun Services mission critical
support teams will initiate proactive implementation efforts for
their respective accounts, as required.

For "Upon Failure" FCOs, Sun Services and partners will implement
the necessary corrective actions as the need arises.

The CIC process must be used for proactive hardware replacement
requests when an FCO is classified as "Upon Failure".


Billing Information
-------------------

Warranty: Sun will provide parts at no charge under Warranty
          Service.  On-Site Labor Rates are based on specified
          Warranty deliverables for the affected product.

Contract: Sun will provide parts at no charge.  On-Site Labor Rates
          are based on the type of service contract.

Non Contract: Sun will provide parts at no charge.  Installation by
              Sun is available based on the On-Site Labor Rates
              defined in the Price List.

________________________________________________________________________

All FCO documents are accessible via Internal SunSolve.  Type "sunsolve"
in a browser and follow the prompts to Search Collections.

For questions on this document, please email:

        [email protected]

For more information on the FCO Program, go to:

        http://tns.central/fco

To access the Service Partner Exchange, use:

        https://spe.sun.com
Statusactive