Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1019713.1 : Specific Micron 4GB DIMMs are experiencing Multiple/Single Bit Error failures on some platforms with AMD Quad Core (Barcelona) processors.
PreviouslyPublishedAs 244406 Product Sun Fire X2200 M2 Server Sun Fire X4100 M2 Server Sun Fire X4200 M2 Server Sun Netra X4200 M2 Server Sun Fire X4140 Server Sun Fire X4240 Server Sun Fire X4440 Server Sun Fire X4600 M2 Server Sun Blade X8440 Server Module Date of Resolved Release 27-Oct-2008 Quad Core systems can experience a reboot or hang (see details below). Affected X-Options: X4063A-Z 8GB (2 x 4GB) Memory FRU, RoHS:Y X6322A 8GB (2 x 4GB) Memory FRU, RoHS:Y X5095A-Z 8GB (2 x 4GB) DDR2-667 ECC Registered DIMMs, RoHS:YL X4227A-Z 8GB (2 x 4GB) DDR2-667 DIMM) Option, RoHS:YL X8124A-Z 8GB (2 x 4GB) DDR2-667 Registered ECC DIMMs) Option, RoHS:YL Affected Parts: 540-7161-01 8GB (2 x 4GB) Memory FRU, RoHS:Y 540-7118-01 8GB (2 x 4GB) DDR2-667 DIMM) Option, RoHS:YL 540-7228-01 8GB (2 x 4GB) Memory, RoHS:Y 540-7600-01 8GB (2 x 4GB) Memory FRU, RoHS:Y 541-2129-01 8GB (2 x 4GB) DDR2-667 Registered ECC DIMMs), RoHS:YL ImpactQuad Core systems can experience a reboot or possibly hang if using Micron 4GB DIMMs (Vendor Part Number MT36HTF51272PY-667E) having a datecode prior to 200832 (week 32 of 2008).Micron 4GB DIMMs (Vendor Part Number MT36HTF51272PY-667E) with a datecode prior to 200832 (week 32 of 2008) are experiencing Multiple/Single Bit Error failures on platforms with AMD Quad Core/Barcelona processors. The issue involves a DRAM sensitivity when both auto-precharge read or writes and manual-precharge read or writes are being used interchangeably. Contributing FactorsAbove listed systems if upgraded from dual core to quad core and containing Micron 4GB DIMMs (Vendor Part Number MT36HTF51272PY-667E) having a datecode prior to 200832 (week 32 of 2008) are impacted.SymptomsThe system remains operational as long as it's running with Chipkill ECC enabled and memory channels are ganged. Multi-bit correctable errors are expected. If in ganged channel mode and chipkill ECC is disabled, the system will reboot with a HyperTransport Sync Flood due to uncorrectable ECC errors, or possibly hang. Chipkill ECC is not possible in unganged mode, so the system will reboot with a HyperTransport Sync Flood due to uncorrectable ECC errors or possibly hang as it would if in ganged channel mode and chipkill ECC is disabled.Root CauseThe issue is with the DIMM, where device U48 can end up in a race condition which forces an issue in the timing control circuitry when transitioning in the precharge set up. This is only an issue in Quad Core CPUs due to the way the memory switches between auto and manual precharging. Barcelona operates this way to get better performance. Micron 4GB DIMMs (Vendor Part Number MT36HTF51272PY-667E) with a datecode prior to 200832 (week 32 of 2008) are the only DIMMs affected. Manufacturing was purged via StopShip/Purge P001-20531, and corrective action began in Services Logistics via GSAP 4386 as of July 14, 2008.Corrective ActionWorkaround:No workaround available - see Resolution section below. Resolution: Any upgrade of dual core systems to quad core requires non-Micron DIMMs or Micron DIMMs that have a datecode of 200832 (week 32, 2008) or later. Any Micron DIMMS prior to this datecode must be replaced via the CIC Process and outlined below. Important! DIMMs should NOT be pulled from Services spares to address this issue. If upgrading from dual core to quad core there are two steps that must be followed to ensure remediation for this issue: . Determine if the system has suspect DIMMs, and if yes . Follow the specific steps for DIMM replacement as outlined below Step 1: Determine if the system has suspect DIMMs If the system has 4GB DIMMs, determine if the DIMMs are Micron and if they have a date code prior to 200832 (week 32 of 2008). There are two ways to check this: - Visually, or - Using the hdtl function of the service processor 1) Visual Inspection - The date code is located on the top center portion of the Micron DIMM. It is a 6 digit number and there are no other numbers adjacent to it. Any date code that is prior to 200832 needs to be replaced. Note: For the X4100 M2 and X4200 M2 you do not need to look at the date code. The suspect Micron DIMMs have the following Xoption/Manufacturing part number (X4227A-Z/540-7118). Non-suspect DIMMs have the following Xoption/Manufacturing part number (X4233A/540-7795). 2. hdtl function of the service processor - hdtl -dd 1 -e /etc/init.d/ipmistack start The JEDEC definitions for bytes 93 and 94 (5Dh and 5Eh) are year and week respectively and are represented in Binary Coded Decimal (BCD). For example, week 32 in year 2008 would be coded as 08 (0000 1000) in byte 93 and 32 (0011 0010) in byte 94. Future versions of ILOM will dump date code information as part of the FRU information. If it has been determined that the system has Micron 4GB DIMs that have a date code prior to 200832 (week 32 of 2008) and you are upgrading from dual core to quad core, please follow the instructions below which highlight the actions required dependent on system type. Step 2: Follow the steps for DIMM replacement as outlined below specific by platform. Sun Fire X2200 M2 (A85) - Sun Service must use the CIC (Customer Intensive Care) process to arrange for the replacement of the affected Micron 4GB DIMMs with the following part number: X4087A. Netra X4200 M2 (N87) - Sun Service must use the CIC process to arrange for the replacement of the affected Micron 4GB DIMMs with the following part number: X4227A-Z. Sun Fire X4140 (B12) / Sun Fire X4240 (B14) / Sun Fire X4440 (B16) - Sun Service must use the CIC process to arrange for the replacement of the affected Micron 4GB DIMMs with the following part number: X6322A. Sun Blade X8440 (A98) - Sun Service must use the CIC process to arrange for the replacement of the affected Micron 4GB DIMMs with the following part number: X5095A-Z. Sun Fire X4100 M2 (A86) / Sun Fire X4200 M2 (A87) - Sun Service must use the CIC process to arrange for the replacement of the affected Micron 4GB DIMMs with the following part number: X4233A. Sun Fire X4600 M2 (A67) - Sun Service must use the CIC process to arrange for the replacement of the affected Micron 4GB DIMMs with the following part number: X8098. The CIC tool can be found at the below URL; http://cic.east.sun.com/cictool/index.jsp It should be expected that once the request for replacement DIMMs has been input into the CIC tool that the DIMMs would ship within a few business days. Service should also use the CIC tool to return the affected Micron 4GB DIMMs. The URL below is a reference for how to both request and return material via the CIC tool; http://sunwebcms.central.sun.com:8001/sunweb/cda/mainAssembly/0,2685,4158935_6870,00.html Note to Authorized Service Partners: Sun Authorized Service Partners may contact Sun Services or their Sun Services Representative to receive FAB related information. CommentsDo not use Services spares to address this issue. This activity is no longer funded by Engineering and must therefore be funded by Services.References: GSAP: 4386 WW Stop Ship: P001-20531.A3 For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: For Sun Authorized Service Providers go to: In addition to the above you may email: Modification History 08-Jan-2009:
Internal Contributor/submitter [email protected] Internal Eng Responsible Engineer [email protected] Responsible Manager: [email protected] Internal Services Knowledge Engineer [email protected] Internal Eng Business Unit Group NSG (Network Systems Group) Internal Sun Alert & FAB Admin Info 23-Oct-2008: Completed draft and sent to Extended Review. 27-Oct-2008: No feedback from Ext Rvw - sending to Publish. 24-Nov-2009: Corrected Product Name to swoRDFish inconsistency. Attachments This solution has no attachment |
||||||||||||
|