Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1004618.1
Update Date:2011-04-26
Keywords:

Solution Type  Problem Resolution Sure

Solution  1004618.1 :   Sun Fire[TM] 12K/15K: frad daemon reports "Ignoring CRC error" or "TH Segment CRC is wrong on DIMM"  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
206404


Applies to:

Sun Fire 12K Server
Sun Fire 15K Server
All Platforms

Symptoms

The frad daemon reports "Ignoring CRC error" or "TH Segment CRC is wrong on DIMM" errors into error log file in /var/opt/SUNWSMS/SMS1.2/adm/platform.

Every four hours, a record of powered-on components is written to the SEEPROM.
You may see hourly errors describing "Ignoring CRC error on DIMM with known bad CRC on DIMM" or "TH Segment CRC is wrong on DIMM".

If you have one of the older DIMMS (for example, part number 501-5401-01 DIMM) that has the older CRCs in the SEEPROM, you may see platform error messages similar to those shown below (in file /var/opt/SUNWSMS/SMS1.2/adm/platform):

Jul 25 10:14:29 2002 Q8SJ3 frad[442]: [9926 594352968222058 NOTICE SeepromInfoPro.cc 2715] Ignoring CRC error on DIMM with known bad CRC on DIMM at SB9/P1/B0/D3
Jul 25 11:14:13 2002 Q8SJ3 frad[442]: [9926 597937432405436 NOTICESeepromInfoPro.cc 2715] Ignoring CRC error on DIMM with known bad CRC on
DIMM at SB9/P1/B0/D3
Jul 25 12:14:22 2002 Q8SJ3 frad[442]: [9926 601545906798911 NOTICE SeepromInfoPro.cc 2715] Ignoring CRC error on DIMM with known bad CRC on
DIMM at SB9/P1/B0/D3
Jul 25 13:14:13 2002 Q8SJ3 frad[442]: [9926 605136625163401 NOTICE SeepromInfoPro.cc 2715] Ignoring CRC error on DIMM with known bad CRC on
DIMM at SB9/P1/B0/D3


Or you may see messages such as:

Dec 17 08:18:14 2003 dwcoresc04-01 esmd[19577]: [1992 6553319427939517 ERR DynamicFru.cc 355] Failed to write the power event record, STILL_ON, of fru SB12/P2/B0/D0: rc=-3
Dec 17 12:18:50 2003 dwcoresc04-01 frad[453]: [9913 6567755605944935 ERR SeepromInfoPro.cc 2978] TH Segment CRC is wrong on DIMM at SB12/P2/B0/D0, offset = 207, original crc: ccc33740, calculated crc: cd1b1440
Dec 17 12:18:50 2003 dwcoresc04-01 esmd[19577]: [1989 6567755607160468 ERR FruAccess.cc 542] Failed to read the temperature summary record of fru SB12/P2/B0/D0(sensor=0): rc=-3
Dec 17 12:18:50 2003 dwcoresc04-01 esmd[19577]: [1993 6567755608282886 ERR DynamicFru.cc 208] Failed to update the temperature summary record of fru SB12/P2/B0/D0(sensor=0): rc=-3
Dec 17 12:18:50 2003 dwcoresc04-01 frad[453]: [9913 6567755636154931 ERR SeepromInfoPro.cc 2978] TH Segment CRC is wrong on DIMM at SB12/P2/B0/D0, offset = 207, original crc: ccc33740, calculated crc: cd1b1440

Cause

Technically, there is a problem with the DIMM. The SEEPROM on SB9/P1/B0/D3 (first example) and SB12/P2/B0/D0 (second example) has a corrupt entry/segment. However, this is NOT an error in the memory area used by the domain. That's why you're not seeing CE ECCs or Rstops from the domain side.

Solution

To have the error messages cease, the DIMM must be replaced. However, because it has no impact on the health or operation of the domain, it can be done at the next scheduled maintenance window. You do not need to schedule a maintenance window for the sole purpose of replacing this DIMM.


Product
Sun Fire 15K Server
Sun Fire 12K Server


Internal Section

Previously Published As 48156

This common issue can generate calls from customers who are concerned by the messages. As the doc shows, they should not be alarmed, and can wait to have service done until they can schedule service.
Info suchas this should be available to Customers on MOS to prevent cases from being logged until they are ready to service it.

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback