Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Troubleshooting Sure Solution 1008384.1 : Analyzing memory messages and hardware replacement needs for Sun SPARC systems.
PreviouslyPublishedAs 211466
Applies to:Sun Fire V890 ServerSun SPARC Enterprise T2000 Server Sun Fire V240 Server Sun Fire V250 Server Sun Fire V440 Server All Platforms PurposeThis document will discuss various memory messages, and issues pertaining to memory configuration and compatibility on Sun VSP SPARC systems.Last Review DateMay 12, 2011Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting DetailsAnalyzing memory messages and hardware replacement needs for Sun SPARC systems.Symptoms:
1. Confirm memory meets minimum configuration requirements. Memory needs on Sun systems will vary. The configuration will need to be confirmed on the specific machine in question. The best place to find this information is by referencing the online Sun System Handbook. 2. Verify the system meets Sun's memory compatibility guidelines (use the Sun System Handbook). This also has to be confirmed on the specific system in question. 3. Verify error exceeds Sun Best Practices thresholds. An individual memory module can log numerous corrected errors (CE's) before replacement is recommended. It is also necessary to verify the version of Solaris[TM] the system is running and it's current patch level to determine if thresholds are exceeded. Use <Document 1010905.1>. 4. Verify error is not root cause to DIMM (bad DIMM) using <Document 1004729.1> 5. For memory errors in Solaris 10, verify FMA fault logs were cleared. If system is running Solaris 10, the DIMM has been replaced but continues to log FMA errors in the /var/adm/messages file, the faults will need to be repaired in Solaris. To repair the FMA faults and error logs from Solaris run: # fmadm faulty you will see: FMADM faulty STATE RESOURCE / UUID For each fault listed under fmadm faulty run: # fmadm repair <uuid#> Check fmadm faulty again to make sure faults have been repaired. 6. Verify error is not root cause to CPU (bad CPU). Memory errors are usually caused by a faulty or failing DIMM but in some cases a bad CPU writer could be at fault. (Requires Sun Support to be engaged.) 7. Verify error is not root cause to motherboard slot (bad MB). If a DIMM that has been replaced continues to log errors, the slot on the motherboard may be faulty. To verify the issue is with the slot on the motherboard as opposed to the DIMM itself, swap memory modules keeping track of DIMM that logged the orignal errors with a known good DIMM. If the error moves with the DIMM then its the memory module that's faulty. If errors are logged on swapped DIMM sitting in the slot in question then the motherboard is at fault and will need to be replaced. 8. At this point, if you have validated that each troubleshooting step above is true for your environment and the issue still exists, further troubleshooting is required. Gather Explorer data collector from the system then contact Sun Support. Sun Engineers: Reference Document 1010921.1 to continue investigation from STEP 6 above. This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below: Domain Engineer/Lead : Dencho Kojucharov Feedback alias: [email protected] normalized, memory errors Previously Published As 91314 Change History Date: 2009-11-6 User Name: 103287 Action: Updated Comment: Removed link to an article that is being archived because it's bad (old information) and duplicates what is found in the SSH. Date: 2007-12-14 User Name: 71396 Action: Approved Comment: Performed final review of article. Attachments This solution has no attachment |
||||||||||||
|