Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000479.1
Update Date:2011-02-16
Keywords:

Solution Type  Sun Alert Sure

Solution  1000479.1 :   Running CE Driver at 100Mb in Forced Mode May Cause PCI IOMMU Panic and/or Other Operational Issues  


Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
200617


Product
Solaris 9 Operating System
Solaris 10 Operating System
Solaris 8 Operating System

Bug Id
<SUNBUG: 6217062>

Date of Workaround Release
01-NOV-2005

Date of Resolved Release
01-FEB-2006

Impact

Running the CE Ethernet driver (see ce(7D)) in forced mode (autonegotiation disabled) may cause a panic, link down, or similar hardware related issues.


Contributing Factors

This issue can occur on the following platforms:

SPARC Platform

  • Solaris 8, 9 and 10 systems (with the CE Ethernet connection configured for 100Mb with autonegotiation disabled)

x86 Platform

  • Solaris 8, 9 and 10 systems (with the CE Ethernet connection configured for 100Mb with autonegotiation disabled)

Notes:

  1. This condition has not yet been reported for 10Mb speed; only when the CE driver is configured for 100Mb full-duplex with autonegotiation disabled.
  2. Failures appear to be independent of hardware type (Saturn vs Cassini, on-board vs. add-in card) and traffic loading.
  3. Failures may be dependent on switch used but this has not been duplicated or proven to be a factor.
  4. Failure may also occur when the CE driver is configured with autonegotiation enabled, but the switch is in force mode leading to a mismatch between the link partners.

To date, no patch level combination has been shown to have any effect on failure symptoms.

To determine if there are CE (ce(7D)) ports configured on a system, the following command can be run:

    % ifconfig -a
    lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
    inet 127.0.0.1 netmask ff000000
    ce1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.8.55.60 netmask ffffff00 broadcast 10.8.55.255

To determine the link speed and/or duplex mode used by the CE driver, the following commands can be run:

    # kstat -p ce | grep link_speed
    # kstat -p ce | grep link_duplex

The following command/output can be used to check whether there is a mismatch between the link partners:

    $ kstat -p | grep lp_
    ce:0:ce0:lp_cap_1000fdx 1
    ce:0:ce0:lp_cap_1000hdx 0
    ce:0:ce0:lp_cap_100T4 0
    ce:0:ce0:lp_cap_100fdx 0
    ce:0:ce0:lp_cap_100hdx 1
    ce:0:ce0:lp_cap_10fdx 0
    ce:0:ce0:lp_cap_10hdx 0
    ce:0:ce0:lp_cap_asmpause 0
    ce:0:ce0:lp_cap_autoneg 1
    ce:0:ce0:lp_cap_pause 0

If the autonegotiation has worked well, lp_cap_autoneg should be 1. Also, if you see all the other fields in the above output as 0, it can be assumed that the autonegotiation has not worked well indicating a mismatch between the link partners.


Symptoms

One of the following symptoms may occur:

1. PCI iommu errors will occur, pointing to a bus that contains the PCI card or onboard port using the CE driver, as in the following example:

    Sep 7 21:12:12 examplebox pcisch:
    [ID 462479 kern.warning] WARNING: pcisch2 (pci@9,700000): PCI fault log start:
    Sep 7 21:12:12 examplebox pcisch:
    [ID 309153 kern.notice] PCI iommu error
    Sep 7 21:12:12 examplebox pcisch:
    [ID 866426 kern.notice] pcisch2: Error 1 on IOMMU TLB entry b:
    Sep 7 21:12:12 examplebox Context=0 not Writable not Streamable
    Sep 7 21:12:12 examplebox PCI Page Size=8k Address in page c1b30000
    Sep 7 21:12:12 examplebox pcisch: [ID 219581 kern.notice]
    Memory: Valid not Cacheable Page Frame=0
    Sep 7 21:12:12 examplebox pcisch: [ID 684763 kern.notice]
    pcisch2 (pci@9,700000): PBM AFSR=0x0.00000000
    Sep 7 21:12:12 examplebox pcisch: [ID 120591 kern.notice]
    dwordmask=0 bytemask=0
    Sep 7 21:12:12 examplebox pcisch: [ID 829486 kern.notice]
    pcisch2 (pci@9,700000): PCI primary error (0):
    Sep 7 21:12:12 examplebox pcisch: [ID 227296 kern.notice]
    pcisch2 (pci@9,700000): PCI secondary error (0):
    Sep 7 21:12:12 examplebox pcisch: [ID 748186 kern.notice]
    pcisch2 (pci@9,700000): PBM AFAR 0.00000000:
    Sep 7 21:12:12 examplebox pcisch: [ID 127741 kern.warning]
    WARNING: pcisch2: PCI config space CSR=0x2a80<signaled-target-abort,
    received-master-abort>
    Sep 7 21:12:12 exampleboxt log end.
    Sep 7 21:12:12 examplebox unix: [ID 836849 kern.notice]
    Sep 7 21:12:12 examplebox ^Mpanic[cpu7]/thread=30016fcfd00:
    Sep 7 21:12:12 examplebox unix: [ID 578303 kern.notice]
    pcisch-2: PCI bus 1 error(s)!
    Sep 7 21:12:12 examplebox unix: [ID 100000 kern.notice]
    Sep 7 21:12:12 examplebox genunix: [ID 723222 kern.notice]
    000002a100077ea0 pcisch:pbm_error_intr+164 (30006cdfe18, 273,
    3000019a398, 3, 30006cdfe18, 1)
    ...
    Sep 7 21:12:12 examplebox unix: [ID 100000 kern.notice]
    Sep 7 21:12:12 examplebox genunix: [ID 672855 kern.notice]
    syncing file systems...
    Sep 7 21:12:13 examplebox genunix: [ID 733762 kern.notice] 1
    Sep 7 21:12:16 examplebox last message repeated 1 time
    Sep 7 21:12:18 examplebox genunix: [ID 904073 kern.notice] done
    Sep 7 21:12:19 examplebox genunix: [ID 353387 kern.notice]
    dumping to /dev/dsk/c1t0d0s7, offset 65536

2. System panics constantly. A stack trace of the core file will show something similar to the following:

    pc: 0x10048b54 unix:panicsys+0x44: call unix:setjmp
    startpc: 0x1011c264 genunix:thread_create_intr+0x0: save %sp, -0xc0, %sp
    unix:panicsys+0x44(0x10147e18, 0x2a10007ca98, 0x104241e0, 0x1, 0x2000, , 0x80001607,
    0x10147e18, 0x2a10007ca98)
    unix:vpanic+0xcc(0x10147e18, 0x2a10007ca98, 0x25c, 0x2a10007c9f8, 0x100bcd78,
    0x3002787495a)
    unix:panic+0x1c(0x10147e18, 0x30059d1c000, 0x10438f08, 0x30059b25460, 0x8, 0x0)
    genunix:kmem_error+0x448(0x0, 0x30000035b00, 0x30059d1c000, , 0x30000035b00?,
    0x30059d1c000?)
    genunix:kmem_cache_alloc_debug+0xf8(, 0x30059d1c000?, 0x1)
    genunix:kmem_cache_alloc(0x30000035b00, 0x1) - frame recycled
    genunix:kmem_alloc+0x2c(0x2000, 0x1, , , 0x300285666d0, 0x78220800)
    ce:ce_allocb+0xc(0x2000, 0x1)
    ce:ce_replace_page+0xa8(0x3002f945e58, 0x30030c263b8, 0x2f, 0x2f0, 0x40,
    0x30027a26040)
    ce:ce_intr+0xed0(0x3002f945e58, , , 0x0, , 0x30027e90020)
    pcisch:pci_intr_wrapper+0x80(0x3002ea64880?)
    unix:intr_thread+0xa4(0x0, 0x0, 0x1041ccc0, 0x104245b0, 0x16, 0x0)
    unix:prom_rtt+0x0()
    -- interrupt data rp: 0x2a10001f9c0
    pc: 0x10044454 unix:idle+0x6c: andcc %g2, 0x4 ( btst %g2, 0x4 )
    npc: 0x10044458 unix:idle+0x70: bne,a,pt %icc, unix:idle+0x6c
    global: %g1 0x3002f72d000
    %g2 0x1b %g3 0
    %g4 0x7 %g5 0
    %g6 0 %g7 0x2a10001fd20
    out: %o0 0 %o1 0
    %o2 0x1041ccc0 %o3 0x104245b0
    %o4 0x16 %o5 0
    %sp 0x2a10001f261 %o7 0x10044494
    loc: %l0 0x10045e64 %l1 0
    %l2 0 %l3 0x2a1000e5d20
    %l4 0 %l5 0
    %l6 0 %l7 0
    in: %i0 0 %i1 0xffffffffffffffff
    %i2 0x104245b0 %i3 0x1041bc78
    %i4 0x3 %i5 0x1041bc00
    %fp 0x2a10001f311 %i7 0x1002ba00
    <intr trap>unix:idle+0x6c(0x0, 0x0, 0x104245b0)
    unix:thread_start+0x4()

Further detailed analysis of the core file may show that the CE driver (dma engine) has written a valid ethernet packet to a buffer that has already been freed.

3. Messages similar to the following may appear regularly:

    Mar 25 03:10:42 cdma1 genunix: [ID 408789 kern.notice] NOTICE:
    ce5: fault cleared external to device; service available
    Mar 25 03:10:42 cdma1 genunix: [ID 451854 kern.notice] NOTICE:
    ce5: xcvr addr:0x00 - link up 1000 Mbps full duplex
    Mar 25 03:10:42 cdma1 genunix: [ID 408789 kern.warning] WARNING:
    ce5: fault detected external to device; service degraded
    Mar 25 03:10:42 cdma1 genunix: [ID 451854 kern.warning] WARNING:
    ce5: xcvr addr:0x00 - link down

Workaround

Configure the CE interface to operate in autonegotiation enabled mode. Also make sure that the corresponding link partner (switch port) is also configured to operate in autonegotiation enabled mode. Make sure that the advertised capabilities of both the link partners are matching with each other.

The 100baseT issues noted in this Sun Alert have been eliminated at all reporting sites by reconfiguring the ethernet ports connected to the Cassini ports to ensure they are configured consistently. Mismatched configurations of the switch and NIC port are not supported. Such mismatches can result in "rx_tag_errors" which can, in rare cases, lead to the type of system panics described in this Sun Alert.

Following the recommendations in Sun document 817-6925-10 "Maximizing Performance of a Gigabit Ethernet NIC Interface" is critical to avoid many system issues, including those described here.

The Sun "Best Practices" Blueprint document at http://www.sun.com/blueprints/0404/817-6925.pdf gives the recommended practices on operating the ethernet link. Customers may follow the recommendations given in this blueprint.


Resolution

Please see the Relief/Workaround section for the resolution to this issue.



Modification History

Date: 11-NOV-2005
  • Updated Impact and Contributing Factors sections

Date: 13-DEC-2005
  • Updated Contributing Factors and Relief/Workaround sections

Date: 01-FEB-2006
  • Updated Relief/Workaround section, re-release as resolved

Date: 16-APR-2007
  • Updated Impact, Contributing Factors, Symptoms, and Relief/Workaround sections


Previously Published As
102015
Internal Comments

Internal Contributor/submitter
[email protected]
Internal Eng Business Unit Group
SSG NSN (Netra Systems and Networking)
Internal Eng Responsible Engineer
[email protected]
Internal Services Knowledge Engineer
[email protected]
Internal Escalation ID
1-12212204, 1-11932350, 1-12622511, 1-8280969, 1-10417507, 1-12503684
Internal Sun Alert Kasp Legacy ID
102015
Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2005-11-01, 2006-02-01
Avoidance: Workaround
Responsible Manager: [email protected]

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback