Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1012557.1
Update Date:2010-09-30
Keywords:

Solution Type  Technical Instruction Sure

Solution  1012557.1 :   Sun StorEdge[TM] 3510/3511 Arrays: Understanding Link Error Status Block (LESB) counters.  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage T3 Array
  •  
  • Sun Storage T3+ Array
  •  
  • Sun Storage A5000 Array
  •  
  • Sun Storage 6120 Array
  •  
  • Sun Storage A5200 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - Other
  •  

PreviouslyPublishedAs
217290


Applies to:

Sun Storage 3510 FC Array
Sun Storage 3511 SATA Array
Sun Storage 6120 Array
Sun Storage A5000 Array
Sun Storage A5200 Array
All Platforms

Goal


This document is primarily intended as a reference for Sun StorEdge[TM] 3510 and 3511 arrays,  but  can be used as a reference for all Fiber Channel (hence referred to as FC) arrays.

Solution


The RAID controller, fibre channel (FC) disk drives and the scsi enclosure services (SES) device all provide the ability to return FC error information that is contained in the Link Error Status Block (LESB). The information in the LESB is very useful for troubleshooting intermittent FC problems.
This document will help understand the LESB and how to use the information provided by the link error status blocks (LESB )to troubleshoot FC loop problems.


Steps to Follow
The LESB is maintained internally by all FC devices and contains accumulated values for errors and other FC statistics.
The LESB is obtained from a device by issuing a Read Link Status Extended Link Service (RLS ELS) to the specified loop ID.
In the case of SE3510/3511, the sccli command "diag error" causes a RLS ELS to be issued to one or more devices in the loop to obtain the LESB.
Following is a description of the information that's returned in the LESB.
1. Total number of LIPs:-
This counter will increment every-time that there is a loop initialization (LIP). Note that the LIP occurs normally for a FC device to join/leave the loop. If there is any problem in the loop or the loop is unstable, there could be LIPs and this counter could increment.
2. Total number of instances of link failures:-
Link failure indicates that the receiving FC device has detected a link-down condition.
3. Total number of instances of loss of Synchronization:-
Loss of synchronization that the receiving FC device has detected an interruption in the protocol sequence that maintains synchronization of information on the loop.
4. Total number of instances of loss of signal:-
Loss of signal indicates that the receiving FC device has detected a complete loss of signal on the loop.
5. Total number of instance of primitive sequence protocol errors:-
A primitive sequence protocol error indicates that the receiving FC device has detected an error in a primitive sequence. These types of errors typically occur outside of FC frames.
6. Total number of instances of invalid transmission words:-
An invalid transmission word indicates that the receiving FC device has detected an invalid transmission word that is not a part of an ordered set,or that an incorrect running disparity has been detected. Invalid transmission word errors can occur inside or outside of FC frames.
7. Total number of instances of invalid CRC:-
A CRC error indicates that the receiving FC device has detected an error within a FC frame. The result of the CRC calculated by the FC receiver for the frame does not match the CRC that was sent by the FC transmitting device.
Having understood this, now if you see the following on a SE3510/3511, you can understand whats going on...
diag data for channel 2 shows:-
sccli: selected device /dev/rdsk/c5t600C0FF00000000007FA9366227CA300d0s2 [SUN StorEdge 3510 SN#07FA93]
CH ID TYPE LIP LinkFail LossOfSy LossOfSi PrimErr InvalTxW InvalCRC
-------------------------------------------------------------------------
2 0 DISK 17 0 0 0 0 159 0
2 1 DISK 17 0 41 0 0 2573 0
2 2 DISK 17 0 2 0 0 450 0
2 3 DISK 17 0 0 0 0 279 0
2 4 DISK 17 0 2 0 0 159 0
2 5 DISK 17 0 1 0 0 280 0
2 6 DISK 17 0 0 0 0 38 0
2 7 DISK 17 0 2 0 0 158 0
2 8 DISK 17 0 1 0 0 158 0
2 9 DISK 17 0 2 0 0 159 0
2 10 DISK 17 0 43 0 0 159 0
2 11 DISK 17 0 0 0 0 279 0
2 12 SES 17 0 0 0 0 0 0
2 32 DISK 17 0 4 0 0 400 0
2 33 DISK 17 0 0 0 0 413 0
2 34 DISK 17 0 0 0 0 277 0
2 35 DISK 17 0 69 0 0 322 0
2 36 DISK 17 0 2 0 0 280 0
2 37 DISK 17 0 79 0 0 83221 0
2 38 DISK 17 0 2 0 0 399 0
2 39 DISK 17 0 2 0 0 280 0
2 40 DISK 17 0 5 0 0 481 0
2 41 DISK 17 0 0 0 0 277 0
2 42 DISK 17 0 2 0 0 37 0
2 43 DISK 17 0 0 0 0 156 0
2 44 SES 17 0 0 0 0 0 0
2 112 DISK 17 0 46 0 0 3175 0
2 112 DISK 17 0 46 0 0 3175 0
2 113 DISK 17 0 6 0 0 415 0
2 114 DISK 17 0 3 0 0 404 0
2 115 DISK 17 0 0 0 0 256 0
2 116 DISK 17 0 0 0 0 280 0
2 117 DISK 17 0 70 0 0 83299 0
2 118 DISK 17 0 6 0 0 763 0
2 119 DISK 17 0 0 0 0 39 0
2 120 DISK 17 0 2 0 0 158 0
2 121 DISK 17 0 2 0 0 401 0
2 122 DISK 17 0 2 0 0 281 0
2 123 DISK 17 0 42 0 0 3416 0
2 124 SES 17 0 0 0 0 0 0
2 14 RAID 17 0 0 0 0 0 0
2 15 RAID 17 0 0 0 0 0 0
and diag data for channel 3 shows:-
sccli: selected device /dev/rdsk/c5t600C0FF00000000007FA9366227CA300d0s2 [SUN StorEdge 3510 SN#07FA93]
CH ID TYPE LIP LinkFail LossOfSy LossOfSi PrimErr InvalTxW InvalCRC
-------------------------------------------------------------------------
3 0 DISK 180 0 7 0 0 113872 4
3 1 DISK 180 0 13 0 0 116989 17
3 2 DISK 180 0 11 0 0 113524 0
3 3 DISK 180 0 7 0 0 114687 5
3 4 DISK 180 0 8 0 0 114500 12
3 5 DISK 180 0 9 0 0 114240 2
3 6 DISK 180 0 4 0 0 111251 0
3 7 DISK 180 0 6 0 0 109227 35
3 8 DISK 180 0 10 0 0 115010 31
3 9 DISK 180 0 8 0 0 113054 49
3 10 DISK 180 0 6 0 0 111574 59
3 11 DISK 180 0 74 0 0 113619 0
3 12 SES 180 0 0 0 4 0 0
3 32 DISK 180 0 5 0 0 110749 141
3 33 DISK 180 0 5 0 0 108484 0
3 34 DISK 180 0 241 0 0 124826 213
3 35 DISK 180 0 75 0 0 1848807 359
3 36 DISK 180 0 6 0 0 106782 0
3 37 DISK 180 0 76 0 0 1848374 0
3 38 DISK 180 0 4 0 0 108279 0
3 39 DISK 180 0 4 0 0 106821 0
3 40 DISK 180 0 7 0 0 109237 1
3 41 DISK 180 49 104989 0 0 610943 3
3 42 DISK 180 0 4 0 0 106803 0
3 43 DISK 180 0 2 0 0 106611 0
3 44 SES 180 0 0 0 0 0 0
3 112 DISK 180 0 45 0 0 109901 0
3 113 DISK 180 0 76 0 0 110926 0
3 114 DISK 180 0 6 0 0 108043 0
3 115 DISK 180 0 10 0 0 106042 0
3 116 DISK 180 0 3 0 0 108666 0
3 117 DISK 180 0 82 0 0 1845672 0
3 118 DISK 180 0 14 0 0 113513 3
3 119 DISK 180 0 6 0 0 113617 3
3 120 DISK 180 0 6 0 0 107927 0
3 121 DISK 180 0 5 0 0 109396 0
3 122 DISK 180 0 9 0 0 111031 7
3 123 DISK 180 37 296508 0 0 5456006 0
3 124 SES 180 0 0 0 0 0 0
3 14 RAID 180 0 0 0 0 3 0
3 15 RAID 180 0 0 0 0 3 0
From the above, we can say that channel 3 (loop b) is having a highly unstable loop ie looking at the high number of InvalTxW and also having CRC errors in the frames.
In this case, it would be highly recommended to schedule a downtime and isolate the bad component on the loop. It is beyond the scope of this document to help isolate the bad component.
Notes:-
1. The statistics is stored on the controller, drives and SES devices and they can only be reset via a power cycle. Just by resetting the array, the counters are not resetted.
2. The InvalTxW I.E the 'Invalid word' field counter is a counter for the number invalid transmission and is 4 bytes in size on the FC interface. This counter is updated only after the synchronization state has been acquired. The receiver FC node (could be a HBA or controller) checks each received transmission word to determine if the word is valid or not. If not valid, then it updates the counter, and ignores the sequence. Keep in mind this is a receiver function that's maintained by each end.
When we see a very high number in the InvalTxW (invalid words), we also need to look at the other counters like the lnkfail, LossOfSync, or LossOfSignal and if they too have a high number then we may have a flaky link (as in the example above). If however, there is a high number of InvalidWord but not on the other counters, then the transmitting node has something wrong. Check for the HBA or the controller.
3. All the FC devices (arrays) maintain the LESB information but the commands/utilities to get the information are different for all the arrays. This document is not about how to get the LESB block information but how to interpret the same and that is the same for all the arrays.
For example, for T3/T3+/6120, we can access the LESB data using the
.disk linkstat command and following is the sample output
.disk linkstat u1d1-9 path 1
DISK LINKFAIL LOSSSYNC LOSSSIG PROTOERR INVTXWORD INVCRC
--------------------------------------------------------
u1d1 1 6 0 0 37 0
u1d2 1 4 0 0 37 0
u1d3 1 15 0 0 8 0
u1d4 1 3 0 0 37 0
u1d5 1 5 0 0 37 0
u1d6 1 17 0 0 27 0
u1d7 1 5 0 0 37 0
u1d8 1 4 0 0 37 0
u1d9 1 7 0 0 37 0
and you will see that the columns are the same as what we saw for SE3510/3511, hence we can use this document to interpret the LESB data for all FC storage arrays.
Another example would be SunStorage A5x00 array which is a JBOD of FC disks. To get the LESB data for this JBOD (Just a bunch of disks) use the following command from the host
luxadm -e rdls pathname
Example:-
luxadm -e rdls /devices/sbus@2,0-SUNW,socal@2,0-sf@0,0:devctl.out
would give us the following output for all the FC disks in the JBOD array
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
1 720896 0 0 0 0 0
d2 0 2 11 0 0 0
ef 0 79 0 0 2 0
e8 0 0 0 0 5 0
e2 0 0 0 0 4 0
e0 0 0 0 0 3 0
dc 0 1 0 0 117 0
d5 0 0 0 0 3 0
2 720896 1 0 0 117 0
b5 0 2 13 0 0 0
cd 0 95 0 0 3 0
ca 0 0 0 0 6 0
c7 0 0 0 0 6 0
c6 0 0 0 0 6 0
ba 0 0 0 0 6 0
which again gives the same data which can be interpreted using this document.


Product
Sun StorageTek 3511 SATA Array
Sun StorageTek 3510 FC Array
Sun StorageTek T3 Array
Sun StorageTek A5200 Array
Sun StorageTek A5000 Array
Sun StorageTek A5100 Array
Sun StorageTek T3+/6X20 Controller Firmware 3.1
Sun StorageTek T3+ Array Controller FW 2.1
Sun StorageTek T3 Multi-Platform 1.1
Sun StorageTek T3+ Array
Sun StorageTek 6120 Array

Internal Comments
Sun StorEdge[TM] 3510/3511 Arrays: Understanding Link Error Status Block (LESB) counters.

ELS. LESB, CRC, 3510, 3511, InvalTxW, LIP, LossOfSy, LossOfSi. PrimiErr, T3, T3+, 6120, A5200, A5100
Previously Published As
81767

Change History
Date: 2010
User Name: Vickie Williams
Comment: *** Restored Published Content *** SSH AUDIT
Version: 0

User Name: 7058
Action: Update Started
Comment: SSH AUDIT
Version: 0
Date: 2005-06-03
User Name: 25440
Action: Approved
Comment: Thanks for the clarification. Publishing.
Version: 4


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback