Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1013668.1
Update Date:2011-05-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1013668.1 :   SL8500 - HBT Drive Communication Errors  


Related Items
  • Sun StorageTek SL8500 Modular Library System
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Tape>Libraries - SL-Series
  •  

PreviouslyPublishedAs
219289


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: Confidential for Partners and Oracle Support personnel

Applies to:

Sun StorageTek SL8500 Modular Library System
All Platforms
Checked for relevance on 25-May-2011.

Symptoms

HBT card hang
Event error 1301
Event error 3954
Drive communications error

Changes

NA

Cause

NA

Solution

Resolution
If drive communication errors are reported recommend:

 

-Check the status of the drive via the system detail screens in SLC.

Example of normal status:

Health State : ok

Device State : Ready

Access State online

Drive State : empty

Drive needs cleaning : false

Host Activity : false

-Analyze the library event log for the sequence of errors noted below on multiple drives. If the error occurs on just a single drive a reboot can be triggered via the SLC diagnostic menu screens.

If drive communication errors exist on all drives and the HBT card appears not to be responding, contact Level 2 Tape Hardware Support to pull the engineering logs from the HBT card for follow up.
If this is not possible, reset the HBT card via the reset switch behind the HBT faceplate. Carefully use a small pointed object like a ball point pen to activate the reset switch with the card installed.

Note: HBT card will reset in approximately 60 seconds and will not dismount any drives currently loaded.

It should not be necessary to reset the HBC card; if the HBC is reset, it will generate a full library reset including all the HandBots which would be disruptive to host operations.

Log examples below; engineering is evaluating a HBT reset driven automatically from the HBC when an HBT hang is detected.

2006-08-17T20:17:45.808, 1.2.2.1.2, root, hli1, queryDrive18566301, error, 1301, "Device, response time-out", request=getStateInternal

1,2,2,1,2
, HBC,

 

2006-08-17T20:17:45.913, 1.0.0.1.0, root, hli1, queryDrive18566301, error, 3954, "Failure, in send output: ", Data=getStateInternal

1,2,-2,1,3
* Exception:=java.net.SocketException: Operation failed, HBC,

 

2006-08-17T20:17:58.738, 1.1.-2.1.4, root, hli1, queryDrive18564201, error, 1301, "Device, response time-out", request=getStateInternal

1,1,-2,1,4
, HBC,

 

2006-08-17T20:17:58.830, 1.0.0.1.0, root, hli1, queryDrive18564201, error, 3954, "Failure, in send output: ", Data=getStateInternal

1,1,-1,1,1
* Exception:=java.net.SocketException: Operation failed, HBC,

 

2006-08-17 18:58:06 ACSLH[0]:

2378 N Co_ProcessResponses.C 1 1308

ACS: 1; LMU error: Co_4400:st_parse_error:

Error: 1001 - Drive error: Drive is not communicating

Request: Dismount, forced rewind and unload

Volser: I01240, media domain: L, media type: 1

Source: Drive 1,3,1,13

Destination: Cell 1,3,14,18,0

2006-08-17 18:58:06 ACSSA[0]:

2468 E sa_demux.c 1 278

drive 1, 3, 1,13 reported a Unit Attention.

2006-08-17 18:58:06 DISMOUNT[0]:

546 N cl_log_lh_er.c 1 99

dm_lh_lib_fail: LH error type = LH_ERR_TRANSPORT_FAILURE

.

2006-08-17 20:30:44 ACSSA[0]:

1431 N sa_demux.c 1 278

drive 1, 3, 1, 5: Library error, Transport failure

.

2006-08-17 20:33:12 ACSMT[0]:

429 N mt_timeout.c 1 135

mt_timeout: mid:42023 Mount timeout after 4920 seconds.

2006-08-17 20:33:12 ACSSA[0]:

1435 W sa_demux.c 1 278

Unable to handle unusual status or event. See related messages.

2006-08-17 20:36:13 ACSMON process[0]:

126 N mon_drv_examine.c 1 506

mon_lsm_examine:st_req_error: Timed out waiting for message



Additional Information
Other problems can also cause drive communication errors, i.e. drive powered off, drive SNO, etc.

HBT drive communication problem is under investigation by Library Engineering and exists in all current library code levels through FRS_3.08.
One customer site that experienced the HBT hang also experienced problems varying the ACS online after bouncing ACSLS. ACSLS requires status from at least one CAP and one Drive before the ACS will vary online.


3.08, HBT, SL8500, Code
Previously Published As STKKB78617

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback