Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1017847.1
Update Date:2011-04-15
Keywords:

Solution Type  Troubleshooting Sure

Solution  1017847.1 :   Backup Performance Troubleshooting Guidelines  


Related Items
  • Sun SPARCstorage DLT7000 Tape Drive
  •  
  • Sun Storage L7 Tape Autoloader
  •  
  • Sun Storage L8 Tape Autoloader
  •  
  • Sun Storage L1000 Tape Library
  •  
  • Sun Storage L1800 Tape Library
  •  
  • Sun StorageTek L40 Tape Library
  •  
  • Sun StorageTek L5500 Tape Library
  •  
  • Solstice Backup (SBU) Software
  •  
  • Sun Storage DDS-3 Tape Drive
  •  
  • Sun StorageTek L700 Tape Library
  •  
  • Sun Storage L100 Tape Library
  •  
  • Sun StorageTek L180 Tape Library
  •  
  • Sun StorageTek L20 Tape Library
  •  
  • Sun Storage L280 Tape Library
  •  
  • Sun Storage DDS-4 Tape Drive
  •  
  • Sun Storage L11000 Tape Library
  •  
  • Sun Storage L500 Tape Library
  •  
  • Sun Storage L25 Tape Library
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Tape>Drives - Misc
  •  

PreviouslyPublishedAs
229085


Applies to:

Sun Storage DDS-3 Tape Drive
Sun Storage DDS-4 Tape Drive
Sun SPARCstorage DLT7000 Tape Drive
Sun Storage L100 Tape Library
Sun Storage L1000 Tape Library
All Platforms
Checked for relevance on 15-Apr-2011.

Purpose

Quite often we get faced with customers having performance problems with their backups.

This can either be Solstice[TM] Backup, Sun StorEdge[TM] Enterprise Backup Software, or Veritas NetBackup.

Many times the wrong action is taken before finally performing the right one. This document provides a guideline to determine in a structured way where the problem may be found.

Last Review Date

April 15, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow
How does it work
First of all we need to know a couple of things:
  • What is the exact problem?
  • What other problems are there?
  • Where is the problem located?
  • Where is it located on the thing? (think of devices, groups, pools, resources, specific tapes!)
  • Where else is it located?
  • When was the first time the problem occurred?
  • From then on, when did it happen again?
  • Which trend do you see?
  • Were there any changes recently? When? What?

Try to get all answers as specific as possible!

Distinguish relevant layers

  • Tape/Library Hardware
  • Operating System
  • Device Drivers
  • Backup Software
  • Network Connections

Tape/Library Hardware

  • What can we tell from the LED status?
  • Are there any messages in the errorlog/eventlog on the Front Panel Module?
  • Is the library fully available? If not: What is functioning well, what not?
  • Where does the problem occur? On certain drives? Certain host-bus-adapters?
  • Is all firmware up-to-date and do firmware revisions of the various sub-components match to each other?
  • When possible:
    • Does a powercycle give any relief? What are the results?
    • Does the powercycle finish to ready/online state without problems?

Operating System

  • Are there any signs of SCSI-problems in /var/adm/messages? What do those
    errors tell us? (Transport errors? Media errors?)
  • What more problems do we see in /var/adm/messages?
  • Is the operating system enough up-to-date to support the library and the backup software?
  • Were there any changes recently? If so, what were those changes?
  • Does iostat -En report errors on the tape drives?

Device Drivers

  • Is the st-driver patched enough to support the hardware?
  • Were there any changes in the st.conf? Usually a default st.conf is sufficient.
  • Are there any other drivers in the device-path that require patching?

Backup Software

  • Were there any changes to the backup software recently? If so, what changes?
  • Was there an increase in data being backed up recently?
  • Were any clients added recently?

Network connections

  • Is the network able to transport the data fast enough?
  • Any changes recently?
  • Switch settings? (half/full duplex, autoneg)

Eliminate known good parts

If the above questions don't give a clue where to seek start by eliminating known good parts:

Eliminating Hardware, OS and Drivers

  • Does a backup of simple data using simple commands work well?
    Try this (using an empty tape):
  • On one command-line:
   # tar cf /dev/rmt/0cb /var
  • On a second command-line:
   # iostat -xn 5| egrep 'dev|rmt'

This should perform well (speed depending upon device type). Remember that tar isn't able to deliver multiple streams. On the other hand, you know that (for example) a DLT7000 drive is okay if it writes at speeds of 4 - 6 MB/s. If it runs < 2 MB/s there's something wrong.

  • Run the same command on different drives. Are they performing the same?
  • If the problem occurs on one drive:
    • Try to swap the cable with another drive (remember scsi-id's may differ) and run the above commands again
      • If the problem moves with the physical drive we are sure there's a problem on that drive.
        • Firmware?
        • Excessive wear?
        • Cleaning required?
      • If the problem stays on the same HBA (but different /dev/rmt/...) the HBA can be the cause, but the drives are proven to be okay.
  • If the problem occurs on some drives:
    • Try to find relationships:
      • Which HBA are the drives connected to?
      • Which powersupply are the drives fed from?
      • Are they of a specific type that uses specific drivers?
      • What about firmware?
      • Cleaning required?
  • If no drive perfoms well:
    • Find out why: Can the filesystem deliver data fast enough?
      Test it with "tar cf /dev/null /var" and monitor with "iostat -xn 5"

Suspect Backup Software

Let's assume we have no reasons to believe there's a problem in the hardware, OS or device drivers. Backups using tar perform well. So the Backup Software is suspected.

Now we have to find out which part of the backup software is involved:

  • Do all backups suffer?
  • Do local backups run well?
  • Do remote backups run well?
  • Certain groups/policy's having the problems?
  • What's so specific about these backups?
  • Is the same pool of media always involved?
  • What is the age of that media?

Determine this by:

  • Starting a simple local client.
    When possible create a local client that only backups a small filesystem (few hundred MBs). This should backup well to all drives.
  • Expand the situation by running a simple remote client.
    • Can the network deliver the data fast enough?
    • Expand this client by increasing number of streams.
    • Is it still fast enough?
  • Perhaps only Application Specific Modules that have the problem? (RMAN, Sybase module?)

Some tips to gather the right amount of data:

  • Stop backup application
  • Set verbose
  • Rotate logfiles (so you start with an empty set)
    • application log files
    • /var/adm/messages
  • Encourage customer not to use fancy log-filters/rotation mechanisms.
    They remove all relationships between various messages logged in a split second.
  • Start backup application
  • Start savegroup or backup policy that faces the problem
  • After the problem occurred save logs, run explorer (with appropriate options), gather information and examine the data.
  • Watch the backup-server using the *stat commands:
    A backup-server needs enough free cpu-cycles to serve drives, (gigabit-) ethernet and perform index and media management commands.


Product
VERITAS NetBackup 4.5
Sun StorageTek Enterprise Backup Software 7.1
Sun StorageTek DAT72 Blade Tray
Sun StorageTek DDS-4 Tape Drive

Internal Comments
Oracle review 11/10/09.

backup, performance, troubleshooting, tape, library, robot, sbu, ebs, networker, netbackup, nbu
Previously Published As
81363

Change History
Date: 2007-12-10
User Name: 7058
Action: Update Canceled
Comment: *** Restored Published Content *** See previous comment.
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback