Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1008081.1
Update Date:2009-09-23
Keywords:

Solution Type  Problem Resolution Sure

Solution  1008081.1 :   Single NAS Head or Both NAS Heads of Cluster Crash when Mounting a Volume  


Related Items
  • Sun Storage 5310 NAS Appliance
  •  
  • Sun Storage 5320 NAS Appliance
  •  
  • Sun Storage 5310 NAS Gateway System
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Network Attached Storage
  •  

PreviouslyPublishedAs
211124


Symptoms
When the NAS head is rebooted, the operator notices that a volume did not mount at boot. The operator then attempts to force mount the volume. This causes the NAS head, or, if in a cluster configuration, both NAS heads to crash.


Resolution
Resolution Steps

This is caused by an issue when trying to delete expired volume checkpoints. It has been reported under CR# 6477728. We are currently working to identify the actual cause of the problem.

Currently the only workaround is to throw away ALL existing checkpoints on the
volume causing problems. Unfortunately, there is no way to remove the single checkpoint.

Begin by listing the checkpoints on the volume in question. Review the
list of checkpoints. Check to see if there is a checkpoint with a status of 'DelPend'. This is the checkpoint that is causing the NAS head(s) to crash when the volume is mounted.

The steps for recovery are as follows:
-- mount the volume in question read only
-- fsck this volume and allow the fsck to return the volume to service
-- once back in service, remove ALL checkpoints on the volume
-- adjust the checkpoint schedule so that it is not as agressive

EXAMPLE:
In the following example, /vol1 is the volume that causes the NAS head to
crash when mounting. We will use this volume to illustrate the steps to
correct this.

Begin by opening a command line session with the NAS head.

1. List checkpoints on this volume.

nas2> chkpntls /vol1
Pseudo-Volume  Num
Volume        Enable  Enab Vis Man   Chkp  Disk Used
/vol1         Yes     Yes  Yes No    42    1.829G
 CPID     Created                  Status   Checkpoint
00000C00 Wed Sep 27 03:00:00 2006 Active   20060927-030000,7d
00000C01 Thu Sep 28 03:00:00 2006 Active   20060928-030000,7d
00000C02 Fri Sep 29 03:00:00 2006 Active   20060929-030000,7d
00000E03 Sun Oct  1 04:00:00 2006 Active   20061001-040000,1d12h
00000C04 Sat Sep 30 04:00:00 2006 Active   20060930-040000,1d12h
00000C05 Sun Oct  1 05:00:00 2006 Active   20061001-050000,1d12h
00000A06 Mon Sep 25 03:00:00 2006 Active   20060925-030000,7d
00000A07 Tue Sep 26 03:00:00 2006 Active   20060926-030000,7d
00000209 Fri Sep 29 17:00:00 2006 DelPend  20060929-170000,1d12h <---PROBLEM
0000020A Fri Sep 29 18:00:00 2006 Active   20060929-180000,7d
0000020B Fri Sep 29 19:00:00 2006 Active   20060929-190000,1d12h

2. Force mount the volume with the read only (ro) options.

nas2> mount -f -o ro /vol1
/vol1: mount processed, see log for details

3. Initiate an fsck on this volume.

nas2> fsck /vol1
Should repairs be needed, do you want them made?
By answering yes, required repairs will be made.
By answering no, the volume will only be checked
and no repairs will be made. By leaving blank,
you will be asked to decide only if a repair is
required.
Make required repairs? yes
sfs2ck vol1: Pass 1 - page and node allocation maps
sfs2ck vol1: Pass 2 - directories and reference counts
sfs2ck vol1: no errors
/vol1 is currently read-only.
Return it to normal service ? [no]yes
Aligning journal, stand by...
/vol1 now in service
Elapsed time: 0 minutes 47 seconds

3. fsck must be run until 'no errors' are found, and it comes
back clean. This could take 2 or more fskc runs.

nas2> fsck /vol1
Should repairs be needed, do you want them made?
By answering yes, required repairs will be made.
By answering no, the volume will only be checked
and no repairs will be made. By leaving blank,
you will be asked to decide only if a repair is
required.
Make required repairs? yes
sfs2ck vol1: Pass 1 - page and node allocation maps
sfs2ck vol1: Pass 2 - directories and reference counts
sfs2ck vol1: no errors
Elapsed time: 0 minutes 3 seconds

4. When the fsck has completed successfully and the volume is mounted,
the checkpoints can be aborted.

nas2> chkpntabort /vol1
All checkpoints on /vol1 will be
deleted and can not be recovered.
You must promptly run the file volume
check-and-repair (fsck) procedure.
Do you really want to abort checkpoints? ? [no]yes
/vol1 checkpoints aborted

6. Verify that the checkpoints have been deleted from the volume.

nas2> chkpntls /vol1
Pseudo-Volume  Num
Volume        Enable  Enab Vis Man   Chkp  Disk Used
/newvol       No
     No checkpoints

The mount/crash problem should be resolved at this point.

7. Review the checkpoint schedule and make changes such that it is
not performing as many checkpoints as originally configured.



Product
Sun StorageTek 5310 NAS Gateway System
Sun StorageTek 5310 NAS Appliance
Sun StorageTek 5320

se53x0, NAS, expired checkpoint, crash, volume, DelPend, 6477728
Previously Published As
87091

Change History
Date: 2006-10-12
User Name: 7058
Action: Approved
Comment: Adjusted format so that preformatted text is offset.
Minor grammar/punctuation fixes.
Spell ck OK.
Changed audience from free to contract per fee vs free guidelines :
Product_uuid
8a8b6eeb-092e-11da-99bc-080020a9ed93|Sun StorageTek 5310 NAS Gateway System
63654ce5-f88d-11d8-ab63-080020a9ed93|Sun StorageTek 5310 NAS Appliance
9d23ea64-a8be-11da-85b4-080020a9ed93|Sun StorageTek 5320

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback