Cari di RHE Linux 
    Red Hat Enterprise Linux Manual
Daftar Isi
(Sebelumnya) 5 : Chapter 4. Managing GFS2 - ...6 : Security Guide (Berikutnya)

Global File System 2

Chapter 5. Diagnosing and Correcting Problems with GFS2 File Systems

This chapter provides information about some common GFS2 issues and how to address them.

5.1. GFS2 File System Shows Slow Performance

You may find that your GFS2 file system shows slower performance than an ext3 file system. GFS2 performance may be affected by a number of influences and in certain use cases. Information that addresses GFS2 performance issues is found throughout this document.

5.2. GFS2 File System Hangs and Requires Reboot of One Node

If your GFS2 file system hangs and does not return commands run against it, but rebooting one specific node returns the system to normal, this may be indicative of a locking problem or bug. Should this occur, gather the following data:
  • The gfs2 lock dump for the file system on each node:
    cat /sys/kernel/debug/gfs2/fsname/glocks >glocks.fsname.nodename
  • The DLM lock dump for the file system on each node: You can get this information with the dlm_tool:
    dlm_tool lockdebug -sv lsname.
    In this command, lsname is the lockspace name used by DLM for the file system in question. You can find this value in the output from the group_tool command.
  • The output from the sysrq -t command.
  • The contents of the /var/log/messages file.
Once you have gathered that data, you can open a ticket with Red Hat Support and provide the data you have collected.

5.3. GFS2 File System Hangs and Requires Reboot of All Nodes

If your GFS2 file system hangs and does not return commands run against it, requiring that you reboot all nodes in the cluster before using it, check for the following issues.
  • You may have had a failed fence. GFS2 file systems will freeze to ensure data integrity in the event of a failed fence. Check the messages logs to see if there are any failed fences at the time of the hang. Ensure that fencing is configured correctly.
  • The GFS2 file system may have withdrawn. Check through the messages logs for the word withdraw and check for any messages and calltraces from GFS2 indicating that the file system has been withdrawn. A withdraw is indicative of file system corruption, a storage failure, or a bug. Unmount the file system, update the gfs2-utils package, and execute the fsck command on the file system to return it to service. Open a support ticket with Red Hat Support. Inform them you experienced a GFS2 withdraw and provide sosreports with logs.
    For information on the GFS2 withdraw function, see Section 4.14, "The GFS2 Withdraw Function".
  • This error may be indicative of a locking problem or bug. Gather data during one of these occurences and open a support ticket with Red Hat Support, as described in Section 5.2, "GFS2 File System Hangs and Requires Reboot of One Node".

5.4. GFS2 File System Does Not Mount on Newly-Added Cluster Node

If you add a new node to a cluster and you find that you cannot mount your GFS2 file system on that node, you may have fewer journals on the GFS2 file system than you have nodes attempting to access the GFS2 file system. You must have one journal per GFS2 host you intend to mount the file system on (with the exception of GFS2 file systems mounted with the spectator mount option set, since these do not require a journal). You can add journals to a GFS2 file system with the gfs2_jadd command, as described in Section 4.7, "Adding Journals to a File System".

5.5. Space Indicated as Used in Empty File System

If you have an empty GFS2 file system, the df command will show that there is space being taken up. This is because GFS2 file system journals consume space (number of journals * journal size) on disk. If you created a GFS2 file system with a large number of journals or specified a large journal size then you will be see (number of journals * journal size) as already in use when you execute the df. Even if you did not specify a large number of journals or large journals, small GFS2 file systems (in the 1GB or less range) will show a large amount of space as being in use with the default GFS2 journal size.

GFS2 Quota Management with the gfs2_quota Command

As of the Red Hat Enterprise Linux 6.1 release, GFS2 supports the standard Linux quota facilities. In order to use this you will need to install the quota RPM. This is the preferred way to administer quotas on GFS2 and should be used for all new deployments of GFS2 using quotas. For information on using the standard Linux quota facilities, see Section 4.5, "GFS2 Quota Management".
For earlier releases of Red Hat Enterprise Linux, GFS2 required the gfs2_quota command to manage quotas. This appendix documents the use of the gfs2_quota command for managing GFS2 file system quotas.

A.1. Setting Quotas with the gfs2_quota command

Two quota settings are available for each user ID (UID) or group ID (GID): a hard limit and a soft limit.
A hard limit is the amount of space that can be used. The file system will not let the user or group use more than that amount of disk space. A hard limit value of zero means that no limit is enforced.
A soft limit is usually a value less than the hard limit. The file system will notify the user or group when the soft limit is reached to warn them of the amount of space they are using. A soft limit value of zero means that no limit is enforced.
You can set limits using the gfs2_quota command. The command only needs to be run on a single node where GFS2 is mounted.
By default, quota enforcement is not set on GFS2 file systems. To enable quota accounting, use the quota= of the mount command when mounting the GFS2 file system, as described in Section A.4, "Enabling/Disabling Quota Enforcement".

Usage

Setting Quotas, Hard Limit
gfs2_quota limit -u User -l Size -f MountPoint
gfs2_quota limit -g Group -l Size -f MountPoint
Setting Quotas, Warn Limit
gfs2_quota warn -u User -l Size -f MountPoint
gfs2_quota warn -g Group -l Size -f MountPoint
User
A user ID to limit or warn. It can be either a user name from the password file or the UID number.
Group
A group ID to limit or warn. It can be either a group name from the group file or the GID number.
Size
Specifies the new value to limit or warn. By default, the value is in units of megabytes. The additional -k, -s and -b flags change the units to kilobytes, sectors, and file system blocks, respectively.
MountPoint
Specifies the GFS2 file system to which the actions apply.

Examples

This example sets the hard limit for user Bert to 1024 megabytes (1 gigabyte) on file system /mygfs2.
# gfs2_quota limit -u Bert -l 1024 -f /mygfs2
This example sets the soft limit for group ID 21 to 50 kilobytes on file system /mygfs2.
# gfs2_quota warn -g 21 -l 50 -k -f /mygfs2

A.2. Displaying Quota Limits and Usage with the gfs2_quota Command

Quota limits and current usage can be displayed for a specific user or group using the gfs2_quota get command. The entire contents of the quota file can also be displayed using the gfs2_quota list command, in which case all IDs with a non-zero hard limit, soft limit, or value are listed.

Usage

Displaying Quota Limits for a User
gfs2_quota get -u User -f MountPoint
Displaying Quota Limits for a Group
gfs2_quota get -g Group -f MountPoint
Displaying Entire Quota File
gfs2_quota list -f MountPoint
User
A user ID to display information about a specific user. It can be either a user name from the password file or the UID number.
Group
A group ID to display information about a specific group. It can be either a group name from the group file or the GID number.
MountPoint
Specifies the GFS2 file system to which the actions apply.

Command Output

GFS2 quota information from the gfs2_quota command is displayed as follows:
user User: limit:LimitSize warn:WarnSize value:Valuegroup Group: limit:LimitSize warn:WarnSize value:Value
The LimitSize, WarnSize, and Value numbers (values) are in units of megabytes by default. Adding the -k, -s, or -b flags to the command line change the units to kilobytes, sectors, or file system blocks, respectively.
User
A user name or ID to which the data is associated.
Group
A group name or ID to which the data is associated.
LimitSize
The hard limit set for the user or group. This value is zero if no limit has been set.
Value
The actual amount of disk space used by the user or group.

Comments

When displaying quota information, the gfs2_quota command does not resolve UIDs and GIDs into names if the -n option is added to the command line.
Space allocated to GFS2's hidden files can be left out of displayed values for the root UID and GID by adding the -d option to the command line. This is useful when trying to match the numbers from gfs2_quota with the results of a du command.

Examples

This example displays quota information for all users and groups that have a limit set or are using any disk space on file system /mygfs2.
# gfs2_quota list -f /mygfs2
This example displays quota information in sectors for group users on file system /mygfs2.
# gfs2_quota get -g users -f /mygfs2 -s

A.3. Synchronizing Quotas with the gfs2_quota Command

GFS2 stores all quota information in its own internal file on disk. A GFS2 node does not update this quota file for every file system write; rather, by default it updates the quota file once every 60 seconds. This is necessary to avoid contention among nodes writing to the quota file, which would cause a slowdown in performance.
As a user or group approaches their quota limit, GFS2 dynamically reduces the time between its quota-file updates to prevent the limit from being exceeded. The normal time period between quota synchronizations is a tunable parameter, quota_quantum. You can change this from its default value of 60 seconds using the quota_quantum= mount option, as described in Table 4.2, "GFS2-Specific Mount Options". The quota_quantum parameter must be set on each node and each time the file system is mounted. Changes to the quota_quantum parameter are not persistent across unmounts. You can update the quota_quantum value with the mount -o remount.
You can use the gfs2_quota sync command to synchronize the quota information from a node to the on-disk quota file between the automatic updates performed by GFS2.

Usage

Synchronizing Quota Information
gfs2_quota sync -f MountPoint
MountPoint
Specifies the GFS2 file system to which the actions apply.
Tuning the Time Between Synchronizations
mount -o quota_quantum=secs,remount BlockDevice MountPoint
MountPoint
Specifies the GFS2 file system to which the actions apply.
secs
Specifies the new time period between regular quota-file synchronizations by GFS2. Smaller values may increase contention and slow down performance.

Examples

This example synchronizes the quota information from the node it is run on to file system /mygfs2.
# gfs2_quota sync -f /mygfs2
This example changes the default time period between regular quota-file updates to one hour (3600 seconds) for file system /mnt/mygfs2 when remounting that file system on logical volume /dev/volgroup/logical_volume.
# mount -o quota_quantum=3600,remount /dev/volgroup/logical_volume /mnt/mygfs2

A.4. Enabling/Disabling Quota Enforcement

In GFS2 file systems, quota enforcement is disabled by default. To enable quota enforcement for a file system, mount the file system with the quota=on option specified.

Usage

mount -o quota=on BlockDevice MountPoint
To mount a file system with quota enforcement disabled, mount the file system with the quota=off option specified. This is the default setting.
mount -o quota=off BlockDevice MountPoint
-o quota={on|off}
Specifies that quota enforcement is enabled or disabled when the file system is mounted.
BlockDevice
Specifies the block device where the GFS2 file system resides.
MountPoint
Specifies the directory where the GFS2 file system should be mounted.

Examples

In this example, the GFS2 file system on /dev/vg01/lvol0 is mounted on the /mygfs2 directory with quota enforcement enabled.
# mount -o quota=on /dev/vg01/lvol0 /mygfs2

A.5. Enabling Quota Accounting

It is possible to keep track of disk usage and maintain quota accounting for every user and group without enforcing the limit and warn values. To do this, mount the file system with the quota=account option specified.

Usage

mount -o quota=account BlockDevice MountPoint
-o quota=account
Specifies that user and group usage statistics are maintained by the file system, even though the quota limits are not enforced.
BlockDevice
Specifies the block device where the GFS2 file system resides.
MountPoint
Specifies the directory where the GFS2 file system should be mounted.

Example

In this example, the GFS2 file system on /dev/vg01/lvol0 is mounted on the /mygfs2 directory with quota accounting enabled.
# mount -o quota=account /dev/vg01/lvol0 /mygfs2

Converting a File System from GFS to GFS2

Since the Red Hat Enterprise Linux 6 release does not support GFS file systems, you must upgrade any existing GFS file systems to GFS2 file systems with the gfs2_convert command. Note that you must perform this conversion procedure on a Red Hat Enterprise Linux 5 system before upgrading to Red Hat Enterprise Linux 6.

Warning

Before converting the GFS file system, you must back up the file system, since the conversion process is irreversible and any errors encountered during the conversion can result in the abrupt termination of the program and consequently an unusable file system.
Before converting the GFS file system, you must use the gfs_fsck command to check the file system and fix any errors.
If the conversion from GFS to GFS2 is interrupted by a power failure or any other issue, restart the conversion tool. Do not attempt to execute the fsck.gfs2 command on the file system until the conversion is complete.
When converting full or nearly full file systems, it is possible that there will not be enough space available to fit all the GFS2 file system data structures. In such cases, the size of all the journals is reduced uniformly such that everything fits in the available space.

B.1. Converstion of Context-Dependent Path Names

GFS2 file systems do not provide support for Context-Dependent Path Names (CDPNs), which allow you to create symbolic links that point to variable destination files or directories. To achieve the same functionality as CDPNs in GFS2 file systems, you can use the bind option of the mount command.
The gfs2_convert command identifies CDPNs and replaces them with empty directories with the same name. In order to configure bind mounts to replace the CDPNs, however, you need to know the full paths of the link targets of the CDPNs you are replacing. Before converting your file system, you can use the find command to identify the links.
The following command lists the symlinks that point to a hostname CDPN:
[root@smoke-01 gfs]# find /mnt/gfs -lname @hostname/mnt/gfs/log
Similarly, you can execute the find command for other CDPNs (mach, os, sys, uid, gid, jid). Note that since CDPN names can be of the form @hostname or {hostname}, you will need to run the find command for each variant.
For more information on bind mounts and context-dependent pathnames in GFS2, see Section 4.12, "Bind Mounts and Context-Dependent Path Names".

B.2. GFS to GFS2 Conversion Procedure

Use the following procedure to convert a GFS file system to a GFS2 file system.
  1. On a Red Hat Enterprise Linux system, make a backup of your existing GFS file system.
  2. Unmount the GFS file system from all nodes in the cluster.
  3. Execute the gfs_fsck command on the GFS file system to ensure there is no file system corruption.
  4. Execute gfs2_convert gfsfilesystem. The system will display warnings and confirmation questions before converting gfsfilesystem to GFS2.
  5. Upgrade to Red Hat Enterprise Linux 6.
The following example converts a GFS file system on block device /dev/shell_vg/500g to a GFS2 file system.
[root@shell-01 ~]#  /root/cluster/gfs2/convert/gfs2_convert /dev/shell_vg/500g gfs2_convert version 2 (built May 10 2010 10:05:40)Copyright (C) Red Hat, Inc.  2004-2006  All rights reserved.Examining file system..................This program will convert a gfs1 filesystem to a gfs2 filesystem.WARNING: This can't be undone.  It is strongly advised that you:   1. Back up your entire filesystem first.   2. Run gfs_fsck first to ensure filesystem integrity.   3. Make sure the filesystem is NOT mounted from any node.   4. Make sure you have the latest software versions.Convert /dev/shell_vg/500g from GFS1 to GFS2? (y/n)yConverting resource groups...................Converting inodes.24208 inodes from 1862 rgs converted.Fixing file and directory information.18 cdpn symlinks moved to empty directories.Converting journals.Converting journal space to rg space.Writing journal #1...done.Writing journal #2...done.Writing journal #3...done.Writing journal #4...done.Building GFS2 file system structures.Removing obsolete GFS1 file system structures.Committing changes to disk./dev/shell_vg/500g: filesystem converted successfully to gfs2.

GFS2 tracepoints and the debugfs glocks File

This appendix describes both the glock debugfs interface and the GFS2 tracepoints. It is intended for advanced users who are familiar with file system internals who would like to learn more about the design of GFS2 and how to debug GFS2-specific issues.

C.1. GFS2 tracepoint Types

There are currently three types of GFS2 tracepoints: glock (pronounced "gee-lock") tracepoints, bmap tracepoints and log tracepoints. These can be used to monitor a running GFS2 file system and give additional information to that which can be obtained with the debugging options supported in previous releases of Red Hat Enterprise Linux. Tracepoints are particularly useful when a problem, such as a hang or performance issue, is reproducible and thus the tracepoint output can be obtained during the problematic operation. In GFS2, glocks are the primary cache control mechanism and they are the key to understanding the performance of the core of GFS2. The bmap (block map) tracepoints can be used to monitor block allocations and block mapping (lookup of already allocated blocks in the on-disk metadata tree) as they happen and check for any issues relating to locality of access. The log tracepoints keep track of the data being written to and released from the journal and can provide useful information on that part of GFS2.
The tracepoints are designed to be as generic as possible. This should mean that it will not be necessary to change the API during the course of Red Hat Enterprise Linux 6. On the other hand, users of this interface should be aware that this is a debugging interface and not part of the normal Red Hat Enterprise Linux 6 API set, and as such Red Hat makes no guarantees that changes in the GFS2 tracepoints interface will not occur.
Tracepoints are a generic feature of Red Hat Enterprise Linux 6 and their scope goes well beyond GFS2. In particular they are used to implement the blktrace infrastructure and the blktrace tracepoints can be used in combination with those of GFS2 to gain a fuller picture of the system performance. Due to the level at which the tracepoints operate, they can produce large volumes of data in a very short period of time. They are designed to put a minimum load on the system when they are enabled, but it is inevitable that they will have some effect. Filtering events via a variety of means can help reduce the volume of data and help focus on obtaining just the information which is useful for understanding any particular situation.

C.2. Tracepoints

The tracepoints can be found under /sys/kernel/debug/tracing/ directory assuming that debugfs is mounted in the standard place at the /sys/kernel/debug directory. The events subdirectory contains all the tracing events that may be specified and, provided the gfs2 module is loaded, there will be a gfs2 subdirectory containing further subdirectories, one for each GFS2 event. The contents of the /sys/kernel/debug/tracing/events/gfs2 directory should look roughly like the following:
[root@chywoon gfs2]# lsenable gfs2_bmap   gfs2_glock_queue gfs2_log_flushfilter gfs2_demote_rq  gfs2_glock_state_change  gfs2_pingfs2_block_alloc  gfs2_glock_put  gfs2_log_blocks  gfs2_promote
To enable all the GFS2 tracepoints, run the following command:
[root@chywoon gfs2]# echo -n 1 >/sys/kernel/debug/tracing/events/gfs2/enable
To enable a specific tracepoint, there is an enable file in each of the individual event subdirectories. The same is true of the filter file which can be used to set an event filter for each event or set of events. The meaning of the individual events is explained in more detail below.
The output from the tracepoints is available in ASCII or binary format. This appendix does not currently cover the binary interface. The ASCII interface is available in two ways. To list the current content of the ring buffer, you can run the following command:
[root@chywoon gfs2]# cat /sys/kernel/debug/tracing/trace
This interface is useful in cases where you are using a long-running process for a certain period of time and, after some event, want to look back at the latest captured information in the buffer. An alternative interface, /sys/kernel/debug/tracing/trace_pipe, can be used when all the output is required. Events are read from this file as they occur; there is no historical information available via this interface. The format of the output is the same from both interfaces and is described for each of the GFS2 events in the later sections of this appendix.
A utility called trace-cmd is available for reading tracepoint data. For more information on this utility, refer to the link in Section C.9, "References". The trace-cmd utility can be used in a similar way to the strace utility, for example to run a command while gathering trace data from various sources.

C.3. Glocks

To understand GFS2, the most important concept to understand, and the one which sets it aside from other file systems, is the concept of glocks. In terms of the source code, a glock is a data structure that brings together the DLM and caching into a single state machine. Each glock has a 1:1 relationship with a single DLM lock, and provides caching for that lock state so that repetitive operations carried out from a single node of the file system do not have to repeatedly call the DLM, and thus they help avoid unnecessary network traffic. There are two broad categories of glocks, those which cache metadata and those which do not. The inode glocks and the resource group glocks both cache metadata, other types of glocks do not cache metadata. The inode glock is also involved in the caching of data in addition to metadata and has the most complex logic of all glocks.

Table C.1. Glock Modes and DLM Lock Modes

Glock modeDLM lock modeNotes
UNIV/NLUnlocked (no DLM lock associated with glock or NL lock depending on I flag)
SHPRShared (protected read) lock
EXEXExclusive lock
DFCWDeferred (concurrent write) used for Direct I/O and file system freeze

Glocks remain in memory until either they are unlocked (at the request of another node or at the request of the VM) and there are no local users. At that point they are removed from the glock hash table and freed. When a glock is created, the DLM lock is not associated with the glock immediately. The DLM lock becomes associated with the glock upon the first request to the DLM, and if this request is successful then the 'I' (initial) flag will be set on the glock. Table C.4, "Glock flags" shows the meanings of the different glock flags. Once the DLM has been associated with the glock, the DLM lock will always remain at least at NL (Null) lock mode until the glock is to be freed. A demotion of the DLM lock from NL to unlocked is always the last operation in the life of a glock.

Note

This particular aspect of DLM lock behavior has changed since Red Hat Enterprise Linux 5, which does sometimes unlock the DLM locks attached to glocks completely, and thus Red Hat Enterprise Linux 5 has a different mechanism to ensure that LVBs (lock value blocks) are preserved where required. The new scheme that Red Hat Enterprise Linux 6 uses was made possible due to the merging of the lock_dlm lock module (not to be confused with the DLM itself) into GFS2.
Each glock can have a number of "holders" associated with it, each of which represents one lock request from the higher layers. System calls relating to GFS2 queue and dequeue holders from the glock to protect the critical section of code.
The glock state machine is based on a workqueue. For performance reasons, tasklets would be preferable; however, in the current implementation we need to submit I/O from that context which prohibits their use.

Note

Workqueues have their own tracepoints which can be used in combination with the GFS2 tracepoints if desired
Table C.2, "Glock Modes and Data Types" shows what state may be cached under each of the glock modes and whether that cached state may be dirty. This applies to both inode and resource group locks, although there is no data component for the resource group locks, only metadata.

Table C.2. Glock Modes and Data Types

Glock modeCache DataCache MetadataDirty DataDirty Metadata
UNNoNoNoNo
SHYesYesNoNo
DFNoYesNoNo
EXYesYesYesYes

C.4. The glock debugfs Interface

The glock debugfs interface allows the visualization of the internal state of the glocks and the holders and it also includes some summary details of the objects being locked in some cases. Each line of the file either begins G: with no indentation (which refers to the glock itself) or it begins with a different letter, indented with a single space, and refers to the structures associated with the glock immediately above it in the file (H: is a holder, I: an inode, and R: a resource group) . Here is an example of what the content of this file might look like:
G:  s:SH n:5/75320 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]G:  s:EX n:3/258028 f:yI t:EX d:EX/0 a:3 r:4 H: s:EX f:tH e:0 p:4466 [postmark] gfs2_inplace_reserve_i+0x177/0x780 [gfs2] R: n:258028 f:05 b:22256/22256 i:16800G:  s:EX n:2/219916 f:yfI t:EX d:EX/0 a:0 r:3 I: n:75661/219916 t:8 f:0x10 d:0x00000000 s:7522/7522G:  s:SH n:5/127205 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]G:  s:EX n:2/50382 f:yfI t:EX d:EX/0 a:0 r:2G:  s:SH n:5/302519 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]G:  s:SH n:5/313874 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]G:  s:SH n:5/271916 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]G:  s:SH n:5/312732 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]
The above example is a series of excerpts (from an approximately 18MB file) generated by the command cat /sys/kernel/debug/gfs2/unity:myfs/glocks >my.lock during a run of the postmark benchmark on a single node GFS2 file system. The glocks in the figure have been selected in order to show some of the more interesting features of the glock dumps.
The glock states are either EX (exclusive), DF (deferred), SH (shared) or UN (unlocked). These states correspond directly with DLM lock modes except for UN which may represent either the DLM null lock state, or that GFS2 does not hold a DLM lock (depending on the I flag as explained above). The s: field of the glock indicates the current state of the lock and the same field in the holder indicates the requested mode. If the lock is granted, the holder will have the H bit set in its flags (f: field). Otherwise, it will have the W wait bit set.
The n: field (number) indicates the number associated with each item. For glocks, that is the type number followed by the glock number so that in the above example, the first glock is n:5/75320; that is, an iopen glock which relates to inode 75320. In the case of inode and iopen glocks, the glock number is always identical to the inode's disk block number.

Note

The glock numbers (n: field) in the debugfs glocks file are in hexadecimal, whereas the tracepoints output lists them in decimal. This is for historical reasons; glock numbers were always written in hex, but decimal was chosen for the tracepoints so that the numbers could easily be compared with the other tracepoint output (from blktrace for example) and with output from stat(1).
The full listing of all the flags for both the holder and the glock are set out in Table C.4, "Glock flags" and Table C.5, "Glock holder flags". The content of lock value blocks is not currently available via the glock debugfs interface.
Table C.3, "Glock types" shows the meanings of the different glock types.

Table C.3. Glock types

Type numberLock typeUse
1transTransaction lock
2inodeInode metadata and data
3rgrpResource group metadata
4metaThe superblock
5iopenInode last closer detection
6flockflock(2) syscall
8quotaQuota operations
9journalJournal mutex

One of the more important glock flags is the l (locked) flag. This is the bit lock that is used to arbitrate access to the glock state when a state change is to be performed. It is set when the state machine is about to send a remote lock request via the DLM, and only cleared when the complete operation has been performed. Sometimes this can mean that more than one lock request will have been sent, with various invalidations occurring between times.
Table C.4, "Glock flags" shows the meanings of the different glock flags.

Table C.4. Glock flags

FlagNameMeaning
dPending demoteA deferred (remote) demote request
DDemoteA demote request (local or remote)
fLog flushThe log needs to be committed before releasing this glock
FFrozenReplies from remote nodes ignored - recovery is in progress.
iInvalidate in progressIn the process of invalidating pages under this glock
IInitialSet when DLM lock is associated with this glock
lLockedThe glock is in the process of changing state
pDemote in progressThe glock is in the process of responding to a demote request
rReply pendingReply received from remote node is awaiting processing
yDirtyData needs flushing to disk before releasing this glock

When a remote callback is received from a node that wants to get a lock in a mode that conflicts with that being held on the local node, then one or other of the two flags D (demote) or d (demote pending) is set. In order to prevent starvation conditions when there is contention on a particular lock, each lock is assigned a minimum hold time. A node which has not yet had the lock for the minimum hold time is allowed to retain that lock until the time interval has expired.
If the time interval has expired, then the D (demote) flag will be set and the state required will be recorded. In that case the next time there are no granted locks on the holders queue, the lock will be demoted. If the time interval has not expired, then the d (demote pending) flag is set instead. This also schedules the state machine to clear d (demote pending) and set D (demote) when the minimum hold time has expired.
The I (initial) flag is set when the glock has been assigned a DLM lock. This happens when the glock is first used and the I flag will then remain set until the glock is finally freed (which the DLM lock is unlocked).

C.5. Glock Holders

Table C.5, "Glock holder flags" shows the meanings of the different glock holder flags.

Table C.5. Glock holder flags

FlagNameMeaning
aAsyncDo not wait for glock result (will poll for result later)
AAnyAny compatible lock mode is acceptable
cNo cacheWhen unlocked, demote DLM lock immediately
eNo expireIgnore subsequent lock cancel requests
EExactMust have exact lock mode
FFirstSet when holder is the first to be granted for this lock
HHolderIndicates that requested lock is granted
pPriorityEnqueue holder at the head of the queue
tTryA "try" lock
TTry 1CBA "try" lock that sends a callback
WWaitSet while waiting for request to complete

The most important holder flags are H (holder) and W (wait) as mentioned earlier, since they are set on granted lock requests and queued lock requests respectively. The ordering of the holders in the list is important. If there are any granted holders, they will always be at the head of the queue, followed by any queued holders.
If there are no granted holders, then the first holder in the list will be the one that triggers the next state change. Since demote requests are always considered higher priority than requests from the file system, that might not always directly result in a change to the state requested.
The glock subsystem supports two kinds of "try" lock. These are useful both because they allow the taking of locks out of the normal order (with suitable back-off and retry) and because they can be used to help avoid resources in use by other nodes. The normal t (try) lock is basically just what its name indicates; it is a "try" lock that does not do anything special. The T (try 1CB) lock, on the other hand, is identical to the t lock except that the DLM will send a single callback to current incompatible lock holders. One use of the T (try 1CB) lock is with the iopen locks, which are used to arbitrate among the nodes when an inode's i_nlink count is zero, and determine which of the nodes will be responsible for deallocating the inode. The iopen glock is normally held in the shared state, but when the i_nlink count becomes zero and ->delete_inode() is called, it will request an exclusive lock with T (try 1CB) set. It will continue to deallocate the inode if the lock is granted. If the lock is not granted it will result in the node(s) which were preventing the grant of the lock marking their glock(s) with the D (demote) flag, which is checked at ->drop_inode() time in order to ensure that the deallocation is not forgotten.
This means that inodes that have zero link count but are still open will be deallocated by the node on which the final close() occurs. Also, at the same time as the inode's link count is decremented to zero the inode is marked as being in the special state of having zero link count but still in use in the resource group bitmap. This functions like the ext3 file system3's orphan list in that it allows any subsequent reader of the bitmap to know that there is potentially space that might be reclaimed, and to attempt to reclaim it.

C.6. Glock tracepoints

The tracepoints are also designed to be able to confirm the correctness of the cache control by combining them with the blktrace output and with knowledge of the on-disk layout. It is then possible to check that any given I/O has been issued and completed under the correct lock, and that no races are present.
The gfs2_glock_state_change tracepoint is the most important one to understand. It tracks every state change of the glock from initial creation right through to the final demotion which ends with gfs2_glock_put and the final NL to unlocked transition. The l (locked) glock flag is always set before a state change occurs and will not be cleared until after it has finished. There are never any granted holders (the H glock holder flag) during a state change. If there are any queued holders, they will always be in the W (waiting) state. When the state change is complete then the holders may be granted which is the final operation before the l glock flag is cleared.
The gfs2_demote_rq tracepoint keeps track of demote requests, both local and remote. Assuming that there is enough memory on the node, the local demote requests will rarely be seen, and most often they will be created by umount or by occasional memory reclaim. The number of remote demote requests is a measure of the contention between nodes for a particular inode or resource group.
When a holder is granted a lock, gfs2_promote is called, this occurs as the final stages of a state change or when a lock is requested which can be granted immediately due to the glock state already caching a lock of a suitable mode. If the holder is the first one to be granted for this glock, then the f (first) flag is set on that holder. This is currently used only by resource groups.

C.7. Bmap tracepoints

Block mapping is a task central to any file system. GFS2 uses a traditional bitmap-based system with two bits per block. The main purpose of the tracepoints in this subsystem is to allow monitoring of the time taken to allocate and map blocks.
The gfs2_bmap tracepoint is called twice for each bmap operation: once at the start to display the bmap request, and once at the end to display the result. This makes it easy to match the requests and results together and measure the time taken to map blocks n different parts of the file system, different file offsets, or even of different files. It is also possible to see what the average extent sizes being returned are in comparison to those being requested.
To keep track of allocated blocks, gfs2_block_alloc is called not only on allocations, but also on freeing of blocks. Since the allocations are all referenced according to the inode for which the block is intended, this can be used to track which physical blocks belong to which files in a live file system. This is particularly useful when combined with blktrace, which will show problematic I/O patterns that may then be referred back to the relevant inodes using the mapping gained via this tracepoint.

C.8. Log tracepoints

The tracepoints in this subsystem track blocks being added to and removed from the journal (gfs2_pin), as well as the time taken to commit the transactions to the log (gfs2_log_flush). This can be very useful when trying to debug journaling performance issues.
The gfs2_log_blocks tracepoint keeps track of the reserved blocks in the log, which can help show if the log is too small for the workload, for example.
The gfs2_ail_flush tracepoint (Red Hat Enterprise Linux 6.2 and later) is similar to the gfs2_log_flush tracepoint in that it keeps track of the start and end of flushes of the AIL list. The AIL list contains buffers which have been through the log, but have not yet been written back in place and this is periodically flushed in order to release more log space for use by the filesystem, or when a process requests a sync or fsync.

C.9. References

For more information about tracepoints and the GFS2 glocks file, refer to the following resources:

Revision History

Revision History
Revision 5.0-7Mon Feb 18 2013Steven Levine
Version for 6.4 GA release
Revision 5.0-5Mon Nov 26 2012Steven Levine
Version for 6.4 Beta release
Revision 5.0-4Tue Nov 13 2012Steven Levine
Resolves #860324
Updates chapter on GFS2 configuration and operational considerations with small clarifications.
Resolves #807057
Adds note recommending consultation with an authorized Red Hat representative to verify your configuration prior to deployment.
Revision 5.0-1Mon Oct 15 2012Steven Levine
Updated chapter on operational considerations.
Revision 4.0-2Thu Mar 28 2012Steven Levine
Version for 6.3 GA release
Revision 4.0-1Thu Mar 28 2012Steven Levine
Resolves: #782482, #663944
Adds new chapter on GFS2 configuration and operational considerations.
Resolves: #757742
Clarifies necessity for using GFS2 with CLVM.
Resolves: #786621
Fixes small typographical error.
Revision 3.0-2Thu Dec 1 2011Steven Levine
Release for GA of Red Hat Enterprise Linux 6.2
Revision 3.0-1Mon Sep 19 2011Steven Levine
Initial revision for Red Hat Enterprise Linux 6.2 Beta release
Resolves: #704179
Documents support for the tunegfs2 command.
Resolves: #712390
Adds new appendix on GFS2 tracepoints.
Resolves: #705961
Resolves minor typographical errors.
Revision 2.0-1Thu May 19 2011Steven Levine
Initial release for Red Hat Enterprise Linux 6.1
Resolves: #549838
Documents support for standard Linux quota facilities in Red Hat Enterprise Linux 6.1.
Resolves: #608750
Clarifies description of GFS2 withdraw function.
Resolves: #660364
Corrects maximum GFS2 file system size information.
Resolves: #687874
Adds new chapter on GFS2 troubleshooting.
Resolves: #664848
Adds information on finding Context-Dependent Path Names before converting from GFS to GFS2.
Revision 1.0-1Wed Nov 15 2010Steven Levine
Initial release for Red Hat Enterprise Linux 6

Index

A

acl mount option, Mounting a File System
adding journals to a file system, Adding Journals to a File System
atime, configuring updates, Configuring atime Updates
mounting with noatime , Mount with noatime
mounting with relatime , Mount with relatime
audience, Audience

C

Configuration considerations, GFS2 Configuration and Operational Considerations
configuration, before, Before Setting Up GFS2
configuration, initial, Getting Started
prerequisite tasks, Prerequisite Tasks
Context-Dependent Path Names (CDPNs)
GFS to GFS2 Conversion, Converstion of Context-Dependent Path Names

D

data journaling, Data Journaling
debugfs, GFS2 tracepoints and the debugfs glocks File
debugfs file, Troubleshooting GFS2 Performance with the GFS2 Lock Dump
disk quotas
additional resources, References
assigning per group, Assigning Quotas per Group
assigning per user, Assigning Quotas per User
enabling, Configuring Disk Quotas
creating quota files, Creating the Quota Database Files
quotacheck, running, Creating the Quota Database Files
hard limit, Assigning Quotas per User
management of, Managing Disk Quotas
quotacheck command, using to check, Keeping Quotas Accurate
reporting, Managing Disk Quotas
soft limit, Assigning Quotas per User

G

GFS2
atime, configuring updates, Configuring atime Updates
mounting with noatime , Mount with noatime
mounting with relatime , Mount with relatime
Configuration considerations, GFS2 Configuration and Operational Considerations
managing, Managing GFS2
Operation, GFS2 Configuration and Operational Considerations
quota management, GFS2 Quota Management, Setting Up Quotas in Enforcement or Accounting Mode, GFS2 Quota Management with the gfs2_quota Command
displaying quota limits, Displaying Quota Limits and Usage with the gfs2_quota Command
enabling quota accounting, Enabling Quota Accounting
enabling/disabling quota enforcement, Enabling/Disabling Quota Enforcement
setting quotas, Setting Quotas with the gfs2_quota command
synchronizing quotas, Synchronizing Quotas with the quotasync Command, Synchronizing Quotas with the gfs2_quota Command
withdraw function, The GFS2 Withdraw Function
GFS2 file system maximum size, GFS2 Overview
GFS2-specific options for adding journals table, Complete Usage
GFS2-specific options for expanding file systems table, Complete Usage
gfs2_grow command, Growing a File System
gfs2_jadd command, Adding Journals to a File System
gfs2_quota command, GFS2 Quota Management with the gfs2_quota Command
glock, GFS2 tracepoints and the debugfs glocks File
glock flags, Troubleshooting GFS2 Performance with the GFS2 Lock Dump, The glock debugfs Interface
glock holder flags, Troubleshooting GFS2 Performance with the GFS2 Lock Dump, Glock Holders
glock types, Troubleshooting GFS2 Performance with the GFS2 Lock Dump, The glock debugfs Interface
growing a file system, Growing a File System

I

initial tasks
setup, initial, Initial Setup Tasks
introduction, Introduction
audience, Audience

M

making a file system, Making a File System
managing GFS2, Managing GFS2
maximum size, GFS2 file system, GFS2 Overview
mkfs command, Making a File System
mkfs.gfs2 command options table, Complete Options
mount command, Mounting a File System
mount table, Complete Usage
mounting a file system, Mounting a File System, Special Considerations when Mounting GFS2 File Systems

N

node locking, GFS2 Node Locking

O

overview, GFS2 Overview
configuration, before, Before Setting Up GFS2
features, new and changed, New and Changed Features

P

path names, context-dependent (CDPNs), Bind Mounts and Context-Dependent Path Names
performance tuning, Performance Tuning With GFS2
Posix locking, Issues with Posix Locking
preface (see introduction)
prerequisite tasks
configuration, initial, Prerequisite Tasks

R

repairing a file system, Repairing a File System

S

setup, initial
initial tasks, Initial Setup Tasks
suspending activity on a file system, Suspending Activity on a File System
system hang at unmount, Special Considerations when Mounting GFS2 File Systems

T

tables
GFS2-specific options for adding journals, Complete Usage
GFS2-specific options for expanding file systems, Complete Usage
mkfs.gfs2 command options, Complete Options
mount options, Complete Usage
tracepoints, GFS2 tracepoints and the debugfs glocks File
tuning, performance, Performance Tuning With GFS2

W

withdraw function, GFS2, The GFS2 Withdraw Function
(Sebelumnya) 5 : Chapter 4. Managing GFS2 - ...6 : Security Guide (Berikutnya)