GFS2 Implementation Under RHEL

This article will demonstrate setting up a simple RHCS (Red Hat Cluster Suite) two-node cluster, with an end goal of having a 50GB LUN shared between two servers, thus providing clustered shared storage to both nodes. This will enable applications running on the nodes to write to a shared filesystem, perform correct locking, and ensure filesystem integrity.

This type of configuration is central to many active-active application setups, where both nodes share a central content or configuration repository.

For this article, two RHEL 6.1 nodes, running on physical hardware (IBM blades) were used. Each node has multiple paths back to the 50GB SAN LUN presented, and multipathd will be used to manage path failover and rebuild in the event of interruption.

Validating Hardware

Prior to building our cluster, it is imperative that the appropriate kernel module(s) have been loaded. Using QLogic 2xxx HBAs, running lsmod should yield something like:

Each of the physical servers has two HBAs installed. Whilst most HBA manufacturers offer software to check the status of the HBAs (for example, QLogic offer SANSurfer), I prefer to check the output of the dmesg command, or /var/log/dmesg, for appropriate detection messages. The correct detection of two QLogic HBAs by the OS should look something like the following:

Once you are happy that the Operating System has successfully detected the HBAs and loaded the appropriate kernel modules, you can proceed. If the HBAs were installed after Operating System installation, you should ensure that you follow the steps provided with your HBA documentation to have them made available to the Operating System. Most common HBAs already have appropriate modules bundled with the OS, so it may just be a case of enabling/configuring them in /etc/modprobe.conf.

Multipath Configuration

The next step is to configure multipathd. We use multipathd to manage mpx-io storage access to each node. The actual multipathd configuration will vary depending on which SAN or other storage technology is being used, and thus should be configured according to your storage array documentation. Our servers connect back to an IBM SAN Volume Controller (product 2145), which leads to a multipath configuration as follows:

Once configured, start multipathd:

Once started, verify that all storage paths are available with the multipath -ll command:

Excellent – all paths are online. You can obtain a similar list (and access to far more functionality and configuration commands) from the multipathd -k interactive command prompt, using the list paths subcommand.

Once you have confirmed storage availability and that all paths are active from both nodes, you can configure multipathd to start automatically on system boot:

LUN Partitioning

We will create a single partition on the LUN of type 8e (Linux LVM) – this will house a Clustered Logical Volume Manager (CLVM) physical volume. Perform this step from a single node only, substituting the appropriate device path in place of mpathb if needed:

Ensure that the appropriate device nodes/links have been created:

On the second node, run partprobe, and check that the new partition is detected, and the appropriate device nodes/links have been created:

As you can see, the devices have been created with different device names on each node – depending on the current udev state and configuration on the server in question.

We can, however, define aliases within /etc/multipath.conf to assign hard device names to the multipathed device(s).

Find the WWID for your LUN as follows:

Use the WWID you’ve gleaned to define the alias in /etc/multipath.conf as follows on all cluster nodes:

Force a multipath devmap reload:

All nodes will now have the same device nodes and links created under /dev/mapper/mpathb* for the base LUN and the partition created earlier.

Further Preparation

Before proceeding with the cluster configuration, several other prerequisite tasks must be performed. It is imperative that date and time are synchronised across the cluster for correct operation. First, check that NTP is peering correctly, and that the date/time are correct:

If this returns an error, or the date and time are not correct, configure NTP appropriately.
If the date/time is correct and synchronisation is occurring correctly, sync the time back to the hardware clock:

As an extra precaution against DNS failure, add entries for both nodes to each node’s /etc/hosts file:

These two safeguards will help to ensure that the cluster operates smoothly.

Software Installation

With RHEL, ensure that you have correctly registered your system either via rhn_register, or rhnreg_ks if you prefer keeping things on the command line. With CentOS, this step is not required. If using RHEL, you’ll need to log into Red Hat Network, and apply your Resilient Storage entitlements to both nodes at this time. If you haven’t purchased Resilient Storage entitlements, this step will obviously fail – go and spend your dollars before returning to this article.

Install the following packages, and their dependencies, via yum:

  • gfs2-utils Utilities for managing the global filesystem (GFS2)
  • lvm2-cluster Cluster extensions for userland logical volume management tools
  • openais The OpenAIS standards-based cluster framework executive and APIs
  • cman Red Hat Cluster Manager
  • modcluster Red Hat Cluster Suite – remote management
  • rgmanager Open source HA resource group failover for Red Hat Cluster

You can install the packages and their dependencies via yum as follows:

Verify that all packages have been correctly installed:

Now that the cluster framework and all supporting packages are installed, we can proceed to cluster configuration.

Cluster Configuration

The values supplied here will vary depending upon your site configuration. I find the easiest method to configure the cluster is to modify the cluster configuration file (/etc/cluster/cluster.conf) directly. There are tools (command line and GUI) available to create and edit this file, however I find a quick bit of vi-hackery the easiest way to get this job done.

Create /etc/cluster/cluster.conf on each node with the following contents (of course, substituting appropriate values depending on your configuration):

A few of these configuration directives are worth looking at further. Ensure that both nodes have unique nodeid values – failure to do so will result in a split-brain cluster – essentially running two single-node clusters. Use the same hostnames for each clusternode as defined in /etc/hosts and DNS. Provide a partition to the devices element for the fence_scsi agent, instead of using the base device. In our case, this is /dev/mapper/mpathbp1. For ease of configuration (and as this is the only cluster on this subnet), broadcast is used for cluster heartbeat. This is a two-node cluster, so a few workarounds are required for correct quorum operation. First, set the two_node flag to 1 (enabling it), and set expected_votes to 1. This configuration circumvents the use of qdiskd, or a third note for correct quorum establishment.

Ensure that the fence and unfence methods are correct, or the cluster will fail to fence correctly, and again nodes will not form quorum or join/leave the cluster correctly.

If, during testing, a node does not fence correctly, you can manually acknowledge the failed fencing operation with fence_ack_manual. This will allow a two node cluster to form with a single node from cold startup if the second node is in an inconsistent or failed state. You can check for fencing (and cluster in general) log messages in /var/log/messages:

Validate the configuration before enabling it with ccs_config_validate:

If ccs_config_validate doesn’t return errors, the cluster is correctly configured and is ready for its initial startup.

Cluster Startup

First, start the cluster manager cman. This is the core cluster service, and will spawn various low-level required daemons (fenced – the fencing daemon, corosync – the core cluster engine, etc.). All steps should, unless otherwise noted, be performed on both nodes). Start cman via the service command:

Next, start rgmanager. No actual resources or resource groups are required for GFS2, but this daemon is included and started for completeness of the RHCS stack:

Enable the correct LVM locking_type for clustering. This updates the value of locking_type in /etc/lvm/lvm.conf from its default value of 0, to 3 – built-in clustered locking.

Start the clustered LVM daemon:

If no errors are experienced at this point, the core cluster is ready for use. Enable all cluster services to start automatically on system boot:

LVM Configuration

In order to provide a logical volume for the creation of our GFS2 filesystem, we must first create a new LVM physical volume on our shared storage (/dev/mapper/mpathbp1). Do this with the pvcreate command. These steps must be performed from a single node:

Create a new volume group, vg_shared, and ensure that you specify -c y to create a clustered volume:

Next, create an appropriately sized logical volume for your GFS2 filesystem. Our partition is 50GB in size:

On the other node, ensure that the logical volume is available:

If there are any issues, rescan the various LVM components:

If, during lvscan, the new volume is listed as inactive, run the following command to activate it:

The logical volume is now ready to receive the GFS2 filesystem.

GFS Configuration

Use the mkfs.gfs2 command to create the GFS2 filesystem. Ensure that lock_dlm is used for the locking protocol, and the first part of the LockTableName (specified with -t <clustername>:<fsname>) matches the cluster name defined in /etc/cluster/cluster.conf. Again, run this from one node only:

I created 4 journals which will allow 4 nodes to mount the filesystem. Additional journals can be added at a later date with the gfs2_jadd command should more nodes be required. Adding a journal will consume additional space on the GFS2 filesystem, and that should be taken into account when sizing the volumes appropriately.
Test mounting the volume on both nodes.

If the test is successful, unmount the filesystem:

Update /etc/fstab on both nodes with the appropriate filesystem configuration. Ensure that you do NOT allow the system to fsck the filesystem on boot otherwise it may attempt to check a filesystem mounted by another node. Also ensure that the noatime and nodiratime mount options are specified. This will significantly increase the performance of the GFS2 filesystem by disabling updates of file/directory access times which are not usually required.

Mount the filesystem on both nodes:

The filesystem is now mounted to both nodes, and is correctly locked and clustered.
You can now enable the automatic startup of GFS2:

Final Validation

Reboot both cluster nodes, and validate correct operation, reviewing system boot messages:

Once both nodes are back up, run the following commands to verify cluster status:

If no issues are noted, you are done! You probably want a more sensible mountpoint than /shared/tmpmount – but this is being done in a lab environment and is suitable for my needs.
Check /var/log/messages should any issues be evident, and resolve them.

Conclusion

This article has walked through the preparation of shared storage, and the installation of Red Hat Cluster Suite and the Global Filesystem, configuration of a simple two-node cluster, and the creation and mounting of a clusterwide shared filesystem.

RHCS is a very complex suite of software, capable of the most demanding high-availability requirements. If you want to learn more, consult the appropriate RedHat documentation.