Tag Archives: clustering

Configuring GFS2 on CentOS 7

This article will briefly discuss how to configure a GFS2 shared filesystem across two nodes on CentOS 7. Rather than rehashing a lot of previous content, this article presumes that you have followed the steps in my previous article, in order to configure the initial cluster and storage, up to and including the configuration of the STONITH device – but no further. All other topology considerations, device paths/layouts, etc. are the same, and the cluster nodes are still centos05 and centos07. The cluster name is webcluster and the 8GB LUN is presented as /dev/disk/by-id/wwn-0x60014055f0cfae3d6254576932ddc1f7 upon which a single partition has been created: /dev/disk/by-id/wwn-0x60014055f0cfae3d6254576932ddc1f7-part1.

First, install the lvm2-cluster and gfs2-utils packages:

Enable clustered locking for LVM, and reboot both nodes:

Create clone resources for DLM and CLVMD, so that they can run on both nodes. Run pcs commands from a single node only:

Create an ordering and a colocation constraint, so that DLM starts before CLVMD, and both resources start on the same node:

Check the status of the clone resources:

Set the no-quorum-policy of the cluster to freeze so that that when quorum is lost, the remaining partition will do nothing until quorum is regained – GFS2 requires quorum to operate.

Create the LVM objects as required, again, from a single cluster node:

Create the GFS2 filesystem. The -t option should be specified as <clustername>:<fsname>, and the right number of journals should be specified (here 2 as we have two nodes accessing the filesystem):

We will not use /etc/fstab to specify the mount, rather we’ll use a Pacemaker-controlled resource:

This is configured as a clone resource so it will run on both nodes at the same time. Confirm that the mount has succeeded on both nodes:

Note the use of noatime and nodiratime which will yield a performance benefit. As per Red Hat Documentation, SELinux should be disabled too.

Next, create an ordering constraint so that the filesystem resource is started after the CLVMD resource, and a colocation constraint so that both start on the same node:

And we’re done.

We can even grow the filesystem online:

Building a Highly-Available Apache Cluster on CentOS 7

This article will walk through the steps required to build a highly-available Apache cluster on CentOS 7. In CentOS 7 (as in Red Hat Enterprise Linux 7) the cluster stack has moved to Pacemaker/Corosync, with a new command line tool to manage the cluster (pcs, replacing commands such as ccs and clusvcadm in earlier releases).

The cluster will be a two node cluster comprising nodes centos05 and centos07, and iSCSI shared storage will be presented from node fedora01. There will be a 8GB LUN presented for shared storage, and a 1GB LUN for fencing purposes. I have covered setting up iSCSI storage with SCSI-3 persistent reservations in a previous article. There is no need to use CLVMD in this example as we will be utilising a simple failover filesystem instead.

The first step is to add appropriate entries to /etc/hosts on both nodes for all nodes, including the storage node, to safeguard against DNS failure:

Next, bring both cluster nodes fully up-to-date, and reboot them:

Continue reading

SCSI-3 Persistent Reservations on Fedora Core 20 with targetcli over iSCSI and Red Hat Cluster

In this article, I’ll show how to set up SCSI-3 Persistent Reservations on Fedora Core 20 using targetcli, serving a pair of iSCSI LUNs to a simple Red Hat Cluster that will host a failover filesystem for the purposes of testing the iSCSI implementation. The Linux IO target (LIO) (http://linux-iscsi.org/wiki/LIO) has been the Linux SCSI target since kernel version 2.6.38. It supports a rapidly growing number of fabric modules, and all existing Linux block devices as backstores. For the purposes of our demonstration, the important fact is that it supports operating as an iSCSI target. targetcli is the tool used to perform the LIO configuration. SCSI-3 persistent reservations are required for a number of cluster storage configurations for I/O fencing and failover/retakeover. Therefore, LIO can be used as the foundation for high-end clustering solutions such as Red Hat Cluster Suite. You can read more about persistent reservations here.

The nodes in the lab are as follows:

  • – Red Hat Cluster node 1 on CentOS 6.5
  • – Red Hat Cluster node 2 on CentOS 6.5
  • – Fedora Core 20 storage node


I’ll start by installing targetcli onto fedora01:

Let’s check that it has been installed correctly:

Make sure that, before proceeding, any existing configuration is removed:

Continue reading

GFS2 Implementation Under RHEL

This article will demonstrate setting up a simple RHCS (Red Hat Cluster Suite) two-node cluster, with an end goal of having a 50GB LUN shared between two servers, thus providing clustered shared storage to both nodes. This will enable applications running on the nodes to write to a shared filesystem, perform correct locking, and ensure filesystem integrity.

This type of configuration is central to many active-active application setups, where both nodes share a central content or configuration repository.

For this article, two RHEL 6.1 nodes, running on physical hardware (IBM blades) were used. Each node has multiple paths back to the 50GB SAN LUN presented, and multipathd will be used to manage path failover and rebuild in the event of interruption.

Continue reading

Clustering with DRBD, Corosync and Pacemaker


This article will cover the build of a two-node high-availability cluster using DRBD (RAID1 over TCP/IP), the Corosync cluster engine, and the Pacemaker resource manager on CentOS 6.4. There are many applications for this type of cluster – as a free alternative to RHCS for example. However, this example does have a couple of caveats. As this is being built in a lab environment on KVM guests, there will be no STONITH (Shoot The Other Node In The Head) (a type of fencing). If this cluster goes split-brain, there may be manual recovery required to intervene, tell DRBD who is primary and who is secondary, and so on. In a Production environment, we’d use STONITH to connect to ILOMs (for example) and power off or reboot a misbehaving node. Quorum will also need to be disabled, as this stack doesn’t yet support the use of quorum disks – if you want that go with RHCS (and use cman with the two_node parameter, with or without qdiskd).

This article, as always, presumes that you know what you are doing. The nodes used in this article are as follows:

  • – rhcs-node01.local – first cluster node – running CentOS 6.4
  • – rhcs-node02.local – second cluster node – running CentOS 6.4
  • – failover IP address

DRBD will be used to replicate a volume between the two nodes (in a Master/Slave fashion), and the hosts will eventually run the nginx webserver in a failover topology, with this example having documents being served from the replicated volume.

Ideally, four network interfaces per host should be used (1 for “standard” node communications, 1 for DRBD replication, 2 for Corosync), but for a lab environment a single interface per node is fine.

Let’s start the build …

Continue reading

Configuring Transitive IPMP on Solaris 11

We all know the pain of configuring probe-based IPMP under Solaris, with a slew of test addresses being required, and a long line of ifconfig configuration in our /etc/hostname.<interface> files.

With Solaris 11, there is a new type of probe-based IPMP called transitive probing. This new type of probing does not require test addresses, as per the documentation: “Transitive probes are sent by the alternate interfaces in the group to probe the active interface. An alternate interface is an underlying interface that does not actively receive any inbound IP packets”.

In this article, I will configure failover (active/passive) IPMP on clusternode1 (the first node of a Solaris Cluster I’m building). Interface net0 has an address of (configured at install time), and I’ll be adding this into an IPMP group ipmp0 along with a standby interface, net1. Make sure you are performing these steps via a console connection, as the original address associated with net0 will need to be removed before attempting to add it to an IPMP group.

The first step, ensure that there is an entry in /etc/hosts for the IP address you’re configuring IPMP for:

Next, ensure that automatic network configuration is disabled. In my case it was as I’d configured networking manually during the installation of Solaris 11:

Verify that the appropriate physical interfaces are available. In the following output, I’ll be bonding e1000g0 (net0) and e1000g1 (net1) into a failover IPMP group.

List the current addresses – from the output of ipadm show-addr I can see that I’ll need to delete net0/v4 and net0/v6, otherwise I’ll be unable to add net0 to the IPMP group.

As the net0 IP interface is already created, I only need to create the net1 interface:

I can then create the IPMP group, which I’ll call ipmp0:

Next, enable transitive probing, which is disabled by default:

And configure the appropriate interface (in my case net1) to be a standby interface (as I’m using failover):

Now I can create my IPv4 address on the IPMP group:

Finally, fix the default route. I removed the existing route and added a new default route using the new and correct interface – ipmp0:

You can use ipmpstat to verify the configuration and health of the IPMP group:

Let’s perform a failover test. I’ll disable net0 and ensure that the clusternode1 address fails over:

It works! (and my SSH connection is still active…) – net1 is now active with the correct IP address. Let’s fail it back:

The address has failed back to net0, and again my SSH connection is still active. I can now continue with clusternode2, and the rest of the cluster install.


MySQL Cluster: Adding New Data Nodes Online

MySQL Cluster has a pretty cool feature that allows you to add new data nodes whilst the cluster is online, thus avoiding any downtime. This is incredibly useful for scaling out the data nodes and adding additional node groups. In this article, I’ll show how to add two new data nodes to an existing cluster that has two data nodes defined. I’ll also explain what needs to happen after the configuration change to ensure that any existing data is correctly partitioned across the new nodes.

Continue reading

Solaris Cluster 4.1 Part Four: Highly Available Containers


The previous article covered the configuration of two resource groups, each containing a failover zpool for use as the zonepath to a highly-available zone, and a failover IP address to be assigned to each zone. The two zones were also configured and installed, and we verified that they could be booted on either node of the cluster, provided that the storage had been failed over appropriately and was available on the node where the zone was being booted.

This final part in the series will cover the incorporation of the zone boot/shutdown/failover into the cluster framework, as well as the configuration of two iPlanet resources to illustrate how Solaris Cluster can manage SMF services deployed within a highly-available Solaris zone.

Highly-Available Zones

First, install the ha-zones data service, if you haven’t done so already. I installed the full cluster package suite, so already have all data services at my disposal:

Register the SUNW.gds resource type:

This is the Generic Data Service that is utilised by SUNWsczone (HA for Solaris Containers) for deploying highly-available zones. SUNWsczone supplies three highly-available mechanisms for zone deployment – sczbt (zone boot – used to start/stop/failover zones), sczsh (zone script resource – used for deploying highly-available services within zones, with start/stop scripts to control them) and sczsmf (zone SMF resource, used for deploying highly-available services within zones, with SMF services to control them). We’ll be using both sczbt and sczsmf.

Continue reading

Solaris Cluster 4.1 Part Three: Cluster Resources


In my previous article, we ended up with a working cluster, with all appropriate cluster software installed. In this article, I’ll start to configure cluster resources. I want to configure two resource groups, ha-zone-1-rg and ha-zone-2-rg. Each resource group will contain a highly-available failover filesystem, a highly-available failover IP address and a highly-available Solaris Zone. I’ll illustrate the process for cloning a zone to save on installation time, as zones in Solaris 11 now use IPS and unless you have a local IPS repository, will connect to http://pkg.oracle.com to download all appropriate packages during zone installation – not something you want to repeat too many times.

A summary of the resources/resource groups I’m looking to create is as follows:

  • ha-zone-1-rg – Resource group for the first set of failover resources
  • ha-zone-1-hasp – a SUNW.HAStoragePlus resource for the first failover zpool used for the zonepath for the first failover zone, ha-zone-1
  • ha-zone-1-lh-res – a SUNW.LogicalHostname resource for the first failover zone
  • ha-zone-1-res – a SUNW.gds resource, coupled with SUNWsczone/sczbt zone boot registration to create a highly-available zone, ha-zone-1
  • ha-zone-1-http-admin-smf-res – a SUNW.gds resource, coupled with SUNWsczone/sczsmf zone SMF service registration to create a highly-available iPlanet admin server instance
  • ha-zone-1-http-instance-smf-res – a SUNW.gds resource, coupled with SUNWsczone/sczsmf zone SMF service registration to create a highly-available iPlanet instance
  • ha-zone-2-rg – Resource group for the second set of failover resources
  • ha-zone-2-hasp – a SUNW.HAStoragePlus resource for the second failover zpool used for the zonepath for the second failover zone, ha-zone-2
  • ha-zone-2-lh-res – a SUNW.LogicalHostname resource for the second failover zone
  • ha-zone-2-res - a SUNW.gds resource, coupled with SUNWsczone/sczbt boot registration to create a highly-available zone, ha-zone-2

This article will cover a lot of ground, much more so than the previous two parts. By the end of the article, you will see two HA resource groups in action, each with a failover zpool and logical hostname resource. I’ll also install the two zones, but won’t make them HA as yet – that’ll be in the next part of the series, as will the configuration of the HA SMF iPlanet resources.

As always, ensure that you read the Oracle Solaris Cluster 4.1 documentation library for full details.

Let’s make a start …

Continue reading

Solaris Cluster 4.1 Part Two: iSCSI, Quorum Server, and Cluster Software Installation


The previous article in this series covered the initial preparation of our two cluster nodes, and the storage server. This article follows on from this by performing more work on the storage server – configuring the iSCSI LUNs that’ll be exported to our cluster nodes as shared disk devices, as well as installing the Solaris Cluster Quorum Server software. Then we move onto the cluster nodes, and install Solaris Cluster 4.1. By the end of this article, you’ll see an operational cluster – although it won’t have any resources created just yet.

iSCSI Configuration

Before we can configure iSCSI (which now requires COMSTAR configuration in Solaris 11), the appropriate package group needs to be installed – group/feature/storage-server. Install this package group on the storage server:

This will install quite a few packages (including things like AVS, Infiniband, Samba, etc.) but is the recommended method in the Oracle documentation. In any case, it provides the packages we want: scsi-target-mode-framework and iscsi/iscsi-target – and meets any dependencies. As an aside, you can find out what package owns a file via pkg search -l <filename> or pkg search file::<filename>:

Once the packages are installed, enable the SCSI target mode framework SMF service:

At this point, I’ll add a second disk to the datapool zpool to ensure there’s plenty of capacity for ZFS volume creation:

Let’s check how much free space we have:

OK – that’ll do – 39.6GB. Next, I’ll create two ZFS volumes, one for each zone that I’ll be deploying to the cluster. Each volume will be used as a failover zpool by the cluster, and will provide storage for a single failover zone. 8GB will suffice for each volume:

ZFS volumes are datasets that represent block devices, and are treated as such. They are useful for things such as this (and swap space, dump devices, etc.).

Continue reading