Tag Archives: High-Availability

Configuring GFS2 on CentOS 7

This article will briefly discuss how to configure a GFS2 shared filesystem across two nodes on CentOS 7. Rather than rehashing a lot of previous content, this article presumes that you have followed the steps in my previous article, in order to configure the initial cluster and storage, up to and including the configuration of the STONITH device – but no further. All other topology considerations, device paths/layouts, etc. are the same, and the cluster nodes are still centos05 and centos07. The cluster name is webcluster and the 8GB LUN is presented as /dev/disk/by-id/wwn-0x60014055f0cfae3d6254576932ddc1f7 upon which a single partition has been created: /dev/disk/by-id/wwn-0x60014055f0cfae3d6254576932ddc1f7-part1.

First, install the lvm2-cluster and gfs2-utils packages:

Enable clustered locking for LVM, and reboot both nodes:

Create clone resources for DLM and CLVMD, so that they can run on both nodes. Run pcs commands from a single node only:

Create an ordering and a colocation constraint, so that DLM starts before CLVMD, and both resources start on the same node:

Check the status of the clone resources:

Set the no-quorum-policy of the cluster to freeze so that that when quorum is lost, the remaining partition will do nothing until quorum is regained – GFS2 requires quorum to operate.

Create the LVM objects as required, again, from a single cluster node:

Create the GFS2 filesystem. The -t option should be specified as <clustername>:<fsname>, and the right number of journals should be specified (here 2 as we have two nodes accessing the filesystem):

We will not use /etc/fstab to specify the mount, rather we’ll use a Pacemaker-controlled resource:

This is configured as a clone resource so it will run on both nodes at the same time. Confirm that the mount has succeeded on both nodes:

Note the use of noatime and nodiratime which will yield a performance benefit. As per Red Hat Documentation, SELinux should be disabled too.

Next, create an ordering constraint so that the filesystem resource is started after the CLVMD resource, and a colocation constraint so that both start on the same node:

And we’re done.

We can even grow the filesystem online:

Building a Highly-Available Apache Cluster on CentOS 7

This article will walk through the steps required to build a highly-available Apache cluster on CentOS 7. In CentOS 7 (as in Red Hat Enterprise Linux 7) the cluster stack has moved to Pacemaker/Corosync, with a new command line tool to manage the cluster (pcs, replacing commands such as ccs and clusvcadm in earlier releases).

The cluster will be a two node cluster comprising nodes centos05 and centos07, and iSCSI shared storage will be presented from node fedora01. There will be a 8GB LUN presented for shared storage, and a 1GB LUN for fencing purposes. I have covered setting up iSCSI storage with SCSI-3 persistent reservations in a previous article. There is no need to use CLVMD in this example as we will be utilising a simple failover filesystem instead.

The first step is to add appropriate entries to /etc/hosts on both nodes for all nodes, including the storage node, to safeguard against DNS failure:

Next, bring both cluster nodes fully up-to-date, and reboot them:

Continue reading

Clustering with DRBD, Corosync and Pacemaker

Introduction

This article will cover the build of a two-node high-availability cluster using DRBD (RAID1 over TCP/IP), the Corosync cluster engine, and the Pacemaker resource manager on CentOS 6.4. There are many applications for this type of cluster – as a free alternative to RHCS for example. However, this example does have a couple of caveats. As this is being built in a lab environment on KVM guests, there will be no STONITH (Shoot The Other Node In The Head) (a type of fencing). If this cluster goes split-brain, there may be manual recovery required to intervene, tell DRBD who is primary and who is secondary, and so on. In a Production environment, we’d use STONITH to connect to ILOMs (for example) and power off or reboot a misbehaving node. Quorum will also need to be disabled, as this stack doesn’t yet support the use of quorum disks – if you want that go with RHCS (and use cman with the two_node parameter, with or without qdiskd).

This article, as always, presumes that you know what you are doing. The nodes used in this article are as follows:

  • 192.168.122.30 – rhcs-node01.local – first cluster node – running CentOS 6.4
  • 192.168.122.31 – rhcs-node02.local – second cluster node – running CentOS 6.4
  • 192.168.122.33 – failover IP address

DRBD will be used to replicate a volume between the two nodes (in a Master/Slave fashion), and the hosts will eventually run the nginx webserver in a failover topology, with this example having documents being served from the replicated volume.

Ideally, four network interfaces per host should be used (1 for “standard” node communications, 1 for DRBD replication, 2 for Corosync), but for a lab environment a single interface per node is fine.

Let’s start the build …

Continue reading

Secure MySQL Replication over SSL

MySQL is a popular open-source relational-database management system. One of its core features is replication, and in this article I will be showing how to configure a master and slave MySQL instance, and then configure replication from master to slave over SSL. Encryption will help protect the replication from snooping. This type of replication has many uses, for example: disaster-recovery scenarios whereby the slave can be switched to a master role in the case of a master outage, for performance where all reads can take place on the slave with writes and updates occurring on the master, and so on. Replication can be configured without encryption, but encrypting with SSL is preferred as part of a defence-in-depth strategy – it’s an extra layer of security.

This article already presumes a good working knowledge of MySQL. The master server is centosa with IP address 10.1.1.150, and is running a minimal installation of CentOS 6.4 x86_64. The slave, centosb, is running the same OS and has IP address 10.1.1.151. MySQL will be installed from the latest current stable RPMs available at dev.mysql.com, rather than using the upstream versions. The latest stable version available at the time of writing is 5.6.14.

This article will cover the configuration of an SSL-encrypted replicated environment from scratch – it does not cover the migration of an existing replicated configuration to an SSL-encrypted replicated configuration, or the migration of any existing data to a new slave.

Continue reading

MySQL Cluster: Adding New Data Nodes Online

MySQL Cluster has a pretty cool feature that allows you to add new data nodes whilst the cluster is online, thus avoiding any downtime. This is incredibly useful for scaling out the data nodes and adding additional node groups. In this article, I’ll show how to add two new data nodes to an existing cluster that has two data nodes defined. I’ll also explain what needs to happen after the configuration change to ensure that any existing data is correctly partitioned across the new nodes.

Continue reading

Solaris Cluster 4.1 Part Four: Highly Available Containers

Introduction

The previous article covered the configuration of two resource groups, each containing a failover zpool for use as the zonepath to a highly-available zone, and a failover IP address to be assigned to each zone. The two zones were also configured and installed, and we verified that they could be booted on either node of the cluster, provided that the storage had been failed over appropriately and was available on the node where the zone was being booted.

This final part in the series will cover the incorporation of the zone boot/shutdown/failover into the cluster framework, as well as the configuration of two iPlanet resources to illustrate how Solaris Cluster can manage SMF services deployed within a highly-available Solaris zone.

Highly-Available Zones

First, install the ha-zones data service, if you haven’t done so already. I installed the full cluster package suite, so already have all data services at my disposal:

Register the SUNW.gds resource type:

This is the Generic Data Service that is utilised by SUNWsczone (HA for Solaris Containers) for deploying highly-available zones. SUNWsczone supplies three highly-available mechanisms for zone deployment – sczbt (zone boot – used to start/stop/failover zones), sczsh (zone script resource – used for deploying highly-available services within zones, with start/stop scripts to control them) and sczsmf (zone SMF resource, used for deploying highly-available services within zones, with SMF services to control them). We’ll be using both sczbt and sczsmf.

Continue reading

Solaris Cluster 4.1 Part Three: Cluster Resources

Introduction

In my previous article, we ended up with a working cluster, with all appropriate cluster software installed. In this article, I’ll start to configure cluster resources. I want to configure two resource groups, ha-zone-1-rg and ha-zone-2-rg. Each resource group will contain a highly-available failover filesystem, a highly-available failover IP address and a highly-available Solaris Zone. I’ll illustrate the process for cloning a zone to save on installation time, as zones in Solaris 11 now use IPS and unless you have a local IPS repository, will connect to http://pkg.oracle.com to download all appropriate packages during zone installation – not something you want to repeat too many times.

A summary of the resources/resource groups I’m looking to create is as follows:

  • ha-zone-1-rg – Resource group for the first set of failover resources
  • ha-zone-1-hasp – a SUNW.HAStoragePlus resource for the first failover zpool used for the zonepath for the first failover zone, ha-zone-1
  • ha-zone-1-lh-res – a SUNW.LogicalHostname resource for the first failover zone
  • ha-zone-1-res – a SUNW.gds resource, coupled with SUNWsczone/sczbt zone boot registration to create a highly-available zone, ha-zone-1
  • ha-zone-1-http-admin-smf-res – a SUNW.gds resource, coupled with SUNWsczone/sczsmf zone SMF service registration to create a highly-available iPlanet admin server instance
  • ha-zone-1-http-instance-smf-res – a SUNW.gds resource, coupled with SUNWsczone/sczsmf zone SMF service registration to create a highly-available iPlanet instance
  • ha-zone-2-rg – Resource group for the second set of failover resources
  • ha-zone-2-hasp – a SUNW.HAStoragePlus resource for the second failover zpool used for the zonepath for the second failover zone, ha-zone-2
  • ha-zone-2-lh-res – a SUNW.LogicalHostname resource for the second failover zone
  • ha-zone-2-res - a SUNW.gds resource, coupled with SUNWsczone/sczbt boot registration to create a highly-available zone, ha-zone-2

This article will cover a lot of ground, much more so than the previous two parts. By the end of the article, you will see two HA resource groups in action, each with a failover zpool and logical hostname resource. I’ll also install the two zones, but won’t make them HA as yet – that’ll be in the next part of the series, as will the configuration of the HA SMF iPlanet resources.

As always, ensure that you read the Oracle Solaris Cluster 4.1 documentation library for full details.

Let’s make a start …

Continue reading

Solaris Cluster 4.1 Part Two: iSCSI, Quorum Server, and Cluster Software Installation

Introduction

The previous article in this series covered the initial preparation of our two cluster nodes, and the storage server. This article follows on from this by performing more work on the storage server – configuring the iSCSI LUNs that’ll be exported to our cluster nodes as shared disk devices, as well as installing the Solaris Cluster Quorum Server software. Then we move onto the cluster nodes, and install Solaris Cluster 4.1. By the end of this article, you’ll see an operational cluster – although it won’t have any resources created just yet.

iSCSI Configuration

Before we can configure iSCSI (which now requires COMSTAR configuration in Solaris 11), the appropriate package group needs to be installed – group/feature/storage-server. Install this package group on the storage server:

This will install quite a few packages (including things like AVS, Infiniband, Samba, etc.) but is the recommended method in the Oracle documentation. In any case, it provides the packages we want: scsi-target-mode-framework and iscsi/iscsi-target – and meets any dependencies. As an aside, you can find out what package owns a file via pkg search -l <filename> or pkg search file::<filename>:

Once the packages are installed, enable the SCSI target mode framework SMF service:

At this point, I’ll add a second disk to the datapool zpool to ensure there’s plenty of capacity for ZFS volume creation:

Let’s check how much free space we have:

OK – that’ll do – 39.6GB. Next, I’ll create two ZFS volumes, one for each zone that I’ll be deploying to the cluster. Each volume will be used as a failover zpool by the cluster, and will provide storage for a single failover zone. 8GB will suffice for each volume:

ZFS volumes are datasets that represent block devices, and are treated as such. They are useful for things such as this (and swap space, dump devices, etc.).

Continue reading

Solaris Cluster 4.1 Part One: Initial Preparation

Introduction

This series of articles will cover the build of a two-node cluster running Solaris Cluster 4.1 on Solaris 11 x86. A few failover resource types will be introduced, and the end setup will have highly-available zones deployed to it. I’ll also install iPlanet on one of the zones, to illustrate how to incorporate an SMF service within a zone into the cluster-managed framework, after cloning it to save time in creating a second zone.

This is all being done under VMware Fusion 5.0 on Mac OS X. As VMware Fusion does not support disk sharing, I’ll also build a third Solaris 11 node for use as an iSCSI target and Quorum Server. Later in the process, I’ll remove the Quorum Server from the mix (as when I thought about it, an iSCSI LUN can be used for this purpose). I’ve still kept the Quorum Server details in the article, however, as it’s still of interest. Solaris 11 has totally changed the way iSCSI sharing happens, too. You have to configure COMSTAR – gone is zfs set shareiscsi=on :/

To start with, I installed Solaris 11/11 onto three VMs, clusternode1 and clusternode2 (each with 1.5GB RAM), and storagenode (with 1GB RAM). Both cluster nodes have a single 20GB disk for use as rpool, and the storagenode has a 20GB disk for rpool, and two additional 10GB disks for use as iSCSI LUNs on ZFS volumes. These LUNs will be presented to our cluster, and used for failover storage – which will then be used as the zonepaths for our highly-available zones. The node details are:

  • 10.1.1.70 – storageserver – iSCSI target and quorum server
  • 10.1.1.71 – ha-zone-1 – Zone to be provisioned to host iPlanet
  • 10.1.1.72 – ha-zone-2 – Zone to be provisioned to illustrate cloning
  • 10.1.1.80 – clusternode1 – Solaris Cluster 4.1 node
  • 10.1.1.90 – clusternode2 – Solaris Cluster 4.1 node

Each cluster node has four network interfaces, as follows:

  • net0e1000g0 – Public network
  • net1e1000g1 – Public network
  • net2e1000g2vmnet2 – a private host-only network (with no other hosts on it)
  • net3e1000g3vmnet4 – a private host-only network (with no other hosts on it)

net0 and net1 will be configured as an IPMP group with transitive probing. net2 and net3 will be used as private cluster interconnects. It is important that no other hosts are using the interconnect networks, otherwise the cluster installation software will detect the traffic and complain, as it could interfere with cluster communications.

Let’s start with some basic preparation …

Continue reading

How to Cluster Oracle Weblogic 12c via the Command Line

In this article, I will show you how to create a two-node Weblogic 12c cluster using only the command line. Oracle Weblogic (formerly BEA Weblogic) is one of the most resilient, reliable and high-performance J2EE application servers that I’ve worked with. I’ve used it to host both custom applications, as well as commercial applications that required an enterprise-grade J2EE container to serve them.

The lab topology will be as follows:

Weblogic Server

Hostname

IP Address

Port

Cluster Name

AdminServer dolan 172.16.18.169 7001 N/A
test_managed_server_1 dolan 172.16.18.169 7002 TestCluster
test_managed_server_2 gooby 172.16.18.172 7002 TestCluster

As you can see, AdminServer only runs on a single node. Once the managed servers have been configured and started for the first time by way of the administration server, they can be restarted independently – thus the loss of the administration server does not impact the general running of the cluster. It will, however, prevent administration of that cluster (configuration, deployments, etc.) until such time as the administration server is back online. There are ways around this (bind the administration to a failover VIP provided by keepalived or similar) but for all but the most demanding usage, a single administration server will suffice.

Both dolan and gooby run CentOS 6.3 x86_64. All steps should be run on both nodes unless otherwise noted.

Continue reading