In this article, I’ll show how to set up SCSI-3 Persistent Reservations on Fedora Core 20 using targetcli, serving a pair of iSCSI LUNs to a simple Red Hat Cluster that will host a failover filesystem for the purposes of testing the iSCSI implementation. The Linux IO target (LIO) (http://linux-iscsi.org/wiki/LIO) has been the Linux SCSI target since kernel version 2.6.38. It supports a rapidly growing number of fabric modules, and all existing Linux block devices as backstores. For the purposes of our demonstration, the important fact is that it supports operating as an iSCSI target. targetcli is the tool used to perform the LIO configuration. SCSI-3 persistent reservations are required for a number of cluster storage configurations for I/O fencing and failover/retakeover. Therefore, LIO can be used as the foundation for high-end clustering solutions such as Red Hat Cluster Suite. You can read more about persistent reservations here.
The nodes in the lab are as follows:
- 10.1.1.103 - centos03 - Red Hat Cluster node 1 on CentOS 6.5
- 10.1.1.104 - centos04 - Red Hat Cluster node 2 on CentOS 6.5
- 10.1.1.108 - fedora01 - Fedora Core 20 storage node
Installation
I’ll start by installing targetcli onto fedora01:
|
1 |
[root@fedora01 ~]# yum -y install targetcli |
Let’s check that it has been installed correctly:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
root@fedora01 ~]# targetcli Warning: Could not load preferences file /root/.targetcli/prefs.bin. targetcli shell version 2.1.fb35 Copyright 2011-2013 by Datera, Inc and others. For help on commands, type 'help'. /> ls o- / ...................................................................................... [...] o- backstores ........................................................................... [...] | o- block ............................................................... [Storage Objects: 0] | o- fileio .............................................................. [Storage Objects: 0] | o- pscsi ............................................................... [Storage Objects: 0] | o- ramdisk ............................................................. [Storage Objects: 0] o- iscsi ......................................................................... [Targets: 0] o- loopback ...................................................................... [Targets: 0] o- vhost ......................................................................... [Targets: 0] /> exit Global pref auto_save_on_exit=true Last 10 configs saved in /etc/target/backup. Configuration saved to /etc/target/saveconfig.json |
Make sure that, before proceeding, any existing configuration is removed:
|
1 2 |
[root@fedora01 ~]# targetcli clearconfig confirm=true All configuration cleared |
Backing Store Configuration
I have attached a second disk to fedora01 that I will bring under LVM control. I’ll then create the iSCSI backing stores as logical volumes on this second disk.
|
1 2 3 4 5 |
[root@fedora01 ~]# fdisk -l /dev/sdb Disk /dev/sdb: 20 GiB, 21474836480 bytes, 41943040 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes |
I’ll create the physical volume first, followed by a volume group - vg_data - on the new physical volume. I’ll then create two logical volumes, lv_fs_failover and lv_fs_fence. lv_fs_failover will be used by the cluster as a failover filesystem, and lv_fs_fence will be used as the fence device by the cluster via SCSI fencing and SCSI-3 persistent reservations.
|
1 2 3 4 5 6 7 8 |
[root@fedora01 ~]# pvcreate /dev/sdb Physical volume "/dev/sdb" successfully created [root@fedora01 ~]# vgcreate vg_data /dev/sdb Volume group "vg_data" successfully created [root@fedora01 ~]# lvcreate -L 8G -n lv_fs_failover vg_data Logical volume "lv_fs_failover" created [root@fedora01 ~]# lvcreate -L 1G -n lv_fs_fence vg_data Logical volume "lv_fs_fence" created |
Let’s verify that everything has been created as expected:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[root@fedora01 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert root fedora -wi-ao---- 17.51g swap fedora -wi-ao---- 2.00g lv_fs_failover vg_data -wi-a----- 8.00g lv_fs_fence vg_data -wi-a----- 1.00g [root@fedora01 ~]# vgs VG #PV #LV #SN Attr VSize VFree fedora 1 2 0 wz--n- 19.51g 0 vg_data 1 2 0 wz--n- 20.00g 11.00g [root@fedora01 ~]# pvs PV VG Fmt Attr PSize PFree /dev/sda2 fedora lvm2 a-- 19.51g 0 /dev/sdb vg_data lvm2 a-- 20.00g 11.00g |
targetcli Configuration
Time to log into targetcli and begin configuration. We cd to /backstores/block - this is where we create the backing stores on the LVM logical volumes:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
[root@fedora01 system]# targetcli targetcli shell version 2.1.fb35 Copyright 2011-2013 by Datera, Inc and others. For help on commands, type 'help'. /> ls o- / ...................................................................................... [...] o- backstores ........................................................................... [...] | o- block ............................................................... [Storage Objects: 0] | o- fileio .............................................................. [Storage Objects: 0] | o- pscsi ............................................................... [Storage Objects: 0] | o- ramdisk ............................................................. [Storage Objects: 0] o- iscsi ......................................................................... [Targets: 0] o- loopback ...................................................................... [Targets: 0] o- vhost ......................................................................... [Targets: 0] /> cd backstores /backstores> ls o- backstores ............................................................................. [...] o- block ................................................................. [Storage Objects: 0] o- fileio ................................................................ [Storage Objects: 0] o- pscsi ................................................................. [Storage Objects: 0] o- ramdisk ............................................................... [Storage Objects: 0] /backstores> cd block /backstores/block> ls o- block ................................................................... [Storage Objects: 0] /backstores/block> |
Start by creating new backing stores on the previously created logical volumes - try to give the storage objects meaningful names:
|
1 2 3 4 5 6 7 8 9 |
/backstores/block> create 8g-fs-failover /dev/vg_data/lv_fs_failover Created block storage object 8g-fs-failover using /dev/vg_data/lv_fs_failover. /backstores/block> create 1g-fs-fence /dev/vg_data/lv_fs_fence Created block storage object 1g-fs-fence using /dev/vg_data/lv_fs_fence. /backstores/block> ls o- block ................................................................... [Storage Objects: 2] o- 1g-fs-fence ............. [/dev/vg_data/lv_fs_fence (1.0GiB) write-thru deactivated] o- 8g-fs-failover .......... [/dev/vg_data/lv_fs_failover (8.0GiB) write-thru deactivated] /backstores/block> |
Next, cd to /iscsi and disable discovery_auth - as this is only a lab we will not require authentication. This means that anyone who knows your initiator IQNs will be able to connect. We still configure ACLs on the connecting initiator IQNs, however.
|
1 2 3 |
/backstores/block> cd /iscsi /iscsi> set discovery_auth enable=0 Parameter enable is now 'False'. |
Create the iSCSI target:
|
1 2 3 4 5 6 7 8 9 10 11 |
/iscsi> create Created target iqn.2003-01.org.linux-iscsi.fedora01.x8664:sn.474dd7ea3998. Created TPG 1. /iscsi> ls o- iscsi ........................................................................... [Targets: 1] o- iqn.2003-01.org.linux-iscsi.fedora01.x8664:sn.474dd7ea3998 ....................... [TPGs: 1] o- tpg1 .............................................................. [no-gen-acls, no-auth] o- acls ......................................................................... [ACLs: 0] o- luns ......................................................................... [LUNs: 0] o- portals ................................................................... [Portals: 1] o- 0.0.0.0:3260 .................................................................... [OK] |
This has the effect of creating a portal on 0.0.0.0:3260 too.
Next, create the LUNs on the backing stores previously configured:
|
1 2 3 4 5 6 |
/iscsi> cd iqn.2003-01.org.linux-iscsi.fedora01.x8664:sn.474dd7ea3998/ /iscsi/iqn.20....474dd7ea3998> cd tpg1/luns /iscsi/iqn.20...998/tpg1/luns> create /backstores/block/8g-fs-failover Created LUN 0. /iscsi/iqn.20...998/tpg1/luns> create /backstores/block/1g-fs-fence Created LUN 1. |
Verify their creation:
|
1 2 3 4 |
/iscsi/iqn.20...998/tpg1/luns> ls o- luns ............................................................................... [LUNs: 2] o- lun0 .................................. [block/8g-fs-failover (/dev/vg_data/lv_fs_failover)] o- lun1 ........................................ [block/1g-fs-fence (/dev/vg_data/lv_fs_fence)] |
Next, we’ll create our (minimal) ACLs, so that only the appropriate initiator IQNs can connect to the LUNs:
|
1 2 |
/iscsi/iqn.20...998/tpg1/luns> cd ../acls /iscsi/iqn.20...998/tpg1/acls> |
First, on both cluster nodes, verify the current IQNs:
|
1 2 3 4 |
[root@centos03 ~]# cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1994-05.com.redhat:613a453497b4 [root@centos04 ~]# cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1994-05.com.redhat:f415d5ebe6c |
Create the ACLs:
|
1 2 3 4 5 6 7 8 |
/iscsi/iqn.20...998/tpg1/acls> create iqn.1994-05.com.redhat:613a453497b4 Created Node ACL for iqn.1994-05.com.redhat:613a453497b4 Created mapped LUN 1. Created mapped LUN 0. /iscsi/iqn.20...998/tpg1/acls> create iqn.1994-05.com.redhat:f415d5ebe6c Created Node ACL for iqn.1994-05.com.redhat:f415d5ebe6c Created mapped LUN 1. Created mapped LUN 0. |
Finally, save the configuration:
|
1 2 3 4 |
/iscsi/iqn.20.../tpg1/portals> cd / /> saveconfig Last 10 configs saved in /etc/target/backup. Configuration saved to /etc/target/saveconfig.json |
Enable and start the target service:
|
1 2 3 |
[root@fedora01 ~]# systemctl enable target.service ln -s '/usr/lib/systemd/system/target.service' '/etc/systemd/system/multi-user.target.wants/target.service' [root@fedora01 ~]# systemctl start target.service |
Add an iptables rule so that the initiators can connect:
|
1 2 3 4 |
[root@fedora01 ~]# firewall-cmd --add-port=3260/tcp success [root@fedora01 ~]# firewall-cmd --permanent --add-port=3260/tcp success |
Create the /var/target/pr directory so that persistent reservations work correctly:
|
1 |
[root@fedora01 ~]# mkdir -p /var/target/pr |
Failure to do this will result in error messages being seen in dmesg, such as:
|
1 2 |
[ 6272.148841] filp_open(/var/target/pr/aptpl_fd9e668a-7d4b-47fe-acb0-a652dd005103) for APTPL metadata failed [ 6272.148844] SPC-3 PR: Could not update APTPL |
Initiator Configuration
On each cluster node, discover the new target:
|
1 2 3 4 |
[root@centos03 ~]# iscsiadm --mode discovery --type sendtargets --portal 10.1.1.108 10.1.1.108:3260,1 iqn.2003-01.org.linux-iscsi.fedora01.x8664:sn.474dd7ea3998 [root@centos04 ~]# iscsiadm --mode discovery --type sendtargets --portal 10.1.1.108 10.1.1.108:3260,1 iqn.2003-01.org.linux-iscsi.fedora01.x8664:sn.474dd7ea3998 |
Start the iscsi service on both cluster nodes:
|
1 2 3 4 |
[root@centos03 ~]# service iscsi start [root@centos03 ~]# chkconfig iscsi on [root@centos04 ~]# service iscsi start [root@centos04 ~]# chkconfig iscsi on |
You can now use targetcli on the storage node to verify that the initiators are both logged in:
|
1 2 3 |
[root@fedora01 ~]# targetcli sessions alias: centos04 sid: 16 type: Normal session-state: LOGGED_IN alias: centos03 sid: 15 type: Normal session-state: LOGGED_IN |
Verify that the new LUNs are visible on the cluster nodes:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
[root@centos03 ~]# fdisk -l ... Disk /dev/sdb: 8589 MB, 8589934592 bytes 64 heads, 32 sectors/track, 8192 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 4194304 bytes Disk identifier: 0x9b40fdcd Device Boot Start End Blocks Id System Disk /dev/sdc: 1073 MB, 1073741824 bytes 34 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 2074 * 512 = 1061888 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 4194304 bytes Disk identifier: 0x00000000 ... |
As you can see, /dev/sdb maps to the lv_fs_failover logical volume on fedora01 and /dev/sdc maps to lv_fs_fence.
From one cluster node, create a partition on /dev/sdb:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
[root@centos03 ~]# fdisk /dev/sdb WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-8192, default 1): Using default value 1 Last cylinder, +cylinders or +size{K,M,G} (1-8192, default 8192): Using default value 8192 Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. |
Create an ext4 filesystem:
|
1 2 3 4 5 6 |
[root@centos03 ~]# mkfs.ext4 /dev/sdb1 [root@centos03 ~]# tune2fs -c 0 -i 0 -m 0 /dev/sdb1 tune2fs 1.41.12 (17-May-2010) Setting maximal mount count to -1 Setting interval between checks to 0 seconds Setting reserved blocks percentage to 0% (0 blocks) |
We now have an ext4 filesystem for the cluster to manage, and another device (/dev/sdc) to be used as a fence device by the cluster.
Cluster Preparation
Before starting to configure the Red Hat Cluster software, a few things need to be done. First, install NTP on both nodes:
|
1 2 3 4 5 |
# yum -y install ntp # chkconfig ntpd on # ntpdate 1.au.pool.ntp.org # service ntpd start # ntpq -p |
Next, for lab purposes, disable iptables and ip6tables, and switch SELinux into permissive mode:
|
1 2 3 4 5 6 7 8 |
# service iptables stop # chkconfig iptables off # service ip6tables stop # chkconfig ip6tables off # vi /etc/selinux/config # grep '^SELINUX=' /etc/selinux/config SELINUX=permissive # setenforce permissive |
Modify /etc/hosts on both nodes, adding an entry for each cluster node:
|
1 2 3 |
# vi /etc/hosts 10.1.1.103 centos03 10.1.1.104 centos04 |
Next, verify multicast connectivity with the omping utility. Start by installing it:
|
1 |
# yum -y install omping |
Run the command simultaneously on both nodes, and verify that multicast connectivity is established:
|
1 2 3 4 5 6 7 8 9 10 |
[root@centos03 ~]# omping 10.1.1.104 10.1.1.103 10.1.1.104 : waiting for response msg 10.1.1.104 : joined (S,G) = (*, 232.43.211.234), pinging 10.1.1.104 : unicast, seq=1, size=69 bytes, dist=0, time=0.497ms 10.1.1.104 : multicast, seq=1, size=69 bytes, dist=0, time=0.507ms [root@centos04 ~]# omping 10.1.1.103 10.1.1.104 10.1.1.103 : waiting for response msg 10.1.1.103 : joined (S,G) = (*, 232.43.211.234), pinging 10.1.1.103 : unicast, seq=1, size=69 bytes, dist=0, time=0.201ms 10.1.1.103 : multicast, seq=1, size=69 bytes, dist=0, time=0.281ms |
Install the appropriate packages required for Red Hat Cluster on both nodes:
|
1 |
# yum -y install ricci cman rgmanager ccs |
On both nodes, enable the appropriate services:
|
1 2 3 |
# chkconfig ricci on # chkconfig cman on # chkconfig rgmanager on |
Set a password for the ricci user:
|
1 2 3 4 5 |
# passwd ricci Changing password for user ricci. New password: Retype new password: passwd: all authentication tokens updated successfully. |
And start the ricci daemon:
|
1 2 3 4 5 6 |
# service ricci start Starting system message bus: [ OK ] Starting oddjobd: [ OK ] generating SSL certificates... done Generating NSS database... done Starting ricci: [ OK ] |
We can now begin cluster configuration.
Cluster Configuration
The following commands should only be issued on one node of the cluster unless otherwise specified. centos03 will be configured first, and the configuration will then be synced across to centos04. The ccs utility is used to administer cluster configuration from the command line.
Create the cluster, in our case it’s called testcluster:
|
1 |
[root@centos03 ~]# ccs -h centos03 --createcluster testcluster |
Add both nodes to the configuration, each having a single quorum vote:
|
1 2 3 4 |
[root@centos03 ~]# ccs -h centos03 --addnode centos03 --votes=1 --nodeid=1 Node centos03 added. [root@centos03 ~]# ccs -h centos03 --addnode centos04 --votes=1 --nodeid=2 Node centos04 added. |
Set the fence daemon properties as appropriate for your environment:
|
1 |
[root@centos03 ~]# ccs -h centos03 --setfencedaemon post_fail_delay=0 post_join_delay=30 |
The post_fail_delay parameter is the number of seconds the fence daemon (fenced) waits before fencing a node (a member of the fence domain) after the node has failed. The post_join_delay parameter is the number of seconds the fence daemon (fenced) waits before fencing a node after the node joins the fence domain.
Set the cman daemon properties as appropriate for the correct operation of a two node cluster. For this, we need to set the two_node parameter to 1, and expected_votes to 1 for the cluster to remain quorate upon failure of a node:
|
1 |
[root@centos03 ~]# ccs -h centos03 --setcman two_node=1 expected_votes=1 |
Next, add the SCSI fencing method to both nodes:
|
1 2 3 4 |
[root@centos03 ~]# ccs -h centos03 --addmethod scsi centos03 Method scsi added to centos03. [root@centos03 ~]# ccs -h centos03 --addmethod scsi centos04 Method scsi added to centos04. |
We can now create our fence device on /dev/sdc:
|
1 |
[root@centos03 ~]# ccs -h centos03 --addfencedev scsi_dev agent=fence_scsi devices=/dev/sdc logfile=/var/log/cluster/fence_scsi.log aptpl=1 |
Note that APTPL (Activate Persist Through Power Loss) is enabled, which is another feature that Linux IO supports. We also enable logging for the fence agent.
Next, the appropriate fence instances are added. The unfence instances are automatically created:
|
1 2 3 4 |
[root@centos03 ~]# ccs -h centos03 --addfenceinst scsi_dev centos03 scsi key=1 Note: Automatically adding unfence action... (use --nounfence to prevent this) [root@centos03 ~]# ccs -h centos03 --addfenceinst scsi_dev centos04 scsi key=2 Note: Automatically adding unfence action... (use --nounfence to prevent this) |
Create a failover domain:
|
1 |
[root@centos03 ~]# ccs -h centos03 --addfailoverdomain fs-failover ordered=1 nofailback=1 |
And add the two cluster nodes as failover domain nodes:
|
1 2 |
[root@centos03 ~]# ccs -h centos03 --addfailoverdomainnode fs-failover centos03 1 [root@centos03 ~]# ccs -h centos03 --addfailoverdomainnode fs-failover centos04 2 |
Add a service to the failover domain:
|
1 |
[root@centos03 ~]# ccs -h centos03 --addservice fs domain=fs-failover recovery=relocate autostart=1 |
Next, add the filesystem resource itself:
|
1 |
[root@centos03 ~]# ccs -h centos03 --addresource fs name=failover_fs device=/dev/sdb1 mountpoint=/mnt fstype=ext4 |
Finally, add a subservice tying the filesystem resource back to the service. These are used more for when resource ordering is required, but I’ll add it anyway:
|
1 |
[root@centos03 ~]# ccs -h centos03 --addsubservice fs fs ref=failover_fs |
Ensure cman is started on both nodes:
|
1 |
[root@centos03 ~]# service cman start |
Update the active running configuration, and sync to centos04:
|
1 2 |
[root@centos03 ~]# cman_tool version -r [root@centos03 ~]# ccs -h centos03 --sync --activate |
Start all cluster services and resources, and verify the cluster is quorate:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
[root@centos03 ~]# ccs -h centos03 --startall Started centos03 Started centos04 [root@centos03 ~]# clustat Cluster Status for testcluster @ Thu Jul 17 20:10:44 2014 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ centos03 1 Online, Local, rgmanager centos04 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:fs centos03 started |
Check that the filesystem is mounted on centos03:
|
1 2 3 4 5 6 |
[root@centos03 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_centos03-lv_root 18G 1.1G 16G 7% / tmpfs 495M 23M 473M 5% /dev/shm /dev/sda1 485M 33M 427M 8% /boot /dev/sdb1 7.9G 146M 7.8G 2% /mnt |
We can see /dev/sdb1 mounted on /mnt. Reboot centos03 and the filesystem should failover to centos04:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[root@centos03 ~]# reboot Broadcast message from root@centos03 (/dev/pts/0) at 20:11 ... The system is going down for reboot NOW! [root@centos04 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_centos04-lv_root 18G 1.1G 16G 7% / tmpfs 495M 23M 473M 5% /dev/shm /dev/sda1 485M 33M 427M 8% /boot /dev/sdb1 7.9G 146M 7.8G 2% /mnt |
All is working as expected! You can further verify cluster status with the following commands:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[root@centos03 ~]# ccs -h centos03 --lsnodes centos03: votes=1, nodeid=1 centos04: votes=1, nodeid=2 [root@centos03 ~]# ccs -h centos03 --lsfencedev scsi_dev: logfile=/var/log/cluster/fence_scsi.log, aptpl=1, devices=/dev/sdc, agent=fence_scsi [root@centos03 ~]# ccs -h centos03 --lsfailoverdomain fs-failover: restricted=0, ordered=0, nofailback=0 centos03: priority=1 centos04: priority=2 [root@centos03 ~]# ccs -h centos03 --lsservices service: name=fs, domain=fs-failover, autostart=1, recovery=relocate fs: ref=failover_fs resources: fs: name=failover_fs, device=/dev/sdb1, mountpoint=/mnt, fstype=ext4 |