Personal tools
You are here: Home Documentation Administrator's Handbook Part VIII: Cluster Administration
Document Actions

Part VIII: Cluster Administration

Guidelines for various administration tasks.

CLUSTER ADMINISTRATION

Adding/removing cluster nodes

Adding cluster nodes to an existing configuration is really piece of cake. All one has to do is to determine the correct MAC address of the new cluster node, copy an existing cluster node entry and modify the network interface settings acordingly.

The MAC adress can be found on the interface card, on the chasis of the server or you may boot up the server, drop to shell (e.g. boot kernel with com-step parameter, hit 5 times continue then break) and use the following command to show all interfaces:

# /sbin/ifconfig -a

Or:

# /sbin/ip addr show all

Also note that with this command you may easily detect which network interface is associated with eth0, eth1, eth2 or bonding devices like bond0, bond1 and so on.

After you have modified the cluster.conf with a copied node stanza and the correct MAC and IP addresses, you need to update the initrd or you need to create a new one on systems where com-ec is not configured.

Managing services (add/remove/start/stop)

It really depends on the service you want to manage but usually cluster services are controlled with clusvcadm.

Disable a resource group:

# clusvcadm -d groupname

Enable a resource group:

# clusvcadm -e groupname

Enable a resource group on specific node:

# clusvcadm -e groupname -m nodename

Relocate a resource to a different node:

# clusvcadm -r groupname -m nodename

Restart a resource on the same node:

# clusvcadm -R groupname

Stop a resource on the cluster:

# clusvcadm -s groupname

Managing Interfaces (bonding, vlan)

A cluster is usually constructed with high availability in mind. Therefor most network interfaces are bonded so that failover is possible in case one link gets down. The configuration is straightforward and described in the following.

Bonded network interfaces are constructed with one master interface (named bond0, bond1 and so on) and at least one but usually minimum two, better even more slave interfaces (regular eth0, eth1 etc.).

On the shell a bonded network configuration may be done as follows:

# modprobe bonding mode=0 miimon=100
# ifconfig eth1 down
# ifconfig eth3 down
# ifconfig eth4 down
# ifconfig bond1 hw ether 00:11:22:AA:BB:CC
# /sbin/ifconfig bond1 192.168.123.45 up
# /sbin/ifenslave bond1 eth1
# /sbin/ifenslave bond1 eth3
# /sbin/ifenslave bond1 eth4

If you want to enable 802.1q vlans:

# modprobe 8021q
# modprobe bonding mode=0 miimon=100
# ifconfig eth1 down
# ifconfig eth3 down
# ifconfig eth4 down
# ifconfig bond1 hw ether 00:11:22:AA:BB:CC
# /sbin/ifconfig bond1 192.168.123.45 up
# /sbin/ifenslave bond1 eth1
# /sbin/ifenslave bond1 eth3
# /sbin/ifenslave bond1 eth4
# vconfig add bond1 2
# vconfig add bond1 3
# ifconfig bond1.12 192.168.12.12 netmask 255.255.255.0 broadcast 192.168.12.255 up
# ifconfig bond1.13 192.168.13.12 netmask 255.255.255.0 broadcast 192.168.13.255 up
# echo 1 > /proc/sys/net/ipv4/ip_forward

Usually the configuration should be bootsafe. This is done by configuring the /etc/sysconfig/network-scripts/ifcfg-[bond0,eth0,eth1,eth2 etc.] scripts

# cat /etc/sysconfig/network-scripts/ifcfg-bond1
DEVICE=bond1
IPADDR=192.168.123.45
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
USERCTL=no
ONBOOT=yes
MASTER=bond1
SLAVE=yes
BOOTPROTO=none
# cat /etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth3
USERCTL=no
ONBOOT=yes
MASTER=bond1
SLAVE=yes
BOOTPROTO=none
# cat /etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth4
USERCTL=no
ONBOOT=yes
MASTER=bond1
SLAVE=yes
BOOTPROTO=none

Also don't forget to load the module bonding to /etc/modprobe.conf:

alias bond1 bonding

For a bootsafe 802.1q vlan configuration you will need the following settings:

# cat /etc/sysconfig/network-scripts/ifcfg-bond1.12
DEVICE=bond1
IPADDR=192.168.12.12
NETMASK=255.255.255.0
NETWORK=192.168.12.0
BROADCAST=192.168.12.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
# cat /etc/sysconfig/network-scripts/ifcfg-bond1.13
DEVICE=bond1
IPADDR=192.168.12.12
NETMASK=255.255.255.0
NETWORK=192.168.12.0
BROADCAST=192.168.12.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
USERCTL=no
ONBOOT=yes
MASTER=bond1
SLAVE=yes
BOOTPROTO=none
# cat /etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth3
USERCTL=no
ONBOOT=yes
MASTER=bond1
SLAVE=yes
BOOTPROTO=none
# cat /etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth4
USERCTL=no
ONBOOT=yes
MASTER=bond1
SLAVE=yes
BOOTPROTO=none

The following modules should be loaded:

options bond1 miimon=30 mode=1
alias bond1 bonding
alias bond1.12 bonding
alias bond1.13 bonding
alias bond1.12 8021q
alias bond1.13 8021q

If you need to specify further IP aliases or if you want to enable vlans, you can use the bonding interfaces just like regular interfaces.

For example bond0:1 would be a virtual interface that is connected to bond0. bond0.12 would be the vlan 12 configuration for bond0.

To adjust the interface used for inter cluster communication you need to modify the network interface that is specified in the com_info section of /etc/cluster/cluster.conf:

<eth name="eth0" mac="00:11:22:AB:CD:EE" master="bond0" slave="yes"/>
<eth name="eth1" mac="00:11:22:AB:CD:EF" master="bond0" slave="yes"/>
<eth name="bond0" ip="192.168.123.45" mask="255.255.255.0" gateway=""/>

Respectively if you want to use a vlan configuration:

<eth name="eth0" mac="00:11:22:AB:CD:EE" master="bond0" slave="yes"/>
<eth name="eth1" mac="00:11:22:AB:CD:EF" master="bond0" slave="yes"/>
<eth name="bond0"/>
<eth name="bond0.12" ip="192.168.123.45" mask="255.255.255.0" gateway=""/>

Upgrading the cluster

If you use only productive channels that have passed our extensive quality assurance tests, the cluster upgrade itself should not be difficult. However some of the clustered services may need some tweaking. Therefor it is a good idea to create a cluster clone prior upgrading the system. If anything goes wrong and you get out of time you are still able to boot a working cluster clone.

The detailed procedure done at Hilti is as follows:

Reboot the cluster node - One cluster node should be rebooted because you should ensure that there are no modifications done to the system that prohibits a clean start.

Create an archive clone -You should create a backup of the existing cluster using comoonics Enterprise Copy (com-ec):

# uname -r

Change "kernel version" and "cluster name" in the following listing to reflect the actual running kernel/cluster:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE localclone SYSTEM "/opt/atix/comoonics-cs/xml/comoonics-enterprise-clone.dtd">
<localclone source="disk" destination="backup">
  <cluster name="node202" sourcesuffix="" destsuffix="C"></cluster>
    <sourcedisks>
      <bootdisk name="/dev/sda"/>
      <rootdisk name="/dev/sdb"/>
    </sourcedisks>
    <destarchive name="node202C" path="/mnt/archive">
      <metaarchive name="meta-clone-node202-02-rhel5u1.tar"/>
      <rootarchive name="root-clone-node202-02-rhel5u1.tgz"/>
      <bootarchive name="boot-clone-node202-02-rhel5u1.tgz"/>
    </destarchive>
    <kernel version="2.6.18-53.el5xen"/>
</localclone>

Save as /root/archive-clone.xml or where you want to store com-ec definitions.

# mkdir -p /mnt/archive
# mount -t nfs node601:/data/linux/clones/in/li/ /mnt/archive/
# ll /mnt/archive/
total 25164116
-rw-r--r-- 1 root root  745245149 Sep 25 11:43 boot-clone-node202-01.tgz
-rw-r--r-- 1 root root   14433509 Sep 17 13:27 boot-clone-node233-01.tgz
-rw-r--r-- 1 root root   23298568 Sep 18 15:29 boot-clone-node233-02.tgz
-rw-r--r-- 1 root root    7160809 Sep 18 14:53 boot-clone-node603-01.tgz
-rw-r--r-- 1 root root    7164846 Sep 18 15:15 boot-clone-node604-01.tgz
-rw-r--r-- 1 root root   21339914 Sep 18 15:30 boot-clone-node629-01.tgz
-rw-r--r-- 1 root root   14196833 Sep 15 11:34 boot-clone-node630-01.tgz
-rw-r--r-- 1 root root   14441910 Sep  4 22:29 boot-clone-node631-02.tgz
-rw-r--r-- 1 root root   21632874 Sep  5 11:49 boot-clone-node631-03.tgz
-rw-r--r-- 1 root root   21632874 Sep 18 15:33 boot-clone-node631-04.tgz
-rw-r--r-- 1 root root   21632875 Sep 18 15:42 boot-clone-node632-01.tgz
-rw-r--r-- 1 root root   14441921 Sep 18 15:46 boot-clone-node633-01.tgz
-rw-r--r-- 1 root root   14441921 Oct 19 14:40 boot-clone-node633-02.tgz
-rw-r--r-- 1 root root      10240 Sep 25 11:43 meta-clone-node202-01.tar
-rw-r--r-- 1 root root      10240 Sep 17 13:35 meta-clone-node233-01.tar
-rw-r--r-- 1 root root      10240 Sep 18 15:29 meta-clone-node233-02.tar
-rw-r--r-- 1 root root      10240 Sep 18 14:53 meta-clone-node603-01.tar
-rw-r--r-- 1 root root      10240 Sep 18 15:15 meta-clone-node604-01.tar
-rw-r--r-- 1 root root      10240 Sep 18 15:30 meta-clone-node629-01.tar
-rw-r--r-- 1 root root      10240 Sep 15 11:34 meta-clone-node630-01.tar
-rw-r--r-- 1 root root      20480 Sep  4 22:29 meta-clone-node631-02.tar
-rw-r--r-- 1 root root      10240 Sep  5 11:49 meta-clone-node631-03.tar
-rw-r--r-- 1 root root      10240 Sep 18 15:33 meta-clone-node631-04.tar
-rw-r--r-- 1 root root      10240 Sep 18 15:42 meta-clone-node632-01.tar
-rw-r--r-- 1 root root      10240 Sep 18 15:46 meta-clone-node633-01.tar
-rw-r--r-- 1 root root      10240 Oct 19 14:40 meta-clone-node633-02.tar
-rw-r--r-- 1 root root 3726527716 Sep 25 12:04 root-clone-node202-01.tgz
-rw-r--r-- 1 root root 1469880460 Sep 17 13:35 root-clone-node233-01.tgz
-rw-r--r-- 1 root root 1505457757 Sep 18 15:40 root-clone-node233-02.tgz
-rw-r--r-- 1 root root  801692639 Sep 18 15:00 root-clone-node603-01.tgz
-rw-r--r-- 1 root root  889991016 Sep 18 15:23 root-clone-node604-01.tgz
-rw-r--r-- 1 root root 1067183803 Sep 18 15:35 root-clone-node629-01.tgz
-rw-r--r-- 1 root root 1098792156 Sep 15 11:39 root-clone-node630-01.tgz
-rw-r--r-- 1 root root  901944549 Sep  4 22:35 root-clone-node631-02.tgz
-rw-r--r-- 1 root root  957838245 Sep  5 11:56 root-clone-node631-03.tgz
-rw-r--r-- 1 root root  962901616 Sep 18 15:40 root-clone-node631-04.tgz
-rw-r--r-- 1 root root 1162097708 Sep 18 15:50 root-clone-node632-01.tgz
-rw-r--r-- 1 root root 5115214979 Sep 18 16:05 root-clone-node633-01.tgz
-rw-r--r-- 1 root root 5115601227 Oct 19 15:02 root-clone-node633-02.tgz

ATTENTION: Make sure you don't overwrite files that you want to keep!

# df -h /mnt/archive/

ATTENTION: Make sure that there is enough space left. In doubt compare to df -h and gfs_tool df / output of the system:

# gfs_tool df /
/:
SB lock proto = "lock_dlm"
SB lock table = "node202:lt_sharedroot"
SB ondisk format = 1309
SB multihost format = 1401
Block size = 4096
Journals = 4
Resource Groups = 118
Mounted lock proto = "lock_dlm"
Mounted lock table = "node202:lt_sharedroot"
Mounted host data = "jid=0:id=196609:first=1"
Journal number = 0
Lock module flags = 0
Local flocks = FALSE
Local caching = FALSE
Oopses OK = FALSE

Type           Total          Used           Free           use%
------------------------------------------------------------------------
inodes         96841          96841          0              100%
metadata       128386         2804           125582         2%
data           7506157        1100451        6405706        15%
# df -h
Filesystem            Size  Used Avail Use% Mounted on
rootfs                 30G  4.6G   25G  16% /
none                  7.8G  160K  7.8G   1% /dev
/dev/vg_local/lv_chroot
                       15G  548M   14G   4% /cdsl.local/var/comoonics/chroot
/dev/mapper/vg_node202_sr-lv_sharedroot
                       30G  4.6G   25G  16% /
/dev/mapper/vg_node202_sr-lv_sharedroot
                       30G  4.6G   25G  16% /cdsl.local
tmpfs                 7.8G     0  7.8G   0% /dev/shm
/dev/vg_local/lv_tmp  3.0G  155M  2.7G   6% /tmp
node601:/data/linux/clones/in/li/
                      299G  240G   60G  81% /mnt/archive
# com-ec -a -x /opt/atix/comoonics-cs/xsl/localclone.xsl archive-clone.xml

root@node202a:/mnt/archive#  com-ec -x /opt/atix/comoonics-cs/xsl/localclone.xsl /root/archive-clone.xml
-------------------com-ec : INFO Start of enterprisecopy node202-local-clone--------------------
-------------------com-ec : INFO Executing all sets 5--------------------
2008-02-07 12:51:05,586 comoonics.enterprisecopy.ComEnterpriseCopy INFO Executing copyset PartitionCopyset(copy-bootdisk:partition)
2008-02-07 12:51:05,701 comoonics.enterprisecopy.ComEnterpriseCopy INFO Executing copyset FilesystemCopyset(copy-bootdisk-filesystem:filesystem)
2008-02-07 12:52:15,014 comoonics.enterprisecopy.ComEnterpriseCopy INFO Executing copyset PartitionCopyset(copy-rootdisk:partition)
2008-02-07 12:52:15,050 comoonics.enterprisecopy.ComEnterpriseCopy INFO Executing copyset LVMCopyset(copy-rootdisk-lvm:lvm)
2008-02-07 12:52:15,934 comoonics.enterprisecopy.ComEnterpriseCopy INFO Executing copyset FilesystemCopyset(copy-rootdisk-filesystem:filesystem)
-------------------com-ec : INFO Finished execution of enterprisecopy node202-local-clone successfully--------------------

Check if files are named correctly:

# ll -h  /mnt/archive/*rhel5u1*
-rw-r--r-- 1 root root 677M Feb  7 12:50 boot-clone-node202-02-rhel5u1.tgz
-rw-r--r-- 1 root root  20K Feb  7 12:50 meta-clone-node202-02-rhel5u1.tar
-rw-r--r-- 1 root root 1.5G Feb  7 13:01 root-clone-node202-02-rhel5u1.tgz

Write all caches to verify that all buffers were written to remote share:

# sync && sync

Restore an old master clone - If you want to put the cluster to a defined state that was stored previously, you may optionally restore a master clone.

First you have to move the current (or the desired) cluster configuration files to /etc/comoonics/enterprisecopy on the running node.

You best mount node601:/data under /data via comoonics live cd because then you may overwrite the non clone volumes. Detailed information is provided by the com-ec documentation.

The comoonics live cd is mounted with the following command from node601:

# /opt/atix/comoonics-fencing/fence_ilo -x /data/services/ec/ILOMapLiveCD.xml -a node202a-rc -l user -p secret

You may use the files that are stored in node601:/data/services/ec/node202 for the following master clone process:

# cp /data/services/ec/node202/hosts_node202 /etc/comoonics/enterprisecopy/hosts_node202
# cp /data/services/ec/node202/fstab_node202 /etc/comoonics/enterprisecopy/fstab fstab_node202
# cp /data/services/ec/node202/cluster_node202.conf /etc/comoonics/enterprisecopy/fstab cluster_node202.conf

ATTENTION: These files will be copied to the destination cluster so you should review all files that they are as intended! Otherwise you may risk an unbootable cluster that needs debugging or you need to clone once again.

Then you need to modify the masterclone-node202.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE masterclone SYSTEM "masterclone.dtd">
<masterclone source="backup" destination="disk">
<sourcecluster name="node000" suffix=""></sourcecluster>
<destcluster name="node202" nodes="2">
  <node name="node202a" nodeid="1">
    <nic name="bond1" ip="10.0.45.28" gateway="10.0.47.254"/>
    <nic name="bond2" ip="10.226.2.126"/>
  </node>
  <node name="node202b" nodeid="2">
    <nic name="bond1" ip="10.0.45.29" gateway="10.0.47.254"/>
    <nic name="bond2" ip="10.226.2.127"/>
  </node>
</destcluster>
<sourcearchive name="clone-node000" path="/data/linux/clones/node000/">
      <metaarchive name="meta-clone-node000-09.tar"/>
      <rootarchive name="root-clone-node000-09.tgz"/>
      <bootarchive name="boot-clone-node000-09.tgz"/>
</sourcearchive>
<destdisks>
  <bootdisk name="/dev/sda"/>
  <rootdisk name="/dev/sdb"/>
</destdisks>
<kernel version="2.6.9-55.0.2.ELsmp"/>
<dirs><configs name="etc/comoonics/enterprisecopy"/></dirs>
</masterclone>

Now start the cloning process:

# com-ec -ad -x /data/services/ec/masterclone.xsl /etc/comoonics/enterprisecopy/masterclone-node202.xml

Now reboot the system and check if the restore was successful.

Update the cluster - The first step to update the cluster is to check the /etc/sysconfig/rhn/sources configuration. The update process for Hilti is not done via Red Hat Network directly but the downloaded RPMs are stored in seperate yum channels on node601 that mark a certain release state. Since we want to update to RHEL 4 U6, the node601 server should fetch all new channels from RHN and then a new yum repository should be created according to the documentation in the Hilti wiki.

After you have all yum repositories available you may update the server including a new kernel release. This may take a few minutes depending on the number of packages to update:

# up2date -uf

Currently you have to apply the following hotfix:

# rpm -ivh util-linux-mount-hilti-2.12a-17.1.hoi.x86_64.rpm

Now you need to recompile the fibre channel hostbus adapter drivers for the new installed kernels:

# /opt/hp/src/hp_qla2x00src/compile_all_kernels

The next step is to create a new initrd:

# mount /dev/sda1 /boot/

Now add a new boot stanza in /etc/grub/menu.lst:

title Red Hat Enterprise Linux AS (2.6.9-67.0.4.ELsmp sharedroot node202)
        root (hd0,0)
        kernel /vmlinuz-2.6.9-67.0.4.ELsmp rw elevator=deadline
        initrd /initrd_sr-2.6.9-67.0.4.ELsmp.img

Now save and reboot the machine (preferably only one node of the cluster) to test if the update was successful.

The kernel boot options com-debug and com-step as well as "1" to boot into single user mode may help you if you need to debug the cluster.

Cluster Shutdown

It is important that you maintain quorum while shutting down the cluster. If you get inquorate, the remaining cluster freezes and a clean shutdown will be difficult! Node after node needs to rejoin the cluster to get enough votes again.

To reduce the number of votes and to remove a node from the cluster all you need to do is issue the follwing command:

# cman_tool leave remove

To redefine the number of expected votes you may use:

# cman_tool expected number_votes

Adjusting iptables rules for Red Hat Cluster Suite

Prerequesits are that iptables are already inplace and cluster traffic is rejected.

Use the following shell script for RHEL5. Add the gnbd Port to TCP_PORTS if in use. Also if rgmanager is not used remove the rgmanager ports from TCP_PORTS.

IPTABLES=/sbin/iptables
CLUSTER_INTERFACE=eth0
TCP_PORTS="41966 41967 41968 41969 50006 50008 50009 21064"
UPD_PORTS="50007 5405"

echo -n "Applying iptables rules"
for port in $TCP_PORTS; do
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p tcp -m tcp --sport $port -j ACCEPT
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p tcp -m tcp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p tcp -m tcp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p tcp -m tcp --sport $port -j ACCEPT
done
for port in $UPD_PORTS; do
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p udp -m udp --sport $port -j ACCEPT
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p udp -m udp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p udp -m udp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p udp -m udp --sport $port -j ACCEPT
done
echo "[OK]"
echo -n "Saving new rules"
(/etc/init.d/iptables save && \
 echo "[OK]") || echo "[FAILED]"

Script for RHEL4:

IPTABLES=/sbin/iptables
CLUSTER_INTERFACE=eth0
TCP_PORTS="41966 41967 41968 41969 50006 50008 50009 21064"
UPD_PORTS="50007 6809"

echo -n "Applying iptables rules"
for port in $TCP_PORTS; do
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p tcp -m tcp --sport $port -j ACCEPT
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p tcp -m tcp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p tcp -m tcp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p tcp -m tcp --sport $port -j ACCEPT
done
for port in $UPD_PORTS; do
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p udp -m udp --sport $port -j ACCEPT
  $IPTABLES -I INPUT  -i $CLUSTER_INTERFACE -p udp -m udp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p udp -m udp --dport $port -j ACCEPT
  $IPTABLES -I OUTPUT -o $CLUSTER_INTERFACE -p udp -m udp --sport $port -j ACCEPT
done
echo "[OK]"
echo -n "Saving new rules"
(/etc/init.d/iptables save && \
 echo "[OK]") || echo "[FAILED]"

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: