Final Preparations after Cluster Setup
This documentation lists final preparations that should be done after initial cluster setup.
Final Preparations after Cluster Setup
This documentation describes the final preparations after you have setup your cluster. Most of these settings are important for cluster maintenance and you should carefully check all of them.
syslog Configuration
It is very important for safe operation of a cluster that all messages from the cluster nodes get channeled to a single monitoring server (grayhead).
- Check that the following line is present in the com_info section of each cluster node:
<syslog name="syslog-server"/>
- Check that 'syslog-server' is resolveable via /etc/hosts
- Check that 'syslog-server' is configured via '/etc/syslog.conf' to receive external syslog messages:
# Options to syslogd # -m 0 disables 'MARK' messages. # -r enables logging from remote machines # -x disables DNS lookups on messages recieved with -r # See syslogd(8) for more details SYSLOGD_OPTIONS="-m 0 -r -s cl.atix -s collocation.atix" # Options to klogd # -2 prints all kernel oops messages twice; once for klogd to decode, and # once for processing with 'ksymoops' # -x disables all klogd processing of oops messages entirely # See klogd(8) for more details KLOGD_OPTIONS="-x"
- Check that all nodes send their syslog messages to the syslog-server. This is configured in '/etc/syslog.conf':
(...) *.* @realserver4
- Reload the syslog service on all nodes and the syslog server and test your configuration with a sample message:
# logger testmessage
netconsole Configuration
The syslog configuration setting in /etc/cluster/cluster.conf configures the cluster node so that it logs its boot messages to the syslog-server. However the boot process is hybrid. After the comoonics boot the cluster node needs to be configured so that the following boot process is logged to syslog-server as well.
- This is done via netconsole.
# cat /etc/sysconfig/netdump | grep -v \# SYSLOGADDR=syslog-server SYSLOGPORT=514
- To be able to send kprint messages to another syslog server, the debug level for kernel messages (kprint) has to be increased. The follwing line should be added to the file '/etc/sysconfig/syslog':
KLOGD_OPTIONS="-x -c8"
Passwordless Authentification
In case of an emergency it is mandatory that you can access each cluster node with ease. Therefor it is useful if a ssh-key is installed where you may memorize the matra and you do not have to lookup cryptic passwords.
- To install the key, either use the previously created one or create a new key with:
# ssh-keygen -t dsa
- Append this key to /root/.ssh/authorized_keys and copy the public key in the same place. Please make sure that the files are not public readable as the files would not be allowed for authentication!
- Reload sshd
# service sshd reload
Enable login via serial console
In case you have to use a remote console you usually need to login via serial console as this is usually mapped to the remote administration device. The following steps are required to configure your system properly.
- Allow ttyS0 as login device by appending ttyS0 to /etc/securetty
# grep S0 /etc/securetty ttyS0
- Add agetty to /etc/inittab so that it will listen to ttyS0
# grep S0 /etc/inittab co:2345:respawn:/sbin/agetty ttyS0 115200 vt100-nav
- Add the following settings to your kernel parameters in /boot/grub/menu.lst
# grep ttyS0 /boot/grub/menu.lst kernel /vmlinuz-2.6.9-78.0.1.ELsmp ro root=/dev/vg_axqa02rc_sr/lv_sharedroot console=tty0 console=ttyS0,115200 com-step com-debug
Check the remote administration console
It is very important that you reboot your cluster nodes to verify that you see all output messages on the remote administration console (e.g. ILO). Also you should be able to login via this remote console.
Check your fstab
It is annoying if you have to find out where your boot resides on a system with lots of partitions. With devices like the quorum disk or devices for the cloned boot it is difficult to decide which device is right. Therefor you should prepare your /etc/fstab so that the right devices are already present but not automatically mounted.
# cat /etc/fstab | grep boot /dev/sda1 /boot ext3 noauto,defaults 1 2
Check /etc/comoonics/enterprisecopy/*
The comoonics Enterprise Copy tools suite is very conveniant for automating various tasks. The default installation brings along some configuration files that might need some slight modifications to your hardware setup. For example the devices need to be adapted whether you use device-mapper multipath. Therefor check the files in /etc/comoonics/enterprisecopy and change the settings according your configuration.
Check that the ntp server is configured correctly
There should be no time drift in your cluster and you should check that the right time server is configured and that it is reachable from all cluster nodes.
# cat /etc/ntp.conf | grep -v \# restrict default nomodify notrap noquery restrict 127.0.0.1 server 10.34.168.50 fudge 127.127.1.0 stratum 10 driftfile /var/lib/ntp/drift broadcastdelay 0.008 keys /etc/ntp/keys
Check the runlevel 3 scripts
Various scripts are started with your Linux server. However a lot of them make no sense in a clustered environment. Typically candidates for deactivation are:
isdn, kudzu, autofs, wpa_supplicant, cups
kdump Configuration
kdump is the new means of providing vmcores on RHEL5 in case the kernel panics. Sometimes a kernel panic is provoked so that more debugging data may be obtained from a stalled system. Therefor it is important to configure kdump on a cluster.
- Modify kdump configuration file
# cat /etc/kdump.conf | grep -v \# ext3 LABEL=crash core_collector makedumpfile -d 31 extra_modules cciss ext3 jbd
- Modify /etc/grub.conf. Append crashkernel to the kernel boot parameters:
crashkernel=128M@16M
- Create local ext3 filesystem for crashdumps
# lvcreate -n "LV_CRASH" -L +15G /dev/VG_LOCAL # mkfs.ext3 /dev/VG_LOCAL/LV_CRASH
- Mount local ext3 filesystem to /var/crash
# cat /etc/fstab | grep CRASH /dev/VG_LOCAL/LV_CRASH /var/crash ext3 defaults 0 0
- Activate and start kdump
# chkconfig kdump on # service kdump start
- Test kdump
# echo "1" > /proc/sys/kernel/sysrq # echo "c" > /proc/sysrq-trigger
- See if vmcore was written to /var/crash
When local disks are present, relocate your chroot environment
A lot sharedroot clusters still have some local disks left. Those should to relocate the chroot environment so that in case of an emergency one may write debug information to disk instead of the ramdisk.
<chrootenv mountpoint = "/var/comoonics/chroot"
fstype = "ext3"
device = "/dev/vg_local/lv_comoonics"
chrootdir = "/var/comoonics/chroot"
/>