2006-01-25
| Revision History | ||
|---|---|---|
| Revision 1.1 | 2007-01-20 | MG |
| Small updates | ||
| Revision 1.0 | 2006-01-25 | MG |
| first official release | ||
| Revision 1.1 | 2006-09-01 | MG |
| adapted to new versions and bugfixes | ||
Abstract
This howto shortly sums up the steps to be made to build a shared root on a GFS Filesystem.
Table of Contents
Whenever multiple servers share data between each other and especially when those servers have common applications and are attached to a high-speed network a Cluster Filesystem is a good option to share data. Consequently it would be optimal to not only share the data of applications but to share every data which is shareable. That means also the so called root-Filesystem is to be shared.
To achieve this all servers need to have a shared resource to put the data on. Typically a high speed network like “Fibre Channel” or “Gigabit Ethernet” is used.
The the section called “Prerequesits” deals with the prerequesits necessary to understand and build a sharedroot with “GFS”.
the section called “Create the cluster configuration” describes the steps to build the cluster configuration and start the dependent services on the source machine.
After the cluster has been setup the section called “Create the GFS filesystem and build the shared-root” points out all steps to setup “GFS” and copy and configure the root on that filesystem.
Last but not least a new initial ramdisk has to be build boot the node into the cluster. the section called “Building the initial ramdisk” describes that process.
After this the cluster can be bootet. the section called “Analysing the sharedroot cluster and other important tools” points out some important tools and sources to look at to keep the cluster running and analyse problems.
This document, mini HOWTO build a sharedroot with GFS, is copyrighted (c) 2002 by Marc Grimme. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is available at http://www.gnu.org/copyleft/fdl.html.
Linux is a registered trademark of Linus Torvalds.
No liability for the contents of this document can be accepted. Use the concepts, examples and information at your own risk. There may be errors and inaccuracies, that could be damaging to your system. Proceed with caution, and although this is highly unlikely, the author(s) do not take any responsibility.
All copyrights are held by their by their respective owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark. Naming of particular products or brands should not be seen as endorsements.
Feedback is most certainly welcome for this document. Send
your additions, comments and criticisms to the following
email address : <grimme (at) atix.de>.
You should have experianced knowledge on storage and how storage networks can be build.
You should also have experianced linux server knowledge and understand the concept of initial ramdisks.
You should know about GFS and how a GFS filesystem can be setup
You should have at least two servers connected to some kind of storage network. Both servers need to have concurrent access to at least one better two logical units (LUNS).
You can also use some virtualisation software like VMWare or XEN. But you will also need the same requirements.
All servers should be able to boot from the storage network. When using “Fibre Channel” or “Parallel SCSI” that should not be a problem.
When you are using an “Ethernet” as storage network you should get your servers to boot from it. If you are using “iSCSI” get bootable “iSCSI HBAs” or else boot via “PXE”. But that is toppic of other documentations. For more information see the section called “Further Information”
It is very much important to have a running fencing environment as also required by GFS.
We require that one nodes is already preinstalled with a decent Linux Operating System that supports “GFS”. Best would be some RHEL4 compatible distribution.
That preinstalled node is required as template for building a shared root on. So set it up as it would already be the sharedroot cluster. Also best let it boot from the storage network so that you are sure you are happy with booting from you storage network. And that issue is already successfully faced.
It is also required to have all relevant “gfs,cman,fence,magma” packages from the “clustersuite” Channel installed.
The packages to help building the sharedroot and the initial ramdisk are also required. They can be downloaded from www.open-sharedroot.org in the files section. The packages are “comoonics-bootimage” and “comoonics-cs”. Also try to get the most accurate version for your system. You can also directly download from here:
comoonics-bootimage-fenceacksv (optional) and
comoonics-bootimage-fenceclient-ilo (optional) and
comoonics-bootimage-fenceclient-vmware (optional) and
comoonics-py-cs (optional)
comoonics-py-ec (optional)
We are using five VMWare guests (“gfs-node1 to 5”on on VMWare host. “gfs-node1 to gfs-node3” boot into the sharedroot “/dev/VG_SHAREDROOT/LV_SHAREDROOT” and “gfs-node4 and gfs-node5” boot into the sharedroot “/dev/VG_SHAREDROOT_OPENSSI/LV_SHAREDROOT_OPENSSI”. Every “sharedroot” consists of two disks on for the root filesystem and one for the bootdisk. Everybody that boots into one of those pools has mapped the bootdisk as “LUN0” and the root as “LUN1”. All nodes have two nics. The first “eth0” is used for locking and cluster communication and the other one “eth1” for the applications. “gfs-node1” is the template server. That means it sees all “LUNs” and boots the locally installed Linux as base for the “sharedroot”.
Warning
Although it is the official redhat tool. Don't ever use “system-config-gfs/system-config-cluster”. The sharedroot needs information in the cluster configuration file that will always be overwritten by those tools.
All cluster configuration options are to be written to the cluser configuration file for gfs /etc/cluster/cluster.conf. For editing use any editor, because that file is “XML”.
Let's recall the environment. We altogetger have five nodes involved in the cluster. “gfs-node1” boots from local filesystem and is used for management purposes and as template for the sharedroot cluster. Later on it can easily be integrated in any sharedroot.
“gfs-node2” and “gfs-node3” will boot in the sharedroot called “VG_SHAREDROOT/LV_SHAREDROOT” and “gfs-node4” and “gfs-node5” will mount “VG_SHAREDROOT_OPENSSI/LV_SHAREDROOT_OPENSSI”. To determine their identity the cluster needs the MAC-Address of the nodes and maps them to IP-Addresses. To monitor the boot process “gfs-node1” is also configured as syslog server.
All these information are collected on per node base in the cluster configuration under the xml-element “com_info” (see Figure 1, “Clusterconfig part for one node”). All other settings in the “com_info” element are self explanatory.
Figure 1. Clusterconfig part for one node
<clusternode name="gfs-node2" votes="1" nodeid="2">
<com_info>
<syslog name="gfs-node1"/>
<rootvolume name="/dev/VG_SHAREDROOT/LV_SHAREDROOT"/>
<eth name="eth0" mac="00:0C:29:3C:16:07" ip="10.0.0.2" mask="255.255.255.0" gateway=""/>
</com_info>
...
</clusternode>
The next step is to create a cluster configuration for all nodes. That configuration can also be changed at any time afterwards. Figure 2, “The cluster configuration file” depicts the cluster configuration for our test cluster. In order to keep it simple some nodes are left out.
Note
Don't forget to check that every node referred under clusternode is entered in the file /etc/hosts. It is also checked automatically by the bootprocess.
Figure 2. The cluster configuration file
<?xml version="1.0"?>
<cluster config_version="20" name="vmware_cluster">
<cman expected_votes="1" two_node="0">
</cman>
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="gfs-node1" votes="1" nodeid="1">
<com_info>
<syslog name="gfs-node1"/>
<rootvolume name="/dev/VG_SHAREDROOT/LV_SHAREDROOT"/>
<eth name="eth0" mac="00:0C:29:3B:XX:XX" ip="10.0.0.1" mask="255.255.255.0" gateway=""/>
<fenceackserver user="root" passwd="XXX"/>
</com_info>
<fence>
<method name="1">
<device name="fence_vmware_client" cfgfile="/mnt/data/vmware/GFS-Node-1/GFS-Node-1.vmx"/>
</method>
<method name="2">
<device name="fence_manual" nodename="gfs-node1"/>
</method>
</fence>
</clusternode>
<clusternode name="gfs-node2" votes="1" nodeid="2">
<com_info>
<syslog name="gfs-node1"/>
<rootvolume name="/dev/VG_SHAREDROOT/LV_SHAREDROOT"/>
<eth name="eth0" mac="00:0C:29:3C:XX:XX" ip="10.0.0.2" mask="255.255.255.0" gateway=""/>
<fenceackserver user="root" passwd="XXX"/>
</com_info>
<fence>
<method name="1">
<device name="fence_vmware_client" cfgfile="/mnt/data/vmware/GFS-Node-2/GFS-Node-2.vmx"/>
</method>
<method name="2">
<device name="fence_manual" nodename="gfs-node2"/>
</method>
</fence>
</clusternode>
...
<clusternode name="gfs-node4" votes="1" nodeid="4">
<com_info>
<syslog name="gfs-node1"/>
<rootvolume name="/dev/VG_SHAREDROOT_OPENSSI/LV_SHAREDROOT_OPENSSI"/>
<eth name="eth0" mac="00:0C:29:BC:XX:XX" ip="10.0.0.4" mask="255.255.255.0" gateway=""/>
<fenceackserver user="root" passwd="XXX"/>
</com_info>
<fence>
<method name="1">
<device name="fence_vmware_client" cfgfile="/mnt/data/vmware/GFS-Node-4/GFS-Node-4.vmx"/>
</method>
<method name="2">
<device name="fence_manual" nodename="gfs-node4"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_manual" name="fence_manual"/>
<fencedevice agent="/opt/atix/comoonics-fencing/fence_vmware_client" name="fence_vmware_client"
hostname="generix" username="user_for_vmwareconsole" password="the_password"
identityfile="ssh_id_file" verbose="on"
fence_vmware_master_cmd="/opt/atix/comoonics-fencing/fence_vmware_master"
/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
The next step would be to start the ccs daemon. That can be done via the initscript /etc/init.d/ccsd. It is started via /etc/init.d/ccsd start and you'll see the response in the syslog (see Figure 3, “The syslog of a successfully started ccsd” for a successful start).
Figure 3. The syslog of a successfully started ccsd
Jan 26 15:07:30 gfs-node1 ccsd[3259]: Starting ccsd 1.0.2: Jan 26 15:07:30 gfs-node1 ccsd[3259]: Built: Nov 20 2005 18:03:50 Jan 26 15:07:30 gfs-node1 ccsd[3259]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Jan 26 15:07:30 gfs-node1 ccsd[3259]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Jan 26 15:07:30 gfs-node1 ccsd[3259]: Initial status:: Quorate Jan 26 15:07:31 gfs-node1 ccsd: startup succeeded Jan 26 15:07:35 gfs-node1 ccsd[3259]: cluster.conf (cluster name = vmware_cluster, version = 20) found. Jan 26 15:07:35 gfs-node1 ccsd[3259]: Remote copy of cluster.conf is from quorate node. Jan 26 15:07:35 gfs-node1 ccsd[3259]: Local version # : 20 Jan 26 15:07:35 gfs-node1 ccsd[3259]: Remote version #: 20
Next step is to start the cluster manager “cman” with the command /etc/init.d/cman start. Implicitly “cman_tool join” is started and the apropriate lock module is loaded.
Note
Only dlm is suppored for the shared root.
For help on “cman_tool” have a look at the manpage. For a successful start of the cluster manager with /etc/init.d/cman startsee Figure 4, “The syslog of a successfully started cluster manager”. In this example the already runnng nodes can be ignored.
Figure 4. The syslog of a successfully started cluster manager
Jan 26 15:24:12 gfs-node1 kernel: CMAN <CVS> (built Jan 23 2006 23:21:04) installed Jan 26 15:24:12 gfs-node1 kernel: CMAN: Waiting to join or form a Linux-cluster Jan 26 15:24:13 gfs-node1 ccsd[3259]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Jan 26 15:24:13 gfs-node1 ccsd[3259]: Initial status:: Inquorate Jan 26 15:24:14 gfs-node1 kernel: CMAN: sending membership request Jan 26 15:24:14 gfs-node1 kernel: CMAN: sending membership request Jan 26 15:24:14 gfs-node1 kernel: CMAN: got node gfs-node4 Jan 26 15:24:14 gfs-node1 kernel: CMAN: got node gfs-node5 Jan 26 15:24:14 gfs-node1 kernel: CMAN: quorum regained, resuming activity Jan 26 15:24:14 gfs-node1 ccsd[3259]: Cluster is quorate. Allowing connections. Jan 26 15:24:14 gfs-node1 kernel: DLM 2.6.9-37.9 (built Nov 20 2005 17:57:43) installed Jan 26 15:24:14 gfs-node1 cman: startup succeeded
As already stated the template node is to be setup as any other node in the cluster the auto start of “cman” on bootime must be disabled. That can be done by the command chkconfig cman off.
Note
The node already joins the cluster in the initial ramdisk so it and should not re-join.
. Or any other comparable tool of your distribution.
Warning
Be extremly curios about the fact that “cman” is disabled for starting at boottime. Otherwise the cluster will panic and be fenced during booting up.
Before the gfs filesystem should be setup the fence daemon has to be startet. Besides never forget to also test the fencing. To start fencing again the initscript /etc/init.d/fenced start does the trick (see Figure 5, “The syslog of a successfully started fenced”).
Figure 5. The syslog of a successfully started fenced
Jan 26 15:36:55 gfs-node1 fenced: startup succeeded
The next would be to test if fencing works as expected. This can be done by executing the command “fence_node” followed by the nodename defined in the cluster configuration. Whenever you call a fence_node gfs-node2 it should alway return successfully (with a 0 returncode) and that destinated node should show the fencing effect. That means it should reboot or being denied from accessing the shared storage.
As already stated the template node is to be setup as any other node in the cluster the auto start of “fenced” on bootime must be disabled. That can be done by the command chkconfig fenced off. The fencing is startet in the initrd and after that restarted in a change root.
Note
Fencing directly on the shared root cannot work because at fencetime the root is freezed for I/Os so the fencing commands cannot be executed and the cluster will die. So in a shared-root environment the initrd prepares for fencing copies the files on a ramdisk and starts the fence daemon on that root.
The fence daemon in the changeroot environment is started by the initscript “/etc/init.d/fenced-chroot” which comes from the rpm “comoonics-bootimage”. Don't start it now because you don't have the changeroot for the fenced just check that it automatically starts (chkconfig --list fenced-chroot).
All other services like “clvmd” and the resource group manager “/etc/init.d/rgmanager” are optional. But should be enabled if unsure.
Also think about disabling the “GFS” service. If you do so rebooting should be possible. But be aware that the “GFS” must not be shown in any /etc/rc directory. With “Red Hat” the command is chkconfig --del gfs.
The next step is to create the “GFS” filesystem. As the linux volume manger “LVM” as capable of running in a clustered environment this is the optimal solution to put “GFS” on and will be described in the following.
So first we need to label the disk(s) we want to put “GFS” on as “LVM” disks. The command pvcreate disk does that for us. Then the volume group for the system has to created on that phsical disk. This is done by the command vgcreate vg_name disk+. Now there is a newly created volume group and we still have to create a logical volume from it. The command lvcreate -n lv_name -L size vg_name creates a new logical volume.
In our environment we take disk “/dev/sdc” to be the physical disk and allocate the whole available storage on logical volume “LV_SHAREDROOT” within “VG_SHAREDROOT”. See Figure 6, “Creating the volume management” for the commands.
Figure 6. Creating the volume management
[root@gfs-node1 ~]# pvcreate /dev/sdc /dev/cdrom: open failed: Read-only file system Attempt to close device '/dev/cdrom' which is not open. Physical volume "/dev/sdc" successfully created [root@gfs-node1 ~]# vgcreate VG_SHAREDROOT /dev/sdc Volume group "VG_SHAREDROOT" successfully created [root@gfs-node1 ~]# vgdisplay VG_SHAREDROOT --- Volume group --- VG Name VG_SHAREDROOT System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1 VG Access read/write VG Status resizable Clustered yes Shared no MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 10.00 GB PE Size 4.00 MB Total PE 2559 Alloc PE / Size 0 / 0 Free PE / Size 2559 / 10.00 GB VG UUID uE0c4p-uRrg-EZy8-tPFE-SjmE-mH1J-oQohZt [root@gfs-node1 ~]# lvcreate -l 2559 -n LV_SHAREDROOT VG_SHAREDROOT
Now we need to create a filesystem on the logical volume “/dev/VG_SHAREDROOT/LV_SHAREDROOT”. For this step we need to know how many nodes will mount the filesystem what name we give to the locktable and what lock protocol we use.
In our case the nodes will be 8 the lock protocol is “lock_dlm” and the locktable is called “lt_sharedroot”. As parameter the locktable is given in combination with the clustername which is “vmware_cluster” in our example (see Figure 7, “Create the filesystem”). Now we will mount it on /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT. You can mount it whereever you want. Just change the path for the commands.
Figure 7. Create the filesystem
[root@gfs-node1 ~]# gfs_mkfs -j 8 -t vmware_cluster:lt_sharedroot -p lock_dlm /dev/VG_SHAREDROOT/LV_SHAREDROOT This will destroy any data on /dev/VG_SHAREDROOT/LV_SHAREDROOT. It appears to contain a GFS filesystem. Are you sure you want to proceed? [y/n] y Device: /dev/VG_SHAREDROOT/LV_SHAREDROOT Blocksize: 4096 Filesystem Size: 1834796 Journals: 8 Resource Groups: 28 Locking Protocol: lock_dlm Lock Table: vmware_cluster:lt_sharedroot Syncing... All Done [root@gfs-node1 ~]# mount -t gfs /dev/VG_SHAREDROOT/LV_SHAREDROOT /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/
As next step we will just copy all the data on the template systems root filesystem on the directory where our new GFS filesystem resides. Use whatever copytool you like and don't copy neither /sys nor /proc nor /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/. In this example we can easily take cp -ax like in Figure 8, “Copy to the sharedroot”
Figure 8. Copy to the sharedroot
[root@gfs-node1 ~]# cp -ax / /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/ [root@gfs-node1 ~]# ls -l /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/ total 140 drwxr-xr-x 2 root root 2048 Jan 21 04:03 bin drwxr-xr-x 2 root root 3864 Jan 27 13:59 boot drwxr-xr-x 3 root root 3864 Jan 18 15:36 cluster drwxr-xr-x 2 root root 3864 Jan 27 14:01 dev drwxr-xr-x 62 root root 2048 Jan 27 13:52 etc drwxr-xr-x 2 root root 3864 Feb 22 2005 home drwxr-xr-x 2 root root 3864 Feb 22 2005 initrd drwxr-xr-x 12 root root 2048 Jan 21 04:03 lib drwx------ 2 root root 3864 Jan 18 15:28 lost+found drwxr-xr-x 4 root root 3864 Jan 27 12:10 media drwxr-xr-x 2 root root 3864 Oct 7 15:03 misc drwxr-xr-x 3 root root 3864 Jan 18 15:18 mnt drwxr-xr-x 3 root root 3864 Jan 18 15:17 opt drwxr-xr-x 2 root root 3864 Jan 27 14:01 proc drwxr-x--- 5 root root 3864 Jan 26 15:37 root drwxr-xr-x 2 root root 2048 Jan 21 04:03 sbin drwxr-xr-x 2 root root 3864 Jan 27 14:03 selinux drwxr-xr-x 2 root root 3864 Feb 22 2005 srv drwxr-xr-x 2 root root 3864 Jan 27 13:59 sys drwxrwxrwt 13 root root 3864 Jan 27 13:59 tmp drwxr-xr-x 14 root root 3864 Jan 20 10:37 usr drwxr-xr-x 18 root root 3864 Jan 18 14:30 var
The next and most important step is to setup the structure for the hostdependent files and make the non shareable parts local for all the nodes. The “open-sharedproject” provides tools to help creating that filesystem structure. By concept all hostdependent files will be put under the tree /cluster/cdsl/{hostname} and if there is a need of shared files/directories in hostdependent directories they will be found in /cluster/shared. This structure is automatically setup by the script com_create_cdsl. That commands extracts all nodes from the cluster configuration and creates the hostdependent file structure. If you then want to create a hostdependent file com_create_hostdependent_file and com_create_shared_file are the tools to help you with. Another nice feature is that every hostdependent file created or moved with these scripts will reside under /cluster/cdsl/default/{filename} so that you can easily copy the directory /cluster/cdsl/default to /cluster/cdsl/{hostname}. You'll see more in the following example.
So the first thing we need to do is create the filesystem structure for the hostdependent files. As shown
Figure 9.
[root@gfs-node1 ~]# com_create_cdsl -n -r /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT / Creating hostdir "//cluster/cdsl/gfs-node1/"..(OK) Creating hostdir "//cluster/cdsl/gfs-node2/"..(OK) Creating hostdir "//cluster/cdsl/gfs-node3/"..(OK) Creating hostdir "//cluster/cdsl/gfs-node4/"..(OK) Creating hostdir "//cluster/cdsl/gfs-node5/"..(OK) Creating hostdir "//cluster/cdsl/gfs-node6/"..(OK) Creating local dir "/cdsl.local"..(OK) [root@gfs-node1 ~]# mount --bind /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/cluster/cdsl/1 /cdsl.local [root@gfs-node1 ~]#
The next step is where a lot of people have different opinions on. What files have to be hostdependent and what files don't. Our example cannot hold for any configuration found. But the experience is that /var has to be completely hostdependent exept for /var/lib which can again be shared. Also /tmp should be hostdependent but see the section called “The /tmp directory”. Last but not least /etc/mtab is a file which has to be hostdependent as any node stores its mounted filesystems here. As it is recreated while the system boots it will be difficult to make it hostdependent. The file or better the link will be removed at boottime and created new as a file. To work around this just make it a symbolic link to /proc/mounts. Also don't forget hostdependent configuration files like for network adapters or the like. Another very important file is the /etc/fstab. It should hold the new root and other filesystems adapted to the needs you have. And always try to make any configurationfile shareable which is most often possible.
Last but not least don't forget to remove the configuration for the network adapter the initial ramdisk already initializes for the cluster. That adapter should stay up all the time to allow fenceless reboots and shutdowns.
As /tmp is a directory which should be on the one hand hostdependent and on the other hand the data is only important during runtime it could also be on a temporary filesystem or - if available - on local disks. There seem to be problems with some applications who rely on accesstime modifications on /tmp. As GFS by default does not modify accesstimes - out of an important reason - it could be helpfull to put /tmp on ramfilesystem or on a local disk.
Figure 10, “Build files on shared root” shows the steps in our example.
Figure 10. Build files on shared root
[root@gfs-node1 grub]# mkdir /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/var/lib/fence_tool [root@gfs-node1 ~]# ls -l /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT/ total 144 total 150 drwxr-xr-x 2 root root 2048 Sep 11 2005 bin drwxr-xr-x 4 root root 1024 May 12 15:06 boot drwxr-xr-x 4 root root 3864 Oct 5 2005 cdsl.local drwxr-xr-x 5 root root 3864 Oct 5 2005 cluster -rw-r--r-- 1 root root 0 Sep 28 2005 comoonics-boot.log drwxr-xr-x 14 root root 5880 May 12 15:03 dev drwxr-xr-x 74 root root 2048 May 12 17:14 etc drwxr-xr-x 2 root root 3864 May 9 16:42 home drwx------ 2 root root 0 May 12 14:02 initrd drwxr-xr-x 9 root root 2048 Sep 10 2005 lib drwxr-xr-x 7 root root 2048 Sep 11 2005 lib64 drwx------ 2 root root 3864 Sep 9 2005 lost+found drwxr-xr-x 6 root root 3864 Sep 30 2005 media drwxr-xr-x 2 root root 3864 Dec 2 16:40 misc drwxr-xr-x 3 root root 3864 Sep 9 2005 mnt drwxr-xr-x 5 root root 3864 Sep 10 2005 opt dr-xr-xr-x 183 root root 0 May 12 14:01 proc drwxr-x--- 10 root root 3864 May 12 17:14 root drwxr-xr-x 2 root root 2048 Sep 11 2005 sbin drwxr-xr-x 2 root root 3864 Sep 30 2005 scratch drwxr-xr-x 2 root root 3864 Sep 9 2005 selinux drwxr-xr-x 2 root root 3864 Aug 12 2004 srv drwxr-xr-x 9 root root 0 May 12 14:01 sys drwxr-xr-x 8 root root 4096 May 15 04:02 tmp drwxr-xr-x 16 root root 3864 Sep 10 2005 usr lrwxrwxrwx 1 root root 15 Oct 5 2005 var -> cdsl.local//var drwxr-xr-x 21 root root 3864 Sep 10 2005 var.orig [root@gfs-node1 ~]# chroot /cluster/mount/VG_SHAREDROOT-LV_SHAREDROOT [root@gfs-node1 /]# com_create_hostdependent_file -F gfs -n /var Creating/Copying the file to all hosts... Copying "//var" => "///cluster/cdsl/default//var"...(OK) Copying "//var" => "///cluster/cdsl/gfs-node1//var"...(OK) Copying "//var" => "///cluster/cdsl/gfs-node2//var"...(OK) Copying "//var" => "///cluster/cdsl/gfs-node3//var"...(OK) Copying "//var" => "///cluster/cdsl/gfs-node4//var"...(OK) Copying "//var" => "///cluster/cdsl/gfs-node5//var"...(OK) Copying "//var" => "///cluster/cdsl/gfs-node6//var"...(OK) (DONE) Moving //var => //var.orig...(OK) Updating cdls structure...(OK) [root@gfs-node1 /]# com_create_shared_file -F gfs -n /var/lib Removing/backing up the file to all hosts... Backing up "///cluster/cdsl/gfs-node1//var/lib" => "///cluster/cdsl/gfs-node1//var/lib.orig" ...(OK) Backing up "///cluster/cdsl/gfs-node2//var/lib" => "///cluster/cdsl/gfs-node2//var/lib.orig" ...(OK) Backing up "///cluster/cdsl/gfs-node3//var/lib" => "///cluster/cdsl/gfs-node3//var/lib.orig" ...(OK) (DONE) Moving /var/lib to shared tree => /cluster/shared//var/lib...(OK) Updating cdls structure... Updateing ../../../cluster/shared//var/lib ..(OK) Updateing ../../../cluster/shared//var/lib ..(OK) Updateing ../../../cluster/shared//var/lib ..(OK) (DONE) [root@gfs-node1 /]# ls -l total 150 drwxr-xr-x 2 root root 2048 Sep 11 2005 bin drwxr-xr-x 4 root root 1024 May 12 15:06 boot drwxr-xr-x 4 root root 3864 Oct 5 2005 cdsl.local drwxr-xr-x 5 root root 3864 Oct 5 2005 cluster -rw-r--r-- 1 root root 0 Sep 28 2005 comoonics-boot.log drwxr-xr-x 14 root root 5880 May 12 15:03 dev drwxr-xr-x 74 root root 2048 May 12 17:14 etc drwxr-xr-x 2 root root 3864 May 9 16:42 home drwx------ 2 root root 0 May 12 14:02 initrd drwxr-xr-x 9 root root 2048 Sep 10 2005 lib drwxr-xr-x 7 root root 2048 Sep 11 2005 lib64 drwx------ 2 root root 3864 Sep 9 2005 lost+found drwxr-xr-x 6 root root 3864 Sep 30 2005 media drwxr-xr-x 2 root root 3864 Dec 2 16:40 misc drwxr-xr-x 3 root root 3864 Sep 9 2005 mnt drwxr-xr-x 5 root root 3864 Sep 10 2005 opt dr-xr-xr-x 183 root root 0 May 12 14:01 proc drwxr-x--- 10 root root 3864 May 12 17:14 root drwxr-xr-x 2 root root 2048 Sep 11 2005 sbin drwxr-xr-x 2 root root 3864 Sep 30 2005 scratch drwxr-xr-x 2 root root 3864 Sep 9 2005 selinux drwxr-xr-x 2 root root 3864 Aug 12 2004 srv drwxr-xr-x 9 root root 0 May 12 14:01 sys drwxr-xr-x 8 root root 4096 May 15 04:02 tmp drwxr-xr-x 16 root root 3864 Sep 10 2005 usr lrwxrwxrwx 1 root root 15 Oct 5 2005 var -> cdsl.local//var drwxr-xr-x 21 root root 3864 Sep 10 2005 var.orig [root@gfs-node1 /]# ls var account cache crash db empty lib lib.orig local lock log mail nis opt preserve run spool tmp yp [root@gfs-node1 /]# ls -l var lrwxrwxrwx 1 root root 15 Jan 28 18:25 var -> cdsl.local//var [root@gfs-node1 /]# ls -l /cluster/ total 12 drwxr-xr-x 6 root root 3864 Jan 28 18:25 cdsl drwxr-xr-x 7 root root 3864 Jan 19 10:03 mount drwxr-xr-x 3 root root 3864 Jan 28 18:26 shared [root@gfs-node1 /]# ls -l /cluster/shared/var/lib/ total 52 drwxr-xr-x 2 root root 3864 Jan 18 15:37 alternatives drwxr-xr-x 2 root root 3864 Jan 18 14:36 dhcp drwxr-x--- 2 root root 3864 Feb 22 2005 dhcpv6 drwxr-xr-x 2 root root 3864 Feb 22 2005 games -rw-r--r-- 1 root root 1138 Jan 27 13:16 logrotate.status drwxr-xr-x 2 root root 3864 Jan 18 16:50 misc drwxr-xr-x 4 root root 3864 Jan 18 14:31 nfs drwxr-xr-x 2 root root 3864 Apr 9 2005 pcmcia -rw------- 1 root root 512 Jan 27 12:09 random-seed drwxr-xr-x 2 rpm rpm 3864 Jan 27 13:16 rpm drwxr-x--- 2 root slocate 3864 Aug 22 04:26 slocate -rw-r--r-- 1 root root 95 Oct 9 14:51 supportinfo drwxr-xr-x 2 root root 3864 Oct 9 11:44 up2date [root@gfs-node1 /]# ls -l /cluster/cdsl/gfs-node2/var/ total 72 drwxr-xr-x 2 root root 3864 Jan 18 14:30 account drwxr-xr-x 5 root root 3864 Jan 18 14:31 cache drwxr-xr-x 3 netdump netdump 3864 Jan 18 14:30 crash drwxr-xr-x 3 root root 3864 Jan 18 14:30 db drwxr-xr-x 3 root root 3864 Jan 18 14:30 empty lrwxrwxrwx 1 root root 35 Jan 28 18:26 lib -> ../../../../cluster/shared//var/lib drwxr-xr-x 12 root root 3864 Jan 18 14:36 lib.orig drwxr-xr-x 2 root root 3864 Feb 22 2005 local drwxrwxr-x 5 root lock 3864 Jan 27 13:16 lock drwxr-xr-x 8 root root 3864 Jan 22 04:02 log lrwxrwxrwx 1 root root 10 Jan 28 18:24 mail -> spool/mail drwxr-xr-x 2 root root 3864 Feb 22 2005 nis drwxr-xr-x 2 root root 3864 Feb 22 2005 opt drwxr-xr-x 2 root root 3864 Feb 22 2005 preserve drwxr-xr-x 14 root root 3864 Jan 27 12:10 run drwxr-xr-x 15 root root 3864 Jan 18 15:37 spool drwxrwxrwt 3 root root 3864 Jan 25 17:36 tmp drwxr-xr-x 3 root root 3864 Jan 18 14:31 yp [root@gfs-node1 /]# ls -l /cluster/cdsl/gfs-node2/var/lib/ total 52 drwxr-xr-x 2 root root 3864 Jan 18 15:37 alternatives drwxr-xr-x 2 root root 3864 Jan 18 14:36 dhcp drwxr-x--- 2 root root 3864 Feb 22 2005 dhcpv6 drwxr-xr-x 2 root root 3864 Feb 22 2005 games -rw-r--r-- 1 root root 1138 Jan 27 13:16 logrotate.status drwxr-xr-x 2 root root 3864 Jan 18 16:50 misc drwxr-xr-x 4 root root 3864 Jan 18 14:31 nfs drwxr-xr-x 2 root root 3864 Apr 9 2005 pcmcia -rw------- 1 root root 512 Jan 27 12:09 random-seed drwxr-xr-x 2 rpm rpm 3864 Jan 27 13:16 rpm drwxr-x--- 2 root slocate 3864 Aug 22 04:26 slocate -rw-r--r-- 1 root root 95 Oct 9 14:51 supportinfo drwxr-xr-x 2 root root 3864 Oct 9 11:44 up2date [root@gfs-node1 /]# cd etc/ [root@gfs-node1 etc]# ls -l mtab -rw-r--r-- 1 root root 408 Jan 27 13:52 mtab [root@gfs-node1 etc]# rm mtab rm: remove regular file `mtab'? y [root@gfs-node1 etc]# ln -s /proc/mounts mtab [root@gfs-node1 etc]# cd etc/sysconfig bash: cd: etc/sysconfig: No such file or directory [root@gfs-node1 etc]# cd sysconfig [root@gfs-node1 sysconfig]# com_create_hostdependent_file /etc/sysconfig/network Creating/Copying the file to all hosts... Copying "//etc/sysconfig/network" => "///cluster/cdsl/default//etc/sysconfig/network"...(OK) Copying "//etc/sysconfig/network" => "///cluster/cdsl/gfs-node1//etc/sysconfig/network"...(OK) Copying "//etc/sysconfig/network" => "///cluster/cdsl/gfs-node2//etc/sysconfig/network"...(OK) Copying "//etc/sysconfig/network" => "///cluster/cdsl/gfs-node3//etc/sysconfig/network"...(OK) Copying "//etc/sysconfig/network" => "///cluster/cdsl/gfs-node4//etc/sysconfig/network"...(OK) Copying "//etc/sysconfig/network" => "///cluster/cdsl/gfs-node5//etc/sysconfig/network"...(OK) Copying "//etc/sysconfig/network" => "///cluster/cdsl/gfs-node6//etc/sysconfig/network"...(OK) (DONE) Moving //etc/sysconfig/network => //etc/sysconfig/network.orig...(OK) Updating cdls structure...(OK) [root@gfs-node1 sysconfig]# rm network-scripts/ifcfg-eth0 rm: remove regular file `network-scripts/ifcfg-eth0'? y
Until now it is necessary to build a standard devicefilesystem hierarchie. If not the initprocess will fail. Please check and copy the files if missing. In future versions of the comoonics-bootimage tools that will not be necessary any more.
Don't forget to adapt the fstab to the new filesystem layout. That means change the rootfilesystem entry so that no filesystemcheck is done and the filesytem type is gfs. Also think about setting the noatime, nodiratime parameters.
The initial ramdisk needs to start some services in advance and others which are enabled by default need to be disabled. In the following all relevant services are enumerated in order of execution during boottime with on or off meaning chconfig servicename on|off.
- bootsr:on source: comoonice-bootimage
Builds the chroot for fenced and sets the drop_count generic or as requested. Controlled by
/etc/sysconfig/cluster.- ccsd-chroot: on source: comoonics-bootimage
Init script like the ccsd on to start the ccsd but in the chroot build by bootsr
- ccsd:off source: ccsd
Normal init script to start ccsd.
- cman:off source: cman
Would try to rejoin the cluster. We are already part of the cluster and rejoining would mean getting fenced so disable this service.
- fenced:off source: fence
Disabled because the fenced has to be startet in a sharedroot in a different changeroot.
- fenced-chroot:on source: comoonics-bootimage
Would start the fenced in a change root prepared by bootsr. Default is
/var/lib/fence_tool. It would be good to change to a directory residing on a localdisk. As the chroot is rebuild at boottime no important data are in that directory.- fenceacksv:on source: comoonics-bootimage-fenceacksv
Starts another service running in fenced chroot. This service is for acknowledging nodes which got manually fenced.
Now we have a shared root ready for boot. The next we have to create is the initial ramdisk and bootdisk.
Now the filesystem is configured for first but. Next is to inilize the bootprocess. In our example the bootfiles are on a dedicated disk but there should be no problem to put them on a partition as well. We are using “GRUB” as bootloader but others like “LILO” should not make any problems. In this example our template node sees the bootpartition as disk /dev/sdb so the first thing is to partition /dev/sdb. Just create a primary boot partition make a “GRUB”-supported filesystem on it and copy all relevant file on to it as shown in Figure 11, “Create a bootpartion and make it for GRUB”
Figure 11. Create a bootpartion and make it for GRUB
[root@gfs-node1 ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Disk /dev/sdb doesn't contain a valid partition table
[root@gfs-node1 ~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): p
Disk /dev/sdb: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-512, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-512, default 512):
Using default value 512
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@gfs-node1 ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 512 524272 83 Linux
[root@gfs-node1 ~]# mkfs.ext3 -L boot /dev/sdb1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=boot
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
131072 inodes, 524272 blocks
26213 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67633152
64 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 27 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@gfs-node1 ~]# mount /dev/sdb1 /mnt/loop
[root@gfs-node1 ~]# mkdir /mnt/loop/grub
[root@gfs-node1 ~]# cp -a /usr/share/grub/i386-redhat/* /mnt/loop/grub
[root@gfs-node1 ~]# ls -l /mnt/loop/grub/
total 302
-rw-r--r-- 1 root root 7956 Aug 23 05:38 e2fs_stage1_5
-rw-r--r-- 1 root root 7684 Aug 23 05:38 fat_stage1_5
-rw-r--r-- 1 root root 6996 Aug 23 05:38 ffs_stage1_5
-rw-r--r-- 1 root root 7028 Aug 23 05:38 iso9660_stage1_5
-rw-r--r-- 1 root root 8448 Aug 23 05:38 jfs_stage1_5
-rw-r--r-- 1 root root 7188 Aug 23 05:38 minix_stage1_5
-rw-r--r-- 1 root root 9396 Aug 23 05:38 reiserfs_stage1_5
-rw-r--r-- 1 root root 512 Aug 23 05:38 stage1
-rw-r--r-- 1 root root 103688 Aug 23 05:38 stage2
-rw-r--r-- 1 root root 103688 Aug 23 05:38 stage2_eltorito
-rw-r--r-- 1 root root 7272 Aug 23 05:38 ufs2_stage1_5
-rw-r--r-- 1 root root 6612 Aug 23 05:38 vstafs_stage1_5
-rw-r--r-- 1 root root 9308 Aug 23 05:38 xfs_stage1_5
[root@gfs-node1 ~]# uname -r
2.6.9-22.0.1.ELsmp
[root@gfs-node1 ~]# cp /boot/*$(uname -r)* /mnt/loop/
[root@gfs-node1 ~]# ls -l /mnt/loop/
total 3293
-rw-r--r-- 1 root root 48021 Jan 28 18:45 config-2.6.9-22.0.1.ELsmp
drwxr-xr-x 2 root root 1024 Jan 28 18:44 grub
-rw-r--r-- 1 root root 1099856 Jan 28 18:45 initrd-2.6.9-22.0.1.ELsmp.img
drwx------ 2 root root 12288 Jan 28 18:40 lost+found
-rw-r--r-- 1 root root 758541 Jan 28 18:45 System.map-2.6.9-22.0.1.ELsmp
-rw-r--r-- 1 root root 1426548 Jan 28 18:45 vmlinuz-2.6.9-22.0.1.ELsmp
[root@gfs-node1 ~]# rm /mnt/loop/initrd-2.6.9-22.0.1.ELsmp.img
rm: remove regular file `/mnt/loop/initrd-2.6.9-22.0.1.ELsmp.img'? y
[root@gfs-node1 ~]# cat /boot/grub/device.map
# this device map was generated by anaconda
(fd0) /dev/fd0
(hd0) /dev/sda
[root@gfs-node1 ~]# cp /boot/grub/device.map /mnt/loop/grub/
[root@gfs-node1 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/centos/system
# initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.9-22.0.1.ELsmp)
root (hd0,0)
kernel /vmlinuz-2.6.9-22.0.1.ELsmp ro root=/dev/centos/system
initrd /initrd-2.6.9-22.0.1.ELsmp.img
[root@gfs-node1 ~]# cp /boot/grub/grub.conf /mnt/loop/grub/
[root@gfs-node1 ~]# vi /mnt/loop/grub/grub.conf
[root@gfs-node1 ~]# cat /mnt/loop/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/centos/system
# initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS sharedroot (2.6.9-22.0.1.ELsmp)
root (hd0,0)
kernel /vmlinuz-2.6.9-22.0.1.ELsmp rw
initrd /initrd_sr-2.6.9-22.0.1.ELsmp.img
title CentOS sharedroot (2.6.9-22.0.1.ELsmp failsave)
root (hd0,0)
kernel /vmlinuz-2.6.9-22.0.1.ELsmp rw
initrd /initrd_sr-2.6.9-22.0.1.ELsmp.img.failsave
[root@gfs-node1 ~]# cat /boot/grub/device.map
# this device map was generated by anaconda
(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdb
[root@gfs-node1 ~]# grub
Probing devices to guess BIOS drives. This may take a long time.
GNU GRUB version 0.95 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> root (hd1)
Filesystem type unknown, using whole disk
grub> root (hd1,0)
Filesystem type is ext2fs, partition type 0x83
grub> setup (hd1,0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd1,0)"... failed (this is not fatal)
Running "embed /grub/e2fs_stage1_5 (hd1,0)"... failed (this is not fatal)
Running "install /grub/stage1 (hd1,0) /grub/stage2 p /grub/grub.conf "... succeeded
Done.
grub> quit
Now there are two possibilities to create the initial ramdisk. One is to change root to the mounted gfs filesystem and process it from there and the other is to take our template node. Here we take the template node.
Before we use a new script from the comoonics-bootimage package we need to make sure everything is in place. The most important thing is the “dependency file”. It is a text file that lists all binaries and special libraries needed for the initial ramdisk. The mkinitrd tool reads the configfile /etc/comoonics/comoonics-bootimage.cfg. Here you'll find the entry “dep_filename=/etc/comoonics/bootimage/files-$(uname -r).list”. That file normally is created as symbollic link to /etc/comoonics/bootimage/gfs61-es40-files.i686.list. This file lists all dependency files for the given architecture. If the link does not exist either change the configfile or create the link. So basically we should be ready to go.
The initial ramdisk is created by the script /opt/atix/comoonics_bootimage/mkinitrd that comes from the package comoonics-bootimage. It is called as the original “mkinitrd” but has a lot of different options. Any initial ramdisk has to provide a lot of more functionality as a typical initird so the build process is longer and would be much more complicated. But the new “mkinitrd” helps you along.
Figure 12, “A typical process of building the initrd” shows you the process. You can ignore the errors shown up if all these files or unnecessary for you booting up.
Figure 12. A typical process of building the initrd
[root@gfs-node1 cdsl]# mkinitrd -f /mnt/loop/initrd_sr-2.6.9-22.0.1.ELsmp.gz 2.6.9-22.0.1.ELsmp Retreiving dependent files...found 353 (OK) Copying files...cp: cannot overwrite directory `/tmp/initrd.mnt.W30851//etc/init.d' with non-directory cp: cannot stat `/etc/initiatorname.iscsi': No such file or directory cp: cannot stat `/etc/iscsi.conf': No such file or directory cp: cannot stat `/etc/rc.d/init.d/iscsi': No such file or directory cp: cannot stat `fence_gnbd': No such file or directory cp: cannot stat `gnbd_import': No such file or directory cp: cannot stat `iscsid': No such file or directory cp: cannot stat `iscsi-device': No such file or directory cp: cannot stat `iscsi-iname': No such file or directory cp: cannot stat `iscsi-ls': No such file or directory cp: cannot stat `iscsi-mountall': No such file or directory cp: cannot stat `iscsi-umountall': No such file or directory cp: cannot stat `/lib/i686/librtkaio-2.3.2.so': No such file or directory cp: cannot stat `/lib/libBrokenLocale-2.3.2.so': No such file or directory cp: cannot stat `/lib/libNoVersion-2.3.2.so': No such file or directory cp: cannot stat `lilo': No such file or directory cp: cannot stat `/sbin/fence_apc_old': No such file or directory cp: cannot stat `/sbin/fence_xcat': No such file or directory cp: cannot stat `scsi_info': No such file or directory cp: cannot stat `/usr/lib/libkrbafs.so.0': No such file or directory cp: cannot stat `/usr/lib/libkrbafs.so.0.0.0': No such file or directory (OK) Copying kernelmodules (2.6.9-22.0.1.ELsmp)...(OK) Post settings ..(OK) Cpio and compress..(OK) Cleaning up (/tmp/initrd.mnt.W30851, )...(OK) -rw-r--r-- 1 root root 26707 Jan 28 19:12 /mnt/loop/initrd_sr-2.6.9-22.0.1.ELsmp.gz
This area lists other resources, which would include books, web sites, newsgroups, mailing lists, etc.
Openshared Root Site. open-sharedroot.sourceforge.net.
Redhats: Cluster Project Pags. sources.redhat.com/cluster.