Personal tools
You are here: Home Documentation Administrator's Handbook Part III: Designing and Planning a Shared Root Cluster
Document Actions

Part III: Designing and Planning a Shared Root Cluster

General setup and design recommendations for a Shared Root Cluster

DESIGNING AND PLANNING A SHARED ROOT CLUSTER

Avoiding Single Point of Failures (SPoFs)

Because a Shared Root Cluster is used for mission-critical applications, the recommended design includes high availabillity measures on different layers:

On top there are usually more than two cluster nodes to allow application failover and server loadbalancing - although it is also possible to construct single node Diskless Shared Root Clusters for various scenarios (e. g. a single node staging cluster).

As the nodes of a Diskless Shared Root Cluster only make use of local harddisks for volatile information (e.g. /tmp or swap), the next layer connects the servers with their shared storage system. The disk space is usually provided by a SAN and only in few cases by a NFS server. Therefor you will need a fibre channel or dedicated iSCSI ethernet network between SAN and cluster nodes. For the majority of clusters it is mandatory that the storage network must be fully redundant. That means that the cluster nodes usually are equipped with dual fibre channel HBAs (Host Bus Adapters) and every node is connected to the storage array via multiple paths provided by a redundant switched fabric. If one link is down, there is hope that the other path is still working.

Then the redundancy continues in the storage array. As the shared root partition is a crucial part of the Diskless Shared Root Cluster, the data is usually stored on a fully redundant RAID 1 (mirroring) volume. Also it is important, that the speed of the disc array is sufficient for the number of cluster nodes and clients to serve. You may influence the performance with the number of disks in the diskgroup that forms the RAID array. However the storage controllers and the hard drive technology are limiting factors.

More redundancy is needed for the network that connects the cluster nodes with each other and the network that is used by the clients to access the cluster ressources. The network bandwith should be as high as possible and you should use different carriers. A minimum of two bonded network interface cards (NICs)is needed for redundant connections. In such scenarios it is mandatory to use Virtual LANs (VLANs) together with Quality of Service (QoS) to separate the network packages and allow traffic shaping so that the Inter Cluster Communication (ICC) is delivered with high priority. Ideally you would use seperate NICs for ICC, public connection of the clients and your management connections to the cluster node.

Now you need to provide some protection against power failure. That means that your servers should include redundant power supplies and your racks should be connected with at least 2 power sources. Ideally one power source is supported by both batteries and diesel power generators.

This setup allows you to achieve scalable performance and high availability and enables you to create disaster tolerant infrastructures. All you need to do is to put your cluster equippment in different fire compartments and replicate your data to a similar structure in another data center.

Failover Domains

In a cluster you have the option to define failover domains. A failover domain is a subset of your cluster nodes that are configured to run a specific service in case that there is a system failure. That means you may specify the nodes were your service is allowed to run.

A failover domain may be configured with the following options:

  • Unrestricted - The specified nodes are preferred, but the service assigned to this domain may run on all available cluster nodes.
  • Restricted - Allows you to specify particular nodes to run a specific service. It the nodes are unavailable then the service cannot be started.
  • Unordered - The service will start on any node within the failover domain without any preference or priority order.
  • Ordered - This option allows you to choose which nodes should run a service in a preferred manner.

Failover domains are usually unrestricted and unordered.

Tip: To implement the concept of a preferred member, create an unrestricted failover domain comprised of only one cluster member. By doing this, a service runs on the preferred member; in the event of a failure, the service fails over to any of the other members.

Power down after failure

Most companies run their critical services on a huge cluster grids. In such huge scenarious the administrators usually get only informed of critical events so that they are not alarmed falsely. However the staff may miss if a cluster node is fenced and rejoins the cluster a few minutes later. In single cases this is acceptable but what if there is a malfunction in the hardware and the same node gets fenced frequently? Such hardware must be identified at all costs and it may be appropriate that the failed cluster node gets powered down. The monitoring software or a Grayhead will take notice of this and will report that the node is down. Then the administrator may decide to put the node online again or to further inspect it if the node is affected regularily. Since only servers in good shape form the cluster, the quality of the cluster increases. However this method may be only used in bigger clusters. In smaller ones usually the cluster services should be protected by bringing all nodes online again as fast as possible.

Heartbeat Network

It is usually a wise idea to use a dedicated heartbeat interface for all inter cluster configurarion. Most servers will have heavy traffic on the public network interface and therefor cluster communication should be on a seperate NIC.

In a Shared Root Cluster the basic network configuration is done within the com_info section of the /etc/cluster/cluster.conf. The IP address that is chosen for the cluster node should be placed in the /etc/hosts file together with the proper node name. From this time on the inter cluster communication is done over the specified NIC.

You should also consider the usage of bonding interfaces so that the heartbeat network may be handled by different network interfaces and seperate network switches. In case of a network issue the alternative network path will guarantee seemless failover and continued cluster services.


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: