Tomcat 6 - Clustering

Clustering refers to running multiple instances of Tomcat that appear as one Tomcat instance. If one instance was to fail the other instances would take over thus the end user would not notice any failures.

Clustering in Tomcat enables a set of Tomcat instances on a LAN to appear to the users a single server, as detailed in the below picture. This architecture allows more requests to handled and can handle if one server were to crash (High Availability) .

Incoming requests are distributed across all servers, thus the service can handle more users. This approach is known as horizontal scaling thus you can buy cheaper hardware and still use existing hardware without having to upgrade your existing hardware.

There are a number of different clustering models that are used Master-Backup, Fail-Over, Tomcat uses both of these and incorporates load balancing as well.

Tomcat Clustering Model

The Tomcat clustering model can be divided into two layers and various components

The two layers that enable clustering are the load-balancing frontend and the state-sharing/synchronization backend. The front-end deals with incoming requests and balancing them over a number of instances, while the backend is concerned with ensuring that shared session data is available to different instances.

There are a number of load-balancing frontends that you can use

Round-robin DNS, whereby a domain name resolution results in a set of IP addresses
A hardware-based load balancer
A software load-balancer like PLB (Pure Load Balancer)
Apache mod_proxy or mod_jk as a load balancer

Depending on your budget, most setups either use hardware-based load-balancer or Apache mod_proxy/mod_jk. I have already touched on how to configure a Apache mod_proxy and mod_jk in the Apache Server selection. The area I did not cover was the use of Sticky Sessions or Session Affinity, what this means when set is that incoming requests with the same session are routed to the same Tomcat worker.

They only problem with sticky sessions is that if a Tomcat instance were to fail then all sessions are lost, but as with the load-balancing frontend, you have numerous session-sharing backends from which to choose. Each provides a different level of functionality as well as implementation complexity.

mod_proxy and mod_jk offer the following options

Sticky Sessions with no session sharing
Sticky Sessions with a Persistent Session Manager and a shared file store
Sticky Sessions with a Persistent Session Manager and a JDBC store to RDBMS
In-memory session replication

Sticky Session with no clustered session sharing ensures that that requests are handled by the same instance, the session ID is encoded with the route name of the server instance that created it, assisting in the routing of the request. This solution can be used by most Production system, it is simple and easy to maintain, there is no additional configuration or resource overhead but this solution has no HA capability, thus if a server were to crash all session data is lost with it.

Sticky Session with a persistence manager and a shared file store, which is already built into Tomcat. The idea is to stored session data thus in the event of a failure the session data can be retrieved. Using a shared disk device (NFS, SMB) all the Tomcat instances has access to the session data. However Tomcat will not guarantee when a sessions data will be persisted to the file store, thus you could have a case where a Tomcat instance crashes but the session data was not written to the file store. It only offers a slightly better solution to the one above.

Sticky session with a persistence session manager and a JDBC-based store, is the same as the file-based store, but uses a RDMBS instead

In-memory session replication, is session data replicated across all Tomcat instances within the cluster, Tomcat offers two solutions, replication across all instances within the cluster or replication to only its backup server, this solution offers a guaranteed session data replication, however this solution is more complex.

The Tomcat server instances running in the cluster are implemented as a communications group. At the Tomcat instance level, the cluster implementation is an instance of the SimpleTcpCluster class. Depending on your needs and the replication pattern, you can configure SimpleTcpCluster with one or two managers

DeltaManager - replicate sessions across all Tomcat instances
BackupManager - replicate sessions from a master to a backup instance

SimpleTcpCluster uses Apache Tribes to maintain communicate with the communications group. Group membership is established and maintained by Apache Tribes, it handles server crashes and recovery. Apache Tribes also offer several levels of guaranteed message delivery between group members. This is achieved updating in-session memory to reflect any session data changes, the replication is done immediately between members. This solution offers full HA but at a cost to heavy network loads, also additional hardware is often required to make sure there are no single points of failure in the network.

Session Management

Sessions are objects (which can contain and reference to other objects) that are kept on behalf of the a client. Because HTTP is stateless there is no simple way to maintain application state using the protocol alone. A server-side session is the main mechanism used to maintain state, it works as follows

The server writes a cookie to the users browser instance, the cookie contains a token to retrieve the server-side session (data structure)
The cookie is supplied by the browser instance every time it accesses a page on the site
The server reads the token in the cookie to extract the corresponding session.

If a browser does not support cookies, it is possible to use URL rewrite to achieve a similar effect, the URL is decorated with the session ID being used.

Setting up Multiple Instances on one Machine

The cluster consists of three independent Tomcat instances and uses the following, the following is the setup used by all three Session-Sharing methods which i will go into detail later, but first the front-end

mod_jk load-balancing frontend

In the real world you would setup each instance on a separate server, but in this case each instance must have the following

Its own configuration directory
Its own temp directory
Its own webapps directory
Its own temporary work directory
TCP ports that are different from each other (AJP Connector)
Optionally other TCP or JDBC resources, depending on the backend session-sharing mechanism being deployed

Three batch files called start1.bat, start2.bat and start3.bat are created and placed in the Tomcat bin directory. Each of the files set the CATALINA_HOME environment variable and then call the startup.bat file, so in each of the startup set the CATALINA_HOME variable to point to each instance

startup file	# startup1.bat set CATALINA_HOME=c:\cluster\machine1 call startup
shutdown file	# stop1.bat set CATALINA_HOME=c:\cluster\machine1 call shutdown

We will use the minimal configuration just to get the cluster up and running, so copy the directory structure below from a clean installation of Tomcat

First you must disable the HTTP connector, then change the AJP Connector and shutdown port for each of the instances

Instance Name	File to Modify	TCP Ports (Shutdown, AJP Connector)
machine 1	\cluster\machine1\conf\server.xml	8005, 8009
machine 2	\cluster\machine2\conf\server.xml	8105, 8109
machine 3	\cluster\machine3\conf\server.xml	8205, 8209

If you still have a problems starting the Tomcat instance due to port binding use the "netstat" command to find what is using the above port or to look for free ports.

Next you need to set a unique jvmRoute for each instance, i have already discussed jvmRoute

jvmRoute

Note: the other instances will be machine2 and machine3

To indicate to a Servlet Container that the application can be clustered, a Servlet 2.4 standard <distributable/> element is placed into the applications deployment descriptor (web.xml). If this element is not added, the session maintained by this application across the three Tomcat instances will not be shared. You can also add it to the Context element

distributable

The front-end is setup in the Apache Server, i am not going to go into to much detail as i have already covered load-balancing

workers.properties file

worker.list = loadbal1,stat1

worker.machine1.type = ajp13
worker.machine1.host =192.168.0.1
worker.machine1.port = 8009
worker.machine1.lbfactor = 10

worker.machine2.type = ajp13
worker.machine2.host =192.168.0.2
worker.machine2.port = 8109
worker.machine2.lbfactor = 10

worker.machine3.type = ajp13
worker.machine3.host =192.168.0.3
worker.machine3.port = 8209
worker.machine3.lbfactor = 10

worker.bal1.type = lb
worker.bal1.sticky_session = 1
worker.bal1.balance_workers = machine1, machine2, machine3

worker.stat1.type= status

httpd.conf updates

JkMount /examples/jsp/* bal1
JkMount /jkstatus/ stat1

JkWorkersFile conf/workers.properties

JSP page

# Create file sestest.jsp

<%@page language="java" %>
<html>
<body>
<h1><font color="red">Session serviced by machine1</font></h1>
<table align="center" border="1">
  <tr>
  <td>
     Session ID
  </td>
  <td>
     <%= session.getId() %></td>
  </td>
  <% session.setAttribute("abc","abc");%>
  </tr>
  <tr>
  <td>
    Created on
  </td>
  <td>
    <%= session.getCreationTime() %>
  </td>
  </tr>
</table>
</body>
</html>

Note: create for each instance and change the machine name above

Session-Sharing Backend

The above is the same setup for all three Session-Sharing backends, which I am now going to show you how to set these up.

The first i am going to setup is the In-Memory replication, two components needs to be configured to enable in-memory configuration, the <Cluster> element is responsible for the actual session replication, this includes the sending of new session information to the group, incorporating new incoming session information locally and management of group membership (it uses Apache Tribes). The other component is a replication Valve which is used to reduce the potential session replication traffic by ruling out (filtering) certain requests from session replication.

The only implementation of in-memory replication is called SimpleTcpCluster, it uses Apache Tribes for communication which uses regular multicasts (heartbeat packets) to determine membership. All node that are running must multicasts a heartbeat at regular frequency, if they do not send a heartbeat the node is considered dead and is removed from the cluster. The membership of the cluster is managed dynamically as nodes are added or removed. Session data is replicated between the all nodes in the cluster via TCP by using end-to-end communication.

Because of the amount of network traffic generated, you should only use a small number of nodes within the cluster, unless you have vast amounts of memory and can supply a high bandwidth network. You can reduce the amount of data by using the BackupManager (send only to one node, the backup node) instead of DeltaManager (sends to all nodes within the cluster).

Element	Description
<Cluster>	The cluster element is nested inside an enclosing <Host> element, it essentially enables session replication for all applications in the host.
<Manager>	This is a mandatory component, this is where you configure either DeltaManager or BackupManager, they both send replication information to others via Channels from the Apache Tribes group communications library.
<Channel>	A channel is an abstract endpoint, (like a socket) that a member of the group can send and receive replicated information through. Channels are managed and implemented by the Apache Tribes communications framework. Channel has only one attribute.
<Membership>	This attribute selects the physical network interface to use on the server (if you have only one network adapter you generally don't need this). This service is based on sending a multicasts heartbeat regularly, which determines and maintains information on the servers that are considered part of the group (cluster) at any point in time.
<Receiver>	This element configures the TCP receiver component of the Apache Tribes Framework, it receives the replicated data information from other members.
<Sender>	This element configures the TCP sender component of the Apache Tribes Framework, it sends the replicated data information to other members.
<Transport>	This element performs the real stuff, tribes support having a pool of senders, so that messages can be sent in parallel and if using NIO sender, you can send messages concurrently as well.
<Interceptor>	Interceptor components are nested components of <Channel> and are message processing components that can chained together to alter the behavior or add value to the option of a channel. basically you have a option flag that will trigger its operation.
<Valve>	This element acts as a filter for In-Memory replication, it reduces the actual session replication network traffic by determining if the current session needs to be replicated at the end of the request cycle. Even through this element is inside the <Cluster> element it is consider to be inside the <Host> element.
<ClusterListener>	Some of the work of the cluster is performed by hooking up listeners to replication messages that are passing through it. You must configure org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener if you are using JvmRouteBinderValve to ensure session stickiness transfers with a fail-over You must also configure org.apache.catalina.ha.session.ClusterSessionListener if using DeltaManager because this listener forwards the messages to the manager for delta and merging operations.

Now to go into details on what options each element can have

<Cluster> Element
Attribute Name	Description	default
className	The implementation Java Class for the cluster manager current uses org.apache.catalina.ha.tcp.SimpleTcpCluster
channelSendOptions	Option flags are included with messages sent and can be used to trigger Apache Tribes channel interceptors. The numerical value is a logical OR flag values including Channel.SEND_OPTIONS_ASYNCHRONRONUS 8 Channel.SEND_OPTIONS_BYTE_MESSAGE 1 Channel.SEND_OPTIONS_SECURE 16 Channel.SEND_OPTIONS_SYNCHRONIZED_ACK 4 Channel.SEND_OPTIONS_USE_ACK 2	11 (async with ack)
<Manager> Element (mandatory)
Attribute Name	Description	default
className	org.apache.catalina.ha.session.DeltaManager org.apache.catalina.ha.session.BackupManager
name	A name for the cluster manager, this name should be the same on all instances
notifyListeners-OnReplication	Indicates if any session listeners should be notified when sessions are replicated between instances	false
expireSessions-OnShutdown	Specifies whether it is necessary to expire of all sessions upon application shutdown	false
domainReplication	Specifies whether replication should be limited to domain members only, this option is only available for DeltaManager	false
mapSendOptions	When using the BackupManager, this maps the send options that are set to trigger interceptors	8 (async)
<Channel> Element
Attribute Name	Description	default
className	org.apache.catalina.tribes.group.GroupChannel
<Membership> Element
Attribute Name	Description	default
className	org.apache.catalina.tribes.membership.McastService
address	The multicast address selected for this instance	228.0.0.4
port	The multicast port used	45564
frequency (milliseconds)	Frequency which heartbeat multicasts are sent (in milliseconds)	500
dropTime (milliseconds)	The time elapsed without heartbeats before the service considers a member has died and removes it from the group (in milliseconds)	3000
ttl	Sets the time-to-live for multicast messages sent (may be used if network traffic are going through any routers)
soTimeout	The SO_TIMEOUT value on the socket that multicasts messages are sent to. Controls the maximum time to wait for a send/receive to complete	0
domain	For partitioning group members into separate domains for replication.
bind	The IP address of the adaptor that the service should bind to.	0.0.0.0
<Receiver> Element
Attribute Name	Description	default
className	org.apache.catalina.tribes.transport.nio.NioReceiver
address	The IP address to bind to, to receive incoming TCP data (you must sent this for multi-homed hosts)	auto
port	Selects the port to use for imcoming TCP data.	4000
autoBind	Tells the framework to hunt for an available port, starting for the specified port number <port> and add up to this number	1000
selectorTimeout (millseconds)	Bypass for old NIO bug. sets the milliseconds timeout while polling for incoming messages	5000
maxThreads	The maximum number of threads to create to receive incoming messages	6
minThreads	The minimum number of threads to create to receive incoming messages	6
<Sender> Element (must contain a Transmitter element)
Attribute Name	Description	default
className	org.apache.catalina.tribes.transport.ReplicationTransmitter
<Transmitter> Element
Attribute Name	Description	default
className	org.apache.catalina.tribes.transport.nio.PooledParallelSender org.apache.catalina.tribes.transport.bio.PooledMultiSender
maxRetryAttempts	The number of retries the framework conducts when encountering socket-level errors during sending of a message	1
timeout	The SO_TIMEOUT value on the socket that messages are sent on. Controls the maximum time to wait for a send to complete	3000
poolsize	Controls the maximum number of TCP connections opened by the sender between the current and another member in the group. only available when using org.apache.catalina.tribes.transport.nio.PooledParallelSender	25
<Interceptor> Element
Attribute Name	Description
org.apache.catalina.tribes.group.interceptors .TcpFailureDetector	When membership pings do not arrive the interceptor attempts to connect to the problematic member to validate that the member is no longer reachable before the membership list is adjusted
org.apache.catalina.tribes.group.interceptors .MessageDispatch5Interceptor	The asynchronous message dispatcher, triggered by default send option value 8 (its hard coded)
org.apache.catalina.tribes.group.interceptors .ThroughputInterceptor	Logs cluster messages throughput information to Tomcat logs
Replication <Valve> Element
Attribute Name	Description
className	org.apache.catalina.ha.tcp.ReplicationValve
filter	A semicolon-delimited list of URL pattern for requests that are to be filtered out.
Example
server.xml file	<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="228.0.0.8" bind="192.168.0.1" port="45564" frequency="500" dropTime="3000" /> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="192.168.0.1" port="4200" autoBind="100" selectorTimeout="5000" maxThreads="6" /> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=".\.gif;.\.js;.\.jpg;.\.htm;.\.html;.\.txt;" /> <Deployer className="org.apache.catalina.ha.deploy.FarmWarDeployer" tempDir="D:/cluster/temp/war-temp/" deployDir="D:/cluster/temp/war-deploy/" watchDir="D:/cluster/temp/war-listen/" watchEnabled="false" /> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> Note: the port in bold is the one you need to change for each instance

Testing In-Memory Session Replication Cluster

You need to perform the following to test the In-Memory session replication

Ensure that the <Cluster>, <Manager> are uncommented in all three instances, also make sure that the <Context> element is commented out.
Start all three instances
Start the Apache server with the mod_jk module
Open a browser to http://<instance IP address>/examples/jsp/sesstest.jsp
Check that its working correctly, then shutdown the instance that had your session
Retest using the browser and hopefully it should pickup the other instance but have the same session ID

I have a catalina log file with cluster logging information, starting up, a node going down and rejoining, this log file is from the example I have given above, although I used different IP address but i am sure you get the idea.

Persistent Session Manager with a shared file store

To setup a persistent session manager you must comment out the <Cluster> element in each instance, this disables the In-Memory replication mechanism, then add a context.xml file to each instance with the below

context.xml

Restart the tomcat instances
Open a browser to http://<instance IP address>/examples/jsp/sesstest.jsp
Click a number of times, you should stay with the same instance (sticky session working)
Shutdown an instance
Retest using the browser and hopefully it should pickup the other instance but have the same session ID

Persistent Session Manager with a JDBC store

The only difference between a file-based store and a JDBC-based store is the <store> element

context.xml

Create a table in your database

create table tomcat_sessions (
  session_id varchar(100) not null primay key,
  valid_session char(1) not null,
  max_inactive int not null,
  last_access bigint not null,
  app_context varchar(255),
  session_data meduimlob,
  KEY kapp_context(app_context)
);

Restart the tomcat instances
Open a browser to http://<instance IP address>/examples/jsp/sesstest.jsp
Click a number of times, you should stay with the same instance (sticky session working)
Shutdown an instance
Retest using the browser and hopefully it should pickup the other instance but have the same session ID