Thursday, November 15, 2012

High Availability with JBoss SOA Platform


High Availability and why we need it

When employing ESBs, one of the most import aspects is that the ESB be highly available. This requirement will arise naturally from the fact that the ESB is an intermediate between other applications which rely on it to supply them with data. When the intermediate is offline, then the entire chain comes offline. To counter this, the ESB is deployed with some sort of failover mechanism in order to make it more resilient to failures.

JBoss SOA Platform is Red Hat's solution for enterprise SOA. It contains, among other things, the Jboss Application Server plus two products that we are interested in: JBoss ESB and JBoss Messaging. JBoss ESB is at it's core a mediation engine, it contains the logic for getting messages from one endpoint to another.  The endpoints are the sources and destinations for messages. They can be implemented with various technologies, like JMS queues, webservices, ftp sites, etc. JBoss Messaging is a JMS implementation, it enables you to setup and manage queues.

Combined this two technologies enable you to move data accross your enterprise: by letting an application put messages on one JMS queue, JBoss ESB can then transport it to another queue where the messages are picked up by another application. One of the advantages of such an approach is that the two applications that are actually sharing data in this way don't need to know about each other. They don't even need to be running at the same time. The downside, as previously noted, is that the ESB is a single point of failure.

HA strategies and concepts

We mentioned the need to make the ESB resilient to failures. But how do we go about this? Remember that the root of the problem is that the ESB is a single point of failure. That is the case because it is running in a process, and that this process could cease for whatever reason. 

But if we could have multiple ESBs running in their own processes, then the failure of one would not affect the other. The better the different processes are insulated from each other, the more robust the solution. For this to really work however, a mechanism would be needed so that in case of failure of one instance, the remaining ones take up the work it was doing.

But making the ESB resilient is not enough. Because the endpoints are the interface between the ESB and other systems, they too need to be HA for without them no data can be shared. The HA strategy of the endpoints is strongly correlated with the endpoint transport mode. For instance: JBoss Messaging can cluster queues, webservices can be made HA via a dedicated http loadbalancer. As it turns out JBoss Messaging has it's own out-of-the box clustering support, so we just use that.

The diagram below illustrates the ideas. The dashed lines represent process boundaries. The two ESBs are insulated from each other, so the failure of one does not influence the other. For simplicity we consider the case where an external agent (sender) puts messages into the messaging system via an endpoint which is by itself HA.
  



Note that the process boundary in which the HA endpoint is running arises because of failover. There are actually multiple instances on different processes, but they can work together and take over each other's work. So as far as the other systems are concerned, it is as if only one very resilient instance was running. Moreover, adding more ESB instances will increase performance, as long as there are enough messages to consume.

One question comes to mind: what happens when a ESB instance takes a message from the endpoint and then fails? If no precaution is taken, then the message is lost. A crucial feature of our HA implementation should be that each message processing is done inside a transaction, so that should it fail the message becomes available on the endpoint again for processing by still active instances.

JBoss Clustering and Messaging

Since we are using JBoss Messaging for our endpoint implementation, we should briefly explain how it's clustering works. JBoss Messaging can be made HA via JBoss Clustering, which is built into the JBoss Application Server. It uses the JGroups protocol to enable different JBoss instances on the same network to find each other and keep each instance aware of the others. Should one instance fail, then the others become aware of this and take over it's workload. JBoss Messaging uses this facility to detect a queue failure, and migrate it's messages to another running queue. 

In order to have our cluster we will deploy JBoss AS on different nodes on the same network. A node in this case would be a different (virtual) machine, which would do a good job insulating the different processes from each other. 







We consider a case with just one queue, Queue1, for simplicity. Queue1 is deployed on separate nodes, and clustered. As you can see JBoss ESB is deployed along with Queue1 on the same node, therefore that node's failure would affect both. Still we have a backup on the other node, so HA is achieved.

One thing to note is that the consumers of Queue1, including JBoss ESB, can directly communicate with any instance of Queue1 on any node in the cluster. This is because JBM does loadbalancing and failover on its JMS client implementation. This is actually the key of JBoss Messaging's HA, because without it clients would simply fail along with the failing node.

JBM clients are configured to contact one single node, from which they receive a list of all available nodes in the cluster which the node knows via JBoss Clustering. The client then can access the Queue1 instance on any of these nodes. Should it happen that the node which the client is currently connected to fails, then it will automatically switch to another node on the list, and access its Queue1 instance. The client can also loadbalance over the Queue1 instances, improving performance. This all happens transparently to the code that uses JBM, as far as that is concerned there is only one Queue1, and it never went down.

But what happens in the remote event that the complete cluster goes down? We still do not want to lose our messages. Therefore they are persisted by the queues in a separate storage system, shared by all nodes in this case. This database must of course be made HA also. The best strategy and implementation depends on the database vendor, and is beyond the scope of this article. We will assume a mysql database because it is a popular, proven product which can be made HA in a way transparent to clients.


Setting up an actual test  

Now that we have discussed some basic concepts, we can consider an actual case. For this example  the ESB will take messages from  a JMS queue named InBoundQueue and place it in the OutBoundQueue. Inside, we will configure the ESB to pause for a while, thereby simulating some heavy processing.





The nodes are virtualized using Virtualbox, or your favorite virtualization product. We will setup JBoss ESB and JBoss Messaging in HA configuration. Then we will consider a simple failover test and how to verify the results.

Virtualbox

Setup three virtual machines using your favorite OS, two for JBoss, one for Mysql. Make sure the virtual machines can resolve each other over the network by choosing for instance host-only networking in Virtualbox. Also, make sure the ip address of the mysql server is static, this depends of the OS you choose. It also makes things easier if the other nodes also have a static ip for deployment, sending messages, etc.

Install and setup the necessary software on the nodes, like mysql server and Java JDK. Copy JBoss soa platform on the JBoss nodes. I recommend setting up one JBoss node completely, then cloning it to create the other. The clone will need some customizing, see below.


JBoss Server Profile

First, create a new profile by creating a new directory under {soa.platform.install.dir}/jboss-as/server and copy into it the contents of the 'all' directory. I have chosen this one mainly because it contains the facilities necessary for clustering. Let's assume this profile is called 'my_cluster'.


JBoss setup

Then edit the file{soa.platform.install.dir}/jboss-as/tools/schema/build.properties according to your needs such as database vendor and connection info.

Take special care to set org.jboss.esb.server.config to the one created earlier, 'my_cluster'. Also make sure to set org.jboss.esb.clustered to 'true'. Also, copy the JDBC driver jar for your database vendor (for instance, mysql) to the lib directory of  'my_cluster' to avoid any errors further on.

Then, just run ant. The script will update the 'my_cluster' profile to work with your database and support clustering. 

JBoss will automatically use the credentials supplied to create the necessary tables for JMS.


JBoss startup

JBoss must be bound to the ip of the machine it is running on. Also, do not forget to start with the
jboss.messaging.ServerPeerID  option to a value unique inside your cluster, and to use the 'my_cluster' profile. There is also an option to set the cluster partition name jboss.partition.name. Nodes with the same name form a cluster when on the same network. This is useful for running multiple clusters on the same network, but does not concern us now.


JBoss ESB setup 

I will assume you are already familiar with developing and deploying .esb files. Edit jboss-esb.xml as follows:


<?xml version="1.0"?>
<jbossesb parameterReloadSecs="5"
 xmlns="http://anonsvn.labs.jboss.com/labs/jbossesb/trunk/product/etc/schemas/xml/jbossesb-1.2.0.xsd"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://anonsvn.labs.jboss.com/labs/jbossesb/trunk/product/etc/schemas/xml/jbossesb-1.2.0.xsd http://anonsvn.jboss.org/repos/labs/labs/jbossesb/trunk/product/etc/schemas/xml/jbossesb-1.2.0.xsd">
 <providers
  <jms-jca-provider connection-factory="ClusteredConnectionFactory" name="JBossMQ">
   <jms-bus busid="quickstartEsbChannel">
    <jms-message-filter dest-name="queue/EsbQueue"
     dest-type="QUEUE" persistent="true" transacted="true"/>
   </jms-bus>
   <jms-bus busid="GwChannel">
    <jms-message-filter dest-name="queue/InBoundQueue"
     dest-type="QUEUE" persistent="true" transacted="true"/>
   </jms-bus>
  </jms-jca-provider>
 </providers>

 <services>
  <service category="myCategory"
   description="Hello World File Action (esb listener)" name="myFileListener">
   <listeners>
    <jms-listener busidref="GwChannel" is-gateway="true" name="gwlistener"/>
    <jms-listener busidref="quickstartEsbChannel" name="helloWorldFileAction"/>
   </listeners>
   <actions mep="OneWay">
    <action class="org.jboss.soa.esb.actions.SystemPrintln" name="PrintBodyConsole">
     <property name="message" value="== Printing body =="/>
     <property name="printfull" value="false"/>
    </action>
    <action
     class="org.company.WaitAction" name="wait">
     <property name="timeSecs" value="60"/>
    </action>
    <action class="org.jboss.soa.esb.actions.SystemPrintln" name="PrintSendConsole">
     <property name="message" value="== Sending to Queue =="/>
     <property name="printfull" value="false"/>
    </action>
    <action class="org.jboss.soa.esb.actions.routing.JMSRouter" name="PutInOutQueue">
     <property name="jndiName" value="queue/OutBoundQueue"/>
     <property name="unwrap" value="true"/>
    </action>
   </actions>
  </service>
 </services>
</jbossesb>


In the provider configuration, note that we are using 'ClusteredConnectionFactory'. This connection factory makes sure we will access the clustered queues as one, and profit from failover and loadbalancing. Also note that we have set the transacted flag to 'true', so that messages will reappear on the queue if processing in the service pipeline fails. 

We also set the persistent flag to 'true', so that messagese will be preserved even when the whole cluster goes down. This amounts to letting the queue know we want persistent messages, it is up to the queue to actually provide this which it will since JBoss Messaging has been configured to work with Mysql.

Furthermore, note the WaitAction. This is a custom class which causes processing to pause for a configurable amount of time. This gives us time to kill the node during the pause in a controlled manner:

public class WaitAction extends AbstractActionLifecycle
{   
    private Long _timeSecs ;   
    private Logger _log;

    public WaitAction(ConfigTree tree) throws ConfigurationException {
        _timeSecs = Long.parseLong( tree.getRequiredAttribute("timeSecs") );
        _log = Logger.getLogger( this.getClass() );       
    }

    public Message process(Message message) throws InterruptedException {   

        _log.info("Going to sleep for " + _timeSecs +" secs."    );       
        Thread.sleep(_timeSecs.longValue()*1000);       
        _log.info("Awake, resume processing.");       
        return message;
    }
}

Don't forget to use the correct FQN for the class in jboss-esb.xml.


JBoss Messaging

We need to setup the required queues. Edit jbm-queue-service.xml als follows:


<?xml version="1.0" encoding="UTF-8"?>
<server>
      <mbean code="org.jboss.jms.server.destination.QueueService"
            name="jboss.esb.quickstart.destination:service=Queue,name=EsbQueue"
            xmbean-dd="xmdesc/Queue-xmbean.xml">
            <depends optional-attribute-name="ServerPeer">
                jboss.messaging:service=ServerPeer
            </depends>
            <depends>jboss.messaging:service=PostOffice</depends>            
            <attribute name="Clustered">true</attribute>
      </mbean>
      <mbean code="org.jboss.jms.server.destination.QueueService"
            name="mycluster.destination:service=Queue,name=OutBoundQueue"
            xmbean-dd="xmdesc/Queue-xmbean.xml">
            <depends optional-attribute-name="ServerPeer">
                 jboss.messaging:service=ServerPeer
            </depends>
            <depends>jboss.messaging:service=PostOffice</depends>           
            <attribute name="Clustered">true</attribute>
      </mbean      
      <mbean code="org.jboss.jms.server.destination.QueueService"
            name="mycluster.destination:service=Queue,name=InBoundQueue"
            xmbean-dd="xmdesc/Queue-xmbean.xml">
            <depends optional-attribute-name="ServerPeer">
                 jboss.messaging:service=ServerPeer
            </depends>
            <depends>jboss.messaging:service=PostOffice</depends>
            <attribute name="Clustered">true</attribute>
      </mbean>
</server>


We mark the queues as clustered by setting the 'Clustered' attribute.


Starting the cluster and deployment

To start the cluster, first start one JBoss instance on one of the Virtualbox machines and wait for it to be fully up and running. This will be what JBoss calls the 'primary node'. You should be able to see some logging similar to this:


2011-10-21 04:17:55,252 INFO  [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] (main) Initializing partition DefaultPartition
2011-10-21 04:17:55,402 INFO  [STDOUT] (JBoss System Threads(1)-3)
---------------------------------------------------------
GMS: address is 192.168.56.101:55200 (cluster=DefaultPartition)
---------------------------------------------------------
2011-10-21 04:17:55,712 INFO  [org.jboss.cache.jmx.PlatformMBeanServerRegistration] (main) JBossCache MBeans were successfully registered to the platform mbean server.
2011-10-21 04:17:55,864 INFO  [STDOUT] (main)
---------------------------------------------------------
GMS: address is 192.168.56.101:55200 (cluster=DefaultPartition-HAPartitionCache)
---------------------------------------------------------
2011-10-21 04:17:58,410 INFO  [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] (JBoss System Threads(1)-3) Number of cluster members: 1
2011-10-21 04:17:58,410 INFO  [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] (JBoss System Threads(1)-3) Other members: 0
2011-10-21 04:17:58,414 INFO  [org.jboss.cache.RPCManagerImpl] (main) Received new cluster view: [192.168.56.101:55200|0] [192.168.56.101:55200]
2011-10-21 04:17:58,416 INFO  [org.jboss.cache.RPCManagerImpl] (main) Cache local address is 192.168.56.101:55200
2011-10-21 04:17:58,429 INFO  [org.jboss.cache.RPCManagerImpl] (main) state was retrieved successfully (in 2.57 seconds)


Then start the second node. You can see in the logging that they will find each other and form the cluster:


2011-10-21 04:20:55,432 INFO  [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-13,192.168.56.101:55200) org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener@32af3289 got new view [192.168.56.101:55200|1] [192.168.56.101:55200, 192.168.56.102:55200], old view is [192.168.56.101:55200|0] [192.168.56.101:55200]
2011-10-21 04:20:55,433 INFO  [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-13,192.168.56.101:55200) I am (192.168.56.101:55200)
2011-10-21 04:20:55,434 INFO  [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-13,192.168.56.101:55200) New Members : 1 ([192.168.56.102:55200])
2011-10-21 04:20:55,434 INFO  [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-13,192.168.56.101:55200) All Members : 2 ([192.168.56.101:55200, 192.168.56.102:55200])


After you have built your .esb file, deploy it in the farm directory of the primary node under my_cluster\farm. Watch in the logs as the queues are deployed on the primary and then secondary nodes.


Running some tests

In order to test failover have your cluster fully running, then send some messages to the inbound queue using your favorite tool, like Hermes JMS. Make sure the body of the messages are unique, so that they are easily identified in the logging. Then take one node offline and watch the logging on the other node. You should see the message appear there. 

When all processing is done, you should see all messages safe and sound in the outgoing queue. I prefer to inspect the database in order to validate this:
  

[root@jboss ~]# mysql jboss -e 'select HEADERS from JBM_MSG'
+-------------------------------------------------------------------------------+
|HEADERS |                  
+-------------------------------------------------------------------------------+
 OutBoundQueue  |ID:JBM-f36d6d66-9c2c-413e-a879-0510af996601 H.CORRELATIONID uickstartId[1321627299207] H.DEST
 
OutBoundQueue  |ID:JBM-9a448807-1911-475f-aa7f-661dfefa3164 H.CORRELATIONID uickstartId[1321627300266] H.DEST
 
OutBoundQueue  |ID:JBM-f83a3c80-31a7-484e-ba9a-e15eef316cc1 H.CORRELATIONID uickstartId[1321627302355] H.DEST



Conclusion

Clustering with JBoss is actually not that hard to setup because it is built into the JBoss Application Server, but the complexity of the solution can make the task seem daunting. I hope you now have some insight in HA concepts and implementation with JBoss SOA Platform, as well as how to verify that you have a working solution.