Click here to close now.

Welcome!

Java Authors: Pat Romanski, Liz McMillan, Carmen Gonzalez, Plutora Blog, Tim Hinds

Related Topics: Java, Cloud Expo, Apache

Java: Article

Dynamic Clustering for J2EE Cloud Environments

Cloud computing is one of the emerging paradigms in today's computing world

Cloud computing is one of the emerging paradigms in today's computing world. One of the main advantages of migrating to the cloud is its elastic nature. Elasticity allows dynamic provisioning and de-provisioning of resources according to the application's workload requirement.

In a traditional on-premise J2EE infrastructure, information about the application server and web server resources are available during deployment. Clustering of such an infrastructure to achieve scalability is much simpler since the information about resources is known beforehand. But in a cloud environment, because of its elastic nature, resources get provisioned and de-provisioned dynamically based on the workload. So a j2ee cloud environment has challenges like reconfiguring automatically for the addition/removal of application server instances to/from the cluster. One solution from the open source space is to use Apache Httpd web server with mod_cluster load balancing module and JBoss application server.

The article will discuss the features of mod_cluster which enable it to operate in a cloud environment and also the steps to set up a highly scalable J2EE cloud environment in your lab.

Introduction to mod_cluster
mod_cluster is an extension of the Apache httpd mod_proxy load balancing module and can balance http requests across multiple instances of JBoss Application Server, JBoss Web standalone, or Tomcat. The unique feature of mod_cluster is that once the initial configuration is done, there is no need of any manual configuration changes for adding or removing JBoss AS or Tomcat instances.

mod_cluster uses two communication channels for its working. It uses ajp, http or https to forward requests from httpd to one of the application server nodes. The backward channel is used by the application server nodes to send server side information to the httpd side and is the key differentiator for mod_cluster with respect to other load balancing modules. This channel sends real time information about load balancing factors for each node and application life cycle events.

Some of the key features of mod_cluster that enable it to be used in cloud environments are

Dynamic configuration
Common httpd based load balancers like mod_jk and mod_proxy require the configuration of workers (application servers) at the httpd side. So if you want to add a new worker, you will have to change the configuration in httpd side and restart the proxy. This is an overhead in case of large or dynamically varying clusters like in cloud environments.

But with mod_cluster the proxy information is maintained in the application server side through a static list or through the advertise mechanism (using mod_advertise) . As the workers start, the listeners receive multicast pings which contain the host and port information about the proxies. Now the workers can send events to the detected proxies and the proxies auto configure themselves to balance request between the nodes.

Dynamic determination of load balancing factor
In common httpd based load balancers, the ratio in which load is distributed among the workers is determined by a static factor we provide at the httpd load balancer configuration. But in the case of mod_cluster the load balancing factor is determined at the application server side based on the real time values monitored at runtime. Load computation is pluggable and you can write your own LoadMetric based on the metric you want in addition to the default load metrics.

Fine grained web application life cycle
In the case of mod_cluster, the applications deployed in the application server side are registered with the httpd side through the Mod-Cluster Management protocol. So the http side has information about which all applications are deployed in which instances and transfers the requests only to the nodes which have the requested application. Since the proxies have information on the applications deployed on each of the workers, we can keep highly sensitive applications on our private cloud and move lesser critical applications to some public provider. This enables us to scale up without compromising security as the data sensitive applications will be in our local premises only . For example in a shopping cart scenario browsing the catalog can go into the public cloud and sensitive requirements like payments can stay in company's private cloud.

The following section describes how we can set up a highly scalable j2ee environment using vmware(or any other private cloud solution like Eucalyptus or Open Nebula) in your lab setup.

Environment Set-Up
We have VMware vCenter server setup in one of the machines and connected to an ESXi host. The discussion is based on the assumption that the reader knows how to create a virtual machine and install guest operating system in a vmware environment. Our virtual machine has CENTOS 5.4 installed in it.

To set up dynamic cluster in cloud, we need minimum of two instances:

  1. Apache httpd + mod_cluster
  2. JBoss 5.1 application server with mod_cluster

Now we can look into creating each of these images/templates with necessary startup scripts in details. These steps can be done in any of the CentOS 5 installed machine or virtual machines.

Creating Apache httpd image
Step 1. Create a base virtual machine with centos 5.x as the guest operating system.
Refer creating a virtual machine and installing guest os for VMWare virtual machine creation.

Step 2. Install Apache httpd and mod_cluster in the virtual machine
Download the apache httpd integrated with latest mod_cluster distribution here. To install httpd with mod_cluster, move the distribution to the vm and extract mod_cluster-1.1.xxx-linux2-x86-ssl.tar.gz file using the following command

tar xvf mod-cluster-1.1.0.xxx-linux2-x86-ssl.tar.gz

This by default installs httpd with required mod_cluster modules in /opt/jboss directory.

Step 3. Configuring mod_cluster at httpd side
The httpd configuration file will be httpd.conf which is located in /opt/JBoss/httpd/httpd/conf. From mod_cluster1.1.0CR2 mod_cluster comes with some quick start values.

LoadModule proxy_module modules/mod_proxy.so

LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

LoadModule slotmem_module modules/mod_slotmem.so

LoadModule manager_module modules/mod_manager.so

LoadModule proxy_cluster_module modules/mod_proxy_cluster.so

LoadModule advertise_module modules/mod_advertise.so

The above configuration specifies the extra modules required for httpd with mod_cluster. If you are adding mod_cluster to the existing httpd installation, you have to download the modules and add the above configuration to httpd.conf file.

# MOD_CLUSTER_ADDS

# Adjust to you hostname and subnet.

<IfModule manager_module>

Listen *:6666

ManagerBalancerName mycluster

<VirtualHost *:6666>

<Directory />

Order deny,allow

Deny from none

Allow from all

</Directory>

KeepAliveTimeout 300

MaxKeepAliveRequests 0

#ServerAdvertise on http://@IP@:6666

AdvertiseFrequency 5

#AdvertiseSecurityKey secret

#AdvertiseGroup @ADVIP@:23364

<Location /mod_cluster_manager>

SetHandler mod_cluster-manager

Order deny,allow

Deny from none

Allow from all

</Location>

</VirtualHost>

</IfModule>

Customize the above configuration for your own needs as this is not suitable for production environment.

Step 4. Starting httpd at boot up
We need the httpd to be up and running when the machines boots up. To achieve this we have to expose httpd as a service through init scripts.

The below script can be used to start and stop httpd at boot up.

#!/bin/sh

# chkconfig: - 64 36

# description: Apache Start|Restart|Stop Web Server

APACHE_HOME=/opt/jboss/httpd

case "$1" in

start)

echo "Starting Apache ..."

# Change the location to your specific location

$APACHE_HOME/sbin/apachectl start

;;

stop)

echo "Stopping Apache ..."

# Change the location to your specific location

$APACHE_HOME/sbin/apachectl stop

;;

graceful)

echo "Restarting Apache gracefully..."

# Change the location to your specific location

$APACHE_HOME/sbin/apachectl graceful

;;

restart)

echo "Restarting Apache ..."

# Change the location to your specific location

$APACHE_HOME/sbin/apachectl restart

;;

*)

echo "Usage: '$0' {start|stop|restart|graceful}" >&2

exit 64

;;

esac

exit 0

Copy the above script to /etc/init.d/httpd file or write your own startup script for apache httpd.

Give the file execute permission

chmod +x /etc/init.d/httpd

Add httpd as service at required run levels

chkconf -add httpd

chkconfig -level 345 httpd on

Now to test the set up try

service httpd start

Starting httpd:                                         [ OK ]

Try http://[ip]:[mod_clusterport]/mod_cluster_manager in the browser.

You should be able to see the following window

Step 5. Convert virtual machine to template
To avoid repeating the same steps for creating httpd virtual machine, you can create the clone of the vm. For vmware powerOff the virtual machine and clone it to template

Now we will look into how to create the jboss image.

Creating JBoss image
Step 1. Create the Centos vm

Refer image creation for httpd

Step 2. Install Java
You can get the latest Java from the following location and the second link explains steps for java installation.

http://www.oracle.com/technetwork/java/javase/downloads/index.html

http://www.oracle.com/technetwork/java/javase/index-137561.html

Step 3. Installing JBoss AS with mod_cluster

We are using JBoss 5.1GA which can be obtained here. Let $JBOSS_HOME is the JBoss installation directory. For installing jBoss AS simply extract the downloaded tar file.

For the demo JBOSS_HOME = /home /JBoss-5.1.0.GA

Download the latest java bundles for mod_cluster here. mod_cluster 1.1.0 work with with JBoss AS 5.1 with out of box.Extract the mod_cluster-1.1.0.xxx-bin.tar.gz file and copy the mod_cluster.sar to the deploy folder.

tar xvf mod_cluster-1.1.0.CR3-bin.tar.gz

cp -r /tmp/mod_cluster.sar $JBOSS_HOME/server/all/deploy

Assuming you have extracted to /tmp directory

cp -r /tmp/mod_cluster.sar $JBOSS_HOME/server/all/deploy

Step 4. Configuration

The main configuration file is mod_cluster-JBoss-beans.xml under

$JBOSS_HOME /server/all/deploy/ mod_cluster.sar/ META-INF/

By default mod_cluster is configured to work in clustered mode. In clustered mode, a single JBoss node is responsible for providing the entire cluster view to the front-end httpd processes.  The default configuration uses advertise mechanism using the mod_advertise module.

Step 5. JBoss as a service at startup

Execute the following commands to add new user jboss and give the startup file execute permission.

#create and give permissions to user jboss

adduser jboss

chown -Rf jboss.jboss /$JBOSS_HOME

#copy the default startup script to /etc/init.d

cd /$JBOSS_HOME /bin

cp JBoss_init_redhat.sh /etc/init.d/jboss

chmod +x /etc/init.d/jboss

Modify the /etc/init.d/jboss file to point JBOSS_HOME and JAVAPTH to point to the actual installed directories.

# chkconfig: - 35 90

# description: JBoss Start|Restart|Stop Application Server

# pidfile: /var/run/JBoss.pid

....

JBOSS_HOME=${JBOSS_HOME:-"$JBOSS_HOME"}

#define the user under which JBoss will run, or use 'RUNASIS' to run as the current user

JBOSS_USER=${JBOSS_USER:-"JBoss"}

#make sure java is in your path

JAVAPTH=${JAVAPTH:-"jdk installation folder "}

#configuration to use, usually one of 'minimal', 'default', 'all'

JBOSS_CONF=${JBOSS_CONF:-"all"}

#if JBOSS_HOST specified, use -b to bind JBoss services to that address

OS=`uname`

IP="" # store IP

case $OS in

Linux) IP=`ifconfig eth0| grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1}'`;;

FreeBSD|OpenBSD) IP=`ifconfig eth0 | grep -E 'inet.[0-9]' | grep -v '127.0.0.1' | awk '{ print $2}'` ;;

SunOS) IP=`ifconfig -a eth0 | grep inet | grep -v '127.0.0.1' | awk '{ print $2} '` ;;

*) IP="Unknown";;

Esac

#Bind to the current ip address

JBOSS_HOST=$IP

JBOSS_BIND_ADDR=${JBOSS_HOST:+"-b $JBOSS_HOST"}

The above modification is to bind jboss to the ip address of the jboss instance.

To add jboss as a service at startup,

#command to start jboss at runlevel 3,4 and 5

chkconfig --add jboss

chkconfig --level 345 jboss on

service jboss start

Check in the browser if JBoss is started

http://[jboss-ip]:8080

Step 6. Convert vm to template

For vmware powerOff the virtual machine and clone it to template.

Testing the environment
Create virtual machines from the above created templates using vijava api or vCenter client. After one instance of both apache httpd and jboss got powered on, check the mod_cluster_manager using the url,

http://[webserverip]:[mod_clusterport]/mod_cluster_manager/

We can see that the JBoss worker is balanced by the mod_cluster. If we create one more JBoss instance, the new one will get added to that balancer. So if you have an application deployed on both of the JBoss instances, the requests will be distributed across the JBoss instances. Similarly when you start up new JBoss instances, the instances will get registered automatically to the proxy and become available for load balancing.If we kill a JBoss instance that will automatically get de-registered from the proxy balancer.

Set up in an environment where multicast is not supported
The above setup showed mod_cluster configuration using advertise mechanism, which uses muticast pings for auto discovery. But major cloud providers like Amazon EC2, Rackspace, GoGrid etc doesn't support multicast in their environment. To overcome this, information about the proxies can be passed through an JBoss argument (JBoss.mod_cluster.proxyList) at start instance up or use the addProxy method exposed by mod_cluster through JMX. The addProxy method takes the IP of httpd proxy and the port on which mod_cluster is listening. You can go to the JBoss AS JMX-Console to do this or use the java code to invoke this method remotely.

To disable the advertise mechanism following configuration changes need to be done :

At httpd side : Set the ServerAdvertise property to off in httpd.conf config file in /opt/JBoss/httpd/httpd/conf

ServerAdvertise off

At JBoss Side : Set advertise property in ModClusterConfig bean to false in mod_cluster-JBoss-beans.xml under $JBOSS_HOME /server/all/deploy/ mod_cluster.sar/ META-INF/

<property name="advertise">false</property>

After setting these properties, create virtual machines from the templates. We can see in the mod_cluster_manger of the apache instance that that jboss node is not added. Now  we have to add the proxy instance to the jboss mod_cluster configuration through JMX. The following code snippet can be used to add a proxy to the balancer.

Hashtable contextProps = new Hashtable();

contextProps.put("java.naming.factory.initial"," org.JBoss.naming.HttpNamingContextFactory");

contextProps.put("java.naming.provider.url", http://+JBossinstanceip+":8080/invoker/JNDIFactory");

contextProps.put("java.naming.factory.url.pkgs", "org.JBoss.naming.client");

InitialContext ctx = new InitialContext(contextProps);  // From table

MBeanServerConnection server = (MBeanServerConnection) ctx.lookup("jmx/invoker/RMIAdaptor");

Object op = server.invoke(new ObjectName("JBoss.web:service=ModCluster"), "addProxy", new Object[]{webServerIP,webServermod_clusterPort},new String[]{"java.lang.String","int"} );

Now if you check the mod_cluster_manager we can see that the jboss node now balanced by mod_cluster.

Conclusion
With capabilities like dynamic addition of workers without any configuration changes, knowledge of deployed applications and calculation of the real time load balancing factor based on different metrics , it is certain that mod_cluster is the future of load balancer modules for apache and it will also have a huge impact in the cloud environment.

More Stories By Joel Mathew

Joel Mathew works as a Technology Analyst at SETLabs, R&D division, at Infosys Technologies Ltd. He has close to 3 years of experience in development of Cloud computing, Java and Java EE applications, Web 2.0,etc.

@ThingsExpo Stories
One of the biggest impacts of the Internet of Things is and will continue to be on data; specifically data volume, management and usage. Companies are scrambling to adapt to this new and unpredictable data reality with legacy infrastructure that cannot handle the speed and volume of data. In his session at @ThingsExpo, Don DeLoach, CEO and president of Infobright, will discuss how companies need to rethink their data infrastructure to participate in the IoT, including: Data storage: Understanding the kinds of data: structured, unstructured, big/small? Analytics: What kinds and how responsiv...
Since 2008 and for the first time in history, more than half of humans live in urban areas, urging cities to become “smart.” Today, cities can leverage the wide availability of smartphones combined with new technologies such as Beacons or NFC to connect their urban furniture and environment to create citizen-first services that improve transportation, way-finding and information delivery. In her session at @ThingsExpo, Laetitia Gazel-Anthoine, CEO of Connecthings, will focus on successful use cases.
Sensor-enabled things are becoming more commonplace, precursors to a larger and more complex framework that most consider the ultimate promise of the IoT: things connecting, interacting, sharing, storing, and over time perhaps learning and predicting based on habits, behaviors, location, preferences, purchases and more. In his session at @ThingsExpo, Tom Wesselman, Director of Communications Ecosystem Architecture at Plantronics, will examine the still nascent IoT as it is coalescing, including what it is today, what it might ultimately be, the role of wearable tech, and technology gaps stil...
The true value of the Internet of Things (IoT) lies not just in the data, but through the services that protect the data, perform the analysis and present findings in a usable way. With many IoT elements rooted in traditional IT components, Big Data and IoT isn’t just a play for enterprise. In fact, the IoT presents SMBs with the prospect of launching entirely new activities and exploring innovative areas. CompTIA research identifies several areas where IoT is expected to have the greatest impact.
Wearable devices have come of age. The primary applications of wearables so far have been "the Quantified Self" or the tracking of one's fitness and health status. We propose the evolution of wearables into social and emotional communication devices. Our BE(tm) sensor uses light to visualize the skin conductance response. Our sensors are very inexpensive and can be massively distributed to audiences or groups of any size, in order to gauge reactions to performances, video, or any kind of presentation. In her session at @ThingsExpo, Jocelyn Scheirer, CEO & Founder of Bionolux, will discuss ho...
SYS-CON Events announced today that GENBAND, a leading developer of real time communications software solutions, has been named “Silver Sponsor” of SYS-CON's WebRTC Summit, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. The GENBAND team will be on hand to demonstrate their newest product, Kandy. Kandy is a communications Platform-as-a-Service (PaaS) that enables companies to seamlessly integrate more human communications into their Web and mobile applications - creating more engaging experiences for their customers and boosting collaboration and productiv...
Roberto Medrano, Executive Vice President at SOA Software, had reached 30,000 page views on his home page - http://RobertoMedrano.SYS-CON.com/ - on the SYS-CON family of online magazines, which includes Cloud Computing Journal, Internet of Things Journal, Big Data Journal, and SOA World Magazine. He is a recognized executive in the information technology fields of SOA, internet security, governance, and compliance. He has extensive experience with both start-ups and large companies, having been involved at the beginning of four IT industries: EDA, Open Systems, Computer Security and now SOA.
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
The industrial software market has treated data with the mentality of “collect everything now, worry about how to use it later.” We now find ourselves buried in data, with the pervasive connectivity of the (Industrial) Internet of Things only piling on more numbers. There’s too much data and not enough information. In his session at @ThingsExpo, Bob Gates, Global Marketing Director, GE’s Intelligent Platforms business, to discuss how realizing the power of IoT, software developers are now focused on understanding how industrial data can create intelligence for industrial operations. Imagine ...
Operational Hadoop and the Lambda Architecture for Streaming Data Apache Hadoop is emerging as a distributed platform for handling large and fast incoming streams of data. Predictive maintenance, supply chain optimization, and Internet-of-Things analysis are examples where Hadoop provides the scalable storage, processing, and analytics platform to gain meaningful insights from granular data that is typically only valuable from a large-scale, aggregate view. One architecture useful for capturing and analyzing streaming data is the Lambda Architecture, representing a model of how to analyze rea...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
SYS-CON Events announced today that Open Data Centers (ODC), a carrier-neutral colocation provider, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Open Data Centers is a carrier-neutral data center operator in New Jersey and New York City offering alternative connectivity options for carriers, service providers and enterprise customers.
When it comes to the Internet of Things, hooking up will get you only so far. If you want customers to commit, you need to go beyond simply connecting products. You need to use the devices themselves to transform how you engage with every customer and how you manage the entire product lifecycle. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, will show how “product relationship management” can help you leverage your connected devices and the data they generate about customer usage and product performance to deliver extremely compelling and reliabl...
The IoT market is projected to be $1.9 trillion tidal wave that’s bigger than the combined market for smartphones, tablets and PCs. While IoT is widely discussed, what not being talked about are the monetization opportunities that are created from ubiquitous connectivity and the ensuing avalanche of data. While we cannot foresee every service that the IoT will enable, we should future-proof operations by preparing to monetize them with extremely agile systems.
There’s Big Data, then there’s really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. Learn about IoT, Big Data and deployments processing massive data volumes from wearables, utilities and other machines.
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manufacturing, ...