|By Hovhannes Avoyan||
|October 25, 2012 08:55 AM EDT||
There are a variety of ways to implement proxying capabilities for web servers. As Apache is the most popular web server, we will try to implement proxying on it. Everyone who knows Apache well, probably knows that Apache implements proxying capability for AJP13 , FTP, CONNECT , HTTP/1.x.
The choice of reverse proxy server is fully dependent on what is actually trying to be hidden behind it. Each proxy mechanism has its own benefits and bottlenecks. Only for Apache, there are several ways to hide application servers (mod_proxy, mod_passenger, mod_wsgi, mod_jk). While mod_passenger and mod_wsgi are good for ruby and python servers respectively, these are a little bit outside the proxying idea. In this article I would like to discuss mod_proxy and mod_jk.
Now let’s think about what we have and what we want to put under proxy. The most common case is to put a pool of Tomcat servers behind Apache. Tomcat servers by default listen to 8080 for HTTP and 8009 for AJP. Now, we want to have Apache listen to 80 for incoming HTTP requests and 443 for HTTPS. People who have configured Tomcat for SSL will undoubtedly agree with me that SSL on Tomcat is quite annoying, so it’s better to implement SSL on the Apache side rather than playing with Tomcat’s keystores.
Okay, now we have two Tomcat servers on 2 different servers with our application installed, and both are on 8080 and an 8009 HTTP/AJP respectively. And one Apache on a third which will do HTTP on 80 , HTTPS on 443 for us and process requests to downstream Tomcat servers.
Situation 1 with mod_proxy and mod_proxy_http:
OK, here’s what this means:
User opens http://www.yourdomain.com in their browser
- Request comes to Apache
- Apache proxies it via HTTP to downstream Tomcat to port 8080
- Tomcat sends response to Apache via HTTP
- Apache delivers content to User’s browser
Well, so what are the pros and cons of this situation? We will provide some comparison tables below, but in general:
- Easy and quick to configure
- Works for all downstream application servers
- We do not have sticky sessions: if a user logs in to Tomcat1 and sends another request it will most likely go to Tomcat2 and the user will get a session expired error.
- mod_proxy does not support failover detection, so it will continue to send requests to downstream Tomcat even if it is down.
- Some Java applications exhibit unpredictable behavior when they are under a proxy environment. (From my experience, Atlassian Bamboo and Fisheye server’s progress bars stalled on several pages, but this was corrected by moving to JK; I have heard about other strange problems as well. )
Now let’s see Situation 2, where we use JK for downstream servers:
A REAL LIFE EXAMPLE
At first sight we can see that nothing has been changed, but this is only at first sight. The main difference here is that now Apache is talking to the Tomcats via AJP 13 and not HTTP protocol. So the process of opening the web site is the following:
- User opens http://www.yourdomain.com in their browser
- Request comes to Apache
- Apache proxies it via AJP 13 to downstream Tomcat to the port 8009
- Tomcat sends response to Apache via AJP
- Apache receives AJP and delivers content to Users browser via HTTP
It seems there is a little overhead with jumping around on HTTP and AJP, but there are benefits as well. Let’s see the Good and Bad sides of JK balancing:
- After a little tweaking we can have sticky sessions just by adding sticky_session=True on Apache and jvmRoute=”NODENAME” on the Tomcat sides. After this, users who are logged in to Tomcat1 will never be dropped to Tomcat2 until Tomcat1 is alive. (Actually you can Use Membase or Memcached as session store so users will never lose their session until it expires normally)
- We have node failure detection, so if Tomcat1 fails, Apache will not send requests to it until it detects that it is back.
- JK configuration is much more advanced than that of mod_proxy and allows lots of tweaking, which will result in better performance and make the environment work just as you need it to.
- JK has a web admin tool that allows you to decommission, suspend and play with the LB factor in real time.
- So far I have found only one bad thing: it is a little harder to configure, so it required some administrator skills.
At this moment you may be asking “Why do I need this? I have a single Tomcat server and it’s working fine”. As a matter of fact, you need to build a network which can handle your current load, be scalable and which will not affect the normal behavior of your websites. From this point of view, the choice of reverse proxy solution is quite reasonable.
Here is a real life example of one of our client server architectures, which I think is a good one
In general, the process is as follows:
- User does DNS request, gets ip address of one of the Varnish servers and the Static content server/s (NGINX).
- NGINX delivers content directly.
- Varnish caches whatever needs to be cached and sends request downstream to one of the Apaches.
- Apache gets JSESSIONID and forwards request via JK to the required Tomcat server or does balance if user does not have cookie.
- Tomcat servers keep sessions in local RAM and copy in Membase cluster (so even if one Tomcat fails another can retrieve its session from Membase ). Membase is clustered memcache so it is fault tolerant by nature (we will have a closer look at Membase in another article).
- Tomcat does needed application logic, (retrieves information from Hadoop/HBase database, etc.) and responds to Apache.
- Apache sends response back to Varnish.
- Varnish updates cache if needed and does delivery to client.
This is a real live working scenario, and it proved itself to be fault tolerant and extremely fast.
I know that after reading this article a lot of people will ask, “why is Apache needed when Varnish can do session stickiness, etc. …”
But the idea here is to use the best possible software for each particular role, software which has real and approved redundancy and reasonable layers of architecture which can help us to easily and quickly detect problems and fix them as they appear. Also, if we keep in mind that the client uses not only HTTP, but also HTTPS, I did not see any webserver which worked with SSL as smoothly as Apache did. Even if we do not have SSL initially, we will have it soon, and I do not believe that any web project can go far without SSL.
Following is a little comparison of JK and mod_proxy, so you can see more closely what these tools are.
|Node failure detection||mod_proxy_balancer has to be present in the server||7||Advanced||10|
|Backend SSL||supported (mod_ssl required)||5||not supported||0|
|Session stickiness||not supported||0||Supported via JVM Route||10|
|Protocols||HTTP, HTTPS||10||AJP 13||8|
|Node decommissioning||Manual needs Apache reload||3||Online via web admin||10|
|Web admin interface||Not present||0||Advanced with RO and RW support||10|
|Large AJP packet sizes||8K||5||Larger than 8K||10|
|Compatibility with other app. servers||Works with all HTTP application servers||10||AJP Compatible (Tomcat, Glassfish, etc. …)||5|
|Configuration||Compatible with Apache Httpd configuration file||10||Need separate JK Workers file in .properties format||8|
So now let’s do some stress tests on both mod_jk and mod_proxy. The Installation schema is as described above (one load balancer, two application servers.) On both Apache server hosts, monitoring software from Monitis.com is installed which will check the servers’ health in real time.
We have used Amazon EC2 medium instances for this test. Here are the load test results in both graphical and plain text mode.
Monitoring is implemented using Monitis M3 monitors.
There are 2 monitors used:
apache_monitor – used for apache server’s health check.
http_load monitor - used to check the load time difference during Apache benchmarking.
The mentioned monitors provide useful information which helps to find relationships between various metrics.
The graphic below depicts Apache worker’s status while busy (upper line) and idle (lower line) while benchmarking using
This graph shows Apache busy and idle worker processes on the Apache web server, so we can see that of 150 enabled processes, almost all are busy during the stress test.
Http content load time (time connect, time transfer, time total)
Following is data provided by siege after benchmarking 7 times (using mod_proxy), each time increasing the concurrent users’ number by 100:
|Concurrent conns.||Trans||Elap Time||Data Trans||Resp Time||Trans Rate||Throughput||Concurrent||Failed|
The graphic below represents Apache worker’s busy (upper line) and idle (lower line) status while benchmarking using
This graph shows Apache busy and idle worker processes on the Apache webserver, so we can see that of 150 enabled processes, almost all are busy during the stress test.
Http content load time (time connect, time transfer, time total)
Following is data provided by siege after benchmarking 7 times (using mod_jk), each time increasing the concurrent users number by 100:
|Concurrent conns.||Trans||Elap time||Data Trans||Resp Time||Trans time||Throughput||Concurrent||Failed|
Both mentioned modules, mod_proxy and mod_jk, are used as balancers for backend application servers such as Tomcat and GlassFish. What are the most important features in load balancing? I assumed node failure detection at first, and ease of session stability and load balancing configuration, without requiring any other extra tools or packages. Do not forget about performance, as well.
So what do we have? The resulting tables show that when advanced load balancing or node failure detection is needed, mod_jk is preferable. However, it cannot provide flexibility such as mod_proxy does when configuring (mod_proxy configuration is as easy as Apache configuration and there is no need for separate files like workers.properties) nor for compatibility needs with servers, other than AJP compatibility.
Now a little bit about performance. While the concurrent users count is not so high (in our case: 400), both servers’ behavior is similar, and it seems mod_proxy is able to provide better performance, but things changed as the number of concurrent users grew.
Take a look at this table:
|Concurrent users||Failed requests(10 Seconds Timeout)|
As you see, with an almost equal number of connections, mod_proxy fails approximately 59% more often.
If you have a small project, or need to hide a variety of application servers (Tomcat+Rails+Django), and if you need an easily configurable and fast SSL solution and your server load is not heavy, then use mod_proxy.
But if your goal is to loadbalance Java applications servers, then JK is definitely the better solution.Share Now:
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Carter, IBM General Manager Cloud Ecosystem and Developers, and a Social Business Evangelist, wil...
Nov. 29, 2015 03:00 AM EST Reads: 586
PubNub has announced the release of BLOCKS, a set of customizable microservices that give developers a simple way to add code and deploy features for realtime apps.PubNub BLOCKS executes business logic directly on the data streaming through PubNub’s network without splitting it off to an intermediary server controlled by the customer. This revolutionary approach streamlines app development, reduces endpoint-to-endpoint latency, and allows apps to better leverage the enormous scalability of PubNub’s Data Stream Network.
Nov. 29, 2015 03:00 AM EST Reads: 333
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Ben Perlmutter, a Sales Engineer with IBM Cloudant, demonstrated techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, faster user experience, both offline and online. The focus of this talk was on IBM Cloudant, Apache CouchDB, and ...
Nov. 29, 2015 02:45 AM EST Reads: 417
I recently attended and was a speaker at the 4th International Internet of @ThingsExpo at the Santa Clara Convention Center. I also had the opportunity to attend this event last year and I wrote a blog from that show talking about how the “Enterprise Impact of IoT” was a key theme of last year’s show. I was curious to see if the same theme would still resonate 365 days later and what, if any, changes I would see in the content presented.
Nov. 29, 2015 01:00 AM EST Reads: 434
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical to maintaining positive ROI. Raxak Protect is an automated security compliance SaaS platform and ma...
Nov. 28, 2015 08:00 PM EST Reads: 430
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data shows "less than 10 percent of IoT developers are making enough to support a reasonably sized team....
Nov. 28, 2015 01:00 PM EST Reads: 479
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York and Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound cha...
Nov. 28, 2015 12:00 PM EST Reads: 553
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
Nov. 28, 2015 12:00 PM EST Reads: 339
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, rich desktop and tuned mobile experiences can now be created with a single codebase – without compromising functionality, performance or usability. In his session at DevOps Summit, Charles Kendrick, CTO and Chief Architect at Isomorphic Software, demonstrated examples of com...
Nov. 28, 2015 11:45 AM EST Reads: 408
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningful and actionable insights. In his session at @ThingsExpo, Paul Turner, Chief Marketing Officer at...
Nov. 28, 2015 11:15 AM EST Reads: 417
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Nov. 28, 2015 11:00 AM EST Reads: 517
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
Nov. 28, 2015 10:30 AM EST Reads: 316
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.
Nov. 28, 2015 10:00 AM EST Reads: 199
The Internet of Everything is re-shaping technology trends–moving away from “request/response” architecture to an “always-on” Streaming Web where data is in constant motion and secure, reliable communication is an absolute necessity. As more and more THINGS go online, the challenges that developers will need to address will only increase exponentially. In his session at @ThingsExpo, Todd Greene, Founder & CEO of PubNub, exploreed the current state of IoT connectivity and review key trends and technology requirements that will drive the Internet of Things from hype to reality.
Nov. 28, 2015 08:45 AM EST Reads: 442
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessions, I wanted to share some of my observations on emerging trends. As cyber security serves as a fou...
Nov. 28, 2015 08:45 AM EST Reads: 332
With all the incredible momentum behind the Internet of Things (IoT) industry, it is easy to forget that not a single CEO wakes up and wonders if “my IoT is broken.” What they wonder is if they are making the right decisions to do all they can to increase revenue, decrease costs, and improve customer experience – effectively the same challenges they have always had in growing their business. The exciting thing about the IoT industry is now these decisions can be better, faster, and smarter. Now all corporate assets – people, objects, and spaces – can share information about themselves and thei...
Nov. 28, 2015 06:00 AM EST Reads: 254
Continuous processes around the development and deployment of applications are both impacted by -- and a benefit to -- the Internet of Things trend. To help better understand the relationship between DevOps and a plethora of new end-devices and data please welcome Gary Gruver, consultant, author and a former IT executive who has led many large-scale IT transformation projects, and John Jeremiah, Technology Evangelist at Hewlett Packard Enterprise (HPE), on Twitter at @j_jeremiah. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.
Nov. 28, 2015 05:30 AM EST Reads: 735
Too often with compelling new technologies market participants become overly enamored with that attractiveness of the technology and neglect underlying business drivers. This tendency, what some call the “newest shiny object syndrome” is understandable given that virtually all of us are heavily engaged in technology. But it is also mistaken. Without concrete business cases driving its deployment, IoT, like many other technologies before it, will fade into obscurity.
Nov. 28, 2015 05:00 AM EST Reads: 366
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true change and transformation possible.
Nov. 28, 2015 04:00 AM EST Reads: 544
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound effect on the world, and what should we expect to see over the next couple of years.
Nov. 28, 2015 03:30 AM EST Reads: 478