Welcome!

PHP Authors: Liz McMillan, Carmen Gonzalez, Hovhannes Avoyan, Lori MacVittie, Trevor Parsons

Blog Feed Post

How to Proxyfy Apache

INTRODUCTION

There are a variety of ways to implement proxying capabilities for web servers. As Apache is the most popular web server, we will try to implement proxying on it. Everyone who knows Apache well, probably knows that Apache implements proxying capability for AJP13 , FTP, CONNECT , HTTP/1.x.

The choice of reverse proxy server is fully dependent on what is actually trying to be hidden behind it. Each proxy mechanism has its own benefits and bottlenecks. Only for Apache, there are several ways to hide application servers (mod_proxy, mod_passenger, mod_wsgi, mod_jk). While mod_passenger and mod_wsgi are good for ruby and python servers respectively, these are a little bit outside the proxying idea. In this article I would like to discuss mod_proxy and mod_jk.

HOW TO

Now let’s think about what we have and what we want to put under proxy. The most common case is to put a pool of Tomcat servers behind Apache. Tomcat servers by default listen to 8080 for HTTP and 8009 for AJP. Now, we want to have Apache listen to 80 for incoming HTTP requests and 443 for HTTPS. People who have configured Tomcat for SSL will undoubtedly agree with me that SSL on Tomcat is quite annoying, so it’s better to implement SSL on the Apache side rather than playing with Tomcat’s keystores.

 

Okay, now we have two Tomcat servers on 2 different servers with our application installed, and both are on 8080 and an 8009 HTTP/AJP respectively. And one Apache on a third which will do HTTP on 80 , HTTPS on 443 for us and process requests to downstream Tomcat servers.

Situation 1 with mod_proxy and mod_proxy_http:

 

OK, here’s what this means:

 

User opens http://www.yourdomain.com in their browser

  1. Request comes to Apache
  2. Apache proxies it via HTTP to downstream Tomcat to port 8080
  3. Tomcat sends response to Apache via HTTP
  4. Apache delivers content to User’s browser

Well, so what are the pros and cons of this situation? We will provide some comparison tables below, but in general:

Pros:

  1. Easy and quick to configure
  2. Works for all downstream application servers

Cons:

  1. We do not have sticky sessions: if a user logs in to Tomcat1 and sends another request it will most likely go to Tomcat2 and the user will get a session expired error.
  2. mod_proxy does not support failover detection, so it will continue to send requests to downstream Tomcat even if it is down.
  3. Some Java applications exhibit unpredictable behavior when they are under a proxy environment. (From my experience, Atlassian Bamboo and Fisheye server’s progress bars stalled on several pages, but this was corrected by moving to JK; I have heard about other strange problems as well. )

Now let’s see Situation 2, where we use JK for downstream servers:

A REAL LIFE EXAMPLE

At first sight we can see that nothing has been changed, but this is only at first sight. The main difference here is that now Apache is talking to the Tomcats via AJP 13 and not HTTP protocol. So the process of opening the web site is the following:

  1. User opens http://www.yourdomain.com in their browser
  2. Request comes to Apache
  3. Apache proxies it via AJP 13 to downstream Tomcat to the port 8009
  4. Tomcat sends response to Apache via AJP
  5. Apache receives AJP and delivers content to Users browser via HTTP

It seems there is a little overhead with jumping around on HTTP and AJP, but there are benefits as well. Let’s see the Good and Bad sides of JK balancing:

Pros:

  1. After a little tweaking we can have sticky sessions just by adding sticky_session=True on Apache and jvmRoute=”NODENAME” on the Tomcat sides. After this, users who are logged in to Tomcat1 will never be dropped to Tomcat2 until Tomcat1 is alive. (Actually you can Use Membase or Memcached as session store so users will never lose their session until it expires normally)
  2. We have node failure detection, so if Tomcat1 fails, Apache will not send requests to it until it detects that it is back.
  3. JK configuration is much more advanced than that of mod_proxy and allows lots of tweaking, which will result in better performance and make the environment work just as you need it to.
  4. JK has a web admin tool that allows you to decommission, suspend and play with the LB factor in real time.

Cons:

  1. So far I have found only one bad thing: it is a little harder to configure, so it required some administrator skills.

At this moment you may be asking “Why do I need this? I have a single Tomcat server and it’s working fine”.  As a matter of fact, you need to build a network which can handle your current load, be scalable and which will not affect the normal behavior of your websites. From this point of view, the choice of reverse proxy solution is quite reasonable.

Here is a real life example of one of our client server architectures, which I think is a good one :)

 

In general, the process is as follows:

  1. User does DNS request, gets ip address of one of the Varnish servers and the Static content server/s (NGINX).
  2. NGINX delivers content directly.
  3. Varnish caches whatever needs to be cached and sends request downstream to one of the Apaches.
  4. Apache gets JSESSIONID and forwards request via JK to the required Tomcat server or does balance if user does not have cookie.
  5. Tomcat servers keep sessions in local RAM and copy in Membase cluster (so even if one Tomcat fails another can retrieve its session from Membase ). Membase is clustered memcache so it is fault tolerant by nature (we will have a closer look at Membase in another article).
  6. Tomcat does needed application logic, (retrieves information from Hadoop/HBase database, etc.) and responds to Apache.
  7. Apache sends response back to Varnish.
  8. Varnish updates cache if needed and does delivery to client.

This is a real live working scenario, and it proved itself to be fault tolerant and extremely fast.

I know that after reading this article a lot of people will ask, “why is Apache needed when Varnish can do session stickiness, etc. …”

But the idea here is to use the best possible software for each particular role, software which has real and approved redundancy and reasonable layers of architecture which can help us to easily and quickly detect problems and fix them as they appear. Also, if we keep in mind that the client uses not only HTTP, but also HTTPS, I did not see any webserver which worked with SSL as smoothly as Apache did. Even if we do not have SSL initially, we will have it soon, and I do not believe that any web project can go far without SSL.

Following is a little comparison of JK and mod_proxy, so you can see more closely what these tools are.

 

Features mod_proxy Weight mod_jk Weight
Load balancing Basic 5 Advanced 10
Node failure detection mod_proxy_balancer has to be present in the server 7 Advanced 10
Backend SSL supported (mod_ssl required) 5 not supported 0
Session stickiness not supported 0 Supported via JVM Route 10
Protocols HTTP, HTTPS 10 AJP 13 8
Node decommissioning Manual needs Apache reload 3 Online via web admin 10
Web admin interface Not present 0 Advanced with RO and RW support 10
Large AJP packet sizes 8K 5 Larger than 8K 10
Compatibility with other app. servers Works with all HTTP application servers 10 AJP Compatible (Tomcat, Glassfish, etc. …) 5
Configuration Compatible with Apache Httpd configuration file 10 Need separate JK Workers file in .properties format 8
Summary 55 81

 

So now let’s do some stress tests on both mod_jk and mod_proxy. The Installation schema is as described above (one load balancer, two application servers.) On both Apache server hosts, monitoring software from Monitis.com is installed which will check the servers’ health in real time.

We have used Amazon EC2 medium instances for this test. Here are the load test results in both graphical and plain text mode.

Monitoring is implemented using Monitis M3 monitors.

There are 2 monitors used:

apache_monitor – used for apache server’s health check.

http_load monitor - used to check the load time difference during Apache benchmarking.

 

The mentioned monitors provide useful information which helps to find relationships between various metrics.

mod_proxy:

The graphic below depicts Apache worker’s status while busy (upper line) and idle (lower line) while benchmarking using

mod_proxy balancer.

This graph shows Apache busy and idle worker processes on the Apache web server, so we can see that of 150 enabled processes, almost all are busy during the stress test.

 

Http content load time (time connect, time transfer, time total)

Following is data provided by siege after benchmarking 7 times (using mod_proxy), each time increasing the concurrent users’ number by 100:

 

Concurrent conns. Trans Elap Time Data Trans Resp Time Trans Rate Throughput Concurrent Failed
100 112173 359.18 206 0.32 312.30 0.57 99.93 0
200 181578 360.01 333 0.40 504.37 0.92 199.72 3
300 179025 360.00 329 0.60 497.29 0.91 299.37 5
400 177681 360.00 326 0.81 493.56 0.91 397.44 40
500 166401 359.99 305 1.07 462.24 0.85 494.52 130
600 160853 359.99 295 1.31 446.83 0.82 584.32 444

 

mod_jk:

The graphic below represents Apache worker’s busy (upper line) and idle (lower line) status while benchmarking using

mod_jk.


This graph shows Apache busy and idle worker processes on the Apache webserver, so we can see that of 150 enabled processes, almost all are busy during the stress test.

Http content load time (time connect, time transfer, time total)

Following is data provided by siege after benchmarking 7 times (using mod_jk), each time increasing the concurrent users number by 100:

 

Concurrent conns. Trans Elap time Data Trans Resp Time Trans time Throughput Concurrent Failed
100 106919 359.60 198 0.34 297.33 0.55 99.93 0
200 186123 360.01 345 0.39 516.99 0.96 199.76 0
300 183017 360.00 339 0.59 508.38 0.94 299.29 8
400 179891 360.00 333 0.80 499.70 0.93 397.34 49
500 169284 359.99 313 1.05 470.25 0.87 494.55 124
600 182954 359.99 339 1.16 508.22 0.94 590.32 258

 

 

CONCLUSION

Both mentioned modules, mod_proxy and mod_jk, are used as balancers for backend application servers such as Tomcat and GlassFish. What are the most important features in load balancing? I assumed node failure detection at first, and ease of session stability and load balancing configuration, without requiring any other extra tools or packages. Do not forget about performance, as well.

So what do we have? The resulting tables show that when advanced load balancing or node failure detection is needed, mod_jk is preferable. However, it cannot provide flexibility such as mod_proxy does when configuring (mod_proxy configuration is as easy as Apache configuration and there is no need for separate files like workers.properties) nor for compatibility needs with servers, other than AJP compatibility.

Now a little bit about performance. While the concurrent users count is not so high (in our case: 400), both servers’ behavior is similar, and it seems mod_proxy is able to provide better performance, but things changed as the number of concurrent users grew.

Take a look at this table:

 

Concurrent users Failed requests(10 Seconds Timeout)
mod_jk 590.32 258
mod_proxy 584.32 444

As you see, with an almost equal number of connections, mod_proxy fails approximately 59% more often.

If you have a small project, or need to hide a variety of application servers (Tomcat+Rails+Django), and if you need an easily configurable and fast SSL solution and your server load is not heavy, then use mod_proxy.

But if your goal is to loadbalance Java applications servers, then JK is definitely the better solution.

Share Now:del.icio.usDiggFacebookLinkedInBlinkListDZoneGoogle BookmarksRedditStumbleUponTwitterRSS

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

@ThingsExpo Stories
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
TechTarget storage websites are the best online information resource for news, tips and expert advice for the storage, backup and disaster recovery markets. By creating abundant, high-quality editorial content across more than 140 highly targeted technology-specific websites, TechTarget attracts and nurtures communities of technology buyers researching their companies' information technology needs. By understanding these buyers' content consumption behaviors, TechTarget creates the purchase inte...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
Almost two-thirds of companies either have or soon will have IoT as the backbone of their business. Though, IoT is far more complex than most firms expected with a majority of IoT projects having failed. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, Chief IoTologist at Wipro, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology portfolios and business models to adopt and leverage IoT. He will delve in...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Who are you? How do you introduce yourself? Do you use a name, or do you greet a friend by the last four digits of his social security number? Assuming you don’t, why are we content to associate our identity with 10 random digits assigned by our phone company? Identity is an issue that affects everyone, but as individuals we don’t spend a lot of time thinking about it. In his session at @ThingsExpo, Ben Klang, Founder & President of Mojo Lingo, discussed the impact of technology on identity. Sho...
There are 66 million network cameras capturing terabytes of data. How did factories in Japan improve physical security at the facilities and improve employee productivity? Edge Computing reduces possible kilobytes of data collected per second to only a few kilobytes of data transmitted to the public cloud every day. Data is aggregated and analyzed close to sensors so only intelligent results need to be transmitted to the cloud. Non-essential data is recycled to optimize storage.
SYS-CON Events announced today that SD Times | BZ Media has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. BZ Media LLC is a high-tech media company that produces technical conferences and expositions, and publishes a magazine, newsletters and websites in the software development, SharePoint, mobile development and commercial UAV markets.
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
In today's uber-connected, consumer-centric, cloud-enabled, insights-driven, multi-device, global world, the focus of solutions has shifted from the product that is sold to the person who is buying the product or service. Enterprises have rebranded their business around the consumers of their products. The buyer is the person and the focus is not on the offering. The person is connected through multiple devices, wearables, at home, on the road, and in multiple locations, sometimes simultaneously...
China Unicom exhibit at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. China United Network Communications Group Co. Ltd ("China Unicom") was officially established in 2009 on the basis of the merger of former China Netcom and former China Unicom. China Unicom mainly operates a full range of telecommunications services including mobile broadband (GSM, WCDMA, LTE FDD, TD-LTE), fixed-line broadband, ICT, data communica...
As businesses adopt functionalities in cloud computing, it’s imperative that IT operations consistently ensure cloud systems work correctly – all of the time, and to their best capabilities. In his session at @BigDataExpo, Bernd Harzog, CEO and founder of OpsDataStore, will present an industry answer to the common question, “Are you running IT operations as efficiently and as cost effectively as you need to?” He will expound on the industry issues he frequently came up against as an analyst, and...
WebRTC is about the data channel as much as about video and audio conferencing. However, basically all commercial WebRTC applications have been built with a focus on audio and video. The handling of “data” has been limited to text chat and file download – all other data sharing seems to end with screensharing. What is holding back a more intensive use of peer-to-peer data? In her session at @ThingsExpo, Dr Silvia Pfeiffer, WebRTC Applications Team Lead at National ICT Australia, looked at differ...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
IoT offers a value of almost $4 trillion to the manufacturing industry through platforms that can improve margins, optimize operations & drive high performance work teams. By using IoT technologies as a foundation, manufacturing customers are integrating worker safety with manufacturing systems, driving deep collaboration and utilizing analytics to exponentially increased per-unit margins. However, as Benoit Lheureux, the VP for Research at Gartner points out, “IoT project implementers often un...
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
SYS-CON Events announced today that IoT Now has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
SYS-CON Events announced today that WineSOFT will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Based in Seoul and Irvine, WineSOFT is an innovative software house focusing on internet infrastructure solutions. The venture started as a bootstrap start-up in 2010 by focusing on making the internet faster and more powerful. WineSOFT’s knowledge is based on the expertise of TCP/IP, VPN, SSL, peer-to-peer, mob...
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, discussed the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports.