Web Stack Myths: PHP is Faster Than You Think

Lies, Damn Lies and Benchmarks. Software benchmarking is like trying to measure the individual performance of soccer players on a team and then using that information to predict how they will do in the World Cup. When there is a clear gap between the abilities of a novice and the abilities of a professional, the outcome is almost always predictable. Things get interesting, however, when you take professionals at the height of their careers and pit them against each other.

Benchmarking software follows the same philosophy. There are so many variables at play that it is often hard to predict how one set of benchmarks will play out in the real world. By forcing two state-of-the-art systems to their limits, we can learn about strengths and weaknesses in each system. Over the past week, we did a small project to benchmark the latest release of Monarch3 against Apache2 and PHP5.

We do this informally every few months in an attempt to see whether or not our LMM (Linux + Monarch + MySQL) Web Server stack is still outperforming the standard LAMP (Linux + Apache + MySQL + PHP) stack used for the vast majority of the websites on the Internet. It is a common complaint on the Web that PHP performance is dismal and that Apache could be faster. What we found surprised us. We thought we knew the state of the art in Apache and PHP performance – we were wrong:

The Questions

The questions that web developers often ask when deploying software are:

  • “Are our servers providing a quality service that is responsive to our customers?”
  • “Are our API services as fast and responsive as we want them to be?”
  • “Can we predict how our servers will perform under heavy load?”
  • “How many concurrent requests can we perform before locking up the cluster?”
  • “Did we do anything in this release to harm our throughput and latency?”

While benchmarking can’t truly answer all of these questions, it can give one a general idea of the performance characteristics of your service. We had replaced our Apache and PHP stack because our previous benchmarks had demonstrated that we couldn’t easily scale to hundreds of thousands of simultaneous connections and ensure a good quality of service for our customers. We were basing our decision on benchmarks that we had seen elsewhere on the web, intuition and rough benchmarks that we had run during the previous years. The one question that was looming over our heads was: Did we make the right decision?

Discoveries

Earlier this month, we decided to design a fairly rigorous benchmarking process. This benchmarking process will be used in future releases to ensure that the software we are releasing is predictable under heavy load. We learned a number of things during the benchmarking sessions:

  • Apache2+PHP5 performance was impressive and far better than we had anticipated.
  • Monarch is still at least twice as fast and more efficient than Apache2+PHP.
  • Apache2+PHP5 has quite a bit of performance jitter and latency as you add concurrent connections.
  • There may be a lock contention issue that is holding Monarch back.
  • Monarch has a very stable jitter profile, even under very heavy load.

The Benchmarking Setup

The benchmark that we designed is available via github. It is best explained in the context of a soft-realtime system. We must ensure that we are able to complete a Web request (one HTTP request to our website or API servers) within 3 seconds. Granted, three seconds is arbitrary – it changes from year to year, from service to service. Three seconds, however, is our soft realtime limit for the benchmark that we are running today. Anything over three seconds is deemed a failed request. We also want to make sure that network bandwidth doesn’t limit the tests, so we are using a 128 byte HTTP response. The faster we can serve requests within a reliable time frame, the better the system performance. So, the three questions that we’re asking with this particular benchmark are:

  • How many requests per second can Apache2+PHP5 do, and how does it relate to what Monarch can do?
  • What is the latency for both Apache2+PHP5 and Monarch as you increase the number of concurrent connections?
  • When do we fail to meet our 3 second response quality-of-service requirement?

We used two similarly equipped AMD Phenom 9150e quad-core Debian Linux boxes with 2GB of RAM and a 100Mbps network connection between them. The benchmarking software that we used was the Apache HTTP server benchmarking tool, commonly called via the ab command in Linux. One of the boxes ran the server software, the other box ran the benchmarking software.

We wanted to make sure that we were primarily measuring the server performance, therefore the amount of data that was being served was reduced to just 128 bytes. The payload was 128 period characters (“.”). There was no compression of HTTP traffic. There was no use of HTTP Keep Alive. The number of available file descriptors for the test was set at 10,000 for the server software and 32,000 for the benchmarking client software. We should have set them to the same value, but we failed to do so before running the main benchmarks. Increasing the server file descriptor number didn’t seem to affect performance that greatly when we spot checked with larger values for the server after the main benchmarks were completed.

The Apache server configuration file can be found here. The Monarch configuration was set to the defaults. Monarch will automatically scale itself based on request demand, so no special configuration is needed for this benchmark.

The Apache benchmark URL data was generated by a PHP file that used a single print() function to print 128 period (‘.’) characters. The Monarch benchmark did the same, but was compiled down to a C++ module that was loaded into the Monarch web server (this is the standard way of programming in the Monarch environment – C++).

The test suite starts at 1 concurrent connection and tries to complete 100,000 requests to the server as fast as it can. The test then waits 5 seconds, so that any spare file descriptors can be freed by Linux, increments the concurrent connection count to 100 connections and tries to complete 100,000 requests to the server as fast as possible. It continues stepping up by 100 concurrent connections until it reaches 10,000 concurrent connections.

This data is then collated and graphed using software that we developed internally. The software has been released as open source software on github.

Apache2 + PHP

When we had performed this benchmark in 2008, we had benchmarked our cluster that was running Apache2+PHP4 at the time. The tests included a number of database hits. The results hovered around 20 – 100 requests per second, which was overkill for most websites. Looking to the future, we knew that we wanted to process thousands, if not tens-of-thousands of requests per server. Most companies that build useful Web APIs, like Twitter and Facebook, must ensure this level of throughput. We didn’t think Apache2+PHP4 would be able to handle that load, but more importantly – we didn’t think Apache2+PHP5 could handle anywhere close to 5,000 requests per second.

This was a mistake on our part – we had read so many blog posts and benchmarks on the Internet blaming PHP for slow performance that we assumed it to be true instead of verifying the results for ourselves. We were wrong to think that PHP was a major bottleneck in our system:

The red crosses in the graph above are each one data point (one run at X concurrent connections for 100,000 requests). The smooth black line is a fitted bezier curve that follows the trend in the data points.

The first data point is low because it is only using 1 connection to process 100,000 requests. Since the step size is 100, the second data point is much higher since the client and server are able to fulfill many more requests simultaneously. In other words, one person using one straw to drink a glass of water will take far longer than 100 people using 100 straws to drink from the same glass of water. Gross, but you get the idea. You will see the same issue crop up in each graph.

The graph above shows that even at 2,000 simultaneous connections that Apache2+PHP5 can process almost 8,000 requests per second. Not bad for an interpreted language – it was certainly many orders of magnitude off from what we thought PHP was capable of doing.

As you increase the number of concurrent connections, however, performance drops considerably and becomes quite jittery past 4,500 concurrent connections.

The following graph describes what the latency profile looks like for Apache2+PHP5 as we increase the number of concurrent connections:

The red crosses in the graph above are each one data point (one run at X concurrent connections for 100,000 requests). The smooth black line is a fitted bezier curve that follows the trend in the data points.

Latency grows at a fairly predictable rate until around 4,500 connections. If more concurrent connections are used after that point, latency jitter becomes increasingly unpredictable. The soft real-time requirement of 3 seconds per request fails to be met at around 8,500 concurrent connections. So, does Monarch fare better?

Monarch

We designed and built Monarch as a soft-realtime Web server. We wanted fast, predictable performance in order to ensure that the people browsing a Monarch-based website would have a very responsive website experience. You can read more about the technical details behind Monarch on the github website. Monarch is an open source project.

So, how did Monarch’s throughput fare as the number of concurrent connections increased?

The red crosses in the graph above are each one data point (one run at X concurrent connections for 100,000 requests). The smooth black line is a fitted bezier curve that follows the trend in the data points.

We were fairly pleased with the performance. At peak performance, it is almost 40% better than Apache2+PHP5 – the performance gap increases as you add concurrent connections past peak performance. The rate is stable and shows little decline as you increase the number of concurrent connections.

You may also notice that some data points (the red crosses) are far above the black trend line. This happened during several benchmarking runs, but always in different places. These spikes in performance leads us to believe that there are some lock contention issues that are keeping Monarch performance artificially low. We’re still investigating why a small handful of the data points demonstrated dramatically increased performance.

The latency graphs demonstrate that Monarch’s response speed is still doing well:

The red crosses in the graph above are each one data point (one run at X concurrent connections for 100,000 requests). The smooth black line is a fitted bezier curve that follows the trend in the data points.

The latency scales almost linearly with the number of concurrent connections. There are also low data points speckled on this graph (lower latency is better). These data points correspond with the high data points (high throughput is better) in the previous graph. Again, we believe this is due to some lock contention issues when Monarch is under load.

Conclusion

The most surprising discovery that came out of these benchmarks is how well Apache2 and PHP5 perform against a Monarch pre-compiled C++ module. If you have ever wondered how pure compiled code fares against PHP5, here’s your answer:

While Monarch out-performs Apache2+PHP5 in raw throughput, it’s not by more than an order of magnitude at peak performance. The gap may widen when real work is performed by the PHP script vs. the C++ module. While the Apache2+PHP5 approach fails the soft real-time requirement far before Monarch does, we live in a time of cheap servers, clustering and cloud computing. If your system doesn’t perform well under many concurrent connections, you can always buy more servers and load balancers. The benefits of writing a high-performance C++ Monarch module must be weighed against the gentle learning curve and fast development speed of writing a PHP script.

We are already planning a future benchmark that will push Monarch past the 3 second soft realtime limit.

Monarch really shines when it comes to low, predictable latency.

All in all, we’re glad that Monarch turned out to be such a small, fast, efficient and stable Web platform that operates predictably under heavy load. Our next set of benchmarks will focus on doing real-world work in PHP5 and Monarch.

If you have a common use case for an API server that you would like us to simulate in our benchmarks, please let us know in the comments. We’re trying to collect a common set of use cases for the purposes of benchmarking Apache2+PHP5 and Monarch web server performance.

CONTACT US

We're not around right now. But you can send us an email and we'll get back to you, asap.

Sending

©2017 Digital Bazaar, Inc. All rights reserved.

Log in with your credentials

Forgot your details?