In a blog post last month, we outlined why a traditional Apache+PHP setup will inevitably fail the growing needs of medium to large AJAX-based websites. The article is continued this month by analyzing different methods of scaling web services past the concurrency barrier inherent in a basic Apache+PHP setup.
While speaking with the technical minds of several companies in our industry, there were several very good questions raised about assumptions we had made when building our system. The feedback from the first blog post about this topic revolved around the following two questions:
- Why would anybody need to scale to 100K (one hundred thousand) concurrent web service requests?
- Isn’t database layout and server bandwidth the primary scalability issue for web services?
We tend to forget that not everybody spends their time thinking about scaling to this level. Scalability requirements that are quite natural to us sometimes come off as exceedingly steep web service requirements to others. The questions above are ones that we have grappled with in the past. Here are the answers that we’ve found over the past several years through vigorous internal debate.
Why do we need to scale past 100K concurrent web service requests?
Google, Amazon, Yahoo and other very large companies have huge data centers and handle more than 100K web service requests per second across a number of their business units. They achieve this through a combination of very large web server farms that are properly configured to load balance requests among hundreds of web service application processors, using Apache+PHP in some cases. It is rare for the majority of websites out there to see more than 50 simultaneous web service requests per second, so 100K simultaneous requests seems like a number that is outside of the mainstream.
While it is true that 100K concurrent requests is large by today’s standards, it is important to note that efficiency is a significant competitive advantage. If a web service processor can handle 100K concurrent requests per second, you can be sure that it will require less hardware to operate the web service processors and thus less cooling and other monthly fees that must come out of your organization’s bottom line. Would you rather manage 120 web service application processors or 4 of them?
Keep in mind that the web moves very quickly and that the number of AJAX requests per web page hit is rapidly increasing. 20 years ago, it was inconceivable that we would be managing terabytes of data on a single hard drive or that we would have the Web, accessed by over 1.4 billion people. History has shown that once an Internet technology is adopted, it is rare that its growth is not exponential.
Large and small sites running AJAX-based web applications can benefit from increased web service request throughput. While this may be a nice-to-have feature today, it will become a must-have feature as time ticks onward.
Isn’t database layout and server bandwidth the primary scalability issue for web services?
Database layout and caching is an important problem that receives a great deal of attention on the web. Numerous sites have suffered due to bad database layout and sloppily written SQL queries. However, the problem domain of databases and web service processors are different. This article focuses on ensuring that a web service processor is never a bottle neck.
For example – there are a number of web services that can run from data contained purely in RAM, no need for a database. Stock tickers, Facebook applications, IM systems, games, unit conversion calculators, and current weather data statistics, are just a few of the types of web service applications that will never be bound by a database layer.
Thin data pipes will almost always affect your website far more than a badly designed web service application processor. However, what happens when you have a really fat pipe to your website (as you often do in a data center)? HTTP requests can often be stuffed into a single TCP packet. Assuming that the packet is roughly 1KB in size and you have a low-bandwidth T1 link (1.544Mbps), you could quite conceivably receive up to 1,508 web service requests per second. 400 concurrent requests per second is beyond what a dedicated Apache+PHP server is capable of providing.
What happens when you have designed your database correctly and your site lives in a data center, but your web service processor fails to cope with spikes in usage? What happens when your web service processor is the bottleneck? How do you take advantage of the multiple cores available to most servers these days without having to re-train your development teams?
The good news is that you don’t need anything that is expensive, highly complex, difficult to administer, or uses a new programming language. Traditional software development tools coupled with some lessons learned in the 70s and 80s regarding good software design provide the solution to scaling web services on modern multi-core CPU configurations.
The Test Configuration
The test hardware and software configuration used the same one that was used in the previous blog post. We utilized a single computer containing a quad-core AMD Phenom (1.8Ghz per core) and 4GB of RAM (3.2GB available). The software was the Digital Bazaar MODEST Web Service Processor configured to run two different processing models:
- MODEST Operations (100,000 operations, executed from a thread pool size of 4 – one thread per CPU core)
- MODEST Fibers (100,000 fibers, running on 4 operations executed from a thread pool size of 4 – one operation/thread per CPU core)
The Operation-based processing model utilizes 100K Operations running on a thread pool size of 4 (one thread per CPU core). This model starts a new Operation for every web service request. Each operation will encode and decode a fairly complex and verbose JSON object 50 times in one shot.
The Fiber-based processing model utilizes 100K Fibers running on 4 Operations on a thread pool size of 4 (one thread per CPU core). There is one operation per CPU core, and one Fiber per web service request. Each fiber will concurrently encode and decode a fairly complex and verbose JSON object 50 times, performing one encode/decode cycle for every time it is scheduled to run.
The difference between the Operation-based processing model and the Fiber-based processing model is that 50 JSON encode/decodes are performed per Operation time slice. The Fiber-based processing model performs 1 JSON encode/decode per Operation time slice. The following table summarizes the differences between both processing models.
|Operation-based Model||Fiber-based Model||Threads||4||4|
|Decode/Encodes per time slice||50||1|
Each processing model was run with a thread pool size ranging from 1 to 8 and a work unit number ranging from 1 to 100,000. All work units were entered into each system before processing began. For example, 100K fibers were entered into the system and then the run function was called in order to simulate 100K simultaneous web service requests. The time taken to complete given a thread pool size and workload is shown on the vertical axis of the graphs below.
The two graphs above demonstrate a number of interesting things:
- The MODEST Web Service Processor is able to process 100,000 work units concurrently where POSIX threads fails to achieve processing 400 concurrent work units.
- It is possible to achieve 96% efficiency when scaling to multiple cores using the C/C++ programming language and a simple fiber-based application framework. There is no need to shift programming languages (like shifting to ERLANG) to achieve efficient scalability.
- Linear scalability is demonstrated given additional processing cores.
- Fibers are faster when less cores are used, slower when load is high and more cores are used (due to fiber scheduling overhead).
- Even though Fibers have much more concurrency, the processing throughput isn’t greatly affected compared to Operations.
While Fibers at heavy workloads may be slower than Operations at heavy workloads – the amount of concurrency achieved by Fibers are much higher than that achieved by Operations. Whether you use Fibers or Operations is wholly dependent on your application or particular web service call. MODEST allows you to mix and match Fibers and Operations at run-time based on the web service application or based on a particular web service call. If your web service application is performing something that requires low latency, like video streaming, you will want to use Fibers. If your web service application is performing bulk data processing and requires the highest throughput, you will want to use Operations.
How does the MODEST Web Services Processor scale to 100K and beyond?
Note that there were no optimizations made to the Linux kernel or the system on which the MODEST Web Services Processor ran. The application ran under a regular UNIX user account with default priorities. The hardware was a cheap, commercial off-the-shelf system. The secret lies in how the software is designed to operate and provide maximum throughput and concurrency.
The Secret of Fiber Efficiency
When you create a new POSIX thread under Linux, 8MB is allocated to the thread for data and stack space. This is far less than a process, which could take up tens if not hundreds of megabytes. When you create a new Fiber in MODEST, only the space that is needed by the Fiber is allocated – usually around ten to a couple of hundred bytes. Fibers perform co-operative multi-tasking, meaning that they voluntarily take themselves off of the work queue when they have completed a unit of work. This has two effects:
- Concurrency doesn’t come at a high memory cost (as it does with POSIX threads).
- Process/Thread scheduling overhead is greatly reduced.
- The unit of work is decided by the Fiber, not the Operating System.
Downsides of Fibers
There are some downsides to this model that one must be aware of before implementing such a system:
- If a Fiber is greedy or buggy, the entire system suffers. A fiber, by design, cooperatively yields processing time. If a fiber never yields processing time, it could consume 100% of a single CPU core.
- You must implement the web service in C++ if you are going to use the MODEST Web Services Processor. Most web service developers expect a web service to be written in Java, PHP, ASP or similar web technology.
The computing industry is known for jumping from one technology to the next, often before the technology matures and often because a technology claims to solve a current aggravation. Our company had jumped on the Tomcat/Java/SOAP bandwagon when it first started rolling and while it helped us get to where we are today, we would have been much better off not adopting the technology in the first place and building our own. That is the lesson learned over the past four years – sometimes you’re better off rolling your own.
Instead of the typical Tomcat/Java/SOAP or the Apache/PHP/JSON stack, we launched Bitmunk 3.0 in July 2008 with MODEST/C++/JSON as our web services stack, and we’re very happy with how it has been performing. We will release the source to MODEST after we get the next stable release of Bitmunk 3 out to the public.
Thanks go to Dave Longley for his hard work on and excellent design of the MODEST Web Services Processor and the rest of the core technology that allows us to do what we do. Mike Johnson, for design and development of our websites, databases and UIs, as well as keeping all the operational stuff running at Digital Bazaar. David I. Lehn for his hand in making MODEST a reality as well as continuing to push us to do things like this research and blog post.
This research is the result of their efforts to continue improving Bitmunk such that we can all some day achieve a healthy ecosystem for the purchase and sale of digital goods using web browsers, open standards, and royalty schemes that fairly reward artists, fans and publishers.