What we were expecting to see were MODEST operations and fibers outperforming the pure POSIX thread approach. We expected the pure POSIX thread approach to scale to around 3000-5000 threads before things started to bottle-neck. The process limit was set to 32,768 – so we thought that the hard limit would be in that neighborhood.
These are the types of assumptions, that when untested, lead to horrible scalability mishaps. Imagine launching a piece of software that used pure POSIX threads to scale connections and having this happen (the red line) just as you started to get lots of site traffic:
So, what caused this nightmare? Take a second or two to think about the problem and make an educated guess. You will see if you were right or not on the next page.
What went wrong…