Before I put any system I expect to run under load into production I want to know how much load it can handle. Without this knowledge how will I plan for scaling, how will I know where the weak points are, what kind of load will it put on my datastore? There's a lot more to consider, like disk access, CPU usage patterns, network IO, but lets keep it simple for now.
Once I'm happy with the results of my load test I can use the base line load figure for monitoring performance of, scaling and developing that system. If someone changes something and that baseline changes more than I expect I know to take a closer look with them.
I recently load tested an API I'd written using Cyclone on Twisted. It was simple, less than 5 endpoints. Programmatically it was a basic bridge between clients and Redis with minimal authorization and data manipulation. I was very disappointed with the results, managing only about 600 r/s on an m1.small on EC2. Because it was so simple I decided to spend an hour re-implementing it in Node.js, and for completeness in Go and Scala on the Play framework. Node, Go and Play performed very similarly, achieving around 1.2k r/s, almost double what Twisted was doing.
I was sad. I love Twisted. I used to just like it but since I discovered inline callbacks I've loved it.
For a while I wanted to marry it and have children with it.
Inline callbacks elegantly avoid the callback hell asynchronous frameworks can often land you in and make your code look more sequential. But I couldn't justify running a high traffic API at almost half the speed of Node just for pretty code.
I posted my benchmark to the Twisted mailing list on a Sunday and almost immediately got replies with questions about my system. All answers alluded to the fact that if I was using CPython instead of PyPy I was probably doing it wrong. They were so right.
I was hoping PyPy would allow Twisted to match the other frameworks but for my specific use case it smoked them. As it turns out it's very fast. The same load test that yields 1.2k r/s on Node.js, Go and Play got me only a few hundred shy of 2k r/s with Twisted on PyPy. I'm no mathemetician but that's a 60% performance gain, which happens to translate to a 60% cost saving.
Further to this my load test highlighted an easy optimisation for my app. I cached some objects in a distributed Redis cache we've started to use internally (more on that in another post), and we were using pickle to serialise and de-serialize some objects. I discovered that on PyPy Cyclone's json serializer was as quick as pickle. Of course that's not an optimisation because it's the same speed, but I realised something obvious. Where possible I could store the cached data in JSON which I could send to the client without de-serializing and re-serializing into JSON, making the difference of a few hundred requests a second. I know it sounds obvious but sometimes you need to see what a difference a small change can make to truly appreciate it. Good times.
This is of course not to say Twisted on PyPy is a universal solution that will always result in the fastest apps. It's not and it won't. This is why it's important to understand use cases for various frameworks, do benchmarking and use the most appropriate tool for the job. But in this case PyPy kicked ass.