Serving 100k requests/second from a fanless Raspberry Pi 4 over Ethernet

A 12x performance boost to Node.js

recently hooked up my Raspberry Pi 4 to my TV and because I hadn’t benchmarked something in a while I decided to see if I could serve 100k HTTP req/sec from this little thing, over an actual Ethernet cable.

I have no fan on this thing, so I use a chunk of metal to passively cool it down somewhat.

I began by installing Ubuntu Server 20.10 for ARM64 via the Raspberry Pi Imager tool.

After booting Ubuntu on the Pi4, I then installed C++ build tools and cloned & built µWebSockets:

git clone --recursive https://github.com/uNetworking/uWebSockets
cd uWebSockets
make
./HelloWorldThreaded

Then I simply ran the HelloWorldThreaded executable and started htop.

If you don’t know about µWebSockets, this is roughly how it looks like, minimal snippet:

uWS::App().get("/*", [](auto *res, auto *req) {
res->end("Hello world!");
}).listen(3000, [](auto *listen_socket) {
if (listen_socket) {
std::cout << "Listening on port " << 3000 << std::endl;
}
}).run();

On my laptop I built and ran the http_load_test of µSockets:

git clone https://github.com/uNetworking/uSockets
cd uSockets
make examples
./http_load_test 200 192.168.0.122 3000

Above test will establish 200 connections making non-pipelined HTTP requests as fast as the server will allow. This got me to 93k req/sec at 400% CPU-time usage (all 4 CPUs on the Pi). So I figured — with my elite cooling solution I ought to manage a slight overclock from the default 1.5Ghz to 1.7Ghz.

Overclocking the Pi 4 is super simple — all you need to do is edit /boot/firmware/config.txt as root and add the lines:

over_voltage=2
arm_freq=1750

and reboot.

Success!

Running overclocked at 400% CPU-time usage — works better than anticipated.

Now the results were a stable 106k routed HTTP req/sec. Running for a few minutes, everything runs stable and performs as I wanted it to, without overheating or failing otherwise.

The code running is a production ready solution that’s currently deployed in many large companies. It does proper and secure HTTP parsing, URL routing, parameter and querystring parsing, timeouts and passes rigorous security testing. It also does (optional) TLS 1.3 encryption. So it’s not your typical “100-liner benchmark winner” we are testing here — just to point that out. Also note yet again that we are doing non-pipelined HTTP requests here.

So, is that good?

These results are quite impressive. Or are they? I feel like 100k is something anyone can claim — you just pick the hardware to make it. So what I really like about this little experiment is that the hardware is fixed, cheap and well-known — anyone can pick this cheap computer up and compare with other software solutions. The fact that we are measuring a physical signal on a cable further simplifies the problem specification and eliminates ambiguity.

For comparison, running the same test with Node.js / Fastify as a cluster yields only 8.8k. That really puts some perspective to this — we just outperformed Node.js / Fastify by 12x! That’s something you probably wouldn’t expect out of a simple software change, but there you have it. Nothing but software changed!

Fastify running as a cluster is entirely CPU-capped at only 8.8% the performance!

Despite Fastify openly boasting about its performance with texts like “the fastest web framework in town” and “serving the highest number of requests as possible” it really is a subpar solution in terms of performance — and so is Node.js — at least without native addons in use.

What about TLS 1.3 then?

We can go even further in our comparison— by enabling TLS 1.3 in µWebSockets we can run the same test with modern encryption enabled. Now we get a stable stream of 77k req/sec over this secure encryption standard. That is, we are outperforming Node.js / Fastify — 8.75x on a secure-vs-insecure basis! If that is not getting the message across, I don’t know what more to say.

With µWebSockets it is possible to serve modern TLS 1.3 traffic way faster than many other software solutions can serve even insecure cleartext traffic.

“This is an unfair comparison”

This test pitted a native C++ server against a scripted Node.js one, so of course the outcome was given. Well, not really. If I run the same test using µWebSockets.js for Node.js, the numbers are a stable 75k req/sec (for cleartext). That’s still 8.5x that of Node.js with Fastify and just shy of 75% of µWebSockets itself.

So you can’t really say that Node.js is “naturally slow and that’s expected”. Native C++ addons can boost the performance of Node.js by huge amounts. Node.js loaded with µWebSockets.js can compare quite favorably with full-on native solutions in many cases, especially if your particular application is performing lots of small-message I/O.

What this test shows, really, is that different software solutions can have huge implications on performance as a whole.

If you need, really need, performance or simply want a shot at lowering your hardware cost you should really take a look at µWebSockets.js for Node.js or just go with µWebSockets for C++. Both are the same server project, only the Node.js variant is available from within, well, Node.js. Just like any other JavaScript module.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store