100k secure WebSockets with Raspberry Pi 4
Practical benchmark of TLS 1.3 WebSockets on limited hardware
If you’ve ever checked your stocks during the virus outbreak, you might have noticed that valuations are updated without you having to reload the page. This happens with Yahoo Finance, TradingView and many others.
These updates are due to WebSockets, persistent connections that receive updates whenever they are sent. The back-end will hold a connection to your web browser for the duration of your visit, keeping you up to date. This connection is not free, and will require back-end resources.
Depending on what kind of software being used back-end, this cost can vary greatly, as hardware requirements will scale at different factors. This post will present comparisons for real-world use cases involving secure WebSockets and limited hardware such as the Raspberry Pi 4.
Benchmark
Imagine we’re building something similar to those websites. You have fluctuating valuations on the back-end and want to present those fluctuations to the clients, the web browsers.
For this test we’re going to use a 2 second update granularity — we send updates to every connected client with this interval. We establish WebSockets over TLS 1.3 connections and traffic is sent over an actual Ethernet network, not localhost.
Back-end hardware is constant, one Raspberry Pi 4, while back-end software is varying. We are going to consider one of the most popular solutions, Socket.IO, and of course µWebSockets. Both software solutions are available for Node.js.
They will both be utilizing all 4 CPUs and the same TLS cipher; TLS_AES_128_GCM_SHA256, 128 bit keys, TLS 1.3. Compression is set equally, and Socket.IO is given the extra chance of having its long-polling disabled.
64-bit Ubuntu for Raspberry Pi 4
There’s not much to say about the operating system, we run latest 64-bit Ubuntu for Raspberry Pi 4 and the only configuration made is ulimit -n 200000 to allow 200k file descriptors. The Pi 4 is connected via Ethernet cable, not Wi-Fi. We have about 290 MB of RAM wasted to the system — this could be reduced for further optimization.
With µWebSockets
We will be using the built-in pub/sub support of µWebSockets for ease of development. For more tutorial-like information and comparison with Socket.IO, you can read “Moving from Socket.IO to µWebSockets.js”.
We connect 100k secure WebSockets in about 3 minutes. Updates are published every 2 seconds and the client will count them and make sure that we are in fact receiving in time and steadily without “clumps” or drops.
Tests run for many hours, and we are in fact steady in memory usage at 2.92 GB out of 3.70 GB available. Messages are delivered without any variations or hiccups in performance, and we get the correct amount of messages per second without any fluctuations. CPU usage is dancing around 60%.
I measured a few thousand more connections without issues, but for keeping a good margin we stop at 100k. We could push it to 120k, but then we’re dancing on the edge of stability as we cannot consistently push 60k messages per second. We settle with a consistent rate of 50k secure messages per second, as is the requirements for 100k clients.
I have a fan on my Pi 4, but since things were running so smooth I tried turning the fan off and kept running the benchmark for a few hours. To my surprise it kept on going stable, now entirely quiet.
With Socket.IO
With Socket.IO there’s very contrasting measurements. At 40k secure connections, Socket.IO will immediately crash the moment we try to send anything. It crashes from exploding in memory usage and getting OOM killed by the system.
Lowering the count to 30k, we can survive for shorter periods. However, the client is not getting nearly the number of messages it expects. Numbers are way off, and fluctuations in performance are massive. Socket.IO does not have the sending performance to cope with 30k secure sockets sending every 2 seconds. Numbers hint of 20k being the absolute upper limit.
At 20k we are stable enough to run for longer periods. Socket.IO does however not keep up with an even performance, but rather sends messages in varying sized “clumps” as it tries to keep up. We never get a stable wave of messages, and counting a period of 120 seconds we’re still missing 40k messages stuck in backpressure!
20k secure connections is really over the edge of what Socket.IO can muster in this benchmark — I would say Socket.IO barely manages 10k connections properly.
Conclusion
Considering this very basic but common real-world case, it should be obvious to readers that back-end cost (and stability) can vary greatly with software. Many believe that to solve a problem like I/O, you just have to throw more hardware at it. It is a common belief I’ve stumbled upon many, many times. Everything in this test has been constant, except for the back-end software. And still we see these massive deviations in overall performance.
Sure, at some point you have to start throwing hardware at the problem either way, but the scalability factor of different software will greatly dictate how your hardware cost will scale. This benchmark clearly shows that in order to scale Socket.IO servers, you would need at least 5x if not 10x the hardware compared to something optimized like µWebSockets. This factor is linear and applies to any amount of hardware — with 10 machines you could do with 1, and so on.
And of course, this test only scratches the surface of what is possible with µWebSockets. As soon as more complex MQTT-syntax is needed to implement your business logic, µWebSockets will absolutely excel in comparison to Socket.IO. The differences between Socket.IO and µWebSockets can easily become astronomical in more complex use cases, as shown in this benchmark.
Thanks!