Multithreading is Hard (and You’re Probably Doing It Wrong)
I’ve never encountered a company that consistently does multithreading correctly. You might think that big, well-known Silicon Valley companies have it all figured out, but they make the same common mistakes as everyone else. Worse, they often miss these mistakes by not using ThreadSanitizer.
Multithreading is a textbook example of the Dunning-Kruger effect — you learn the basics, feel confident, and assume you’ve mastered it. But in reality, you’ve likely missed crucial details.
Use ThreadSanitizer Religiously
Skipping ThreadSanitizer is a slippery slope that leads to disaster. If you think, “False positives don’t matter,” you’ve already lost. There are no false positives if you do it correctly. Your application should be entirely free of any warnings.
Understand the Importance of Memory Barriers
Most developers know what a mutex is and how it orders operations across threads. However, few understand the memory barrier it provides, and even fewer understand how it relates to the underlying atomic variable.
A basic mutex is implemented like this:
#include <atomic>
class SimpleMutex {
std::atomic<bool> locked = {false};
public:
void lock() {
while (locked.exchange(true, std::memory_order_acquire)) {
// Busy-wait until it becomes false
}
}
void unlock() {
locked.store(false, std::memory_order_release);
}
};
To understand a mutex, you must understand atomics and their operations using load(acquire)
and store(release)
.
Why Are Memory Barriers Important?
CPUs will not read from or write to RAM unless necessary — they first use per-core SRAM caches. This means that if thread 1 modifies a variable, it might not be immediately visible to thread 2, even if the two are temporally synchronized (such as via sleep). To properly propagate changes across threads, you need:
- A load/acquire operation on the same atomic address in the receiving thread.
- A store/release operation on the same atomic address in the writing thread.
The Address of an Atomic Defines Its Synchronization Universe
A mutex (and therefore any atomic variable) establishes its own isolated synchronization universe, or “communication channel”. To safely propagate data from one thread to another:
- The writing thread must wrap its write operation in a
store(release)
on the same atomic variable. - The reading thread must perform a
load(acquire)
on that same atomic variable.
Otherwise, the data may be stale, missing, or corrupt.
The Most Common Multithreading Mistake
Many developers correctly use a mutex when writing data, but fail to apply the same synchronization when reading it. This misunderstanding most likely comes from the idea that synchronization uses RAM as central stage of data exchange. As in, you synchronize to “flush” cache into RAM. So it naturally feels like a two step operation; write & read. And therefore it feels like reading the data is a separate step. It’s not. This is the wrong mental model.
Synchronization is not about flushing cache to RAM, in fact RAM is rarely even touched. CPUs directly synchronize with each other, without using RAM. And when you think about it like this, it makes a lot of sense that you must synchronize your reading part as you synchronize your writing part. Because both threads come together to propagate their caches (there is no middle man such as the RAM). You are synchronizing the local SRAM cache of one CPU directly with another CPU. And for this you must use the same mutex (or atomic). The mutex literally is your communication channel in which the data exchange is guaranteed.
Always ensure that reading shared data follows the same locking strategy as writing it.
Conclusion
If you take away only a few things from this:
- Always use ThreadSanitizer — there are no acceptable warnings.
- Memory barriers matter — understand how
load(acquire)
andstore(release)
work. - A mutex does more than synchronize time — it synchronizes memory.
- Reading shared data requires a mutex just as much as writing does.
Mastering multithreading isn’t just about avoiding race conditions — it’s about understanding memory models, atomic operations, and synchronization universes. If you ignore these principles, your code may seem to work — until it doesn’t.