Assert in production

Why your code should crash more

Nov 24, 2025

Assertions seem extreme: assertions don’t throw an exception, assertions abort the process. Sure, in development a crash might be convenient for debugging, but in production, a crash is terrifying. The instinct is to handle a failure, maybe degrade functionality, but keep the system running. Crashing feels like giving up. Yet, our instinct betrays us. Continuing after a violation is worse than crashing.

On 18 November 2025, Cloudflare suffered a global outage that returned HTTP 5xx errors across much of the Internet for hours. The culprit: a traffic-critical component attempted to read a file that had grown beyond its supported size, causing a crash—courtesy of a single unwrap(). unwrap() is functionally an assertion: If unwrap returns a value, the process continues, otherwise, the process aborts.

code that generated the error — If unwrap returns a value, the process continues, otherwise, the process aborts.

Given the consequences of this crash, we face a serious question: Should assertions be enabled in production? Couldn’t the process have continued and done its best despite the violation?

Assertions

Assertion violations are not exceptions. Exceptions are values within the system’s expected operating boundaries. Assertions encode invariants, conditions that must always hold. An assertion violation is an invariant violation and indicates that the system is outside its expected operating boundaries. Assertion violations and exceptions cannot be compared: they are categorically different.

A system can operate inside its operational boundaries, triggering exceptions, or outside its operational boundaries, triggering assertion violations

Let It Crash

Assertions are not the cause of an invariant violation, they are a witness of an invariant violation. The component has already failed catastrophically, it just hasn’t crashed yet. If we allowed the component to continue, we can no longer predict the outcome of any action since the component is already operating outside its specification.

The only safe option: crash the component.

And Recover

Any component can crash at any moment: processes crash, networks partition, hardware breaks. Your system must already handle crashes. By crashing on an assertion violation, we transform an unpredictable situation (undefined behavior) into a well-understood one (component crash). We’re not introducing a new failure mode; we’re converting chaos into a failure mode the system already knows how to handle.

Cloud-crash vs Cloud-bleed

During Cloudflare’s November outage, the proxy crashed, the recovery mechanisms failed, and much of the internet went down. This appears to indicate that we should disable assertions in production: if unwrap didn’t abort, maybe the internet would have stayed up.

Consider Cloudbleed, the 2017 Cloudflare incident where a buffer overflow caused the system to leak sensitive data. Although the system entered a failed state, the system kept serving requests, silently violating fundamental safety guarantees. Crashing the system on detection of an invariant violation would have contained the accidental disclosure.

Would you rather be offline for a few hours or have your sensitive data exposed? The answer depends on your priorities, but for many systems safety violations are more severe than liveness violations.

Lessons Learned

The November incident reveals subtlety: Should the oversized file trigger a validation violation resulting in an exception or an assertion violation resulting in a crash? Is the file considered external input or internal data?

While most engineers will opt for an exception in this case, once you’ve established an invariant, the principle is the same: crash on violation and ensure your recovery mechanisms can compensate for the crash. By that logic, the problem wasn’t that the component crashed—components must be allowed to crash. The problem was that the system couldn’t recover from that crash.

Conclusion

At Resonate, we enable assertions in production. We abort on invariant violation and rely on recovery. The Resonate Server, the core component of our durable execution framework Distributed Async Await, contains more than 200 assertions—every one will crash the process rather than allow the server to continue after a violation. The component responsible for orchestrating an entire distributed system consistently must itself be consistent.

Consider doing the same.

Nitin Bhide

Excellent explanation. I have always been a strong proponent of using assert in my work. However, developers in most companies—especially those focused on projects—are often reluctant to adopt this practice. In every project where I made extensive use of assert, the bug count was significantly lower, and the debugging effort was far less compared to other projects

Expand full comment

Scattered Thoughts • Some Assembly Required

Discussion about this post