jack

Intro

I've read lots of explanations of when and how to use asserts, or other error handling. But I realised, I've never actually really sat down and thought through all the possibilities myself. Welcome to the ride!

For the purposes of this post, I will mainly consider "things that can happen in subsystems of a computer program other than successfully performing whatever operation you hoped to achieve".

I think it's important to think of "subsystems" in that way. A subsystem is about as general a term as I could manage, because sometimes this refers to a function, sometimes a class or other object, sometimes a library, sometimes a whole program, but I don't think the reasoning is tied to any particular one of those[1].

In particular, for any non-trivial program, it's important to recognise that subsystems are built out of subsystems, and the behaviour for a system may be different if it's "an encapsulated part of your code", "your whole program", "an external library" etc, which may become painfully apparent when it shifts from one to another.

It's also important to be realistic that any non-trivial program, the number of ways things can go OTHER than the one you wanted massively dwarf the 'correct' behaviour. If you want a one-time script that just has to work once, ignoring all those except to stop them happening in this particular situation is a good trade-off. But in anything else, you need to devote a lot -- maybe most -- of your time dealing with them.

Intro: footnoes

[1] Encapsulation is good. Objects are good. But encapsulation is a broader concept, and objects are *one* form of encapsulation, and different forms are appropriate in different circumstances.

Case 1: Successful operation

Yay! Not much to say here. Typically you return a value. Sometimes you don't return anything at all.

Case 2a: Internal error

This code is screwed up somehow. Some internal state is inconsistent. You read off the end of an array. You checked for invalid values, and then proceeded with a calculation and -- oops! found an invalid value anyway. You had two branches, an if and else, but somehow found some third state that wasn't covered.

Case 2b: Invalid input

The input (function parameters, possibly something read from a file, etc) doesn't fit the requirements you wrote for what they should be.

Sometimes this is conflated with 2a, but for reasons described below, I'm considering this separately to start with.

Case 2c, 2d, 2e...: Something else happened that prevented a successful completion

You were supposed to write to a file, but couldn't. Out of memory. Someone called "pop" but the list is empty.

How to deal with them

Traditionally, #1 is a place for asserts, 2c onward are a place for exceptions, error return values, or sometimes a null return value, and 2b is a place for massive arguments between those two options.

At least conceptually, I'd say the best way to deal with everything is the way rust does.

Most return values are actually a union of either a successful return (usually with a value), or an error return (with some info, say an error type from a hierarchy to say what the error is).

This provides the *calling* code the maximum options with minimum fuss.

If the calling code just wants a quick answer and doesn't care about handling failure, there's an "unwrap" function which takes the return value, turns a successful return into a plain value, and fails the whole program if there's an error. It's a bit like "not checking for null", except that you can search the whole program for "unwrap" and make sure there aren't any before doing anything important for it, but it's harder to search for "should have checked for null and didn't" when it's not immediately obvious which pointers should sometimes be null and which should never be null.

More often, you can use the "try" macro -- or the one character long syntactic sugar abbreviation of it, '?' suffix on an expression. That does the same thing, but instead of failing, it early-returns from the current function with the same error.

That's often what you want: if you get some sort of error three levels deep, probably all your other functions want to deal with that the same way whatever that is. Until it hits a boundary between subsystems, where you may want to deal with the error in a more specific way.

Indeed, I think not doing that is driven mostly by syntax. If you use error values, then every function that can fail needs to have an inconveniently long list of boilerplate saying "if any of these errors, early-return with that error". And if you use exceptions, you have a whole bunch of hidden complexity where looking at your function gives no indication how many errors may be abruptly perpetuated up through it, or from which lines in it.

I think in an ideal world, all the error cases would use this model. It would be easy to write code that returns an "invalid parameter" error with minimal boilerplate. And the calling code can either handle that error (if they care), or deliberately treat it as a failure (if it commits to validating the parameters in advance).

And even in the "internal error" case, you likely want a return value there. Usually the calling code should not handle them (since everything is probably screwed then anyway). But if, say, you're handling a buggy library, you can choose to handle them if you want, not be forced to deal with crashes.

But they *would* still be a special case. For instance, your debugger would be configured to stop on any internal errors (basically, equivalent of asserts). Even more so than for exceptions. And you may choose to disable checks for them if you're desperate for optimisation (also like asserts).

Optimisation

That also gives an answer to how you should handle unexpected errors in code at runtime. The fail fast people are correct that you usually shouldn't continue, since whatever you were doing is probably failing. But if you return an "internal error" value, higher layers of the program can handle that if you want.

Say, if your "open file" function fails under some conditions. Continuing to run the "open file" code after an error is bad -- it might be harmless, say, opening a file with an empty name, or it might be a security bug. But as long as you bail out of that function as soon as you detect the error, higher level code might choose to treat that as a unit, and retry, or report an error to the user, not just kill the program.

Conversely, if optimisation is key, you can choose the C-like approach of "crash immediately", or "disable all those checks since they should never happen anyway" and just let the consequences of that error bubble forward until they become irrelevant or something more serious goes wrong. Which isn't ideal, but is better than not having any of those checks in the first place and pretending they're not needed.

That leads to one of the points that led me into this post in the first place. That it's usually better to have your asserts in runtime non-debug code than to just not have the checks (unless you care about optimisation).

Action on Failure

Action on Failure

Active Recent Entries