4 min read

Nullable data

Nullable data
Photo by Valentin Lacoste / Unsplash

There is a pet peeve of mine when it comes to, what I consider, clean code. Or to be more precise: clean data. And that comes back to using a NULL (or any version of it) to extend the underlying type with something "more". Then it's underlying type ever designed to mean.

Every time I open a data store and see NULL in there, it raises many questions for me. Which are usually only answered by looking into code. And even then it is not always straightforward. Because of self-explanatory code. Or self-documenting. However you look at it, it takes more time than it should. In database theory, a NULL value represents the absence of data, but it is inherently ambiguous. For instance, does a NULL value in a "Price" field mean that the price is zero, unknown, or not applicable? Such ambiguity can lead to misinterpretation of data, which is dangerous for both analysis and application logic.

And then your code becomes littered with if (field == null), which then requires you to write some strange code to explain these cases. Again, this is is opinionated write-up, not a law. On the other hand, NULL in code is rarely called for, in my experience. Or ever useful. Now let us go through some of the scenarios. In 99% of cases, the default value of the type would work as intended. Now let me start with some examples.

Boolean as nullable. Here is where my face gets contorted at the sight of it. An example of this is query parameters, mostly when it comes to filtering. As an example, &onlyActive=true. On the implementation, this is the most common case I see:

public void Get(bool? onlyActive) {
  if (onlyActive is null) {
    // select both active & inactive
  }
  else if (onlyActive) {
    // select only active
  }
  else {
    // select onlyInactive
  }
}
Example of nullable boolean

And my mind just goes: Huuuuuh? Ok, seems harmless till you introduce some other state of the user, for example, banned or something along these lines. What now? Or when you just don't wanna apply the filter at all. The list goes on. It is clear that boolean is not the right answer if the value can be more than true or false. It is not meant for more, but we cram it to the brim with these kinds of hacks. And then; self-documenting code, DDD, etc. Sure, I guess.

Enum as nullable. Here is another gem that comes from time to time. Like we forget that enumeration can have a default value, but somehow it makes sense to use NULL it to represent it. Just to keep it fresh and make the next guy in line guessing. Because he will not have enough tiers shed over "agile" already. The example of a boolean being nullable is analogous here, so no need to repeat it. With an addition, that boolean example solution would be just to use a nice enum. And the solution for nullable enum: just use the damn default value for it. That is just opinion tho, so do whatever you feel like.

String as nullable. I never, ever, have come across a solution that models a string representation of some piece of data that has benefited from being nullable. But, but, the user didn't enter the optional text field. Okay, so it is empty. Problem solved. In any case, it is empty. Don't encode some state from UI into the meaning of it's value. If you need to persist or know of this, model it separately from the actual data. Here I can again start introducing examples like for boolean above. If you forgot one state, probably down the line with the new JS framework of the day who knows what else can come in? So model it separately from the actual data and the data itself is just empty. Sounds simple, till you make a comment on PR. Then you start that slow descent through nine circles of hell.

Number as nullable. Here I am going just include all number types out there. The same rules for integers as for floats. I touched upon it already in the introduction. With a "Price" example. I dare you to explain that to some salesperson, what NULL is. What if it means it is discontinued or free after so many purchases? The damn product will still have the same price, it will just not be charged. Or the product will have a state to indicate if available and so on. Or there will be a coupon applied at the end of the checkout process that will nullify the value. But the price of it will always be there, for auditing purposes and who knows what. Don't give meaning to something that will require you to write more documentation than just explaining business cases applied to that value.

Going to stop here, as I hope this illustrates the point I am trying to make. I never have come across a scenario where a NULL for any kind of data can't be made "cleaner". The database will be thankful and your replacement/future self will also salute you. I know in the world of document databases it is easy to fall into the trap, but resist the urge. In conclusion, while NULL seems like a convenient option for representing absent or optional data, it comes with various challenges that can make data models less robust and harder to maintain. By being aware of these pitfalls and considering alternatives, you can build more reliable and understandable systems.

As said when started, this is a pet peeve of mine. I am guilty of making these things nullable myself, due to dependencies. Then after I sorted the ducks, I would give them default values and model around these constraints. Garbage in, garbage out. A simple rule of thumb is that after that code crunch is over it still can be refactored. Don't make your life worse, because I bet you when you get back to it in a couple of years there will no be documentation. And you will be spending hours more than needed to understand why that value is NULL.

Till next time, NULL.