In a previous life, I had a long discussion on why adding booleans was a bad idea. And just recently one of the core Python developers suggested the same thing -- adding booleans, that is. This is a long rant on why such things are bad.
Once at some previous job, I blocked a code review in which the other developer did something like
if boolean1 + boolean2: do_thing()
Why this was bad, in my view? 'Cause it was reducing the booleans into simple integers. While it is true that booleans are, internally, in Python, integers, what they represent is not integers, but actually true or false. People got it -- or, at least, they say they got it, but the code changed, anyway -- and live moved on.
But, last month, I had to see Raymond Hettinger, one of the core Python developers, post this on twitter:
#python tip: The boolean values False and True are equal to zero and one.
Replace: sum(1 for x in data if pred(x))
Obviosly, what he meant was to use
sum(pred(x) for x in my_list)
... instead of
sum(1 if pref(x) else 0 for x in my_list)
Again, basing on the fact that Python uses integers behind the scenes to deal with booleans.
And, as you can guess, that really annoyed me.
What Are Types
Let me explain, in a long rant, why "booleans are integers" is bad.
Imagine the computer memory. Imagine one specific memory location, being used, with this value:
What does it mean? That's where languages and their types come in.
For example, let's imagine that this location is being managed by a C program
and the program and this program marked this value as a
in C represent 8 bit integers1, they have being for a long time used to
represent one character in a string -- a sequence of
represents a string2. So, when other developers see
char, they think "Ok,
this is the character with code 65" (which is "A", by the way).
If the same code use
int, other developers would think "Yeah, this is the
And, just to screw things up, it could be an
enum, in which the value
represents the 65th variant (element) of that enum.
And that's the reason types exist: -- instead of, say, developers managing memory directly and just changing their representation, like in Assembly -- they provide consistent representation over the internal storage.
The Problem With Adding Booleans
So, we saw that memory is just a bunch of bytes and what gives meaning for those bytes -- in programming languages, that is -- are types3.
Now let's see another developer seeing the code above; they go from the top of
the code to the bottom, and reach the line of
sum(pred(x) for x in my_list.
The first thought they come is that
sum acts on numbers, so obviously
is a function that return numbers. But what number it represents? So they go
pref and see it returning
False. Now they have to trace
back and rethink what the line did, leaving them with cognitive
dissonance, which is a
clever way of saying "they have to rethink what they already though".
And too many situations with cognitive dissonance is what makes code "unreadable" -- the line above is still readable, but it doesn't actually represent what it shows.
Respect Your Types
Python is very loose with its type system4, but it doesn't mean one could
play "fast and furious" over it. Let's say that, at some point, Python
developers decide to change
False from their integer roots to be
actually symbols -- things that simply "exist" and have no value56. Then
everyone that managed booleans as integers would see their code misbehaving or
crashing, simply because they didn't thread booleans as booleans.
Now let's see the other option:
sum(1 if pred(x) else 0 for x in my_list).
This line is (a) longer and (b) slower due the branching during execution (the
if). But when you read something like this you see that there is a function
where its value isn't being checked, which probably means it returns a
boolean7; if it is true, returns 1; if it is false, returns 0; and you're
actually doing a sum of ones and zeroes -- as numbers.
No cognitive dissonance, no messing around and just because we treated types as types.
PS: After a small discussion about what's better, I came with a better line
1 if pred(x) else 0:
sum(1 for x in my_list if pred(x))
Why this would be better? Because, when you think what you actually want --
count the number of
Trues in the list -- you can actually use a feature in
list comprehensions for filtering: the
if at the end. This will count 1 (a
number) only if the element being processed "agrees" with the predicate.
That line could be translated like something as
myList.stream() .filter(x -> pred(x)) .map(x -> 1) .sum();
... in Java 8: You remove the non-True values of the list, convert the
to 1 (a number) and sum the total.
They could use more than 8 bits, depending on the architecture, and due the fact that the C Standard is very flexible in this concept
It doesn't mean that every single
char is a character in a string, it
could be used exactly as an 8 bit integer
I'm being very lose here about types, there is a lot more complex context in them, but I'm going to stick with this "representation" for now.
Maybe nose as loose as C, which lets you "convert" a memory that represents a float into a integer with no sign.
Or, better yet, that can have any value and would still work.
As far as I know, Python standard already forced booleans to be integers, so that will never happen, but let's add this for the sake of discussion.
This is one of times I feel jealous of Clojurist, which can use
their functions and actually have a coding style that says that predicates
-- functions that either return True or False -- end with
?; so not only
someone reading a piece of Clojure code seeing a
is_valid? knows it
returns a boolean, it actually reads like a boolean check.