> - Index on Boolean is useless, it's an easy mistake that will take memory and space disk for nothing.
I’ve seen this advice elsewhere as well, but recently tried it and found it wasn’t the case on my data set. I have about 5m rows, with an extremely heavy bias on one column being ‘false’. Adding a plain index on this column cut query time in about half. We’re in the millisecond ranges here, but still.
There is no need for adding the boolean value to the index in this case, since it is constant (true). You can add a more useful column instead, like id or whatever your queries use:
It does appear smaller, but single digit megabytes on a table with millions of rows. Not a major difference for most use cases I think. But good to know for the few that it would make a difference.
I know nothing about partial indices in Postgres, but it seems like for indexing a Boolean, you either index the true or false values right? I feel like Postgres could intelligently choose to pick the less frequent value
Is that correct? I would think that, even with NOT NULL Boolean field, the physical table has three kinds of rows: those with a true value, those with a false value, and those no longer in the table (with either true or false, but that doesn’t matter)
If so, you can’t, in general, efficiently find the false rows if you know which rows have true or vice versa.
You also can only use an index on rows with true values to efficiently find those with other values if the index can return the true rows in order (so that you can use the logic “there’s a gap in the index ⇒ there are non-true values in that gap)
I’ve seen this advice elsewhere as well, but recently tried it and found it wasn’t the case on my data set. I have about 5m rows, with an extremely heavy bias on one column being ‘false’. Adding a plain index on this column cut query time in about half. We’re in the millisecond ranges here, but still.