oggyboye's comments

oggyboye · on Dec 24, 2023

How does it work when indexing uuid columns?

LAC-Tech · on Dec 24, 2023

Very poorly is my understanding. There's various sequential UUID-like schemes that are more sortable by prefixing with bits of physical time. Off the top of my head, ULIDs and also UUID v7.

ahoka · on Dec 24, 2023

But those sequential IDs are often just not suitable due to security reasons.

sgarland · on Dec 24, 2023

Then don’t expose them. Adapt your API, don’t break the DB.

But in any case, both of the varieties mentioned still have random components; they aren’t purely sequential. Monotonic for the time component, yes.

ahoka · on Dec 25, 2023

I cannot not expose them, they exist to be exposed.

sgarland · on Dec 25, 2023

You could map them to a v4 UUID that isn’t a PK (or even indexed), and expose that instead.

sroussey · on Dec 24, 2023

I think this really only matters for clustered indexes.

Tostino · on Dec 24, 2023

No, you will have much larger index bloat as you add rows as page splits happen to accommodate the new rows.

sgarland · on Dec 24, 2023

Nope. Performance and WAL bloat will both occur.

https://www.cybertec-postgresql.com/en/unexpected-downsides-...

sroussey · on Dec 24, 2023

This is a great article, thank you!

sroussey · on Dec 24, 2023

I wonder how this works out for MySQL. Any time I’ve used uuid there I’ve used my own version (more like v7 that is available now).

This makes a big difference for the primary key (clustered index for both MySQL and postgresql).

Does anyone have an analysis for MySQL secondary indexes like the above referenced one for PostgreSQL secondary indexes?

sgarland · on Dec 24, 2023

MySQL stores the PK with every secondary index, and uses it to retrieve the requested rows (unless the query is covered by the index). I’d think for most queries, this would result in a similar slowdown.

masklinn · on Dec 24, 2023

A clustered index means the entire table data is included. It’s the same issue, just to an even higher degree.

kg · on Dec 24, 2023

Probably poorly by default, but you could use a hash of the uuid as a key (to try and more evenly spread the entropy) or key it off a suffix instead of a prefix since iirc that's where most of the entropy lives.

In practice if you want good performance and scalability it's important to select keys well.

hobs · on Dec 24, 2023

For this use case people generally choose sequential UUIDs or they want random ones to prevent hot pages for their inserts.

masklinn · on Dec 24, 2023

Normally, a uuid is orderable.

However uuid4 create a lot of work during updates, and tend to result in relatively low fill trees so querying is less efficient than it could be.

They also don’t really benefit from btrees as range or prefix queries are extremely rare.