Hacker Newsnew | past | comments | ask | show | jobs | submit | oggyboye's commentslogin

How does it work when indexing uuid columns?


Very poorly is my understanding. There's various sequential UUID-like schemes that are more sortable by prefixing with bits of physical time. Off the top of my head, ULIDs and also UUID v7.


But those sequential IDs are often just not suitable due to security reasons.


Then don’t expose them. Adapt your API, don’t break the DB.

But in any case, both of the varieties mentioned still have random components; they aren’t purely sequential. Monotonic for the time component, yes.


I cannot not expose them, they exist to be exposed.


You could map them to a v4 UUID that isn’t a PK (or even indexed), and expose that instead.


I think this really only matters for clustered indexes.


No, you will have much larger index bloat as you add rows as page splits happen to accommodate the new rows.


Nope. Performance and WAL bloat will both occur.

https://www.cybertec-postgresql.com/en/unexpected-downsides-...


This is a great article, thank you!


I wonder how this works out for MySQL. Any time I’ve used uuid there I’ve used my own version (more like v7 that is available now).

This makes a big difference for the primary key (clustered index for both MySQL and postgresql).

Does anyone have an analysis for MySQL secondary indexes like the above referenced one for PostgreSQL secondary indexes?


MySQL stores the PK with every secondary index, and uses it to retrieve the requested rows (unless the query is covered by the index). I’d think for most queries, this would result in a similar slowdown.


A clustered index means the entire table data is included. It’s the same issue, just to an even higher degree.


Probably poorly by default, but you could use a hash of the uuid as a key (to try and more evenly spread the entropy) or key it off a suffix instead of a prefix since iirc that's where most of the entropy lives.

In practice if you want good performance and scalability it's important to select keys well.


For this use case people generally choose sequential UUIDs or they want random ones to prevent hot pages for their inserts.


Normally, a uuid is orderable.

However uuid4 create a lot of work during updates, and tend to result in relatively low fill trees so querying is less efficient than it could be.

They also don’t really benefit from btrees as range or prefix queries are extremely rare.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: