Sorting is language specific even if you're restricted to languages using Latin ...

encom · 2025-10-19T19:56:10 1760903770

Before the Danish language adopted the letter "å" (in 1948), the vowel was written as "aa". In the Danish alphabet, "å" is the last letter. Therefore a list of three Danish city names would be correctly sorted as:

  * Albertslund
  * Odense
  * Aarhus

This feels like material for another Tom Scott video.

tpmoney · 2025-10-19T21:53:35 1760910815

Not Tom Scott, but Dylan Beattie has done a handful of interesting talks[1] effectively on "there's no such thing as plain text" which in part covers this sort of thing. In fact, I think your Danish cities list is actually one of his examples.

[1]: https://www.youtube.com/watch?v=gd5uJ7Nlvvo

encom · 2025-10-20T14:46:20 1760971580

Finally had time to watch it, that was excellent. Thanks for the link.

Pike matchbox.

qw · 2025-10-20T09:10:54 1760951454

And to make it more interesting, Sweden also has the letter "å", but it's in the 27th place in the alphabet (followed by "ä" and "ö"). In the Danish/Norwegian alphabet, the letter "å" is the last letter of the alphabet.

plufz · 2025-10-20T07:48:39 1760946519

Haha. Like it was enough with ” tooghalvfems”.

dmurray · 2025-10-19T18:43:30 1760899410

And that's why there are a hundred different possible values for LC_COLLATE, and it's completely normal that two popular Unix distributions picked different default values for that setting...right?

It would have been reasonable to conclude the article a third of the way through, and say "sorting is locale-dependent, if what you value is consistent behaviour between different OSs (instead of sorting based on the user's preferences) you need to implement the sorting yourself."

harrall · 2025-10-19T22:07:48 1760911668

LC_ALL=C which gives you consistent sorting behavior.

The article does mention it but in passing.

tracker1 · 2025-10-20T20:58:27 1760993907

Beyond that, are what/why you are sorting... should File1.foo come before File005.foo or file020.foo? I've honestly thought about creating my own file manager just to case-insensitively sort files where sequences of numbers are padded to the same length, and only if there's an identical match is case-sensitivity put lower first, then upper on first original difference.

My worry is that it would perform badly on really large directories... That said, for where it's a pain, it would be helpful to say the least.

1718627440 · 2025-10-21T14:51:09 1761058269

It isn't even language/nation dependent, there are also different official sorting orders in a single language dependent on the context, e.g. phone book vs. dictionary.

And then a lot of languages are used in different countries with different rules.