Sorting is language specific even if you're restricted to languages using Latin characters. Eg. How do you sort N relative to Ñ? How do you treat the Turkish variations on the letter I?
Doing a dumb sort by character or byte values is obviously the wrong call for any diacritics, but the right call may also depend on the language.
Before the Danish language adopted the letter "å" (in 1948), the vowel was written as "aa". In the Danish alphabet, "å" is the last letter. Therefore a list of three Danish city names would be correctly sorted as:
* Albertslund
* Odense
* Aarhus
This feels like material for another Tom Scott video.
Not Tom Scott, but Dylan Beattie has done a handful of interesting talks[1] effectively on "there's no such thing as plain text" which in part covers this sort of thing. In fact, I think your Danish cities list is actually one of his examples.
And to make it more interesting, Sweden also has the letter "å", but it's in the 27th place in the alphabet (followed by "ä" and "ö"). In the Danish/Norwegian alphabet, the letter "å" is the last letter of the alphabet.
And that's why there are a hundred different possible values for LC_COLLATE, and it's completely normal that two popular Unix distributions picked different default values for that setting...right?
It would have been reasonable to conclude the article a third of the way through, and say "sorting is locale-dependent, if what you value is consistent behaviour between different OSs (instead of sorting based on the user's preferences) you need to implement the sorting yourself."
Beyond that, are what/why you are sorting... should File1.foo come before File005.foo or file020.foo? I've honestly thought about creating my own file manager just to case-insensitively sort files where sequences of numbers are padded to the same length, and only if there's an identical match is case-sensitivity put lower first, then upper on first original difference.
My worry is that it would perform badly on really large directories... That said, for where it's a pain, it would be helpful to say the least.
It isn't even language/nation dependent, there are also different official sorting orders in a single language dependent on the context, e.g. phone book vs. dictionary.
And then a lot of languages are used in different countries with different rules.
Doing a dumb sort by character or byte values is obviously the wrong call for any diacritics, but the right call may also depend on the language.