Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
dahfizz
on Feb 9, 2022
|
parent
|
context
|
favorite
| on:
How UTF-8 Works
Some Unicode encodings require a char datatype that is 2+ bytes wide, hence a wide char.
wolverine876
on Feb 9, 2022
[–]
Right, so what does the GGP mean by "did not have to go to wide chars"?
dahfizz
on Feb 9, 2022
|
parent
|
next
[–]
Utf-8 can be implemented with plain old 8-bit chars. It does not need the wide chars like UCS-2, for example, where every character is 16 bytes.
Therefore, by going to utf8 instead of ucs2, we did not have to go to wide chars.
cryptonector
on Feb 9, 2022
|
parent
|
prev
|
next
[–]
"wide chars" == wchar_t, sizeof(char) == 1, sizeof(wchar_t) > sizeof(char).
masklinn
on Feb 9, 2022
|
parent
|
prev
[–]
The code unit of UTF8 is the byte, so you only deal with bytes and byte sequences. Thus no issues of byte-order and such, which you have to deal with in UTF-16 and UTF-32 (as their code units are respectively 2 and 4 bytes).
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: