Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some Unicode encodings require a char datatype that is 2+ bytes wide, hence a wide char.


Right, so what does the GGP mean by "did not have to go to wide chars"?


Utf-8 can be implemented with plain old 8-bit chars. It does not need the wide chars like UCS-2, for example, where every character is 16 bytes.

Therefore, by going to utf8 instead of ucs2, we did not have to go to wide chars.


"wide chars" == wchar_t, sizeof(char) == 1, sizeof(wchar_t) > sizeof(char).


The code unit of UTF8 is the byte, so you only deal with bytes and byte sequences. Thus no issues of byte-order and such, which you have to deal with in UTF-16 and UTF-32 (as their code units are respectively 2 and 4 bytes).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: