Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Fun fact, UTF-8's prefix scheme can cover up to 31 payload bits.

It’d probably be more correct to say that it was originally defined to cover 31 payload bits: you can easily complete the first byte to get 7 and 8 byte sequences (35 and 41 bits payloads).

Alternatively, you could save the 11111111 leading byte to flag the following bytes as counts (5 bits each since you’d need a flag bit to indicate whether this was the last), then add the actual payload afterwards, this would give you an infinite-size payload, though it would make the payload size dynamic and streamed (where currently you can get the entire USV in two fetches, as the first byte tells you exactly how many continuation bytes you need).



Yeah the current definition is restricted to 4 octets in RFC 3629. Really interesting to see the history of ranges UTF-8 was able to cover.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: