Ask HN: Super-summary to go from "grapheme" to "bytes"? Is this super-summary correct, to understand how what's shown on the user's screen is expanded into single bytes? 1) A user sees some character on his/her screen => that's a "grapheme", which is a collection of... 2) ...1 to N "Unicode code points", where a single "Unicode code point" can use... 3) ...1 to 6 "UTF-8" bytes. Is that right (in the case of UTF-8 storage)? (I feel like that I'm missing an intermediate step...) (indirectly related to "You can't just assume UTF-8" https://news.ycombinator.com/item?id=40195009 , comment https://news.ycombinator.com/item?id=40206149 , link mentioned in that comment being https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ ) Thx :o) |