Learn to Transform UTF8 Text to Unicode in a Snap
Introducing a Better Way in C 11
If you're a programmer working with strings, you've likely encountered the need to encode or decode UTF8 text. Traditionally, this conversion was done using the strtol()
function. However, C 11 introduces a more efficient and streamlined method.
Using the 'u8' Suffix
To convert a string to Unicode in C 11, simply use the 'u8' suffix on it. For example, the following code will convert the UTF8-encoded string "Hello, world!" to a Unicode string:
const char *utf8_string = "Hello, world!"; wchar_t *unicode_string = u8(utf8_string);
Encoding UTF-8 Byte Span to UTF-16 Character Span
In certain scenarios, you may need to convert a UTF-8 encoded read-only byte span to a UTF-16 encoded character span. To do this, you can use the std::codecvt
function:
const std::basic_string utf8_string = "Hello, world!"; std::basic_string unicode_string; std::codecvt::in conv; const char *utf8_begin = utf8_string.c_str(); const char *utf8_end = utf8_string.c_str() + utf8_string.length(); const std::mbstate_t state; auto result = conv.in(state, utf8_begin, utf8_end, utf8_end, &unicode_string); assert(result == utf8_end);
Alternative Approach: Reading Byte and Calling 'read()'
If reading the input into a C string is not an option, you can change your code to read a byte and then call the read()
function:
unsigned char byte; int result = read(fd, &byte, 1);
Comments