Contact Form

Name

Email *

Message *

Cari Blog Ini

Convert Utf8 To Unicode C

Learn to Transform UTF8 Text to Unicode in a Snap

Introducing a Better Way in C 11

If you're a programmer working with strings, you've likely encountered the need to encode or decode UTF8 text. Traditionally, this conversion was done using the strtol() function. However, C 11 introduces a more efficient and streamlined method.

Using the 'u8' Suffix

To convert a string to Unicode in C 11, simply use the 'u8' suffix on it. For example, the following code will convert the UTF8-encoded string "Hello, world!" to a Unicode string:

const char *utf8_string = "Hello, world!"; wchar_t *unicode_string = u8(utf8_string);

Encoding UTF-8 Byte Span to UTF-16 Character Span

In certain scenarios, you may need to convert a UTF-8 encoded read-only byte span to a UTF-16 encoded character span. To do this, you can use the std::codecvt::in function:

const std::basic_string utf8_string = "Hello, world!"; std::basic_string unicode_string; std::codecvt::in conv; const char *utf8_begin = utf8_string.c_str(); const char *utf8_end = utf8_string.c_str() + utf8_string.length(); const std::mbstate_t state; auto result = conv.in(state, utf8_begin, utf8_end, utf8_end, &unicode_string); assert(result == utf8_end);

Alternative Approach: Reading Byte and Calling 'read()'

If reading the input into a C string is not an option, you can change your code to read a byte and then call the read() function:

unsigned char byte; int result = read(fd, &byte, 1);


Comments