Chinese character encoding

Chinese character encoding is needed for the display of Chinese characters in computers, used in the Chinese, Japanese, and Korean languages (collectively CJK). The following are common Chinese encoding systems:

Guobiao is usually displayed using simplified characters and Big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters, though font and encoding are always tied together for practical reasons. For example, one cannot map traditional Chinese glyphs to the GB encoding without compromising the meaning of some characters. Some "simplification" involves mapping many characters with different meaning and usage into a much simpler common writing. One can easily map many-to-one in a Big5 encoding using simplified glyphs. But mapping one-to-many when assigning traditional glyphs to the GB encoding is tricky, because whatever you pick, some characters would be the wrong choice in some of the usages. Technically one can map simplified glyphs to the Big5 encoding, but such product would not find a profitable market and hence practically non-existent. Unlike UNICODE which assigns different codes for simplified characters than traditional characters, neither Big5 nor Guobiao supports both traditional and simplified characters simultaneously.

The conversion between traditional and simplified Chinese is usually problematic. The traditional to simplified (many-to-one) conversion is simple, but the opposit conversion often results in a data loss. The simplified to traditional (one-to-many) conversion often requires usage context or common phrases to resolve conflicts.

One other issue is that many of the encoding systems are missing characters. While the missing characters are often literary and not commonly used, this does become a problem because people's names often contain these characters. An example of the problem is the Taiwanese politician Wang Jian-Hsuan whose second given name is not in some character systems.

